Cluster Post 3 | Module 4: Data Analysis and Presenting Results
From Concept to Submission Series | 2026
Inferential Statistics: Choosing and Reporting the Right Test
The module overview describes t-tests, ANOVA, correlation, regression, and chi-square. This post goes deeper: a decision framework for choosing the right test, exactly what to report for each test with worked examples, effect sizes explained and why they matter as much as p-values, and the multiple comparisons problem that invalidates many published results.
Choosing the Right Test: The Decision Logic
The most common statistical error in student and early-career research is not running a test incorrectly — it is running the wrong test on the right data. Choosing a test requires answering three questions in sequence: What is the research question asking? What type of variables are involved? Are the assumptions of the candidate test met?
| Research question | Appropriate test |
| Do two independent groups differ on a continuous outcome? | Independent samples t-test (or Mann-Whitney U if normality violated) |
| Does one group differ before vs. after an intervention? | Paired samples t-test (or Wilcoxon signed-rank if normality violated) |
| Do three or more independent groups differ on a continuous outcome? | One-way ANOVA (or Kruskal-Wallis if normality violated) |
| Is there a relationship between two continuous variables? | Pearson correlation (or Spearman’s rho if non-normal) |
| Does one continuous variable predict another? | Simple linear regression |
| Do multiple variables together predict an outcome? | Multiple linear regression |
| Is there a relationship between two categorical variables? | Chi-square test of independence |
| Does a categorical outcome depend on predictors? | Logistic regression |
The non-parametric alternatives in the table — Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Spearman’s rho — are not inferior tests you use when the real test fails. They are the correct tests for data that violates the normality assumption. Using a parametric test on non-normal data with a small sample produces inflated Type I error rates; using the appropriate non-parametric test is the methodologically sound choice.
Understanding p-Values: What They Do and Do Not Tell You
The p-value is the probability of obtaining results at least as extreme as yours if the null hypothesis were true and the study were repeated many times. It is not the probability that the null hypothesis is true. It is not the probability that your findings are a fluke. It is not a measure of the size or importance of an effect.
The .05 threshold is a convention, not a law of nature. Ronald Fisher, who popularised it in the 1920s, described it as a rough guideline for when results were worth further investigation — not as a bright line separating true from false findings. A result with p = .049 is not meaningfully different from one with p = .051. Both represent weak evidence by conventional standards.
Report exact p-values rather than “p < .05” or “n.s.” This allows readers to evaluate the strength of evidence themselves. “p = .043” tells a reader more than “p < .05”. “p = .412” tells a reader more than “n.s.” — particularly when the non-significant result is theoretically informative.
Why effect sizes matter as much as p-values
Statistical significance tells you whether an effect exists. Effect size tells you how large it is. These are different questions, and both matter for interpreting research findings. A study with N = 2,000 will detect very small effects as statistically significant. An effect that is statistically significant but trivially small — d = 0.05, less than one-twentieth of a standard deviation — is not practically meaningful, regardless of its p-value.
Conversely, a study with N = 30 may fail to detect a medium-sized effect as statistically significant simply because it is underpowered. A non-significant result in an underpowered study cannot be interpreted as evidence that the effect does not exist; it is evidence that the study could not detect an effect of the size studied. Effect sizes, combined with confidence intervals, give a more complete picture.
| Effect size measure | Used with | Small / Medium / Large benchmarks |
| Cohen’s d | t-tests: difference between two means | |
| η² (eta-squared) | ANOVA: proportion of variance explained | |
| r | Correlation | |
| R² | Regression: proportion of variance explained by model | |
| Cramér’s V | Chi-square: association between categorical variables |
Reporting Templates for Each Test
APA format specifies exactly what statistical information to include when reporting each test. These are not arbitrary style requirements — they ensure that readers have everything needed to evaluate your results, replicate your analysis, or include your findings in a meta-analysis. The following templates give the required elements.
Independent samples t-test
Students who received peer mentoring (M = 3.74, SD = 0.89) scored significantly higher on the belonging scale than students who did not (M = 3.12, SD = 0.97), t(438) = 7.03, p < .001, d = 0.67. Elements required: group means and SDs, t-statistic, degrees of freedom in parentheses, p-value, effect size (Cohen’s d).
One-way ANOVA
A one-way ANOVA revealed a significant effect of mentoring frequency on belonging scores, F(2, 439) = 14.23, p < .001, η² = .06. Post-hoc comparisons using Tukey’s HSD indicated that students with weekly contact (M = 3.91, SD = 0.84) scored significantly higher than those with monthly contact (M = 3.41, SD = 0.93, p = .003) and those with no contact (M = 3.12, SD = 0.97, p < .001). Monthly and no-contact groups did not differ significantly (p = .21). Elements required: F-statistic, both degrees of freedom (between groups, within groups), p-value, effect size (η²), post-hoc results when overall test is significant.
Pearson correlation
Peer mentor contact frequency was significantly positively correlated with social belonging scores, r(440) = .38, p < .001, 95% CI [.30, .46]. Elements required: r coefficient, N or df in parentheses, p-value, 95% confidence interval. Note: the confidence interval for r is more informative than the p-value alone.
Multiple regression
A multiple regression was conducted to predict social belonging from peer contact frequency, first-generation status, and college type. The model was statistically significant, F(3, 438) = 28.44, p < .001, R² = .16, indicating that the predictors together explained 16% of variance in belonging scores. Peer contact frequency was the strongest predictor (β = .31, p < .001), followed by first-generation status (β = -.18, p = .002). College type was not a significant predictor (β = .07, p = .14). Elements required: overall F and significance, R², standardised coefficients (β) and significance for each predictor. Also report unstandardised B coefficients and standard errors in a table.
Chi-square test of independence
A chi-square test of independence found a significant association between mentoring participation and first-year progression status (pass vs. resit/fail), χ²(1, N = 442) = 12.44, p < .001, Cramér’s V = .17. Elements required: chi-square value, degrees of freedom, N, p-value, effect size (Cramér’s V or phi for 2×2 tables).
The Multiple Comparisons Problem
When you run multiple statistical tests on the same dataset, the probability of obtaining at least one false positive increases substantially. With alpha = .05, you accept a 5% chance of a false positive on any single test. If you run 20 tests, the probability of at least one false positive across all tests rises to approximately 64%.
This is called the familywise error rate, and it inflates silently every time you run more than one test without correction. Many published papers and theses run ten, twenty, or more tests without any correction, which means their “significant” findings may include several that are false positives produced by chance rather than by real effects.
When to apply a correction
Corrections are warranted when you are running multiple tests on the same outcome variable, when you are testing multiple outcomes in an exploratory analysis without strong prior hypotheses, or when individual tests are part of a family of related comparisons (such as all pairwise comparisons in a post-hoc ANOVA test).
The Bonferroni correction is the simplest: divide your alpha level by the number of tests. For 10 tests at alpha = .05, the corrected threshold is .005. It is conservative — it increases the risk of false negatives — but it is straightforward to apply and easy to explain.
For ANOVA post-hoc comparisons, Tukey’s HSD is more powerful than Bonferroni and is the standard recommendation. For complex analyses with many predictors, false discovery rate (FDR) corrections such as the Benjamini-Hochberg procedure offer a less conservative alternative to Bonferroni.
When correction is not required
Corrections are not always necessary. When you have a single, pre-specified primary hypothesis and a small number of secondary hypotheses clearly declared in advance, you can justify testing them without correction — the hypothesis was specified before data collection, so it is not capitalising on chance. The key is transparency: your methodology chapter must clearly distinguish pre-specified primary hypotheses from exploratory analyses, and exploratory analyses require correction.
For Law Students
Quantitative analysis in empirical legal research most commonly uses regression to examine predictors of legal outcomes. Three specific considerations apply in this context.
Logistic regression for binary legal outcomes
Many legal outcomes are binary: case won or lost, bail granted or refused, sentence custodial or non-custodial, appeal upheld or dismissed. Binary outcomes require logistic regression, not linear regression. Linear regression applied to a binary outcome violates assumptions and can produce predicted probabilities outside the 0–1 range — a theoretical and practical impossibility.
Logistic regression reports odds ratios rather than regression coefficients. An odds ratio of 2.3 for the predictor “represented by counsel” means that represented defendants have 2.3 times the odds of a favourable outcome compared to unrepresented defendants, holding other predictors constant. Always report 95% confidence intervals around odds ratios alongside their significance — a wide confidence interval around a large odds ratio signals imprecision that the odds ratio alone does not convey.
Reporting regression results for legal audiences
Legal audiences — including journal reviewers, examiners, and policy audiences — vary widely in statistical literacy. When writing for a primarily legal audience, supplement the statistical reporting with plain-language interpretations of each coefficient. After the formal reporting, add a sentence: “In practical terms, each additional month of case age at the point of listing was associated with a 4.2% increase in the probability of adjournment, controlling for case complexity and court location.” This is not dumbing down — it is translating the statistical finding into terms that allow legal reasoning about its implications.
The ecological fallacy in aggregate court data
When using aggregate court data — statistics at the district or state level rather than the individual case level — be cautious about drawing conclusions about individual cases. If districts with higher caseload show lower grant rates for bail applications, this does not necessarily mean individual judges in high-caseload courts are more likely to refuse bail. The association at district level may be confounded by case composition differences between districts. This is the ecological fallacy — inferring individual-level relationships from aggregate-level correlations — and it requires explicit acknowledgement when aggregate data is used.
References
- Field, A. (2024). Discovering Statistics Using IBM SPSS Statistics (6th ed.). Sage.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum.
- Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.
- American Psychological Association. (2020). Publication Manual of the APA (7th ed.).
- Tabachnick, B. G., & Fidell, L. S. (2022). Using Multivariate Statistics (8th ed.). Pearson.
- Frost, J. (2023). Regression Analysis: An Intuitive Guide. Statistics By Jim Publishing.
Next: Cluster Post 4 — Statistical Assumptions: The Checks Most Researchers Skip
The Complete Guide to Research Paper Structure: IMRAD Format, Thesis Organization & Academic Writing (2026)
From Concept to Submission: A Complete Guide to Research Paper and Thesis Writing Module 1:…
The IMRAD Framework: Why It Exists, How It Really Works, and Where It Breaks Down
Cluster Post 1 | Module 1: Understanding the Structure of Research Papers and Theses From…
How to Write a Research Introduction That Reviewers Cannot Ignore
Cluster Post 2 | Module 1: Understanding the Structure of Research Papers and Theses From…
How to Write a Methods Section That Reviewers Will Trust
Cluster Post 3 | Module 1: Understanding the Structure of Research Papers and Theses From…
The Results Section: How to Present Findings Without Letting Interpretation Slip In
Cluster Post 4 | Module 1: Understanding the Structure of Research Papers and Theses From…
The Discussion Section: How to Turn Findings Into Knowledge
Cluster Post 5 | Module 1: Understanding the Structure of Research Papers and Theses From…
Complete Thesis Structure: A Chapter-by-Chapter Guide
Cluster Post 6 | Module 1: Understanding the Structure of Research Papers and Theses From…
10 Structural Mistakes That Get Research Papers Rejected — And How to Fix Every One
Cluster Post 7 | Module 1: Understanding the Structure of Research Papers and Theses From…
The Academic Writing Process: Complete Guide from First Draft to Submission (2026)
From Concept to Submission: A Complete Guide to Research Paper and Thesis Writing Module 2,…
How to Start Writing — and Keep Going
Cluster Post 1 | Module 2: The Academic Writing Process From Concept to Submission Series …
How to Write Clear Engaging Academic Prose
Cluster Post 2 | Module 2: The Academic Writing Process From Concept to Submission Series …
The Revision Process: How to Turn a Draft Into a Submission
Cluster Post 3 | Module 2: The Academic Writing Process From Concept to Submission Series …
Citation Styles Explained: APA, MLA, Chicago, IEEE, and Bluebook
Cluster Post 4 | Module 2: The Academic Writing Process From Concept to Submission Series …
Reference Management: Zotero and Mendeley from Setup to Submission
Cluster Post 5 | Module 2: The Academic Writing Process From Concept to Submission Series …
Research Methodologies: Complete Guide to Quantitative, Qualitative, Mixed Methods & Legal Research (2026)
Why Methodology Determines Research Quality Here’s what thesis examiners focus on first: your methodology section…
Research Paradigms: Why Your Philosophical Stance Shapes Everything
Cluster Post 1 | Module 3: Research Methodologies From Concept to Submission Series | 2026 ←…
Quantitative Research Design: From Hypothesis to Valid Results
Cluster Post 2 | Module 3: Research Methodologies From Concept to Submission Series | 2026…
Qualitative Research Design: Choosing the Right Approach
Cluster Post 3 | Module 3: Research Methodologies From Concept to Submission Series | 2026…
Qualitative Data Collection and Analysis: Interviews, Coding, and Trustworthiness
Cluster Post 4 | Module 3: Research Methodologies From Concept to Submission Series | 2026…
Mixed Methods Research: When and How to Combine Approaches
Cluster Post 5 | Module 3: Research Methodologies From Concept to Submission Series | 2026…
Sampling: Choosing Who to Study and How Many
Cluster Post 6 | Module 3: Research Methodologies From Concept to Submission Series | 2026…
Research Ethics in Practice: What Ethics Forms Don’t Tell You
Cluster Post 7 | Module 3: Research Methodologies From Concept to Submission Series | 2026…
Data Analysis and Results Presentation: Complete Guide for Quantitative, Qualitative & Legal Research (2026)
Data Analysis and Results Presentation: Why Data Analysis Determines Research Credibility Here’s what separates published…
Preparing Your Data: The Work That Determines Analysis Quality
Cluster Post 1 | Module 4: Data Analysis and Presenting Results From Concept to Submission…
Descriptive Statistics: What to Report and How to Read Them
Cluster Post 2 | Module 4: Data Analysis and Presenting Results From Concept to Submission…
Inferential Statistics: Choosing and Reporting the Right Test
Cluster Post 3 | Module 4: Data Analysis and Presenting Results From Concept to Submission…
Statistical Assumptions: The Checks Most Researchers Skip
Cluster Post 4 | Module 4: Data Analysis and Presenting Results From Concept to Submission…
Presenting Qualitative Findings: Quotes, Themes, and the Balance Between Showing and Telling
Cluster Post 5 | Module 4: Data Analysis and Presenting Results From Concept to Submission…
Writing the Results Section: Separating Findings from Interpretation
Cluster Post 6 | Module 4: Data Analysis and Presenting Results From Concept to Submission…
Organization and Academic Tone: Complete Guide to Professional Scholarly Writing (2026)
Why Organization and Academic Tone Matter More Than You Think Here’s what thesis examiners notice…
