Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 21:7:e36163.
doi: 10.7554/eLife.36163.

Why we need to report more than 'Data were Analyzed by t-tests or ANOVA'

Affiliations

Why we need to report more than 'Data were Analyzed by t-tests or ANOVA'

Tracey L Weissgerber et al. Elife. .

Abstract

Transparent reporting is essential for the critical evaluation of studies. However, the reporting of statistical methods for studies in the biomedical sciences is often limited. This systematic review examines the quality of reporting for two statistical tests, t-tests and ANOVA, for papers published in a selection of physiology journals in June 2017. Of the 328 original research articles examined, 277 (84.5%) included an ANOVA or t-test or both. However, papers in our sample were routinely missing essential information about both types of tests: 213 papers (95% of the papers that used ANOVA) did not contain the information needed to determine what type of ANOVA was performed, and 26.7% of papers did not specify what post-hoc test was performed. Most papers also omitted the information needed to verify ANOVA results. Essential information about t-tests was also missing in many papers. We conclude by discussing measures that could be taken to improve the quality of reporting.

Keywords: analysis of variance; human biology; medicine; meta-research; none; statistics; systematic review; t-test; transparency.

PubMed Disclaimer

Conflict of interest statement

TW, OG, VG, NM, SW No competing interests declared

Figures

Figure 1.
Figure 1.. Systematic review flow chart.
The flow chart illustrates the selection of articles for inclusion in this analysis at each stage of the screening process.
Figure 2.
Figure 2.. Many papers lack the information needed to determine what type of ANOVA was performed.
The figure illustrates the proportion of papers in our sample that reported information needed to determine what type of ANOVA was performed, including the number of factors, the names of factors, and the type of post-hoc tests. The top panel presents the proportion of all papers that included ANOVA (n = 225). 'Sometimes' indicates that the information was reported for some ANOVAs but not others. The bottom row examines the proportion of papers that specified whether each factor was between vs. within-subjects. Papers are subdivided into those that reported using repeated measures ANOVA (n = 41), and those that did not report using repeated measures ANOVA (n = 184). RM: repeated measures.
Figure 3.
Figure 3.. Why it matters whether investigators use a one-way vs two-way ANOVA for a study design with two factors.
The two-way ANOVA allows investigators to determine how much of the variability explained by the model is attributed to the first factor, the second factor, and the interaction between the two factors. When a one-way ANOVA is used for a study with two factors, this information is missed because all variability explained by the model is assigned to a single factor. We cannot determine how much variability is explained by each of the two factors, or test for an interaction. The simulated dataset includes four groups – wild-type mice receiving placebo (closed blue circles), wild-type mice receiving an experimental drug (open blue circles), knockout mice receiving placebo (closed red circles) and knockout mice receiving an experimental drug (open red circles). The same dataset was used for all four examples, except that means for particular groups were shifted to show a main effect of strain, a main effect of treatment, and interaction between strain and treatment or no main effects and no interaction. One- and two-way (strain x treatment) ANOVAs were applied to illustrate differences between how these two tests interpret the variability explained by the model.
Figure 4.
Figure 4.. Additional implications of using a one-way vs two-way ANOVA.
This figure compares key features of one- and two-way ANOVAs to illustrate potential problems with using a one-way ANOVA for a design with two or more factors. When used for a study with two factors, the one-way ANOVA incorrectly assumes that the groups are unrelated, generates a single p-value that does not provide information about which groups are different, and does not test for interactions. The two-way ANOVA correctly interprets the study design, which can increase power. The two-way ANOVA also allows for the generation of a set of p-values that provide more information about which groups may be different, can test for interactions, and may eliminate the need for unnecessary post-hoc comparisons. This figure uses an experimental design with four groups (wild-type mice receiving placebo, wild-type mice receiving an experimental drug, knockout mice receiving placebo and knockout mice receiving an experimental drug). See Figure 2 for a detailed explanation of the material in the statistical implications section. KO: knockout; WT: wild-type; Pla: placebo.
Figure 5.
Figure 5.. Why it matters whether investigators used an ANOVA with vs. without repeated measures.
This figure highlights the differences between ANOVA with vs. without repeated measures and illustrates the problems with using an ANOVA without repeated measures when the study design includes longitudinal or non-independent measurements. These two tests interpret the data differently, test different hypotheses, use information differently when calculating the test statistic, and give different results.
Figure 6.
Figure 6.. Why papers need to contain sufficient detail to confirm that the appropriate t-test was used.
This figure highlights the differences between unpaired and paired t-tests by illustrating how these tests interpret the data differently, test different hypotheses, use information differently when calculating the test statistic, and give different results. If the wrong t-test is used, the result may be misleading because the test will make incorrect assumptions about the experimental design and may test the wrong hypothesis. Without the original data, it is very difficult to determine what the result should have been (see Figure 6).
Figure 7.
Figure 7.. Differences between the results of statistical tests depend on the data.
The three datasets use different pairings of the values shown in the dot plot on the left. The comments on the right side of the figure illustrate what happens when an unpaired t-test is inappropriately used to compare paired, or related, measurements. We expect paired data to be positively correlated – two paired observations are usually more similar than two unrelated observations. The strength of this correlation will vary. We expect observations from the same participant to be more similar (strongly correlated) than observations from pairs of participants matched for age and sex. Stronger correlations result in greater discrepancies between the results of the paired and unpaired t-tests. Very strong correlations between paired data are unusual but are presented here to illustrate this relationship. We do not expect paired data to be negatively correlated – if this happens it is important to review the experimental design and data to ensure that everything is correct.
Figure 8.
Figure 8.. Few papers report the details needed to confirm that the result of the ANOVA was correct.
This figure reports the proportion of papers with ANOVAs (n = 225) that reported the F-statistic, degrees of freedom and exact p-values. Sometimes indicates that the information was reported for some ANOVAs contained in the paper but not for others.

Similar articles

Cited by

References

    1. Cumming G. The new statistics: why and how. Psychological Science. 2014;25:7–29. doi: 10.1177/0956797613504966. - DOI - PubMed
    1. Diong J, Butler AA, Gandevia SC, Héroux ME. Poor statistical reporting, inadequate data presentation and spin persist despite editorial advice. PLoS One. 2018;13:e0202121. doi: 10.1371/journal.pone.0202121. - DOI - PMC - PubMed
    1. Ellis DA, Merdian HL. Thinking outside the box: developing dynamic data visualizations for psychology with shiny. Frontiers in Psychology. 2015;6:1782. doi: 10.3389/fpsyg.2015.01782. - DOI - PMC - PubMed
    1. EMBO Press Author Guidelines (The EMBO Journal) [December 15 , 2018];2017 http://emboj.embopress.org/authorguide#statisticalanalysis
    1. Eskamp S, Nuijten MB. statcheck: Extract statistics from articles and recompute p values. 1.2.22016 http://CRAN.R-project.org/package=statcheck

Publication types