Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov 26;110(48):19313-7.
doi: 10.1073/pnas.1313476110. Epub 2013 Nov 11.

Revised standards for statistical evidence

Affiliations

Revised standards for statistical evidence

Valen E Johnson. Proc Natl Acad Sci U S A. .

Abstract

Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25-50:1, and to 100-200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Evidence thresholds and size of corresponding significance tests. The UMPBT and significance tests used to construct this plot have the same (z, formula image, and binomial tests) or approximately the same (t tests) rejection regions. The smooth curves represent, from Top to Bottom, t tests based on 20, 30, and 60 degrees of freedom, the z test, and the χ2 test on 1 degree of freedom. The discontinuous curves reflect the correspondence between tests of a binomial proportion based on 20, 30, or 60 observations when the null hypothesis is p0 = 0.5.
Fig. 2.
Fig. 2.
P values versus UMPBT Bayes factors. This plot depicts approximate Bayes factors derived from 765 t statistics reported by Wetzels et al. (20). A breakdown of the curvilinear relationship between Bayes factors and P values occurs in the lower right portion of the plot, which corresponds to t statistics that produce Bayes factors that are near their maximum value.
Fig. 3.
Fig. 3.
Histogram of P values that were less than 0.05 and reported in ref. .

Comment in

Similar articles

Cited by

References

    1. Zimmer C. (April 16, 2012) A sharp rise in retractions prompts calls for reform. NY Times, Science Section.
    1. Naik G. (December 2, 2011) Scientists’ elusive goal: Reproducing study results. Wall Street Journal, Health Section.
    1. Begg CB, Mazumdar M. Operating characteristics of a rank correlation test for publication bias. Biometrics. 1994;50(4):1088–1101. - PubMed
    1. Duval S, Tweedie R. Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics. 2000;56(2):455–463. - PubMed
    1. Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294(2):218–228. - PubMed

Publication types