Intended for healthcare professionals

Research Methods & Reporting

Interpreting and reporting clinical trials with results of borderline significance

BMJ 2011; 343 doi: https://doi.org/10.1136/bmj.d3340 (Published 04 July 2011) Cite this as: BMJ 2011;343:d3340

Origin of the 5% p-value threshold

Whilst few would disagree that the assessment of the reliability of a
study depends on more than juat the p-value, it is nonetheless a key
indicator that should not be dismissed lightly

The use of the 5% p-value threshold appears to have become universal
in biomedical research, yet it does not seem to to be based on any clear
statistical reasoning. So far as I can make out, the origin of this
threshold seems to lie in a discussion of the theoretical basis of
experimental design, published by the Cambridge geneticist and
statistician RA Fisher in 1926 [1].

Fisher's work laid the statistical foundation for the evolution of
randomised controled trials, which he and others developed over the
subsequent 25 years. With regard to the use of p<0.05, this was an
arbitrary threshold that he adopted in order to discuss the broader issue
of statistical significance, which was then a novel concept. To quote
Fisher from this paper:

"...If one in twenty does not seem high enough odds, we may, if we
prefer it, draw the line at one in fifty or one in a hundred. Personally,
the writer prefers to set a low standard of significance at the 5 per cent
point, and ignore entirely all results which fails to reach this level. A
scientific fact should be regarded as experimentally established only if a
properly designed experiment rarely fails to give this level of
significance..."

My interpretation of this is that the 5% standard represents the
absolute minimum standard for a single study, with non-significant results
only being admissable if the study falls within the context of a broader
evidence base, composed of similar studies that did yield statistically
significant results.

So while it may be entirely reasonable to set a higher threshold than
5%, especially where methodological concerns raise the risk of bias,
moving to a lower threshold is much more difficult to justify.

Reference

1. Fisher RA. The arrangement of field experiments. Journal of the
Ministry of Agriculture of Great Britain 1926; 33:503-513.

Competing interests: I run a company that provides data analytic and health economic consultancy services to the pharmaceutical industry.

18 July 2011
Jonathan D Belsey
Managing Director
JB Medical Ltd