Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra dataBMJ 2014; 348 doi: https://doi.org/10.1136/bmj.g2215 (Published 31 March 2014) Cite this as: BMJ 2014;348:g2215
- John Wood, principal research associate1,
- Nick Freemantle, professor of clinical epidemiology and biostatistics1,
- Michael King, professor of primary care psychiatry2,
- Irwin Nazareth, professor of primary care and population science1
- 1Research Department of Primary Care and Population Health and PRIMENT Clinical Trials Unit, University College London, London NW3 2PF, UK
- 2Division of Psychiatry and PRIMENT Clinical Trials Unit, University College London, London W1W 7EJ, UK
- Correspondence to: J Wood
- Accepted 10 March 2014
P values that fail to reach the conventional significance level of P≤0.05 are regularly reported as if they were moving in that direction. Phrases such as “almost/approaching statistical significance” or, most tellingly, a “trend towards” statistical significance continue to find their way into papers in journals with high impact factors.1 In this article, we examine the mathematical basis for this assumption and assess the extent to which a near significant P value may predict movement towards a future significant P value through the addition of extra data. We also explore the likelihood that extra data would actually result in a significant outcome and, lastly, the confidence one might have that a repeat experiment would independently give statistically significant results.
What does P value represent?
The clearest context in which to consider the correct interpretation of a P value is within a randomised trial. Fisher described how the “simple precaution of randomisation will suffice to guarantee the validity of the test of significance.”2 Random allocation of participants to groups ensures that only the play of chance or a real effect of treatment can explain any difference seen in outcome between the groups. A P value tells us how far chance alone can explain the observed difference and acts as a “snapshot” measure of the strength of evidence at the end of the trial.
Calculating extent to which “near significant” P value predicts subsequent significant one
As evidence accumulates, conclusions become more firmly based. A P value could easily be imagined as following …