- aMedical Statistics Laboratory, Imperial Cancer Research Fund, London WC2A 3PX,
- bDepartment of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE
- Correspondence to: Mr Altman.
The non-equivalence of statistical significance and clinical importance has long been recognised, but this error of interpretation remains common. Although a significant result in a large study may sometimes not be clinically important, a far greater problem arises from misinterpretation of non-significant findings. By convention a P value greater than 5% (P>0.05) is called “not significant.” Randomised controlled clinical trials that do not show a significant difference between the treatments being compared are often called “negative.” This term wrongly implies that the study has shown that there is no difference, whereas usually all that has been shown is an absence of evidence of a difference. These are quite different statements.
The sample size of controlled trials is generally inadequate, with a consequent lack of power to detect real, and clinically worthwhile, differences in treatment. Freiman et al1 found that only 30% of a sample of 71 trials published in the New England Journal of Medicine in 1978-9 with P>0.1 were large enough to have a 90% chance of detecting even a 50% difference in the effectiveness of the treatments being compared, and they found no improvement in a similar sample of trials published in 1988. To interpret all these “negative” trials as providing evidence of the ineffectiveness of new treatments is clearly wrong and foolhardy. The term “negative” should not be used in this context.2
A recent example is given by a trial comparing octreotide and sclerotherapy in patients with variceal bleeding.3 The study was carried out on a sample of only 100 despite a reported calculation that suggested that 1800 patients were needed. This trial had only a 5% chance of getting a statistically significant result if the stated clinically worthwhile treatment difference truly existed. One consequence of such low statistical power was a wide confidence interval for the treatment difference. The authors concluded that the two treatments were equally effective despite a 95% confidence interval that included differences between the cure rates of the two treatments of up to 20 percentage points.
Similar evidence of the dangers of misinterpretation of non-significant results is found in numerous metaanalyses (overviews) of published trials, when few or none of the individual trials were statistically large enough. A dramatic example is provided by the overview of clinical trials evaluating fibrinolytic treatment (mostly streptokinase) for preventing reinfarction after acute myocardial infarction. The overview of randomised controlled trials found a modest but clinically worthwhile (and highly significant) reduction in mortality of 22%,4 but only five of the 24 trials had shown a statistically significant effect with P<0.05. The lack of statistical significance of most of the individual trials led to a long delay before the true value of streptokinase was appreciated.
While it is usually reasonable not to accept a new treatment unless there is positive evidence in its favour, when issues of public health are concerned we must question whether the absence of evidence is a valid enough justification for inaction. A recent publicised example is the suggested link between some sudden infant deaths and antimony in cot mattresses. Statements about the absence of evidence are common—for example, in relation to the possible link between violent behaviour and exposure to violence on television and video, the possible harmful effects of pesticide residues in drinking water, the possible link between electromagnetic fields and leukaemia, and the possible transmission of bovine spongiform encephalopathy from cows. Can we be comfortable that the absence of clear evidence in such cases means that there is no risk or only a negligible one?
When we are told that “there is no evidence that A causes B” we should first ask whether absence of evidence means simply that there is no information at all. If there are data we should look for quantification of the association rather than just a P value. Where risks are small P values may well mislead: confidence intervals are likely to be wide, indicating considerable uncertainty. While we can never prove the absence of a relation, when necessary we should seek evidence against the link between A and B—for example, from case-control studies. The importance of carrying out such studies will relate to the seriousness of the postulated effect and how widespread is the exposure in the population.