Intended for healthcare professionals


Some gentle statistics

BMJ 2001; 322 doi: (Published 27 January 2001) Cite this as: BMJ 2001;322:a

I propose that all readers of the BMJ with a shaky knowledge of statistics (which, if we are frank, is about 99.9% of us) read every word—and all the boxes—of the paper on significance testing on p 226. Many readers don't like longish articles and are phobic about statistics, which is why they should start with the cartoons. One shows a newsreader on “Today's Random Medical News” (p 227). Behind him are three dials indicating that “Smoking, coffee, stress, etc” cause “Heart disease, depression, breast cancer, etc” in “Children, rats, men aged 25-40.” You've heard those bulletins, and so have your patients. In the second cartoon a listener with a spinning head is being told “Don't eat eggs … eat more eggs … stay out of the sun … don't lie around inside” (p 230).

Why have we got into such a mess? Why has a recent book suggested that the solution to medicine's ills would be the closure of all departments of epidemiology? The answer, according to Jonathan Sterne and George Davey Smith, is an overdependence on significance testing and too many small and imprecise trials testing improbable hypotheses.

Most BMJ readers are used to the idea that journals will contain many “false positive” studies if they deem that any study with a P value of 0.05 or under is “positive” and report studies that measure many variables and many outcomes in many subgroups. But the authors make a plausible calculation to show that it's much worse than that. Firstly, they assume that 10% of hypotheses are true and 90% untrue, a reasonable assumption. Their second assumption is that most studies are too small and that the average power of studies reported in medical journals is 50%. Lots of evidence supports this assumption.

Consider then 1000 studies testing different hypotheses. One hundred will be true, but 50% of those (because of lack of power) will be reported as untrue. From the 900 hypotheses that are untrue 45 will be reported as true because of the use of P<0.05 as true. So almost half of the 95 studies reported as “positive” are false alarms.

“In many ways,” argue the authors, “the general public is ahead of medical researchers in its interpretation of new ‘evidence.’ The reaction to ‘lifestyle scares’ is usually cynical, which, for many reasons, may well be rational.” The authors propose guidelines for reporting (and interpreting) the results of statistical analyses in medical journals, but one memorable guideline comes in a commentary on the paper: “All reports of large effects confined to Aston Villa supporters over the age of 75 and living south of Birmingham should go into the wastepaper basket” (p 231). For Aston Villa read Chelsea, Real Madrid, Wimbledon, or even Queens Park Rangers. You get the point.


View Abstract

Log in

Log in through your institution


* For online subscription