Intended for healthcare professionals

Rapid response to:

Education And Debate

Sifting the evidence—what's wrong with significance tests?Another comment on the role of statistical methods

BMJ 2001; 322 doi: https://doi.org/10.1136/bmj.322.7280.226 (Published 27 January 2001) Cite this as: BMJ 2001;322:226

Rapid Response:

Understanding statistical significance testing

Understanding statistical significance testing

Editor – Few readers of the BMJ appear to understand statistical
significance testing(1, 2). This is not surprising. Imagine a patient with
a BP of 170/110 asking a doctor to give the probability that
‘hypertension’ is replicated by being higher than 150/90 when repeated.
Instead of being told ‘about 95%’ say, the patient is told that if we
assume that the blood pressure was 120/80, then the likelihood of the
observed BP being 170/110 or higher by chance would be 4%. The BP of
120/80 is analogous to a null hypothesis and the likelihood of 4% is
analogous to a P value.

In a cross over trial, 14 out of 19 patients respond to drug A better
than B. The ‘P value’ is calculated using the binomial theorem by
selecting 19 patients at random from a hypothetical population where ‘A is
better than B’ in 0.5 of cases (a null hypothesis). The proportion
selected with the observed result of 14/19 or 15/19 up to 19/19 would be
3.18% (i.e. P = 0.0318).

A doctor might wonder about the probability of replication(3) i.e.
drug A being still better than drug B if the study was repeated with the
same numbers (i.e. getting a result of 10/19 or 11/19, up to 19/19). We
can find this by selecting 19 patients from a hypothetical population of
0.737 (=14/19) made up of patients from pooled studies with a result of
14/19. Using the binomial theorem again, the proportion of studies with a
result of 10/19 or over is 96.06%, i.e. the ‘probability of replication’
in this case.

Note that this probability of replication of 96.06% is similar to 1 –
P = 100 – 3.18 = 96.82%. This is the case when ‘replication’ means getting
a similar result again but in the special case that just excludes the null
hypothesis. The probability of replication would also rise or fall,
conditional upon other factors such as the specified replication limits
(these may be confidence limits), the accuracy the study’s description,
the similarity of patients and geographical areas.

Replicating clinical findings and study results is familiar to
doctors and scientists. To understand something, we have to use models
based on our own familiar experiences. Statistical hypotheses are
concerned with replication. Scientific hypotheses are based on much
imagination and use of other models too that are not necessarily
statistical(4).

D E H Llewelyn
Consultant Physician

KRUF Centre, West Wales General Hospital, Carmarthen SA31 2AF

1. Editor’s choice. BMJ 2001; 322:0 (27th January).

2. Sterne JAC, Smith GD. BMJ 2001; 322: 226-231. (27th January).

3. Llewelyn DEH. Assessing the validity of diagnostic tests and
clinical decisions. MD thesis, University of London 1988.

4. Llewelyn DEH, Hopkins A. Editors’ introduction. In Analysing how
we reach clinical decisions. London: Royal College of Physicians
Publications 1993.

Competing interests: No competing interests

19 February 2001
D E H Llewelyn
Consultant Physician
West Wales General Hospital