P valuesBMJ 2010; 340 doi: https://doi.org/10.1136/bmj.c2203 (Published 28 April 2010) Cite this as: BMJ 2010;340:c2203
- Philip Sedgwick, senior lecturer in medical statistics
A randomised controlled trial evaluated the cost and efficacy of community leg ulcer clinics that used four layer compression bandaging.1 The control treatment was provision of usual care by district nurses. Over the 12 months of follow-up, ulcers healed more quickly among patients randomly assigned to clinic treatment than in those assigned to the control treatment (P=0.03). On the other hand, there was no difference between treatment groups in mean total NHS costs per patient (P=0.89). All statistical tests were two sided, and the critical level of significance was set at 0.05 (5%).
Which of the following statements, if any, can be concluded?
a) The P value represents the strength of the evidence in support of the null hypothesis
b) There was a statistically significant difference between treatments in healing times at the 0.05 level of significance
c) There is no difference in mean total NHS costs between treatments in the total population
d) The null hypothesis for the statistical test of mean total NHS costs between treatments is true.
Answers a and b can be concluded; answers c and d cannot.
Last week’s question described the role of the null and alternative hypotheses in statistical hypothesis testing.2 The null and alternative hypotheses for leg ulcer healing times are shown in the box.
Null and alternative hypotheses
Null hypothesis: For the population from which the sample was obtained, no differences exist in the healing times of leg ulcers between patients treated at community clinics and those treated by district nurses.
Alternative hypothesis: For the population from which the sample was obtained, the leg ulcer healing times of patients treated by district nurses are not the same as in patients treated in community clinics (two sided).
The P value is a probability and, therefore, indicative of how likely an event is to occur. It represents the strength of the evidence provided by the sample data in support of the null hypothesis (a is true). The classic definition of the P value for the above example of ulcer healing times is the probability that the observed difference between treatment groups in average leg ulcer healing times, or a larger difference, could have been obtained under the null hypothesis—that is, if there were no differences in the total population between patients treated in community clinics and those treated by district nurses in the healing times of leg ulcers.
A large P value suggests the sample data support the null hypothesis, whereas a small P value suggests they do not. The cut off between a large and a small P value is typically set to 0.05 (that is, 5%), termed the critical level of significance. The P value for the statistical test of healing times is 0.03, which is less than 0.05. Therefore, there is little evidence to support the null hypothesis, and we reject it in favour of the alternative hypothesis (b is true). There is a statistically significant difference in healing times at the 0.05 level of significance—observation of the sample data would reveal that leg ulcer healing times were shorter if patients were treated in community clinics.
There was not a statistically significant difference between treatment groups in the mean total NHS costs (P=0.89). However, this finding does not necessarily mean that there is no difference in the total population (c is false). All that we can conclude is that the study has failed to show a difference: absence of evidence is not evidence of absence.3 A further study, perhaps with a different sample of patients with leg ulcers or involving community clinics and district nurses in another geographical region, may give different results. Therefore, we can never conclude from a study that the null or alternative hypothesis is true or false (d is false). Sample data only ever provide evidence in support of the null or alternative hypothesis, in turn permitting inferences to be made about the population.
Cite this as: BMJ 2010;340:c2203
Competing interests: None declared.