Rapid responses are electronic comments to the editor. They enable our users to debate issues raised in articles published on bmj.com. A rapid response is first posted online. If you need the URL (web address) of an individual response, simply click on the response headline and copy the URL from the browser window. A proportion of responses will, after editing, be published online and in the print journal as letters, which are indexed in PubMed. Rapid responses are not indexed in PubMed and they are not journal articles. The BMJ reserves the right to remove responses which are being wilfully misrepresented as published articles.
I thank Schiff[1] for his rapid response to my endgame on statistical hypothesis testing.[2]
The statistical null or alternative hypotheses are statements about the population. However, it is a common misconception that the null or alternative hypothesis can be deemed true or false on the basis of the result (P-value) of the hypothesis test. In the absence of statistical significance, then it can only be inferred that the study failed to find evidence to reject the null hypothesis in favour of the alternative; otherwise in the presence of statistical significance the study found evidence to reject the null hypothesis in favour of the alternative. The reasoning for this is straightforward. The participants in a trial are a single sample from the population. Another trial will involve a different sample that may give very different results. This underlies the principle of sampling error,[3] and is the basis for undertaking a meta-analyses. For obvious reasons, it is unlikely that any drug or therapeutic regimen would be licenced on the results of a single trial.
Theoretically it would be difficult to prove that a statistical hypothesis is true or false. Presumably the only way to achieve this would be for the entire population to be sampled. This is not theoretically possible. If it was, then under such circumstances statistical hypothesis testing would obviously not be performed as it would not be considered necessary.
The trial referenced in the endgame was designed as a superiority trial. Whilst the text did not indicate this, it was implicit since the statistical hypothesis testing was stated as two-sided. Traditionally, two-sided statistical hypothesis testing for a placebo-controlled trial is a process of investigating or demonstrating the prior belief of superiority of the drug in effectiveness (as specified by the research hypothesis). If the researchers had a prior belief that doubted the effectiveness of varenicline (compared to placebo) – as Schiff suggested – would it be ethical to undertake such a trial? No doubt it would be deemed unethical to undertake a comparative placebo-controlled trial if the drug was thought to be of no benefit (and possibly less effective than placebo), and had potential side-effects! Would patients consent to that? Obviously superiority of placebo over varenicline could be demonstrated in a superiority design. However, if the researchers had a prior belief that doubted the effectiveness of varenicline (compared to placebo), it would be an interesting debate as to whether they would have undertaken a non-inferiority trial rather than a superiority one.
1. Schiff A. Re: Understanding statistical hypothesis testing. 22 June 2014.
2. Sedgwick P. Understanding statistical hypothesis testing. BMJ 2014;348:g3557.
3. Sedgwick P. What is sampling error? BMJ 2012;344:e4285.
Competing interests:
No competing interests
15 August 2014
Philip M Sedgwick
Reader in Medical Statistics and Medical Education
Just read the print version - gobsmacked to be told c is false! And I spent 17 years designing clinical trials in the drug industry.
The online "explanation" (para 8)doesn't help either - what's the difference between inferring that varen is better than placebo and inferring the null hypothesis is false? I hope the author will respond to this!
Also, I would disagree with the statement "The research hypothesis would have been that the outcome would be superior with varenicline compared with placebo" - the researchers could have been a sceptical lot, and doubted the value of varenicline - how are we to know?
Author’s reply: Understanding statistical hypothesis testing
I thank Schiff[1] for his rapid response to my endgame on statistical hypothesis testing.[2]
The statistical null or alternative hypotheses are statements about the population. However, it is a common misconception that the null or alternative hypothesis can be deemed true or false on the basis of the result (P-value) of the hypothesis test. In the absence of statistical significance, then it can only be inferred that the study failed to find evidence to reject the null hypothesis in favour of the alternative; otherwise in the presence of statistical significance the study found evidence to reject the null hypothesis in favour of the alternative. The reasoning for this is straightforward. The participants in a trial are a single sample from the population. Another trial will involve a different sample that may give very different results. This underlies the principle of sampling error,[3] and is the basis for undertaking a meta-analyses. For obvious reasons, it is unlikely that any drug or therapeutic regimen would be licenced on the results of a single trial.
Theoretically it would be difficult to prove that a statistical hypothesis is true or false. Presumably the only way to achieve this would be for the entire population to be sampled. This is not theoretically possible. If it was, then under such circumstances statistical hypothesis testing would obviously not be performed as it would not be considered necessary.
The trial referenced in the endgame was designed as a superiority trial. Whilst the text did not indicate this, it was implicit since the statistical hypothesis testing was stated as two-sided. Traditionally, two-sided statistical hypothesis testing for a placebo-controlled trial is a process of investigating or demonstrating the prior belief of superiority of the drug in effectiveness (as specified by the research hypothesis). If the researchers had a prior belief that doubted the effectiveness of varenicline (compared to placebo) – as Schiff suggested – would it be ethical to undertake such a trial? No doubt it would be deemed unethical to undertake a comparative placebo-controlled trial if the drug was thought to be of no benefit (and possibly less effective than placebo), and had potential side-effects! Would patients consent to that? Obviously superiority of placebo over varenicline could be demonstrated in a superiority design. However, if the researchers had a prior belief that doubted the effectiveness of varenicline (compared to placebo), it would be an interesting debate as to whether they would have undertaken a non-inferiority trial rather than a superiority one.
1. Schiff A. Re: Understanding statistical hypothesis testing. 22 June 2014.
2. Sedgwick P. Understanding statistical hypothesis testing. BMJ 2014;348:g3557.
3. Sedgwick P. What is sampling error? BMJ 2012;344:e4285.
Competing interests: No competing interests