- Thomas V Perneger (perneger@cmu.unige.ch), medical epidemiologist
- Institute of Social and Preventive Medicine, University of Geneva, CH-1211 Geneva 4, Switzerland
- Correspondence to: Dr Perneger
- Accepted 16 January 1998
When more than one statistical test is performed in analysing the data from a clinical study, some statisticians and journal editors demand that a more stringent criterion be used for “statistical significance” than the conventional P<0.05.1 Many well meaning researchers, eager for methodological rigour, comply without fully grasping what is at stake. Recently, adjustments for multiple tests (or Bonferroni adjustments) have found their way into introductory texts on medical statistics, which has increased their apparent legitimacy. This paper advances the view, widely held by epidemiologists, that Bonferroni adjustments are, at best, unnecessary and, at worst, deleterious to sound statistical inference.
Summary points
Adjusting statistical significance for the number of tests that have been performed on study data—the Bonferroni method—creates more problems than it solves
The Bonferroni method is concerned with the general null hypothesis (that all null hypotheses are true simultaneously), which is rarely of interest or use to researchers
The main weakness is that the interpretation of a finding depends on the number of other tests performed
The likelihood of type II errors is also increased, so that truly important differences are deemed non-significant
Simply describing what tests of significance have been performed, and why, is generally the best way of dealing with multiple comparisons
Adjustment for multiple tests
Bonferroni adjustments are based on the following reasoning.1-3 If a null hypothesis is true (for instance, two treatment groups in a randomised trial do not differ in terms of cure rates), a significant difference (P<0.05) will be observed by chance once in 20 trials. This is the type I error, or α. When 20 independent tests are performed (for example, study groups are compared with regard to 20 unrelated variables) and the null hypothesis holds for all 20 comparisons, the chance of at least one test being significant is no longer 0.05, but 0.64. …
Sign in
Article access
Article access for 1 day
Purchase this article for £20 $30 €32*
The PDF version can be downloaded as your personal record







CiteULike
Connotea
Del.icio.us
Digg
Facebook
Mendeley
Reddit
Technorati
Twitter
Stumbleupon
Rapid responses
Latest Responses
Re: Bringing Nightingale down to size
Published 29 May 2012
Re: Avoid antimuscarinic drugs in people with dementia
Published 29 May 2012
Re: Strengthening primary health care: Related to the integration of medical training, community service need and health administration
Published 29 May 2012
Re: Strengthening primary health care: Related to the integration of medical training, community service need and health administration
Published 29 May 2012
Health Literacy: Patient involvement and engagement with healthcare
Published 29 May 2012
Most responses
Venous thrombosis in users of non-oral hormonal contraception: follow-up study, Denmark 2001-10 (12 responses)
Published 10 May 2012 - 23:32
The psychiatric oligarchs who medicalise normality (9 responses)
Published 2 May 2012 - 15:42
Are doctors justified in taking industrial action in defence of their pensions? No (8 responses)
Published 8 May 2012 - 12:21
Are doctors justified in taking industrial action in defence of their pensions? Yes (8 responses)
Published 8 May 2012 - 12:21
The hardest thing: admitting error (7 responses)
Published 2 May 2012 - 12:27