CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials
BMJ 2010; 340 doi: https://doi.org/10.1136/bmj.c869 (Published 24 March 2010) Cite this as: BMJ 2010;340:c869
All rapid responses
Misinterpretation of results is a major problem in reports of trials.
At the same time as the CONSORT 2010 update was published, an article
appeared advocating that reports of completed studies should not include
power calculations [1], because power is irrelevant for interpreting
completed studies and investigators and readers often misinterpret
studies’ results by focusing on power and P-values to the exclusion of
estimates and confidence intervals. Although this viewpoint is likely too
drastic for the CONSORT group to adopt, the 2010 CONSORT explanation and
elaboration document [2] could have done more to mitigate this problem.
Misinterpretations are often based on the belief that a calculation
showing high power permits any result with P>0.05 to be interpreted as
strong evidence against any effect large enough to be important. This
occurs in both large and small studies, and even when the effect actually
estimated by the study would be important if it turned out to be correct.
The CONSORT document mentions a form of this problem in the section on
sample size (Item 7a), but not in the section on “Interpretation
consistent with results ...” (Item 22) and without clearly indicating how
such misinterpretation should be avoided. The implication that this is
mainly a problem “when in fact too few patients were studied to make such
a claim” is confusing: this is an interpretation problem, not a sample
size problem. Proper interpretation should take into account the
estimated effect and its confidence interval, rather than focusing only on
whether P<_0.05. p="p"/> Unfortunately, the discussion for Item 22 does not address the issue
at all, and the example it gives is a negative interpretation that does
not mention the estimated treatment effect or its confidence interval; it
only states that the study was “large” and the treatment did not reduce
risk “significantly” (which most readers will understand to mean
P>0.05). This example therefore appears to show exactly the sort of
poor reasoning that should be avoided. The CONSORT document does cite a
paper that advocated use of confidence intervals instead of power for
interpretation [3], but only in support of a puzzling statement that for
completed studies “power is then appropriately indicated by confidence
intervals”. Confidence intervals should be used directly for
interpretation instead of power, not as a way to indicate power.
A better example would include support like “the confidence bound
provides evidence against any substantial benefit.” Ironically, such well
-supported negative conclusions are sometimes criticized as invalid due to
lack of power [4, 5]. An additional form of misinterpretation is to
conclude that a difference large enough to be important must be present if
the observed difference was statistically significant. This is touched on
in the discussion of Item 17 (confidence intervals), but again not
addressed at Item 22. Recent research [6] found that about 75% of
publications studied included the “significance fallacy”, meaning that
they failed to clearly distinguish “statistically significant” versus the
general meaning of “significant” as a synonym for “important”.
The CONSORT guidelines recommend reporting estimates and confidence
intervals, and reporting of them has increased [6]. Actual consideration
of them when interpreting study results seems to be much less common.
Such consideration is not as familiar or easy as just noticing whether or
not P<_0.05 but="but" it="it" is="is" crucial="crucial" for="for" valid="valid" interpretation.="interpretation." work="work" has="has" begun="begun" on="on" a="a" wiki="wiki" webpage="webpage" to="to" try="try" make="make" interpretation="interpretation" easier="easier" ctspedia.org="ctspedia.org" do="do" view="view" ctspedia="ctspedia" resultsinterpretation.="resultsinterpretation." the="the" consort="consort" guidelines="guidelines" would="would" also="also" be="be" an="an" excellent="excellent" place="place" emphasize="emphasize" key="key" principles="principles" about="about" how="how" interpret="interpret" results.="results." p="p"/> Peter Bacchetti
University of California, San Francisco, USA
peter@biostat.ucsf.edu
References
1. Bacchetti P: Current sample size conventions: Flaws, harms, and
alternatives. BMC Medicine 2010, 8.
2. Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereaux
PJ, Elbourne D, Egger M, Altman DG: CONSORT 2010 Explanation and
Elaboration: updated guidelines for reporting parallel group randomised
trials. British Medical Journal 2010, 340:28.
3. Goodman SN, Berlin JA: The use of predicted confidence-intervals
when planning experiments and the misuse of power when interpreting
results. Annals of Internal Medicine 1994, 121(3):200-206.
4. Ford AC, Moayyedi P: "Power" of Selective Serotonin Reuptake
Inhibitors in Irritable Bowel Syndrome. Clinical Gastroenterology and
Hepatology 2010, 8(3):313-314.
5. Bacchetti P, Ladabaum U: "Power" of Selective Serotonin Reuptake
Inhibitors in Irritable Bowel Syndrome - Reply. Clinical Gastroenterology
and Hepatology 2010, 8(3):314-314.
6. Silva-Ayçaguer LC, Suárez-Gil P, Fernández-Somoano A: The null
hypothesis significance test in health sciences research (1995-2006):
statistical analysis and interpretation. BMC Med Res Methodol 2010, 10:44.
Competing interests:
None declared
Competing interests: No competing interests
In their discussion on the CONSORT guidelines for clinical trial
reporting, Moher et al. state that accurate and transparent reporting are
important (1). About blinding, they write as follows: “Participants may
respond differently if they are aware of their treatment assignment ...
These biases have been well documented” (6 citations).
One of the six cited papers is the trial by Karlowski et al. (2)
which reported that the benefit of vitamin C against the common cold was
explained by the break in the blind. In the abstract: [the vitamin C]
“effects demonstrated might be explained equally well by a break in the
double blind” (2). The Karlowski trial has frequently been cited by
clinical trialists as an example of the importance of blinding, and by
specialists in nutrition and infectious diseases as an evidence that
vitamin C does not have a real effect on the common cold (3,4).
Given that Moher et al. argue for the importance of good quality
reporting of controlled trials, it is surprising that they ignore obvious
shortcomings in the Karlowski paper. The trial report is inconsistent with
several of the CONSORT 2010 items (1).
Item 1b. Abstract : “The abstract should accurately reflect what is
included in the full journal article” [in their text] and “for the primary
outcome, a result for each group and the estimated effect size and its
precision” [in Table 2]. Karlowski et al.'s abstract does not describe the
results for the primary outcome. Their results section describes:
“Volunteers taking placebo had colds of a mean duration of 7.14 days,
while those taking 3 gm of ascorbic acid (groups 2 and 3) had colds of a
mean duration of 6.59 days and those taking 6 gm had colds of a mean
duration of 5.92 days. Thus, each 3-gm increment of ascorbic acid would
appear to shorten the mean duration of a cold by approximately half a
day”, but this is not mentioned in their abstract (2). Instead, in their
abstract Karlowski et al. describe the findings of a post hoc subgroup
analysis based on guessing the treatment, even though half of recorded
common cold episodes were missing from the subgroup analysis without any
explanation.
Item 3b. “Important changes to methods after trial commencement (such
as eligibility criteria), with reasons” (1). Although Karlowski et al. did
motivate their post hoc subgroup analysis, they did not give any details
about the groups who were “blinded” and “unblinded” after the trial.
Karlowski administered vitamin C in two ways using 2x2 factorial design:
prophylactically each day over the study and therapeutically for 5 days
when a participant caught a cold. Thus, a participant can be “unblinded”
for either or both of the supplementation methods. However, Karlowski et
al. did not report which supplementation method the “unblinded”
participants guessed correctly.
Item 12b. “Methods for additional analyses, such as subgroup analyses
and adjusted analyses. ... Because of the high risk for spurious findings,
subgroup analyses are often discouraged. Post hoc subgroup comparisons
(analyses done after looking at the data) are especially likely not to be
confirmed by further studies. Such analyses do not have great credibility”
(1). In the Karlowski trial, the subgroup analysis by “guessing the
treatment” was a post hoc subgroup comparison. The methods of the subgroup
analysis are poorly described, see Items 3b and 13b. Furthermore,
Karlowski et al. used “correct answer” as a surrogate for “knowing”
without considering that many answers we correct purely by guesswork.
Thus, the main finding of the Karlowski study, on the basis of their
abstract - the reason for Moher et al. to cite the study (1) - is based on
a poorly described post hoc subgroup analysis.
Item 13b. “For each group, losses and exclusions after randomisation,
together with reasons” (1). In the Karlowski trial, 42% (105/249) of
recorded common cold episodes were missing from the subgroup analysis (4),
but no reasons are given to this exclusion. Furthermore, the maximum
effect of vitamin C on common cold duration was even greater in the
“missing group” than in the entire study population (4), but this was not
commented on by Karlowski et al.
Item 20. “Trial limitations, addressing sources of potential bias,
imprecision, and, if relevant, multiplicity of analyses” (1). In the
abstract, the main finding of the Karlowski trial was the subgroup
difference in the effect of vitamin C by guessing the treatment. However,
the shortcomings of the subgroup analysis are not properly discussed.
There are numerous logical inconsistencies that should have been noted by
the original authors (3-5).
The Karlowski trial is particularly important for two reasons: for
methodological and for biological reasons. First, it has been and still is
used as an example of the “placebo effect in action” (1,3). Second, the
Karlowski trial is a frequent citation in medical textbooks when stating
that vitamin C is ineffective against the common cold (3), whereas placebo
-controlled trials have shown quite consistently that it is effective,
even though the practical significance is unsettled (8).
Over a decade ago, I pointed out the problems of the Karlowski
subgroup analysis (4). The principal investigator of the Karlowski trial
did not find errors in my reanalysis (6,7). The study cannot be considered
as an example of placebo effect in action.
Thus, while Moher et al. propose the CONSORT items as a guide for
authors who write trial reports and for the readers of such reports, they
ignore those items when they refer to the Karlowski trial as an evidence
justifying their claim that “biases [caused by the awareness of their
treatment assignment] have been well documented” (1).
Furthermore, Moher et al. do not refer to a recent large meta-
analysis of 202 trials with 60 clinical conditions, which directly
compared a placebo arm with a no-treatment arm (9). Most of the included
trials had three arms so that the third arm received an active
intervention. Therefore, the participants of the placebo arms did not know
whether they were given the active treatment or not (blinded to the
treatment), whereas the no-treatment participants knew that they were not
being treated. Thus, this meta-analysis measures the importance of
blinding. Placebo had no effect when the outcome was binary or objective,
whereas it had effects on several subjective and continuous outcomes (9).
Furthermore, this meta-analysis gives estimates for bias potentially
caused by the lack of blinding in trials on various clinical conditions.
It would seem much more useful to consider the specific conditions when
the lack of blinding may cause substantial bias, in contrast to universal
statements such as “biases have been well documented (1)”.
Citation bias means that authors refer to studies that are consistent
with their preconceptions (10). There is evident citation bias in the
article by Moher et al. (1). When arguing that the lack of blinding causes
bias in controlled trials, they refer to an old study which supports their
preconceptions (2), ignoring the evidence which indicates that the old
study was erroneously analyzed (3-5). In addition, they ignore an
extensive meta-analysis which analyses the effect of blinding on 60
clinical conditions (9). Although trial reports should describe the level
of blinding, bias caused by the awareness of treatment assignment has not
been “well documented”.
Harri Hemilä
University of Helsinki
harri.hemila@helsinki.fi
REFERENCES
1. Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux
PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 explanation and
elaboration: updated guidelines for reporting parallel group randomised
trials. BMJ 2010;340:c869. doi: 10.1136/bmj.c869.
2. Karlowski TR, Chalmers TC, Frenkel LD, Kapikian AZ, Lewis TL, Lynch JM.
Ascorbic acid for the common cold: a prophylactic and therapeutic trial.
JAMA 1975;231:1038-42.
3. Hemilä H. The most influential trial on vitamin C and the common cold:
Karlowski et al. (1975). In: Do vitamins C and E affect respiratory
infections? [PhD Thesis] University of Helsinki, Helsinki, Finland,
2006:21-7. Available at:
http://ethesis.helsinki.fi/julkaisut/laa/kansa/vk/hemila/ and
http://www.ltdk.helsinki.fi/users/hemila/karlowski/
4. Hemilä H. Vitamin C, the placebo effect, and the common cold: a case
study of how preconceptions influence the analysis of results. J Clin
Epidemiol 1996;49:1079-84.
5. Hemilä H. Analysis of clinical data with breached blindness
[commentary]. Stat Med 2006;25:1434-7.
6. Chalmers TC. To the preceding article by H. Hemilä. J Clin Epidemiol
1996;49:1085.
7. Hemilä H. To the dissent by Thomas Chalmers. J Clin Epidemiol
1996;49:1087.
8. Hemilä H, Chalker EB, Douglas RM. Vitamin C for preventing and treating
the common cold. Cochrane Database Syst Rev 2010;(3):CD000980.
9. Hrobjartsson A, Gøtzsche PC. Placebo interventions for all clinical
conditions. Cochrane Database Syst Rev 2010;(1):CD003974.
10. Gøtzsche PC. Reference bias in reports of drug trials. BMJ
1987;295:654-6.
Competing interests:
None declared
Competing interests: No competing interests
The recently published CONSORT 2010 Statement1 provides laudable
impetus in the ongoing quest of improved reporting of randomised
controlled clinical trials by researchers. However, with increasing
contemporary focus on participant self-determined, self-reported as well
as personally relevant outcomes in clinical trials, it is worthwhile
considering placing due emphasis on patient’s own first hand rather than
clinician impression of adverse symptoms2 during trial participation and
follow up for Item 19 (All important harms or unintended effects in each
group) of the CONSORT 2010 Statement. Furthermore, for specific detailed
guidance on reporting trial-related harm in randomised trials, CONSORT
2010 refers readers to a 2004 extension document3 that similarly did not
discuss the role of patient reported symptoms unmediated by the scrutiny
and judgement of, and documentation by, study outcome assessors. In this
context, outcome assessors will likely under detect, down play or under
document the frequency or severity of patient reported symptoms, more so
according to assessors' prior belief in unblinded trials, thereby delaying
recognition of, and exacerbating the risk of, preventable adverse effects.
4
References:
1. Moher D, Hopewell S, Schultz KF, et al. CONSORT 2010 Explanation and
Elaboration: updated guidelines for reporting parallel group randomised
trials. BMJ 2010; 340: c869.
2. Basch E. The missing voice of patients in drug-safety reporting. N Engl
J Med 2010; 362: 865-73.
3. Ioannidis JP, Evans SJ, Gotzsche PC, et al. Better reporting of harms
in randomized trials: an extension of the CONSORT statement. Ann Intern
Med 2004; 141: 781-8.
4. Pakhomov SV, Jacobsen SJ, Chute CG, Roger VL. Agreement between patient
-reported symptoms and their documentation in the medical record. Am J
Manag Care 2008; 14: 530-9.
Competing interests:
Dr J Ting is a current external postgraduate student at the London School of Hygiene and Tropical Medicine
Competing interests: No competing interests
Proper Interpretation of Trial Results Deserves Attention
[This re-posts previous comments that no longer render correctly in
the new web layout]
Misinterpretation of results is a major problem in reports of trials.
At the same time as the CONSORT 2010 update was published, an article
appeared advocating that reports of completed studies should not include
power calculations [1], because power is irrelevant for interpreting
completed studies and investigators and readers often misinterpret
studies' results by focusing on power and P-values to the exclusion of
estimates and confidence intervals. Although this viewpoint is likely too
drastic for the CONSORT group to adopt, the 2010 CONSORT explanation and
elaboration document [2] could have done more to mitigate this problem.
Misinterpretations are often based on the belief that a calculation
showing high power permits any result with P>0.05 to be interpreted as
strong evidence against any effect large enough to be important. This
occurs in both large and small studies, and even when the effect actually
estimated by the study would be important if it turned out to be correct.
The CONSORT document mentions a form of this problem in the section on
sample size (Item 7a), but not in the section on "Interpretation
consistent with results ..." (Item 22) and without clearly indicating how
such misinterpretation should be avoided. The implication that this is
mainly a problem "when in fact too few patients were studied to make such
a claim" is confusing: this is an interpretation problem, not a sample
size problem. Proper interpretation should take into account the
estimated effect and its confidence interval, rather than focusing only on
whether P<0.05.
Unfortunately, the discussion for Item 22 does not address the issue
at all, and the example it gives is a negative interpretation that does
not mention the estimated treatment effect or its confidence interval; it
only states that the study was "large" and the treatment did not reduce
risk "significantly" (which most readers will understand to mean
P>0.05). This example therefore appears to show exactly the sort of
poor reasoning that should be avoided. The CONSORT document does cite a
paper that advocated use of confidence intervals instead of power for
interpretation [3], but only in support of a puzzling statement that for
completed studies "power is then appropriately indicated by confidence
intervals". Confidence intervals should be used directly for
interpretation instead of power, not as a way to indicate power.
A better example would include support like "the confidence bound
provides evidence against any substantial benefit." Ironically, such well
-supported negative conclusions are sometimes criticized as invalid due to
lack of power [4, 5]. An additional form of misinterpretation is to
conclude that a difference large enough to be important must be present if
the observed difference was statistically significant. This is touched on
in the discussion of Item 17 (confidence intervals), but again not
addressed at Item 22. Recent research [6] found that about 75% of
publications studied included the "significance fallacy", meaning that
they failed to clearly distinguish "statistically significant" versus the
general meaning of "significant" as a synonym for "important".
The CONSORT guidelines recommend reporting estimates and confidence
intervals, and reporting of them has increased [6]. Actual consideration
of them when interpreting study results seems to be much less common.
Such consideration is not as familiar or easy as just noticing whether or
not P<0.05, but it is crucial for valid interpretation. Work has begun
on a wiki webpage to try to make valid interpretation easier
(ctspedia.org/do/view/CTSpedia/ResultsInterpretation). The CONSORT
guidelines would also be an excellent place to emphasize key principles
about how to interpret results.
References
1. Bacchetti P: Current sample size conventions: Flaws, harms, and
alternatives. BMC Medicine 2010, 8.
2. Moher D, Hopewell S, Schulz KF, Montori V, Gotzsche PC, Devereaux
PJ, Elbourne D, Egger M, Altman DG: CONSORT 2010 Explanation and
Elaboration: updated guidelines for reporting parallel group randomised
trials. British Medical Journal 2010, 340:28.
3. Goodman SN, Berlin JA: The use of predicted confidence-intervals
when planning experiments and the misuse of power when interpreting
results. Annals of Internal Medicine 1994, 121(3):200-206.
4. Ford AC, Moayyedi P: "Power" of Selective Serotonin Reuptake
Inhibitors in Irritable Bowel Syndrome. Clinical Gastroenterology and
Hepatology 2010, 8(3):313-314.
5. Bacchetti P, Ladabaum U: "Power" of Selective Serotonin Reuptake
Inhibitors in Irritable Bowel Syndrome - Reply. Clinical Gastroenterology
and Hepatology 2010, 8(3):314-314.
6. Silva-Aycaguer LC, Suarez-Gil P, Fernandez-Somoano A: The null
hypothesis significance test in health sciences research (1995-2006):
statistical analysis and interpretation. BMC Med Res Methodol 2010, 10:44.
Competing interests: No competing interests