Jump to: Page Content, Site Navigation, Site Search,
You are seeing this message because your web browser does not support basic web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.
Peter Jüni a Department of
Social and Preventive Medicine, University of Bern, Bern, 3012 Switzerland, b Imperial Cancer Research Fund Medical Statistics Group, Centre
for Statistics in Medicine, Institute of Health Sciences, Oxford OX3
7LF, c Medical Research Council Health Services Research
Collaboration, Department of Social Medicine, University of Bristol,
Bristol BS8 2PR
Correspondence to: M
Egger m.egger{at}bristol.ac.uk
The quality of controlled trials is of obvious relevance to
systematic reviews. If the "raw material" is flawed then the
conclusions of systematic reviews cannot be trusted. Many reviewers
formally assess the quality of primary trials by following the
recommendations of the Cochrane Collaboration and other
experts.
1 2
However, the methodology for both the
assessment of quality and its incorporation into systematic reviews and
meta-analysis are a matter of ongoing debate.3-5 In this
article we discuss the concept of study quality and the methods used to
assess quality.
Internal validity
Components of internal and external validity of controlled
clinical trials
extent to which systematic
error (bias) is minimised in clinical trials
extent to which results
of trials provide a correct basis for generalisation to other
circumstances
Quality is a multidimensional concept, which could relate to the design, conduct, and analysis of a trial, its clinical relevance, or quality of reporting.6 The validity of the findings generated by a study clearly is an important dimension of quality. In the 1950s the social scientist Campbell proposed a useful distinction between internal and external validity (see box below). 7 8 Internal validity implies that the differences observed between groups of patients allocated to different interventions may, apart from random error, be attributed to the treatment under investigation. In contrast, external validity, or generalisability, is the extent to which the results of a study provide a correct basis for generalisations to other circumstances. In itself, there is no external validity. The term is only meaningful with regard to specified "external" conditions, such as other patient populations or treatment regimens. Internal validity is a prerequisite for external validity: the results of a flawed trial are invalid, and the question of its external validity becomes redundant.
|
Summary points
|
| |
Dimensions of internal validity |
|---|
Internal validity is threatened by bias, "any process at any stage of inference tending to produce results that differ systematically from the true values."9 In clinical trials, biases fall into four categories: selection bias, performance bias, detection bias, and attrition bias (box).
Selection bias
The aim of randomisation is the creation of groups that are
comparable for any known or unknown potential confounding
factors.10 Success depends on two interrelated procedures (see box above).11 Firstly, an allocation sequence that is
suitable to prevent selection bias must be generated
for example, by
using a computer algorithm, tossing a coin, or throwing a dice.
Secondly, this sequence must be concealed from investigators enrolling
patients. Knowledge of assignments
for example, from a table of random
numbers posted on a bulletin board
can cause selective enrolment of
patients on the basis of prognostic factors.12 Patients
who would have been assigned to a treatment deemed to be
"inappropriate" may be rejected, and some patients may be
deliberately directed to the "appropriate"
treatment.13 Deciphering of allocation schedules may occur
even if concealment was attempted. For example, envelopes may be opened
or held against a bright light to reveal the
contents.14
|
The two interrelated steps of randomisation
Generation of allocation sequences
|
Performance bias and detection bias
Performance bias occurs if additional treatment interventions are
provided preferentially to one group. Blinding of patients and care
providers prevents this type of bias and also safeguards against
differences in placebo responses between the groups. Detection bias
arises if the knowledge of patient assignment influences the assessment
of outcome.15 This is avoided by the blinding of those
assessing outcomes
for example, patients, care providers,
radiologists, or end point review committees (box).
Attrition bias
Deviations from protocol and loss to follow up often lead to the
exclusion of patients after they have been allocated to treatment
groups, which may introduce attrition bias. Possible deviations from
protocol include the violation of eligibility criteria and
non-adherence to treatments. Loss to follow up refers to patients
becoming unavailable for examinations at some stage during the study
period because they refuse to participate further (also called drop
outs), cannot be contacted, or clinical decisions are made to stop the
assigned interventions.
| |
Empirical evidence of bias |
|---|
Numerous case studies show that the biases described above do occur in practice, distorting the results of clinical trials.6 The authors are aware of four methodological studies that have gauged their relative importance in a large number of clinical trials while avoiding confounding by disease or intervention.20-23 The figure shows a meta-analysis of the results from these studies. Inadequate or unclear concealment of treatment allocation was associated with an exaggeration of treatment effects in all four studies. Odds ratios from trials with inadequate or unclear concealment were on average 30% lower (more beneficial) than those from trials with adequate methodology (combined ratio of odds ratios 0.70, 95% confidence interval 0.62 to 0.80). The inappropriate generation of allocation sequences was assessed in three studies only and was not consistently associated with treatment effects, although an effect was evident in the study from Denmark (figure). 20 21 23 Interestingly, when only trials with adequate concealment of allocation were analysed in Schulz et al's study, those with an inadequate generation of allocation sequences did yield inflated treatment effects.20 This indicates that if assignments are predictable some deciphering can occur, even with adequate concealment. On the other hand, the generation of unbiased sequences is probably irrelevant if the sequences are not concealed from those involved in the enrolment of patients.13
|
Results for double blinding were more heterogeneous: the two larger
studies
20 22
found that estimates were on average
moderately biased in open trials, whereas one of the two smaller
studies showed no effect,21 and the other showed
substantial bias associated with lack of double blinding
(figure).23 To some extent the importance of blinding
depends on the outcomes assessed. In some situations
for example, when
examining the effect of an intervention on overall
mortality
blinding of outcome assessment is irrelevant. Differences in
the type of outcomes examined could thus explain the discrepancy
between the studies.
Furthermore, investigators' understanding of who exactly should be
blinded in double blind trials varies,24 and this may also
introduce heterogeneity. Two studies addressed attrition bias but used
different definitions. Schulz et al compared trials that reported
exclusions with trials that either explicitly reported no exclusions or
gave the impression that no exclusions had taken place.20
In contrast, Kjaergard et al compared trials that reported adequately
on attrition (independent of whether exclusions occurred) to trials
with inadequate reporting.23 Schulz et al found little difference in effect estimates (ratio of odds ratios 1.07, 95% confidence interval 0.94 to 1.21) whereas Kjaergard et al found a trend
towards larger effect estimates in trials with adequate reporting
(ratio of odds ratios 1.50, 0.80 to 2.78).
20 23
The methods used to assess attrition were unsatisfactory in both of these
studies. Future research in this area should distinguish between
quality of reporting and methodological quality and consider that some
exclusions and losses to follow up may be unavoidable whereas others
are clearly inappropriate.
| |
Dimensions of external validity |
|---|
External validity relates to the applicability of the results of a
study to other "populations, settings, treatment variables, and
measurement variables".8 External validity is a matter of judgment, which depends on the characteristics of the patients included in the trial, the setting, the treatment regimens, and the
outcomes assessed (box).8 In recent years large
meta-analyses based on data from individual patients have shown that
important differences in treatment effects may exist between patient
groups and settings. For example, antihypertensive treatment reduces total mortality in middle aged patients with hypertension, but this may
not be the case in elderly people.25 The benefits of fibrinolytic treatment in suspected acute myocardial infarction has
been shown to decrease linearly with the delay between the start of
symptoms and the initiation of treatment.26 In trials of
cholesterol lowering drugs the benefits of a reduction in non-fatal myocardial infarction and mortality due to coronary heart disease depends on the reduction in total cholesterol concentration and the
duration of follow up.27 At the very least, therefore,
assessment of a trial's applicability requires adequate information
about the characteristics of the participants.
| |
Quality of reporting |
|---|
The assessment of the methodological quality of a trial is
intertwined with the quality of reporting
that is, the extent to which
a report provides information about the design, conduct, and analysis
of the trial.4 Reports often omit important methodological details. For example, only 1 of 122 randomised trials of selective serotonin reuptake inhibitors specified the method of
randomisation.28 A widely used approach to this problem is
to assume that the quality was inadequate unless the information to the
contrary is provided (the "guilty until proved innocent" approach).
This is often justified because faulty reporting generally reflects
faulty methods.
20 29
A well conducted but badly
reported trial will, however, be misclassified. An alternative approach
is to explicitly assess the quality of the reporting rather than the
adequacy of the methods. This is also problematic because a biased but
well reported trial will receive full credit.30 The
adoption of guidelines on the reporting of clinical trials has recently
improved this situation for several journals,
31 32
but
deficiencies in reporting will continue to be confused with
deficiencies in design, conduct, and analysis.
| |
Assessing trial quality |
|---|
How the quality of trials should be assessed is being debated. Quality scales combine information on several features in a single numerical value, whereas the component approach examines key dimensions individually, without calculation of a score. Moher et al reviewed the use of quality scores in systematic reviews published in medical journals and the Cochrane database of systematic reviews.33 Trial quality was assessed in 78 (38%) of the 204 reviews from journals, of which 20 (26%) used components and 52 (67%) used scales. By contrast, all 36 reviews from the database assessed quality, of which 33 (92%) used components and none used scales.
Scales vary considerably in dimensions covered and complexity.4 Many scales include items for which there is little evidence that they are related to the internal validity of a trial. For example, a widely used instrument includes items related to the presentation of data and the organisation of the trial.34 Unsurprisingly, different scales can lead to discordant results. This was shown in a study in which 25 different scales were used to assess 17 trials comparing low molecular weight heparin with standard heparin for thromboprophylaxis.5 With some scales, the relative risks of the "high quality" trials were close to unity and not statistically significant, indicating that low molecular weight heparin was not superior to standard heparin, whereas the "low quality" trials assessed by these scales showed better protection with the low molecular weight heparin. With other scales the opposite was the case: high quality trials indicated that low molecular weight heparin was superior to standard heparin, whereas low quality trials found no significant difference.5
When the association of effect estimates with quality scores is examined, interpretation of results is difficult. In the absence of an association there are three possible explanations35: there is no association with any of the components; there are associations with one or several components, but these components have so little weight that the effects are lost in the summary score; or there are associations with two or more components, but these cancel out so that no association is found with the overall score. On the other hand, if treatment effects do vary with quality scores then investigators will have to identify the component or components that are responsible for this association to interpret this finding.
The analysis of individual components of trial quality overcomes many of the shortcomings of composite scores. The component approach takes into account that the importance of individual quality domains, and the direction of potential biases associated with these domains, varies between the contexts in which trials are performed.
![]() |
| (Credit: MARK OLDROYD) |
| |
Incorporating study quality into meta-analysis |
|---|
It makes intuitive sense to take into account information on the
quality of studies when doing systematic reviews. One approach is to
exclude trials that fail to meet some standard of quality. This may
often be justified but could exclude studies that might contribute
valid information. It may therefore be prudent to exclude only trials
with gross deficiencies in design
for example, those that clearly
failed to study comparable groups. The possible influence of study
quality on effect estimates should, however, always be examined in a
given set of included studies. Several approaches have been proposed
for this purpose.
Quality as a weight in statistical pooling
The most radical approach is to directly incorporate information
on study quality as weighting factors in the analysis. Study weights
can be multiplied by quality scores, thus increasing the weight of
trials deemed to be of high quality and decreasing the weight of those
of low quality.
3 21
A trial with a quality score of 40 out of 100 will thus get the same weight in the analysis as a trial
with half the amount of information but a quality score of 80.
Sensitivity analysis
The robustness of the findings of a meta-analysis to different
assumptions should always be examined in a thorough sensitivity
analysis. An assessment of the influence of methodological quality
should be part of this process. Simple stratified analyses and
meta-regression models are useful for exploring associations between
treatment effects and study characteristics. Quality summary scores or
categorical data on individual components can be used for this purpose.
For the reasons discussed the authors recommend that sensitivity
analysis should be based on the components of study quality that are
considered important in the context of a given meta-analysis. Other
approaches, such as plotting effect estimates against quality scores or
performing cumulative meta-analysis in order of quality, are also
affected by the problems surrounding composite
scales.
3 36
Conclusions
There is ample evidence that many trials are methodologically weak and increasing evidence that deficiencies translate into biased findings of systematic reviews. The assessment of
the methodological quality of controlled trials and the conduct of
sensitivity analyses should therefore be considered routine procedures
in systematic reviews and meta-analysis. Although composite quality
scales may provide a useful overall assessment when comparing populations of trials, such scales should generally not be used to
identify trials of apparent low quality or high quality in a given
systematic review. Rather, the relevant methodological aspects should
be identified a priori and assessed individually. This should include
the generation and concealment of treatment allocation, blinding, and
handling of attrition in the analysis. Other ways of investigating and
dealing with bias in systematic reviews will be discussed and
illustrated later in this series.37
|
| |
Acknowledgments |
|---|
We thank Ken Schulz and Lise Kjaergard for unpublished data and Iain Chalmers for useful comments on an earlier version of this paper.
| |
Footnotes |
|---|
Series editor: Matthias Egger
Funding: PJ is supported by the Swiss National Science Foundation. The work on trial quality in Bristol was supported by the NHS Research and Development Programme.
Competing interests: None declared.
| |
References |
|---|
| 1. | Clarke M, Oxman AD, eds. Cochrane reviewers' handbook 4.0. In: Cochrane Collaboration. Cochrane Library. Oxford: Update Software, 1999. |
| 2. | Cook DJ, Sackett DL, Spitzer WO. Methodologic guidelines for systematic reviews of randomized control trials in health care from the Potsdam consultation on meta-analysis. J Clin Epidemiol 1995; 48: 167-171[CrossRef][Medline]. |
| 3. | Detsky AS, Naylor CD, O'Rourke K, McGeer AJ, L'Abbé KA. Incorporating variations in the quality of individual randomized trials into meta-analysis. J Clin Epidemiol 1992; 45: 255-265[CrossRef][Medline]. |
| 4. | Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Controlled Clin Trials 1995; 16: 62-73[Medline]. |
| 5. |
Jüni P, Witschi A, Bloch R, Egger M.
The hazards of scoring the quality of clinical trial for meta-analysis.
JAMA
1999;
282:
1054-1060 |
| 6. | Jüni P, Altman DG, Egger M. Assessing the quality of controlled clinical trials. In: Egger M, Davey Smith G, Altman DG, eds. Systematic reviews in health care: meta-analysis in context 2nd ed. London: BMJ Books, 2001. |
| 7. | Campbell DT. Factors relevant to the validity of experiments in social settings. Psychol Bull 1957; 54: 297-312[CrossRef][Medline]. |
| 8. | Campbell DT, Stanley JC. Experimental and quasi-experimental designs for research on teaching. In: Gage NL, ed. Handbook of research on teaching. Chicago: Rand McNally, 1963:171-246. |
| 9. | Murphy EA. The logic of medicine. Baltimore: Johns Hopkins University Press, 1976. |
| 10. |
Altman DG, Bland JM.
Treatment allocation in controlled trials: why randomise?
BMJ
1999;
318:
1209 |
| 11. | Altman DG. Randomisation. Essential for reducing bias. BMJ 1991; 302: 1481-1482. |
| 12. | Keirse MJ. Amniotomy or oxytocin for induction of labor. Re-analysis of a randomized controlled trial. Acta Obstet Gynecol Scand 1988; 67: 731-735[Medline]. |
| 13. | Schulz KF. Randomised trials, human nature, and reporting guidelines. Lancet 1996; 348: 596-598[CrossRef][Medline]. |
| 14. | Schulz KF. Subverting randomization in controlled trials. JAMA 1995; 274: 1456-1458[Abstract]. |
| 15. |
Noseworthy JH, Ebers GC, Vandervoort MK, Farquhar RE, Yetisir E, Roberts R.
The impact of blinding on the results of a randomized, placebo-controlled multiple sclerosis clinical trial.
Neurology
1994;
44:
16-20 |
| 16. | Sackett DL, Gent M. Controversy in counting and attributing events in clinical trials. N Engl J Med 1979; 301: 1410-1412[Medline]. |
| 17. | Coronary Drug Project Research Group. Influence of adherence to treatment and response of cholesterol on mortality in the CDP. N Engl J Med 1980; 303: 1038-1041[Abstract]. |
| 18. |
May GS, Demets DL, Friedman LM, Furberg C, Passamani E.
The randomized clinical trial: bias in analysis.
Circulation
1981;
64:
669-673 |
| 19. |
Hollis S, Campbell F.
What is meant by intention to treat analysis? Survey of published randomised controlled trials.
BMJ
1999;
319:
670-674 |
| 20. | Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995; 273: 408-412[Abstract]. |
| 21. | Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 1998; 352: 609-613[CrossRef][Medline]. |
| 22. | Jüni P, Tallon D, Egger M. `Garbage in - garbage out'? Assessment of the quality of controlled trials in meta-analyses published in leading journals. In: Proceedings of the 3rd symposium on systematic reviews: beyond the basics, St Catherine's College, Oxford. Oxford: Centre for Statistics in Medicine, 2000:19. |
| 23. | Kjaergard LL, Villumsen J, Gluud C. Quality of randomised clinical trials affects estimates of intervention efficacy. In: Proceedings of the 7th Cochrane colloquium. Universita S.Tommaso D'Aquino, Rome. Milan: Centro Cochrane Italiano, 1999:57 (poster B10). |
| 24. |
Devereaux PJ, Manns BJ, Ghali WA, Quan H, Lacchetti C, Mouton VM, et al.
Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials.
JAMA
2001;
285:
2000-2003 |
| 25. | Gueyffier F, Bulpitt C, Boissel JP, Schron E, Ekbom T, Fagard R, et al. Antihypertensive drugs in very old people: a subgroup meta-analysis of randomised controlled trials. Lancet 1999; 353: 796. |
| 26. | Fibrinolytic Therapy Trialists' (FTT) Collaborative Group. Indications for fibrinolytic therapy in suspected acute myocardial infarction: collaborative overview of early mortality and major morbidity results from all randomised trials of more than 1000 patients. Lancet 1994; 343: 311-322[CrossRef][Medline]. |
| 27. | Thompson SG. Controversies in meta-analysis: the case of the trials of serum cholesterol reduction. Stat Methods Med Res 1993; 2: 173-192[Medline]. |
| 28. |
Hotopf M, Lewis G, Normand C.
Putting trials on trial the costs and consequences of small trials in depression: a systematic review of methodology.
J Epidemiol Community Health
1997;
51:
354-358[Abstract].
|
| 29. |
Liberati A, Himel HN, Chalmers TC.
A quality assessment of randomized control trials of primary treatment of breast cancer.
J Clin Oncol
1986;
4:
942-951 |
| 30. | Feinstein AR. Meta-analysis: statistical alchemy for the 21st century. J Clin Epidemiol 1995; 48: 71-79[CrossRef][Medline]. |
| 31. |
Moher D, Jones A, Lepage L.
Use of the CONSORT statement and quality of reports of randomized trials.
JAMA
2001;
285:
1987-1991 |
| 32. |
Egger M, Jüni P, Bartlett C.
Value of flow diagrams in reports of randomized controlled trials.
JAMA
2001;
285:
1996-1999 |
| 33. | Moher D, Cook DJ, Jadad AR, Tugwell P, Moher M, Jones A, et al. Assessing the quality of reports of randomised trials: implications for the conduct of meta-analyses. Health Technol Assess 1999;i3(12). |
| 34. | Chalmers TC, Smith H, Blackburn B, Silverman B, Schroeder B, Reitman D, et al. A method for assessing the quality of a randomized control trial. Controlled Clin Trials 1981; 2: 31-49[CrossRef][Medline]. |
| 35. |
Greenland S.
Quality scores are useless and potentially misleading.
Am J Epidemiol
1994;
140:
300-302 |
| 36. | Linde K, Scholz M, Ramirez G, Clausius N, Melchart D, Jonas WB. Impact of study quality on outcome in placebo-controlled trials of homeopathy. J Clin Epidemiol 1999; 52: 631-636[CrossRef][Medline]. |
| 37. | Sterne JAC, Egger M, Davey Smith G. Investigating and dealing with publication and other biases in meta-analysis. BMJ 2001 (in press). |
Read all Rapid Responses
Israeli students are refusing to perform intimate examinations on anaesthetised women without their informed consent.