Problems with use of composite end points in cardiovascular trials: systematic review of randomised controlled trials
BMJ 2007; 334 doi: https://doi.org/10.1136/bmj.39136.682083.AE (Published 12 April 2007) Cite this as: BMJ 2007;334:786- Ignacio Ferreira-González, research fellow1,
- Gaiet Permanyer-Miralda, senior consultant2,
- Antònia Domingo-Salvany, senior scientist10,
- Jason W Busse, research associate3,
- Diane Heels-Ansdell, statistician3,
- Victor M Montori, associate professor5,
- Elie A Akl, assistant professor6,
- Dianne M Bryant, clinical epidemiologist8,
- Pablo Alonso-Coello, general practitioner9,
- Jordi Alonso, general practitioner10,
- Andrew Worster, associate professor3,
- Suneel Upadhye, associate member3,
- Roman Jaeschke, clinical professor4,
- Holger J Schünemann, associate professor7,
- Valeria Pacheco-Huergo, research fellow1,
- Ping Wu, senior scientist11,
- Edward J Mills, assistant professor12,
- Gordon H Guyatt, professor3
- 1Departament de Medicina, Universitat Autònoma de Barcelona, and Hospital Vall d'Hebron, Barcelona 08035, Spain
- 2Cardiology Service, Epidemiology Unit, Hospital General Vall d'Hebron
- 3Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada L8N 3Z5
- 4Department of Medicine, McMaster University
- 5Knowledge and Encounter Research Unit, Department of Medicine, Mayo Clinic College of Medicine, Rochester, MN 55905, USA
- 6Department of Medicine and of Social and Preventive Medicine, University at Buffalo, Buffalo, New York 14214, USA
- 7Unit of Clinical Research Development and INFORMAtion Translation/CLARITY Research Team, Department of Epidemiology, Italian National Cancer Institute Regina Elena, Rome 00144, Italy
- 8Faculty of Health Sciences, University of Western Ontario, Elborn College, London, ON, Canada N6G 1H1
- 9Iberoamerican Cochrane Centre, Department of Clinical Epidemiology and Public Health, Hospital de Sant Pau, Barcelona 08041
- 10Health Services Research Unit, Institut Municipal d'Investigació Médica (IMIM-hospital del mar), Barcelona E-08003
- 11College of Naturopathic Medicine, Toronto, ON, Canada M2K 1E2
- 12Global Health, Simon Fraser University, Vancouver, BC, Canada V5A 1S6
- Correspondence to: J W Busse j.busse{at}utoronto.ca
- Accepted 29 January 2007
Abstract
Objective To explore the extent to which components of composite end points in randomised controlled trials vary in importance to patients, the frequency of events in the more and less important components, and the extent of variability in the relative risk reductions across components.
Design Systematic review of randomised controlled trials.
Data sources Cardiovascular randomised controlled trials published in the Lancet, Annals of Internal Medicine, Circulation, European Heart Journal, JAMA, and New England Journal of Medicine, from 1 January 2002 to 30 June 2003. Component end points of composite end points were categorised according to importance to patients as fatal, critical, major, moderate, or minor.
Results Of 114 identified randomised controlled trials that included a composite end point of importance to patients, 68% (n=77) reported complete component data for the primary composite end point; almost all (98%; n=112) primary composite end points included a fatal end point. Of 84 composite end points for which component data were available, 54% (n=45) showed large or moderate gradients in both importance to patients and magnitude of effect across components. When analysed by categories of importance to patients, the most important components were associated with lower event rates in the control group (medians of 3.3-3.7% for fatal, critical, and major outcomes; 12.3% for moderate outcomes; and 8.0% for minor outcomes). Components of greater importance to patients were associated with smaller treatment effects than less important ones (relative risk reduction of 8% for death and 33% for components of minor importance to patients).
Conclusion The use of composite end points in cardiovascular trials is frequently complicated by large gradients in importance to patients and in magnitude of the effect of treatment across component end points. Higher event rates and larger treatment effects associated with less important components may result in misleading impressions of the impact of treatment.
Introduction
Composite end points capture the number of patients who have one or more of several events of interest. Clinical trials, particularly in cardiology,1 often use composite end points to reduce sample size requirements and to capture the overall impact of therapeutic interventions.2
Freemantle and colleagues have highlighted potential advantages and limitations of the use of composite end points.1 Although composite end points may increase the event rate and thus the statistical power of the study, they may mislead if component end points are of widely differing importance to patients, the number of events in the components of greater importance is small, and the magnitude of effect differs markedly across components.3 For example, a statement that an intervention reduces a composite of cardiovascular mortality, myocardial infarction, and revascularisation procedures is problematic if most of the events were revascularisation procedures and investigators found a large apparent effect of treatment on revascularisation but not on death or infarction.
To explore the characteristics of composite end points in common use, we reviewed a consecutive sample of randomised controlled trials that investigated cardiology interventions and were published in six prominent journals. In particular, we were interested in the extent to which components of composite end points varied in importance to patients, the frequency of events in the more and less important components, and the extent of variability in the relative risk reductions across components.
Methods
Eligibility criteria
We included parallel group randomised controlled trials that involved humans exposed to any cardiovascular therapeutic intervention and reported at least one composite end point. We defined a cardiovascular clinical trial as any randomised controlled trial in which the target population of the study had to have coronary artery disease, valvular heart disease, arrhythmia, cardiomyopathy, or congestive heart failure on entry. We also included randomised controlled trials investigating primary prevention of cardiovascular disease. We excluded trials that reported composite end points with components relating to toxicity or safety or with no outcomes important to patients (that is, including only surrogate outcomes) or subgroup analyses that ignored random allocation.
Literature search
We used Medline to search electronically four high impact general medicine journals (Lancet, Annals of Internal Medicine, JAMA, and New England Journal of Medicine) and two leading cardiology journals (Circulation and European Heart Journal), from 1 January 2002 to 30 June 2003. We used the publication type function to restrict our search to “randomized controlled trial” and “human” subjects; the National Library of Medicine and the US Cochrane Center have collaborated to use these terms to accurately index randomised controlled trials in the Medline database.4
Study selection
Eight investigators (JWB, EAA, DMB, PA-C, AW, SU, VP-H, and AD-S), working in pairs, used standardised forms to establish if abstracts of articles identified in our electronic search were parallel group randomised controlled trials studying humans and covered a cardiology topic (as defined above). We retrieved the full text of all potentially eligible articles. The same reviewers independently assessed eligibility of the full text articles with standardised forms and resolved discrepancies by discussion. An arbitrator (VMM) resolved any discrepancies that remained.
Data extraction
Seven reviewers (JWB, EAA, DMB, PA-C, AW, SU, and IF-G) trained in health research methods worked in pairs to extract data independently and in duplicate, using a standardised form and a data collection manual. Reviewers collected information on content area and the type of interventions tested, sample size, the length of follow-up, the number of composite end points presented, and the declared source(s) of funding.
To avoid confounding, we explored only data associated with each trial's primary composite end point. When authors reported more than one composite end point, we established the primary one by using the following hierarchy: (a) authors' explicit declaration of primacy, (b) the composite end point used to calculate the sample size, (c) authors' attribution of importance to the composite end point in their description of the results, and (d) the composite end point that appeared first in the methods section. Two reviewers (JWB and IF-G) independently selected the primary composite end point, resolving discrepancies by discussion.
For each composite end point we extracted the number of components, the effect of the intervention on the composite end point, and the number of events attributed to the composite end point. For the component end points of each composite end point we recorded the effect of the intervention and the number of patients who achieved the outcome. When authors reported results from the same composite end point for more than one time point, we used data from the longest interval. A statistician (DH-A) entered all data into an electronic database and reviewed them for errors and missing data.
Ranking of outcomes according to importance to patients
Patients will typically assign varying importance to different health outcomes.5 We sought, but were unable to find, a published hierarchical categorisation of importance to patients for cardiovascular outcomes. Therefore, to explore the variability in importance to patients across components, we developed a hierarchical categorisation of importance to patients for the component end points included in eligible studies. Two cardiologists (GP-M and IF-G) and nine internists (GHG, HJS, VMM, EAA, RJ, JA, VP-H, PA-C, and AD-S) independently categorised each of 72 outcomes used as components of composite end points in the eligible trials into five categories (I to V, in descending order of importance): I=death, II=critical, III=major, IV=moderate, and V=minor. Estimates of utility associated with the outcomes guided the process.6 Group members met to resolve disagreements and succeeded in coming to consensus.
Data analysis
The κ statistic provided a measure of interobserver agreement independent of chance on the eligibility of randomised controlled trials. We calculated, for each of the five categories of outcomes, the median event rate and the interquartile range for the control group as well as the effect of the intervention within the category by using the authors' reporting of relative risk, odds ratio, or hazard ratio. To ensure independence of observations within categories of importance to patients, we selected only one end point in each category for each composite end point to make these calculations; where a composite included more than one end point in the same category, we selected the end point with the highest event rate in the control group. To estimate the effect of the intervention across trials and within each category, we used random effects meta-analyses with an inverse variance approach. This method is conservative, in that it considers both within study and between study differences in estimating the pooled estimate. We used the I2 statistic, the percentage of between study variability that is due to true differences between studies (heterogeneity) rather than sampling error (chance), to quantify inconsistency among trials.7
To describe the gradient of importance to patients among component end points, we considered a large gradient to be present in composite end points combining outcomes from categories I or II (fatal and critical) with outcomes from category V (minor). We considered a moderate gradient to be present when composite end points combined outcomes from categories I or II with outcomes from category IV (moderate) without any component from category V. We assigned a minor or absent gradient to composite end points not included in the other two categories.
We limited analysis of the gradient of efficacy across components to those composite end points that provided data on at least two of their individual end points. We assigned a large gradient in the effect of the intervention if the difference between the smallest and largest reported treatment effects (relative risk, odds ratio, or hazard ratio) was >0.4, a moderate gradient when the difference was 0.2 to 0.4, and a small gradient when the difference was <0.2.
Among composite end points with moderate or large gradients in importance to patients, we explored the impact of the outcomes with less importance to patients on both the total event rate for the composite end point in the control group and the magnitude of effect of the intervention. Our approach was to quantify, in sequence, the impact of adding component end points from importance to patients category III (major end points) and categories IV and V (moderate and minor end points) to end points allocated to categories I and II (fatal and critical end points). This was only possible for those studies that supplied data for each component of the composite in categories I, II, and III (fatal, critical, and major). For each study, we first calculated the event rate in the control group and the relative risk reduction on the basis of a composite of all the end points in categories I and II included in the original composite. We then repeated these calculations for another composite including all end points in categories I, II, and III. When adding the moderate and minor components (categories IV and V), we used the data for the original composite end point reported in the paper to calculate the control event rate and the relative risk reduction. Thus, we did not need component data for end points in categories IV and V.
Calculation of the exact impact would require joint distributions for all the components; because authors did not provide this level of detail, we made estimations by using a conservative approach to assess the impact of the outcomes of moderate and minor importance to patients. To establish the effect on the event rate for the control group, we estimated the impact of fatal and critical end points under the assumption that no patient had both a critical and a fatal event or more than one critical event. For instance, if the rate of death for the control group was 1% and that of large stroke was 2%, we calculated an event rate for the end points within importance to patients categories I and II of 3%. We then estimated the effect of adding the events associated with end points grouped in category III of importance to patients, again assuming mutually exclusive events, to the more serious events. Thus, if the rate of non-fatal myocardial infarctions was 2%, the event rate for the control group would increase from 3% to 5%. We considered the end points grouped in categories IV and V of importance to patients to account for the total composite end point event rate left unaccounted. Thus, if the composite end point event rate for the control group was 10%, the effect of adding the less important outcomes (categories IV and V) would increase the control event rate from 5% to 10%.
A similar approach allowed assessment of the impact of outcomes grouped according to importance to patients on the effect of the intervention. We calculated the median and associated interquartile range for both the control group event rate and the effect of the intervention. We calculated a test of proportions (χ2 test) to explore associations between gradients in either importance to patients or of the effect of treatment on components within composite end points by declared source of funding (industry versus non-industry funded). We used SAS version 9.1 and S-PLUS version 6.2 (Insightful Corporation, Seattle, Washington) for analyses; we chose a 5% threshold for statistical significance for all analyses.
Results
Results of literature search
Our literature search generated 650 abstracts, from which we identified 242 potentially eligible randomised controlled trials, of which 114 proved eligible on consensus review of the full text publications (fig 1⇓). Chance corrected agreement on eligibility was excellent (κ=0.90). Thus, approximately half of all parallel group cardiovascular randomised controlled trials identified reported an eligible composite end point.
Study characteristics
Table 1⇓ describes the characteristics of the included studies. Most trials appeared in Circulation, the Lancet, and JAMA; focused on treatment of coronary disease primarily through pharmacological intervention; and reported only one primary composite end point. Almost all (98%) composite end points included fatal end points, usually reported as “all cause mortality” (table 2⇓). The median sample size of eligible randomised controlled trials was 840 (interquartile range 238-2334), and the median follow-up time was 1 year (90 days-3.5 years). When a composite end point included only two components, reporting of individual event rates and the composite end point rate made the joint distribution apparent; no eligible trials with three or more components reported the joint distribution of component outcomes.
Most trials (69%; n=79) declared either direct financial industry funding (n=74) or industry having supplied the drug or device under investigation (n=5). Authors of 15 (13%) trials declared not for profit funding alone, and 20 (18%) did not declare a funding source. Of the 74 trials that declared industry funding, 27 trials also reported funding by not for profit sources.
Gradient in importance to patients and effect of intervention across components
Most composite end points (56%; n=64) showed either a large (10%; n=11) or moderate (47%; n=53) gradient in importance to patients. Among the 84 composite end points that reported data for at least two of their component end points, the gradient in the effect of the intervention across component end points was usually large (57%; n=48) or moderate (18%; n=15). Of these 84 randomised controlled trials, 45 (54%) included a composite end point with components that exhibited large or moderate gradients in both importance to patients and effect of intervention across components. Many remaining composite end points (32%; n=27) included a large or moderate gradient in either importance to patients (11%; n=9) or treatment effect across components (21%; n=18). Only 14% (n=12) of composite end points reviewed were composed of end points that exhibited either no gradient or a minor gradient in both importance to patients and the effect of treatment across components. Declared industry funding versus non-industry funding was not significantly associated with either the gradient in importance to patients or in the gradient of effect of treatment across components among composite end points.
Effect of end points of moderate and minor importance to patients
When analysed by categories of importance to patients, the most important components were associated with lower control group event rates, with medians of 3.3% (interquartile range 1.4-6.9%) for fatal outcomes, 3.3% (2.2-5.2%) for critical end points, and 3.7% (1.6-8.5%) for major outcomes. End points of moderate and minor importance to patients had higher event rates: median 12.3% (2.9-26.7%) for moderate and 8.0% (4.5-26.8%) for minor. Similarly, we found that pooled effects for fatal and critical outcomes were small, and end points of lesser importance to patients were associated with larger effects (fig 2⇓).
Of 64 composite end points with moderate or large gradients in importance to patients, 46 had sufficient data to quantify the impact of the less important end points on both the event rates for the composite end point in the control group and the effect of the intervention on the composite end point. The median event rate for the control group when we considered only the most important patient outcomes (fatal and critical end points) was 2.5%. The event rate rose to 8.7% when we added end points of major importance to the composite and to 21.7% when we added end points of moderate and minor importance (table 3⇓). The magnitude of the treatment effect also rose substantially as less important components were included (table 3⇓). Of the 46 composite end points with a large or moderate gradient in importance to patients that provided data on component end points, 59% (n=27) were statistically significant (P<0.05). However, only seven achieved statistical significance when we considered only components of greater importance to patients (fatal, critical, and major end points), whereas most (20/27) achieved statistical significance only when we added end points of moderate or minor importance to the composite.
Our results suggest that a naïve interpretation of composite end points may lead clinicians to overestimate the impact of treatments on preventing adverse events that matter most to patients. Consider, for example, the following statement: “In patients with in-stent stenosis of coronary artery bypass grafts, γ radiation reduced the composite end point of death from cardiac causes, Q wave myocardial infarction, and revascularisation of the target vessel.” This result sounds impressive because it suggests that γ radiation reduces the incidence of death and myocardial infarction, as well as the need for revascularisation. The trial that produced this result randomised 120 patients with in-stent stenosis of a saphenous vein graft to γ radiation (iridium-192) or placebo.8 Death or myocardial infarction contributed only 6/43 events in the placebo arm and 5/22 events in the iridium-192 arm. The investigators have shown the impact of the intervention on revascularisation. The trial provides, however, essentially no information about the effect of the intervention on myocardial infarction or death.
Consider another example—the irbesartan diabetic nephropathy trial that randomised 1715 hypertensive patients with nephropathy and type 2 diabetes to irbesartan, amlodipine, or placebo.9 Results showed a benefit of irbesartan over amlodipine in the primary end point, a composite of a doubling of the baseline serum creatinine concentration, the onset of end stage renal disease (serum creatinine >6.0 mg/dl, initiation of dialysis, or transplantation), or death from any cause. Doubling of serum creatinine provided most events and was the only outcome for which irbesartan was convincingly beneficial. Indeed, in this instance, irbesartan both lowered the incidence of doubling of creatinine and showed a trend towards reduction in end stage renal disease, but it showed a slight trend towards increased all cause mortality (fig 3⇓). These examples highlight the challenges that clinicians face when making decisions on the basis of the results of cardiovascular trials that report composite end points.
Discussion
Findings
Our analysis of a sample of 114 randomised controlled trials on cardiovascular interventions found that the use of composite end points is common. Reporting is not optimal: authors failed to provide the effect of treatment for all the components in almost one third of the articles. Most randomised controlled trials showed a large or moderate gradient in importance of end points to patients, and in 54% of the 84 trials in which data were available the component end points exhibited substantial gradients in both importance to patients and the effect of treatment across components. Less important components showed higher event rates and larger treatment effects.
Limitations and strengths
Our review has potential limitations. Our conclusions depend on clinicians assigning importance to patients to cardiovascular end points, a challenging process. Previous analyses have considered component end points in one of two categories, fatal or non-fatal1; patients are, however, unlikely to attach similar importance to all non-fatal end points. Several authors have suggested weighting end points to reflect their relative importance when constructing composite end points,10 11 12 13 14 and Lubsen and Kirwan have outlined a theoretical classification scheme for ranking components of composite end points.15 Trialists, however, rarely use these strategies. Published data examining the utilities that patients attribute to a variety of cardiovascular outcomes guided our classification,6 and 11 clinicians knowledgeable in cardiovascular care worked independently in generating the classification and were able to achieve consensus. Our decisions on categorisation are available (table 2⇑ provides examples), and readers can independently judge their credibility.
Our analysis of the effect of treatment across components was limited to trials that reported data on component end points in categories I (fatal) to III (major) of importance to patients. This may have led to a biased sample. Our analytic approach was, however, conservative in that when three or more components were present and the joint distribution of results was unavailable we assumed distributions that attributed the maximum number of events to the end points of greater importance to patients.
One might question our application of meta-analytic approaches to data across a wide variety of interventions (fig 2⇑). The variability in results, represented by the I2, proved to be 0% for fatal end points and was below a commonly used threshold of 50% for other end points.7 The increasing treatment effect with decreasing importance seems to be a real phenomenon.
Our work has additional strengths. Our sample of 114 composite end points is the result of a systematic search completed in duplicate with excellent agreement. Our data collection was comprehensive and careful, including independent judgment and abstraction of data at all stages by reviewers trained in the methodology and use of targeted, relevant analyses. We excluded composite end points containing efficacy and safety outcomes because their inclusion would have overestimated the proportion of composite end points with large gradients in the effect of the treatment. We included only one composite end point for each randomised controlled trial to avoid clustering, as multiple reported composite end points within a single trial commonly share components.
Implications
Use of composite end points appeals to clinical trialists because it increases event rates and statistical power. The fundamental problem with composite end points is, however, the difficulty in interpreting results when the gradient of importance to patients is substantial and a substantial gradient in the magnitude of the treatment effect also exists. Conversely, confident interpretation of composite end point results requires relatively small gradients of importance to patients and similar relative risk reductions across components.3 Our findings suggest that most composite end points used in cardiovascular randomised controlled trials have substantial gradients in both importance to patients and treatment effects across component end points. Furthermore, less important outcomes provide larger contributions to the composite end point event rate and show larger treatment effects. In particular, mortality outcomes, present in almost all cardiovascular composite end points, provide the lowest event rate and show the smallest treatment effects. Thus, an important and plausible risk of misleading conclusions associated with the use of composite end points is to attribute reductions in mortality to interventions that do not, in fact, reduce death rates.
The common use of inadequately reported composite end points with large gradients in importance to patients, in which end points of least importance contribute most events, and in which treatment fundamentally affects these same components, is problematic. Trialists should report complete data on individual component end points to facilitate appropriate interpretation; clinicians should view with caution the results of cardiovascular trials that use composite end points to report their results. Clinicians and patients are best served when trialists restrict their use of composite end points to end points of similar importance to patients and contexts in which they anticipate that more important end points will contribute a large proportion of study events. If they do not, they risk misleading their audience.
What is already known on this topic
Clinical trialists use composite end points, outcomes that capture the number of patients who have one or more of several events, to increase event rates and statistical power
When the gradient of importance to patients is large, and the more important events are uncommon and show negligible treatment effects, use of composite end points can be misleading
What this study adds
Almost half of a sample of recent prominently published cardiovascular trials used composite end points, which were often inadequately reported and showed large gradients in importance to patients
End points of least importance to patients typically contributed most events
Composite end points, as currently used in cardiovascular trials, may often be misleading
Footnotes
We thank Lisa Buckingham for assistance in creating the database used to capture the information extracted from eligible trials.
Contributors: IF-G, JWB, GP-M, VMM, and GHG were involved in the study design and concept. JWB, EAA, IF-G, DMB, PA-C, AW, and SU collected all data. DH-A, IF-G, JWB, and GHG did the analysis. All authors offered critical revisions to the manuscript, and all approved the final version. GHG is the guarantor.
Funding: Data collection was partially funded by Carlos III Spanish Institute of Health Research (FIS; CIBER: Network of Medical Research Centres). IF-G and PA-C are partially funded by Carlos III Spanish Institute of Health Research fellowship award (FIS). JWB is funded by a Canadian Institutes of Health Research fellowship award. VMM is a Mayo Foundation scholar. These sources did not play any other role and there were no other funding sources for this work.
Competing interests: None declared.
Ethics approval: Not needed.