- Ioanna Tzoulaki, lecturer1,
- Konstantinos C M Siontis, research associate1,
- John P A Ioannidis, professor2
- 1Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece
- 2Stanford Prevention Research Center, Department of Medicine, Stanford University School of Medicine, Stanford 94305, USA
- Correspondence to: J P A Ioannidis
- Accepted 22 September 2011
Objective To compare the reported effect sizes of cardiovascular biomarkers in datasets from observational studies with those in datasets from randomised controlled trials.
Design Review of meta-analyses.
Study selection Meta-analyses of emerging cardiovascular biomarkers (not part of the Framingham risk score) that included datasets from at least one observational study and at least one randomised controlled trial were identified through Medline (last update, January 2011).
Data extraction Study-specific risk ratios were extracted from all identified meta-analyses and synthesised with random effects for (a) all studies, and (b) separately for observational and for randomised controlled trial populations for comparison.
Results 31 eligible meta-analyses were identified. For seven major biomarkers (C reactive protein, non-HDL cholesterol, lipoprotein(a), post-load glucose, fibrinogen, B-type natriuretic peptide, and troponins), the prognostic effect was significantly stronger in datasets from observational studies than in datasets from randomised controlled trials. For five of the biomarkers the effect was less than half as strong in the randomised controlled trial datasets. Across all 31 meta-analyses, on average datasets from observational studies suggested larger prognostic effects than those from randomised controlled trials; from a random effects meta-analysis, the estimated average difference in the effect size was 24% (95% CI 7% to 40%) of the overall biomarker effect.
Conclusions Cardiovascular biomarkers often have less promising results in the evidence derived from randomised controlled trials than from observational studies.
A plethora of novel biomarkers are being examined as predictors of cardiovascular outcomes.1 2 However, concerns have been raised on the reported effect sizes and on their claims of improved prediction, as several biases may inflate the observed associations.3 4 The population samples used to test these prognostic associations come primarily from traditional observational epidemiological studies, including cohort and case-control studies. However, some evidence may be available also from datasets of participants of randomised controlled trials. Randomised controlled trials would still be analysed as observational datasets in this scenario, but would they show the same results as analyses from traditional epidemiological studies?
Epidemiological studies may differ from randomised controlled trials in several ways, including the risk of confounding, the extent of susceptibility to publication and selective reporting biases,5 and the characteristics of enrolled participants. Because of such differences, observational studies often give larger estimates for the size of treatment benefits of interventions.6 7 8 9 However, it is unknown whether differences exist between the two types of study design when biomarker effects, rather than treatment effects, are assessed (and, if so, in which direction). This is important, since the eventual clinical use of these biomarkers depends on whether the prognostic effects are large enough to make some meaningful impact on risk classification and decision making for treatment or preventive interventions.
We aimed here to assess a comprehensive sample of published meta-analyses of biomarkers in relation to cardiovascular outcomes and compare the reported effect sizes in datasets from observational studies and from randomised controlled trials.
Literature search and eligibility of articles
We assembled meta-analyses that examined any emerging biomarker, defined as any biological parameter10 other than those included in the Framingham risk score,11 in relation to cardiovascular disease, coronary heart disease, or cardiovascular mortality. Biomarkers were eligible regardless of whether they were derived from blood, urine, tissue, or imaging. We excluded meta-analyses of single common genetic variants, since these markers are known to have very limited prognostic ability when examined in isolation,12 but multi-gene scores were eligible.
We used three different approaches to collect a comprehensive sample of biomarker meta-analyses indexed in Medline (no year restriction, last update 15 January 2011). First, we used the algorithm “(Biological marker[MeSH terms]) AND (cardiovascular OR coronary [Title/Abstract])” limited to meta-analyses, English language, and human studies. Second, we performed targeted Medline searches for meta-analyses of 71 additional specific emerging biomarkers included in recent comprehensive reviews10 13 14 with the same algorithm but using each biomarker’s name instead of the generic “Biological marker” that may not index some otherwise eligible biomarkers (see web extra table A on bmj.com for full list of biomarkers searched). We first perused the title and abstract of each of these citations, and potentially eligible articles were then retrieved for perusal in full text. Finally, we identified all meta-analyses of individual participant data published by major consortia operating in the specialty (Emerging Risk Factor Collaboration, Fibrinogen Studies Collaboration, Ankle Brachial Index Collaboration).
Articles were eligible if they included at least one meta-analysis examining the association between an eligible biomarker with an eligible outcome and containing data from at least one dataset from an observational study and one from a randomised controlled trial, which was subsequently analysed as an observational study. We included studies regardless of the baseline characteristics (clinical setting) of the examined populations. If an article presented separate meta-analyses on more than one eligible biomarker or outcome or on participants with different clinical settings, these meta-analyses were kept separate. Finally, meta-analyses were eligible regardless of whether the included studies used adjustment for some covariates or score (such as the Framingham risk score) or tested for association in unadjusted analyses. When we identified more than one meta-analysis examining the same biomarker and same outcome on the same clinical setting, we kept only the most recent one with eligible data. We accepted meta-analyses regardless of whether they were meta-analyses of the published literature or of individual participant data.
Data extraction was performed independently by two investigators, and discrepancies were solved by discussion. From each eligible meta-analysis, we recorded the first author, journal, year of publication, number of studies in the meta-analysis, the biomarker examined, risk factors or score used for covariate adjustment, and outcome examined. We extracted the study-specific estimates of relative risk (risk ratio, odds ratio, hazard ratio, or incident risk ratio) for each biomarker and outcome. If this information was not given in sufficient numerical detail in the meta-analysis, we sought to complement it by extracting information from the meta-analysis forest plot using software for image digitisation (Engauge Digitizer 4.1). Whenever data were provided with different adjustments in each study, we preferred estimates that adjusted for the Framingham risk score or the model with Framingham risk score variables; if none of these options was available, we preferred the model with the larger number of adjusting factors.
We also recorded for each study that was included in the meta-analysis whether the data came from a randomised controlled trial or observational study; if the latter, we also specified whether it was a prospective or retrospective study. We verified the original study design by checking the original publication of each study cited in each meta-analysis. We also extracted information on whether both treatment arms of each randomised controlled trial population were included in the meta-analysis. For randomised controlled trials, which included all treatments arms, we further extracted information on whether the trial has previously reported a significant difference (P<0.05) between treatment arms for cardiovascular disease outcomes. We also categorised biomarkers according to whether they were recommended (class I or II recommendation) by leading guideline authorities15 16 17 18 19 20 21 22 23 24 for use in the clinical setting where the meta-analysis pertained.
To perform sensitivity analyses, for all eligible meta-analyses of individual participant data, we conducted Medline searches to identify the most recent meta-analysis of published literature for the same biomarker, outcome, and setting. We then extracted the relevant data on relative risk estimates for each newly identified meta-analysis as outlined above.
For each meta-analysis, we synthesised the reported relative risks of each dataset with random effects for all datasets and separately for datasets from observational studies and those from randomised controlled trials. Random effects calculations were implemented with the inverse variance approach, considering the sum of the within-study plus between-study variance.25 All summary relative risks were coined so as to be ≥1.00 (if the summary relative risk in the original meta-analysis was <1.00, all the datasets’ relative risks were inversed (for example, 0.50 became 2.00)). Prediction intervals were calculated for the summary relative risks for datasets from observational studies and from randomised controlled trials.26 We also calculated the relative relative risk within each meta-analysis as the ratio of the summary relative risk in datasets from observational studies to the summary relative risk in datasets from randomised controlled trials. A relative relative risk >1 means that datasets from observational studies show a stronger effect for the biomarker than datasets from randomised controlled trials.
The relative relative risk is difficult to compare across different meta-analyses, because each biomarker risk may have been expressed for a different contrast (such as comparison of extreme tertiles or quintiles, or per one standard deviation of a continuous measurement, etc) and different relative risk metric. To standardise the relative relative risks across meta-analyses, we also calculated the ratio of the log of the relative relative risks (logRRR) to the summary log of the relative risk (logRR) of each meta-analysis. As an example, the lnRR and lnRRR would be 10 times higher if the results were expressed as per 100 mg/dl of biomarker instead of per 10 mg/dl, whereas their ratio (called the design difference) remains the same. The design difference represents the difference between datasets from observational studies and from randomised controlled trials as a proportion of the summary effect of each meta-analysis. Thus, if the logRRR is 0.2 and the summary logRR is 0.5, the design difference is 0.2/0.5=40%—that is, the difference in the effect between datasets from observational studies and from randomised controlled trials has 40% of the size of the overall effect of the biomarker that is estimated when all studies are considered. We combined design differences across meta-analyses to obtain a summary design difference and its variance according to random and fixed effects calculations and calculated the 95% prediction interval for the summary design difference.25 26 27
We performed analyses calculating the design difference according to type of observational design (prospective v retrospective); type of meta-analysis (published literature v individual participant data); type of randomised controlled trial (those with significant difference between treatment arms and the biomarker analysed in the combined population of both treatment arms v those with no significant treatment effect or included only placebo arms in the biomarker analysis); statistical significance of the biomarker (those with P<0.05 for the summary relative risk including all datasets v those not statistically significant); and whether the biomarker was recommended for clinical use.
In sensitivity analyses, we also calculated the design difference in (a) meta-analyses of individual participant data of biomarkers for which we could identify a published meta-analysis of published literature for the same clinical setting and outcome, (b) meta-analyses of individual participant data limited only to studies that were also included in the respective matched meta-analyses of published literature, and (c) the matched meta-analyses of published literature.
Heterogeneity was quantified with Cochran’s Q test and the I2 metric,25 28 and 95% confidence intervals for I2 were calculated.29 I2 has a scale of 0–100%, and values >75% suggest very large between-study heterogeneity. Analyses were performed in STATA 10.1 (STATA Corp., College Station, TX). P values are two tailed.
Overall, we searched 295 articles (see web extra fig A on bmj.com), and 21 articles corresponding to 31 meta-analyses were deemed eligible.30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 This is recent literature, and 11 articles appeared in 2009–10 alone. The eligible meta-analyses pertained to a wide range of biomarkers used in different clinical settings, with a predominance of general population (primary prevention) settings (table 1⇓). All meta-analyses examined coronary heart disease as the outcome of interest, except for five, which examined cardiovascular disease and cardiovascular mortality.34 37 46 47 48 Sixteen of the 31 meta-analyses (52%) corresponded to situations where guidelines already recommended the clinical use of the respective biomarker in that particular setting. Seven meta-analyses were of individual participant data, and 24 analysed published literature. Each meta-analysis included a median (interquartile range) of 11 (6–20) datasets from observational studies and 2 (1–3) from randomised controlled trials. The total number of events (number of participants with the outcome of interest) ranged from 4 to 26 459 (median 3867 (interquartile range 1415–7316), data not available for two meta-analyses). The total number of events was larger in datasets from observational studies than in those from randomised controlled trials in 21 (72%) of the 29 meta-analyses with information on number of events. In 48% (14/29) of the meta-analyses the average number of events per dataset was larger for observational studies than for randomised controlled trials.
By random effects calculations, 27 of the 31 meta-analyses showed significant (P<0.05) associations of the examined biomarker with the outcome of interest. The summary relative risks were adjusted for different variables in each meta-analysis, ranging from age and sex to multivariable models with a range of cardiovascular disease risk factors. None reported adjustment for the Framingham risk score. Estimates of between-study heterogeneity were high for many meta-analyses, and often it remained high even within the subgroups of datasets from observational studies and those from randomised controlled trials (table 2⇓ and web extra tables B and C).
Comparison of effects in datasets from observational studies versus those from randomised controlled trials
The summary estimates in datasets from observational studies and from randomised controlled trials were in the same direction for all comparisons apart from fasting insulin (table 2⇑). Related prediction intervals for the summary relative risks are shown in web extra table D. For 15 meta-analyses the summary relative risk was nominally significant in both types of datasets, for eight it was nominally significant only in datasets from observational studies, and in three it was nominally significant only in datasets from randomised controlled trials.
In 19 of the 31 meta-analyses the relative relative risk estimate was >1 (suggesting a stronger effect in datasets from observational studies than in those from randomised controlled trials, design difference >0). For seven meta-analyses, the effect was significantly stronger in datasets from observational studies than those from randomised controlled trials: this included meta-analyses which examined C reactive protein, non-HDL cholesterol, lipoprotein(a), post-load glucose, fibrinogen, B-type natriuretic peptide, and troponins. For five of those biomarkers, the effect was less than half as strong in the randomised controlled trial datasets; these differences would probably be considered not only statistically significant but also clinically important. There was no meta-analysis with a significantly stronger effect in datasets from randomised controlled trials than in datasets from observational studies.
The random effects summary design difference across all 31 meta-analyses was 24% (95% confidence interval 7% to 40%, P=0.005), showing that the difference in the effect size between datasets from observational studies versus those from randomised controlled trials amounted to 24% of the overall effect of the biomarker (figure⇓). The associated 95% prediction interval of the design difference was −29% to 76%. This means that, typically, observational studies are expected to show larger or even much larger effects than randomised controlled trials, but exceptions can exist where larger effects are seen in the randomised controlled trials. There was modest heterogeneity between meta-analyses in the design difference (estimated I2=39%), and fixed effects estimates were slightly larger (33% (23% to 43%)) (table 3⇓). A funnel plot of the design differences and their standard errors is shown in web extra figure B and reveals a non-significant trend (Egger’s P value=0.09) for larger design differences when there is more evidence.
Of the five trials that contributed biomarker data to more than five different meta-analyses, the three trials that targeted high risk populations (with high levels of low density lipoprotein cholesterol in the West of Scotland Coronary Prevention Study, high cardiovascular disease risk in the Multiple Risk Factor Intervention Trial, and postmenopausal women in the Women’s Health Study) quite consistently found smaller effect size estimates than the meta-analysis summary (5/7 times, 7/8 times, and 7/9 times respectively), whereas this trend was not seen in the two trials that enrolled healthy populations (5/9 times in the Physicians Health Study, 4/7 in the Air Force/Texas Coronary Atherosclerosis Prevention Study).
Subgroup and sensitivity analyses
Based on random effects calculations, the design difference did not differ beyond chance when analyses were performed according to type of observational study, type of meta-analysis, type of randomised controlled trial, statistical significance of the biomarker, and whether the biomarker was recommended for clinical practice (table 3⇑). There were trends for stronger design differences in meta-analyses of individual participant data versus those of published literature and in non-significant versus significant biomarkers; the contrast between meta-analyses of individual participant data and those of published literature was nominally statistically significant with fixed effects calculations (table 3⇑).
In sensitivity analysis, we compared the overall design difference of four biomarkers (C reactive protein, Lp(a) lipoprotein, lipoprotein associated phospholipase A2 mass, and fibrinogen) examined in meta-analyses of individual participant data of our sample and the corresponding design difference of these four biomarkers in data from meta-analyses of published literature identified through additional literature searches. The random effects summary design difference was similar in the meta-analyses of individual participant data (53% (21% to 84%)), in data from the corresponding meta-analyses of published literature (69% (33% to 105%)), and in the meta-analyses of individual participant data limited only to studies included both in meta-analyses of individual participant data and in those of published literature (55% (22% to 89%)) (see web extra tables E and F for tabulated data).
In this empirical evaluation of 31 meta-analyses examining the association of a wide range of cardiovascular biomarkers, effect sizes were on average stronger in datasets derived from observational studies than in datasets from randomised controlled trials. The average difference in effect size amounted to about a quarter or a third of the estimated overall effect of the biomarker based on all data. For seven biomarkers, six of which are recommended for wide clinical use by major guidelines,15 16 17 18 19 20 21 22 23 24 the prognostic effect sizes were significantly stronger in datasets from observational studies than in those from randomised controlled trials.
New cardiovascular biomarkers are continuously proposed.1 2 Several of them have received great attention in the medical literature, and multiple meta-analyses and individual data consortia thereof have reported consistent associations with cardiovascular disease, raising hopes for improved cardiovascular prediction over and above what traditional markers and scores such as the Framingham risk score achieve.2 33 51 52 53 Accurate estimates of the prognostic ability of these markers are important for their clinical translation, and deviant results with different study designs raise some concern.
There are different possible interpretations for these discrepancies between observational studies and randomised controlled trials. Firstly, publication bias and selective reporting is well documented,53 54 55 56 and this applies also to prognostic analyses.55 57 Such biases may inflate the size of biomarker associations. Another evaluation of meta-analyses of biomarkers has shown that the largest studies almost always show smaller effect size estimates than the most highly cited, smaller studies of biomarkers.58 Epidemiological studies and their analyses may suffer differently from publication and other selective reporting biases than analyses of randomised controlled trial data.5 Large randomised controlled trials (those that are often also used for the assessment of biomarkers with major outcomes) are highly visible and unlikely to remain unpublished. Moreover, the analysis of biomarker associations is not a primary goal of these trials, and there may be less zeal in obtaining and reporting specific results for these markers in analyses of randomised controlled trials than in observational studies. On the other hand, in some observational studies, in contrast to randomised controlled trial studies, the biomarker analysis may be the primary aim—that is, the study may have been designed for that aim and therefore may suffer less from publication bias.
The estimated design difference tended to be higher, if anything, in meta-analyses of individual participant data than in meta-analyses of published literature. Groups that have undertaken meta-analyses of individual participant data, such as the Emerging Risk Factors Collaboration (ERFC), have performed a massive task bringing together multiple studies by harmonising and standardising their data, and problems of selective reporting might be partially amended in these analyses.59 If so, the gap between effect sizes in data from observational versus randomised controlled trial designs might be expected to close in meta-analyses of individual participant data. However, we found that the observed design difference was of similar magnitude in meta-analyses of individual participant data and in the corresponding meta-analyses of published literature for the same biomarker, outcome, and clinical setting.
One possible explanation is that in these meta-analyses of individual participant data each biomarker is studied in a selected subset of all potentially eligible datasets. For example, the meta-analyses by ERFC included data from 12–68 studies for each biomarker even though there are currently over 100 studies included in this collaboration.59 60 61 In fact, biomarkers examined in large consortia are among the most studied and “hottest” biomarkers in the medical literature—such as C reactive protein, fibrinogen, and lipid related markers. Selection forces might be affecting some of those particular biomarkers to a greater extent and may not be fully remedied even with meticulous standardisation of data contributed to a retrospective consortium such as ERFC.62
An alternative explanation may be that some randomised controlled trials may have more restrictive inclusion criteria than observational studies. Some observational studies enrol participants from the general population without any restrictions. This may lead to a more limited range of risk profiles across randomised controlled trial participants and less prominent prognostic effects for biomarkers. However, the typical randomised controlled trials included here are studies that also target either the general population or all patients without documented cardiovascular disease.
One may also speculate whether the design difference might be a result of higher measurement error of biomarkers or outcomes in randomised controlled trials than in observational studies. However, most of the biomarkers examined here, including almost all those that showed significant differences between the two study designs, are routinely measured with standardised, highly reproducible assays. As for outcome misclassification, this is also unlikely to be greater in randomised controlled trials than in observational studies, since randomised controlled trials (especially the generally high profile trials analysed here) are usually sufficiently meticulous for the ascertainment of such major outcomes.
Potential limitations of study
Some limitations should be acknowledged. Although we examined a large number of emerging biomarkers through comprehensive and complementary search strategies, there are still biomarkers that do not have available meta-analyses or have not been examined in any randomised controlled trial populations. It is unclear whether our results can be extended to any tested biomarker, including those that have only one or a few studies reported on them. However, the set of biomarkers that we examined includes many that have a strong presence in the literature, and most of them are either routinely used or in the process of entering clinical practice. Finally, we concentrated on the effect size of biomarkers and not on other important measures of predictive ability such as discrimination, calibration, and reclassification. However, these additional useful metrics are rarely reviewed in meta-analyses of biomarkers and are not reported by most studies.3
When the results of different types of studies disagree, it is useful to consider whether a biomarker would still be useful for clinical practice, given the different levels of estimated predictive performance. For example, C reactive protein and Lp(a) lipoprotein have been promoted for routine clinical practice. The European Atherosclerosis Society recently endorsed routine measurement of Lp(a) lipoprotein in individuals at moderate or high risk of cardiovascular disease and suggested that Lp(a) lipoprotein concentrations <50 mg/dL should be a treatment priority.15 22 However, in the data that we examined Lp(a) lipoprotein had markedly stronger prognostic effect size in datasets from observational studies than from randomised controlled trials—in the latter, the effect was practically null. Similarly, C reactive protein has been recommended by several guidelines for routine use,15 16 17 23 but again the prognostic effect was small in randomised controlled trials. If one considered only data from randomised controlled trials, probably neither Lp(a) lipoprotein nor C reactive protein would be considered good biomarkers, whereas data from observational studies suggest the opposite.
Given that biases are difficult to control for after publication of data, prospective solutions need to be considered. One option is to register available study populations, either from observational or from randomised controlled trial studies, for which the quality of the outcome and covariate data are acceptable and biospecimens are available, and then ensure that all these populations are assessed for each emerging biomarker of interest, not just those datasets that investigators are interested in or have reported on for that specific biomarker. Registration has been used for randomised trials to decrease biases in the evidence that determines the use of medical interventions.63 64 Biomarkers can sometimes be costly, especially if they are used for screening and primary prevention purposes in wide segments of the population. Prognostic estimates derived from all-inclusive consortia of registered studies may be more reliable for understanding the true potential of these biomarkers.
What is already known on this topic
Several cardiovascular disease biomarkers are proposed, but there is uncertainty about their effects and their ability to improve prediction
It is unknown whether the observed effect sizes differ between datasets from observational studies versus those from randomised controlled trials
What this study adds
Across all 31 meta-analyses included in this study, datasets from observational studies suggested, on average, larger prognostic effects than those from randomised controlled trials
For seven major biomarkers the prognostic effect was significantly stronger in datasets from observational studies than in datasets from randomised controlled trials
Cardiovascular biomarkers often have less promising results in the evidence derived from randomised controlled trial populations than from observational studies
Cite this as: BMJ 2011;343:d6829
Contributors: JPAI, IT, KCMS conceived the study, analysed the data, interpreted the results, and drafted the manuscript. IT and KCMS extracted the data. JPAI is the guarantor.
Funding: There was no funding for this study
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years, no other relationships or activities that could appear to have influenced the submitted work.
Ethical approval: Not required for this study.
Data sharing: Statistical code and dataset available from the corresponding author at email@example.com.
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.