Methods for deriving risk difference (absolute risk reduction) from a meta-analysisBMJ 2023; 381 doi: https://doi.org/10.1136/bmj-2022-073141 (Published 05 May 2023) Cite this as: BMJ 2023;381:e073141
- M Hassan Murad, professor1,
- Zhen Wang, professor1,
- Ye Zhu, research associate1,
- Samer Saadi, research fellow1,
- Haitao Chu, professor2 3,
- Lifeng Lin, associate professor4
- 1Evidence-based Practice Center, Mayo Clinic, Rochester, MN 55905, USA
- 2Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
- 3Statistical Research and Data Science Center, New York, NY, USA
- 4Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, USA
- Correspondence to: M H Murad
- Accepted 6 March 2023
Decision making requires the trade-off of benefits and harms, which in turn requires knowledge of the absolute treatment effect on binary patient outcomes. Decision makers make judgments about how this absolute effect relates to their decisional thresholds.1 For example, a systematic review showed that compared with carotid artery stenting, carotid endarterectomy was associated with a significant reduction in the risk of stroke in patients with symptoms of average risk (risk ratio 0.77, 95% confidence interval 0.63 to 0.94). Endarterectomy, however, increased the risk of myocardial infarction (risk ratio 2.15, 95% confidence interval 1.27 to 3.61).2 Trading off these two outcomes in a clinical setting, a guideline, or decision modeling cannot be done solely based on these relative effects. Instead, we need to know the exact treatment effect in absolute terms (ie, how many strokes were prevented and how many additional myocardial infarctions were caused in 1000 patients who received endarterectomy rather than stenting in a population with a given baseline risk). This absolute effect is called absolute risk reduction or risk difference and is calculated by subtracting the average risk (incidence rate) in the control group from the average risk in the intervention group.
In addition to the importance of risk difference for the trade-off of benefits and harms, the confidence interval of risk difference is the basis for judgments about certainty in the estimates, and subsequently certainty in the decision being made. Following the GRADE approach (grading of recommendations, assessment, development and evaluation), a treatment effect can be considered precise (ie, more certain) if the confidence interval of risk difference does not cross the decisional threshold (ie, the decision would be the same regardless of which confidence interval boundary represented the truth).345 Since risk difference varies based on the population baseline risk, this precision judgment might need to be made separately for populations with different baseline risks for the outcome of interest.
Therefore, robust methods to estimate risk difference and associated confidence intervals that correspond to patient groups with certain baseline risks are needed. Estimating risk difference from a single study is simple: one only needs to calculate each group’s risk in the individual study by dividing the number of events by the sample size. The risk difference is then calculated as the difference between the risks of the two groups, and its standard error and confidence interval can also be derived using the risks and sample sizes (supplementary table 1). The most common scenario in practice, however, is that multiple studies have been published on the topic in question. Estimating risk difference from multiple studies is not straightforward, and the available methods have several limitations. In this exposition we describe four methods for estimating risk difference from a meta-analysis. The first method is meta-analysis of risk differences derived from individual studies. The second method is transformation of a relative effect to an absolute effect using an assumed baseline risk. This second method is the most commonly used and is recommended by the Cochrane Collaboration and GRADE.367 We present two other less well known approaches that can reduce some of the limitations of the two original methods. One is a modeling approach that incorporates uncertainties in the relative effect and the baseline risk. The other is a bivariate random effects model that estimates risk difference by conditioning on the baseline risk of individual studies. We provide open source coding using R software8 so that meta-analysts or guideline developers can estimate risk difference using these two methods (code is provided in the appendix and can be downloaded with data from https://github.com/linlf/RD). All four methods are applied to a case study in which carotid artery endarterectomy is compared with carotid stenting.2
The absolute risk reduction or risk difference is a critical measure for decision making; however, the available methods for estimating risk difference have limitations
Pooling of risk differences generated from multiple studies in a meta-analysis is limited by the variability of risk differences across baseline risks
Transforming a pooled relative effect (such as a risk ratio or odds ratio) into risk difference is the most commonly used and recommended method; however, this approach does not incorporate uncertainty in the baseline risk, which counterintuitively makes risk difference estimates imprecise in higher risk populations and precise in lower risk populations
Modeling can incorporate relative effects from trials and baseline risks from observational studies and accounts for both uncertainties, and is the preferred approach if information about the uncertainty of baseline risk is available
The bivariate random effects model computes conditional effects based on baseline risks and is preferred if enough studies are available
Method 1: Meta-analysis of risk difference
For this simple two step method, the risk difference is estimated for each study with the accompanying standard errors, then these risk differences are pooled across studies in a meta-analysis with typical study specific weights.
Application of method
Data should be prepared in four columns that include the number of events and sample size in each study arm. The meta-analysis is then simply conducted on the risk difference scale. Numerous proprietary or open source general statistical and meta-analysis packages that can pool data and express the results as a risk difference are available.8910 The meta-analysis can be performed using a common effect (also called fixed effect) model or a random effects model.11 Specifically, the common effect model can be used if these studies are considered homogenous—that is, they share a common true risk difference. The random effects model is appropriate if the true risk differences vary across studies from either the statistical or clinical perspective. Such variation between studies is often referred to as heterogeneity. Both models are implemented in most available meta-analysis software.
Advantages and disadvantages
The main limitation of using the risk difference as a meta-analysis effect measure is that risk difference, by definition, depends on baseline risks that could vary considerably across studies. This challenge was recognized in the early days of meta-analysis to the extent that the Cochrane Handbook for Systematic Reviews of Interventions discourages pooling risk difference across trials.6 For example, assume that a study shows a risk reduction from 25% to 15% (ie, a risk difference of 10%). This 10% risk reduction cannot be applied to a population with a baseline risk of 1% (which would give an negative risk). Thus, studies with very different baseline risks cannot be meta-analyzed to produce a weighted mean risk difference.
The other limitation of using the risk difference as a meta-analysis effect measure is that it estimates a single pooled risk difference applicable to a population that in theory has the weighted average baseline risk of the control arms of the studies in the meta-analysis. Thus, this approach does not help to establish risk differences for populations with different baseline risks, unless a meta-analysis of risk differences was stratified in a subgroup analysis based on a grouping of baseline risks. Stratification might reduce statistical heterogeneity but will likely lead to imprecise risk difference estimates because the number of studies in each stratum of baseline risk will be small. Furthermore, simulations suggest that the approach of pooling risk differences across studies performs poorly when studies have zero events.11
Application to case study
Using a traditional random effects model that assumes that study specific underlying effect measures follow a normal distribution, the meta-analysis of risk differences from eight trials shows that compared with stenting, endarterectomy reduced strokes by 20 per 1000 patients (95% confidence interval 40 fewer to 1 more; fig 1).
This meta-analysis was associated with some heterogeneity (I2=44%, τ2=0.0003, P=0.09). The average baseline risk (ie, average stroke incidence in the stenting arms of the eight trials) was 8%. To estimate risk difference in a high risk population, we can restrict meta-analysis to studies with a baseline risk >10%, which gives a pooled risk difference of 41 fewer strokes per 1000 patients (95% confidence interval 80 fewer to 3 fewer). As expected, the risk difference was higher in the high risk group, but its confidence interval became much wider because only three trials were available for this subset of the analysis.
Method 2: Transforming a pooled relative effect into risk difference using an assumed baseline risk
This method transforms or re-expresses a pooled odds ratio or a risk ratio into a variety of risk differences across a range of assumed baseline risks. This approach is recommended by most methodology groups, including the Cochrane Collaboration and the GRADE working group, and is the approach used in GRADEpro, the software that facilitates grading certainty of evidence and guideline development.367121314
Application of method
The only data needed are a single pooled odds ratio or risk ratio with its confidence interval and a single baseline risk value. Systematic reviewers usually pool log odds ratio or log risk ratio from the studies in a meta-analysis software using either a common effect or random effects model, to generate a relative effect (odds ratio or risk ratio) and its confidence interval. The baseline risk is usually derived from the control arms of randomized trials, but can also be derived from a large observational study or based on expert consensus.12 Baseline risk derived from the control arms of trials might not be representative of the actual risk.15 The subsequent computation of risk difference is simple and intuitive and can be made by hand (box 1) or in the open source software GRADEpro. The 95% confidence interval of risk difference is calculated using the same equations in box 1 by multiplying the lower or upper limit of the relative effect by the same baseline risk.
Computing risk difference from risk ratio and odds ratio
RD = BR × (RR − 1)
RD = (OR × BR)/(1 − BR + (OR × BR)) − BR
BR=baseline risk; RD=risk difference; RR=risk ratio; OR=odds ratio.
Advantages and disadvantages
The first limitation of this approach is that it does not account for the variability in the baseline risk and assumes no uncertainty about it. All the variability of the resultant risk difference is derived from variability of the relative effect. This approach is clinically illogical because the baseline risk is always uncertain.16 This limitation becomes critical when the baseline risk is small, which can result in a misleadingly precise risk difference.
The second limitation becomes evident when guideline developers make precision judgments for several groups with various baseline risks (eg, low, medium, and high risk). Since the confidence interval of risk difference is a simple linear transformation of the baseline risk, it widens as the baseline risk increases. This phenomenon means that the estimates in high risk populations are always less precise than in lower risk populations (ie, the confidence interval of risk difference is always wider at higher baseline risk), which is counterintuitive. No clinical or methodological reason exists for high risk populations to always have less precise estimates. The consequences of this problem are critical, because guideline developers will always be less certain (less precise) in high risk populations, which could weaken their recommendations for treatment. When an odds ratio is transformed to a risk difference, the confidence interval of risk difference also widens as baseline risk increases, although the relation is not perfectly linear as with risk ratio transformation. Nevertheless, the same concerns remain for transforming odds ratios to risk differences.
The third limitation is noted when the treatment effect is not significant (ie, the confidence interval of the relative effect crosses the null). In this case, the majority of the confidence interval of the risk difference sometimes gives the opposite conclusion to the point estimate. We showed in a previous study that this discrepancy can occur in 5-10% of meta-analyses with non-significant findings and is attributed to the logarithmic transformation, which generates an asymmetric confidence interval of the relative effect.7 Although this limitation might not seem serious, it can cause interpretational challenges (such as when considering non-inferiority decisions) and impact subsequent decision modeling.
Finally, this method is based on the assumption of total portability of risk ratio and odds ratio across various baseline risks (ie, the relative effect is assumed to be independent of the baseline risk). This assumption is often debated and although it might hold true in some settings, in many others it does not.17181920
Application to case study
A meta-analysis of the eight trials that compared endarterectomy with stenting on the outcome of stroke calculated a risk ratio of 0.77 (95% confidence interval 0.63 to 0.94). In figure 2, this risk ratio is used to compute risk difference, which is estimated across baseline risks ranging from 0% to 100%. Figure 2 shows that the confidence interval of risk difference widens as a linear function of the baseline risk. We note that if a decisional threshold was set at 100 fewer strokes per 1000 people, the treatment effect would be precise for patient groups with baseline risks of 10% and 20%, but not 30%. This conclusion about imprecision is only attributed to the method of computation and might not make sense clinically (ie, no clinical reason exists to make us consistently less certain about the benefit of endarterectomy over stenting in a higher risk population compared with a lower risk population). In addition, since this method does not account for the uncertainty in the baseline risk, if the baseline risk was low (1%) the case study’s 95% confidence interval of risk difference would be four fewer strokes to one fewer stroke per 1000 patients, which may be misleadingly precise.
Method 3: Modeling of uncertainty in the baseline risk via simple microsimulation
Method 2 can be enhanced by addressing the uncertainty in the baseline risk through a modeling approach that uses a simple microsimulation method. This approach is particularly helpful when two bodies of evidence need to be considered simultaneously. The least biased estimate for comparative effectiveness evidence is usually derived from a meta-analysis of randomized trials, whereas a real world baseline risk might be better derived from a different source (such as a population based study or registry), which provides better generalizability and applicability to the target population.1221
Application of method
The data required are a pooled risk ratio or odds ratio with its confidence interval and a baseline risk with its confidence interval. A general statistical package is needed since this approach is not usually implemented in meta-analysis software. In the appendix we provide the R code for this simple microsimulation.
This method estimates risk difference by simulating trials of individual patients and aggregating individual results to generate group means and distributions (similar to Monte Carlo simulation). Thus, the main feature of this approach is “constructing the desired patient group from individual sampling,” which means simulating one individual at a time, and the mean results of the whole cohort are calculated from all the aggregated individual simulation results.22 Each time the model runs, it selects a patient with a baseline risk drawn randomly from a distribution that is defined by the mean and standard deviation of the available baseline risk. This information is usually derived from a study or a series of studies with the closest characteristics to the patient group of interest. This patient will also have a simulated risk ratio (or odds ratio) randomly drawn from a distribution of risk ratio (or odds ratio) that is defined by the mean and standard deviation of the available risk ratio (or odds ratio).
Similar to baseline risks, the value for risk ratio (or odds ratio) should come from the most relevant studies. Risk difference will be generated for this patient from these two simulated values using the same equations listed in box 1. This calculation concludes one microsimulation for one patient. The model then begins the next simulation by selecting a second patient and randomly drawing their baseline risk and risk ratio, and then generating their risk difference. This step is repeated for a large number of simulations to create a distribution of the simulated risk difference (typically 10 000 or more simulated patients such that the variation due to simulation is negligible). The distribution of these simulated risk differences will have a mean risk difference, which is the final estimated risk difference. The risk difference distribution will have 2.5 and 97.5 centile values that provide the 95% confidence interval for risk difference.23
The framework of this modeling process follows bayesian decision theory, which contains two major core tasks: constructing a decision analytic model, and performing a probabilistic analysis to identify the uncertainty of the current strategy. In this approach, we assume that the baseline risk follows the beta distribution and that the risk ratio (or odds ratio) follows the lognormal distribution. The parameters of the beta distribution are selected such that the distribution has the assumed mean and variance. We used the beta distribution for simulating baseline risks owing to its special association with binomial data and because its natural range is between 0 and 1. The beta distribution is frequently used for event probabilities and in bayesian statistics. Other distributions could also be considered, such as logit or probit normal distributions. If information about the uncertainty in the baseline risk was only available in a range form, then baseline risk could be sampled from a uniform distribution.2324 If the distributions could be estimated statistically, they should be used in the simulations. For example, if the associations between probabilities of outcomes and patient characteristics can be established using regression models the predicted probabilities from the regression could be used.
Advantages and disadvantages
This proposed microsimulation produces a confidence interval that does not automatically widen with increasing baseline risk, which is a limitation of method 2. Rather, the confidence interval widens as uncertainties of baseline risk or risk ratio (or odds ratio) increase. Another advantage of this type of modeling is that simulation provides flexibility to study other scenarios, such as longer follow-up, multiple treatment arms, and the inclusion of multiple outcomes; however, these scenarios require modeling expertise and are outside of the scope of this study.
Application to case study
The meta-analysis of the eight trials that compared endarterectomy with stenting on the outcome of stroke calculated a risk ratio of 0.77 (95% confidence interval 0.63 to 0.94).2 This risk ratio and its variance provide the parameters of the lognormal distribution to be used for microsimulation. We derived a baseline risk from an observational study of Medicare claims data from 10 958 patients aged 66 years or older. These claims data were considered to be real world data and gave a stroke rate of 5.3% (95% confidence interval 3.9% to 7.2%) in patients with symptoms who received stenting.25 This average baseline risk and its variance provide the parameters of the beta distribution to be used for microsimulation. A microsimulation of 10 000 iterations showed that compared with stenting, endarterectomy was associated with 12 fewer strokes per 1000 patients (95% confidence interval 3 fewer to 21 fewer). A histogram of the distribution of risk difference is in figure 3.
Method 4: Bivariate random effects model
This one stage, meta-analysis model is based on joint analysis of the risks in treatment and control groups using a bivariate random effects model and computing conditional effects based on baseline risks.262728 Models used for this method include the bivariate generalized mixed effects model and the bivariate beta binominal model.293031
Application of method
The data required for this analysis are similar to those needed for method 1, which are the number of events and sample size in each study arm. Data are formatted in a long format with columns for events and sample size, intervention type (intervention v control) and study identification. Considering that this approach is not currently available in meta-analysis software and requires a general software package and knowledge of statistical coding, we have provided the code for the open source R software for a bivariate random effects model that estimates risk difference (appendix). This code also plots risk difference and its confidence interval across baseline risks and can provide risk difference and confidence interval for a given baseline risk.
Advantages and disadvantages
In regression analysis, the confidence interval of the response variable is narrowest when the predictor variable is at its mean value. Using this bivariate approach, therefore, the confidence interval of risk difference is narrower at the average baseline risk.17 This method solves the problem, noted with method 2, of the ever widening confidence interval of risk difference as the baseline risk increases. Unlike method 2, method 4 does not require the assumption of portability of the relative effect across baseline risks, and unlike method 1, it does not force a normal distribution on risk differences estimated from the individual studies. Method 4 also offers the advantage of not requiring continuity correction for studies with zero events and avoids bias related to continuity correction.11
Simulations suggestthat in the presence of zero events, the bivariate random effects model generally leads to smaller biases and mean squared errors and higher interval coverage probabilities than other methods.11 The important assumption of this method is that event counts of individual studies follow binomial distributions and the risks on the logit transformed scale (or more generally other link functions such as probit or complementary log-log) follow a bivariate normal distribution.
This regression method has some inherent limitations. A sufficient number of studies with a sufficient spread of risks is needed to estimate model parameters and avoid model singularity and colinearity (ie, six or more studies in bivariate models with five parameters11). If the correlation between the two treatment groups is close to −1 or +1, convergence in the parameter estimation means that the results of this method could be less reliable. Extrapolation of risk difference for baseline risks not included in the original dataset should also be avoided.
Another limitation of this approach is that it assumes a linear relation between study specific risk differences and baseline risks on the logit (or more generally probit, complementary log-log, and other link functions) transformed scale of risks. This linearity assumption might not hold in some cases, and a non-linear relation on the transformed scale could be modeled. Nevertheless, such alternative parametric models might require a large number of studies for the accurate estimation of parameters. Note that a linear relation between study specific risk differences and baseline risks in a transformed scale commonly suggests a non-linear relation in a different link function transformed scale. Thus, choosing a link function that can provide better goodness of fit than the default logit transformation can ameliorate this limitation.
Application to case study
We applied the bivariate generalized mixed effects model to data from the eight trials that compared the effect of endarterectomy with the effect of stenting on stroke. Figure 4 demonstrates that the confidence interval of risk difference is narrowest at the average baseline risk of 8% and widens as the baseline risk increases or decreases. The model provides risk difference and the 95% confidence interval for specific baseline risks.
We have described four methods to estimate risk difference in the context of a meta-analysis in terms of their main principles, strengths, and limitations (supplementary table 2) and applied them to a case study (results summarized in table 1). When choosing between these four methods, meta-analysts should consider the available data and the goal of risk difference estimation. An algorithm is proposed in figure 5 to help in choosing the appropriate method. Methods 1 and 4 require the availability of 2×2 data from each individual study. Methods 2 and 3 can be performed without such data and only require a pooled relative effect, which is also the case when the effect estimates are adjusted. When results are presented as hazard ratios, researchers can impute a relative risk from a hazard ratio3233 and apply method 3. Alternatively, they could apply the bivariate random effects model where the follow-up periods could be considered as a covariate in person time for estimating the event rate in each treatment group. For method 2, which has the most limitations, we suggest using a range of baseline risks (as opposed to a single value) to compute the confidence interval of risk difference. Using a range of baseline risks from the available studies avoids the assumption of no uncertainty in baseline risk, which is clinically irrational and can lead to a misleadingly narrow confidence interval of risk difference.
If the goal of risk difference estimation is to estimate risk difference for several groups with different baseline risks and make separate imprecision judgments and subsequent decisions for these groups, then method 2 (transforming a pooled relative effect into risk difference) has the most problems. Such an approach could lead to weaker recommendations in high risk populations owing to imprecise estimates, and stronger recommendations in low risk populations owing to precise estimates. The bivariate random effects model and the microsimulation would be better for making imprecision judgments for multiple groups with different baseline risks. If, however, the goal of risk difference estimation is to incorporate a baseline risk from a body of evidence that is external to a meta-analysis of comparative effectiveness (ie, including baseline risk data from a registry or a large observational study), then the microsimulation approach would be clearly the preferred method. Lastly, if empirical evidence suggested heterogeneity of the relative effect based on the baseline risk, then the bivariate random effects model would have the most compelling rationale. The microsimulation and the bivariate random effects model align with efforts that aim to deal with heterogeneity of the treatment effect.34
Although no single approach resolves all the limitations we identified, we have presented some guidance for choosing among four methods. As always, a sensitivity analysis can help to determine whether the choice of method affects imprecision judgments and subsequent decisions.
Contributors: MHM provided expertise in evidence synthesis, guideline development, and decision making. ZW and SS provided expertise in methodology and meta-analysis. YZ, HC, and LL provided expertise in statistics and modeling. MHM is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Patient and public involvement: Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research. Ethical approval was not required. All associated data are shared publicly on https://github.com/linlf/RD.
Provenance and peer review: Not commissioned; externally peer reviewed.