- Stephen Burgess, statistician1,
- Adam Butterworth, genetic epidemiologist2,
- Anders Malarstig, genetic epidemiologist3,
- Simon G Thompson, statistician2
- 1Strangeways Research Laboratory, Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
- 2Department of Public Health and Primary Care, University of Cambridge, Cambridge
- 3Pfizer Global Research and Development, Cambridge
- Correspondence to: S Burgess
- Accepted 21 September 2012
What is Mendelian randomisation?
If epidemiologists are compared with fishermen, causality is the big fish. It is elusive to find, difficult to catch, and claims to have measured it are often exaggerated. But, despite the challenge, demonstration of causal relations remains a central aim of epidemiological inquiry. Mendelian randomisation is becoming a commonly used technique to make assessment of causality possible from observational data.1 For example, in coronary disease it has recently strengthened the case for a causal role of lipoprotein(a)2 and weakened the case of C reactive protein.3
To perform Mendelian randomisation, we look for a genetic variant with three key features. Firstly, it is associated with the risk factor of interest. Secondly, it divides the observed population into groups similar to arms in a randomised trial, which do not systematically differ with respect to any confounding variable.4 This ensures that any difference in the outcome is because of the genetic variant. Thirdly, it affects the outcome only through the risk factor of interest and not by other biological pathways. Provided these key features hold, we can infer a causal association of the risk factor on the outcome.5 The genetic variant acts as a proxy for the risk factor, and the random allocation of genes at conception is exploited as a natural experiment to show causation. The figure⇓ shows the correspondence between Mendelian randomisation and a randomised trial.4
For example, the genetic variant rs11206510 is associated with both low density lipoprotein cholesterol concentrations and coronary heart disease.6 Each additional copy of the C allele is associated with a 2.5% reduction in low density lipoprotein cholesterol concentration and an odds ratio for coronary heart disease of 0.93. Under the Mendelian randomisation assumptions that these associations are not confounded and that the genetic association with the disease is entirely mediated by the risk factor, we estimate that a 2.5% reduction in low density lipoprotein cholesterol concentration leads to a 7% reduction in coronary heart disease.
What factors affect the validity of Mendelian randomisation studies?
From the first discussions of Mendelian randomisation, researchers have emphasised that the assumptions leading to the assertion of causal association could be invalid for many genetic variants.7 Violations in the assumptions of no direct association between the genetic variant and either the outcome or any confounding risk factor can occur for several reasons. These reasons include the association of the variant with multiple risk factors (pleiotropy), the association between the variant and other genetic variants (linkage disequilibrium), and the presence of genetic differences between possibly hidden subgroups in the population under investigation (population stratification).8 For example, in a North American cohort, a variant could be associated with type 2 diabetes because of increased prevalence of the variant among people of Native American descent, who are known to have a greater incidence of the disease. The genetic association might be driven not specifically by a single risk factor but by a range of factors associated with the difference in ethnic background. Such violations of internal validity can lead to misleading conclusions.1 5
An aspect of validity that has received less attention is the issue of external validity. If the assumptions about the genetic variant are true and a valid estimate is made that corresponds to a causal association, can this estimate be generalised to the effect of a clinical intervention? For example, is the estimate of lowered risk derived from considering genetically reduced concentrations of cholesterol the same as the lowered risk conferred by an intervention that reduces concentrations of cholesterol?
Reasons why Mendelian randomisation might give a different estimate to an intervention
Mendelian randomisation is different from a randomised trial in a fundamental way. In a randomised trial, the intervention applied to the treatment group is usually the intervention that is proposed in clinical practice. In Mendelian randomisation, the “intervention” leading to differences between the groups within the study is the presence of a genetic variant.9 The question of external validity is whether the causal effect due to the change in the risk factor as a result of the presence of the genetic variant is similar to the causal effect due to the proposed intervention on the risk factor. Aside from those resulting from differences in the study population,10 there are several reasons why these effects might be unequal:
Time scale and developmental compensation
The presence or absence of the genetic variant in an individual is determined at conception. This means that the Mendelian randomisation estimate represents the result of a lifelong difference in the risk factor between the groups. An intervention in levels of a risk factor for coronary heart disease (for example) might have limited benefit because some stages of atherosclerosis might be irreversible. There might be no intervention on the risk factor in a mature cohort that can imitate the genetic effect. The same would be true if the disease progression depends on a developmental phase at an particular stage of life.8
For some risk factors, an individual might develop compensatory mechanisms (canalisation) in response to increased (or reduced) levels of the risk factor.9 This has been seen in knockout studies, in which deletion of a particular gene often does not have the profound effect expected. This is because alternative pathways are developed as a compensatory mechanism to circumvent the missing gene.7 For example, previous studies of interleukin 1 knockout mice have suggested that other inflammatory responses (for example, tumour necrosis factor alpha concentrations) might be increased to compensate for the loss of inflammatory signalling from the interleukin 1 pathway.11
Usual versus pathological levels
Risk of disease often depends primarily on the average or “usual” levels of a risk factor. Mendelian randomisation has a particular role to play here, as genetic variants would be expected to affect these average levels. It is plausible, however, that long term increased average levels of a risk factor do not affect risk of disease but acute response of the risk factor does.8 The efficacy of short term targeted interventions on pathological levels of a risk factor might not be validly assessed by Mendelian randomisation.
For example, genetic variants that are associated with usual concentrations of C reactive protein have been used to assess the causal association of long term raised average concentrations of C reactive protein on cardiovascular risk.3 Though there does not seem to be a causal association between C reactive protein and cardiovascular risk, this does not preclude the efficacy of a therapeutic intervention on acute concentrations of C reactive protein, which is better assessed by in vivo studies.12
Extrapolation of small differences
The change in a risk factor because of genetic variants is generally small. For an intervention lowering (or raising) the risk factor uniformly by a small amount for everyone in the population, a Mendelian randomisation study can provide a relevant estimate of the effect of the intervention. If the proposed intervention in the risk factor is more substantial, however, then the Mendelian randomisation estimate of its effect relies on extrapolation.
For example, the effect of statin use (inhibition of 3-hydroxy-3-methylglutaryl coenzyme A (HMG CoA) reductase) on low density lipoprotein cholesterol concentrations is several times larger than the association of low density lipoprotein cholesterol concentrations with variants in the HMG CoA reductase gene. Extrapolation of the genetic effect relying on a linear assumption for the effect of the risk factor on the outcome might not be valid.
Different pathways of genetic and intervention effects
The genetic variant and the proposed intervention will not, in general, have the same specific effect on the risk factor. This situation is similar to that of differences between drugs that act on different mechanisms but influence the same mediating risk factor. The genetic change in the risk factor might be associated with another variable, as in the case of a variant in the FTO gene associated with obesity.13
The effect of FTO on obesity is not direct; rather the genetic variant affects satiety, which in turn affects obesity.14 An intervention on obesity that is not based on reducing food intake might have a different effect on the outcome to a Mendelian randomisation study. Even when both effects are specifically targeted on the risk factor, it could be that they are on different biological, biochemical, or physiological pathways, and so the genetic and clinical changes in risk factor might affect the outcome to different extents.
Cholesterol and coronary heart disease
We can illustrate the differences between Mendelian randomisation estimates and those from other approaches in the following example. Coronary heart disease is the result of a build up of atheromatous plaques in the coronary arteries. A major component of such plaques is cholesterol, and low density lipoprotein cholesterol is an established causal risk factor for coronary heart disease. We assessed the association between low density lipoprotein cholesterol and coronary heart disease from Mendelian randomisation and from randomised trials in which statins are used as the clinical intervention to lower low density lipoprotein cholesterol concentrations.
We considered five genetic variants from a meta-analysis of genome-wide association studies that are associated with low density lipoprotein cholesterol but not with high density lipoprotein cholesterol nor triglycerides.6 The table⇓ gives the estimates of association of each genetic variant with log transformed concentrations of low density lipoprotein cholesterol and risk of coronary heart disease, and Mendelian randomisation estimates with each genetic variant of the causal odds ratio of coronary heart disease per 30% decrease in concentration.5 These odds ratios range from 0.27 to 0.45. This relies on an extrapolation of between eightfold and 20-fold of the genetic effects on the risk factor.
In comparison, randomised trials of statins have reported lower estimates of the benefits of reduced low density lipoprotein cholesterol concentrations. A meta-analysis examining the effect of statin use on coronary heart disease, comprising around 69 139 participants with 6406 events, gave a relative risk of 0.73 (95% confidence interval 0.70 to 0.77) based on average reduction of around 30% in low density lipoprotein cholesterol concentrations over an average follow-up time of at least three years.15 A more focused meta-analysis examining the effect of statin use for primary disease prevention, comprising around 27 969 individuals without a history of coronary heart disease with 1677 events, gave a similar relative risk of 0.72 (95% confidence interval 0.65 to 0.79) over 1.5 to three years’ follow-up.16
The effect of statins in reducing coronary heart disease increases over time.17 As atherosclerosis is a chronic condition that develops progressively, it is not surprising that the estimates of the effect of the lifelong reduction of low density lipoprotein cholesterol associated with the genetic variants corresponds to a greater proportional change in the risk of coronary heart disease than the effects of statin use. The difference between the estimates could also be caused by the non-specific effects of statins; any effects of statins on inflammatory response, however, would further lessen the role of low density lipoprotein cholesterol and make the contrast with the genetic effects more extreme.
Blood pressure and coronary heart disease
Another example is the association between blood pressure and coronary heart disease. A genetic risk score associated with a 1.6 mm Hg decrease in systolic blood pressure corresponds to an odds ratio for coronary heart disease of 0.91 (95% confidence interval 0.89 to 0.92).18 Assuming a linear association, this implies an odds ratio of 0.55 (0.47 to 0.61) for a 10 mm Hg decrease compared with the relative risk from a meta-analysis of 0.78 (0.73 to 0.83) in trials and 0.75 (0.73 to 0.77) in cohort studies.19 Here again, the estimate of the benefit of reducing blood pressure from the genetic variants is much greater than that of the intervention.
Interpreting the result of a Mendelian randomisation study
A Mendelian randomisation study tests whether a risk factor is causally associated with a disease outcome by examining whether there are differences in the outcome between genetically defined groups with different average levels of the risk factor of interest.
There are three pitfalls in interpreting the result of a Mendelian randomisation study:
Failure of key assumptions: The key assumption is that the genetic variant that is associated with the risk factor divides the population into groups that are similar to treatment arms in a randomised trial in that all potential confounding factors are balanced between the groups. This requires lack of pleiotropy of the variant, absence of linkage disequilibrium with other functional variants, and absence of hidden population strata (see text). If any of these conditions do not hold, then estimates from Mendelian randomisation can be misleading
Overinterpreting a null finding: The differences in the risk factor between the genetic groups are usually small compared with its overall variation. A null finding might simply reflect that the small differences between the groups do not result in large enough differences in the outcome to be reliably distinguished from chance differences in a limited sample size. In some cases sample sizes in tens of thousands are required to provide sufficient power to reliably interpret a null finding8
Overinterpreting a positive finding: While the Mendelian randomisation hypothesis relates to genetic groups, one aim of a Mendelian randomisation is to determine the potential effect of a clinical intervention in the risk factor of interest. Qualitative and quantitative differences between the comparison of genetic groups and the proposed intervention mean that the causal effect estimated by Mendelian randomisation might not directly translate into the observed effect on the outcome of modifying the risk factor in practice.
Using the Mendelian randomisation paradigm to guide drug discovery
Questions of generalisability of results are especially relevant when Mendelian randomisation is used in guiding clinical interventions and drug discovery. Genetic evidence can inform the causal role of a risk factor that is being considered as a target for intervention. Association between a relevant genetic variant affecting the risk factor and the outcome might be taken as evidence for the potential efficacy of a drug affecting the risk factor pathway. For the reasons given above, however, absence of evidence for such an association does not necessarily imply lack of efficacy. Although we can expect Mendelian randomisation in many circumstances to provide a good qualitative indication, the magnitude of the Mendelian randomisation estimate will not necessarily be a reliable guide to the potential benefit of a drug.
In the examples considered, evidence from Mendelian randomisation suggests that low density lipoprotein cholesterol and blood pressure are appropriate targets for interventions aimed at reducing coronary heart disease. The genetic variants might also suggest particular biochemical pathways for such intervention. In this way, Mendelian randomisation can be used to prioritise risk factors for future pharmacological investigation.
Prospects for Mendelian randomisation
Mendelian randomisation is a useful tool for exploring causal relations between modifiable risk factors and outcomes of interest. It is one of the few epidemiological methods that can help in the selection of targets for therapeutic intervention. It would be misleading, however, to assume that the estimate from a Mendelian randomisation study gave the definitive answer to the general question of causal relevance of a risk factor. Mendelian randomisation estimates are especially relevant when the effect of interest is that of a long term population based intervention. We conclude that, while a Mendelian randomisation approach will generally be qualitatively informative for the direction of effect of a clinical intervention, the genetically derived estimate might not correspond to the magnitude of the effect in practice.
Estimates from Mendelian randomisation represent causal effects of genetically determined differences in a risk factor on a disease outcome
These estimates are informative for assessing aetiological associations of risk factors and for prioritising targets for pharmaceutical intervention
The effects of such interventions can be quantitatively different to those obtained from Mendelian randomisation
Cite this as: BMJ 2012;345:e7325
Contributors: SB was the lead author; the other authors were involved with all stages in the conceptualising and editing of this article. SB is guarantor.
Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: all authors receive support through grant RG/08/014 from the British Heart Foundation; AM is currently on secondment to the University of Cambridge from Pfizer Pharmaceuticals.
Provenance and peer review: Not commissioned; externally peer reviewed.