What can mendelian randomisation tell us about modifiable behavioural and environmental exposures?BMJ 2005; 330 doi: https://doi.org/10.1136/bmj.330.7499.1076 (Published 05 May 2005) Cite this as: BMJ 2005;330:1076
- Correspondence to: G Davey Smith
- Accepted 4 February 2005
Using genetic variants as a proxy for modifiable environmental factors that are associated with disease can circumvent some of the problems of observational studies
Epidemiologists look for modifiable causes of common diseases to improve population health. However, epidemiological studies may identify spurious “causes.” For example, the epidemiological findings that hormone replacement therapy protects against coronary heart disease,w1β carotene prevents lung cancer,w2and vitamin E and vitamin C reduce risk of cardiovascular diseasew3have all been refuted by randomised controlled trials and have raised concerns about the value of epidemiological studies.1 The misleading findings were probably due to confounding by behavioural, physiological, and socioeconomic factors related both to exposures and to disease end points.2 3 One solution to these problems is mendelian randomisation.4 5
What is mendelian randomisation?
Mendelian randomisation is a recent development in genetic epidemiology6 7 based on Mendel's second law that inheritance of one trait is independent of inheritance of other traits. It uses common genetic polymorphisms that are known to influence exposure patterns (such as propensity to drink alcohol) or have effects equivalent to those produced by modifiable exposures (such as raised blood cholesterol concentration). Associations between genetic variants and outcome are not generally confounded by behavioural or environmental exposures. This means that observational studies of genetic variants have similar properties to intention to treat analyses in randomised controlled trials (fig 1).
Scope of mendelian randomisation
The simplest way of appreciating the potential of mendelian randomisation is to consider applications of the underlying principles. The inferences that can be drawn from mendelian randomisation studies depend on the different ways in which genetic variants can proxy for environmentally modifiable exposures.
Understanding effects of health related behaviours
In observational studies, alcohol consumption is related to many known and unknown confounding factors; ill health may lead people to reduce their drinking habits, and people often misreport their alcohol intake, making interpretation of observed associations between alcohol consumption and health outcomes difficult. It is unlikely that the definitive test of causality—a randomised controlled trial of the long term effects of alcohol consumption—will ever be carried out. However, genetic variants that influence the tendency to drink alcohol have been identified, allowing estimation of the unbiased and unconfounded health effects of alcohol.
The enzyme aldehyde dehydrogenase is responsible for efficient metabolism of alcohol after it has been oxidised to acetaldehyde. Half of Japanese people are heterozygous or homozygous for a variant of the aldehyde dehydrogenase gene (ALDH2) that is non-functional, called the null variant, making them unable to metabolise acetaldehyde efficiently. Peak blood acetaldehyde concentrations after drinking alcohol are 18 times higher among people who are homozygous for the null variant allele and five times higher among heterozygous people compared with people with two functioning alleles.w4Possession of the null variant allele makes consumption of alcohol unpleasant; the high acetaldehyde concentrations induce facial flushing, palpitations, drowsiness, and other symptoms. People with null variant alleles drink much less alcohol (table).8 The table also shows the basic principle of mendelian randomisation—that genetic variants are not confounded; neither age nor smoking, likely confounders of associations between alcohol consumption and disease, are related to ALDH2 polymorphisms.
A longstanding debate about the cardioprotective effect of alcohol may be resolved by examining differences in risk factors and coronary heart disease events in people with different ALDH2 variants. Considerable evidence, including that derived from randomised controlled trials, suggests that alcohol increases high density lipoprotein (HDL) cholesterol concentrationsw5 w6(which should protect against coronary heart disease) and raises blood pressure (which should mitigate or reverse the protective effect of alcohol).w7 w8In line with this, the ALDH2 genotype related to higher alcohol consumption is associated with higher HDL cholesterol concentrations and hypertension (table). The difference in HDL cholesterol concentration between ALDH2 variants is similar to what would be predicted from the difference in alcohol consumption between ALDH2 variants and the effects of alcohol consumption on HDL cholesterol observed in randomised trials.w5 w6This mathematical similarity provides strong evidence for the utility of the mendelian randomisation approach.
Understanding effects of modifying physiological factors
Genetic variants can influence circulating biochemical factors such as cholesterol, homocysteine, or fibrinogen concentrations. This provides a method for assessing causality in associations observed between these measures (referred to in genetic epidemiology as intermediate phenotypes) and disease, and thus whether interventions to modify the intermediate phenotype could be expected to influence risk of disease. For example, familial hypercholesterolaemia is a mendelian dominant condition in which many rare mutations of the low density lipoprotein receptor genew9lead to high circulating cholesterol concentrationsw10and premature coronary heart disease.w11The inference to be made from this evidence (now well accepted) is that high blood cholesterol concentration is an important cause of coronary heart disease in the general population.
In familial hypercholesterolaemia, blood cholesterol concentrations are about 3.0 mmol/l higher than in the general population. Thus if we assume a linear relation between blood cholesterol and risk of coronary heart disease, the evidence from randomised trials of statins would predict an increase in coronary heart disease of around twofold for people with familial hypercholesterolaemia, whereas fourfold risks are observed.w11
Have the statin trials underestimated the effect of long term lowering of cholesterol concentrations? As atherosclerosis builds up over many years, short term trials of cholesterol lowering in adulthood would not mirror the lifetime effects of high blood cholesterol concentration in familial hypercholesterolaemia. In the statin trials the relative reduction in mortality from coronary heart disease increases over time from randomisation—and thus time with lowered cholesterol concentrations. This is what would be expected if raised cholesterol concentrations cause clinical atherosclerosis over decades. Furthermore, the strength of the association increases as the lag period between cholesterol measurement and mortality increases,w12indicating that long term rises in cholesterol concentration are the important aetiological factor and that the longer the reduction in cholesterol, the greater the benefits.
In mendelian randomisation approaches, genetic variants are equivalent to lifetime differences in blood cholesterol and indicate the long term effects of lower blood cholesterol concentrations on disease. They therefore generate more realistic estimates of causal effects that are free from measurement error or short term fluctuations in cholesterol concentrations, both of which may dilute the strength of association.
In some circumstances, physiological risk factors that seem to be targets for intervention may actually be influenced by the disease process itself— “reverse causation.” For example, the presence of atherosclerosis increases circulating fibrinogen concentrations. Although fibrinogen concentration predicts risk of coronary heart disease, a genetic variant associated with higher concentrations of fibrinogen is not associated with higher risk of disease.9 This suggests that reverse causation may generate the association between fibrinogen and coronary heart disease and, crucially, that lowering fibrinogen concentrations may not prevent disease. However, in such situations the genetic association studies need to be of very large sample size to provide robust evidence.9
Swinging the lead or toxic jobs?
The potential hazards of sheep dip have prompted parliamentary questions in Britain and considerable discussion about the merits of banning organophosphates used in sheep dip.w13Farmers exposed to sheep dips containing organophosphates attribute a variety of chronic symptoms to sheep dip.10 In 1999, however, a British government committee considered that the evidence did not support a causal link between sheep dip and farmers' symptoms.w14Randomised trials of exposure to sheep dip would not be feasible, and as people in studies of organophosphate exposure generally know the possible health problems, it is difficult to conduct valid case-control studies.
Genetic variants that modify the biological response to the exposure (in this case variants related to detoxification of organophosphates10) can indicate the effects of different levels of exposure. Different forms of the enzyme paraoxonase have varying ability to detoxify sheep dip. If organophosphates truly cause ill health, then people with the genetic variants that are less efficient would be expected to form a higher proportion of exposed people with symptoms than those with more efficient variants, and this is what is found.10 Since it is unlikely that the detoxification genotype will affect a person's tendency to report symptoms or to desire compensation or early retirement, these findings provide evidence that exposure to sheep dip has a causal effect on health.
Intrauterine environment and health
Fetal exposures during pregnancy are difficult to measure but may be modified by parental genotype. For example, folate deficiency in pregnancy is now known to be a cause of neural tube defects.w15 w16A polymorphism in the gene MTHFR is associated with metabolic effects equivalent to those seen with lower folate intake, and in a meta-analysis of case-control studies of neural tube defects, mothers homozygous for this variant (TT) had double the risk of having an infant with a neural tube defect compared with mothers homozygous for the CC variant.11 The relative risk of a neural tube defect associated with the TT genotype in the infant was less than that observed for maternal genotype, and paternal genotype had no effect on risk. This suggests that the intrauterine environment, influenced by maternal TT genotype operating as a proxy for lower maternal folate concentrations, rather than the genotype of the offspring increases the risk of neural tube defect. The association between maternal MTHFR genotype and risk of neural tube defects in offspring provides evidence that maternal folate intake is a key aetiological factor.
Limitations of mendelian randomisation
Clearly the approach has potential, but several limitations need to be considered (box). Establishing reliable associations between the genotype or intermediate phenotype and the disease is a particular concern and is largely related to the limited sample size of current studies.
This problem relates to all genetic association studies.13 A recent illustration of this is the association between MTHFR genotype and coronary heart disease. Since the MTHFR TT genotype is associated with raised homocysteine concentrations, a metaanalysis of studies that showed increased risk of disease associated with this genotype was taken as strong evidence of the causal nature of the association between homocysteine and coronary heart disease14 and used to support a protective effect of folate, which lowers homocysteine concentrations.15 Recently a large study has shown that the strength of the association (and thus the protective effect of folate) may have been overestimated.16 An updated meta-analysis showed evidence of potential publication bias (Begg test z = 2.06, P = 0.039).
Uses and limitations of mendelian randomisation in observational studies12
Confounding—genetic variants will not generally be liable to confounding by behavioural, socioeconomic, and physiological factors
Reverse causation—genetic variants will not be influenced by the onset of disease or by the tendency for individuals with disease to differentially report exposure history
Selection biases—genetic variants will not generally be influenced by factors determining how participants are selected into a study, either as a case or a control.
Attenuation by errors (regression dilution bias)—genetic variants will indicate differences in exposure level across the lifetime and associations will not be attenuated by random imprecision in measurement of the exposure.
Cannot be used without a reliable association between genotype and disease
Confounding of genotype by linkage disequilibrium between the genetic variant of interest and another genetic variant that influences the outcome
Genetic variants with multiple (pleiotropic) effects may lead to misleading conclusions
Canalisation and developmental stability
Inadequate biological understanding of the function of genetic variants
Although control of confounding is one of the advantages of mendelian randomisation, it is essential to show the relation of potential confounders to genotype to be sure that this is the case. Confounding may arise either through multiple effects of a genotype (pleiotropy) or through one genetic variant being physically close to—and thus transmitted with—another functional variant (linkage disequilibrium). Most tests of such potential confounding have suggested that it is either unappreciable or much less than is observed in conventional epidemiology. However, the possibility of such confounding should be examined in all studies. For example, in the study we discussed above of paroxanase variants and symptoms of sheep dip toxicity, the relation between genotype and symptom reporting among those unexposed to sheep dip should be explored to test the assumption that genotype is not related to differential symptom reporting.
Another problem is developmental compensation. If a person has developed and grown within an environment in which one factor is perturbed because of a particular genetic variant, they may be rendered resistant to its influence through permanent changesin tissue structure and function that counterbalance its effects—so called canalisation.7
Observational epidemiological studies have often identified spurious apparent causes of disease
Confounding has probably contributed to many of these misidentifications
Genetic variants that proxy for environmentally modifiable risk factors are not subject to confounding
Mendelian randomisation (use of genetic variations as proxies for modifiable risk factors in observational studies) is a powerful new strategy in epidemiology
Study of genetic variants can lead to inferences on how population-wide changes in environmental exposures could reduce risk of disease.17–20 Importantly, the inferences are concerned with attribution of causality that is relevant to whole populations and are not concerned with targeting interventions on those with specific genetic variants. By applying the increasing knowledge of human genetics, it is possible to improve knowledge of how diverse environmental exposures, many of which are socially patterned, shape population health and where future public health prevention should be targeted.
References w1-w16 are onbmj.com
We thank Nancy Krieger, John Lynch, and Neil Pearce for their comments.
Contributors and sources GDS and SE are experienced epidemiologists who have set up and maintain large observational cohort studies. Both authors planned the content and drafted the manuscript. The authors reviewed every hit produced by “Mendelian randomisation” and “Mendelian randomization” on Google and Google Scholar and all Web of Science citations to key references.
Competing interests None declared.