Clinical Review Science, medicine, and the future

Translating genomics into improved healthcare

BMJ 2010; 341 doi: http://dx.doi.org/10.1136/bmj.c5945 (Published 05 November 2010) Cite this as: BMJ 2010;341:c5945
  1. Aroon D Hingorani, professor of genetic epidemiology, British Heart Foundation senior research fellow12,
  2. Tina Shah, postdoctoral research fellow2,
  3. Meena Kumari, senior research fellow1,
  4. Reecha Sofat, specialist registrar in clinical pharmacology2,
  5. Liam Smeeth, professor of clinical epidemiology, Wellcome Trust senior clinical fellow3
  1. 1Genetic Epidemiology Group, Department of Epidemiology and Public Health, University College London, UK
  2. 2Centre for Clinical Pharmacology, Division of Medicine, University College London
  3. 3Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, WC1E 7HT, UK
  1. Correspondence to: L Smeeth liam.smeeth{at}lshtm.ac.uk

Summary points

  • Until recently the genetic basis of most common diseases remained elusive

  • Genome wide association analysis has now uncovered thousands of regions of human DNA where sequence variation influences susceptibility to common diseases

  • Common single nucleotide polymorphisms associated with disease are distributed across multiple chromosomal regions, have modest affects on disease risk, act additively, and explain only part of disease heritability

  • Association analysis is starting to provide data on the causes of common human diseases that should accelerate the design and development of new treatments

  • Emerging technologies, including rapid, less costly sequencing of whole genomes, bring the prospect of better diagnostics and treatment

  • Clinicians will need to keep updated on genetic advances that have healthcare applications

Since the work of Mendel,1 genetic research has been punctuated by key, shifting advances (box 1), which culminated in the first draft sequence of the human genome in 2001.2 3 4 Although most of the human genome sequence is shared by everyone, a small proportion varies between individuals. This variation, acting together with environmental factors, is thought to underlie differences in physiology, susceptibility to disease, and responses to drugs. We summarise the recent discoveries and review the implications of newly acquired knowledge for medical practice and public health.

Sources and selection criteria

Because of the wide remit of this article, we did not attempt a systematic search covering the whole of translational genetics. Instead, we used personal collections of major journal articles and reviews accumulated individually by all authors over several years of academic work in translational genetics.

Box 1 Key early milestones in genetic research

  • Discovery of nucleic acid (Meisscher 1869)

  • Identification of its key constituents—the organic bases adenine (A), thymine (T), guanine (G), and cytosine (C) (Kossel 1879) held on a backbone of deoxyribose or ribose sugars in DNA and RNA, respectively

  • The Boveri-Sutton chromosome theory of inheritance (1902)

  • The proposal that DNA (not protein) is the heritable genetic material that distinguishes organisms of different species (Chargaff 1950)

  • The discovery of the double helical structure of DNA (Crick and Watson 1953) provided the necessary insight into how base sequence information can be replicated and transmitted from parent to offspring. It also pointed the way to understanding how the DNA sequence is translated into the primary amino sequence of proteins, which, in turn, determines the structure and function of an organism. The unidirectional flow of information from DNA to RNA to protein was subsequently summarised in the central dogma (Crick 1958)

What is known about the molecular basis of single gene disorders?

Uncommon single gene (Mendelian) disorders such as cystic fibrosis and familial forms of type 2 diabetes, colon cancer, and breast cancer are caused mainly by mutations, usually in the coding sequence of a gene, that produce a major structural or functional disruption of an encoded protein. A mutation is both necessary and sufficient for disease to develop, and the pattern of disease transmission from parent to offspring is predictable. Mendelian disorders are categorised as autosomal dominant, autosomal recessive, or X linked, depending on whether the mutation is located on an autosome or sex chromosome and on whether one or two mutant alleles are needed for the disease to manifest itself. The responsible genes were identified mainly through analysis of DNA samples from multigenerational families (pedigrees) with affected and unaffected members, a technique called linkage analysis.5 6 Coinheritance of a genetic marker of known chromosomal location with the disease phenotype allows the position of the disease gene to be “mapped” in these families. For some conditions, mutations in different genes can produce the same disease phenotype, a phenomenon known as locus heterogeneity.7 DNA sequencing of candidate genes in the linked region allows the disease causing mutations to be identified. For some diseases, the precise disease causing mutations, even within the same gene, can also differ from family to family, and this is referred to as allelic heterogeneity.8 These discoveries provided great insight into the normal biological function of the affected genes and proteins and in some cases have led to the development of predictive genetic tests, gene based or drug treatments. A comprehensive list of single gene disorders and their molecular basis can be found online (www.ncbi.nlm.nih.gov/omim/).

What is known about the genetic basis of complex diseases?

Diseases that commonly affect adults (such as cardiovascular disease, diabetes, Parkinson’s disease, Alzheimer’s disease, and common cancers) result from a more complex interplay between genetic and environmental factors. Such diseases exhibit familial clustering, but there is no clear inheritance pattern because of the polygenic aetiology and the substantial contribution from the environment. Family based linkage analysis can rarely identify genetic mutations associated with these disorders because people are typically over 50 years before clinical presentation; by this time their parents may be dead and susceptible children too young to manifest the disorder. The “common-disease common-variant” hypothesis proposes that these diseases arise from many common DNA variants, each with a modest influence on disease risk.9 10 The presence of a particular genetic variation is neither necessary nor sufficient to cause disease but confers a modest increase in risk.

After publication of the draft human sequence in 2001, attention turned to cataloguing and studying the effects of common variations in the DNA sequence on the risk of common diseases. Single nucleotide polymorphisms (SNPs) are variations at a single base pair (fig 1) and are the most common type of human sequence variation, occurring about every 500-1000 DNA base pairs. Less common types of variation include single nucleotide insertions and deletions (indels) and deletion or duplication of longer tracts of DNA (copy number variation) (fig 1).

Figure1

Fig 1 The spectrum of common genetic variation includes single nucleotide polymorphism, insertion and deletion polymorphism, nucleotide repeat polymorphism, and copy number variation, all of which may affect coding or non-coding regions of DNA

The SNP Consortium developed the SNP database (www.ncbi.nlm.nih.gov/SNP/) to identify the most common single nucleotide variations in the genome. Each SNP in this database is allocated a unique reference number. The Human HapMap consortium (http://hapmap.ncbi.nlm.nih.gov/index.html.en) was then established to quantify the association between SNPs in the genomes of human populations with differing ancestry (linkage disequilibrium). Coupled with new technological developments, this provided the framework to develop arrays capable of typing around 500 000 SNPs across the genome. Information on another million or more SNPs could then be inferred on the basis of the known associations between SNPs. These arrays or “SNP chips” provide a cost effective way to genotype many thousands of people. 11

To investigate the genetic basis of common diseases, genome-wide case-control studies have compared the frequency of typed (and inferred) SNPs in large numbers of unrelated people affected by a disease and unaffected unrelated controls.12 Instead of the single exposure evaluated in a non-genetic case-control study (such as smoking), these studies examine hundreds of thousands of genetic exposures simultaneously. Points in the genome at which the frequency of alleles differs between cases and controls harbour the genetic variants contributing to disease risk. Figure 2 shows the results of such a study of myocardial infarction.13 The Office of Population Genetics website contains an up to date list of genome-wide association studies and publications (www.genome.gov/GWAStudies/).

Figure2

Fig 2 Manhattan plot from a genome-wide association study of myocardial infarction (adapted, with permission, from Samani and colleagues13). The x axis refers to points along the genome (separated by chromosome) at which each of the several hundred thousand single nucleotide polymorphisms (SNPs; represented by a dot) evaluated are located. The y axis refers to the negative logarithm of the P value for a test of association between each SNP and the binary outcome—the presence or absence of disease. For example, a −log P value of 7 (shown by the dashed red line) equates to a P value for the association of an SNP with disease of 1×10−7 or 0.0000001, and a −log P value of 7.3 equates to 5×10−8

Genome wide association studies involve hundreds of thousands of tests of association, so the number of false positive SNP-disease associations would be high if conventional thresholds were used to determine significant associations. Thus, stringent criteria are used before “genome-wide significance” can be declared. Typically, P value thresholds of 1×10−7 or 5×10−8 are used.14 A P value of 5×10−8 can be thought of as a P value of 0.05 with a Bonferroni correction for one million statistical tests—a typical number of variants genotyped or inferred in a genome-wide association study. Even with such stringent thresholds, positive findings are routinely replicated in independent datasets to confirm or refute association. Because SNPs of smaller effect size can be identified by increasing the available sample size, research consortiums focus on a single disease so that data from several case-control collections can be pooled and summarised using meta-analysis. This approach has increased the number of genetic loci identified for many disorders.15

How might findings from genome-wide association studies affect healthcare?

Accumulated evidence from such studies is now helping to determine the direction of future research and to clarify where the future healthcare applications will be. Typically, many genetic regions contribute to increased risk of complex disease (20 loci have been identified for type 2 diabetes and 40 for Crohn’s disease), but the effect at each region is weak (5-10% increase in risk) and seems to be additive and independent. As a result, genome wide association studies to date have explained only a small part of the heritability of common disorders (table). The findings currently have limited value for predicting disease risk but may have other important implications for healthcare provision (box 2).

Box 2 Insight into disease aetiology from genome-wide association studies

In Crohn’s disease, many of the single nucleotide polymorphisms (SNPs) associated with disease susceptibility lie in and around genes concerned with autophagy, which was previously not considered an important disease mechanism in Crohn’s disease.

In type 2 diabetes many of the loci encode proteins concerned with insulin secretion rather than insulin signalling, which had previously been the focus of research.

Several of the genetic loci associated with the risk of coronary heart disease influence low density lipoprotein-cholesterol, but some are located distant from known genes and seem not to influence any of the known risk factors for myocardial infarction. Thus, previously unknown disease mechanisms might be at work.

Data from different disorders show that some genetic regions or SNPs affect the risk of more than one disease. For example, different SNPs in the same region on chromosome 12 influence the risk of coeliac disease, type 1 diabetes, and myocardial infarction. The same SNP near the TCP2 gene on chromosome 8 is associated with the risk of developing type 2 diabetes and prostate cancer. These, and other examples indicate that some common disorders have a partially overlapping aetiological basis, and this may lead to new disease taxonomy.

Recently identified genetic variants associated with disease*

View this table:

Can genetics help predict risk of common disease?

Predictive genetic tests based on findings from genome-wide association studies are being offered commercially, despite concerns about their clinical value.16 17 With a few notable exceptions (such as age related macular degeneration18; table), carrying any one common risk allele increases the chance of experiencing a common disease event by only a fraction (typically 5-25%). This makes tests based on only one SNP poorly predictive. Several risk alleles, often on different genes, may contribute to increased risk of disease, so would information from a panel of common modest effect alleles be better at predicting disease than a single common risk allele? A person with 10 risk alleles, each in a different gene and each conferring a 20% increase in risk, might be expected to be at double the risk of disease compared with someone carrying none, and such a high risk person might benefit from a targeted preventive intervention. However, people with 10 risk alleles are rare in the population. For example, if the average frequency of a risk allele in a population is 30% (0.3), the probability of inheriting 10 such independent alleles is 0.310 (0.0000059). Because the frequency distribution of common independently inherited risk alleles is normal (bell shaped) and the association with risk is additive, more cases of disease would be expected in the many people with an intermediate number of risk alleles than the minority with a large number of alleles. Thus, the frequency distributions of risk alleles should overlap substantially among eventual cases and controls, making it difficult to separate the two groups by the number of risk alleles carried (fig 3).

Figure3

Fig 3 Association between the population frequency distribution of type 2 diabetes risk alleles (bars) and risk of incident diabetes (red line) in the Whitehall II study (adapted from Talmud and colleagues19)

In disease prevention the aim is often to stratify risk rather than to discriminate events. This is because many preventive interventions produce the same relative risk reduction whatever the risk so that the absolute benefit is larger (and the number needed to treat smaller) in people at high risk. If the number needed to treat to prevent one event in a person with 10 risk alleles was 100, the number needed to screen to prevent one event would be 100/0.000006 (16 666 666). Although genotype based tests are cheap, screening such a large number of people just to alter risk of disease in one would be very costly.

Genetic tests that capture a wider range of variability in a given gene or region, rather than simply a few SNPs, may be better for prediction. The discovery of rare or intermediate frequency alleles that have a much larger effect size than common HapMap alleles may also open up greater opportunities.20 Tests of rarer alleles may be useful in family based screening, such as that used for monogenic familial hypercholesterolaemia, which currently uses cholesterol measurement rather than genotyping.21

Can genetics improve understanding of the non-genetic causes of disease?

The genotype is unique among naturally occurring differences between people because it is allocated at random,1 fixed throughout life, and not modified by disease. Interpretation of genetic associations is therefore not limited by confounding (where exposure and disease seem to be associated because of common association with a third factor) or reverse causation (where the association between an exposure and disease is caused by the disease itself leading to an alteration in the exposure).22 23 24 Thus when the function of a gene is known, its association with a disease (however weak) can provide clear insight into the causal mechanisms leading to disease (fig 4). For example, ADH1 (alcohol dehydrogenase) gene variants are associated with differences in long term alcohol consumption, which is otherwise difficult to measure accurately. ADH1 variants can provide a useful index of long term usual alcohol consumption. Association between these variants and oesophageal cancer provides evidence for a causal role of alcohol consumption in the disease.25 Another example is C reactive protein (CRP) and its role in coronary heart disease. Raised concentrations of CRP are associated with an increased risk of heart disease. However, bias and confounding by reverse causality may partially or wholly explain these associations, so we do not know whether lowering CRP would reduce the risk of cardiovascular events. Genetic variants that influence CRP values are less prone to confounding, and reverse causality is not a problem. The presence or absence of an association between CRP genetic variants and disease can thus provide clear evidence on whether CRP actually plays a causal role in disease.26 27

Figure4

Fig 4 Conceptual parallels between a randomised controlled trial and a Mendelian randomisation experiment to judge the causal relevance of a biomarker associated with risk of cardiovascular disease (adapted from Casas and colleagues28)

Can genetics lead to improved therapeutics?

Using pharmacogenetics to develop genotype based predictive tests of drug response may help to personalise or stratify therapeutic interventions at an individual or group level. The number of pharmacogenetic tests currently approved for clinical use is limited. A recent systematic review of pharmacogenetic studies highlighted several methodological and other problems in this area,29 notably small underpowered studies and the widespread use of surrogate outcome measures. However, genetic loci associated with the risk of statin myopathy have recently been identified,30 as well as loci associated with hypersensitivity to the protease inhibitor abacavir,31 response to interferon treatment in hepatitis C infection,32 and dose requirements in people taking warfarin.33 Pharmacogenetic testing may eventually become common in some therapeutic areas, but the clinical value and cost effectiveness of emergent pharmacogenetic tests have not yet been subject to the same level of scrutiny and careful appraisal as other diagnostic tests.

As well as offering the potential for new pharmacogenetic tests, emerging genome-wide association studies of drug response could provide new insights into the pathways by which even well established drugs are handled by the body. The concept of pharmacogenetics is expanding to incorporate not only the use of genetic data to guide treatment but also to inform drug development. This is because studies in populations of variants in genes encoding a drug target protein can be considered to be a type of natural randomised trial and could be used to help predict the on-target effect of modifying the same target pharmacologically. For example, common variants in the CETP gene, which encodes cholesteryl ester transfer protein, the target of the CETP inhibitor torcetrapib, were associated with the same lipid and lipoprotein changes seen with torcetrapib treatment but were not associated with high blood pressure, an off-target effect of torcetrapib.34 The hope is that by providing randomised evidence of the effects of drugs in humans without requiring participants’ exposure to a new molecule of uncertain safety and efficacy new drug targets might be validated and the risks of late stage failure in drug development reduced through the application of genetic studies during early drug development.

Future directions

Much of the heritability of common diseases cannot be explained by common SNPs, so the focus is now moving towards other types of genetic variation, such as copy number and rare single nucleotide variation. Interest in heritable changes in gene expression caused by other processes, such as DNA methylation or histone modification, is also increasing.20 35

Some alleles with large effects on disease risk are likely to be rare (because natural selection reduces their frequency over time) and thus are not well represented on whole genome arrays. Efforts in this area are being stimulated by another technological advance, the ability to sequence a single human genome (exome sequencing) using “next generation” sequencing technology. In contrast to the first draft human genome sequence, which took several years and millions of pounds to complete, this now takes a few days and costs about £8000 (€9000; $12 600). Systematic rare variant discovery is now being undertaken as part of the international 1000 genomes project (www.1000genomes.org/page.php) and the UK 10k Consortium. Whole exome sequencing is expected to identify rare variants that influence risk in common disorders and identify mutations underlying sporadic single gene disorders that have not been amenable to linkage analysis because of the absence of multigenerational pedigrees. The variants that are discovered may eventually prove valuable as family based genetic tests.

Finally, analysis of the effects of newly discovered genetic loci in representative population based cohort studies (not case-control studies) are beginning to provide better information on the absolute (not relative) risk of common diseases, as well as insight into the modification of genetic effects by environmental factors. The UK Biobank project (www.ukbiobank.ac.uk/)—a prospective study that has recruited more than 500 000 volunteers, stored millions of biological samples (including DNA), and recorded information on lifestyle measures—will provide a new resource for scientists studying the environmental and genetic determinants of a wide range of common diseases in future decades.

Conclusions

As the use of whole genome sequencing becomes more widespread, an improved understanding of the causes of disease, better targeted drug treatments, and perhaps prediction of risk are realistic expectations. As in any area of medical advance, rigorous evaluation of new genetic based technologies will be needed. A major challenge for clinicians will be to keep updated on genetic advances with potential healthcare applications and to develop the ability to critically appraise research findings in this fast moving field.

Glossary

  • Allele: Alternative forms of a genetic locus; a single allele for each locus is inherited separately from each parent

  • Autosome: A chromosome not involved in sex determination

  • Complex disease genetics: The study of the patterns of inheritance of common diseases resulting from the combined action of alleles of more than one gene (such as heart disease, diabetes, and some cancers)

  • Epigenetics: The study of heritable changes in gene expression that are not caused by changes in DNA sequence

  • Genotype: The specific combination of two alleles inherited for a particular gene

  • Human genome: The total set of chromosomes found in an individual

  • Linkage analysis: Analysis of DNA markers that are near or within a gene of interest among families to identify the inheritance of a disease causing mutation in a given gene

  • Linkage disequilibrium: Where alleles occur together more often than can be accounted for by chance alone

  • Locus: The physical position of a gene or marker on a chromosome

  • Marker: An identifiable physical location on a chromosome (for example, a single nucleotide polymorphism or a gene), the inheritance of which can be assessed. Markers can be regions of DNA that are expressed (genes) or a segment of DNA with no known coding function but whose pattern of inheritance can be determined

  • Mutation: Any heritable change in DNA sequence that occurs in less than 1% of the population

  • Phenotype: The observable characteristics of an organism produced by the genotype (or environment, or both)

  • Sex chromosome: The X and Y chromosomes in humans that determine sex

  • Single nucleotide polymorphism (SNP): Variation at a single base pair (A, T, C, or G) in the DNA sequence

  • The SNP Consortium: A public-private partnership that was established to identify and map common SNPs. As part of the international HapMap project, it aimed to generate a high quality, extensive, publicly available map of SNPs as markers evenly distributed throughout the human genome in different populations

Additional educational resources

Resources for healthcare professionals
  • Online Mendelian Inheritance in Man (OMIM) (www.ncbi.nlm.nih.gov/omim)—Comprehensive database of human genes and genetic phenotypes that contains information on all known mendelian disorders and more than 12 000 genes

  • The SNP Consortium Database (www.ncbi.nlm.nih.gov/SNP/)—National Center for Biotechnology Information database of all known single nucleotide polymorphisms

  • Human HapMap Consortium (http://hapmap.ncbi.nlm.nih.gov/index.html.en)—Public resource developed by a partnership of scientists and funding agencies to help researchers find genes associated with human disease

  • Office of Population Genetics (www.genome.gov/GWAStudies/)—Catalogue of published genome wide association studies

  • 1000 Genomes Project (www.1000genomes.org/page.php)—Catalogue of human genetic variation

  • UK Biobank Project (www.ukbiobank.ac.uk/)—Resource to support a diverse range of research intended to improve the prevention, diagnosis, and treatment of illness, and the promotion of health

Resources for patients and the public

Notes

Cite this as: BMJ 2010;341:c5945

Footnotes

  • Contributors: LS and AH had the original idea for the article and developed the outline. All authors contributed to drafts and approved the final draft. LS and AH are guarantors.

  • Funding: LS is supported by a Wellcome Trust Senior Clinical Fellowship (082178). AH is supported by a senior fellowship from the British Heart Foundation (FS05/125) and is the principal investigator on a MRC Biomarker Award with funding from Pfizer. The funders had no role in the manuscript.

  • Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work (TS, MK, RS); LS had support from a Wellcome Trust senior clinical fellowship and ADH had support from a British Heart Foundation senior fellowship for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years (LS, TS, MK, RS); ADH is on the editorial board of Drug and Therapeutics Bulletin, a BMJ Group publication, has provided non-remunerated advice to GlaxoSmithKline and London Genetics, and has received honorariums for speaking at educational meetings with sponsorship from the drug industry, which have been donated in whole or large part to charity. ADH has received a Medical Research Council biomarker research award on which Pfizer is a co-sponsor; no other relationships or activities that could appear to have influenced the submitted work (all authors).

  • Provenance and peer review: Commissioned; externally peer reviewed.

References