The Inherited Basis of Common Diseases

The Inherited Basis of Common Diseases

CHAPTER 41  THE INHERITED BASIS OF COMMON DISEASES   41  THE INHERITED BASIS OF COMMON DISEASES DAVID ALTSHULER A central question in medicine is to ...

133KB Sizes 1 Downloads 96 Views

CHAPTER 41  THE INHERITED BASIS OF COMMON DISEASES  

41  THE INHERITED BASIS OF COMMON DISEASES DAVID ALTSHULER A central question in medicine is to understand why some people get sick and others do not. We seek these answers for multiple reasons: to provide explanations to our patients, to improve our ability to predict disease risk, and most importantly, to understand pathophysiology as a foundation for designing rational approaches to prevention and therapy. In some cases, a single environmental exposure is found to play a major role in disease (e.g., smoking and lung cancer, or human immunodeficiency virus infection [HIV] and acquired immunodeficiency syndrome [AIDS]). In others, such as Huntington’s disease or cystic fibrosis, mutation of a single gene is necessary and can be sufficient to cause illness. Such singular answers are the exception; in most cases, disease is attributable neither to a single environmental factor nor to mutation of a single gene. Rather, most cases of disease result from the combined action of inborn and somatically acquired alterations in gene sequence, environmental and behavioral exposures, and bad luck. Such disorders, which make up most of the morbidity and mortality in the population, are termed complex traits. Human genetics is a unique tool for generating new hypotheses about the root causes of disease, based on genome-wide searches in the human population that are unlimited by prior assumptions about underlying pathophysiologic processes. Because the sequence of the human genome and much of its common variation is now known, and given emerging tools and methods to directly determine genome sequences of individuals, we are entering an era in which medicine can be informed by knowledge of the specific genes and variants that contribute to risk for common human diseases.

HERITABILITY: INHERITED VARIATION IN DISEASE RISK

Susceptibility to disease varies within and across human populations. Studies of familial aggregation can determine the extent to which inheritance contributes to these patterns. These studies are simple in concept and ask whether members of the same family display more similar rates of disease than do individuals chosen at random from the population. Of course, familial clustering can reflect not only shared genes but also shared environment. The contribution of shared genotype can be dissected further by comparing rates of disease within families as a function of the extent of genetic relatedness. The cleanest such design involves the comparison of disease concordance among dizygotic and monozygotic twin pairs. For common diseases such as types 1 and 2 diabetes mellitus, obesity, hypertension, coronary artery disease, autoimmune diseases, common cancers, schizophrenia, and bipolar disease, twin studies have documented that rates of concordance are significantly higher in monozygotic than in dizygotic twin pairs. For many other traits of clinical interest (e.g., most drug responses), formal tests of heritability have not yet been performed, and the role of inheritance in these characteristics is less well documented.

195

Data about familial aggregation allow the calculation of heritability, or the fraction of interindividual variability in disease risk attributable to additive genetic influences. The remaining variability among individuals is due to all other contributions: environmental influences on disease, nonadditive (epistatic) genetic effects (e.g., gene-gene interactions or gene-environment interactions), error in the measurement of relatedness or disease, and random chance. For most clinically important traits (diseases and risk factors), empirical estimates of heritability range from 20 to 80% (see Online Mendelian Inheritance in Man, available at www.ncbi.nlm.nih.gov:80/entrez/ query.fcgi?db=OMIM, for comprehensive information). When interpreting estimates of heritability, it is important to consider two crucial factors: the effect of measurement errors and the environmental context. Measurement errors can decrease the estimate of the heritability of a trait. A single measurement of blood pressure is much less heritable than a composite score based on serial measures of blood pressure over time. That is, day-to-day variability and imprecision in clinical measures can obscure an underlying biologic susceptibility that is entrained by inheritance. For the patient and physician, this means that although the blood pressure on a given day may not be particularly heritable, the blood pressure over time (which is presumably the relevant risk factor for vascular disease) is heritable to a much greater extent. Second, estimates of heritability must be interpreted in the context of the environment in which the study was performed. In the case in which environmental triggers of disease are relatively constant across a study population, inherited factors may explain much of the variation in rates of disease. In contrast, in the case in which exposure to environmental causes of disease is highly varied across the study population, nongenetic factors may outweigh the contribution of inborn susceptibility. For example, the rate and diversity of smoking behavior will have a major impact on how much of the variability in rates of lung cancer (in any given study or patient cohort) may be explained by inheritance. If nobody smoked (or everyone smoked), little of the variation in lung cancer risk would be due to smoking behavior; if, in contrast, half the population smoked multiple packs a day, and the other half not at all, this behavior would no doubt dominate over inborn susceptibility. For these reasons, heritability is not a fixed characteristic of a given disease, but an assessment of a given population, set of measurements, and the extent to which variability in genetic and environmental exposure explains disease risk. This sheds light on what is sometimes thought to be a contradiction between rates of disease being highly heritable (in a given population) and yet varying dramatically across populations separated by time, geography, or socioeconomic status. In broad comparisons across groups, environmental exposure and methods of clinical ascertainment can vary substantially and contribute to secular changes in patterns of disease. Conversely, within a group exposed to a relatively uniform environment and studied in a standardized manner, genetic susceptibility may play a major role in determining individual risk.

HETEROZYGOSITY: INHERITED VARIATION IN GENOME SEQUENCE

Heritability expresses the patterns of inherited variation in rates of disease; heterozygosity expresses the rate of inherited variation in genome sequences (Table 41-1). Heterozygosity is defined as the proportion of sites on the chromosome at which two randomly chosen copies differ in DNA sequence. Because cells are diploid (carry two copies of the genome sequence) and because these two copies were selected in a semirandom manner from the population, heterozygosity is equivalent to the fraction of base pairs that vary between the two copies each of us inherited from our mother and our father. That is, heterozygosity is the rate of genetic variation in the individual. Single-nucleotide polymorphisms (SNPs) are sites at which a single letter in the DNA code has been swapped for a single alternate letter. Such variants are observed at approximately 1 in 1000 positions in the human genome sequence. In the protein coding regions of genes, rates of genetic variation are lower—less than 1 in every 2000 bases; the rate of variation that substantially alters the sequence of the encoded protein is lower still (see Table 41-1). These rates can be understood in light of darwinian selection against changes that alter the amino acid sequence of encoded proteins. Genomes also contain larger scale variation: insertions and deletions of nucleotides, alteration in the number of copies of particular genes and sequences, and larger-scale alterations such as inversions and translocations. Both SNPs and larger-scale alterations can influence gene function and contribute to disease.

196

CHAPTER 41  THE INHERITED BASIS OF COMMON DISEASES  

TABLE 41-1 CHARACTERISTICS OF HUMAN GENOME SEQUENCE VARIATION Length of the human genome sequence (base pairs)

3,000,000,000

Number of human genes (estimated)

20,000

Fraction of base pairs that differ between the genome sequence of 1.3% (1 in 80) a human and a chimpanzee Fraction of base pairs that vary between the genome sequence of any two humans

0.1% (1 in 1000)

Fraction of coding region base pairs that vary in a manner that substantially alters the sequence of the encoded protein

0.2% (1 in 5000)

Number of sequence variants present in each individual as heterozygous sites

3,000,000

Number of amino acid–altering variants present in each individual as heterozygous sites

12,000

Number of sequence variants in any given human population with frequency of >1%

10,000,000

Number of amino acid polymorphisms present in the human genome with a population frequency of >1%

75,000

Fraction of all human heterozygosity attributable to variants with a frequency of >1%

98%

The genetic variation in each of us is due largely to common variants. Empirically, more than 98% of the heterozygous sites in each individual display frequency of greater than 1% in the worldwide human population. Because most human heterozygosity is due to common variants, a database containing all common (>1% frequency) sequence variants in the human population can be constructed by sequencing the genomes of only hundreds of individuals, and yet would capture most of the genetic variation in any individual. Built on the foundation of the human genome project, a catalogue of common DNA variants has been created by a series of public-private projects, including the SNP Consortium, International HapMap, and 1000 Genomes Projects. At the time of this writing, the public database contains more than 17 million human genetic variants (www.ncbi.nlm.nih.gov:80/SNP/index. html). Not all these entries represent common variants (some are rare), and some may represent technical false-positive findings. Nonetheless, the existing collection represents most of the common variants in each individual and has fueled efforts to systematically measure genetic variants for their contribution to disease. The major role of common variation in human sequence diversity is explained by the unique demographic history of the human population. Despite the global distribution of the current human population, it is now clear that all people on the planet are the descendants of a single population that lived in Africa only 10,000 to 40,000 years ago. The ancestral population was small (with an effective size of perhaps 10,000 individuals), lived a hunter-gatherer existence at low population densities (relative to other humans and later domesticated animals), and had evolved in Africa over millions of years. Most human genetic variation arose in this phase of human history, before the more recent migrations, expansions, and invention of technologies (e.g., farming) that resulted in widespread population of the globe. Most common human genetic variation predates the Diaspora and is shared by all populations on earth. A second factor is the slow rate of change in human DNA. Mutation and recombination occur at very low rates: on the order of 10−8 per base pair per generation. And yet, any pair of human genes traces a lineage back to a shared ancestor who lived on the order of 103 to 104 generations ago (if a generation is 20 years, then 104 generations is 200,000 years). In other words, considering the typical nucleotide in two unrelated humans, it is more likely that they trace back to a shared ancestor without any mutation having occurred than it is that a mutation has arisen in the intervening time. This explains why 99.9% of base pairs are identical when any two copies of the human genome are compared. Another aspect of human variation is explained by these simple mathematical and population genetic relationships: the extent of human DNA sequence diversity attributable to rare and common variants. Each of us inherits from our parents some 3 million common polymorphisms (classically defined as those with frequency of >1%). We inherit common variants that are shared by apparently unrelated individuals but do not reach a frequency of 1% or

higher. Finally, we inherit thousands of variants that are unique to each individual and their closest relatives. The question of how these different classes of variants influence disease is of central interest and importance to medical genetics. The shared ancestry of human populations explains another aspect of human genetic variation: the correlations among nearby variants known as linkage disequilibrium, or haplotypes. Empirically, individuals who carry a particular variant at one site in the genome are observed to be more likely than chance to carry a particular set of variants at nearby positions along the chromosome. That is, not all combinations of nearby variants are observed in the population, but rather only a small subset of the possible combinations. These correlations reflect the fact discussed earlier that most variants in our genomes arose once in human history (typically long ago) and did so on an arbitrary but unique copy carried by some individual in the population. The ancestral copy of the genome on which the mutation occurred can be recognized in the current population as a stretch of particular alleles (known as a haplotype) that track together in the population. That is, although most variations in our genome arose before written human history, the DNA sequence in each of us carries a record of the evolution and demographic history of the human population. These ancestral haplotypes, passed down from shared prehistoric ancestors in Africa, can be recognized in the current human population. The haplotype structure of the human genome offers a practical tool in association studies of human disease because it is not necessary to measure directly each nucleotide in order to capture much of the information. Such haplotypebased methods are the foundation for genome-wide association studies, discussed later.

THE SEARCH FOR GENES UNDERLYING MONOGENIC DISEASES

The genetic architecture of a disease refers to the number and magnitude of genetic risk factors that exist in each patient and in the population and their frequencies and interactions. Diseases can be due to a single gene (monogenic) in each family or to multiple genes (polygenic). It is easiest to identify genetic risk factors when only a single gene is involved and this gene has a large impact on disease in that family. In cases in which a single gene is necessary and sufficient to cause disease, the condition is termed a mendelian disorder because the disease tracks perfectly with a mutation (in the family) that obeys Mendel’s simple laws of inheritance. Some single-gene disorders are caused by the same gene in all affected families; for example, cystic fibrosis is always caused by mutations in CFTR. Although many individuals with cystic fibrosis carry the same founder mutation (δ-508), others carry any pair of a wide variety of different mutations in CFTR. The existence of many different mutations at a given disease gene is known as allelic heterogeneity. A mendelian disorder can be due to a single genetic lesion in any given family, but in different families can be due to mutations in a variety of genes. This phenomenon, termed locus heterogeneity, is illustrated by retinitis pigmentosa. Although mutation in a single gene is typically necessary and sufficient to cause retinitis pigmentosa, there are dozens of different genes in which retinitis pigmentosa mutations have been found (Online Mendelian Inheritance in Man #268000). In each family, however, only one such gene is mutated to cause disease. Most single-gene disorders are rare (present in <1% of the population) and are manifested early in life. Many are severe and cause death before reproduction in the absence of modern medical care. The fact that most monogenic disorders are severe in childhood and rare in the population is probably not a coincidence, but reflects the impact of natural selection. The deleterious effect of these mutations results in a decrease in reproductive fitness (in individuals unlucky enough to inherit them), and the mutations and the disease are therefore unlikely to drift to high frequency in the population. There are exceptions to this general idea: cases in which the mutation causing a severe monogenic disease (such as HbS, the cause of sickle cell anemia) is common in the population at large. Such cases appear to be the result of a different kind of selection, known as balancing selection—situations in which a gene mutation is beneficial in one circumstance (a genotype or environment) but deleterious in another. Heterozygous carriers for HbS are relatively protected against malaria, and this benefit balances the deleterious effect of sickle cell disease in homozygotes. Starting in the 1980s, the advent of genome-wide linkage analysis led to rapid success at identifying the specific genetic mutations that

CHAPTER 41  THE INHERITED BASIS OF COMMON DISEASES  

cause mendelian disorders, with hundreds of genes identified for clinically important conditions (for comprehensive information, see www.ncbi.nlm.nih. gov:80/entrez/query.fcgi?db=OMIM). Progress was sparked by the development of a suite of powerful research techniques—family-based linkage analysis followed by positional cloning—in which a genome-wide search is undertaken for the causal gene, which is first localized to a chromosomal region. (The initial idea of genetic linkage mapping traces to Sturtevant in fruit flies in 1913 but did not become practical in humans until the 1980s.) Once the search has been focused by the discovery of linkage between a chromosomal region and a disease, that chromosomal neighborhood is scoured for the genetic culprit, which is recognized based on the observation of mutations that alter the protein coding sequence, and are enriched in cases of disease compared with unaffected relatives and population-based controls. The power of these approaches prompted and was fuelled by the Human Genome Project, which provided the foundation of information on DNA structure, sequence, and genetic variation required to undertake such searches.

GENETIC INVESTIGATION OF COMMON DISEASES

Similar to mendelian disorders, most common diseases are influenced by inheritance. In contrast to mendelian disorders, however, the genetic contribution to common diseases appears to be due to the action of many genes rather than a single gene in each family. Empirical evidence in favor of this model comes from efforts to use the same approach (positional cloning) for complex traits that was applied successfully to monogenic disorders. In the 1990s, the tools of family-based linkage analysis were applied to nearly all common disorders. Much of this work was done in isolated founder populations (such as Finland and Iceland) with the goal of simplifying the genetic architecture and accessing extended pedigrees. Excepting a few notable successes, however, these studies revealed few strong signals that localized the genes responsible for disease. In most of the hundreds of such studies that have been published, there are many weak statistical signals (few if any are statistically significant given the large number of hypotheses tested) and little agreement between different studies of the same disease. In view of the well-understood statistical power of family-based linkage methods (based on their extensive use for monogenic disorders) and their relatively limited success despite extensive efforts in common diseases, it was concluded by most investigators that rare variants in single genes do not explain a large fraction of the risk for common diseases. If a single gene contained rare mutations of large effect that explained 20% or more of the inherited risk for type 2 diabetes, hypertension, or schizophrenia, it is likely that its location would long since have been found based on linkage analysis. A next potential shortcut to understanding the genetic determinants of common diseases is to identify and study rare, early-onset forms of diseases that clearly demonstrate mendelian patterns of inheritance. Because these families display patterns of inheritance consistent with a major gene of large effect, the powerful tools of positional cloning can be and have been used successfully to identify the genes responsible. Important examples include the role of BRCA1 and BRCA2 in early-onset breast cancer, maturity-onset diabetes of the young as a form of type 2 diabetes, many monogenic disorders of blood pressure and electrolyte regulation, early-onset Alzheimer’s disease, and many others. These successes provide diagnostic information for families burdened with severe, early-onset forms of disease and insight into the underlying pathways responsible for disease. For example, more than 20 genes have been identified that, when mutated, cause rare mendelian disorders of blood pressure and electrolyte regulation. So far, every one of these genes is active in the kidney, and most are involved in the renin-angiotensin-aldosterone pathway. This result is a compelling demonstration of the central importance of the kidney in human blood pressure regulation and has suggested new therapeutic targets of substantial promise. It was hoped that the genes found to be responsible for early-onset, monogenic forms of common diseases would contribute to the more common forms of disease in the population. In this scenario, severe mutations might cause early-onset forms, and more prevalent but subtle alterations in the same genes might contribute to common forms of disease. A comprehensive test of this hypothesis awaited tools from the human genome project and improved methods of genetic epidemiologic analysis.

ASSOCIATION STUDIES: FROM CANDIDATE GENES TO GENOME-WIDE ASSOCIATION STUDIES

197

Genome-wide association studies (GWAS) are simple in concept. A genetic variant is identified, its frequency is measured in individuals with the disease of interest, and it is compared with well-matched controls (drawn from the population at large or unaffected family members). This process can be repeated for as many genetic variants as exist—up to and including a genomewide collection. Appropriate analyses need be performed to rule out alternative explanations for an association to disease, such as mismatching of cases and controls, or technical artifacts. Because the null distribution is well described (under the hypothesis of no association between genotype and phenotype), it is possible to calibrate such analyses and to identify reproducible associations from out of the large sea of benign polymorphisms. Genetic association studies were pioneered in the context of the HLA locus on chromosome 6. The HLA was discovered based on its role in transplantation tolerance and is characterized by diverse allelic variation that can be measured based on interactions of antibodies and antigens. By measuring these protein-based (immunologic) readouts of the underlying genetic variation, HLA alleles were found to be a major determinant of susceptibility to infectious and autoimmune diseases. Starting in the 1960s, empirical data on human population genetics and genetic association studies were developed in the context of the HLA. By the 1980s, tools of molecular biology made it possible to directly measure DNA variation (rather than using protein or phenotype measurements as surrogates for the underlying genetic variation), ushering in the modern era of human genetic research. In this pregenomic era, it was only practical to measure one or a small number of genetic variations in each study, limiting association studies to incomplete assessments of individual “candidate” genes selected based on biologic criteria. The study of candidate genes led to a modest number of robust and reproducible associations, such as the contribution of Apo-ε4 to Alzheimer’s disease; factor V Leiden to deep venous thrombosis; a 32-base deletion in the chemokine receptor CCR5 to HIV infection; common variants in the insulin gene to type 1 diabetes; SNPs in the peroxisome proliferator-activated receptor γ (PPAR-γ) and the β-cell potassium channel Kir6.2 to the risk for type 2 diabetes. By early in the 2000s, comprehensive surveys of published genetic association studies showed that valid associations were few and far between, with many initial claims of association proving irreproducible, likely representing false-positive claims. One such analysis estimated that, in the pre-GWAS era, only 10 to 20 bona fide associations had been documented of common genetic variants with common diseases. A major reason for the state of this literature was the intrinsically low likelihood of finding a gene and variant contributing to any given disease. Each genome contains millions of genetic variants, and presumably only a small fraction of these influence disease. This is often described as a problem of “multiple hypothesis testing,” with the investigative community searching for associations between multiple genes, multiple variants in each gene, and multiple diseases. An alternative (bayesian) statistical framework frames this issue based on low prior probabilities of association. Regardless, it is conceptually clear that much more stringent statistical thresholds (than the traditional P < .05) are required for declaring association of genetic variants and disease. As in linkage analysis for mendelian traits, a key to success in association studies was the advent of genome-wide search, unbiased by prior hypotheses about biologic mechanisms. With the sequencing of the human genome, development of large-scale SNP databases, and tools for genotyping up to 1M SNPs per individual, by 2005 it became practical to perform GWAS to identify genomic loci harboring allelic variation. With a recognition that any given variant had a very low likelihood of truly being associated with disease, much more stringent statistical thresholds were deployed (typically requiring a P value of 10−7 or lower to declare “genome-wide significance”). Age-related macular degeneration (AMD) provided an early success of GWAS. AMD is a typical common, polygenic disease (Chapter 431); siblings of affected patients are perhaps three to six times as likely as unrelated individuals to become afflicted, and yet family-based linkage analysis revealed only modestly significant (and modestly reproducible) linkage results. The pathophysiologic defects that underlie AMD were largely unknown until it was found that a common coding polymorphism in the gene for complement factor H is a major risk factor for AMD. The variant (Y402H) has a high population frequency (approximately 35% in European populations) and

198

CHAPTER 41  THE INHERITED BASIS OF COMMON DISEASES  

increases risk by 2.5- to 3-fold in heterozygotes and by 5- to 7-fold in homozygotes. Multiple other complement factors have since been found to harbor common genetic variation that influences the risk for AMD in a highly reproducible manner, providing unambiguous information about the primary role of complement in this common disease. Since 2005, GWAS has been used to identify literally hundreds of novel genetic variants that show reproducible associations to a large variety of common human diseases. The field evolved a set of criteria and standards that largely eliminated the previous difficulties with irreproducible claims of association, making association studies a reliable method to identify genomic loci related to human diseases. The National Human Genome Research Institute of the National Institutes of Health maintains a catalogue of GWAS findings (www.genome.gov/26525384) that, at the time of this writing, included 904 such associations for 165 traits. This represents rapid progress compared with the two dozen or so such findings known at the start of the decade. The results of GWAS support a number of conclusions about the role of common genetic variants in common disease. First, most diseases investigated by GWAS have yielded novel findings, suggesting that the approach has general utility. Second, only a small fraction of these findings were previously known, indicating that new clues can be obtained by genetic mapping of common diseases. Third, most of the associations demonstrate extremely modest odds ratios (on the order of 1.1-fold to 1.5-fold), indicating that natural selection has purged alleles of large effect from the pool of common variants. Fourth, most of the associated SNPs lie in noncoding regions, suggesting that they act through effects on gene regulation rather than directly altering protein-coding sequences. Fifth, only a modest fraction of the estimated heritability of each disease has yet been explained, indicating a role for other common variants of more modest effect, rare variants, genetic interactions, or other (as yet unanticipated) influences. Because GWAS are genome-wide (not limited to “candidate” genes), they provide a test of the hypothesis that prior investigations had identified sets of genes relevant to each disease (relevant, that is, through the lens of genetic variation and inheritance). In the case of autoimmune diseases, many (perhaps half) of the 100 or more findings from GWAS lie near a gene previously known to play a role in the immune system. Similarly, a substantial fraction of the genetic variants found to influence lipid levels lie near genes that were previously known to play a role in lipid biology (because they were known already either through rare mutations that contribute to mendelian forms of hyperlipidemia or through biologic investigations). These findings confirm that the basic pathophysiologic mechanisms already known can be “validated” through the lens of inherited risk factors and encourage investigation of the new findings offered by GWAS. In contrast, for some diseases, most of the genetic variants found are novel and do not lie near genes previously studied. One such case is type 2 diabetes, for which 35 independent genomic loci have been found to influence risk for disease, and yet only a handful were previously implicated by other methods. This may indicate and illuminate gaps in our previous knowledge of the pathophysiology of type 2 diabetes. Although tantalizing, the results of GWAS have raised many more questions than they have answered. These discoveries implicate particular genomic regions, but to date in only a few cases have the causal genes been proved. This is challenging in large part because so many of these common variants are noncoding, and it remains difficult to connect noncoding variation to the genes thereby regulated. To the extent that truly novel genes are identified, much work is needed to discover their biologic and physiologic functions. Finally, GWAS findings explain only a modest fraction of the estimated heritability of most diseases, leaving open the question of which genes, and which types of variants and genetic effects, explain the remainder.

FROM COMMON VARIANTS TO INDIVIDUAL GENOMES

Although much of human genetic variation is due to common DNA variants (such as those tested through GWAS), each of us also inherits many thousands of variants that arose more recently and that tend to be lower in frequency and more population specific. To the extent that such variants have very large effects on phenotype, they may have been previously identified based on family-based linkage studies of mendelian disorders. However, there almost certainly exists a large universe of lower-frequency variations that are too rare to have been captured by the first generation of GWAS and that have effects too modest to have been recognized and identified in family-based linkage analyses.

The study of such lower-frequency and rare variants is now becoming practical owing to advances in technology for DNA sequencing. With dramatic drops in price and increases in throughput, it is increasingly practical to sequence individual genomes in the context of medical research (and, in the future, clinical practice). Such an approach will provide a much more complete assessment of genetic variation than was previously obtainable and will incorporate common as well as rare variants. The first task will be to develop methods to interpret the millions of variants in each genome. Variants of high frequency have already been well studied by GWAS, and each is increasingly annotated with information about disease association. For variants that are lower in frequency, but still observed in a substantial number of unrelated individuals, the basic association methodology can be applied. That is, the frequencies of each specific variant can be measured in affected cases and in unaffected controls and compared with an appropriately constructed null distribution and statistical threshold. Moreover, the availability of a much more complete database of DNA variation (as is being created by the 1000 Genomes Project) will lead to a second generation of GWAS that is more complete for lower-frequency variation. Many DNA variants will be unique to individuals (and their close relatives), however, and will require different approaches. If the analysis includes a large pedigree, with multiple affected and unaffected relatives, it may be possible to perform the association analysis in single families. More typically, however, it will be necessary to analyze large sets of samples, measure the rate of different (individually rare) variants in each gene, and compare these rates between cases and controls. In the distant future, we may learn to “read” genome sequences and predict the effect of a variant never before observed. For the foreseeable future, however, interpretation will require statistical analysis of genomes, documenting that variation in particular genes is robustly and reproducibly associated with each particular disease.

IMPLICATIONS AND FUTURE DIRECTIONS

Inherited factors contribute substantially to common as well as rare diseases. Mendelian disorders are typically caused by rare mutations in the proteincoding regions of genes. Common variants, investigated through GWAS, have typically modest effects and often act through noncoding effects on gene regulation. Each of us carries a deep reservoir of less common variation that will soon be tested for a role in disease using methods of next-generation DNA sequencing. It seems reasonable to expect that an integrative analysis of this information will result in a defined list of genes and variants in those genes (both common and rare) that contribute to each human disease. Success identifying genes and mutations will only prove of value if it leads to improved prediction, diagnosis, understanding, and treatment. Prediction and personalized medicine require a foundation of evidence that demonstrates clinical benefit. This will involve incorporating DNA variation in epidemiologically valid cohorts and testing particular approaches to genetic prediction in clinical trials. The fact that some genetic tests may prove predictive in no way means that routine genome sequencing will be useful to patients, and much work will be needed to offer the public and physicians guidance in interpreting such data. Biologic understanding requires bedside-to-bench research, in which genes found as mutated in patients are studied in the laboratory. It will be necessary to place these new genes into known (and as yet unrecognized) biologic pathways and to understand how dysfunction and dysregulation lead to disease. In some cases, such as the role of complement in AMD (see earlier), initial answers may come quickly; in others, in which the relevant pathobiology is as yet unknown, the information to be gleaned from following these clues is unpredictable. Presumably, in the fullness of time, the genetic insights gleaned from patients will lead to a new generation of therapies that more directly target the underlying root causes of risk in the population. What is most certain is that genetic and genomic information is accumulating at a staggering rate and holds both much potential and much challenge for the future of medicine. SUGGESTED READINGS Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322:881-888. A review of genetic mapping of human disease. Lifton RP. Individual genomes on the horizon. N Engl J Med. 2010;362:1235-1236. A commentary previewing the use of individualized genome sequences in clinical medicine. Manolio TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med. 2010;363:166-176. Review. Fark YJ, Claus R, Weichenhan D, et al. Genome-wide epigenetic modifications in cancer. Prog Drug Res. 2011;67:25-49. Review.