Review
TRENDS in Molecular Medicine Vol.7 No.11 November 2001
23 Nolan, P.M. et al. (2000) Implementation of a large-scale ENU mutagenesis program: towards increasing the mouse mutant resource. Mamm. Genome 11, 500–506 24 Fuchs, H. et al. (2000) Screening for dysmorphological abnormalities – a powerful tool to isolate new mouse mutants. Mamm. Genome 11, 528–530 25 Nagy, T.R. and Clair, A.L. (2000) Precision and accuracy of dual-energy X-ray absorptiometry for determining in vivo body composition of mice. Obes. Res. 8, 392–398 26 Crawley, J.N. and Paylor, R. (1997) A proposed test battery and constellations of specific behavioral paradigms to investigate the behavioral phenotypes of transgenic and knockout mice. Horm. Behav. 31, 197–211 27 Tarantino, L.M. and Bucan, M. (2000) Dissection of behavior and psychiatric disorders using the mouse as a model. Hum. Mol. Genet. 9, 953–965 28 Tarantino, L.M. et al. (2000) Behavior and mutagenesis screens: the importance of baseline
29
30
31 32
33 34
35
analysis of inbred strains. Mamm. Genome 11, 555–564 Schindewolf, C. et al. (2000) Comet assay as a tool to screen for mouse models with inherited radiation sensitivity. Mamm. Genome 11, 552–554 Wechsler-Reya, R. and Scott, M.P. (2001) The developmental biology of brain tumors. Annu. Rev. Neurosci. 24, 385–428 Hammerschmidt, M. et al. (1997) The world according to hedgehog. Trends Genet. 13, 14–21 Chiang, C. et al. (1996) Cyclopia and defective axial patterning in mice lacking Sonic hedgehog gene function. Nature 383, 407–413 Brayton, C. et al. (2001) Evaluating mutant mice: anatomic pathology. Vet. Pathol. 38, 1–19 Johnson, G.A. et al. (1997) Magnetic resonance microscopy in basic studies of brain structure and function. Ann. New York Acad. Sci. 820, 139–148 Benveniste, H. et al. (2000) Magnetic resonance microscopy of the C57BL mouse brain. NeuroImage 11, 601–611
507
36 Paulus, M.J. et al. (2000) High resolution X-ray computed tomography: an emerging tool for small animal cancer research. Neoplasia 2, 62–70 37 Foster, F.S. et al. (2000) Advances in ultrasound biomicroscopy. Ultrasound Med. Biol. 26, 1–27 38 Duggan, D.J. et al. (1999) Expression profiling using cDNA microarrays. Nat. Genet. 21 (Suppl.), 10–14 39 Wells, C. and Brown, S.D. (2000) Genomics meets genetics: towards a mutant map of the mouse. Mamm. Genome 11, 472–477 40 Nakagata, N. (2000) Cryopreservation of mouse spermatozoa. Mamm. Genome 11, 572–576 41 Critser, J.K. and Mobraaten, L.E. (2000) Cryopreservation of murine spermatozoa. Ilar. J. 41, 197–206 42 Glenister, P.H. and Thornton, C.E. (2000) Cryoconservation – archiving for the future. Mamm. Genome 11, 565–571 43 Nadeau, J.H. et al. (2001) Sequence interpretation. Functional annotation of mouse genome sequences. Science 291, 1251–1255
Using genetic variation to study human disease James G. Taylor, Eun-Hwa Choi, Charles B. Foster and Stephen J. Chanock The generation of a draft sequence of the human genome has spawned a unique opportunity to investigate the role of genetic variation in human diseases. The difference between any two human genomes has been estimated to be less than 0.1% overall, but still, this means that there are at least several million nucleotide differences per individual. The study of single nucleotide polymorphisms (SNPs), the most common type of variant, is likely to contribute substantially to deciphering genetic determinants of common and rare diseases. The effort to identify SNPs has been accelerated by three developments: the availability of sequence data from the genome project, improved informatic tools for searching the former and high-throughput genotype platforms. With these new tools in hand, dissecting the genetics of disease will rapidly move forward, although a number of formidable challenges will have to be met to see its promise realized in clinical medicine.
James G. Taylor Eun-Hwa Choi Charles B. Foster Stephen J. Chanock* Section of Genomic Variation, Pediatric Oncology Branch, National Cancer Institute, Advanced Technology Center, 8717 Grovemont Circle, Gaithersburg, MD 20877, USA. *e-mail:
[email protected]
The completion of the first draft of a human genome map marks the beginning of a new age in the investigation of the genetic basis of human disease1,2. Until now, efforts have concentrated on mapping disease loci for highly penetrant mendelian disorders. Meanwhile, the unraveling of determinants of complex human traits has lagged behind. The ability to annotate the human genome has generated new enthusiasm for investigating genetic determinants of disease, particularly defining markers for polygenic disease loci. The Human Genome Project has identified the SINGLE NUCLEOTIDE POLYMORPHISM (SNP) (see Glossary) as the most common variant1,3–5 (Box 1). The availability of a comprehensive SNP catalog offers the possibility to identify many disease loci and, eventually, pinpoint http://tmm.trends.com
functionally important variants in which the nucleotide change alters the function or expression of a gene that directly influences a disease outcome. The study of the distribution of SNPs, particularly in different populations, is also valuable for investigating molecular events that underlie evolution, namely, genetic drift, mutation, recombination and selection6. Finally, the study of human variation can also illustrate important changes in human history, for example tracing the origin of populations and their migrations. What is an SNP?
An SNP is a stable substitution of a single base with a frequency of more than 1% in at least one population (see Box 1). SNPs are distributed throughout the human genome at an estimated overall frequency of one in every 1900 bp3. At the level of the chromosome, the density of SNPs appears to be relatively constant across the genome with the exception of the sex chromosomes3. Another measure of global diversity in the genome, MEAN HETEROZYGOSITY, is also relatively stable. The SNP Consortium (TSC) reported that only two autosomal chromsomes (i.e. 15 and 21) had mean heterozygosity scores that differed by more than 10% from the genome-wide average3. Even though we are early in the annotation of the genome, it likely that there are substantial differences in SNP densities among specific regions of each chromosome and possibly within individual genes1,3.
1471-4914/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S1471-4914(01)02183-9
Review
508
TRENDS in Molecular Medicine Vol.7 No.11 November 2001
Table 1. Well-studied genetic variants in human disease
SNPs and phenotype
Gene
Variant(s)a
Disease
Susceptibility Apolipoprotein E (APOE) Factor V (F5) Natural resistance-associated macrophage protein-1 (NRAMP1) Peroxisome proliferator-activated receptor-γ (PPARG)
ε4 allele Factor V Leiden Asp543Asn and intron 4 G/C Pro12Ala
Alzheimer’s disease20 Venous thrombosis24,b Tuberculosis29
B, C, and D structural variants G-376A and G-308A of promoter
Cystic fibrosis22
Gly16Arg
Catecholamine (albuterol) response25 Dosing of warfarin32
Outcomes Mannose-binding lectin (MBL2) Tumor necrosis factor-α (TNF ) Pharmacogenomicsc β2 adrenergic receptor (ADRB2)
Cytochrome P450 IIC, polypeptide 9 (CYP2C9)
Arg144Cys
Type 2 diabetes21
Cerebral malaria30,31
aFor
coding SNPs, the amino acid variant or common designation is listed. associated with resistance to activated protein C (APC resistance). cA subcategory of clinical outcomes. bAlso
In some cases, SNPs might confer a phenotypic advantage in response to a significant challenge and have been positively selected in a population. In addition, there are undoubtedly many less common single nucleotide variants, which do not occur at a sufficiently high enough frequency to meet the working definition of a SNP, but could nonetheless be of substantial biological or clinical importance7,8. In many instances, SNPs have been maintained by genetic hitchhiking, which occurs when linked neutral variants are dragged along with advantageous mutations. Background selection occurs when deleterious mutations are removed from a population, leaving behind flanking neutral variants6. Population shifts can also be traced by selected SNPs, which might have arisen because of a founder effect. The latter refers to a population in which many individuals share an identical region of a chromosome derived from a single ancestor. Box 1. Essential single nucleotide polymorphism (SNP) facts • • • • • •
Defined by a frequency of >1% in at least one population Stable inheritance Building block of haplotypes Estimated density of 1 in ~2 kb throughout the genome Bi-allelic – suitable for high-throughput genotype analysis Topographical classificationa Coding amino acid change (nonsynonymous, nonconservative) Coding amino acid change (nonsynonymous, conservative) No change in amino acid (synonymous coding) Noncoding (5′ UTR) Noncoding (3′ UTR) Other noncoding (including introns and intergenic regions) a Risch, N.J. (2000) Searching for genetic determinants in the new millennium. Nature 405, 847–856
http://tmm.trends.com
Interest has concentrated on SNPs that alter the function or expression of a gene because they afford the ability to fit genetic observations with plausible mechanisms of pathogenesis. For many complex diseases, it has been proposed that multiple variants confer susceptibility, but it is not clear whether rare variants or common variants will be the responsible determinants8,9. An SNP classification schema has been proposed based upon the location and likelihood to alter biological activity (Box 1). Based upon the first annotation of an SNP map across the draft sequence of the human genome, it has been estimated that there could be between 50 000 and 250 000 ‘functionally’ interesting SNPs10. This represents a small percentage of the total number of SNPs, most of which do not appear to alter phenotype. Although it is intuitively apparent that amino acid substitutions can change the function of a protein, gene expression can also be affected by SNPs positioned in critical regulatory sequences. It is notable that SNPs in the promoter regions of cytokines, namely tumor necrosis factor (TNF ), interleukin 4 (IL4) IL6 and IL10, (e.g. genes that coordinate the immune response) have been associated with a range of infectious and autoimmune disorders. This trend suggests that small differences in the regulation of key genes can amplify or dampen a biological pathway, for example, the coordination of host response to a pathogen. Such variation could ensure diversity in response to a range of infectious and inflammatory disorders. The significance of a functional SNP should be interpreted in the context of a defined study population that is under specific environmental pressure, because SNPs that confer selective advantage can be deleterious in a different setting. For example, the T→A nucleotide change in codon 6 of the β chain of hemoglobin (HBB) results in the sickle cell mutation. In the setting of endemic malaria, heterozygosity is protective against infection, but without this challenge, there is no selective advantage for heterozygosity at this locus11. How to find a SNP
The vast majority of SNPs, deposited into the public database db-SNP (http://www.ncbi.nlm.nih.gov/SNP/), have been discovered by the public effort, TSC. So far, TSC has already discovered over 1.5 million SNPs using sequence data from the public Human Genome Project3. In short, validated informatic algorithms have been designed to search for single nucleotide differences between aligned sequence reads and available genomic sequence. This approach has been successful in identifying common SNPs, namely those with a frequency of greater than 1% in a diverse panel of individuals representative of different populations. Moreover, this approach has concentrated on developing a dense map, with uniform coverage across the existing draft of the human genome.
Review
Unrelated population
TRENDS in Molecular Medicine Vol.7 No.11 November 2001
Family
Case-control or cohort study A×D Odds ratio = Gene frequency B×C Case
A
B
Control
C
D
Relative risk =
A/A+B
effective at predicting functional SNPs in noncoding regions. The availability of the mouse genome sequence and phylogenetic footprinting algorithms offers an additional opportunity for comparative genomic studies in the vicinity of known genes. Conserved regions of mouse and human genomes might contain functional SNPs within regulatory sequences.
C/C+D Genome-wide linkage analysis Outcome measure: LOD score
Candidate-gene association study Outcome measure: odds ratio or relative risk
509
Genome-wide association study † Outcome measure: odds ratio or relative risk
Transmission disequilibrium test Outcome measure: proportion of transmitted alleles TRENDS in Molecular Medicine
Fig. 1. Different study designs for population-based genetic association studies. Study populations suitable for genetic analysis include families with affected cases or unrelated affected cases and suitable controls. On the left, association studies of unrelated populations use the case-control or COHORT STUDY (see Glossary) designs; the genotype or allele frequencies are compared between the affected cases and unaffected control group. A difference in allele frequency between case and control groups is indicated by the odds ratio in case-control studies (or the relative risk in cohort studies). Candidate gene case-control studies are the most commonly performed analyses between unrelated cases and controls. Genome-wide association studies can employ a case-control design and analyze a collection of SNPs and microsatellites regularly spaced across the genome. On the right, family based genetic studies use linkage analysis to identify highly penetrant mendelian disorders; the main outcome measure of linkage being the LOD SCORE (or logarithm of odds). Currently, genome-wide linkage scans of family pedigrees are possible using 200–400 microsatellite markers at approximately 10 centiMorgan intervals, but such study designs might be of limited value for the identification of genes of modest effect16. The transmission disequilibrium test compares the proportion of alleles transmitted (or inherited) within a family looking at heterozygous parents and their offspring29. Case-control studies use the odds ratio as an outcome measure, while the main outcome measure for a cohort study is the relative risk. †Some have proposed that genome-wide association studies (i.e. case-control studies) can be performed with sufficient statistical power using unaffected siblings as controls (sibship-based tests) as an alternative to case-control studies of unrelated populations29.
Approximately 5% of SNPs in the public database have been discovered by gene-based studies3. Such deliberate efforts to re-sequence a gene or set of genes belonging to a common biological pathway have moved forward slowly. Public efforts are underway to identify and validate SNPs present in genes drawn from critical pathways, such as the Th1 and Th2 cytokines, complement cascade factors and mediators of apoptosis. In turn, these can be applied to relevant studies, which are based upon a plausible hypothesis. For example, functional SNPs in DNA repair genes could be studied in selected cancers. This approach enables an investigation that examines a biological process in toto. Using a biological model of a pathway, the choice of candidate SNPs can be extended to include known genes that participate in the dynamic process, such as the targets of signal transduction or mediators of intracellular trafficking. In parallel, elaborate alignment programs are able to search clusters of EXPRESSED SEQUENCE TAGS (ESTS) for possible SNPs that have an increased likelihood of residing in a coding region or a 3′ untranslated region of a gene12,13. Sophisticated informatic tools can search for putative SNPs that predict amino acid substitutions, but are less http://tmm.trends.com
Construction of haplotypes
Combinations of SNPs are inherited together on the same DNA strand to form haplotypes. Although early in the effort to annotate the genome, it appears that regions of haplotype sharing are large3,33. If the number of common haplotypes is limited, then haplotype analysis could streamline the analysis of genetic factors responsible for complex diseases14. Without prior knowledge of the functional variant, genomic regions can be evaluated for association with disease outcomes. Later, the functionally significant SNPs can be identified by closer examination of regions identified by haplotype-based studies. The public effort to map SNPs across the entire genome has established a foundation upon which to construct common and uncommon haplotypes using LINKAGE DISEQUILIBRIUM analysis15,33. Study design for SNP-based approach towards studying human diseases
Two experimental strategies are being used by geneticists to investigate genetic variants in human disease: linkage analysis and association studies. The former, LINKAGE ANALYSIS, seeks to define a physical relationship between two or more genetic markers, identifying the location of a disease gene, whereas ASSOCIATION ANALYSIS correlate a sequence variant with a well-defined phenotype. Linkage analysis compares inheritance patterns of predefined genetic markers residing on the same chromosome to a disease outcome. The approach pinpoints a region of a chromosome based upon a non-random pattern of co-inheritance of markers, although usually it does not uncover the specific genetic variation responsible for disease outcome. Multigeneration family pedigrees are required to facilitate tracing segregation patterns of genetic markers. Linkage studies have been employed to map hundreds of highly penetrant disease loci. The advantage of analyzing family pedigrees is that disease inheritance can be compared to patterns of linkage over large genomic regions in an effort to map genetic mutations. On the other hand, it is difficult to collect affected families of sufficient size or number to effectively apply these methods. Finally, traditional linkage studies might have limited power for identifying the moderate gene effects postulated to contribute to complex diseases16. Association studies test whether a SNP or MICROSATELLITE is enriched in patients with disease compared to suitable controls. Typically, the study uses a case-control or cohort strategy involving unrelated subjects (Fig. 1). If a susceptibility factor is more
Review
510
TRENDS in Molecular Medicine Vol.7 No.11 November 2001
Human genome High-density, genome-wide genetic map
*
*
*
*
*
Candidate gene Map SNPs
* * * **
Genome-wide association study Odds ratio 1.0
Candidate gene association study
** * * *
* * *
Genetic marker
Informative SNP and candidate gene haplotype
Map SNPs and haplotypes in candidate gene(s)
SNPs and human disease Validation in clinical study and in vitro correlation
* ** Informative SNP and candidate gene haplotype
TRENDS in Molecular Medicine
Fig. 2. Parallel approaches to investigating genetic variation and its contribution to disease. On the left, the approach to a genome-wide association scan is portrayed, which requires a sufficiently dense map of markers (microsatellites or SNPs) distributed across the genome. A genetic marker is localized to a region of a chromosome if a sufficiently high outcome measure, such as an odds ratio, is observed. Candidate genes in the region are later analyzed to pinpoint the genetic variant (or haplotype) responsible for the observed phenotype. On the right, the figure shows the design for a candidate gene study. Candidate SNPs are selected on the basis of prior biological or genetic observations. Both approaches utilize case-control or COHORT STUDY (see Glossary) designs where affected and unaffected cases are compared in relation to well-defined epidemiological endpoints (with outcome measures of either odds ratio for case-control studies or relative risk for cohort studies). *denotes genetic marker (SNP or microsatellites). ‡denotes informative genetic marker.
prevalent in cases than in controls, an association could be inferred. Still, a variant might not be directly responsible for a disease phenotype, but could be a genetic marker in linkage with a nearby locus. Association studies could be particularly useful in the identification of genetic markers that modify risk for disease outcome. Optimal study design typically interrogates SNPs that are present in sufficiently high frequency and that alter the function or expression of the gene product. Thus, a candidate SNP association study is particularly useful if a specific disease model is to be tested, because it utilizes genetic markers presumed to have phenotypic consequences. Until recently, the cost and efficiency of genotype platforms have limited the opportunity to analyze large numbers of SNPs in association studies. SNPs identified in pilot studies should be re-tested in additional populations, but the difficulty in replicating findings in separate study populations has raised questions about the applicability of the results to populations with different genetic and environmental profiles17. Association studies are usually conducted with unrelated subjects, which overcomes the enormous problem encountered in gathering enough related subjects for family-based studies. However, TRANSMISSION DISEQUILIBRIUM testing is a useful method to study genetic associations if a sufficient number of families are available. There is a limitation for the use of CASE-CONTROL STUDIES, comprised of unrelated subjects because the selection of controls could lead to bias. The problem of population stratification arises http://tmm.trends.com
when there is a sufficiently large difference in both the ethnic admixture between cases and controls and allele prevalence based upon ethnicity. In simulated analyses, large differences in allele frequency by ethnic background, together with significant differences in ethnic admixture between cases and controls are needed before bias substantially influences the result of a genetic association study18. One solution is to test a set of unlinked genetic markers in the study population to directly determine if there is a substantive difference between the estimated ancestry of the sampled cases and controls19.
Studies using SNPs to probe the genetic basis of human disease can provide insights into fundamentally different questions: susceptibility to a disease, modification of the phenotype of a monogenic disease, and response to pharmacologic treatment (see Table 1). Among the best examples of an association between a genetic marker and a complex disorder is that of Alzheimer’s disease (AD), in which the ε4 allele of apolipoprotein E (APOE ) has been strongly correlated with the risk of AD (Ref. 20) (Table 1). The field has moved towards examining gene–gene interactions by evaluating additional SNPs in AD (Ref. 17). Another example is that of non-insulindependent diabetes mellitus (NIDDM) in which at least 16 common SNPs have been associated with NIDDM. The strongest association validated in confirmation studies has been reported between a variant of the peroxisome proliferator-activated receptor-γ gene (PPARG) and diabetes mellitus21 (Table 1). Although many have embraced the utility of SNP markers for the study of complex diseases, the candidate gene approach is also well suited to examining phenotypic differences in outcomes in diseases typically considered monogenic (e.g. cystic fibrosis, sickle cell anemia or chronic granulomatous disease)22,23. For many years, it has been appreciated, and even confirmed in family studies, that the clinical course of patients with cystic fibrosis (CF) is heterogeneous, even among those with a common CFTR genotype (∆F508). In most cases, the primary mutation does not effectively predict the specific outcomes observed in the course of the disease. Recently, variants in the mannose-binding lectin, MBL2, have been shown to be associated with deleterious pulmonary outcomes in CF (Ref. 22) (Table 1). Here, the link between SNPs, which alter the function and circulating level of the protein, has provided a new insight into the pathogenesis of lung disease. Thus, it is possible that therapeutic replacement of recombinant MBL2 protein could benefit a subset of CF patients. For the clinician, the challenge is to apply the fruits of candidate SNP studies to patient care. Already, clinicians have embraced testing for informative SNPs; for example, genotype analysis for the Factor V Leiden variant is one of several important parameters used to determine the appropriate duration of anticoagulant therapy following a documented venous thrombosis24.
Review
TRENDS in Molecular Medicine Vol.7 No.11 November 2001
Pharmacogenomics is an emerging field, which attempts to use genetic variation to explain differences in response to drug therapy. For example, variants in the β2 adrenergic receptor (β2 AR) affect the regulation of receptors in response to catecholamine exposure25. Individuals who are homozygous for a SNP at codon 16 ADRB2 respond differently to a commonly prescribed β agonist in asthma patients. Similarly, genetic variation could be used to identify individuals at high risk for life-threatening toxicities. Future directions
The clinical potential of the SNP revolution has yet to be realized (Fig. 2). It remains to be determined whether the current approaches, namely candidate gene studies and whole-genome scans, will be effective in dissecting the genetic basis of common as well as rare complex diseases (see Anthony Brookes, page 512). So far, the majority of published studies have identified a handful of genetic markers (i.e. SNPs) that alter the risk for disease outcome. It is difficult to predict how many genetic markers will be validated in well-designed confirmation studies. If the current approaches are successful, the next step will be to understand the relationship between informative SNPs and haplotypes and begin to build profiles comprising genetic markers present in different regions of the genome. If this is accomplished, an additional, difficult step will be to determine the significance of each contributing SNP or haplotype. In the near future, it will be necessary to develop more sophisticated analytical tools to investigate gene–gene interactions, but on a scale larger than currently available. Similarly, it will also be critical to account for gene–environment interactions, many of which will require parallel investigation in animal models or in vitro laboratory studies. Deciphering the contribution of genetic variation to disease outcome carries important implications for the theory and practice of medicine. Based upon the initial findings of such investigations, it is anticipated that the most significant observations derived from SNP-based studies will be to identify genetic markers that alter the risk for a disease outcome. The intercalation of these data into clinical medicine will be a daunting task, because it will require substantial shifts in both the public’s perception of genetic risk factors and the establishment of a new set of paradigms for clinical medicine. Currently, genetic testing is restricted mainly to a set of informative, References 1 Venter, J.C. et al. (2001) The sequence of the human genome. Science 291, 1304–1351 2 Lander, E.S. et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860–921 3 Sachidanandam, R. et al. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 http://tmm.trends.com
511
highly penetrant disease conditions, in which the presence of a genetic mutation is predictive of a defined outcome (i.e. diagnosis of cystic fibrosis or hemophilia mutations). The exact manner in which the findings of SNP-based studies will be integrated into clinical practice remains to be defined. Most likely, it will impact preventive and early intervention strategies, encouraging individuals to make smart choices with respect to exposures or participation in high-risk activities (see Helena Furberg and Christine Ambrosone, page 517). For example, determination of a common variant in the myeloperoxidase gene, MPO, could be used to encourage individuals at higher risk for lung cancer to minimize exposure to tobacco smoke26. The promise of pharmacogenomics could substantially shift current algorithms for drug therapy in clinical medicine. Selection of a particular drug could take into account an individual’s genetic profile to predict the likelihood for response. Determination of genetic profiles associated with severe or life-threatening toxicity will permit health care providers to choose alternative prescriptions. An informative example is the altered metabolism of a commonly used drug for treatment of childhood acute lymphoblastic leukemia (ALL), 6-mercaptopurine (6MP), which has a frequency of slightly less than 1%. Previously, it had been noted that a small subset of patients develop potentially fatal hematopoietic toxicity. This low-prevalence thiopurine intolerance occurs in individuals who are homozygous for one of several rare variants in the TPMT (thiopurine S-methyltransferase) gene27. These data suggest that screening of newly diagnosed patients could avoid deleterious complications in all and thus, therapy could be modified for the at-risk group. Conclusions
There are a number of important challenges to meet before the results of SNP-based studies can be integrated into clinical medicine. It is possible that medicine will evolve to incorporate predictive models, based upon genetic profiles. However, as this shift looms on the horizon, we need to address the difficult ethical, financial and personal questions required to accommodate the coming age of individual, tailored medicine28. The combination of SNP analysis with new approaches to investigate profiles of gene expression and proteomics should lead to fundamental insights into the biological importance of common genetic variations in the human genome.
4 Cargill, M. et al. (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22, 231–238 5 Halushka, M.K. et al. (1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 22, 239–247 6 Nachman, M.W. (2001) Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17, 481–485
7 Glatt, C.E. et al. (2001) Screening a large reference sample to identify very low frequency sequence variants: comparisons between two genes. Nat. Genet. 27, 435–438 8 Pritchard, J.K. (2001) Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 9 Collins, F.S. et al. (1997) Variations on a theme: cataloging human DNA sequence variation. Science 278, 1580–1581
512
Review
TRENDS in Molecular Medicine Vol.7 No.11 November 2001
10 Risch, N. (2001) The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol. Biomarkers Prev. 10, 733–741 11 Flint, J. et al. (1998) The population genetics of the haemoglobinopathies. Baillieres Clin. Haematol. 11, 1–51 12. Buetow, K.H. et al. (1999) Reliable identification of large numbers of candidate SNPs from public EST data. Nat. Genet. 21, 323–325. 13 Irizarry, K. et al. (2000) Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nat. Genet. 26, 233–236 14 Taillon-Miller, P. et al. (2000) Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat. Genet. 25, 324–328 15 Kruglyak, L. (1999)Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22, 139–144 16 Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science 273, 1516–1517 17 Emahazion, T. et al. (2001) SNP association studies in Alzheimer’s disease highlight problems for complex disease analysis. Trends Genet. 17, 407–413 18 Wacholder, S. et al. (2000) Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J. Natl. Cancer Inst. 92, 1151–1158
19 Pritchard, J.K. and Rosenberg, N.A. (1999) Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 20 Strittmatter, W.J. et al. (1993) Apolipoprotein E: high-avidity binding to β-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc. Natl. Acad. Sci. U. S. A. 90, 1977–1981 21 Altshuler, D. et al. (2000) The common PPARγ Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat. Genet. 26, 76–80 22 Garred, P. et al. (1999)Association of mannosebinding lectin gene heterogeneity with severity of lung disease and survival in cystic fibrosis. J. Clin. Invest. 104, 431–437 23 Foster, C.B. et al. (1998) Host defense molecule polymorphisms influence the risk for immune-mediated complications in chronic granulomatous disease. J. Clin. Invest. 102, 2146–2155 24 Seligsohn, U. and Lubetsky, A. (2001) Genetic susceptibility to venous thrombosis. New Engl. J. Med. 344, 1222–1231 25 Israel, E. et al. (2000) The effect of polymorphisms of the β(2)-adrenergic receptor on the response to regular use of albuterol in asthma. Am. J. Respir. Crit. Care Med. 162, 75–80
26 London, S.J. et al. (1997) Myeloperoxidase genetic polymorphism and lung cancer risk. Cancer Res. 57, 5001–5003 27 Krynetski, E.Y. et al. (1996) Genetic polymorphism of thiopurine S-methyltransferase: clinical importance and molecular mechanisms. Pharmacogenetics 6, 279–290 28 Collins, F.S. (1999) Shattuck lecture – medical and societal consequences of the Human Genome Project. New Engl. J. Med. 341, 28–37 29 Bellamy, R. et al. (1998) Variations in the NRAMP1 gene and susceptibility to tuberculosis in West Africans. New Engl. J. Med. 338, 640–644 30 McGuire, W. et al. (1994) Variation in the TNF-α promoter region associated with susceptibility to cerebral malaria. Nature 371, 508–510 31 Knight, J.C. et al. (1999) A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria. Nat. Genet. 22, 145–150 32 Furuya, H. et al. (1995) Genetic polymorphism of CYP2C9 and its effect on warfarin maintenance dose requirement in patients undergoing anticoagulation therapy. Pharmacogenetics 5, 389–392 33 Reich, D.E. et al. (2001) Linkage disequilibrium in the human genome. Nature 411, 199–204
Rethinking genetic strategies to study complex diseases Anthony J. Brookes Understanding the genetic basis of complex diseases is turning out to be difficult, prompting a widespread (re-)evaluation of the relevant issues. ‘Forward’ and ‘reverse’ genetics strategies have been applied arguably in a manner only suitable for much simpler diseases. It would now be beneficial to pay detailed attention to experimental design, and to increase study scales dramatically. Ultimately, this would lead to completely hypothesis-free, truly comprehensive, multi-platform investigations. Such studies would maximize the chances of finding data patterns indicative of real etiology, although many aspects of complex disease causation might simply be too intricate and inconsistent to ever be deciphered. Therefore, considerable technology development is an immediate priority, along with parallel advances in bioinformatics and biostatistics systems aimed at discriminating between marginal signals and background noise within extremely large, diverse and complex data sets. Community standards and open data sharing will be essential ingredients for success in this exciting 21st-century challenge.
Anthony J. Brookes Center for Genomics and Bioinformatics, Karolinska Institute, Theorells väg 3, S-171 77 Stockholm, Sweden. e-mail: Anthony.Brookes@ cgb.ki.se
Researchers worldwide are working hard to elaborate the genetic basis of complex diseases, but progress is frustratingly slow. To appreciate why this is so, one might compare deciphering the molecular etiology of complex morbidity to the challenge of reassembling a jigsaw puzzle to create a view or understanding that did not previously exist. http://tmm.trends.com
This simple analogy is useful as it highlights two points that will be echoed throughout this text. First, just as a young child would not be able to tackle a highly complex jigsaw puzzle, one may expect there will be many aspects of disease etiology that investigators (with less than perfectly developed technologies) will never be able to resolve1. Second, rather than completing the jigsaw by (adult) rational thinking, even the dumbest robotics device could in principle reassemble a puzzle of immense complexity by implementing the simple command ‘try all combinations and see what consistently fits’ (see Fig. 1). Similarly, as statistical genetic, genomic and computational technologies improve, it is likely that within one or two decades a corresponding ‘hypothesis free’, comprehensive, and highly automated research strategy could turn out to be the most effective (although still limited) way to unravel the molecular basis of human disease. The complex disease puzzle
To place these ideas in a real-world context, one can consider the history of previous genetic disease
1471-4914/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S1471-4914(01)02163-3