Clinica Chimica Acta 363 (2006) 127 – 137 www.elsevier.com/locate/clinchim
Review
Clinical genotyping: The need for interrogation of single nucleotide polymorphisms and mutations in the clinical laboratory Gregory J. Tsongalis a,*, William B. Coleman b a
Department of Pathology, Dartmouth Medical School and Dartmouth Hitchcock Medical Center Lebanon, NH, United States b University of North Carolina Medical School, Chapel Hill, NC, United States Received 20 April 2005; received in revised form 21 May 2005; accepted 21 May 2005 Available online 15 August 2005
Abstract Background: Detection of single nucleotide polymorphisms (SNPs) and gene mutations is becoming more routine to the clinical laboratory. Methods: Completion of the Human Genome Project has led to new scientific knowledge of human disease processes that has revealed the most fundamental of abnormalities in nucleic acids while at the same time bringing some of the most sophisticated diagnostic tools to the clinical laboratory. In addition, public awareness (both lay persons and healthcare providers) and sensitivity to human genetics has increased tremendously. Together, this rapidly evolving science and increased public education has led to an increasing demand for genotypic testing. Conclusions: There are several clinical applications of human genotyping that are available using these newer technologies. D 2005 Elsevier B.V. All rights reserved. Keywords: SNP; Genotyping; Mutation; Molecular diagnostics
Contents 1. 2. 3. 4. 5. 6. 7.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Human Genome Project . . . . . . . . . . . . . . . . . . . . . Single nucleotide polymorphisms (SNPs). . . . . . . . . . . . . . . Educating the populace . . . . . . . . . . . . . . . . . . . . . . . . Sources and types of genetic variation . . . . . . . . . . . . . . . . New genotyping technologies. . . . . . . . . . . . . . . . . . . . . Clinical applications . . . . . . . . . . . . . . . . . . . . . . . . . 7.1. Analysis of less complex genetic diseases . . . . . . . . . . . 7.2. Analysis of highly complex and common polygenic diseases . 8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
127 128 128 129 129 131 132 133 135 136 136
1. Introduction
* Corresponding author. Molecular Pathology, Dartmouth Hitchcock Medical Center, One Medical Center Drive, Lebanon, NH 03756, United States. Tel.: +1 603 650 5498; fax: +1 603 650 4845. E-mail address:
[email protected] (G.J. Tsongalis). 0009-8981/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.cccn.2005.05.043
The ability for clinical laboratories to detect human genetic variation has historically been limited to a rather small number of traditional genetic diseases whose gene sequences and mutation spectra had been clearly identified. For many of these diseases, no clinical laboratory testing was ever available and treatment options were minimal if at
128
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
all existent. These realities made turn-around-time and assay performance characteristics less of an issue than the quality assurance we practice today. Currently, there is a growing need for clinical laboratories to provide high quality genotyping-based tests within the realm of clinical relevance, including performance and turn-around-time. This need is being driven by the completion of the Human Genome Project, identification of millions of human single nucleotide polymorphisms (SNPs), education of the populace with respect to genetic diseases, and development of new technologies suitable for the clinical laboratory [1]. All of these driving forces will continue to place an unprecedented demand on the clinical laboratory to provide increased and refined diagnostic testing capabilities for rapid identification of genomic targets. While all genetic variants could be detected by rapid genotyping methods, this review will provide a general overview of the Human Genome Project effort, several new technologies and examples of clinical genotyping for monogenic and polygenic diseases.
ushering in the ‘‘genomic era’’ [7,8]. Since that time, the International Human Genome Sequencing Consortium has continued to work towards completion of the finished sequence of the human genome. The most recent version of the human genome sequence contains 2.85 billion basepairs, corresponding to 99% of the euchromatic genome, with very few gaps remaining to be resolved [9]. Despite the rapid progress towards completion of the sequence of the human genome, the actual number of genes contained in the human genome remains unknown. The difficulty with this determination reflects the lack of straightforward criteria that will reliably identify structural genes in the genomic DNA sequence [10]. Early estimates suggested that the human genome might contain 70,000 to 100,000 genes [11 – 13]. However, more recently the number of genes contained in the human genome has been estimated to be approximately 30,000 – 40,000 [14,15], and possibly as few as 20,000 – 25,000 [9]. Refinement of the human genome sequence, identification and characterization of the genes contained, and description of the features of the genome continues at a rapid pace [16].
2. The Human Genome Project 3. Single nucleotide polymorphisms (SNPs) In the mid-1980s, the United States Department of Energy (DOE), as part of its studies addressing the protection of the genome from the mutagenic effects of radiation, established an early genome project. In 1988, the United States Congress funded both the National Institutes of Health (NIH) and the DOE to coordinate research and technical activities related to the human genome [2]. In 1989, under the direction of James Watson, the National Center for Human Genome Research (NCHGR) was created and a 5-year plan implemented for the initial phase of what was estimated to be a 15-year research effort. In 1993, Francis S. Collins was named the new director of the NCHGR and a new 5-year plan was implemented based on technological improvements in large-scale sequencing methods [3]. A large part of the early and subsequent successes of the Human Genome Project are attributable to the development of improved technologies that accelerated the elucidation of the human genome sequence. After receiving full institute status at NIH in 1997, the National Human Genome Research Institute (NHGRI) announced a third 5-year plan in 1998 [4]. Two years later, it was announced that the sequence of the majority of the human genome was complete in draft form. The draft sequence of the human genome (including 90% of the human genome’s 3 billion basepairs) was published in 2001 by the NHGRI and the International Human Genome Sequencing Consortium [5], and a group of researchers lead by J. Craig Venter and Celera Genomics [6]. Just 2 years later in April of 2003, the NHGRI announced that the last of the Human Genome Project’s original goals had been achieved with the completion of the sequencing of the human genome,
Early analysis of the draft sequences of the human genome revealed considerable variability between individuals. However, it is now thought that any two individuals will display 99.9% identity at the DNA sequence level. Variation of 0.1% of the human genome can be accounted for, in part, by the presence of single nucleotide polymorphisms (SNPs) distributed throughout the genome [17,18]. SNPs are identified as common, single base variations that occur in human DNA at a frequency of approximately 1 in every 1000 bases. Early estimates suggested that the human genome would contain 3– 4 million SNPs [17,18]. However, current information from the NCBI dbSNP database contains over 5 million validated SNPs and nearly 10 million SNP records [19]. These benign sequence variations can greatly affect an individual’s risk for developing disease. Some mutations can be considered SNPs reflecting the fact that they are represented in >1% of the general population. While not all mutations are polymorphisms, many refer to mutation and/or SNP in general terms to indicate a genetic variant. SNPs themselves may be associated with disease-causing genes, although they are not functional. SNPs can, therefore, also function in the search for disease causing genes [20,21]. Sets of nearby SNPs on the same chromosome are inherited in groups. This pattern of SNP groupings is known as a haplotype. A major initiative in the study of SNPs and human diseases is the International HapMap Project [22]. A HapMap is a map of these SNP groupings on various chromosomes. This should make genome scanning approaches to define diseaseassociated SNPs or genes much more efficient. The HapMap should also be a powerful tool for identifying
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
genetic factors associated with adverse responses to drugs and other disease mechanisms [23]. Nonetheless, identification of clinically relevant SNPs has become a major initiative in diagnostic testing laboratories.
4. Educating the populace One consequence of the Human Genome Project has been the high profiling of human genetics to the lay person through various forms of media. An educated patient population has resulted in challenges to the healthcare system that nobody could have predicted. The Internet has been responsible for the education or rather the provision of information to more individuals than any other educational institution worldwide. One could argue as to the accuracy of the information available through the Internet, yet it has become commonplace for a patient or their family to seek the latest research, therapeutic options and clinical trials for a given disease via this electronic database known as the world wide web. In an attempt to better inform patients and their families, mass educational efforts are needed to distribute basic yet accurate information to the consumer served by the healthcare system. Two national efforts include the Family History Initiative and National DNA Day. On November 8, 2004, the U.S. Department of Health and Human Services launched a Family History Initiative to encourage all Americans to learn about their families’ health histories as a way of promoting personal health and preventing disease. Obtaining an accurate family history is often difficult for physicians seeing a single patient. However, using a new website tool, family members can begin this task during family gatherings [24]. This new family health history tool encourages the first step in providing accurate preventative medicine. In addition to the Family History Initiative, the NHGRI has promoted a National DNA Day to commemorate the completion of the Human Genome Project and the discovery of the DNA double helix [25]. Both of these initiatives are first steps on a national level to educate and
Amplification
Normal Human Chromosome
Fig. 1. Chromosomal abnormalities.
Terminal Deletion
129
Table 1 Known types of human genetic variants Chromosomal abnormalities Aneuploidy Translocation Large deletion Gene amplification Nucleotide sequence abnormalities Point mutation Single nucleotide polymorphism Insertion Deletion Point mutations Missense Nonsense Transition or transversion Splice site
g
prepare individuals for this new era in medicine and promote awareness of a genetic role in family medicine.
5. Sources and types of genetic variation The human population displays a range of phenotypes that reflect variations in heritable traits or characteristics. Some of these phenotypic differences are related to complex gene expression patterns, while others reflect allelic variation between individuals. Allelic variation refers to the subtle differences in DNA sequence that exist between individuals for any particular gene. Additional variation within the human population is related to acquired mutations that have entered the human gene pool, defining the genetic history of humans and distribution of genetic variation from a founding population (founder effect) [26 – 28]. Mutations that are not tolerated are not retained in the gene pool, while advantageous mutations are efficiently passed from generation to generation. Genetic abnormalities or variants can arise in and be inherited through the germline. Likewise, mutations can be acquired in somatic cells. In both cases, these genetic variations can result from inefficient repair of one of many types of DNA damage [29]. Spontaneous lesions can occur during normal cellular processes such as DNA replication, DNA repair, or gene rearrangement [30], or through chemical alteration of the DNA molecule itself as a result of hydrolysis, oxidation, or methylation [31,32]. Typically, DNA lesions create nucleotide mismatches that lead to point mutations. Nucleotide mismatches can result from the formation of an apurinic or apyrimidinic site following depurination or depyrimidination reactions, nucleotide conversions involving deamination reactions, or in rare instances from the presence of a tautomeric form of an individual nucleotide in replicating DNA. The most common nucleotide deamination reaction involves methylated cytosines which can replace cytosine in the linear sequence of a DNA molecule in the form of 5-methylcytosine. The deamination of 5-methylcytosine results in the formation of thymine. This particular deamination reaction
130
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
A 5`
3` P
T
Exon
Coding Sequence Splice Acceptor
B
Electrophoretic
Hybridization
Real time
Gel vs. capillary Restriction enzyme digestion Conformational analysis (HA, SSCP) Sequencing
Southern blot Dot/slot blot Line probe assay
FRET Taqman Molecular beacons
Microarray
Intercalating dyes
Splice Donor
GTG AAG TCA TGC Transcribed Sequence VAL
Table 3 PCR detection methods
LYS SER
CYS Amino Acid Translation
Missense Mutation GTG AAG TCA TGC Normal Sequence GTG ATG TCA TGC Mutant Sequence VAL LYS SER VAL MET SER
CYS Normal Translation CYS Mutant Translation
Nonsense Mutation GTG AAG TCA TGC Normal Sequence GTG TAG TCA TGC Mutant Sequence VAL LYS SER VAL STOP
CYS Normal Translation Mutant Translation
Fig. 2. A. Example of normal gene structure with corresponding transcribed and amino acid sequence. B. Examples of missense and nonsense mutations with corresponding changes to the amino acid sequences.
accounts for a large percentage of spontaneous mutations in human disease [33,34]. Interaction of DNA with physical agents, such as ionizing radiation, can lead to single or double-strand breaks due to scission of phosphodiester bonds on one or both polynucleotide strands of the DNA molecule. Ultraviolet light can produce different forms of photoproducts, including pyrimidine dimers between adjacent pyrimidine bases on the same DNA strand. Nucleotide base modifications can result from exposure of the DNA to various chemical agents, including N-nitroso compounds Table 2 The genetic code First position (5V-end)
U
Second position C
A
G
Third position (3V-end)
U U U U C C C C A A A A G G G
Phe Phe Leu Leu Leu Leu Leu Leu Ile Ile Ile Met Val Val Val
Ser Ser Ser Ser Pro Pro Pro Pro Thr Thr Thr Thr Ala Ala Ala
Tyr Tyr Stop Stop His His Gln Gln Asn Asn Lys Lys Asp Asp Glu
Cys Cys Stop Trp Arg Arg Arg Arg Ser Ser Arg Arg Gly Gly Gly
U C A G U C A G U C A G U C A
and polycyclic aromatic hydrocarbons. DNA damage can also be caused by chemicals that intercalate the DNA molecule and/or cross-link the DNA strands. Bifunctional alkylating agents can cause both intrastrand and interstrand crosslinks in the DNA molecule. Various forms of spontaneous and induced DNA damage can give rise to different types of mutation including both gross alteration of chromosomes and more subtle alterations to specific gene sequences in otherwise normal chromosomes. Gross chromosomal aberrations include (i) large deletions, (ii) additions (reflecting amplification of DNA sequences), and (iii) translocations (reciprocal and nonreciprocal) (Fig. 1). The most common forms of genetic abnormality involve single nucleotide alterations, small deletions, or small insertions into specific gene sequences (Table 1). Single nucleotide alterations that involve a change in the normal coding sequence of the gene are referred to as point mutations (Fig. 2A and B). The consequence of most point mutations is an alteration in the amino acid sequence of the encoded protein (Fig. 2B). However, some point mutations are ‘‘silent’’ and do not affect the structure of the gene product. Silent mutations are possible since most amino acids can be encoded by more than one triplet codon (Table 2). Point mutations fall into two classes which are termed missense mutations and nonsense mutations (Fig. 2B). Missense mutations involve nucleotide base substitutions that alter the translation of the affected codon triplet. In contrast, nonsense mutations involve nucleotide base substitutions that modify a triplet codon which normally encodes for an amino acid into a translational stop codon. This results in the premature termination of translation and the production of a truncated protein product. Small deletions and insertions typically result in frameshift mutations because the deletion or insertion of a single
Fig. 3. Cepheid GeneXpert\.
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
131
Table 4 SNP genotyping for less complex genetic diseases
Fig. 4. Verigene System (Verigene AutoProcessing System (APS) and the Verigene ID) (Nanosphere Inc.).
nucleotide (for instance) alters the reading frame of the gene on the 3V-side of the affected site.
6. New genotyping technologies In human genetic diseases, genetic abnormalities will affect either 1 copy (heterozygous) or both copies (homozygous) of the disease-causing gene [35]. Detecting these various allelic patterns is critical in determining the causative or susceptibility roles of these abnormalities in any given clinical scenario. Most current approaches to detecting these variants include methods that are based on the initial amplification of target gene sequences by the polymerase chain reaction (PCR) [36,37]. While many of these technologies will be reviewed in more detail elsewhere in this issue of the journal, a simple overview of several newer technologies will lend considerable insight into the future of clinical genotyping. The polymerase chain reaction (PCR) clearly revolutionized the way in which laboratories could interrogate DNA or RNA sequences for genetic abnormalities. The ability to amplify a small segment of nucleic acid and produce enough material to perform a series of analytical tests enables the molecular analysis of many genetic variants, and may form the basis for the development of molecular tests for all
Disease
Gene
Mutation(s)
Sickle cell anemia
h-globin
Cystic fibrosis
CFTR
Hereditary Thrombophilia
Factor II Factor V Factor VII MTHFR
Single nucleotide substitution; GAG to GTG results in substitution of valine for glutamic acid >1000 described mutations; recommended 23 mutation panel for carrier screening G20210A G1691A G1601A C677T
human genetic variations (Table 3). Most of these techniques are manual and disadvantageous due to labor and low throughput capabilities. However, enzymatic detection systems and semi-automated blotting techniques are now available, improving these performance characteristics. Array technologies have been the most recent detection scheme used for interrogation of PCR products and can provide significant amounts of data. However, the introduction of these technologies into the clinical laboratory has been slow due to the complexity of testing and quality assurance issues associated with these systems. Nonetheless, some microchip arrays, as well as bead arrays coated with specific oligonucleotides, allow the user to simultaneously screen for several genetic sequence abnormalities and have become commercially available for clinical genotyping applications [38,39]. Newer technologies have emerged that are capable of discriminating between genetic variants by real-time detection of PCR products or without the initial need for amplification. Real-time PCR offers the sensitivity and specificity of traditional PCR with the added benefit of increased performance characteristics due to detection chemistries and rapid turn around times related to new thermal cycling capabilities [40]. As a next generation standard for real-time PCR technologies, the GeneXpert\ (Cepheid, Sunnyvale, CA) combines DNA or RNA extraction, PCR amplification, and real-time detection in a single disposable cartridge that provides test results in less than 2 h
Fig. 5. Liati Analyzer, Liat Tube. Whole blood is collected directly into a Liat Tube (A – B); after the tube is capped, the analyzer scans barcode (C); and the tube is inserted into the analyzer (D); the analyzer then conducts all the nucleic acid testing automatically and reports the results in real-time (E).
132
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
Table 5 Typical carrier screening panel in individuals of Ashklenazi Jewish ancestry Disease
Gene
Chromosome location
Mutation(s)
Carrier frequency
Bloom syndrome Canavan disease Cystic fibrosis Gaucher disease Niemann – Pick disease Tay – Sachs disease
BLM ASPA CFTR GBA SMPD1 HEXA
15q26.1 17pter-p13 7q31 1q21 – 31 11p15.4 – p15.1 15q23 – q24
2281del6bp/ins7bp E285A, Y231X, A305E, IVS2 W1282X, DF508, G542X, 3849 + 10 kb, N1303K N370S, 84GG, L444P, IVS2 + 1 R496L, L302P, fsP330 1277insTATC, IVS12 + 1(G > C), G269S, R247W, R249W
1:111 1:40 1:30 1:13 1:90 1:30
(Fig. 3) [41]. The most established of the non-amplified technologies is the Invader\ technology (TWT, Madison, WI) [42]. The Invader\ Assays enable simultaneous detection of 2 DNA sequences in a single well by using two different discriminatory probes, each with a unique 5Vflap, and two different FRET cassettes, each with a spectrally distinct fluorophore. By design, the released 5Vflaps will bind only to their respective FRET cassettes to generate a target-specific signal [42]. Other technologies that promise to provide rapid genotyping results include the Verigenei System (Nanosphere, Norhtbrook, IL) (Fig. 4) and the Liati System (IQuum, Allston, MA) (Fig. 5). Nanoparticles are at the core of Nanosphere’s Verigenei System. Each gold particle is coated with many copies of a complementary DNA sequence (diameter of approximately 13 nm). The probes can then bind with and signal the presence of a specific target DNA sequence. The unique properties of the gold nanoparticle probes allow them to be detected by either optical, electrical or magnetic processes. The Liati System consists of the Liati Analyzer and disposable Liati Tubes. Similar to the Cepheid GeneXpert\, the Liati Table 6 Examples of genes associated with hereditary cancers Gene
Familial syndrome
APC ATM AXIN2 BLM BRCA1, 2 EXT1, 2 FANC A, C, D, E, F, G KIT MEN1 MET MSH2, MLH1, MSH6, PMS2 NF1, 2 p16INK4A PTCH PTEN RB1 RET SDHB, C, D STK11 TP53 TSC1, 2 VHL WT1 XPA, C; ERCC2, 3, 4, 5
Familial adenomatous polyposis coli Ataxia telangiectasia Attenuated polyposis Bloom syndrome Hereditary breast cancer Hereditary multiple exostoses Fanconi anemia Familial gastrointestinal tumors Multiple endocrine neoplasia type I Hereditary papillary renal cell carcinoma Hereditary non-polyposis colon cancer Neurofibromatosis type 1, 2 Familial malignant melanoma Gorlin syndrome Cowden Syndrome Hereditary retinoblastoma Multiple endocrine neoplasia type II Familial paraganglioma Peutz – Jeghers syndrome Li – Fraumeni syndrome Tuberous sclerosis Von Hippel – Lindau syndrome Familial Wilms tumor Xeroderma pigmentosum
System is designed to be a truly closed system that automates all procedural steps including reagent preparation, target enrichment, inhibitor removal, nucleic acid extraction, amplification, and real-time detection. The labin-a-tube technology refines the testing process to three simple steps: collecting a raw biological sample such as whole blood into a Liati Tube and capping the tube, scanning the tube’s barcode, and inserting the tube into the Liat Analyzer. Thus, as the need for clinically valid genotyping tests continues to increase, technologies are continually being developed to meet that demand.
7. Clinical applications The rapid completion of the Human Genome Project, advances in genotyping technologies and the promotion of preventative medicine concepts has led to the increasing demand for clinical laboratories to become involved in accurate detection of DNA sequence variations. We now know that single nucleotide polymorphisms (SNPs) are the most common form of variant in the human genome sequence and that these SNPs can be associated and in some cases causative for one of the many described human diseases [43,44]. The discussion or even mention of a small percentage of known mutations and/or SNPs that are associated with human disease is beyond the scope of this manuscript. A comprehensive list of disease associated genetic variants can be found at the Online Mendelian Inheritance in Man (OMIM) website [45]. The following sections will highlight some of the current SNP genotyping that is being performed in clinical laboratories for several Table 7 Genes in which polymorphisms have been described that affect cell transformation Gene
Cancer type
Cyclin D1
Acute lymphoblastic leukemia (ALL), hepatocellular carcinoma, head/neck squamous cell carcinoma, hereditary non-polyposis colon cancer, pituitary adenoma, prostate cancer, ovarian cancer, urinary bladder cancer Head/neck squamous cell carcinoma, malignant melanoma, urinary bladder cancer Breast cancer, endometrial cancer Breast cancer, lung cancer, stomach cancer Glioma, lung cancer, stomach cancer, urinary bladder cancer Chronic lymphocytic leukemia
p16INK4A p21Cip1 p53 L-myc P2X7 receptor
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
common monogenic human diseases and several polygenic human diseases. 7.1. Analysis of less complex genetic diseases With respect to human genetic diseases, our knowledge of inheritance patterns, penetrance, and exceptions to Mendellian rules, has augmented the need for molecular genetic testing [46]. Many laboratories perform these types of tests routinely and provide extensive reporting and/or genetic counseling through appropriate mechanisms. It is important that accurate test results and interpretation, including risk assessment, be conveyed to the physician and patient. SNP
133
genotyping can be applied to both simple and complex trait diseases to establish a diagnosis or assess risk. Table 4 illustrates the routine application of SNP or mutation genotyping for several common genetic diseases. Sickle cell anemia represents the simplest of the scenarios with one mutation in the h-globin gene accounting for the majority of this disease. Cystic fibrosis genotyping, on the other hand, represents a single gene (CFTR) in which more than 1000 mutations have been described. Population screening, as indicated by the recent American College of Obstetrics and Gynecology and the American Association of Genetics guidelines, would not be feasible without knowledge of the most common of these mutations [47]. Finally, hereditary
Fig. 6. A. SNP Maps: Chromosome 1 confirmed SNPs NCI CGAP (http://gai.nci.nih.gov/cgi-bin/histo.cgi?c=1 and o=h). B. The NCI CGAP Lucky SNP50 database contains lists of validated SNP assays as shown below. C. In addition to the NCI CGAP Lucky SNP database list, the site offers specific information for individual SNPs, allele frequencies and assay conditions as shown below for ABCA1-02.
134
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
B SNP ID
dbSNP ID
Amino Acid Change
SNP Region
MGB Eclipse
ABCA1-02
rs9282537
G2061G
Ex46-22C>T
V
ABCA1-03
rs9282538
L217L
Ex7-70A>G
V
ABCA1-04
rs2230806
R219K
Ex7-65G>A
V
ABCA1-12
rs4149313
I883M
Ex18-8A>G
V
ABCA1-15
rs2777801
IVS32+30T>G
V
ABCA6-01
rs9282552
IVS13-16G>A
V
ABCA6-02
rs2058128
Ex15+9T>C
V
ABCA7-01
rs4147912
IVS16+8A>C
V
ABCA7-05
rs3764651
IVS20+166A>G
V
ABCA7-07
rs9282557
IVS26+59A>G
V
L637L
TaqMan
C SNPs matching: ABCA1-02 dbSNP ID: rs9282537 LuckySNP500 ID: ABCA1-02 Gene: ABCA1 Amino acid change: G2061G SNP Region: Ex46-22C>T
dbSNP NCBI map Ensembl map Entrez Gene
Surrounding Sequence
(GC Content=49%)
CATGTATGTGTAGGACAGCATGATAAAATTCCCAAGCCAGACCAAAGYCAAG GTGCTTTTTATCACTGTAGGTTGGTGAGTGGGCGATTCGGAAACTGGGCCTC GTGAAGTATGGAGAAAAATATGCTGGTAACTATAGTGGAGGCAACAAACGCA AGCTCTCTACAGCCATGGCTTTGATCGG(C/T)GGGCCTCCTGTGGTGTTTC TGGTGAGTATAACTGTGGATGGAAAACTGTTGTTCTGGCCTGAGTGGAAAAC ATGACTGTTCAAAAGTCCTATATGTCCAGGGCTGTTGTATGATTGGCTTGTC TTCCCCCAGGGACAGCAGAGCAACCTTGGAAAAGCAGAGGGAAGCTTCTCCC TTGGCACACACTGGGGTGGCTGTACCATGCCTGCAGATGCTCCCAAATAG To link to the SNP in the Genewindow genome browser, click on the red SNP. Some SNPs in this sequence are not currently in the database. Frequency Data as determined by sequencing
102 anonymized subjects:
Genotypic
Allelic
Total Complet ed
CC
CT
TT
C
T
101
85/101 (0.842)
12/101 (0.119)
4/101 (0.040)
182/202 (0.901)
20/202 (0.099)
View Subpopulation Frequencies Fig. 6 (continued).
thrombophilia, where the majority of patients were accounted for by a single SNP in the factor V gene, is now known to be much more complex with respect to phenotype versus genotype. Thrombophilia represents a condition where SNPs in at least four separate genes have been associated with increased risk for developing disease in persons heterozygous and homozygous for these genetic variants.
In cases where there may be large numbers of mutations or multiple genes associated with disease, it is common practice for laboratories to test for panels of mutations as screening requires. This is important in certain populations as with individuals of Ashkenazi Jewish ancestry [48]. Table 5 indicates some of the SNP genotyping that is performed on individuals of
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
135
Ashkenazi Jewish ancestry as part of carrier screening programs.
Table 9 Single nucleotide (SNP) and other polymorphisms associated with thrombosis and hemostasis
7.2. Analysis of highly complex and common polygenic diseases
Gene
Region
Polymorphism
Factor II (prothrombin) Factor V
3V Untranslated region Exon 10
Factor VII
Exon 8 Intron 7 Promoter h-chain promoter Promoter
20210 G Y A 1691 G Y A (FV Leiden) 353 G Y A 37 bp repeat 10 bp insertion 455 G Y A 675 insG
While the analytical challenges of genetic testing for monogenic disorders continue to be optimized, a direct result of this is to have defined technologies which can then be applied to the more common forms of human diseases. These more complex polygenic, multifactorial disorders that affect individuals occur at a much higher frequency than the single gene or monogenic disorders. In many of these conditions, the genetic element is comprised of abnormalities in several genes that each contribute to the disease phenotype. Two of these disease categories account for the majority of sickness in the developed world and include cancer and cardiovascular disease. The genetics of human cancer has revolved around gene identification and mutation spectra that may be associated with the development of the disease. Three main categories of genes have been identified and include: oncogenes, tumor suppressor genes and stability genes [49 – 51]. Initially, much of our understanding of cancer genetics stemmed from data that was obtained from studying families with inherited cancers. Not only did these studies give us a better appreciation for the complexity of this disease, but they led to gene discoveries that gave laboratories the ability to perform predisposition testing for some malignancies (Table 6). However, the majority of human cancers are sporadic, making the evaluation of genetic abnormalities in these cells for diagnostic or prognostic purposes much more complex. We now know that the malignant transformation of somatic cells and clonal neoplastic growth require multiple changes at the molecular, biochemical and cellular levels [52,53]. This transformation is characterized by uncontrolled cell growth and proliferation for which many genes and associated polymorphisms have been described (Table 7). More than 290 cancer genes have been identified [54]. Of these, 90% show somatic mutations and 20% show germline or inherited mutations in human cancer. While many cancer genes have been identified, much work is still needed to fully understand the mechanisms by which normal cells transition to cancer cells. The mission of the National Cancer Institute’s (NCI) Table 8 Diseases that result in increased plasma levels of LDL cholesterol Disease
Gene
Autosomal recessive hypercholesterolemia Familial hypercholesterolemia
ARH LDLR (low-density lipoprotein receptor) APOB-100 LPL ABCG5, ABCG8 (ATP-binding cassette)
Familial ligand-defective apolipoprotein B-100 Familial lipoprotein lipase (LPL) deficiency Sitosterolemia
Fibrinogen Plasminogen activator inhibitor-1 (PAI-1)
Cancer Genome Anatomy Project (CGAP) is to determine the gene expression profiles of normal, precancer, and cancer cells in various types of human tissues [55]. The ultimate goal of this effort would be the improved detection, diagnosis, and treatment of human cancers by identifying tumor specific markers for diagnosis and therapeutic intervention. One aspect of CGAP is the LuckySNP500 Database that includes over 1500 defined SNPs with validated assay protocols (Fig. 6A – C). These types of user-accessible discovery tools significantly augment the efforts that are underway to win the war on cancer. Cardiovascular disease (CVD) is the leading cause of morbidity and mortality in the United States. There are an estimated 62 million people with CVD in this country [56]. The causal agent in many forms of CVD is the elevation of low-density lipoprotein (LDL) in plasma. LDL is the major transporter of cholesterol and is composed of a cholesteryl ester core surrounded by a phospholipid and apolipoprotein B-100 coat [57]. Several genes and mutations for CVD have been described whereby plasma LDL and hence cholesterol levels are elevated (Table 8). Plasma lipid and lipoprotein concentrations are important factors in the development of CVD. Lipoprotein lipase (LPL), which is located on the vascular endothelium, functions to remove lipids from the circulation by hydrolyzing triglycerides. Several polymorphisms have been described which alter the function of LPL and that may result in CVD including the familial LPL deficiency syndrome. Another type of CVD is thrombosis and hemostasis for which several SNPs have also been identified (Table 9). These SNPs are typically associated with an increased risk for thrombosis, and the most common, Factor V Leiden, is prevalent in approximately 6% of the general population [58,59]. The identification of these SNPs can help predict who may be at increased risk for developing disease. The pathophysiology of CVD is complex and numerous genes have been identified for other forms of CVD such as heart failure, hypertension, and hypertrophic cardiomyopathy [60,61]. As a disease entity, CVD is comprised of many forms of pathophysiologic processes that affect different cellular and biochemical pathways. While many SNPs have been identified in genes associated with these functions, our ability to accurately correlate genotype
136
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137
with phenotype remains an enormous challenge for these types of complex diseases.
8. Summary The need for clinical genotyping of SNPs and mutations has and continues to be driven by our quest for knowledge. This includes both a scientific approach to the underlying mechanisms of disease and an educational approach to increase awareness of the general public. The Human Genome Project and its many programs as well as rapid advances in technologies has offered laboratorians the opportunity to make use of the genetic code for the purpose of providing better diagnostic, prognostic, and therapeutic assessment of human diseases. Unraveling the more common yet more complex diseases will be the biggest challenge as we move into the future of molecular diagnostics.
References [1] Gray IC, Campbell DA, Spurr NK. Single nucleotide polymorphisms as tools in human genetics. Hum Mol Genet 2000;16:2403 – 8. [2] http://www.genome.gov/120112391. [3] Collins FS, Galas D. A new five-year plan for the US Human Genome Project. Science 1993;262:43 – 6. [4] Collins FS, PAtrinos A, Jordan E, Chakravati A, Gesteland R, Walters L. New goals for the US. Human Genome Project: 1998 – 2003. Science 1998;282:682 – 9. [5] Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860 – 921. [6] Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science 2001;291:1304 – 51. [7] http://www.genome.gov/11006929. [8] Guttmacher AE, Collins FS. Welcome to the genomic era. N Engl J Med 2003;349:996 – 8. [9] International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004;431: 931 – 45. [10] Snyder M, Gerstein M. Defining genes in the genomics era. Science 2003;300:258 – 60. [11] Nowak R. Mining treasures from Fjunk DNA_. Science 1994;263: 608 – 10. [12] Antequera F, Bird A. Predicting the total number of human genes. Nat Genet 1994;8:114. [13] Fields C, Adams MD, White O, Venter JC. How many genes in the human genome? Nat Genet 1994;7:345 – 6. [14] Ewing B, Green P. Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet 2000;25:232 – 4. [15] Roest Crollius H, Jaillon O, Bernot A, et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet 2000;25:235 – 8. [16] http://www.nhgri.nih.gov/. [17] Sachidanandam R, Weissman D, Schmidt SC, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:928 – 33. [18] Holden AL. The SNP Consortium: summary of a private consortium effort to develop an applied map of the human genome. BioTechniques—discovery of markers for disease; 2002. p. 22 – 26. Suppl. [19] http://www.ncbi.nlm.nih.gov/projects/SNP.
[20] Hirschorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005;6:95 – 108. [21] Wang WYS, Barratt BJ, Clayton DC, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005;6:109 – 18. [22] http://www.genome.gov/10001688#1. [23] http://genome.gov/10001688. [24] http://www.genome.gov/12513847. [25] http://www.genome.gov/10506367. [26] Collins A, Lonjou C, Morton NE. Genetic epidemiology of singlenucleotide polymorphisms. Proc Natl Acad Sci U S A 1999;96: 15173 – 7. [27] Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:1516 – 7. [28] Lander ES. The new genomics: global views of biology. Science 1996; 274:536 – 9. [29] Drake JW, Baltz RH. The biochemistry of mutagenesis. Ann Rev Biochem 1976;45:11 – 37. [30] Friedberg EC, Walker GC, Siede W. DNA repair and mutagenesis. Washington (DC)’ ASM Press; 1995. [31] Lindahl T. Instability and decay of the primary structure of DNA. Nature 1993;362:709 – 15. [32] Ames BN, Shigenaga MK, Gold LS. DNA lesions, inducible DNA repair, and cell division: three key factors in mutagenesis and carcinogenesis. Environ Health Perspect 1993;101(Suppl. 5): 35 – 44. [33] Cooper DN, Youssoufian H. The CpG dinucleotide and human genetic disease. Hum Genet 1988;78:151 – 5. [34] Rideout III WM, Coetzee GA, Olumi AF, Jones PA. 5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. Science 1990;249:1288 – 90. [35] Burke W. Genetic testing. N Engl J Med 2002;347:1867 – 75. [36] Kristensen VN, Kelefiotis D, Kristensen T, Borresen-Dale AL. High throughput methods for detection of genetic variation. BioTechniques 2001;30:318 – 32. [37] Chen X, Sullivan PF. Single nucleotide polymorphisms genotyping: biochemistry, protocol, cost and throughpput. Pharmacogenomics J 2003;3:77 – 96. [38] Dudda-Subramanya R, Lucchese G, Kanduc D, Sinha AA. Clinical applications of DNA microarray analysis. J Exp Ther Oncol 2003;3: 297 – 304. [39] Iannone MA, Consler TG, Pearce KH, Stimmel JB, Parks DJ, Gray JG. Multiplexed molecular interactions of nuclear receptors using fluorescent microspheres. Cytometry 2001;44:326 – 37. [40] Wilhelm J, Pingoud A. Real-time polymerase chain reaction. ChemBioChem 2003;4:1120 – 8. [41] S. Raja, J. Ching, L. Xi, et al., in press. Technology for automated, rapid, and quantitative PCR or reverse transcription-PCR clinical testing. Clin Chem. [42] de Arruda M, Lyamichev VI, Eis PS, et al. Invader technology for DNA and RNA analysis: principles and applications. Expert Rev Mol Diagn 2002;2:487 – 96. [43] Taylor JG, Choi EH, Foster CB, Chanock SJ. Using genetic variation to study human disease. Trends Mol Med 2001;7:507 – 12. [44] Chanock S. Candidate genes and single nucleotide polymorphisms (SNPs) in the study of human disease. Dis Markers 2001;17:89 – 98. [45] www.ncbi.nlm.nih.gov/entrez/quert.fcgi?db=OMIM. [46] Guttmacher AE, Collins FS, Carmona RH. The family history: more important than ever. N Engl J Med 2004;351:2333 – 6. [47] ACOG. Preconception and prenatal carrier screening for cystic fibrosis: clinical laboratory guidelines. Washington (DC)’ American College of Obstetricians and Gynecologists; 2001. p. 1 – 31. [48] Charrow J. Ashkenazi Jewish genetic disorders. Fam Cancer 2004; 3(3 – 4):201 – 6. [49] Knudson AG. Cancer genetics. Am J Med Genet 2002;111:96 – 102. [50] Nowell PC. Tumor progression: a brief historical perspective. Semin Cancer Biol 2002;12:261 – 6.
G.J. Tsongalis, W.B. Coleman / Clinica Chimica Acta 363 (2006) 127 – 137 [51] Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med 2004;10:789 – 99. [52] Hanahan D, Weinberg RA. The hallmarks of cancer. Cell 2000; 100:57 – 70. [53] Loktionov A. Common gene polymorphisms, cancer progression and prognosis. Cancer Lett 2004;208:1 – 33. [54] Futreal PA, Coin L, Marshall M, et al. A consensus of human cancer genes. Nat Rev Cancer 2004;4:177 – 83. [55] http://cgap.nci.nih.gov/. [56] www.nhlbi.nih.gov/resources/docs/cht-book.htm. [57] Nabel EG. Cardiovascular disease. N Engl J Med 2003;349: 60 – 72.
137
[58] Caprini JA, Glase CJ, Anderson CB, Hathaway K. Laboratory markers in the diagnosis of venous thromboembolism. Circulation 2004; 109(12 Suppl. 1):I4 – 8. [59] Hoppe B, Tolou F, Radtke H, Kieswetter H, Dorner T, Salama A. Marburg I polymorphisms of factor VII-activating protease is associated with idiopatyhic venous threomboembolism. Blood 2005; 105:1549 – 51. [60] Doris PA. Hypertension genetics, single nucleotide polymorphisms, and the common disease: common variant hypothesis. Hypertension 2002;39:323 – 31 [part 2]. [61] Archacki S, Wang Q. Expression profiling of cardiovascular disease. Hum Genomics 2004 (Aug.);1(5):355 – 70.