Analytical Biochemistry 415 (2011) 218–220
Contents lists available at ScienceDirect
Analytical Biochemistry journal homepage: www.elsevier.com/locate/yabio
Notes & Tips
Multiple target loci assembly sequencing (mTAS) Hyojun Han a, Jung-ki Yoon b, Byoung Chul Cho c, Hwangbeom Kim a, Duhee Bang a,⇑ a
Department of Chemistry, Yonsei University, Shinchon 134, Seoul 120-749, Republic of Korea College of Medicine, Seoul National University, Seoul 110-799, Republic of Korea c Department of Internal Medicine, College of Medicine, Yonsei University, 250 Seongsanno, Seodaemun-gu, Seoul 120-752, Republic of Korea b
a r t i c l e
i n f o
Article history: Received 27 February 2011 Received in revised form 8 April 2011 Accepted 9 April 2011 Available online 16 April 2011
a b s t r a c t Here we present multiple target loci assembly sequencing (mTAS), a method for examining multiple genomic loci in a single DNA sequencing read. The key to the success of mTAS target sequencing is the uniform amplification of multiple target genomic loci into a single DNA fragment using polymerase cycling assembly (PCA). Using this strategy, we successfully collected multiloci sequence information from a single DNA sequencing run. We applied mTAS to examine 29 different sets of human genomic loci, each containing from 2 to 11 single-nucleotide polymorphisms (SNP) present at different exons. We believe mTAS can be used to reduce the cost of Sanger sequencing-based genetic analysis. Ó 2011 Elsevier Inc. All rights reserved.
The PCR1-based amplification of a genomic locus has provided the simplest protocol for targeted sequencing [1–3]. Multiple loci can be captured by adding several primer pairs into a PCR to generate several amplicons [4,5]. Recently developed methods allow us to capture a number of genomic loci selectively prior to sequencing [6–13]. For example, RainDance, Inc., has produced a microfluidic platform to amplify thousands of amplicons simultaneously [14]. Other large-scale target-enrichment methods use ‘‘molecular inversion probes’’ (MIP) [10–13] and the hybrid capture approach [6–9]. These target-enrichment methods have significantly increased the multiplexity in the capture of target DNA [2]. By combining these target-enrichment strategies with high-throughput DNA sequencing technology [15], researchers have been able to reduce the costs of sequencing dramatically. However, to the best of our knowledge, no methodology can achieve multiplex target sequencing using a single sequencing read. All currently available target sequencing methods cannot amplify separated genomic loci into a single DNA, and multiple target loci must be amplified separately and sequenced multiple times. Here we present mTAS, a new strategy for detecting multiple loci by a single sequencing read. We demonstrate the utility of mTAS by sequencing 20 different sets of human genomic loci. We also present the successful application of the mTAS method to analyze epidermal growth factor receptor (EGFR) mutations present at three different exons [16]. Our method enables multiplex amplification of desired target loci with perfect uniformity and the examination of the multiple target loci in a single sequencing read. ⇑ Corresponding author. E-mail address:
[email protected] (D. Bang). Abbreviations used: EGFR, epidermal growth factor receptor; mTAS, multiple target loci assembly sequencing; PCA, polymerase cycling assembly; PCR, polymerase chain reaction; SNP, single-nucleotide polymorphisms. 1
0003-2697/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ab.2011.04.012
The mTAS method takes advantage of PCA [17], a method for constructing large stretches of DNA. The PCA method typically uses multiple overlapping oligonucleotides that are designed to assemble via a polymerase chain reaction. For mTAS target sequencing, we designed multiple PCA probes, each having target-specific sequences at the 30 end and assembly spacer sequences at the 50 end (Fig. 1). These probes first generate multiple short amplicons, and in the second round of the assembly process, the overlapping spacer sequences are used to assemble short amplicons into a large stretch of the desired DNA sequence. To test the utility of mTAS for targeted sequencing, we assembled various sets of disease- and phenotype-related human single-nucleotide polymorphisms (SNPs) from those listed on the website of a commercial genetic testing service (https:// www.23andme.com/). The selected SNP sequences are shown in SI Table 1. To facilitate the design of oligonucleotides for the mTAS experiments, we developed a Perl computer program, Perl-mTAS. Perl-mTAS is a generator of overlapping nucleotides based on the information related to a SNP ID, the target locus length, and the oligonucleotide annealing temperature. We used the nearest neighbor method to calculate the assembly temperatures for regions overlapping adjacent oligonucleotides [18]. The oligonucleotide sequences generated from the Perl-mTAS are listed in SI Table 1. The assembly process for mTAS proceeds in two steps. We used genomic DNA purified from human blood. The first assembly step generated a mixture of amplicons of about 100 bp (Fig. S1). We then mixed an aliquot of the first amplification products, without further purification, into an excess pair of flanking primer oligonucleotides to begin a second assembly process. Using an optimized protocol for the assembly process (see supplemental material), we were able to assemble 25 amplicons out of 26 mTAS experimental sets (Fig. 2 and Fig. S2 for repeat experiments). We found
Multiple target loci assembly sequencing (mTAS) / H. Han et al. / Anal. Biochem. 415 (2011) 218–220
219
Fig.1. Schematic representation of the mTAS method. mTAS target sequencing uniformly amplifies multiple target genomic loci into a single DNA fragment using multiple PCR primer pairs in two steps. The first assembly step using genomic DNA and assembly primers generates 100 bp size amplicons. The first amplification products and an excess pair of flanking primer oligonucleotides are subsequently mixed for a second assembly process to create target assembly DNA fragments which vary from 141 to 624 bp depending on the number of target loci. A single DNA sequencing run of the DNA fragment simultaneously provides us the information of these loci.
Fig.2. Agarose gel data from the mTAS target amplification of 26 different sets of human genomic loci. Two to nine SNP loci based on phenotypes are grouped together to carry out each mTAS experiment. Targeted phenotypes for each experimental set are listed on the gel picture. Red triangles indicate the desired target amplicon sizes (For interpretation of color mentioned in this figure the reader is referred to the web version of the article.).
220
Notes & Tips / Anal. Biochem. 415 (2011) 218–220
that the concentration of each oligonucleotide used for the first and the second assembly steps should be over 1 lM to obtain the desired amplicons as major products (see supplemental material and SI Table 2). We also found that in the majority of experiments, the assembly of two to five SNPs proceeded with high efficiency, as shown in Fig. 1. We grouped two to five SNPs based on the phenotypes. For example, we used all SNP loci listed for age-related macular degeneration (on https://www.23andme.com/) as one set. For phenotypes that contain only one SNP, we put two to four of these SNPs together to carry out one mTAS experiment. Notably, we achieved mTAS target sequencing of six to nine loci, as illustrated for rheumatoid arthritis (six SNPs), type 1 diabetes (eight SNPs), and type 2 diabetes (nine SNPs) (Fig. 2). However, we found that the amplifications were less efficient in these experiments. Thus, when the number of SNPs is more than five, we may need to divide these mTAS experiments into two sets. To check the efficiency of the mTAS method, we cloned the amplicons and used Sanger sequencing to confirm sequences of captured target loci. We found that the majority (92%) of the target sequences were perfectly assembled with only 8% loss of some target loci from assembly amplicons (SI Table 3). We repeated the assembly of 26 amplicons two more times and found that the assembly efficiency was comparable to the efficiency level of the first experiments (Fig. S3 and SI Table 3). In these experiments, we used cloning procedures to evaluate mTAS precisely; however, the PCR products from mTAS can be sequenced directly after agarose gel purification, as discussed below. Mutations in the EGFR are a leading cause of a non-small-cell lung cancer (NSCLC) [16]. More than 90% of EGFR mutations are present in exon 19 (five amino acid deletion mutation) and in exon 21 (Leu858Arg mutation from a single nucleotide change). More importantly, tyrosine kinase inhibitor drugs (gefitinib and erlotinib) targeting these EGFR mutations develop a drug-resistant cancer mainly from the emergence of an exon 20 mutation (Thr790Met). At present, because the identification of these EGFR mutations is very important for screening patients for personalized therapy, lung cancer patient’s tumor tissue samples are often examined by PCR assessments of these loci followed by multiple Sanger sequencing runs. By applying mTAS sequencing for these clinically important EGFR target sequences, we expected to reduce the DNA sequencing cost through the use of mTAS primer pairs designed for EGFR. Using genomic DNA extracted from human lung cancer tissues, we carried out mTAS target amplification of three loci, covering parts of exons 19, 20, and 21 (Fig. S4). Subsequently, we used direct Sanger sequencing of these amplicons to verify the target sequences. We successfully detected EGFR mutations related to lung cancer (arrows in Fig. S4), and our results were identical to the sequencing results from a conventional EGFR DNA sequencing provider. In summary, using multiple PCR primer pairs that could anneal to target genomic loci, we were able to collect the information of these loci from a single DNA sequencing run. Furthermore, mTAS target sequencing provides homogeneous enrichment over multiple target loci (10 loci) and results in specific and uniform evaluations of target loci. As a result, the mTAS target sequencing process provides a unique solution for cost-effective analyses of clinical samples that are typically examined by Sanger sequencing runs. Currently, most clinical genetic tests are carried out using Sanger sequencing. Thus, by amplifying these genetic test target loci in one sequence read, we can reduce the cost of Sanger sequencing many folds. Although we used Sanger sequencing here to evaluate mTAS, this method can be used in conjunction with highthroughput sequencing technology to increase the throughput even further. For example, the Roche-454 sequencing platform, which has a read length of about 500 bp, can be used with mTAS to detect
single-nucleotide polymorphisms spread out over the genome while retaining most of the sequence data. Acknowledgments This study was supported by a grant of the Korea Healthcare technology R&D Project, Ministry of Health & Welfare, Republic of Korea (A101259-1001-0000100), and BK21 program of the Ministry of Education, Science, and Technology. D.B. is a TJ Park junior faculty fellow supported by the Posco TJ Park Foundation. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.ab.2011.04.012. References [1] S. Yang, R.E. Rothman, PCR-based diagnostics for infectious diseases: uses, limitations, and future applications in acute-care settings, Lancet Infect. Dis. 4 (2004) 337–348. [2] L. Mamanova, A.J. Coffey, C.E. Scott, I. Kozarewa, E.H. Turner, A. Kumar, E. Howard, J. Shendure, D.J. Turner, Target-enrichment strategies for nextgeneration sequencing, Nat. Methods 7 (2010) 111–118. [3] R.K. Saiki, D.H. Gelfand, S. Stoffel, S.J. Scharf, R. Higuchi, G.T. Horn, K.B. Mullis, H.A. Erlich, Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase, Science 239 (1988) 487–491. [4] M.C. Edwards, R.A. Gibbs, Multiplex PCR: advantages, development, and applications, PCR Methods Appl. 3 (1994) S65–S75. [5] J.Y. Chun, K.J. Kim, I.T. Hwang, Y.J. Kim, D.H. Lee, I.K. Lee, J.K. Kim, Dual priming oligonucleotide system for the multiplex detection of respiratory viruses and SNP genotyping of CYP2C19 gene, Nucleic Acids Res. 35 (2007) e40. [6] M. Lovett, J. Kere, L.M. Hinton, Direct selection: a method for the isolation of cDNAs encoded by large genomic regions, Proc. Natl. Acad. Sci. USA 88 (1991) 9628–9632. [7] S. Parimoo, S.R. Patanjali, H. Shukla, D.D. Chaplin, S.M. Weissman, cDNA selection: efficient PCR approach for the selection of cDNAs encoded in large chromosomal DNA fragments, Proc. Natl. Acad. Sci. USA 88 (1991) 9623–9627. [8] T.J. Albert, M.N. Molla, D.M. Muzny, L. Nazareth, D. Wheeler, X. Song, T.A. Richmond, C.M. Middle, M.J. Rodesch, C.J. Packard, G.M. Weinstock, R.A. Gibbs, Direct selection of human genomic loci by microarray hybridization, Nat. Methods 4 (2007) 903–905. [9] A. Gnirke, A. Melnikov, J. Maguire, P. Rogov, E.M. LeProust, W. Brockman, T. Fennell, G. Giannoukos, S. Fisher, C. Russ, S. Gabriel, D.B. Jaffe, E.S. Lander, C. Nusbaum, Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing, Nat. Biotechnol. 27 (2009) 182–189. [10] P.M. Lizardi, X. Huang, Z. Zhu, P. Bray-Ward, D.C. Thomas, D.C. Ward, Mutation detection and single-molecule counting using isothermal rolling-circle amplification, Nat. Genet. 19 (1998) 225–232. [11] D.O. Antson, A. Isaksson, U. Landegren, M. Nilsson, PCR-generated padlock probes detect single nucleotide variation in genomic DNA, Nucleic Acids Res. 28 (2000) E58. [12] P. Hardenbol, J. Baner, M. Jain, M. Nilsson, E.A. Namsaraev, G.A. KarlinNeumann, H. Fakhrai-Rad, M. Ronaghi, T.D. Willis, U. Landegren, R.W. Davis, Multiplexed genotyping with sequence-tagged molecular inversion probes, Nat. Biotechnol. 21 (2003) 673–678. [13] P. Hardenbol, F. Yu, J. Belmont, J. Mackenzie, C. Bruckner, T. Brundage, A. Boudreau, S. Chow, J. Eberle, A. Erbilgin, M. Falkowski, R. Fitzgerald, S. Ghose, O. Iartchouk, M. Jain, G. Karlin-Neumann, X. Lu, X. Miao, B. Moore, M. Moorhead, E. Namsaraev, S. Pasternak, E. Prakash, K. Tran, Z. Wang, H.B. Jones, R.W. Davis, T.D. Willis, R.A. Gibbs, Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay, Genome Res. 15 (2005) 269–275. [14] R. Tewhey, J.B. Warner, M. Nakano, B. Libby, M. Medkova, P.H. David, S.K. Kotsopoulos, M.L. Samuels, J.B. Hutchison, J.W. Larson, E.J. Topol, M.P. Weiner, O. Harismendy, J. Olson, D.R. Link, K.A. Frazer, Microdroplet-based PCR enrichment for large-scale targeted sequencing, Nat. Biotechnol. 27 (2009) 1025–1031. [15] J. Shendure, H. Ji, Next-generation DNA sequencing, Nat. Biotechnol. 26 (2008) 1135–1145. [16] S.V. Sharma, D.W. Bell, J. Settleman, D.A. Haber, Epidermal growth factor receptor mutations in lung cancer, Nat. Rev. Cancer 7 (2007) 169–181. [17] W.P. Stemmer, A. Crameri, K.D. Ha, T.M. Brennan, H.L. Heyneker, Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides, Gene 164 (1995) 49–53. [18] J. SantaLucia Jr., A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics, Proc. Natl. Acad. Sci. USA 95 (1998) 1460– 1465.