A multiplexing single nucleotide polymorphism typing method based on restriction-enzyme-mediated single-base extension and capillary electrophoresis

A multiplexing single nucleotide polymorphism typing method based on restriction-enzyme-mediated single-base extension and capillary electrophoresis

ANALYTICAL BIOCHEMISTRY Analytical Biochemistry 329 (2004) 220–229 www.elsevier.com/locate/yabio A multiplexing single nucleotide polymorphism typing...

757KB Sizes 0 Downloads 13 Views

ANALYTICAL BIOCHEMISTRY Analytical Biochemistry 329 (2004) 220–229 www.elsevier.com/locate/yabio

A multiplexing single nucleotide polymorphism typing method based on restriction-enzyme-mediated single-base extension and capillary electrophoresis Yihua Che and Xiangning Chen¤ Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, 800 E. Leigh Street, Suite 1-110, Richmond, VA 23298, USA Received 3 December 2003 Available online 10 May 2004

Abstract Millions of single nucleotide polymorphisms (SNPs) have been identiWed in recent years. This provides a great opportunity for large-scale association and population studies. However, many high-throughput SNP typing techniques require expensive and dedicated instruments, which render them out of reach for many laboratories. To meet the need of these laboratories, we here report a method that uses widely available DNA sequencer for SNP typing. This method uses a type II restriction enzyme to create extendable ends at target polymorphic sites and uses single-base extension (SBE) to discriminate alleles. In this design, a restriction site is engineered in one of the two polymerase chain reaction (PCR) primers so that the restriction endonuclease cuts immediately upstream of the targeted SNP site. The digestion of the PCR products generates a 5⬘-overhang structure at the targeted polymorphic site. This 5⬘-overhang structure then serves as a template for SBE reaction to generate allele-speciWc products using Xuorescent dyeterminator nucleotides. Following the SBE, the allele-speciWc products with diVerent sizes can be resolved by DNA sequencers. Through primer design, we can create a series of PCR products that vary in size and contain only one restriction enzyme recognition site. This allows us to load many PCR products in a single capillary/lane. This method, restriction-enzyme-mediated single-base extension, is demonstrated by typing multiple SNPs simultaneously for 44 DNA samples. By multiplexing PCR and pooling multiplexed reactions together, this method has the potential to score 50–100 SNPs/capillary/run if the sizes of PCR products are arranged at every 5–10 bases from 100 to 600 base range.  2004 Elsevier Inc. All rights reserved. Keywords: SNP; Genotyping; Multiplex PCR; Restriction digestion; DNA sequencer; Capillary electrophoresis; Single base extension

Genetic variations are the basis of human diversity and play an important role in human diseases. Single nucleotide polymorphisms (SNPs)1 are the most abundant variation in the human genome [1,2]. The large number of SNPs available in the public databases makes Wne-mapping disease genes realistic and exciting. However, there are practical problems for such studies. The cost and throughput of SNP typing are among the most signiWcant ones [3–5]. Most methods for SNP typing ¤

Corresponding author. Fax: 1-804-828-3223. E-mail address: [email protected] (X. Chen). 1 Abbreviations used: SNPs, single nucleotide polymorphisms; CE, capillary electrophoresis; REs, restriction endonucleases; REM-SBE, restriction-enzyme-mediated single-base extension. 0003-2697/$ - see front matter  2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ab.2004.01.038

require dedicated instruments, such as microarray techniques [6,7], matrix-assisted laser desorption/ionization mass spectrometry [8,9], the SNP stream system [10], the TaqMan nuclease assay [11], pyrosequencing [12], and the FP-TDI method [13]. The high cost of dedicated instrumentation makes these methods out of reach for many laboratories. DNA sequencers is one of the most widely available instruments for biomedical research. The throughput and automation of sequencers have been demonstrated in large-scale sequencing projects such as the human genome project. DNA sequencers separate and detect DNA fragments by size and Xuorescence labeling [14]. To use sequencers eYciently and economically, the key is to Wnd a way to generate a series of products of diVerent

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

sizes and colors because sequencers can separate and identify these products every eYciently. The SNaPshot and SNuPe techniques marketed by Applied Biosystems and Amersham were the Wrst attempts in this direction. Because of the length limitation of extension primers, neither method was used broadly. Recently, Schouten et al. [15] reported a ligation-mediated method to quantify target-sequences and potentially to type SNPs. In the method, a short probe containing target-speciWc sequence and a common tail were ligated to a large probe, which was produced by cloning a short target sequence and stuVer sequences of variable length. Ligation products were then ampliWed with a universal primer set and separated and identiWed by capillary electrophoresis (CE). One of the weaknesses of this method was the cumbersome cloning procedures required for each target. CE had also been used to separate allele-speciWc PCR products [16]. However, allele-speciWc PCR is not a robust procedure. It has serious problems in multiplexing and does not work for some SNPs [17]. As a result, allelespeciWc PCR is not routinely used for SNP typing. Restriction endonucleases (REs) are useful tools to generate DNA fragments with diVerent lengths and have been widely used for genetic mapping. Direct use of REs for SNP typing was also reported [18,19]. There is a special group of REs, type II restriction endonucleases, with recognition sites several bases away from their cutting sites [20]. FokI, BbvI, BtgZI and BceAI are examples of these enzymes (Table 2). This feature provides an opportunity to integrate a recognition site into a PCR primer with the digestion site in the ampliWed genomic sequence. Taking advantage of this property, we can design a PCR primer close to a target SNP site so that the SNP is right at the cutting site. After the enzyme digestion, a 5⬘-overhang structure is created and this structure can be extended by DNA polymerases. In this design, we control the size of PCR product by the position of reverse primer. This design enables us to generate a unique size for each SNP and paves the road for high-level multiplex SNP typing by DNA sequencers. In this article, we report the development of a technique that explores this design for SNP typing. The technique, restriction-enzyme-mediated single-base extension (REM-SBE), overcomes the limitation of primer length and allows us to use the entire size range of DNA sequencers for genotyping. Using this design, we can signiWcantly increase the capacity of multiplexing and reduce operation cost.

1. Materials and methods 1.1. DNA samples Human genomic DNAs were obtained from Coriell Institute (Camden, NJ). The sample panel consisted of 44 individuals. The working concentration was 10 ng/l.

221

1.2. PCR primer design The forward primer was engineered to contain a type II RE recognition site at a speciWc position of the primer so that the restriction enzyme could cut the DNA fragment immediately upstream of the SNP site (in 5⬘ to 3⬘ direction). For example, the recognition sequence, GGATG, was placed 13 bases upstream of the targeted SNP site to generate a FokI site (Fig. 1). Since the position of the forward primer was Wxed in this design, the reverse primer was positioned to produce a unique size for each SNP. When each SNP had a unique size, multiple SNPs could be stacked together for a sequencer run. Common tails (F: 5⬘-CGGTGCGCGTCGCTCAGG-3⬘ for the forward primer, and R: 5⬘-TCCGATATCCCGGGTCGT-3⬘ for the reverse primer) were added to the forward and reverse primers, respectively, to improve the performance of multiplex PCR. Eighteen randomly selected markers were designed for this study using the Primer 3 program (http://www.broad.mit.edu/cgi-bin/ primer/primer3_www.cgi/) [21]. Primers were obtained from Qiagen (Alameda, CA). The marker information and primer sequences are listed in Table 1. Some other REs that could be used for the REM-SBE technique are listed in Table 2. 1.3. Multiplex PCR The optimization of multiplex PCR was implemented by a stepwise procedure. We Wrst tested primer eYciency and speciWcity for each SNP individually using a threestep standard PCR protocol (94 °C, 30 s; 55 °C 45 s; 65 °C 1 min; 35 cycles). We then pooled four or Wve markers that had diVerent amplicon sizes and showed similar eYciencies for multiplexing. Multiplex PCRs were performed using a two-step PCR protocol in 20 l of reaction volume. For the Wrst step, multiplex PCR was performed for 15 cycles with a reaction mixture containing 20 mM Tris–HCl (pH 8.4), 2.5 mM MgCl2, 50 mM KCl, 6 mM (NH4)2SO4, 30 nM each primer (totally 10 primers for a Wve-plex combination setup), 250 M dNTPs (Invitrogen, Carlsbad, CA), 40 ng of DNA, and 1 U of HotMaster Taq DNA polymerase (Eppendorf, Hamburg, Germany). After the Wrst step, the PCR was paused to add a mixture of 5 l containing 1 U of HotMaster Taq polymerase, 500 nM of each tail primer, and 500 M dNTPs. The reaction was resumed for 25 more cycles. Two optimized programs were used for multiplexing PCR. In program A, the Wrst step consisted of 15 cycles of 95 °C for 30 s, 58 °C for 5 s, ramping from 58 to 48 °C at 0.1 °C/s, and 72 °C for 1 min. The second step used 25 cycles of 95 °C for 30 s, 60 °C for 1 min, and 72 °C for 1 min. In program B, the Wrst step used 15 cycles of 95 °C for 30 s, 60 °C for 45 s with temperature decrement at ¡1 °C/cycle, and 72 °C for 1 min. The second step for program B was the same as that in program A.

222

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

Fig. 1. Schematic drawing illustrating the design of the REM-SBE technique. A restriction endonuclease recognition sequence, GGATC for FokI as shown, is inserted in the forward PCR primer so that the enzyme could cut the ampliWed PCR products at the position immediately upstream of the targeted polymorphic site. The cutting position is shown by the dashed line. The digestion created a 5⬘-overhang structure that could be extended by DNA polymerases when appropriate dye-labeled ddNTPs were provided. The labeled products could then be separated and identiWed by DNA sequencers.

For visualization, PCR products were stained with SYBR Green (Molecular Probes, Eugene, OR); they remained at room temperature for 20 min and were then separated by electrophoresis on 2% Agarose Gel (BioRad Laboratories, Hercules, CA). For any given gel analysis, the same amount of PCR products (5 l) was loaded to each lane. 1.4. Restriction digestion and SBE After PCR ampliWcation, 15 l of PCR products was incubated with 2 U of shrimp alkaline phosphatase (Roche, Indianapolis, IN) and 4 U of FokI RE (New England Biolabs, Beverly, MA) for 6 h at 37 °C to digest unincorporated nucleotides and to cut the amplicons at the designed position. The enzymes were then inactivated by heating for 15 min at 85 °C. The restriction-digested PCR products were labeled with SBE reaction using Xuorescent terminator nucleotides. The SBE reaction contained 10 l of digested PCR products, 2 l of 10£ sequencing buVer, 1 U of Taq DNA polymerase (New England Biolabs), and a mixture of Xuorescent terminator nucleotides (5-(and-6)-carb-

oxytetramethylrhodamine (TAMRA)-ddATP, TAMRAddCTP, rhodamine 110 (R110)-ddGTP, R110-ddUTP; 40 nM each) (Perkin–Elmer, Boston, MA). The mixture of terminators was designed to use only two Xuorescent dyes. When an A/C or G/T polymorphism was tested, R110-ddCTP and TAMRA-ddGTP would be used. This design was to simplifying color matrix correction. Distilled water was added to make a total of 20 l reaction volume, and the mixture was incubated for 1 h at 74 °C. 1.5. Capillary electrophoresis Following the incubation, SBE reactions were diluted to 35 l and puriWed by column Wltration using a Performa 96-well plate (Edge Biosystems, Gaithersburg, MD) following the manufacturer’s instruction. One microliter of the Wltered PCR product was resuspended in 9 l deionized formamide with 0.1 l of ILS-600 DNA size standard (Promega, Madison, WI). The fragments were then separated and identiWed by the SpectruMedix capillary sequencer SCE9610 (SpectruMedix LLC, State College, PA) under the following conditions: sample

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

223

Table 1 SNP information and primer sequences No

SNP

Primer Sequence

1

rs1156853

2

rs1345662

3

rs1990001

4

rs257926

5

rs1864922

6

rs246943

7

rs246945

8

rs149445

9

rs1560636

10

rs1422318

11

rs974495

12

rs2898095

13

rs2109857

14

rs27563

15

rs2045628

16

rs2041189

17

rs1609850

18

rs27562





5 -FCAACTTTCGGATGATAACCAGTA-3 5⬘-RAGAATTTTACCAGATCTCCAATGT-3⬘ 5⬘-FCACTTAGAGCGGATGGTAATTATGTCT-3⬘ 5⬘-RGAGGGCAAGCCTCTCTATATC-3⬘ 5⬘-FTCCAGGGGATGCATGTCCTGTTC-3⬘ 5⬘-RCCTTTCCCTGGCCTAGTACAG-3⬘ 5⬘-FTTCAACCGGATGCCAACTGAGCAC-3⬘ 5⬘-RTCCTGAAGGGATGAGTTCC-3⬘ 5⬘-FACCCGGATGCAACAGTCACC-3⬘ 5⬘-RTGCAAGAATTGAGCTTTAATA-3⬘ 5⬘-FCCTTATTTAGGGGATGTACAAACACTT-3⬘ 5⬘-RACGCCCGGCAAGATTCAT-3⬘ 5⬘-FAGAGGAGTGGATGCCTCTAATGTT-3⬘ 5⬘-RGGACACGCAGAATGGGAGA-3⬘ 5⬘-FATGAAAAGGATGGAGTCACTG-3⬘ 5⬘-RAAATACATCTAACCATATTTAAGAG-3⬘ 5⬘-FAATGAAAAGGATGGAGTCACTG-3⬘ 5⬘-RACCCCAGGAAAGGACAAAACAA-3⬘ 5⬘-FAGTTCTTGGGATGAAGGAAAT-3⬘ 5⬘-RCATTCCATGATATAATCTTTGTG-3⬘ 5⬘-FGTTGATGGGATGGTTAGAAAAAG-3⬘ 5⬘-RACAACACAAGGTAGTTTCACG-3⬘ 5⬘-FCAGGTAGGATGGGGCTTGTGTA-3⬘ 5⬘-RTCTCTAACATACCTATCAAGTCTA-3⬘ 5⬘-FTCCTGGGGATGGAAATAAGGAC-3⬘ 5⬘-RAGCGGAAACTGCCTTAGCTG-3⬘ 5⬘-FTGCTGGGATGCATTTTGATGTT-3⬘ 5⬘-RCCCACACAAGGGATTGAAA-3⬘ 5⬘-FAGTATAACAGGATGGAAAGAGCTG-3⬘ 5⬘-RATTCCTATTCTTGAAACCTCTGG-3⬘ 5⬘-FTGCAAATCGGATGCCTCTAGC-3⬘ 5⬘-RTTCTACTTTTATTCCATCATTGC-3⬘ 5⬘-FTTGAGACGGATGTGACTAACACTG-3⬘ 5⬘-RCCAGGTAATGAATAATGTGAGGT-3⬘ 5⬘-FAGTTTACGGATGATTTAGGTCTCC-3⬘ 5⬘-RGCAATTGTAAGATTCAGGGAAG-3⬘

Allele

Sizea(bp)

[a/g]

409

[c/g]

331

[a/g]

171

[a/g]

180

[c/t]

245

[c/t]

143

[c/g]

273

[c/t]

467

[a/g]

393

[a/g]

490

[a/g]

160

[a/t]

525

[a/g]

267

[a/t]

209

[a/g]

243

[c/g]

133

[a/t]

328

[g/c]

223

F, the forward common tail: 5⬘-CGGTGCGCGTCGCTCAGG. R, the reverse common tail: 5⬘-TCCGATATCCCGGGTCGT. The size was calculated after single-base extension.

a

Table 2 Examples of type II restriction endonucleases that can be used with the REM-SBE technique Restriction endonuclease

Recognition sequence and Cutting Site( ˆ)

BbvI

5⬘ƒ GCAGC(N)8 ˆƒ3⬘ 3⬘ƒ CGTCG(N)12 ˆƒ5⬘

BceAI



The GenoSpectrum software (SpectruMedix) was used to analyze the electropherogram.

2. Results 2.1. Multiplex PCR



5 ƒ ACGGC(N)10 ˆƒ3 3⬘ƒ TGCCG(N)14 ˆƒ5⬘

BtgZI

5⬘ƒ GCGATG(N)10 ˆƒ3⬘ 3⬘ƒ CGCTAC(N)14 ˆƒ5⬘

FokI

5⬘ƒ GGATG(N)9 ˆƒ3⬘ 3⬘ƒ CCTAC(N)13 ˆƒ5⬘

injection at 3.0 kV for 120 s, and data acquisition at 1.0 kV for 120 min. Electrophoresis was performed using sequencing gel from SpectruMedix in TBE buVer (0.09 M Tris, 0.09 M boric acid, pH 8.0, 0.002 M EDTA).

Our goals were to use DNA sequencer to increase throughput and reduce cost. To accomplish the goals, multiplexing was a necessity. A robust multiplex PCR protocol was an essential part of our SNP typing protocol. We used a two-step procedure to optimize multiplex PCR. Since the inclusion of a FokI site in the forward PCR primers could have a maximum of 5 base mismatches, single-marker PCR was performed to examine the speciWcity and eYciency. As shown in Fig. 2A, singlemarker PCRs worked well for all 18 markers despite some variations in ampliWcation eYciency. None of the

224

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

Wrst step were critical for successful multiplexing. In our testing, the range of concentration was between 20 and 40 nM for each primer. When the concentration was too low, some amplicons in the multiplex would not be seen; on the other hand, higher concentration normally led to uneven ampliWcation of the amplicons. The use of the two-domain primers, as observed by others [22], had eVectively improved the multiplex. 2.2. Alkaline digestions

Fig. 2. Comparison of single-marker PCR and Wve-plex multiplex PCR. (A, top) Single-marker PCRs (Nos. 1 to 18, see Table 1). The results indicated that the inclusion of a restriction recognition site in the forward PCR primers did not change the speciWcity of PCR. (B, bottom) Some examples of randomly assembled Wve-plex PCRs. The combinations of markers used were A, 2/6/7/9/10; B, 1/3/8/12/18; C, 1/ 2/3/4/6; D, 4/6/7/9/10; E, 2/9/10/11/13; F, 5/10/12/14/16; G, 2/13/14/15/ 16; H, 2/3/6/7/9; I, 1/5/7/9/12; J, 2/10/13/15/16; K, 11/13/15/16/17; L, 3/ 8/11/14/17; M, 1/4/10/13/16; N, 4/9/13/15/16; O, 1/7/10/15/16; P, 3/4/9/ 10/16; Q, 3/9/10/17/18 and R, 3/4/6/9/12. The numbers in the combinations referred to the order listed in (A, top). These results demonstrated that our two-domain primer design and two-stage protocol performed well for multiplex.

markers had nonspeciWc product. Interestingly, those markers that had lower eYciency in single-marker PCR were not necessarily weaker in a multiplex setup (such as marker 10 in combination E and marker 16 in combination G; Fig. 2B). Whether the variation in eYciency was caused by the marker itself or the mismatch introduced in the forward PCR primers was not clear. We noticed that there was a minimal concentration of genomic DNA necessary (72 ng/l) for successful multiplex PCRs. When the concentration of genomic DNA was lower than that, it would lead to insuYcient ampliWcation for some markers. While most Wve-plex combinations of the 18 markers could be successfully ampliWed, the competition between primers caused some uneven ampliWcation in some combinations. In a few combinations, we observed some nonspeciWc products with sizes more than 1 kb. The uniformity and speciWcity of multiplexed amplicons could be adjusted by using diVerent PCR programs. The touchdown program, program B, generated no nonspeciWc bands in any of the combinations tested. However, it had some diYculties in producing even PCR products in some of the Wve-plex combinations. The ramping program, program A, on the other hand, had better uniformity for PCR products in the Wve-plex sets (data not shown). Our two-step PCR procedures allowed us to achieve relatively even ampliWcation of all multiplexed amplicons. We found that the primer concentrations in the

phosphatase

and

restriction

enzyme

After PCR, we needed to inactivate excess dNTPs and to create an extendable end at the targeted polymorphic site. These tasks were accomplished by digestions of shrimp alkaline phosphatase and type II RE. When we Wrst tested the protocol, we performed the two enzyme digestions separately. To make the protocol more eYcient, we tested whether the two reactions could be combined. Side by side comparisons indicated that shrimp alkaline phosphatase and FokI endonuclease did not interfere with each other when they were used together (data not shown). Combined or separated, the FokI RE cut the DNA fragments precisely at the designed position for all 18 markers tested. The restriction digestion produced two DNA fragments for each marker and both fragments had a 5⬘-overhang structure that could be extended by DNA polymerase. For each marker, the smaller fragment of endonuclease digestion, which contained the enzyme recognition site, was a fragment of 40–50 bp (forward primer plus forward tail) and could not be easily distinguished between the markers. For the larger fragments, however, they were designed to have diVerent size for each amplicon so that they could be resolved by a capillary sequencer. When several Wve-plex PCRs were pooled together for the digestions, we extended the time of the digestions from 6 to 8 h. The amount of enzymes was kept the same. In our tests, when three Wve-plex PCRs were pooled for the digestions, results identical to those by individual PCRs were obtained (data not shown). We did not pool more reactions because we had only 18 markers tested; 15 of them were put into three multiplex reactions. We believed that substantially more reactions could be pooled. 2.3. SBE and genotyping scoring After the digestions, both the shrimp alkaline phosphatase and the FokI endonuclease were inactivated by heating at 85 °C for 15 min. SBE was performed using Taq DNA polymerase and Xuorescent terminators corresponding to the polymorphisms. Because the FokI digestion created a 5⬘-overhang structure at the polymorphic site, there was no need to use any extension primer. DNA polymerases extended the ends of the

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

restriction digestion and produced labeled, allele-speciWc DNA fragments. The labeled products could be easily separated and identiWed by DNA sequencers. We found that the SBE was much more eYcient when it was performed at elevated temperature because the sticky ends of FokI digestion could anneal together at room temperature and reduce extension eYciency. This was the primary reason that we decided to use a thermal stable DNA polymerase for the extension. A typical result is shown in Fig. 3. A homozygous sample had a single peak with one color and the color of the peak represented the allele. Fig. 3A shows a single blue peak of 273 bases as

225

expected for marker rs246945, indicating that the sample was a homozygote for allele C. Similarly, a single green peak is seen in Fig. 3C, representing a homozygote for the G allele. For the heterozygote, both blue and green peaks were seen (Fig. 3B). We noticed that the blue and green peaks were oVset by a few data points. This was because each Xuorescence group had a distinct mobility and the high resolution of CE was able to separate one from another. Even the same Xuorophore could be separated when it was linked to primers of the same length and sequence except the polymorphic base at the 3⬘ end [23]. The sensitivity of CE, therefore, was very helpful in

Fig. 3. Electropherograms showing allele discrimination for marker 7, rs246945 (G/C SNP, fragment size 273 bp). After FokI digestion, SBE was performed with R110-ddGTP and TAMRA-ddCTP and extension products were separated by CE. Products with expected size and allele-speciWc Xuorescent labeling were identiWed. R110-labeled products, blue peaks, represented the C allele, TAMRA-labeled products, green peaks, represented the G allele. Red peaks were DNA size standard ILS600. (A) C/C homozygote; (B) heterozygote; (C) G/G homozygote.

Fig. 4. Electropherogram showing marker separation and allele discrimination for a Wve-plex reaction (combination H: 2/3/6/7/9). F, R110-labeled ddGTP/ddUTP (blue peaks); T, TAMRA-labeled ddATP/ddCTP (green peaks); I, DNA size standard ILS 600 (red peaks).

226

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

identifying the heterozygous samples: they always had two peaks of diVerent colors and the two peaks were oVset by a few data points. The peak heights of the two peaks were approximately the same. While the peak height ratio changed between SNPs, it was constant for a given marker because the eYciency of incorporation of dye-terminators by Taq DNA polymerase was constant for a given sequence context [24,25]. When several SNPs were multiplexed together, the peak high pattern of individual SNPs did not change (comparing the heterozygous in Fig. 3 with marker 7 in Fig. 4, which was the same SNP rs246945 run by itself or multiplexed with other SNPs). To verify the accuracy and eYciency of the protocol, we typed 44 DNA samples for Wve SNPs in a single Wveplex PCR. After CE, the color and size of peaks along with peak height and peak area were exported from GenoSpectrum, the genotyping software from our sequencer vendor SpectruMedix. Genotypes were scored by a Microsoft Excel template [26] implementing the following criteria: (i) product size was within the range of § 1.5 bases of expected size; (ii) if the peak height ratio of allele 1/allele 2 was 710, the sample would be scored as homo allele 1; if the ratio was 60.1, the sample would be scored as homo allele 2; (iii) when the peak height ratios were between 0.1 and 10, genotypes would be scored by a cluster algorithm based on Euclidean distances. The results of the Wve-plex reaction are shown in Fig. 5. The genotypes scored from the Wve-plex setup were a 100% match with the genotypes scored from single-marker reaction and were in complete concordance with the genotypes obtained from a diVerent technology [13,27]. To be more eYcient and cost eVective, we needed to pool several multiplex PCRs together. In the protocol, there were several stages at which reactions could be pooled. Reactions could be pooled after PCR, or pooled after phosphatase and endonuclease digestion, or pooled after gel Wltration before sequencer run. Obviously, the earlier that we could pool in the protocol, the more eYcient that we could be. We tried to multiplex more SNPs in a PCR and found that it was signiWcantly more diYcult when more than Wve SNPs were multiplexed. Therefore, we settled on multiplexing Wve SNPs in PCR. We then tried to pool several multiplexed PCRs together for the phosphatase and endonuclease digestion. Fig. 6 shows some results of the pooling tests. Fig. 6A was an experiment that pooled two Wve-plex PCRs for the phosphatase and endonuclease digestion. Ten SNPs were clearly typed in a single capillary. Fig. 6B was an experiment that pooled three Wve-plex PCRs. All 15 markers were separated and typed. The genotypes scored from the pooled samples matched with those in the single Wve-plex setup, indicating that the pooling of several reactions did not compromise the phosphatase and endonuclease digestion and did not sacriWce genotype quality.

3. Discussion In this article, we demonstrate the principles of a new SNP typing method that uses type II restriction enzymes and DNA sequencers. We show that a recognition site can be engineered in one of the PCR primers and that the mismatches introduced by the recognition site do not compromise the eYciency and speciWcity of PCR. We further demonstrate that an extendable 5⬘-overhang structure is produced precisely immediately before targeted SNP sites and allele-speciWc products are produced by SBE reactions. The quality and accuracy of the method are illustrated by typing Wve SNPs simultaneously in a single PCR for 44 subjects. In the protocol, we show that we can increase our eYciency by multiplexing PCR and by pooling several multiplexed reactions for a sequencer run. For multiplex PCR, we use a two-domain primer design that has a target-speciWc domain at the 3⬘ end and a common tag at the 5⬘ end. We Wnd that our two-staged procedure works well for multiplexing four or Wve markers. In the 18 SNPs used in this study, we randomly select 5 SNPs to multiplex, and most of them work well on the Wrst try (Fig. 2 shows some of these results). We believe that in addition to the two-staged procedure and twodomain primer design, the primer concentration used in the Wrst reaction and the mismatches introduced by the FokI site in the primers contribute to the success of multiplex PCR. Pooling of several multiplexed reactions for enzyme digestions, cleanup, and sequencer run is a key to reduce overall cost of genotyping because these are the most timeconsuming and expensive steps in the protocol. The more that we can pool, the lower would be the cost. If we could pool 10 Wveplex reactions, we would type 50 SNPs in a capillary. The throughput would be signiWcant and the cost would be competitive. An important reason to develop a sequencer-based SNP typing method is to use DNA sequencers that are widely accessible in biomedical research laboratories and biotech/genomic companies. Almost all methods currently available for SNP typing require specialized/dedicated instruments that are expensive and beyond the reach for most laboratories. A method that uses popular sequencers would meet the needs of these laboratories and signiWcantly stimulate research in SNP-related Welds. We believe that our method described here meets this demand. There are some limitations of our method. The use of a RE to produce extendable ends at the polymorphic sites makes it diYcult to type those SNPs that are located close to the same restriction recognition sequence used in the PCR primers. Fortunately, there are several other restriction enzymes we can use in our method. These include FokI, BbvI, BtgZI, and BceAI (Table 2). All these enzymes have at least 8 bp between the cutting site and the recognition sequence; this is suYcient to allow

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

227

Fig. 5. Genotyping results from a Wve-plex reaction (combination A). Five SNPs were typed for 44 subjects. After the reactions and puriWcation, samples were separated in SCE9610 sequencer. Genotypes were scored by peak size, peak color, and peak height ratio as described in the text.

robust PCR, as demonstrated by FokI whose distance is 9 bp between the cutting site and the recognition sequence (Fig. 2). If one enzyme does not work for a given SNP, we can use a diVerent one. Due to the limitation of distance between recognition and cutting sites of the restriction enzymes, PCR primer design may be restricted to some extent. Since there are two orientations to design PCR primer for a given SNP, this would be manageable.

4. Conclusion We report the development of a SNP typing method that combines the accuracy of SBE reaction and the sensitivity of CE. SBE is one of the best biochemistries for SNP typing and most commercially available techniques today use this same biochemistry. DNA sequencers and other CE platforms have proven high-throughput and

228

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229

Fig. 6. Examples of electropherograms showing marker separation and allele discrimination for pooled multiplex PCRs. (A) Pooling of two Wve-plex reactions. Ten markers (1/2/7/10/11/13/15/16/17/18) were clearly separated. (B) Three Wve-plex reactions were pooled. Fifteen markers (1/2/3/4/6/7/8/ 9/10/12/13/15/16/18) were separated and genotypes scored. The results show that pooling several multiplex PCRs is an eVective way to increase throughput and to reduce cost.

automation capabilities. The primary limitation in using CE eYciently is to create a set of allele-speciWc products with diVerent lengths. We overcome this barrier by taking advantage of type II restriction enzymes and engineering an enzyme recognition site in one of the PCR primers. This design enables us to obtain allele-speciWc products as long as a PCR amplicon. By varying the size of a PCR products purposefully, we can stack many SNPs in a capillary and use the resolution power of CE eYciently. As a result, DNA sequencers can be used for SNP typing eYciently and economically.

Acknowledgments The authors thank Ms. Xu Wang and Shaon Hussein for their help in sample preparation and other technical assistance. We are grateful to Dr. Kenneth S. Kendler for his critical reading of the manuscript. The study was supported by funds from the Virginia Institute for Psychiatric and Behavioral Genetics and the Department of Psychiatry.

References [1] R. Sachidanandam, D. Weissman, S.C. Schmidt, J.M. Kakol, L.D. Stein, G. Marth, S. Sherry, J.C. Mullikin, B.J. Mortimore, D.L. Willey, S.E. Hunt, C.G. Cole, P.C. Coggill, C.M. Rice, Z. Ning, J. Rogers, D.R. Bentley, P.Y. Kwok, E.R. Mardis, R.T. Yeh, B. Schultz, L. Cook, R. Davenport, M. Dante, L. Fulton, L. Hillier, R.H. Waterston, J.D. McPherson, B. Gilman, S. SchaVner, W.J. Van Etten, D. Reich, J. Higgins, M.J. Daly, B. Blumenstiel, J. Baldwin, N. Stange-Thomann, M.C. Zody, L. Linton, E.S. Lander, D. Atshuler, A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms, Nature 409 (2001) 928–933. [2] J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton, H.O. Smith, M. Yandell, C.A. Evans, R.A. Holt, J.D. Gocayne, P. Amanatides, R.M. Ballew, D.H. Huson, J.R. Wortman, Q. Zhang, C.D. Kodira, X.H. Zheng, L. Chen, M. Skupski, G. Subramanian, P.D. Thomas, J. Zhang, G.L. Gabor Miklos, C. Nelson, S. Broder, A.G. Clark, J. Nadeau, V.A. McKusick, N. Zinder, A.J. Levine, R.J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R.

[3] [4]

[5] [6]

[7]

[8]

Charlab, K. Chaturvedi, Z. Deng, F.V. Di, P. Dunn, K. Eilbeck, C. Evangelista, A.E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T.J. Heiman, M.E. Higgins, R.R. Ji, Z. Ke, K.A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G.V. Merkulov, N. Milshina, H.M. Moore, A.K. Naik, V.A. Narayan, B. Neelam, D. Nusskern, D.B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M.L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y.H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N.N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. WinnDeen, K. Wolfe, J. Zaveri, K. Zaveri, J.F. Abril, R. Guigo, M.J. Campbell, K.V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y.H. Chiang, M. Coyne, C. Dahlke, A. Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, The sequence of the human genome, Science 291 (2001) 1304–1351. P.Y. Kwok, Methods for genotyping single nucleotide polymorphisms, Annu. Rev. Genomics Hum. Genet. 2 (2001) 235–258. X. Chen, P.F. Sullivan, Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput, Pharmacogenomics J. 3 (2003) 77–96. A.C. Syvanen, Accessing genetic variation: Genotyping single nucleotide polymorphisms, Nat. Rev. Genet. 2 (2001) 930–942. J.B. Fan, X. Chen, M.K. Halushka, A. Berno, X. Huang, T. Ryder, R.J. Lipshutz, D.J. Lockhart, A. Chakravarti, Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays, Genome Res. 10 (2000) 853–860. B. Armstrong, M. Stewart, A. Mazumder, Suspension arrays for high throughput, multiplexed single nucleotide polymorphism genotyping, Cytometry 40 (2000) 102–108. T.J. GriYn, L.M. Smith, Single-nucleotide polymorphism analysis by MALDI-TOF mass spectrometry, Trends Biotechnol. 18 (2000) 77–84.

Y. Che, X. Chen / Analytical Biochemistry 329 (2004) 220–229 [9] M.S. Bray, E. Boerwinkle, P.A. Doris, High-throughput multiplex SNP genotyping with MALDI-TOF mass spectrometry: practice, problems and promise, Hum. Mutat. 17 (2001) 296–304. [10] P.A. Bell, S. Chaturvedi, C.A. Gelfand, C.Y. Huang, M. Kochersperger, R. Kopla, F. Modica, M. Pohl, S. Varde, R. Zhao, X. Zhao, M.T. Boyce-Jacino, SNPstream UHT: ultra-high throughput SNP genotyping for pharmacogenomics and drug discovery, Biotechniques Suppl. (2002) 70–77. [11] K.J. Livak, Allelic discrimination using Xuorogenic probes and the 5⬘ nuclease assay, Genet. Anal. 14 (1999) 143–149. [12] M. Ronaghi, Pyrosequencing sheds light on DNA sequencing, Genome Res. 11 (2001) 3–11. [13] X. Chen, L. Levine, P.Y. Kwok, Fluorescence polarization in homogeneous nucleic acid analysis, Genome Res. 9 (1999) 492–498. [14] C. Heller, Principles of DNA separation with capillary electrophoresis, Electrophoresis 22 (2001) 629–643. [15] J.P. Schouten, C.J. McElgunn, R. Waaijer, D. Zwijnenburg, F. Diepvens, G. Pals, Relative quantiWcation of 40 nucleic acid sequences by multiplex ligation-dependent probe ampliWcation, Nucleic Acids Res. 30 (2002) e57. [16] I. Medintz, W.W. Wong, G. Sensabaugh, R.A. Mathies, High speed single nucleotide polymorphism typing of a hereditary haemochromatosis mutation with capillary array electrophoresis microplates, Electrophoresis 21 (2000) 2352–2358. [17] S. Ayyadevara, J.J. Thaden, R.J. Shmookler Reis, Discrimination of primer 3⬘-nucleotide mismatch by taq DNA polymerase during polymerase chain reaction, Anal. Biochem. 284 (2000) 11–18. [18] W.H. Liu, M. Kaur, G.M. Makrigiorgos, Detection of hotspot mutations and polymorphisms using an enhanced PCR-RFLP approach, Hum. Mutat. 21 (2003) 535–541.

229

[19] Z. Ronai, M. Sasvari-Szekely, A. Guttman, Miniaturized SNP detection: quasi-solid-phase RFLP analysis, Biotechniques 34 (2003) 1172–1173. [20] A. Pingoud, A. Jeltsch, Structure and function of type II restriction endonucleases, Nucleic Acids Res. 29 (2001) 3705– 3727. [21] S. Rozen, H. Skaletsky, Primer3 on the WWW for general users and for biologist programmers, Methods Mol. Biol. 132 (2000) 365–386. [22] Shuber AP., 1999. Universal primer sequence for multiplex DNA ampliWcation. US patent 5882856. [23] G. Matyas, C. Giunta, B. Steinmann, J.P. Hossle, R. Hellwig, QuantiWcation of single nucleotide polymorphisms: a novel method that combines primer extension assay and capillary electrophoresis, Hum. Mutat. 19 (2002) 58–68. [24] L.T. Parker, Q. Deng, H. Zakeri, C. Carlson, D.A. Nickerson, P.Y. Kwok, Peak height variations in automated sequencing of PCR products using Taq dye-terminator chemistry, Biotechniques 19 (1995) 116–121. [25] L.T. Parker, H. Zakeri, Q. Deng, S. Spurgeon, P.Y. Kwok, D.A. Nickerson, AmpliTaq DNA polymerase, FS dye-terminator sequencing: analysis of peak height patterns, Biotechniques 21 (1996) 694–699. [26] E.J. van den Oord, Y. Jiang, B.P. Riley, K.S. Kendler, X. Chen, FPTDI SNP scoring by manual and statistical procedures: a study of error rates and types, Biotechniques 34 (2003) 610–620, see also p. 622. [27] X. Chen, Fluorescence polarization for single nucleotide polymorphism genotyping, Comb. Chem. High Throughput Screen. 6 (2003) 213–223.