Analytical Biochemistry 377 (2008) 46–54
Contents lists available at ScienceDirect
Analytical Biochemistry journal homepage: www.elsevier.com/locate/yabio
Single-molecule polymerase chain reaction reduces bias: Application to DNA methylation analysis by bisulfite sequencing Aparna Chhibber, Benjamin G. Schroeder * Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA 94404, USA
a r t i c l e
i n f o
Article history: Received 10 November 2007 Available online 4 March 2008
Keywords: DNA methylation 5-Methylcytosine Bisulfite Single-molecule PCR Artifacts DNA sequencing
a b s t r a c t The treatment of DNA with bisulfite, which converts C to U but leaves 5-methyl-C unchanged, forms the basis of many analytical techniques for DNA methylation analysis. Many techniques exist for measuring the methylation state of a single CpG but, for analysis of an entire region, cloning and sequencing remains the gold standard. However, biases in polymerase chain reaction (PCR) amplification and in cloning can skew the results. We hypothesized that single-molecule PCR (smPCR) amplification would eliminate the PCR amplification bias because competition between templates that amplify at different efficiencies no longer exists. The amplified products can be sequenced directly, thus eliminating cloning bias. We demonstrated this accurate and unbiased approach by analyzing a sample that was expected to contain a 50:50 ratio of methylated to unmethylated molecules: a region of the X-linked FMR1 gene from a human female cell line. We compared traditional cloning and sequencing to smPCR and sequencing. Sequencing smPCR products gave an expected methylated to unmethylated ratio of 48:52, whereas conventional cloning and sequencing gave a biased ratio of 72:28. Our results show that smPCR sequencing can eliminate both PCR and cloning bias and represents an attractive approach to bisulfite sequencing. Ó 2008 Elsevier Inc. All rights reserved.
DNA methylation, in which cytosine in a 50 -CpG-30 sequence context is methylated at the 5 position, is associated with normal processes such as epigenetic reprogramming [1], X chromosome inactivation [2], environmental exposure and aging [3,4], and disease processes [5,6] including cancer [7,8], mental retardation, and autoimmune disease [9]. Advances in the bisulfite conversion of DNA [10,11] have led to increased use of this method for methylation analysis. Once normal C is converted to U, analysis essentially becomes a problem of identifying and measuring sequence differences. Many techniques have been developed, spanning the spectrum from those that analyze a single nucleotide difference to those that measure global changes in methylation [8,12]. One very popular technique is bisulfite sequencing, first described by Frommer et al. [13] in 1992. In contrast to single-nucleotide-polymorphism-based techniques, sequencing a CpG-containing region measures the methylation state of all the CpGs within the region and is useful for detailed analysis and initial surveys of regions suspected to be under methylation regulation. Another advantage of bisulfite sequencing is the built-in quality control for bisulfite conversion. Because methylation happens only at CpG, all CpH (CpA, CpC, and CpT) cytosines should be converted to T in the resulting bisul-
* Corresponding author. Fax: +1(650)-554-2660. E-mail addresses:
[email protected],
[email protected] (B.G. Schroeder). 0003-2697/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ab.2008.02.026
fite sequence. The detection of CpH is evidence of incomplete bisulfite conversion. There are two methods commonly used when performing bisulfite sequencing, each with benefits and drawbacks. One method is to directly sequence the amplicon. The other method is to PCR amplify a region of interest, clone it, and then pick several clones for sequencing. Although direct amplicon sequencing is faster and requires less sequencing, there are several issues to consider. Sequence data generated from fluorescently labeled dideoxy terminators requires extensive normalization and processing to quantitatively detect 20% differences [14]. Pyrosequencing is more quantitative but is limited to at most 15 CpGs in an amplicon of no larger than 100 bp [15]. Furthermore, any approach that analyzes a pool of molecules, regardless of its quantitative accuracy, cannot provide information about the methylation haplotype, that is, the methylation status of each CpG within an individual DNA molecule. Finally, when analyzing PCR amplicons derived from a mixed pool of template molecules, care must be taken to measure PCR bias, which can often be severe [16–18]. If PCR bias is observed, steps must be taken to correct for it or to change reaction conditions to try to reduce it. PCR bias is a result of the difference in amplification efficiencies of sequences within a PCR. Sequences that amplify more efficiently become overrepresented in the final amplicon population. Several approaches for the reduction of bisulfite-specific PCR bias have been reported. Voss et al. [17]
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
47
reported that bias in favor of unmethylated targets could be reduced by the addition of betaine to the PCR. Wojdacz and Hansen [19] found that the amplification bias in favor of unmethylated targets could be reversed by redesigning the primers to include one or two CpGs. Shen et al. [20] reported that the amount of PCR bias observed can change depending on the annealing temperature of the PCR. Unlike direct PCR sequencing, PCR amplification followed by cloning and sequencing has the advantage of producing methylation haplotypes. However, the process still relies on PCR from a pool of molecules and can be subject to PCR bias. One additional source of bias associated with cloning and sequencing is the cloning step itself [21], where the ratio of methylated to unmethylated PCR products is not faithfully maintained through the ligation, transformation, and bacterial growth steps. We sought to circumvent these problems by using single-molecule PCR (smPCR)1 bisulfite sequencing. smPCR has been employed for the analysis of single-copy genes in sperm (see [22] and references therein), for digital PCR [23], for analysis of somatic mutations (reviewed in [24]), for massively parallel sequencing (reviewed in [25]), and for screening protein expression libraries (see [26] and references therein). Here we demonstrate that single-molecule PCR bisulfite sequencing provides all the benefits of bisulfite cloning and sequencing (quantitation, methylation haplotypes, and internal QC) while eliminating the problems of PCR and cloning bias. The result is detailed, unbiased, molecule by molecule methylation mapping which can provide quantitative descriptions of the methylation state of the region of interest and the methylation status of each individual CpG within the region.
pCR4-TOPO using the TOPO TA Cloning Kit (Invitrogen) according to the manufacturers instructions, and a 2-ll aliquot of this reaction was used to transform One Shot TOP 10 chemically competent cells (Invitrogen). After overnight growth on LB agar plates containing 30 lg/ml kanamycin, isolated colonies were inoculated into 2.3 ml of LB containing 100 lg/ml ampicillin for overnight growth at 37 °C with shaking. Plasmids were purified with the QIAprep Spin Miniprep Kit (Qiagen). Sequencing reactions contained 2 ll plasmid DNA, 4 ll BigDye Terminator v3.1 Ready Reaction Mix (Applied Biosystems), and 320 nM primer 50 - ATTAATGCAGCTGG CACGAC in a 10-ll volume. Reactions were thermally cycled as follows: 1 min at 96 °C, 25 cycles of 96 °C/10 s, 50 °C/4 min, then purified with the BigDye XTerminator Purification Kit (Applied Biosystems) according to the manufacturer’s instructions. Purified sequencing products were analyzed on a 3130xl Genetic Analyzer with a 50-cm array and POP-7 polymer (all from Applied Biosystems). Data were analyzed with Sequencing Analysis 5.2 with the KB Basecaller and SeqScape 2.5 (all from Applied Biosystems).
Materials and methods
Single molecule PCR
Bisulfite conversion
SYBR Green reaction mixtures were prepared with a series of 10-fold serial dilutions of bisulfite-converted genomic DNA template. For each template dilution, and a no-template control mixture, a 5-ll aliquot was transferred into each of 12 wells for real-time PCR analysis as described above. Negative wells were counted and used to calculate the copies/ll concentration of the bisulfite-converted genomic DNA stock as follows. The distribution of molecules into wells is modeled as a Poisson distribution, where the probability of getting k molecules in a well is P(k) = ek kk / (k!), where e is the base of the natural logarithm, k! is the factorial of k, and k is equal to the expected number of occurrences per well, in other words, the average number of molecules per 5-ll aliquot. From this relation, it follows that the probability of getting 0 molecules in a well is P(0) = ek, the probability of getting one molecule in a well is P(1) = ekk, and the probability of getting two or more molecules in a well is P(>1) = 1P(0)P(1). A positive amplification can occur as a result of one or more than one molecule present in the well, whereas a negative amplification indicates that no molecules were present or that the PCR failed, but this can be ruled out if higher concentrations in the dilution series generated amplification. The fraction of wells that do not amplify should therefore equal P(0), and thus the average number of molecules per 5-ll aliquot is k = ln(P(0)). Based on the dilution used, the stock concentration can then be calculated. For single-molecule methylation analysis of FMR1, a 1X SYBR Green reaction containing 250 nM each primer and 0.01 copies/ll bisulfite-converted genomic DNA was prepared and 20 ll per well distributed to 336 wells of a 384-well plate. The remaining 48 wells were filled with 20 ll each of the corresponding no-template control reaction. The plate was thermally cycled as above. For analysis without a real-time PCR instrument, after thermal cycling the
Female human cell line genomic DNA from CEPH individual 1347-02 (Applied Biosystems 403062) or in vitro universally methylated genomic DNA (Chemicon S7821) was converted using the methylSEQr Bisulfite Conversion Kit (Applied Biosystems 4379580) according to the manufacturer’s instructions. Genomic DNA (300 ng) was converted and isolated in a total volume of 50 ll for a concentration of 6 ng/ll. PCR, cloning, and sequencing The FMR1 region was amplified in a 50-ll PCR containing 1X Gold Buffer, 200 nM each dNTP, 2 mM MgCl2, 0.1 unit AmpliTaq Gold polymerase (all from Applied Biosystems), 0.5 mg/ml bovine serum albumin (Sigma B8667), 0.5% (v/v) glycerol (Shelton Scientific IB15760), 250 nM each primer (forward: 50 -GTGTAAAACGA CGGCCAGTTGAGTGTATTTTTGTAGAAATGGG; reverse: 50 - GCAG GAAACAGCTATGACCTCTCTCTTCAAATAACCTAAAAAC; underlined portions correspond to sequencing primer tails), and 30 ng (approximately 10,000 copies) bisulfite-converted genomic DNA. The reaction was thermally cycled in a GeneAmp PCR System 9700 (Applied Biosystems) as follows: 5 min at 95 °C (to activate the hot-start polymerase), 5 cycles of 95 °C/30 s, 60 °C/2 min, 72 °C/3 min; 30 cycles of 95 °C/30 s, 65 °C/1 min, 72 °C/3 min; hold at 72 °C/7 min; hold at 4 °C. The control was an identical reaction without genomic DNA. An aliquot of each reaction was analyzed by agarose gel electrophoresis to confirm the presence of the expected 312-bp amplicon and the absence of any product in the control reaction. One microliter of PCR product was cloned into
1
Abbreviations used: smPCR, single-molecule PCR; NTC, no-template control.
Primer screening Twelve primer sets previously described [18] were screened for their propensity to generate template-independent amplification (primer dimer). Duplicate 10-ll no-template reactions containing 1X SYBR Green Master Mix (Applied Biosystems 4309155) and 250 nM each primer were thermally cycled as follows: 10 min at 95 °C (to activate the hot-start polymerase), 5 cycles of 95 °C/ 15 s, 60 °C/2 min, 72 °C/3 min; 40 cycles of 95 °C/15 s, 65 °C/ 1 min, 72 °C/3 min. Reactions were monitored on a 7900HT RealTime PCR instrument (Applied Biosystems).
48
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
plate was heated to 70 °C for 1 min and then imaged on a 365-nm UV transilluminator with a standard ethidium bromide filter (AlphaImager HP; Alpha Innotech).
Results and discussion
Sequencing single-molecule PCR products
Primers specific for bisulfite-converted DNA and tailed with M13-forward and M13-reverse sequencing primer binding sites were used to amplify a region of the FMR1 gene. The primer binding sites do not contain CpGs and will amplify the target regardless of methylation status. The process of X-inactivation in females, in which one of the two copies‘ X-linked genes is methylated, provides a natural 50:50 mixture of methylated and unmethylated DNA. The X-linked gene FMR1 was chosen to test for bias in the detection of methylated and unmethylated sequences. Bisulfite-converted female human (CEPH) genomic DNA or bisulfite-converted universally methylated genomic DNA was amplified with M13-tailed primers specific for the top strand of the FMR1 gene and sequenced with M13 forward primer (Fig. 1).
PCR products (10 ll) were mixed with 4 ll ExoSAP-IT (USB), incubated at 37 °C for 30 min followed by 80 °C for 15 min and then diluted with 42 ll water. Sequencing reactions contained 2 ll diluted amplicon, 4 ll BigDye Terminator v3.1 Ready Reaction Mix, and 320 nM primer (either M13-forward 50 - TGTAAAACG ACGGCCAGT or M13-reverse 50 - CAGGAAACAGCTATGACC) in a 10 ll volume. Reactions were thermally cycled as follows: 1 min at 96 °C, 25 cycles of 96 °C/10 s, 50 °C/4 min, and then purified with the BigDye XTerminator Purification Kit. Purified sequencing products were analyzed on a 3730xl DNA Analyzer with a 36-cm array and POP-7 polymer.
Direct PCR sequencing
Fig. 1. The M13-forward primer was used to directly sequence FMR1 PCR products. Sequencing template for the upper sequence trace is the PCR product from universally methylated genomic DNA, which contains only methylated CpGs. Lower trace template is from CEPH female genomic DNA, which contains both methylated and unmethylated CpGs.
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
Sequencing extension temperature was lowered to 50 °C instead of the normal 60 °C to reduce the top-heavy ‘‘ski-slope” in peak heights observed when generating the C-poor sequencing products from unmethylated templates [27]. As expected, all non-CpG cytosines have been converted to T. In the universally methylated sample (Fig. 1, upper trace), all CpG cytosines remain C. In the female sample (Fig. 1, lower trace), mixed C/T sequences appeared at each CpG location, indicating a mixture of methylated and unmethylated C at these positions. Whereas direct sequencing of PCR products derived from bisulfite-converted DNA is simple and straightforward, the analysis of the results from a mixed methylation sample is not. As seen in the electropherograms in Fig. 1, a homogeneous template, such as that derived from universally methylated genomic DNA, produces a clean sequence (upper trace), whereas templates derived from heterogeneously methylated samples, such as the female genomic DNA used here, yield electropherograms that display several characteristics that confound quantitative analysis (lower trace). The Big Dyes on the C and T terminators have different quantum efficiencies, resulting in different peak intensities. Also, the C and T terminators can be incorporated at different efficiencies. These two effects can result in peak heights different from expected ratio. In addition, several researchers have reported PCR bias resulting in nonuniform amplification of a mixed methylation sample [16,18]. Whereas PCR bias appears to be strong for some amplicons and weak for others, and there are several reported remedies, any approach to methylation analysis that relies on an upfront PCR amplification step must be carefully tested for PCR bias. Peak broadening and splitting become apparent near the 3’ end of the mixed methylation sequence trace (lower trace of Fig. 1) due to the sequence-dependent differences in mobility of the sequencing product. Boyd et al. [18] demonstrated that, in POP-4 polymer, each C/T difference results in approximately a 0.1 nucleotide difference in mobility. Over the 22 CpG sites in the FMR1 amplicon, methylated and unmethylated sequencing products have considerably different mobilities, resulting in the observed peak splitting.
Cloning and sequencing Cloning and sequencing of PCR products from bisulfite-converted DNA is a widely used technique for methylation analysis. This approach eliminates the problems of peak splitting and peak quantitation. In addition, it yields the methylation state of each CpG for each molecule cloned, rather than an average methylation percentage at each CpG. Siegmund and Laird [28] proposed the term ‘‘methylation pattern” to describe analysis of multiple-linked CpGs by such methods as methylation-specific PCR, MethyLight, and bisulfite-cloned sequencing. In contrast to methylation-specific PCR and MethyLight, which provide a composite answer based on a pool of molecules, cloning and sequencing (and smPCR sequencing; see below) provide molecule by molecule information. Therefore, we introduce the term ‘‘methylation haplotype” to describe the status of a group of CpGs on an individual DNA molecule. Methylation haplotypes can unravel epigenetic phenomena not easily detected by other DNA methylation analysis methods. For example, methylation haplotypes clearly reveal the bimodal distribution of methylation patterns in X-inactivated genes or the linkage between one allele and a particular methylation pattern. Methylation haplotypes were determined by cloning the same amplicon from bisulfite-converted CEPH female human genomic DNA used in Fig. 1 into the vector pCR 4-TOPO; 52 two plasmid clones were isolated, purified, and sequenced. Of these 2 plasmids gave low-quality values and were removed from the analysis. The remaining 50 sequences could be grouped into two sets shown in Table 1: completely unmethylated (14 clones; 28%) and completely
49
or partially methylated (36 clones; 72%). Of these 36 clones, 10 were completely methylated at all 22 CpG positions and 26 were partially methylated, containing 1–5 nonmethylated CpG sites. Our bacterial cloning results show a bias against AT-rich clones, resulting in only 28% unmethylated sequences recovered rather than the expected 50%. A previous study of this region [29] reported a larger than expected number of AT-rich clones, which may reflect the PCR bias in favor of AT-rich amplicons [16,18]. Our cloning results may reflect a reported cloning bias against AT-rich clones [10,21,30,31]. The bias against ligation and propagation of AT-rich sequences may vary from one cloning system to another. Several deviations from the expected sequences, including three clones that show evidence of incomplete bisulfite conversion and several mutations, were observed (Table 2). Excluding the incomplete bisulfite conversion, PCR misincorporation (mutation) is well documented and is the most likely cause of the remaining mutations observed. Benefits of single-molecule PCR Given the above results, we hypothesized that smPCR amplification and sequencing would produce less-biased results by eliminating both bacterial cloning bias and PCR bias. PCR bias is eliminated because there is no competition between template sequences of higher and lower amplification efficiency when there is only a single template molecule in the PCR. The reaction can be cycled until even the less-efficiently amplified products accumulate to the point that they can be easily sequenced. In this work, 45 cycles were found to amplify all smPCR products to plateau phase. Since the single-molecule amplicon can be sequenced directly, no bacterial cloning is required. Screening PCR primers for primer dimer A requirement for successful single-molecule PCR is a very low occurrence of non-template-dependent amplification or primer dimer. Primer dimer formation can be greatly reduced by using a hot-start technique [32]. However, even with hot-start, the challenge of primer-dimer-free bisulfite-specific PCR is somewhat greater than that for normal PCR. Successful primer design is more challenging because the bisulfite-converted DNA has lower complexity, the primers are often AT-rich and therefore need to be longer to bring them up to an acceptable Tm, and the design is further constrained by having to anneal outside of CpG sites. Nonetheless, screening for primer dimer is as simple as running a no-template control reaction. We screened 12 primer sets for their propensity to form amplification products in the absence of template. These primer sets are specific for bisulfite-converted DNA, regardless of methylation status, and contain M13 tails [18]. As shown in Table 3, four primer sets showed no template-independent amplification: RasSF, FMR1, p16, and MLH1. When 3 ng of bisulfite-converted genomic DNA (approximately 1000 copies) was used as template, all primer sets generated amplification, as expected, with threshold cycles ranging from 24 to 29. These four primer sets were deemed suitable for smPCR because they did not form primer dimer yet could amplify the target when supplied. In contrast, a primer-dimerforming primer set such as that for SRBC, where NTC reactions generate Ct values of 27, was found unsuitable for smPCR because the primer dimer would out-amplify a single template molecule. However, even for a ‘‘well-behaved” primer set, such as FMR1, the occasional primer dimer is observed when large numbers of NTC reactions are performed (see below). These products are smaller than the real amplicon and their sequences are a combination of the forward and reverse primer sequences (data not shown).
50
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
Table 1 Bisulfite sequencing from 50 plasmid clones of the FMR1 amplicon from female human DNA results in biased overrepresentation of methylated molecules
Therefore they do not represent genomic DNA contamination or amplicon carryover contamination; rather, they are de novo artifacts. smPCR sequencing of FMR1 The same bisulfite-converted CEPH female genomic DNA used above was diluted to 0.2 copy per well based on initial quantification by titration (see Materials and methods) for smPCR with the FMR1 primer set in 20-ll SYBR Green PCRs. The SYBR Green fluorescence of a 384-well plate of reactions
can be monitored in real time during PCR and during postPCR melting curve analysis. Wells with amplification product can also be identified by using a fluorescent plate reader or a UV transilluminator as shown in Fig. 2. SYBR Green becomes fluorescent in the presence of PCR primers at room temperature, but this fluorescence decreases rapidly as the temperature is increased. To reduce the background fluorescence of the SYBR Green binding to primer, the plate was incubated at 70 °C for 2 min prior to UV imaging. Forty-eight NTC reactions were run at the same time in the bottom two rows of the plate. Six (12.5%) of the 48 NTC reactions
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
51
Table 2 Unexpected base changes observed in FMR1 plasmid clones Base changea
Clone 6 22 5 38 44 25 16 29
57T > C 73A > G 126A > G 126A > G 126A > G 129G > Ad 11T > Cd 47T > Cd 76T > Cd 5T > Cd 9T > Cd 10T > Cd 20T > Cd 72T > Cd 76T > Cd 77T > Cd 101T > Cd 102T > Cd 132T > Cd
28
a b c d
QVb
c
50 50 50 50 48 35 50 50 50 50 50 50 35 50 50 50 50 50 50
First base of nonprimer amplicon is base number 1. Quality value. T in genomic sequence and not a result of incomplete bisulfite conversion. Evidence of incomplete bisulfite conversion.
Table 3 Results from real-time PCR screening identifies primer sets that do not form primer dimer Ct Primera BRCA SRBC TIMP CDH1 MY0D1 RasSF FMR1 MGMT APC p16 ER MLH1
NTCb Und. 27 36 28 26 Und. Und. 35 28 Und. 31 Und.
d
gDNAc 38 27 37 29 26 Und. Und. 33 28 Und. Und. Und.
29 24 28 26 25 25 26 25 24 25 28 28
29 24 27 26 25 26 25 25 24 25 27 28
a
Primers sequences are listed in [18]. Two replicate no-template-control reactions. c Two replicate reactions containing 3 ng bisulfite-converted universally methylated genomic DNA. d Undetermined, i.e., no amplification seen after 45 cycles. b
showed amplification. In contrast, 133 (39.6%) of 336 wells containing dilute bisulfite-converted DNA showed amplification. From these template-containing reactions, 96 SYBR Green-positive amplicons were sequenced as described under Materials and methods. As expected from the NTC reaction results, 41 sequences were primer dimer. Eight sequences had low-quality values. Four sequence traces showed one or more mixed bases at a CpG, indicating that these PCR wells contained multiple templates. One sequence did not match FMR1, was longer than a primer dimer, and was discarded. The remaining 42 sequences were high-quality FMR1 amplicon sequences and the methylation state of each CpG could be easily read (Fig. 3). As done for the plasmid clones, the smPCR sequences were grouped into two sets as shown in Table 4: completely unmethylated (22 clones; 52%) and completely or partially methylated (20 clones; 48%).The observed ratio of 52%
Fig. 2. Fluorescence image of smPCR plate reveals single-molecule amplicons. The top 14 rows started with 0.2 copy/well of bisulfite-converted gDNA template. Fluorescent wells indicate the presence of either FMR1 amplicon or primer dimer. The bottom 2 rows contain no template. The fluorescent wells indicate the presence of primer dimer.
unmethylated to 48% methylated was very close to the expected 50:50 ratio. Of the 20 methylated clones, 5 were completely methylated at all 22 CpG positions and 15 were partially methylated, containing 1–3 nonmethylated CpG sites. In contrast to our PCR cloning data, which contained many mutations, only a single sequence contained an unexpected sequence difference, 34A > G in clone C8. Several lines of evidence suggest that we have achieved single-molecule PCR. First, the dilution series behaves as expected by transitioning from all wells positive to some wells positive to no wells positive for amplification upon further dilution of the template. Second, this transition occurs approximately where it is expected to based on the concentration of the template as measured by traditional UV absorbance. Third, each amplified product produced a discreet sequence, and several sequence types were observed (fully methylated, partially methylated, etc). Fourth, as predicted from the Poisson distribution of zero, one, and sometimes more than one molecule per well, several examples of mixed sequences that presumably came from the coamplification of two or more molecules were observed. Fig. 4 shows the expected distribution of single and multiple molecules per well as a function of the fraction of empty wells. Note that the fraction of wells containing a single molecule never exceeds 37% and that, as the concentration of molecules increases (that is, the number of observed negative wells decreases), the number of wells with more than one molecule increases. In our experiment, four mixed-template sequences were identified. These presumably represent the presence of one methylated and one unmethylated template molecule in the PCR. It is also possible that these mixed sequences could arise from two nonidentical partially methylated molecules. Because we discarded mixed-template sequences from our analysis, we may underreport the number of partially methylated molecules. It is also possible that some wells could contain two completely nonmethylated templates or two completely methylated templates, but these cannot be detected because both molecules have the same sequence. Of 336 wells, 133 amplified and 203 did not, yielding a calculated fraction of nonamplified wells of 0.604. However, from our sequencing of 96 positive wells, we know that only 46 contained real amplicon (42 analyzed sequences plus four multiple templates). Applying this correction to the 133 amplified
52
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
Fig. 3. Single-molecule PCR sequencing of FMR1 reveals methylation haplotypes unambiguously. For clarity, only the regions including CpGs 7–13 from 3 of the 42 single molecules analyzed in Table 4 are shown. Arrows point to CpG 10, which is not methylated (TG in the sequence) in the unmethylated molecule C1 (top) and the partially methylated molecule J1 (middle) but is methylated (CG in the sequence) in the completely methylated molecule E14 (bottom).
wells, we calculate that only 64 wells contained real amplicon, leaving 272 nonamplified wells of 336, for a nonamplified fraction of 0.810. The true fraction may lie somewhere between these values because it may be the case that some wells that contained a template molecule could also form primer dimer which would out-compete the template for amplification. For a nonamplified fraction between 0.604 and 0.810, the Poisson distribution predicts the fraction of multiple template wells to be between 0.093 and 0.019. In our sample of 96 positives, this represents between 9 and 22 wells that contained more than one template. This is in general agreement with our observed four mixed templates, given that we cannot detect the presence of two templates with the same methylation pattern.
Conclusion Our results show that it is possible to determine methylation haplotypes, the precise status of a group of CpGs on a molecule by molecule basis, by PCR amplification from individual bisulfiteconverted genomic DNA molecules and that, by doing so, PCR and cloning biases are eliminated. The technique depends on a bisulfite-specific primer set that forms primer dimer at a low rate. The observed rate of primer dimer formation based on NTC reactions for the FMR1 primer set used here is 12.5%. However, this number is based on only six primer dimer events and carries a large measure of uncertainty. From the sequencing of SYBR Green positive wells we can also calculate a primer dimer rate: 39.6% of wells were positive and 41.7% (40/96) of these
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
53
Table 4 Bisulfite sequencing from single-molecule PCR of the FMR1 amplicon from female human DNA results in the expected unbiased distribution of methylated and unmethylated molecules
were primer dimer, so the observed primer dimer formation rate is 16.5% (39.6 x 41.7%) of all wells. Ongoing work is centered on ways to reduce the fraction of wells lost to primer dimer formation. Single-molecule PCR has several advantages other than besides the elimination of PCR bias, including the elimination of jumping PCR artifacts, the mitigation of jackpot mutations, and the ability to easily detect rare sequence variants [24]. Next-generation sequencing platforms are providing new tools for DNA methylation analysis, such as ultradeep sequencing of bisulfite amplicons [33]. Ultradeep amplicon sequencing has the potential to generate very large numbers of sequences and a quantitative measure of methylation patterns. However, as mentioned above, PCR amplification bias can still affect these results. The use of molecular barcodes as described by Miner et al. [34] should be
considered a way to detect redundancy and contamination in ultradeep resequencing experiments. This is especially true when starting with limited template because of the increased risks of redundancy and contamination. In addition, barcodes can be used to detect the presence of more than one amplicon molecule in the emulsion PCR reactor in a manner analogous to the method described by Ohuchi et al. [35]. Other methods, including reduced-representation bisulfite sequencing [36] and methylated DNA capture approaches such as MeDIP [37] and MIRA [38], have been or are expected to soon be combined with highthroughput sequencing in pursuit of DNA methylation analysis. Methods in which the original sample molecule is amplified and directly analyzed, as shown in this paper, overcome PCR bias, jumping PCR, jackpot mutations, and cloning bias to provide accurate results.
54
Bias reduction with single-molecule PCR / A. Chhibber, B.G. Schroeder / Anal. Biochem. 377 (2008) 46–54
1.0 One molecule More than one molecule
Fraction of wells
0.8
0.6
0.4
0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
Fraction of wells with no template Fig. 4. Predicted single-molecule and multiple-molecule frequencies vs fraction of empty wells. Graph shows the relationship between the observed fraction of wells with no template on the x axis and the expected fraction of wells on the y axis that contain one (solid line) or more than one (dashed line) template molecule based on the Poisson distribution.
Acknowledgments We thank Mary Margaret Jotz for help with tables, figures, and careful reading of the manuscript and Linda Lee and Gerald Zon for their thoughtful discussions and continuous support of this work. References [1] H.D. Morgan, F. Santos, K. Green, W. Dean, W. Reik, Epigenetic reprogramming in mammals, Hum. Mol. Genet. 14 (2005) R47–R58. [2] T. Sado, A.C. Ferguson-Smith, Imprinted X inactivation and reprogramming in the preimplantation mouse embryo, Hum. Mol. Genet. 14 (2005) R59–R64. [3] J.-P. Issa, Age-related epigenetic changes and the immune system, Clin. Immunol. 109 (2003) 103–108. [4] M.F. Fraga, E. Ballestar, M.F. Paz, S. Ropero, F. Setien, M.L. Ballestar, D. HeineSuñer, J.C. Cigudosa, M. Urioste, J. Benitez, M. Boix-Chornet, A. SanchezAguilera, C. Ling, E. Carlsson, P. Poulsen, A. Vaag, Z. Stephan, T.D. Spector, Y.-Z. Wu, C. Plass, M. Esteller, Epigenetic differences arise during the lifetime of monozygotic twins, Proc. Natl. Acad. Sci. USA 102 (2005) 10604–10609. [5] J.F. Costello, C. Plas, Methylation Matters, J. Med. Genet. 38 (2001) 285–303. [6] K.D. Robertson, DNA methylation and human disease, Nat. Rev. Genet. 6 (2005) 597–610. [7] P.W. Laird, Cancer epigenetics, Hum. Mol. Genet. 14 (2005) R65–R76. [8] M. Esteller, Cancer epigenomics: DNA methylomes and histone-modification maps, Nat. Rev. Genet. 8 (2007) 286–298. [9] B. Richardson, DNA Methylation and autoimmune disease, Clin. Immunol. 109 (2003) 72–79. [10] C. Grunau, S.J. Clark, A. Rosenthal, Bisulfite genomic sequencing: systematic investigation of critical experimental parameters, Nucleic Acids Res. 29 (2001) e65. [11] V.L. Boyd, G. Zon, Bisulfite conversion of genomic DNA for methylation analysis: protocol simplification with higher recovery applicable to limited samples and increased throughput, Anal. Biochem. 326 (2004) 278–280. [12] S.J. Clark, A. Statham, C. Stirzaker, P.L. Molloy, M. Frommer, DNA methylation: bisulphite modification and analysis, Nat. Protoc. 1 (2006) 2353–2364. [13] M. Frommer, L.E. McDonald, D.S. Millar, C.M. Collis, F. Watt, G.W. Grigg, P.L. Molloy, C.L. Paul, A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands, Proc. Natl. Acad. Sci. USA 89 (1992) 1827–1831. [14] J. Lewin, A.O. Schmitt, P. Adorján, T. Hildmann, C. Piepenbrock, Quantitative DNA methylation analysis based on four-dye trace data from direct sequencing of PCR amplificates, Bioinformatics 20 (2004) 3005–3012. [15] K. Brakensiek, L.U. Wingen, F. Länger, H. Kreipe, U. Lehmann, Quantitative high-resolution CpG island mapping with PyrosequencingTM reveals diseasespecific methylation patterns of the CDKN2B gene in myelodysplastic syndrome and myeloid leukemia, Clin. Chem. 53 (2007) 17–23.
[16] P.M. Warnecke, C. Stirzaker, J.R. Melki, D.S. Millar, C.L. Paul, S.J. Clark, Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA, Nucleic Acids Res. 25 (1997) 4422– 4426. [17] K.O. Voss, K.P. Roos, R.L. Nonay, N.J. Dovichi, Combating PCR bias in bisulfitebased cytosine methylation analysis. Betaine-modified cytosine deamination PCR, Anal. Chem. 70 (1998) 3818–3823. [18] V.L. Boyd, K.I. Moody, A.E. Karger, K.J. Livak, G. Zon, J. Burns, Methylationdependent fragment separation: Direct detection of DNA methylation by capillary electrophoresis of PCR products from bisulfite-converted genomic DNA, Anal. Biochem. 354 (2006) 266–273. [19] T.K. Wojdacz, L.L. Hansen, Reversal of PCR bias for improved sensitivity of the DNA methylation melting curve assay, BioTechniques 41 (2006) 274– 278. [20] L. Shen, Y. Guo, X. Chen, S. Ahmed, J.-P. Issa, Optimizing annealing temperature overcomes bias in bisulfite PCR methylation analysis, BioTechniques 42 (2007) 48–58. [21] P.M. Warnecke, C. Stirzaker, J. Song, C. Grunau, J.R. Melki, S.J. Clark, Identification and resolution of artifacts in bisulfite sequencing, Methods 27 (2002) 101–107. [22] J. Brohede, N. Arnheim, H. Ellegren, Single-molecule analysis of the hypermutable tetranucleotide repeat locus D21S1245 through sperm genotyping: a heterogeneous pattern of mutation but no clear male age effect, Mol. Biol. Evol. 21 (2004) 58–64. [23] B. Vogelstein, K.W. Kinzler, Digital PCR, Proc. Natl. Acad. Sci. USA 96 (1999) 9236–9241. [24] Y. Kraytsberg, K. Khrapko, Single-molecule PCR: an artifact-free PCR approach for the analysis of somatic mutations, Expert Rev. Mol. Diagn. 5 (2005) 809– 815. [25] D.R. Bentley, Whole-genome re-sequencing, Curr. Opin. Genet. Dev. 16 (2006) 545–552. [26] S. Rungpragayphan, T. Yamane, H. Nakano, SIMPLEX: single-molecule PCRlinked in vitro expression: a novel method for high-throughput construction and screening of protein libraries, Methods Mol. Biol. 375 (2007) 79–94. [27] V.L. Boyd, B.G. Schroeder, L.G. Lee, in: H.P. Neumann (Ed.), Progress in DNA methylation research, Nova Science Publishers, Hauppauge, 2007, pp. 91–116. [28] K.D. Siegmund, P.W. Laird, Analysis of complex methylation data, Methods 27 (2002) 170–178. [29] R. Stöger, T.M. Kajimura, W.T. Brown, C.D. Laird, Epigenetic variation illustrated by DNA methylation patterns of the fragile-X gene FMR1, Hum. Mol. Genet. 6 (1997) 1791–1801. [30] R. Godiska, M. Patterson, T. Schoenfeld, D.A. Mead, in: J. Kieleczawa (Ed.), DNA sequencing: optimizing the process and analysis, Jones and Bartlett, Sudbury, 2005, pp. 55–75. [31] S.M.D. Goldberg, J. Johnson, D. Busam, T. Feldblyum, S. Ferriera, R. Friedman, A. Halpern, H. Khouri, S.A. Kravitz, F.M. Lauro, K. Li, Y.-H. Rogers, R. Strausberg, G. Sutton, L. Tallon, T. Thomas, E. Venter, M. Frazier, J.C. Venter, A Sanger/ pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes, Proc. Natl. Acad. Sci. USA 103 (2006) 11240–11245. [32] Q. Chou, M. Russell, D.E. Birch, J. Raymond, W. Bloch, Prevention of pre-PCR mis-priming and primer dimerization improves low-copy-number amplifications, Nucleic Acids Res. 20 (1992) 1717–1723. [33] K.H. Taylor, R.S. Kramer, J.W. Davis, J. Guo, D.J. Duff, D. Xu, C.W. Caldwell, H. Shi, Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing, Cancer Res. 67 (2007) 8511– 8518. [34] B.E. Miner, R.J. Stöger, A.F. Burder, C.D. Laird, R.S. Hansen, Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR, Nucleic Acids Res. 32 (2004) e135. [35] S. Ohuchi, H. Nakano, T. Yamane, In vitro method for the generation of protein libraries using PCR amplification of a single DNA molecule and coupled transcription/translation, Nucleic Acids Res. 26 (1998) 4339–4346. [36] A. Meissner, A. Gnirke, G.W. Bell, B. Ramsahoye, E.S. Lander, R. Jaenisch, Reduced representation bisulfite sequencing for comparative highresolution DNA methylation analysis, Nucleic Acids Res. 33 (2005) 5868– 5877. [37] M. Weber, J.J. Davies, D. Wittig, E.J. Oakeley, M. Haase, W.L. Lam, D. Schübeler, Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells, Nat. Genet. 37 (2005) 853–862. [38] R. Rauch, G.P. Pfeifer, Methylated-CpG island recovery assay: a new technique for the rapid detection of methylated-CpG islands in cancer, Lab. Invest. 85 (2005) 1172–1180.