Available online at www.sciencedirect.com
ScienceDirect Journal of Genetics and Genomics 42 (2015) 151e159
JGG ORIGINAL RESEARCH
The Performance of Whole Genome Amplification Methods and Next-Generation Sequencing for Pre-Implantation Genetic Diagnosis of Chromosomal Abnormalities Na Li a,b,1, Li Wang c,1, Hui Wang c, Minyue Ma c, Xiaohong Wang a, Yi Li a, Wenke Zhang c, Jianguang Zhang d, David S. Cram d,*, Yuanqing Yao c,* a
Department of Gynecology and Obstetrics, Reproductive Medicine Center, Tangdu Hospital, Fourth Military Medical University, Xi’an 710038, China b Department of Obstetrics and Gynecology, Affiliated Hospital of Academy of Military Medical Sciences, Beijing 100071, China c Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing 100853, China d Berry Genomics, C., Limited, Beijing 100015, China Received 20 October 2014; revised 6 March 2015; accepted 8 March 2015 Available online 14 March 2015
ABSTRACT Reliable and accurate pre-implantation genetic diagnosis (PGD) of patient’s embryos by next-generation sequencing (NGS) is dependent on efficient whole genome amplification (WGA) of a representative biopsy sample. However, the performance of the current state of the art WGA methods has not been evaluated for sequencing. Using low template DNA (15 pg) and single cells, we showed that the two PCR-based WGA systems SurePlex and MALBAC are superior to the REPLI-g WGA multiple displacement amplification (MDA) system in terms of consistent and reproducible genome coverage and sequence bias across the 24 chromosomes, allowing better normalization of test to reference sequencing data. When copy number variation sequencing (CNV-Seq) was applied to single cell WGA products derived by either SurePlex or MALBAC amplification, we showed that known disease CNVs in the range of 3e15 Mb could be reliably and accurately detected at the correct genomic positions. These findings indicate that our CNV-Seq pipeline incorporating either SurePlex or MALBAC as the key initial WGA step is a powerful methodology for clinical PGD to identify euploid embryos in a patient’s cohort for uterine transplantation. KEYWORDS: Single cells; Whole genome amplification; Next-generation sequencing; Copy number variation; Pre-implantation genetic diagnosis
INTRODUCTION Whole genome amplification (WGA) is a technology designed to amplify small amounts of DNA down to the single cell level and generate a representative DNA template sufficient for downstream genetic analysis using conventional molecular techniques. A variety of WGA methodologies have evolved * Corresponding authors. Tel/fax: þ86 10 6693 9258 (Y. Yao); Tel: þ86 10 5325 9188x5073, fax: þ86 10 8430 6824 (D.S. Cram). E-mail addresses:
[email protected] (D.S. Cram); yqyao_
[email protected] (Y. Yao). 1 These two authors contributed equally to this work.
but are essentially based on principles of primer extension PCR (PEP) (Zhang et al., 1992), degenerate oligonucleotide primed PCR (DOP-PCR) (Telenius et al., 1992) or multiple displacement amplification (MDA) (Dean et al., 2002). Applications of WGA are widespread in clinical practice, including genetic typing of circulating tumor cells (Czyz et al., 2014), identification of forensic samples (Cai et al., 2010), and chromosome and single gene analysis of embryos from patients undergoing assisted reproduction and pre-implantation genetic diagnosis (PGD) (Zhang et al., 1992). In the field of PGD, array-based comparative genomic hybridization (CGH) and single-nucleotide polymorphism (SNP) arrays are widely used to discriminate between euploid and
http://dx.doi.org/10.1016/j.jgg.2015.03.001 1673-8527/Copyright Ó 2015, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Limited and Science Press. All rights reserved.
152
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
aneuploid embryos in the cohort produced by assisted reproductive technologies (Munne, 2012; Handyside, 2013). More recently, several next-generation sequencing (NGS) protocols have been developed and validated (Hou et al., 2013; Fiorentino et al., 2014; Huang et al., 2014; Wang et al., 2014b, 2014c; Wells et al., 2014). Both array and NGS methodologies rely solely on a pre WGA step to amplify the small amount of DNA in an embryo biopsy sample with high fidelity, in order to generate sufficient embryo DNA for either hybridization or sequencing. Currently, the WGA kits that are used in PGD are commercially available (de Bourcy et al., 2014), and have been specifically optimized for single cell application. However, no WGA method has been reported with the capacity to replicate the entire single cell genome (Zheng et al., 2011). Further, from the analysis of the sequences that are amplified by WGA, there is a substantial bias due to the cumulative effects of allelic dropout or preferential allelic amplification (Findlay et al., 1995; Zheng et al., 2011; Binder et al., 2014). To further improve NGS-based technologies for comprehensive PGD of the full spectrum of chromosomal abnormalities afflicting human embryos (Vanneste et al., 2009; Voet et al., 2011), the WGA reaction remains the most critical step that determines the overall diagnostic performance of both array and sequencing based methods. In this study, we examined the performance of two PCRbased methods SurePlex (Fiorentino et al., 2011; Shen et al., 2013) and MALBAC (Zong et al., 2012) as well as the multiple displacement amplification (MDA)-based method REPLI-g (Treff et al., 2011) for genome coverage and bias, and their potential as a starting template for detection of copy number variation (CNV) using our recently described NGS method called CNV-Seq (Liang et al., 2014). Here, we showed that both PCR-based WGA methods used in conjunction with CNV-Seq, are highly effective for identification, delineation and quantitation of small CNVs in single cells. RESULTS Evaluation of WGA methods for genome coverage of low template DNA Three commonly used WGA methods based on either PCR (SurePlex, MALBAC) or MDA (REPLI-g) were evaluated with the aim of determining which methodology can provide the most uniform and reproducible coverage of the 24 chromosomes in a single cell for CNV analysis. For initial experimentation, we used 15 pg genomic DNA samples, which corresponds to approximately the same chromosomal DNA content in a single cell (6.6 pg of DNA), taking into account the random loss of DNA due to the effects of limiting dilution. Replicate 46,XX genomic DNA samples (15 pg) were prepared and subjected to WGA using the three different methods. Agarose gel electrophoresis was used to determine the yield and size range of the WGA reaction products (Fig. 1A). From five replicates, the mean yield was 4.15 0.40 mg (SurePlex), 3.18 0.39 mg (MALBAC) and 21.91 3.50 mg (REPLI-g). The size range of the different
WGA products was 0.2‒0.8 kb (SurePlex), 0.2‒2.0 kb (MALBAC) and >10 kb (REPLI-g), respectively, and patterns were consistent with previous results from single cells (Treff et al., 2011; Zong et al., 2012; Wang et al., 2014c). To evaluate the genome coverage of the different WGA systems in more depth, we first performed WGA reactions from 15 pg samples, constructed libraries from 50 ng of the WGA products and 50 ng of unamplified DNA, and then performed massively parallel sequencing and chromosome mapping of the unique reads. Density plots of the distribution sequencing reads (100 kb sequencing bins) for all 24 chromosomes were compared between the three different WGA methods and representative results were shown for Chr. 1, Chr. 12 and Chr. 18 (Fig. 1B; refer to Fig. S1 for remaining chromosomes). In contrast to the uniform density of sequencing reads across Chr1, Chr12 and Chr18 for non-amplified genomic DNA, the patterns for amplified low template genomic DNA were non-uniform with a significant number of regions with high read numbers (peaks) and low read numbers (troughs). Nonetheless, the overall density profiles were relatively similar, regardless of the WGA method and this held true for the other chromosomes (Fig. S1). As judged by the distribution and fluctuation of the peaks and troughs, both PCR-based WGA methods exhibited more uniformity and potentially less regional amplification bias than the MDA-based WGA system. The two PCR-based WGA systems were further evaluated for reproducing consistent chromosomal density read profiles. This is of particular importance for the application of NGSbased technologies for CNV detection since a high degree of reproducibility would allow the application of normalization algorithms and thus more meaningful comparisons of test to reference samples. Three replicate WGA products derived from 15 pg of genomic DNA were sequenced and density plots for Chr1, Chr12 and Chr18 were analyzed (Fig. 2). Replicate sequencing read density distributions across these three example chromosomes, and the remaining chromosomes, were again very similar for both SurePlex and MALBAC methods. Thus, both PCR-based WGA systems showed reproducible chromosome-specific coverage from a low template DNA sample mimicking the DNA content of a single cell. Sensitivity, specificity and reproducibility of CNV diagnosis from different WGA templates The suitability of the three different WGA systems to provide a starting template for reliable and accurate chromosome analysis by CNV-Seq was assessed using 15 pg DNA samples with known clinically-significant CNVs. For these experiments, we selected four genomic DNA disease models with variably sized terminal deletions or duplications (Table 1), including an unbalanced t(1,X) translocation (52.24 Mb Xp del, 92.8 Mb 1q dup), Sotos syndrome (15.4 Mb 5p dup, 6.44 Mb 5q del), Jacobsen syndrome (12.38 Mb 11q del) and Wolf-Hirschhorn syndrome (3.32 Mb 4p del). CNV-Seq analysis of SurePlex and MALBAC products detected all the six types of CNVs with sizes and map intervals similar to those detected in matching unamplified genomic DNA
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
A
M
NC
1
2
3
4
5
M
NC
1
2
3
4
5
M
153 NC
1
2
3
4
5
2 kb 10 kb
1.5 kb 0.5 kb 0.5 kb
0.1 kb 1 kb
0.1 kb SurePlex
B
MALBAC
Chr. 1
Chr. 12
Chr. 18
0.008
Sequencing read density (%)
REPLI-g
0.006
0.010
0.004
0.003
0.000
0.000
0.000
0.015 0.010 0.005 0.000
0.015 0.010
0.015 0.010
0.005 0.000
0.005 0.000
0.000
0.020 0.015 0.010 0.005 0.000
0.020 0.015 0.010 0.005 0.000
0.030 0.020 0.010 0.000
0.05 0.04 0.03 0.02 0.01 0.00
0.020 0.010
0
500
1000
1500
2000
Read bin (100 kb)
2500
gDNA
SurePlex
MALBAC
0.02
REPLI-g
0.01 0.00 0
200
400
600
800
1000 1200
Read bin (100 kb)
0
200
400
600
800
Read bin (100 kb)
Fig. 1. Analysis of SurePlex, MALBAC and REPLI-g WGA products. A: Gel analysis of WGA by agarose gel electrophoresis. Lanes 1‒5, replicate 15 pg samples; NC, negative control; M, DNA molecular weight markers. The significant DNA product observed in the REPLI-g NC sample has been previously reported to result from primer-dimer formation, followed by concatemeric amplification in the absence of a natural DNA template (Treff et al., 2011). B: Density plots of sequencing reads for Chr. 1, Chr. 12 and Chr. 18 (refer to Fig. S1 for the remaining chromosomes). Chromosome plots depict the number of sequencing reads per 100 kb bin relative to the all 100 kb bins (Y-axis) versus each sequential 100 kb bin (X-axis). The black bars represent the centromere separating the p and q arms and red bars represent regions of repetitive sequences. Each chromosome displayed a unique distribution of mapped sequencing reads.
samples (Table 1, Fig. 3A). No other non-specific CNVs were detected in other regions of these chromosomes (Fig. 3A) or the remaining chromosomes. In addition, for both deletion and duplication CNVs, with the exception of the 3.31 Mb 4p deletion (Wolf-Hirschhorn syndrome) identified from the MALBAC product, the actual copy number change was also very similar to the genomic DNA copy number change (Table 1). In contrast, from the REPLI-g products, CNV-Seq only identified the three larger CNVs and failed to completely detect the three smaller CNVs. Furthermore, a number of other CNVs not present in the original genomic DNA samples were additionally detected by CNV-Seq (Table 1), indicating a lack of specificity originating from the REPLI-g products. Since the two PCR-based WGA methods applied to 15 pg of genomic DNA clearly provided a more representative template conducive to sensitive and specific detection of CNV, we further evaluated the reproducibility of SurePlex and
MALBAC using the genomic DNA sample with the 3.32 Mb 4p del (Wolf-Hirschhorn syndrome) as the test CNV. Three replicate chromosome profiles derived by CNV-Seq analysis of the SurePlex and MALBAC WGA products are shown in Fig. 3B. The 3.32 Mb CNV was detected from all three replicate SurePlex and MALBAC WGA products with the correct copy number. On the basis of these studies using 15 pg samples, we concluded that both SurePlex and MALBAC WGA provided high sensitivity and reproducibility for CNV detection by CNV-Seq, although the overall performance of SurePlex was judged to be marginally superior. Validation of CNV-Seq for detection of small CNVs at the single cell level Although 15 pg of genomic DNA only models the DNA content in a single cell, we proceeded to assess the
154
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
SurePlex Chr. 12
Sequencing read density (%)
Chr. 1 0.015 0.010 0.005 0.000
0.012 0.008 0.004 0.000
0.015 0.010 0.005 0.000 0
500
1000
1500
2000
2500
Chr. 18
0.015 0.010 0.005 0.000
0.015 0.010 0.005 0.000
0.015
0.012
0.010 0.005
0.008 0.004
0.000
0.000
0.15 0.10 0.05 0.00
0.012 0.008 0.004 0.000 0
200
Read bin (100 kb)
400
600
800
1000 1200
Replicate 1
Replicate 2
Replicate 3 0
Read bin (100 kb)
200
400
600
800
Read bin (100 kb)
MALBAC Chr. 1
0.030 0.020
Sequencing read density (%)
Chr. 18
Chr. 12 0.020 0.015 0.010 0.005 0.000
0.010 0.000
0.015 0.010 0.005 0.000
0.020
0.020
0.020
0.010
0.010
0.010
0.000
0.000
0.000
0.020 0.015 0.010 0.005 0.000
0.02
0.020 0.010 0.000 0
500
1000
1500
2000
Read bin (100 kb)
2500
Replicate 1
Replicate 2
Replicate 3
0.01 0.00 0
200
400
600
800 1000 1200
Read bin (100 kb)
0
200
400
600
800
Read bin (100 kb)
Fig. 2. Density plots of chromosome sequencing reads derived from replicate 15 pg genomic DNA samples subjected to SurePlex and MALBAC WGA. Chromosome plots depict the number of sequencing reads per 100 kb bin relative to the all 100 kb bins (Y-axis) versus each sequential 100 kb bin (X-axis). Black bar represents the centromere separating the p and q arms and red bars represent regions of repetitive sequences. Chromosome-specific distributions of mapped sequencing reads were reproducible in replicate samples.
combination of SurePlex and CNV-Seq for CNV detection on single cells directly lysed and then amplified by WGA. For these experiments, we accessed patient blood samples with small CNVs associated with Sotos syndrome (15.4 Mb 5p dup, 6.44 Mb 5q del), 1p36 microdeletion syndrome (2.34 Mb 1p del) and 3q29 microduplication syndrome (2.26 Mb 3q dup). Small numbers of lymphocytes in a small drop of PBS buffer were prepared and then single cells were transferred by micromanipulation to individual PCR tubes to create replicate test samples. High-resolution (20 kb sequencing bins) CNVSeq profiles derived from triplicate SurePlex products are shown in Fig. 4. In all replicates, the detected size and map position of the four small CNVs were almost identical to those in the matching genomic DNA control. In addition, the mean copy numbers of 2.9 (14.76 Mb 5p dup), 1.1 (5.89 Mb 5q del), 1.2 (2.24 Mb 1p del) and 2.8 (2.26 Mb 3q dup) were close to expected values. Taken together, these results suggest that the combination of CNV-Seq and SurePlex WGA is sensitive, specific and reproducible for detecting these small CNVs at the single cell level.
DISCUSSION NGS technologies for genetic testing of low template samples such as blastomere and trophectoderm biopsy samples from embryos produced by assisted reproductive technologies (ART) are evolving at a rapid pace, with reports of several different protocols (Hou et al., 2013; Yin et al., 2013; Fiorentino et al., 2014; Huang et al., 2014; Wang et al., 2014b; Wells et al., 2014). However, in most cases, selection of the appropriate WGA method for PGD has been arbitrary, based mainly on previous performance as a hybridization template for array based diagnosis. To optimize NGS for PGD, we evaluated three well known WGA systems, which is the key first step in the procedure. The main finding from the study was that PCR-based WGA of low template DNA, including 15 pg and one cell samples, provided a far superior template than MDA-based WGA for CNV-Seq to identify and correctly delineate pathogenic CNVs. The CNV-Seq method used in this study to analyze CNVs down to the single cell level is a relatively low pass sequencing
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
155
Table 1 Comparison of sensitivity and specificity of WGA for detection of known pathogenic CNV by CNV-Seq Chromosome disease
gDNA CNV
SurePlex CNV
MALBAC CNV
REPLI-g CNV
Mental retardation
52.24 Mb Xp11.21-pter del
52.10 Mb Xp11.22-pter del
52.10 Mb Xp11.22-pter del
49.20 Mb Xp11.22-pter del
92.80 Mb 1q22-qter dup
93.30 Mb 1q22-qter dup
92.70 Mb 1q23.1-qter dup
93.70 Mb 1q22-qter dup
15.40 Mb 5p15.1-pter dup
15.50 Mb 5p15.1-pter dup
15.40 Mb 5p15.1-pter dup
13.50 Mb 5p15.2-pter dup
6.44 Mb 5q35.2-qter del
5.60 Mb 5q35.2-qter del
5.90 Mb 5q35.2-qter del
Not detected
4.20 Mb 18q12.3-q21.1 dup Sotos syndrome
44.10 Mb Xp11.23-pter dup 60.00 Mb 1p13.3-p21.2 dup Jacobsen syndrome
12.38 Mb 11q24.1-qter del
12.00 Mb 11q24.1-qter del
12.40 Mb 11q24.1-qter del
Not detected 7.70 Mb 11q24.4-q25 del 3.50 Mb 5q11.1-q11.2 del 5.80 Mb 7q31.1-q31.2 del
Wolf-Hirschhorn syndrome
3.32 Mb 4p16.3 del
3.30 Mb 4p16.3 del
2.80 Mb 4p16.3 del
Not detected 5.60 Mb 5q35.2-qter dup 5.40 Mb 7p11.2-p12.1 dup 4.10 Mb 11p15.4-pter dup 3.60 Mb 13q22.3-q31.1 dup 3.00 Mb 17p11.2 dup
Normal
Not detected
Not detected
Not detected
4.20 Mb 9q21.33-q22.2 dup
False positive is shown in bold.
strategy whereby approximately three million uniquely and perfectly mapped sequencing reads are allocated to 20 kb sequencing bins along the length of each chromosome. Thus, from sample to sample, the sequence read content of each bin will be random, but the mean read number per bin should be reflective of copy number. This prediction has been exemplified in our previous studies of small CNVs at the genomic DNA level whereby CNV detection was reproducible at a resolution of 0.1 Mb and CNVs were shown to be bona fide by mate pair sequencing confirmation (Liang et al., 2014). Thus, for CNVSeq to be both sensitive and specific at the single cell level, the main role of the WGA reaction is to provide uniform and consistent coverage of the genome from sample to sample so that sequencing bin data can be normalized to reference with minimum remaining noise. Only the two PCR based WGA methods analyzed in this study fulfilled this criteria. However, whether the two WGA methods provide reproducible genome-wide coverage remains to be more rigorously tested in a more comprehensive study using other known pathogenic CNVs located in different regions of the genome. The disparity between PCR- and MDA-based WGA for revealing CNVs by sequencing was an unexpected finding because MDA products have been previously used successfully to detect chromosomal abnormalities in embryos by arraybased methods (Munne, 2012; Handyside, 2013). Both PCR WGA methods evaluated in this study rely on DOP-PCR using primers with degeneracy at the 3 prime end (Telenius et al., 1992). Thus, following random priming across the genome, annealed primers in close proximity (within 1 Mb) can initiate efficient PCR. In contrast, MDA also uses hexamer random priming, but generates much larger genomic fragments of
greater than 10 kb in size by an isothermal reaction (Gill and Ghaemi, 2008). Based on these inherent differences in DNA priming and enzymatic amplification kinetics, we speculate that the two PCR-based methods would therefore have a much higher probability than the MDA method to reproducibly cover the single cell genome with less bias, including regions of recalcitrant GC and AT rich sequences, and thus account for the more consistent and uniform density plots observed for PCR WGA products across all the 24 chromosomes. In support of this hypothesis, in recent embryo validation studies (Wang et al., 2014b) and two cases of clinical PGD (Wang et al., 2014a), we showed that SurePlex WGA method, which was fortuitously selected, was effective in combination with CNVSeq for detecting aneuploidies and unbalanced translocations associated with small terminal CNVs. More recently, MALBAC WGA combined with NGS was also successfully applied to detect aneuploidies in single embryonic blastomeres (Huang et al, 2014). Therefore, from the limited clinical research studies conducted to date using PCR-based WGA and NGS including this study, a relatively high degree of diagnostic sensitivity and specificity have now been demonstrated for detection of chromosome CNVs in embryos. In order to offer even more comprehensive PGD to patients at an affordable cost, the next challenge is to extend the resolution of NGS at the single cell level down to approximately 1 Mb without increasing sequencing depth. To achieve this goal, further optimization of both WGA and NGS sequencing protocols will be necessary. Firstly, it is well known that the current volume of 10‒50 mL to conduct WGA is too high, diluting the small amounts of DNA template and decreasing reaction efficiency. One promising solution is an automated fluidics system
156
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
A
Chromosome disease syndromes Log 2 ratio -4 -2 0 2
MALBAC
52.10 Mb Xp11.22-pter del 1000
1500
15.50 Mb 5p15.1-pter dup 500
1000
200
400
600
800
1500
1000
3.30 Mb 4p16.3 del 0
500
1000
1500
1200
1000
500
1000
1500
15.40 Mb 5p15.1-pter dup 0
1400
1500
Mental retardation
92.70 Mb 1q23.1-qter dup 0
12.00 Mb 11q24.1-qter del 0
Log 2 ratio -3 -2 -1 0 1 2
2500
5.60 Mb 5q35.2-qter del
Log2 ratio -3 -2 -1 0 1 2
0
2000
Log 2 ratio -3-2-1 0 1 2
1000
500
2000
2500
Sotos syndrome
5.90 Mb 5q35.2-qter del
500
1000
Log 2 ratio -4 -3-2-1 0 1 2
Log 2 ratio -4 -3-2-1 0 1 2
500
0
1500
93.30 Mb 1q22-qter dup 0
52.10 Mb Xp11.22-pter del
Log2 ratio -3 -2 -1 0 1 2
500
Log 2 ratio -3 -2 -1 0 1 2
0
1500
Jacobsen syndrome
12.40 Mb 11q24.1-qter del 0
Log 2 ratio -4 -3-2-1 0 1 2
Log 2 ratio -4 -2 0 2
SurePlex
200
400
600
800
1000
0
500
1400
WolfHirschhorn syndrome
2.80 Mb 4p16.3 del
Read bin (100 kb)
1200
1000 Read bin (100 kb)
1500
Wolf-Hirschhorn syndrome (Chr. 4)
B
SurePlex
Log 2 ratio -3 -2 -1 0 1 2
Log 2 ratio -4 -3-2 -1 0 1 2
MALBAC
500
1000
1500
0
500
1000
1500
Log 2 ratio -4 -3-2 -1 0 1 2
Log 2 ratio -3 -2 -1 0 1 2
0
Replicate 1
500
1000
1500
0
500
1000
1500
Log 2 ratio -4 -3-2 -1 0 1 2
Log 2 ratio -4 -3-2 -1 0 1 2
0
Replicate 2
0
500
1000 Read bin (100 kb)
1500
Replicate 3
0
500
1000 Read bin (100 kb)
1500
Fig. 3. CNV-Seq profiles of pathogenic CNVs derived from 15 pg genomic DNA samples subjected to SurePlex and MALBAC WGA. A: CNV profiles for chromosome disease syndromes. B: Replicate CNV profiles for Wolf-Hirschhorn syndrome. Chromosome plots are log2 values of the mean sequencing read number per 100 kb bin (Y-axis) versus sequential 100 kb sequencing bins (X-axis). The mean CNV along the length of each chromosome is marked with a blue line. The upper and lower dashed lines indicate the expected position of a 100% chromosome gain (duplication) and a 100% chromosome loss (deletion), respectively. Red bars mark regions of repetitive sequences and black bars mark the centromere. The disease-specific CNVs were clearly identified and delineated in all samples, including replicate samples of the CNV associated with Wolf-Hirschhorn syndrome.
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
157
Log 2 ratio -3 -2 -10 1 2 3
Sotos syndrome (Chr. 5) gDNA
2000
4000
6000
8000
Log 2 ratio -4-3-2-1 0 1 2
0
Single cell (Replicate 1) 2000
4000
6000
8000
Log 2 ratio -3 -2-1 0 1 2 3
0
Single cell (Replicate 2)
0
2000
4000
6000
8000
Read bin (20 kb)
1p36 microdeletion syndrome Log 2 ratio -3-2-1 0 1 2
Log 2 ratio -1 0 1 2
3q29 microduplication syndrome
2000
4000
6000
8000
10000
0
12000
2000
4000
6000
8000
10000
Log 2 ratio -3-2-1 3 012
Log 2 ratio -3 -2-1 0 1 2
0
gDNA
2000
4000
6000
8000
10000
12000
0
2000
4000
6000
8000
10000
Log 2 ratio -3 -2-1 0 1 2
Log 2 ratio -3-2-1 0 1 2 3
0
Single cell (Replicate 1)
2000
4000
6000
8000
10000
12000
0
2000
4000
6000
8000
10000
Log 2 ratio -3-2-1 0 1 2
Log 2 ratio -3-2-1 0 1 2 3
0
Single cell (Replicate 2)
0
2000
4000
6000
8000
10000
12000
Read bin (20 kb)
Single cell (Replicate 3)
0
2000
4000
6000
8000
10000
Read bin (20 kb)
Fig. 4. CNV-Seq profiles of pathogenic CNVs from single cell SurePlex WGA products. Chromosome plots are log2 values of the mean sequencing read number per 20 kb bin (Y-axis) versus sequential 20 kb sequencing bins (X-axis). The mean CNV along the length of each chromosome is marked with a blue line. The upper and lower dashed lines indicate the expected position of a 100% chromosome gain (duplication) and a 100% chromosome loss (deletion), respectively. Red bars mark regions of repetitive sequences and black bars mark the centromere. The CNVs associated with Sotos, 1p36 microdeletion and 3q29 microduplication syndromes were all clearly identified and delineated in the replicate single cell samples.
for isolating single cells into very small chambers (Yu et al., 2014), which would allow WGA to proceed in sub-microliter volumes, increasing efficiency and, possibly, genomic coverage. Secondly, generation of the DNA sequencing library from the WGA product uses a pre-amplification PCR step which introduces additional bias into the population of sequencing reads. Recently, a PCR free library method called VeriSeq (Illumina, USA) based on SurePlex WGA has been developed which may help to remove this extra bias in the sequencing reads associated with PCR. Thirdly, a more radical
approach would completely remove the WGA step altogether which is the primary source of genome bias and directly sequence PCR libraries constructed from the native embryonic genomic DNA. This may also be possible with a microfluidics approach in combination with more efficient DNA modifying enzymes to generate the sequencing libraries. With these improvements, it may eventually be possible to generate sequencing reads with virtually no bias allowing more effective normalization of test against reference, which should translate to increased sensitivity and specificity at a higher resolution.
158
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
In conclusion, for now, we have identified and validated two PCR-based WGA systems compatible with CNV-Seq to reliably and accurately detect pathogenic CNVs at the single cell level. This sequencing pipeline should be ideally suited for clinical PGD using either blastomere or trophectoderm biopsy to identify true euploid embryos for transfer and increase the chances for an infertile couple or a couple with a translocation to have a healthy baby by assisted reproduction. MATERIALS AND METHODS Clinical research samples For the research study, patient’s peripheral blood samples were collected in EDTA tubes at Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing with approval from the Ethics Committee of the Chinese PLA General Hospital (S2013-092-02). Five patients had chromosomal diseases associated with CNVs including Sotos, Jacobsen, Wolf-Hirschhorn, 1p36 microdeletion and 3q29 microduplication syndromes. In two other patients, one had a Xp11.22-pter duplication and 1q22qter deletion associated with mental retardation and one had a normal karyotype (control). Isolation and dilution of genomic DNA Extraction of genomic DNA (gDNA) from blood samples was performed using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, USA) according to the manufacturer’s instructions. The concentrations of purified gDNA were determined using the Quant-iT Assay Kit with the Qubit fluorometer (Invitrogen, USA). For preparation of 15 pg genomic DNA samples, 1.5 mg of DNA was serially diluted 10-fold in Dulbecco’s phosphatebuffered saline (DPBS) to generate a solution with a DNA concentration of 1500 pg/100 mL. From this solution, 10 mL was finally transferred to a sterile PCR tube containing 15 mL of DPBS to give a final stock solution with a DNA concentration of 6 pg/mL. Replicate 2.5 mL volumes of this stock solution containing 15 pg of genomic DNA were used as a starting template for WGA. Micromanipulation of single cells White blood cells (WBCs) were micromanipulated under a dissection microscope (Olympus SZ61, Japan) using a finely pulled glass Pasteur pipette. Isolated lymphocytes were washed three times with DPBS buffer and transferred into 200 mL tubes containing 2.5 mL DPBS. All procedures were performed in a dedicated laboratory with positive pressure HEPA-filtered air taking precautions to avoid contamination. Tubes containing single cells were frozen at 20 C prior to WGA. Whole genome amplification Single WBCs or 15 pg of gDNA in 2.5 mL of DPBS were subjected to WGA using three different commercial kits, namely, the SurePlex WGA Kit (New England Biolabs,
USA), the MALBAC Single Cell Whole Genome Amplification Kit (Yikon Genomics, China) and the REPLI-g Single Cell Kit (Qiagen) according to the recommended protocols. Following WGA reactions, the amplified DNA products were purified using DNA Clean and Concentrator columns (Zymo Research, USA) to remove unincorporated primers, and then quantitated using a NanoDrop 2000c spectrophotometer (Thermo Scientific, Germany). For determination of the quality of the WGA products, small aliquots were analyzed by electrophoresis on 2% agarose gels prepared in TBE buffer (90 mmol/L Tris-borate, pH 8.0, 2 mmol/L EDTA), stained with GelRed (Biotium, USA), and DNA visualized with Image Master software (Bio-Rad, USA). Next-generation sequencing and CNV detection CNV-Seq was performed as previously described (Liang et al., 2014; Wang et al., 2014d). The data analysis bioinformatics pipeline used for identifying, mapping and quantitating chromosomal CNVs in gDNA and WGA samples has been described previously (Wang et al., 2014b, 2014d). In brief, DNA products (input of 50 ng) were fragmented to an average size of 300 bp and end ligated with sequencing adaptors containing a 9 bp sample identifying barcode (Liang et al., 2013). Tagged fragments were then amplified by PCR with adaptor primers to generate sequencing libraries for each sample. Barcoded libraries were mixed and subjected to massively parallel sequencing on the HiSeq2500 platform to generate 43 bp reads (9 bp barcode and 36 bp genomic sequence). From the approximate 5 million sequencing reads generated from each library, 3.0‒3.25 million reads (60%‒65%) were uniquely and perfectly mapped to the hg19 human reference genome using the Burrows Wheel algorithm (Li and Durbin, 2010). Data sets from a minimum of 15 samples were internally compared with each other using the number of sequencing reads allocated to either consecutive chromosomal 100 kb bins (low resolution) or 20 kb bins (high resolution) as previously described (Wang et al., 2014c). To specifically identify CNV, log2 value of the mean CNV was plotted against each successive 20 kb or 100 kb bin from the p arm to the q arm of each chromosome, using theoretical reference points of log2[0] for two copies (normal), log2[1.5] for three copies (duplication) and log2[0.5] for one copy (deletion). Gain or loss of chromosomal regions was finally determined using the fused lasso algorithm (Tibshirani and Wang, 2008) and CNVs were called using cut off values of >2.8 for duplications and <1.2 for deletions. ACKNOWLEDGMENTS The study was supported by grants awarded to Yuanqing Yao by the Key Program of the “Twelfth Five-year plan” of People’s liberation Army (No. BWS11J058) and the National High Technology Research and Development Program (SS2015AA020402).
N. Li et al. / Journal of Genetics and Genomics 42 (2015) 151e159
SUPPLEMENTARY DATA Fig. S1. Chromosome plots of sequencing reads density versus read bins of 100 kb. Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jgg.2015.03.001.
REFERENCES Binder, V., Bartenhagen, C., Okpanyi, V., Gombert, M., Moehlendick, B., Behrens, B., Klein, H.U., Rieder, H., Ida Krell, P.F., Dugas, M., Stoecklein, N.H., Borkhardt, A., 2014. A new workflow for whole-genome sequencing of single human cells. Hum. Mutat. 35, 1260e1270. Cai, H.Q., Liu, H.T., Shi, B., Li, A., Tang, W.R., Luo, Y., 2010. Whole genome amplification and its application in forensic individual identification. Yi Chuan 32, 1119e1125. Czyz, Z.T., Hoffmann, M., Schlimok, G., Polzer, B., Klein, C.A., 2014. Reliable single cell array CGH for clinical samples. PLoS One 9, e85907. de Bourcy, C.F., De Vlaminck, I., Kanbar, J.N., Wang, J., Gawad, C., Quake, S.R., 2014. A quantitative comparison of single-cell whole genome amplification methods. PLoS One 9, e105585. Dean, F.B., Hosono, S., Fang, L., Wu, X., Faruqi, A.F., Bray-Ward, P., Sun, Z., Zong, Q., Du, Y., Du, J., Driscoll, M., Song, W., Kingsmore, S.F., Egholm, M., Lasken, R.S., 2002. Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl. Acad. Sci. USA 99, 5261e5266. Findlay, I., Ray, P., Quirke, P., Rutherford, A., Lilford, R., 1995. Allelic dropout and preferential amplification in single cells and human blastomeres: implications for preimplantation diagnosis of sex and cystic fibrosis. Hum. Reprod. 10, 1609e1618. Fiorentino, F., Biricik, A., Bono, S., Spizzichino, L., Cotroneo, E., Cottone, G., Kokocinski, F., Michel, C.E., 2014. Development and validation of a nextgeneration sequencing-based protocol for 24-chromosome aneuploidy screening of embryos. Fertil. Steril. 101, 1375e1382. Fiorentino, F., Spizzichino, L., Bono, S., Biricik, A., Kokkali, G., Rienzi, L., Ubaldi, F.M., Iammarrone, E., Gordon, A., Pantos, K., 2011. PGD for reciprocal and Robertsonian translocations using array comparative genomic hybridization. Hum. Reprod. 26, 1925e1935. Gill, P., Ghaemi, A., 2008. Nucleic acid isothermal amplification technologies: a review. Nucleosides Nucleotides Nucleic Acids 27, 224e243. Handyside, A.H., 2013. 24-chromosome copy number analysis: a comparison of available technologies. Fertil. Steril. 100, 595e602. Hou, Y., Fan, W., Yan, L., Li, R., Lian, Y., Huang, J., Li, J., Xu, L., Tang, F., Xie, X.S., Qiao, J., 2013. Genome analyses of single human oocytes. Cell 155, 1492e1506. Li, H., Durbin, R., 2010. Fast and accurate long-read alignment with BurrowsWheeler transform. Bioinformatics 26, 589e595. Huang, J., Yan, L., Fan, W., Zhao, N., Zhang, Y., Tang, F., Xie, X.S., Qiao, J., 2014. Validation of multiple annealing and looping-based amplification cycle sequencing for 24-chromosome aneuploidy screening of cleavagestage embryos. Fertil. Steril. 102, 1685e1691. Liang, D., Lv, W., Wang, H., Xu, L., Liu, J., Li, H., Hu, L., Peng, Y., Wu, L., 2013. Non-invasive prenatal testing of fetal whole chromosome aneuploidy by massively parallel sequencing. Prenat. Diagn. 33, 409e415. Liang, D., Peng, Y., Lv, W., Deng, L., Zhang, Y., Li, H., Yang, P., Zhang, J., Song, Z., Xu, G., Cram, D.S., Wu, L., 2014. Copy number variation sequencing for comprehensive diagnosis of chromosome disease syndromes. J. Mol. Diagn. 16, 519e526.
159
Munne, S., 2012. Preimplantation genetic diagnosis for aneuploidy and translocations using array comparative genomic hybridization. Curr. Genomics 13, 463e470. Shen, J., Cram, D.S., Wu, W., Cai, L., Yang, X., Sun, X., Cui, Y., Liu, J., 2013. Successful PGD for late infantile neuronal ceroid lipofuscinosis achieved by combined chromosome and TPP1 gene analysis. Reprod. Biomed. Online 27, 176e183. Telenius, H., Carter, N.P., Bebb, C.E., Nordenskjold, M., Ponder, B.A., Tunnacliffe, A., 1992. Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer. Genomics 13, 718e725. Tibshirani, R., Wang, P., 2008. Spatial smoothing and hot spot detection for CGH data using fused lasso. Biostatistics 9, 18e29. Treff, N.R., Su, J., Tao, X., Northrop, L.E., Scott, R.T., 2011. Single-cell whole-genome amplification technique impacts the accuracy of SNP microarray-based genotyping and copy number analyses. Mol. Hum. Reprod. 17, 335e343. Vanneste, E., Voet, T., Le Caignec, C., Ampe, M., Konings, P., Melotte, C., Debrock, S., Amyere, M., Vikkula, M., Schuit, F., Fryns, J.P., Verbeke, G., D’Hooghe, T., Moreau, Y., Vermeesch, J.R., 2009. Chromosome instability is common in human cleavage-stage embryos. Nat. Med. 15, 577e583. Voet, T., Vanneste, E., Van der Aa, N., Melotte, C., Jackmaert, S., Vandendael, T., Declercq, M., Debrock, S., Fryns, J.P., Moreau, Y., D’Hooghe, T., Vermeesch, J.R., 2011. Breakage-fusion-bridge cycles leading to inv dup del occur in human cleavage stage embryos. Hum. Mutat. 32, 783e793. Wang, H., Wang, L., Ma, M., Song, Z., Zhang, J., Xu, G., Fan, J., Li, N., Cram, D.S., Yao, Y., 2014a. A PGD pregnancy achieved by embryo copy number variation sequencing with confirmation by non-invasive prenatal diagnosis. J. Genet. Genomics 41, 453e456. Wang, L., Cram, D.S., Shen, J., Wang, X., Zhang, J., Song, Z., Xu, G., Li, N., Fan, J., Wang, S., Luo, Y., Wang, J., Yu, L., Liu, J., Yao, Y., 2014b. Validation of copy number variation sequencing for detecting chromosome imbalances in human preimplantation embryos. Biol. Reprod. 91, 37. Wang, L., Wang, X., Zhang, J., Song, Z., Wang, S., Gao, Y., Wang, J., Luo, Y., Niu, Z., Yue, X., Xu, G., Cram, D.S., Yao, Y., 2014c. Detection of chromosomal aneuploidy in human preimplantation embryos by nextgeneration sequencing. Biol. Reprod. 90, 95. Wang, Y., Chen, Y., Tian, F., Zhang, J., Song, Z., Wu, Y., Han, X., Hu, W., Ma, D., Cram, D., Cheng, W., 2014d. Maternal mosaicism is a significant contributor to discordant sex chromosomal aneuploidies associated with noninvasive prenatal testing. Clin. Chem. 60, 251e259. Wells, D., Kaur, K., Grifo, J., Glassner, M., Taylor, J.C., Fragouli, E., Munne, S., 2014. Clinical utilisation of a rapid low-pass whole genome sequencing technique for the diagnosis of aneuploidy in human embryos prior to implantation. J. Med. Genet. 51, 553e562. Yin, X., Tan, K., Vajta, G., Jiang, H., Tan, Y., Zhang, C., Chen, F., Chen, S., Pan, X., Gong, C., Li, X., Lin, C., Gao, Y., Liang, Y., Yi, X., Mu, F., Zhao, L., Peng, H., Xiong, B., Zhang, S., Cheng, D., Lu, G., Zhang, X., Lin, G., Wang, W., 2013. Massively parallel sequencing for chromosomal abnormality testing in trophectoderm cells of human blastocysts. Biol. Reprod. 88, 69. Yu, Z., Lu, S., Huang, Y., 2014. Microfluidic whole genome amplification device for single cell sequencing. Anal. Chem. 86, 9386e9390. Zhang, L., Cui, X., Schmitt, K., Hubert, R., Navidi, W., Arnheim, N., 1992. Whole genome amplification from a single cell: implications for genetic analysis. Proc. Natl. Acad. Sci. USA 89, 5847e5851. Zheng, Y.M., Wang, N., Li, L., Jin, F., 2011. Whole genome amplification in preimplantation genetic diagnosis. J. Zhejiang Univ. Sci. B 12, 1e11. Zong, C., Lu, S., Chapman, A.R., Xie, X.S., 2012. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science 338, 1622e1626.