A novel approach to improving hybrid capture sequencing targeting efficiency

A novel approach to improving hybrid capture sequencing targeting efficiency

Molecular and Cellular Probes 46 (2019) 101424 Contents lists available at ScienceDirect Molecular and Cellular Probes journal homepage: www.elsevie...

989KB Sizes 2 Downloads 99 Views

Molecular and Cellular Probes 46 (2019) 101424

Contents lists available at ScienceDirect

Molecular and Cellular Probes journal homepage: www.elsevier.com/locate/ymcpr

A novel approach to improving hybrid capture sequencing targeting efficiency

T

WenXiang Lua,b, Miao Zhua, Yi Chenc,*, Yunfei Baid,** a

Department of R&D, Decode Genomics Incorporation, Nanjing, 210000, China Department of R&D, AGCU ScienTech Incorporation, Wuxi, 214174, China c Department of Obstetrics and Gynaecology, Kunshan Hospital of Traditional Chinese Medicine, Kunshan, 215300, China d State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, China b

A R T I C LE I N FO

A B S T R A C T

Keywords: Hybrid capture Single-stranded DNA Library construction Targeting efficiency

At present, approaches to hybrid capture sequencing have many limitations, from their significant complexity and labor requirements, to their low enrichment efficiency, resulting in their limited utilization. In an effort to overcome these drawbacks, we have developed a novel method that relied upon direct genomic DNA hybridization and single-stranded DNA library preparation. Using this novel protocol, we were able to achieve a targeting efficiency as high as 75%, and we found this approach to overall be an efficient and simple approach to DNA library construction.

1. Introduction

hybridization could be achieved through a PCR-free approach in which genomic DNA is directly hybridized without the need for an adaptercontaining DNA library. In light of this hypothesis, we developed a novel protocol aimed at improving in-solution hybridization capture targeting efficiency (Fig. 1). During this approach, sheared genomic DNA undergoes hybridization to biotinylated RNA bait. Following hybridization, streptavidin magnetic beads are used to capture these bait RNAs and any hybridized DNA, after which the beads are washed to eliminate nonspecific binding and the captured fragments are eluted and ligated to specific adapter sequences. These ligated DNA sequences then undergo amplification via Illumina-compatible primers over the course of a limited number of PCR cycles.

Targeted sequencing is a commonly employed strategy in both clinical and research settings, as it necessitates the use of substantially reduced computational and sequencing resources as compared to whole genome sequencing, allowing it to be a more cost-effective strategy that can be more readily implemented [1]. To date, several different strategies for such targeted sequencing have been developed, with many depending on the selection of specific regions of interest using either PCR amplification- or hybridization capture-based methods [2]. In settings where panels of distinct genes are being analyzed, in-solution hybridization enrichment strategies are the most commonly employed as they are highly specific, reproducible, and scalable. Each approach to targeted sequencing is hampered by certain specific limitations pertaining either to ease-of-use, coverage, on-target enrichment rates, or cost, and as such there are many ongoing efforts to optimize solutionbased targeted enrichment methods [3,4]. One key factor that affects the ultimate enrichment efficiency of a given approach is the use of adapters which are ligated to the genomic library in the hybridization mixture. When specific degenerate primers are added to the hybridization reaction, they are able to block these adapter sequences and compete for nonspecific hybridization, resulting in an increase in consequent on-target read rate to roughly 60%, while the on-target read rate was roughly 35% without blocking oligonucleotides [5,6]. As such, we hypothesized that more efficient, uniform

*

2. Materials & methods 2.1. DNA extraction We obtained human gDNA from a donor-provided blood sample, whereas bacterial genomic DNA was derived from Escherichia coli DH5α. Both samples were collected from Southeast University. The DNeasy Blood & Tissue Kit (Qiagen) was used for DNA extraction, and extracted DNA was then quantified with a Qubit dsDNA HS Assay system (Life Technologies).

Corresponding author. Corresponding author. E-mail addresses: [email protected] (Y. Chen), [email protected] (Y. Bai).

**

https://doi.org/10.1016/j.mcp.2019.101424 Received 13 May 2019; Received in revised form 8 July 2019; Accepted 19 July 2019 Available online 20 July 2019 0890-8508/ © 2019 Elsevier Ltd. All rights reserved.

Molecular and Cellular Probes 46 (2019) 101424

W. Lu, et al.

Fig. 1. Overview of the library preparation strategy. Genomic DNA is sheared and then undergoes hybridization with biotinylated RNA bait sequences. Streptavidin magnetic beads are then used to capture sheared hybridized DNA fragments. These eluted captured DNA sequences are then ligated with 5′ and 3′ adapters with random overhanding nucleotides, after which this adapter-ligated DNA undergoes PCR amplification using Illumina-compatible primers.

an exome capture approach [7]. Briefly, 500 ng of the above-generated genomic DNA was combined with 2.5 μg of human Cot-1DNA (Life Technologies) and 2.5 μg of salmon sperm DNA (Life Technologies) in a PCR tube. As a negative control, water was also combined with human Cot-1DNA and salmon sperm DNA and was then processed using the same capture and library preparation process outlined below. This solution was then warmed to 95 °C for 5 min, and then incubated at 65 °C for 5 min. This sample was then combined with 2X hybridization buffer (10X SSPE, 10X Denhardt's, 10 mM EDTA and 0.2% SDS; 65 °C), a mixture of 500 ng biotinylated RNA baits that had been warmed to 65 °C for 2 min, and 20 U SUPERase-In (Ambion). The solution was thoroughly mixed via pipetting, after which it was allowed to incubate at 65 °C for 12 h.

2.2. Adaptor preparation GenScript Biotech synthesized all oligonucleotides used herein. 3′adapters were prepared by mixing equimolar concentrations of two oligonucleotides – SS3F (5′P-AGATCGGAAGAGCACACGTCTGAACTCC AGTC-3′-amino-modiier) and SS3R (5′- GTGTGCTCTTCCGATCTNNN NNN-3′-amino-modifier) – at a final concentration of 100 μM in ddH2O. 5′adapters were prepared in the same manner using the two following oligonucleotides: SS5F (5′-ACACTCTTTCCCTACACGACGCTCTTCCGA TCT-3′) and SS5R (5′- NNNNNNAGATCGGAAGAGCGTC-3′-aminomodifier). For both adapters, these oligonucleotides were annealed via heating to 95 °C, after which they were slowly cooled to 10 °C. 2.3. Hybrid capture library preparation

2.3.2. Washing Following this hybridization reaction, 50 μl M-280 streptavidin Dynabeads (Invitrogen) were collected and washed with an appropriate wash buffer (1 M NaCl, 10 mM Tris-HCl [pH 7.5], 1 mM EDTA, and 0.01% Tween 20) before their resuspension in binding buffer (1 M NaCl, 1 mM EDTA, and 10 mM Tris-HCl, pH7.5). These beads were then mixed with the hybridized target samples for 30 min with occasional vortexing. A magnetic separator was then used to isolate the streptavidin beads, and isolated sequences were washed once for 15 min at room temperature using 1X SSC/0.1% SDS, after which samples were washed thrice for 10 min each at 65 °C using prewarmed 0.1X SSC/ 0.1% SDS. DNA selected in these samples via hybridization was then

We utilized Agilent SureDesign to generate unique RNA bait sequences targeting ~3.1 Mb of the human genome, including 352 skeletal disorder-related genes in total. Baits were 120 nucleotides long, with a 2X density parameter. Genomic DNA was sheared to ~300 bp (Covaris E220 System) followed by treatment using 5 U of T4 polynucleotide kinase (Thermo Fisher). The resultant DNA was then purified with 1.8 X AMPure XP (BeckmanCoulter) beads and eluted using 10 μl ddH2O. 2.3.1. Hybridization Hybridization was conducted in a manner previously described for 2

Molecular and Cellular Probes 46 (2019) 101424

W. Lu, et al.

Table 1 A workflow comparison for various DNA enrichment strategies.

Time This study Conventional

Covaris shearing

5′-phosphates

End repair

A-tailing

Adapter ligation

Amplification

Hybridization

Adapter ligation

Amplification

80 s ✓ ✓

15 min ✓ –

15 min – ✓

15 min – ✓

15 min – ✓

25 min – ✓

16 h ✓ ✓

15 min ✓ –

45 min ✓ ✓

Table 2 Sequencing, mapping, and coverage for the direct hybridization method. Library

Total reads(M)

Reads mapping rate

On target reads rate

Duplication rate

Target regions > 1X

Target regions > 10X

DHM-1 DHM-2 DHM-3

10 10 10

91% 90% 90%

77% 75% 76%

16% 19% 17%

99% 99% 99%

96% 95% 95%

Fig. 2. Human gDNA library read densities. No capture step was performed for this assay, with only library preparation having been conducted. Read densities at the genome-scale were calculated with a 1 M window. Red bands correspond to centromeric regions. Density drops in centromeric regions are expected because these regions are commonly enriched with long arrays of near-identical tandem repeats.

eluted using 50 μl 0.1 M NaOH for 10 min at room temperature, after which 50 μl 1 M Tris-HCl (pH 7.5) was added for neutralization. Lastly, 1.8 X AMPure XP beads were used in order to allow for the concentration of isolated DNA, eluting in 10 μl ddH2O.

High-Fidelity Master Mix (NEB) and 10 μM of appropriate PCR Primers, which were as follows:

2.3.3. Amplification The eluted captured single-stranded DNA then underwent simultaneous 3′ and 5’ adapter ligation using 1 μM of each adapter at 25 °C for 15 min in a 25 μl total volume containing 12.5 μl 2 X Quick ligation buffer (NEB) and 1 μl Quick T4 DNA ligase (NEB). Nucleic acid sequences were then isolated using 1 X AMpure XP beads, and underwent PCR amplification in a 50-μl PCR reaction containing 25 μl of 2 X Q5

5′-CAAGCAGAAGACGGCATACGAGATN8GTGACTGGAGTTCAGACGTG TGCTCTTCCGATCT-3′

5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG CTCTTCCGATCT-3′

where N8 indicates Illumina index barcode sequences. PCR settings included: 98 °C for 3 min, 13 cycles of [98 °C for 10 s, 66 °C for 30 s, 72 °C for 1 min], 72 °C for 5 min. We then used 1 X AMPure XP beads to isolate the final libraries, eluting in 20 μl ddH2O. 3

Molecular and Cellular Probes 46 (2019) 101424

W. Lu, et al.

Fig. 2. (continued)

2.4. Genomic DNA NGS library preparation

3. Results

We sheared human and E. coli DH5α gDNA into ~300 bp fragments which were then treated with 5 U T4 polynucleotide kinase and purified via 1.8 X AMPure XP beads, eluting in 10 μl ddH2O as above. This isolated gDNA was then warmed to 95 °C for 5 min, prior to cooling on ice. This gDNA, along with a negative control containing water instead of gDNA, then underwent simultaneous 3′ and 5’ adapter ligation (5 μM each adapter) at 25 °C for 15 min in a total volume of 25 μl containing 12.5 μl 2 X Quick ligation buffer and 1 μl Quick T4 DNA ligase, respectively. We then used 1 X AMPure XP beads to isolate the ligated DNA sequences, which were amplified in a total 50-μl PCR reaction volume as above. In order to benchmark library preparation performance, we also obtained the commercial VAHTS™ Universal DNA Library Prep Kit (Vazyme Biotech), which was used in order to generate an E.coli DH5α gDNA library based on provided protocols.

3.1. Development of the direct hybridization approach In this study, we developed a novel strategy as an alternative to current approached for targeted sequencing, in an effort to improve targeting efficiency via hybridization of genomic DNA to RNA bait molecules. There are two key steps to this method (Fig. 1). First, genomic DNA is sheared and hybridized to biotinylated RNA bait sequences. Second, single-stranded DNA library construction is performed. This gDNA direct hybridization approach both serves to eliminate the risk of adaptor-mediated cross-hybridization, and does not necessitate any amplification steps prior to hybridization. Ultimately, this approach markedly reduced both the time and effort needed for effective target enrichment owing to the library preparation step and the removal of one PCR amplification. We conducted a rough comparison of the time considerations for both out approach and for a conventional strategy (Table 1). In total our protocol incorporates 5 basic steps, whereas a more conventional approach entails 7. The implementation of our strategy has the potential to allow for a significant reduction in the workforce, reagents, and time needed to complete this enrichment process owing to the enzymatic processing and purification steps employed throughout the enrichment process.

2.5. Sequencing and data analysis An Agilent Bioanalyzer 2100 was used to quantify sample libraries, which were then sequenced using an Illumina Hiseq X Ten platform (2 × 150 bp) (Nanjing Geneseeq). The SOAPnuke software was used in order to process raw reads, and to eliminate low-quality and adapter sequences, after which BWA was used for alignment of the remaining reads to the hg19 human reference genome [8]. We then employed SAMtools [9], GATK [10], and Picard to sort SAM/BAM files, conduct local realignment, and mark duplicate sequences.

3.2. A direct hybridization approach to hybrid capture library preparation Using the approach outlined in the present study, we prepared a negative control library containing no sample DNA but all other reagents and three hybrid capture libraries from the same sample, and 4

Molecular and Cellular Probes 46 (2019) 101424

W. Lu, et al.

Fig. 2. (continued) Table 3 Comparison of DH5α DNA library preparation outcomes. Sample

Raw reads (M)

Mapped reads ratio

Mean depth

1X Coverage Depth

≤30X Coverage Depth

≤50X Coverage Depth

DH-100ng DH-10ng DH-VAHTS

4 4 4

99.5% 98.8% 99.4%

103.2 102.4 107.8

99.9% 99.9% 99.9%

99.9% 99.8% 99.9%

98.3% 96.7% 99.7%

Fig. 3. E. coli gDNA library comparison of the read density distributions for libraries generated with our approach and the VAHTS kit. No capture step was performed for this assay, with only library preparation having been conducted. Read densities were calculated at the whole-genome level using a 10K window.

PCR step is needed for this approach following hybridization.

then used them for sequencing. About 0.1 million reads were obtained from the negative control library, and further analysis revealed that no reads could be mapped to target regions, indicating that human Cot1DNA and salmon sperm DNA do not affect our data. Approximately 10 million reads were obtained from each library sample, and the resultant data were used to assess target efficiency and duplication rates (Table 2). We found that using our approach we were able to achieve an increased on-target read rate of approximately 75%, with a duplication rate < 20%. These findings highlight that this approach does improve target efficiency, as hypothesized. Furthermore, our approach did not increase the duplication rate significantly, and that as such only a single

3.3. NGS library construction with genomic DNA To confirm that our approach can be used in order to generate DNA libraries suited to NGS efforts, we used human and E.coli DH5α gDNA for sample library construction. To assess adapter dimer or artifact formation in the library preparation process, a negative control library and a sample library with 10 ng human gDNA input were constructed before the indexing PCR step. These libraries underwent qPCR using universal primers specific 5

Molecular and Cellular Probes 46 (2019) 101424

W. Lu, et al.

Previous work has revealed that the efficiency of hybridization approaches necessitating the use of long adapter libraries decreases as a consequence of both library hybridization to bait sequences, and due to the long adapter-mediated capture of off-target regions [5,6]. In order to overcome this issue, shorter adapters have been designed for use in generating truncated libraries during the hybrid capture process [11]. While an improvement over long adapters, these shorter adapter sequences still remain in the libraries, and blocking oligonucleotides must therefore be used during the hybrid capture process as a means of improving hybridization efficiency. Unlike in previous studies, we were able to mediate direct bait hybridization to the DNA fragments, thereby eliminating the presence of adapters during the hybrid capture step, with no need to construct a DNA library prior to hybridization. This simple advance allowed us to achieve very high on-target read rates, without resulting in any significant increase in the duplication rate as has been observed in certain published approaches such as the double hybridization method [6]. The fact that multiplexing is not possible during capture in this approach is due to the missing index/barcode sequence. The adapter ligation scheme outlined in Fig. 1 can also be employed for DNA library construction without the hybridization capture step if the sample DNA is heat-denatured prior to the reaction. Indeed, we found it to be a straightforward and easy to implement approach to such library construction as compared alternative methods [12–14]. We found this approach to be well suited for a range of DNA input levels (as low as 10 ng), while remaining highly concordant with a commercial library preparation kit. In summary, we have developed a novel approach to constructing a hybrid capture library via directly hybridizing gDNA and ssDNA library preparation. Our approach can enhance the on-target efficiency of targeted sequencing approaches, and remained very simple to employ. Importantly, we also developed this approach so as to be compatible with DNA library construction for NGS assays using a range of DNA input amounts.

Fig. 4. Pairwise comparison of reads densities at the genome-wide level with a 10K window; Right of the diagonal are Pearson's correlation coefficient values.

for the ligated adapters as follows: qPCR-F (5′-ACTCTTTCCCTACACG ACGCTC-3′) and qPCR-R (5′- GACTGGAGTTCAGACGTGTGCT-3′). Each library was diluted 1000x, yielding a CT value of 25.75 ± 0.15 for the negative control library and a value of 20.49 ± 0.16 for the sample gDNA library. These results suggest that simultaneous 3′ and 5′ adapter ligation can produce a limited amount of adapter dimer formation. However, after the PCR indexing step we conducted bead purification with a 1 X bead-to-sample ratio to remove PCR-amplified adapter dimers from the library, ensuring that these dimers cannot be detected in the final libraries. To assess the performance of our approach, we utilized a range of different human gDNA input amounts (10, 50, or 100 ng). Library coverage was assessed using roughly 25 million reads, and genomewide reads densities were calculated for each library. The results highlighted a uniform distribution of sequences across chromosomes, although the library constructed from 10 ng of input DNA had some inconsistencies in certain areas, as on chromosome 16 (Fig. 2). These results suggest that our library preparation step is well-suited to the successful construction of NGS libraries. To compare how well our approach coincided with the results of a commercial library preparation kit, we used E.coli DH5α gDNA for library construction, given that this organism has a smaller genome than the human genome, and thus we were able to achieve whole genome coverage using only a small amount of data. We used either 10 or 100 ng of input DNA for our approach, and we used 100 ng of input DNA with VAHTS™ Universal DNA Library Prep Kit. For each sample, the resultant sequencing data were down-sampled to 4 million reads. Although there was a correlation between input DNA amount and depth of coverage, our method and the VAHTS kit exhibited overall similar findings with respect to mean depth of coverage (Table 3). We additionally compared the performance of our approach in comparison to the commercially available VAHTS kit. We observed a largely identical read density for both approaches (Fig. 3). Consistent with this, there were very high correlation coefficients when comparing the read density between these two libraries and the VAHTS kit (Pearson's R = 0.97 for 100 ng and 0.96 for 10 ng; Fig. 4).

Author contributions W.X.L developed the method and performed the experiments. M.Z performed all bioinformatics analyses. Y.C and Y.F.B instructed the study and edited the manuscript. Acknowledgments We thank the department of IT of the Decode Genomics Incorporation for their technical assistance. References [1] L. Mamanova, A.J. Coffey, C.E. Scott, et al., Target-enrichment strategies for nextgeneration sequencing, Nat. Methods 7 (2010) 111–118. [2] I. Kozarewa, J. Armisen, A.F. Gardner, B.E. Slatko, C.L. Hendrickson, Overview of target enrichment strategies, Curr. Protoc. Mol. Biol. 112 (2015) 7.21.1-23. [3] M. Harakalova, M. Mokry, B. Hrdlickova, et al., Multiplexed array-based and insolution genomic enrichment for flexible and cost-effective targeted next-generation sequencing, Nat. Protoc. 6 (12) (2011) 1870–1886. [4] A.E. Shearer, M.S. Hildebrand, R.J. Smith, Solution-based targeted genomic enrichment for precious DNA samples, BMC Biotechnol. 12 (2012) 20. [5] I.J. Nijman, M. Mokry, R. van Boxtel, P. Toonen, E. de Bruijn, E. Cuppen, Mutation discovery by targeted genomic enrichment of multiplexed barcoded samples, Nat. Methods 7 (11) (2010) 913–915. [6] H. Lee, B.D. O'Connor, B. Merriman, et al., Improving the efficiency of genomic loci capture using oligonucleotide arrays for high throughput resequencing, BMC Genomics 10 (2009) 646. [7] A. Gnirke, A. Melnikov, J. Maguire, et al., Solution hybridselection with ultra-long oligonucleotides for massively parallel targeted sequencing, Nat. Biotechnol. 27 (2009) 182–189. [8] H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform, Bioinformatics 26 (2010) 589–595. [9] H. Li, B. Handsaker, A. Wysoker, et al., The sequence alignment/map format and SAMtools, Bioinformatics 25 (2009) 2078–9. [10] A. McKenna, M. Hanna, E. Banks, et al., The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data, Genome Res. 20

4. Discussion In this present manuscript, we have described a novel approach to hybrid capture library preparation that is also suited for NGS library preparation. The approach described herein is simple and easy to implement, while improving on-target read rates. 6

Molecular and Cellular Probes 46 (2019) 101424

W. Lu, et al.

from highly degraded DNA using T4DNA ligase, Nucleic Acids Res. 45 (10) (2017) e79. [14] D.W. Lazinski, A. Camilli, Homopolymer tail-mediated ligation PCR: a streamlined and highly efficient method for DNA cloning and library construction, Biotechniques 54 (2013) 25–34.

(2010) 1297–1303. [11] R. Nadin, R. David, Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture, Genome Res. 22 (2012) 939–946. [12] M.T. Gansauge, M. Meyer, Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA, Nat. Protoc. 8 (4) (2013) 737–748. [13] M.T. Gansauge, T. Gerber, I. Glocke, et al., Single-stranded DNA library preparation

7