Accepted Manuscript Title: Massively parallel sequencing of 32 forensic markers using the Precision ID GlobalFilerTM NGS STR Panel and the Ion PGMTM System Authors: Zheng Wang, Di Zhou, Hui Wang, Zhenjun Jia, Jing Liu, Xiaoqin Qian, Chengtao Li, Yiping Hou PII: DOI: Reference:
S1872-4973(17)30190-4 http://dx.doi.org/10.1016/j.fsigen.2017.09.004 FSIGEN 1778
To appear in:
Forensic Science International: Genetics
Received date: Revised date: Accepted date:
7-2-2017 26-8-2017 6-9-2017
Please cite this article as: Zheng Wang, Di Zhou, Hui Wang, Zhenjun Jia, Jing Liu, Xiaoqin Qian, Chengtao Li, Yiping Hou, Massively parallel sequencing of 32 forensic markers using the Precision ID GlobalFilerTM NGS STR Panel and the Ion PGMTM System, Forensic Science International: Geneticshttp://dx.doi.org/10.1016/j.fsigen.2017.09.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Massively parallel sequencing of 32 forensic markers using the Precision ID GlobalFilerTM NGS STR Panel and the Ion PGM TM System
Zheng Wang a, Di Zhou b, Hui Wang c, Zhenjun Jia d, Jing Liu a, Xiaoqin Qian a, Chengtao Li e, Yiping Hou a,f,* a
Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan
University, Chengdu 610041, China b
Thermo Fisher Scientific, Shanghai 200050, China
c
Department of Forensic Genetics, Institute of Forensic Science, Chengdu Public Security Bureau,
Chengdu 610081, China d
Department of Criminal Science and Technology, People's Public Security University of China,
Beijing 100038, China e
Shanghai Key Laboratory of Forensic Medicine, Institute of Forensic Science, Ministry of Justice,
P.R. China, Shanghai 200063, China f
Collaborative Innovation Center of Judicial Civilization, Beijing 100088, China
The first author: Zheng Wang, M.D. Institution: Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu 610041, China. Address: 3-16 Renmin South Road, Chengdu 610041, China E-mail:
[email protected].
Corresponding Author: Prof. Yiping Hou Institution: Institute of Forensic Medicine, West China School of Basic Science and Forensic Medicine, Sichuan University, Chengdu 610041, China. Address: 3-16 Renmin South Road, Chengdu 610041, China Tel: +86-28-85501549 Fax: +86-28-85501550. Email:
[email protected]
1
Highlights
·The Precision ID GlobalFiler™ NGS STR Panel and Ion PGM TM System can generate concordant result with CE-based methods.
·Single source full profiles could be obtained using as little as 100 pg of input DNA.
·Partial STR genotypes of the minor contributor could be detected up to 19:1 mixture.
·106 unrelated Han individuals were sequenced to perform genetic analyses of allelic diversity at MPS level.
Abstract Massively parallel sequencing (MPS) technologies have proved capable of sequencing the majority of the key forensic STR markers. By MPS, not only the repeat-length size but also sequence variations could be detected. Recently, Thermo Fisher Scientific has designed an advanced MPS 32-plex panel, named the Precision ID GlobalFilerTM NGS STR Panel, where the primer set has been designed specifically for the purpose of MPS technologies and the data analysis is supported by a new version HID STR Genotyper Plugin (V4.0). In this study, a series of experiments that evaluated concordance, reliability, sensitivity of detection, mixture analysis, and the ability to analyze case-type and challenged samples were conducted. In addition, 106 unrelated Han individuals were sequenced to perform genetic analyses of allelic diversity. As expected, MPS detected broader allele variations and gained higher power of discrimination and exclusion rate. MPS results were found to be concordant with current capillary electrophoresis methods, and single source complete profiles could be obtained stably using as little as 100 pg of input DNA. Moreover, this MPS panel could be adapted to case-type samples and partial STR genotypes of the minor contributor could be detected up to 19:1 mixture. Aforementioned results indicate that the Precision ID GlobalFiler TM NGS STR Panel is reliable, robust and reproducible and have the potential to be used as a tool for human forensics.
Keywords: Massively parallel sequencing (MPS); Ion PGM™ System; Short tandem repeat (STR); Precision ID GlobalFilerTM NGS STR Panel
2
1. Introduction
Short tandem repeats (STRs) are DNA regions with a variable number of tandemly repeated short sequence motifs (2-7 bp in length) [1]. The number of repeat motifs in some STRs can be highly variable among individuals, and DNA profiling with sets of selected STRs (e.g. the expanded CODIS core loci [2,3]) has now been applied in various aspects of human forensics [4,5]. Genotyping of STRs based on length-based amplicon detection has been used in forensic genetics for over 20 years. Length-based detection method uses amplification by PCR of STR markers (in the form of commercial kits), separation by capillary electrophoresis (CE) and allele calling using fluorescent fragment sizing. This type of analysis procedure has been the mainstay for STR typing applications, however, it is not without limitations. In CE, designing non-overlapping fragment lengths for STRs or using different fluorescent labels for STR primers limits the number of STRs that can be multiplexed together. In addition, CE only identifies STRs by length and cannot distinguish between isoalleles (alleles of the same length but different sequences). Massively parallel sequencing (MPS), also known as next generation sequencing (NGS), has been around for over a decade, and has the capability to sequence many targeted regions of multiple samples simultaneously at single-base resolution [6]. In recent years, MPS technology has attracted much interest amongst researchers especially in the field of forensic science [7]. To date, multiple studies have been conducted to sequence forensic biomarkers using MPS, including STR [8-17], SNP [18-21], microRNA [22] and mtDNA [23-25]. All these studies indicate that MPS holds promise for forensic applications. For STRs, many more loci could be multiplexed in one reaction with MPS technology because the detection is no longer based on fluorescent labeling. Moreover, MPS has the potential not only to reveal the sequence variations in repeats and flanking regions of STRs but also to meet the demands of high multiplexing and throughput. In recent years several MPS-STR commercial kits were released, e.g. the Ion TorrentTM HID STR 10-plex and the Early Access STR Kit v1 by Thermo Fisher Scientific (Waltham, MA, USA) [9,12], the PowerSeqTM Auto System by Promega (Madison, WI, USA) [11], and ForenSeqTM DNA Signature Prep Kit by Illumina (San Diego, CA, USA) [13]. Related
3
studies support that multiplex STR typing by MPS is a promising technology for forensic applications and reliable STR genotypes could be obtained at a sensitivity level [9,11-13]. In the study herein, the performance of a novel MPS-STR panel, the Precision ID GlobalFilerTM NGS STR Panel (Thermo Fisher Scientific) that is designed for MPS on the Ion Torrent platforms, was evaluated. The multiplex panel contains the expanded CODIS 20 core loci (CSF1PO, FGA, TH01, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, D1S1656, D2S441, D2S1338, D10S1248, D12S391, D19S433, D22S1045) as well as 9 non-CODIS core loci (D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S474, D6S1043, D12ATA63 and D14S1434), Amelogenin, Y-STR (DYS391) and Y-InDel (rs2032678). The studies addressed are sensitivity of detection, concordance, reliability, case-type and challenged samples typing and mixture analysis. In addition, 106 unrelated Han individuals were sequenced to perform genetic analyses of allelic diversity at MPS level.
2. Materials and methods
2.1 Sample preparation Human blood samples were collected with the approval of the Ethics Committee at the Institute of Forensic Medicine, Sichuan University (Approval Number: K2015008). Peripheral blood was collected from 106 unrelated healthy individual donors (51 females and 55 males) after receiving written informed consents. Human genomic DNA was extracted using PureLink® Genomic DNA Mini Kit (Thermo
Fisher
Scientific)
according
to
the
manufacturer’s
protocol
(https://tools.thermofisher.com/content/sfs/manuals/purelink_genomic_man.pdf). Genomic DNA was quantified using Quantifiler® Human DNA Quantification kit (Thermo Fisher Scientific) on an Applied BiosystemsTM 7500 Real-time PCR System (Thermo Fisher Scientific) following the manufacturer’s instructions
(https://tools.thermofisher.com/content/sfs/manuals/cms_041395.pdf).
quantitative results, samples were normalized to 1.0 ng/μL.
2.2 Library construction
4
Based
on
STR libraries were prepared according to the Precision ID STR panel protocol (Revision A.0, https://tools.thermofisher.com/content/sfs/manuals/MAN0015830_PrecisionID_Panels_IonPGM_UG.p df) and materials with minor modifications. For the amplification of targets, 1.0 ng DNA was performed in 20 μL of reaction volume containing 4 μL of 5X Ion AmpliSeq HiFi Mix, 10 μL of Precision ID STR panel and 5 μL nuclease-free water. Thermal cycling was performed on the ProFlexTM 96-Well PCR System (Thermo Fisher Scientific) using the following conditions: enzyme activation for 2 min at 99 °C, amplification for 23 cycles of 15 s at 99 °C and 4 min at 60 °C, and hold at 10 °C. Then, primer sequences were partially digested by adding 2 μL FuPa Reagent and incubated for 65 °C for 10 min at 50 °C, 10 min at 55 °C, 20 min at 60 °C and hold up to 1 h at 10 °C. For the ligation of STR libraries with adaptor, 4 μL Switch solution, 4 μL DNA ligase, 0.5 μL Ion P1 Adapter, 0.5 μL Xpress Barcode X (X choose from Ion XpressTM Barcode Adapters 1-96 kit for different samples) and 1 μL Nuclease-free Water was added into 22 μL digested PCR reaction, and then incubated for 30 min at 22 °C, 10 min at 68 °C and hold up to 1 h at 10 °C. After ligation, each library was purified with 45 μL AgencourtTM AMPureTM XP Reagent (Beckman Coulter, FL, USA) as recommended by the manufacturer with minor modifications that libraries were washed by 70% ethanol three times. To assess the yield and normalize the libraries, 9 μL-diluted libraries (1:100 dilution) were quantified on an Applied BiosystemsTM 7500 Real-time PCR System with the Ion Library TaqManTM Quantitation Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions, and then libraries were normalized to 12 pM (except libraries in sensitivity study). The normalized libraries were pooled in equal volume amounts for template preparation.
2.3 Template preparation and Ion PGMTM sequencing The diluted library (25 μL) was used to generate template positive Ion SphereTM Particles (ISPs) containing clonally amplified DNA. Emulsion PCR (emPCR) was conducted in the Ion OneTouchTM 2 System (Thermo Fisher Scientific) with the Ion PGMTM Hi‑ QTM OT2 Kit (Thermo Fisher Scientific).
5
Template-positive ISPs were enriched with the Ion OneTouchTM ES (Thermo Fisher Scientific) following the manufacturer’s recommendations. Sequencing was performed on the Ion Torrent PGMTM instrument using the Precision ID STR Hi‑ QTM Sequencing Solutions Kit and Ion 318TM Chip v2 (850 flows). Sequencing primer and Control Ion SpheresTM Particles of the Ion PGMTM Hi‑ QTM sequencing kit were added to the enriched, template-positive ISPs. After annealing the sequencing primer, sequencing polymerase was added and a final volume of 30 μL was loaded onto the Ion 318TM chip (Thermo Fisher Scientific).
2.4 Sequencing data acquisition and analysis All sequencing data generated in this study were analyzed using the Torrent Suite V4.4 (Thermo Fisher Scientific) with the Homo sapiens hg 19 genome as the alignment reference. The HID STR Genotyper plugin V4.0 was launched to identify regions of STR loci with the panel BED file (PrecisionID GlobalFiler NGS STR Panel targets.bed) and the JSON file (PrecisionID GlobalFiler NGS STR Panel AnalysisParams v1. json). In the output files generated from the Genotyper plugin, sequence at each locus could be divided into three categories, including allele sequence, stutter sequence (N ± 1 repeat motif) and noise sequence (not alleles and not stutters, i.e., PCR/sequence errors) [12]. In this study, a minimum analytical coverage threshold was set to 500 reads for loci and minimum coverage threshold used for determination of a minor allele was 200 reads. Depth of coverage (DoC), also known as read depth, was calculated for each sample and each STR locus. Allele ratio (AR) was computed for each STR locus as ratio of allele reads to total reads (allele reads, stutter reads and noise reads). Allele coverage ratio (ACR), also known as heterozygote balance (Hb), was calculated for each heterozygous STR locus as ratio of lower allele coverage to higher allele coverage (1.0 indicating equal coverage of two alleles at heterozygous STR locus).
2.5 Concordance study and forensic parameters calculation Control DNA 007 (Thermo Fisher Scientific), Control DNA 9947A (AGCU, Wuxi, China) and 106 Han samples were chosen for concordance study. The Huaxia Platinum System (Thermo Fisher Scientific) [26] and the AGCU 21+1 STR kit (AGCU) [27,28] were selected to obtain 31 loci (29
6
autosomal loci, Amelogenin and rs2032678, except DYS391) in the Precision ID GlobalFiler TM NGS STR Panel for the concordance study. STR genotypes generated from the HID STR Genotyper plugin V4.0 were compared to the results of CE-based data from the Huaxia Platinum System and the AGCU 21+1 STR kit. PCR amplification (CE-based kits) was performed on a ProFlexTM 96-well PCR System and amplification products were separated and detected on the Applied Biosystems 3500xL Genetic Analyzers (Thermo Fisher Scientific) using POP-4 polymer and 36 cm capillary array according to the manufacturer’s instructions. Initial fragment sizing and allele calling were conducted using GeneMapper ID-X software V1.4 (Thermo Fisher Scientific) with the peak amplitude threshold set at 175 relative fluorescence units (RFUs) for all colors. For every plate Control DNA 007 was included as a positive control and a nuclease-free water sample was included as negative control sample. To assess the potential genetic variation, STR sequencing results of 106 Han samples were analyzed. Sequence variants per locus and per allele were compiled and counted. The power of discrimination (PD), and power of exclusion (PE) were calculated for the alleles observed in MPS and CE using the Promega PowerStats V12 spreadsheet.
2.6 Sensitivity study To demonstrate the sensitivity of the Precision ID GlobalFiler TM NGS STR Panel, a series of dilutions of Control DNA 007 (1000, 500, 250, 100 and 50 pg) was used in triplicate (i.e., 15 samples) for library preparations. To avoid run-to-run variation, libraries were labeled by different barcodes to conduct template preparation and then sequenced on the same Ion 318TM chip.
2.7 Reliability study Control DNA 007 was amplified in quadruplicate (i.e., 4 samples) at 1.0 ng of input DNA to determine coverage variation (the coverage of each locus/the total coverage of all loci), inter-locus balance (the lowest coverage locus/the highest coverage locus) and ACR variation. To avoid run-to-run variation, all reactions were sequenced in the same run.
2.8 Case-type and challenged samples
7
To evaluate the ability to obtain reliable genotypes from those typically encountered forensic case-type samples, a set of 12 case-type samples, including 3 blood stains, 2 muscle samples, 2 hair rooted samples, 1 semen stain, 2 cigarette butts and 2 bone samples, was prepared and quantified as described in our previous study [26]. Artificial degraded samples were prepared by incubating 5 ng Control DNA 007 in a 10 μL reaction with 1 μL 10× DNase I buffer and 2U DNase I (Takara, Dalian, China) at 37 °C for 1, 2 and 4 min [26]. These challenged samples with previously generated STR profiles [26], were sequenced in triplicate (i.e., 9 samples). The MPS results were compared with the STR profiles generated by CE.
2.9 Mixture study Mixture study was performed with mixed Control DNA samples with known ratios of DNA. One male DNA (Control DNA 007) and one female DNA sample (Control DNA 9947A) at a total amount of 1 ng were mixed in the following ratios: 29:1, 19:1, 9:1, 4:1, 1:1, 1:4, 1:9, 1:19 and 1: 29 (in duplicate, i.e., 18 samples).
3. Results and discussion
The Precision ID GlobalFilerTM NGS STR Panel allows for multiplex amplification of 29 autosomal STRs, one Y-STR, one Y-InDel and the sex determination locus. These loci include the expanded CODIS core STR loci [2,3], D1S1677, D2S1776, D3S4529, D4S2408, D5S2800, D6S474, D6S1043, D12ATA63, D14S1434, rs2032678, DYS391 and Amelogenin. The sizes of the targeted amplicons for STRs range from 129 to 250 base pairs (bps). Meanwhile, a new version (V4.0) of the HID STR Genotyper Plugin was developed specifically for analysis of STR sequencing data obtained from the Ion Torrent platforms. In this study, the evaluation of this MPS-STR panel was completed to determine utility in analyzing a set of forensically relevant genetic biomarkers on reference and challenged samples. This study was divided into five sets of experiments in order to examine the Precision ID GlobalFilerTM NGS STR Panel's concordance with CE, reliability, sensitivity, ability to interpret mixtures, and case-type and challenged samples typing. In addition, 106 unrelated Han
8
individuals (Han population, the biggest ethnic group, constitutes about 92% of the total population in China) were sequenced to determine forensic parameters.
3.1. Sequencing results and analysis of data quality Five sequencing runs were performed in total. The first run contained 36 Han samples and Control DNA 007; the mean depth was 1786 × (± 293 ×) and the mapped reads ranged from 54464 to 102956. The second run included 36 Han samples and Control DNA 007; the mean depth was 1841 × (± 329 ×) and the mapped reads ranged from 32874 to 100098. The third run contained 34 Han samples, Control DNA 007 and Control DNA 9947A; the mean depth was 1679 × (± 302 ×) and the mapped reads ranged from 44293 to 100098. The fourth run included 12 case-type samples, 4 samples from the repeatability study, 9 challenged samples and Control DNA 007; the mean depth was 1461× (± 399 ×) and the mapped reads ranged from 34116 to 92959. The last run included 15 samples from sensitivity studies and 18 samples from mixture study; the mean depth was 2538× (± 862 ×) and the mapped reads ranged from 33837 to 147165. Informative metrics, including DoC, ARs and ACRs, were calculated for the sequenced data produced from 106 Han individuals. The average DoC across the 29 autosome STRs was 1527× reads per locus (± 680, within the range of 635× - 3371×). For rs2032678, DYS391 and Amelogenin, the average DoC was 923× (± 221×, within the range of 589× - 1192×), 626× (± 80×, within the range of 516× - 785×) and 3331× (± 747×, within the range of 1939× - 4738×), respectively. Obvious inter-locus imbalance was observed (Fig. 1) and this variation in coverage can impact sample throughput, reminding the manufacturers of the need to further optimize panel primer design and balance primer efficiency. Nevertheless, the depth level in this study is sufficient for reliable MPS-STR genotyping of a single-source reference sample [12,29]. As the Amelogenin locus and Y-InDel (rs2032678) have nearly no stutter and no noise, they were not calculated for ARs. For the other 30 STRs, genuine allele sequence ratios were averaged to 91.9% with the lowest (85.4%) observed at D1S1656 and the highest (98.8%) observed at D3S4529 (Supplemental Fig. 1). Compared with the previous versions (the Ion TorrentTM HID STR 10-plex [9] and the Early Access STR Kit v1 [12]), the ARs of this MPS-STR panel have been improved significantly. Stutter ratio is less than 8% at all loci
9
except D22S1045 (12.4%), D1S1656 (12.2%) and D12S391 (9.1%), and 5.3% of average is a litter lower than the previous study [12]. The vast majority of stutter reads are consisted of minus stutter sequences (N-1 repeat motif). Noise ratio is surprisingly narrow (Supplementary Table 1), and X.2 and X.3 (e.g., 1 bp or 2 bp deletion for a tetranucleotide repeat locus) are the major forms of noise peaks (Supplementary Table 2). For ACRs, a good balance between the amplicons of heterozygous loci could ensure reliable heterozygote genotyping and facilitate mixtures deconvolution. Thirty markers (29 autosomal STRs and the Amelogenin locus) of the 32 markers were analyzed, and ACRs were not calculated for Y-STR (DYS391) and Y-InDel (rs2032678) as they have only one allele. The mean ACR was 0.80, ranging from 0.53 (D12S391) to 0.93 (TPOX) with < 0.60 at three loci, D12S391, D12ATA63 and D10S1248 (Fig. 2). Overall, aforementioned results, in comparison to previous NGS-STR panels [9,12], indicate the significant improvement in the quality of sequencing data and the data analysis. Sequencing chemistry plays an important role in helping to increase the quality of sequencing data [30]. For this MPS-STR panel, a new version of Hi‑ QTM Sequencing Chemistry (the Precision ID STR Hi‑ QTM Sequencing Solutions) has been developed specifically. In addition, the upgrades and improvements in the new version HID STR Genotyper Plugin enhanced the readability and compatibility of sequencing data, and provided flexibility in terms of kit configuration and analysis parameter tuning. Sequence alignment artifacts could be minimized and most noises could be eliminated by the improved alignment algorithm incorporated in the new plugin. However, there is still room for performance improvements, particularly in terms of inter-locus imbalance and biased heterozygous balance, such effects will reduce the sample throughput on one chip and have effects on the success of interpretation.
3.2. Concordance and sequence variation Maintaining the ability to determine the traditional length polymorphism of STRs (CE genotypes) promotes backward compatibility with current forensic DNA databases. Sequencing results were assessed for 106 Han samples, the Control DNA 007 and 9947A by comparison of CE profiles. With MPS, alleles were called when the allele read counts were above 200 (heterozygotes) or 500 (homozygotes). With CE, alleles were called when a RFU signal of 175 or higher was observed. While
10
notable allelic imbalance at D12S391, D12ATA63 and D10S1248 (mean ACR < 0.6) was observed (Fig. 2), genotype results were 100% concordance between the two methods for all overlapping loci (31 loci of 108 samples resulting in 3348 loci). The results demonstrate the utility and compatibility of HID STR Genotyper Plugin V4.0 to analyze sequencing data generated from Ion Torrent sequencing platforms. Next, we examined the sequence variation beyond the nominal repeat-length based allele calls and assessed the impact of this variation on the discriminating power. Theoretically, MPS offers the capability to reveal substantial genetic variation of STRs and thus could detect a broader allele range or allele variants. Uniquely identified alleles by CE method (length polymorphism) comprise 74.2% of the total alleles observed using MPS (sequence polymorphism) in these 29 STRs for the analyzed set of 106 Han samples. However, no new allele was found in this study, and the sequence variation was not evenly dispersed over the loci (Table 1). A full listing of sequence variants obtained at all loci is shown in Supplementary Table 2. Power of discrimination (PD) and power of exclusion (PE) for each locus in 106 Han samples obtained by sequence analysis and CE analysis are also shown in Table 1. The additional sequence variation has the strongest effect on the discriminating power of D21S11 and D12S391 with an average two-fold difference in the match likelihood between the two methods. However, no additional allele variants were observed at 13 STR loci, prior to forensic implementation, a greater number of samples were included in this study will be needed to generate allele frequencies. We are expanding our sequencing efforts to include more samples from distinct ethnic groups to address this need in the near future.
3.2. Reliability study Reliability of the Precision ID GlobalFilerTM NGS STR Panel was evaluated in examining reproducibility and repeatability of genotype calls on the Control DNA 007. Full profiles and reproducible results of the Control DNA 007 (1 ng of input DNA) were obtained across five sequencing runs. Among the four replicates of the Control DNA 007 in Run 4, all 32 loci allele calls (56 alleles) were concordant. The ACRs (24 heterozygous loci) ranged from 0.48 ± 0.03 (D12S391) to 0.95 ± 0.02 (D2S441) (Supplemental Table 3). The variation of ACRs was limited with a standard
11
deviation < 0.15, except for the loci D52800 (± 0.19), D21S11 (± 0.17) and D8S1179 (± 0.15). The inter-locus balance (the lowest coverage locus/the highest coverage locus) was examined among the four reactions of the Control DNA 007, and the mean and standard deviation were 0.15 ± 0.008. Despite the coverage variation among the four replicates being limited (standard deviation < 0.007), seven loci presented coverage ratio < 0.020, D18S51 (0.017), DYS391 (0.012), D1S1667 (0.014), D21S11 (0.014), rs2032678 (0.014), D3S4529 (0.013), FGA (0.012), were detected (Supplemental Table 3). The data support that the Precision ID GlobalFilerTM NGS STR Panel and the Ion PGMTM System can generate consistent results in multiple reactions and different sequencing runs. However, notable inter-locus imbalance was observed, reminding the manufacturer of further optimization.
3.3. Sensitivity The sensitivity of detection of the Precision ID GlobalFilerTM NGS STR Panel was determined by analyzing the Control DNA 007 at five different amounts of input DNA: 1000, 500, 250, 100 and 50 pg pg. The samples with 50 pg of input DNA (in triplicate) yielded the lowest quantity of amplified libraries (7, 9 and 13 pM). Therefore, all 15 libraries for sensitivity study were diluted to 7 pM and input volume was adjusted accordingly for library preparation. The sequencing results at 1.0 ng of input DNA (the recommended input amount) were consistent with the CE genotyping results. The results showed that 1.0 ng of input DNA generated high DoC for all loci (1010× - 7424×), with an average ACR of 0.78 ± 0.15 for 24 heterozygous loci. At 500 pg levels, full genotypes could be achieved and the DoC ranged from 886× to 7012×, with an average ACR of 0.74 ± 0.18. However, obvious imbalanced (< 0.5) ACRs were observed at D12S391 (Fig. 3A). With 250 pg of input DNA, all alleles could be obtained and the DoC ranged from 681× to 6268×, with an average ACR of 0.73 ± 0.21. Notable imbalanced (< 0.3) ACRs were observed at D12S391 (Fig. 3A). When the input DNA was reduced to 100 pg, a specific locus, D12ATA63, showed notable imbalanced (< 0.3) ACRs (Fig. 3B). The DoC ranged from 455× to 4611×, with some allele coverage < 200 reads (heterozygous loci: D12S391 and FGA) or 500 reads (homozygous loci: D3S4529 and Y-InDel). Although coverage was lower than the analytical threshold at these four loci, complete loss of an allele was not observed with 100 pg of input DNA (Supplemental Fig. 2). With 50 pg of input DNA, notable imbalanced ACRs and
12
allele dropout rates were observed (Fig. 3B and Supplemental Fig. 2), which was not suitable for allele calling. These results indicate that 100 pg may be an initial minimum input amount for sequencing but 250 pg and above input DNA would be more reliable.
3.4. Mixtures Forensic biological samples that involve materials originating from more than one individual are commonly encountered in forensic routine casework. Mixture studies can assist to mixture interpretation, including the number of contributors, the major and minor contributor genotypes, and contributor ratios [26]. With MPS, the traditional length polymorphism of STRs can be detected along with elucidation of any sequence variants present within the alleles. Identification of intra-allelic sequence variants could provide opportunities for better mixture deconvolution. For instance, the presence of an intra-allelic sequence variant could help to distinguish a minor contributor from stutter in a mixture or even detect a minor contributor masked by the major contributor [11]. Two Control DNA samples (one female: 9947A and one male: 007) were selected for the mixture study. The DNA ratios were 29:1, 19:1, 9:1, 4:1, 1:1, 1:4, 1:9, 1:19 and 1: 29. The total amount of input DNA for each reaction was 1.0 ng. The detailed genotypes of the Control DNA 007 and 9947A were used to determine at what ratios both contributors could be detected (Supplementary Table 4). In total, 28 alleles from the Control DNA 9947A and 25 alleles from the Control DNA 007 could be considered as non-overlapping alleles (Supplemental Table 4). In this study, minimum coverage threshold was adjusted to 50 for determination of a minor allele. All alleles contributed by 9947A and 007 were detectable in the 1:1 mixture (Fig. 4). All non-overlapping alleles from the minor contributor could be detected in mixtures 4:1 (or 1:4) through 9:1 (or 1:9), except three alleles dropped out (below threshold) in the 1:9 mixture (allele 20 at D12S391, allele 30 at D21S11 and allele 14 at D22S1045) (Fig. 5A). When the ratio was 1:29 or 29:1 mixture, only the alleles of major contributor can be detected at most loci. Fig. 5B shows that 4 alleles at 4 loci were identified from the minor contributor (9947A) of the 29:1 mixture: i.e., allele 10 (111×) at the CSF1PO, allele 13 (213×) at D14S1434, allele 19 (179×) at D2S1338, and allele 14 (224×) at D3S1358. At the 19:1 and 1:19 mixture, 8 of the 25 non-overlapping alleles from Control DNA 9947A and 10 of 28 non-overlapping alleles from Control DNA 007 were
13
found in the mixture (Supplemental Table 4 and Fig. 5C). Calling threshold of minor alleles in this study was set artificially. However, it should be set according to the sequencing noises [31], and thus more work needs to be done to determine the noise levels of each locus. The results indicate that the Precision ID GlobalFilerTM NGS STR Panel and the Ion PGM System could detect partial minor STR profiles up to a 1:19 mixture.
3.5. Case-type and challenged samples The ability to obtain reliable and robust results from typically encountered case-type samples could indicate whether this novel MPS-STR assay has the potential to be used as an efficient tool for human identification. Twelve case-type samples were chosen based on data generated for these samples with typical CE methodology [26]. Each sample yielded clear, complete and concordant genotypes in both MPS and CE methods, indicating that the Precision ID GlobalFilerTM NGS STR Panel has good reproducibility and could adapt to various types of samples. The representative histogram portrayal of one human muscle sample DNA (1.0 ng) is shown in Supplemental Fig. 3. Forensic biological samples might be exposed to unfavorable environmental factors. Artificial digested DNA samples were examined by this novel panel to determine the capability of degraded samples. The reduction of read numbers and allelic dropout began with larger amplicons. For the Control DNA 007 that were incubated for 1 min, all alleles were called with the reduction of allele reads for most loci compared to the undigested Control DNA 007. For 2 min incubation, allelic dropout (< 200 reads) occurred at DYS391, FGA and D12S391 in one or two digested samples. For 4 min incubation, the alleles at FGA, DYS391, rs2032678, D10S1248, D12S391, D18S51 and D21S11 missed in one or two digested samples. Overall, MPS out-performed CE in terms of the number of alleles obtained (Table 2). The likely reason is due to short amplicons in the MPS format. Unlike STR detection in CE, primer design in NGS platforms is more flexible as the length of amplicons can be overlapped and designed in short format.
14
Conclusions
In this study, the performance of the Precision ID GlobalFilerTM NGS STR Panel on the Ion Torrent PGMTM platform was evaluated. The concordance test demonstrated that the genotypes generated by this novel MPS panel were in 100% concordance with those by typical CE methods. The PCR sensitivity study indicated that single source complete genotypes could be obtained using as little as 100 pg of input DNA (recommended 250 pg and above). The genotypes were reproducible and consistent among multiple typing replicates for a Control DNA. Different types of single source samples (blood stains, muscle samples, hair rooted samples, semen stains, cigarette butts and bone samples) were sequenced and obtained full profiles easily, and this system could detect partial STR genotypes of the minor contributor up to a 19:1 mixture in the mixture study. In general, it is a robust and reliable system for human forensics. While the results herein support that STR typing by MPS will become a viable methodology, additional studies still need to be conducted, such as increased sample testing to describe the additional variation of STR alleles, and determine stutter and noise ratios [31]. In addition, the whole design of the next version of the MPS-STR panel should follow the guidelines of ISFG [32].
Conflict of interest The authors declare that they have no conflict of interest.
Acknowledgment This study was supported by grants from the National Natural Science Foundation of People’s Republic of China (No. 81330073 and 81501635) and the Five-Thirteenth National Science and Technology Support Program of China (2016YFC0800703). We would like to thank Ying Wang, Wei Wu and Yu Liang (Thermo Fisher Scientific) for the support of reagents for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
15
References [1] J.M. Butler, Genetics and genomics of core short tandem repeat loci used in human identity testing. J. Forensic Sci. 51 (2006) 253–265. [2] D.R. Hares, Expanding the CODIS Core Loci in the United States, Forensic Sci. Int. Genet. 6 (2012) e52–54. [3] D.R. Hares, Selection and implementation of expanded CODIS core loci in the United States, Forensic Sci. Int. Genet. 17 (2015) 33–34. [4] M.A. Jobbing, P. Gill, Encoded evidence: DNA in forensic analysis, Nat. Rev. Genet. 5(2004) 739–751. [5] M.Kayser, P. de Knijff, Improving human forensics through advances in genetics, genomics and molecular biology, Nat. Rev. Genet. 12(2011) 179–192. [6] K.V. Voelkerding, S.A. Dames, J.D. Durtschi, Next-generation sequencing: from basic research to diagnostics, Clin. Chem. 55 (2009) 641–658. [7] C. Børsting, N. Morling, Next generation sequencing and its applications in forensic genetics, Forensic Sci. Int. Genet. 18 (2015) 78-89. [8] C. Gelardi, E. Rockenbauer, S. Dalsgaard, C. Børsting, N. Morling, Second generation sequencing of three STRs D3S1358, D12S391 and D21S11 in Danes and a new nomenclature for sequenced STR alleles, Forensic Sci. Int. Genet. 12 (2014) 38–41. [9] S.L. Fordyce, H.S. Mogensen, C. Børsting, R.E. Lagacé, C.W. Chang, N. Rajagopalan, et al., Second-generation sequencing of forensic STRs using the Ion Torrent TM HID STR 10-plex and the Ion PGMTM, Forensic Sci. Int. Genet. 14 (2015) 132–140. [10] X. Zeng, J.L. King, M. Stoljarova, D.H. Warshauer, B.L. LaRue, A. Sajantila, et al., High sensitivity multiplex short tandem repeat loci analyses with massively parallel sequencing, Forensic Sci. Int. Genet. 16 (2015) 38–47. [11] X. Zeng, J. King, S. Hermanson, J. Patel, D.R. Storts, B. Budowle, An evaluation of the PowerSeqTM Auto System: A multiplex short tandem repeat marker kit compatible with massively parallel sequencing, Forensic Sci. Int. Genet. 19 (2015) 172-179.
16
[12] F. Guo, Y. Zhou, F. Liu, J. Yu, H. Song, H. Shen, ET AL., Evaluation of the Early Access STR Kit v1 on the Ion Torrent PGMTM platform, Forensic Sci. Int. Genet. 23 (2016) 111-120. [13] J.D. Churchill, S.E. Schmedes, J.L. King, B. Budowle B, Evaluation of the Illumina(®) Beta Version ForenSeqTM DNA Signature Prep Kit for use in genetic profiling, Forensic Sci. Int. Genet. 20 (2016) 20-29. [15] K.B. Gettings, K.M. Kiesler, S.A. Faith, E. Montano, C.H. Baker, B.A. Young, et al., Sequence variation of 22 autosomal STR loci detected by next generation sequencing, Forensic Sci. Int. Genet. 21 (2016) 15-21. [16] E.H. Kim, H.Y. Lee, I.S. Yang, S.E. Jung, W.I. Yang, K.J. Shin, Massively parallel sequencing of 17 commonly used forensic autosomal STRs and amelogenin with small amplicons, Forensic Sci. Int. Genet. 22 (2016) 1-7. [17] X. Zhao, H. Li, Z. Wang, K. Ma, Y. Cao, W. Liu, Massively parallel sequencing of 10 autosomal STRs in Chinese using the ion torrent personal genome machine (PGM), Forensic Sci. Int. Genet. 25 (2016) 34-38. [18] S.B. Seo, J.L. King, D.H. Warshauer, C.P. Davis, J. Ge, B. Budowle, Single nucleotide polymorphism typing with massively parallel sequencing, Int. J. Legal Med. 127 (2013) 1079–1086. [19] C. Børsting, S.L. Fordyce, J. Olofsson, H.S. Mogensen, N. Morling, Evaluation of the Ion Torrent HID SNP 169-plex: a SNP typing assay developed for human identification by second generation sequencing, Forensic Sci. Int. Genet. 12 (2014) 144–154. [20] M. Eduardoff, C. Santos, M. de la Puente, T.E. Gross, M. Fondevila, C. Strobl, et al., Inter-laboratory evaluation of SNP-based forensic identification by massively parallel sequencing using the Ion PGMTM, Forensic Sci. Int. Genet.17 (2015) 110-121. [21] M. Eduardoff, T.E. Gross, C. Santos, M. de la Puente, D. Ballard, C. Strobl, et al., Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGMTM, Forensic Sci. Int. Genet. 23 (2016) 178-189. [22] Z. Wang, D. Zhou, Y. Cao, Z. Hu, S. Zhang, Y. Bian, et al., Characterization of microRNA expression profiles in blood and saliva using the Ion Personal Genome Machine® System (Ion PGMTM System), Forensic Sci. Int. Genet. 20 (2016) 140–146.
17
[23] J.L. King, B.L. LaRue, N.M. Novroski, M. Stoljarova, S.B. Seo, X. Zeng, et al., High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq, Forensic Sci. Int. Genet. 12 (2014) 128–135. [24] J.A. McElhoe, M.M. Holland, K.D. Makova, M.S. Su, I.M. Paul, C.H. Baker, et al., Development and assessment of an optimized next-generation DNA sequencing approach for the mtgenome using the Illumina MiSeq, Forensic Sci. Int. Genet. 13 (2014) 20–29. [25] W. Parson, C. Strobl, G. Huber, B. Zimmermann, S.M. Gomes, L. Souto, et al., Reprint of: evaluation of next generation mtGenome sequencing using the Ion Torrent Personal Genome Machine (PGM), Forensic Sci. Int. Genet. 7 (2013) 632–639. [26] Z. Wang, D. Zhou, Z. Jia, L. Li, W. Wu, C. Li, et al., Developmental Validation of the Huaxia Platinum System and application in 3 main ethnic groups of China, Sci. Rep. 6 (2016) 31075. [27] B.F. Zhu, Y.D. Zhang, C.M. Shen, W.A. Du, W.J. Liu, H.T. Meng, et al., Developmental validation of the AGCU 21+1 STR kit: a novel multiplex assay for forensic application, Electrophoresis 36 (2015): 271-276. [28] C. Phillips, W. Parson, J. Amigo, J.L. King, M.D. Coble, C.R. Steffen, et al., D5S2500 is an ambiguously characterized STR: Identification and description of forensic microsatellites in the genomics age, Forensic Sci. Int. Genet. 23 (2016) 19-24. [29] K.J. van der Gaag, R.H. de Leeuw, J. Hoogenboom, J. Patel, D.R. Storts, J.F. Laros, et al., Massively parallel sequencing of short tandem repeats-Population data and mixture analysis results for the PowerSeqTM system, Forensic Sci. Int. Genet. 24 (2016) 86-96. [30] J.D. Churchill, J.L. King, R. Chakraborty, B. Budowle, Effects of the Ion PGM TM Hi-QTM sequencing chemistry on sequence data quality, Int. J. Legal Med. 30 (2016) 1169-1180. [31] B. Young, J.L. King, B. Budowle, L. Armogida, A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis, PLoS One 12 (2017): e0178005. [32] W. Parson, D. Ballard, B. Budowle, J.M. Butler, K.B. Gettings, P. Gill, et al., Massively parallel sequencing of forensic STRs: Considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements, Forensic Sci. Int. Genet. 22 (2016) 54-63.
18
Fig. 1. Average DoC for the 32 loci in the Precision ID GlobalFilerTM NGS STR Panel for the 106 Han individuals amplified with 1.0 ng of DNA. Error bars represent standard deviation.
Fig. 2. Average ACRs for the 30 STRs in the Precision ID GlobalFiler TM NGS STR Panel for the 106 Han individuals amplified with 1.0 ng of DNA. ACRs were calculated by dividing the lower coverage allele by the higher coverage allele at that locus. Error bars represent standard deviation.
Fig. 3. Allelic imbalance and dropout at D12S391 and D12ATA63 in the sensitivity test. (A) As the template DNA decreased, peak height imbalance increased at D12S391 and one allele dropout was observed at 50 pg. (B) Notable allelic imbalance was observed at D12ATA63 with 100 pg of input DNA and one allele dropout occurred when the template DNA reduced to 50 pg. X axis are the alleles for a given locus, Y axis is the coverage for a given allele. The analytical thresholds are 200 for heterozygotes and 500 for homozygotes. A: allele, S: stutter, N: noise.
Fig. 4. A histogram portrayal of the allele Doc ratio by locus of 1:1 mixture of two Control DNAs (9947A and 007). S: stutter, N: noise.
Fig. 5. A histogram portrayal of allelic dropout and minor alleles detection in mixture test. (A) Allelic dropouts were observed at D12S391 (allele 20), D21S11 (allele 30) and D22S1045 (allele 14) in the 1:9 mixture. (B) The alleles (allele 10, 13, 14, 19 at CSF1PO, D14S1434, D3S1358 and D2S1338, respectively) of minor contributor were detected in the 29:1 mixture. (C) Eight non-overlapping alleles of minor contributor were identified in the 19:1 mixture. The minimum coverage threshold is 50 for determination of a minor allele. The arrows with M indicate alleles from the minor contributor. D: dropout, S: stutter, N: noise.
19
20
21
22
Table 1 Locus statistics for CE and MPS analysis of the 106 Han samples. Locus
Numbers of alleles
Numbers a
of alleles b
Increment
PD a
PE a
PD b
PE b
D12S391
12
31
+19 (158.3%)
0.947
0.794
0.974
0.850
D2S1338
11
27
+16 (145.5%)
0.960
0.812
0.970
0.850
D21S11
14
32
+18 (128.6%)
0.929
0.648
0.977
0.812
D3S1358
5
9
+4 (80.0%)
0.841
0.530
0.905
0.648
D8S1179
9
16
+7 (77.8%)
0.956
0.719
0.971
0.831
vWA
7
12
+5 (71.4%)
0.926
0.665
0.931
0.683
D1S1656
11
15
+4 (36.4%)
0.946
0.648
0.952
0.649
CSF1PO
8
9
+1 (12.5%)
0.886
0.498
0.892
0.514
D2S441
9
12
+3 (33.3%)
0.905
0.410
0.942
0.579
D4S2408
6
8
+2 (33.3%)
0.890
0.438
0.920
0.530
D3S4529
6
7
+1 (16.7%)
0.892
0.579
0.894
0.579
D12ATA63
7
8
+1 (14.3%)
0.889
0.453
0.891
0.454
D5S2800
7
8
+1 (14.3%)
0.846
0.498
0.862
0.498
D6S1043
11
12
+1 (14.3%)
0.960
0.775
0.960
0.775
D7S820
7
8
+1 (14.3%)
0.921
0.613
0.921
0.613
D14S1434
8
9
+1 (12.5%)
0.844
0.613
0.845
0.613
D10S1248
7
7
- (0%)
0.891
0.468
0.891
0.468
D13S317
8
8
- (0%)
0.928
0.596
0.928
0.596
D16S539
7
7
- (0%)
0.918
0.530
0.918
0.530
D18S51
11
11
- (0%)
0.955
0.756
0.955
0.756
D19S433
11
11
- (0%)
0.947
0.738
0.947
0.738
D1S1677
8
8
- (0%)
0.800
0.370
0.800
0.370
D22S1045
7
7
- (0%)
0.876
0.483
0.876
0.483
D2S1776
8
8
- (0%)
0.859
0.630
0.859
0.630
D5S818
8
8
- (0%)
0.915
0.579
0.915
0.579
23
D6S474
5
5
- (0%)
0.841
0.498
0.841
0.498
FGA
15
15
- (0%)
0.957
0.630
0.957
0.630
TH01
6
6
- (0%)
0.829
0.453
0.829
0.453
TPOX
5
5
- (0%)
0.784
0.226
0.784
0.226
a
Obtained by length (CE).
b
Obtained by sequence (MPS). Allele variants are treated as different alleles.
Table 2 The total number of alleles observed with CE and MPS for the degraded samples. Samples
Alleles observed with CE a
Alleles observed with MPS
1 min incubation
46 (100%)
56 (100%)
1 min incubation
46 (100%)
56 (100%)
1 min incubation
46 (100%)
56 (100%)
2 min incubation
45 (97.8%)
54 (96.4%)
2 min incubation
45 (97.8%)
55 (98.2%)
2 min incubation
45 (97.8%)
55 (98.2%)
4 min incubation
36 (78.3%)
50 (89.3%)
4 min incubation
35 (76.1%)
49 (87.5%)
4 min incubation
37 (80.4%)
50 (89.3%)
a
Results from the Huaxia Platinum System.
24