Forensic Science International: Genetics 1 (2007) 180–185 www.elsevier.com/locate/fsig
Evaluation of the Genplex SNP typing system and a 49plex forensic marker panel C. Phillips a,*, R. Fang b, D. Ballard c, M. Fondevila a, C. Harrison c, F. Hyland b, E. Musgrave-Brown c, C. Proff d, E. Ramos-Luis a, B. Sobrino a, A. Carracedo a, M.R. Furtado b, D. Syndercombe Court c, P.M. Schneider d The SNPforID Consortium a
Forensic Genetics Department, Genomic Medicine Group, University of Santiago de Compostela, Galicia, Spain b Applied Markets Group, Applied Biosystems, Foster City, CA, USA c Department of Haematology, ICMS, Queen Mary’s School of Medicine & Dentistry, London E1 2AT, UK d Institute of Legal Medicine, University of Cologne, Germany Received 29 January 2007; accepted 3 February 2007
Abstract Using a 52 SNP marker set previously developed for forensic analysis, a novel 49plex assay has been developed based on the Genplex typing system, a modification of SNPlexTM chemistry (both Applied Biosystems) using oligo-ligation of pre-amplified DNA and dye-labeled, mobility modified detection probes. This gives highly predictable electrophoretic mobility of the allelic products generated from the assay to allow detection with standard capillary electrophoresis analyzers. The loci chosen comprise the 48 most informative autosomal SNPs from the SNPforID core discrimination set supplemented with the amelogenin gender marker. These SNPs are evenly distributed across all 22 autosomes, exhibit balanced polymorphisms in three major population groups and have been previously shown to be effective markers for forensic analysis. We tested the accuracy and reproducibility of the Genplex system in three SNPforID laboratories, each using a different Applied Biosystems Genetic Analyzer. Genotyping concordance was measured using replicates of 44 standardized DNA controls and by comparing genotypes for the same samples generated by the TaqMan1, SNaPshot1 and Sequenom iPLEX1 SNP typing systems. The degree of informativeness of the 48 SNPs for forensic analysis was measured using previously estimated allele frequencies to derive the cumulative match probability and in paternity analysis using 24 trios previously typed with 18 STRs together with three CEPH families with extensive sibships typed with the 15 STRs in the Identifiler1 kit. # 2007 Elsevier Ireland Ltd. All rights reserved. Keywords: SNP; Genotyping; Oligo-ligation assay; OLA; Genplex
1. Introduction The SNPforID consortium (www.snpforid.org) has been funded by the EU Growth programme to develop single nucleotide polymorphisms (SNPs) for forensic use. A major outcome of the development work of SNPforID was the creation of a core single-tube 52plex PCR that can be used as the preparatory step in a variety of assays, each based on different SNP genotyping chemistries [1]. Although the
* Corresponding author. Tel.: +34 981582327; fax: +34 981580336. E-mail address:
[email protected] (C. Phillips). URL: www.snpforid.org 1872-4973/$ – see front matter # 2007 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.fsigen.2007.02.007
consortium chose the SNaPshot1 (AB: Applied Biosystems, Foster City, CA, USA) single base extension system as the ‘‘benchmark’’ technique for analysis of forensic samples and for the allele frequency validation process [2], peak imbalances and high background signal with this assay have hampered its application in the analysis of challenging DNA. Furthermore despite considering the AB SNPlexTM oligo-ligation system as a potentially useful technique since it can genotype up to 48 SNPs, it was evident that this approach lacks sufficient sensitivity for forensic analysis, requiring 37 ng of DNA to ensure successful ligation. Using oligo-ligation as a forensic technique only became a viable prospect once the system was adapted to begin with a pre-amplification of SNPs prior to the ligation stage. This is easier to achieve with fixed SNP sets
C. Phillips et al. / Forensic Science International: Genetics 1 (2007) 180–185
where the pre-amplification step can be carefully adjusted to give a balanced yield of target DNAs for ligation. Therefore the appropriate step was to incorporate the previously optimized 52plex PCR as the first stage in an adapted SNPlexTM system (termed Genplex) using a universal dye-labeled probe set to detect the alleles of each SNP with capillary electrophoresis analyzers. This report details the Genplex assay developed by AB to genotype 48 of the 52 SNPforID loci (plus the amelogenin gender marker) and outlines the concordance studies performed to measure the accuracy of the technique. In addition, estimates are given of the informativeness of the 48SNP set in comparison to widely used STR marker sets from the standard parameters of match probabilities in European and African populations together with exclusion probabilities and paternity indices estimated from both normal trios and extended families that allowed first-degree relatives of the true father to be compared. 2. Materials and methods 2.1. Selection of SNP markers for Genplex Genplex routinely analyzes 48 SNPs so 4 of the original 52 SNP markers were not incorporated: rs2016276, rs826472, rs2830795 and rs1028528. SNP rs2016276 had several proximal SNPs, previously avoided in the SNaPshot1 extension primer design, that were likely to interfere with the binding of ligation primers on both sides of the substitution site. The other three SNPs had been given revised positions in dbSNP build 118 (NCBI genome build 34) that made them less well separated from other more informative SNPs retained in the Genplex set. Amelogenin was added as a gender marker with the assay able to successfully differentiate the 6 base pair (bp) deletion of the X chromosome from the full Y sequence with the same oligo-ligation chemistry used to analyze the SNP substitutions.
181
2.3. The Genplex assay The main principle of Genplex and SNPlexTM assay systems is to identify the products of a 48plex oligo-ligation assay (OLA) using probes specific to each SNP/allele combination [3]. Specificity is controlled using a standard set of 96 nonhuman sequences: one at the end of each allele-specific oligo. These sequences in turn hybridize complementary oligos termed ZipChute1 probes that carry FAMTM or dR6G dye labels and proprietary mobility modifiers. The Genplex assay comprises the following reaction steps (outlined in Fig. 1, with reference to the numbered steps below) performed in 96-well micro-titre plate (MTP) format: (i) PCR: pre-amplification of 48 SNPs used the same primers developed for the core discrimination set PCR designed to give amplicon lengths 59–115 bp as previously described [1] plus a primer pair for amelogenin developed de novo by AB (giving a 170 bp product). (ii) Post PCR cleanup: removal of unincorporated bases and primers with ExoSAP-IT1 (USAB Corp., Cleveland, OH, USA). (iii) OLA: ligation of biotinylated locus-specific oligos (LSOs) binding to sequence immediately downstream of the SNP site (20 bases) with allele-specific oligos (ASOs) binding directly to the alleles and upstream sequence. Each ASO identified by a pre-assigned reporting sequence complimentary to a ZipChute1 detection probe. (iv) Binding of OLA products to capture plate: a streptavidincoated MTP binds the biotinylated OLA products to provide a solid phase for ZipChute1 hybridization.
2.2. DNA samples Genotyping concordance was measured using a standardized Applied Biosystems control plate available as a common reference panel for the SNPlexTM system (PN 4366135) and containing 44 duplicated DNA samples sourced from the European Collection of Cell Cultures (www.ecacc.org.uk). The control plate was typed with Genplex in all three laboratories to gauge between-replicates concordance. In addition the plate was parallel genotyped with TaqMan1 assays as part of the assay optimization process at AB, and in one of the SNPforID laboratories with SNaPshot1 and Sequenom iPLEX1 MALDITOF SNP genotyping systems for cross-platform concordance analysis. The informativeness of the 48 SNPs in paternity analysis was studied using 24 German paternity trios previously analyzed with 18 STRs together with 3 extended CEPH Utah families: 1333; 1340 and 1345 each comprising three generations of European origin and sibships of 9, 4 and 7 offspring, respectively (http://ccr.coriell.org/nigms/nigms_cgi/ fam.cgi?1333).
Fig. 1. Graphical summary of Genplex assay steps. Numerals refer to the description in the text of each step.
182
C. Phillips et al. / Forensic Science International: Genetics 1 (2007) 180–185
(v) ZipChute1 probe hybridization: captured OLA products are hybridized with a set of ZipChute1 identifying probes each with a designated mobility and colour label regulated by proprietary chemical mobility modifiers and FAMTM/ dR6G dyes. (vi) ZipChute1 elution and electrophoresis: the ZipChutes1 are eluted from capture plates and combined with sample loading mix containing LIZ1-labeled size standard. Samples are transferred to the electrophoresis plate with one negative target control (NTC) and multiple allelic ladder wells. Products are separated with capillary electrophoresis using POP-7TM polymer for 15 min. The oligo-ligation reagents, capture plates, ZipChute1 probe set, size ladder and allelic ladder are common, universal components from the SNPlexTM system. Components specific to Genplex and this SNP combination are therefore the PCR primer pool and the oligo-ligation oligonucleotide pool. The additional amelogenin marker made use of extra ZipChute1 probes that have been retained in the SNPlexTM system. 2.4. Capillary electrophoresis and GeneMapperTM analysis software Electrophoresis of Genplex and SNPlexTM products requires a 3130 or 3730 Genetic Analyzer, POP-7TM polymer and GeneMapperTM v.4.0 analysis software. A different AB Genetic Analyzer was used in each SNPforID laboratory: a 3730xl (96 capillaries), a 3130xl (16) and a 3130Avant (4). GeneMapperTM software performs the same role as GeneScan1 and Genotyper1 in routine use for STR analysis and similarly assigns alleles on the basis of relative peak heights. However rather than being based on user-defined peak height ratios that would normally take into account non-specific baseline signals and PCR byproducts such as stutter peaks, GeneMapperTM software automatically types SNP alleles by comparing the positions of sets of data-points in three clusters (two homozygotes and one heterozygote) on a 2D plot of log signal strength and peak height ratio. This is the optimum approach to automated SNP typing with OLA for two reasons: peak heights from ligation products rarely show perfect balance in heterozygous samples and homozygotes often give minor non-specific signals (typically a peak of significant but consistent height in the position of the absent allele). Cluster plot analysis in GeneMapperTM software allows the comparison of multiple data-points either from the same run or imported from previous runs to reveal that the heterozygote imbalance or non-specific signal is consistent for any one SNP across all samples and all assay runs. Automatic allele calls can be manually edited from visual inspection of each cluster plot and Genplex offers the advantage of using fixed SNP sets that will show cluster plot characteristics of increasing familiarity to the user. 2.5. Alternative SNP genotyping systems TaqMan1 real-time PCR assays were designed for 43 of the SNPs using standard protocols [4]. SNaPshot1 genotyping was
performed using a single 52plex PCR amplification and two tandem 23plex and 29plex primer extension reactions as previously described [1]. The third alternative system: Sequenom iPLEX1 MALDI-TOF spectrometry [5] uses a multiplexed primer extension reaction (typically 24–28plex) with mass-modified terminating nucleotides. This allows finetuning of the resolution between each extension product in the mass spectrum together with a broadened range of product peaks within the lower and upper mass limits. Two iPLEX1 assays typed 39 of the 48 Genplex SNPs. German paternity trios were typed using SEfilerTM (AB) and Powerplex 16TM (Promega Corp., Madison, WI, USA) STR systems, and the CEPH families typed using the Identifiler1 STR system (AB). 2.6. Forensic and paternity analysis informativeness metrics The informativeness of the 48-SNP set for forensic profiling was measured by estimating the match probability, given by the combined probabilities for each locus: p4 + 4p2q2 + q4, with allele frequencies p and q. The ability of SNPs and STRs to distinguish first-degree relatives was assessed by recording the genotype differences observed in the CEPH families between full sibs and unrelated individuals. Statistical analysis of STR and SNP genotypes of the German paternity trios was carried out using DNAVIEW v.28.01 (C.H. Brenner, Oakland, CA, USA). Paternity informativeness statistics derived from the CEPH family data comprised: the average probability of exclusion (using the 20 offspring), LODfather (i.e., likelihood of data if the individual is the father, equivalent to paternity index with known mother) and the value: [LODfather LODnext closest relative], a parameter that measures the ability of a set of loci to differentiate the true father from wrongly named first-degree relatives. The paternity metrics were all calculated using CERVUS v2.0 (http://helios.bto.ed.ac.uk/evolgen/cervus) a program that specializes in examining the feasibility of novel marker sets for paternity analysis. 3. Results and discussion Table 1 summarizes the genotyping success of Genplex achieved in the three laboratories and concordance values with replicated samples and across platforms. Complete concordance was observed between Genplex and the three alternative platforms tested. Typical cluster plots and sample electropherograms are shown in Fig. 2. These were consistent in pattern between all three laboratories, i.e., individual SNPs exhibited very similar cluster distributions, underlining the principle that comparative peak heights are influenced more by the dynamics of PCR and ligation at each locus than by factors specific to an assay run, although the same does not apply to signal strength. The inclusion of a minimum total number of data-points (20 or more) was an essential factor in generating good quality clusters and therefore reliable automated allele calls. One sample pair in the control plate, L38, showed peak height imbalance for SNP rs938283 in both Genplex (Fig. 2F) and iPLEX1 assays leading to data-points being excluded from
C. Phillips et al. / Forensic Science International: Genetics 1 (2007) 180–185
183
Table 1 Genotyping success and concordance for Genplex between replicated controls, plus between Genplex and three alternative SNP typing systems Concordance study performed
SNPs assayed
Paired genotype comparisons
Failed genotypes
% genotyping success
Discordant genotypes
% concordance
Genplex between-replicates laboratory 1 Genplex between-replicates laboratory 2 Genplex between-replicates laboratory 3 Genplex:TaqMan1 Genplex:SNaPshot1 Genplex:iPLEX1
48 48 43 43 48 39
2112 2112 1892 989 2064 1677
3 1 141 11 21 150
99.93 99.98 96.27 99.44 99.49 95.53
0 0 0 0 0 0
100 100 100 100 100 100
Fig. 2. Typical cluster plots and examples of corresponding electropherogram segments for six SNPs genotyped by Genplex and analyzed by GeneMapperTM v.4.0 software. Plots A and B represent the two extremes of heterozygote imbalance found in this SNP set (bias towards higher allele 2 and allele 1 signals, respectively). All electropherograms are set at the same vertical scale of 3000 RFUs. Plot F shows two outlier data-points (light blue crosses) for SNP rs938283 observed in all three laboratories for control DNA L38.
184
C. Phillips et al. / Forensic Science International: Genetics 1 (2007) 180–185
Fig. 3. Cumulative match probability values plotted for increasing numbers of loci from the 48 Genplex SNPs. Horizontal lines show equivalent values for two widely used STR sets, indicating for example, SGM+ has comparable discrimination power in Europeans to 34 SNPs.
the nearest cluster in both platforms. The flanking sequence of rs938283 in this sample is currently being investigated for possible proximal SNP sites previously not observed by SNPforID during the 52-SNP validation process [1]. The 48 SNPs in Genplex give match probabilities of 9.6 10 18 for Europeans and 6.9 10 16 for Africans using relevant estimates from the allele frequency browser [2]. Plots of log10 match probability with increasing SNP number are given in Fig. 3 with estimates for two widely used STR sets included for comparison. Overall, the Genplex SNP set provides match probabilities comparable to the most informative STR set of Identifiler1, although lower variability in Africans for some of the SNPs reduces informativeness slightly. Pair-wise analysis of the 40 CEPH family members gave a mean number of SNP loci with different genotypes of
28.4 3.6 in unrelated individuals and 20 2.9 in full sibs, with a minimum 15 loci different. The equivalent mean numbers for STR loci are 14.3 1.2 and 10.9 2.4 with a minimum 7 loci different. Pair-wise comparison grids and plots of loci showing the varying numbers of genotype differences are outlined in Fig. 4. The power of the 48-SNP set in paternity analysis compared to STR marker sets was assessed in both the German trios and the CEPH families. One noteworthy result observed in three separate German trios was the occurrence of single second order exclusions in STR systems: D21S11 (maternal mutation, PI = 2.88E+10), FGA (paternal null allele, residual PI = 1.94E+10) and SE33 (paternal mutation, residual PI = 6.37E+10). All trios, including the above three, showed consistent SNP results with an average PI of 1.21E+6 for SNPs
Fig. 4. Pair-wise comparison grids and distributions of genotype difference values with 15 Identifiler1 STRs and 48 Genplex SNPs for the 40 individuals in the three CEPH extended families studied. Sib pairs are marked in the right-hand SNP grid with red crosses.
C. Phillips et al. / Forensic Science International: Genetics 1 (2007) 180–185
185
Fig. 5. Median LODfather and [LODrfather LODnext closest relative] values (midline in each box-plot) for 48 Genplex SNPs and 15 Identifiler1 STRs obtained from comparison of parents/grand-parents of CEPH families as putative fathers in paternity analysis. Boxes represent the lower and upper 25% quartiles of the values obtained and extended bars the full data range.
in comparison to 2.12E+11 for 18 STR loci. Average exclusion probabilities were 9.5 10E 4 for SNPs and 7.9 10E 9 for STRs. These values indicate that SNPs show much lower power than STRs in paternity analysis, but the observation of three STR incompatibilities in such a small sample underlines the greater instability of tandem repeat loci compared with substitution polymorphisms (although 18 STRs is a larger number than normally used). As with the German trios, CEPH family SNP genotyping was less informative than STRs with an average SNP PI of 4.16E+4 and an average STR PI of 2.07E+6. The average exclusion probabilities were 2.1 10E 5 for SNPs and 9 10E 7 for the 15 STRs of Identifiler1. The LOD values obtained with the CEPH families are plotted in Fig. 5: median LODfather values of 10.22 were obtained for SNPs and 12.96 for STRs. In contrast, when an analysis of CEPH family members related to the true father was made, the much larger number of loci in the SNP set compared to 15 STRs led to increased power to distinguish related individuals in paternity analysis: median [LODfather LODnext closest relative] values obtained were 9.24 for SNPs and 4.96 for STRs. As a test of the reproducibility of Genplex, parallel genotyping of standardized test samples in three laboratories, two without prior experience of oligo-ligation, showed this assay to be both reliable and robust. Generally there was little need for intervention to edit the automatic allele calls made by GeneMapperTM software. Comparison of genotyping performance with the same samples on alternative platforms indicated Genplex to be completely concordant with established systems and the most successful approach to SNP genotyping of the four. Estimates of the informativeness of the 48 SNPs successfully incorporated into the Genplex assay indicate that these loci offer comparable power to STRs for routine forensic
profiling and while less informative for normal paternity analysis clearly provide greater ability to distinguish closely related individuals as putative fathers than STR analysis. The second phase of the validation of Genplex by SNPforID concentrates on the forensic performance of this system and will be reported in due course. Indications from the study detailed here already suggest that Genplex is likely to offer a comparable, if not enhanced, alternative to SNaPshot1 for forensic SNP genotyping. Acknowledgement The authors wish to thank Maria Torres and Ines Quintela, University of Santiago de Compostela, for performing the Sequenom iPLEX1 genotyping, Gabi Fo¨rster, Institute of Legal Medicine, Cologne, for excellent technical assistance, and Yogesh Prasad and Michael Rhodes of AB for their guidance and help with the use of GeneMapperTM software. References [1] J.J. Sanchez, C. Phillips, C. Borsting, K. Balogh, M. Bogus, M. Fondevila, C.D. Harrison, E. Musgrave-Brown, A. Salas, D. Syndercombe-Court, P.M. Schneider, A. Carracedo, N. Morling, A multiplex assay with 52 single nucleotide polymorphisms for human identification, Electrophoresis 27 (9) (2006) 1713–1724. [2] The SNPforID allele frequency browser: http://bioinformatics.cesga.es/ snpforid/search.php. [3] AB SNPlexTM chemistry guide pdf: http://docs.appliedbiosystems.com/ pebiodocs/04360856.pdf. [4] AB TaqMan1 chemistry guide pdf: http://docs.appliedbiosystems.com/ search-dodnum.taf?dodnum=4348358. [5] Sequenom iPLEX1 application note pdf: http://www.sequenom.com/ Assets/pdfs/appnotes/8876-006.pdf.