Forensic Science International: Genetics Supplement Series xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Forensic Science International: Genetics Supplement Series journal homepage: www.elsevier.com/locate/fsigss
A microhaplotypes panel for forensic genetics using massive parallel sequencing ⁎
Chiara Turchi , Mauro Pesaresi, Adriano Tagliabracci Section of Legal Medicine, Department of Biomedical Sciences and Public Health, Polytechnic University of Marche, Ancona, Italy
A R T I C L E I N F O
A B S T R A C T
Keywords: Microhaplotypes Massive parallel sequencing Forensic genetic
Microhaplotypes (microhaps) are defined as loci of two or more SNP within the span of a single sequencing run with three or more common allelic combination (haplotypes) of the SNPs. Microhaps appear to be useful in forensics for individual identification, ancestry inference, estimating relationships, and deconvoluting mixtures. The most important issue is identifying and characterizing a set of microhaps with the optimum characteristics for specific purposes and developing a suitable genotyping technology. The MPS technologies are now making microhaplotypes a new type of forensic marker: a single sequence read can cover the expanse of the microhaplotypes and these loci become phase-known. In the present study we selected a panel of 89 microhaps, from The ALlele FREquency Database and we evaluated these loci on 73 Italian samples. The results make the panel very informative for resolving DNA mixture, for identification and for recostruction of biological relationships. Microhaps will become a very promising new type of forensic marker when MPS is the technology being used for genotyping.
1. Introduction The introduction of massive parallel sequencing technology in forensic genetic field has opened a new possibility in forensic DNA typing. Besides multiplexing of the existing STR markers, along with a various numbers of SNPs and InDels, MPS methodology make it possible to explore a new type of genetic marker, known as microhaplotype. Microhaplotype locus (or microhaps) is defined by at least two single nucleotide polymorphisms (SNPs) within the length of a sequence read and the expectation of a very low recombination rate [1] The alleles at microhap locus are defined by allelic combination of each SNPs and are referred as haplotypes. Microhaplotypes were been recently introduced in the landscape of forensic genetic. To date about 148 microhaps were been evaluated in the human genome and were annotated in the ALlele FREquency Database (https://alfred.med.yale.edu/), identified by specific nomenclature system [2]. Recently the 130 microhaps were studies in 83 different populations using TaqMan assay and haplotypes frequencies were inferred by phasing the individual SNPs [1]. MPS is the only methodology that will directly yield the phase. Indeed a single sequence read can cover the expanse of the microhaplotypes and these loci become phase-known. Microhaps appear to be useful in forensics for identification and lineage/family relationships. Moreover, they can provide information
⁎
on biogeographic ancestry and can be useful for both detecting and deconvoluting mixtures of DNA. In this study, we search for microhaplotypes that will be particularly informative for mixture detection and deconvolution and identification of close biological relationships. The relative value of microhaps for these purposes can be estimated using the effective number of alleles (Ae) that represents the number of equally frequent allele that would generate the same heterozygosity as the locus with multiple alleles at very different frequencies. 2. Material studied, methods, techniques We selected microhaps, from The ALlele FREquency Database (https://alfred.med.yale.edu/), that matched the following criteria: 1comprised of three, four or five SNPs; 2- comprised of 2-SNPs but with Global Average Effective Number of Alleles (Ae) ranking < =60. A total of 89 microhaps were selected, spread across 22 human autosomes. The loci range from 18 bp to 279 bp, with only three that spanning greater than 200 bp. Two primer pools for multiplex PCR reactions were design, with amplicons ranged between 199 bp and 374 bp. Libraries for next generation sequencing analysis were prepared with Ion AmpliSeq Library kit 2.0. After PCR the two multiplex reactions were pooled and processed as a unique sample. Samples were barcoded with Ion Xpress
Corresponding author. E-mail address:
[email protected] (C. Turchi).
http://dx.doi.org/10.1016/j.fsigss.2017.09.035 Received 27 August 2017; Accepted 11 September 2017 1875-1768/ © 2017 Elsevier B.V. All rights reserved.
Please cite this article as: Turchi, C., Forensic Science International: Genetics Supplement Series (2017), http://dx.doi.org/10.1016/j.fsigss.2017.09.035
Forensic Science International: Genetics Supplement Series xxx (xxxx) xxx–xxx
C. Turchi et al.
Fig. 1. Distribution of 88microhaplotypes by their affective number of alleles (Ae).
by their effective number of alleles. 33 loci have values greater than 3. More important are the subset of 17 microhaplotype loci with Ae above 4. Then we calculated the theoretical probabilities of the ability to detect at least three alleles in a mixture of two unrelated individuals using different combination of microhaps [3]. Considering the 17 loci with Ae greater than 4 the probability of identifying more than two haplotypes in a mixture is close to 1. These results make this subset of loci very informative for forensic purposes.
barcode Adapters. Individual libraries were pooled and submitted to emulsion PCR by using the Ion PGM Hi-Q OT2 kit. The template-positive Ion PGM Hi-Q ISPs were enriched on Ion One touch ES Instrument and sequenced on PGM instrument by using Ion PGM Hi-Q sequencing kit,318 chip types and 400 base read mode. A total of 73 Italian samples were analyzed. Written informed consent was obtained from each participant. The data analysis was performed by using a home-made bioinformatic pipeline. Briefly, alignment of reads against human reference genome was performed on the Torrent Suite version 5.0.4, followed by variant calling of the 272 SNP sites using samtools and bcftools. The annotation of each haplotype was performed by using the GATK tool. The output of this step was vcf files, that were filtered and converted in a haplotypes table by using excel.
4. Discussion Overall the results of this study confirm the utility of microhaps in forensic. We selected a subset of microhaps loci, but now many issues must be addressed and resolved before used these markers in forensic practice. First, microhap data obtained by MPS will include additional variation not previously observed. Therefore, it could be useful to explore the presence of uncommon variant alleles at sites sequenced with MPS. Moreover, it could be useful design a new PCR panel with reduced amplicon size, to allow analysis of samples with degraded DNA. A very important issue will be to improve the bioinformatic analysis, by testing yet existing software developed for STR analysis on MPS data that could be suitable also with microhaps or eventually develop a new specific tool. Finally, it could be interesting to test the selected loci on mixture.
3. Results Overall the MPS results showed a very good performance of the designed panel. The average coverage ranged between 267.9 and 3174 (median: 848.1; mean: 1006.6). We observed also a good uniformity of coverage (median: 93.91%, mean: 88.78%), even if the primer pool 1 were usually more efficient than primer pool 2. Two samples showed a low uniformity of base coverage, due to a PCR failure of one primer pool. These samples were excluded from the study. Only one microhap locus was always negative in all samples analyzed. The final microhap panel was reduced to 88 loci and 272 SNPs, evaluated in 71 italians samples. Haplotypes table data were used to infer the relevant characteristics for forensic purposes. The heterozygosity values for all 88 loci ranged between 0.01 and 0.88, with a mean value of 0.61. Only 3 loci display heterozygosity values < 0.3. We then calculated the effective number of alleles (Ae), which relates to the usefulness of the locus in resolving DNA mixtures and resolution of relationships, and more than half of the loci present Ae values above 2.5. The Ae values observed in our study were compare with those reported in a recent published study [1], and they were similar. The higher values observed at certain loci in our results could be explained as MPS is more able to detect rare haplotypes than phasing data sets of individual SNPs. As higher the effective number of alleles, the more probable a mixture could be detected, we searched for candidate loci that displayed high Ae values. Fig. 1 describes the distribution of 88 microhaps
5. Conclusion Microhaps will become a very promising new type of useful forensic marker when MPS is the technology being used for genotyping. The selected microhaps could be incorporated into the multiplexes used for typing STRs, AISNPs, mtGenome, phenotyping predictive SNPs etc., because they will be highly informative for individual identification, mixture identification and reconstruction of family relathionships. References [1] K.K. Kidd, W.C. Speed, A.J. Pakstis, et al., Evaluating 130 microhaplotypes across a global set of 83 populations, Forensic Sci. Int. Genet. 29 (2017) 29–37. [2] K.K. Kidd, Proposed nomenclature for microhaplotypes, Hum. Genom. 10 (2016) 16. [3] K.K. Kidd, W.C. Speed, Criteria for selecting microhaplotypes: mixture detection and deconvolution, Investig. Genet. 6 (2015) 1.
2