Development and application of a nonbinary SNP-based microhaplotype panel for paternity testing involving close relatives

Development and application of a nonbinary SNP-based microhaplotype panel for paternity testing involving close relatives

Forensic Science International: Genetics 46 (2020) 102255 Contents lists available at ScienceDirect Forensic Science International: Genetics journal...

2MB Sizes 2 Downloads 26 Views

Forensic Science International: Genetics 46 (2020) 102255

Contents lists available at ScienceDirect

Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsigen

Development and application of a nonbinary SNP-based microhaplotype panel for paternity testing involving close relatives

T

Shule Suna,1, Ying Liua,1, Jienan Lia, Zedeng Yanga, Dan Wena, Weibo Liangb, Yiqing Yana, Hao Yua, Jifeng Caia, Lagabaiyila Zhaa,* a b

Department of Forensic Medicine, School of Basic Medical Sciences, Central South University, Changsha, 410013, Hunan, PR China Department of Forensic Biology, West China School of Preclinical and Forensic Medicine, Sichuan University, Chengdu, 610041, PR China

ARTICLE INFO

ABSTRACT

Keywords: Close relatives Paternity tests Microhaplotype Nonbinary SNP

Paternity testing involving close relatives is facing challenges in the field of forensic genetics. Microhaplotype has been proposed as a promising genetic marker for their low mutation rates and high discrimination power recently. In this study, we selected 30 microhaplotypes from 1000 genome projects, including one non-binary SNP, and other six microhaplotypes from published studies containing only binary SNPs to established a panel of microhaplotypes for paternity testing. Most microhaplotypes generated a high effective number of alleles (Ae) with the harmonic mean value of Ae of 3.91 and the arithmetic mean value of heterozygosity of 0.74, respectively. We collected 54 unrelated individuals and 53 samples from six extended families. It was noting that 13 samples from six extended families were unrelated so they were also included in unrelated individuals. The pedigrees of 38 parent–child duos, 55 uncle/aunt/grandparent–child duos (non-biological parent-child duos) and 29 full sibling pairs were constructed based on 53 samples from six extended families. The genotype and haplotype results demonstrated that the combined power of discrimination (CPD) reached 0.99999999999999999999999999999999799 and the cumulative probability of exclusion (CPE) reached 0.999999999999548. The combined probability of excluding relatives (uncle/aunt/grandparent) (CPER) was 0.999999993 (> 0.9999), indicating that our panel had good effectiveness in preventing the misinterpretation of close relatives being biological parents. For 38 parent–child duos, the CPI by using the microhaplotypes panel was higher than the one by using Goldeneye 20A kit due to higher polymorphism and more loci in our panel. For 55 non-biological parent-child duos, the CPIs by using STR loci could not help determine 9 non-biological parent-child duos as “exclusions” of paternity while the CPIs by using microhaplotype loci could not help exclude the parenthood of 4 non-biological parent-child duos (CPI > 0.0001). Using the CPI derived from both datasets of STRs and microhaplotypes, all the non-biological parent-child duos could be considered as exclusions. The efficiency of excluding close relatives for this panel was evaluated by analyzing the parameters of 2000 simulated pairs, and the effectiveness was 0.988 at the threshold of t1 = 4 and t2 = −4. Moreover, the average Log10 combined full sibling index (CFSI) for all 29 full sibling pairs was about 7.55 after physical linkage taken account. These data demonstrated that this nonbinary SNPs-based microhaplotype panel has advantages in paternity testing, especially in STR mutated or close relatives involved cases.

1. Introduction Paternity testing has an important role in solving searches for missing persons and inheritance disputes, identifying disaster/war victims, and criminal investigations [1]. However, some paternity tests involving close relatives are facing challenges in forensic DNA analyses [2]. For example, when dealing with cases involving identification of missing persons or war victims, more than one person’s genotype

profiling match with the DNA collected from the scene of accident after comparison in a DNA database. These kinds of cases have been reported repeatedly [3,4]. The reason may be that the DNA database has enrolled several close relatives of these persons. In addition, if the alleged father is a close relative of the child rather than a biological father during some duo parentage cases, there will be homologous alleles between the alleged father and child. Thus, just a few non-matching loci may be detected between the profile of the child and that of the alleged

Corresponding author. E-mail address: [email protected] (L. Zha). 1 These authors contributed equally to this work and should be considered as co-first authors. ⁎

https://doi.org/10.1016/j.fsigen.2020.102255 Received 1 August 2019; Received in revised form 12 December 2019; Accepted 20 January 2020 Available online 24 January 2020 1872-4973/ © 2020 Elsevier B.V. All rights reserved.

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

parent. It is difficult to distinguish them by mutations. Hence, the risk of drawing a wrong conclusion may be increased considerably [5,6]. To summarize, the misinterpretation of close relatives being biological parents affects the accuracy of the results of paternity testing. The misinterpretation may be from second-degree relatives, third-degree relatives, or more distant relatives, but from second-degree relatives is the most common case [7]. Hence, averting the false inclusion of second-degree relatives during paternity testing is crucial. Forensic short tandem repeat (STR) loci have been the “gold standard” markers for decades [8]. They are highly polymorphic in human populations and, thus, have high per-locus discrimination power in kinship testing [9]. Despite the great PI value of STR loci in forensic applications, most STR loci show relatively high mutation rates. According to Lai and Sun [10], the locus-specific mutation rates range from 10−4 to 10−2 for each gamete of every generation. When several non-matching loci are observed during paternity testing, it is difficult to determine whether these mismatches are due to real mutations or from the inclusion of close relatives. Compared with STRs, single-nucleotide polymorphisms (SNPs) have significantly lower mutation rates. The locus-specific mutation rate is 10−7 to 10−8 for each gamete of every generation, far below the mutation rate of STR loci [11,12]. However, most SNPs have low polymorphism because they are binary alleles [13,14]. Approximately 45–65 unlinked SNP markers are required to match the polymorphism information content (PIC) of 12–16 STRs [15,16]. “Microhaplotype” was proposed first by Professor Kenneth Kidd [17]. It refers to a locus with two or more SNPs in a short segment of DNA (often < 300 bp) that generates three or more common alleles. Multiple microhaplotypes often exist for the occurrence historically of rare crossovers, the accumulation of the variants at the different sites, and the vagaries of random genetic drift and/or selection [18]. The microhaplotype could be a promising genetic marker in forensic paternity tests for five reasons. First, for each locus, the sequencing of haplotypes of multiple SNPs can offer more information than sequencing a single SNP, even compared with STR loci [19]. Second, because the mutation of a microhaplotype locus originates from the mutation of inner SNPs, its mutation rate is far less than the mutation rate of STR loci. Third, microhaplotypes are multi-SNP haplotypes, so the heterozygosity of a microhaplotype locus is higher than that of a standalone SNP locus. Fourth, with the help of next-generation sequencing technology, it is possible to detect a large number of microhaplotypes at one time and obtain the sequence of DNA fragment [20]. Fifth, a stuttered and imbalanced heterozygous peak would not be produced during polymerase chain reaction (PCR) amplification because microhaplotype markers show their polymorphism in different sequences rather than different repeating units [17]. In our previous study, we screened many nonbinary SNPs [21] and found that some nonbinary SNPs can form microhaplotypes with SNPs nearby. These nonbinary SNP-based microhaplotypes possess high heterozygosity and can be potential genetic markers in paternity testing, especially when mutated STRs are involved and/or a relative is an alleged parent. In addition, Illumina MiSeq platform (San Diego, CA, USA) with PE300 option can be used to sequence both ends of the amplicons spanning target loci (300 bp at each end), so the sequencing length of the target region could reach 600 bp. This technology makes it feasible and practicable to sequence multiple loci, including more SNPs within the amplicon that offer more information [22]. In the present study, we selected 30 microhaplotypes (including nonbinary SNPs) and 6 microhaplotypes from Kidd et al. (only containing binary SNPs), and designed primers for amplicon sequencing by MiSeq. The genotyping by sequencing and further haplotype phasing analysis provided the microhaplotype information for unrelated individuals and the samples from extended families so that its effectiveness in preventing the false inclusion of a second-degree relative during paternity testing could be well evaluated.

2. Materials and methods 2.1. Sample collection Ninety-four Chinese Han individuals were enrolled in this study. These samples were divided into two categories: (i) 54 unrelated random Chinese Han individuals; (ii) 53 samples from six unrelated extended families of a Chinese Han population. The detailed kinship among the samples of six extended families was depicted in Fig. 1. Thirteen samples were both extended-family samples and unrelated random samples. Finally, 38 parent-child duos and 55 uncle/aunt/ grandparent-child duos (non-biological parent-child duos) were built to evaluate the efficiency of microhaplotype loci in paternity testing. These relatives duos were from six unrelated extended families, so some relatives duos had the same alleged parent or alleged child. The study protocol was approved by the Ethics Committee of the Third Xiangya Hospital of Central South University (2018-S194; Hunan Province, China). Written informed consent was obtained from each participant. 2.2. Selection of candidate microhaplotype loci Based on the nonbinary SNPs observed in our previous study, we selected neighboring SNPs to formed nonbinary SNP-based microhaplotype loci. Population data from the 1000 Genomes Phase 3 Project (International Genome Sample Resource; www.internationalgenome. org/home) was leveraged to choose and assess candidate loci [23]. A total of 211 unrelated individuals from Han Chinese in Beijing, China (CHB) and Southern Han Chinese (CHS) were enrolled. These genetic markers were selected manually via four criteria. The first criterion was to obtain microhaplotypes with the value of effective number of alleles (Ae) more than 3 and observed heterozygosity (Ho) more than 0.6. Hence, we selected microhaplotypes which contained nonbinary SNPs, every SNP had minor allele frequency more than 0.1, the sum of two minor allele frequencies was more than 0.2 for a nonbinary-SNP and every microhaplotype consisted of 3–6 adjacent SNPs. Second, the target region between two outmost SNPs needed to be < 500 bp. Third, the microhaplotype loci should avoid “recombination hotspots” and repetitive sequences in the genome. A finescale genomic recombination map of CHB published by Gao et al. was used to confirm whether the locus was located in the recombination hotspots, and the sequences of candidate loci were compared using BLAST to confirm whether there was repetitive sequences [24,25]. The recombination rate of the most of selected loci was about 1 cM/Mb. The previously reported study suggested that the recombination rate of two genetic markers within < 10 kb was < 10−4 even smaller than the mutation rate for STRs (10-3) if assuming the genome-wide average recombination per megabase was 1 % [17]. So our microhalotype loci (< 500 kb) could be treated as a haplotype. The distribution of our selected loci on chromosome 18 in CHB fine-scale genomic recombination map was shown in Supplementary Fig. 1, which was an example. Fourth, heterozygosity among CHB and CHS populations was > 0.6. Besides, we chose six microhaplotypes with Ho > 0.6 from Kidd et al. containing only binary SNP [19]. The nomenclature of our microhaplotypes was in accordance with that proposed by Kidd [26]. 2.3. Amplicon sequencing using the MiSeq PE300 platform Genomic PCR primer sequences were design using Primer3 and Oligo v2.3.7 (Molecular Biology Insights, Colorado Springs, CO, USA) and their genomic specificity was evaluated in silico using NCBI PrimerBLAST. Primer sequences were summarized in Supplementary Table 1 and Supplementary Table 2. Using Takara/WaferGen SmartChip TE™ system, for all 96 samples in one single Smartchip, parallel nanoliter PCR-based target enrichment for amplicon sequencing was conducted using a procedure similar to De Wilde B et al. [27]. The PCR reaction 2

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

Fig. 1. The pedigree figures of six extended families. (A), (B), (C), (D), (E), (F) and (G) representing six different extended families. female sample; male sample; female sample was unavailable; male sample was unavailable.

volume for a single well was 100 nL. The concentration of reagents in the single-plex PCR for each sample was: MasterMix 1×, Universal Outer Primer 1 μM, Index Primer 1 μM, Inner Primer Pair 0.25 μM, DNA template 2.5 ng/μL. The PCR was conducted on a T100 Thermal Cycler (Bio-Rad Laboratories, Hercules, CA, USA) set at the following cycling conditions: 95 °C for 5 min, 10 cycles at 95 °C for 15 s, 60 °C for 30 s, 72 °C for 60 s, 2 cycles at 95 °C for 15 s, 80 °C for 30 s, 60 °C for 30 s, 72 °C for 60 s, 8 cycles at 95 °C for 15 s, 60 °C for 30 s, 72 °C for 60 s, 2 cycles at 95 °C for 15 s, 80 °C for 30 s, 60 °C for 30 s, 72 °C for 60 s, 8 cycles at 95 °C for 15 s, 60 °C for 30 s, 72 °C for 60 s, and 10 cycles at 95 °C for 15 s, 80 °C for 30 s, 60 °C for 30 s, 72 °C for 60 s. The PCR products of the whole chip, i.e. the constructed multiplexed sequencing library with the dual index for tracking different samples was collected, purified via gel recovery, and then sequenced on Illumina MiSeq platform with PE300 setting/reagent (Kit v3 at 600 cycles according to manufacturer recommendations [22]). The average read depth for each locus within the panel by NGS reads was shown in Supplementary Fig. 2. There was variant at different loci but these loci were filtered by two different researchers without the information of samples.

(chip A and B) were used to verify the reproducibility of 20 microhaplotype loci of some samples and the genotype results of these 20 loci between chips were the same (Duplicate loci: mh01zha011, mh01zha012, mh02zha008, mh02zha013, mh02KK-136, mh04zha004, mh05zha002, mh07zha003, mh08zha010, mh13zha002, mh13KK-218, mh14zha006, mh16zha006, mh16zha009, mh16KK-302, mh18zha005, mh19zha006, mh19zha007, mh19zha009, mh22zha003; Duplicate samples: F1, F3, F5, F7, F8, F10, F11, F13, F14, F17, F19, M1, M2, M6, M8, M14, M21, M22, S5, S12, S13, S14, S15, S16, S18, S19, S20, S21, S23, S26, S28, S29, S30, S35, S36, S37, S39, S40, S41, S42, S43, S44, S46, S47, S48, S49, S52). 2.5. Sanger sequencing for validation To verify the accuracy of MiSeq PE300 sequencing, we selected several samples randomly and obtained unambiguous genotypes using T vector molecular cloning and Sanger sequencing. 32 microhaplotypes were typed using sample S1. Among the 32 microhaplotypes, the locus mh02zha001 and mh03zha009 were also typed using sample S5, S15, and S23. Target fragments were first amplified on a GeneAmp PCR 9700 thermal cycler, then ligated into T vectors, cloned and, finally, sequenced. To obtain an unambiguous genotype of the microhaplotype, we sequenced PCR products and multiple monoclonal plaques using Sanger technology. Ultimately, we obtained TA cloning sequences that were consistent with the corresponding PCR products.

2.4. Duplicate samples and chips for reproducibility Four samples from two persons (two samples per person) were used on the same chip for amplicon sequencing by MiSeq and the genotype results of same sample were identified. Moreover, two different chips 3

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

2.6. STR genotyping

based on the data of CHB and CHS from 1000 Genomes Project, and 54 unrelated individuals from our study, respectively. Full siblings were determined using the ITO method of LR principles and according to the 2010 AABB Guidelines for Mass Fatality DNA Identification Operations (http://www.aabb.org/programs/disasterresponse/Pages/massfatality. aspx), and CFSI for each pairs was also calculated according to the product of FSIs for all microhaplotype loci [39]. The FSI compares the probability of the DNA results if the test persons are the full sibling versus the probability of the DNA results if the test persons are the unrelated individuals. The formulas of ITO method for calculating FSI were shown in Supplementary Table 3. But there was the genetic map distance between some loci within the same chromosome smaller than 50 cM, the linkage should be taken into account for full sibling testing according to the study of Bright et. al [40]. The linkage correction formulas for calculating FSI were shown in Supplementary table 4. Moreover, 1000 biological parent-child duos and 1000 uncle/aunt/ grandparent–child duos (non-biological parent-child duos) were simulated by SimPed software based on the data of CHB and CHS from 1000 genomes project [41]. The Log10CPI of these duos were calculated, and the efficiency of this panel for excluding close relatives was evaluated by analyzing the parameters (sensitivity, specificity, positive prediction value (PPV), negative prediction value (NPV) and effectiveness).

STR loci were also genotyped for 53 family samples and 39 of 54 unrelated individuals to confirm their kinship. A Multiplex PCR System Goldeneye 20A kit (Peoplespot, Beijing, China) was employed to amplify 19 autosomal STR markers (CSF1PO, D2S1338, D3S1358, D5S818, D6S1043, D7S820, D8S1179, D12S391, D13S317, D16S539, D18S51, D21S11, Penta D, Penta E, FGA, TPOX, TH01, vWA) and AMEL loci according to manufacturer protocols. Multiplex amplification was done using a 10 μL PCR volume in a GeneAmp PCR System 9700 Thermal Cycler (Thermo Fisher Scientific, Waltham, MA, USA). Capillary electrophoresis of the amplified products was detected using an ABI PRISM 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) with POP4 denaturing polymers (Applied Biosystems). Electrophoresis data were analyzed on the basis of allelic ladders by applying GeneMapperID v3.2 (Applied Biosystems). 2.7. Data analyses Exact tests of the Hardy–Weinberg equilibrium (HWE) and forensic parameters (allele frequencies, power of discrimination (PD), PIC, power of exclusion (PE), typical paternity index (TPI) and observed heterozygosity (Ho)) were evaluated by a modified spreadsheet within PowerStat v1.2 (Promega, Madison, WI, USA) [28]. The linkage disequilibrium (LD) tests of pairwise microhaplotype loci were analyzed using Arlequin v3.5 and SHEsis software [29,30]. The effective number of alleles (Ae) was calculated on the basis of the formula proposed by Kidd and Speed [18]. Moreover, the probability of excluding relatives (uncle/aunt/grandparent) (PER) was conducted as the formula: PER = 0.75PE provided by Funk’s et al. [31]. The raw Illumina FASTQ data generated from the MiSeq PE300 platform was first demultiplexed using bcl2fastq software for each sample and then ran through BBMap (version 37.75)’s BBDuk software (https://sourceforge.net/projects/ bbmap) for both adapter and quality trimming to remove adapters, low quality (averaged Q value < 20) and short reads (< 100bp). BWA [32] was used for “clean” read mapping onto the reference human genome sequence (GRCh37-hg19 version) and GATK(v4.0) [33] was used to call genotypes at each target SNP site (Module: Haplotypecaller; Parameters used: base_Q_min 20, map_Q min20, stand_call_conf 20, –dontUseSoftClippedBases). Both vcf and gvcf files need to be generated to confirm the genotype calls at target sites with GQ > = 20. Moreover, at least 30X reading depth was needed to call genotypes. A customized MH reconstruction pipeline including HapCUT2 [34] has been developed to phase the genotype data.

3. Results 3.1. General information Among these markers, 30 microhaplotypes contained nonbinary SNPs, so the polymorphism rate of these genetic markers was improved (harmonic mean Ae = 3.8), which was higher than that of 11 microhaplotypes with only binary SNPs for kinship analysis reported by the Chen’s study (mean Ae = 3.3) [42]. Thirty-six microhaplotype loci were distributed on 16 different autosomal chromosomes. Precise information of the 36 microhaplotypes, including SNP ID, Ae, SNP position on the chromosome, number of SNPs for each microhaplotype locus, and detailed extent sizes are summarized in Table 1 based on the data of the 1000 human project. Eleven out of 36 loci span three SNPs, 23 out of 36 loci span four SNPs, and the remaining two microhaplotypes contain five SNPs and six SNPs respectively. The length between the outermost SNPs of each microhaplotype ranged from 63 bp to 423 bp (mean = 216 bp). All of these loci had relatively high Ae based on the data of the 1000 human project. The locus with the highest Ae was mh02zha013 (11.31). 13 of the Ae of microhaplotypes were > 4.0 and 24 of the Ae of microhaplotypes were > 3.5. The harmonic mean value and median value of the Ae of these 36 loci was 3.91 and 3.76, respectively. The maximum heterozygosity of these microhaolotypes was 0.86 and the minimum heterozygosity was 0.65 (mean heterozygosity = 0.74) for these 36 loci based on the data of the 1000 human project.

2.8. Calculation of combined paternity index (CPI) and combined full sibling index (CFSI) The CPI was calculated based on likelihood ratio (LR) principles according to the International Society for Forensic Genetics (ISFG) [35]. The PI compares the probability of the DNA results if the alleged father is the biological father versus the probability of the DNA results if any other random man in the population is the biological father. For each case, the CPI was equivalent to the product of PIs for all sites tested. With regard to 55 non-biological parent-child duos, they were considered to be mutations if non-matching loci were observed between the close relative and child. The mutation rate of STR was 10−3 based on Lai’s and Sun’s study [10] and the PI was calculated by stepwise mutation model [36]. The allele frequencies for STR were from the Zou’s study [37]. According to a study by Nachman and Crowell, the mutation rate of a SNP was 10−8 [38]. If one non-matching SNP was observed on a microhaplotype locus, we regarded the mutation rate to be 10−8 to calculate PI. If two non-matching SNPs were observed on a microhaplotype locus, we took the mutation rate to be 10−8×10−8 to calculate PI, and so on for more non-matching SNPs observed on a microhaplotype locus. The allele frequencies for microhaplotypes were

3.2. Forensic parameters 3.2.1. Forensic parameters of the 36 microhaplotypes by the data of CHB and CHS from 1000 Genomes Project A total of 211 unrelated individuals from CHB and CHS were enrolled for calculation of forensic parameters. Allele frequencies and specific forensic parameters of the 36 microhaplotype loci can be seen in Supplementary table 5. A total of 260 alleles were observed, and the allele frequencies varied from 0.002 to 0.549. All 36 microhaplotype loci had high genetic polymorphisms, with unique allele numbers of these loci ranging from 4 (mh01zha0012, mh09zha014, and mh05zha002 locus) to 24 (mh02zha013 locus). The PD, PIC, PE, TPI and Ho ranged from 0.826 (mh02zha013 locus) to 0.976 (mh18zha004 locus), 0.619 (mh16zha006 locus) to 0.905 (mh02zha013 locus), 0.354 (mh03zha009 locus) to 0.720 (mh13KK-218 locus), 1.426 (mh03zha009 locus) to 3.638 (mh13KK-218 locus) and 0.649 (mh03zha009 locus) to 0.863 (mh13KK-218 locus), respectively. The 4

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

Table 1 Precise information of the 36 microhaplotypes. Microhaplotype mh01zha009 mh01zha011 mh01zha012 mh02KK-136 mh02zha001 mh02zha003 mh02zha008 mh02zha013 mh03zha009 mh04zha004 mh05zha002 mh05zha005 mh07zha003 mh08zha003 mh08zha008 mh08zha009 mh08zha010 mh09zha004 mh12zha008 mh13KK-213 mh13KK-218 mh13KK-223 mh13zha002 mh14zha006 mh16KK-302 mh16zha003 mh16zha004 mh16zha006 mh16zha009 mh18zha004 mh18zha005 mh19zha006

Ae 3.759 3.349 4.517 4.025 3.973 3.459 3.696 11.31 2.878 3.411 3.775 3.496 4.723 3.669 3.672 3.153 3.147 3.423 3.27 4.081 7.561 4.829 3.849 6.144 4.094 4.216 3.766 3.139 3.698 3.223 4.339 3.702

# SNPs 3 3 4 3 4 4 3 4 4 4 3 4 4 4 4 4 3 4 4 3 4 4 4 4 4 3 4 3 4 4 3 6

Position (build37) chr1:189911396-189911775 chr1:208209385-208209589 chr1:210372065-210372127 chr2:228092389-228092459 chr2:15838619-15838723 chr2:80080017-80080287 chr2:220665364-220665668 chr2:138693815-138693915 chr3:3158819-3159162 chr4:57939863-57940018 chr5:178765138-178765405 chr5:165727583-165727891 chr7:41441508-41441607 chr8:2772228-2772575 chr8:4811346-4811576 chr8:5796703-5796935 chr8:140873929-140874087 chr9:90650554-90650880 chr12:84222002-84222424 chr13:23765541-23765681 chr13:54060827-54060972 chr13:110806699-110806852 chr13:103130462-103130682 chr14:106476185-106476511 chr16:7587734-7587847 chr16:12718025-12718443 chr16:12724015-12724327 chr16:77852460-77852635 chr16:86921457-86921568 chr18:14315931-14316046 chr18:44908587-44908819 chr19:20762669-20762731

extent in bp 380 205 63 71 105 271 305 101 344 156 268 309 100 348 231 233 159 327 423 141 146 154 221 327 114 419 313 176 112 116 233 63

mh19zha007 mh19zha009 mh21KK-320 mh22zha003

4.824 3.488 5.132 3.958

4 5 4 3

chr19:28888223-28888363 chr19:53632326-53632503 chr21:43062859-43063044 chr22:19532378-19532704

141 178 186 327

SNPs rs815738/rs10920178/rs843814/ rs4844633/rs841863/rs6540449/ rs72649965/rs11119437/rs2494175/rs4844968/ rs6714835/rs6756898/rs12617010/ rs6745544/rs10190568/rs6717035/rs4668958/ rs10199736/rs10199901/rs62141577/rs7423545/ rs6758274/rs7579563/rs11692946/ rs6430761/rs7606213/rs62168684/rs11695953/ rs193098/rs73005815/rs9831674/rs6784393/ rs10049992/rs1914740/rs1714017/rs6835177/ rs33900/rs28060/rs33899/ rs17066625/rs1696966/rs1799554/rs1696967/ rs4724041/rs378367/rs433709/rs404569/ rs1471734/rs921781/rs73492855/rs341730/ rs13272409/rs6997388/rs77698197/rs76386173/ rs201591576/rs4875646/rs4875647/rs7824642/ rs9886648/rs9886647/rs11783687/ rs72618165/rs10125273/rs10125295/rs10125370/ rs10862775/rs9943694/rs61954584/rs9943755/ rs8181845/rs679482/rs9510616/ rs1927847/rs9536429/rs7492234/rs9536430/ rs1192204/rs1192205/rs3825483/rs3825481/ rs72649485/rs12877457/rs9514021/rs9514022/ rs71205883/rs7160425/rs7161550/rs78689987/ rs1395579/rs1395580/rs1395582/rs9939248/ rs6498348/rs16960309/rs8052581/ rs34771585/rs72638292/rs4781308/rs4781311/ rs2966051/rs2914455/rs12922936/ rs76047588/rs11641186/rs11641193/rs80213582/ rs3971538/rs11660240/rs2135869/rs2870160/ rs877630/rs3892875/rs76519339/ rs10422309/rs201266999/rs75130051/ rs10422326/rs10422340/rs118026388/ rs8106726/rs8102417/rs59490836/rs10406130/ rs74178308/rs8108729/rs8107824/rs8108835/rs2560950/ rs2838081/rs2838082/rs78902658/rs2838083/ rs4819804/rs4819514/rs5748271/

The nonbinary SNP is marked in bold.

combined PD and combined PE were 0.99999999999999999999999999999999999246 and 0. 9999999999875, respectively.

99999999999999999799 and 0. 999999999999548. The combined PE of 19 STR loci was 0.99999998387 according to Zou’s study [37]. The combined PE of our panel was slightly higher than Goldeneye 20A kit. Besides, the combined PER (uncle/aunt/grandparent) of our panel was 0.999999993 (> 0.9999) by the data of 54 unrelated individuals. The results of the exact test (p-value) of pairwise microhaplotype loci based on the data of 54 unrelated individuals revealed no significant LD for any loci pairs after Bonferroni’s adjustment for multiple testing (p = 0.05/630) except for mh09zha004–mh19zha009. The locus mh19zha009 and mh09zha004 were located in different chromosomes. But p value for LD was mostly depended on the size of the studied population. Then a more stable value (r2 values) was used to verify LD for these loci and all r2 values of pairwise microhaplotype loci were smaller than 0.1. The p values and r2 values for 36 microhaplotype loci can be seen in Supplementary table 8-1. Similarly, the results of the exact test (p-value) of pairwise microhaplotype and STR loci based on 39 of 54 unrelated individuals revealed no significant LD for any loci pairs after Bonferroni’s adjustment for multiple testing (p = 0.05/ 1485) except for mh19zha009–D7S820. But all r2 values of these pairwise loci were smaller than 0.1 and the locus mh19zha009 and D7S820 were located in different chromosomes. The p values and r2 values of 36 microhaplotype and 19 STR loci can be seen in Supplementary Table 8-2. Therefore, combining the above results, we believed that there was no significant LD between tested loci for paternity testing. But the physical linkage became relevant for full sibling testing, the physical linkage correction should be taken into account in full sibling testing.

3.2.2. Forensic parameters of the 36 microhaplotypes by the data of 54 unrelated individuals from MiSeq PE300 sequencing The genotype profiles of all 94 sequenced samples (including 54 unrelated individuals) enrolled in our study are shown in Supplementary table 6 and Supplementary table 7. The results of MiSeq PE300 sequencing were accurate and in accordance with those of Sanger sequencing (Fig. 2 illustrates the consistency using Locus mh02zha001 for Sample S5 as an example with data source from Cell F48 in Supplementary table 6. Supplementary Fig. 3 to Supplementary Fig. 34 show the TA cloning results of 32 microhaplotypes). Moreover, the results of 20 duplicate loci between chip A and B were the same. The allele frequencies and forensic parameters of the 36 microhaplotype loci based on 54 detected individuals are shown in Table 2. Altogether, 222 alleles were observed. The corresponding allele frequencies ranged from 0.009 to 0.593. All 36 microhaplotype loci were highly genetically polymorphic, with unique allele numbers for these markers ranging from 3 (mh14zha006 locus) to 15 (mh02zha013 locus). Deviations from the HWE were not observed at any loci after Bonferroni’s adjustment for multiple testing (p = 0.05/36). The PD, PIC, PE, TPI, and Ho varied from 0.776 (mh02zha013 locus) to 0.940 (mh18zha004 locus), 0.572 (mh03zha009 locus) to 0.836 (mh02zha013 locus), 0.261 (mh03zha009 locus) to 0.735 (mh02zha013 locus), 1.174 (mh03zha009 locus) to 3.857 (mh02zha013 locus) and 0.574 (mh03zha009 locus) to 0.870 (mh02zha013 locus), respectively. The combined PD and combined PE were 0.9999999999999995

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

Fig. 2. Illustration for the consistency between NGS and Sanger sequencing results using Locus mh02zha001 of S5 sample as an example. PCR direct sequencing result is at bottom: TGAT/CAAT, showing S5 sample is a heterozygote at rs6745544 and rs10190568 respectively. Clone 1 and 2 are two Sanger reads from two TA clones, showing haplotype should be CAAT|TGAT, which is consistent with the NGS result on the top, the Integrative Genomics Viewer (IGV) screenshot. Due to high depths of the reads, only a portion of reads were included as illustration.

3.3. Evaluation of the efficiency of this panel in paternity tests of 38 parent–child duos

parent because the CPI was > 10000 in two non-biological parent-child duos (37th non-biological parent-child duo and 39th non-biological parent-child duo). In addition, we could not draw clear conclusions in seven non-biological parent-child duos because the CPI of these eight duos ranged from 0.0001–10000 (7th, 26th, 27th, 28th, 44th, 48th and 54th non-biological parent-child duos).

Thirty-eight parent–child duos were analyzed and PIs calculated using genotype data separately from STR loci by a Goldeneye 20A kit and microhaplotypes loci by this panel. The allele frequency of microhaplotypes was obtained separately from 54 unrelated individuals in our MiSeq sequencing platform and 211 unrelated individuals from CHB and CHS published in the 1000 Genomes Project. Therefore, two different PIs were obtained on the basis of genotype data from microhaplotype loci. The genotype of 38 parent–child duos and the specific PI per loci and CPI per parent–child duo based on different genetic markers are shown in Supplementary table 9, Supplementary table 10 and Supplementary table 11, respectively. Furthermore, we compared the distribution of CPI in 38 duo cases by different genetic markers (Fig. 3). The mean values of CPIs (in log10) according to the allele frequency of STR loci, microhaplotype loci by data of 54 unrelated individuals and microhaplotype loci by data of CHB and CHS were 6.85, 8.97 and 9.92, respectively. Overall, CPIs by 36 microhaplotypes panel were higher than CPIs by Goldeneye 20A kit. When we combined STR and microhaplotype panels, CPIs for 38 duos were improved considerably (Fig. 3).

3.4.2. Paternity testing in 55 non-biological parent-child duos by 36 microhaplotype loci A panel of 36 microhaplotype loci were genotyped and we evaluated the efficiency of this panel by computing CPI of 55 uncle/aunt/ grandparent–child duos. The detailed genotype profiles of these samples are shown in Supplementary table 6. We calculated the CPI of 55 non-biological parent-child duos according to the allele frequencies of microhaplotype loci from 54 unrelated individuals and from the 1000 Genomes Project, respectively. The specific CPI per non-biological parent-child duo is shown in Supplementary tables 13 and 14. No erroneous conclusions were drawn among 55 non-biological parent-child duos, and only 4 s-degree relatives could not be excluded as being biological parents because the CPI of these duos was between 0.0001–10000. When we computed the CPI combining STR loci and microhaplotype loci, all the close relatives could be ruled out as being true parents and the CPI of these non-biological parent-child duos was much less than 0.0001. Eventually, we compared the CPI of 55 nonbiological parent-child duos separately by STR loci and microhaplotype loci (Fig. 4). Among 55 non-biological parent-child duos, the maximum, minimum and mean values for the log10CPI by 36 microhaplotype loci were lower than the log10CPI by Goldeneye 20A kit. If we calculated the CPI of these non-biological parent-child duos using both microhaplotype loci and STR loci, the log10CPI of these non-biological parentchild duos decreased.

3.4. Evaluation of the efficiency of this panel in paternity tests of 55 uncle/ aunt/grandparent–child duos 3.4.1. Paternity analyses in 55 non-biological parent-child duos by 19 STR loci The detailed STR genotype profiles of these samples are shown in Supplementary table 12. To validate the parenthood of these non-biological parent-child duos, we calculated the CPIs of 55 non-biological parent-child duos (Supplementary table 12). According to ISFG guidelines [43] and the Chinese National Standards for Paternity Testing, we support the test man to be a biological father if CPI > 10000; support that the test man is not a biological father if CPI < 0.0001; cannot draw a clear conclusion if 0.0001 < CPI < 10000. The results showed that we mistakenly supported the close relative of child as a biological

3.5. Evaluation of the efficiency of this panel for excluding close relatives The efficiency of the microhaplotype panel for excluding close relatives was evaluated by analyzing the parameters of 1000 biological 6

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

Table 2 Allele frequency and forensic genetic parameters of every microhaplotype locus. mh01zha009

mh01zha0011

mh01zha012

mh02KK-136

mh02zha001

mh02zha003

mh02zha008

mh02zha013

mh03zha009

CCA GCA GCG GTA GTT

0.111 0.38 0.231 0.046 0.231

GCC GGT GTT TCC TGC TTT

0.407 0.167 0.111 0.296 0.009 0.009

CCGA GCAC GCGA GCTC GTAC

0.269 0.065 0.269 0.241 0.157

GTA GTC TCA TCC TTC

0.074 0.093 0.287 0.167 0.38

AACG CAAT CACG CGAG TGAT

0.278 0.185 0.009 0.176 0.352

CGCC GACA GACC GGCC TAAC TGAC TGCC

0.157 0.083 0.38 0.185 0.009 0.176 0.009

AAG CAG GAA GGG

0.333 0.361 0.13 0.176

AAGT GAGC GGGC GGGT TGAC TGGC AAGC GGAC

0.176 0.056 0.083 0.019 0.593 0.056 0.009 0.009

PD PIC PE TPI HET Ae P

0.87 0.69 0.526 2.077 0.759 3.766 0.76

PD PIC PE TPI HET Ae P

0.868 0.656 0.328 1.35 0.63 3.406 0.178

PD PIC PE TPI HET Ae P

0.904 0.73 0.463 1.8 0.722 4.316 0.343

PD PIC PE TPI HET Ae P

0.882 0.688 0.494 1.929 0.741 3.72 0.97

PD PIC PE TPI HET Ae P

0.876 0.686 0.558 2.25 0.778 3.753 0.533

PD PIC PE TPI HET Ae P

0.883 0.724 0.627 2.7 0.815 4.144 0.395

PD PIC PE TPI HET Ae P

0.848 0.658 0.463 1.8 0.722 3.459 0.937

CCAA CCAG CCGA CGAA CGAG CTAA CTAG TCAA TCAG TCGA TCGG TGAA TGAG TTAA TTAG PD PIC PE TPI HET Ae P

PD PIC PE TPI HET Ae P

0.782 0.572 0.261 1.174 0.574 2.523 0.589

0.028 0.009 0.028 0.009 0.009 0.019 0.009 0.083 0.037 0.194 0.009 0.231 0.037 0.148 0.148 0.94 0.836 0.735 3.857 0.87 6.814 0.837

mh04zha004

mh05zha002

mh05zha005

mh07zha003

mh08zha003

mh08zha008

mh08zha009

mh08zha010

mh09zha004

GCAG GGAG GGCG GTCG TCAC TCAG TCCG TGAG TGCG TTAG TTCG PD PIC PE TPI HET Ae P

CAC CAT CGG TGG

0.139 0.333 0.241 0.287

ACTT GCCC GCCT GGAT

0.111 0.389 0.231 0.269

CGAT CGCC CGCT CTTC TTTC

0.343 0.139 0.213 0.083 0.222

CACG CATA CATG CGCA CGCG TGCA TGCT

0.065 0.009 0.231 0.37 0.019 0.241 0.065

CCCT CTCT GTCT GTGT TCCC

0.454 0.028 0.176 0.139 0.204

ACTC ATCC CACG CATG CCTC CCTG CTCC CTCG

0.167 0.009 0.009 0.398 0.167 0.009 0.231 0.009

CCC CCT CTC TAC TCT

0.083 0.241 0.5 0.167 0.009

CCCG CTCG CTGA TTTG

0.352 0.074 0.333 0.241

PD PIC PE TPI HET Ae P

0.853 0.679 0.558 2.25 0.778 3.695 0.487

PD PIC PE TPI HET Ae P

0.83 0.658 0.558 2.25 0.778 3.456 0.324

PD PIC PE TPI HET Ae P

0.897 0.724 0.558 2.25 0.778 4.193 0.875

PD PIC PE TPI HET Ae P

0.861 0.701 0.592 2.455 0.796 3.888 0.426

PD PIC PE TPI HET Ae P

0.855 0.658 0.406 1.588 0.685 3.347 0.71

PD PIC PE TPI HET Ae P

0.857 0.69 0.699 3.375 0.852 3.733 0.059

PD PIC PE TPI HET Ae P

0.816 0.606 0.526 2.077 0.759 2.916 0.136

PD PIC PE TPI HET Ae P

0.852 0.644 0.406 1.588 0.685 3.352 0.71

0.046 0.167 0.111 0.019 0.12 0.019 0.019 0.028 0.009 0.028 0.435 0.902 0.726 0.463 1.8 0.722 4.021 0.537

mh12zha008

mh13KK-213

mh13KK-218

mh13KK-223

mh13zha002

mh14zha006

mh16KK-302

mh16zha003

mh16zha004

GAGC GATC GCGC GCTT TCTT TGGC TGTT GATT

0.306 0.12 0.019 0.352 0.009 0.019 0.167 0.009

CCA CCG TAG TCA TCG

0.324 0.157 0.25 0.213 0.056

0.009 0.157 0.019 0.009 0.333 0.269 0.204

AGTA GACA GCTT

0.278 0.38 0.343

ACTT GCTC GCTT GTAT GTTT

0.065 0.194 0.204 0.38 0.157

CCG CCT CGG CGT GGT TGG

0.306 0.009 0.056 0.231 0.157 0.241

AATG ATGG ATTG GTAA GTTG GTGG

0.269 0.222 0.093 0.083 0.324 0.009

0.87 0.697 0.526 2.077 0.759 3.836 0.831

PD PIC PE TPI HET Ae P

0.889 0.719 0.526 2.077 0.759 4.156 0.902

CCCC CCCT CGCT CGTC CGTT TCCT TGCT CCTC CGCC PD PIC PE TPI HET Ae P

TGGG CATC CGCG CGGC CGGG CGTC TGCG

PD PIC PE TPI HET Ae P

CCCC CTCC CTCT CTTC CTTT TTCC TTCT TTTC TTTT PD PIC PE TPI HET Ae P

PD PIC PE TPI HET Ae P

0.885 0.707 0.494 1.929 0.741 3.999 0.78

PD PIC PE TPI HET Ae P

0.806 0.587 0.406 1.588 0.685 2.947 0.783

PD PIC PE TPI HET Ae P

0.872 0.708 0.699 3.375 0.852 3.96 0.097

PD PIC PE TPI HET Ae P

0.893 0.728 0.494 1.929 0.741 4.293 0.555

PD PIC PE TPI HET Ae P

0.894 0.718 0.526 2.077 0.759 4.128 0.921

0.019 0.13 0.213 0.102 0.046 0.028 0.259 0.093 0.111 0.936 0.816 0.592 2.455 0.796 6.098 0.335

0.019 0.139 0.167 0.269 0.176 0.056 0.157 0.009 0.009 0.935 0.798 0.627 2.7 0.815 5.591 0.779

mh16zha006

mh16zha009

mh18zha004

mh18zha005

mh19zha006

mh19zha007

mh19zha009

mh21KK-320

mh22zha003

AAC AAG GTC GTT

0.056 0.139 0.407 0.398

GCCC GCCT GGTC TTCC

0.324 0.13 0.315 0.231

CGCT CTCG CTTC TTCG

0.389 0.009 0.37 0.231

GAC GAT GGC GTC TAC TGC

0.13 0.139 0.269 0.093 0.009 0.361

CATACC CGGCCT CGGTTT CGTACC TGGCCT TGGTTT

0.241 0.37 0.093 0.111 0.009 0.176

AAGG GAAG GAGC GAGG TGGG

0.241 0.157 0.259 0.194 0.148

AACA AACG GACA GACG GATA GGCA GGCG

0.12 0.315 0.278 0.009 0.194 0.056 0.028

ACC ACT AGC GAC GAT GGC GGT

0.019 0.194 0.046 0.046 0.333 0.213 0.148

PD

0.818

PD

0.845

PD

0.776

PD

0.896

PD

0.894

PD

0.901

CCAGA CCCGA CTATG CTGTG GCCGA GCCTA GTATG GTGTG PD

PD

0.896

PD

0.909

0.009 0.009 0.296 0.009 0.241 0.083 0.009 0.343 0.857

(continued on next page) 7

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

Table 2 (continued) mh16zha006

mh16zha009

mh18zha004

mh18zha005

mh19zha006

mh19zha007

mh19zha009

mh21KK-320

mh22zha003

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

PIC PE TPI HET Ae P

0.586 0.379 1.5 0.667 2.886 0.908

0.674 0.699 3.375 0.852 3.643 0.047

0.585 0.494 1.929 0.741 2.927 0.233

0.715 0.434 1.688 0.704 4.038 0.335

0.715 0.627 2.7 0.815 4.048 0.345

0.757 0.699 3.375 0.852 4.777 0.322

0.681 0.662 3 0.833 3.696 0.106

0.731 0.558 2.25 0.778 4.3 0.956

0.747 0.526 2.077 0.759 4.537 0.626

MP: matching probability; PD: power of discrimination; PIC: polymorphism information content; PE: probability of exclusion; TPI: typical paternity index; HET: observed heterozygosity; P: probability values of exact tests for Hardy–Weinberg equilibrium (The P value less than 0.05 is marked in bold) Ae: effective number of alleles.

t2 = −5, the PPV and NPV were 1.000. The effectiveness was 0.988 at the threshold of t1 = 4 and t2 = −4, and only the relationship of 24 simulated non-biological parent-child duos was uncertain. 3.6. Evaluation of the efficiency of this panel for identification of full siblings We also explored application of this panel for full siblings identification. We constructed 29 pairs of full siblings and 29 pairs of unrelated individuals from 94 samples, and then calculated the CFSI of these pairs (shown in Supplementary table 15 and Supplementary table 16). But the genetic map distances between locus mh01zha009/mh01zha011/ mh01zha012, mh02zha003/mh02zha013, mh02zha008/mh02KK-136, mh05zha005/mh05zha002, mh08zha003/mh08zha008/mh08zha009, mh13kk213/mh13KK-218/mh13zha002/mh13KK-223, mh16KK-302/ mh16zha003/mh16zha004, mh16zha006/mh16zha009, mh18zha004/ mh18zha005 and mh19zha006/mh19zha007/mh19zha009 were smaller than 50 cM according to the data of the HapMap project (shown in Supplementary table 17), the FSIs of these loci should be calculated according to the linkage correction formulas (shown in Supplementary table 4). The FSIs of other 10 loci without physical linkage were obtained from formulas of ITO method (shown in Supplementary Table 3). The allele frequencies for microhaplotypes were form the data of 54 unrelated individuals in our study, and the subpopulation effect was not considered. The recombination fraction of these loci was calculated by the Kosambi’s genetic mapping equation (shown in Supplementary table 17) [44]. A total of 36 microhaplotype loci were applied in our panel, and the Log10CFSI for 29 full sibling pairs ranged from 2.66 to 11.88 after linkage correction. The average Log10CFSI for these full sibling pairs was 7.55 using 36 microhaplotype loci. While the Log10CFSI for 29 pairs of unrelated individuals ranged from −3.82 to −8.30 and the average Log10CFSI for these unrelated individual pairs was −6.45 after linkage correction. These results indicated that the Log10CFSI of full sibling pairs could completely separated from the Log10CFSI of unrelated individuals pairs using these 36 microhaplotype loci (Fig. 5).

Fig. 3. A box plot of log10CPI of 38 parent–child duos separately by STR loci and microhaplotype loci.

4. Discussion With the increasing number of paternal disputes, paternity testing plays more critical role in forensic DNA analyses. STRs are the most common genetic markers used in this field. However, due to their high mutation rates, application of STR markers to some duo paternity tests involving close relatives become particularly difficult. For instance, when a close relative is the alleged father, we may be only able to detect a few non-matching loci between the profile of the child and that of the alleged parent, but it is difficult to distinguish them from mutations. To solve this problem, SNPs have been exploited by some experts [45]. The mutation rate of SNPs is low, but the polymorphism rate is limited [13]. The number of SNPs have to be significantly increased to enhance their ability in paternity testing. To reach the polymorphism information content (PIC) of 12–16 STRs, 45–65 SNPs must be required [15,16]. So their employment in caseworks is still inconvenient.

Fig. 4. A box plot of log10CPI of 55 uncle/aunt/grandparent–child duos separately by STR loci and microhaplotype loci.

parent-child duos and 1000 non-biological parent-child duos, and the results were shown in Table 3. The thresholds for identifying the relationship between alleged parent and child was set as t1 and t2. If the Log10CPI larger than t1, the alleged parent was thought to be the biological parent. While the alleged parent was thought to be the nonbiological parent if the Log10CPI was smaller than t2. If the Log10CPI was in the range of t1 and t2, the relationship between the alleged parent and child was uncertain. Even at the threshold of t1 = 5 and 8

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

Table 3 The effectiveness (system power) of the microhaplotype panel for discriminating the biological father from the close relatives based on the 1000 genomes project. t1 (Log10CPI)

t2 (Log10CPI)

Sensitivity

Specificity

PPV

NPV

Effectiveness

1 2 3 4 5

−1 −2 −3 −4 −5

1.000 1.000 1.000 1.000 0.945

1.000 1.000 0.994 0.976 0.965

1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000

1.0000 1.0000 0.997 0.988 0.955

Sensitivity: probability of judging biological parents correctly as parents; Specificity: probability of judging non-biological (uncle/aunt/grandparent) parents correctly as non-parents; PPV (Positive predictive value): proportion of subjects correctly judged as parents; NPV (Negative predictive value): proportion of subjects correctly judged as non-parents; Effectiveness(system power): proportion of biological parents correctly judged as parents or non-biological (uncle/aunt/grandparent) parents correctly judged as non-parents.

generated relatively high heterozygosity and polymorphism. They were distributed on 16 different autosomal chromosomes. By analyzing 54 unrelated individuals, we calculated the forensic parameters of 36 microhaplotypes to assess their application in forensic science. A total of 222 alleles were obversed. The microhaplotype loci possessed high PD, TPI, PIC and Ho, ranging from 0.776 to 0.940, 1.174–3.857, 0.572 to 0.836 and 0.574 to 0.870, respectively. The CPD and CPE were 0.99999999999999999999999999999999799 and 0. 999999999999548. The combined PER (uncle/aunt/grandparent) of our panel was 0.999999993 (> 0.9999), indicating that our panel had good effectiveness in preventing the misinterpretation of close relatives being biological parents. These results were similar to those of CHB and CHS population, which supported that our 36 new microhaplotypes had high polymorphism in the Chinese Han population. Furthermore, we calculated the CPI of 38 parent–child duos by 36 microhaplotype loci and 19 STR loci. For the 38 duos, the CPIs calculated using microhaplotypes panel were, in general, higher than those using Goldeneye 20A kit due to high polymorphism and larger numbers of loci in our panel. The combination of STR and microhaplotype panels could improve considerably CPIs for 38 duos. Also, CPIs were computed between a close relative and a child. For 55 non-biological parent-child duos analyzed by 19 STR loci, the CPIs of nine non-biological parentchild duos were > 10−4. It was difficult to determine the parent-child relationship of these duos. While the CPI using 36 microhaplotype loci was < 10−4 for 51 non-biological parent-child duos for the lower mutation rate of microhaplotype loci. But these relatives duos were from six unrelated extended families. some relatives duos had the same alleged parent or alleged child and the distribution of CPI of the whole population could not be represented. So the simulated study was conducted. The efficiency of excluding close relatives was evaluated by analyzing the parameters of 2000 simulated pairs, and the effectiveness was 0.988 at the threshold of t1 = 4 and t2 = −4. So our microhaplotype panel had high exclusion power and could not subject to interference from a close relative. Finally, we evaluated the efficiency of this panel of full identification of siblings. Among 29 full siblings, the average Log10CFSI for these full sibling pairs was 7.55 using linkage correction. Among 29 pairs of unrelated individuals, the average Log10CFSI for these full sibling pairs was −6.45 using linkage correction. The results suggested that this panel might be a valuable tool for full sibling identification. Our study shows that these 36 microhaplotypes are useful genetic markers in paternity testing, especially involving close relatives. But we have to acknowledge that investigating 54 unrelated Chinese Han individuals is limited, more populations need to be studied and further researches need to be done in the future.

Fig. 5. A scatter plot for log10CFSI of 29 pairs of full siblings and 29 pairs of unrelated individuals using microhaplotype loci.

Microhaplotypes are novel genetic markers proposed by Kidd et al. to supplement the forensic-marker system. The rates of mutation and recombination within microhaplotypes all significantly lower than the mutation rate for STRs [46]. So we consider it is caused by close relatives in paternity testing rather than mutations, if only one nonmatching locus is observed. If two or more mismatched loci are found in disputed paternity cases, we believe more strongly that the nonmatching loci originated from the close relatives in paternity testing. Chen’s laboratory previously reported that some microhaplotype loci used in kinship analysis have high polymorphism, but the value of Ho within these loci still lower than STRs [42]. It is necessary to find new microhaplotype loci with higher polymorphism for paternity testing involving close relatives. Moreover, the MiSeq PE300 platform can sequence DNA fragments ≤600 bp, making it feasible to sequence a microhaplotype locus containing more SNPs [22]. A “nonbinary SNP” refers to a SNP locus with more than two alleles [47]. A nonbinary SNP has higher discrimination power and exclusion power than a binary SNP, and it also demonstrates significantly lower mutation rates than STR loci [48]. We have confirmed that nonbinary SNPs are useful supplementary tools in application of mixture detection and personal identification in our previous study [21]. Chen’s previously reported study showed that the mean Ae value of 11 microhaplotypes containing binary SNPs used in kinship analysis was 3.3, while the 30 microhaplotypes with nonbinary SNPs have higher polymorphism (harmonic mean Ae = 3.8) in this study [42]. Hence, with higher polymorphism and lower mutation rates, microhaplotypes containing nonbinary SNPs are appropriate supplementary genetic markers for paternity testing. In the present study, we developed and analyzed a panel of 36 microhaplotype loci, most of which included nonbinary SNPs and, thus,

5. Conclusions Microhaplotypes proposed by Kidd and colleagues are novel genetic markers for paternity testing and human identification. We selected 30 9

Forensic Science International: Genetics 46 (2020) 102255

S. Sun, et al.

nonbinary-SNP-microhaplotypes and developed a panel containing these loci. This panel had high Ho and PIC. We also evaluated their effectiveness in preventing interference from close relatives. The parenthood for 38 parent–child duos was confirmed. The parenthood for 55 uncle/aunt/grandparent–child duos was excluded except for 4 cases for which we could not draw a conclusion. The effectiveness for simulate study was 0.988 at the threshold of t1 = 4 and t2 = −4. The microhaplotypes also could distinguish full siblings from unrelated individuals. These results demonstrated that our panel was highly efficient for paternity testing, and has potential value for kinship analyses or other applications of forensic cases.

[16] R. Chakraborty, D.N. Stivers, B. Su, Y. Zhong, B. Budowle, The utility of short tandem repeat loci beyond human identification: implications for development of new DNA typing systems, Electrophoresis 20 (1999) 1682–1696. [17] K.K. Kidd, A.J. Pakstis, W.C. Speed, R. Lagacé, J. Chang, S. Wootton, E. Haigh, J.R. Kidd, Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics, Forensic Sci. Int. Genet. 12 (2014) 215–224. [18] K.K. Kidd, W.C. Speed, Criteria for selecting microhaplotypes: mixture detection and deconvolution, Investig. Genet. 28 (6) (2016) 1. [19] K.K. Kidd, W.C. Speed, A.J. Pakstis, D.S. Podini, R. Lagacé, J. Chang, S. Wootton, E. Haigh, U. Soundararajan, Evaluating 130 microhaplotypes across a global set of 83 populations, Forensic Sci. Int. Genet. 29 (2017) 29–37. [20] R.K. Ravi, K. Walton, M. Khosroheidari, MiSeq: A Next Generation Sequencing Platform for Genomic Analysis, Methods Mol. Biol. 1706 (2018) 223–232. [21] L. Zha, L. Yun, P. Chen, H. Luo, J. Yan, Y. Hou, Exploring of tri-allelic SNPs using pyrosequencing and the SNaPshot methods for forensic application, Electrophoresis 33 (2012) 841–848. [22] Illumina, Miseq System, (2018) Guide (15027617 v04), https://support.illumina. com/downloads/miseq_system_user_guide_15027617.html . pdf. [23] P.H. Sudmant, T. Rausch, E.J. Gardner, et al., An integrated map of structural variation in 2,504 human genomes, Nature 526 (2015) 75–81. [24] F. Gao, C. Ming, W. Hu, H. Li, New software for the fast estimation of population recombination rates (FastEPRR) in the genomic era, G3 Bethesda (Bethesda) 6 (2016) 1563–1571. [25] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402. [26] K.K. Kidd, Proposed nomenclature for microhaplotypes, Hum. Genomics 10 (2016) 16. [27] B. : De Wilde, S. Lefever, W. Dong, J. Dunne, S. Husain, S. Derveaux, J. Hellemans, J. Vandesompele, Target enrichment using parallel nanoliter quantitative PCR amplification, BMC Genomics 15 (2014) 184. [28] F. Zhao, X.Y. Wu, G.Q. Cai, C.C. Xu, The application of modified-powerstates software in forensic biostatistics, Chin J Forensic Med 8 (2003) 297–298. [29] L. Excoffier, H. Lischer, Arlequin suite version 3.5: a new series of programs to perform population genetics analyses under Linux and Windows, Mol Ecol Resources 10 (2010) 564–567. [30] Y.Y. Shi, L. He, SHEsis, a powerful software platform for analyses of linkage disequilibrium, haplotype construction, and genetic association at polymorphism loci, Cell Res. 15 (2005) 97–98. [31] W.K. Funk, Y.Q. Hu, Statistical Forensics: Theory, Methods and Computation, Wiley Blackwell, 2008. [32] H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler Transform, Bioinformatics 25 (2009) 1754–1760. [33] G.A. Auwera, M.O. Carneiro, C. Hartl, From FastQ Data to high‐confidence variant calls: the genome analysis toolkit best practices pipeline, Curr Prot Bioinform 43 (2013) 1–33. [34] P. Edge, V. Bafna, V. Bansal, HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies, Genome Res. 116 (2016) 801–912. [35] D.W. Gjertson, C.H. Brenner, M.P. Baur, ISFG: recommendations on biostatistics in paternity testing, Forensic Sci. Int. Genet. 1 (2007) 223–231. [36] H. Fan, J.Y. Chu, A brief review of short tandem repeat mutation, Genomics Proteomics Bioinformatics 5 (2007) 7–14. [37] X. Zou, Y. Li, P. Li, Q. Nie, T. Wang, Genetic polymorphisms for 19 autosomal STR loci of Chongqing Han ethnicity and phylogenetic structure exploration among 28 Chinese populations, Int. J. Legal Med. 131 (2017) 1539–1542. [38] M.W. Nachman, S.L. Crowell, Estimate of the mutation rate per nucleotide in humans, Genetics 156 (2000) 297–304. [39] H.L. Lu, Q.G. Yang, ITO method to calculate the chance of blood relationship between two individuals, Chinese J Forensic Med 17 (2002) 188–191. [40] J.A. Bright, J.M. Curran, J.S. Buckleton, Relatedness calculations for linked loci incorporating subpopulation effects, Forensic Sci. Int. Genet. 7 (2013) 380–383. [41] S.M. Leal, K. Yan, B. Müller-Myhsok, SimPed: a simulation program to generate haplotype and genotype data for pedigree structures, Hum. Hered. 60 (2005) 119–122. [42] J. Zhu, P. Chen, S. Qu, Y. Wang, H. Jian, S. Cao, Y. Liu, R. Zhang, M. Lv, W. Liang, L. Zhang, Evaluation of the microhaplotype markers in kinship analysis, Electrophoresis 40 (2019) 1091–1095. [43] N. Morling, R.W. Allen, A. Carracedo, H. Geada, F. Guidet, Paternity Testing Commission of the International Society of Forensic Genetics: recommendations on genetic investigations in paternity cases, Forensic Sci. Int. 129 (2002) 148–157. [44] D.D. Kosambi, The estimation of map distance from recombination values, Ann. Eugen. 12 (1994) 172–175. [45] S.K. Mo, Z.L. Ren, Y.R. Yang, Y.C. Liu, J.J. Zhang, H.J. Wu, Z. Li, X.C. Bo, S.Q. Wang, J.W. Yan, M. Ni, A 472-SNP panel for pairwise kinship testing of seconddegree relatives, Forensic Sci. Int. Genet. 34 (2018) 178–185. [46] F. Oldoni, K.K. Kidd, D. Podini, Microhaplotypes in forensic genetics, Forensic Sci. Int. Genet. 38 (2019) 54–69. [47] Y. Liu, H. Liao, Y. Liu, J. Guo, Developing a new nonbinary SNP fluorescent multiplex detection system for forensic application in China, Electrophoresis 38 (2017) 1154–1162. [48] C. Phillips, J. Amigo, Á Carracedo, M.V. Lareu, Tetra-allelic SNPs: informative forensic markers compiled from public whole-genome sequence data, Forensic Sci. Int. Genet. 19 (2015) 100–106.

Declaration of Competing Interest None. Acknowledgments We are grateful to all volunteers who provided the samples used in this study. We are also grateful to Microanaly Gene Technologies co., Ltd for its technical guidance. This work was funded by the National Natural Science (grant number81871533); Natural Science Foundation of Hunan Province (grant number 2017JJ3422); and Shanghai Key Laboratory of Forensic Medicine Open Project (grant number KF1815); The postdoctoral fund of Central South University (Postdoctor Ying Liu NO. 220494). Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.fsigen.2020.102255. References [1] I.A. Pretty, D.P. Hildebrand, The forensic and investigative significance of reverse paternity testing with absent maternal sample, Am. J. Forensic Med. Pathol. 26 (2005) 340–342. [2] J.C. Lee, L.C. Tsai, P.C. Chu, Y.Y. Lin, C.Y. Lin, T.Y. Huang, Linacre A. Yu YJ, H.M. Hsieh, The risk of false inclusion of a relative in parentage testing - an in silico population study, Croat. Med. J. 54 (2013) 257–262. [3] I. Gornik, M. Marcikic, M. Kubat, D. Primorac, G. Lauc, The identification of war victims by reverse paternity is associated with significant risks of false inclusion, Int. J. Legal Med. 116 (2002) 255–257. [4] I. Birus, M. Marcikić, D. Lauc, S. Dzijan, G. Lauc, How high should paternity index be for reliable identification of war victims by DNA typing? Croat. Med. J. 44 (2003) 322–326. [5] N. Von Wurmb-Schwark, V. Mályusz, E. Simeoni, et al., Possible pitfalls in motherless paternity analysis with related putative fathers, Forensic Sci. Int. 159 (2006) 92–97. [6] M. Dogan, K. Murat Canturk, R. Emre, et al., Demonstration of false inclusion risks of duo parentage analyses in the Turkish population in light of parentage acceptance criteria, Aust. J. Forensic Sci. 49 (2017) 326–331. [7] M. Dogan, U. Kara, R. Emra, W.K. Fung, K.M. Canturk, Two brothers’ alleged paternity for a child: who is the father, Mol. Biol. Rep. 42 (2015) 1025–1027. [8] M. Gymrek, A genomic view of short tandem repeats, Curr. Opin. Genet. Dev. 44 (2017) 9–16. [9] J.M. Butler, Genetics and genomics of core short tandem repeat loci used in human identity testing, J. Forensic Sci. 51 (2006) 253–265. [10] Y. Lai, F. Sun, The relationship between microsatellite slippage mutation rate and the number of repeat units, Mol. Biol. Evol. 20 (2003) 2123–2131. [11] C. Børsting, J.J. Sanchez, H.E. Hansen, A.J. Hansen, H.Q. Bruun, N. Morling, Performance of the SNPforID 52 SNP-plex assay in paternity testing, Forensic Sci. Int. Genet. 2 (2008) 292–300. [12] T. Schwark, P. Meyer, M. Harder, J.H. Modrow, N. von Wurmb-Schwark, The SNPforID assay as a supplementary method in kinship and trace analysis, Transfus. Med. Hemother. 39 (2012) 187–193. [13] K.K. Kidd, A.J. Pakstis, W.C. Speed, Developing a SNP panel for forensic identification of individuals, Forensic Sci. Int. 164 (2006) 20–32. [14] J.J. Sanchez, C. Phillips, C. Børsting, A multiplex assay with 52 single nucleotide polymorphisms for human identification, Electrophoresis 27 (2006) 1713–1724. [15] A.A. Westen, A.S. Matai, J.F. Laros, H.C. Meiland, Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples, Forensic Sci. Int. Genet. 3 (2009) 233–241.

10