A 472-SNP panel for pairwise kinship testing of second-degree relatives

A 472-SNP panel for pairwise kinship testing of second-degree relatives

Accepted Manuscript Title: A 472-SNP panel for pairwise kinship testing of second-degree relatives Authors: Shao-Kang Mo, Zi-Lin Ren, Ya-Ran Yang, Ya-...

1MB Sizes 0 Downloads 4 Views

Accepted Manuscript Title: A 472-SNP panel for pairwise kinship testing of second-degree relatives Authors: Shao-Kang Mo, Zi-Lin Ren, Ya-Ran Yang, Ya-Cheng Liu, Jing-Jing Zhang, Hui-Juan Wu, Zhen Li, Xiao-Chen Bo, Sheng-Qi Wang, Jiang-Wei Yan, Ming Ni PII: DOI: Reference:

S1872-4973(18)30121-2 https://doi.org/10.1016/j.fsigen.2018.02.019 FSIGEN 1860

To appear in:

Forensic Science International: Genetics

Received date: Revised date: Accepted date:

16-10-2017 22-2-2018 25-2-2018

Please cite this article as: Shao-Kang Mo, Zi-Lin Ren, Ya-Ran Yang, Ya-Cheng Liu, Jing-Jing Zhang, Hui-Juan Wu, Zhen Li, Xiao-Chen Bo, Sheng-Qi Wang, Jiang-Wei Yan, Ming Ni, A 472-SNP panel for pairwise kinship testing of second-degree relatives, Forensic Science International: Genetics https://doi.org/10.1016/j.fsigen.2018.02.019 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A 472-SNP panel for pairwise kinship testing of second-degree relatives

IP T

Shao-Kang Mo1,5, Zi-Lin Ren1, Ya-Ran Yang2, Ya-Cheng Liu3, Jing-Jing Zhang4, Hui-Juan Wu4, Zhen Li1, Xiao-Chen Bo1, Sheng-Qi Wang1, Jiang-Wei Yan2,6*, and Ming Ni1*

1

Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850,

SC R

China. 2CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of

Genomics, Chinese Academy of Sciences, Beijing 100101, China. 3Department of Genetics,

U

Beijing Tongda Shoucheng Institute of Forensic Science, Beijing 100192, China. 4

N

Department of Biotechnology, Beijing Center for Physical and Chemical Analysis, Beijing

A

100089, China. 5Department of Reproductive Center, General Hospital of Lanzhou Military

M

Region, Lanzhou 730050, China. 6University of Chinese Academy of Sciences, Beijing,

*

Corresponding

ED

100049, China.

author:

Jiang-Wei

Yan(e-mail:

[email protected],

telephone:

CC E

PT

+861064097964) and Ming Ni(e-mail: [email protected], telephone: +8610669302421).

Email addresses for other authors: Shao-Kang Mo ([email protected]), Zi-Lin

A

Ren([email protected]), Ya-Ran Yang([email protected]), Ya-Cheng Liu ([email protected]), Jing-Jing Zhang([email protected]), Hui-Juan Wu([email protected]), Zhen Li ([email protected]), Xiao-Chen Bo (email: [email protected]) and Sheng-Qi Wang (email: [email protected]).

1

Highlights  A new 472-SNP panel for kinship analysis is proposed and validated by MPS.  SNP2kin could distinguish 2nd-degree relatives from the unrelated.  The testing power of ~6.45 SNPs are equivalent to one STR for 2nd-degree

IP T

relatives.

SC R

Abstract

Kinship testing based on genetic markers, as forensic short tandem repeats (STRs) and

U

single nucleotide polymorphisms (SNPs), has valuable practical applications. Paternity and

N

first-degree relationship can be accurately identified by current commonly-used forensic

A

STRs and reported SNP markers. However, second-degree and more distant relationships

M

remain challenging. Although ~105–106 SNPs can be used to estimate relatedness of higher

ED

degrees, genome-wide genotyping and analysis may be impractical for forensic use. With rapid growth of human genome data sets, it is worthwhile to explore additional markers,

PT

especially SNPs, for kinship analysis. Here, we reported an autosomal SNP panel consisted

CC E

of 342 SNP selected from >84 million SNPs and 131 SNPs from previous systems. We genotyped these SNPs in 136 Chinese individuals by multiplex amplicon Massively Parallel Sequencing, and performed pairwise gender-independent kinship testing. The specificity and

A

sensitivity of these SNPs to distinguish second-degree relatives and the unrelated was 99.9% and 100%, respectively, compared with 53.7% and 99.9% of 19 commonly-used forensic STRs. Moreover, the specificity increased to 100% by the combined use of these STRs and SNPs. The 472-SNP panel could also greatly facilitate the discrimination among different 2

relationships. We estimated that the power of ~6.45 SNPs were equivalent to one forensic STR in the scenario of 2nd-degree relative pedigree. Altogether, we proposed a panel of 472 SNP markers for kinship analysis, which could be important supplementary of current

IP T

forensic STRs to solve the problem of second-degree relative testing.

SC R

Keywords

Kinship testing, Genetic markers, SNP2kin, Single nucleotide polymorphism (SNP), Short

U

tandem repeat (STR), Second-degree relatives

A

N

1. Introduction

M

Kinship testing, or accurate inference of relationships among individuals, is a valuable application of human genetics. Kinship testing has critical role in solving inheritance

ED

disputes, missing person searches, criminal investigations, and identifying disaster or war

PT

victims. Kinship testing is based on elaborately selected DNA markers. Forensic short tandem repeat (STR) loci are the gold-standard markers proposed for decades, such as the

CC E

U.S Combined DNA Index System (CODIS; 13 STR loci) and the Extended European Standard Set (17 STR loci) [1]. The STR markers are highly polymorphic in human

A

populations and thus have high per-locus discrimination power in kinship testing. Reliable discrimination in human identity (HID) and paternity testing can be achieved with the 15–19 commonly used forensic STRs [1,2]. Close relatives as full-siblings can also be distinguished from unrelated persons by typing additional STR markers [3–5]. However, to identify more distant relatives (complex kinship testing) as second-degree relatives 3

(avuncular, grandparent-child, and half-sibling), the power of current available STR markers is insufficient. One practice to enhance discrimination power in complex kinship testing is to include multiple known relatives to test a person (e.g., use two or more full-siblings to determine an alleged nephew), and one can also type additional DNA markers, usually Y- or

IP T

mitochondrial markers [6–10]. However, to involve more individuals and paternal/maternal lineage specific markers limits the applicability of kinship testing, and the outcome might be

SC R

inconclusive by currently available STR markers.

More markers are needed to increase the power of complex kinship testing. Compared

U

to STRs, SNPs have a much greater availability of loci for candidate markers [11–13]. Large

N

public resources such as the 1000 Genomes Project (1KGP) [14,15] and the UK10K Project

A

[16] have provided abundant datasets of human SNP genotypes and their population

M

frequencies. Massively Parallel Sequencing (MPS) and microarray technologies make it

ED

convenient to type large numbers of SNP loci in parallel. Nonetheless, the primary shortage of SNPs in testing is that they are normally bi-allelic and thus have much lower per-locus

PT

discrimination power than STR markers, which usually have seven to 16 alleles.

CC E

Consequently, SNPs are usually regarded as supplementary to STR markers [12,17–24]. SNPs have low germline mutation rates (~10-8 compared to 10-4–10-3 for STRs) [19,23–25] and SNP genotyping relies on short amplicons. When testing result is ambiguous due to

A

germline mutation of STRs or STR typing of highly-degraded DNA fails, typing additional SNP markers is useful [3,23,25]. Currently, the main SNP marker systems for HID and kinship testing include the SNPforID multiplex assay (52 SNPs) [17] and the SNP panel for individual identification (IISNP, 92 SNPs) [18,26,27], both first proposed in 2006. Very 4

recently, Zhang et al. (2017) developed an expanded 273-SNP system designed for Chinese Han population including markers selected from SNPforID, IISNP, and the HapMap database. However, the complex kinship testing of second-degree relatives still remains unsolved.

IP T

The main question is how many markers are needed. Phillips et al. suggested that marker sets with medium-scale multiplexing (256-1,000) could be suitable for challenging kinship

SC R

analyses [25]. In our previous simulation study, we predicted that ~490 SNPs with their

minor allele frequency distribution comparable to those of the SNPforID and IISNP are

U

required to determine second-degree relatives [29]. It has been estimated that 10–15 STR

N

loci being comparable to ~50 SNPs in paternity testing [17,20,21]. Thus, the power of

A

current available SNP markers in addition with the forensic STRs, is still supposed to be

M

insufficient for pairwise kinship testing of second-degree relative. Additional SNP markers

ED

should be explored and validated in complex kinship testing. Of note, to distinguish distant relatives can also be achieved through a large number

PT

(~104–106) of SNP loci, though these SNPs cannot be technically regarded as markers for

CC E

kinship testing. In a variety of human genetics studies, ensuring exact pedigrees of samples and excluding cryptic relatives are critical [30–33], considering common errors in reported pedigrees and samples [34–38]. Many methods and tools have been proposed to estimate

A

relatedness among individuals by using SNPs typed by microarray or MPS [28,39–49]. With SNPs of such order of magnitude, distant relatives up to ninth-degree are predictable [42,44]. However, till now, there have been few efforts to select a specific set of markers, named as a panel, which includes a limited number of makers and with sufficient power to distinguish 5

second-degree relatives. A panel provides many advantages in actual kinship testing cases compared with the methods applied in genetic researches. First, genotyping SNPs in a panel by MPS of multiplex PCR products (amplicon sequencing) has lower requirements of DNA amount (ng or sub-ng) and integration (amplicon < 200 bp) [28,50] than genome-wide

IP T

microarray or exome/whole-genome sequencing. Namely, it can be applied to degraded DNA samples, which are often encountered in actual cases. The per-sample experimental

SC R

and analyzing cost is also low. Once established, a large number of individuals can be genotyped to extensively evaluate the power of panel in kinship testing. To type a specific

U

set of markers for kinship testing is more likely to become a routine approach than

N

genome-wide genotype tying based methods.

A

In this study, we aim to develop a SNP panel that is efficient for pairwise

M

gender-independent kinship testing of second-degree relatives. We proposed SNP2kin, a

ED

472-SNP panel selected from millions of human SNPs and from previous SNP marker systems. SNP2kin greatly expanded the utility of SNP markers for kinship analysis. We

PT

genotyped SNP2kin in 136 individuals by amplicon sequencing, and performed kinship

CC E

testing of 59 pairs of second-degree relatives and 8,281 pairs of unrelated persons. We found that SNP2kin had a high reliance to distinguish second-degree relatives and unrelated

A

individuals in pairwise.

2. Materials and methods 2.1 Selecting SNP markers for kinship testing The process of candidate SNP selecting is shown in Fig. 1. The original SNP datasets 6

included over 84.7 million bi-allelic SNPs deposited in the 1KGP. Multiple criteria on various characteristics of SNPs were used, which are respectively depicted below. Genome positions. SNPs at or within 100 kbp up- or downstream of coding/noncoding genes were excluded to avoid potential influence of selection pressure on SNP population Gene

annotations

were

obtained

from

the

GENCODE

project

IP T

frequencies.

(http://www.gencodegenes.org) [51,52]. The distance between two adjacent SNPs should be

SC R

no smaller than 10 kbp.

Hardy–Weinberg equilibrium (HWE). The HWE of SNPs was extracted from datasets

N

(p value for HWE testing > 0.05) in at least one dataset.

U

of genome-wide association studies deposited in the NCBI dbGaP. SNPs should be at HWE

A

Heterozygosity. SNPs were filtered by heterozygosity in populations obtained from the

M

1KGP and UK10K project. From the 1KGP, we selected 853 unrelated individuals who had

ED

a kinship coefficient < 0.0884 when using KING v1.4 with the “–kinship” parameter [41], based on 134 previously proposed autosomal HID SNPs (SNPforID, n = 50, IISNP, n = 88,

PT

four SNPs are common). Unrelated individuals were from African (n = 294), admixed

CC E

American (n = 120), East Asian (n = 223), and European (n = 216) super-continental populations. Heterozygosity of SNPs was required to be ≥ 0.3 for a population of all 853 individuals, and ≥ 0.2 for each super-continental population. SNPs were further filtered by

A

utilizing heterozygosity data from the UK10K project, which includes 4,000 individuals divided into two subpopulations (ALSPAC, n = 2,000 and TWINSUK, n = 2,000). As individual genotypes were unavailable for examination of potential relatedness, higher criteria for SNP heterozygosity were used: ≥ 0.4 for the total and ≥ 0.3 for both 7

subpopulations. Fixation index among populations. Fixation index (FRT) measures the variance of allele frequencies between continental populations [53]. The FRT of each SNP was determined based on genotypes of the 853 unrelated individuals from the 1KGP. SNPs with FRT ≤ 0.03

IP T

were retained. The FRT is calculated as

SC R

,

where He(i) is the expected heterozygosity of the regional-continental population, i=R, or of the total population, i=T. The regional-continental populations include the East Asian,

N

U

African, admixed American, and European.

A

Linkage disequilibrium (LD) among SNPs. We calculated the LD of retained SNPs by

M

using PLINK v1.07 to estimate pairwise r2 values for LD with parameters “--r2 --matrix” [39]. The calculation was conducted pairwise between candidate SNPs and 134 previously

PT

1 previous HID SNPs.

ED

proposed autosomal HID SNPs. Candidate SNPs were excluded if they had r2 ≥ 0.01 with ≥

Pairwise LD values among remaining candidate SNPs were also obtained. Two SNPs

CC E

were considered in LD if their r2 was ≥0.01. Thus, each SNP was a node in a network with a degree denoting the number of other SNPs that were in LD with it. Subsequently, SNPs were

A

excluded one after another according to their descending ranked degrees in the network until there was no LD with a ≥0.01 r2 among the SNPs. Experimental validation. Retained SNPs, together with previously identified HID SNPs included in 1KGP (n = 134), comprised a panel for multiplex-PCR primer design and genotyping in actual samples. Only SNP sites for which primers could be successfully 8

designed and which could be amplified and sequenced in >20% of samples were retained in SNP2kin. Details of experimental techniques are described in the “Multiplex Amplification of SNPs and Sequencing section”.

IP T

2.2 Sample collection Whole-blood or FTA card (Whatman Bioscience, Cambridge, UK) blood samples were

SC R

collected from 136 Chinese individuals in a population of 77 unrelated families in Beijing,

China. DNA was extracted from whole-blood samples by using the PureLink Genomic DNA

U

kit (Invitrogen, MA, USA) following the manufacturer’s instructions. DNA was stored at

N

-80 ℃ and quantified by quantitative real-time PCR (qPCR) with the Quantifiler Trio DNA

A

Quantification Kit (Applied Biosystems, MA, USA) before use. For FTA card samples, a

M

punch of each sample was used for multiplex amplification directly (without DNA

ED

extraction).

PT

2.3 Multiplex amplification of SNPs and sequencing

CC E

A set of 497 SNPs (Supplementary Table 1) was used as input for an online multiplex PCR primer design system, Ion AmpliSeq Designer (https://www.ampliseq.com, Thermo Fisher Scientific, MA, USA), with the design model selected for formalin-fixed paraffin-embedded

A

DNA samples (germline and somatic). Primers (Supplementary Table 2) were synthesized by Invitrogen. Multiplex amplification for DNA samples (1.5 ng input) or FTA card samples (1.0 mm punch) was performed in a pool by using the mix in Ion AmpliSeq Library Kit 2.0 (Life Technologies) with a thermal cycling sequence of 2 min at 99 ℃, 18 cycles of 15 s at

9

99 ℃ and 4 min at 60 ℃, and a hold at 10 ℃ for up 60 min. PCR products were prepared for MPS libraries by using Ion AmpliSeq Library Kit 2.0, according to the manufacturer’s instructions, with the small modification of increasing the number of amplification cycles to 21 for FTA samples. Libraries were quantified by qPCR with the Ion Library Quantitation

IP T

Kit (Life Technologies) on a 7500 Real-Time PCR System (Applied Biosystems). The Ion Personal Genome Machine (PGM, Thermo Fisher Scientific) MPS platform was used to

SC R

generate ~160-bp single-end reads with Ion PGM Template OT2 200 Kit, Ion PGM Sequencing 200 Kit v2, and Ion 318 Chip v2.

N

U

2.4 SNP calling

A

Two pipelines were implemented for SNP genotyping from MPS reads. The hg19 (GRCh37)

M

human assembly was used as a reference [54]. First, SNPs were called by using the Torrent

ED

Variant Caller v4.4.8 plugin in Torrent Suite v4.4.2 (installed in PGM, Life Technologies) with the settings “Generic – PGM – Germ Line – Low Stringency”. SNPs were further

PT

filtered to require a genotype quality value ≥ 30 in the output Variant Call Format files.

CC E

Second, raw reads were analyzed by using BWA and SAMtools [55–57]. Trimmomatic v0.33 was used for read quality control with parameters “maxinfo:40:0.8 crop:150 minlen:40 headcrop:2” [58]. Clean reads were aligned to the human genome by using

A

BWA-MEM v0.7.12 as single-end reads with default settings. SAMtools was implemented to generate the “mpileup” file from the SAM-formatted alignment files. The mpileup files were used as inputs for BCFtools v1.2 (http://samtools.github.io/bcftools) for SNP calling with parameters “-mvO v”. SNPs of individuals with consistent genotypes called by both

10

pipelines were retained in subsequent analyses.

2.5 STR genotyping Forensic STRs were genotyped by using the PowerPlex 21 kit (Promega, WI, USA) and

IP T

GoldenEye 20A kit (Beijing PeopleSpot Inc., Beijing, China). The 19 STR markers shared by both kits were used for the following analyses: D2S1338, D3S1358, D5S818, D6S1043,

SC R

D7S820, D8S1179, D12S391, D13S317, D16S539, D18S51, D19S433, D21S11, CSF1PO,

FGA, Penta D, Penta E, TH01, TPOX, and vWA. PCR amplification was performed

U

according to the manufacturer’s recommendations. Analysis of amplified PCR products was

N

performed on the ABI 3130xl Genetic Analyzer (Applied Biosystems) capillary

A

electrophoresis instrument. Data were analyzed using Genemapper ID 3.2 software (Applied

M

Biosystems).

ED

2.6 Likelihood ratio of kinship testing

PT

The software MERLIN (v1.1.2), which implements the Lander-Green algorithm, was employed to calculate the likelihood ratio (LR) accounting for linkage in kinship testing

CC E

with parameters “--likelihood –perFamily” [59]. For a person within a given pedigree, the LR model compares likelihood values (L) based on genotypes of autosomal markers (G) of

A

two alternative hypotheses: H0 (test person is the specific member in the relationship pedigree) and H1 (test person is unrelated). Namely, LR = L(G|H0)/L(G|H1). In this study, we used LR'i, j to denote the likelihood ratio that an individual will be in one relationship i (hypothesis, Hi) against another relationship j (Hj). Namely, LR'i, j = L(G|Hi)/ L(G|Hj) =

11

LRi/LRj, where LRi or LRj is the LR of Hi or Hj against H1. To evaluate the effect of genetic linkage on testing, the LR ignoring genetic linkage among markers were also calculated as previous introduced [29]. Allele frequencies of the 19 STR loci used for LR calculation were from a Chinese

IP T

population [60], and SNP2kin from the East Asian population in 1KGP (Supplementary Table 1). Genetic distances (centiMorgan) of STRs were obtained from the reference data of

(ftp://ftp.ncbi.nlm.nih.gov/hapmap/, Supplementary Table 6).

N

U

2.7 Availability of data and materials

SC R

ILIR (v1.02) [61], and genetic distances of SNPs were from the HapMap data

in

the

NCBI

SRA repository

under

M

available

A

The accession numbers for raw reads of the 136 samples reported in the current study are

All

no.

PRJNA347598

accession

numbers

ED

(http://www.ncbi.nlm.nih.gov/bioproject/PRJNA347598/).

BioProject

PT

(SRS1743256–SRS1743391) of the samples are listed in Supplementary Table 3.

CC E

3. Results

3.1 The 472-SNP panel for kinship testing (SNP2kin)

A

First, we selected additional SNP markers for kinship testing. Compared to STRs, resources for human SNPs are abundant, and their low per-locus power could be largely compensated by their greater numbers. SNPs can be batch-genotyped by amplicon sequencing, even for extremely low-quality DNA [11]. We integrated public human population genome resources to

find

additinal

SNPs

for

HID

use. 12

Resources

included

the

1KGP

(http://www.internationalgenome.org/), NCBI dbGaP (http://www.ncbi.nlm.nih.gov/gap),

CC E

PT

ED

M

A

N

U

SC R

IP T

and the UK10K (http://www.uk10k.org/) project.

Fig.1. Scheme to determine markers of SNP2kin from candidate SNPs. Input SNP dataset is from the 1KGP and then filtered by multiple filters listed at right. Details and cutoffs of these

A

filters are described in Methods Section. The autosomal SNP markers in SNPforID and IISNP were included except three sites (rs10768550, rs2291395, rs9606186) that are located in <10 kbp regions. M, million. HWE, Hardy-Weinberg equilibrium. HID, human identity.

13

With criteria on genome position, population allele heterozygosity, Hardy–Weinberg equilibrium, fixation in population, and linkage disequilibrium, we identified 363 additional autosomal SNP markers from among >84.7 million SNPs (Fig. 1; see details in “Methods”). Incorporating these new SNPs with previously proposed autosomal SNP markers in

IP T

SNPforID and IISNP [12,17,18], we obtained a total of 497 autosomal HID SNPs (Supplementary Table 1). We excluded 25 SNPs, including three SNPs from the IISNP

SC R

which are too closely located (< 10 kbp) in the human genome, 15 SNPs which were failed

of multiplex PCR primers design and seven SNPs with a low genotype calling rate (failed

U

in >20% samples). Finally, we obtained a 472-SNP panel, named SNP2kin, for kinship

N

testing of second-degree relatives. We previously predicted that a panel containing ~490

pairwise

second-degree

relationships,

M

for

A

SNP markers, comparable to the size of the 472-SNP system, would provide reliable testing including

avuncular,

half-sibling,

and

ED

grandparental-child relatives [29]. These three relationships are in one kinship class and

A

CC E

relatives.

PT

could be considered equivalent in this study. We refer to these relationships as 2nd-degree

14

Fig.2. The heterozygosity of the 472 SNPs in SNP2kin. Correlation of SNP heterozygosity according to unrelated individuals from our data (n = 77) and East Asian (EAS, n = 223) populations in the 1KGP. Linear regressions are shown with R2.

IP T

Among subjects from 77 Chinese families (Supplementary Table 3), 75.8% of the SNPs in SNP2kin had a heterozygosity > 0.4, and 22.5% had a heterozygosity within 0.2–0.4. SNP

SC R

heterozygosity was highly correlated with SNP heterozygosity of East Asian populations in 1KGP (n = 223, R2 = 0.714, Fig. 2) and weakly correlated with SNP heterozygosity in

U

African, admixed American, and European populations (R2 = 0.23–0.48, Supplementary Fig.

N

1b-c). Nonetheless, 98.3% of the SNPs had a heterozygosity > 0.2 in other supercontinent

A

populations, implying the applicability of SNP2kin for kinship testing in these populations.

M

Nine SNPs from SNPforID (rs10495407, rs2056277, rs1335873, rs1886510, rs2016276,

ED

rs1528460, rs740910, rs1024116 and rs2830795) had a <0.2 heterozygosity in African and/or East Asian populations (Supplementary Table 1). Two SNPs (rs1897859 and

PT

rs2368108) had a <0.1 heterozygosity among individuals typed in this study, but >0.35 in

CC E

populations of 1KGP. The inconsistency might be attributed to biased or insufficient sampling of Chinese population.

A

3.2 Discrimination of 2nd-degree relatives and unrelated individuals by SNP2kin We conducted pairwise and gender-independent kinship testing among 136 individuals, including 105 relatives from 46 families (Supplementary Table 4) and 31 unrelated persons.

15

For a comparison, besides the SNP markers within SNP2kin, we typed 19 commonly-used forensic STR markers for most of the individuals. Based on a set of given markers, we calculated the LR, accounting for genetic linkage among markers, of the hypothesis that relationships would be that of 2nd-degree relatives vs. unrelated by using SNP2kin. We found

IP T

that taking linkage into account led a ~3 adjustment to the SNP2kin-based log10(LR) values (mean 8.5, SD = 3.2) for the 2nd-degree relatives compared with those ignoring linkage

SC R

(mean 5.1, SD = 3.2; linear correlation coefficient, 0.952, Supplementary Fig. 2 and

Supplementary Table 5). For the unrelated, although the correlation between LR accounting

U

for and ignoring linkage was relatively low (linear correlation coefficient, 0.875), 99.15% of

N

the unrelated pairs had a log10(LR) accounting for linkage and a log10(LR) ignoring linkage

A

CC E

PT

ED

M

A

both < 0.

16

IP T SC R U N A M ED PT CC E

Fig.3. Discrimination of 2nd-degree relatives and unrelated controls. The distributions of

A

log10(LR) values of 2nd-degree relatives and the unrelated respectively by using the 19 forensic STRs (a), SNPforID+IISNP (b), SNP2kin (c) and both SNP2kin and the STRs (d).

In Fig. 3, we exhibit that distributions of log10(LR) values (accounting for linkage among markers if not otherwise specified) for the 2nd-degree relatives and unrelated 17

individuals by respectively using the 19 forensic STRs, the autosomal SNP markers within SNPforID and IISNP, and SNP2kin. As including a much larger size of markers, SNP2kin remarkably improved the testing efficacy compared to the STR-based and previous SNP system-based testing. Discrimination of 2nd-degree relatives from unrelated individuals was

IP T

unfeasible when using the 19 STRs. Only 22.2% (12/54) of 2nd-degree relative pairs had a log10(LR) ≥ 3 (i.e., possibility that individuals were relatives was 1000-fold higher than

SC R

possibility of being unrelated), and 40.7% (22/54) 2nd-degree relatives had a 1 ≤ log10(LR) <

3. The corresponding percentages by markers in SNPforID and IISNP were 27.1% (16/59)

U

and 59.3% (35/59), respectively. Due to sample variations, the ratios of confident

N

genotyping of the 472 SNPs varied among samples, and we included individual pairs that

A

had ≥400 shared SNPs. The percentages of log10(LR) ≥ 3 and 1 ≤ log10(LR) < 3 for

M

SNP-based testing was 94.9% (56/59) and 5.1% (3/59), respectively. On the other side, for

ED

the unrelated individual pairs, only 0.85% (70/8281) had a SNP2kin-based log10(LR) > 0, compared to 10.0% (470/4679) and 10.5% (872/8281) for the STR-based and

PT

SNPforID+IISNP -based testing, respectively.

CC E

We further combined the SNP2kin and the 19 forensic STRs to the kinship testing of 2nd-degree relative. Intriguingly, we found that the STR+SNP2kin marker set could completely distinguish the 2nd-degree relatives and the unrelated (Fig. 3d). The minimum

A

log10(LR) value for 2nd-degree relatives was 4.07; whereas the maximum value for the unrelated was 1.73. Beside the LR approach, we also examined the power of discrimination by SNPs based on an identical-by-descent approach, and found that it was unfeasible (Supplementary Fig. 3). 18

3.3 System efficacy of markers in 2nd-degree relative scenario To evaluate the system efficacy of SNP2kin for kinship testing in the scenario of 2nd-degree relative pedigree, we obtained the sensitivity and specificity to distinguish the relatives and

IP T

the unrelated. Compared with the 19 forensic STRs and SNPforID+IISNP, SNP2kin had a remarkably improved testing efficacy (Table 1). At a log10(LR)>2 threshold for

SC R

discrimination, SNP2kin had a 99.9% specificity and 100% sensitivity. Although the STRs

and SNPforID+IISNP had a comparable specificity, their sensitivities decreased to 53.7% and 50.9%, respectively. The receiver operating characteristic (ROC) curves respectively by

N

U

using the forensic STRs, SNPforID+IISNP, and SNP2kin are shown in Fig. 4a and their area

A

under curve (AUC) values were 0.971, 0.982 and 0.99995, respectively. Moreover, if

M

SNP2kin were combined with the STRs in testing, all the 2nd-degree relatives could be distinguished from the unrelated and the AUC value of ROC curves equaled one. These

ED

results suggested that the available forensic STRs supplemented by SNP2kin would be a

CC E

PT

promising marker system in the practical cases of 2nd-degree relative kinship testing.

Markers

A

Forensic STRs1

SNPforID +IISNP2

SNP2kin

Threshold of log10(LR)

Sensitivity (%)

Specificity (%)

False Positivity (%)

3 2 1 3 2 1 3 2 1

44.44% 53.70% 62.96% 27.12% 50.85% 86.44% 94.92% 100.0% 100.0%

100.00% 99.89% 98.38% 99.99% 99.70% 97.81% 99.94% 99.86% 99.72%

0.00% 0.11% 1.62% 0.01% 0.30% 2.19% 0.06% 0.14% 0.28%

19

SNP2kin +Forensic STRs

3 2 1

100.00% 100.00% 100.00%

100.00% 100.00% 99.81%

0.00% 0.00% 0.19%

Table 1. Summary of testing efficacy at three thresholds of log10(LR) and by using different sets of markers. 1 19 STRs, see section STR genotyping for details. 2 131 SNPs, rs2269355,

IP T

rs938283, rs10768550, rs2291395 and rs9606186 in SNPforID or IISNP are not included

A

CC E

PT

ED

M

A

N

U

SC R

because of lack of population frequency or too close distances.

20

IP T SC R U N A M ED PT CC E A Fig.4. The 2nd-degree relative testing efficacy of markers in the scenarios of 2nd-degree relative pedigree. a The ROC curves for discrimination between 2nd-degree relatives and the unrelated by using the STRs, SNPforID+IISNP and SNP2kin. b The variation of testing 21

efficacy, measured by 1–AUC, with increasing number of STR or SNP markers. The y axis is logarithmically scaled. c Correlation of the forensic STR number and the number of SNPs

IP T

with equivalent testing efficacy. The equation and R2 of linear regression are shown.

By using AUC as a metric, we found that the testing efficacy increased exponentially as

SC R

number of SNP or STR markers increasing (Fig. 4b). Moreover, we could quantitatively compare the power of SNP markers within SNP2kin and the forensic STRs in the scenario of

U

2nd-degree relative pedigree. We estimated that the power of approximate 6.45 SNPs was

N

equivalent to that of one forensic STR in 2nd-degree relative testing (Fig. 4c). Thus SNP2kin

A

that included 472 SNP markers had a comparable power with ~73 hypothetical forensic STRs,

M

which was actually beyond the number of currently available STR markers. This SNP to

ED

STR ratio was larger than the estimation of ~50 SNPs being comparable to 10–15 STR loci in paternity testing [17,20,21], suggesting the per-locus power of SNP markers compared to

PT

STRs was smaller in testing of more distant relationship.

CC E

3.4 Discrimination among different relationships Next, we investigated the efficacy of SNP2kin and SNP2kin combined with forensic STRs

A

to distinguish among 2nd-degree relative (denoted as h), full-sibling (1st-degree relative, f), parent-child (p), and 1st-cousin (c) relationships. The problem is, for a given pair of related individuals, to select the most confident relationship from the four candidates, h, f, p, and c. Discrimination among relationships was useful in cases which needed to reconstruct the pedigree of subjects, instead of to test whether they were alleged relatives or unrelated. To 22

avoid confusion, we used LR'i, j to denote LR of being relationship i (i = h, f, p, or c) vs. relationship j (j = h, f, p, or c; j ≠ i), and log10(LR'i, j) = log10(LRi) - log10(LRi), where LRi/j

A

N

U

SC R

IP T

is the LR of being related vs. unrelated.

M

Fig.5. Discrimination of 2nd-degree relatives from full-siblings (a) and 1st-degree cousin (b)

ED

by the likelihood ratio of one relationship vs. others. The black and gray hollow circles denote

PT

that the values are below and above the zero baseline, respectively.

CC E

We performed the calculation for the relative pairs included in this study (Supplementary Table 4). In the range of h, f, p, and c, we found that all full-siblings and parent-child pairs in this study could be correctly determined by using either the 19 forensic

A

STRs or SNP2kin. However, the efficacy of the STRs was insufficient to determine 2nd-degree relatives. We exhibit the log10(LR'h, f) and log10(LR'h, c) of the 2nd-degree relatives in Fig. 5. Based on the STRs, two pairs of 2nd-degree relatives were wrongly determined as full-siblings (log10(LR'h, f) < 0) and 16 pairs of 2nd-degree relatives was determined as 23

cousins (log10(LR'h, c) < 0), leading a high total error rate of 33.3% (18/54). Comparatively, none of the 2nd-degree relatives were determined as full-sibling by using the SNP2kin, and four 2nd-degree relative pairs were determined as 1st-cousins. Thus the error rate was 6.8% (4/59) for SNP2kin. Moreover, we found that three of the four wrongly determined

IP T

2nd-degree relative pairs could be corrected by using both SNP2kin and the STRs, and the error rate further decreased to 1.9% (1/54), suggesting that combining the power of SNP2kin

SC R

and forensic STRs was also favorable in the discrimination among relationships.

U

4. Discussions

N

Our study exhibits the potential of SNP markers in kinship analysis. To involve a

M

A

medium-scale number of SNP markers compensated their shortage of low per-locus power. We estimated that SNP2kin (a 472-SNP panel) was comparable to 73 hypothetic forensic

ED

STRs, which was much beyond the number of normally available forensic STRs. By solely

PT

using SNP2kin, the sensitivity and specificity to distinguish the 2nd-degree relatives and the unrelated individuals were 100% and 99.9% with an optimized cutoff (log10(LR) > 2),

CC E

respectively. Moreover, when we combined SNP2kin and forensic STRs, all the 2nd-degree relatives were correctly distinguished from the unrelated. Therefore, SNPs could acts as

A

important and critical supplementary markers of forensic STRs in kinship testing of 2nd-degree relatives. We should address that we did not suggest that using SNP markers to replace forensic STRs in complex kinship testing. These STRs are forensically well-established markers. The STR databases contain millions of profiles, and STR typing is a routine technique in many forensic laboratories worldwide [62]. STRs have 24

well-characterized population frequencies and locus-specific mutation rates which are important parameters in calculation. These advantages of forensic STRs are also the deficiencies of SNP markers that need to be improved. In this study, hundreds of SNPs within SNP2kin were typed through multiplex

IP T

amplification in a pool and MPS. With a relatively stringent bioinformatics quality control, the numbers of confident SNP typing varied among samples (451.9±11.7 in the 472-SNP

SC R

panel, Supplementary Table 3). The multiplex primers and/or amplification experimental protocol might need to be improved for further studies. The main reason of invalid

U

genotyping was biased amplification efficiency at different loci. Improvement of primer

N

design and regimen of thermal cycling might be helpful. More advanced commercially

A

available services may also be used in the future. The performance of SNP typing for DNA

M

from different human materials should also be examined, considering its applicability in

ED

forensic uses. On the other hand, in this study we separately typed STRs by using capillary electrophoresis and SNPs by MPS. Co-amplification of SNPs and STRs and simultaneously

PT

typing on MPS platform has been developed in forensic community [63]. This strategy

CC E

significantly reduces the costs and procedures of genotyping, and may be applied to a 19-STR and SNP2kin combined system in the future. Although the kinship testing of 2nd-degree relatives was reliant by using SNP2kin and

A

the forensic STRs, the efficacy of the combined system was insufficient in kinship analysis for more distant relationships. Based on our previous prediction, approximately 2,000 SNP genetic markers with >0.2 minor allele frequencies are required to distinguish pairwise third-degree relationship [29]. The exploring of more SNP markers capable for kinship 25

testing, as well as high quality, low cost genotyping of ~103 SNPs, are challenging but worthwhile. Compared with the analysis using ~104–106 SNPs typed by whole-genome sequencing or large human genotype microarray, to use customized set of selected markers is an alternative and probably more applicable approach for kinship analysis. It also should

IP T

be addressed that, microhaplotypes (microhaps) is another promising genetic markers for forensic uses. Compared to SNPs, microhaps have higher per-marker power to distinguish

SC R

relatives. The inclusion of microhaps in one battery could increase the testing power without

introducing too many amplicons. Several studies have shown the usefulness of microhaps in

U

measuring biological relatedness and resolving DNA mixtures [64–66].

N

To explore more SNP markers, some criteria used for candidate SNPs selection in this

A

study could be adjusted. For instance, the requirements of SNPs not located within or

M

adjacent annotated functional elements could be relaxed, which led a remarkable reduction

ED

of the candidate SNPs. The LD cutoff of r2 ≥ 0.01 also seemed excessive and could be elevated. Besides, filtering SNPs with LD on different chromosomes were not necessary.

PT

In sum, we have developed an efficient panel SNP2kin including 472 autosomal SNP

CC E

markers analyzed with MPS technology, which can significantly elevate the power of kinship testing. As supplements of 19 forensic commonly used STRs, these SNPs could sufficiently distinguish 2nd-degree relatives from unrelated individuals, as well as more close

A

relationships. The testing was gender-independent and pairwise, and, thus, has broad applicability. With suitable optimization of genotyping experiment, use of the SNP and STR combined system is promising for practical forensic applications. The methodology might also provide insights for kinship analysis research. 26

Authors' contributions MN, JWY, and SKM conceived and designed this study. JWY, YCL, and HJW collected and prepared the samples. MN, SKM, and JWY developed the methods and protocols. SKM, JJZ,

IP T

YRY, and ZL performed the experiments. SKM, MN, ZLR and SQW analysed the data. MN, JWY, and SKM prepared the figures and the manuscript. JWY and XCB supervised the study. All

SC R

authors read and approved the final manuscript.

U

Acknowledgements

N

We thank all of the volunteers for providing samples. We thank Hua Chen for his helpful

A

comments and advices on the manuscript. This work was founded by the National Science

ED

M

Foundation of China (81330073, U1435222) and HBAMMS (AWS16J003).

PT

Compliance with Ethical Standards Conflict of Interest: none.

CC E

Ethical approval: “All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and

A

with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.” Informed consent: “Informed consent was obtained from all individual participants included in the study.”

27

References J.M. Butler, Genetics and genomics of core short tandem repeat loci used in human identity testing, J. Forensic Sci. 51 (2006) 253–265. doi:10.1111/j.1556-4029.2006.00046.x.

[2]

M. a Jobling, P. Gill, Encoded evidence: DNA in forensic analysis., Nat. Rev. Genet. 5 (2004) 739–751. doi:10.1038/nrg1455.

[3]

I. Lindner, N. von Wurmb-Schwark, P. Meier, R. Fimmers, A. Büttner, Usefulness of SNPs as Supplementary Markers in a Paternity Case with 3 Genetic Incompatibilities at Autosomal and Y Chromosomal Loci, Transfus. Med. Hemotherapy. 41 (2014) 2–2. doi:10.1159/000357989.

[4]

E. Ochiai, M. Osawa, T. Tamura, K. Minaguchi, K. Miyashita, Y. Matsushima, et al., Effects of using the GlobalFilerTM multiplex system on parent-child analyses of cases with single locus inconsistency., Leg. Med. 18 (2016) 72.

[5]

H. Inoue, S. Manabe, K. Fujii, Y. Iwashima, S. Miyama, A. Tanaka, et al., Sibling assessment based on likelihood ratio and total number of shared alleles using 21 short tandem repeat loci included in the GlobalFiler™ kit, Leg. Med. 19 (2017) 122–126. doi:10.1016/j.legalmed.2015.07.008.

[6]

T.E. King, G.G. Fortes, P. Balaresque, M.G. Thomas, D. Balding, P.M. Delser, et al., Identification of the remains of King Richard III, (2014) 1–8. doi:10.1038/ncomms6631.

[7]

B. Rolf, W. Keil, B. Brinkmann, L. Roewer, R. Fimmers, Paternity testing using Y-STR haplotypes: Assigning a probability for paternity in cases of mutations, Int. J. Legal Med. 115 (2001) 12–15. doi:10.1007/s004140000201.

[8]

P. Gill, C.H. Brenner, J.S. Buckleton, a. Carracedo, M. Krawczak, W.R. Mayr, et al., DNA commission of the International Society of Forensic Genetics: Recommendations on the interpretation of mixtures, Forensic Sci. Int. 160 (2006) 90–101. doi:10.1016/j.forsciint.2006.04.009.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

[1]

[9]

I. Ayadi, N. Mahfoudh-Lahiani, H. Makni, L. Ammar-Keskes, A. Rebaï, Combining autosomal and Y-chromosomal short tandem repeat data in paternity testing with male child: Methods and application, J. Forensic Sci. 52 (2007) 1068–1072. doi:10.1111/j.1556-4029.2007.00513.x.

[10]

W. Parson, L. Gusmão, D.R. Hares, J. a. Irwin, W.R. Mayr, N. Morling, et al., DNA Commission of the International Society for Forensic Genetics: Revised and 28

extended guidelines for mitochondrial DNA typing, Forensic Sci. Int. Genet. 13 (2014) 134–142. doi:10.1016/j.fsigen.2014.07.010. C. Børsting, N. Morling, Next generation sequencing and its applications in forensic genetics, Forensic Sci. Int. Genet. (2015). doi:10.1016/j.fsigen.2015.02.002.

[12]

C. Børsting, S.L. Fordyce, J. Olofsson, H.S. Mogensen, N. Morling, Evaluation of the Ion TorrentTM HID SNP 169-plex: A SNP typing assay developed for human identification by second generation sequencing, Forensic Sci. Int. Genet. 12 (2014) 144–154. doi:10.1016/j.fsigen.2014.06.004.

[13]

M. Eduardoff, C. Santos, M. de la Puente, T.E. Gross, M. Fondevila, C. Strobl, et al., Inter-laboratory evaluation of SNP-based forensic identification by massively parallel sequencing using the Ion PGMTM, Forensic Sci. Int. Genet. 17 (2015) 110–121. doi:10.1016/j.fsigen.2015.04.007.

SC R

IP T

[11]

N

U

[14] C. Genomes Project, A. Auton, L.D. Brooks, R.M. Durbin, E.P. Garrison, H.M. Kang, et al., A global reference for human genetic variation, Nature. 526 (2015) 68–74. doi:10.1038/nature15393. P.H. Sudmant, T. Rausch, E.J. Gardner, R.E. Handsaker, A. Abyzov, J. Huddleston, et al., An integrated map of structural variation in 2,504 human genomes, Nature. 526 (2015) 75–81. doi:10.1038/nature15394.

[16]

U.K. Consortium, K. Walter, J.L. Min, J. Huang, L. Crooks, Y. Memari, et al., The UK10K project identifies rare variants in health and disease, Nature. 526 (2015) 82–90. doi:10.1038/nature14962.

[17]

J.J. Sanchez, C. Phillips, C. Børsting, K. Balogh, M. Bogus, M. Fondevila, et al., A multiplex assay with 52 single nucleotide polymorphisms for human identification, Electrophoresis. 27 (2006) 1713–1724. doi:10.1002/elps.200500671.

[18]

A.J. Pakstis, W.C. Speed, R. Fang, F.C.L. Hyland, M.R. Furtado, J.R. Kidd, et al., SNPs for a universal individual identification panel, Hum. Genet. 127 (2010) 315–324. doi:10.1007/s00439-009-0771-1.

[19]

T. Schwark, P. Meyer, M. Harder, J.H. Modrow, N. Von Wurmb-Schwark, The SNPforID assay as a supplementary method in kinship and trace analysis, Transfus. Med. Hemotherapy. 39 (2012) 187–193. doi:10.1159/000338855.

A

CC E

PT

ED

M

A

[15]

[20]

P. Gill, An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes, Int. J. Legal Med. 114 (2001) 204–210. doi:10.1007/s004149900117.

29

A. Amorim, L. Pereira, Pros and cons in the use of SNPs in forensic kinship investigation: A comparative analysis with STRs, Forensic Sci. Int. 150 (2005) 17–21. doi:10.1016/j.forsciint.2004.06.018.

[22]

C. Børsting, J.J. Sanchez, H.E. Hansen, A.J. Hansen, H.Q. Bruun, N. Morling, Performance of the SNPforID 52 SNP-plex assay in paternity testing, Forensic Sci. Int. Genet. 2 (2008) 292–300. doi:10.1016/j.fsigen.2008.03.007.

[23]

C. Børsting, N. Morling, Mutations and/or close relatives? Six case work examples where 49 autosomal SNPs were used as supplementary markers, Forensic Sci. Int. Genet. 5 (2011) 236–241. doi:10.1016/j.fsigen.2010.02.007.

[24]

P.M. Schneider, Beyond STRs: The role of diallelic markers in forensic genetics, Transfus. Med. Hemotherapy. 39 (2012) 176–180. doi:10.1159/000339139.

[25]

C. Phillips, M. García-Magariños, A. Salas, Á. Carracedo, M.V. Lareu, SNPs as supplements in simple kinship analysis or as core markers in distant pairwise relationship tests: When do SNPs add value or replace well-established and powerful STR tests?, Transfus. Med. Hemotherapy. 39 (2012) 202–210. doi:10.1159/000338857.

[26]

K.K. Kidd, A.J. Pakstis, W.C. Speed, E.L. Grigorenko, S.L.B. Kajuna, N.J. Karoma, et al., Developing a SNP panel for forensic identification of individuals, Forensic Sci. Int. 164 (2006) 20–32. doi:10.1016/j.forsciint.2005.11.017.

[27]

A. Pakstis, W. Speed, J. Kidd, K. Kidd, Candidate SNPs for a universal individual identification panel, Hum. Genet. 121 (2007) 305–317. doi:10.1007/s00439-007-0342-2.

[28]

S. Zhang, Y. Bian, A. Chen, H. Zheng, Y. Gao, Y. Hou, et al., Developmental validation of a custom panel including 273 SNPs for forensic application using Ion Torrent PGM, Forensic Sci. Int. Genet. 27 (2017) 50–57. doi:10.1016/j.fsigen.2016.12.003.

CC E

PT

ED

M

A

N

U

SC R

IP T

[21]

A

[29]

S.K. Mo, Y.C. Liu, S.Q. Wang, X.C. Bo, Z. Li, Y. Chen, et al., Exploring the efficacy of paternity and kinship testing based on single nucleotide polymorphisms, Forensic Sci Int Genet. 22 (2016) 161–168. doi:10.1016/j.fsigen.2016.02.012.

[30]

M.J. Mcmillin, J.E. Below, K.M. Shively, A.E. Beck, H.I. Gildersleeve, J. Pinner, et al., Mutations in ECEL1 Cause Distal Arthrogryposis Type 5D, Am. J. Hum. Genet. 92 (2013) 150–156. doi:10.1016/j.ajhg.2012.11.014.

[31]

J.E. Below, D.L. Earl, K.M. Shively, M.J. Mcmillin, J.D. Smith, E.H. Turner, et al., Whole-Genome Analysis Reveals that Mutations in Inositol Polyphosphate

30

Phosphatase-like 1 Cause Opsismodysplasia, Am. J. Hum. Genet. 92 (2013) 137–143. doi:10.1016/j.ajhg.2012.11.011. [32]

B. Li, D. Krakow, D.A. Nickerson, M.J. Bamshad, Y. Chang, R.S. Lachman, et al., Opsismodysplasia resulting from an insertion mutation in the SH2 domain, which destabilizes INPPL1, Am. J. Med. Genet. Part A. 164 (2014) 2407–2411. doi:10.1002/ajmg.a.36640.

IP T

[33] V. Makaryan, E.A. Rosenthal, A.A. Bolyard, M.L. Kelley, J.E. Below, M.J. Bamshad, et al., TCIRG1-Associated Congenital Neutropenia, Hum. Mutat. 35 (2014) 824–827. doi:10.1002/humu.22563. B.F. Voight, J.K. Pritchard, Confounding from cryptic relatedness in case-control association studies., PLoS Genet. 1 (2005). doi:10.1371/journal.pgen.0010032.

[35]

A.G. Day-Williams, J. Blangero, T.D. Dyer, K. Lange, E.M. Sobel, Linkage analysis without defined pedigrees, Genet. Epidemiol. 35 (2011) 360–370. doi:10.1002/gepi.20584.

[36]

S.M. Kerr, A. Campbell, L. Murphy, C. Hayward, C. Jackson, L. V Wain, et al., Pedigree and genotyping quality analyses of over 10,000 DNA samples from the Generation Scotland: Scottish Family Health Study, BMC Med. Genet. 14 (2013) 38. doi:10.1186/1471-2350-14-38.

[37]

M.A. Bellis, K. Hughes, S. Hughes, J.R. Ashton, Measuring paternal discrepancy and its public health consequences., J. Epidemiol. Community Heal. 59 (2005) 749–754.

[38]

M. Wolf, J. Musch, J. Enczmann, J. Fischer, Estimating the Prevalence of Nonpaternity in Germany, Hum. Nat. 23 (2012) 208–217. doi:10.1007/s12110-012-9143-y.

[39]

S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M.A.R. Ferreira, D. Bender, et al., PLINK: A tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet. 81 (2007) 559–575. doi:10.1086/519795.

CC E

PT

ED

M

A

N

U

SC R

[34]

A

[40]

[41]

B. Kirkpatrick, S. Li, R. Karp, E. Halperin, Pedigree Reconstruction Using Identity by Descent, in: Res. Comput. Mol. Biol., 2011: pp. 136–152. doi:10.1089/cmb.2011.0156. A. Manichaikul, J.C. Mychaleckyj, S.S. Rich, K. Daly, M. Sale, W.M. Chen, Robust relationship inference in genome-wide association studies, Bioinformatics. 26 (2010) 2867–2873. doi:10.1093/bioinformatics/btq559.

31

[42]

J. Staples, D.J. Witherspoon, L.B. Jorde, D.A. Nickerson, J.E. Below, C.D. Huff, PADRE: Pedigree-Aware Distant-Relationship Estimation, Am. J. Hum. Genet. 99 (2016) 154–162. doi:10.1016/j.ajhg.2016.05.020.

[43]

J. Staples, D. Qiao, M.H. Cho, E.K. Silverman, D. a. Nickerson, J.E. Below, PRIMUS: Rapid reconstruction of pedigrees from genome-wide estimates of identity by descent, Am. J. Hum. Genet. 95 (2014) 553–564. doi:10.1016/j.ajhg.2014.10.005.

IP T

[44] C.D. Huff, D.J. Witherspoon, T.S. Simonson, J. Xing, W.S. Watkins, Y. Zhang, et al., Maximum-likelihood estimation of recent shared ancestry (ERSA), Genome Res. 21 (2011) 768–774. doi:10.1101/gr.115972.110. M.P. Epstein, W.L. Duren, M. Boehnke, Improved inference of relationship for pairs of individuals., Am. J. Hum. Genet. 67 (2000) 1219–1231. doi:10.1086/321195.

[46]

L. Sun, K. Wilder, M.S. McPeek, Enhanced pedigree error detection, Hum. Hered. (2002). doi:10.1159/000067666.

[47]

J. Cussens, M. Bartlett, E.M. Jones, N.A. Sheehan, Maximum likelihood pedigree reconstruction using integer linear programming., Genet. Epidemiol. (2013). doi:10.1002/gepi.21686.

[48]

D. He, Z. Wang, B. Han, L. Parida, E. Eskin, IPED: Inheritance Path-based Pedigree Reconstruction Algorithm Using Genotype Data, J. Comput. Biol. 20 (2013) 780–791. doi:10.1089/cmb.2013.0080.

[49]

D. Shem-Tov, E. Halperin, Historical Pedigree Reconstruction from Extant Populations Using PArtitioning of RElatives (PREPARE), PLoS Comput. Biol. 10 (2014) 1–13. doi:10.1371/journal.pcbi.1003610.

[50]

I. Grandell, R. Samara, A.O. Tillmar, A SNP panel for identity and kinship testing using massive parallel sequencing, Int. J. Legal Med. (2016) 905–914. doi:10.1007/s00414-016-1341-4.

CC E

PT

ED

M

A

N

U

SC R

[45]

A

[51]

J. Harrow, F. Denoeud, A. Frankish, A. Reymond, C.K. Chen, J. Chrast, et al., GENCODE: producing a reference annotation for ENCODE, Genome Biol. 7 Suppl 1 (2006) S4 1–9. doi:10.1186/gb-2006-7-s1-s4.

[52]

J. Harrow, A. Frankish, J.M. Gonzalez, E. Tapanari, M. Diekhans, F. Kokocinski, et al., GENCODE: the reference human genome annotation for The ENCODE Project, Genome Res. 22 (2012) 1760–1774. doi:10.1101/gr.135350.111.

[53]

D.L. Hartl, A.G. Clark, Principles of Population Genetics, Fourth Edition, Sinauer Associates, Inc., 2007.

32

P.A. Fujita, B. Rhead, A.S. Zweig, A.S. Hinrichs, D. Karolchik, M.S. Cline, et al., The UCSC Genome Browser database: update 2011, Nucleic Acids Res. 39 (2010). doi:10.1093/nar/gkq963.

[55]

H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform, Bioinformatics. 26 (2010) 589–595. doi:10.1093/bioinformatics/btp698.

[56]

H. Li, A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data, Bioinformatics. 27 (2011) 2987–2993. doi:10.1093/bioinformatics/btr509.

[57]

H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, et al., The Sequence Alignment/Map format and SAMtools, Bioinformatics. 25 (2009) 2078–2079. doi:10.1093/bioinformatics/btp352.

[58]

A.M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics. 30 (2014) 2114–2120. doi:10.1093/bioinformatics/btu170.

[59]

G.R. Abecasis, S.S. Cherny, W.O. Cookson, L.R. Cardon, Merlin—rapid analysis of dense genetic maps using sparse gene flow trees, Nat. Genet. 30 (2002) 97–101. doi:10.1038/ng786.

[60]

B. Xie, L. Chen, Y. Yang, Y. Lv, J. Chen, Y. Shi, et al., Genetic distribution of 39 STR loci in 1027 unrelated Han individuals from Northern China, Forensic Sci Int Genet. 19 (2015) 205–206. doi:10.1016/j.fsigen.2015.07.019.

[61]

A.O. Tillmar, C. Phillips, Evaluation of the impact of genetic linkage in forensic identity and relationship testing for expanded DNA marker sets, Forensic Sci. Int. Genet. 26 (2017) 58–65. doi:10.1016/j.fsigen.2016.10.007.

[62]

J.M. Butler, M.D. Coble, P.M. Vallone, STRs vs. SNPs: Thoughts on the future of forensic DNA testing, Forensic Sci. Med. Pathol. 3 (2007) 200–205. doi:10.1007/s12024-007-0018-1.

CC E

PT

ED

M

A

N

U

SC R

IP T

[54]

A

[63]

[64]

J.D. Churchill, S.E. Schmedes, J.L. King, B. Budowle, Evaluation of the Illumina?? Beta Version ForenSeq??? DNA Signature Prep Kit for use in genetic profiling, Forensic Sci. Int. Genet. 20 (2016) 20–29. doi:10.1016/j.fsigen.2015.09.009. K.K. Kidd, A.J. Pakstis, W.C. Speed, R. Lagacé, J. Chang, S. Wootton, et al., Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics, Forensic Sci. Int. Genet. 12 (2014) 215–224. doi:10.1016/j.fsigen.2014.06.014.

33

K.K. Kidd, W.C. Speed, A.J. Pakstis, D.S. Podini, R. Lagacé, J. Chang, et al., Evaluating 130 microhaplotypes across a global set of 83 populations, Forensic Sci. Int. Genet. 29 (2017) 29–37. doi:10.1016/j.fsigen.2017.03.014.

[66]

N. Hiroaki, F. Koji, K. Tetsushi, S. Kazumasa, N. Hiroaki, S. Kazuyuki, Approaches for identifying multiple-SNP haplotype blocks for use in human identification, Leg. Med. 17 (2015) 415. https://doi.org/10.1016/j.legalmed.2015.06.003.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

[65]

34