Forensic Science International: Genetics 42 (2019) 90–98
Contents lists available at ScienceDirect
Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsigen
Research paper
Forensic characterization and statistical considerations of the CaDNAP 13STR panel in 1,184 domestic dogs from Germany, Austria, and Switzerland
T
Burkhard Bergera, , Josephin Heinricha, Harald Niederstättera, Werner Hechtb, Nadja Morfc, Andreas Hellmannd, Udo Rohlederd, Uwe Schleenbeckerd, Cordula Bergera, Walther Parsona,e, The CaDNAP Group ⁎
a
Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria Institute of Veterinary Pathology, Justus-Liebig-University Giessen, Giessen, Germany c Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland d Bundeskriminalamt, Kriminaltechnisches Institut, Wiesbaden, Germany e Forensic Science Program, The Pennsylvania State University, University Park, PA, USA b
ARTICLE INFO
ABSTRACT
Keywords: Canine STRs Non-human DNA Dog breeds Population structure Match probability
Crime scene samples originating from domestic dogs such as hair, blood, or saliva can be probative as possible transfer evidence in human crime and in dog attack cases. In the majority of such cases canine DNA identification using short tandem repeat (STR) analysis is the method of choice, which demands, among others, a systematic survey of allele frequency data in the relevant dog populations. A set of 13 highly polymorphic canine STR markers was used to analyze samples of 1,184 dogs (including 967 purebred dogs) from the so-called DACH countries (Germany, Austria, Switzerland). This CaDNAP 13-STR panel has previously been validated for canine identification in a forensic context. Here, we present robust estimates of allele frequencies, which are essential to assess the weight of the evidence by estimating the probability of a matching DNA profile within the dog population under question, e.g. in the form of a random match probability (RMP). The geographical provenance of the tested dogs showed a negligible influence on the observed genotype variation. Therefore, we combined the STR data from all three countries into a single dog population sample (DPS). In contrast, pronounced genetic differentiation between dog breeds was found by principal component analysis and sub-structure analysis with the STRUCTURE software. These findings entailed the need to account for the effects of DPS breed composition on allele frequency estimates. A possible strategy, which was favored here, relies on collecting a DPS that is guided by the breed composition of the relevant dog population. In total, dogs from 166 different breeds were included in our DPS, 64 of them including at least 5 individuals (n = 771 dogs). Sampling reflected the abundance of breeds in the DACH countries with the following being the most common ones: German Shepherds (population frequency: 14.3%), Dachshunds (5.9%), Labrador Retrievers (3.9%), and Golden Retrievers (3.2%). The pedigree listing of the purebred dogs in our DPS ranked German Shepherds (DPS frequency 8.5%) first, followed by Labrador Retrievers (3.9%), Golden Retrievers (3%), and Dachshunds (2.5%). RMP values based on overall allele frequencies and accounting for substructure using FST between breeds ranged between 10-13 and 10-14 and represent a conservative approach of RMP assessment.
1. Introduction
non-human field as well [3–7]. Accordingly, a number of approaches have been published using canine STR loci for dog individualization from crime scene evidence [8–16]. The Canine DNA Profiling (CaDNAP) Group (https://gerichtsmedizin.at/cadnap.html), a collaboration between forensic institutes from Germany, Austria, and Switzerland (hereafter referred to as DACH countries; D for Germany, A for Austria and CH for Switzerland), has focused on canine STRs for several years [17–20]. CaDNAP has been recognized as official working group
Short tandem repeat (STR) loci, chosen to take full advantage of the high diversity of these genetic markers among individuals, played a pivotal role for the sustainable success of forensic identity testing over the last decades [1,2]. This is primarily true for human identification, but STR loci have been identified in numerous genomes of other organisms and have become a vital tool for DNA identity testing in the ⁎
Corresponding author. E-mail address:
[email protected] (B. Berger).
https://doi.org/10.1016/j.fsigen.2019.06.017 Received 18 April 2019; Received in revised form 21 June 2019; Accepted 22 June 2019 Available online 26 June 2019 1872-4973/ © 2019 Elsevier B.V. All rights reserved.
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
by the International Society for Forensic Genetics in 2017 (https:// www.isfg.org/Working%20Groups). The panel of STRs selected by CaDNAP provides strong evidence for individual identification and has been successfully validated [21] according to current forensic guidelines [22]. It consists of thirteen STR markers co-amplified in two PCR multiplexes containing one overlapping marker. This set was applied in the comprehensive population study presented here. The challenging nature of biological evidence encountered in routine forensic genetic investigations (e.g. limited or degraded DNA) explains why initial studies on canine DNA focused on technical aspects such as establishing sensitive assays for co-amplifying informative STR marker sets and the construction of reliable allelic ladders. While detailed frequency estimates and accurate descriptions of human STRs have been the efforts of large-scale collaborative endeavors – recently driven by the rapidly developing Next Generation Massively Parallel Sequencing technologies (for recent examples see [23–25]) – extensive canine STR frequency databases are largely lacking and until now frequency data for most loci are based on merely a few hundred genotypes, with the only exception of [11]. When assessing the weight of evidence of a matching DNA profile in forensic casework – e.g. by calculating match probabilities – it is imperative that allele frequencies are accurately known for the population that is considered to be the population from which the unknown contributor to the evidence profile originated. Here, we present STR data from 1184 dogs, thereby substantially increasing the number of data published on this particular marker set by a factor of four [21]. This increase in database size resulted in newly observed alleles and fosters reliable allele frequency estimates. Recently – and in line with data published previously [26–31] – we have shown that the manifestation of canine STR genotypes correlated well with the particular dog breeds [32]. As match probabilities are highly sensitive to population (sub)structure, potential shaping factors have to be considered. Under natural conditions, population structure corresponds to a pattern of preferential mating within a subgroup of a population. Therefore, for many wildlife species subgroups can be found on a geographical scale. This however, seems less likely for domestic dog populations [33], as the high level of population structuring in dogs can mainly be attributed to domestication history and – to an even higher proportion – to breeding practices, which have been intensified over the last 200 years [34–38]. Recent studies reported an increase of inbreeding and loss of intra-pedigree genetic variation for many dog breeds [33,39–41]. Consequently, the genetic structure of a dog population largely depends on mix of pedigrees composing it. Such considerations raise the forensically relevant questions on appropriate sampling strategies and the statistical evaluation of the evidence. Here, essentially, two approaches are conceivable, i) an overall approach using data from (ideally) all relevant breeds (including crossbreeds) or ii) a breed-specific approach. For the latter, individual frequency data for the different dog breeds are necessary. To meet the obvious influences of the particular breeds on genotype characteristics (e.g. allele frequencies) of the whole population, we were aiming to adjust the breed composition of the population sample examined to real world breed proportions. The extent of population substructure due to breed composition on the weight of evidence was considered by calculations of correction factors for population substructure and/or consanguinity, which can be applied to random match probability (RMP) calculations. This overall approach was compared to the usage of breed specific allele frequencies. For the latter, allele frequency data for the three most abundant dog breeds in the DACH region - German Shepherd (GS), Labrador Retriever (LR) and Golden Retriever (GR) - were compiled.
cotton swabs (buccal swab). In addition, tissue samples were taken from dogs delivered for necropsy and subsequent carcass disposal as described in [32]. The sample set was completed by including data from an earlier study, which originally comprised 295 dogs [21] (referred to as DPS-295). However, data from ten individuals were omitted because of possible kinship relations with other dogs in this study. Hence, only 285 samples were added to the actual collection (Supplementary Table S1). All sampled dogs were categorized and their breed assigned by trained personnel. Whenever possible, pedigree certificates were documented and photographs of the dog´s external appearance were taken. The selection was done regardless of sex and age of the individuals. An overview of purebred dogs and cross-breeds living in the research area (DACH countries) was compiled to determine the overall breed frequency distribution in this region as detailed in [32]. This information formed the basis for assessing the consistency of the collected breeds with the breed composition of the real dog population. In general, the breed designations used in this research (Supplementary Table S1) refer to the nomenclature of the Fédération Cynologique Internationale. However, in some rare cases an unambiguous breed assignment was not straightforward, because slightly deviating breed nomenclature rules and breed standards apply to the different countries and national breeding associations (kennel clubs). 2.2. DNA extraction and quantification Total genomic DNA (gDNA) from mouth swabs and tissue samples was extracted according to the protocols described in detail in [32]. Following extraction, DNA concentrations were measured either with a spectral photometer (Nanodrop 2000; Peqlab GmbH, Erlangen, Germany) or by using a quantitative real-time PCR assay according to [42]. The assay was performed in a total volume of 10 μl and run on an AB 7500 Fast Real-Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA; TFS). 2.3. Analysis of canine STRs Multiplexed PCR amplifications of the 13 CaDNAP STR markers and two sex-specific markers were carried out according to [21] with template input amounts of 500 pg to 1 ng DNA. Capillary electrophoresis was performed on ABI Prism 3100 or 3500xL Genetic Analyzers using default instrument settings (all TFS). The data were analyzed using GeneMarker HID V1.7 (SoftGenetics Inc., State College, PA, USA). For further analyses, only the 13 STR multilocus genotypes (MLG) were considered. 2.4. Sequencing analysis of markers C38 and WILMS-TF
2. Materials and methods
The unlabeled PCR primers C38 F (GTGATACAATGCATTTTCTGG GTTG, canFam3, chr38: 6841726-6841750), and C38R (CATTTTTTCA TGTGTCTGTTGGGC, chr38: 6841921-6841944) both binding outside the original C38 amplification primers were used to test for a suspected primer binding site mutation. For sequence verification of an exceptionally short WILMS-TF allele unlabeled versions of the original amplification primers [21] were used. Amplification and Sanger sequencing were performed according to [17]. Capillary electrophoresis was performed on an AB 3500xL Genetic Analyzer using POP-6, 50 cm capillary arrays and default instrument settings (all TFS). The data were analyzed using Sequencing Analysis Version 5.4 (TFS) and Sequencher Version 5.1 (GeneCodes, Ann Arbor, MI, USA).
2.1. Sampling
2.5. Data analyses
Dogs were typically sampled by direct queries to private dog owners, breeders, obedience schools, and dog shows. Sampling was performed by collecting mucosa cells from the buccal cavity on sterile
STR genotype data were formatted in Excel 2013 (Microsoft Corporation, Redmond, WA, USA) and input files for downstream data analysis were generated with GenAIEx 6.4 [43,44]. Population statistic 91
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
parameters, like allele frequency, Hardy-Weinberg-Equilibrium (HWE), expected heterozygosity (Hexp), observed heterozygosity (Hobs), power of exclusion (PE), power of discrimination (PD), polymorphism information content (PIC), and Fst and Fis values, were calculated using GenAIEx 6.4 [43,44], Arlequin v.3.5 [45], and STRAF [46]. For the analysis of molecular variance (AMOVA), routines provided by the poppr package were employed (v2.8.1 [47,48]) for R v.3.5.2 [49]). The “genotype_curve” function of the poppr package was used for computing genotype accumulation curves. This function randomly samples without replacement a given number of loci and counts the number of distinct MLGs obtained for the full set of individuals. This procedure was repeated 10,000-times for all MLG sizes from 1 up to 12 of the altogether 13 loci under study. A genotype accumulation curve was constructed by plotting the hence obtained number of distinct MLGs vs. the number of retained loci forming the underlying, down-sampled marker sets. To detect potential genotype clusters due to population stratification, principal component analysis (PCA) was performed using STRAF [46]. STRUCTURE v.2.3.4.21 software [50,51] was used to examine subpopulation structure within the entire population set. The analysis was performed using 100,000 burn-in steps followed by 100,000 Markov Chain Monte Carlo steps with the admixture and correlated allele frequencies models. Ten independent runs (replicates) were performed for each value of K (i.e., the user-defined number of clusters) which was varied from 2 to 5. Post processing of the STRUCTURE results was performed in CLUMPAK [52]. RMP was calculated for dog individuals not included in the population sample and belonging to the three most frequent breeds (GS, LR, GR). Following NRCII recommendation 4.2 [53], the Balding and Nichols formulas [54] were used to account for population structure among breeds. Overall allele frequencies derived from all genotypes regardless of the breed were used for calculations. For theta correction we applied the Fst obtained for all breeds represented by at least five individuals. For breed-specific RMP calculations formulas 4.2a and 4.2b [53] and breed-specific allele frequencies were used. To account for intra-breed non-random association of alleles, breed-specific Fis values were calculated. At heterozygous loci 2qp was used instead of 2qp(1Fis).
Fig. 1. Genotype accumulation curve for simulating the effects of locus dropout on the genotyping approach’s ability to distinguish between all dogs included in DPS-1184. The resulting curve reached a plateau with as little as 5–6 randomly chosen loci.
may be challenging in terms of available DNA quantity and/or quality, and partial profiles can be expected under such framework conditions. As a result, some profiles may turn indistinguishable because of dropout events. Therefore, the effect of locus drop-out was simulated by computing a genotype accumulation curve [47,48]. As depicted in Fig. 1, the resulting curve reached a plateau with as little as 5–6 randomly chosen loci. With increasing number of retained loci, however, the variance dropped. For instance, when simulating the complete loss of information for 4 randomly chosen STR loci, 6950 (∼70%) of the 10,000 applied repetitions showed full resolution (i.e. 1184 distinct 9locus genotypes) and the remaining 3050 repetitions (∼30%) yielded 1183 different genotypes for the 1184 dogs in our dataset (Supplementary Table S2). These values changed to 9218 (∼92%) and 782 (∼8%), respectively, when assuming the complete drop-out of a single, randomly chosen locus (Supplementary Table S2). These data clearly illustrate that the CaDNAP 13-STR panel is large enough for reasonably robust discrimination even in cases of degraded DNA that may result in partial profiles. At the population genetics level, we tested for possible spatial stratification of DPS-1184. First, and on basis of the country of origin, the 1184 genotype records were assigned to three sub-populations (Austria, Germany, Switzerland). Then STRUCTURE analysis was performed to identify potential clusters formed by differences in allele frequency patterns. STRUCTURE was applied to the entire data set using increasing values of K (K = 1 to K = 5). As shown in Fig. 2 the application of STRUCTURE to the total population sample showed no cluster formation corresponding to the three declared subpopulation classification at K = 3. At K = 2 a split was observed, which, however, did not occur along the geographical division used. One cluster almost exclusively contained the STR genotypes of dogs declared to be German Shepherds, whereas the other cluster contained the profiles of all the other individuals regardless of breed and geographical origin. A similar finding was already reported in an earlier study [32]. As an alternative approach, PCA was used for sub-structure analysis
3. Results and discussion The DPS comprised in total 1184 samples (hereafter referred to as “DPS-1184”) collected from unrelated dogs from Austria (n = 260; 22%), Germany (n = 569; 48%), and Switzerland (n = 355; 30%). Complete 13-locus STR profiles were obtained for all but three samples with null-alleles in markers C38 and PEZ3. DPS-1184 comprised of 967 purebred dogs (82%) and 217 crossbreeds and/or dogs of unknown ancestry (18%). From these 285 had been published previously [21]. The objective of the current study was to present a substantially extended data set for gaining a better picture of allelic ranges and rare alleles. Compared to the previously published DPS-295 data [21] a fourfold increase in sample size was achieved and the number of covered breeds doubled (166 breeds in DPS-1184 vs. 77 in DPS-295). Furthermore, DPS-1184 contained also samples from Switzerland. As a result, more precise and robust estimates of allele frequencies should be possible, which are essential to assess the weight of evidence by estimating the probability of a matching DNA profile within the relevant dog population, e.g. as random match probability (RMP). In Supplementary Table S1 sample information is listed including breed affiliation and country of origin. DPS-295 samples [21] are tagged. All 1184 13-locus STR genotypes obtained in this research were unique, which underlines the forensic utility of the here employed marker set. However, in forensic genetic casework, biological evidence
92
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
Fig. 2. STRUCTURE analyses of 13-STR-genotypes from 1184 dogs. The samples were grouped by the country of origin (CHE Switzerland, AUT Austria, GER Germany). The individual STRUCTURE plots were generated with Clumpak Files (major cluster ClumppIndFile.output) for K = 2 (upper panel) and K = 3 (lower panel). The German Shepherd splits from all other breeds at K = 2..
placement of the latter two in opposite quadrants along PC-2 (Fig. 3b) supports the idea of having strong population sub-structuring at the level of pedigrees [32]. Further evidence pointing in this direction came from AMOVA, suggesting statistically significant stratification of the dataset at the breed level by revealing that 13.2% of the total variation was attributable to differences between breeds but only 3.7% were explained by differences between individuals within a country. Within breeds, a mere 0.3% of the total variation was attributed to the country of origin. As expected, the vast majority of variation (82.7%) was found within individuals, which emphasizes the STR panel’s suitability to identity testing. These analyses clearly show that geographical provenance had a negligible effect on genotype variation and that STR data from the three selected countries can be combined and treated as one common population sample. Similar findings were previously reported for dogs collected from different locations in the U.S. [31]. Furthermore, the pronounced genetic differentiation between the breeds pointed out that the breed composition of the tested dogs had a huge impact on the allele frequencies. This phenomenon may be explained by the extreme form of selective breeding in dogs, which has resulted in the very different morphological, behavioral, and genetic characteristics distinguishing present day dog breeds [29,35,37,38,55–57]. Hence, the breed composition of a given population is a sensitive factor that has to be considered with particular care when portraying this population on basis of STR frequency estimates obtained by sampling. The primary strategy favored here was to align DPS breed composition as closely as possible to that of the entire DACH population. To obtain reliable information about the breed composition of the current dog population in the DACH countries, official authorities were contacted and the data obtained for all three countries was merged (for details see [32]). The particular frequencies of most of the more than 500 different breeds recorded were low with some significant exceptions. GS displayed by far the highest frequency (14.3%), followed by Dachshunds (5.9%), LR (3.9%), and GR (3.2%). The pedigree listing of the 967 purebred dogs included in DPS-1184 also ranked GS (8.5%) first, followed by LR (3.9%), and GR (3.0%). The Dachshund ranked fifth with a frequency of 2.5%. Fig. 4 compares the frequencies of the 23 most popular breeds in the DACH countries to their corresponding frequencies in DPS-1184. The bar chart shows that most breeds had broadly similar frequency levels except the underrepresented GS and Dachshunds. Overall, a clear correlation (R2 = 0.883) between actual DACH and DPS-1184 dog breed frequencies was found (Fig. 4, inset), which is mandatory for avoiding biased STR allele frequency estimates on basis of our population sample. In total, dogs from 166 different breeds were included, and 64 of them were represented by five or more individuals (n = 771). The three most common dog breeds in our dataset (GS: 82, LR: 38, GR: 29) accounted for 12.6% of all genotypes. At the other end of the
Fig. 3. a Principal component analysis (PCA) of 13-STR-genotypes from 1184 dogs was used for sub-structure analysis based on the country-by-country classification (CHE Switzerland, AUT Austria, GER Germany). 3b: PCA performed on STR data of the three most frequent breeds GS, LR, and GR.
based on the country-by-country classification. Consistent with the STRUCTURE results, a rather homogeneous distribution of the genotypes was observed (Fig. 3a). Notably, completely deviating clustering was observed when pre-classifying the genotypes according to their breed affiliation. Fig. 3b exemplifies the PCA results for the three most frequent dog breeds in the DACH region (GS, LR, GR). In line with the results obtained for the entire DPS, no sub-structuring attributable to geography was observed for each individual breed (Fig. 3b). However, the clear-cut separation between GS and GR plus LR along PC-1 and the
93
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
Fig. 4. Frequencies of the 23 most popular breeds in the DACH countries (ranked in descending order) compared to their corresponding frequencies in DPS-1184. The inserted graph shows the ratio between the breed frequencies in DPS-1184 and in the DACH dog population.
spectrum, 48 less common breeds are represented in DPS-1184 by single individuals only. Table 1 provides an overview of the allele frequencies obtained for DPS-1184. Forensically relevant parameters such as Hobs, Hexp, PE, PD, and PIC are listed in Table 2. Statistically significant (p < 0.05) deviations from Hardy-Weinberg equilibrium (HWE) were found for all 13 loci, indicating that the dogs were not mating randomly within the entire population. Regardless of breed, the overall Hobs of all dogs was estimated at 0.73. The Hobs values obtained for the purebred dogs (n = 967) and for the crossbreeds (n = 217) amounted to 0.71 and 0.83, respectively. An increase of Hobs between 10 and 30% for mongrels compared to pedigree dogs was observed for all STR markers as shown in Supplementary Fig. S1. Turning more specifically to the existing differences among dog breeds, Supplementary Tables S3a, b, and c list the allele frequency data obtained for GS, LR, and GR. Population genetic parameters for these three breed-specific population samples can be found in Supplementary Tables S4a, b, and c. In contrast to the findings on the entire population sample consisting of a mix of many different breeds, the three breedspecific subsets showed no statistically significant deviations from HWE (p < 0.05) for the majority of the 13 analyzed STR loci (GS: 10 loci, GR: 12, LR: 12). As a notable result of the comparison between DPS-1184 and DPS295, a marked increase in the number of different alleles per locus was found. The total number of distinct alleles for all 13 loci increased by nearly 30% from 289 (DPS-295) to 375 (DPS-1184). In Table 3 the number of different alleles and the corresponding allelic ranges for all STR markers are listed. C38 was the locus with the highest allele count, featuring 58 different alleles ranging from 11 to 35.1 repeats. The 19 newly described C38-alleles accounted for an almost 50% increase in
allele number as compared to DPS-295. At the lower end of the variability range, loci FH2508 and FH2087Ub had only 14 and 11 different alleles, respectively. In both cases, only one additional allele was found as compared to DPS-295. Two observations made during STR genotyping required further evaluation via Sanger sequencing. First, in two out of the 11 samples from Bernese Mountain Dogs no allele calls for marker C38 were obtained. Sanger sequencing revealed an eight base pair deletion affecting the binding site of the reverse primer to be causative for this phenomenon (Supplementary Fig. S2). Second, we detected an exceptionally short WILMS-TF allele in three (out of six) Czechoslovakian Wolfdog samples and one specimen from a crossbreed. Due to its very short amplicon length, the electrophoretic peak of this allele appeared in the range of the adjacent FH2087Ub marker (Supplementary Fig. S2). However, since FH2087Ub is part of both applied STR multiplexes, albeit in differing marker contexts, unambiguous allele calling was assured. Allele sequencing verified a WILMS-TF repeat count of 2.3 (Supplementary Fig. S2). Finding an unusual allele in Czechoslovakian Wolfdogs was not fully unexpected, as these dogs constitute a unique breed, which originated from military hybridization experiments between German Shepherds and wild Carpathian wolves in the 1950s. Recent genetic analyses revealed a limited introgression of wolf alleles [58]. However, a wolf origin of WILMS-TF allele 2.3 cannot be confirmed on basis of our data. Although an in-depth discussion of statistical analyses was not the primary objective of this study, some considerations about estimating the probability of a matching canine DNA profile should be briefly mentioned here. The substantial population sub-structuring due to marked differences among dog breeds significantly challenges RMP calculations in a forensic context—RMPs tend to be underestimated
94
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
Table 1 Allele frequencies of 13 canine STRs (CaDNAP 13-STR panel) based on 1184 dogs from multiple breeds (DPS-1184). Alleles 2.3 6 7 8 9 9.1 9.3 10 10.1 10.2 10.3 11 11.1 11.2 11.3 12 12.1 12.2 12.3 13 13.1 13.2 13.3 14 14.1 14.2 14.3 15 15.1 15.2 15.3 16 16.1 16.2 16.3 17 17.1 17.2 17.3 18 18.1 18.2 18.3 19 19.1 19.2 19.3 20 20.1 20.2 20.3 21 21.1 21.2 21.3 22 22.1 22.2 22.3 23 23.1 23.2 23.3 24 24.1 24.2 24.3 25 25.1 25.2 25.3 26 26.1 26.2
C38
FH2054
FH2087Ub
0.0004
0.0008 0.0055
0.0004 0.122 0.0684 0.1305
0.0013
0.003 0.0486
0.1926
0.0135
0.0008 0.2149
0.0063
0.0004 0.234
FH2137
FH2328
FH2361
FH2508
0.0013 0.0084 0.0152 0.0017
0.2416
0.0773 0.0321
0.003
0.1178
0.0976
0.0072
0.0274
0.1883 0.0004
0.0743 0.0063 0.0055
0.0063 0.0089
0.1339
0.2188
0.154 0.0004 0.0013 0.0004 0.1451
0.1045
0.0525
0.0173
0.0034 0.0013 0.0008 0.0047 0.0017 0.0004 0.0093 0.0008 0.0127 0.0013
0.0051 0.0008
0.1693 0.0017
0.0617
0.0013
0.0106
0.0224 0.0498
0.0004
0.0144 0.0874 0.0718 0.0177 0.0549 0.0984 0.0144 0.0321 0.0904 0.0042 0.0072 0.0084 0.0777 0.0004 0.003 0.0025 0.0722 0.022 0.0008 0.1174
0.0051
0.0131
0.1351 0.0038 0.1254
0.0904
0.0338 0.0004 0.0025
0.0004
0.0008
0.2703 0.0034 0.0144
0.0008
0.0106 0.0384 0.0004 0.0046 0.1187
0.0004
0.147 0.0076 0.0076
0.0008
0.0781
0.0042
0.0122
0.0228
0.0266 0.0004 0.0046 0.0004 0.0165 0.0004 0.003
0.0203
0.0165
0.1077 0.0313 0.0887 0.098
0.0051
0.0752 0.0008 0.1423
0.0008
0.0342
0.0004
0.1744
0.0017
0.0093 0.0988
0.0008
0.0017 0.0503
0.0004
0.0013
0.0038 0.0262
0.0849 0.0013 0.0008
0.0004
0.0038
0.0034
0.0118 0.0008
0.0004
0.0004
0.0101 0.0004
0.1499
0.0063 0.0642
0.016
0.1305
0.0921
0.0131
0.0025 0.0013
0.2437 0.1398 0.0878 0.0287
0.0008
0.0025
0.223 0.0182 0.0004
WILMS-TF
0.0055 0.0017 0.1035
0.0258
0.0942
PEZ6
0.0245
0.016
0.1448
PEZ3
0.0828
0.0853
0.0622 0.0072 0.0055 0.1155 0.0004 0.0042 0.0008 0.1421 0.0008 0.003
PEZ15
0.2365 0.109
0.0883
0.0431 0.003
FH2613
0.0021
0.0004 0.0063
FH2611
0.0038
0.1339 0.0038 0.0008 0.0718 0.0004 0.0008 0.0346 0.0021 0.0228 0.0013 0.0008 0.0063 0.0017 0.0017 0.0055
0.0139
0.0198 0.0004 0.0063
0.0038
0.1854 0.0671 0.1968 0.0004 0.0511 0.0524 0.0215 0.0122 0.0211 0.0042
0.0072 0.0008 0.0038 0.0008
0.0008 0.0004
0.0013 0.016 0.011 0.0038 0.0004
0.0004
0.0401 0.0004 0.0004
0.0068
0.0494 0.0114 0.0198 0.0203 0.0101 0.0938 0.0034
0.0008 0.1407
0.0114 0.0004 0.0127 0.0004 0.1094 0.0008 0.0021 0.0038 0.1318 0.0152 0.0013 0.0093 0.1947 0.0393 0.0063 0.0106 0.1727 0.0752 0.0055 0.0034 0.0401 0.0562 0.0038 0.0004 0.0122 0.0422 0.0004
0.0013 0.0304 0.1208 0.0004 0.0017 0.0604 0.1905 0.003 0.0025 0.0524 0.1402 0.0017 0.0008 0.0587 0.087 0.0017 0.0004 0.0258 0.0389 0.0025 0.0051 0.0093 0.0004 0.0051 0.0046 0.0013 0.0013
0.0046 0.0144 0.0004 0.0017 0.0059 0.0008 0.0017
0.2092
0.1116
0.0004
0.0008
(continued on next page) 95
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
Table 1 (continued) Alleles 27 27.1 27.2 28 28.1 28.2 29 29.1 29.2 30 30.1 31 31.1 32 32.1 32.2 33 33.1 33.2 33.3 34 34.1 34.2 35 35.1 36 36.1 37 37.1 38
C38
FH2054
FH2087Ub
0.0127 0.0004 0.0068 0.0008 0.0102 0.0008 0.0034
FH2137
FH2328
FH2361
FH2508
FH2611
FH2613
0.0004
0.0046 0.0004
PEZ15
PEZ3
0.0144
0.0008
0.0063
0.0017
0.0013
0.0004
0.0004
0.0021
0.0402
0.0017
0.0131
0.0046
0.0034
0.0047 0.0051 0.0055 0.0008 0.0025 0.0042 0.0004 0.0004 0.0017 0.0021 0.0013
0.008 0.003
PEZ6
WILMS-TF
0.1961 0.0638 0.0811
0.0063
0.0076 0.0063
0.0042
0.0004
0.0051
0.0008 0.0021 0.0008
0.0013 0.0004
values between all breeds represented by at least five individuals were used for theta correction. The database for these calculations comprised 771 individuals from 64 different breeds and yielded a global Fst value of 0.16. Alternatively, a breed-specific approach for calculating RMP for these three example profiles was applied. As the breed affiliations of the evidence profiles were known and corresponding breed-specific allele frequencies were available, the calculation of within-breed-RMPs was done following NRCII formulas 4.2a and 4.2b [53] using breed-specific allele frequencies (Supplementary Tables S3a-c) and breed specific FIS values as correction factors for intra-breed non-random association of alleles. Table 4 lists the RMP values obtained with the overall as well as the breed-specific approach. Sub-structure corrected RMP values computed on the basis of overall allele frequency data were roughly in the order of 10−13 to 10-14. These values are 2 to 6 orders of magnitude higher than those obtained with the breed-specific approach (ranging from 10-15 to 10-19). This is a clear indication that applying an overall approach, as exemplarily carried out here, may yield more conservative estimates of matching probabilities than considering breed-specific allele frequencies. However, a more extensive evaluation, e.g. by testing more breeds, varying the breed composition of the population sample and/or applying alternative correction factors, is still pending and may be subject to further studies. In conclusion, our results show that collecting DNA samples from a large number of dogs and aligning the sample’s breed composition as close as possible with that of the actual dog population under study is a good basis for obtaining reliable STR allele frequencies. These can be used for realistic and at the same time conservative (i.e. court-admissible) estimates for the weight of the DNA evidence. The data presented here is valid for a large geographical area, which makes it applicable to canine DNA cases from different countries.
Table 2 Summary statistics of forensically relevant parameters of the 13 analyzed canine STRs based on 1184 dogs, including expected heterozygosity (Hexp), polymorphism information content (PIC), power of discrimination (PD), observed heterozygosity (Hobs), power of exclusion (PE), typical paternity index (TPI), and the p-value for testing Hardy-Weinberg equilibrium (pHW). All parameters were calculated with the software tool STRAF [46]. locus
Hexp
PIC
PD
Hobs
PE
TPI
pHW
C38 FH2054 FH2087Ub FH2137 FH2328 FH2361 FH2508 FH2611 FH2613 PEZ15 PEZ3 PEZ6 WILMS-TF
0.901 0.847 0.846 0.931 0.865 0.844 0.838 0.899 0.879 0.887 0.864 0.888 0.898
0.893 0.829 0.828 0.927 0.850 0.827 0.818 0.890 0.868 0.877 0.849 0.878 0.889
0.979 0.958 0.957 0.988 0.965 0.956 0.953 0.979 0.971 0.974 0.965 0.976 0.978
0.761 0.693 0.685 0.787 0.702 0.744 0.661 0.763 0.715 0.715 0.702 0.789 0.753
0.530 0.417 0.405 0.575 0.431 0.500 0.371 0.532 0.451 0.452 0.432 0.579 0.514
2.096 1.626 1.587 2.349 1.677 1.954 1.476 2.107 1.751 1.757 1.680 2.368 2.020
< 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05
when population structure is not adequately accounted for. A feasible solution – favored in this study and also considered sufficiently conservative by others [11,59] – utilizes allele frequency data attained across breeds and an overall estimate of population structure (Theta correction). Therefore, this overall approach can also be used for evidence of unknown or uncertain breed affiliation. We calculated RMP for three STR genotypes from GS, LR, and GR (not being part of DPS-1184, genotypes see Table S1), respectively, by using the formulae of [54] to account for breed-based population stratification. In these calculations, DPS-1184 allele frequency data were applied (Table 1) and the Fst
96
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
Table 3 Numbers of different alleles and the corresponding allelic ranges for the 13 canine STRs (CaDNAP 13-STR panel) in the current population sample (DPS-1184) and the population sample published in [21] (DPS-295). Increased allelic ranges of DPS-1184 compared to DPS-295 are indicated in grey. d.n.a – does not apply. Locus
C38 FH2361 PEZ6 FH2613 WILMS-TF FH2137 FH2611 PEZ15 PEZ3 FH2328 FH2054 FH2508 FH2087Ub All
number of distinct alleles
repeat range (DPS-295)
repeat range (DPS-1185)
(DPS-295)
(DPS-1185)
Diff. (%)
min.
max.
span
min.
max.
span
39 34 30 26 27 28 22 22 14 13 11 13 10 289
58 44 38 36 36 34 26 26 21 16 15 14 11 375
48.7 29.4 26.7 38.5 33.3 21.4 18.2 18.2 50.0 23.1 36.4 7.7 10.0 29.8
11 10 14 8 8 15.3 14 6 20 12 9 9 7 d.n.a.
35.1 37 25.3 28.2 19.3 28 27.2 22.2 35 23 18 15.1 16
24.1 27 11.3 20.2 11.3 12.1 13.2 16.2 15 11 9 6.1 9
9 10 14 8 2.3 15.3 10 6 15 12 8 8 6
37.1 38 25.3 28.2 20 30 27.2 23.2 35 23 19 15.1 16
28.1 28 11.3 20.2 17.1 14.1 17.2 17.2 20 11 11 7.1 10
Table 4 RMP for three test samples (one dog of each of the three most frequent breeds GS, LR, and GR) using allele frequencies from all dogs or breed-specific frequencies. Applied correction factors and calculation methods [53] are specified. test sample
allele frequency database
Theta (FST)
1_GS 2_LR 3_GR
all breeds (DPS-1184)
0.16
1_GS 2_LR 3_GR
breed specific (GS) breed specific (LR) breed specific (GR)
FIS
0.0225 0.0535 0.0502
Declaration of Competing Interest
Formula
RMP
NRC 4.10
2.92E-13 4.55E-14 1.05E-13
NRC 4.2 [2pq not 2pq(1-F)]
2.09E-15 1.35E-19 9.08E-19
[9] B. van Asch, F. Pereira, State-of-the-Art and future prospects of canine STR-Based genotyping, Open Forensic Sci. J. 3 (2010) 45–52. [10] T. Kun, L.A. Lyons, B.N. Sacks, R.E. Ballard, C. Lindquist, E.J. Wictum, Developmental validation of Mini-DogFiler for degraded canine DNA, Forensic Sci. Int. Genet. 7 (1) (2013) 151–158. [11] E. Wictum, T. Kun, C. Lindquist, J. Malvick, D. Vankan, B. Sacks, Developmental validation of DogFiler, a novel multiplex for canine DNA profiling in forensic casework, Forensic Sci. Int. Genet. 7 (1) (2013) 82–91. [12] J.L. Halverson, C. Basten, Forensic DNA identification of animal-derived trace evidence: tools for linking victims and suspects, Croat. Med. J. 46 (4) (2005) 598–605. [13] R. Ogden, R.J. Mellanby, D. Clements, A.G. Gow, R. Powell, R. McEwing, Genetic data from 15 STR loci for forensic individual identification and parentage analyses in UK domestic dogs (Canis lupus familiaris), Forensic Sci. Int. Genet. 6 (2) (2012) e63–5. [14] M. Dayton, M.T. Koskinen, B.K. Tom, A.M. Mattila, E. Johnston, J. Halverson, et al., Developmental validation of short tandem repeat reagent kit for forensic DNA profiling of canine biological material, Croat. Med. J. 50 (3) (2009) 268–285. [15] B. van Asch, C. Alves, L. Gusmao, V. Pereira, F. Pereira, A. Amorim, A new autosomal STR nineplex for canine identification and parentage testing, Electrophoresis 30 (2) (2009) 417–423. [16] R.J. Mellanby, R. Ogden, D.N. Clements, A.T. French, A.G. Gow, R. Powell, et al., Population structure and genetic heterogeneity in popular dog breeds in the UK, Vet. J. 196 (1) (2013) 92–97. [17] C. Eichmann, B. Berger, W. Parson, A proposed nomenclature for 15 canine-specific polymorphic STR loci for forensic purposes, Int. J. Legal Med. 118 (5) (2004) 249–266. [18] C. Eichmann, B. Berger, M. Steinlechner, W. Parson, Estimating the probability of identity in a random dog population using 15 highly polymorphic canine STR markers, Forensic Sci. Int. 151 (1) (2005) 37–44. [19] A.P. Hellmann, U. Rohleder, C. Eichmann, I. Pfeiffer, W. Parson, U. Schleenbecker, A proposal for standardization in forensic canine DNA typing: allele nomenclature of six canine‐specific STR loci, J. Forensic Sci. 51 (2) (2006) 274–281. [20] C. Eichmann, B. Berger, M. Reinhold, M. Lutz, W. Parson, Canine-specific STR typing of saliva traces on dog bite wounds, Int. J. Legal Med. 118 (6) (2004) 337–342. [21] B. Berger, C. Berger, W. Hecht, A. Hellmann, U. Rohleder, U. Schleenbecker, et al., Validation of two canine STR multiplex-assays following the ISFG recommendations for non-human DNA analysis, Forensic Sci. Int. Genet. 8 (1) (2014) 90–100. [22] A. Linacre, L. Gusmao, W. Hecht, A.P. Hellmann, W.R. Mayr, W. Parson, et al., ISFG: recommendations regarding the use of non-human (animal) DNA in forensic genetic investigations, Forensic Sci. Int. Genet. 5 (5) (2011) 501–505. [23] M. Bodner, I. Bastisch, J.M. Butler, R. Fimmers, P. Gill, L. Gusmao, et al.,
The authors declare that they do not have any known conflict of interest. Acknowledgement We extend our warmest thanks to all dog owners and breeders enabling the present study through their interest in supporting canine forensic STR analysis. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.fsigen.2019.06.017. References [1] P. Gill, H. Haned, O. Bleka, O. Hansson, G. Dorum, T. Egeland, Genotyping and interpretation of STR-DNA: low-template, mixtures and database matches-Twenty years of research and development, Forensic Sci. Int. Genet. 18 (2015) 100–117. [2] M.A. Jobling, P. Gill, Encoded evidence: DNA in forensic analysis, Nat. Rev. Genet. 5 (10) (2004) 739–751. [3] M. Arenas, F. Pereira, M. Oliveira, N. Pinto, A.M. Lopes, V. Gomes, et al., Forensic genetics and genomics: much more than just a human affair, PLoS Genet. 13 (9) (2017) e1006960. [4] B. Budowle, P. Garofano, A. Hellman, M. Ketchum, S. Kanthaswamy, W. Parson, et al., Recommendations for animal DNA forensic and identity testing, Int. J. Legal Med. 119 (5) (2005) 295–302. [5] B.G. Cassidy, R.A. Gonzales, DNA testing in animal forensics, J. Wildl. Manage. 69 (4) (2005) 1454–1463. [6] S. Kanthaswamy, Review: domestic animal forensic genetics - biological evidence, genetic markers, analytical approaches and challenges, Anim. Genet. 46 (5) (2015) 473–484. [7] H. Miller Coyle, Nonhuman DNA Typing, Theory and Casework Applications, CRC Press, Boca Raton, FL, 2007. [8] C. Berger, B. Berger, W. Parson, Canine DNA profiling in forensic casework: the tail wagging the dog, Forensic Sci. Rev. 21 (1) (2009) 1–13.
97
Forensic Science International: Genetics 42 (2019) 90–98
B. Berger, et al.
[24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34]
[35] [36] [37]
[38] [39]
Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on quality control of autosomal Short Tandem Repeat allele frequency databasing (STRidER), Forensic Sci. Int. Genet. 24 (2016) 97–102. K.B. Gettings, L.A. Borsuk, D. Ballard, M. Bodner, B. Budowle, L. Devesse, et al., STRSeq: a catalog of sequence diversity at human identification Short Tandem Repeat loci, Forensic Sci. Int. Genet. 31 (2017) 111–117. C. Phillips, K.B. Gettings, J.L. King, D. Ballard, M. Bodner, L. Borsuk, et al., "The devil’s in the detail": release of an expanded, enhanced and dynamically revised forensic STR Sequence Guide, Forensic Sci. Int. Genet. 34 (2018) 162–169. S. Bjornerfeldt, F. Hailer, M. Nord, C. Vila, Assortative mating and fragmentation within dog breeds, BMC Evol. Biol. 8 (2008) 28. D.N. Irion, A.L. Schaffer, T.R. Famula, M.L. Eggleston, S.S. Hughes, N.C. Pedersen, Analysis of genetic variation in 28 dog breed populations with 100 microsatellite markers, J. Hered. 94 (1) (2003) 81–87. M.T. Koskinen, Individual assignment using microsatellite DNA reveals unambiguous breed identification in the domestic dog, Anim. Genet. 34 (4) (2003) 297–301. H.G. Parker, L.V. Kim, N.B. Sutter, S. Carlson, T.D. Lorentzen, T.B. Malek, et al., Genetic structure of the purebred domestic dog, Science 304 (5674) (2004) 1160–1164. G.D. Zouganelis, R. Ogden, N. Nahar, V. Runfola, M. Bonab, A. Ardalan, et al., An old dog and new tricks: genetic analysis of a Tudor dog recovered from the Mary Rose wreck, Forensic Sci. Int. 245 (2014) 51–57. S. Kanthaswamy, B.K. Tom, A.-M. Mattila, E. Johnston, M. Dayton, J. Kinaga, et al., Canine population data generated from a multiplex STR kit for use in forensic casework, J. Forensic Sci. 54 (4) (2009) 829–840. B. Berger, C. Berger, J. Heinrich, H. Niederstätter, W. Hecht, A. Hellmann, et al., Dog breed affiliation with a forensically validated canine STR set, Forensic Sci. Int. Genet. 37 (2018) 126–134. F.C.F. Calboli, J. Sampson, N. Fretwell, D.J. Balding, Population structure and inbreeding from pedigree analysis of purebred dogs, Genetics 179 (1) (2008) 593, https://doi.org/10.1534/genetics.107.084954. G. Larson, E.K. Karlsson, A. Perri, M.T. Webster, S.Y.W. Ho, J. Peters, et al., Rethinking dog domestication by integrating genetics, archeology, and biogeography, Proc. Natl. Acad. Sci. 109 (23) (2012) 8878, https://doi.org/10.1073/pnas. 1203005109. E.A. Ostrander, R.K. Wayne, A.H. Freedman, B.W. Davis, Demographic history, selection and functional diversity of the canine genome, Nat. Rev. Genet. 18 (12) (2017) 705–720. H.G. Parker, Genomic analyses of modern dog breeds, Mamm. Genome 23 (1-2) (2012) 19–27. H.G. Parker, D.L. Dreger, M. Rimbault, B.W. Davis, A.B. Mullen, G. CarpinteroRamirez, et al., Genomic analyses reveal the influence of geographic origin, migration, and hybridization on modern dog breed development, Cell Rep. 19 (4) (2017) 697–708. H.G. Parker, S.F. Gilbert, From caveman companion to medical innovator: genomic insights into the origin and evolution of domestic dogs, Adv. Genomics Genet. 5 (2015) 239–255. M. Jansson, L. Laikre, Pedigree data indicate rapid inbreeding and loss of genetic diversity within populations of native, traditional dog breeds of conservation concern 13 (9) (2018) e0202849.
[40] G. Leroy, X. Rognon, A. Varlet, C. Joffrin, E. Verrier, Genetic variability in French dog breeds assessed by pedigree data, J. Anim. Breed. Genet. 123 (1) (2006) 1–9. [41] S. Voges, O. Distl, Inbreeding trends and pedigree analysis of Bavarian mountain hounds, Hanoverian hounds and Tyrolean hounds, J. Anim. Breed. Genet. 126 (5) (2009) 357–365. [42] J.J. Evans, E.J. Wictum, M.C. Penedo, S. Kanthaswamy, Real-time polymerase chain reaction quantification of canine DNA, J. Forensic Sci. 52 (1) (2007) 93–96. [43] R. Peakall, P.E. Smouse, GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research, Mol. Ecol. Notes 6 (1) (2006) 288–295. [44] R. Peakall, P.E. Smouse, GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research-an update, Bioinformatics 28 (19) (2012) 2537–2539. [45] L. Excoffier, H.E. Lischer, Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows, Mol. Ecol. Resour. 10 (3) (2010) 564–567. [46] A. Gouy, M. Zieger, STRAF-A convenient online tool for STR data evaluation in forensic genetics, Forensic Sci. Int. Genet. 30 (2017) 148–151. [47] Z.N. Kamvar, J.C. Brooks, N.J. Grunwald, Novel R tools for analysis of genome-wide population genetic data with emphasis on clonality, Front. Genet. 6 (2015) 208. [48] Z.N. Kamvar, J.F. Tabima, N.J. Grunwald, Poppr: an R package for genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction, PeerJ 2 (2014) e281. [49] R_Core_Team, R: A Language and Environment for Statistical Computing, (2018). [50] J.K. Pritchard, M. Stephens, P. Donnelly, Inference of population structure using multilocus genotype data, Genetics 155 (2) (2000) 945–959. [51] L. Porras-Hurtado, Y. Ruiz, C. Santos, C. Phillips, A. Carracedo, M.V. Lareu, An overview of STRUCTURE: applications, parameter settings, and supporting software, Front. Genet. 4 (2013) 98. [52] N.M. Kopelman, J. Mayzel, M. Jakobsson, N.A. Rosenberg, I. Mayrose, Clumpak: a program for identifying clustering modes and packaging population structure inferences across K, Mol. Ecol. Resour. 15 (5) (2015) 1179–1191. [53] National Research Council, The Evaluation of Forensic DNA Evidence, Washington (DC) (1996). [54] D.J. Balding, R.A. Nichols, DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands, Forensic Sci. Int. 64 (2-3) (1994) 125–140. [55] G. Leroy, Genetic diversity, inbreeding and breeding practices in dogs: results from pedigree analyses, Vet. J. 189 (2) (2011) 177–182. [56] N. Pedersen, H. Liu, G. Theilen, B. Sacks, The effects of dog breed development on genetic diversity and the relative influences of performance and conformation breeding, J. Anim. Breed. Genet. 130 (3) (2013) 236–248. [57] K. Streitberger, M. Schweizer, R. Kropatsch, G. Dekomien, O. Distl, M.S. Fischer, et al., Rapid genetic diversification within dog breeds as evidenced by a case study on Schnauzers, Anim. Genet. 43 (5) (2012) 577–586. [58] M. Smetanova, B. Cerna Bolfikova, E. Randi, R. Caniglia, E. Fabbri, M. Galaverni, et al., From wolves to dogs, and back: genetic composition of the czechoslovakian wolfdog, PLoS One 10 (12) (2015) e0143807. [59] S. Kanthaswamy, R.F. Oldt, M. Montes, A. Falak, Comparing two commercial domestic dog (Canis familiaris) STR genotyping kits for forensic identity calculations in a mixed-breed dog population sample, Anim. Genet. 50 (1) (2019) 105–111.
98