Forensic Science International: Genetics 6 (2012) 819–826
Contents lists available at SciVerse ScienceDirect
Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsig
European Network of Forensic Science Institutes (ENFSI): Evaluation of new commercial STR multiplexes that include the European Standard Set (ESS) of markers L.A. Welch a,*, P. Gill b,i,**, C. Phillips c, R. Ansell d, N. Morling e, W. Parson f, J.U. Palo g, I. Bastisch h a
Centre for Forensic Science, Department of Pure and Applied Chemistry, University of Strathclyde, Glasgow G42 9TA, United Kingdom Department of Forensic Genetics, Norwegian Institute of Public Health, Oslo, Norway Forensic Genetics Unit, University of Santiago de Compostela, Spain d Swedish National Laboratory of Forensic Science, Sweden e Department of Forensic Medicine, University of Copenhagen, Denmark f Institute of Legal Medicine, Innsbruck Medical University, Muellerstrasse 44, 6020 Innsbruck, Austria g Laboratory of Forensic Biology, University of Helsinki, Finland h Bundeskriminalamt, Germany i University of Oslo, Oslo, Norway b c
A R T I C L E I N F O
A B S T R A C T
Article history: Received 10 November 2011 Received in revised form 20 March 2012 Accepted 22 March 2012
To support and to underpin the European initiative to increase the European set of standard markers (ESS), by the addition of five new loci, a collaborative project was organised by the European Network of Forensic Science Institutes (ENFSI) DNA working group in order to assess the new multiplex kits available. We have prepared allele frequency databases from 26 EU populations. Concordance studies were carried out to verify that genotyping results were consistent between kits. Population genetics studies were conducted and it was estimated that FST < 0.001. The results showed that the kits were comparable to each other in terms of performance and major discrepancy issues were highlighted. We provide details of allele frequencies for each of the populations analysed per laboratory. ß 2012 Published by Elsevier Ireland Ltd.
Keywords: ENFSI DNA STR multiplexes Europe
1. Introduction It is now more than ten years since the SGM plus and other equivalent systems were introduced into a number of European laboratories. Since then, national DNA databases have grown considerably in size. The Pru¨m Treaty (Convention) was established in 2005 between Belgium, Germany, Spain, France, Luxembourg, the Netherlands and Austria to ‘‘step up cross-border cooperation, particularly in combating terrorism, cross-border crime and illegal migration’’ [1]. It was formalised as part of an EU Council decision in 2008, with additional countries signing the declaration (Finland, Slovenia, Hungary, Norway, Estonia and Romania). Italy also signed the declaration in 2009. This Convention enables participating countries to compare unidentified DNA profile data with other databases in order to facilitate cross-border data exchange. In 2006 the European Network of Forensic Science Institutes (ENFSI) DNA working group and the European DNA Profiling (EDNAP) group * Corresponding author. Tel.: +1 240 626 6856. ** Corresponding author. E-mail addresses:
[email protected],
[email protected] (L.A. Welch),
[email protected] (P. Gill). 1872-4973/$ – see front matter ß 2012 Published by Elsevier Ireland Ltd. http://dx.doi.org/10.1016/j.fsigen.2012.03.005
published recommendations outlining a proposed evolution of DNA databases across Europe [2,3]. Previous collaborative exercises had shown that success rates from degraded DNA could be markedly improved if smaller amplicons were utilised [4,5]. In particular, Butler et al. had proposed a number of ‘mini-STRs’ for inclusion. These ‘mini-STRs’ had the advantage of satisfying both criteria of small amplicon size and heterozygosities >80% [6]. Historically, there were only seven STR loci common to the available STR multiplex kits – D3S1358, vWA, D8S1179, D21S11, D18S51, HUMTH01 and FGA [7]. The evolution of DNA databases meant that more loci would need to be added to the standard set in order to facilitate the sharing of data between countries. Gill et al. suggested the addition of five new loci to increase the standard set of loci from seven to twelve – D10S1248, D12S391, D22S1045, D1S1656 and D2S441 [2,3] – and these were adopted by an EU Council recommendation in 2009 [8] as the new European Standard Set (ESS) loci. The addition of new loci into the ESS decreases the chance of obtaining false positive matches with cross-border DNA data exchanges – especially when there are partial (incomplete) profiles, whilst the small amplicon sizes of the new loci increases the chance of amplification in degraded sample, where DNA may be fragmented and/or in low quantity.
L.A. Welch et al. / Forensic Science International: Genetics 6 (2012) 819–826
820
In response to the ENFSI/EDNAP recommendation to increase the number of STR loci available, two commercial companies, Applied BiosystemsTM and Promega Corporation, developed new multiplex systems which included the five new ESS loci. In addition, all of the new multiplex kits available have improved buffer systems. This means that there is increased amplification efficiency, reduced inhibition effects and the inclusion of miniSTRs increases the chance of obtaining a DNA profile from degraded sample types. The Applied BiosystemsTM AmpF‘STR1 NGMTM PCR Amplification Kit amplifies 15 STR loci plus amelogenin, incorporating all of the ten loci from the previous AmpF‘STR1 SGM1 Plus PCR Amplification Kit as well as the five new ESS loci. The Promega PowerPlex1 ESX-16 and ESI-16 Systems amplify all twelve ESS loci plus D2S1358, D16S539, D19S433 and amelogenin; ESX-17 and ESI-17 Systems incorporate an additional locus, SE33. QIAGEN also has a license to distribute commercial STR multiplex kits and the ESSplex1 was included in this study. ESSplex1 incorporates the same STR loci as the ESX-16 and ESI-16 Systems from Promega, whilst their ESSplex SE1 kit includes SE33. A collaborative project was organised by the ENFSI DNA working group in order to assess the new multiplex kits available. We have prepared allele frequency databases from 26 EU populations. Some laboratories have published either all or part of their data set as separate publications [9–15]. Concordance studies were carried out to verify genotyping results were consistent between kits. Population genetics studies were conducted. 2. Materials and methods 2.1. Sample collection A total of 26 European laboratories participated. Genotype data were collected and amalgamated onto Microsoft Excel1 spreadsheets listing the loci specific to each multiplex tested (Table 1). Each laboratory applied for ethical approval within their local jurisdictions before carrying out analysis. Data were provided anonymously. For legal reasons, four laboratories could only provide ‘shuffled’ data sets, whereby the alleles within each locus were randomly assorted. This had no effect on the allele frequencies of the data set but negated their inclusion in population genetics and concordance studies as the original genotypes were unknown. Table 1 STR loci investigated in each kit (plus amelogenin = AMELO). Loci ordered according to size and dye set used. NGM = AmpF‘STR1 NGMTM PCR Amplification Kit; ESX-16/ 17 = PowerPlex1 ESX-16/ESX-17 (+SE33); ESI-16/17 = PowerPlex1 ESI-16/ESI-17 (+SE33); ESSPlex = QIAGEN1 Investigator ESSPlex.
STR locus
Dye colour
NGM
ESX-16/17
ESI-16/17
ESSPlex
Blue
D10S1248 vWA D16S539 D2S1338
AMELO D3S1358 D19S443 D2S1338 D22S1045 D16S539 D18S51 D1S1656 D10S1248 D2S441 THO1 VWA D21S11 D12S391 D8S1179 FGA (SE33)
AMELO THO1 D3S1358 VWA D21S11 D16S539 D1S1656 D19S433 D8S1179 D2S1338 D10S1248 D22S1045 D12S391 FGA D2S441 D18S51
Green
AMELO D8S1179 D21S11 D18S51
AMELO D3S1358 THO1 D21S11 D18S51 D10S1248 D1S1656 D2S1338 D16S539
Yellow
D22S1045 D19S433 TH01 FGA D2S441 D3S1358 D1S1656 D12S391
D22S1045 VWA D8S1179 FGA D2S441 D12S391 D19S443 (SE33)
Red
2.2. Data analysis 2.2.1. Sample collation and data transformation Alleles that fell outside of the range of the control allelic ladders were designated as ‘rare’ but still allocated an allele number. Samples with multiple alleles missing were removed from further analysis. Population data for each kit were amalgamated into Excel1 spreadsheets. Sample IDs were anonymised to simple numbers indicative of the Lab ID (e.g. 1-1, 1-2, 1-200; 35-1, 35-2, 35-60; for labs 1 and 35, respectively). After collation, CONVERT software was used to create files suitable for population analysis programs. CONVERT.exe ‘‘facilitates ready transfer of co-dominant, diploid genotypic data amongst commonly used population genetic software packages’’ [16]. 2.2.2. Concordance studies A MatLab1 program was prepared by P Gill to detect duplicate data within each population dataset (within a single kit), and to discover non-concordances between the different STR kits used. A number of duplicates were detected within the datasets. After verification by the submitting laboratory, one of each pair of duplicate data was deleted from each affected population. All nonconcordant samples were checked with corresponding laboratories, before any corrections were made to datasets. 2.2.3. Allele frequency calculations CONVERT software was used to produce a table of allele frequencies per locus across all populations tested. An overall allele frequency for the entire data set was also calculated. Sample numbers for each population were recorded. 2.2.4. Hardy–Weinberg calculations Arlequin software was used to carry out testing for Hardy– Weinberg equilibrium for each locus within each population [17]. Arlequin uses a test analogous to Fisher’s Exact test to calculate an overall probability value for each locus, based on the algorithm outlined by Guo and Thompson [18]. Hardy–Weinberg expectations were also evaluated using Genetic Data Analysis (GDA) software (18) for comparison. This method is also based on Fisher’s Exact tests, as described by Weir [19]. 2.2.5. Genetic data analysis (GDA) GDA software [18], with Fisher’s Exact tests [20], was used to study associations between and within loci which may be indicative of disequilibrium. With hundreds of tests carried out simultaneously, a large number of statistically significant results will be expected to occur by chance. Probabilities should be uniformly distributed between zero and one. Probability plots (P–P plots) following Buckleton et al. [21] were used to test this. GDA software was also used to calculate the heterozygosity of each locus and FST obtained with the different kits. 2.2.6. Profile frequency The profile frequency was calculated for each population for each kit using the product rule. The most common genotypes (Pm) were used to create a STR profile that was heterozygous at every locus, and was the least discriminating genotype [22]. Common genotypes were calculated from the combined allele frequencies for all populations for each kit. The frequencies of the two most common alleles per locus were used for the calculations using Eq. (1):
Pm ¼
n Y i¼1 p 6¼ q
2 pi qi
(1)
L.A. Welch et al. / Forensic Science International: Genetics 6 (2012) 819–826
821
Table 2 Anonymous Lab IDs were allocated prior to data collection. Multiple sample numbers in the second column indicate that a different number of samples were genotyped with each kit. ESX and ESI systems were separated to indicate those labs that included locus SE33. LAB ID a
1 2 3 4 5 6 7 8 10 11 12 13 14 17 18 20 22a 23 25 26 27 32 33a 34a 35 36 a b
No. of samples 333/330 206 142 151 45 331 247 202 200 132 200 206 217/224 200 202 208 210 304 200 284 425 207 208 171 200 104
NGM
ESX-16
ESX-17
ESI-16
ESI-17
x x x x x
x x x
x x
x x x
x x
x x
2 1 2 3b 3b 2 1 3b 1 3b 3b 2b 2b 1 2b 2b 4 2b 2b 3 1 1 1 1 3b 1
x x
x
x x x x x x
No. of kits usedb
ESSplex
x x x
x x x x
x x x
x
x x
x
x
x
x
x
x x x x x
x
x x
Indicates laboratories that provided shuffled data. Indicates concordance tested.
where p is the most common allele frequency for locus i; q is the second most common allele frequency at the same locus i; n is the number of loci in the kit. 3. Results and discussion
discordances were found within one population; the remaining three were in one other population. These differences were homozygote/heterozygote genotypes in the following loci: D8S1179, vWA, D3S1358, FGA, D10S1248 and D18S51 (Table 4b). The two resolved discordances seen between NGM and ESI were homozygote/heterozygote genotypes in D1S1656 and
3.1. Sample collection Data were collected from 26 European countries (Table 2).
Table 4a The number of comparisons and discordances seen between different STR kits.
3.2. Data analysis
Kit comparison
3.2.1. Duplicate pairs and non-concordance samples Table 3 shows the number of duplicate samples observed across all populations for each STR kit tested. Duplicates were only removed after verification from the participant laboratory. All statistical analysis was carried out on the verified sets of data. One population was found to have twenty discordances between NGM and ESI kits. This population was removed from further analysis due to the high error rate in the laboratory leading to unreliable genotyping results (data not shown). The majority of initial discordance was a result of transcriptional error (Table 4a). For example: an 18.3 allele at D1S1656 typed as 18.8; a 29 allele at D21S11 typed as 20; a 22 allele at FGA typed as 12; and at amelogenin, where XX was mis-transcribed as an XY. After these initial transcriptional discordances were removed, 13 discordances remained (‘resolved’ discordances) (Tables 4a and 4b). Seven out of the ten resolved NGM/ESX
18,756 18,372 511 No. of comparisons 11 8 0 Initial no. of discordances 10 2 0 Final no. of resolved discordances Concordance 99.95 99.99 100.00 (%)
Table 3 Number of pairs of duplicated data per kit tested.
No. of duplicate pairs Total no. of samples genotyped Duplication rate (%)
NGM
ESX
ESI
ESS
6 3316
3 2977
2 3204
0 690
0.18
0.10
0.06
0.00
NGM/ESX NGM/ESI
NGM/ESS ESX/ESI
ESX/ESS ESI/ESS
24,800
932
432
2
0
0
1
0
0
99.99 100.00
100.00
Table 4b Resolved discordances – i.e. discordances remaining after transcriptional errors were removed from the datasets. Discordant locus
Genotype Kit 1
Genotype Kit 2
vWA vWA D18 vWA vWA D10S1248 FGA D18S51 D3S1358 D8S1179 D1 D2 Amelogenin
19, 19 [NGM] 14, 14 [NGM] 15, 15 [NGM] 14, 14 [NGM] 14, 14 [NGM] 13, 16 [NGM] 19, 25 [NGM] 14, 20 [NGM] 15, 17 [NGM] 14, 15 [NGM] 16, 17 [NGM] 17, 24 [NGM] X, X [ESX]
16, 19 [ESX] 14, 16 [ESX] 12, 15 [ESX] 14, 18 [ESX] 14, 17 [ESX] 16, 16 [ESX] 19, 19 [ESX] 20, 20 [ESX] 17, 17 [ESX] 14, 14 [ESX] 17, 17 [ESI] 24, 24 [ESI] X, Y [ESI]
822
L.A. Welch et al. / Forensic Science International: Genetics 6 (2012) 819–826
Fig. 1. (a) Discordance seen at the vWA locus. The top panel shows a 14, 14 homozygous genotype using the NGM kit. The bottom panel reveals a 14, 17 genotype using ESX17. (b) Discordance seen between ESI and ESX at the FGA locus.
D2S1338 (Table 4b). The resolved discordance between ESX and ESI was at the FGA locus – ESX genotyped as 23, 24.1 and ESI was homozygous 23, 23 (see examples, Fig. 1a and b). Discordance rates were very low for all kits, validating any kit for use by any laboratory. It is well known that different kits can sometimes produce different results, due to the presence of primer-binding site mutations, sequence differences within primer-binding regions, and insertions/deletions within the amplicon region [23–26]. Five instances of tri-allelic genotypes were noted in separate samples, regardless of the kit used, for vWA, D18S51, SE33, D2S1338 and FGA loci. These genotypes were verified using different STR kits (see supplementary data for examples).
Table 5 Rare alleles identified using the NGM STR kit. NGM and ESX/ESI ladder ranges are included for comparison. NB. Only the rare allele 28 at D12S391 was not present in any allelic ladder. Locus
Rare allele designation
NGM ladder range
ESX/ESI ladder ranges
D12S391 D10S1248 D2S1338 D19S443 FGA D2S441 D3S1358
28 19 12, 18, 15, 8 10,
14-27 8–18 15–28 9–17.2 17–33.2; 42.2–51.2 9–16 12–19
14–27 8–19 10–28 5.2–18.2 14–33.2; 40.2–50.2 8–17 9–20
13 18.2 16 11
L.A. Welch et al. / Forensic Science International: Genetics 6 (2012) 819–826
823
Table 6 P-values for the NGM STR kit Hardy–Weinberg equilibrium (HWE) calculations generated using GDA software. A Sidak correction, if a significance level of a = 0.05 is used for 15 loci per population, corresponds to a[Per Test] = 0.0034 (which was used as the significance level). P-values highlighted in bold indicate loci that show a statistical deviation away from HWE. Population ID
3
4
5
8
10
12
13
14
18
20
23
26
32
35
D10S1248 vWA D16S539 D2S1338 D8S1179 D21S11 D18S51 D22S1045 D19S433 TH01 FGA D2S441 D3S1358 D1S1656 D12S391
0.778 0.946 0.435 0.100 0.524 0.482 0.006 0.066 0.417 0.555 0.272 0.285 0.745 0.081 0.753
0.193 0.420 0.125 0.678 0.338 0.193 0.953 0.256 0.082 0.151 0.557 0.745 0.325 0.851 0.494
0.718 0.525 0.550 0.688 0.271 0.879 0.711 0.792 0.818 0.981 0.804 0.341 0.903 0.457 0.241
0.813 0.749 0.135 0.457 0.472 0.258 0.190 0.841 0.345 0.099 0.601 0.414 0.615 0.515 0.881
0.066 0.444 0.597 0.205 0.143 0.059 0.007 0.629 0.156 0.075 0.002 0.529 0.876 0.862 0.592
0.050 0.522 0.043 0.198 0.906 0.824 0.867 0.671 0.146 0.056 0.828 0.624 0.568 0.953 0.177
0.403 0.121 0.230 0.034 0.678 0.205 0.352 0.794 0.423 0.913 0.398 0.142 0.003 0.115 0.719
0.013 0.190 0.190 0.612 0.433 0.412 0.919 0.402 0.473 0.262 0.894 0.425 0.858 0.879 0.658
0.712 0.937 0.929 0.230 0.302 0.128 0.786 0.457 0.808 0.490 0.121 0.628 0.678 0.244 0.576
0.369 0.506 0.268 0.353 0.525 0.479 0.373 0.874 0.888 0.849 0.234 0.742 0.873 0.013 0.310
0.346 0.084 0.193 0.806 0.581 0.187 0.363 0.183 0.910 0.582 0.666 0.972 0.924 0.194 0.477
0.194 0.475 0.603 0.475 0.685 0.059 0.299 0.441 0.156 0.741 0.341 0.838 0.618 0.505 0.414
0.549 0.282 0.543 0.048 0.143 0.930 0.246 0.275 0.640 0.646 0.524 0.238 0.579 0.660 0.206
0.097 0.947 0.364 0.380 0.455 0.298 0.722 0.721 0.441 0.223 0.232 0.519 0.135 0.547 0.533
3.2.2. Allele frequency tables Allele frequency tables were compiled for each STR kit for every population tested. Rare alleles, i.e. alleles that fell outside of the ladder range, were observed in some populations using the NGM kit (Table 5). The ESX and ESI allelic ladder ranges were wider than the NGM allelic ladder ranges. Most rare alleles had corresponding peaks within the ESX/ESI ladder. The one exception to this was an allele 28 at D12S391 (the allelic range extended to 27).
The allele frequencies observed between the different European populations were similar for each locus. This was not unexpected, due to previous studies on similar population sets [27]. Consequently, low u values (<0.005) were recorded (Section 3.2.6). 3.2.3. Hardy–Weinberg calculations Hardy–Weinberg calculations were carried out using both Arlequin and GDA programs, as described in Section 2.2.4. The two
Table 7 P-values for the ESX STR kit Hardy–Weinberg equilibrium (HWE) calculations generated using GDA software. No statistical deviations from HWE were observed. A Sidak correction was used where a[Per Test] = 0.0032 corresponds to a = 0.05. Population ID
6
8
11
12
20
23
25
26
27
35
D3S1358 THO1 D21S11 D18S51 D10S1248 D1S1656 D2S1338 D16S539 D22S1045 VWA D8S1179 FGA D2S441 D12S391 D19S443 SE33
0.137 0.884 0.401 0.358 0.733 0.373 0.121 0.082 0.778 0.874 0.521 0.233 0.564 0.474 0.560 0.346
0.568 0.036 0.709 0.169 0.854 0.693 0.457 0.056 0.800 0.677 0.496 0.481 0.696 0.423 0.569 0.060
0.908 0.742 0.149 0.612 0.265 0.840 0.233 0.610 0.300 0.985 0.951 0.797 0.560 0.642 0.448 0.362
0.557 0.151 0.795 0.695 0.009 0.944 0.378 0.135 0.690 0.368 0.860 0.881 0.448 0.677 0.303 0.998
0.781 0.879 0.104 0.367 0.162 0.684 0.264 0.151 0.877 0.347 0.769 0.211 0.755 0.021 0.860 –
0.898 0.244 0.034 0.449 0.739 0.120 0.828 0.107 0.156 0.173 0.818 0.561 0.863 0.512 0.236 –
0.040 0.620 0.797 0.029 0.628 0.392 0.449 0.162 0.073 0.379 0.544 0.906 0.268 0.163 0.914 0.526
0.491 0.693 0.212 0.918 0.136 0.439 0.384 0.575 0.285 0.529 0.735 0.228 0.989 0.200 0.097 0.611
0.187 0.391 0.764 0.247 0.815 0.461 0.953 0.961 0.824 0.968 0.469 0.194 0.354 0.379 0.261 –
0.023 0.298 0.456 0.635 0.221 0.200 0.448 0.588 0.837 0.936 0.583 0.448 0.214 0.140 0.633 0.148
Table 8 P-values for the ESI STR kit Hardy–Weinberg equilibrium (HWE) calculations generated using GDA software. P-values highlighted in bold indicates a locus that shows a statistical deviation away from HWE. A Sidak correction was used where a[Per Test] = 0.0032 corresponds to a = 0.05. Population ID
3
4
5
6
7
8
11
12
13
14
17
18
25
36
D3S1358 D19S443 D2S1338 D22S1045 D16S539 D18S51 D1S1656 D10S1248 D2S441 THO1 VWA D21S11 D12S391 D8S1179 FGA SE33
0.992 0.295 0.108 0.106 0.047 0.287 0.182 0.767 0.328 0.471 0.933 0.663 0.838 0.637 0.332 0.892
0.303 0.025 0.700 0.254 0.140 0.933 0.733 0.171 0.738 0.170 0.320 0.173 0.006 0.155 0.526 0.402
0.893 0.820 0.690 0.782 0.544 0.707 0.442 0.721 0.333 0.982 0.534 0.874 0.235 0.285 0.805 –
0.159 0.611 0.150 0.799 0.038 0.685 0.524 0.568 0.663 0.893 0.880 0.842 0.001 0.245 0.010 0.343
0.210 0.159 0.657 0.775 0.293 0.445 0.268 0.377 0.264 0.655 0.227 0.762 0.422 0.471 0.563 0.133
0.618 0.356 0.471 0.851 0.131 0.186 0.515 0.815 0.413 0.100 0.744 0.260 0.868 0.473 0.604 0.188
0.125 0.302 0.534 0.787 0.519 0.138 0.411 0.693 0.491 0.687 0.163 0.610 0.284 0.878 0.732 0.238
0.561 0.142 0.195 0.650 0.038 0.846 0.950 0.049 0.628 0.050 0.531 0.828 0.173 0.903 0.820 0.944
0.006 0.413 0.031 0.786 0.233 0.350 0.074 0.412 0.128 0.903 0.134 0.205 0.752 0.700 0.391 0.312
0.783 0.471 0.716 0.396 0.273 0.949 0.848 0.018 0.500 0.288 0.245 0.448 0.609 0.452 0.708 0.403
0.533 0.040 0.364 0.173 0.452 0.653 0.293 0.203 0.344 0.869 0.333 0.072 0.537 0.078 0.779 0.022
0.800 0.796 0.364 0.374 0.883 0.750 0.232 0.740 0.687 0.394 0.918 0.132 0.609 0.320 0.120 –
0.011 0.977 0.486 0.042 0.312 0.098 0.569 0.730 0.342 0.588 0.335 0.793 0.571 0.563 0.745 0.446
0.131 0.054 0.110 0.343 0.968 0.529 0.195 0.443 0.016 0.673 0.104 0.235 0.612 0.928 0.879 –
L.A. Welch et al. / Forensic Science International: Genetics 6 (2012) 819–826
824
datasets were statistically compared to determine whether there was any difference between the results from the two separate programs. A Mann–Whitney statistical test for differences between population medians was used and no significant differences were observed. (NGM, Pr = 0.9593. ESX, Pr = 0.9730. ESI, Pr = 0.9520, ESS, Pr = 0.9831) The GDA dataset was used, to utilise the same population genetics program for all further tests. The arbitrary significance level of a < = 0.05 would be achieved in c. 1 in 20 tests by chance. Given the large number of tests utilised, a Sˇida´k correction was applied to the data to give a revised significance level of a = 0.01 [28]. At this level, there was some deviation from Hardy–Weinberg within the NGM, ESI and ESX datasets (Tables 6–8). Further investigation of these populations revealed that tests of significance were greatly affected by homozygous or heterozygous rare allele genotypes (Table 9). To explore the cause of the deviations (temporary) experimental deletion of rare genotypes from data sets removed the significant P-values (data not shown). Data were reinstated for all further tests. Subject to the observation of the rare genotypes, these results indicated that the assumption of independence was reasonable for all populations [29–31]. 3.2.4. Linkage disequilibrium Probability plots (P–P plots) were created for each population for each STR kit, as described in Section 2.2.5, using data derived from GDA calculations for linkage disequilibrium. Most plots fell within the matrix boundaries, however some lay outside of the predicted range. Example plots are given in Fig. 2. Further exploratory analysis of these populations again identified rare
Table 9 Rare allele genotypes identified within populations that deviated from HWE. NB. One sample in population 13 gave the same rare genotype 12, 12 for D3S1358 using both NGM and ESI. Note that a rare homozygote is not expected, and may be indicative of inbreeding. Population ID
Locus
STR Kit
Rare genotype
Allele frequency
2 3 6 10 10 12 13 13
D2S441 D18S51 D12S391 D18S51 FGA D10S1248 D3S1358 D3S1358
NGM NGM ESI NGM NGM ESX NGM ESI
13.3, 13.3 [2] 10, 10 20.3, 20.3 11, 11 24.2, 26.2 12, 17 12, 12 12, 12
0.0122 0.0112 0.003 0.0075 0.0025, 0.0025 0.0325, 0.05 0.0073 0.0073
genotypes, causing significant differences as outlined in Table 9. Once removed from the dataset, populations were within the ranges predicted by chance. 3.2.5. Heterozygosity Heterozygosity values for each locus were calculated by combining all population datasets. All loci gave heterozygosity values above 0.695 (69.5%). Nine loci had values greater than 0.795 (79.5%), with the highest values found in the SE33 locus (0.95, 95%) present in the ESX and ESI STR kits. 3.2.6. FST values FST (u) values were calculated from the combined datasets for each kit using GDA software, as outlined by Weir [19]. FST estimates the differentiation between populations (population structure).
Random SamplesP-P plot Population Number 4
Random SamplesP-P plot Population Number 10
1
1 0.9
Observed p-values
Observed p-values
0.9 0.8 0.7 0.6 0.5 0.4
0.8 0.7 0.6 0.5 0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
1
0
0.1
0.2
0.3
0.9
0.9
0.8
0.8
Observed p-values
Observed p-values
1
0.7 0.6 0.5 0.4
0.2 0.1 0.4
0.5
0.6
0.7
Expected p-values
0.9
1
0.4
0.1 0.3
0.8
0.5
0.3
0.2
0.7
0.6
0.2
0.1
0.6
0.7
0.3
0
0.5
Random SamplesP-P plot Population Number 18
Random SamplesP-P plot Population Number 14 1
0
0.4
Expected p-values
Expected p-values
0.8
0.9
1
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Expected p-values
Fig. 2. Examples of P–P plots derived from linkage disequilibrium data generated by GDA software. Blue lines indicate maximum/minimum values; red lines indicate the 5th/ 95th percentiles; grey line indicates the median. Black line is the observed data plot. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)
L.A. Welch et al. / Forensic Science International: Genetics 6 (2012) 819–826
FST (θ) values for each locus (per kit) 0.006 0.005 0.004 0.003
NGM ESX
0.002
ESI
0.001 0
Fig. 3. FST (u) values for each locus using NGM, ESI and ESX STR kits. ESS was not included in the analysis due to the limited population size (n = 2).
The NRCII report published in 1996 recommended the use of u = 0.01 for general populations and 0.03 for smaller isolated populations where there is a higher likelihood of inbreeding [32]. The value of 0.01 for the US population has been shown to be highly conservative [33,34] and the degree of polymorphism within the allele frequencies across all of the European populations was similar (Section 3.2.2), therefore it was anticipated that u values would be low across the dataset. The highest u observed was 0.005 (NGM, D10S1248). All other u values were less than 0.004 (Fig. 3), an order of magnitude lower than the recommended 0.01 used in frequency calculations. 3.2.7. Profile frequency Profile frequencies were calculated for each population, using the most common alleles approach, as described in Section 2.2.6. Table 10 Most common genotype profiles for each STR kit. Alleles in bold indicate variation between kits. This was due to high similarities between different allele frequencies, altering the commonest alleles relative to the kit used and sample numbers utilised in the calculation. Locus
NGM
ESI
ESX
ESS
D10S1248
13 14 16 17 11 12 17 20 13 14 29 30 14 15 15 16 13 14 6 9.3 21 22 11 14 15 16 15 17.3 18 20
14 13 17 18 11 12 17 20 13 14 29 30 13 15 15 16 13 14 6 9.3 21 22 11 14 15 16 15 17.3 18 20 27.2 19
14 13 16 17 11 12 17 20 13 14 29 30 12 15 15 16 13 14 6 9.3 21 22 11 14 15 16 15 17.3 18 20 27.2 28.2
13 14 16 17 11 12 17 20 13 14 29 30 14 15 16 15 13 14 6 9.3 21 22 11 14 15 16 12 15 18 19
vWA D16S539 D2S1338 D8S1179 D21S11 D18S51 D22S1045 D19S433 TH01 FGA D2S441 D3S1358 D1S1656 D12S391 SE33
825
The most common genotype profiles were similar amongst the different kits, with slight variation at some loci (Table 10). An overall profile frequency was calculated for each kit by using a combined allele frequency, i.e. all samples were collated for all populations to provide overall allele frequencies per kit. These would be the allele frequencies used for a pan-European database. The overall profile frequencies were calculated as: 2.7 1015 (NGM), 3.0 1017 (ESX-17), 2.0 1017 (ESI-17), 7.0 1015 (ESS), 2.0 1015 (ESX-16) and 1.5 1015 (ESX-16). ESX-17 and ESI-17 gave profile frequencies that were 2–3 orders of magnitude above NGM, ESS, ESX-16 and ESI-16 due to the presence of the highly heterozygous SE33 locus (heterozygosity = 95%). 4. Conclusion The increased use of forensic DNA analysis in criminal casework has seen the acceleration of DNA profiles added to national DNA databases. The increased number of profiles leads to an increased likelihood of false positive matches. This likelihood is increased further by the advent of the Pru¨m Treaty [1], which allows sharing of DNA profile data across individual countries. As a result of a European requirement to update and improve DNA profiling systems, new STR profiling kits have become available allowing the analysis of more loci, with smaller amplicon sizes and enhanced buffer systems. These kits contain all of the ESS loci agreed by the EU commission, as well as extra loci, which are common to previously manufactured kits. The analysis of STR data from 26 European populations shown here indicates that the new STR kits are fit-for-purpose. There does not appear to be significant deviation from Hardy–Weinberg Equilibrium, and population substructure is accommodated by application of a correction factor using FST = 0.01; rare alleles were shown to have a large affect on statistical tests, otherwise genotypes can be considered to be in linkage equilibrium. Some discordances were noted between kits, but this is to be expected and can be accommodated in database searches. Overall, the profile frequencies generated by the new kits greatly decreases the chance of obtaining false positive results. Acknowledgments Institute of Forensic Research, Poland. Reparto Carabinieri Investigazioni Scientifiche, Italy – Andrea Berti. Ministero dell’Interno, Direzione Centrale Anticrimine della Polizia di Stato, Servizio Polizia Scientifica, Italy – Renato Biondo. The Forensic Science Service, UK – Andy Hopwood, Valerie Tucker. Landeskriminalamt Northrhine-Westfalia, Germany – Sabine West. Institute of Forensic Science, Slovakia – Roman Lohaj. Institute of Legal Medicine, Zu¨rich, Switzerland – Cordula Haas, Helen Burri. Forensic Centre, Montenegro – Sandra Kovacevic. Institute of Legal Medicine, Austria – Martin Steinlechner, Daniela Niederwieser. Institute of Criminalistics, Prague, Czech Republic – Vlastimil Stenzl, P. Coufalova. National Institute of Criminalistics and Criminology, Brussels, Belgium – Sophie Dognaux, Tom Heylen and Fabrice Noe¨l. Institute for Forensic Sciences, Budapest, Hungary – Sandor Furedi. Department of Forensic Medicine, University of Copenhagen, Denmark – Maria C. Stene. Institute of Forensic Medicine, Oslo – Oskar Hannson.
826
L.A. Welch et al. / Forensic Science International: Genetics 6 (2012) 819–826
Institut National de Police Scientifique, France, Laboratoires de Lille, Lyon, Marseille, Paris and Toulouse – Annick Delaire. National Bureau of Investigation, Finland – Matti Karjalainen. Eolaı´ocht Fho´ire´inseach E´ireann, Republic of Ireland – Maureen Smyth. School of Criminal Justice, Institut de police scientifique, University of Lausanne, Switzerland – Vincent Castella. Forensic genetics unit, University center of legal medicine Lausanne-Geneva, Switzerland – Christian Gehrig. Forensic Science Unit, Basque Country Police (FSU), Erandio, Vizcaya, Spain – Oscar Garcı´a. General Commissary of Scientific Police, Spanish Forensic Police, Madrid, Spain – Carmen Solis. Criminalistic Service of the Civil Guard, Madrid, Spain – David Parra. Scientific Police Division, Catalonian Police (CME), Sabadell, Barcelona, Spain – Susana Maulini. National Institute of Toxicology and Forensic Science, Madrid, Spain – Antonio Alonso. Ministry of the Interior, Slovenia – Aleksander Regent. Hellenic Police, Forensic Science Division, Greece – Penelope Miniati. International Commission on Missing Persons, Bosnia & Herzegovina – Thomas Parsons & Rene´ Huel. University of Santiago de Compostela, Spain – Manuel Fondevila. Barts and The London School of Medicine and Dentistry, UK – Denise Syndercombe-Court. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.fsigen.2012.03.005. References [1] http://register.consilium.europa.eu/pdf/en/05/st10/st10900.en05.pdf, 2005. [2] P. Gill, L. Fereday, N. Morling, P.M. Schneider, The evolution of DNA databases – recommendations for new European STR loci, Forensic Sci. Int. 156 (2006) 242–244. [3] P. Gill, L. Fereday, N. Morling, P.M. Schneider, New multiplexes for Europe – amendments and clarification of strategic development, Forensic Sci. Int. 163 (2006) 155–157. [4] L.A. Dixon, A.E. Dobbins, H.K. Pulker, J.M. Butler, P.M. Vallone, M.D. Coble, W. Parson, B. Berger, P. Grubwieser, H.S. Mogensen, et al., Analysis of artificially degraded DNA using STRs and SNPs—results of a collaborative European (EDNAP) exercise, Forensic Sci. Int. 164 (2006) 33–44. [5] M.D. Coble, J.M. Butler, Characterization of new miniSTR loci to aid analysis of degraded DNA, J. Forensic Sci. 50 (2005) 43–53. [6] J.M. Butler, Y. Shen, B.R. McCord, The development of reduced size STR amplicons as tools for analysis of degraded DNA, J. Forensic Sci. 48 (2003) 1054–1064. [7] P.D. Martin, H. Schmitter, P.M. Schneider, A brief history of the formation of DNA databases in forensic science within Europe, Forensic Sci. Int. 119 (2001) 225–231. [8] C.o.t.E. Union, Brussels, vol. 15870/09 ENFOPOL 287 CRIMORG 170, 2009, pp. 1–7. [9] L. Albinsson, L. Nore´n, R. Hedell, R. Ansell, Swedish population data and concordance for the kits PowerPlex ESX 16 System, PowerPlex ESI 16 System, AmpFlSTR NGMTM, AmpFlSTR SGM PlusTM and Investigator ESSplex, Forensic Sci. Int. Genet. 5 (2011) e89–e92.
[10] A. Berti, F. Brisighelli, A. Bosetti, E. Pilli, C. Trapani, V. Tullio, C. Franchi, G. Lago, C. Capelli, Allele frequencies of the new European Standard Set (ESS) loci in the Italian population, Forensic Sci. Int. Genet. 5 (5) (2011) 548–549. [11] O. Garcia, J. Alonso, J.A. Cano, R. Garcia, G.M. Luque, P. Martin, I. Martinez de Yuso, S. Maulini, D. Parra, I. Yurrebaso, Population genetic data and concordance study for the kits Identifier, NGM, PowerPlex ESX 17 System and Investigator ESSplex in Spain, Forensic Sci. Int. Genet. 6 (2012) e78–e79. [12] C. Previdere`, P. Grignani, S. Presciuttini, Italian population data for the new ENFSI/ EDNAP loci D1S1656, D2S441, D10S1248, D12S391, D22S1045. The GeFI collaborative exercise and concordance study, Forensic Sci. Int. Genet. 5 (2011) e238– e239. [13] V.C. Tucker, A.J. Hopwood, C.J. Sprecher, R.S. McLaren, D.R. Rabbach, M.G. Ensenberger, J.M. Thompson, D.R. Storts, Developmental validation of the PowerPlex ESI 16 and PowerPlex ESI 17 Systems: STR multiplexes for the new European standard, Forensic Sci. Int. Genet. 5 (5) (2011) 436–448. [14] V.C. Tucker, C. Baumgartner, G.R. Stead, A.J. Hopwood, UK population data generated with PowerPlex ESI 16 system, Forensic Sci. Int. Genet. (2011), http://dx.doi.org/10.1016/j.fsigen.2011.08.005. [15] S. Dognaux, M.H. Larmuseau, L. Jansen, T. Heylen, N. Vanderheyden, B. Bekaert, F. Noel, R. Decorte, Allele frequencies for the new European Standard Set (ESS) loci and D1S1677 in the Belgian population, Forensic Sci. Int. Genet. 6 (March (2)) (2012) e75–e77 [Epub 2011 June 12]. [16] http://www.agriculture.purdue.edu/fnr/html/faculty/Rhodes/Students%20and% 20Staff/glaubitz/software.htm, 2011. [17] L. Excoffier, H.E.L. Lischer, Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows, Mol. Ecol. Resour. 10 (2010) 564–567. [18] S.W. Guo, E.A. Thompson, Performing the exact test of Hardy–Weinberg proportion for multiple alleles, Biometrics 48 (1992) 361–372. [19] B.S. Weir, Genetic Data Analysis II, Sinauer Associates Inc., Sunderland, MA, 1996. [20] D. Zaykin, L. Zhivotovsky, B.S. Weir, Exact tests for association between alleles at arbitrary numbers of loci, Genetica 96 (1995) 169–178. [21] J. Buckleton, C.M. Triggs, S.J. Walsh, Forensic DNA Interpretation, CRC Press, FL, 2005. [22] A. Edwards, H.A. Hammond, L. Jin, C.T. Caskey, R. Chakraborty, Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups, Genomics 12 (1992) 241–253. [23] B. Rolf, N. Bulander, P. Wiegand, Insertion/deletion polymorphisms close to the repeat region of STR loci can cause discordant genotypes with different STR kits, Forensic Sci. Int. Genet. 5 (2011) 339–341. [24] B. Budowle, A. Masibay, S.J. Anderson, C. Barna, L. Biega, S. Brenneke, B.L. Brown, J. Cramer, G.A. DeGroot, D. Douglas, et al., STR primer concordance study, Forensic Sci. Int. 124 (2001) 47–54. [25] B. Budowle, C.J. Sprecher, Concordance study on population database samples using the PowerPlex 16 kit and AmpFlSTR Profiler Plus kit and AmpFlSTR COfiler kit, J. Forensic Sci. 46 (2001) 637–641. [26] C.R. Hill, M.C. Kline, J.J. Mulero, R.E. Lagace, C.W. Chang, L.K. Hennessy, J.M. Butler, Concordance study between the AmpFlSTR MiniFiler PCR amplification kit and conventional STR typing kits, J. Forensic Sci. 52 (2007) 870– 873. [27] P. Gill, L. Foreman, J.S. Buckleton, C.M. Triggs, H. Allen, A comparison of adjustment methods to test the robustness of an STR DNA database comprised of 24 European populations, Forensic Sci. Int. 131 (2003) 184–196. [28] H. Abdi, Bonferroni and Sˇida´k corrections for multiple comparisons, in: N.J. Salkind (Ed.), Encyclopedia of Measurement and Statistics, Sage, Thousand Oaks, CA, 2007. [29] D.J. Balding, M. Greenhalgh, R.A. Nichols, Population genetics of STR loci in Caucasians, Int. J. Legal Med. 108 (1996) 300–305. [30] D.J. Balding, R.A. Nichols, DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands, Forensic Sci. Int. 64 (1994) 125–140. [31] R.A. Nichols, D.J. Balding, Effects of population structure on DNA fingerprint analysis in forensic science, Heredity 66 (Pt 2) (1991) 297–302. [32] NRCII, The Evaluation of Forensic DNA Evidence, National Academy Press, Washington, DC, 1996. [33] B. Budowle, R. Chakraborty, Population variation at the CODIS core short tandem repeat loci in Europeans, Leg. Med. (Tokyo) 3 (2001) 29–33. [34] B. Budowle, B. Shea, S. Niezgoda, R. Chakraborty, CODIS STR loci data from 41 sample populations, J. Forensic Sci. 46 (2001) 453–489.