Forensic Science International 124 (2001) 47±54
STR primer concordance study Bruce Budowlea,*, Arni Masibayl, Stacey J. Andersonm, Charles Barnah, Lisa Biegaj, Susanne Brennekei, Barry L. Browna, Jill Cramerc, Gretchen A. DeGrootn, Derek Douglasd, Barry Ducemanj, Allison Eastmanj, Robert Gilesc, Jennifer Hamillf, Daniel J. Haasen, Dirk W. Janssenn, Timothy D. Kupferschmidf, Terry Lawtonh, Christine Lemireg, Barbara Llewellynd, Tamyra Morettib, Jennifer Nevesg, Chris Palaskik, Sindey Schuelere, Joanne Sguegliag, Cynthia Sprecherl, Christine Tomseyk, Don Yeth a
FBI, Laboratory Division, 935 Pennsylvania Avenue, NW, Washington, DC 20535, USA b FSRU, FBI, Quantico, VA 22135, USA c GeneScreen, 2600 Stemmons Fwy. #133, Dallas, TX 75207, USA d Illinois State Police Research and Development Laboratory, 2040 Hill Meadows, Spring®eld, IL 62702, USA e Kansas Bureau of Investigation, Topeka Headquarters Laboratory, 1620 SW Tyler, Topeka, KS 66612, USA f Maine State Police Crime Laboratory, 30 Hospital Street, Augusta, ME 04333, USA g Massachusetts State Police Crime Laboratory, 59 Horsepond Road, Sudbury, MA 01776, USA h Michigan State Police, Forensic Laboratory East Lansing, 714 South Harrison, East Lansing, MI 48823, USA i Missouri Highway Patrol, Forensic Laboratory, 1510 East Elm Street, Jefferson City, MO 65101, USA j New York State Police Crime Laboratory, Building 22, 1220 Washington Avenue, Albany, NY 12226, USA k Pennsylvania State Police DNA Laboratory, 80 N. Westmoreland Avenue, Greensburg, PA 15601, USA l Promega Corporation, 2800 Woods Hollow Road, Madison, WI 53711, USA m South Dakota Forensic Laboratory, % 500 East Capitol Street, 3500 E. Highway 34, Pierre, SD 57501, USA n Wisconsin State Laboratory-Milwaukee, 1578 South 11th Street, Milwaukee, WI 53204, USA Accepted 20 July 2001
Abstract Over 1500 population database samples comprising African Americans, Caucasians, Hispanics, Native Americans, Chamorros and Filipinos were typed using the PowerPlex1 16 and the Pro®ler PlusTM/CO®lerTM kits. Except for the D8S1179 locus in Chamorros and Filipinos from Guam, there were eight examples in which a typing difference due to allele dropout was observed. At the D8S1179 locus in the population samples from Guam, there were 13 examples of allele dropout observed when using the Pro®ler Plus kit. The data support that the primers used in the PowerPlex1 16, Pro®ler PlusTM, and CO®lerTM kits are reliable for typing reference samples that are for use in CODIS. In addition, allele frequency databases have been established for the STR loci Penta D and Penta E. Both loci are highly polymorphic. # 2001 Elsevier Science Ireland Ltd. All rights reserved. Keywords: AmpFlSTR1 Pro®ler PlusTM kit; AmpFlSTR1 CO®lerTM kit; PowerPlex1 16 kit; PowerPlex1 2.1 kit; CODIS; Concordance; Primers; STR; Stochastic effects; Allele dropout; Penta D; Penta E; Hardy±Weinberg expectations
1. Introduction * Corresponding author. Tel.: 1-202-324-9512; fax: 1-202-324-1462. E-mail address:
[email protected] (B. Budowle).
United States forensic laboratories that contribute to the National DNA Index System database, utilizing the software known as combined DNA index system (CODIS), are
0379-0738/01/$ ± see front matter # 2001 Elsevier Science Ireland Ltd. All rights reserved. PII: S 0 3 7 9 - 0 7 3 8 ( 0 1 ) 0 0 5 6 3 - 1
48
B. Budowle et al. / Forensic Science International 124 (2001) 47±54
required to type 13 short tandem repeat (STR) loci from DNA acquired from convicted felons. The STR loci are CSF1PO, FGA, TH01, TPOX, vWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, and D21S11 [1]. To facilitate analysis of the core CODIS STR loci, commercial kits have been developed by two manufacturers (i.e. Applied Biosystems, Foster City, CA, and Promega Corp., Madison, WI). These kits contain primer sets for amplifying the various STR loci by the polymerase chain reaction (PCR). Primers are designed to anneal to invariant regions ¯anking the STR locus of interest. Regardless of how well designed, when a suf®ciently large number of individuals are typed, it is likely that a variant will be seen in the ``invariant'' ¯anking region of the target sequence where a primer hybridizes. If a mismatch is proximal to the 30 end of the primer, the mismatch may hinder or prevent primer extension during the PCR. When no product is synthesized, allele dropout results. For a particular locus, the sequence of primers (and thus the primer binding sites) are presumably different from each manufacturer (and in some cases between different versions of kits from the same manufacturer). Within and between laboratories that use the same primer sets, generally there is little concern for pro®le comparisons about allele dropout due to primer mismatch; typically, multiple analyses of DNA samples from the same source would yield the same pro®le. Although in some rare casework scenarios where inhibitors or contaminants co-purify with the DNA, it may be possible to observe an allele dropout in an evidence sample and not in a reference sample from the same source using the same primer sets, and vice versa. A comparison of typing results from samples that have been analyzed using the different primer sets (or kits) from the different manufacturers reveals the degree of allele dropout that may occur with a particular primer set. The results of such a comparison can be used as part of the evaluation process for the typing reliability of the primer sets. In the current study, concordance results were compiled on samples typed using both the PowerPlex1 16 kit (Promega Corp., Madison, WI) and the Pro®ler PlusTM/CO®lerTM kits (Applied Biosystems, Foster City, CA). The data show that for the major population groups (African American, Caucasian, and Hispanic) allele dropout is rare using either manufacturer's primers. 2. Materials and methods The number of samples typed for each of the 13 core STR loci with both manufacturers' kit from each contributing laboratory is displayed in Table 1. 2.1. STR ampli®cation and typing The DNA samples were ampli®ed using the PowerPlex1 16 kit (Promega Corp., Madison, WI) and the Pro®ler
PlusTM/CO®lerTM kits (Applied Biosystems, Foster City, CA) and following the manufacturers' recommendations. The ampli®ed products were analyzed using either an ABI Prism1 310 Genetic Analyzer or ABI Prism1 377 DNA Sequencer (Applied Biosystems, Foster City, CA) according to the manufacturer's recommended protocol. Allele designations were determined by comparison of the sample fragments with those of the allelic ladders supplied with each kit. Details about the analytical process and raw data can be obtained by contacting the contributing laboratories. 2.2. Statistical analysis The frequency of each allele for each locus was calculated from the numbers of each genotype in the sample set (i.e. the gene count method). Unbiased estimates of expected heterozygosity were computed as described by Edwards et al. [2]. Possible divergence from Hardy±Weinberg expectations (HWE) was tested by calculating the unbiased estimate of the expected homozygote/heterozygote frequencies [3±6] and by the exact test [7], based on 2000 shuf¯ing experiments. The computer program to perform these tests was developed by R. Chakraborty (School of Public Health, University of Texas, Houston, TX). 3. Results and discussion This study provides results on concordance data from PowerPlex 16 and Pro®ler Plus/CO®ler kits on approximately 1500 samples (African Americans N 305, Caucasians N 509, Hispanics N 264, Native Americans N 313, Filipinos N 74, Chamorros N 68, Other N 4). Excluding the D8S1179 locus in Chamorros and Filipinos, there were eight examples of allele dropout observed (only four in the major US population groups) (Table 2). However, in the comparison of D8S1179 typing results from Chamorros and Filipinos, 13 individuals typed as heterozygotes using the PowerPlex 16 kit, but only a single allele was observed using the Pro®ler Plus kit (Table 3). The data suggest that the variant that causes the allele dropout originated in the ¯anking region of allele 16. Budowle et al. [8] reported, in a population study of Chamorros and Filipinos (in which the same samples typed in the current study were also typed) out of nine STR loci only the D8S1179 locus departed signi®cantly from HWE (P 0:005, Chamorros; P 0:030, Filipinos). The observed and expected homozygosity were 38.4 and 22.8%, respectively, for Chamorros and 25.8 and 15.8%, respectively, for Filipinos. While a departure from HWE at one locus would have little impact on the validity of estimating the frequency of a multiple STR locus pro®le, the fact that the same locus departed from HWE in both sample populations suggests that the departure may not be due to chance alone. Indeed, the current concordance study demonstrates
Table 1 Number of individuals typed per locus per contributing laboratory Laboratory/population
Total
D3S1358
VWA
FGA
D8S1179
D21S11
D18S51
D5S818
D13S317
D7S820
CSF1PO
TPOX
TH01
D16S539
66 112 50 28 3 46 31 101 116 51 27 87 96 115 30 107 12 128 80 101 4 68 74 2 2
66 112 50 28 3 46 31 101 116 51 27 87 95 115 30 107 12 128 80 101 4 67 69 2 2
64 103 50 28 3 46 31 101 116 51 27 87 95 115 30 107 12 128 80 101 4 67 72 2 2
64 103 50 28 3 46 31 99 116 51 27 87 95 115 30 107 12 128 80 101 4 67 74 2 2
64 103 50 28 3 46 31 101 116 51 27 87 95 115 30 107 12 128 80 101 4 68 72 2 2
64 103 50 28 3 46 31 101 116 51 27 87 95 115 30 107 12 128 80 101 4 68 74 2 2
64 103 50 28 3 46 31 100 116 51 27 87 95 115 30 107 12 128 80 101 4 67 70 2 2
63 103 50 28 3 46 29 100 116 51 27 87 95 115 30 107 12 128 80 101 4 67 69 2 2
63 103 50 28 3 46 31 101 116 51 27 87 95 115 30 107 12 128 80 101 4 68 72 2 2
66 112 50 28 3 46 31 101 116 51 27 87 95 115 30 107 12 128 80 101 4 67 72 2 2
66 112 50 28 3 46 30 100 116 51 27 87 95 115 30 107 12 128 80 101 4 68 73 2 2
66 112 50 28 3 46 30 101 116 51 27 87 95 115 30 107 12 128 80 101 4 66 73 2 2
66 112 50 28 3 46 30 101 116 51 27 87 95 115 30 107 12 128 80 101 4 66 71 2 2
66 112 50 28 3 46 30 101 116 51 27 87 95 115 30 107 12 128 80 101 4 66 72 2 2
1537
1536
1524
1522
1523
1525
1519
1515
1522
1533
1533
1532
1530
1531
B. Budowle et al. / Forensic Science International 124 (2001) 47±54
Illinois/African American Michigan/African American Missouri/African American New York/African American Pennsylvania/African American Wisconsin/African American Illinois/Caucasian Maine/Caucasian Michigan/Caucasian Missouri/Caucasian New York/Caucasian Pennsylvania/Caucasian Wisconsin/Caucasian Michigan/Hispanic New York/Hispanic Pennsylvania/Hispanic Wisconsin/Hispanic Kansas/Apache Kansas/Navajo South Dakota/Sioux Wisconsin/Native American Massachusetts and FBI/Chamorro Massachusetts and FBI/Filipino Wisconsin/Asian Wisconsin/Miscellaneous
N
49
50
B. Budowle et al. / Forensic Science International 124 (2001) 47±54
Table 2 The seven discrepant types (and population group) due to allele dropout observed in approximately 1537 individuals typed with both the PowerPlex 16 kit and the Pro®ler Plus/CO®ler kits Population
Locus
PowerPlex 16
Profiler Plus/COfiler
Caucasian Caucasian Caucasian Native American Native American Native American Chamorro Filipino
D16S539 vWA vWA CSF1PO CSF1PO D16S539 TH01 D21S11
13, 13 17, 18 15, 16 12, 14 13, 14 11, 11 7, 9 30.2, 30.2
12, 13 18, 18 16, 16 12, 12 13, 13 11, 12 7, 7 30.2, 32.2
Table 3 Discrepant types due to allele dropout observed at the D8S1179 locus in Chamorros (N 68) and Filipinos (N 72) Population
PowerPlex 16
Profiler Plus
Chamorros Chamorros Chamorros Chamorros Chamorros Filipinos Filipinos Filipinos Filipinos Filipinos Filipinos Filipinos Filipinos
13, 13, 13, 10, 13, 13, 12, 15, 12, 15, 13, 13, 15,
13, 13, 13, 10, 13, 13, 12, 15, 12, 15, 13, 13, 15,
16 16 16 18 16 16 16 16 16 16 15 16 17
13 13 13 10 13 13 12 15 12 15 13 13 15
that the departure from HWE likely is due to allele dropout. The method of Chakraborty et al. [9], predicts allele dropout (i.e. null allele Ð based on departure from HWE) frequencies of 0.112 and 0.063 in Chamorros and Filipinos, respectively. The observed allele dropout frequencies using the Pro®ler Plus kit primers are 0.037 and 0.056, respectively. Even though the observed allele dropout frequency is lower than predicted for Chamorrans, after correcting for the observed allele dropout, there were no signi®cant departures from HWE at the D8S1179 locus in Chamorros (P 0:153) and Filipinos (P 0:094). A study is underway to identify the variant in the primer binding sequence that causes the allele dropout at the D8S1179 locus. Budowle and Sprecher [10] compared over 500 population database samples comprising African Americans, Bahamians, and Southwestern Hispanics, again using both the PowerPlex 16 and the Pro®ler Plus/CO®ler kits. In that study, only one typing difference due to allele dropout was observed and it was in African Americans at the FGA locus. An FGA heterozygote pro®le was observed using the PowerPlex 16 primers, and a single allele FGA
pro®le was observed using Pro®ler Plus primers. By combining the results of the current study with those reported by Budowle and Sprecher [10], more than 2000 samples have been typed using both manufacturer's kits. In total, there were only 22 examples of allele dropout. These were observed in 7 of the 13 core STR loci: CSF1PO, D8S1179, D16S539, D21S11, FGA, TH01, and vWA. Excluding the D8S1179 locus, there were no more than two observations of allele dropout due to a primer mismatch (out of approximately 2000 typings) in any of the loci. Moreover, of the 22 examples of allele dropout due to a primer mismatch, only four examples of allele dropout were observed in the major US population group samples (i.e. one in African Americans, three in Caucasians, and zero in Hispanics). The current searching algorithms in CODIS can accommodate the match requirements when allele dropout (which is rare) occurs. Because editing, interpretation of electropherograms, and recording data involve human intervention, most STR population databases can be expected to contain a few transcriptional errors. A small number of errors will not affect the overall allele frequency distribution and thus are of little consequence for statistical inferences. If the frequency of error was high and there was systematic bias in the errors in one laboratory as compared with another (or with one manufacturer's kit as compared with the other), one might anticipate that samplings of various STR population databases within a major population group would be notably different. However, using data generated by several laboratories and primer sets from both manufacturers, Budowle et al. [12] showed that subpopulations within a major population group are genetically similar. Regardless, the 13 core CODIS STR population data sets developed in this concordance study most likely contain few, if any, transcriptional errors because the samples were typed twice (using different primer sets) and transcriptional differences were addressed. The allele frequency distributions generated in the current study are similar to those reported previously [12], supporting the reliability of the current and previous reported STR population databases. The genotype data of the current study will be available at www.promega.com/geneticidentity (larger population data sets can be found at www.fbi.gov in the library section). Allele dropout is not a typing error, because it is a systematic characteristic, producing typing results that generally are reproducible when using the same primers. In this study, differences in typing results between primer sets due to a sequence mismatch resulting in allele dropout were very few. In addition to typing discrepancies due to allele dropout, this concordance study identi®ed potential areas of the analytical process where transcriptional differences or errors may occur. These typing differences may or may not require additional scrutiny by CODIS laboratory participants. Such typing discrepancies that were not a result of primer design predominately were due to: (1) designation of alleles that exceed the size range included in the allelic ladder; (2)
B. Budowle et al. / Forensic Science International 124 (2001) 47±54
notable peak height imbalance due to increased annealing temperature (a parameter to consider in internal validation studies); and (3) editing electropherograms. Depending on the analytical protocol, some very large alleles may not be detected. At the FGA locus, for example, some alleles can be much larger than the largest allele in the allelic ladder. In the current study, failure to detect such large alleles occurred but was uncommon. For example, a sample was typed at the FGA locus as 20, 45.2 using the PowerPlex primers and initially as a 20, 20 using Pro®ler Plus primers. The difference was not due to a primer mismatch. In the Pro®ler Plus analysis, data collection was terminated before the large FGA allele migrated past the laser; a reanalysis with a longer electrophoretic data collection time demonstrated the presence of the larger FGA allele. As with true allele dropout, CODIS search algorithms can accommodate such differences; thus failure to detect a large allele does not pose a problem for searching DNA pro®les in the appropriate DNA index. However, slightly longer collection times could obviate failure to detect alleles that substantially exceed the boundaries of the allelic ladder of the largest sized STR loci in a multiplex system. Extrapolation of the size of alleles that exceed the boundaries of the allelic ladder is an acceptable process. However, when alleles are much larger or smaller than the extreme alleles of an allelic ladder, the value assigned to the same allele may differ when using different manufacturer's kits (or different electrophoretic platforms). Such a difference in allele designation can be predicted by theory and was empirically demonstrated in our study. As an example, an allele for one sample at the FGA locus was designated as
51
43.2 with the PowerPlex 16 primers and 44 with the Pro®ler Plus primers. A difference in the large allele designation is not considered a typing error, rather it is a limitation in assigning values to these alleles by extrapolation. To accommodate this limitation in allele designation, CODIS designates, for example, all FGA alleles that exceed in size the allele 30 as >30, regardless of the manufacturer's kit that is used. In such a situation, after a CODIS search, a reference sample would be typed with the same kit by the requesting laboratory to verify whether or not the DNA pro®les are similar. Since primer concordance is one of the preliminary studies of validation for CODIS compatibility, not all participating laboratories had completed internal validation studies (for routine applications) before collecting data reported in this paper. Minimum peak height thresholds for interpretation were arbitrarily set by each participating laboratory, and veri®cation of effective primer annealing temperatures during the PCR was not carried out by all participants. Some samples that were initially typed as homozygotes were clearly heterozygotes when more closely scrutinized. Fig. 1A is an electropherogram of a DNA pro®le generated using the PowerPlex 16 kit that represents the quality expected by the manufacturer. Fig. 1B is a PowerPlex 16 electropherogram generated for one sample by a participant in our study. In Fig. 1B, the D5S818 heterozygous pro®le was initially typed as a homozygote; the other allele was below the arbitrary interpretation threshold used by the laboratory that performed this analysis. More notable is the imbalance in peak heights between loci in Fig. 1B compared with those displayed in Fig. 1A. This observation
Fig. 1. (A) Displays a PowerPlex 16 electropherogram of the loci D5S818, D13S317, D7S820, D16S539, CSF1PO, and Penta D. This electopherogram is one of good quality and readily interpretable. (B) An example of an electropherogram of the same loci as in (A), but of a database sample analyzed in this study where an allele at the D5S818 locus was initially not recorded. The arrow points to the allele at the D5S818 locus that was not initially called because it was below the interpretation threshold of the contributing laboratory. The raised baseline and the lower peak height of the D5S818 locus (compared with the other loci) are indicators to interpret the pattern with extra care. It would not be a proper interpretation for the D5S818 locus to be considered a homozygote pro®le. The other ®ve loci in this electropherogram were correctly typed.
52
B. Budowle et al. / Forensic Science International 124 (2001) 47±54
One should not rely on the current study to extrapolate either the chance or potential source of human error when performing casework analysis. Casework analysis is performed with greater scrutiny, and a technical review is carried out in every case. Additionally, because of the expected secondary review by one of us (Budowle), the contributors may not have reviewed the data with the same deliberation as would be done by those who routinely enter data into CODIS. Lastly, this study will heighten awareness of scientists so that there will be greater attention paid to areas where transcription errors may potentially occur. Allele frequency data on the 13 core STR loci are not displayed in this paper, because the data are similar to other reported studies [12]. However, the PowerPlex 16 kit contains primers for two additional STR loci (i.e. Penta D and Penta E). Thus, population data on the two pentanucleotide STRs also were generated. The frequency distributions of observed alleles for the Penta D and Penta E STR loci are shown in Tables 4 and 5. The observed and expected homozygosities, exact test for departures from HWE, PD and PE are also provided. Both loci are highly polymorphic in all populations analyzed. In fact, in all populations studied, the Penta E locus has a higher discriminating power than any of the 13 core STR loci. The few departures from HWE are due to genotypes consisting of rare alleles (i.e. those observed fewer than ®ve times). Rare alleles generally have no consequence for estimating genotype frequencies, because rare allele frequencies are replaced by a minimum allele frequency [13,14].
is indicative of variation in performance of analytical instruments among laboratories. For example, different thermocyclers may perform differently thermodynamically, and the actual temperatures may vary from what is set. The imbalance seen in Fig. 1B is characteristic of a primer annealing temperature that is higher than desired during the PCR. Two approaches can correct the imbalance: (1) recalibrate the thermocycler so that it performs well under the kit manufacturer's recommended conditions, or (2) modify the protocol so that the results obtained approximate the desired performance. Before implementation for casework, it is incumbent upon each laboratory to perform in-house validation studies to determine acceptable working analytical parameters and to demonstrate that reliable results can be obtained when analyzing forensic samples [11]. Most of the initial discrepancies in the concordance study were due to errors in editing of the electropherograms. Three contributing laboratories had little or no editing errors (0 out of 1239 locus typings, 0 out 1805 locus typings, and 1 out of 2651 locus typings). One laboratory, however, had as high as 53 out of 2704 locus typings. Genotyping errors were due to: (1) not deleting designations on stutter peaks; (2) deleting a designation on a true allele peak; or (3) manual transcription of typing results into a database spreadsheet. While CODIS can accommodate an incorrect allele assignment at a locus or two, supplementing software might reduce transcriptional errors in data entered in CODIS. These include: (1) expert systems to assist interpretation by the analyst, and (2) ranking pro®les (by the number of matching loci) that partially match. Table 4 Observed allele distributions for the Penta D locus Allele
African American (N 301)
Caucasian (N 506)
Hispanic (N 261)
Apache (N 128)
Navajo (N 80)
Sioux (N 101)
Chamorro Filipino (N 62) (N 61)
2.2 3.2 5 6 7 8 9 10 11 12 13 14 15 16 17
8.140 1.329 3.821 1.163 2.492 11.296 18.272 7.475 16.279 13.123 11.794 3.488 0.664 0.498 0.166
0.198 0.000 0.198 0.000 0.494 1.482 21.245 13.043 14.526 19.862 19.466 7.115 1.581 0.494 0.296
1.724 0.000 1.149 0.192 0.575 2.299 19.349 15.900 14.176 18.391 18.774 6.130 0.766 0.575 0.000
0.391 0.000 0.000 0.000 0.000 0.391 25.781 18.750 14.844 19.141 19.922 0.781 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 26.875 26.250 10.000 11.250 23.750 1.875 0.000 0.000 0.000
0.495 0.000 0.000 0.000 0.990 0.495 33.663 26.733 9.901 10.396 13.366 2.475 0.990 0.495 0.000
0.000 0.000 0.000 0.000 3.226 1.613 42.742 12.903 13.710 7.258 9.677 6.452 0.806 1.613 0.000
0.000 0.000 0.000 0.000 0.000 4.918 45.082 18.033 9.836 13.115 8.197 0.820 0.000 0.000 0.000
Observed homozygosity Expected homozygosity Homozygosity test (P-value) Exact test (P-value) PD PE
12.3% 11.8% 0.800 0.844 0.971 0.759
18.4% 16.5% 0.264 0.880 0.951 0.666
18.0% 15.5% 0.265 0.933 0.956 0.684
18.0% 15.5% 0.624 0.952 0.925 0.599
26.3% 21.6% 0.308 0.614 0.916 0.567
20.8% 22.0% 0.764 0.293 0.911 0.574
21.0% 23.2% 0.672 0.810 0.914 0.579
24.6% 26.8% 0.725 0.086 0.880 0.522
B. Budowle et al. / Forensic Science International 124 (2001) 47±54
53
Table 5 Observed allele distributions for the Penta E locus Allele
African American (N 301)
Caucasian (N 506)
Hispanic (N 263)
Apache (N 128)
Navajo (N 80)
Sioux (N 101)
Chamorro Filipino (N 45) (N 48)
5 7 8 9 10 11 12 12.1 13 14 15 16 17 18 19 20 20.3 21 22 23 24 25
8.638 9.136 16.944 2.492 5.316 6.478 15.615 0.000 12.458 4.651 5.980 4.817 2.990 2.326 0.997 0.664 0.000 0.166 0.166 0.000 0.000 0.166
7.609 15.415 1.877 1.087 9.486 11.265 17.688 0.000 10.870 6.621 4.249 4.842 4.447 2.174 0.988 0.791 0.000 0.198 0.296 0.099 0.000 0.000
6.084 12.357 4.183 1.331 6.274 10.076 19.392 0.190 8.175 6.464 8.175 4.373 4.373 2.471 1.901 3.042 0.000 0.570 0.380 0.190 0.000 0.000
0.000 0.781 0.391 0.000 0.000 1.172 12.109 0.000 16.016 15.234 7.422 14.063 15.234 6.641 2.734 5.859 0.000 1.563 0.781 0.000 0.000 0.000
0.000 0.000 0.625 0.000 0.625 3.750 8.125 0.000 13.750 2.500 14.375 11.875 12.500 5.625 10.625 6.875 0.000 2.500 4.375 1.250 0.625 0.000
2.970 3.960 1.485 0.000 1.980 1.980 20.297 0.000 12.871 7.921 17.327 13.366 4.950 2.970 2.970 2.475 0.000 1.980 0.495 0.000 0.000 0.000
2.222 7.778 0.000 1.111 4.444 11.111 8.889 0.000 4.444 6.667 14.444 12.222 4.444 6.667 5.556 4.444 1.111 4.444 0.000 0.000 0.000 0.000
4.167 0.000 0.000 2.083 1.042 9.375 16.667 0.000 5.208 11.458 11.458 6.250 3.125 8.333 9.375 4.167 0.000 4.167 3.125 0.000 0.000 0.000
Observed homozygosity Expected homozygosity Homozygosity test (P-value) Exact test (P-value) PD PE
11.0% 10.0% 0.586 0.009 0.976 0.796
11.3% 10.5% 0.579 0.089 0.978 0.787
10.6% 9.4% 0.496 0.432 0.981 0.808
18.8% 11.8% 0.014 0.007 0.966 0.755
6.3% 9.4% 0.331 0.219 0.968 0.798
10.9% 11.6% 0.818 0.154 0.963 0.760
8.9% 7.4% 0.711 0.398 0.968 0.829
14.6% 8.3% 0.117 0.046 0.964 0.813
In conclusion, the data support that allele dropout is rare for primer sets found in the PowerPlex 16 and Pro®ler Plus and CO®ler kits; this is particularly so for major US population groups (i.e. African Americans, Caucasians, and Hispanics) in which a total of only four examples of allele dropout were observed (three in the current study and one by Budowle and Sprecher [10]). Typing samples with different primer sets and comparing the results also support the validity of STR typing. Knowledge of the primer sequences was not needed to perform the validation study. Reference sample pro®les generated by the kits described in this study are reliable for entry into CODIS and should be compatible for comparison/searching purposes. Reliable results can be obtained, as long as proper protocols are used and appropriate validation studies are carried out to de®ne the limitations of the analytical systems (see FBI Quality Assurance Standards [11]). If new primer sets are developed for any of the 13 core STR loci, concordance studies should be performed as an early part of the validation process. Since allele dropout occurs at a very low frequency, informed interpretation and the current searching algorithms in CODIS can accommodate the match requirements when
allele dropout (or transcriptional error) may have occurred. A mismatch at only one or two loci of a 13-locus STR pro®le should be considered for potential inclusion of a suspect for investigative purposes.
Acknowledgements This is publication number 01-10 of the Laboratory Division of the Federal Bureau of Investigation. Names of commercial manufacturers are provided for identi®cation only, and inclusion does not imply endorsement by the Federal Bureau of Investigation.
References [1] B. Budowle, T.R. Moretti, S.J. Niezgoda, B.L. Brown, CODIS and PCR-based short tandem repeat loci: law enforcement tools, in: Proceedings of the Second European Symposium on Human Identi®cation 1998, Promega Corporation, Madison, WI, 1998, pp. 73±88.
54
B. Budowle et al. / Forensic Science International 124 (2001) 47±54
[2] A. Edwards, H.A. Hammond, L. Jin, C.T. Caskey, R. Chakraborty, Genetic variation at ®ve trimeric and tetrameric repeat loci in four human population groups, Genomics 12 (1992) 241±253. [3] R. Chakraborty, P.E. Smouse, J.V. Neel, Population amalgamation and genetic variation: observations on arti®cially agglomerated tribal populations of Central and South America, Am. J. Hum. Genet. 43 (1988) 709±725. [4] R. Chakraborty, M. Fornage, R. Guegue, E. Boerwinkle, Population genetics of hypervariable loci: analysis of PCR based VNTR polymorphism within a population, in: T. Burke, G. Dolf, A.J. Jeffreys, R. Wolff (Eds.), DNA Fingerprinting: Approaches and Applications, Birkhauser Verlag, Berlin, 1991, pp. 127±143. [5] M. Nei, A.K. Roychoudhury, Sampling variances of heterozygosity and genetic distance, Genetics 76 (1974) 379± 390. [6] M. Nei, Estimation of average heterozygosity and genetic distance from a small number of individuals, Genetics 89 (1978) 583±590. [7] S.W. Guo, E.A. Thompson, Performing the exact test of Hardy±Weinberg proportion for multiple alleles, Biometrics 48 (1992) 361±372.
[8] B. Budowle, D.A. Defenbaugh, K.M. Keys, Genetic variation at nine short tandem repeat loci in Chamorros and Filipinos from Guam, Legal Med. 2 (1) (2000) 26±30. [9] R. Chakraborty, Y. Zhong, L. Jin, B. Budowle, Nondetectability of restriction fragments and independence of DNA fragment sizes within and between loci in RFLP typing of DNA, Am. J. Hum. Genet. 55 (1994) 391±401. [10] B. Budowle, C. Sprecher, Concordance study on population database samples using the PowerPlexTM 16 Kit and AmpFlSTR1 Pro®ler PlusTM Kit and AmpFlSTR1 CO®lerTM Kit, J. Forensic Sci. 46 (3) (2001) 637±641. [11] Quality Assurance Standards for Forensic DNA Testing Laboratories. Forensic Science Communications July 2(3) (2000), Available at: http://www.fbi.gov/programs/lab/fsc. [12] B. Budowle, B. Shea, S. Niezgoda, R. Chakraborty, CODIS STR Loci Data from 41 Sample Populations, J. Forensic Sci. 46 (3) (2001) 453±489. [13] B. Budowle, K.L. Monson, R. Chakraborty, Estimating minimum allele frequencies for DNA pro®le frequency estimates for PCR-based loci, Int. J. Leg. Med. 108 (1996) 173±176. [14] National Research Council II Report, The Evaluation of Forensic Evidence, National Academy Press, Washington, DC, 1996.