10% or 5% match window in DNA profiling

10% or 5% match window in DNA profiling

kmwic science Forensic Science International 78 (1996) 111~~118 10% or 5% match window in DNA profiling Wing K. Fung Department of Statistics, Univ...

466KB Sizes 10 Downloads 53 Views

kmwic science Forensic Science International 78 (1996) 111~~118

10% or 5% match window in DNA profiling Wing K. Fung Department

of Statistics,

University

of’ Hong

Kong,

Hong

Kong

Received 16 December 1994; revised 6 October 1995; accepted 8 November 1995

Abstract Though the FBI repeatedlyclaimsthat the width of the window of the matching criterion they usein DNA profiling is 941,it is shownthat the width is indeed10%.This fact hasbeen noticed by somepeoplebut seemsunknown to many. However, no one seemsto be aware of the fundamental problem it createsto the fixed bin method employed by the FBI and someother laboratoriesbecauseabout half of the bins have sizeslessthan 10%. In other words, the probability for a random match of the DNA fragmentsfrom the crime sceneand from the suspectcould be underestimated.The problem may have seriousimplicationsfor commonly adoptedlegal and forensicpractices.The potential seriousness of underestimating the match probability is illustrated usingthe Hong Kong Chinesedatabase. Keywords: DNA profiling; Match-binning; Fixed bins; Match window size; Bin size

1. Introduction

DNA profiling, since its introduction in 1985 [l], has become the most controversial area in forensic science. The method is very powerful and highly discriminating for forensic human identification. It may be regarded as one of the most important discoveries in the forensic field since the introduction of fingerprinting. The method, of course is not perfect and has been seriously criticized [2-41. However, the method is widely employed in courts in some countries, though with a warning to use the tool carefully [5]. The match-binning approach, because of its simplicity, is widely used for evaluating the probability of a random match of the DNA fragments from the crime scene and those from the suspect [6]. The alternative likelihood ratio approach, though it has better justification and is more powerful statistically [6,7], is not so popular. The match-binning approach uses a criterion of matching and 0379-0738/96/$15.00 0 1995 Elsevier Science Ireland Ltd. All rights reserved SSDI

0379-0738(95)01876-K

112

W.K.

Fung / Forensic

Science

International

78 (1996)

111- 1 I8

determines the probability of a random match using the relative frequencies of the associated (fixed or floating) bins. The details of the fixed bin method are referred to by Budowle et al. [8]. The common matching criterion is that the fragment lengths are within some fixed percentage (for example the FBI uses + 2.5%) of each other. There are a few ways of expressing or interpreting this criterion, mathematically, but there does not seem to be any comparison between these expressions. I shall show that they are equivalent. Despite the repeated claim by the FBI that the width of the match window is (approximately) 5% of the fragment lengths [8], it has been illustrated in a letter to Science by Sullivan [9] that the width is indeed (approximately) 10%. However, apparently not many people notice this finding. Furthermore, no one seems to realize its serious implications to the fixed bin method used by the FBI (and some other laboratories and countries) in which many of the bin sizes are smaller than 10%. In other words, the FBI fixed bin method is scientifically unsound, and may not be conservative as the FBI often claims. This may have serious consequences for commonly used legal and forensic practices. I shall illustrate, using the Hong Kong Chinese DNA database of 208 cases, the possible effect in evaluating the random match probability.

2. Sizes of match windows and bins Let x and y be the fragment lengths in base pairs (bp) from the crime scene and from the suspect respectively. The FBI first determines the match visually and any visual match must be confirmed or rejected through application of the following matching criterion [lo]. A range of base pair values is constructed independently for each of x and y as, [(l - 0.025)x, (1 + 0.025)x], and [(l - O.O25)y, (1 + O.OZS)y].

(1)

Two fragments are declared a match if there is an overlap between the two ranges [8,10]. The FBI claimed that the above matching criterion has a width of 5% only [8,11]. DNA profiling is so powerful that two fragments are (almost) always declared a match if they belong to the same source [8]. Sometimes people would declare the fragments as matching if the following criterion is satisfied [12] : I both x and y I (1 + 0.025) Another popular way of determining x and y satisfies the condition,

a match is that the absolute difference for

W.K. Fung / Forensic

Science

Internaiional

78 (1996)

III

118

113

(x - yl 5 0.05 Qq A discussion about the width of this condition may be referred to in [9,13,14]. Suppose that the fragment length x from the crime scene is 1590 bp. Using the FBI criterion (Eq. (l)), a match is declared if y lies between 1513 bp and 167 1 bp. The window width is 158 bp which is equal to (approximately) 10% of the fragment length x, but not 5% as claimed by the FBI. This fact has been illustrated in a similar way in a letter to Science by Sullivan [9], but few people seem to have noticed it. Actually, the ranges in Eq. (1) can be simplified. The possible range of the suspect’s fragment y for a match is 0.975x/1.025

I y I 1.025x/0.975

(4)

Although Eqs. (2) and (3) look different from Eq. (1) the three equations are indeed equivalent and can all be simplified as Eq. (4). The size of the possible range of y for matching is 0.10006x - equivalent to a 10% window. Fixed bin method is widely used by the FBI (and other government laboratories in Singapore and Hong Kong) for assessing the probability of a random match of DNA profiles. The bins are arbitrarily defined and their boundaries are determined by a set of size-standard markers with exact base pair lengths. The sizes of the bins need to be greater than the size of the matching criterion [8], otherwise the random match probability could be underestimated so it biases against the accused. The FBI bins and their sizes are given in the first three columns of Table 1. Though the FBI repeatedly claimed that their bins were wider than the match window, it is found that, surprisingly, about half of the bins are smaller than 10% (the match window width) and the smallest bin size is only 5.8%. Thus, the FBI’s evaluation of random match probability is definitely unsound scientifically. However, no one seems to notice this fundamental problem. A similar problem also arises in the Hong Kong and Singapore binning methods [ 15- 171. Perhaps, one may wonder whether the rebinning method used by the FBI would solve the problem. It has also been suggested that the bin with the larger frequency is employed when the fragments from the crime scene and from the suspect lie in different bins. I use the same fragment length 1590 bp with possible matching fragments lying between 1513 bp and 1671 bp for illustration. These fragments lie in bins 9 and 10 of the FBI bins (Table 4 of [S]) that have allele counts of 32 and 35 for the D14S13 locus of the Black-population data. Since the counts are greater than 5, the frequencies of these bins 9 and 10 would remain unchanged (Table 6 of [S]) even though the FBI applied the rebinning method. However, the sizes of the bins are only 8.2% and 8.8% which are less than the 10% match window size, and so the rebinning method does not help here.

3. The possible consequence

Next, let us study the possible underestimation of the random match probabilities of the FBI fixed bin method due to the bin size problem. Since the FBI and Hong

114

W.K. Fung / Forensic

Science

International

78 (1996)

11 l-l

18

Kong use the same HAE III enzyme for DNA profiling, the Hong Kong Chinese DNA database of 208 unrelated persons is employed for illustration. The VNTR loci DlS7, D2S44, D4S139 and DlOS28 are used in Hong Kong. The frequencies of the data are given in the last four columns of Table 1 according to the FBI bins. Notice that alleles at loci DlS7 and DlOS28 have larger variability and they spread over more bins than those at the other two loci. Considering all possible between-person comparisons of fragments at each locus using the FBI matching criterion (Eq. (1)); counting the number of matches out of these 21528(208 x 207/2) comparisons. The average random match probability of Table 1 Ranges and sizes of FBI bins, with allele counts at loci DlS7, D2S44, D4S139 and DlOS28 from 208 unrelated individuals Bin

1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Total

Raw

(bp)

O-639 MO-772 773-871 872-963 964- 1077 1078-1196 1197-1352 1353-1507 1508-1637 1638-1788 1789-1924 1925-2088 2089-2351 2352-2522 2523-2692 2693-2862 2863-3033 3034-3329 3330-3674 3675-3979 3980-4323 4324-4821 4822-5219 5220-5685 5686-6368 6369-7241 7242-8452 8453- 10093 10094- 11368 11369-12829 12830-

Size (%)

18.7 11.9 9.9 11.1 10.4 12.2 10.8 8.2 8.8 7.3 8.1 11.8 7.0 6.5 6.1 5.8 9.3 9.8 7.9 8.3 10.9 1.9 8.5 11.3 12.8 15.4 17.7 11.9 12.1

Allele counts at locus DlS7

D2S44

0

0 0 8

D4S139 0 0 0 0 0 0 0 0 0 0 0 0 0 0

DlOS28 0

1 0 1 4 1 1 2 6 3 8 6 15 11 13 17 15 21 26 29 28 28 18 25 26 32 25 18 8 10 18

11 5 9 27 42 70 63 32 38 39 27 13 9 7 8 3 3 0 2 0 0 0 0 0 0 0 0 0

1 1 7 12 11 19 14 46 35 59 62 71 42 19 10 2 5

6 4 29 32 23 35 27 23 22 17 17 16 20 16 30 17 33 12 17 12 5 0 1 0 2 0 0 0 0 0

416

416

416

416

one to

million.

Odds:

“50.3

x 10’

132 475 391 174

DlS7 D2S44 D4S139 DlOS28

Combined

Number

Locus

Table 2 Number of between-person

of matches

matches

_--

50.3m”

2.0

0.0061 0.0221 0.0182 0.0081

Direct

Average

and average

count

random

random

17.3m

5.8

0.0063 0.0325 0.0348 0.008 I

bins

probabilities

probabilites

FBI

match

match by

count

16.6m

6.0

0.0064 0.0325 0.0348 0.0083

Rebinned

by direct

FBI

3.4m

29.3

0.0103 0.0559 0.0392 0.0130

bins

binning

Adjusted

and by various

methods

3.3m

30.3

0.0103 0.0559 0.0393 0.0134

Rebinned

adjusted

116

W.K. Fung / Forensic Science International 78 (1996) 11 I-1 18

Table 3 The upper limits (in bp) of the adjusted bins with sizes at least 11% 639,

2115, 5715,

712, 2363, 6381,

871, 2639, 7241,

974, 2947, 8452,

1088, 3291, 10093,

1216, 3615, 11368,

1358, 4104, 12829

1518, 4583,

1695, 5118,

1894,

the single-locus match can then be estimated under this direct counting method. The probability and number of matches at each locus are given in Table 2. For each of these matches, the random match probability can also be evaluated using the formula 2fJ; where fi and f2 are the frequencies of the bins that the found alleles overlap. If the alleles fall in different bins, the larger frequency is employed. The evaluated probabilities of random matches at each locus according to the FBI bins are then averaged, and these averages are given in the fourth column of Table 2. I also evaluate these average probabilities according to the rebinned FBI bins (minimum bin count is 5). They are given in the fifth column of Table 2 and are very similar to those evaluated without rebinning. The average probabilities at loci D2S44 and D4S139 according to FBI bins are found to be higher (about 50% or more) than those obtained by direct counting, seeming that the FBI binning method has little problem at these loci. The average probabilities at the other two loci, however, are very close to those by direct counting. It suggests that some of the individual probabilities of random matches are overestimated and some are underestimated, giving the averages to be nearly ‘unbiased’ (but not conservative). Thus, the FBI binning method is not (always) conservative as the FBI often claims at least at loci Dl S7 and DlOS28. Since the ranges of bins are arbitrary determined (by markers), the bins are adjusted so that their sizes are all greater than the lOoh match window size. As most of the original FBI bins 4 to 24 have sizes smaller than lo%, they are reset to have 11% sizes (1.1 x the size of 10% match windows). These 27 (original 31) adjusted bins are listed in Table 3. The average random match probabilities are evaluated according to the adjusted bins and the rebinned adjusted bins for comparison. They are very close to each other and are given in the last two columns of Table 2. These probabilities are at least 60% (at loci DlS7 and D10S28) higher than those evaluated by direct counting. The average random match probabilities and the average odds for the four combined loci are evaluated using the multiplication rule. They are given in the last two rows of Table 2. The smallest odds are naturally found with direct counting method and results in one to 50.3 million. The odds according to the rebinned FBI bins and the larger rebinned adjusted bins are one to 16.6 million and one to 3.3 million, respectively. Perhaps one may argue that the random match probability estimate could still be conservative for a four-loci match using the FBI bins. This of course is not generally true. It may also be argued that since these probabilities are as small as one in a million, one or two orders of magnitude difference would not affect the general conclusion. However, these arguments do not help solve the problem of too small a bin size and would definitely be challenged in courtrooms. Moreover, if there is only a two-loci (instead of four-loci) match, the random match probability would be much higher.

W.K.

Fung / Forensic

Science

International

78 (1996)

I1 I - 118

117

For example, there is a case in the database having an estimated frequency of 1.7 in 1000 under the FBI binning, but its probability is 1.3 in 100 under the adjusted binning, being 7.8 times higher than the other estimate. The evidence of this two-loci match is much less convincing according to the adjusted binning.

4. Concluding remarks The FBI fixed bin method is a very popular method for assessing random match probabilities in DNA profiling. A few matching criteria (equations) are commonly used, and I have shown that they are indeed equivalent. Although the FBI often claims that the match window size they employ is 5%, and so is smaller than the bin sizes and thus conservative [10,18], it is shown that the window size is actually 10%. In other words, the FBI binning method is scientifically unsound because the bins are too small. It has also been shown that the random match probabilities could be underestimated at some loci and the consequence could be serious in some circumstances. The commonly used practice may face serious challenges in courtrooms, therefore, good statistical and forensic practice is definitely needed in this area. Acknowledgements This work was partly supported by CRCG grant A/C 337/017/0003 and a statistics departmental fund of the University of Hong Kong. The author thanks D.M. Wong, P. Tsui, ST. Chow, W.K. Li, and the Assistant to the Editor-In-Chief for assistance. He is grateful for the referee’s valuable comments that largely improved the presentations of the paper.

References [I] A.J. Jeffreys, V. Wilson and S.L. Thein, Individual-specific ‘fingerprints’ of human DNA. *Vurure, 316 (1985) 76-79. [2] D.J. Balding and P. Donnelly, How convincing is DNA evidence. Nature, 368 (1994) 2855286. [3] J.E. Cohen, DNA fingerprinting for forensic identification: potential effects on data interpretation of subpopulation heterogeneity and band number variability. Am. J. Hum. Genet.. 46 (1990) 358-368. [4] E.S. Lander, DNA fingerprinting on trial. Nature, 339 (1989) 501-505. [5] National Research Council, DNA Technology in Forensic, Science, National Academic Press. Washington, DC, 1992. [6] D.A. Berry, I.W. Evett and R. Pinchin, Statistical inference in crime investigations using deoxyribonucleic acid profiling (with discussion). Appl. Stat., 41 (1992) 499-531. [7] B. Devlin, N. Risch and K. Roeder, Forensic inference using DNA fingerprints. J. Am. Stat. Assoc., 87 (1992) 337-350. [S] B. Budowle, A.M. Giusti, J.S. Waye, F.S. Baechtel, R.M. Fourney, D.E. Adam, L.A. Presley, H.A. Deadman and K.L. Monson, Fixed-bin analysis for statistical evaluation of continuous distributions of allelic data from VNTR loci. for use in forensic comparisons. Am. J. Hum. Gene., 48 (1991) 841-855. [9] P.J. Sullivan, DNA fingerprint matches. Science, 256 (1992) 1743-1744.

118

WK.

Fung 1 Forensic

Science

International

78 (1996)

I1 I-l

18

[lo] Federal Bureau of Investigation, Laboratory Division, Procedures for the Detection of Restriction Fragment Length Polymorphisms in Human DNA. FBI Academy, Virginia. [ll] K.L. Monson and B. Budowle, A comparison of the fixed bin method with the floating bin and direct count methods: effect of VNTR profile frequency estimation and reference population. J. Forensic Sci., 38 (1993) 1037-1050. [12] A.W. Sudbury, J. Marinopoulos and P. Gunn, Assessing the evidental value of DNA profiles matching without using the assumption of independent loci. J. Forensic Sci. Sot., 33 (1993) 73-82. [13] N. Risch and B. Devlin, On the probability of matching DNA fingerprints. Science, 255 (1992) 717-720. [14] N. Risch and B. Devlin, Response to P.J. Sullivan. Science, 256 (1992) 17441745. [15] P. Tsui and D.M. Wong, Allele frequencies of four VNTR loci in Chinese population in Hong Kong. Technical Report, The Government Laboratory of Hong Kong, 1994. [16] ST. Chow, W.F. Tan, K.H. Yap and T.L. Ng, The Development of DNA profiling database in an HaeIII-based RFLP system for Chinese, Malays, and Indians in Singapore. J. Forensic Sci., 38 (1993) 874-884. [17] S.T. Chow, K.H. Yap and W.F. Tan, Determination of match criteria for DNA profiling based on casework data. Technical Report, Institute of Science and Forensic Medicine, Singapore, 1994. [18] R. Chakraborty, L. Jin, M.R. Srinivasan and B. Budowle, On allele frequency computation from DNA typing data. Int. I. Leg. Med., 106 (1993) 103-106.