Forensic
THE III.
Science
International,
20 (1982)
INTERPRETATION
OF REFRACTIVE
I. W. EVETT
and J. A. LAMBERT
Home Office (G t. Britain)
Central
(Received
February
Research 18, 1982;
Establishment, accepted
237
237 - 245
INDEX
Aldermaston,
March
MEASUREMENTS.
Reading,
Berkshire
RG7
4PN
22,1982)
Summary The question of whether or not to group recovered glass fragments before comparison with a control sample is fundamental to the interpretation of refractive index measurements. A computer program has been written to perform the grouping and explore the consequences. Grouping the fragments is shown to increase the chances of finding all recovered fragments similar when they have in fact come from the same source as the control, and to give enhanced discrimination when the recovered glass has come from a different source. When the recovered fragments have come from two different sources the consequences of the grouping procedure are preferable to non-grouping.
Introduction In two previous papers in this journal [ 1, 21 one of the authors has described an approach to the forensic comparison of glass refractive index (RI) measurements. This may be briefly summarised as follows.
Grouping Decide if the spread of the measurements on the recovered fragments is small enough for them to be treated as having come from one source. If so, group them.
Similarity Compare the mean of the measurements on the recovered fragments with the mean of the measurements on the control and decide whether they are close enough to be classed as ‘similar’. If not decide ‘different’.
Significance In the event of the result being ‘similar’, the significance is assessed by estimating the probability of a coincidence. That is, if the fragments had been taken at random from some glass source, itself taken at random from the population, what is the probability that they would be found to be similar to the control? The smaller the value of this probability, the greater is the evidential significance of the result. 0379-0738/82/0000-0000/$02.75
0 Elsevier
Sequoia/Printed
in The Netherlands
238 Control
Mean
I I .
I
..I
.
.
!
I
1
(a)
5159
ConventIonal
1
“on-grouping
approach
1.5159
Grouping
Acceptance
Interval
Acceptance
of grouping
L
3sd
Mean
1.5160
approach
Fig. 1. Comparison
1 5161
5160
Control
(b)
I I I I
interval
and non-grouping
1.5161
from
t-test
approaches
at
99%
level
in a hypothetical
case.
Lindley [3], Seheult [4] and Grove [5,6] have each described different approaches to the problem. Whereas the various approaches differ in fundamental theoretical aspects they all take it as almost a sine qua non that, provided the recovered fragments are sufficiently close in RI, then they should be treated as having come from one source. To the statistician, the advantages of doing this are self evident, but the operational forensic scientist is bound to have doubts about its admissibility. The alternative to grouping is to compare the recovered measurements individually with an acceptance interval centred on the control mean. It is quite widespread practice to use an interval of + 3S.D., where S.D. denotes the estimate of standard deviation calculated from repeated measurements on the control sample. Throughout this discussion it is assumed that 10 measurements are made on the control. Figure 1 shows a simple example which contrasts the comparison stage of a grouping approach, with the non-grouping ‘3S.D.’ approach. The control mean is 1.5160 and there are five recovered fragments with RI’s varying between 1.51585 and 1.51592. The S.D. is taken to be 0.00004. The non-grouping comparison is shown in Fig. l(a): two of the fragments would be classed as similar. If the measurements are grouped, however, the position is quite different. In Fig. l(b) note that the mean of the five recorded fragments falls well outside the acceptance interval. Note also that the acceptance interval (here taken from a 99% t-test -- see below) decreases as the number of grouped fragments increases - so the discrimination increases. This paper describes a detailed investigation of the relative advantages of grouping and non-grouping.
239
A computer program *GP has been written for investigating a collection of recovered measurements, deciding on how they should be grouped and comparing them with measurements from between one and three controls. In addition, a routine has been written for generating simulated casework control and recovered results and comparing the outcomes ‘grouping’ and ‘non-grouping’.
The computer
program
The program is run on a PRIME computer and operates in two stages. Information on up to three control samples can be read in. The user has the option of fixing the mean and standard deviations for the control samples or of generating the results about a particular mean. A pseudo-random number generator is used to generate results from a normal distribution. It is possible to simulate up to three different control samples in this way. A set of recovered results are also generated from a normal distribution. The recovered results are compared one at a time with the control results. The comparison interval is + 3 X S.D. of the control results reflecting widespread practice. Grouping of the recovered results, then takes place. The method of grouping is described in the following section. The grouped results are compared with the control by means of a standard t-test at the 99% significance level (see Appendix 1). The decision to use a t-test rather than the comparison criterion calculated by Evett [2] was based on two considerations. First, the Evett comparison criteria were based solely on data from window glass and so rely on rather restrictive assumptions. Second, when a computer routine is used to carry out the comparison, the extra computation required for the t-test is not a problem. In the previous papers a significance level of 95% was used: thus, on average, the conclusion ‘different’would be drawn in 1 in 20 cases in which the recovered glass had in fact come from the same source as the control. Discussions with forensic scientists suggest that most would consider this error rate to be unacceptably high and so the significance level of 99% was used for the present study. At this level the t-test for one recovered fragment (assuming 10 control fragments) corresponds to an acceptance interval of f 3.3 S.D. which is close to the usual practice. However, as the number of grouped recovered fragments increases, the effective acceptance interval becomes progressively smaller, and so the discriminating power of the test increases.
Grouping Given a set of m recovered results we want some way of deciding whether these results could reasonably have come from one source. For glass with a known S.D., we can use tables to get a range of RI in which we would ex-
240 TABLE Grouping of group No.
8 9 10 11 12 13 14 15 16 17 18 19 20
1 intervals
in group
for different
Grouping (x 10,000)
sizes
interval
1.13 1.46 1.68 1.86 1.98 2.09 2.21 2.29 2.36 2.44 2.50 2.56 2.61 2.66 2.70 2.74 2.78 2.82 2.85
pect any specified number of results to fall. Unfortunately, as we do not know in advance whether the recovered glass has come from one or more sources, we cannot even estimate the S.D. The procedure adopted in Evett [2] and employed here is to use a range based on the distribution of S.D.‘s of control glass samples. The grouping range for m fragments has been calculated such that in 95% of cases, m fragments from one source will have a range of measurements less than the grouping range. Although the ranges were calculated from data on window glass alone, examination of casework data from 6000 cases indicates that the only effect of applying the ranges to non-window glass would be to lower slightly the above figure of 95%. The grouping range for different numbers of glass fragments is shown in Table 1. The remaining problem is how to group the results. Several methods were tried and the following was selected as the most appropriate for this work. Start with m recovered results and sort them into ascending order. Let m(M) be the median result, m(H) the highest and m(L) the lowest result. First find the median result m(M). Next find the nearest result to m(M) and combine it with m(M) to form a group of two. Since the results are sorted, this will be either m(M - 1) or m(M + 1). If the range of the two results is within the required limits, include the next nearest result to form a group of three. Repeat the process until either all the results are grouped, or the range of results exceeds the corresponding grouping interval. In the latter case remove
241 TABLE Grouping
2 versus non-grouping
No. of recovered fragments
2 4 6 8 10 Both
control
for control
and recovered
glass having the same origin
Probability of at least one similar
Probability all similar
Grouping
Non-grouping
Grouping
Non-grouping
0.991 0.990 0.991 0.990 0.990
0.999 1.000 1.000 1.000 1.000
0.946 0.974 0.986 0.989 0.989
0.964 0.933 0.910 0.883 0.856
and recovered
fragments
generated
normally
with SD.
of
= 0.00004.
the most recently added result. This may leave the lower or higher results ungrouped. If so find the mid point of first the lower set and then the higher set of results and repeat the comparisons until all the results are accounted for. Results The various experiments to compare the relative advantages and non-grouping are described in the following sections.
of grouping
Control and recovered glass from the same source The first stage was to simulate the case where the control and recovered glass have the same origin. Ten control results and two recovered results were generated about the same mean with a standard deviation of 0.00004, and compared fist without grouping and then using grouping. This was repeated 10,000 times and then for 4, 6, 8, and 10 recovered results. The results are summarised in Table 2. The results for grouping reflect the significance level of the t-test. In 1% of the cases the recovered glass will be reported as different when it has in fact come from the same source as the control. A disadvantage of the nongrouping approach immediately emerges -even when all the fragments actually come from the same source as the control there is an appreciable chance of finding one or more of them different. This chance increases with the number of fragments, giving the forensic scientist the unnecessary difficulty of explaining the presence of glass from two or more sources when in fact all the glass has the same origin. It must be emphasized that the ‘probabilities’ quoted in Table 2 and elsewhere are based on 10,000 results. A tabulated probability of 1.000 means that a particular result was obtained at least 9,995 times out of 10,000 (since the results are rounded to three decimal places), and does not imply certainty. Recovered glass from a different source The second stage was to simulate the case when the recovered inated from a different source to the control.
glass orig-
242
10
5 (Recovered
mean
- control
15 mean)
20
x lo5
Fig. 2. The probability of finding at least one recovered fragment to be similar to the control, when the recovered and control fragments have come from different sources. 10 control and 2 recovered fragments. S.D. = 4 x 10e5.
Ten control results were generated at a mean of 1.5160. Two recovered results were generated initially at the same mean, and then in steps of 0.00001 up to 1.5162. This was repeated 10,000 times for each step. The standard deviation was set at 0.00004. The results are shown in graphical form in Fig. 2. The process was repeated, but for ten recovered results instead of two. The outcome of this is shown in Fig. 3. An examination of Figs. 2 and 3 shows that grouping is consistently more discriminating than non-grouping when the control and recovered glass samples have different means. The difference in performance between grouping and non-grouping becomes more pronounced as the number of recovered fragments increases; e.g. Fig. 3 shows that for 10 results, even when the means are 0.0001 apart, there is still a very good chance of finding at least one result similar by not grouping and hence establishing an apparent connection. Recovered glass from more than one source There is always a possibility of erroneously grouping glass fragments which have come from two or more sources. If grouping is to be generally recommended as the first stage in the interpretation of RI measurements, the consequences of incorrectly grouping should be no more serious than if the fragments were not grouped. There is a limitless range of possible recovered fragments to be considered, but the investigation is restricted to the case where the recovered fragments have come from two sources, neither of which is the same as the control.
5 (Recovered
10 mean
-
control
15 mean)
x
lo5
Fig. 3. The probability of finding at least one recovered fragment to be similar to the control, when the recovered and control fragments have come from different sources. 10 control and 10 recovered fragments. S.D. = 4 x 10d5.
The computer simulation was set up as follows. Ten control fragments were generated about a fixed mean of 1.5160. Two RI values were generated in the range 1.51585 to 1.51615, by randomly sampling from a uniform distribution. These two values were taken as the two means of the recovered results and a group of recovered results was generated about each mean. Comparisons were made and the process was repeated 10,000 times, and then for 2, 4, 6, 8 and 10 fragments. The results are displayed in Table 3. TABLE 3 Grouping versus non-grouping for recovered glass having come from more than one source No. of recovered fragments
1+1 2+2 3+3 4+4 5+5
Average no. of similar fragments Grouping
Non-grouping
1.38 2.33 3.16 4.05 5.01
1.44 2.90 4.35 5.78 7.28
Both control and recovered fragments generated normally with S.D. = 0.00004. Control mean fixed at 1.5160. Recovered means in the range 1.51585 to 1.51615.
244
Table 3 shows the average number of matching fragments found for grouping and non-grouping. Note that non-grouping results in more matching fragments being found, and that the difference between the two approaches becomes more pronounced as the number of recovered fragments increases. The figures for both grouping and non-grouping are appreciable, but fortunately the probability of getting measurements from different sources close enough to group, and to be similar to a particular control sample, is small. As an example, the probability of glass from two different sources falling in the range 1.51585 - 1.51615 can be estimated from the casework data as 0.08362 - approximately 0.7%.
More than one control sample Particular problems occur in the non-grouping approach when there are two or more control samples which are very close in RI. The performance of grouping in these circumstances is assessed in the following computer simulation. Ten results were generated about 1.5160 to represent control 1. Ten results were generated to represent control 2, first at 1.5160, then in steps of 0.00005 to 1.5164. Ten recovered results were generated midway between the two control samples. The process was repeated 10,000 times for each step, the results are shown in Table 4. Table 4 shows that there is a much higher probability of finding recovered fragments similar to both controls when no grouping takes place. In these circumstances grouping is shown to be far more discriminating than non-grouping.
TABLE
4
Grouping Mean of control 2
1.5160 1.51605 1.5161 1.51615 1.5162 1.51625 1.5163 1.51635 1.5164
versus non-grouping Mean of recovered
1.5160 1.516025 1.51605 1.516075 1.5161 1.516125 1.51615 1.516175 1.5162
for two control
samples
Probability of a connection with both controls Grouping
Non-grouping
0.982 0.806 0.214 0.004 0.000 0.000 0.000 0.000 0.000
1.000 1.000 1.000 0.996 0.966 0.849 0.579 0.256 0.078
Both control and recovered fragments generated normally S.D. = 0.00004. Mean of control 1 fixed at 1.5160.
with
245
Conclusion Computer simulations have been carried out to investigate the relative performance of treating recovered results individually, or of grouping them before making a comparison with a control sample. Some typical casework problems have been modelled and the results indicate throughout that grouping gives enhanced discrimination. This work has used computer simulation to examine the first stage in the interpretation of RI measurements, that is the comparison between control and recovered measurements. The next stage is to investigate the significance of the comparison and this will be the subject of a future report.
References I. W. Evett, The interpretation (1977) 209 - 217. I. W. Evett, The interpretation
12 (1978)
of
refractive
of refractive
index
measurements,
index measurements.
Forensic
II, Forensic
Sci., 9 Sci. Znt.,
37 - 47.
D. V. Lindley, A problem in forensic science, Biometrika, 64 (1977) 207 - 213. A. Seheult, On a problem in forensic science, Biometrika, 65 (1978) 646 - 648. D. M. Grove, The interpretation of forensic evidence using a likelihood ratio, Biometrika, 6 7 (1980) 243 - 246. D. M. Grove, The statistical interpretation of refractive index measurements, Forensic Sci. Znt., 18 (1981) 189 - 194. R. V. Hogg and A. T. Craig, Zntroduction to Mathematical Statistics (3rd edn.), MacMillan, 1970, p. 303. Statistical Tables, Cambridge, D. V. Lindley and J. C. P. Miller, Cambridge Elementary 1971, p. 6.
Appendix:
The t-test
Let the mean and standard deviation calculated from n control measurements be f and s, respectively. Let the mean and standard deviation calculated from m measurements on recovered fragments be 7 and s, respectively. Then, if the control and recovered fragments have come from the same source, the statistic
T=
[nm/(n+ m)]"(Z -jq {[(n- 1)s; + (m - l)$]
--
/(n + m -
2) >”
has a t distribution with n + m - 2 degrees of freedom [7]. The test for similarity at the 99% significance level that follows from this is: If
ITIG t1s, ,, +m _ 2 conclude
Otherwise where t l%.n+m-2 of freedom [ 81.
conclude is
‘similar’ ‘different’
the 1% point of the t distribution
with n + m - 2 degrees