Does suggestive information cause a confirmation bias in bullet comparisons?

Does suggestive information cause a confirmation bias in bullet comparisons?

Forensic Science International 198 (2010) 138–142 Contents lists available at ScienceDirect Forensic Science International journal homepage: www.els...

109KB Sizes 0 Downloads 72 Views

Forensic Science International 198 (2010) 138–142

Contents lists available at ScienceDirect

Forensic Science International journal homepage: www.elsevier.com/locate/forsciint

Does suggestive information cause a confirmation bias in bullet comparisons? Jose Kerstholt a,*, Aletta Eikelboom a, Tjisse Dijkman b, Reinoud Stoel b, Rob Hermsen b, Bert van Leuven b a b

TNO Human Factors, PO Box 23, 3769 ZG Soesterberg, The Netherlands Netherlands Forensic Institute, PO Box 24044, 2490 AA Den Haag, The Netherlands

A R T I C L E I N F O

A B S T R A C T

Article history: Received 26 March 2009 Received in revised form 19 January 2010 Accepted 9 February 2010 Available online 6 March 2010

Several researchers have argued that the confirmation bias, the tendency to selectively gather and process information such that it fits existing beliefs, is a main threat to objective forensic examinations. The goal of the present study was to empirically investigate whether examiners making bullet comparisons are indeed vulnerable to this bias. In the first experiment, six qualified examiners evaluated 6 sets of bullets that were presented to them twice. In the neutral task condition it was mentioned in the case description that there were two perpetrators and two crime scenes, whereas in the potentially biasing task condition it was mentioned that there was only one perpetrator and one crime scene. The results showed no effect of biased information on the decision outcome. An exploratory analysis revealed rather large individual differences in two cases. In a second study we compared the conclusions of first and second examiners of actual cases that were conducted in the period between 1997 and 2006. As the second examiner mostly has no context information it may be expected that the conclusion of the first examiner should be more extreme when he or she would have become prey to a confirmation bias. The results indicate an effect in the opposite direction: the first examiner gave less extreme ratings than the second one. In all, our results indicate that examiners were not affected by biased information the case description. ß 2010 Elsevier Ireland Ltd. All rights reserved.

Keywords: Forensics Observer effects Bullet comparisons Expert opinion

1. Introduction Over the last decades several criminal cases were reopened after it became clear that a suspect was wrongly convicted. Even though the judges were convinced of the suspect’s guilt at the time, their assessment turned out to be incorrect. Tunnel vision has been mentioned as one of the most important causes for wrong convictions. When persons become prey to tunnel vision their information processing is too much affected by their prior beliefs. A thorough investigation of a Dutch criminal case, for example, showed that investigators solely worked towards the conviction of the suspect after his confession. Alternative scenarios were neglected [1]. Tunnel vision, also referred to as a confirmation bias, is the tendency to selectively gather and process information such that it fits existing beliefs, expectations or hypotheses [2,3]. A confirmation bias can affect information processing at various levels. At the level of information selection, for example, it is widely shown that people tend to select information that is congruent with their own opinion (e.g. [4]), whereas in integrating information people put more weight on information that concurs with their own view [5].

* Corresponding author. Tel.: +31 3463 56442. E-mail address: [email protected] (J. Kerstholt). 0379-0738/$ – see front matter ß 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.forsciint.2010.02.007

Prior beliefs can therefore have a major impact on subsequent information processing, potentially resulting in a conclusion that is in line with prior beliefs rather than the factual data to be processed. Risinger et al. [2] mentioned four possible sources that may bias the information processing of forensic examiners. First, an investigator could provide more information than necessary when the case is submitted for examination. Examples are that other incriminating evidence is mentioned or that hopes or expectations are expressed. A second source is that forensic examiners working on the same case communicate with each other, preventing an independent assessment. The third source is that an examiner receives information at a later point in time that does not concur with his or her own assessment. This may trigger a re-thinking process and, possibly, a re-assessment. The final source is that the police or prosecutor is unhappy with the result and requires a reassessment. In this case the preferred outcome is insinuated at a more or less subtle level potentially biasing the assessment of the examiner. To date, different sources for biased information processing have been investigated. In one line of research it was investigated to what extent fingerprint specialists are affected by an (opposite) expert opinion before doing their own assessment [6–8]. The general finding of these studies is that fingerprint specialists made more cautious decisions when they were given an opposite expert

J. Kerstholt et al. / Forensic Science International 198 (2010) 138–142

139

opinion, that is, they more often concluded that two fingerprints came from a different source (exclusion) or that a match could not be assessed on the basis of the available data. Another class of manipulations added potentially biasing information to the case description. Kerstholt et al. [9], for example, manipulated the amount of incriminating evidence given to shoe print examiners, and Hall and Player [10] manipulated the emotional content of a case. In both studies no effects were found on the final conclusion of the examiners. The type of manipulation may therefore have different effects on examiners’ information processing. One of the reasons that Kerstholt et al. [9] mentioned for the absence of a manifest effect of a confirmation bias, is that shoe print examiners are using a fairly proceduralised method. Examiners assess which traces are relevant, how big they are and which form they have, and consequently determine to which extent these traces correspond in shoe and shoe print. Integration and assessing degree of similarity is subsequently completely standardized. In bullet comparisons, there is far less standardisation. In both areas a distinction is made between class characteristics (dependent of type, make of the firearm and ammunition that is used) and individual characteristics (striations and irregularities that are characteristic for the firearm that is used). The exact assessment of the specific marks in bullets is, however, less standardized than in the area of shoe print examinations. The central research question of the present study was whether prior knowledge affects the assessment of similarity between two bullets. Two studies were conducted. In the first study we did an experiment in which the transmittal letter accompanying the presentation of two bullets either contained biased information or not. In the second study we analyzed to what extent first and second examiners differed in their assessment of complex, ambiguous cases that were carried out in the period between 1997 and 2006. In most cases, the first examiner has far more background information than the second examiner and for that reason a potential difference could point to a confirmation bias.

The cases that were used in the experiment can be qualified as relatively difficult. As FN pistols were used to make the test sets, matching striations were relatively hard to assess. In contrast with simple cases – were the data speak for themselves – difficult cases require more interpretation which will increase the potential influence of biased information. Furthermore, in actual casework the results of an examiner are checked by a second person (quality regulations and standard working procedure in this type of work). This was not allowed in this test.

2. Effect of prior knowledge on bullet comparison

2.1.3. Procedure The examiners were given one month to examine each experimental case. Care was taken that the second matched case under the other condition was presented minimally six months after the first case was submitted. Total time for data collection was one year. The examiner received two bullets for comparison conforming to the standard operating procedure. Examiners were aware of the fact that the case was part of an experiment, but did not know the explicit research question. Examiners were asked to elaborately describe their (sub)conclusions in each step of the procedure. The procedure was as follows. First, class characteristics of the bullets were compared. If clear differences are observed in these characteristics, further examination is useless as it can immediately be concluded that the bullets were fired from different firearms. In our experiment all bullets were fired with the same type of firearm, and class characteristics were therefore similar. Next, individual characteristics are compared (striations). For this comparison a microscope is used that enables the comparison of two bullets in one common picture. To make a correct interpretation of a correspondence in striations, the examiner needs to know the value of this trace. Dependent on the trace the characteristic value can be assessed. The conclusion, as currently in use, can have the following (ordered) classification (the scale value used in the current analysis is presented in brackets):

Additional information concerning a case may negatively affect an objective examination [2]. The description of the case may contain information, such as incriminating evidence, that may bias the examiner into a particular direction. This would compromise an independent assessment. In the present experiment we gave examiners two different types of background information. In one condition it was suggested that there was only one perpetrator and one crime scene (the biasing condition). In the other condition it was suggested that there were two perpetrators and two crime scenes (the neutral condition). In all cases examiners were required to indicate whether two bullets were fired from the same firearm. It is expected that examiners are more certain that bullets are fired from the same firearm, when background information suggested that there was only one perpetrator and one crime scene.

2.1. Materials and methods 2.1.1. Participants Six qualified forensic firearms examiners from the Netherlands Forensic Institute participated in the experiment: 4 male and 2 female. One person had just started with his training and one had almost finished the training. The other four were highly experienced and worked on average 19 years in this area (varying from 11 to 38 years). 2.1.2. Stimuli An expert from the Netherlands Forensic Institute made sixsets of bullets. Three sets contained bullets that were fired with the same firearm and three sets contained bullets from different firearms. All bullets were from the same make and type. They were fired with the same type of pistol (FN, 7.65 Browning) ensuring that class characteristics and caliber of all sets were the same. This way, the assessment could only be based on individual characteristics of the firearm that was used. On submitting the bullets for comparison a case description of the crime was given. This description either contained biased information or not. All participants received all cases twice: one with and one without biased information. In order to prevent memory effects, these cases were presented with a relatively long period between them. Even though these two cases were superficially different (with regard to familiarity), the type of crime was exactly the same, which made both cases comparable. The resulting experimental design could be seen as a 2  2 within-subjects design, corresponding to the two types of bullet sets (same firearm versus different firearms) under the two conditions. Each cell of the design had 18 observations (6 examiners times 3 sets of bullets). In the neutral condition it was suggested that two perpetrators and two crime scenes were involved and in the biased condition that there was only one perpetrator and one crime scene (see Table 1 for an example). In all cases the examiner was asked to assess whether the bullets were fired from the same firearm. The order in which the sets of bullets in each experimental conditions were presented to the examiners was counterbalanced.

Table 1 Case descriptions used in the experiment. Case description condition 1

Case description condition 2

At a shooting in Venlo on Monday morning an as yet unknown man was killed. According to witnesses there was only one perpetrator, who is still on the run. The street in which the shooting took place was almost immediately closed of for research by the NFI. The reason for the shooting is unclear. The car with which the perpetrator escaped was tracked down at the Maas on the border of the districts Venlo and Tegelen. During autopsy two bullets were removed from the victim’s body.

A shooting this weekend at cafe´ Piet Grijs in the centre of Zwolle resulted in the death of its owner. The incident took place around noon. Several visitors witnessed that two men, approximately 20 years of age, entered the cafe´ and immediately shot at the victim who was, at that time, standing behind the bar. During the flight of the perpetrators, another shot fell a few blocks further down. This bullet was found in the shop-window of a clothing store. The bullet in the victim’s body has been removed. The perpetrators are still on the run.

J. Kerstholt et al. / Forensic Science International 198 (2010) 138–142

140

Table 2 Correspondence between conclusions in the neutral and the bias condition. Bullets from same barrel

Bullets from different barrels

Bias condition

Bias condition

Individualisation Neutral condition

Individalisation Exclusion Inconclusive

Exclusion

12 1 1

Individualisation

Exclusion

Inconclusive

3 3

3 9

2 2

 The bullets are. . .fired from the same firearm

2.3. Discussion

* Beyond any reasonable doubt

(4)

* Most likely

(3)

* Likely

(2)

* Possibly

(1)

 Evidence that the bullets are fired from the same firearm is(0) inconclusive  The bullets are. . .fired from the same firearm * Likely not

( 2)

* Beyond any reasonable doubt not

( 4)

* Certainly not

( 5)

2.2. Results 2.2.1. Assessment of the bullets Table 2 shows the level of agreement between the conclusions that examiners gave in the neutral and the biasing condition. When the bullets came from the same barrel, there was 1 trial in which there was a change from an exclusion in the neutral condition to an individualization in the biasing condition. When the bullets came from the same barrel, there were no switches from an exclusion to an individualisation. For the statistical analyses we used the probability assessments as there could also be a shift in the examiners’ certainty with regard to a conclusion. An overall analysis showed no differences between the biasing and neutral conditions (Wilcoxon Signed-rank test (z = .91, p = .18)1). If the sets containing bullets from the same firearm are analysed separately from the sets containing bullets from a different firearm, the results remain essentially the same, with neither a significant difference between the conditions for the same firearm (z = 1.35, p = .09), nor for different firearms (z = .00, p = 1.00). The medians in the biasing and neutral conditions are both equal to 2 for the same firearm and both zero for different firearms. Thus, whether the bullets came from the same or a different firearm, we did not find an effect of our biasing case description.

2.2.2. Individual differences For two cases individual differences were rather large (see Table 3 for the median values and ranges for each case). As the patterns were the same for both conditions, we only present the data for the neutral condition. For case 2 the conclusions varied from 2 (likely not) to 2 (likely) and for case 3 from zero (inconclusive) to 4 (beyond any reasonable doubt). None of the examiners concluded that there was a probability that the bullets were fired from the same firearm when they were in fact not. All examiners either concluded that it could not be assessed or that the bullets probably were fired from a different firearm. Table 4 shows the assessment of the marks and the conclusions that each examiner gave in both cases. There were three different marks: striations in the grooves and striations in the higher parts and structural similarities when there are no matching striations between the bullets. The examiners could conclude that there are no matching patterns, that they are small, partly or completely. As can be seen in Table 4 there were large differences between examiners as to the identification of the marks. For case 2, for example, two examiners thought that there was a partly matching pattern of striations in the grooves, one thought that the matching pattern was only small and three did not see any matches. These results show that examiners largely differed in their assessment of matching patterns. In addition, there were differences with regard to the conclusion, given the assessment of matching patterns. For example, whereas examiner 1 and examiner 3 gave a score of ‘2’ when there was a partly match in the grooves in case 2, examiner 3 gave a score of ‘4’ in case 3. 1

Inconclusive

The scale from which the probability assessments could be chosen cannot be considered interval scale, and therefore a non-parametric test needs to be used.

Our results indicate no effect of the information given in the case description on the conclusion of the examiners. The description that the two bullets were fired from the same firearm did not result in higher probability assessments than the suggestion that more perpetrators and several crime scenes were involved. Note that the bullet sets used are among the most difficult ones to be encountered in practice, with traces that were relatively difficult to assess, due to ambiguity or unclear correspondences. This would facilitate the potential influence of biased information. The results of this experiment concur with previous findings in the domain of shoe print examinations [9] and fingerprints [10], but does not correspond to the general idea that biased information in the transmittal letter would result in a confirmation bias [2]. Even though there was no effect of case description, there were large individual differences for two cases. For one case, the conclusions varied from ‘beyond any reasonable doubt’ to ‘inconclusive’ and for the other case from ‘likely’ to ‘likely not’. As our exploratory analysis revealed, examiners differed with regard to the traces that were identified: some saw high agreement between bullets where others saw none, but also with regard to the conclusion that was drawn given the assessment of the matching pattern. One possible factor is the amount of time that examiners used for their assessment, which may also have interacted with the level of experience. As experienced examiners have a higher case Table 3 Median assessments and ranges for each case in the non-suggestive task condition. Median value Case Case Case Case Case Case

1 2 3 4 5 6

(same firearm) (same firearm) (same firearm) (different firearms) (different firearms) (different firearms)

Range

4 0.5 2 0 1 0

None [ 2; 2] [0; 4] [ 2; 0] [ 2; 0] [ 2; 0]

Table 4 Assessment of matching striations by each examiner for case 2 and case 3 in the non-suggestive task condition. Grooves

High parts

Case 2 Examiner Examiner Examiner Examiner Examiner Examiner

1 2 3 4 5 6

Partly None Partly None None Small

None None None None None None

Case 3 Examiner Examiner Examiner Examiner Examiner Examiner

1 2 3 4 5 6

None Partly Partly Partly All around Partly

None None None None Partly Small

Structure

Similar Partly similar Partly similar

Similar Similar Partly similar

Conclusion 2 0 2 0 2 1

0 2 4 2 4 2

J. Kerstholt et al. / Forensic Science International 198 (2010) 138–142

load than their less experienced colleagues, they may have spent less time on these experimental cases. On the other hand, if it is true that in the most complex type of bullet comparisons examiners look at (slightly) different traces, it may be that examiners are complementary to each other. If this is the case, such differences between examiners could be exploited and made explicit, likely resulting in a better conclusion. Further research that focuses on the exact features examiners use in drawing a conclusion might shed a light on this. Although it may be possible that the absence of a difference between the biasing and neutral condition was indeed caused by an absence of confirmation bias, some alternative explanations could also account for this result. First, even though we presented six examiners with six different cases, sample size was quite small, leading to instable parameter estimates and an increased the probability of a Type II error, i.e. the probability of not rejecting the null hypothesis that the groups are equal, when in fact they differ. In this experiment the power was around 60% based on a medium effect size. In other words, the effects need to be quite large in this experiment to result in significant differences. This drawback would be more relevant when there would have been a trend in the predicted direction. Second, the results might have been influenced by the fact that examiners knew that they were participating in an experiment. Note, however, that examiners were not aware of the research question. Most participants thought that the variability between examiners was tested and were not aware of the manipulation in the case description. As such our manipulation was rather subtle and not at odds with reality. Participants were therefore not aware of the possibility that their information processing could be biased, excluding the possibility that they corrected for it. However, in light of these drawbacks and in order to shed more light on the present issue, we analysed an existing data set and compared the conclusions of first and second examinations in real cases. In contrast to the experiment this study has a higher ecological validity, in combination with a relatively large sample size (N = 153). 3. Agreement between first and second examinations in real cases In complex or ambiguous cases the first examiner will ask a colleague to give a second opinion. This procedure is totally blind, in the sense that the second examiner does not know the conclusion of the first examiner. In addition he or she also lacks any context information. In contrast to the first examiner, the second examiner only receives the bullets/cartridge cases without any case description. This fact makes a comparison of first and second opinions interesting as a measurement of a potential confirmation bias in real cases. As the first examiner has more (and potentially biasing) case information than the second one, the first examiner is expected to give more extreme conclusions than the second when he or she had become prey to a confirmation bias.

141

specific folder in order to annually report the number of cases, examiners involved and outcomes. In the period 1997–2006 a total of 153 cases with two independent conclusions were available. The transformation from the qualitative indicators to numerical values was the same as described above. The data presented here are part of the standard operating procedure at the Netherlands Forensic Institute and were, as such, not gathered specifically for the purpose of present study.

3.2. Results Table 5 shows the agreement between the first and second examiner when only the class of the conclusion (individualisation, exclusion or inconclusive) is taken into account. In 128 cases examiners had the same conclusion (84%). In 3 cases (2%) the first examiner concluded that the case was an exclusion whereas the second examiner came to an individualisation. In none of the cases a switch occurred from an individualisation to an exclusion. When also the probability assessment is taken into account, there were exact correspondences in 94 cases, in 37 cases the second conclusion was higher than the first one and in 22 cases the second conclusion was lower. A comparison of both examiners indicate that the assessments of the second examiner are more extreme than those of the first examiner (Wilcoxon Signed-rank test; z = 2.31, p = .01). The second examiners concluded that the bullets/cartridge cases were more likely to be fired from the same firearm than the first examiners. 3.3. Discussion The second examiner gave higher scores than the first examiner in analyzing the same case, which suggests that the first examiners were more cautious in their conclusions than the second ones. Unlike the first examiner, the second examiner mostly lacks any background information concerning the case. If examiners would have been affected by potentially biasing information in the transmittal letter accompanying the bullets, it would be predicted that the results are the other way around. Due to the biasing information less evidence would be needed to conclude that the bullets/cartridge cases came from the same firearm, resulting in higher scores. In fact, the second examiner gave significantly higher scores than the first one. It is stressed that the cases that were included in our analysis were the most ambiguous and difficult ones to be examined. The first examiners were uncertain of their assessment, and for that reason they asked for a second opinion. This uncertainty might be the reason that they gave relatively lower scores than the second examiners. This concurs with findings by Kerstholt et al. [9], who found that shoe print examiners made more cautious conclusions when the case was more complex and with studies in which the effect of opposite expert opinions were investigated [6,7,8]. Possibly, examiners are (subconsciously) controlling for factors that potentially bias their conclusion. 4. Overall discussion

3.1. Method The second examiner writes down his or her conclusions on a standard form which is added to the file. From 1997 onwards all second opinions are collected in a

Our overall conclusion is that examiners were not affected by the biased information in the transmittal letter. The data of the first

Table 5 Agreement between first and second examiner. Examiner 2

Examiner 1

Individualisation Exclusion Inconclusive

Individualisation

Exclusion

Inconclusive

85 3 8 96

0 10 3 13

8 3 33 44

93 16 44 153

142

J. Kerstholt et al. / Forensic Science International 198 (2010) 138–142

experiment showed that there were no differences between biased and neutral information when the same case was presented twice to the examiner. In the second study we even found that examiners who received less or no information at all were less cautious in their assessment than those who received a case description. One of the mechanisms underlying a confirmation bias is that a particular hypothesis is momentarily taken to be true [11,12]. Additional incriminating evidence in the case description may generate a hypothesis of guilt or conviction, which consequently affects further information processing. Possibly, examiners do not have that attitude towards information that is provided in the submittal letter. As a matter of fact, examiners are quite critical towards information that is provided to them, which may inhibit the generation of an initial hypothesis. Possibly, this critical attitude explains the absence of a confirmation bias in the forensic domains investigated. This explanation cannot account for the findings when finger print specialists were given an opposite expert opinion [6–8]. In these studies it was shown that examiners made more cautious conclusions when an opposite expert opinion was given, that is, they more often concluded that two fingerprints did not match or that the data were inconclusive. Possibly examiners became less confident of their conclusion with an opposite expert opinion and corrected for that uncertainty by providing a more cautious judgment. As also indicated by Langenburg et al. [8] the costs of a false alarm (incorrect individualisation) are higher than the costs of a miss (incorrect exclusion). Overall, different manipulations of context might address different underlying psychological mechanisms. Rather than addressing context effects in general, a distinction should be made between the specific phase in the examination process and the psychological mechanism that is being addressed. To date, research has primarily focused on the outcome of an examination. In order to increase our understanding of underlying mechanisms, it would be interesting to also measure process variables, such as the amount of time spent on the task, the cues

that are used for an identification and the way various cues are integrated into an overall conclusion. Including this process variables may further our understanding of the differential effects obtained by the specific manipulations of context. Acknowledgements The authors thank the examiners of the weapons munition department for their willingness to participate in the experiment. References [1] F. Posthumus, Evaluatieonderzoek in de Schiedammerparkmoord. Amsterdam: Technical Report (in Dutch). [2] M.D. Risinger, M.J. Saks, W.C. Thompson, R. Rosenthal, The Daubert/Kumho implications of observer effects in forensic science: hidden problems of expectation and suggestion, California Law Review 90 (2002) 1–56. [3] R.S. Nickerson, Confirmation bias: a ubiquitous phenomenon in many guises, Review of General Psychology 2 (1998) 175–220. [4] P. Fisher, E. Jonas, D. Frey, S. Schulz-Hardt, Selective exposure to information: the impact of information limits, European Journal of Social Psychology 35 (2005) 469–492. [5] C.G. Lord, L. Ross, M.R. Lepper, Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence, Journal of Personality and Social Psychology 37 (1979) 2098–2109. [6] I.E. Dror, D. Charlton, Why experts make errors, Journal of Forensic Identification 56 (2006) 600–616. [7] I.E. Dror, D. Charlton, A.E. Pe´ron, Contextual information renders experts vulnerable to making erroneous identifications, Forensic Science International 156 (2006) 74–78. [8] G. Langenburg, C. Champod, P. Wertheim, Testing for potential contextual bias effects during the verification stage of the ACE-V methodology when conducting fingerprint comparisons, Journal of Forensic Science 54 (2009) 571–582. [9] J.H. Kerstholt, R. Paashuis, M. Sjerps, Shoe print examinations: effects of expectation, complexity and experience, Forensic Science International 165 (2007) 30– 34. [10] L.J. Hall, E. Player, Will the introduction of an emotional context affect fingerprint analysis and decision making? Forensic Science International 181 (2008) 36–39. [11] D.J. Koehler, Explanation, imagination and confidence in judgment, Psychological Bulletin 110 (1991) 499–519. [12] J.H. Kerstholt, A.R. Eikelboom, Effects of prior interpretation on situation assessment in crime analysis, Journal of Behavioural Decision Making 20 (2007) 455– 465.