Forensic Science International: Genetics 16 (2015) 71–76
Contents lists available at ScienceDirect
Forensic Science International: Genetics journal homepage: www.elsevier.com/locate/fsig
Inclusion probability with dropout: An operational formula E. Milot a,b,*, J. Courteau c, F. Crispino a,b, F. Mailly d a
De´partement de chimie, biochimie et physique, Universite´ du Que´bec a` Trois-Rivie`res, 3351 boul. des Forges, CP 500, Trois-Rivie`res, QC, Canada G9A 5H7 Centre international de criminologie compare´e, Universite´ du Que´bec a` Trois-Rivie`res, 3351 boul. des Forges, CP 500, Trois-Rivie`res, QC, Canada G9A 5H7 c Groupe de recherche PRIMUS, Faculte´ de me´decine et des sciences de la sante´, Universite´ de Sherbrooke, 3001, 12e Avenue N., Sherbrooke, QC, Canada J1H 5N4 d Laboratoire de sciences judiciaires et de me´decine le´gale, Ministe`re de la se´curite´ publique du Que´bec, 1701, rue Parthenais, Montre´al, QC, Canada H2K 3S7 b
A R T I C L E I N F O
A B S T R A C T
Article history: Received 7 February 2014 Received in revised form 6 November 2014 Accepted 26 November 2014
In forensic genetics, a mixture of two or more contributors to a DNA profile is often interpreted using the inclusion probabilities theory. In this paper, we present a general formula for estimating the probability of inclusion (PI, also known as the RMNE probability) from a subset of visible alleles when dropouts are possible. This one-locus formula can easily be extended to multiple loci using the cumulative probability of inclusion. We show that an exact formulation requires fixing the number of contributors, hence to slightly modify the classic interpretation of the PI. We discuss the implications of our results for the enduring debate over the use of PI vs likelihood ratio approaches within the context of low template amplifications. ß 2014 Elsevier Ireland Ltd. All rights reserved.
Keywords: Inclusion probability Dropout Number of contributors Forensic mixture Low-template DNA RMNE
1. Introduction The analysis of mixtures from low template (LT) DNA profiling is opening a new era in forensic genetics by providing an opportunity to extract more information than ever relevant to judiciary casework from crime scene traces. At the same time, it raises major probabilistic challenges in the evaluation of the evidentiary weight of genetic profiles by necessitating the assessment of alleles potentially present but below the analytical threshold, the so-called ‘‘dropouts’’. Two major schools of mixture interpretation have been cohabiting for years in the forensic science community [1]: inclusion probability (PI) theory, also known as the ‘‘random man not excluded’’ (RMNE) approach, and likelihood ratios (LR). In brief, the former provides a measure of how inclusive a mixture is by estimating the proportion of a relevant population expected to have genotypes such that these individuals cannot be excluded as possible contributors to a mixture DNA profile. The second uses the same mixed profile to evaluate two or more competing hypotheses about the source of a trace. A debate has run over the merits of each
* Corresponding author at: De´partement de chimie, biochimie et physique, Universite´ du Que´bec a` Trois-Rivie`res, 3351 boul. des Forges, CP 500, Trois-Rivie`res, QC, Canada G9A 5H7. Tel.: +1 819 376 5011x4397. E-mail addresses:
[email protected] (E. Milot),
[email protected] (J. Courteau),
[email protected] (F. Crispino),
[email protected] (F. Mailly). http://dx.doi.org/10.1016/j.fsigen.2014.11.023 1872-4973/ß 2014 Elsevier Ireland Ltd. All rights reserved.
approach [2–4] but consensus is growing over the superiority of LR to deal with mixture data in general and dropouts in particular, and for the evaluation of evidentiary weight in the court [5,6]. Nevertheless, the PI approach can serve for investigative purposes (i.e. forensic intelligence) such as to evaluate the power of discrimination or ‘‘quality’’ of a mixture prior to the collection of a suspect profile [2,7], especially in situations where no known profiles can be assumed to be present. Thus, it may help the investigator focus his efforts on the most useful evidence. The PI can also serve to decide whether a mixture should be searched against a crime scene or convicted offender databank [2]. For evidentiary purposes, the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines recommend the removal of loci that exhibit peaks below the stochastic level for cumulative probability of inclusion (CPI) calculations prior to comparison with a suspect’s profile [8], unless higher RFU alleles can be interpreted as a distinct group, in which case a lab could calculate a restricted CPI using only these alleles (SWGDAM 2010, Sections 4.6.3 and 5.3.5) [8]. However, it has been reported that a widespread practice was to exclude from the CPI calculations loci for which a suspect shows discordant alleles. This may not always be conservative and may produce evidence prejudicial to suspects [9]. While it is true that many advanced statistical tools for LR calculations are becoming available for such complex circumstances, a proper understanding and efficient use of this approach implies much more than implementing new software within a laboratory. The effort and time required to properly train analysts should not be
E. Milot et al. / Forensic Science International: Genetics 16 (2015) 71–76
72
underestimated. Moreover, there is still some resistance to the LR framework within the judicial system [1]. Thus, many labs still use the PI both for evidential and investigative purposes and until the use of LR becomes more widespread, they are left with few options to deal with dropouts. It may seem problematic to develop a PI that accounts for dropouts because any allele from the population genetic pool could have been present in the mixture before dropping out, hence, in theory no ‘‘random man’’ should be excluded at all. Van Nieuwerburgh et al. [7] proposed a PI formulation that allows for dropout occurrence. The method requires the user to specify the number x of dropouts assumed to have occurred on a given mixture and then considers as not excluded any genotype in the population matching that restriction. Thus, it makes no assumption about dropout rates, which has the advantage of avoiding their difficult estimation. However, Van Nieuwerburgh et al.’s PI is unduly conservative when x > 0 (Tables 1 & 2). This is because the inclusion of a given genotype with x discordant allele(s) is done only on the basis of the frequency of the latter in the population rather than on the joint probability that discordant alleles occurred in the pool of contributor genotypes in first place and then dropped out. The above discussion underscores another major issue with current PI calculations in that they imply a post-hoc interpretation of mixtures to assess whether, how many or at what loci dropouts did actually occur, an error-prone process that may be hard to justify in court. Therefore, one ‘‘paints the target around the arrow’’ [10]. Ideally, a PI accounting for dropouts should come as near as possible to the value that would be obtained if it were calculated with the standard formulation (see Section 2) when all alleles are visible (no dropouts). Moreover, it would not rely on post-hoc evaluation of dropouts that may have occurred. An exact formulation necessitates some knowledge about both dropout rates and the probability that alleles were present in the trace (or equivalently, in the pool of contributor genotypes) before dropping out. No such formulation has yet been proposed, likely because developers of statistical tools incorporating dropout probabilities mostly adopt the LR approach. Here we develop such a formulation and show that it requires fixing the number of contributors to a trace. Therefore, the interpretation of the PI slightly changes and we discuss the implications of this. Nevertheless, this formulation constitutes a more rigorous solution to deal with dropouts than alternative PI methods proposed thus far. 2. Methods
Let Pr(A) designates the probability of an event A. Probability theory states that Pr(A \ B) = Pr(A|B)Pr(B) and, more generally, P P Pr(A) = nPr(A \ Bn) = nPr(A|Bn)Pr(Bn) where {Bn} is a partition of the sample space. This property refers to the law of the total probability. 2.2. General expression for one locus The exclusion probability (PE) is the probability that a random man (or woman) would be excluded at a focal locus. Then PE = 1 PI, where PI is the inclusion probability or the RMNE probability – the random man not excluded. A general expression is
PI ¼
NðNþ1Þ=2 X
Prðg i Þg i
(1)
i¼1
where Pr(gi) is the probability that an individual chosen at random from the population is of genotype gi at the locus, gi is the probability that the distinct alleles of gi (denoted gi0 ) are included in Gc and N(N + 1)/2 is the total number of distinct genotypes made from N distinct alleles. Under Hardy–Weinberg (HW) equilibrium, Pr(gi) = fpi,1pi,2, where pi,1 and pi,2 are the frequencies of the first and second alleles (Ai,1, Ai,2) of gi, respectively, and f = 2 if the individual is heterozygous (i.e. Ai,1 6¼ Ai,2) and f = 1 otherwise. Since C = Gc in the absence of dropout
gi ¼
1 0
if gi0 C otherwise
and Eq. (1) reduces to the classical formula
PI ¼
0
N C ðN C þ1Þ=2 X
f pi;1 pi;2
¼@
NC X
12 p jA
(2)
j¼1
i¼1
2.3. Modelling dropouts In the presence of dropouts, we need to include not only all genotypes compatible with C, but also all genotypes not in C due to the occurrence of dropouts on the mixture. These genotypes are those gi that satisfy: gi0 Gc ; gi0 6 C
2.1. Some notations and properties of the probability theory Let A1, A2, . . ., AN the N distinct alleles represented in the population at a specific locus and let p1, p2, . . ., pN the corresponding alleles frequencies or probabilities. Let C the set of NC visible distinct alleles from a mixture of NCo contributors. Without loss of generality, we can assume that C ¼ fA1 ; . . . ; ANC g. Let Gc be the set of all distinct alleles of the NCo contributors. Table 1 Frequency of each alleles for three loci used in Van Nieuwerburgh et al.’s [7]. Allele
Locus 1
Locus 2
Locus 3
A1 A2 A3 A4 A5 A6 A7
0.18 0.19* 0.20* 0.21* 0.22 – –
0.15 0.15 0.16* 0.17* 0.18* 0.19 –
0.12 0.12 0.13* 0.14* 0.15* 0.16 0.18
The asterisk (*) means that the allele is observed in the evidence profile.
Note that the first condition does not imply that the genotype gi is one of the mixture’s contributors but implies instead that its alleles are compatible with the mixture. The term Pr(gi) in Eq. (1) remains the same, but the problem is to derive a mathematical expression for gi. Lets recall that gi is the probability that the distinct alleles of gi (gi0 ) are included in Gc. When dropouts are possible, all we know from Gc is the visible subset C (C Gc). Let Dk = Gc \ C be the set of dropout alleles in Gc distinct from C (then C \ Dk = 1 and C [ Dk = Gc). This is equivalent to saying that Dk represents one of the K possible events (or outcomes) of dropout alleles forming the invisible part of the mixture. Then, conditioning on the invisible alleles (alleles in dropout) and by summing over all possible sets of invisible alleles (law of the total probability (see Section 2.1)), we have
g i ¼ Prðgi0 Gc Þ ¼
K X Prðgi0 Gc jDk ÞPrðDk Þ k¼0
¼
K X Prðgi0 C [ Dk jDk ÞPrðDk Þ k¼0
(3)
E. Milot et al. / Forensic Science International: Genetics 16 (2015) 71–76
where K is the total number of possible combinations of dropout alleles, namely
K¼
2NCoN XC 1 j¼0
N NC 2NCo N C j
73
Hence, for a population in HW equilibrium, the inclusion probability taking into account the possibility of dropouts is given by PIðd; NCoÞ ¼
NðNþ1Þ=2 X
f pi;1 pi;2 g i
(5)
i¼1
Here Pr(Dk) represents the probability that the set of all distinct alleles of the NCo contributors (Gc) is composed of the disjoint sets C and Dk and that the alleles in Dk are in dropout (since C is given (i.e. observed), we do not need to model its probability). Now, let 0 d 1 the probability of dropout. We can rewrite Pr(Dk) as PrðDk Þ ¼ PK
d
ND
k
PrðGc ¼ C [ Dk Þ
The denominator of this last expression is given to ensure that P summing over all possible Dk we obtain kPr(Dk) = 1. Therefore we have K X dNDk PrðGc ¼ C [ Dk Þ g i ¼ Prðgi0 C [ Dk jDk Þ PK N Dm PrðGc ¼ C [ Dm Þ k¼0 m¼0 d
!
or, equivalently
gi ¼
!
dNDk PrðGc ¼ C [ Dk Þ P N Dm PrðGc ¼ C [ Dm Þ Dm d
Dk such that
gi ¼
dNDk PrðGc ¼ C [ Dk Þ P N Dm PrðGc ¼ C [ Dm Þ Dm d
X Dk such that
!
gi0 C [ Dk ; k¼0;...;K
N Dm PrðGc ¼ C [ Dm Þ m¼0 d
X
where
gi0 C [ Dk ; k¼0;...;K
Under the assumption that the NCo contributors to the mixture are randomly drawn from the population with respect to their genotype, Pr(Gc = C [ Dk) reduces to the probability of randomly drawing from the population NCo genotypes where each allele of C [ Dk is represented at least once. This reduces to summing on all possible combination of alleles repeated ri times with ri 1 and P iri = 2NCo. Formally, we can write this as ! X r NC [ D ð2NCoÞ! pr11 . . . pNC [ D k PrðGc ¼ C [ Dk Þ ¼ r 1 ! . . . rNC [ D ! k S1 k 0 1 NC [ D Yk pri X i A @ð2NCoÞ! ¼ ri ! S1 i¼1 o n P r i ¼ 2NCo where S1 ¼ r 1 ; . . . ; r NC [ D 1 with
with D0 =1 and Pr(Gc = C [ Dk) given by Eq. (4). Note that D0 =1 represents the case where all alleles are visible and dND0 ¼ d0 ¼ 1. In summary, the frequency of a genotype is used to determine the probability of drawing a random man with that specific profile (the Pr(gi) in Eq. (1)). This probability is also equal to the probability of not being excluded from the mixture when all alleles of the random man are visible in the mixture (i.e. when gi0 C). However, when it is not the case, a random man is allowed to have discordant alleles because of dropouts and the genotype probability must be weighted by the probability that discordant alleles were present in the first place (before dropout). This, in turn, depends on the number of contributors and dropout rates. This probability is given by the gi term in Eq. (5). It is easy to see that gi = 1 when gi0 C, implying that Eq. (5) is also greater or equal to the standard Eq. (2). 2.4. Multiple loci The previous formula applies to the case of one locus. However, allowing locus-dependent dropout probabilities d = (d1, . . ., dL), where L is the total number of loci, the cumulative probability of inclusion across multiple loci is easily deduced by the product rule as CPIðd; NCoÞ ¼
L Y
PI‘ ðd‘ ; NCoÞ
(6)
‘¼1
3. Examples
k
With an approach similar to the ‘‘inclusion–exclusion principle’’ used in combinatorial mathematics, we have PrðGC ¼ C [ Dk Þ 0 1 NC [ D X Yk pri i @ð2NCoÞ! A ¼ ri ! S i¼1 2
NC [ D
þ
Xk
X
m
ð1Þ
m¼1
1i1 < < im NC [ D
k
0 1 r X Y p jj @ð2NCoÞ! A r j! S j2S 3
4
o n P where S2 ¼ r 1 ; . . . ; r NC [ D 0 with r i ¼ 2NCo , S3 ¼ r 1 ; . . . ; r i1 1 ; k P r i ¼ 2NCog, and S4 ¼ f1; . . . ; i1 1; r im þ1 ; . . . ; r NC [ D 0 with n
k
im þ 1; . . . ; NC [ Dk g. By the multinomial theorem, the expression for Pr(Gc = C [ Dk) reduces to 0 @
NC [ D
Xk i¼1
12NCo pi A
NC [ D
þ
Xk
m¼1
ð1Þ
m
0
X 1i1 < < im N C [ D
@ k
X j 2 S4
12NCo p jA
(4)
Let us consider the simple example used in Van Nieuwerburgh et al [7]. They used three loci with 5, 6 and 7 distinct alleles respectively. Table 1 contains the allele frequencies used in [7], the asterisk (*) indicating that the allele is visible in the evidence profile. To be able to compare methods, we selected d = 1/9 and d = 2/9 because they correspond to one or two expected dropouts considering the total number of alleles (i.e. 9) in the three-locus profile. Comparison between our results and Van Nieuwerburgh et al. [7] are presented in Table 2, and show that our approach leads to a modest increase in PI compared to Van Nieuwerburgh et al.’ method. To further assess how PI(d, NCo) differs from the standard Eq. (2), we used the Caucasian frequency data of the Laboratoire de sciences judiciaires et de me´decine le´gale in Que´bec to create 10 random mixtures of two individuals for the 15 loci of the AmpF‘STR1 Identifiler1 kit. Dropouts were randomly generated on these mixtures at rate d = 0.1 (equal across loci). Then, PI(d, NCo) was calculated by setting d = 0.1, 0.25 or 0.5 and assuming two or three contributors (Table 3). The results show that PI(d = 0.1, NCo = 2) calculated on mixtures after alleles were dropped out provide values close to the standard PI calculated on the same mixtures before the occurrence of these dropouts (Fig. 1A). Values
E. Milot et al. / Forensic Science International: Genetics 16 (2015) 71–76
74
Table 2 Probability of inclusion for the example in Table 1 obtained with Van Nieuwerburgh et al.’s [7] method and our method considering up to three contributors. PI1
PI2
PI3
CPI
No dropouts (d = 0) Van Nieuwerburgh et al. Eqs. (5) and (6)
0.36 0.36
0.26 0.26
0.18 0.18
0.0168 0.0168
With possible dropouts (d > 0) Van Nieuwerburgh allowing 1 dropout Eqs. (5) and (6), 2 contributors, d = 1/9 Eqs. (5) and (6), 3 contributors, d = 1/9 Van Nieuwerburgh allowing 2 dropouts Eqs. (5) and (6), 2 contributors, d = 2/9 Eqs (5) and (6), 3 contributors, d = 2/9
– 0.397 0.485 – 0.425 0.553
– 0.295 0.374 – 0.319 0.433
– 0.211 0.291 – 0.233 0.343
0.1175 0.0247 0.0528 0.3649 0.0316 0.0821
generally differed by less than one order of magnitude and were higher than the standard PI in most cases. On the other hand, values obtained from the standard PI after dropouts were always lower (less conservative) by up to three orders of magnitude. When d or NCo were overestimated (i.e. d = 0.25 or 0.5 or NCo = 3 in this example), PI(d, NCo) provided more conservative (inclusive) values (Fig. 1B). 4. Discussion Here we show that an exact formulation of the PI accounting for dropouts requires the specification of the number of
contributors. On the one hand, this represents an advantage as it incorporates more information from the trace DNA profile that helps better approaching the PI that would be obtained from the standard equation which assumes no alleles have dropped out. However, within the PI LR debate context, this lessens the usefulness of the PI for evidentiary purposes by removing one of its specificity relative to the LR, namely the non-specification of contributors, which is a characteristic that contributes to the simplicity of both calculation and interpretation of the standard equation [3]. The interpretation of PI(d, NCo) is somewhat different from the classic one and could be worded as the probability that a random man would not be excluded from a mixture assuming there were NCo contributors. Although it requires specifying the number of contributors, it is not suspect-driven, thus it remains a full PI approach, not an intermediate one between the PI and LR. As a consequence, when a suspect profile is available we emphasize that the use of PI(d, NCo) for an evidential purpose should be limited to cases where all of the suspect’s alleles are included in the mixture at all loci. This is because the method is not intended to evaluate the probability of observing a suspect’s genotype, whether or not it has alleles discordant with the mixture. In cases where it is eventually determined that all of the suspect’s alleles are present within a mixture, then it could be said that 1 in X individuals in the population of interest is not excluded as a potential contributor to the mixture, and that the suspect is one of those individuals not excluded. On the contrary, in situations
Table 3 Cumulative probability of inclusion for 10 two-person mixtures randomly generated from the Caucasian allelic frequencies used as a reference by the Laboratoire de sciences judiciaires et de me´decine le´gale in Que´bec. PI(d, NCo = 2)
PI(d, NCo = 3)
Mixture
Equation (2) After dropout
Before dropout
d = 0.1
d = 0.25
d = 0.5
d = 0.1
d = 0.25
d = 0.5
1 2 3 4 5 6 7 8 9 10
9.4 1012 7.3 1010 3.9 109 1.8 1010 4.7 1010 1.4 1010 3.9 1010 8.7 1012 6.1 1010 3.0 1011
3.4 1010 4.7 107 1.2 107 9.6 109 1.8 108 6.9 1010 9.4 108 1.4 109 2.5 108 2.5 108
2.4 109 1.1 107 9.8 108 1.5 108 5.1 108 6.3 109 1.9 107 9.3 109 1.9 108 4.2 108
2.1 108 5.6 107 4.1 107 1.0 107 2.3 107 4.7 108 9.3 107 7.1 108 1.1 107 1.7 107
9.8 108 1.8 106 1.2 106 4.0 107 7.3 107 2.0 107 3.0 106 3.0 107 4.4 107 4.7 107
2.5 107 1.7 106 1.8 106 7.1 107 1.7 106 6.0 107 4.6 106 5.7 107 9.1 107 1.4 106
4.3 106 1.4 105 1.4 105 7.7 106 1.6 105 7.2 106 3.3 105 6.7 106 9.9 106 1.1 105
3.2 105 6.4 105 7.0 105 6.0 105 8.2 105 4.3 105 1.4 104 3.9 105 5.2 105 5.7 105
Fig. 1. Cumulative probability of inclusion (log scale) for 10 randomly generated 2-person mixtures. A: CPI value for each mixture as obtained with the standard Eq. (2) before (blue triangles) and after (black dots) generating dropouts, and using Eq. (6) with d = 0.1 and NCo = 2 after generating dropouts (red squares). B: average CPI and range (error bars) for the 10 mixtures obtained with the standard equation before (‘standard’) and after (‘standard-do’) generating dropouts and using Eq. (6) with various values of d (0.1, 0.25, 0.5) and NCo (2, 3). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
E. Milot et al. / Forensic Science International: Genetics 16 (2015) 71–76
where one or a two of the suspect’s alleles are absent, the PI value calculated from the mixture cannot be applied to this suspect. Replicate amplifications may be considered and may provide additional data that could help resolve the issue. Our approach requires specifying the number of contributors. In practice, it will often be realistic to assume that this number lies within a limited range. Most interpretable mixtures in caseworks likely have fewer than four significant contributors and more complex mixtures are rarely usable for evidentiary purposes. Therefore, one strategy could be to calculate PI(d, NCo) for three contributors as it will provide a more conservative value than for two contributors. Alternatively, if both possibilities of two or three contributors are assumed with equal probabilities, a PI can be obtained by averaging the two possibilities PI(d, NCo = {2, 3}) = (1/ 2)PI(d, NCo = 2) + (1/2)PI(d, NCo = 3). This last equation can be easily generalized when possibilities are not assumed equally likely (unequal weights) or when there are more than two possibilities for the number of contributors (say that one would consider a range between 2 and 5). Moreover, recently published approaches that perform better than the classic maximum allele count method show that it is possible to reliably determine the number of contributors in most two-person or three-person mixtures, and that even a majority of four-person mixtures would be classified as such [11,12]. An important characteristic of PI(d, NCo) is the possibility to account for the tremendous variation empirically observed in dropout rates between loci. However, this necessitates reasonably accurate estimates of d. This information is probably not yet available to many forensic labs who may have limited data that only allow the establishment of a stochastic threshold. However, both for the PI and LR approaches, it will be urgent to obtain such estimates if we are to take the full advantage of new statistical methodologies that can make the best use of all the information contained in mixed profiles. This is crucial to assess the weight of evidence in the fairest manner possible and may thus have important consequences for justice. For instance, the Laboratoire de sciences judiciaires et de me´decine le´gale in Que´bec is currently setting up a protocol to empirically estimate dropout rates in relation to peak heights at each locus and in various situations (e.g. balanced vs unbalanced contributions). Indeed, a single d is unlikely to be applicable to all situations. Nevertheless, our examples with simulated mixtures suggests that PI(d, NCo) is not too sensitive to moderate errors in d. Namely, changing d either from 0.1 to 0.25, or from 0.25 to 0.5 changed the PI by about one order of magnitude (Fig. 1B). PI(d, NCo) necessarily provides conservative estimates compared to the standard PI calculated on a mixture after dropouts. However, these are not excessively conservative as the method aims to approach the standard PI that would be obtained if there were no dropouts. Therefore, using PI(d, NCo) avoids depreciating unduly the evidence. This is because it weighs the probability that a given allele was present in the pool of contributor genotypes and dropped out, based not only on its frequency but also on the probability of a given set of NCo contributors drawn from the population. One advantage of our method is that it can be used with a wide range of mixtures even those for which no dropouts are visible under the analytical threshold. For a given d > 0, it is expected that some multilocus profiles will show no dropouts (unless d ! 1) and it is thus logical to apply PI(d, NCo) for all mixtures. Thus, it is a way to objectivize the use of the PI so that it is less dependent on subjective experience when dealing with dropouts. Extension of our method could include developments to deal with a number of empirical complications such as population structure, d variable across alleles within a locus or the possibility for genetically related contributors. It also could make
75
use of quantitative data such as peak height/area or replicate amplifications. Such endeavours would make the model much more complex and increase the number of parameters to be estimated empirically. We should first evaluate whether these would be better handled within a Bayesian (hence LR) framework that accounts for the uncertainty around parameters [13], especially given the fact that promising methodological developments have been published on these questions (e.g. [14]). The PI that we propose here remains relatively simple – adding only two parameters to be estimated (d, NCo) – and straightforward to interpret. Our method should not be seen as a replacement to the LR approach especially since the uncertainty around parameter estimates is not explicitly modelled. Rather, it can be viewed as a tool that has specific utilities for intelligence or investigative purposes. For instance, one could evaluate more accurately the discrimination power of several mixtures within a case and prioritize samples before profiles from suspects are available. It also provides a method to deal with potential dropouts in low template mixtures for labs that have yet to adopt the LR framework as more conservative estimates for PI values are obtained. A code to implement PI(d, NCo) in Mathematica [15] is given in Appendix A. A code for the R environment [16] is also available from the corresponding author upon request. Appendix A. A programme in Mathematica The following programme in Mathematica calculates PI(d) given by Eq. (5) for one locus.
We call the routine ‘‘fctPI’’ as follows: fctPI[{p1,p2,p3,p4,p5},{p2,p3,p4},2,0.1,{0.18, 0.19, 0.20, 0.21,0.22}] where the five parameters represent: 1. 2. 3. 4. 5.
p = {p1,p2,p3,p4,p5}: the list of all possible alleles at the locus pC = {p2,p3,p4}: the list of all visible allele NCo = 2: the number of contributors dropout = 0.1: the probability of dropout d v = {0.18, 0.19, 0.20, 0.21,0.22}: the list of allele frequencies
76
E. Milot et al. / Forensic Science International: Genetics 16 (2015) 71–76
References [1] C.D. Steele, D.J. Balding, Ann. Rev. Stat. Appl. 1 (2014) 20.1. [2] J. Buckleton, in: J. Buckleton, C.M. Triggs, S.J. Walsh (Eds.), Forensic DNA Evidence Interpretation, CRC Press, London, 2005, pp. 218–274. [3] J. Buckleton, J. Curran, A discussion of the merits of random man not excluded and likelihood ratios, Forensic Sci. Int. Genet. 2 (2008) 343–348. [4] Budowle, et al., Mixture interpretation: defining the relevant features of mixed DNA profiles, J. Forensic Sci. 54 (2009) 810–821. [5] P. Gill, C.H. Brenner, J.S. Buckleton, A. Carracedo, M. Krawczak, W.R. Mayr, N. Morling, M. Prinz, P.M. Schneider, B.S. Weir, DNA Commission of the International Society of Forensic Genetics: recommendations on the interpretation of mixtures, Forensic Sci. Int. 160 (2006) 90–101. [6] P. Gill, L. Gusmo, H. Haned, W.R. Mayr, N. Morling, W. Parson, L. Prieto, M. Prinz, H. Schneider, P.M. Schneider, B.S. Weir, DNA Commission of the International Society of Forensic Genetics: recommendations on the evaluation of STR typing results that may include drop-out and/or drop-in using probabilistic methods, Forensic Sci. Int. Genet. 6 (2012) 679–688. [7] F. Van Nieuwerburgh, E. Goetghebeur, M. Vandewoestyne, D. Deforce, Bioinformatics 25 (2009) 225. [8] SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories. 2010 Scientific Working Group on DNA Analysis Methods (SWGDAM).
[9] J.M. Curran, J. Buckleton, Inclusion probabilities and dropout, J. Forensic Sci. 55 (2010) 1171–1173. [10] W.C. Thompson, Painting the target around the matching profile: the Texas sharpshooter fallacy in forensic DNA interpretation, Law Probab. Risk 8 (2009) 257–276. [11] J. Perez, A.A. Mitchell, N. Ducasse, J. Tamariz, T. Caragine, Estimating the number of contributors to two-, three-, and four-person mixtures containing DNA in high template and low template amounts, Croat. Med. J. 52 (2011) 314–326. [12] H. Haned, L. Pne, J.R. Lobry, A.B. Dufour, D. Pontier, Estimating the number of contributors to forensic DNA mixtures: does maximum likelihood perform better than maximum allele count, J. Forensic Sci. 56 (2011) 23–28. [13] A. Biedermann, F. Taroni, Bayesian networks for evaluating forensic DNA profiling evidence: a review and guide to literature, Forensic Sci. Int. Genet. 6 (2012) 147–157. [14] R.G. Cowell, T. Graversen, S.L. Lauritzen, J. Mortera, Analysis of forensic DNA mixtures with artefacts, http://arxiv.org/pdf/1302.4404.pdf, 2013. [15] Wolfram Research, Inc., Mathematica, Version 5.0.1.0, Champaign, IL, 2003. [16] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008,ISBN 3-90005107-0, http://www.R-project.org.