Spectrochimica Acta Part B 63 (2008) 129 – 141 www.elsevier.com/locate/sab
Limits of detection and decision. Part 2 ☆ E. Voigtman Department of Chemistry, University of Massachusetts, Amherst, 710 N. Pleasant St., Amherst, MA 01003-9336, United States Received 29 August 2007; accepted 12 November 2007 Available online 22 November 2007
Abstract Extensive Monte Carlo studies of instrumental limits of detection (LODs) were performed on a simple univariate chemical measurement system having homoscedastic, Gaussian measurement noise and using ordinary least squares (OLS) processing of tens of millions of independent calibration curve data sets. It was found that experimental decision and detection limits in the content domain were distributed as scaled reciprocals of noncentral t variates. In the response domain, the decision and detection limits were distributed as scaled χ variates. Rates of false negatives were found to be as expected statistically and no bias was found. However, use of detection limit expressions based on critical values of the noncentrality parameter of the noncentral t distribution were found to be significantly biased, resulting in substantial bias in rates of false negatives. © 2007 Elsevier B.V. All rights reserved. Keywords: Limit of detection; Detection limit; Noncentral t distribution; Bias
1. Introduction Part 1 began the detailed statistical exploration of a simple univariate chemical measurement system (CMS) with homoscedastic, Gaussian measurement noise and ordinary least squares (OLS) processing of the calibration curve data. This model system is unrealistically simplistic relative to what is often encountered in reality. Nevertheless, it serves as an excellent “zero order approximation” to what exists in practice and, mutatis mutandis, can accommodate heteroscedastic measurement errors, weighted least squares processing, non-Gaussian noise and so on. It is especially valuable in the present investigation where a major goal is elucidation of the probability density functions (PDFs) of the experimental decision and detection limits that arise from Currie's classic hypothesis testing formulation of detection theory [1]. In Part 1 it was shown, via extensive Monte Carlo computer simulations performed on the CMS above, that Currie's schema provided rates of false positives (i.e., type I errors, denoted by p) and false negatives (i.e., type II errors, denoted by q) that were as ☆
This article is published in a special honor issue dedicated to Jim Winefordner on the occasion of his retirement, in recognition of his outstanding accomplishments in analytical atomic and molecular spectroscopy. E-mail address:
[email protected]. 0584-8547/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sab.2007.11.018
expected statistically and no bias was found. In contrast, Hubaux and Vos' prediction interval-based instantiation of Currie's schema [2] was found to produce significantly negatively biased experimental detection limits. Uncovering the cause of the bias in the prediction interval methodology lead to the finding that Currie's schema can admit non-unique solutions if more than one alternative hypothesis could be considered and implies that the alternative hypothesis must be carefully formulated if a unique solution is desired, as will almost certainly be the case in practice. Given the clear advantages that Currie's schema affords relative to the more complicated and biased prediction interval methodology, and the absence of any disadvantages in this regard, it might seem that the matter is settled enough so that attention may be directed toward determining the PDFs for the experimental decision and detection limits in both the response and content domains. This process was initiated near the end of Part 1, with Fig. 8 there showing histograms of 107xC and 107xD variates. Each histogram was overplotted with its corresponding theoretical PDF, which were stated, without proof, as being modified noncentral t distributions. In this paper, the PDFs of xC and xD variates are derived and examples are provided to illustrate their computation, accuracy and utility in yielding, e.g., confidence intervals and expectation values. However, there are
130
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
two issues that must be addressed first, which is why the PDF derivations were necessarily delayed from Part 1. The first issue is essentially a philosophical one: Currie has consistently maintained that the limit of detection in his schema simply does not exist, even in principle, if the population standard deviation of the noise, σ0, is unknown [3–5]. In the author's view, this is completely untenable, given that population parameters are almost never known outside of non-experimental situations such as Monte Carlo computer simulations. Every real measurement is intrinsically a random variate, i.e., a sample from some PDF wider than a ‘delta function’, as are all quantities computed from one or more measurements, e.g., sample means, sample standard deviations or area integrals. Simply put, measurements rarely yield population parameters except in contrived or special situations. Accordingly, we have to take into account that real chemical measurement systems and their associated calibration curves, even for the simple CMS described above, will only yield decision and detection limits that are random variates and it is the PDFs of these variates that determine all of their statistical properties. The second issue concerns the noncentral t distribution and its fundamental connection with detection limits. In Part 1, it was shown that the two experimental domain Currie detection limits were given by yD ¼ tp þ tq s0 g1=2 ð1Þ and xD ¼ tp þ tq s0 g1=2 =b
ð2Þ
with all notation as in Part 1. These two equations are exact, valid for all degrees of freedom and follow directly from their theoretical domain counterparts YD and XD, which they asymptotically approach as ν → ∞. Yet these two expressions have repeatedly been mischaracterized as mere approximations, ascribed to a supposedly “invalid use of the central t-distribution” [6, p. 52] and thus inferior to “use of the more statistically justified noncentral tdistribution” [6, p. 56] and its associated critical noncentrality values. These latter, variously denoted by Δ(p,q) [7], δα,β,ν [4,5] and by δp,q herein, are typically obtained from published tables [7]. Clayton et al. [7] clearly disagree with Eqs. (1) and (2), stating bluntly “Under the alternative hypothesis, X N 0, the appropriate derived distribution is a noncentral t distribution, and detection limits become functions of the corresponding noncentrality parameter. Other techniques involving similar assumptions, such as those cited previously, purport to provide fixed type I and type II error rates but avoid the use of noncentral t probabilities. As pointed out by Burrows, the derivation of such methods must include some type of logical or mathematical fallacy; otherwise such methods ‘would always be available to circumvent noncentral t distributions when considering power of t tests’ (5).” In the above quotation, the Burrows sub-quotation is referenced as coming from one of Burrows' publications in press, but the publication seems never to have happened. Regardless of who may have said it, what matters is the apparent contradiction between Eqs. (1) and (2), which are without mathematical, statistical or logical fallacy and which handily pass the scrutiny of extensive Monte Carlo simulations, and the
theoretical, content domain detection limit given by Clayton et al. as their Eq. (17). As they state “Thus xd(p,q) =w0Δ(p,q)σ /β (17) is the detection limit associated with the detection rule that achieves assurance probability (1−q). Some values of Δ(p,q) for specified p, q, and ν are tabulated by Owen (13) and Burrows (5). A more useful listing for the present application is given here in table 1, which also gives values of tp for p = 0.10, 0.05, and 0.01. Note that xd(p,q) is a parameter; it is not an estimate based on a particular set of observations. As the number of calibration observations, n, increases, the detection limit given by eq 17 approaches that given by eq 13. Apart from the unknown factor (σ /β), which is analyte and measurement-specific and which must be estimated, all of the components of eq 17 can be determined for any given calibration design by specifying values for p and q.” The second major goal of this part of the study is to resolve this apparent contradiction. 2. Theoretical All of the assumptions concerning the CMS, the measurement noise, the OLS processing of the calibration curve data, the standard measurement protocol and so on are exactly as in Part 1 and there is no change in nomenclature. Fig. 1 concisely gives the expressions for the Currie decision and detection limits in all four quadrants. If σ0 were known, which will almost never to true in practice, then the equations in quadrant 1 are the true Currie decision and detection limits in the response domain: YC ¼ zp r0 g1=2
ð3Þ
YD ¼ zp þ zq r0 g1=2 :
ð4Þ
These equations yield errorless real numbers, greater than zero. If p = q, as is commonly the case, then YD = 2YC. If β were also known, then the equations in quadrant 3 are the true Currie decision and detection limits in the content domain: XC ¼ zp r0 g1=2 =b
ð5Þ
XD ¼ zp þ zq r0 g1=2 =b:
ð6Þ
These equations also yield errorless real numbers, greater than zero. These are the ideal circumstances, approachable only asymptotically by experimental results.
Fig. 1. Quadrant diagram summarizing the four Currie decision limit expressions (subscript ‘C’) and four Currie detection limit expressions (subscript ‘D’).
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
Now suppose that σ0 was unknown, as is almost always the case outside of computer simulations and the like. In this case, there is agreement [7, Eq. (14); 3, p. 20; 4, Eq. (4B); 5, Eq. (4b)] that the response domain, experimental Currie decision limit is yC ¼ tp s0 g1=2
ð7Þ
as given in quadrant 2 of Fig. 1. This follows in standard fashion from the expression for YC in quadrant 1: the unknown σ0 is replaced by the sample standard deviation s0 obtained via the OLS processing of the calibration curve data (since s0 = sr, the sample standard error about regression), and the critical zp value is replaced by the corresponding critical (central) tp value for N − 2 degrees of freedom. Since s0 is a χ distributed random variate, η is a constant for any given OLS calibration design and tp is constant, yC is a scaled χ distributed random variate with PDF given by h 2 2 pu ðuÞ ¼ mm=2 um1 emu =2ðkr0 Þ =ðkr0 Þm 2ðm2Þ=2 Cðm=2Þ U ðuÞ ð8Þ where k = tpη1/2, u = ks0 and U(•) is the unit step function. Only if σ0 were unknown would Eq. (7) be used: it would never be used in place of the YC equation if σ0 was available. The stage has now been set for the noncentral t distribution to make one of its appearances. pffiffiffi P As Johnson et al. [8] state so concisely “The statistic nð X n0 Þ=S is used in testing the hypothesis that themean of a normal population is equal to ξ0. P If X ¼ n1 Rni¼1 Xi and sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! n X P 2 ðX i X Þ S ¼ ðn 1Þ1
131
This peculiar situation of both requiring σ0, and yet not knowing it, is one source of the apparent contradiction. If σ0 were known, then the XD and YD would always be computed via Eqs (6) and (4), thereby requiring only critical z values and requiring neither the central t distribution nor the noncentral t distribution. Note that Eqs. (6) and (4) will always yield lower numerical values than those computed from Eqs. (9) and (10), respectively. This is due to the fact that, for all possible combinations of p, q, and ν, zp þ zq bdp;q
ð11Þ
with lim dp;q ¼ zp þ zq . Thus Eqs. (9) and (10) are sub-optimal mYl in their respective theoretical domains. Clayton et al., quoted above, recognized that σ / β must be estimated and knew this would have to be done well, since they said [7]: “Attempts to redefine limits that have more realistic type II error rates, as we advocate, have generally suffered from one or more of the following deficiencies: (a) the influence of the calibration has been ignored altogether, (b) the measurement error variance has been assumed known (or a large-sample, highly precise estimate of it is assumed available), (c) a simple straight-line calibration with known model parameters (slope and intercept) has been assumed, and/or (d) mathematical or logical fallacies have occurred in the development of the detection limit estimators.” Despite (b) in the quotation above, they then proceeded to modify Eq. (9) (their Eq. (17)) by substituting s0 for σ0 and b for β: xDV¼ dp;q s0 g1=2 =b
ð12Þ
i¼1
are calculated from a random samplepffiffiof ffi Psize n, and the population mean is equal to ξ0, then nð X n0 Þ=S should be distributed as (central) t with n − 1 degrees of freedom. If, however, the population mean ξ ispnot pffiffiffi P ffiffiffi equal to ξ0, then nð X n0 Þ=S is distributed as tn1 V ð nðn n0 Þ=rÞ, where σ is the population standard deviation. The power of the test is calculated as a partial integral of the probability density function of this noncentral t-distribution”. From this long standing usage of the noncentral t distribution, Clayton et al. deduce their Eq. (17) [7], vide supra, which in the notation of this paper is XDV¼ dp;q r0 g1=2 =b
ð9Þ
with xd( p,q) = XD′ , w0 = η 1/2 , Δ(p,q) = δ p,q, σ = σ0, and β unchanged. This equation is for the theoretical Currie detection limit in the content domain. Removing β yields the corresponding theoretical Currie detection limit in the net response domain: YDV¼ dp;q r0 g
1=2
:
ð10Þ
Both Eqs. (9) and (10) are indisputably correct, but arise in the first place only because Eq. (7) was used. But Eq. (7) is used only when σ0 is unknown and yet Eqs. (9) and (10) both require σ0.
and the corresponding yD equation would be yDV¼ dp;q s0 g1=2 :
ð13Þ
This is almost the only possible course of action (but see the MARLAP correction below), given that neither δp,q nor η1/2 can be changed, and others have been forced to make the same substitutions [3,4,6]. But s0 is a negatively biased point estimate of σ0, as has been known for almost a century [9], and this guarantees that Eqs. (12) and (13) are negatively biased and therefore will give rates of false positives significantly in excess of what was nominally specified by the choice of q in selecting δp,q. In contrast, Eqs. (1) and (2) are unbiased for all degrees of freedom and comparison of Eqs. (1) and (13) and of Eqs. (2) and (12) shows that the experimental detection limits given by Eqs. (12) and (13) are negatively biased by factors of δp,q / (tp + tq). In general, for all possible combinations of p, q, and ν, zp þ zq bdp;q btp þ tq :
ð14Þ
As an example, for p = 0.05= q and v = 4, zp +zq ≈ 2(1.64485)= 3.28970 b δp,q ≈4.06728 b tp +tq ≈ 2(2.13185) = 4.26370. Hence, a bias compensation factor (bcf) may be defined as bcf u tp þ tq =dp;q
ð15Þ
132
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
and use of this will be demonstrated in the Monte Carlo simulations below. Assuming for the moment that σ0 is known, but Eq. (7) is used despite this, then Eqs. (9) and (10) solve a Neyman– Pearson hypothesis test upon which Currie's schema is founded. This is exactly as Clayton et al. demonstrated, but it is important to realize that this solution is unique only for their stated alternate hypothesis, which is that the true detection limit is a constant (i.e., a theoretical parameter) that is greater than zero. Thus, with the standard measurement protocol given in Part 1, the net response will be distributed as net response eN : dp;q r0 g1=2 ; r0 g1=2 ¼ N : YDV; r0 g1=2
ð16Þ
where YD′ is the requisite theoretical Currie detection limit as per Eq. (10). However, with the reality that experimental detection limits must be random variates, there is another viable alternative hypothesis that involves replacing YD′ in Eq. (16) with yD from Eq. (1). Note that yD is a scaled χ distributed random variate with PDF given by h i 2 2 pu ðuÞ ¼ mm=2 um1 emu =2ðkr0 Þ =ðkr0 Þm 2ðm2Þ=2 Cðm=2Þ U ðuÞ
ð17Þ
where k = (tp + tq)η1/2 , u = ks0 and U(•) is the unit step function. Then the net response will be distributed as
net response e N : yD ; r0 g1=2 ¼ N : tp þ tq s0 g1=2 ; r0 g1=2
ð18Þ
which is a compound normal distribution [10], i.e., a distribution in which one (or both) of the population parameters is a random variate. This will be elaborated more fully after the Monte Carlo results are presented. At this point, it might seem that the noncentral t distribution has gone away, but this is incorrect: it lies at the very heart of content domain experimental limits of decision and detection. Consider Eq. (28) from Part 1: xC ¼
1=2 tp g1=2 Sxx =tslope
ð19Þ
where tslope = b / sb. This equation consists of the product of three positive constants, divided by a random variate, tslope, that is itself the quotient of a Gaussian variate b eN : b; rb
ð20Þ
and the χ variate sb. Then
Accordingly, as shown by the defining equations in the Appendix, tslope e t ðujm; dÞ ¼ t ðujN 2; b=rb Þ:
ð24Þ
where u is the independent variable and δ is the noncentrality parameter. Thus, for homoscedastic, Gaussian noise and OLS processing of the calibration curve data, the t test statistic of the slope, tslope, is distributed in accordance with the noncentral t distribution. Note that tslope is a positively biased point estimate of δ and it is essentially the experimental (amplitude) signal-tonoise ratio of the CMS. If x is a random variate with PDF px(x), and y = a / x, where a is a positive constant, then the PDF of y is py(y) = (a / y2 ) × px(a / y) [12]. Hence xC is distributed as the scaled reciprocal of a noncentral t variate having N − 2 degrees of freedom and noncentrality parameter δ ≡ β / σb. For lack of a better or nonambiguous name, this is referred to as a modified noncentral t distribution. Explicitly 1=2 1=2 xC e tp g1=2 Sxx =xC2 t tp g1=2 Sxx =xC jN 2; b=rb : ð25Þ Since xD functionally differs from xC only in having a factor of (tp + tq) in place of tp, the PDF of xD is also a modified noncentral t distribution: 1=2 2 1=2 xD e tp þ tq g1=2 Sxx =xD t tp þ tq g1=2 Sxx =xD jN 2; b=rb :
ð26Þ
For the overplots shown in Fig. 8 in Part 1, these two PDFs were evaluated with the following values: β = 1, σ0 = 0.1, N = 18 (3 replicates at each of xi = 1, 2,…, 6), M = 5, η1/2 ≈ 0.699206, 1=2 1=2 Sxx c7:245688, rb ¼ r0 =Sxx , t0.05 = 1.74588, and t0.01 = 2.58349. The fits were excellent. 3. Experimental To test the above developments, extensive Monte Carlo calculations were performed. All of the software and hardware details are unchanged from Part 1, quo vide. Unless noted otherwise, the linear calibration curve parameters used in the Monte Carlo simulations are as given in Table 1. As in Part 1, various combinations of p and q were used, with values of
Table 1 Model parameters used in the Monte Carlo simulations Parameters
Model 1
Model 2 and 2A
ð21Þ
α, β, σ0, N, M, ν Standards values xi σb
b=rb eN : b=rb ; 1uN : d; 1
ð22Þ
with δ ≡ β / σb and pffiffiffiffiffiffiffiffiffiffiffi 1=2 1=2 sb =rb ¼ s0 =Sxx = r0 =Sxx ¼ s0 =r0 e v2 =m:
ð23Þ
δ η1/2 p, q tp, tq δp,q Calibration curves
1, 1, 0.1, 6, 3, 4 1 measurement each of xi = 1, 2, …, 6 0.1/(35/2)1/2 ≈ 0.02390457218 ≈41.83300134 (6/5)1/2 ≈ 1.095445115 0.05, 0.05 2.13185, 2.13185 4.06728 10 sets of 1 million each
1, 1, 0.1, 18, 5, 16 3 replicates each of xi = 1, 2, …, 6 0.1/(105/2)1/2 ≈ 0.01380131119 ≈ 72.45688371 (22/45)1/2 ≈ 0.6992058988 0.05, 0.01 or 0.05 1.74588, 2.58349 or 1.74588 4.15529 or 3.44041 10 sets of 1 million each
tslope ub=sb ¼ ðb=rb Þ=ðsb =rb Þ so that
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
0.05 and 0.01 being most frequently employed. It is important to note that one complete calibration curve, and all associated test statistics and computations, were performed for each simulation step, for one million steps per simulation and 10 simulations in a simulation set. Fig. 2 shows a typical Monte Carlo simulation model (Model 1 in Table 1) that was used to determine how well Eqs. (1), (10) and (13) performed in terms of their rates of false negatives. As for the simulations in Part 1, this is both the computer model itself and also its own self-documenting block diagram. For each simulation step, M = 3 independent true blank replicates were averaged and then blank-corrected by subtracting the OLS intercept from their sample mean. This specimen under test (SUT), distributed as SUT ∼ N:0, σ0η1/2, is then used to prepare 3 “net responses” by separately adding YD ′ (Eq. (10)), yD ′ (Eq. (13)) and yD (Eq. (1)) to the given SUT variate. These are designated by NR1, NR2, and NR3, respectively. Note that, in Fig. 2, Eq. (1) is implemented by starting with yD ′ from Eq. (13) and fixing it by multiplying by the bias compensation factor in Eq. (15). In Fig. 2, p = 0.05 = q, so in a given simulation of 1 million steps, it is expected that 50 000 (= 0.05 × 106) false negatives should occur. In a set of ten simulations, the mean numbers of false negatives should approximate 50 000, with relatively small dispersion. This particular model has only 4 degrees of freedom, so it provides a stringent test of Eqs. (1), (10) and (13).
133
4. Results and discussion The heart of the model is the block named “Linear calibration curves” at left center. This contains the desired calibration curve parameters, together with requisite constants (e.g., critical t values) and it generates all the calibration curve data and associated sample test statistics, e.g., slopes, intercepts, their standard errors, the standard error about regression, and so on. It also generates the experimental Currie decision limit in the net response domain, yC, via Eq. (7). For each simulation step, the generated variates are used for further calculations, as necessary, and 6 such variates are binned into the histogram blocks having letter or numeral designations to their right. The 6 binned variates are tslope, sr, yC, NR1, NR2, and NR3. Fig. 3 shows the normalized histogram of 10 7 tslope variates, overplotted with the noncentral t distribution (Eq. (24)) with ν = 4 and δ ≈ 41.833. Agreement is excellent. The population mean and population standard deviation of the tslope variates may be computed via Eqs. (A2) and (A3) in the Appendix, giving E[tslope] ≈ 52.429 and (var[tslope])1/2 ≈ 27.436. For comparison, the histogram results were 52.434 and 27.534, respectively. The positive bias of tslope was about 25%. The sr histogram was similarly well fit (not shown) by Eq. (8), with k ≡ 1. Histograms for the remaining 4 variates are shown and discussed below.
Fig. 2. Typical Monte Carlo simulation model showing how rates of false negatives are computed in the theoretical and experimental net response domains.
134
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
Fig. 3. Plot showing normalized histogram (markers) of sample test statistics of the slopes for 107 independent calibration curves for model 1 in Table 2. Overplotted is the noncentral t distribution in Eq. (24), for ν = 4 and δ ≈ 41.833.
4.1. Quadrant 1: the theoretical, net response domain
4.2. Quadrant 2: the experimental, net response domain
In the upper portion of Fig. 2, it is assumed that σ0 is both unknown and known, so that the experimental net response domain Currie decision limit, yC, is the χ distributed random variate given by Eq. (7), but the theoretical net response domain Currie detection limit, YD′ , is given by Eq. (10). Since δ0.05,0.05 ≈ 4.06728 for ν = 4 [7], and σ0η 1/2 = 0.1 × (6/ 5) 1/2 , YD′ ≈0.445548. Therefore NR1 should be normally distributed as
Since σ0 is almost never known, the remainder of the model in Fig. 2 explores what happens when Eqs. (13) (for yD′ ) and (1) (for yD) are put to the test. As noted above, Eq. (13) is guaranteed to be negatively biased, so the only real question is how badly this increases the rate of false negatives. On the contrary, Eq. (1) cannot be biased and the Monte Carlo results should readily confirm this. The results are given in the small data tables 2 and 3 at the right of Fig. 2. For yD ′ , data table 2 shows that 62490.0 ± 216.2 false negatives occurred. This is very poor performance for ν = 4: the obtained q was about 25% higher than the q = 0.05 used in initially specifying δp,q. In contrast, data table 3 shows that yD gives excellent performance for exactly the same underlying set of SUT variates: 49975.6 ± 235.5 false negatives. The only difference between Eqs. (13) and (1) is that the former has a factor of δp,q, which is replaced with a sum of critical t values in the latter. Fig. 4 shows the histograms for yC, NR1, NR2, and NR3. Both NR2 and NR3 are examples of compound normal distributions [10], since they are
NR1eN : YDV; r0 g1=2 cN : 0:445548; 0:109544
ð27Þ
For the 107 binned NR1 results in histogram block D, the sample mean was 0.445549 and the sample standard deviation was 0.109546, so agreement was excellent. These NR1 variates, judged against their corresponding yC random variates, resulted in 50 041.9 ± 230.1 false negatives (sample mean ± sample standard deviation, for 10 simulations). This is shown in the small data table 1 at the upper right of Fig. 2. Thus no bias was found, and the Monte Carlo results are consistent with theory, as expected. The histograms for yC and NR1 are shown in Fig. 4, without (for clarity purposes) their overplotted theoretical PDFs from Eqs. (8) and (27), respectively. But note carefully that, if σ0 is known, as assumed in this section, then the true theoretical Currie decision and detection limits for this CMS model are YC ≈ 0.180185 (Eq. (3)) and YD ≈ 0.360369 (Eq. (4)). These are clearly superior, in all regards, to the yC, YD ′ pair and the Monte Carlo performance of the YC, YD pair (not shown) is equally as good as that of the yC, YD ′ pair. Thus, YD ′ would never be used, even in the very unlikely event that σ0 were known.
NR2 eN : yDV; r0 g1=2 ¼ N : dp;q s0 g1=2 ; r0 g1=2
ð28Þ
and NR3 eN : yD ; r0 g1=2 ¼ N : tp þ tq s0 g1=2 ; r0 g1=2
ð29Þ
and, as may be clearly seen in Fig. 4, the histograms are quite similar in appearance. Nevertheless, the yD variates binned in the NR3 histogram, with sample mean and sample standard deviation of 0.4390 and 0.1933, respectively, perform much
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
135
Fig. 4. Plot showing normalized histograms, in the net response domain, for 107 independent values each of the Currie decision limit, yC, and the three net response variates NR1, NR2, and NR3.
better than the yD′ variates binned in the NR2 histogram, with sample mean and sample standard deviation of 0.4188 and 0.1873, respectively. Evidently, small differences matter. 4.3. Quadrant 3: the theoretical, content domain In transferring to the content domain, all that is necessary is to divide, as appropriate, by β, if it is known, or by b, otherwise, taking care not to use both in a comparison. Fig. 5 shows the content domain version of Fig. 2, with the deliberate mistake, at the top left of the figure, of dividing δp,qσ0η1/2 by β even though xC and the SUT involved division by b. Even though b is an unbiased point estimate of β, data table 1 shows that careless mixing of b and β usage causes significant bias. This bias entirely vanishes, as it must, if either β (much preferably) or b is consistently used throughout. One difference relative to the response domain is that NR1 is no longer normally distributed if b, rather than β, is the divisor: the distribution would be a relatively complicated compound normal distribution: NR1 eN : dp;q r0 g1=2 =b; r0 g1=2 =b
ð30Þ
where both population parameters of the underlying normal distribution are themselves distributed as scaled reciprocals of normal variates. 4.4. Quadrant 4: the experimental, content domain The remaining quadrant is entirely consistent with quadrant 2 since all of the variates to be compared, i.e., NR2, NR3, and
xC, are simply b− 1 times the respective variates from quadrant 2. For xD′ , data table 2 shows that 62 540.3 ± 331.1 false negatives occurred, which is statistically the same as for yD′ . In contrast, data table 3 shows that xD gives excellent performance: 49 996.5 ± 333.8 false negatives. Fig. 6 shows the histograms for xC, xD, NR1, NR2 and NR3. It also shows the PDFs for xC and xD, computed via Eqs. (25) and (26), respectively, using the Model 1 values in Table 1. The PDFs of x C and xD were computed using the simulation model shown in Fig. 7. The ν and δ values are as in Eqs. (25) and (26), the ‘k’ value is t p, for xC, or t p + t q, for xD, and the ‘scale factor’ is η 1/2 S1/2 xx . This simulation also computes the cumulative distribution function (CDF), confidence intervals and the expectation value of the variate, i.e., of xC or xD. For the xC results in Fig. 6, the expectation value and 95% CI from the 10 7 binned histogram results were 0.219639 and 0.0799 to 0.390, respectively. The corresponding results obtained using the numerical integration of the theoretical PDF, via the model in Fig. 7, were 0.219643 and 0.0815 to 0.391, respectively. For the xD results in Fig. 6, the expectation value and 95% CI from the 10 7 binned histogram results were 0.439277 and 0.160 to 0.780, respectively. The corresponding results obtained using the numerical integration of the theoretical PDF, via the model in Fig. 7, were 0.432985 and 0.163 to 0.782, respectively. Again, agreement on all counts is excellent. The PDFs for NR2 and NR3 are even more complicated than that of NR1 in Eq. (30) since both population parameters of the normal distributions are themselves distributed as modified noncentral t variates.
136
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
Fig. 5. Typical Monte Carlo simulation model showing how rates of false negatives are computed in the theoretical and experimental content domains.
Fig. 6. Plot showing normalized histograms, in the content domain, for 107 independent values each of the Currie decision limit, xC, Currie detection limit, xD, and the three net response variates NR1, NR2, and NR3. The xC and xD histograms are overplotted with their respective theoretical PDFs.
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
137
Fig. 7. Simulation model used to numerically evaluate the PDFs and associated variates for xC and xD.
4.5. More degrees of freedom, but no escape
4.6. The MARLAP correction
Model 2 in Table 1 has 16 degrees of freedom, compared to only 4 for Model 1 and it also has M = 5 replicates averaged for the SUT. Both of these factors decrease η significantly, thereby lowering all decision and detection limits. However, Model 2 has q = 0.01, as some authors have suggested as desirable [6,14], rather than q = 0.05, which is the IUPAC recommended default value [4]. Table 2 shows the Monte Carlo results for this model, for all four quadrants, and again Eqs. (12) and (13) alone perform very poorly, this time exhibiting rates of false negatives that are about 42% too high, even though ν was significantly increased. Model 2A in Table 1 differs from Model 2 in having q = 0.05, as for Model 1, and the other necessary changes in tq and δp,q. This results in improved performance from Eqs. (12) and (13), with rates of false negatives that average only about 9.5% too high. For all 3 models, the NR1 and NR3 results are as expected statistically. Clearly, aside from the obvious conclusion that increasing ν is beneficial, it is also seen that relatively low values of ν should especially be avoided if low rates of false negatives are of importance and Eqs. (12) or (13) are used.
Since it is obvious that Eqs. (12) and (13) cannot be unbiased, and this is readily verified in the simulations detailed above, the Multi-Agency Radiological Laboratory Analytical Protocols (MARLAP) Manual [16] attempts to correct the bias in Eqs. (12) and (13) by replacing σ0 by s0 / c4, where c4 is the expectation value of s0, divided by σ0: c4 uc4 ðmÞuE ½s0 =r0 ¼
pffiffiffiffiffiffiffi 2=m C½ðm þ 1Þ=2=C½m=2
ð31Þ
Note that c4 must always be less than unity, so this correction is in the right direction. However, just as confidence interval construction requires either the product of a critical z value and a population standard deviation, or the product of a critical t value and a sample standard deviation, and nothing else is a usable substitute, so the same restriction is present here. Ironically, if it were possible to construct a viable confidence interval with the product of a critical z value and s0 / c4, this would provide a means of circumventing use of t values, which is not possible.
Table 2 Rates of false negatives obtained via Monte Carlo simulation p and q values
p = 0.05; q = 0.05 Model 1: ν = 4 p = 0.05; q = 0.01 Model 2: ν = 16 p = 0.05; q = 0.05 Model 2A: ν = 16
Net response domain
Content domain
Quadrant 1 theoretical
Quadrant 2 experimental
Quadrant 3 theoretical
Quadrant 4 experimental
50 041.9 ± 230.1
62 490.0 ± 216.2 49 975.6 ± 235.5 14 222.9 ± 134.1 10 015.5 ± 98.7 54 815.4 ± 250.7 50 019.2 ± 213.0
50 032.2 ± 187.3 (only b used)
62 540.3 ± 331.1 49 996.5 ± 333.8 14 178.6 ± 134.1 9988.6 ± 112.9 54 694.0 ± 249.8 49 925.3 ± 213.2
10 023.6 ± 97.1 49 996.9 ± 121.1
9997.7 ± 89.2 49 923.0 ± 164.1
N.B.: In quadrants 2 and 4, the upper numbers pertain to NR2 while the lower numbers are for NR3. Numbers are sample means ± sample standard deviations for averages of 10 simulations of 106 calibration curves each.
138
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
Fig. 8. One of 10 Monte Carlo simulation models used to determine rates of false positives and false negatives for 8 expressions under test.
In the interests of completeness, Fig. 8 shows one of the 10 models used to test 8 of the possible Currie decision and detection limit equations. The model shown is for ν = 2, with the calibration curve having 4 standards (1, 2, 3 and 4). The other 9 models were constructed for ν = 1 and ν = 3 through 10, with evenly spaced standards in the obvious pattern. For each model, appropriate critical values were used for zp, zq, tp, tq and δp,q. Values of √η were also modified as appropriate. The remaining parameters were α = 0.5, β = 3, σ 0 = 0.1 and p = 0.05 = q. As in Figs. 2 and 5, the bracketed numbers at the far right of Fig. 8 identify the tested expression. In this case, (1) is a test of false positives using Eq. (3) while (2) tests false negatives via Eq. (4). For both of these, YC is used as the decision limit, but yC is the decision limit for the remaining 6 under test. Next, (3)
is a test of false negatives using Eq. (10), (4) tests false positives using Eq. (7), and (5) tests false negatives via Eq. (1). All five of these equations should work well, for all degrees of freedom, when paired, as here, with their appropriate decision limits. The last three all test false negatives. They are (6), which tests Eq. (13), then (7), which tests what would happen if σ0 in Eq. (4) were replaced by s0, while foolishly retaining the use of the critical z values from Eq. (4), and, lastly, (8) tests the MARLAP correction described above, with c4 values computed via Eq. (31). Thus the MARLAP equation is simply Eq. (13) divided by c4. Table 3 gives the results for (1), (2), (3) and (4), while Table 4 gives the remaining results. Beginning with Table 3, columns (1) and (2) show the rates of false positives and negatives, respectively, then the sample standard deviations for the 10
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
simulations of 1 million results each, and then, at the bottom of each table box, either YC, for (1), or YD, for (2). Column (3) gives the equivalent results for Eq. (10), with Y′D as the bottom entry in each box in the table. Note that, as expected, YD b Y′D for ν = 1, 2, …, 10 and this pattern must hold for all degrees of freedom. For the remaining columns, σ0 is assumed unknown, so there is no access to YC or YD, hence there are only two entries per box in the tables. Column (4) of Table 3 and column (5) of Table 4 show that Eqs. (7) and (1), respectively, also work as expected. Not surprisingly, column (6) shows that Eq. (13) works poorly, producing positively biased rates of false negatives, except for anomalously ‘less bad’ performance at ν = 1. Column (7) is duly atrocious, exactly as expected, while column (8), testing the MARLAP correction, shows that significant over-correction has occurred: the detection limits are now positively biased, resulting in negatively biased rates of false negatives. With the exception of ν = 1, the MARLAP correction is clearly less biased than the uncorrected Eq. (13), and the bias in the rates of false negatives are in the conservative direction, but the results in column (5) clearly show that Eq. (1) is unbiased and Eq. (1) also has the advantage of being both
Table 3 Monte Carlo error rates, true decision and true detection limits for items (1) through (4) in Fig. 8 ν
1
2
3
4
5
6
7
8
9
10
σ0 known
σ0 known
σ0 both known and unknown
σ0 unknown
zps0 (1)
(zp + zq)s0 (2)
δp,qσ0 (3)
tps0 (4)
49939.2 206.7 0.300 49912.6 204.4 0.260 49978.7 225.9 0.238 49948.6 243.9 0.225 49972.0 181.2 0.215 49913.5 200.0 0.209 50024.3 199.1 0.203 49922.9 253.3 0.199 49956.1 226.5 0.196 49948.2 220.1 0.193 False positives
50019.9 213.8 0.601 49979.9 257.7 0.520 50185.9 250.9 0.477 49935.5 142.1 0.449 50129.1 215.8 0.431 50066.4 239.7 0.417 49876.3 265.8 0.407 50016.5 213.1 0.398 49912.5 297.4 0.392 50062.3 233.8 0.386 False negatives
50010.5 200.0 2.288 49975.8 307.8 0.872 49959.7 180.4 0.646 49989.3 209.6 0.556 50024.9 266.4 0.507 50095.6 245.0 0.476 49904.6 199.3 0.454 50039.3 180.4 0.438 49926.6 263.2 0.426 50033.7 223.5 0.416 False negatives
50006.4 202.8 49938.6 182.0 50051.5 151.3 49903.3 213.7 49922.1 197.3 49927.7 193.5 50028.9 96.1 49929.2 174.4 49932.1 211.0 49922.5 155.4 False positives
139
Table 4 Monte Carlo false negative error rates for items (5) through (8) in Fig. 8 ν
1 2 3 4 5 6 7 8 9 10
σ0 unknown
σ0 unknown
σ0 unknown
σ0 unknown
(tp + tq)s0 (5)
δp,qs0 (6)
(zp + zq)s0 (7)
δp,qs0/c4 (8)
50034.2 227.8 49947.3 255.3 49994.1 227.4 49965.9 191.4 50056.8 185.4 50057.4 299.6 50012.0 284.1 49948.4 174.2 49998.6 316.6 50056.5 125.8 False negatives
50802.5 233.5 60918.3 297.1 63113.3 299.0 62516.8 211.4 61478.5 205.7 60240.7 259.8 59320.6 316.7 58452.0 194.8 57698.6 342.5 57163.2 137.1 False negatives
898382.2 302.6 373829.6 273.1 209173.9 455.0 155532.0 388.7 129390.7 301.5 113527.3 355.7 102874.7 366.9 95320.9 302.1 89574.5 445.1 85204.1 246.0 False negatives
33779.3 156.6 40332.6 247.7 44490.7 230.3 46562.0 195.2 47743.3 160.5 48385.3 288.6 48784.1 291.8 48988.1 176.2 49224.7 307.8 49426.8 117.7 False negatives
trivially simple and intuitively obvious. Thus the MARLAP correction cannot salvage Eq. (13) (or Eq. (12)). 5. Conclusions The reported results support a number of important conclusions. First, for the simple CMS system that has been examined, Currie's decision and detection limit schema is correctly implemented via the equations in Fig. 1. Each of these is exact and unbiased for all degrees of freedom. The yC and yD random variates are scaled χ variates, as per Eqs. (8) and (17), respectively. The xC and xD random variates are scaled reciprocals of noncentral t variates, as per Eqs. (25) and (26), respectively. If σ0 is known, then quadrant 1 is always preferred over quadrant 2. If both σ0 and β are known, then quadrant 3 is always preferred over quadrant 4. If only β is known, it is consistently used where b would have been used otherwise. Indeed, b is then simply ignored. If p = q, as is most commonly the case, then in each quadrant, the detection limit is exactly twice the decision limit, as intuitively expected. The second conclusion is that there is never any need for critical values of the noncentrality parameter of the noncentral t distribution. If both σ0 and β are known, then Eqs. 3–6 are optimum, being errorless real numbers that are as low as possible. In contrast, Eq. (7) would not be used if σ0 were known, which automatically eliminates Eqs. (9) and (10). Thus neither the central t distribution nor the noncentral t distribution would have any relevance. If Eq. (7) were used despite σ0 being known, then Eqs. (9) and (10) would produce errorless real numbers that are larger than those produced by Eqs. (6) and (4), respectively. This raises several troubling issues. First, why use s0, as required in Eq. (7), when σ0 is known? Second, since Eqs. (4), (6), (9) and (10) are all equally correct, with none having a
140
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
claim to greater statistical, mathematical or logical legitimacy, what is to be done about having two values of the content domain detection limit, from Eqs. (6) and (9), that are both errorless real numbers, yet are different? The same problem arises with the detection limits in the net response domain, i.e., Eqs. (4) and (10). Clearly, there is no reason to use Eqs. (9) and (10) since all of the advantages, and none of the disadvantages, reside with Eqs. (4) and (6). This means that Eqs. (9) and (10), despite being perfectly legitimate, are dead ends. The third conclusion is that Eqs. (9) and (10) cannot be usefully salvaged by substituting s0 for σ0. As soon as this substitution is made, the statistical legitimacy of Eqs. (9) and (10) is undermined severely, by what is considered to be a mistake of novices [15]: “One of the first things you learn in statistics is to distinguish between the true parameter value of the standard deviation σ and the sample standard deviation s.” The resulting Eqs. (12) and (13) are guaranteed to be negatively biased and therefore must result in rates of false negatives that are in excess (i.e., are positively biased) of those specified in the selection of the critical value of the noncentrality parameter of the noncentral t distribution. This is exactly what the Monte Carlo simulations demonstrate quantitatively. Furthermore, it is trivially simple to fix Eqs. (12) and (13): all that is required is to multiply them by the bias compensation factor, Eq. (15), thereby resulting in the unbiased Eqs. (2) and (1), respectively. The MARLAP correction also fails, resulting in positively biased detection limits and correspondingly negatively biased rates of false negatives. This is certainly preferable to elevated rates of false negatives, and the bias magnitudes are smaller, but there is no need for the bias in the first place: it can be eliminated by simply not using the ‘critical delta’ method. In reality, there was never actually a contradiction between the pair of Eqs. (9) and (10) and the pair of Eqs. (1) and (2): they are simply based on different alternate hypotheses. The alternative hypothesis relevant to Eq. (10) is “given fixed values of p, q, and ν, and repeated sampling from N:Y,σ0η1/2, what is the smallest constant value of Y that will result in no more than 100q% false negatives when the decision limit is given by Eq. (7)?” This minimum Y value is YD′ in Eq. (10) and, divided by β, it is XD′ in Eq. (9). This is the solution given by Clayton et al. [7, Eq. (17)]. For Eq. (1), the relevant alternative hypothesis is “given fixed values of p, q, and ν, and repeated sampling from the compound normal distribution N:yD,σ0η1/2, what χ distributed random variate yD will result in no more than 100q% false negatives when the decision limit is given by Eq. (7)?” Then Eq. (2) gives the experimental detection limit in the content domain and it is distributed as a modified noncentral t variate, as per Eq. (26). Given that there is no reason to ever use Eqs. (9), (10), (12) and (13), there is no need for critical values of the noncentrality parameter of the noncentral t distribution, except possibly to correct existing published LODs that were based on use of Eqs. (12) or (13). This would be done via multiplication by the bcf in Eq. (15). For MARLAP-based results, an additional factor of c4 would be necessary. Fortunately, existing published detection
limits based on Eqs. (12) or (13), though negatively biased, are probably not so biased as to be problematic unless the degrees of freedom are small. Upward corrections of ten percent or so are meaningless in a practical sense. For future work, however, there is no reason to employ methodology that is needlessly complicated and knowingly results in non-conservatively biased results. Similar considerations apply to the less biased MARLAPbased detection limits. A fourth conclusion is that, to the extent that national and international standards organizations have adopted the ‘critical values of the noncentral t distribution’ methodology [16–19], this should be discarded due to its inherent bias and, instead, the unbiased Currie schema given in Fig. 1 should be adopted. In Part 3, weighted least squares (WLS) extension of Currie's decision and detection schema to heteroscedastic, Gaussian white noise is demonstrated, with heteroscedasticities as high as μ = 0.45 [4] being easily handled. In Part 4, Boumans' RSDBBEC methodology [20], and related simple LODs that neglect type II errors, are investigated and comprehensive recommendations are made with regard to both future harmonization of LOD methodology and standardized nomenclature. The author thanks S. Nadarajah and S. Kotz for inspiration for the present work [21]. Appendix. Noncentral t probability density function Consider the quotient of two independent random variates, x and y, where x ∼ N:δ,1, y ∼ (χ2 / ν)1/2, “∼” means “is distributed as”, ν is degrees of freedom and δ is the noncentrality parameter. Thus x is simply a normally distributed (i.e., Gaussian) variate and y is a “χ” distributed variate. Then, by definition, u ≡ x / y is distributed as the noncentral t distribution [11, p 191]: ðmþ1Þ=2 md2 =2ðmþu2 Þ u e t ðujm; dÞ ¼ C 1 þ u2 =m Hhðk Þ e
ðA1Þ
where Cum!=2ðm1Þ=2 Cðm=2ÞðpmÞ1=2 ; Z 1=2 ku ud= m þ u2 ; Hhðk Þu
l
2
ðum =m!Þeðuþk Þ =2 du
0
and Hh(k) is the Airey integral [8, p. 515]. If δ = 0, the noncentral t distribution reduces to the customary central “Student” t distribution. Where there is no risk of confusion, tu(u|v,δ) will be denoted by t(u|v,δ) or even more concisely by t(v,δ). Johnson et al. [8, p. 508] use the notations t′v(δ), t′v or simply t′ if no confusion can result. The notation in Eq. (A1) is necessary since the independent variable, u, may be the transform of another variable. As ν goes to infinity, both the central and noncentral t distributions asymptotically become Gaussian [8, p. 519]. Note that there are numerous incorrect published expressions for the PDF of the noncentral t distribution; that given above has been extensively tested numerically and is correct. The expectation value of u, for ν N 1, is [13] E½u ¼ d ðm=2Þ1=2 Cððm 1Þ=2Þ=Cðm=2Þ
ðA2Þ
E. Voigtman / Spectrochimica Acta Part B 63 (2008) 129–141
and the variance of u, for ν N 2, is var½u ¼ ½m=ðm 2Þ
þd2 ½m=ðm 2Þ ðm=2ÞC2 ððm 1Þ=2Þ=C2 ðm=2Þ :
ðA3Þ
Confidence intervals are computed in the customary manner by inverting the cumulative distribution function (CDF) of the noncentral t distribution at the desired bounding values. The CDF is simply Z u CDFðuÞu tu ðuVjm; dÞduV: ðA4Þ l
References [1] L.A. Currie, Limits for qualitative and quantitative determination — application to radiochemistry, Anal. Chem. 40 (1968) 586–593. [2] A. Hubaux, G. Vos, Decision and detection limits for linear calibration curves, Anal. Chem. 42 (1970) 849–855. [3] L.A. Currie, Detection in analytical chemistry — importance, theory, and practice, in: L.A. Currie (Ed.), ACS Symposium Series 361, ACS, Washington, DC, 1988, pp. 1–62. [4] L.A. Currie, Detection: international update, and some emerging dilemmas involving calibration, the blank, and multiple detection decisions, Chemom. Intell. Lab. Syst. 37 (1997) 151–181. [5] L.A. Currie, Detection and quantification limits: origins and historical overview, Anal. Chim. Acta 391 (1999) 127–134. [6] R.D. Gibbons, D.E. Coleman, Statistical Methods for Detection and Quantification of Environmental Contamination, first ed.John Wiley & Sons, New York, 2001.
141
[7] C.A. Clayton, J.W. Hines, P.D. Elkins, Detection limits with specified assurance probabilities, Anal. Chem. 59 (1987) 2506–2514. [8] N.L. Johnson, S. Kotz, N. Balakrishnan, second ed., Continuous Univariate Distributions, vol. 2, John Wiley & Sons, New York, 1995, p. 509. [9] W.S. Gosset, The probable error of a mean, Biometrika 6 (1908) 1–25. [10] N.L. Johnson, S. Kotz, N. Balakrishnan, second ed., Continuous Univariate Distributions, vol. 1, John Wiley & Sons, New York, 1994, p. 163. [11] E.S. Keeping, Introduction to Statistical Inference, 1995 Dover, Mineola, NY. [12] A. Papoulis, Probability, Random Variables, and Stochastic Processes, second ed.McGraw-Hill, New York, 1984. [13] D.B. Owen, A survey of properties and applications of the noncentral tdistribution, Technometrics 10 (1968) 445–478. [14] M.E. Zorn, R.D. Gibbons, W.C. Sonzogni, Weighted least-squares approach to calculating limits of detection and quantification by modeling variability as a function of concentration, Anal. Chem. 69 (1997) 3069–3075. [15] J.F. Box, Guinness, Gosset, Fisher, and small samples, Stat. Sci. 2 (1987) 45–52. [16] Multi-agency radiological laboratory analytical protocols (MARLAP) Manual Volume III, Section 20A.3.1 20-54 - 20-58. United States agencies in MARLAP are EPA, DoD, DoE, DHS, NRC, FDA, USGS and NIST. [17] B.D. Real, M.C. Ortiz, L.A. Sarabia, Analysis of interferents by means a D-optimal screening design and calibration using partial least squares regression in the spectrophotometric determination of Cr(VI), Talanta 71 (2007) 1599–1609. [18] ISO standard 11843, part 2. [19] Implementing council directive 96/23/EC concerning the performance of analytical methods and the interpretation of results, 2002/657/EC Commission Decision of August 12, 2002. [20] P.W.J.M. Boumans, Detection limits and spectral interferences in atomic emission spectrometry, Anal. Chem. 66 (1994) 459A–467A. [21] S. Nadarajah, S. Kotz, Computation of signal-to-noise ratios, MATCH Commun. Math. Comput. Chem. 57 (2007) 105–110.