Chemometrics and Intelligent Laboratory Systems 69 (2003) 77 – 87 www.elsevier.com/locate/chemolab
A new approach to determining the precision and bias of on-line gauges F. Lombard Department of Mathematics and Statistics, Rand Afrikaans University, P.O. Box 524, Auckland Park 2006, South Africa Received 15 November 2002; received in revised form 27 May 2003; accepted 4 June 2003
Abstract On-line gauges are used in many industries to measure characteristics of raw material destined for a production process. In the coal industry, which is the focus of interest in this paper, the traditional method for determining coal characteristics has been the assay of coal samples. This method requires a great deal of work on a day-to-day basis. It has the further drawback that there is generally a time lag of at least 1 day between sampling of the coal and obtaining numerical measures of its quality characteristics. An interesting recent development is that nuclear gauges, which are capable of producing estimates of coal quality in real time, are now available commercially. The present paper demonstrates how bootstrap methods can be used to obtain a confidence interval for gauge precision using gauge output data alone. No specially constructed reference values from a sampling and assay operation are required. As such, the method is potentially an attractive alternative to the methodology set out in ISO 15239. It is also shown how gauge calibration can be checked using gauge output and one set of reference samples obtained under normal operating conditions. The method, which again uses the bootstrap, involves the construction of a simultaneous confidence band for a calibration function that relates gauge values to reference values. D 2003 Elsevier B.V. All rights reserved. Keywords: Precision; Bias; On-line gauges
1. Introduction In the coal industry, the traditional method for determining coal characteristics has been the assay of coal samples. This method requires a great deal of work on a day-to-day basis, and there is generally a time lag of at least 1 day between sampling the coal and doing the preparation and assays to obtain numerical measures of quality characteristics. An interesting recent development is that Prompt GammaNeutron Activation Analysis (PGNAA) gauges,
E-mail address:
[email protected] (F. Lombard). 0169-7439/$ - see front matter D 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0169-7439(03)00114-X
which are capable of producing estimates of coal quality in real time, are now available commercially. An issue of great importance to a user or prospective purchaser of a gauge is its ability to reproduce the ‘‘true’’ values of the various coal characteristics in question. The International Standards Organization has recently produced a protocol [8], based on the Grubbs method of variance estimation [5,6], to establish the degree of measurement precision of on-line gauges. Such an evaluation is typically conducted prior to purchase of a gauge and thereafter perhaps once or twice a year. The method requires that average gauge output over a 6- to 8-h period be compared to assay results from coal samples gathered over the
78
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
same period of time. At least 60 such comparisons are to be made. Taking into account the time and labor required to collect and prepare a coal sample for assay, this means that about 2 months are required just to gather the requisite data demanded by an evaluation of gauge precision. The computations required to estimate gauge precision are, however, relatively trivial and can be readily done on any modern pocket calculator. A recent discussion of the Grubbs method, together with an example of its application to analyzer evaluation, can be found in Lombard [9]. Implementation of the Grubbs method in this particular context is a cumbersome, expensive and time-consuming business. Furthermore, the sampling protocol deviates substantially from that used under normal operating conditions. Thus, it is a moot point how representative the final results are of those that would pertain under normal operating conditions. Finally, the results can be rather disappointing in that the method often ascribes zero variance to the gauge. That such a result should occur is not at all surprising, however. For it is often the case that gauge measurement error variance is substantially smaller than the total variance attributable to sampling and assay. In using the Grubbs method, one is, therefore, attempting to estimate gauge precision by using a rather less precise instrument (sampling and assay) as the standard. Indeed, Grubbs [5] was aware of this circumstance and did not advocate application of his method in situations where there exist wide disparities between the precisions of the instruments being compared. In the present paper, a method is proposed that uses a few hours’ output from the gauge to produce a reliable estimate of its measurement error variance. There is no need to gather and assay any samples. Thus, gauge precision can be checked much more frequently. The potential savings in time and in financial and human resources that implementation of the proposed method implies are therefore substantial. The computations required cannot, however, be performed on a pocket calculator. A personal computer with software capable of generating normal random numbers and performing nonlinear leastsquares fits is required. There are numerous software packages available that are capable of these tasks. All computations in the present paper were programmed in MATLAB. The programs are available from the author on request.
The rationale of the proposed method is simple. The output produced by a gauge over a short time interval (from 1 to 5 min, typically) is a valuable source of information that is ignored completely when the Grubbs method of estimation is used. A salient fact is that there is usually substantial serial correlation between successive gauge readings over short time intervals. The presence of such serial correlation leads to a mode of analysis that obviates entirely the gathering of reference values via sampling and laboratory assay prescribed in ISO 15239 [8]. The main objective of the present paper is to show that, subject to some readily verifiable assumptions, a valid estimate of gauge precision together with a confidence interval can be had from gauge output alone. While such estimation can be accomplished in a number of different ways, we develop here a method based on the variogram of observed gauge output. The variogram is widely recognized as a useful tool in geostatistics and is understood and used extensively in the mining industry. In the statistical model we used, gauge variance appears as a ‘‘nugget effect’’ that is to be estimated. An equally important aspect of gauge evaluation is the determination of its accuracy, that is, the ability of the gauge to measure unbiasedly (i.e. without any offset) the ‘‘true’’ value of the quality characteristic in question. This is equivalent to determining whether the gauge is properly calibrated. In order to answer the question, a ‘‘true’’ value must be defined. Typically, observations reported by a sampling and assay operation are deemed to be unbiased estimates of the ‘‘true’’ value. Thus, the value reported by a gauge is to be compared with the assay value from a reference sample taken from the same body of coal. In this instance, the labor involved in gathering the requisite number of reference samples cannot be avoided. The second objective of the present paper is to demonstrate how an evaluation of gauge calibration can be made if one set of reference samples, obtained under normal operating conditions, is available as a standard of comparison. The proposed method rests on the construction of a confidence band for the bias function, the ideal form of which is a horizontal line that passes through the origin. The proposed methodology uses the bootstrap method of statistical inference. Two standard referen-
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
ces for bootstrap methods are Efron and Tibshirani [3] and Davison and Hinkley [2], to which the reader is referred for a detailed discussion of the underlying philosophy and for a wide variety of bootstrap applications. The layout of the paper is as follows. In Section 2, estimation of gauge measurement variance is discussed. Section 3 illustrates the application of the method to some data. Section 4 considers the question of calibration and illustrates the methodology by application to another set of data. Technical details are kept to a minimum. On-line gauges of various types are presently used in a number of industries to determine characteristics of raw material destined for a production process. While the present paper focusses on the coal industry, most, if not all, results are applicable if ‘‘coal’’ is replaced by ‘‘product’’ throughout the text.
2. Estimating gauge precision
79
of gauge precision, that is, the measurement error standard deviation re. We assume that the variogram of the Xt process, VX ðl; hÞ ¼
1 EðXtþ1 Xt Þ2 2
l = 1,2,. . ., can be expressed in terms of a small number of parameters h=(h1,. . .,hp 1), p z 2. The Yt process from Eq. (1) then has variogram V ðl; h; re Þ ¼ VX ðl; hÞ þ r2e
ð2Þ
which contains p unknown parameters. The variance r2e is commonly known in the geostatistical literature as a ‘‘nugget effect’’. Notice that VX must be a nonconstant function of l in order that the parameters h and re in Eq. (2) can be estimated separately. The latter assumption will be satisfied if the Xt series exhibits nonzero autocorrelation at one or more lags. This will generally be true if the time interval over which the gauge produces an estimated Xt value is short. On the other hand, if each Xt represents a daily value (480 min, say), then the successive Xt values can be expected to be statistically independent because the effective correlation memory will be a negligible fraction of 480. The present analysis would not be applicable in such a case and one would have no recourse other than to gather (at least) two independent reference samples and apply the Grubbs estimation procedure. An important special case of the preceding setup occurs when Xt is an AR(1) process
Consider a hypothetical situation in which a given mass of coal is subjected repeatedly to interrogation by a gauge. The result would be a series of measurements y1,y2,. . . distributed around a mean X. The corresponding measurement errors ej = yj X will be uncorrelated and distributed around a mean of zero and with some unknown standard deviation re. We assume that successive observations are produced over short time intervals in which approximately equal masses of coal are interrogated. We also assume that the distribution of measurement errors is normal (Gaussian) with standard deviation re. The X-value is determined primarily by the composition of the coal in question. The intrinsic variability between contiguous coal masses produces successive X-values that are similarly variable and, more importantly, serially correlated over short time spans. Thus, a series of measurements Yt, t = 1,. . .,n, can be represented in the form
The dt are assumed independently distributed with mean zero and constant variance rd2 and the autocorrelation q satisfies 0 < q < 1. The variogram of X, found by straightforward calculation from Eq. (3) using the fact that rX2 = rd2/(1 q2)—see Harvey [7, p. 17]—is
Yt ¼ l þ Xt þ et
VX ðl; hÞ ¼ r2X ð1 ql Þ:
ð1Þ
where Xt, t = 1,. . .,n, is a serially correlated process with zero expected values and where the measurement errors are independently distributed with common unknown standard deviation re. The problem we address in this section of the paper is the estimation
Xt ¼ qXt1 þ dt :
ð3Þ
This can be written in the form VX ðl; hÞ ¼ r2X ð1 expðl logqÞÞ which is known in geostatistics as the exponential variogram model—see Cressie [1, p. 61]. Here,
80
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
one has p = 3, h=(q, rX2) and, by direct calculation from Eq. (2), V ðl; h; re Þ ¼ r2X ð1 ql Þ þ r2e :
ð4Þ
This model has provided an acceptably good fit to many data sets that we have analyzed. If the simple AR(1) plus noise model (i.e. exponential variogram plus nugget effect) does not fit the data well, one would typically replace the AR(1) process by a low-order ARMA process (see Ref. [7, Section 2.4]). The methodology described below does not, however, rest on any such specific assumption regarding the structure of the Xt process. The only structural requirement is that a parametric model for a variogram that fits the data well be available. A number of such models commonly used in geostatistics are listed by Cressie [1, pp. 61–62]. 2.1. Point estimation V(l; h, re) can be estimated unbiasedly from the observed data Yt by Vˆ ðlÞ ¼
n1 X 1 ðYtþ1 Yt Þ2 : 2ðn 1Þ t¼1
ð5Þ
l¼1
subject to the constraints VX(l;h) z 0 and re z 0. Here, L is small relative to n. The resulting least-squares estimates will be denoted by hˆ and rˆ e, respectively, and the mean squared estimation error by sˆ2 ¼
L X 1 ˆ rˆ 2 Þ2 : ðVˆ ðlÞ VX ðl; hÞ e L p 1 l¼1
varðYt Þ ¼ vX ðhÞ þ r2e where vX (h) denotes the variance of Xt, expressed as a function of h. In the AR(1) case, for instance, vX (h) = h2 = rX2. Denoting the sample variance of the Yt by SY2 and (for the moment only) the estimates obtained from Eq. (5) by hˆ(L) and rˆ e(L), we should expect to have ˆ SY2 cvX ðhðLÞÞ þ rˆ 2e ðLÞ: Thus, it seems sensible to choose L in such a way that the discrepancy between the two sides in the last display, ˆ DðLÞ ¼ ASY2 vX ðhðLÞÞ rˆ 2e ðLÞA
If Vˆ(1),. . .,Vˆ(L) are viewed as the primary data, then h and re can be estimated by least squares, that is, by minimizing L X ðVˆ ðlÞ VX ðl; hÞ r2e Þ2
parameters from Eq. (5), thus giving rise to highly variable estimates. With this in mind, one would be inclined to choose for L a value at which a sill seems to have been reached. A rather more objective choice of L can be made on the following basis. From Eq. (1), we see that
is sufficiently small. The data analysis in Section 3 illustrates the implementation of this idea. Remark 1. It is important to realize that re is expressed on a per unit mass basis. The measurement error standard deviation re(k) pertaining to an aggregate of k intervals (for instance, k = 12 if the time interval is 1 h and measurements are produced every 5 min) can be estimated if the masses m1,. . ., mk of coal passing through a gauge in each of the k constituent time intervals are known. In such a case, the observed value, Y(k) say, is typically a massweighted average of the observations made in each of the constituent intervals, namely
ð6Þ
Note that r2e = V(0) so that estimation of r2e involves an extrapolation of the fit from l = 1,. . ., L to l = 0 which, it is reasonable to suggest, should be based primarily on estimated values Vˆ(l) with indices l that are ‘‘close’’ to zero. Choosing L too small, however, leaves too few degrees of freedom after estimating the
ð7Þ
Y ðkÞ ¼
k X
wt Yt
t¼1
where wt ¼
mt : m1 þ : : þ mk
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
81
their least squares estimates hˆ and rˆ e, respectively. An asymptotic 100(1 a)% equitailed confidence interval for re is
It then follows from Eq. (1) that Y ðkÞ ¼ X ðkÞ þ eðkÞ
ˆ rˆ e þ z1a=2 cˆ pp sÞ ˆ ðrˆ e z1a=2 cˆ pp s;
where k X
k X
Under the usual ‘‘independent and identically distributed residuals’’ assumption, nonlinear regression theory says that the parameter estimates (hˆ, rˆ e) are asymptotically jointly normally distributed with covariance matrix Cs2 where s2 is the residual variance and
where z1 a/2 denotes the 1 a/2 quantile of the ˆ standard ˆ normal distribution. The substitution of h and re for the unknowns h and re in cpp adds substantial variability to the statistic and has the consequence, especially when L is small, that the true coverage probability of the confidence interval differs substantially from the nominal value. Also, because of statistical dependence between successive estimated variogram values, it is not true that the residuals from the fit are independently or identically distributed. A more trustworthy confidence interval can be obtained by applying an appropriate bootstrap method. By ‘‘appropriate’’ we mean that one does not resample the residuals from the least squares fit but rather simulates realizations Yt* of the original data, pretending that hˆ and rˆ e are the true parameter values. If there is good reason to believe that both Xt and et in Eq. (1) are normally distributed, a parametric bootstrap can be used. Since V(l; h, re) does not depend upon l, the latter parameter can be set equal to zero in the simulation process. In the AR(1) case, simulated realizations of the Yt process are constructed from standard independent Gaussian random numbers g0,. . ., gn, e1,. . ., en by setting
C ¼ ðU VU Þ1 ;
ðaÞ X0* ¼ rˆ 2X g0
X ðkÞ ¼
wt Xt ;
eðkÞ ¼
t¼1
wt et ;
t¼1
so that r2eðkÞ ¼ ðw21 þ : : : þ w2k Þr2e : If the mass flow is (more or less) constant, so that wt = 1/k for all t, then e(k) = e¯, the average error, and the usual formula re re¯ ¼ pffiffiffi k for the standard deviation of an average of independent errors is recovered. 2.2. Confidence intervals
and, for t = 1,. . .,n,
U being the L p matrix with lth row
* þ rˆ X ð1 qˆ 2 Þ1=2 gt and ˆ t1 ðbÞ Xt* ¼ qX
BVX ðl; hÞ : : : BVX ðl; hÞ ; ; ; 2re : Bh1 Bhp1
Yt* ¼ Xt* þ rˆ e et :
Denote the square root of the element in the pth row and pth column of C by cpp. Confidence intervals for re can then be obtained via the distribution of the studentized statistic s ¼ ðre rˆ e Þ=ˆcpp sˆ
ð8Þ
with sˆ from Eq. (6) and where cˆpp is obtained by replacing the unknown parameters h and re in cpp by
Then calculate ˆ s* ¼ ðrˆ e rˆ e*Þ=ðˆcpp * s*Þ where a star indicates that the quantity in question is calculated from the Yt* data in the same way that the corresponding unstarred quantity was calculated from the Yt data. Repeating this a large number of times gives an empirical distribution of s* values that approximates the exact distribution of s. The degree of approximation is usually excellent when a studen-
82
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
tized quantity such as s is bootstrapped (see Ref. [2], Sections 5.2.1 and 5.4.1). Denoting the a/2 and 1 a/ 2 quantiles of the empirical distribution by q*1,a and q*2,a , respectively, the 100(1 a)% parametric bootstrap confidence interval for re is then ˆ rˆ e þ q*2;a cˆ pp sÞ: ˆ ðrˆ e þ q*1;a cˆ pp s; Remark 2. The AR(1) case with q = 1 also formally falls within the ambit of the methods discussed in this paper. To see this, it is sufficient to observe that rX2(1 ql) equals rd2(1 ql)/(1 q2) and that the latter tends to a limit lrd2/2 as q tends to 1. Thus, model (4) now becomes V ðl; r2d ; re Þ ¼ lr2d =2 þ r2e : This is nothing other than the linear variogram model listed in Cressie [1, p. 61]. The point and interval estimation methods remain valid except that in generating bootstrap samples, one now takes X0*= 0. However, in this case, the Yt process is nonstationary so that the criterion (7) can no longer be used to choose the cutoff lag L.
3. Data analysis This section illustrates the application of the preceding method to some data. Fig. 1 shows a
Fig. 2. Estimated variogram (stars) and exponential variogram (dots) fitted by nonlinear least squares.
plot of n = 480 values of Yt against t. The data, which are the output from an on-line coal gauge, give the measured calorific value content (in megaJoules per kilogram) for 480 consecutive 1-min intervals. The assumption underlying the analysis is that each observed Y value is the sum of two unobservable values, namely, a true value over the 1-min interval in question and a measurement error inherent in the gauge. The objective is to estimate the standard deviation of the gauge measurement error. Another assumption is that the Yt data are Gaussian. A standard normal Q – Q plot of the data, not shown here, indicated excellent conformance to a normal distribution. The stars in Fig. 2 are a plot of the estimated variogram Vˆ(l) for l = 1,. . .,30. A sill seems to have been reached around l = 10. Fig. 3 shows a plot of the discrepancy D(L) in Eq. (7) for L = 9,. . .,30. There is a local minimum at L = 21, so we choose this value for use in Eq. (5). The corresponding least squares estimates of the parameters are then found to be rˆ e ¼ 2:03;
Fig. 1. A series of 480 1-min calorific value (CV) determinations from an on-line gauge.
qˆ ¼ 0:62;
rˆ X ¼ 1:60:
The dotted curve in Fig. 2 is the corresponding fitted variogram. Next, we applied the parametric bootstrap to the 480 calorific values from Fig. 1. Fig. 4 displays a
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
Fig. 3. Plot of the discrepancy function D(L) in Eq. (7) against the lag L.
83
confidence interval for re is (1.84, 2.04). This interval is wider than the normal theory interval by a factor of 1.6. We can use these data to illustrate the importance of obtaining gauge results over sufficiently short time intervals. The data from Fig. 1 were aggregated into 48 contiguous, nonoverlapping, intervals of 10 min each and the average, Y¯, of the 10 values in each new interval calculated. Fig. 5 shows plots of the variogram of the resulting series of averages and of the original series for k = 1,. . .,25. The averaging has clearly destroyed the curvature at small lags that was apparent in the 1-min variogram and with it, one’s ability to estimate r2e by the extrapolation method described above. All that can be estimated from the averaged data Y¯ ¼ l þ X¯ þ e¯
normal Q –Q plot of 2000 bootstrap values of the statistic s in Eq. (8). If the large sample normal theory were applicable, the bootstrap values would cluster around the straight line shown in the figure. Inspection of the figure reveals substantial differences between the bootstrap distribution of s and the large sample normal approximation. The 0.025 and 0.975 bootstrap quantiles are q*1,a = 4.84 and q2,a = 1.19, respectively, compared to the normal theory values of 1.96 and 1.96, respectively. Thus, a 95% bootstrap
It is also of some interest to contrast the standard errors of the estimates of measurement precision delivered by the variogram method and by the Grubbs method in this particular application. For this, we recall from Remark 1 in Section 2.1 that re is expressed on a per unit time basis. Thus, if k min of
Fig. 4. Normal Q – Q plot of 2000 bootstrapped values of the statistic s in Eq. (8).
Fig. 5. Estimated variograms from 1- and 10-min intervals.
is the variance r2Y¯ ¼ r2X¯ þ r2e =10:
84
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
coal sampling is required to prepare one reference sample for assay, then the average measurement error over an aggregate of k intervals has standard deviation pffiffiffi re¯ ¼ re = k. The Grubbs estimation procedure yields directly an estimator jˆ G of re¯ based on n pairs of assays and n gauge averages—see, e.g. Lombard [9] for a detailed description. The variogram method, pffiffiffi on the other hand, yields the estimator rˆ e¯ ¼ rˆ e = k of re¯ from k individual 1-min readings. It is the variability of these two estimates that will be compared. Suppose 8 h of coal sampling is required to prepare one reference sample for assay, so that k = 480. Then re¯ has standard error pffiffiffi pffiffiffi SEðrˆ e¯Þ ¼ SEðrˆ e Þ= k c0:23= 480 ¼ 0:011
ð9Þ
where SE(rˆe) c 0.23 comes from the 2000 bootstrap estimates rˆe* of re. On the other hand, the standard error of the Grubbs estimator rˆ G of re¯ is pffiffiffi SEðrˆ G Þcð2rˆ 4e¯ þ 2rˆ 2e¯ r2a þ r4a Þ1=2 =ð2 nÞ
4. Checking gauge calibration Another important aspect of gauge evaluation is to determine whether the gauge is indeed measuring the true value of the characteristic in question, that is, whether the gauge is properly calibrated. The numbers v1,. . .,vm produced by a series of sampling and laboratory analyses applied to m distinct batches of coal typically serve as unbiased estimates of what is meant by ‘‘true’’ values. Thus, the standard for what is ‘‘true’’ and what is ‘‘not true’’ is set by sampling and laboratory analysis. A batch would consist typically of 8 h of coal flow, and each gauge result for the batch is then an aggregation of 480 1-min determinations. Denote by c1,. . .,cm the corresponding gauge results for the m batches of coal. If the gauge calibration is good, a scatterplot of the (v, c) pairs should cluster around a straight line which has unit slope and passes through the origin. Any statistically significant deviation from this pattern indicates that the gauge is not measuring the ‘‘true’’ value of the characteristic. Fig. 6 depicts an instance in which a calibration bias might be present. A paired t-test does not, however, detect any such bias from these data. The reason is that the bias, if any, is clearly not of a simple location (fixed offset) type. Denote by xj the expected value of the jth batch (as seen by the gauge) and which it measures with an error uj. Then
(see, e.g. Ref. [9]) where ra denotes the standard deviation of the error due to sampling, preparation and assay. This value depends substantially on the sophistication of the sampling system. We use here the value ra = 0.345 which was estimated in Lombard [9]. Also, in the present instance, we have pffiffiffi pffiffiffi rˆ e¯ ¼ rˆ e = k ¼ 2:03= 480 ¼ 0:093:
vj ¼ x j þ uj :
Thus
The sampling and analysis results can be expressed as
pffiffiffi SEðrˆ G Þc0:006= n:
ð10Þ
Equating the right-hand sides of Eqs. (9) and (10), we see that n = 36 days of gathering observations are required for the Grubbs estimator to deliver the estimation precision that the method proposed in Section 2 can deliver in 1 day. Of course, the difference between the standard errors delivered by the two methods diminishes if the time interval between gauge readings is increased (from 1 to 2 min, say) and will disappear completely as soon as this time interval exceeds the effective correlation memory of the underlying 1-min series. In the previous paragraph, it was shown that for the data considered here, the advantage has already dissipated completely at time intervals of 10 min and more.
cj ¼ b0 þ b1 xj þ ej
ð11Þ
ð12Þ
where ej denotes sampling and analysis error and where b0 and b1 are constants. The number b0 + b1xj represents the ‘‘true’’ value for the jth batch. If we define a bias function for the gauge by bðzÞ ¼ b0 þ ðb1 1Þz; then the gauge calibration is good if and only if b(z) = 0 for all z. Below, we will construct simultaneous 100(1 a)% confidence intervals (cj, Cj) for the bias function evaluated at the observed gauge results, i.e. Prfcj Vbðvj ÞVCj for j ¼ 1; : : : ; mg ¼ 1 a:
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
85
Section 1. Typically, m could be 30 8-h days, each day producing 480 1-min gauge readings. Thus rˆ u2 is an average of 30 independent estimates of ru2. The 1-min variance estimate made in Section 3 had a standard of 0.837. This implies a standard error of pffiffierror ffi 0:837= 30 480 ¼ 0:007 for the estimate over a 30day period assuming, of course, that the variance has remained constant over that time period. The estimate rˆ u2 can therefore be considered essentially error-free in this, and in most other, practical circumstances and we will treat it as such below. Now, suppose we can find a constant da such that ( Pr
ADðcj ÞA Vd1a for j ¼ 1; . . . ; m Vcj
) ¼ 1 a;
Fig. 6. Plot of 30 (gauge, reference) pairs of CV values.
The hypothesis that the gauge calibration is satisfactory is accepted at the 100a% level if and only if cj V 0 V Cj for j = 1,. . .,m. It follows directly from Eqs. (11) and (12) that rvc b1 ¼ 2 ; b0 ¼ lc b1 lv : rv r2u Assume until further notice that ru2 is known. Then the method of moment estimates of b1 and b0 are bˆ1 ¼
Svc ; Sv2 r2u
bˆ0 ¼ c¯ bˆ1 x¯ :
ð13Þ
The corresponding estimate of the gauge calibration function is bˆ(z) = bˆ0 + bˆ1z. Denote the discrepancy between the estimate and the true value by
that is, d1 a is the 100(1 a) percentile of the studentized statistic t ¼ max
1VjVm
ADðcj ÞA : Vcj
ð15Þ
Then a set of simultaneous 100(1 a)% confidence intervals (cj, Cj) for b(cj) is given by ˆ j Þ d1a Vc ; cj ¼ bðc j
ˆ j Þ þ d1a Vc : Cj ¼ bðc j
ð16Þ
Since the small sample distribution of the statistic t is unknown, we estimate the constant da via a nonparametric bootstrap procedure. For this, we sample with replacement m pairs (xj*, cj*), j = 1,. . .,m from the collection {(xj*,cj*), j = 1,. . .,m} of observed pairs and compute the statistic
ˆ bðzÞ: DðzÞ ¼ bðzÞ It is show in Appendix A that the variance of D(z) in large samples is approximately 1 Vz2 ¼ m
( 2 Sc bˆ1 v
þ
2 2 Sv2 Sc þ Sv;c bˆ v bˆ 1
ðSv2 rˆ 2u Þ2
) 1v
¯ ðz vÞ
2
ð14Þ where rˆ u2 is an estimate of ru2 derived from the output of the gauge over the full duration of the trial by, for example, the method described in
t* ¼ max
1VjVm
AD*ðc*ÞA j : Vc** j
Repeating this resampling a large number of times produces an empirical distribution of t* values that serves as surrogate for the true (unknown) distribution of t. Denote the 100(1 a) percentile of the empirical distribution of t* values by d1* a A set of simultaneous 100(1 a)% confidence intervals for b(vj), j = 1,. . .,m is then given by Eq. (16) with d1 a replaced by d1* a.
86
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
resources that implementation of the proposed method implies are therefore substantial. The possibility of estimating measurement precision from gauge output alone suggests a further possibility, namely, to monitor gauge output to detect quickly any substantial degradation in measurement precision and/or accuracy. Such monitoring can be implemented at various levels of sophistication using adaptations of techniques that are well known in the statistical quality control literature. These latter issues will be dealt with in a separate paper.
Appendix A Set Fig. 7. Simultaneous confidence intervals (CI) and the estimated and ideal bias functions for the data from Fig. 6.
v ¼ c b 0 b1 v and
Fig. 7 shows a plot of the 95% simultaneous bootstrap confidence intervals computed from B = 2000 resamples of the data from Fig. 6. There are 30 such intervals and for visual effect, the upper limits have been connected by straight-line ˆ segments, as have the lower limits. The estimate ju2 = 0.128 was used in the calculations. Also shown in Fig. 6 are the horizontal line through 0 (i.e. the ideal bias function) and the estimated bias function. Since the ideal bias function is not wholly contained within the intervals, the conclusion is that calibration bias is present.
T¼
Sv;v rv;v : r2x r2u
Some algebraic manipulation gives DðzÞ ¼ v¯ þ ðbˆ1 b1 Þðz lv Þ ðbˆ1 b1 Þðv¯ lv Þ while it follows from results in Fuller [4,p,16] that bˆ1 b1 ¼ T þ OP ðm1 Þ and T ¼ OP ðm1=2 Þ: Also, by the central limit theorem, v¯ lv = OP (m 1/2). Thus,
5. Summary
DðzÞ ¼ v¯ þ T ðz lv Þ þ OP ðm1 Þ
This paper has described statistical methods for estimating the measurement precision and accuracy of on-line gauges. The estimation of measurement precision can be made using gauge output alone. This is in contrast to the Grubbs estimation method, which requires collection and assay of a relatively large number (at least 60, according to ISO 15239 [8]) of duplicate samples. Typically, it would then take at least 60 working days to gather the requisite data. We saw that the method proposed in this paper can produce a reliable result in just a few hours without the need to gather and assay any samples. The potential savings in time and in financial and human
follows. Now, the variance of v¯ + T(z lv) is ( ) r2v r2v þ r2v;v 1 2 2 ¯ rv þ ðz vÞ m ðr2v r2u Þ2
ð17Þ
if one assumes that the random variables x, e and u, hence c and v, are normally distributed. The estimate Vu2 from Eq. (14) follows upon replacing the variances and covariances in Eq. (17) by their sample equivalents and replacing b0 and b1 by bˆ0 and bˆ1 when calculating the latter. If the normality assumption is dropped, a rather more complicated expression for the variance of D(z), involving third and fourth cumulants in the joint
F. Lombard / Chemometrics and Intelligent Laboratory Systems 69 (2003) 77–87
distribution of (c, v), is obtained. However, the main purpose of developing an approximate variance estimate is to obtain a standardized version of the statistic D(u) that is approximately pivotal (see [Ref. 2, Sections 2.6.1 and 5.2]). The variance estimate Eq. (14) should serve this purpose quite well unless, of course, there is evidence of substantial nonnormality.
References [1] N.A.C. Cressie, Statistics for Spatial Data, Wiley, New York, 1991. [2] A.C. Davison, D.V. Hinkley, Bootstrap Methods and their Application, Cambridge Univ. Press, Cambridge, 1997.
87
[3] B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall, New York, 1993. [4] W.A. Fuller, Measurement Error Models, Wiley, New York, 1987. [5] F.E. Grubbs, On estimating precision of measuring instruments and product variability, J. Am. Stat. Assoc. 43 (1948) 243 – 264. [6] F.E. Grubbs, Errors of measurement, precision, accuracy and the statistical comparison of measuring instruments, Technometrics 15 (1973) 53 – 66. [7] A.C. Harvey, Time Series Models, 2nd ed., Harvester Wheatsheaf, London, 1993. [8] ISO 15239, Solid Mineral Fuels—Evaluation of the Measurement Performance of On-Line Analyzers, International Standards Organization, Geneva, 2001. [9] F. Lombard, Statistical assessment of on-line analyzers, Chemometr. Intell. Lab. Syst. 37 (1997) 281 – 289.