Validation of accuracy by interlaboratory programme

Validation of accuracy by interlaboratory programme

Tulmru. Vol. 32. No. I I. pp IOXX-1091, 19X5 Prmted I” Great Brllam 0039-9 140/x5 $3.00 + 0.00 Pergamon Press Lid VALIDATION OF ACCURACY BY INTERLAB...

339KB Sizes 0 Downloads 52 Views

Tulmru. Vol. 32. No. I I. pp IOXX-1091, 19X5 Prmted I” Great Brllam

0039-9 140/x5 $3.00 + 0.00 Pergamon Press Lid

VALIDATION OF ACCURACY BY INTERLABORATORY PROGRAMME R. SUTARNO and Mineral

Sciences

Laboratories,

Canada

(Received

HENRY

F. STEGER

Centre for Mineral and Energy Ottawa, Ontario, Canada

22 March

1985. Accepted

18 May

Technology,

555 Booth

Street,

1985)

Summary-A statistical design is proposed for assessing the accuracy of an analytical method by its application to a certified reference material in an interlaboratory programme. The validation of accuracy is based on the difference between the certified value and the overall mean of the test programme and ts linked to the concept that below a certam limtt this difference has no practical significance. It is shown that a certified reference material cannot be used to detect bias m a method if the bias is smaller than the confidence interval of the certified value.

reference materials (CRMs) are widely applied to ensure the reliability of analytical data by use in the calibration of instruments or standardization of reagents and in validation of the accuracy of analytical methods. Nevertheless, there are no general guidelines for relating the uncertainty of the certified value of the reference material to the effectiveness of its uses. Recently, however, Sutarno and Steger’ proposed an experimental design for validation of the accuracy of an analytical method by analysis of a CRM, in which the uncertainties in the CRM certified values and in the method are taken into account. This procedure requires that the CRM be analysed by the same analyst 10 times, either in a single batch of determinations or in smaller batches over a period of time. The accuracy of the method is validated if the difference between the certified value for the CRM and the mean value obtained by the method is not statistically significant. The significance is based on the magnitude of the uncertainty in the certified value of the reference material, so the better characterized the reference material, i.e., the less the uncertainty in the certified values, the more rigorous is the validation of the accuracy, or the lower the level at which bias in the method can be detected. From time to time there arise occasions when the analyst is required to validate the accuracy of a method (i.e., to detect bias) to a degree that is not with the statistical attainable by comparison significance of the uncertainty in the certified values for the CRMs available. This paper presents an alternative statistical design for such validation by applying the analytical method to a CRM in an interlaboratory programme. The proposed test procedure is also applicable to the validation of the “long-term” accuracy of a method by one analyst if at least IO batches of at least two replicate determinations each are done, each Certified

Crown

copyrights

reserved

batch at a different time. The results then constitute a multi-data set (i.e., a quasi-interlaboratory programme) in which the variance of the overall mean has a between-periods component. If more than one analyst participates, however, the variance of the overall mean has both between-periods and betweenanalysts (i.e., between-laboratories) components, but these are inseparable and appear simply as the between-periods variance. The proposed statistical design has been submitted by the Canadian Certified Reference Materials Project to the Council Committee on Reference Materials (REMCO) of the International Organization for Standardization for inclusion in ISO/REMCO Guide 33 “The Use of Certified Reference Materials”.

UNCERTAINTY

OF CRMs

The effect of the mode of certification on the type of statistical parameters available and on the estimation of the uncertainty in the certified value, A,, of the CRM was discussed in the preceding paper.’ For a reference material certified by an interlaboratory programme, the estimate of the precision of A, is expressed as cr& the variance of A,, or alternatively as a confidence interval (usually 95’;/,). The magnitude of rrc is estimated from 05 = (St, + SfJn,)lNc if more than one method

(1)

was used, or

cr; = (ate + cr;Jn,)/Nc

(2)

if all participating laboratories used the same method; here, N, = number of laboratories (excluding those giving results subsequently decided to be outliers) that participated in the certification of the reference material, n, = average number of replicate determinations laboratory, S,,, and per 1088

1089

ANNOTATIONS

u Lc= between-laboratories

standard deviations, S,, and urc = within-laboratory standard deviations. For a reference material certified by use of a “definitive” method by one laboratory, cLc = 0 (by definition) and of becomes cr,2= Qr,.

(3)

For convenience, S,, and aLc will both be denoted hereafter by gL, and S, and crc by a,,,. VALIDATION

OF ANALYTICAL

recommended. are:

The statistical

calculated

1 = the overall mean (excluding outliers) of the test programme results V[x’] = variance of x’ 0 Lm= between-laboratories standard deviation of the measurement procedure, estimated by S Lrn (T, = within-laboratory standard deviation of the measurement procedure. estimated by S,

METHODS

The concept of the use of CRMs to validate the accuracy of an analytical method by interlaboratory programme is based on three premises. First, that the certified value of an element in a CRM is the best estimate of the true value. Second, that the analytical method is validated if the overall results of the test programme differ from the corresponding true values for the CRM by no more than can be accounted for by statistical fluctuation. Third, that the accuracy of the analytical method is also validated even if the difference between the overall mean value R for the test programme and A, of the CRM is statistically significant but is of such magnitude that it is negligible for practical purposes. The last premise is necessary because the probability of detecting a statistically significant difference between x’ and A, increases with increasing number of laboratories participating in the programme and/or increasing degree of replication.

GENERAL PROCEDURES FOR INTERLABORATORY PROGRAMMES

A physical and documentary validation of the CRM’ should be performed before use of the CRM in an interlaboratory programme. Subdivision of a unit of the CRM, before distribution, must be done with care to avoid introducing any additional significant error, either systematic or random. The organization, physical execution and statistical analysis of the results of such an interlaboratory programme are outside the realm of this paper. Details are available from many sources, but ISO/REMCO Guide 351~or document IS0 5725’ are

-M

The number minations

of‘ laborutories

c3=A,-and the variance

(4)

of the bias is V[6] = V[.R] + 0; Y[c5]=(&+$‘)/k

(5) +a&

(6)

Theoretically, the measurement procedure is free of bias if 6 = 0. In practice, however, 6 $0 but the statistical significance of 6 depends on the magnitude

M

8=A,-x' I. Frequency

und of repplicute deter-

The number of laboratories, k, and the number of replicate determinations, n, of the test interlaboratory programme should ideally be selected according to the tolerable difference, A, between A, and x’ at the significance levels a and /l. The parameter s( is the probability of concluding that A, and x’ are statistically significantly different when in fact they are not; /? is the probability of concluding that A, and x’ are not statistically significantly different when in fact they are. In practice, however, k and n are seldom decided upon in this way, and indeed there are no firm guidelines for choosing them. One recommendation4 is that k and n should be at least 8 and 2, respectively. All laboratories should perform the same number of replicate determinations. The overall mean of the results of the interlaboratory programme represents the best estimate of the value of that characteristic of the CRM by the measurement procedure. The accuracy of the measurement procedure is defined as the agreement between x’ and A,. The bias of the measurement procedure, 6, is

0

Fig.

parameters

distribution

of 6 = A, - R.

I090

ANNOTATlONS

of V[6]. The larger k and/or n, the smaller will be V[6] and the higher the likelihood that 6 is statistically significant. In other words, increasing k and/or n increases the chance of being able to detect smaller measurement bias. If k and/or n are large enough. values of 6 may be obtained that are statistically but not practically significant, and it IS necessary to invoke a parameter, i., which is the minimum value of 26 which can be assumed to be practical significance. To clarify the concept of E., let us consider the frequency distribution of 6 = A, - f. The null hypothesis is that the measurement procedure is unbiased. In that case the distribution of S has a mean of zero and variance V[6]. Figure 1 shows that even if the hypothesis is correct, there is a chance r that the measurement procedure will be rejected as being biased. For c( = 0.05, the acceptance criterion is

/A, - R 15 2( V[S])“?

(7)

The alternative possibility is that the measurement procedure is in fact biased by an amount M. In this case the frequency distribution of 6 has a mean of M (or --M) and the same variance (Fig. I, broken lines), and the chance that the measurement procedure is accepted as free from bias is /Ll.For fl = 0.05, the value of M will be M = 4( V[s])“1

(8)

M is the minimum value of bias in the measurement procedure that can be detected from the interlaboratory results at the probability levels a = 0.05 and p = 0.05. For a given characteristic of a CRM, M decreases as V[x’] decreases (Le., as n and k increase); the minimum occurs at V[x’] = 0: Mln,” = 40,

(9)

The right-hand side of equation (9) is approximately the width of the 957; confidence interval of A,. In other words, a CRM cannot be used to detect bias in a measurement procedure if it less than the confidence interval of the estimate of A,. Sometimes the value of M,,, for a characteristic of a CRM is so small that it is practically insignificant. In this case, the measurement procedure can be accepted as practically free of bias if M is less than the critical value 1: 1 2 M,,, > 4( V[S])“?

(10)

Table

1. Certification

Statistical

and test parameters

parameters

(A) Crrr$ficure Element

statistical

SCH-I

ecu-I

durs Fe (total) 60.75”,, 0.09”,, 0.20% 0 0017%

AL bw

OL oc (B)

programme

Ag

I39 I’ g/g 2.1 P&T/g 7 5flglg

19P!zk

.$zrerlahorator,v programme 60.67% 0.10% 0.06% 34 III 3.26

&l Oh k N n = N/k

145.2 /Lg,‘g I 12/1g,g

1 I’lcgig IO 20 2

a series of combinations of k and n can be computed for use in setting up a test programme. Assessment

of accuraq

The accuracy of the measurement procedure is checked by comparing the overall mean of the interlaboratory program, x’, with the certified value of the CRM, A, by: IA,-x’[

<2Jm

(12)

where U: is the uncertainty associated with the CRM and is given by equations (l)-(3), and uh is the uncertainty associated with the overall mean of the interlaboratory comparison for the measurement procedure, and is given by o:, = (St, Two decisions

+ S;n)/k

(13)

are then possible.

(I) If equation (12) is satisfied, the measurement procedure is accepted to be as accurate as those used to certify the CRM. There is only a 5% chance (or less) that the procedure is, in fact, biased by an amount 1 or greater. (2) If equation (12) is not satisfied, 1A, - x’ 1 is statistically significant but if 1A, - f 1 1/2 the measurement procedure is not sufficiently accurate for the intended purpose. EXAMPLES OF THE USE OF CRMs (I) A CRM, SCH-I,’ was used for assessing by interlaboratory comparison the accuracy of a method to determine total iron (ISO/TC 102/SC2 N768E).

Le..

Table 2. Results for CRM CCU-I 1. >4[oi+(4~m;o’ln)~;’

(11)

On the assumption that the analytical method being tested for accuracy is as precise as the method(s) used to certify the reference material, uLm and 0, can be replaced by uL and Q,,, for the reference material, and

Trial I 2 3 4 5

Ag. pg/g 147, 144. 147, 144, 145,

I45 143 149 143 147

Trial 6 7 8 9 IO

Ag. p gig 143. 147, 141, 147. 145,

I45 148 143 146 I44

1091

ANNOTATIONS

The pertinent Table I.

statistical

parameters

are reported

in

The method may still be acceptable accuracy since

with respect

to

IA‘ - R I G /./2 Test for accuraq so i. = 2 x 6.2 = 12.4 pg/g Ag if the analyst accepts that the bias is not practically significant. By comparison, if the results for CCU-I are treated by the procedure of Sutarno and Steger’ as representing a quasi “single-shot” investigation, the mean value, R, would of course be the same as x’. The criterion for validation of accuracy would then be

1A, - x’ 1= 60.75 - 60.67 = 0.08% Fe rr; = V[A,] = 0.0017 & = [St, + SQn]/k 2,/m

= 0.0002

= 0.087% Fe

IA,-?1

<2Jm: IA,-RI the method is sufficiently and shows no bias.

<22a,<

lS.Opg/g

accurate

(2) A certified copper concentrate, CCU-l,h was used to assess the long-term accuracy of a combined fire-assay/atomic-absorption procedure for silver determination, the same analyst analysing duplicate test portions once each month over a period of lOmonths. The results are reported in Table 2; the estimated statistical parameters are summarized in Table 1.

and since 1A, - j? 1 = 6.2 pg/g Ag, the bias of the method is assessed as statistically insignificant. The two types of accuracy tests performed for CCU-I illustrate the greater power of the interlaboratory programme procedure for detecting bias in an analytical method, and this is therefore the recommended procedure whenever the higher cost in terms of time and expense of such a programme warranted by the ultimate purpose of the method. REFERENCES

Test for accuraq 1A, - f

( = 6.2 pg/g Ag 0; = V[A,] = 1.9

o:, = IS:, + Si/n]/k

= 0.188

the bias of the method statistically significant.

is

I. R. Sutamo and H. F. Steger, Taluntu. 1985. 32, 439. 2. ISO/REMCO Guide 35-1984(e), Certification of Reference Materials -General and Sfatistical Principles. 3. 3. IS0 5725-198 I(E), m IS0 Srandurds Handbook Geneva, 1981. 4. K. A. Brownlee, Statistical Theory und Methodology in Science and Engineering. Wiley, New York. 1960.. 5. H. F. Steaer. W. S. Bowman. R. Sutarno and G. M. Faye, CANMET Rept., 75-168(TR). Energy, Mmes and Resources, Canada, Ottawa, 1975. G. H. Faye, W. S. Bowman and R. Sutarno. CANMET Repr., 79-16, Energy, Mines and Resources. Canada, Ottawa, 1979.