Journal of Geochemical Exploration, 9 (1978) 23--30 © Elsevier Scientific Publishing Company, Amsterdam -- Printed in The Netherlands
23
A NEW APPROACH TO THE ESTIMATION OF ANALYTICAL PRECISION
MICHAEL THOMPSON and RICHARD J. HOWARTH Applied Geochemistry Research Group, Imperial College, London SW7 2BP (Great Britain) (Received February 7, 1977; revised and accepted June 6, 1977)
ABSTRACT Thompson, M. and Howarth, R.J., 1978. A new approach to the estimation of analytical precision. J. Geochem. Explor., 9: 23--30. Methods used for the estimation of analytical precision commonly suffer from two deficiencies which give rise to misleading results. These are: (1) the methods take no account of changes in absolute or relative error over the concentration range, and (2) they tend for other reasons to produce optimistically-biassed results. The difficulties can be avoided by the correct use of duplicate determinations. One method presented allows precision parameters to be estimated. The other gives rise to a simple control chart for use in geochemical analysis.
INTRODUCTION In p r e v i o u s w o r k it has b e e n s h o w n t h a t w h e n t h e r e is a wide range o f c o n c e n t r a t i o n in a set o f s a m p l e s , b o t h t h e a b s o l u t e a n d relative errors in a n a l y t i c a l d e t e r m i n a t i o n s can v a r y significantly o v e r t h e range ( T h o m p s o n a n d H o w a r t h , 1 9 7 6 ) . In such a case, t h e s t a n d a r d d e v i a t i o n a l o n e c a n n o t p r o p e r l y d e s c r i b e a n a l y t i c a l precision. An e q u a t i o n or p l o t relating s t a n d a r d d e v i a t i o n t o c o n c e n t r a t i o n is m o r e a p p r o p r i a t e . This a p p r o a c h leads t o a realistic e s t i m a t e in c o n t r a s t t o t h e usual a s s u m p t i o n o f e i t h e r a c o n s t a n t a b s o l u t e e r r o r ( b y using t h e s t a n d a r d d e v i a t i o n ) , or a c o n s t a n t relative e r r o r ( b y using t h e c o e f f i c i e n t o f v a r i a t i o n ) . I t has b e e n s h o w n t h a t a m e a n ingful e s t i m a t i o n o f p r e c i s i o n o v e r a r a n g e o f c o n c e n t r a t i o n s can be o b t a i n e d b y t h e d u p l i c a t i o n o f analyses. In a d d i t i o n , m a n y a s p e c t s o f t h e m e t h o d s c u r r e n t l y u s e d t e n d t o p r o v i d e o p t i m i s t i c a l l y biassed e s t i m a t e s o f a n a l y t i c a l p r e c i s i o n . Such e s t i m a t e s m a y b e i m m e d i a t e l y c o m f o r t i n g , b u t are u l t i m a t e l y u n h e l p f u l . P r o p e r use o f d u p l i c a t e d d a t a can a v o i d t h e s e difficulties. In this p a p e r w o r k i n g i n s t r u c t i o n s are p r e s e n t e d f o r t w o s i m p l e p r o c e d u r e s utilising d u p l i c a t e analyses. A d e t a i l e d discussion o f t h e p r o c e d u r e s w i t h e x a m p l e s o f t h e i r a p p l i c a t i o n c a n be f o u n d e l s e w h e r e ( T h o m p s o n a n d H o w a r t h , 1 9 7 6 ) . In t h e first m e t h o d t o b e used, w h e r e 50 or m o r e d u p l i c a t e d
24 results are available, the variation of standard deviation of the determination (Sc) is expressed as a function of the concentration (c) and the standard deviation at zero concentration (So), thus: Sc = So + k c
(1)
The parameters So and k can be used to quantify the precision (Pc) at concentration c, by means of the definition P c = 2Sc/C. This gives the expression: Pc = 2So/C + 2k
(2)
Thus, the value 2k corresponds to the precision which is observed at concentrations well above the detection limit. In suitable cases, i.e. when there are observations at low concentrations, the practical detection limit Cd (when P c = 1.0) can also be estimated from the expression: Cd = 2s0/(1 -- 2k)
(3)
This m e t h o d thus enables the precision characteristics of an analytical system to be estimated. The second m e t h o d is appropriate when the number of duplicate observations available is insufficient for the valid estimation of the parameters So and k to be made. In this case the data are tested against an empirical standard of precision, again expressed in the form of equation (1). The m e t h o d consists of a rapid graphical procedure which can be used as a useful control chart for within-batch precision when about 10 or more duplicated results are available. The success of either of these methods depends critically upon the duplication being properly carried out, according to the following conditions: (1) The duplicate analyses must be performed on splits of all the actual samples, or a random selection of them. (2) Each of the two sub-samples must be taken through the whole analytical procedure as if it were an independent sample. (3) The position of the second sub-sample in the analytical sequence must n o t be systematically related to that of the first, but should be distributed at random in the batch of samples passing through the analytical process. This makes the methods generally easier to use for within-batch precision studies. (4) The data must not be rounded off too severely. At least one significant figure containing uncertainty must be retained. (5) Sub-zero measurements when obtained must be recorded and used as such and not set to zero or other arbitrary value. The same applies to values falling below a presumed detection limit. (6) The sub-samples should be numbered so that it is impossible for them to be identified as such at the time of analysis. The omission of any of these steps will give optimistically-biassed esti-
25
mates which will n o t truly reflect the precision behaviour of the system. A computer program for the automatic randomisation of samples and selection of duplicates has been published (Howarth, 1977). If an assessment of the total effect of field error plus sampling error is required, then duplicate samples taken at the field site should be treated as the two sub-samples. Further splitting of the field samples is not required. M E T H O D 1: F O R 50 O R M O R E D U P L I C A T E D R E S U L T S
Procedure
(1) From the (N>50) pairs of results ai, bi (i = 1, 2 , . . iV), form lists of the pair means (ai + b i ) / 2 and the corresponding absolute differences i ai - - bi I (do n o t logtransform the data). (2) Sort the list of the means into increasing order and the differences into the corresponding order. (3) Select the first eleven results and calculate the mean of the pair means and the median (i.e. the central value, n o t the average) of the differences. (4) Repeat this procedure for successive groups of eleven results, and obtain corresponding lists of means and medians. Ignore any terminal group of less than eleven results. (5) E i t h e r : plot the medians as a function of the means and obtain the intercept and slope of the line graphically by eye; or: obtain the same parameters by regression. These parameters correspond respectively with So and h in equation (1). Discussion
The whole procedure is illustrated in Fig. 1, which shows means versus differences, plotted as small squares, for a set of 100 pairs of duplicated results. The data set is divided into groups of eleven individual points by the vertical lines, and the mean concentrations and median differences of each group are shown by the large squares. Regression of the medians on the means gives the values so = 5.20 (standard error = 1.53) and k = 0.046 (standard error = 0.014). There are a few technical details which need emphasising. For strictly unbiassed estimates, the parameters obtained by regression must be multiplied by 1.048, which is a factor derived from the properties of the half-normal distribution. However, omission of this step is unlikely to lead to problems, as the difference is rarely outside the 95% confidence limits of the estimates. The parameters are valid for estimating precision only within the concentration range spanned. Thus for meaningful estimates of so (and therefore detection limit) a reasonable proportion of the samples must have concentrations approaching the detection limit. When estimating the parameters by regression it is good practice to calculate in addition the standard errors, and thereby determine whether So and k are significantly greater than zero. Additional refinements
26
30
I I J
I i o
°
°1
= 20
°
o
i 1oO ~[]
°1 I
I[] °t
I ,
o
i
II
[]
2 ...... 2 2 -
....
~ mo ..... "~
o L ol Iol ~1 o I I~ [] i t fool I i °I o~ I ~ L lJ []I ~ i []
[]
[]
50
I00
I i
o
o
o
t~
r
150
200
Mean of duplicate results
Fig. 1. P r o c e d u r e for m e t h o d 1 ( > 50 d u p l i c a t e d results available). Small squares are i n d i v i d u a l results. Large s q u a r e s s h o w t h e m e d i a n a b s o l u t e d i f f e r e n c e a n d m e a n conc e n t r a t i o n for e a c h range b o u n d e d b y vertical lines. T h e diagonal lines are t h e regression line w i t h c o n f i d e n c e limits o f -+1.0 s t a n d a r d error.
consist of testing linearity, and the n o r m a l distribution of error. A c o m p u t e r program DUPAN3 for p e r f o r m i n g all of these operations is available ( T h o m p s o n , 1978). M E T H O D 2: F O R 10 T O 50 D U P L I C A T E D R E S U L T S
Procedure
(1) Specify the precision required in the form: S c = S o + kc
Either So or k could be zero if appropriate. (2) F o r m t w o new equations f r o m this, namely: dg0 = 2.326 (So + k c ) d99 = 3.643 (So + k c ) (3) (4) ences (5)
Plot dg0 and d99 over a suitable range of c, to f o r m the control chart. As in M e t h o d l , obtain the pair means (a i + bi)/2 and absolute differJai - - biJ of the duplicate results. Plot these points on the c o n t r o l chart.
27
Discussion
This procedure is illustrated in Fig. 2 for the specification Sc = 1.0 + 0.05c i.e. a system with a detection limit of 2.2 units (from equation 3) and a precision approaching 10% at high concentrations (from equation 2). The lines dg0 and d99 are respectively the 90th and 99th percentiles of the absolute difference between duplicates as a function of concentration, assuming a normal distribution of error. If the duplicate analytical data comply with the specification, on average 90% of the points will fall below the d90 line and 99% below the d99 line. If the precision is better, then a higher proportion will tend to fall below the lines. If the precision is worse, the opposite will tend to occur. Statistical variations in the number of points falling in each area will occur, and it may be desirable to calculate the probability of any particular occurrence by the binomial theory. Tables I and II list the binomial probabilities associated with this type of chart, giving the chance that M or more more points from a total of N will fall above the specified percentile, i.e., dg0 or d99 • As an example, in Fig. 2, three of the 14 points plotted on the chart fall above the 90th percentile (d90), and one falls above d99 • This distribution can be interpreted by reference to Table I under N = 14 and M = 3, which gives the value 0.158. This signifies that only in about 16% of cases will this distribution (or worse) occur if the
30
I
I
I
I
d99 = 20
t
i
°
dgo
IO ///~'~/
0 I
0
25
0
o 0 0
000 °
qp
I
I
50
75
I00
Mean of duplicate results
Fig. 2. P r o c e d u r e for m e t h o d 2 ( < 5 0 d u p l i c a t e d results available). Lines m a r k e d d,0 a n d d,9 are p e r c e n t i l e s for a b s o l u t e differences in the specification s c = 1.0 ÷ 0.05c.
28
analytical data comply with the specification. Likewise the corresponding value obtained from Table II for the single point falling above d99 is about 13%. The implication is that the precision is probably worse than the specification. For general geochemical purposes we have had printed a control chart for 10% precision (i.e. with percentile lines drawn for the specification S c = 0.05c) on logarithmic axes. This chart is shown in Figure 3, and it has been consistently useful for rapid precision control over the course of several years. The points plotted in Fig. 3 are the same data as plotted in Fig. 2. However, the distribution of the points between the three areas is different, because of the more stringent specification in Fig. 3. CONCLUDING REMARKS
A full account of the theoretical background to the method, its scope and robustness will be found in Thompson and Howarth (1976}. Examples discussed include determination of lead, arsenic and zinc by emission spec-
TABLE I The p r o b a b i l i t y that M or m o r e p o i n t s o u t o f N will fall above the 9 0 t h percentile o f the chart (single e v e n t p r o b a b i l i t y = 0 . 1 0 0 0 0 0 ) H
H=I
M=2
r1=3
~=q
r1=5
I 2 3 q 5
.i00000 .190000 .2710O0 .3q3900 ,409510
m=5
r!=I
P1~8
:~=9
.010000 .028000 ,052300 ,081460
.00]000 .003700 .008560
.000100 .000460
.[]00010
.~8059 .521703 .569533 .612560 ,651322
,114265 .149694 ,186895 .225159 ,263901
.0]5850 .025692 ,038092 .052972 ,070191
.001270 .00272~ ,005024 .00833J ,012795
.000099 .000177 ,000432 ,000891 ,001635
.O0000l ,000006 .000023 .000064 .000147
.000000 .000001 .000003 .000009
.000000 .000000 .300000
CCO00C OOO0OC
13 14 15
.686189 .7]7570 .7q~813 .771232 .794109
.302643 ,3443998 .378655 ,415371 .450957
.089562 .110870 .133883 ,158360 ,184061
.L!CCC2~
.0]8535 .025637 .034161 .044]33 .055556
.002751 .004329 ,006460 .009230 .012720
.000296 .000541 .000920 ,001474 .002250
.000023 ,000050 .000099 .000l@J .000311
.000001 .000003 .000008 ,000017 .000034
.C0000C ,000000 .@00000 .000301 .000003
.000000 .0003CC .COC000 .COCOCJ .00C00~
.3~DCCC .32220z .DJOCCC ,220020 .~3CCCC
.CCSCZC C22222 .C~C22~ .22~220
15 17 18 19 20
.81q698 ,833228 .849905 .864915 .878423
.485272 ,5182]5 ,549716 .579735 ,608253
.21075] .238203 .266204 .294~55 .323073
.068406 ,082641 ,098197 .114998 .132953
,01700~ .022144 .008194 .035194 .043174
,003297 .0041567 .006415 .008593 .01]253
.000509 ,000784 .001172 .001896 .002385
.000061 .000106 .000173 .000273 .0004]6
.300006 ,300011 .000021 .000036 .000050
.000030 .000001 .000002 .000004 .000007
.200032 .000002 .~CCCQC .300000 .300001
.3322;32 ..~00000 .32ZCCC .388000 .3ZC2CS
21 22 23 24 25
.8g0581 .901~23 .911371 .g20234 ,928210
,635270 .660801 .684873 .707523 ,720794
.351591 ,379959 .4080a3 .435726 ,462906
,151965 ,171928 .19273l .214262 .236409
.05215# ,062]34 .073113 .085075 ,097994
.014445 ,018216 .022608 .027658 .033400
,003273 .004390 .005773 .007456 .009476
.000613 ,000879 .001230 .001684 .002261
,000095 .000J47 .000220 .000321 .000458
.000012 ,00002! .000033 .000052 .00007g
.20000] .C00C02 .@C0004 .000007 .000012
.COCCCC .?COCCC .52000Z ,DJOCOl .22CC:~I
26 27 28 3930
.935389 .941B50 .947665 ,952899 .g57609
,748736 ,76740i .78z~445 .801128 ,816305
.4~9495 .5]5419 .5406J7 ,565O40 .588649
.259058 ,282102 ".305434 .~28952 ,352561
.]11835 .]20557 ,142J12 ,]58444 .]75495
.039859 .047057 .055007 ,0637]7 .073190
.011869 ,01~6~ .017907 ,02]6]7 ,025827
.002983 ,003871 ,004951 .006247 .007784
.300638 .300872 .001112 ,301550 ,002020
,300117 .000[69 .000239 .0C0333 .00045q
.000018 .000028 .000042 .CC0062 .000009
,200203 .0C~C:] 4 .20C007 .2C0012 .2223:5
3i 32 33 34 35
.96184~8 .965663 ,969097 ,972J87 .g74968
.830435 .843577 .855785 ,867116 ,877624
,611414 .6333|6 ,654342 .6744~7 .693750
,376]70 .399694 .423056 .446;85 ,~9015
.193201 ,211498 .230318 .249592 ,269251
.083421 ,094399 ,106109 ,J18530 ,131636
.030563 ,035849 .041704 .040144 ,055183
.009588 .011685 .014102 .016802 .019990
.002596 .003295 ,004134 ,005131 ,006304
.000611 .000809 ,00[058 ,001366 ,001742
.000126 .000174 .030238 .C00320 .200424
.200023 .ZCCC33 .ZCCC47 .Z~CC66 .CC2C~I
36 37 38 39 40
.977472 .979724 .981752 ,983577 .985219
.887358 .896369 .904705 .9J2410 .919526
.7J2137 ,729659 .746330 ,762168 .777102
,4gl4~Bg ,513553 .535164 .556281 .576869
,289227 .3094~4 .329864 .350394 .370982
.145397 ,159780 .174748 .190259 ,206273
.062828 .071085 .079955 ,089434 ,099516
.023509 .02744i .031806 .036621 .04]902
.007673 .009256 .0H075 .01314~ .C15495
,302198 ,0027a~ .003397 .C04165 .005353
.O0J£Se ,000720 .300923 .501172 .221472
.CCZ:25 .CC£168 .202223 .5022~3 .£CZ3~1
8 9 10 i~
-=:2
"=:1
"'=±:
29
TABLE
II
T h e p r o b a b i l i t y that M or m o r e p o i n t s o u t o f N will fall above the 9 9 t h percentile o f the chart (single e v e n t p r o b a b i l i t y = 0 . 0 1 0 0 0 0 ) H
>'=l
M=2
"'=3
D1=4
*'75
1 2 3 4 5
.oioooo .OiggO0 .o297Ol .039JC4 .0490i0
~=6
M:]
'I:~
M:9
rlciC
.OOOlO0 .ooo298 .000592 .000980
.oooocJ .000004 .O000iO
.oooooo .000000
.000000
6 7 8 9 ]0
.0~8~20 .06793S .07725S .086483 ,095618
.O0]d60 .00203i .D02690 ,003436 .004266
.000020 .000034 .000054 .O000BO .000114
.000000 ,o0000e .OODO0] .00000~ .000002
11 12 13 la is
.]0"~562 .ii3615 ,i22479 131254 .139942
,005180 .00617S .007249 .008401 .009G30
,000155 .000206 ,000265 ,000335 .000416
18 17 18 19 20
.i,461Sq2 .157057 .16S486 ,173B31 .182093
.010933 .012309 .0~3756 .015274 .016859
23 24 29
.000000 .000000 .000000 .000000 .000000
.000000 .000000 ,000000 .000000 ,000000
.000000 ,ooo000 ,000000 .000000
.000000 .000000 .000000
.o0000c .000000
.000000
.000003 .O0000S .000007 .000009 ,000012
.000000 ,000000 .000000 .000000 .000000
,000000 .000000 ,000000 .000000 ,000000
.O00DO0 .000000 ,000000 .000000 .000000
.000000 .O0000O ,000000 .000000 .000000
.000000 .000000 .000000 .000000 .O00000
,000000 .000000 ,000000 +000000 .O00000
.000000 ,000000 .000000 .000000 .O0000O
,000000 .000000 .000000 .000000
,000508 .000612 .000729 .000859 .001004
.000017 .000021 .000027 .000034 ,000043
.000000 ,000001 .O00OOl .000001 .000001
.OOOO00 .000000 .000000 ,000000 ,000000
.000000 .000000 ,000000 ,000000 ,O000OO
.OOOOOO .000000 .000000 ,000000 .000000
.OOO000 ,DO0000 .DO0000 .000000 .000000
.000000 ,000000 .000000 .000000 .O0000O
.O0000O .000000 .000000 .000000 .OOOO00
.000000 .000000 .OOO000 .000000 .OOO000
,190272 .198369 .206386 .214322 .222179
,018512 .020229 .02201] .023854 ,025759
.001162 ,001336 .001~25 .00172g ,001951
,000052 .000063 .000075 .000091 .000107
,000002 ,000002 .DO00O3 .000004 .000004
.000000 .000000 .000000 .OO0000 .000000
.000000 .000000 .O0000O .OO0000 .O000OO
.000000 .000000 .000000 .000000 .000000
.000000 .000000 .000000 .000000 .000000
.000000 .000000 .000000 +O00000 .ODO000
,000000 .000000 .000000 .OOO000 .O00000
.000000 .000000 .000000 .DO0000 .000000
~6 27 28 29 3O
,229957 .237697 .245281 .252828 ,260300
.027723 .029746 .031825 .0339~9 .0361~
,002189 .002444 .002717 ,O0300B .003310
>000125 ,0001"16 ,000169 .000194 ,000223
,000005 .000007 ,000008 .000010 ,000012
.000000 ,000000 .000000 .000000 ,000000
.000000 ,000000 ,000000 .000000 ,O0000O
.DO0000 ,000000 .000000 .000000 .000000
.000000 ,000000 .000000 .O000DO .000000
.000000 ,000000 .000000 .000000 ,000000
,000000 .000000 .O00000 .000000 ,000000
.000000 ,000000 ,000000 .000000 .OCO000
33 34 3S
,267697 .275020 .282269 .289~7 .296552
.038390 .040683 .043026 ,04541B ,047859
.0036~ .003993 .004360 .004747 .005154
.000254 .000287 ,0~0325 .000365 .000409
.000014 .ODO016 .000019 .000022 .000025
.O00OOl .O000Ol .000001 .O00OOi .000001
,000000 .000000 .O0000O .O0000D .O0000D
.000000 .000000 .O00000 .OOO000 .OOO000
.000000 ,000000 .000000 .O00OO0 .000000
.000000 .000000 .000000 .O0000O ~O00000
.000000 .OOOO00 .000000 .O0000O ,000000
.000000 .000000 .OODO00 .OOO000 .OO0000
3a 37 38 39 40
.303~07 .3~0~i .317,1~ .324271 .331028
.050346 .0S2878 .05~455 ,0~8075 .060737
.00~581 .00602B ,006497 .005986 .007497
.0O0456 .000507 ,000563 .000522 ,000686
.O00029 ,000033 .O0003B .000043 ,000049
.000002 .000002 .OOOO02 ,000002 .000003
.000000 .000000 ,000000 .000000 ,000000
.OOO000 .000000 .000000 .000000 .000000
.000000 .000000 .000000 ,OODO00 .000000
.000000 .000000 .000000 .O000OO ,000000
.000000 .O00000 .000000 .O000DO .000000
.OOO000 .000000 .000000 .000000 .O00000
}D
....................................
I00
IIII]I
sc
II lti
fl~ll
I [I[11
I I
I
II I]I D-"Z
I]11]1
I]1
I
I/ 1
II I I I / Y
"~I II ifl
II
II
II
II II i|
L
I
I IIlI
I l llll 1/1111
I[1111
.0
Z /I
f f
~
/i"/i"
.l . .m. . , I
l
I/l-I IlJl I I llll~l
lill I 1G I
II II
El
II
]I I
~
/] "
A
•
i .ill i i
Jq I I l l IJ~ lill
l
II II II
,f
II
I
II
I
III
I I
II 0.1 mean of duplicate results
io
Fig. 3. Precision chart used in Applied Geochemistry Research Group.
H=I]
rl ] 2
30
trography, colorimetry and atomic absorption respectively. Computations were carried out on the Imperial College Computer Centre CDC 7314/6400 facility. REFERENCES Howarth, R.J., 1977. Automatic generation of randomised sample submittal schemes for laboratory analysis. Comput. Geosci., 3: 327--334. Thompson, M., 1978. Interpretation of duplicated data in geochemical analysis. Comput, Geosci., in press. Thompson, M. and Howarth, R.J., 1976. Duplicate analysis in geochemical practice (2 parts). Analyst, 101: 690--709.