Clim Biochem. 1, 3-11 (1967)
CLINICAL LABORATORY ERROR IN PERSPEGTIVE D. G. C A M P B E L L AND J. A. O W E N
Departments of Biochemistry, Royal Women's Hospital and Alfred Hospital, Melbourne, Australia (]~eceived January 10, 1967)
SUMMARY
1. The subject of error in clinical analyses is reviewed. Arising from this, it is suggested that a more complete evaluation of laboratory variation, particularly as to the type of frequency distribution of the error of results, would lead to a better use of the results. 2. Evidence is recalled that, throughout the world, even good biochemistry laboratories do not provide the day-to-day reproducibility in results that many clinicians feel is necessary. 3. Estimates of allowable analytical error, expressed as coefficients of variation of values, are given for the following analyses: plasma calcium, chloride, glucose, phosphate, potassium, protein, sodium, and urea. 4. It is suggested that current analytical techniques are not adequate, at least in respect of non-replicate analyses.
T H E MAIN PURPOSE OF CLINICAL BIOCHEMISTRY LABORATORIES is to p r o v i d e
information about patients that can be used in making a diagnosis, in assessing progress, or in selecting therapy. This information is very largely numerical and, because of the way it is obtained, is subject to a degree of uncertainty. Obviously the less uncertain the information is the more valuable it is likely to be and, because of this, most laboratories attempt to minimize the uncertainty by the application of better analytical methods and various quality control measures. There can be few clinical chemists, however, who have looked into standards of laboratory practice without discovering that the matter was a good deal more complex than they had anticipated. In this article we have reviewed some of the problems facing anyone interested in laboratory performance. DESCRIPTIVE TERMS
When reporting laboratory performance various terms have been used in different senses by different authors. To avoid further confusion, we define and comment upon the usage of these terms. Correspondence: The Royal Women's Hospital, Carlton N3, Victoria, Australia.
4
CAMPBELL & OWEN
Accuracy Accuracy is the extent to which a result conforms to a standard, or to the truth, or to an assumed or accepted value. Ideally, accuracy should mean conformity to the true value. In clinical chemistry, because of the frequent inability to define the true value, accuracy commonly comes to mean conformity to an assumed value. Thus, in interlaboratory surveys, conformity to the interlaboratory mean is commonly taken as the measure of accuracy. Accuracy is of most importance when an initial result is being interpreted on the basis of established knowledge. T h u s a clinician m a y read (1) that a person whose capillary true blood sugar is greater than 140 rag/100 ml 2 hr after the ingestion of 50 g of glucose is diabetic. If the laboratory m e t h o d of measuring blood sugar gives consistently high results, the clinician m a y be misled into making a diagnosis of diabetes in a patient who is not diabetic.
J~eproducibility Reproducibility is the extent to which a set of results deviates from its mean. Because it depends on t h e conditions under which the results are obtained, reproducibility must always be qualified to define these conditions, e.g. reproducibility between laboratories, between operators, between the morning and the afternoon, between weekends and during the week, between J a n u a r y and July. We would emphasize the importance of qualifying the term "reproducibility ~'. Robinson (2) has recently shown how the reproducibility of a manual analytical procedure worsened during the day. This has been called the "four o'clock phenomenon". Further, a given difference between two blood urea values obtained on consecutive days has a greater significance than the same difference between two results obtained a m o n t h apart. T o the clinician, this is a m a t t e r of considerable importance. In acute renal failure, the decision to dialyse a patient m a y be based upon daily blood analyses, whereas in chronic renal failure decisions are more likely to be based upon monthly investigations. Again, if minor day-to-day changes in a patient's plasma calcium level are sought, it is best to store the samples and analyse them on the same day, preferably, in the same batch, because reproducibility in this case is better than reproducibility between days. We avoid the terms precision and repeatability which have been used to mean reproducibility under specified conditions (8). We believe it is better to use only the term reproducibility, which is self-explanatory and which can be qualified to cover all situations. Neither precision nor repeatability are sufficiently explicit. T h e measurement of reducing substances in urine with a tablet test m a y give extremely good reproducibility (on a scale of 0, + , + + , or + + + ) b u t the test would not ordinarily be considered to have great precision. T h e repeatability of a result could refer to the possibility of obtaining another sample of blood to repeat the test rather than to the chance of obtaining an identical result.
ReliabiMy We believe this term should be used as by H e n r y (4) who defines as reliable any laboratory method whose accuracy and reproducibility are readily maintained. T h u s the use of a reliable method makes it easier for the laboratory to
LABORATORY ERROR IN PERSPECTIVE maintain an appropriate standard ; if an unreliable method must be used adequate quality control and more frequent correction measures will be necessary to keep to the same standards. However, reliability of analytical procedures should not be a factor the clinician need consider. LABORATORY VARIATION Apart from blunders such as incorrect calculations or reading mistakes, there are m a n y factors affecting laboratory accuracy and reproducibility. Some of these, such as different analysts, differing methods, differing reagents, differing standards, and variation in instruments, are readily classified. A good analyst knows, however, t h a t even when he works with great care, avoids all blunders, and has the most refined equipment, his results are still subject to a certain variation. This has been termed the residual, indeterminate, or accidental error (5). It is attributable to uncontrolled variation in experimental conditions and to the limits to which scales on pipettes, colorimeters, and other measuring devices can be read. During the past two centuries, there has accumulated from m a n y fields of study data indicating that the incidence of residual errors in measurements can be described b y a Gaussian curve. This has led biochemistry laboratory workers to assume t h a t the overall variation in analytical chemistry results is likewise delineated and on this basis, predictions of the incidence of particular variations have been made and control systems set up. There is evidence, however, to indicate t h a t variation in biochemical results frequently does not have a Gaussian distribution (6). Skew distributions for errors have been found for the volumetric determination of lactose in aqueous solutions (7) and of glucose in urine (8). In both instances the range of errors below the mean was less than t h a t above the mean. In the case of urine urea estimations b y the hypobrolnite method, Shore and Thomson (7) obtained a symmetrical distribution of errors, but it did not fit a Gaussian curve well. The result that is obtained by an analytical method involves observational errors in volumetric and gravimetric procedures and in reading colorimeters and other instruments. Even if these observational errors have a Gaussian frequency distribution, the error in the final result will have a Gaussian distribution if, and only if, the result is directly dependent on the q u a n t i t y observed. Often this is not so. Thus, in the case of a volumetric determination of sugar in urine, the concentration is inversely related to a titration value (T). If errors in T are Gaussian in distribution, then errors of T will have a skew distribution. T h e r e would thus seem to be good reason why variation in analytical results should not fit a Gaussian curve. Instances where analytical variation fitted better other types of frequency distribution curves have been reported (if, 7). How widely non-Gaussian frequency distributions such as the log-normal distribution (9) or those described by Pearson (lO) apply to variations in e v e r y d a y clinical chemistry procedures remains to be determined. Certainly the question is w o r t h y of some attention. Some figures for laboratory reproducibility, taken from the literature, are set
m~
0
© ©
"0
o
D 0 ~q
©
0
.4
N © c~
ooOO
o o c~
i~.-~ I ~ . ~
0~
~. v
"m o-.~
o
o--
--o
0~
~
,'~ 0 - ~ c ~ . ~ 0
O~
~
0
LABORATORY ERROR IN PERSPECTIVE
7
TABLE 2 SIGNIFICANCE OF DIFFERENCES
Ratio of difference to its standard error 1.0 1.5 1.75 2.0 2.25 2.5 2.75 3.O
Chance (%) of difference being due to analytical error A*
B*
31 12 8 4.5 2.5 1 0.5 0.2
48 29 21 16 9 6,5 5 3.5
*Column A is to be used when comparing a result witha fixed value: Column B is to be used when comparing a result with a previous result. out in Table 1. The list is not comprehensive. Nevertheless, the literature upon this subject is by no means large and enquiry in m a n y laboratories will reveal that locally pertinent data are frequently not available. UNCERTAINTY AND THE CLINICIAN
We turn now to consider how uncertainty in a laboratory result affects the clinician in the m a n a g e m e n t of individual patients. The clinician requests biochemical information primarily for the purpose of making a decision. He has learnt from his own experience, and from that of others, t h a t if a biochemical result lles within certain limits or if a change in the result lies outside certain limits, one line of action is more appropriate than another. If decisions were made solely on the basis of a single biochemical finding, the matter would be simple. T h e course of action to be taken would be immediately apparent. Occasionally, because of laboratory error, a mistake would be made but the clinician would have no alternative but to heed the biochemical result, for in this particular circumstance there would be nothing else providing guidance. In clinical medicine, such a situation is rare and almost invariably a decision requires the consideration of m a n y , more or less independent variables. The clinician is thus commonly faced with a reported biochemical finding indicating one line of action while other features of the case (including, often, other biochemical findings) suggest a different line of action. T h e problem m a y be further complicated by the t r e a t m e n t itself having risks. T h e clinician m u s t weigh these factors against each other. Knowledge of the probability that the biochemical result could have a certain error would be of considerable help to him. There are two basic types of probability assessment. The first concerns the probability t h a t a result, which is subject to error, differs from a fixed value. For example, experience m a y indicate that, in certain circumstances, a patient with a plasma potassium below 3.0 mEq/1, does better with an immediate intravenous
CAMPBELL & OWEN infusion of potassium than without this. H o w e v e r , in a particular case there m a y be factors present which contraindicate potassium infusion, such as oliguria. If the l a b o r a t o r y reports t h a t the plasma potassium is 2.8 m E q / 1 the clinician will w a n t to know the probability t h a t the difference between the reported value (2.8) and the critical value (3.0) is real. T h e second t y p e of probability assessment concerns two results. In relation to a particular patient, the clinician m a y w a n t to know the significance of an a p p a r e n t change in a biochemical result. Thus, in looking after a jaundiced patient, he m a y be faced with a p l a s m a bilirubin reported one d a y to be 15.0 m g / 1 0 0 ml and on the next d a y to be 14.1 m g / 1 0 0 ml. He wishes to know w h a t is the probability t h a t the jaundice is actually regressing. In other words, he wishes to know w h a t are the chances t h a t a difference of 0.9 m g / 1 0 0 ml in the reported p l a s m a bilirubin represents a change in the patient. In both instances, he has to interpret an uncertain difference. Knowing the frequency curve of analytical error, he can c o m p u t e the chance of the difference being real. This is readily practical if the error curve is Gaussian. T h u s T a b l e 2 lists probabilities relating to various difference/standard error ratios. T w o lists of probabilities are given. T h e first (column A) is to be used when comparing a result (subject to error) with a fixed value. T h e second (column B) is to be used when comparing two results both of which are subject to error. We have evidence (Table 3) t h a t laboratories h a v e good days and bad d a y s and, ideally, this should be t a k e n into account in assessing the significance of a difference between results on different days. TABLE 3 WITHIN-DAY REPRODUCIBILITY ON CONSECUTIVE DAYS*
Day
Na
K
C1
Urea
1 2.4 1.5 1.2 1.3 2 2.0 1.1 1.5 2.2 3 0.9 1.5 0.8 1.0 4 3.1 3.6 3.5 3.0 5 3.8 2.4 1.9 2.7 6 2.1 0.9 0.8 2.5 7 1.3 1.6 2.3 4.0 8 2.3 1.5 1.1 3.5 9 1.2 1.1 1.2 4.6 10 1.5 2.3 0.9 3.7 *Results are expressed as a coefficient of variation calculated from 10 replicate daily analyses of a serum pool. This a p p r o a c h allows s t a t e m e n t s a b o u t the probability t h a t the difference in analytical results is real, b u t w i t h o u t further d a t a the physiological significance of a n y difference c a n n o t be determined. F o r the latter, information on individual p a t i e n t variation from d a y to day, or hour to hour, is required. Some work has been done on this, for example, F a w c e t t and W y n n (11), Zieve (/2), Pollard (13), b u t m u c h more information is needed. I t m u s t be emphasized t h a t a knowledge of the f r e q u e n c y curve of results in healthy persons, t h a t is, " t h e normal range," does not allow calculation of
LABORATORY ERROR IN PERSPECTIVE
9
TABLE 4 SOME ESTIMATES OF PHYSIOLOGICAL VARIATION~ ANALYTICAL REPRODUCIBILITY*
REPRODUCIBILITY~AND ACCEPTABLE
Physiological variation Test
Among individuals
Within individuals
Calcium 4.6 5.2 -Chloride 1.9 3.7 1 Glucose 13.8 10.0 -Phosphate 11.8 14 -Potassium 9.4 7.1 6 Protein 3.4 6.8 2 Sodium 2.7 1.8 1 Urea 28 31 -Reference (22) (23) (11) *Results are expressed as coefficients of variation.
Day-to-day analytical reproducibility 2.3 2.0 -5.8 2.2 3.3 1.3 3.6 (19)
-0.8 2.2 -1.8 2.7 1.0 4.0 (17)
5.3 2.8 -7.5 2.4 3.5 1.8 -(8)
Acceptable reproducibility (mean & range) 1.2 (0.3-3) 2.4 (0.7-3) 2.2 (0.7-3) 3.6 (3-5.5) 3.4 (1.5-4.5) --1.4 (0.8-2) 7.5 (3.5-9) this paper
probability t h a t a patient with a particular result is healthy. This requires knowledge of the number of healthy persons with the particular result, compared with the number of unhealthy patients with the same result. We can illustrate this by considering the case of a person found to have a plasma calcium of over three million in which healthy persons have plasma calcium values distributed in a Gaussian manner with a mean of 10.0 rag/100 ml and a standard deviation of 0.5 rag/100 ml. F r o m this information, it can be calculated that approximately one person in 33,000 will have a plasma calcium exceeding 12.0 rag/100 ml. In the total population there would therefore be a b o u t 100 such healthy persons. If, further, the population contained 200 unhealthy persons, each with a plasma calcium over 12.0 rag/100 ml, of which 100 had p a r a t h y r o i d adenomas and 100 had other diseases causing hypercalcemia, on the average, one out of three persons with a plasma calcium exceeding 12.0 rag/100 ml would be healthy, one • would have a parathyroid adenoma, and one would have some other disease associated with hypercalcemia. If the distribution of plasma calcium in healthy individuals follows a non-Gaussian distribution, the above probabilities will be in error but the a r g u m e n t will still hold. WHAT ERROR IS ACCEPTABLE? There will be few who would argue with the tenet t h a t complete accuracy and reproducibility in clinical chemistry are ultimately desirable in the interest of the patient. B u t to attain this, clinical chemistry laboratories would require resources which they do not have. With fewer facilities, a lower degree of a c c u r a c y and reproducibility m u s t be accepted. We shall end by considering the question of assessing how far, in the light of present clinical knowledge, is present-day clinical chemistry providing an acceptable standard of laboratory performance. The problem here is to define w h a t is acceptable. Few would think the reproducibility obtained in interlaboratory surveys (Table 1) an acceptable state of
10
CAMPBELL & OWEN
laboratory performance. We would question whether the best laboratories are good enough. Tonks (14) and Zwart Voorspuij and van der Slik (15) have suggested t h a t the allowable d a y - t o - d a y analytical variation should be from one-third to onefifth of the population variation. This basis for computing the allowable analytical error breaks down when considering ~omponents with a relatively wide normal range, for example, plasma haptoglobin with a normal range 20-200 mg/100 ml. There would seem to be no theoretical justification for basing allowable analytical error 6n a fixed fraction of the physiological range amongst individuals. Thus, no one would consider as acceptable measurements of body weight having an accuracy of one-fifth the normal physiological range. As has been pointed out (1/~), some of our analyses are not satisfactory, if this type of criterion is used. In the case of sodium analyses as performed by m a n y laboratories (see Table 1), the 99% analytical confidence limits can cover from half to more than the total normal range. There are better grounds for basing allowable analytical error on the day-tod a y or hour-to-hour variation of result in an individual for there is evidence t h a t less variation occurs within the individual than among individuals (Table 4). Since it is the clinician who is faced with the problem of making a decision in the face of uncertainty, we asked a number of hospital clinicians to state what accuracy in clinical chemistry results they would regard as acceptable. This was done by giving them a list of results (all within the normal range) and asking them to indicate what they considered to be acceptable analytical limits. We realize t h a t interpretation of their answers is fraught with difficulty. Undoubtedly their views were influenced by d a y - t o - d a y experience with an imperfect clinical chemistry service and probably also by the knowledge that improvement would cost money. Nevertheless, we feel t h a t it is of interest to record a s u m m a r y of their views (Table 4, last column). I t was assumed t h a t errors are distributed in a Gaussian manner, and acceptable was taken to mean attainable 99 times out of a 100. Since the problem facing clinicians frequently concerns the difference between two results, both of which are subject to error, the stated acceptable range was divided by 3.6 (~/2 times 2.5) to obtain an acceptable standard deviation which was then converted to a coefficient of variation. T h e results (Table 4, last column) make it clear t h a t there is still a discrepancy between what clinicians feel they require and what laboratories are able to provide. With increased accuracy and reproducibility smaller changes would become more significant, and this in turn would lead to a desire for a further increase in accuracy and reproducibility. CONCLUSIONS
Interlaboratory surveys have shown t h a t the standard of performance of m a n y clinical chemistry laboratories throughout the world is far from satisfactory. T h e r e is evidence t h a t even good laboratories probably do not provide the dayto-day reproducibility t h a t m a n y clinicians feel is necessary.
LABORATORY ERROR IN PERSPECTIVE
11
I n r e v i e w i n g t h e s e p r o b l e m s t w o i m p o r t a n t a s p e c t s of l a b o r a t o r y p r a c t i c e t h a t r e q u i r e f u r t h e r s t u d y h a v e arisen. A m o r e c o m p l e t e e v a l u a t i o n of l a b o r a t o r y v a r i a t i o n , p a r t i c u l a r l y as to t h e t y p e of f r e q u e n c y d i s t r i b u t i o n of t h e e r r o r s of results, w o u l d l e a d to a b e t t e r use of t h e results. W h a t r e p r o d u c i b i l i t y is n e e d e d to s a t i s f y clinical r e q u i r e m e n t s is u n k n o w n . I t m a y be t h a t c u r r e n t a n a l y t i c a l t e c h n i q u e s a r e n o t a d e q u a t e , a t l e a s t in r e s p e c t of n o n - r e p l i c a t e a n a l y s e s . ACKNOWLEDGMENT W e are g r a t e f u l to m a n y p e r s o n s w h o d i s c u s s e d w i t h us t h e m a t t e r s p r e s e n t e d in this review. W e a r e p a r t i c u l a r l y i n d e b t e d to o u r clinical c o l l e a g u e s for a l l o w i n g us to p r e s e n t t h e i r v i e w s on c l i n i c a l l y a c c e p t a b l e a n a l y t i c a l l i m i t s a n d to M i s s M a r c i a K i l g a r i f f for t h e d a t a p r e s e n t e d in T a b l e 3.
1. 2. 3. 4" 5.
6. 7. 8. 9. 10. 11. 12.
13. 14. 15.
16. 17. i8. 19. 20. 21. 22. 23.
REFERENCES World Health Organization. Tech. Rep. Series No. 310. Diabetes Mellitus, Geneva, 1965. ROmNSON, R. The four o'clock phenomenon. Lancet ii, 744-745 (1966). HUGHES, H. K. Suggested nomenclature in applied spectroscopy. Analyt. Chem. 24, 1349-1354 (1952). HENRY, R . J . Clinical Chemistry: Principles and Technics. Hoeber, New York, 1964. VOGEL, A. I. A Textbook of Quantitative Inorganic Analysis. Longmans, London, 1964. CLANCEV,V.J. Statistical methods in chemical analysis. Nature 159, 339-340 (1947). SHORE,A. & THOMSON, L . C . A study of the total error in two commonly used biochemical estimations. Guy's Hosp. Rep. 98, 209-221 (1949). MAINLAND,D. Elementary Medical Statistics, p, 268. Saunders, Philadelphia, 1963. GADDt:M,J . H . The log-normal distribution. Nature 156, 463-466 (1945). PEARSON, K. Cited by W. P. Elderton, Frequency Curves and Correlation, p. 36. Layton, London, 1927. FAWCETT,J. K. & WYNN, V. Variation of plasma electrolytes and total protein levels in the individual. Brit. Med. J. ii, 582-585 (1956). ZIEVE, L. On interpreting variations in laboratory tests. Postgrad. Med. 35 A. 46-56 (Apr. 1964). POLLARD,A. C. The Quantitative Changes Which Occur Through the Day in Some Commonly Determined Plasma Constituents. Technicon Symposium, London, 1965. TONKS, D.B. A study of the accuracy and precision of clinical chemistry determinations in 170 Canadian laboratories. Clin. Chem. 9, 217-233 (1963). ZWARTVOORSPUIJ, A. J. • VAN DER SLIK, W. The accuracy of sodium and potassium determinations in the clinical laboratory. Clin. Chim. Acta 9, 99 (1964). CAMPBELL,D. G. & ANNAN, W. Precision of plasma urea and electrolytes estimated by the AutoAnalyzer. J. Clin. Pathol. 19, 513-516 (1966). MABRY, C. C., GEVEDON, R. E., ROECKEL, I. E. & GOCHMAN, N. Automated submicrochemistries. Am. J. Clin. Pathol. 46, 265-281 (1966). T~nERS, R. E., BRYAN, D. J. & OGLESBY, K. A multichannel continuous flow analyzer. Clin. Chem. 12, 120-136 (1966). MITCHELL,F . L . Quality control in a large laboratory. Proc. Assoc. Clin. Biochem. 4, 38-41 (1966). HENDRY, P. I . A . Proficiency Survey in Clinical Chemistry (1965); Report of Scientific Meeting, College of Pathologists of Australia. Griffen Press, Adelaide, 1966. DESMOND,F . B . A clinical chemistry proficiency survey N.Z. Med. J. 63, 716-720 (1964). WOOTTON, I. D . P . Micro-Analysis in Medical Biochemistry. Churchill, London, 1964. BRYAN, D. J., WEARNE, J. C., VIAU, A., MUSSER, A. W., SCttOONMAKER,F. W. ~¢ THIERS, R . E . Profile of admission chemical data by multichannel automation: an evaluation experiment. Clin. Chem. 12, 137-143 (1966).