A critical examination of analytical error

A critical examination of analytical error

W39-914Oj78/0601-0325 ?ihmta, Vol. 25, pp. 32?-329. 0 Rrgamon Press Ltd., 1978 Printed in Great Bntam A CRITICAL EXAMINATION ANALYTICAL ERROR $02,0...

668KB Sizes 5 Downloads 104 Views

W39-914Oj78/0601-0325

?ihmta, Vol. 25, pp. 32?-329. 0 Rrgamon Press Ltd., 1978 Printed in Great Bntam

A CRITICAL EXAMINATION ANALYTICAL ERROR

$02,00/O

OF

W. E. HARRIS Depa~ment of Chemistry, The university of Alberta, Edmonton, Alberta, Canada

(Receiued 10 July 1977. Accepted 6 Nouember 1977)

Smmary-Errors associated with three of the five basic analytical operations are examined. It is concluded that non-Gaussian distributions of error are to be expected.With greater care and with improved technique such non-Gaussian distributions become more evident. Condensation of experimental data and interpretation of the analytical results should be made with these characteristics in mind.

Chemistry, as an experimental science, necessarily deals with numbers, since the result of an analysis is nosily expr~~d as a number. Such numbers should be evaluated with an awareness of their limitations. This study is concerned with a broad and long-term concern with the best formal processing methods to use for the condensation of experimental data and the interpretation of the condensed information. Experimentally, data are obtained from a sequence of up to five basic operations: definition of the goal, isolation of the system, manipulation, measurement, and evaluation. Error can be introduced at any stage. In this paper experimental data concerning errors relating to the manipulative and measurement steps are presented and evaluation is discussed. O~ERVA~ONAL

(QUA~~~ON)

rest quantization level. Introduction of bias into the resulting numerical data has been studied and assessed for electronic quantizers.1r2 Al~ough the use of electronic quantizers is spreading, in scientific measurement a large number of quantization operations must continue to be carried out by humans. Like electronic quantizers, human scale-readers are seldom unbiased, and the biases are clearly non-random. Existence of such bias has important implications with respect to the statistical evaluation of numerical data. Observations

I have examined in some detail one of the simplest human quantization operations, that of reading a burette. At the completion of a titration, the meniscus should be at a random position in terms of the final digit in the burette reading (the 7 in a reading of 44.47 ml, for example). Thus a large number of such terminal digits would be expected to be uniformly distributed among the ten digits from 0 to 9. Figure 1 shows three samples of distributions obtained from human quantization of burette meniscus positions. Figure 1A shows a sample of the distribution of 10000 terminal digits for burette readings recorded by 184 students (between 45 and 65 readings each) during the course of their performance of analyses. In the sample selected there was prior elimination of those few students with pronounced prejudice for zero (p < 0.01). A large number of readings were examined so that the expected standard deviation would be reduced to the low level indicated. Presumably, each of the 10000 readings in Fig. IA had been taken with care; each student had been thoroughly instructed on how to read the burette scale and on the common sources of error, and had been given both practice and a practical test in careful reading. The fact that terminal digits of 0 and 5 do not stand out from the others is evidence that quantization to the nearest 0.01 ml was attempted. It can be seen that nearly all ten digjts have an observed fr~uency clearly lying far outside the expected st~d~d-de~ation limits. A chi-square calculation can be used to determine the extent to which data such as these deviate from expected probabilities. In this

ERROR

As indicated, the fourth major operation in analysis is that of measurement. Numbers are usually generated from an analogue-to-digital conversion or a quantization procedure in which the magnitude of an analogue parameter is represented numerically. (Quantization is the operation of limiting the possible values of a continuous quantity to a discrete set.) This process has been most carefully examined’v2 in terms of electronic analogue-to-digital conversion of electrical signals. However, such analogue-to-metal conversion, or quanti~tion, is also required in the reading of scales, dials and verniers by humans. As stated, the observations frequently entail the reading of scales, dials, or verniers. By definition, analogue quantities can take an infinite number of values, whereas a numerical representation of an analogue value is always limited in resolution. Resolution of a number is related to the quantization level, defined as the smallest change in input signal that can be recorded numerically by the system used. In addition to this resolution limitation, a quantizer (the device or human agent, which brings about the analo~ue-to-di~~l conversion) may also introduce bias into the numerical represen~tion of the output. Such bias is seldom random. For example, some electronic quantizers always round down to the nea-

325

W. E. HARRIS

326

Cl

0123456789

Termnot dtyt

FQ. I. Frequency of observation of randomly distributed terminal digits in burette readings ; s,the interval between the arrows, denotes the expected 95% confidence limits (+ two standard deviations). (A) 10000 readings of terminal digits taken by 184students during theperformanceofanalyses; (B) 1500extraordinarily carefully taken readings by 200 students

during a practical test;3,4 (C) 3500 readings with prior e~~~inati~n of those from ~nd~v~du~swith bias (p <: 0.05). the chi-square value indicates a probability approaching certainty that the distribution observed was not uniform. Although each of the IoooO readings was presumably taken with care, the question arises as to whether a set ofreadings with better reliability in terms of the distribution could be expected if even greater care were exercised. Figure 1B shows the distribution of 1500 readings that were taken with painstaking care. They were taken during a practical examination where extraordinary concern for reporting the correct value was shown. The djstribution pattern is strikingly similar to that for the 10000 readings. For example, in both cases the digit 2 is observed most frequently and 7 the least frequently. With 1500 readings the expected standard deviation is relatively larger, but still all digits have a frequency lying outside the expected standard deviation limits. A chi-square calculation again indicates with a probability approaching certainty that the distribution of the reported observation is nonuniform. Figure 1C shows a third independent sample of a distribution of terminal digits. For this sample the chisquare values for the 45-65 readings taken by individuals were calculated, and those with a distribution having a probability of randomness of less than 0.05 were eliminated (about half). This third independent sample was obtained solely from the readings of 72 selected individuals who showed no obvious bias, For these 3500 terminal digits the same distribution

pattern persists, and again the probability approaches certainty that the distribution of the reported observations in this highly selected sample is non-uniform. In dis~bu~ons such as those illustrated in-Fig. 1, three distinct types af group bias can be identified. The first type is bias towards small (0, 1,2) and away from Ed-rage (5,6,7) terminal digits. The more painstakingly careful readings in Fig. 13 actually appear to show more, not less, of this type of bias. The second type of group bias is prejudice for even and against odd terminal digits. For al1 three groups of readings, chi-square calculations for the odd-even distribution give probabilities showing it is highly unlikely that the observed distributions arise from a uniform distribution alone. For the 184 individuals invdved in the ~~rea~ng distribution, the tendency toward odd-even bias was examined; some were found to have bias toward odd numbers, and more to have bias toward even numbers. For individuals, this type of bias appears to be almost as common as the pronounced low-mid-range bias. The overah group distributions as to odd-even bias therefore include some mutual cancellation of individual biases. The third type of bias can be revealed by examining the deviations of readings from their correct values. Figure 2 shows t,he distribution of such errors for 200 independent readings. These readings, like those in Fig lB, were obtained with painstaking concern and under the controlled conditions of a practical test. This

case

u” f K t

002

004

&rrette readingerror, m\ Fig. 2. ~~str~bu~on of reading errors for 200 lnde~nden~ student observations of over 40 diierent burettes (errors z-0.1 ml omitted). (Correct values were established to a high level of confidence from the readings of lo-15 experienced instructors.)

figure ihustrates an obvious non-Gaussian distribution of errors. Negative errors in excess of 0.01 mi outweighed positive ones by almost ten to one. The simultaneous existence of three types of bias in such a simple operation as quantization of a burette scale seems remarkable. Also remarkable is the observation that the bias is not eiiminated through detailed ~ns~u~tion and practice, but rather is brought sharply into focus. We can speculate about the causes for the various types of bias. For example, the third type probably results from incorrect positioning of the reading aid. In spite of prior identification of this as a likely source of error, it was not eliminated even with nainstakingly careful readings. Similarly, explanations

321

A critical examination of analytical error might be offered for the sources of error giving rise to the first two types of bias. Their elimination, however, even if the causes were identified, would appear to be unattainable. The use of a vernier scale decreases, but does not eliminate, number bias in readings. It has been shown’ that the random terminal digits of weighings from a balance with a vernier read-out exhibit less bias than those from a burette scale (though it is still present). Although digital read-out, where feasible, should further reduce uncertainties introduced by human quantization, it should be recognized that electronic quantization also exhibits bias.’

g

s -_ 0

$;

loo

t

1

0 MANIPULATIVE ERROR

The third operation in analysis involves physical and chemical manipulation of the system, including such steps as separation, transfer, reagent addition, and determinate sampling errors. Errors introduced during this operation are called manipulative, and they usually become smaller with increasing competence of the experimentalist. It is attractive to study observational bias in detail because it is easy to isolate the quantization operation and to obtain large amounts of data. Isolation of the errors associated with physical and chemical manipulations is more difficult. In a complex set of operations, quantization errors such as those described in the preceding section probably make only minor contributions to the total uncertainty. Most bias and uncertainty is probably introduced through the manipulative operations; that is, in addition to a number of opportunities for introduction of error and bias from quantization operations, there is manipulative error. In general, the several sources oferror are difficult to study separately. Consequently, only the overall quality of the results is compared with the correct values. Several such studies have been reported. For example, one study6 of several hundred chloride determinations indicated no significant bias by the gravimetric technique, positive bias with the Fajans method, and negative bias for the Volhard method. Even though isolation of the errors associated with physical and chemical manipulations is difficult, manipulative bias is probably far more significant than quantization bias and therefore deserves a larger share of attention. When the number of sources of errors is increased, and provided no one source of error dominates, it would be expected that the distribution of error would become more nearly Gaussian even though the individual sources have non-Gaussian distributions. If a strongly non-Gaussian overall distribution is observed, then it is likely that there is a dominating source of error. Observations

One simple fundamental manipulative operation, that of taking an aliquot, was adequately isolated from

-04

-02

Error,

0

02

04

ml

Fig. 3. Distribution of errors for the takmg of an aliquot which would require about 40 ml of titrant. The manipulative operations are those involved in the procedure “pipette 10 ml of the concentrated solution (3M H,SO,) from the 50-ml flask into the 250-ml volumetric flask; dilute to volume; pipette a lo-ml aliquot of the diluted solution into the 200-ml conical flask”. Each student was given a fresh portion of 3M H,SO,, which has little tendency to absorb or lose water. The pipettes and volumetric flasks were matched so that the calibration error was negligible-normally less than 1 part in 3000. The aliquot supplied by the student was titrated by the instructor to a highly precise end-point. The error plotted in the graph is the difference between the expected and observed volumes of t&rant used.

the reading operation and was examined for manipulative bias. Figure 3 shows the distribution of errors in the taking of aliquots by a group of 700 students (over a 5-year period). Those aliquots with gross errors were excluded. To obtain the data, a procedure was used that allowed the error to be estimated to the nearest 0.01 ml. Scale-reading errors were reduced to a negligible level and were certainly at least an order of magnitude smaller than the average errors shown in the figure. In Fig. 3 the errors have been collected into O.l-ml intervals to show the group distribution. The aliquots were carefully taken under the conditions of a practical test and after detailed instruction and practice. It seems probable that an individual may tend toward either positive or negative manipulative bias; on a group basis, however, this figure clearly demonstrates a much higher probability of positive than of negative errors of moderate magnitude. Chi-square calculations indicate with a probability approaching certainty that the distribution is non-Gaussian. The operations involved in the taking of an aliquot gave no obvious evidence of bias until after refinement of the instructions and improvement of the techniques for using a pipette and volumetric flask. After such improvement, uncertainty in the operation was reduced and bias became clear. Less direct evidence of overall bias, observational plus manipulative, relates to evaluation of the results of analyses performed by students. Minimization of the adverse effect of bias is basic to a systematic approach to making student’s marks agree as closely

328

W. E. HARRIS

100

0

Average grade,

%

Fig. 4. Smoothed curves showing the effect of varying degrees of skew (l/3,2/3; 1,lS and 3) on the relation between average grade and discrimination for two analyses by a class in introductory quantitative analysis. (Arbitrarily, a skew of 3 is much more lenient toward results that show positive deviation from correct values than toward ones that show negative deviationsia skew of l/3 much more lenient toward negative deviations, and a skew of unity equally lenient toward both.) These curves relate to a 5-point grading scale described elsewhere.4 (A) Determination ofcopper iodiometrically ; (B) determination oftotal salt by ion-exchange.

as possible with the quality of their experimental work. Thus, the significance of an error in technique represented by a result that is high by a given amount may be greater or less than that of a result that is low by the same amount. The best index of the reliability of grades is the quantity called discrimination.*7 It has been observed that grading scales that allow different ranges for high and for low results usually give better values of discrimination. The most desirable skew factors to be used in setting up grading scales cannot be theoretically predicted. Figure 4 shows plots of discrimination against average grade for two such experiments. They are chosen for purposes of illustration because they are closely similar in many respects. In both cases samples and standards are weighed, then subjected to appropriate chemical manipulations, and finally samples and standards are titrated alternately. It is clearly shown that a skew factor greater than unity is desirable in setting up the grading scale for copper, whereas skew in the opposite direction is desirable for grading the results for the determination of total salt. The significance of results such as those in Fig. 4 would seem to have implications for the, evaluation of data from interlaboratory comparisons of analyses and from clinical testing. However, the findings are given here mainly to indicate that we should expect real data to show bias. EVALUATIVE ERROR

The final operation in analysis is the complex matter of evaluation. Significant error can result from inadequate or improper exercise of judgment about the data collected. In a formal sense, evaluation involves (1) selection of the most valid value(s) to represent the data collected, (2) determination of the reliability of *It is a measure of the difference between the performance of the best and poorest students in a class, for an item such as a test question or an experiment. It is calculated from the top and bottom end-fractions of a class. The size of the endfraction chosen is a compromise between including as many st@ents as possible for statistical validity and maintaining as great a difference as possible between these fractions.

the selected value(s), and (3) an attempt to assess the validity of the usually unstated prior assumptions about non-random errors. Evaluation is and must partly remain an ill-defined and elusive process. In the context of’ scientific measurement, it involves two stages: the. processing of data and the exercise of judgment. Data-processing has as its aim the condensation of observations to furnish results free from irrelevant information. Such condensation is carried out according to formalized rules. Thus calculation of,the average, median and confidence limits,,or the application of graphical, minmax, least-squares, maximumlikelihood and Bayesian methods have formalized rules by which an estimate of the “real” value is produced. We need not be concerned further with such formalities except to note that their, estimates and validities may differ. The exercise of judgment is another matter. It enters the evaluative process, first in the decision as to the method of data-processing to be applied, and secondly in the interpretation of results. In reality, two separate acts of judgment are not consciously made since a decision about the method of processing establishes the rules by which the results are interpreted. Almost inevitably, data-processing techniques are chosen that involve quantities such as the average (arithmetic mean) and standard deviation. For these quantities to be good estimates of the central tendency and its validity, a Gaussian distribution of error is required, and experimentalists usually tacitly assume that this is the case. This is the point at which substantial evaluative error is often introduced. (It is taken here that evaluative error includes the use of assumptions that lead to a biased central value.) If a Gaussian distribution of error could be ensured then the average would be the most efficient estimate of the central value, and confidence limits the most efficient estimate of its reliability, but as Schmitts has said, “No physical process or collection of observations has ever followed or will ever follow the Gaussian distribution exactly.” Therefore one part of the exercise of judgment in evaluation must be to

A critical examination of analytical error weigh the convenience of the use of the Gaussian assumption against the consequent probability of introduction of substantial evaluative error. Non-Gaussian distributions decrease the reliability of averages and also makes conclusions about their reliability conform less with “reality”. The frequency of non-Gaussian ~st~bu~ons makes questionable the nearly universal practice of using the average as the best estimate of the true value. Nevertheless the use of a value other than the average meets with widespread prejudice and is likely to he viewed with strong suspicion. It is often believed that somehow, if an experimentalist would simply exercise more care, faith in the fantasy of the Gaussian distribution would be justified. Actually, as pointed out earlier, with greater care, non-Gaussian distributions become more clearly evident. 7 It is instinctively difficult to accept non-Gaussian ~s~butions and bias because, from a utopian point of view, a Gaussian ~s~bution of error is a persistently attractive model. In fact, it is so attractive that the term “normal distribution” is used. The use of “normal” as synonymous with “Gaussian” is unfortunate because of its misleading connotation (in the sense of “customary” or “usual”). In real measurements, high and low results of the same magnitude are unlikely to occur with the same frequency. Each operation is likely to have its own characteristic biases, and their combination can only fortuitously lead to a Gaussian distribution of error. Although sources of bias may be uncovered, in many cases the best that can be hoped for is to minimize their effects. The median is more robust than the average as an indicator of the true value in that it is less dependent on the nature of the distribution of the results5 In real analyses and particularly those involving only a few measurements it is probably superior to the average. The data presented here demonstrate that several types of bias can be encountered in even the most elementary operations. Such bias makes it more difficult to arrive at the truth. For example, in Fig 2 the median of the deviations from correct burette-reading

329

values is substantially smaller than the average of the deviations from correct values. This is so for even the median obtained without prior elimination of gross errors and the average obtained with such elimination. Similarly, with the manipulative bias exhibited in Fig 4, the median appears to be more reliable than the average in that the median deviation from correct values is only half as large as the average deviation. The curves shown in Fig 4 also appear to confirm more broadly that non-Gaussian distributions are the norm for the overall results of a series of operations. In the exercise of judgment during evaluation, the most significant error is probably that made by confusing the estimation process with the interpretation process. One conclusion reached here is that a prior assumption should be made that significant deviation from a Gaussian distribution is customary. Another conclusion is that these deviations may be of such magnitude as to make the average less valid than the median and that nou-par~et~c methods of data-reduction should be used more extensively. Caution in the use of averages and confidence limits is recommended, and the prior assumption of a Gaussian distribution should be avoided if possible. REFERENCES 1. G. Horlick, Anal. Chem., 1975,47,352; P. C. Kelly and G. Horlick, ibid., 1973,45, 518. 2. H. V. Malmstadt, C. G. Enke, S. R. Crouch and G. Horlick, Uptimiration ofElectronic Measurements, p. 18. Benjamin,

Menlo Park, California, 1974. 3. W. E. Harris and B. Kratochvil, J. Chem.E&c.. 1971,48, 543. 4. tdem, reaching ~n~r~uctory Analytical Chemistry, p. 78. Saunders. Philadeluhi& 1974; Rev. Anal. Chem. 1977.3. 249, 5. H. A. Laitinen and W. E. Harris, Chemical Analysis, 2nd

Ed., p. 547. McGraw-Hill, New York, 1975. 6. B. Kratochvil, T. J. Bydalek and W. J. Blaedel, .I. Chem. E&c.,

1965,42,430.

7. R. L. Ebel, Essentials of Educational Measurements, 2nd Ed. Prentice-Hall. Enelewood Cliffs. N.J.. 1972: W. E. Harris, Anal. Chek, 19?5,47,1046A.’ ’ 8. S. A. Schmitt, Measuring Uncertainty. An Elementary Introduction to Bayesian Statistics. Addison-Wesley, Reading, Mass., 1969.