Cowput
d Ops. Res.. Vol. I, pp. 135-136.
Pergamon
NOTES,
Press,
1974. Printed
IDEAS
in Great
Britain
&TECHNIQUES
REDUCING COMPUTATIONAL ROUNDOFF ERRORS EFFICIENTLY EDWARD
L. MELNICK*
New York University, New York, N.Y.10003, U.S.A. Presented here is a new computational
procedure for determining the statistical properties of data.
INTRODUCTION
The essence of operations research lies in the construction of a mathematical model which adequately describes the behavior of a phenomenon being studied. Often preceding this construction is an examination of summary statistics computed from observable data generated by the unknown process. Some of these statistics are the mean, variance, skewness and kurtosis coefficients. The computational form of the latter three statistics is based on
where Xi, i = 1,. . . , n, are the observed data and X is the sampled mean defined as ln X = - C xi. The sampfed variance is m2, the skewness coefficient is rn$irnz and the kurtosis fir coefficient is m,/m$. Concentrating, for the moment, on the variance m,, we recall the equivalent mathematical relationship 1*
m2 = --Fxf
- X2.
In the olden days of the desk calculator, provisions were made for calculations with 20 significant digits so that equation (2) was almost always used to calculate m2. This formula only required one pass through the data file. With the arrival of the electronic computers and their limited word size, equation (1) became the preferred calculating formula since equation (2) was affected by roundoff errors especially when n was large and/or the absolute values of Xi were large (large being determined by the computer word size). This is especially true for computer programs that invert variance-covariance matrices (such as regression programs) since roundoff errors greatly influence the results of the computations when the matrices are ill conditioned. Studies have demonstrated that accuracy is enhanced
* Edward L. Melmck is Associate Professor of statistics at New York University. He has been a mathematical statistician at the U.S. Census Bureau and has published in the Journalof AppliedProbability, the IEEE Transactions on Information Theory, and Decision Sciences. We holds the B.A. degree in industrial psychology from Lehigh University, M.S. in mathematical statistics from Virginia Polytechnic Institute and Ph.D. in mathematical statistics from George Washington University. 135
EDWARDL. MELNICK
136
by working with deviations about the mean rather than the raw data. Although greater accuracy had been obtained, this procedure is unnecessarily slow and expensive since it requires two passes through the data file, once to computer X and then to compute m2. A NEW
If xc, a good approximation m2
COMPUTATIONAL
PROCEDURE
of X, is known, then =
$ (Xi -
i
X,)' - (X, -
2)’
requires only one pass through the data and the roundoff error will be the same order of magnitude obtained from equation (1). However, since a good x, is rarely known, it can be obtained by the following iterative scheme. Define
sf = c (Xi -
2,)”
(4)
I
where ,FI = 1 i xi. Then, r i
s; = i
x,_ 11 - [X, - x,_ 11)”
([Xi -
(5)
and using the binomial expansion So = ~
(Xi -
X,_,)k
+
~
~
i=l
i=l
(-
1)’ r 0
j=l
(Xi -
X,_ l)k-‘(X,
-
X,_ ,y’.
(6)
Changing the order of summation in the last expression and substituting, where appropriate, the notation St- 1,
sf = SF_ 1
+
(x, -
ql)k
+ i j=
(- ly’ ; 1
[S;:i
Recognizing the identity (X, - X,_ $ = (l/$(x, SF= Sk_
1
+
(x, - E,_
l)k + i j=l
(- l/t-y'
+ (x, - x,-#-j]&
- x,_$.
(7)
0 - X,_ #, equation (7) can be expressed as
05
[(x, - X,_
l)k + (x, - X,_ ,)‘$I{].
(8)
Thus, mk = i Si,and this is obtained by only saving X,_ i, and S{_, where j = l,...,k. In the special case where k = 2, the variance m2 = 4 S,'and SF in equation (5) reduces to the form r-l s; = S12-l+ 7(X,
- xr_1)2.
(Paper received 7 March 1973)