ABSTRACTS
A New General Purpose Minimum Variance Algorithm for the Analysis and Modeling of Biological Data F. T u r k h e i m e r , L. S o k o l o f f , K. P e t t i g r e w * , K. S c h m i d t Laboratory o f Cerebral Metabolism and ~Division o f Epidemiology and Services Research, National Institute o f Mental Health, Bethesda MD, USA Introduced in the early XIX Century (1,2), the method of the "Least Squares" (LS) has gradually become the basic tool for parameter estimation. The widespread use of this technique is due both to the technical ease of its implementation, and to a strong belief that measurement errors are distributed according to the normal law. This belief in normality relies more on the Central Limit Theorem ("The sum of many small independent elementary errors is approximately normal") than on a careful observation of the data. What began with a small group of investigators who objected to the so-called "dogma of normality" (3) has today become the generally accepted view that real world measurement noise can be distributed differently from the normal and that blind application of the LS can lead to quite erroneous determinations of parameter estimates and other related statistics. In unknown noise conditions a safer choice than LS estimation is the use of "Robust" estimators (4), but these estimators may lead to a loss of efficiency in the normal case. We have developed a new method that tackles this problem by selecting the Minimum Variance (MV) Estimator from a given family of estimators. One such family is the Lp norm, where the minimized residuals are taken to the power p; this family is a generalization of the LS method in which p=2. The MV method also allows the use of any other family, e.g. the m-estimators of Huber (5). Given a data sample and a suitable model, the MV algorithm computes for each member of the family both the estimate of the parameter of interest and the variance on that estimate through the "Bootstrap" technique (6). The parameter estimate with minimum variance over the entire family is then selected. No information on the distribution and the scale of the noise is needed. The MV method can be applied to any estimation problem; we have successfully applied it in simulation studies to estimation of the center of a symmelric distribution and to parameter estimation with linear and nonlinear models. In these studies with large numbers of simulations, the technique produced unbiased estimates of the parameters with variances close to the minimum theoretically achievable under various noise conditions, even with small sample sizes (10 points). The example below (Fig 1A) shows the variance of the slope of the straight line y=x+10 when the data are subject to zero mean Cauchy noise and the slope and intercept of the line are estimated by minimization of the Lp norm of the residuals for the family in which 1 _
Fig lB. Best-fitting MV and LS lines
SD
3:
0.16
25 0.14
20
0.12 0.10
15 I 2
P
I
I
3
4
Heavy line indicates smoothed curve.
10
0
I
I
5
10
I X
15
I------t 20
25
Solid line (MV estimate), Dashed line (LS estimate)
References 1. Gauss CF, Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Arnbientium. 1809, Frid. Perthes et I.H. Bessere, Hamburg 2. Legendre AM, Nouvelle Methodes pour la Determination des Orbites des Cometes. 1805, Courcier, Paris 3. Huber, PJ. The Annals of Mathematical Statistics. 1972, 43 (4): 1041-1067 4. Huber, PJ. Robust Statistics. 1981, Wiley & Sons, New York 5. Rey, JJ. Introduction to Robust and Quasi-Robust Statistical Methods. 1983, Springer-Verlag, Berlin 6. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. 1993,Chapman & Hall, New York
SlO1