Some Properties of a Class of Biased RegressionEstimators by J. M. LOWERRE Department of Mathematics, Clarkson College, Potsdam, New York 13676,
U.S.A.
For the linear statistical model y = Xb + e, X of full column rank estimates of b of the form (C+ X’X)‘X’y are studied, where C commutes with X’X and Q’ is the Moore-Penrose inverse of Q. Such estimators may have smaller mean square error, component by component than does the least squares estimator. It is shown that this class of estimators is equivalent to two apparently different classes considered by other authors. It is also shown thaf there is no C such that (C+X’X)‘X’Y = My, in which My has the smallest mean square error, component by component. Two criteria, other than tmse, are suggested for selecting C. Each leads to an estimator independent of the unknown b and u*. Subsequently, comparisons are made between estimators in which the C matrices are functions of a parameter k. Finally, it is shown for the no intercept model that standardizing, using a biased estimate for the transformed parameter vector, and retransforming to the original units yields an estimator with larger tmse rhan the least squares estimator. ABSTRACX:
L Introduction If the covariance matrix for e, in the linear statistical model y = Xb + e, is a’& and X’X is of full rank there is a considerable literature on biased estimators for b. The use of biased estimators is suggested when X’X has at least one small eigenvalue [see Refs. (l-4) and the extensive lists of references found therein]. Many of the estimators are in the class of the form 6* = (C+ X’X)‘X’Y, where C is a symetric matrix, which commutes with X’X, and M+ indicates the Moore-Penrose inverse of the matrix M. When C is such that the MoorePenrose inverse is the inverse of C+X’X the class is that identified by Hoer1 and Kennard (5) and more generally by the author in (6). A property of the class which makes it attractive is that if C has sufficiently small, positive eigenvalues then the estimator has smaller mean square error, component by component, then the components of b = (X’X)-lX’y have variance. That is, if h? is the ith component of 6* and 6i is the ith component of 6, then var (6:) + (6? - bi)’ < var (6i), for all i. Of course, this means if the eigenvalues of C are sufficiently small and positive that the total mean square error of 6* is less than that of 6: i.e. Hoer1 and Kennards’ basic theoretical result. This paper explores five additional properties of this class of estimators. (1) This is not the only class of biased estimators. Goldstein and Smith (22) Crone (13), and Obenchain (19), Hocking, Speed, and Lynn (2) studied an apparently different class of estimators. We will show that the estimators they studied comprise the same class identified here.
J. M. Lowerre
(2) Another way to form a linear estimator of b is to set 6** = My, in which so that b?* will have the smallest possible mean square error. Equivalently MM’ is to have the smallest possible diagonal elements. Although (C+ X’X)‘X’y is clearly in the class of linear estimators, it will be shown that there is no commuting C such that (C+ X’X)‘X’ = M. (3) When C+ X’X is of full rank, the optimal C to minimize tmse 6* has been developed by Hoer1 and Kennard. Unfortunately, this depends on the unknown b and a’. It will be shown that (C+X’X)+X’yb can be written as a linear combination of vectors, each of which in turn is the sum of a bias vector and a noise vector. One would like to minimize both the bias and the noise. Since this is not possible, it will be shown that on minimizing either of two functions of bias and noise that an estimator is obtained which does not depend on unknown parameters. (4) A subclass of this class of estimators is one in which the eigenvalues of C are functions of a parameter, k, whose magnitude is at the statistician’s disposal. Hoer1 and Kennard’s Ridge Regression estimator, in which C = kI, is a particular example. Section IV presents two different methods of selection from within this subclass. (5) The literature often suggests that X should be standardized before estimating b. However, it will be shown that if the estimates are to be in the original units and the model is the no intercept model that: standardizing, using a biased estimator of this class on the transformed parameters, and retransforming this estimate to the original units, yields an estimator with larger tmse than that of the least squares estimator. M is selected
II. Class Members The shrunken estimator of Mayer and Wilke (17), as well as Perlman (20), is of the form ol(X’X)-‘X’y, and hence is in the class with C= (o-r- 1) X’X. The Ridge Estimator of Hoer1 and Kennard is (kl+ X’X)-‘X’y, and is obviously in the class, as is Dwivedi’s (k(X’X)-‘+X’X)-‘X’y. It can also be shown that Marquardt’s fractional rank estimator is in this class (14). Goldstein and Smith (22), Crone (13), and Obenchain (19), Hocking, Speed and Lynn (2) have studied an apparently different class. It will now be shown that both classes are the same as identified here. The class of Crone, Obenchain, and Goldstein and Smith will be treated first. If the singular value decomposition of X is UAjP’, in which U is an II x p matrix of orthogonal columns, A is the p x p diagonal matrix of eigenvalues of X’X, and P is a (p x p) orthogonal matrix, then Crone, Obenchain, and Goldstein and Smith have considered estimators of the form PALJ’y in which A is a diagonal matrix of constants. Generally, they have selected the ith element of A, say ai, to be a function of Ai, the ith eigenvalue of X’X, and perhaps one or more parameters. Actually, Crone and Obenchain do this directly. Goldstein and Smith 518
Journal of The Franklin Institute
Some Properties of a Class of Biased Regression Estimators augment the matrix U to the n x n orthogonal
matrix (U: R) and set
They then form the p x p diagonal matrix D, in which the ith diagonal element, say di, is a function of the ith singular value of X and a parameter k. They then estimate [ with Dzl and estimate b with Pt. This is, however, 6* = PDU’y. Now, if A is any diagonal matrix it is clear from the properties of the Moore-Penrose inverse (9) that there is a unique diagonal matrix D such that A = (A+ D)+Af. Conversely, for any diagonal matrix D there is a unique PAU’y = P(D+A)+A%J’y = P(D+A)+P’PA%‘y = A = (A + D)+A! Then [P(D + A)P’]‘X’y = (C+ X’X)‘X’y, where C commutes with X’X. Since the steps are reversible the classes are the same. Obviously, A approaches A-; if the eigenvalues of C approach 0 through positive values. Thus, component by component the estimator PfiU’y will have a smaller mean square error than the corresponding element of b = (X’X)-lX’y will have variance. This is Goldstein and Smith’s primary result. Hockicg, Speed, and Lynn (2) considereg estimators of b of the form 6* = PBPb, in which B is diagonal, and Pb = A-‘P’X’y. However, this is 6* = PBA-‘P’X’y = PBA-$U’y. Since BA-; IS ’ diagonal, their estimator is also in the same class. Their diagonal B is formed to satisfy Bcicl
(ii)
ai=
, in which Ii is an ixi
identity, ai is a scalar. It is easy to show this is equivalent to Bl = Ra, in which R is of full rank, and 1 is a vector of all ones, so that for any diagonal matrix A there is a vector a such that AA+1 = Bl = Ra has a solution vector a. Thus, for any diagonal matrix A, the estimator PAU’y is in the class they considered. III. No Optimal
C
Theil (23) shows that if M= bb’X(Xbb’X’+ a’I)-‘, then the mean square error matrix of My, that is E[(My- b)(My - b)‘], has the smallest possible diagonal elements. Farebrother (15) shows, alternately, that M can be written (a’+ b’X’Xb)-‘bb’X. One can also show this by using the form of the GaussMarkov theorem in Liebelt (12) and using the tearing matrix result that (0 + cd’)-l = Q-’ - (1 + d’Qc)-‘Q-‘cd’Q_‘, where c and d are vectors and Q is an invertible matrix. The M obtained is, like Hoer1 and Kennard’s optimal C, not directly applicable as it depends on the unknown b and o*. However, if there were a C such that (C + X’X)‘X’ = bb’X’(a* + b’X’Xb)-’ = bb’X’q (where for simplicity we set q = (a*+ b’X’Xb)-‘) we might estimate C and use it as an estimate of the optimal linear estimate. We now show that no such C exists. Vol. 303, No. 6, June 1977
519
J. M. Lowerre
Suppose, to the contrary, there exists a commuting C such that (C+ X’X)‘X’y = qbb’x’y. On substituting y = Xb + e and noting the equality must hold for all e, hence for e = 0, one can obtain (C+ X’X)+X’e = qbb’x’e. Since this in turn is to hold for all e then (C+ X’X)‘X’ must equal qbb’X’. Multiplying both sides of (C+ X’X)‘X’ = qbb’X’ by X and inverting yields (C+X’X)+ = qbb’. Now, X’X=
PAP’. Further, if A is partitioned
generality we note (C + X’X) is non-invertible
into
, with no loss of
if and only if 6 is partitioned
as
It follows (C + X’X)’ = P1( D1 + A,)-‘P;. Similarly, when C + X’X is invertible its inverse is P(D + A)-‘P’. Substituting into (C + X’X)’ = qbb’ yields P1(D1+A1)-‘P: = qbb’, or (D1+Al)-l = qP;bb’P1 = (Jq P;b)(Jq Pib’)‘. However, the left side is diagonal, while the right side is not, except for the trivial and uninteresting case that b is a scalar, hence, the conclusion follows from the contradiction.
IV. Two Choices of C Independent of Unknown Parameters It has been shown earlier that if the singular value decomposition of X is UAfP’, where C = PDP’ with D, diagonal, and A diagonal, that (C+ X’X)‘X’y = PAU’y. Here, D = A+Af- A or A = (D + A)+A$. A major problem is the selection of D, or equivalently A. In ridge-regression D = kl and the ith diagonal element of A is Qi= hd/(k + hi). In C = k(X’X)-1 the matrix D = kA-‘, and the ith diagonal element of A is ai = h&k + A:)-‘. For these two choices of C, one must select k. If the method used in k selection is to choose it so that tmse (C+ X’X)-rX’y < tmse (X’X)-‘X’y the choice of k depends on the unknown b and u2. On the other hand, Hoer1 and Kennard (5) showed that the matrix C= PDP’, such that tmse (C+X’X)‘X’y is a minimum, has di = cr2z;‘, where the vector z = P’b. This criterion for optimization thus leads to a solution with the same limitation as Theil’s: it also depends on the unknown quantities. We note that optimal A has Ui= A$.$(a2+Aiz~)-‘. Two methods of selecting C, or A, are now suggested, where the criterion is to optimize a property of the estimator other than mse or tmse. To develop this we note that 6* = PAU’y = PAU’( UAfP’b + e) = PAAiP’b
+ PAU’e
=b+P(AAf-I)P’b+PAU’e
= b + f pj{[ajAjf- l]pJb + aju{e), j=l
520
Journal of The Franklin
Institute
Some Properties of a Class of Biased Regression Estimators
where pi is the jth column of P and Uj is the jth column (ajAj’- l)(p;b) is identified as the jth bias quantity, (a$! factor, ajuie as the jth noise quantity and aj as the jth difference 6* - b is then CF=i(pjBj + pjlvj), in which Bj is the Nj the noise quantity. It is also noted- that the set of each with mean 0 {uie : j = 1,2,. . . , p} are uncorrelated, Thus, E[‘(6* - b)‘(6* - b)] = f
E[(ajh)-
of U. The scalar 1) as the jth bias noise factor. The bias quantity and random variables and variance u’.
l)(pjb) + ajuie]”
j=l
and selecting A to minimize the tmse is equivalent to selecting, for each i, ai to minimize (a$~- l)‘(p!b)‘+ afa2. This yields ai = Afz:(a2+ AizT)-‘3 and hence, the optimal C determined by Hoer1 and Kennard (5). However, as mentioned earlier, rather than select a C dependent on (p + 1) unknown quantities, it is recognized that what one would like to do is simultaneously minimize the bias quantity and noise quantity. Since each of these is a product of an unknown factor and a known factor, the objective can be modified to simultaneously minimizing the bias and noise factors, no matter what the unknown error vector e is and unknown b. Although this cannot be done, in the spirit of least squares the ai can be selected to minimize the sum of the squares of the bias and noise factors. Elementary calculus shows d[(aiA#-
1)2+a?]da;1=O=2[(aiA~-l)A~+ai],
if ai = Af(l+ hi)-‘, and that this yields a minimum. From D = A-‘hf-A, and C = PDP’ we obtain C = I, and the estimator 6* = (I+ X’X)-‘X’y. This is Hoer1 and Kennard’s ridge-regressor with k = 1. Rather than minimize the sum of the squares of the bias and noise factors, one might choose to minimize the sum of their absolute values. That is, one might select the ai to minimize (aihf - I]+ (ail. To do so, it is first meaningful to determine what ai are reasonable and minimize within this set. Comparing the vector term pi[(aiAf - l)plb + aiu;e] in 6* - b and piA;tuIe in 6 - b, it is reasonable that o I ai 5 ATi, or more noise will exist with the biased estimate than the unbiased one. It also seems reasonable that ai approach o as Ai approaches 0, and approaches A$ as hi gets large. (Actually, we note that the optimal ai = Afz?(a2+ AiZ:)-’ has all of these properties.) If then ]aiAi- I]+ [ai] is minimized subject to o 5 ai 5 A$, then the objective function is 1 - aJT’+ ai. For 0 5 ai I Ai this is obviously minimized by setting ai = 0 when hi 5 1 and ai = A$ when hi > 1. It obviously satisfies the other constraints. A1 0 Assuming A= o ~ in which the elements in A, are greater than one and ( 21 these in AZ are less than one, it follows that and
Vol.303,No. 6,June1977
D=A+Af-A=
521
J. M. Lowerre Hence, C+ X’X= PiAlP:. This is Marquardt’s estimator (14), or principal components estimator, with explicit cut-off point. Since the two preceding choices of C are independent of 6 and u*, because they are obtained by minimizing functions of the bias and noise factors, there will be vectors 6 for which the tmse of the estimator exceeds that for the least squares estimator. On the other hand, no matter how C is selected, if it is independent of b, a*, and y, it is easy to show there will exist vectors for which the least squares estimator will have smaller tmse than will (C+ X’X)‘X’y. V. Functions of k Although estimators in which the elements of D are functions of a parameter k are not apt to have optimal properties, except for ridge-regression with k = 1, they may still hold practical advantages for an analyst. This is because such an estimator may have smaller tmse than (X’X)-lX’y. For example, Hoer1 and Kennard showed tmse (kl+X’X)-‘X’yctmse (X’X)-lX’y if o< k< a’lmax (2: : z = P’b}. In general, if C(k) = PD(k)P’, where the ith diagonal element of D(k) is written as di(hi, k) to show it as a function of the ith eigenvalue of X’X and a parameter k, then if di(hi, k) > 0 and has limit zero as k approaches zero through positive values, then there exists a K such that if O< k
a*Ai + zfdi(Ai,
k)*
(di(Ai, k) + Ai)*
’
then if Cj(k) = PDj(k)P’, j = 1,2, with the ith diagonal element if D,(k) written di(A, k : j), a sufficient condition for 6:(k) to be tmse better than b:(k) is that d
a*Ai + Zfdi(Ai, k : 2)* k = o o*Ai + zfdi(Ai, k : l)* k = o< d f dk i=l (di(A, k : 2)+ hi)* I ’ dk i-1 (di(Aipk : l)+ Ai)* I f
As
an
example,
di(Ai, k : 1) = khf’
consider C,(k)= k(X’X)-1 and di(Ai, k : 2) = k. ALSO
and
C*(k) = kl.
We
find
d ~‘Ai+z:(kA;‘)‘/,_O=~~_2~2A~~, r dk [(kA;‘)+ Ai]* i=l 522
Journal of The Franklin
Institute
Some Properties of a Class of Biased Regression Estimators
while
It follows that (k(X’X)-’
+ X’X)-lX’y
2cr* 2 (Ai)-3-2U2
is tmse better than (kl+ X’X)-‘X’y
igl (hi)-2= 2~’
i=l
f
if
(1 -Ai)(‘>O.
i=l
If the eigenvalues of XX This is equivalent to
are ordered
i (1-Ai)(Ai)-3> i==l
as Ai 5A25.
2
. .S At5 l< A,+r . . .
(Ai-1)(Ai)-3.
i=t+l
Although this is a sufficient condition for 6?(k) to be tmse better than 6?(k), its application may be simplified. The maximum of (x - 1)xe3, for x > 0, is $ which occurs at x = 3/2. Thus, the inequality condition will hold if i (1sAi)(“>(p-t)$. i=l
It is easily shown that this latter holds if (p - t) 5 6 and at least one eigenvalue if X’X is less than 0.6. This latter is apt to occur, of course, when the matrix X makes biased estimates reasonable in the first place. An attractive alternative criterion for selecting within such class can be obtained by rewriting the estimators as 67(k) = PAl(k) U’y and 6?(k) = PA2( k) U’y. In this form it is reasonable to identify bT( k) as optimal better than 6$(k) if each element in the diagonal matrix A,(k) is closer to an element in an optimal A than is the corresponding element in A2(k). As an example, consider the optimal k as that one obtained by minimizing laiA-‘- 11+/ail, subject to 0 I ai 5 A-;. AS shown earlier, this leads to ai = 0 if Aisl, and ai=A$ if Ai>l. NOW, for GT(k)=(k(X’X)-‘+X’X)-‘X’y the ith diagonal element of AZ(k) is Af/(k + hi). Thus, the ratio of the ith diagonal element of Al(k) to that of AZ(k) is Ai(Ai+ k)(A:+ k)-‘. If hi I 1 this ratio is equal to or less than 1. If Ai> 1, the ratio is greater than 1. Thus, both diagonal elements are between 0 and A;+, with the element from A,(k) closer to 0 ‘than that from A,(k) when hi I 1 and closer to AT: when hi > 1. Using the optimal A identified here, we conclude (k(X’X)-l+X’X)-‘X’y is optimal better than (kl+ X’X)-‘X’y.
VZ. Non-Standardization None of the preceding has made reference to whether or not X is in standard or correlation form. All the previous results are applicable then in either case. It is usual though for X to be in standard form (3). A major reason seems to be that if one estimates b with (kl+X’X)-lX’y with the original data and then changes the scale in X the affect of k on different components will change. Vol. 303,
No. 6, June 1917
523
J. M. Lowerre There is one time, however, when variables should not be standardized. Explicitly, the variables should not be standardized if the model is the nonintercept model and one wishes an estimator in the original units which has smaller tmse than does the least squares estimator. It will be shown that for the non-intercept model that standardizing variables, estimating the transformed parameters by an estimator in the class being discussed, and converting this to an estimator in the original units leads to an estimator in the original units with a tmse greater than that of the least squares estimator. Consider then y = Xb + e, where no column of X is a vector of all ones. Let 1 indicate a vector of all ones and I’= l/n 1’X indicate the row vector of the averages of the columns of X. The model can then be written y = (X- I_?+ l?)b + e = (X- 1i’)b + lZ’b+ e. Also, 7 = n-‘1’~ = 3b + 2, so y_yl=(j&l-’ x)b+e-2l=xb+E. If now q: = n-l CyZ1 (xii - Z.j)2, then the matrix q1 0 0 0 q2 0
... ...
0 0
can be formed, and the model written as y - 71 = xQ_lQb + E = xQ-‘/3 +E. Since each element of xQ-’ is of the form
the model is in standard form in terms of the parameter vector p. If C(k) is a matrix which commutes with Q-‘x’xQ_’ and whose positive eigenvalues approach 0 as k approaches 0, then a biased estimator of p, which is of the type considered in this paper, is (C(k) + Q-‘x’~Q-~)fi* = Q-‘x’(y - yl). This is the same as Q-l(QC(k)Q+x’x)Q-‘$* = Q-lx’y. Using the obvious transformation between estimators one finds (QC(k)Q + x’x)6*( k) = x’ y. It follows that tmse 6*(k) = u2 tr [x(QC(k)Q+x’x)-2,y’]+ x [QC(k)Q+
x’x)-ty’x -
b’[(QC(k)Q+x’,y)-‘x’x-
I]’
0.
It follows that the limit as k approaches 0 of tmse 6*(k) = a2 tr (x’x)-’ = a2 tr (X’X- n-lX’JX)-‘, in which J is a p x p matrix of all ones. This assumes, of course, that the inverse of x’x exists, which it will if the vector 1 is not in the column space of x. We make this additional assumption now. Using the tearing operation from (17) it is found that m2 tr (x’x)-’ = cr2 tr (X’X)-’ + [
n-‘1’X(X’X)-2X’1 1 - (n)_’ l’X(X’X)_‘X’l
1’
if 1 - (n)-’ 1’X(X’X)-2X’1 # 0. However, this is identical to (n)-‘l’[IX(X1X)-‘X11, which is greater than 0 because I-X(X’X)-lX is positive, semi-definite and 1 is not in the column space of X. Thus, we can state tmse 524
Journal of The Franklin
Institute
Some Properties of a Class of Biased Regression Estimators 6*(k) = tmse (QC(k)Q + ~‘x)-~x’y > tmse (X’X)-‘X’y if the eigenvalues of C(k) are positive and close to 0. Interestingly, the same result does not hold in the intercept model. That is, if y=(l:X)+?
+e, where 1 is not in the column space of X, the model can be ) transformed into y - 71 = xb + E = xQ-‘P +E. However, in this case x is constructed only from X and not the entire coefficient matrix (1 : X). In this case, 6*(k) is also equal (QC( k)Q + x’~)-~x’y, and its tmse, as the eigenvalues Searle shows, however, that the least of C(k) go to zero, is (+‘tr (x’x)-‘. squares estimators of bO and b are 7 - 6’2 and 6 = (x’x)-‘x’y. Thus, limit tmse 6*(k) = tmse 6, and not less than tmse 6 as in the no intercept model. (
VI Conclusion Biased regression estimators of the form 6* = (C+ X’X)‘X’y have been developed recently since such estimators may have smaller mean square error, component by component, than the corresponding element of (X/X)-‘X’y. Writing the singular value decomposition of X as UA;p’, it is shown that the class of estimators of the form PAU’y, in which A is diagonal, is the same class, hence, as the elements of A approach A-+ estimators of this class have smaller tmse, component by component, than do the corresponding elements of the least squares estimator. It is also shown to be the same as the class in which 6* = QX’y, where Q commutes with X’X. The estimator (C+ X’X)‘X’y is, of course, linear in y. It is shown, however, that there is no C such that the estimator (C+X’X)‘X’y is the linear estimator with the smallest mean square error component by component. As shown by Hoer1 and Kennard, there is a commuting matrix C = PDP’, with di = a’/~:, such that tmse 6* is minimized. Unfortunately, the elements of D depend on the unknown u2 and b. To circumvent this, it is shown 6*-b is the sum of bias vectors and noise vectors and two estimators exist in which a function of the bias and noise factors are minimized. Minimizing either of these, no matter what the noise is, provides estimators which do not depend on unknown parameters. Of the two, the one with the more intuitive appeal to the author is the one which minimizes laihif- ll+la,(. Subsequently, two methods of choosing between estimators in which the diagonal elements of D are functions of a parameter k were proposed. This provides methods of comparison between such estimators. Since each might have smaller tmse than (X’X)-‘X’y such methods seem desirable. Both methods show (k(X’X)-1 + X’X)-‘X’y as better than (kl+ X’X)-‘X’y. Finally, a problem with standardization is discussed because most of the literature seems to accept standardization as the norm. It is shown that for the no intercept model that standardizing, estimating the transformed parameter vector with a biased estimator, and then re-transforming to the original units yields an estimator with larger tmse than the least squares estimator. References (1)R. F. Gunst, Latent
Root
R. L. Mason Regression
Vol. 303, No. 6. June 1977
of Least Squares and and J. T. Weber, “A Comparison Estimators”, Technometrics, Vol. 18, pp. 75-83, 1976.
525
J. M. Lowerre (2) R. R. Hocking,
F. M. Speed and M. J. Lynn, “A Class of Biased Estimators in Linear Regression”, Technometrics, Vol. 18, pp. 425-437, 1976. (3) D. W. Marquardt and R. D. Snee, “Ridge Regression in Practice”, The American Statistician, Vol. 29, pp. 3-19, 1975. (4) S. M. Sidik, “Comparison of Some Biased Estimation Methods (Including Ordinary Subset Regression) in the Linear Model”, Technical Report No. NASA TN D-7932, National Aeronautics and Space Admin., Lewis Research Center, Cleveland, Ohio. (5) A. E. Hoer1 and R. W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems”, Technometrics, Vol. 12, pp. 55-67, 1970. (6) J. M. Lowerre, “On the Mean Square Error of Parameter Estimates for Some Biased Estimators”, Technometrics, Vol. 16, pp. 461-467, 1974. (7) T. Dwivedi, “Some Properties of a General Family of Biased Estimators”, Ph.D. Dis., Dept. of Mathematics, Clarkson College of Technology, Potsdam, New York, 1974. (8) W. J. Hemmerle, “An Explicit Solution for Generalized Ridge Regression”, Technometrics, Vol. 17, 1975. (9) F. Graybill, “Introduction to Matrices with Applications in Statistics”, Wadsworth Pub. Co., Belmont, Calif., 1969. (10)A. E. Hoer1 and R. W. Kennard, “Ridge Regression: Applications to Nonorthogonal Problems”, Technometrics, Vol. 12, pp. 69-82, 1970. (11)A. E. Hoer1 and R. W. Kennard, “A Note on a Power Generalization of Ridge Regression”, Technometrics, Vol. 17, p. 269, 1975. (12) P. B. Liebelt, “An Introduction to Optimal Estimation”, Addison-Wesley, Reading, Mass., 1967. (13) L. Crone, “The Singular Values Decomposition of Matrices and Cheap Numerical Filtering of Systems of Linear Equations”, J. Franklin Inst., Vol. 294, 2, pp. 133-136, 1972. (14) D. W. Marquardt, “Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation”, Technometrics, Vol. 12, pp. 591-612, 1970. (15) R. W. Farebrother, “The Minimum Mean Square Error Linear Estimator and Ridge Regression”, Technometrics, Vol. 17, pp. 127-128, 1975. (16) G. C. McDonald and D. I. Galarneau, “A Monte Carlo Evaluation of Some Ridge-Type Estimators,” J. Am. Statist. Ass., Vol. 70, pp. 407-416, 1975. (17) L. S. Mayer and T. A. Wilke, “On Biased Estimation in Linear Models”, Technometrics, Vol. 15, pp. 497-508, 1973. (18) B. Noble, “Applied Linear Algebra”, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1969. (19) R. L. Obenchain, “Ridge Analysis Following a Preliminary Test of the Shrunken Hypothesis”, Technometrics, Vol. 17, pp. 431-441, 1975. (20) M. Perlman, “Reduced Mean Square Error Estimation for Several Parameters”, Sankahya, Ser. B, V. 34, 1972. (21) S. R. Searle, “Linear Models”, Wiley & Sons, Inc., New York, 1971. Estimators for Regression (22) M. Goldstein and A. F. M. Smith, “Ridge-type Analysis”, J. Roy. Statistical Sot., Ser. B, Vol. 36, pp. 284-291, 1974. North Holland Pub. Co., Amsterdam, (23) H. Theil, “Principles of Econometrics”, Holland, 197 1. (24) J. T. Webster, R. F. Gunst and R. L. Mason, “Latent Root Regression Analysis”, Technometrics, Vol. 16, pp. 513-522, 1974.
526
Journal of The Franklin Institute