Economics Letters 3 (1979) 159-164 0 North-Holland Publishing Company
A ‘CONSERVATIVE’ Frank MOULAERT
RIDGE ESTIMATOR *
Centrum voor Economische
Stud&
K. U. Leuven, Belgium
Received 11 July 1979
In this paper an alternative to the Ordinary Ridge Estimator (ORa introduced by Hoer1 and Kennard (1970) is proposed. This estimator is called a ‘Conservative’ Ridge Estimator (CRE), because it puts a heavier weight on the unbiasedness and a smaller weight on the statistical stability of the ‘unstable’ estimation components than the ORE does. Nevertheless, the CRE often performs better in terms of the Mean Square Error of estimation.
1. Two ways of decomposing
the OLSE
The model under study is the well-known y = xgt
linear homoskedastic
model,
E,
(1)
def
where
x=
[xr )...) xi )...) x,]
for
Xi E biT
(all i) .
y, is a typical element of y, and Xit is a typical element Of Xi (t = 1, . . .. T; T refers to the sample size). The stochastic variable ey is a typical element of E (t = 1, .. . . T) obeying the statistical properties of centrality, i.e., E(Q) = 0 (all t) and of homoskedasticity, i.e., E(etetl) = 0 (all t, t’: t # t’) and E(e:) = u2 (all t). All observations yr and xit are standardized, i.e., centered about their arithmetic mean and divided by their standard error times T. In addition, it is assumed that X is a non-stochastic matrix of full column rank. The singular decomposition of X is given as X=UAP’,
(2)
such that X’X = PA2 P’ for P’P = PP’ = I,,, (identity
matrix of dimension
m) and
* I wish to thank Tony E. Smith, Stephen Gale, Yoel Haitovsky and Essie Maasoumi for their comments. 159
160
F.Moulaert /A ‘conservative’ ridge estimator
A2 = diag(61, . . .. Sf, . . .. Sk) such that 6i > 6i+r > 0 (all i) and U=der XPA-’ (as a consequence U’U = Im). For the purpose-of this note, use will be made of the following two decompositions of the OLSE jl of the vector of parameters fl in (I), namely fi = (X)X’)-‘X’u = (PA-‘P’)X’y
,
(3)
and
=U’y .
fi = (X’x>-‘X’v
= PA-la
2. Unbiasedness
versus statistical stability
2.1. Statistical
instability
for
a
of the OLSE
Many authors have illustrated how the is strongly dependent on the lower-bound cross-product matrix X’X. * For example, of the OLSE by means of its Mean Square MS@)
d;f E[@ - tV’(fi -
= tr cov@)= u2 gr
statistical stability of the OLSE estimator of the spectrum of eigenvalues of the if one measures the statistical stability Error (MSE), i.e.,
PII h , ( I1
one notices that for given finite lim a,%
(4)
u*,
tr COV[&~~)] = t=.
2.2. Ridge type estimators To deal with this condition of statistical instability, one often relies on biased estimators in general, and ridge type estimators in particular. ’ Following, and at the same time generalizing, Haitovsky and Wax (1976) 3 the class of ridge estimators under investigation is defined as (9
B*(k, s> d;f [X’X + k&s]--‘xj, , where S = P3P’ def
for
S = diag(sl, .. . . si, . . . .
s,,),
all si > O ,
def
’ For procedures to decide on the ‘smallness’ of 6,, see Belsley (1976, part III). ’ For a good survey of ridge type estimators, see Vinod (1978). 3 For this generalization, see Moulaert (1979, p. 5 ff.).
F. Moulaert /A
‘conservative’ ridge estimator
and k some positive scalar. 4 The estimator(s) fl*(k, 5’) can be decomposed
161
as
B*(k, s> = PY* (k 3, where the typical element of y* equals
and where ai is the typical element of the vector a defmed in (4). The r; (all i) are called ‘estimation components’ of p*. (In fact, they are the elements of the estimator r* of ‘y=&f P’fl.) Given k = 0, estimation components corresponding with small 6i are called ‘unstable’, the others are called ‘stable’ estimation components. 2.3. Unbiasedness versus statistical stability The A&SEof fl* (k, S) equals
which can be decomposed i.e., MWB*
(k
into the total variance fi , and the mean square error fi ,
s)l =fi LB* (k 31 +fz LB* (!GXII,
where
and
(7) It is shown in Moulaert (1979) that there always exist a k > 0 such that MSE[~* (k, A’)] < MSE@). Moreover, one can easily verify that
4 The problem of selecting a statistically optimal k is not treated in this note. For this problem, see again Vinod (1978). It is important to,point out that the philosophy behind most of the selection criteria, is to find a small k which leads the estimator, without too much shrinking, into acceptable (depending upon the criterion which is utilized) statistical stability boundaries. (For the link between the shrinking of the estimation components and the estimator on the one hand, and statistical stability on the other hand, see footnote 8.)
F. Moulaert /A konservative’ridge estimator
162
(i) the first partial derivative of fr with respect to k is strictly negative, and (ii) the first partial derivative of fi with respect to k is strictly positive. Hence, ask increases, the ridge estimators in (5) behave better and better in terms of statistical stability as measured by fr , but worse and worse in terms of unbiasedness as measured by fi .
3. CRE versus ORE 3. I. Defining the estimators By setting all si in (5) equal to unity, one obtains the ORE fl* (k) of fl. In fact, Hoer1 and Kennard originally derived this estimator by shifting the spectrum of eigenvalues of X’X upward with a small positive scalar k, i.e., p*(k) = (X’X + k&J-‘X’y = P(A2 + kZ,,,)-‘P’X’,v . The corresponding rf (k) = 6iaJ(6f
estimation
(8) components
then equal
all i .
+ k) ,
If, on the other hand, one sets the Si = 6i (all i), one obt_ains the CRE b(k) of fi. This estimator corresponds with the decomposition of fl given in expression (4), where the main diagonal of A has been shifted upward with a positive scalar k, i.e., p(k) = (T’@-’
x>
= P(A + k&J’
a,
(9)
for X = U(A t kZm)P’ or X = U(I t kA-’ ) LJ’X. Hence, X can also be regarded as an instrumental transformation of X. The estimation components corresponding with CRE b(k) equal ?i (k) = ai/@ i + k),
all i .
3.2. Comparative analytical evaluation ’ By use of (6) and (7) it can be verified that, given k, (i)
for all Si < 1, the corresponding terms of the sum in fi, i.e., the total variance of the estimator, will be smaller for the ORE than for the CRE. They will however be larger for all 6i > 1,
5 For a geometrical interpretation
of both ORE and CRE, see Moulaert (1979, p. 13 ff.).
F. Moulaert /A
‘conservative
163
‘ridge estimator
for all 6i < 1, the corresponding terms of the sum in fz , i.e., the lotal square bias of the estimator, will be smaller for the CRE than for the ORE. They will however be larger for all 6i > 1. (iii) the terms in fi cfi) corresponding with 6i = 1, are identical for the ORE and the CRE. In other words: for given k, ORE weights the statistical stability of the unstable estimation components more heavily than CR,!? does, whereas CRE puts more weight on the unbiasedness of the unstable estimation components. (For the stable components, the reverse holds.) 3.3. A Bayesian interpretation Assume that the elements of E in (1) are distributed
independent
normal, i.e.,
E - N(0, 02Z,) ,
(10)
and that the following a priori information r=Rfl+e,
e - WJ, L)
on B is given:
,
(11)
for n < m, r E din and R, a given n times m non-stochilstic matrix with rank n. 6 Following Theil(197 1, p. 670 ff.), (r, R) is here regarded as a second sample which is statistically independent of the sample b, X). In addition, it is assumed that fl and log u are uniformly and independently distributed a priori as P(f4 a> a ;
for
-w<~i
all i,
(12)
O
By application of Bayes’ theorem and partial integration of the joint posterior density function p@, u/y, r) with respect to u, a posterior density function for b is obtained. An approximative posterior mean of fi equals (13) where c2 is the OLSE of u2. For r = 0 (and, hence fl= 0), expression (13) reduces to an ORE and CRE, respectively when R = I,,, and R = A112P’. ’
6 For a given non-singular
’
variance-covariance
matrix
V, without
loss of generality,
any a priori
information of the form r = R$ + e, e - N(0, VI, with n < m, r E P” and R given and nonstochastic of rank n, can be transformed into a priori information of the form (11)., See Theil (1971, p. 670). In view of footnote 6, given V = *?,I,, R would equal (l/o,)Zm and (1/0,)Al’~ P’, respectively. In that case, k = G2/oz i.s.0. G2 solely.
164
F. Moulaerr /A
'conservative'ridge estimator
In view of (1 l), I can be regarded as ancunbiased a priori estimator of fl corresponding with the (posterior) ORE, whereas A-II2 r can be regarded as an unbiased a priori estimator of y corresponding with the (posterior) C’. Hence, the a priori estimator corresponding with the CRE has a tendency to inflate the (norm of the) unstable estimation components and may deflate the stable estimation components, whereas the ORE a priori estimator remains neutral in this respect. Since there exists a direct link between the norm of an estimation component and its impact on the statistical stability as measured by fr , ’ the above result confirms the conclusions on statistical stability of ORE versus CRE drawn in section 3.2.
4. Conclusion The reader may verify that mere algebraic analysis is not powerful enough to discriminate between ORE and CRE on the basis of their comparative performance in terms ofMSE. However, Monte Carlo experiments [see Moulaert (1979, p. 18 ff.)] confirm that, for small values of CT,the ‘conservative’ stabilization procedure offered by CRE often dominates ORE interms of MSE of the estimator and that, in general, ORE is a worthy alternative to CRE.
References Belsley, D.A., 1976, Multicollinearity: Diagnosing its presence and assessing the potential damage it causes least-squares estimation, N.B.E.R. Working Paper 154. Haitovsky, Y. and Y. Wax, 1976, Generalized ridge regression, least squares with stochastic prior information and Bayesian estimators, CORE Discussion Paper 7626. Hoerl, A.E. and R.W. Kennard, 1970, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12,55-67. Moulaert, F., 1979, Conservative ridge estimation, Ph.D. Dissertation (Regional Science Department, University of Pennsylvania, Philadelphia, PA). Theil, H., 1971, Principles of econometrics (Wiley, New York). Vinod, H.D., 1978, A survey of ridge regression and related techniques for improvements over ordinary least squares, The Review of Economics and Statistics LX, no. 1, 121-131.
’ In fact, fl [fl*(k, S)] = cr2xg r +‘/a;, stability of.p*.
so that shrinking r#? (for any z) improves the statistical