Another view of the Kuks–Olman estimator

Another view of the Kuks–Olman estimator

Journal of Statistical Planning and Inference 89 (2000) 169–174 www.elsevier.com/locate/jspi Another view of the Kuks–Olman estimator Bernhard F. Ar...

70KB Sizes 3 Downloads 133 Views

Journal of Statistical Planning and Inference 89 (2000) 169–174

www.elsevier.com/locate/jspi

Another view of the Kuks–Olman estimator Bernhard F. Arnold ∗ , Peter Stahlecker

 Institut fur Statistik und Okonometrie, Universitat Hamburg, Von-Melle-Park 5, D-20146 Hamburg, Germany Received 31 March 1999; received in revised form 15 December 1999; accepted 19 January 2000

Abstract In linear regression biased estimators like ridge estimators, Kuks–Olman estimators, Bayes, and minimax estimators are mainly used in order to circumvent diculties caused by multicollinearity. Up to now, the application of the minimax principle to the weighted scalar mean squared error yields explicit solutions solely in speci c cases, where, e.g., ridge estimators or Kuks–Olman estimators are obtained. In this paper we introduce a new objective function in such a way that we always get an explicit minimax solution which, in an important special case, can be interpreted as a Kuks–Olman estimator. Our functional may be viewed as a meac 2000 Elsevier Science B.V. All rights sure of relative rather than absolute squared error. reserved. Keywords: Kuks–Olman estimator; Linear regression; Minimax estimator; Ridge estimator

1. Introduction We consider the linear regression model y = Xÿ + u; n

(1) n×k

where y ∈ R is the column vector of observations of the dependent variable, X ∈ R is the deterministic model matrix consisting of the known values of the k explanatory variables, ÿ ∈ Rk is the column vector of the unknown regression coecients, and u ∈ Rn is the column vector of the unobservable disturbances. In order to estimate ÿ we focus on linear estimators b = Cy with C ∈ Rk×n . When rk(X ) = k the most prominent estimator of ÿ is the OLSE given by C = (X 0 X )−1 X 0 , where 0 denotes the transpose. If rk(X ) ¡ k or if the matrix X 0 X is ill-conditioned, i.e., in case of multicollinearity, a ridge estimator proposed by Hoerl and Kennard (1970) may be used; it is given by C = (rIk + X 0 X )−1 X 0 ; ∗

Corresponding author.

c 2000 Elsevier Science B.V. All rights reserved. 0378-3758/00/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 0 0 ) 0 0 0 9 8 - 7

(2)

170 B.F. Arnold, P. Stahlecker / Journal of Statistical Planning and Inference 89 (2000) 169–174

where Ik ∈ Rk×k is the identity matrix and r is a suitably chosen positive real number. As we do in this paper, Kuks and Olman (1971, 1972) suggested to use the minimax principle but, in contrast to our approach, they assume that there is some prior information available about ÿ represented by a nonempty compact subset of Rk . A prerequisite of their investigations is that the expectation E(u) of the disturbances u is equal to zero and that the covariance matrix V ∈ Rn×n of u is known and positive de nite (p.d.): E(u) = 0

and

E(uu0 ) = V:

(3)

Note that throughout this paper any nonnegative de nite (n.n.d.) or p.d. matrix is assumed to be symmetric. Kuks and Olman applied the minimax principle to the weighted scalar mean squared error. In general, the underlying optimization problem cannot be solved explicitly; for further discussions see, e.g., Lauter (1975), Ho mann (1979), Pilz (1986), Stahlecker (1987), Trenkler and Stahlecker (1987), Ga ke and Heiligers (1989), Pilz (1991), Drygas (1993), Stahlecker and Trenkler (1993) and Rao and Toutenburg (1995). In a special case, Kuks and Olman derived an optimal linear estimator b∗ = C ∗ y with C ∗ = (S + X 0 V −1 X )−1 X 0 V −1 ;

(4)

k×k

is a given p.d. matrix. Here, no rank condition is imposed on X . where S ∈ R Expression (4) leads us to the following de nition of a Kuks–Olman estimator, where we slightly generalize the class of linear estimators to the class of linear ane estimators. Deÿnition 1. A linear ane estimator b = Cy + c (C ∈ Rk×n ; c ∈ Rk ) for ÿ is called a Kuks–Olman estimator if C = (U + X 0 WX )−1 X 0 W with p.d. matrices U ∈ Rk×k and W ∈ Rn×n . Kuks–Olman estimators can also be viewed as general ridge estimators discussed, e.g., by Rao (1976) and by Markiewicz (1996). It is noteworthy that Kuks–Olman estimators also appear within the framework of a Bayesian approach, where the knowledge about ÿ is represented by a probability distribution (see, e.g., Rao (1976) or Pukelsheim (1993), p. 270). 2. The objective function Let b be an estimator of the regression coecient ÿ in the linear model (1), and let (b − ÿ)0 A(b − ÿ) be the weighted squared error of b, where A ∈ Rk×k is a given n.n.d. matrix of weights. Adopting the concept of a relative error we want to allow for an increasing squared error (b − ÿ)0 A(b − ÿ) whenever ÿ is increasing with respect to

B.F. Arnold, P. Stahlecker / Journal of Statistical Planning and Inference 89 (2000) 169–174 171

some suitably chosen norm; furthermore, it seems to be reasonable to tolerate a greater value of the weighted squared error of b in case of a larger norm of the disturbances u. Combining both vectors ÿ and u to one column vector  = ( ÿu ) ∈ Rk+n we consider the ratio (b − ÿ)0 A(b − ÿ) ; 0 T

(5)

where T ∈ R(k+n)×(k+n) is a given p.d. matrix and where, of course, we assume  6= 0. Obviously, (5) meets both of the requirements stated above. Moreover, it may be more appropriate not to focus on the values of ÿ and u itself, i.e., on the deviations of ÿ and u from the corresponding null vectors, but on the deviations of ÿ and u from given parameters ÿ0 ∈ Rk and u0 ∈ Rn , respectively. Here, ÿ0 might be the result of theoretical or empirical considerations, and u0 might be some presumed speci cation error of model (1). Setting 0 = ( ÿu00 ) ∈ Rk+n , we obtain (b − ÿ)0 A(b − ÿ) ( − 0 )0 T ( − 0 )

(6)

as an analogon to expression (5). When there are no preferences with respect to ÿ0 and u0 these parameters should be set equal to 0, and (5) will be relevant. As a rule, the matrix T in the denominator of (5) and (6) will be selected in diagonal or in block-diagonal form giving the weights of the corresponding components of  or  − 0 , respectively. But there might be more general applications, where an interaction between the regression coecient ÿ and the disturbance term u should be modelled. This could be the case when a certain pattern of the components of u is expected, e.g., seasonal e ects in time series or individual e ects in panel studies, having di erent impacts on the components of ÿ. In this paper we are going to apply the minimax principle to quantity (6); here, we consider linear ane estimators and we do not make use of any secured information about . Inserting Eq. (1) into b = Cy + c (C ∈ Rk×n ; c ∈ Rk ) and setting D = (CX − Ik ; C), where Ik ∈ Rk×k is the identity matrix and D ∈ Rk×(k+n) , we get (D + c)0 A(D + c) ( − 0 )0 T ( − 0 )

(7)

being equivalent to (6). This leads to the following de nition of an optimal linear ane estimator. Deÿnition 2. Let A ∈ Rk×k , n.n.d., and T ∈ R(k+n)×(k+n) , p.d., be given matrices, and let 0 ∈ Rk+n be a given vector. Then a linear ane estimator b∗ = C ∗ y + c∗ (C ∗ ∈ Rk×n ; c∗ ∈ Rk ) for ÿ in model (1) is optimal if the inequality sup

∈Rk+n 6=0

(D∗  + c∗ )0 A(D∗  + c∗ ) (D + c)0 A(D + c) 6 sup 0 ( − 0 ) T ( − 0 ) ( − 0 )0 T ( − 0 ) ∈Rk+n 6=0

172 B.F. Arnold, P. Stahlecker / Journal of Statistical Planning and Inference 89 (2000) 169–174

holds for all C ∈ Rk×n and all c ∈ Rk . Here, we have set D∗ = (C ∗ X − Ik ; C ∗ ) and D = (CX − Ik ; C). Note that no stochastic aspects enter this de nition. The problem of determining an optimal linear ane estimator can be reduced to nd its linear part. Looking at the relation sup

∈Rk+n 6=0

(D + c)0 A(D + c) ( − 0 )0 T ( − 0 )

= sup

∈Rk+n 6=0

(D( − 0 ) + D0 + c)0 A(D( − 0 ) + D0 + c) ; ( − 0 )0 T ( − 0 )

we see that this supremum is nite i A(D0 + c) = 0: Thus, we restrict ourselves to those linear ane estimators b = Cy + c satisfying Ac = −AD0 : In the following, we focus on the special solution (8)

c = −D0 which is equivalent to c = −(CX − Ik )ÿ0 − Cu0 :

(9)

For these linear ane estimators we get sup

∈Rk+n 6=0

(D + c)0 A(D + c) ( − 0 )0 D0 AD( − 0 ) = sup ( − 0 )0 T ( − 0 ) ( − 0 )0 T ( − 0 ) ∈Rk+n 6=0

= max (T −1=2 D0 ADT −1=2 ) = max (A1=2 DT −1 D0 A1=2 ); where max (·) denotes the maximal eigenvalue of the corresponding matrix and T 1=2 (A1=2) is the p.d. (n.n.d.) square root of T (A). According to De nition 2, we obtain an optimal linear ane estimator by minimizing the objective function Z:Rk×n → [0; ∞) with Z(C) = max (A1=2 DT −1 D0 A1=2 )

(10)

and D = (CX − Ik ; C) = C(X; In ) − (Ik ; 0k×n ); here, In ∈ Rn×n and 0k×n ∈ Rk×n denote the identity and the null matrix, respectively. If C ∗ is a minimizer of Z then the ane part is calculated by (9) c∗ = −(C ∗ X − Ik )ÿ0 − C ∗ u0

(11)

and b∗ = C ∗ y + c∗ is an optimal linear ane estimator for ÿ in model (1). Proceeding like this we see that 0 enters only the ane part of our optimal linear ane estimator which, in case of 0 = 0, reduces to a linear estimator.

B.F. Arnold, P. Stahlecker / Journal of Statistical Planning and Inference 89 (2000) 169–174 173

3. An explicit solution We consider the argument of the max operator in (10), i.e., the function f:Rk×n →

k×k , R¿

f(C) = A1=2 [C(X; In ) − (Ik ; 0k×n )]T −1 [C(X; In ) − (Ik ; 0k×n )]0 A1=2 ; k×k denotes the set of all k-dimensional n.n.d. matrices equipped with the where R¿ Lowner ordering de ned by A1 ¿A2 i A1 − A2 is n.n.d. Since f(·) is quadratic and convex with respect to the Lowner ordering, it can be seen by direct calculation that this function is minimized by

C ∗ = (Ik ; 0k×n )T −1 (X; In )0 [(X; In )T −1 (X; In )0 ]−1 :

(12)

As any minimizer of f(·) also minimizes ’(f(·)), where ’ is an arbitrarily selected k×k , for example the min , the max or the trace operator (furisotonic functional on R¿ k×k ther isotonic operators on R¿ are discussed, e.g., in Ga ke and Kra t (1982), in Pukelsheim (1993) or in Lauterbach and Stahlecker (1990)), the matrix C ∗ minimizes, in particular, the objective function Z(·), and we have proved the following theorem. Theorem. Let A ∈ Rk×k ; n.n.d.; and T ∈ R(k+n)×(k+n) ; p.d.; be given matrices, and let 0 = ( ÿu00 ) ∈ Rk+n be a given vector. Then an optimal linear ane estimator b∗ = C ∗ y + c∗ for ÿ in model (1) is given by (12) and (11). Note that our optimal linear ane estimator b∗ does not depend on the matrix A of weights. We now consider the important special case, where the p.d. matrix T ∈ R(k+n)×(k+n) is of block-diagonal form ! 0 T11 0

T22

with T11 ∈ Rk×k and T22 ∈ Rn×n . Inserting ! −1 T11 0 −1 T = −1 0 T22 into (12) and applying the inversion formula we obtain −1 0 −1 0 −1 −1 X (XT11 X + T22 ) = (T11 + X 0 T22 X )−1 X 0 T22 : C ∗ = T11

Thus, according to De nition 1, the optimal linear ane estimator b∗ = C ∗ y + c∗ given by (12) and (11) is a Kuks–Olman estimator. We end this paper with the general remark that our approach does not require any rank condition with respect to the model matrix X . Acknowledgements The authors wish to thank the referees for their helpful comments which signi cantly improved the original version of this paper.

174 B.F. Arnold, P. Stahlecker / Journal of Statistical Planning and Inference 89 (2000) 169–174

References Drygas, H., 1993. Reparametrization methods in linear minimax estimation. In: Matusita, K., Puri, M.L., Hayakawa, T. (Eds.), Proceedings of the Third Paci c Area Statistical Conference, Zeist, Netherlands (VSP), pp. 87–95. Ga ke, N., Heiligers, B., 1989. Bayes, admissible, and linear minimax estimators in linear models with restricted parameter space. Statistics 20, 487–508. Ga ke, N., Kra t, O., 1982. Matrix inequalities in the Lowner ordering. In: Korte, B. (Ed.), Modern Applied Mathematics: Optimization and Operations Research. North-Holland, Amsterdam, pp. 595–622. Hoerl, A.E., Kennard, R.W., 1970. Ridge regression: biased estimation of nonorthogonal problems. Technometrics 12, 55–67. Ho mann, K., 1979. Characterization of minimax linear estimators in linear regression. Math. Oper. Statist. Ser. Statist. 10, 19–26. Kuks, J., Olman, W., 1971. Minimax Linear Estimation of Regression Coecients I. Iswestija Akademija Nauk Estonskoj SSR 20, 480 – 482 (in Russian). Kuks, J., Olman, W., 1972. Minimax Linear Estimation of Regression Coecients II. Iswestija Akademija Nauk Estonskoj SSR 21, 66 –72 (in Russian). Lauter, H., 1975. A minimax linear estimator for linear parameters under restrictions in form of inequalities. Math. Oper. Statist. Ser. Statist. 6, 689–695. Lauterbach, J., Stahlecker, P., 1990. Some properties of [tr(Q2p )]1=2p with application to linear minimax estimation. Linear Algebra Appl. 127, 301–325. Markiewicz, A., 1996. Characterization of general ridge estimators. Statist. Probab. Lett. 27, 145–148. Pilz, J., 1986. Minimax linear regression estimation with symmetric parameter restriction. J. Statist. Plann. Inference 13, 297–318. Pilz, J., 1991. Bayesian Estimation and Experimental Design in Linear Regression Models. Wiley, New York. Pukelsheim, F., 1993. Optimal Design of Experiments. Wiley, New York. Rao, C.R., 1976. Estimation of parameters in a linear model. Ann. Statist. 4, 1023–1037. Rao, C.R., Toutenburg, H., 1995. Linear Models, Least Squares and Alternatives. Springer, Heidelberg. Stahlecker, P., 1987. A priori Information und Minimax-Schatzung im Linearen Regressionsmodell. Mathematical Systems in Economics, Vol. 108. Athenaum, Frankfurt (in German). Stahlecker, P., Trenkler, G., 1993. Minimax estimation in linear regression with singular covariance structure and convex polyhedral constraints. J. Statist. Plann. Inference 36, 185–196. Trenkler, G., Stahlecker, P., 1987. Quasi minimax estimation in the linear regression model. Statistics 18, 219–226.