Linear restrictions and two step least squares with applications

Linear restrictions and two step least squares with applications

Statistics & Probability Letters 2 (1984) 245 248 North-Holland LINEAR RESTRICTIONS WITH APPLICATIONS AND TWO STEP LEAST SQUARES Guido E. del PINO ...

187KB Sizes 0 Downloads 37 Views

Statistics & Probability Letters 2 (1984) 245 248 North-Holland

LINEAR RESTRICTIONS WITH APPLICATIONS

AND TWO STEP LEAST SQUARES

Guido E. del PINO Departamento de Estadistica, P. Universidad Catblica de Chile, Casilla I14-D, Santiago, Chile Received January 1984 Revised June 1984

Abstract: In this paper we consider the full rank regression model with arbitrary covariance matrix: Y = XJ3 + e. It is shown that the effect of restricting the information Y to T = A ' Y may be analyzed through an associated regression problem which is amenable to solution by two step least squares. The results are applied to the important case of missing observations, where some classical results are rederived. Keywords: linear models, two step least squares, influential data, missing data, dummy variables.

1. Introduction and summary

Consider the full rank regression model Y = X nXl

fl +

nXp pgl

e

(1.1)

nxl

with E(~) = 0,

V(~) = nXn

where X has rank p and X is positive definite. The problem of estimating/3 when only a subset of the Yi is observed (the complementary subset is 'missing') has been thoroughly analyzed in the literature. Some relevant references are Bartlett (1937), Seber (1976) and Belsley et al. (1980). In the last reference, the observations are deleted on purpose to examine their influence on the estimated regression model. It is well known that the use of dummy variables may be useful in this context. In this paper we consider the more general case where a set of linear combinations of the Y~ is observed. This may be written as

T=A'Y, where A is an n x m matrix of rank m.

(1.2)

This framework includes, in addition to the problem of missing observations, the case of aggregated data. It is also worth mentioning that this is a finite dimension analog of remote sensing problems. One important point is that the method of proof employed in the general situation offers additional insight for the case of missing observations. In Section 2 it is shown that the problem is equivalent to a regression problem with different regressors. Theorems 2.1 and 2.2 give two alternative characterizations of the solution. The last theorem is used in Section 3 to find the change in the estimated parameters caused by the linear restrictions. Finally, in Section 4 we particularize the results of the previous section to the case of missing observations. Some results well know in the literature are obtained as special cases.

2. Main theorems

Assuming the regression model (1.1), it is desired to find the best estimator among those linear in T given by (1.2) and unbiased for fl (BLUE). A direct approach consists in finding the ex-

0167-7152/84/$3.00 © 1984, Elsevier Science Publishers B.V. (North-Holland)

245

Volume 2, Number 4

STATISTICS & PROBABILITY LETTERS

p e c t e d value and the covariance m a t r i x of T. W e have

ET=A'Xfl

and

Var T = A ' V A .

August 198

T h e use of d u m m y variables to study the prol: lem of missing d a t a is very old in statistics. O n e c the first to e m p l o y it seems to be Bartlett (1937 a n d an excellent reference for this topic is F u l k

(1980).

T h e n the B L U E o f / 3 is

[~ = ( X'A( A',~A ) ' A'X ) - IX'A( A'•A ) - I A'Y. (2.1)

W e now generalize this m e t h o d to the case ( linear restrictions. It will be seen that the durum variables are r e p l a c e d b y columns of more c o n p l i c a t e d structure.

Let C = ZA

(2.2)

T h e n X'A = X'_Y- ~C a n d A ' ~ A = C'~,- ~C. H e n c e

Theorem 2.2 The estimator fl coincides with/3", tk B L U E of/3 in the model

Y=X/3+Zv+e, = ( X'WC(C'WC)-'C'WX)-' × ( X'WC(C'WC)-1CWY), with W = V-a. F o r any n × r m a t r i x M, let PM be the m a t r i x r e p r e s e n t i n g the o r t h o g o n a l p r o j e c t i o n on the colu m n space of M, with respect to the inner p r o d u c t ( a , b ) = a ' W b . It is well k n o w n that if M is of r a n k r, then PM = M ( M ' W M ) -1M'W. Since the r a n k of A is m, so is the r a n k of C. T h e n l

t

\-1

= ( X WPc X )

E ( e ) = 0,

V a r ( e ) = ~;,

where Z is any matrix satisfying

(2.:

r(A)+r(Z)=n

(2.¢

and Z54 = 0 .

2."

Proof. C o n d i t i o n s (2.6) a n d (2.7) are equivalent

Pc + P z = I.

(2.:

S t a n d a r d results a b o u t overfitting show that coincides with the B L U E o f / 3 in

t

X WPc Y.

If E a n d F are m a m c e s with n rows, we will use the notation,

Y=(I-

( E, F ) = E ' W F ,

A n a p p l i c a t i o n of T h e o r e m 2.1 a n d (2.8) cox pletes the proof.

(2.3)

that is, the element 0" of ( E , F ) is the inner p r o d u c t of the i t h c o l u m n of E a n d t h e j t h c o l u m n of F. W i t h this n o t a t i o n (2.1) b e c o m e s

[3 = ( X, P c X ) - ' ( X, P c Y ) .

(2.4)

By the general p r o p e r t i e s of p r o j e c t i o n matrices

( a, Pcb ) = (Pea, b) = (Pc a, Pcb ) so that

= (POX, Pc X )

Pz)Xfl+e.

3. The effect of observation restrictions on e~ mated coefficients In this section we use T h e o r e m 2.2 to find c o n v e n i e n t f o r m u l a for ~ - / 3 . Theorem 3.1. The change in regression coefficie

'(POX, Y ) .

W e have thus shown the following.

due to the restrictions is - fl = ( X ' W X ) - 1 X ' W Z R - ' Z ' W e ,

(3

where Theorem 2.1. The estimator fl coincides with the B L U E of fl in the regression problem

e= r - x f l ,

V= cXB+e, E ( e ) = 0 ,

Proof. F r o m T h e o r e m 2.2, /~ = fl*. S t a n d a r d

246

R=Z'W(I-Px)Z.

Volume 2, Number 4

STATISTICS & PROBABILITY LETTERS

sults on overfitting give

- ~ = (X'WX)

(3.2)

'X'WZ~,

where ~ is the B L U E of y in the model

y = (I-

P x ) Z y + e.

But

= ((1-

Px)Z, (1-

= (Z, (I-

Px)Z>-'((I-

Px)Z>-I(Z,

(I-

Px)Z, Y}

Pz)V)

August 1984

N o t e that the symmetric matrix B is such that Y ' B Y is the square length of the projection of Y on the space orthogonal to X. F r o m (4.1) it is easily seen that /~ - / 3 depends on Y only through WDe. On the other hand, this depends only on the residuals corresponding to the deleted observations (i.e. eD), if and only if W~ = 0. this condition is equivalent to E ( e D e b' ) = O. We now consider some special cases of (4.1). (a) V = 1.

= R - 1Z'We from which (3.1) follows.

'( XD)'( I ~ -- 142)

fi -- fl = ( X ' X )

To appreciate the relative simplicity achieved by the application of Theorem 2.1 in the proof of Theorem 3.1, we give a different p r o o f of this last theorem in the Appendix. The m e t h o d of proof is an extension of the one employed by Belsley et al. (1980), and involves the use of fairly complicated matrix manipulations.

(b) D = {i}.

- ~ = (x'wx)

'x'w'w~e/B..

N o t e that all residuals e, are generally involved in this expression.

(c) v = I ,

D=(i}.

-- fl = ( X ' X ) - '

4. The effect of eliminating observations

aWDe.

X , ' e J ( 1 - h,),

where h i is she the ith diagonal element of H. This result appears in Belsley et al. (1980, p. 13).

In this section we show how the general results obtained m a y be applied to analyze the effect of eliminating a set of observations (deleting a set of rows). For a matrix Ga×b, the submatrix formed by picking rows i t . . . . . i r and columns Jl . . . . . Js will be denoted by G~, where D = { i 1. . . . . it} and E = {Jl .... ,j~}. If D = (1 . . . . . a} or E = {1 . . . . . b} the c o r r e s p o n d i n g s u b s c r i p t or s u p e r s c r i p t is eliminated. With this notation, the operation of deleting rows with index in D (the observations ~ with i in D are missing) corresponds to the choice A = I ~ in (1.2). It is obvious that Z = I D satisfies conditions (2.4) and (2.5). This provides theoretical justification to the intuitive idea that the effect of deleting r rows may be analyzed by adding r d u m m y variables to the regression (see also Fuller (1980)). Note that this holds irrespective of the presence of correlation in the errors. Let H = Px and B = W ( I - H ) . F r o m (3.1) we obtain

-- fl = ( X t W X ) - I x ' w D ( B g )

'~,~,

(4.1)

Acknowledgement The author would like to thank a referee for helpful comments.

Appendix Here we give an alternative proof of Theorem 3.1. Proof. Substituting ( 1 - P z ) for Pc in (2.4),

fi= E-l( x t w g

- XtWPzY

)

with

E = (X'WX-

X'WZ(Z'WZ)-1Z'WX).

(A.1)

F r o m R a o (1973, p. 33), we have the identity

(A + BDB')-' =A

1-A-aB(B'A

1B+D-I)B'A-t,

(A.2) 247

August 19

STATISTICS & PROBABILITY LETTERS

Volume 2, Number 4

w h e r e A a n d B are n o n s i n g u l a r matrices. W e n o w a p p l y (A.2) to (A.1) with

P r e m u l t i p l y i n g the i d e n t i t y R 1R = I b y Z ar p o s t m u l t i p l y i n g b y ( Z ' W Z ) 1 Z ' W we get

A= X'WX,

Z R - 1 Z ' W ( I - P x P z ) = Pz.

B= X'WZ,

D= -(Z'WZ)

-1

S u b s t i t u t i n g (A.5) in (A.4) we f i n a l l y o b t a i n

to get E_I = A _ I + A - 1 X , W Z R - x Z , W X A

m = ZR-1Z'W(I-

1

with

Px).

w h i c h together w i t h (A.3) yields T h e o r e m 3.1.

R = Z'W(I-

Px)Z.

References

Then = ( A -1 + A - I x ' W Z R - 1 Z ' W X A - 1 )

×(x'wY-x'wezY) and

(A.3)

~-~=A-aX'WMY,

where M = Pz - Z R × (X'W-

248

(A.:

1ZtWX(XtWX)

X'WPz).

- 1

(A.4)

Belsley, D.A., E. Kuh and R.E. Welch (1980), Regress, Diagnostics: Identifying Influential Data and Sources of C linearity (Wiley, New York). Fuller, W. (1980), The use of indicator variables in cornputi predictions, Journal of Econometrics 12, 231-243. Rao, C.R. (1973), Linear Statistical Inference and its Appli, tions (Wiley, New York, 2nd, ed.) Seber, G.A.F. (1977), Linear Regression Analysis (Wiley, N York). Bartlett, M. (1937), Some examples of statistical methods research in agriculture and applied biology, Journal of Royal Statistical Society, Supplement 4, 137-170.