Asymptotic distribution of test statistic for the covariance dimension reduction methods in regression

Asymptotic distribution of test statistic for the covariance dimension reduction methods in regression

ARTICLE IN PRESS Statistics & Probability Letters 68 (2004) 421–427 Asymptotic distribution of test statistic for the covariance dimension reduction...

201KB Sizes 0 Downloads 27 Views

ARTICLE IN PRESS

Statistics & Probability Letters 68 (2004) 421–427

Asymptotic distribution of test statistic for the covariance dimension reduction methods in regression Xiangrong Yina,*,1, R. Dennis Cookb,2 a

Department of Statistics, University of Georgia, 204 Statistics Building, Athens, GA 30602, USA b School of Statistics, University of Minnesota, 1994 Buford Ave., St. Paul, MN 55108, USA Received 11 April 2002 Available online 26 May 2004

Abstract Yin and Cook (J. Roy. Statist. Soc. Ser. B Part 2 64 (2002) 159) recently introduced a new dimension reduction method for regression called Covk : Here, we develop the asymptotic distribution of the Covk test statistic for dimension under weak assumptions. This serves as an analytic counterpart to the permutation test suggested by Yin and Cook. r 2004 Elsevier B.V. All rights reserved. Keywords: Asymptotics; Central subspaces; Dimension reduction subspaces; Regression graphics

1. Introduction In this article, we develop the asymptotic distribution of the test statistic for the dimension reduction method Covk ; introduced recently by Yin and Cook (2002). Popularized in the context of sliced inverse regression (Li, 1991), estimating the structural dimension of a regression by testing has proven an effective method for dimension reduction. Typically, either the asymptotic distribution of the test statistic or a non-parametric alternative is needed. Yin and Cook (2002) used a permutation test, which was suggested first by Cook and Weisberg (1991) and further developed by Cook and Yin (2001). In this article, we derive the asymptotic distribution of the Covk test statistic under the null dimension hypothesis. *Corresponding author. Tel.: +706-542-3312; fax: +706-542-3391. E-mail address: [email protected] (X. Yin). 1 Supported in part by University of Georgia Research Foundation. 2 Supported in part by National Science Foundation grant DMS-0103983. 0167-7152/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2004.04.010

ARTICLE IN PRESS 422

X. Yin, R. Dennis Cook / Statistics & Probability Letters 68 (2004) 421–427

We briefly review the Covk method in Section 2, and develop its asymptotic distribution in Section 3. The reader is referred to Yin and Cook (2002) for motivation and details of Covk : The scalar response is denoted by Y and the p  1 vector of predictor is X: We assume throughout that the data (Yi ; XTi ), i ¼ 1; y; n; are iid observations on (Y ; XT ) with finite moments. Subspaces will be denoted by S; and SðBÞ means the subspace of Rt spanned by the columns of t  u matrix B: PB denotes the projection operator for SðBÞ with respect to the usual inner product.

2. Covariance methods Extending the mean dimension reduction subspaces developed by Cook and Li (2002), Yin and Cook (2002) introduced kth moment dimension reduction subspaces and central kth moment dimension reduction subspaces. Here dimension reduction hinges on finding a p  q matrix g; qpp; so that the random vector gT X contains all the information about Y that is available from EðY jXÞ; VarðY jXÞ; y; M ðkÞ ðY jXÞ; where M ðkÞ ðY jXÞ ¼ E½ðY  EðY jXÞÞk jX for kX2: For notational convenience, M ð1Þ ðY jXÞ stands for EðY jXÞ: If Y NfM ð1Þ ðY j XÞ; y; M ðkÞ ðY j XÞgjgT X; then SðgÞ is a kth moment dimension reduction subspace for the regression of Y on X: ðkÞ where intersection is over all kth moment dimension reduction subspaces Let SðkÞ Y jX ¼ -S ðkÞ ðkÞ S : If SY jX is itself a kth moment dimension reduction subspace, it is called the central kth moment dimension reduction subspace, or simply the central kth moment subspace. 1=2 Let RX ¼ VarðXÞ; which is assumed to be non-singular, and let Z ¼ RX ðX  EðXÞÞ be the 1=2 ðkÞ ðkÞ standardized predictor. Then SY jX ¼ RX SY jZ (Yin and Cook, 2002), and there is no loss of generality working on the Z-scale. Throughout this note, we shall work on the Z-scale unless specified otherwise. Let c be a basis for SðkÞ Y jZ ; the central kth moment subspace, and assume that the linearity condition (Li, 1991) holds: EðZ j cT ZÞ ¼ Pg Z: The population kernel matrix is defined as K ¼ ðEðY ZÞ; y; EðY k ZÞÞ and the corresponding covariance subspace is SðkÞ Cov ¼ SðKÞ:

ð1Þ

ðkÞ Then, under the linearity condition: SðkÞ Cov DSY jZ (Yin and Cook, 2002, Proposition 2). Thus the subspace spanned by the left singular vectors of K corresponding to its non-zero singular values is a subspace of SðkÞ Y jZ : The kernel matrix K is one of many that could be used. For instance, Yin and Cook (2002) also suggested the corresponding kernel matrix with Y centered and scaled:

Kc ¼ ðEðW ZÞ; y; EðW k ZÞÞ; pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where W ¼ ðY  EðY ÞÞ=sðY Þ and sðY Þ ¼ VarðY Þ: Here the response is standardized in Kc to perhaps induce some numerical stability and so all calculations will be invariant to unit changes in Y : Because SðKc Þ ¼ SðKÞ ¼ SðkÞ Cov ; methods based on Kc should give result equivalent to those based on K: This conclusion follows from the relation Kc ¼ KU; where U is a k  k non-singular,

ARTICLE IN PRESS X. Yin, R. Dennis Cook / Statistics & Probability Letters 68 (2004) 421–427

upper triangular matrix. 0 1 2EðY Þ ðEðY ÞÞ2 C32 B sðY Þ s2 ðY Þ s3 ðY Þ B B B 1 EðY ÞC31 B 0 B s2 ðY Þ s3 ðY Þ B B B 1 0 U¼B B 0 3 ðY Þ s B B B B B 0 0 0 B B @ 0 0 0

is non-singular, and Cnk ¼ kn :

























423

1 ðEðY ÞÞk1 Ckk1 C sk ðY Þ C k2 k2 C ðEðY ÞÞ Ck C C C sk ðY Þ C k3 k3 C ðEðY ÞÞ Ck C C C sk ðY Þ C C C C EðY ÞCk1 C k C s ðY Þ C A 1 sk ðY Þ

3. Asymptotic distribution In general, we are interested in the subspace SfEðfiðkÞ ðY ÞZÞ; i ¼ 1; y; hgDSðkÞ Y jZ ;

ð2Þ

f1ðkÞ ðY Þ; y; fhðkÞ ðY Þ

are hpminðp; kÞ linearly independent known polynomials having up to where kth degree. Particular interest is placed in the special case, ðkÞ SðKc Þ ¼ SðkÞ Cov DSY jZ :

ð3Þ

Our goal is to estimate the dimensions of these subspaces by using the singular values of corresponding sample kernel matrices. While (3) is the subspace that we normally pursue in practice, the results of the next section are presented in terms of (2) to allow for generality. 3.1. General known polynomial f ðkÞ ðY Þ # X denote the usual estimate of RX ; and define the standardized predictors Let R # 1=2 ðXi  XÞ; #i ¼ R % i ¼ 1; y; n; Z X % is the sample mean of the predictor vector. Define where X fðY Þ ¼ ðf1ðkÞ ðY Þ; y; fhðkÞ ðY ÞÞT ; Mf ¼ EðfðY ÞÞ; Mi ¼ EðfiðkÞ ðY ÞÞ; Kh ¼ ðEððf1ðkÞ ðY Þ  M1 ÞZÞ; y; EððfhðkÞ ðY Þ  Mh ÞZÞÞ ¼ ðEðf1ðkÞ ðY ÞZÞ; y; EðfhðkÞ ðY ÞZÞÞ:

ARTICLE IN PRESS 424

X. Yin, R. Dennis Cook / Statistics & Probability Letters 68 (2004) 421–427

The sample kernel matrix corresponding to Kh is ! n n X X 1 1 ðkÞ ðkÞ #h ¼ # i ; y; #i : K f ðYi ÞZ f ðYi ÞZ n i¼1 1 n i¼1 h # h and let Assume d ¼ dimðSðKh ÞÞ is known. Let s#1 X#s2 X?X#sh be the singular values of K # # #l1 ; y; l#h be the corresponding left singular vectors. Then SðK # h Þ ¼ Sðl1 ; yld Þ: Otherwise, # will be called inference about d is still required for use in practice. The linear combination l#Tj Z the jth Covk predictor. We base our estimate of d on the idea suggested by Li (1991). The statistic r X #m ¼ n s#2j L j¼mþ1

# m to the percentage points of its is used as follows to estimate d: Beginning with m ¼ 0; compare L distribution under the hypothesis d ¼ m: If it is smaller, there is no information to contradict the hypothesis. If it is larger, conclude that d > m; increment m by 1 and repeat the procedure. The # m is # m1 is relatively large, implying that d > m  1; while L estimate d# ¼ m follows when L relatively small, so there is no information to contradict the hypothesis. The estimate of SðKh Þ is # for # 1=2 l#j ; j ¼ 1; y; d; then given by Sfl#1 ; y; l#d#g: These vectors can then be back transformed to R X consideration in the original scale. # d by investigating the joint asymptotic Here, we find the asymptotic distribution of L # h using the general approach developed by distribution of the smallest h  d singular values of K Eaton and Tyler (1994): First, construct the singular value decomposition of Kh ;   T D 0 Kh ¼ C W; 0 0 where CT and W are orthonormal matrices with dimensions p  p and h  h; and D is a d  d diagonal matrix of singular values. Next partition CT ¼ ðC1 ; C0 Þ and WT ¼ ðW1 ; W0 Þ where C0 is p  ðp  dÞ and W0 is h  ðh  dÞ: Then it follows from Eaton pffiffiffi and Tyler (1994) that the asymptotic # h is the same as the asymptotic distribution of the smallest h  d singular values of nK distribution of the singular values of the ðp  dÞ  ðh  dÞ matrix pffiffiffi pffiffiffi # h W0 : nUn ¼ nCT K 0

# d is the same as that of Thus, the asymptotic distribution of L T # h W0 ðCT K # Ld ¼ n  trace½CT0 K 0 h W0 Þ ; pffiffiffi which is the sum of the squared singular values of nUn : And Ld ¼ n vecðUn ÞT vecðUn Þ; where ‘‘vec’’ is the usual operator that maps a matrix into a vector by stacking its column. Now we are ready to state the following proposition. # d is the same as the Proposition 1. Let dP¼ dim½SðKh Þ: Then the asymptotic distribution of L ðpdÞðhdÞ wj Qj ; where the Qj ’s are independent chi-square random variables distribution of Q ¼ j¼1 each with one degree of freedom, and w1 Xw2 X?XwðpdÞðhdÞ are the eigenvalues of the covariance

ARTICLE IN PRESS X. Yin, R. Dennis Cook / Statistics & Probability Letters 68 (2004) 421–427

425

matrix X ¼ ðWT0 #CT0 ÞVar½ðfðY Þ  Mf Þ#ZðW0 #C0 Þ;

ð4Þ

where # is the Kronecker product. This result does not require the linearity condition, an essential part of the justification for Covk : Thus it provides a general method for inferring about SðkÞ Cov : But the linearity condition is ðkÞ still required for establishing the connection between the SðkÞ Cov and SY jZ : 3.2. Special polynomial ðY  EðY Þ=sðY ÞÞj In this section, we consider the special case Kc assuming kpp: Recall that Kc ¼ KU and # ¼ ðY  YÞ=sðY % W ¼ ðY  EðY ÞÞ=sðY Þ: Define the corresponding sample estimates: W Þ where Y% # is the U matrix and sðY Þ are the sample mean and sample standard deviation of Y ; respectively. U with EðY Þ and sðY Þ replaced by Y% and sðY Þ correspondingly. Finally, ! ! n n n n X X X X 1 1 1 1 k k # ¼ # i ; y; # # i ; y; #i : #c ¼ # iZ # Z W W and K Yi Z Y Z K i i n i¼1 n i¼1 n i¼1 n i¼1 i #c ¼ K # U: # With some algebra, K Construct the singular value decomposition of Kc ;   T Dc 0 Wc ; Kc ¼ C c 0 0 where CTc and Wc are orthonormal matrices with dimensions p  p and k  k; and Dc is a d  d diagonal matrix of singular values. Partition CTc ¼ ðCc1 ; Cc0 Þ and WTc ¼ ðWc1 ; Wc0 Þ where Cc0 is 3.1, we then need to p  ðp  dÞ and Wc0 is k  ðk  pffiffiffi to the argument in Section pdÞ: ffiffiffi Similar # c Wc0 : Note that CT K ¼ 0 and KUWc0 ¼ 0: We investigate the distribution of nVn ¼ nCTc0 K c0 then have that pffiffiffi pffiffiffi # UW # c0 nVn ¼ nCTc0 K pffiffiffi T #  KÞðU #  U þ UÞWc0 ¼ nCc0 ðK pffiffiffi T #  KÞUWc0 þ oðn1=2 Þ ¼ nCc0 ðK pffiffiffi 1=2 #  BKÞUW # Þ ¼ nCTc0 ðK c0 þ oðn pffiffiffi T ¼ nCc0 Cn UWc0 þ oðn1=2 Þ; # is defined in the appendix and Cn is also defined in the appendix corresponding to the where B terms with the general polynomial replaced by fðY Þ ¼ ðY ; y; Yk ÞT : Thus based on the proof of Proposition 1, the new corresponding matrix is Xc ¼ ðWTc0 #CTc0 ÞVar½ðUT ðfðY Þ  MfðY Þ ÞÞ#ZðWc0 #Cc0 Þ:

ARTICLE IN PRESS X. Yin, R. Dennis Cook / Statistics & Probability Letters 68 (2004) 421–427

426

But simple algebra can show that UT ðfðY Þ  MfðY Þ Þ ¼ fðW Þ  MfðW Þ : Hence we summarize our result in the following proposition. # d is the same as the Proposition 2. Let dP¼ dim½SðKc Þ: Then the asymptotic distribution of L ðpdÞðkdÞ wj Qj ; where the Qj ’s are independent chi-square random variables distribution of Q ¼ j¼1 each with one degree of freedom, and w1 Xw2 X?XwðpdÞðhdÞ are the eigenvalues of the covariance matrix Xc ¼ ðWTc0 #CTc0 ÞVar½ðfðW Þ  MfðW Þ Þ#ZðWc0 #Cc0 Þ;

ð5Þ

where W ¼ ðY  EðY ÞÞ=sðY Þ:

Appendix. Justifications # d and Ld ; we need only develop the asymptotic Proof of Proposition 1.pBy of L pffiffiffi ffiffiffi the equivalence T # 1=2 R1=2 and Cn ¼ K # h  BK # h ; then n vecðUn Þ: Define B# ¼ R distribution for Ld ¼ n vecðUn Þ X X # h  BK # h ÞW0 ¼ CT Cn W0 ; Un ¼ CT ðK 0

0

pffiffiffi pffiffiffi n vecðUn Þ ¼ ðWT0 #CT0 Þ n vecðCn Þ; # h  BK # h Þ ¼ vecðK # h Þ  ðI#BÞvecðK # vecðCn Þ ¼ vecðK h Þ: Pn # i ¼ 0 we can write Z Now, because EðZÞ ¼ 0 and i¼1

T

Kh ¼ EððfðY Þ 

¼ EðfðY ÞT #ZÞ;

n 1X # # i: ðfðYi Þ  fðYi Þ Þ#Zi ¼ fðYi ÞT #Z n i¼1 i¼1 Pn 1 # # Þ#ZÞ: Thus we have, vecðCn Þ ¼ n i¼1 fðYi Þ#Zi  ðI#BÞEðfðY # # Note that ðI#BÞEðfðY Þ#ZÞ ¼ ðI#BÞEððfðY Þ  Mf Þ#ZÞ; and n n X 1X #i ¼1 #i fðYi Þ#Z ðfðYi Þ  Mf Þ#Z n i¼1 n i¼1

#h ¼ 1 K n

n X

MTf Þ#ZÞ T

T

n 1X 1=2 % ¼ ðfðYi Þ  Mf Þ#R# X ðXi  XÞ n i¼1

# I#R X ¼ n

1=2

n X

%  lX ÞÞ ðfðYi Þ  Mf Þ#ðXi  lX  ðX

i¼1

n # 1=2 X I#R X ðfðYi Þ  Mf Þ#ðXi  lX Þ ¼ n i¼1 ! n X %  lX Þ ðfðYi Þ  Mf Þ#ðX  i¼1

ARTICLE IN PRESS X. Yin, R. Dennis Cook / Statistics & Probability Letters 68 (2004) 421–427

427

! n n X # X I#B % ¼ ðfðYi Þ  Mf Þ#Zi  ðfðYi Þ  Mf Þ#Z n i¼1 i¼1 ¼

n #X I#B # % ðfðYi Þ  Mf Þ#Zi  ðI#BÞðfðY i Þ  Mf Þ#Z: n i¼1

Combining these two, we have pffiffiffi n vecðCn Þ n X pffiffiffi # n 1 ðfðYi Þ  Mf Þ#Zi  EððfðY Þ  Mf Þ#ZÞ ¼ ðI#BÞ n i¼1 pffiffiffi # %  ðI#BÞð nðfðYi Þ  Mf Þ#ZÞ:

!

Now, applying the central limit theorem to the above two terms and using the facts that B# !P I % !P 0; we have that the second term converges to 0 and the first term asymptotically normal and Z pffiffiffi with mean 0 and covariance matrix VarððfðY Þ  Mf Þ#ZÞ: Thus n vecðUn Þ is asymptotically normal with mean 0 and covariance matrix X: Finally, the conclusion follows from the usual multinormal theory. &

References Cook, R.D., Weisberg, S., 1991. Discussion of Li (1991). J. Amer. Statist. Assoc. 86, 328–332. Cook, R.D., Yin, X., 2001. Dimension reduction and visualization in discriminant analysis (with discussion). Austral. New Zealand J. Statist. 43 (2), 147–199. Cook, R.D., Li, B., 2002. Dimension reduction for the conditional mean in regression. Ann. Statist. 30, 455–474. Eaton, M.L., Tyler, D., 1994. The asymptotic distribution of singular values with application to canonical correlations and correspondence analysis. J. Multivariate Anal. 50, 238–264. Li, K.C., 1991. Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86, 316–342. Yin, X., Cook, R.D., 2002. Dimension reduction for the conditional kth moment in regression. J. Roy. Statist. Soc. Ser. B Part 2 64, 159–175.