A theoretical view of the envelope model for multivariate linear regression as response dimension reduction

A theoretical view of the envelope model for multivariate linear regression as response dimension reduction

Journal of the Korean Statistical Society 42 (2013) 143–148 Contents lists available at SciVerse ScienceDirect Journal of the Korean Statistical Soc...

223KB Sizes 0 Downloads 30 Views

Journal of the Korean Statistical Society 42 (2013) 143–148

Contents lists available at SciVerse ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

A theoretical view of the envelope model for multivariate linear regression as response dimension reduction Jae Keun Yoo ∗ Department of Statistics, Ewha Womans University, Seoul 120-750, Republic of Korea

article

info

Article history: Received 1 August 2011 Accepted 12 March 2012 Available online 5 April 2012 AMS 2000 subject classifications: primary 62G08 secondary 62H05

abstract The envelope model recently developed for the classical multivariate linear regression have potential gain in efficiency in estimating unknown parameters over usual maximum likelihood estimation. In this paper, we theoretically investigate the envelope model as dimension reduction for response variables and connect them to existing methods. © 2012 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

Keywords: Envelope model Multivariate linear regression Reducing subspaces Response dimension reduction

1. Introduction When the interest is placed on changes of multi-dimensional responses in distribution as predictors vary, multivariate linear regression should be one of the popular statistical tools. The classical multivariate linear regression of Y ∈ Rr |X ∈ Rp with r ≥ 2 is as follows: Y|X = α + βX + ε,

(1)

where α ∈ R is an intercept vector, β ∈ R is an unknown coefficient matrix, the error vector ε ∈ R ∼ MN (0, 6 ≥ 0)yX. A notation ‘y’ indicates statistical independence, and MN stands for the multivariate normal distribution. In addition, it is assumed that 6 > 0 throughout the rest of the paper. When dimensions of Y and X are high, the maximum likelihood estimation (MLE) for the parameters, especially the regression coefficient matrix β, may not be efficient. This is problematic, when prediction of responses for a new observation of X is a main issue in the regression study. As an alternative in such case, Cook, Li, and Chiaromonte (2010, CLC) proposed the envelope model under some conditions, which force a connection between the conditional mean E (Y|X) and the conditional covariance, cov(Y|X) = 6. By constructing the minimal reducing subspace of 6, which is fully informative to E (Y|X), dimensions of the parameter in model (1) are reduced, and it leads more efficient MLE than the usual MLE. This model is called the envelope model. Another interpretation of the envelope model is to partition the response subspace into the reducing subspace and its complement. Since the former subspace is fully informative to E (Y|X), the projection of response variables onto the reducing subspace can be thought as dimension reduction of responses. r



r ×p

Tel.: +82 2 3277 6717. E-mail address: [email protected].

1226-3192/$ – see front matter © 2012 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jkss.2012.03.004

r

144

J.K. Yoo / Journal of the Korean Statistical Society 42 (2013) 143–148

The main purpose of this paper is to provide a theoretical view of the envelope model for multivariate regression as response dimension reduction. Since, under the envelope model, lower-dimensional linearly transformed responses given X alone are informative to E (Y|X), the linear combination of Y can be considered as dimension-reduced responses. For this, we theoretically investigate the envelope model in the context of response dimension reduction developed by Yoo and Cook (2008, YC). Response dimension reduction becomes important in various fields of study. Analysis of repeated measures, longitudinal data, functional data, curve or time series data is often difficult due to high dimensionality of Y, although the dimension of X is relatively low. The study of such data would be facilitated if we could find a low dimensional linear transform of Y that adequately describes the regression relationship. For example, Leurgans, Moyeed, and Silverman (1993) applied the canonical correlation analysis to functional data in order to apply smoothing in a suitable way by reducing the dimensions. More recently, Li, Aragon, Shedden, and Agnan (2003) have reduced the dimension of multivariate response to adequately analyze China climate data. The data includes 12 dimensional responses. The organization of the paper is as follows. Section 2 is devoted to reviewing both the envelope model for multivariate linear regression and response dimension reduction in YC. Section 3 contains a new interpretation of the envelope model as response dimension reduction in the context of YC. In Section 4, we summarize our work. For notational conveniences, define 6U = cov(U) for a random vector U ∈ Ru , and S (B) stands for a subspace spanned by the columns of B ∈ Rr ×p . And, for a subspace S of Rr , S ⊥ stands for its orthogonal complement. The original paper of the envelope model by CLC can be acquired from http://www.stat.umn.edu/~dennis/RecentArticles/CLC.pdf. 2. Review: envelope model and response dimension reduction 2.1. Envelopes Envelopes are subspaces and, especially, are originated from the concepts of invariant and reducing subspaces. So, first, we start with an invariant subspace. A subspace S of Rr is an invariant subspace of M ∈ Rr ×r , if MS ⊆ S . Moreover, if MS ⊥ ⊆ S ⊥ , S is a reducing subspace of M. Now we define M-envelopes as follows. Let M ∈ Sr ×r and let S ⊆ S (M), where S stands for symmetric matrices. The M-envelope of S , notationally EM (S ), is the intersection of all reducing subspaces of M that contain S . By the definition, EM (S ) is the minimal and unique reducing subspace among all possible ones. For more details regarding invariant and reducing subspaces and EM (S ), readers can refer Section 2 of CLC. To develop an envelope model under (1), we consider M as 6, which is the conditional covariance of cov(Y|X) or the covariance of the random error vector ε in (1). Also, as a choice of S , we consider B = S (β). Letting d = dim(B ) and u = dim{E6 (B )}, it is assumed that 0 < d ≤ u ≤ r throughout the rest of the paper. By following the definition of E6 (B ), 6 should be partitioned along with E6 (B ) and E6⊥ (B ) in the envelope model. 2.2. Envelope model for multivariate linear regression For the classical multivariate linear regression in (1), we connect 6 and B through E6 (B ) by assuming the existence of E6 (B ). It should be again noted that B ⊆ E6 (B ) and E6 (B ) reduces 6. We will denote that a r × u matrix 0 is a semi-orthogonal basis matrix of E6 (B ) throughout the rest of the paper. Then we can state that (1) β does not have full-column rank, if u < r; (2) if so, parts of β are fully informative to regression; (3) 0, that is E6 (B ), can fully explain β, because β = 0ν; (4) it implies that the MLE of β can be obtained through lower-dimensional matrix 0. Along with the statements above, the following results are easily derived under the existence of E6 (B ) in (1). R1. 6 = 61 + 62 with 61 62 = 0 and E6 (B ) = S (61 ). R2. Model (1) can be re-written as follows: Y|X = α + 0νX + ε. R3. 61 = 00 and 62 = 0 = 0T0 600 . R4. 0T Yy0T0 Y|X. R5. 0T0 YyX and 0T0 YyX|0T Y. T

00 0 0T0 ,

(2) where a r × (r − u) matrix 00 is the orthogonal complement of 0,  = 0 60, and T

Results R1–R3 directly come from properties of envelopes. Result R4 can be proved by R3 under (1), and it rules out any possibility that 0T0 X contributes to the regression. The relation of β = 0ν and R4 directly implies R5. Now we consider model (2) as an alternative of (1), and the model in (2) is called the envelope model for multivariate linear regression. To have insight about how efficient model (2) can be, we compare the total number of parameters for both. In model (2), it should be r + pu + u(r − u) + u(u + 1)/2 + (r − u)(r − u + 1)/2 = r + pu + r (r + 1)/2, while model (1) has r + pr + r (r + 1)/2 parameters. The difference between the two is p(r − u), and with high dimensional p and relatively small u to r, the difference clearly gets bigger.

J.K. Yoo / Journal of the Korean Statistical Society 42 (2013) 143–148

145

ˆ CLC be the maximum likelihood estimator (MLE) of β under (2), and let βˆ OLS be a usual OLS estimator of β under (1). Let β ˆ CLC and βˆ OLS are as follows. From CLC, the covariances of β √

ˆ OLS − β)} = 6−1 ⊗ 6 Under (1), cov{ n vec(β X

(3)

√ ˆ CLC − β)} = 6−1 ⊗ 00T 600T + (νT ⊗ 00 )M− (ν ⊗ 0T ) Under (2), cov{ n vec(β (4) 0 X where n stands for the sample size, ‘⊗’ represents usual Kronecker product operator (see Searle, 1982), and ‘−’ in M− indicates the Moore–Penrose inverse. 1 According to CLC, the total number of unknown parameters in the right-side term in (4) does not exceed that in 6− X ⊗ 6 in (3). These differences will be bigger with the situation of high dimensional p and relatively small u to r. From these results, under the envelope model of (2), we have the potential gain in efficiency in the MLE for β and prediction of Y for a new observation of X based on the covariance comparisons. For more details regarding the MLEs under (2), the close look of M and the derivation of the covariances in (3) and (4), readers can refer to Section 5 of CLC. 2.3. Response dimension reduction Define L ∈ Rr ×q with the smallest possible ranks with q ≤ r so that E (Y|X) = AE (LT Y|X), where A is a r × q matrix. This says that X can be thought of as influencing LT Y and all other conditional mean components are determined from E (LT Y|X) via A. It can be shown that A is a generalized inverse of LT : LT ALT = LT . Without loss of generality, we take A = 6Y L(LT 6Y L)−1 . Then LAT forms the orthogonal projection operator PL(6Y ) for S (L) relative to the inner product ⟨v1 , v2 ⟩6Y = v1T 6Y v2 , so we have E (Y|X) = E {PTL(6Y ) Y|X}.

(5)

This says that E (Y|X) varies in the subspace spanned by 6Y L only depending on X. In other words, we pursue dimension reduction of Y and X through linear projection without loss of information on E (Y|X). According to Yoo and Cook (2008, YC), this type of dimension reduction in regression is called linear response reduction for E (Y|X). Suppose that there exist a r × k matrix K ̸= Ir so that E (Y|X) = E {E (Y|X, KT Y)|X} = E {E (Y|KT Y)|X}.

(6)

Then E (Y|K Y) in (6) can be re-written as E {g (K Y)|X}. If k < r, dimension reduction for Y is achieved, which is called conditional response reduction. According to Proposition 2 in YC, S (K) ⊆ S (L) for L defined in (5). And, the equality holds under the following condition: T

T

A1. E (Y|KT Y = a) is linear in a. Condition A1 is called the linearity condition and will hold to a reasonable approximation in many problems (Hall & Li, 1993). If Y has an elliptically contoured distribution, condition A1 is automatically satisfied. In the case that condition A1 does not hold, Y can often be one-to-one transformed to satisfy this condition. In regression, typically, the responses are generated given the predictors. That is, the predictors are sampled before generating the responses. Consequently, condition A1 may not seem reasonable. In practice, however, it can be shown through numerical studies that the condition reasonably holds in many multivariate regressions. One example is given in Section 6 of YC. −1 1 In YC, such L in (5) and K in (6) under condition A1 are inferred through ρ = 6− Y cov(Y, X)6X based on the relation of r ×q S (ρ) = S (L) = S (K). Throughout the rest of the paper, the notations of φ ∈ R and q represent the orthonormal basis and true dimension of S (ρ). 3. Response dimension reduction in the envelope model 3.1. Dimension reduction for Y For convenience and without loss of generality, we can re-write the multivariate linear regression in (1) without the intercept α by centering the responses and predictors. Then E (Y|X) = βX. Proposition 1. For 0 defined in the envelope model (2) for the multivariate linear regression (1), E (Y|X) = E (PT0(6 ) Y|X). Y

Proof. Under the envelope model in (2), Proposition 3.1 in CLC establishes that E6 (β) = E6 (6Y β). By the definition of 1 −1 E6 (6− Y β), it is implied that S (6Y β) ⊆ S (0), equivalently that S (β) ⊆ S (6Y 0). From this we can establish a relation of β = 6Y 0γ for a u × p matrix γ . Therefore, the following equivalence can be derived: −1

PT0(6Y ) E (Y|X) = PT0(6Y ) βX = 6Y 0(0T 6Y 0)−1 0T βX

= 6Y 0(0T 6Y 0)−1 0T 6Y 0γ X = 6Y 0γ X = βX = E (Y|X). This completes the proof.



146

J.K. Yoo / Journal of the Korean Statistical Society 42 (2013) 143–148

From Proposition 1, 0T Y for the envelope model achieves linear response reduction and can replace the original r-dimensional Y without loss of information about E (Y|X). In the envelope model, predictors X influence the conditional distribution of Y|X only through 0T Y, because 0T Yy0T0 Y|X (R4) and 0T0 YyX (R5). This means that all information about Y|X is the same as that about 0T Y|X. This is one direct implication of Proposition 1. The linearity condition A1 is satisfied in the envelope model (2). To show this, the following lemma is needed. Lemma 1. Assume that the centered-envelope model in (2) holds. Then E (0T0 Y|0T Y) = 0. Proof. We have the following equation: E (0T0 Y|0T Y) = E {E (0T0 Y|0T Y, X)|0T Y}. Then results R4–R5 force that E (0T0 Y|0T Y, X) = E (0T0 Y|X) = E (0T0 Y) = 0. This completes the proof.



We return to linearity condition. Then E (Y|0T Y) = E {(P0 + Q0 )Y|0T Y} = E (P0 Y|0T Y) + E (Q0 Y|0T Y) = P0 Y.

(7)

The third equation in (7) is directly established by Lemma 1. We summarize this in the following proposition. Proposition 2. For 0 defined in the envelope model (2) for multivariate linear regression (1), linearity condition A1 holds, that is, E (Y|0T Y = a) is linear in a. By Proposition 2, linear and conditional response reductions in the envelope model for (1) always coincide. 3.2. Relations between 0 and φ 3.2.1. Containment 1 Under model (1), β can be thought as the OLS coefficient matrix, and hence we have β = cov(Y, X)6− X . Proposition 3.1 −1 in CLC shows that E6 (β) = E6 (6Y β) and implies the following relation: 1 S (β) = S (6− Y β) ⊆ S (0).

−1 −1 1 It is easily noticed that 6− Y β = 6Y cov(Y, X)6X , which is the kernel matrix used to restore φ . Therefore, the following relation is directly implied: 1 −1 S (β) = S {6− Y cov(Y, X)6X } = S (φ) ⊆ S (0).

(8)

Eq. (8) shows that S (0) should be an upper bound of S (φ) under the envelope model. This implies that parts of dimensionreduced responses 0T Y can be redundant for E (Y|X), but the parts is required to explain the conditional covariance 6. 3.2.2. Covariance dimension reduction versus mean dimension reduction The main purpose of the envelope model (2) is placed on more efficient estimation of β through dimension reduction of 6 informative to β. The relation R1 of 6 = 61 + 62 and E6 (B ) = S (61 ) establishes it. This covariance dimension reduction leads the dimension reduction of β, which induces the dimension reduction of the response Y for E (Y|X). Supposing that 0 is known, the CLC approach reduces the dimension of 6 from r (r + 1)/2 to u(u + 1)/2 + (r − u)(r − u + 1)/2, and the difference is u(r − u). Since u ≤ r , u(r − u) is always non-negative. On the other hand, the YC approach directly reduces the dimension of Y for E (Y|X) without considering the relation between E (Y|X) and 6. That is, in the case of the YC, the relations of 6 = 61 + 62 and E6 (B ) = S (61 ) is not required. √ √ The following highlight the difference between the two approaches. With r = 2, let φ = (1/ 2, 1/ 2)T √ example √ can and φ0 = (1/ 2, −1/ 2)T . We construct a regression as follows: Y ∈ R2 |X = φνX + ε, where ν ∈ R1×p , ε ∼ MN (0, 6) and 6 = {(a, b)(b, c )}T . In the example, it is easily derived that E (Y|X) = E (PTφ(6 ) Y|X) regardless of a, b, and c. The reduced dimension for Y is Y clearly one, and the YC-approach can estimate φ. For this example to be the envelope model with respect to φ, the following covariance decomposition should hold:

6 = φφT 6φφT + φ0 φT0 6φ0 φT0 =

 (a + c )/2 b



b . (a + c )/2

That is, 6 must have special structure, which is the common diagonal values in 6. If this is not satisfied, φ cannot span a reducing space for 6, and hence we cannot have the envelope model with respect to φ. Suppose that 6 = {(1, 0.5), (0.5, 2)}T . In this case, a possible choice of 0 to construct an envelope model must satisfy that S (φ) ⊂ S (0).

J.K. Yoo / Journal of the Korean Statistical Society 42 (2013) 143–148

147

Then, without loss of generality, 0 should be the identity matrix, I2 . Then the envelope model with respect to 0 fails to dimension reduction of Y for E (Y|X). Generally, in such case that u = r, that is 6 = 61 , we can always set 0 = Ir . In such case, the CLC approach cannot desirably reduce the dimension of Y for E (Y|X), although there is a possibility for dimension reduction for E (Y|X). When the primary importance in multivariate linear regression is placed onto dimension reduction of covariance cov(Y|X), the YC method is not expected to have potential advantage over the envelope estimator, because the former covers E (Y|X) only. In such case, the latter should be considered and have more efficiency in dimension reduction of cov(Y|X). 3.2.3. Parametric versus non-parametric Response dimension reduction in the enveloped model through 0 is parametric, because all theoretical developments for model (2) are based on the existence of E6 (B ) under the classical multivariate linear regression in (1). On the other hand, in the derivation of φ, the YC does not put any assumption except condition A1. In this sense, the YC seems non-parametric for response dimension reduction. 3.2.4. Estimation The approaches for the estimation of 0 and φ are different. Since the envelope model is fully parametric, the log-likelihood function under (2) is maximized to estimate unknown population quantities. After replacing all the other parameters with their MLEs in the log-likelihood, we have the following equation:

ˆ res 0) + log det(0T0 6 ˆ Y 00 ), log D = log det(0T 6

(9)

ˆ res is the sample covariance matrix of the residuals from the fit of the multivariate linear regression (1) and 6 ˆ Y is where 6 the usual moment estimator of 6Y . Then 0 is estimated by minimizing log D in (9) over the Grassmann manifold Gr ×m under u = m. For more about the Grassmann manifold, readers may refer Edelman, Tomás, and Smith (1998). For the hypothesis u = m, the test statistics are A

ˆ (m) = 2(Lˆ FM − Lˆ m ) ∼ χp2(r −m) , where Lˆ FM and Lˆ m denote the maximum values of the log-likelihood for (1) constructed as Λ A

and (2) respectively, and the notation ‘∼’ stands for the asymptotic convergence in distribution. The estimation of φ requires the minimization of the following quadratic objective function with arguments B ∈ Rr ×q and C ∈ Rq×p : 1 ˆ − vec(BC)}T Vˆ − ˆ − vec(BC)}, Fq (B, C) = {vec(ρ) ρ {vec(ρ)

(10)

ˆ Y , cov ˆX ˆ (Y, X), and 6 where ρˆ is constructed by replacing 6Y , cov(Y, X) and 6X with their usual moment estimators 6 ˆ ρ is a consistent estimator of the asymptotic covariance matrix of respectively, and the pr × pr inner-product matrix V √ nvec(ρˆ − ρ). The minimization of (10) can be considered as the alternating weighted least squares. Write vec(BC) = (Ir ⊗ B)vec(C) and fix B through replacing it by a p × q initial matrix. Then the estimation of C via minimizing (10) can be considered as 1 ˆ on (Ir ⊗ B) with weight Vˆ − weighted least squares of vec(ρ) ρ . On the other hand, with fixing C, consider minimization of (10) over B. The matrix B = (b1 , . . . , bq ) is updated column by column with the other columns fixed subject to ∥bk ∥ = 1 and bTk B(−k) = 0, where B(−k) is the matrix that is left after taking column bk from B. Let ck is the kth row of C = (c1T , . . . , cqT )T and C(−k) consist of all but the kth row of C. For this partial minimization problem, we can re-write (10) as follows. 1 T ˆ− F ∗ (bk ) = {θ k − (ckT ⊗ Ip )QB(−k) bk }T V ρ {θ k − (ck ⊗ Ip )QB(−k) bk },

where θ k = vec{ρˆ − B(−k) C(−k) } ∈ Rr ×p and QB(−k) projects onto the orthogonal complement of S {B(−k) } in the usual inner product. Again, the problem becomes a weighted least square fit, and we can summarize the estimation of B and C as follows: (1) Fix B. As an initial value, a set of canonical bases, (e1 , . . . , eu ), are taken, where ei is a p × 1 vector with the ith place alone one and elsewhere zeros. ˆ on (Ir ⊗ B) with weight Vˆ −1 . (2) Given B, estimate C through weighted least squares of vec(β) ρ (3) Fixing C with an estimate acquired from step (2), estimate bk , k = 1, . . . , u, in order through weighted least squares of 1 θ k on (ckT ⊗ Ip )QB(−k) with weight Vˆ − ρ . Update B as its estimate acquired from the current step. (4) Alternate weighted least squares in step (2) and (3) until a termination condition is satisfied.

ˆ νˆ φ ) = argB,C min Fm (B, C) in (10) and Fˆm = Fm (φ, ˆ νˆ φ ) under q = m. Then, for the hypothesis q = m, the Define that (φ, A

ˆ is an estimator of φ. suggested test statistics are nFˆm ∼ χ(2r −m)(p−m) . And, under q = qˆ , φ

148

J.K. Yoo / Journal of the Korean Statistical Society 42 (2013) 143–148

4. Discussion In the paper, we re-interpret the envelope model for the classical multivariate linear regression proposed by Cook et al. (2010) as response dimension reduction in the context of Yoo and Cook (2008). Without loss of information about E (Y|X), the envelope model reduces the dimension of the r-dimensional response Y by transforming Y to its lower-dimensional linear combination 0T Y, following the notations used in the paper. Since all information about the multivariate linear regression is placed on E (Y|X), 0T Y|X has the same amount of information in Y|X. Therefore, usage of 0T Y|X over Y|X can release the curse of dimensionality to arise in the analysis of data with high dimensional responses such as functional data, longitudinal data, growth curve or time series data. We generalize the envelope model in (2) as follows: Y|X = α + g (βX) + ε,

(11)

where g (·) is an unknown link function. The model in (11) is reduced to the model in (2), if g (·) is the identity function. Supposing that E6 (B ) exists and that β = 0ν, we call model (11), the link-free envelope model. One potential research topic is the study of the link-free envelope model regarding its properties and the estimation of unknown parameters without knowing g (·). The research may include the development of response dimension reduction under (11). This type of response dimension reduction can be called the link-free response dimension reduction. Since unknown link function is involved in E (Y|X), conditions to guarantee that 0T Y reduces the dimension of Y without loss of information on E (Y|X) should be established. Work along these lines is in progress. Acknowledgments The authors are grateful to the associate editor and the two referees for many insightful and helpful comments. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (KRF) funded by the Ministry of Education, Science and Technology (2011-0005581). References Cook, R. D., Li, B., & Chiaromonte, F. (2010). Envelope models for parsimonious and efficient multivariate linear regression. Statistics Sinica, 20, 927–1010. Edelman, A., Tomás, A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20, 303–353. Hall, P., & Li, K. C. (1993). On almost linearity of low-dimensional projections from high-dimensional data. Annals of Statistics, 21, 867–889. Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society: Series B, 55, 725–740. Li, K. C., Aragon, Y., Shedden, K., & Agnan, C. T. (2003). Dimension reduction for multivariate response data. Journal of the American Statistical Association, 98, 99–109. Searle, A. (1982). Matrix algebra useful for statistics. New York: Wiley. Yoo, J. K., & Cook, R. D. (2008). Response dimension reduction for the conditional mean in multivariate regression. Computational Statistics & Data Analysis, 53, 334–343.