Computational Statistics and Data Analysis 103 (2016) 139–150
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Reduced rank regression with possibly non-smooth criterion functions: An empirical likelihood approach Sanying Feng a , Heng Lian b,∗ , Fukang Zhu c a
School of Mathematics and Statistics, Zhengzhou University, Zhengzhou 450001, China
b
School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia
c
School of Mathematics, Jilin University, Changchun 130012, China
article
info
Article history: Received 22 December 2015 Received in revised form 29 April 2016 Accepted 29 April 2016 Available online 11 May 2016 Keywords: Asymptotic normality Empirical likelihood Median regression Quantile regression
abstract Reduced rank regression is considered when the criterion function is possibly non-smooth, which includes the previously un-studied reduced rank quantile regression. The approach used is based on empirical likelihood with a rank constraint. Asymptotic properties of the maximum empirical likelihood estimator (MELE) are established using general results on over-parametrized models. Empirical likelihood leads to more efficient estimators than some existing estimators. Besides, in the framework of empirical likelihood, it is conceptually straightforward to test the rank of the unknown matrix. The proposed methods are illustrated by some simulation studies and real data analyses. © 2016 Elsevier B.V. All rights reserved.
1. Introduction Reduced rank regression was proposed in the 1950s (Anderson, 1951) with the purpose of introducing a more parsimonious model in cases with multiple responses. Let y = (y1 , . . . , yq )T be the q responses that are to be related to the p-dimensional predictor x = (x1 , . . . , xp )T . The multivariate linear regression model is given by y = α + BT x + e,
(1)
where α ∈ R is the intercept, B is the p × q coefficient matrix and e is the q-vector of mean zero noises. Different columns of B are associated with different responses. If we assume the noises for different responses are independent and use ordinary least squares procedure to fit the model, the procedure is the same as linear regression for each response separately. Reduced rank regression aims at producing a more parsimonious model with fewer than p × q parameters by assuming that rank(B) ≤ r for an integer r < min(p, q). Informally the number of parameters in reduced rank regression can be counted as follows. Assuming the first r columns of B are linearly independent, we have the parametrization B = (B1 , B1 BT2 ) for a rank r matrix B, where B1 is a p × r matrix and B2 is a (q − r )× r matrix, and the number of free parameters is thus r (p + q − r ) < pq. With reduced degrees of freedom, the reduced rank regression has a potential to produce a more efficient estimator of B, as shown in Anderson (1999). Some more recent references on reduced rank regression include Geweke (1996), Bunea et al. (2011, 2012), Chen and Huang (2012), Chen et al. (2012), Chen et al. (2013) and Lian and Ma (2013). In this paper, we consider reduced rank regression in a more general framework, allowing also the criterion functions to be non-smooth, as for median or quantile regression, although the framework proposed here is more generally applicable q
∗
Corresponding author. E-mail address:
[email protected] (H. Lian).
http://dx.doi.org/10.1016/j.csda.2016.04.012 0167-9473/© 2016 Elsevier B.V. All rights reserved.
140
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
than the numerical examples provided later. For example, we can also deal with generalized linear model with multivariate binary or count responses. Let β = (βT1 , . . . , βTq )T = vec(B) and let α ∈ Rl be other parameters in the model (for example α is the intercept in (1)). Let θ = (αT , βT )T be all the parameters. We assume there are m estimating equations, with m ≥ qp + l, g(z, θ) = (g1 (z, θ), . . . , gm (z, θ))T , where z = (xT , yT )T . For example, in the standard reduced rank regression for mean, we can use the estimating equations xj (yk − αk − xT βk ) = 0, j = 1, . . . , p, k = 1, . . . , q together with yk − αk − xT βk = 0, k = 1, . . . , q. It is known that least squares procedure is extremely sensitive to outliers or heavy-tailed error distribution and does not have asymptotically normal distribution when the second moment of the error is not finite. An obvious alternative to least squares procedure is to replace the quadratic loss with absolute least deviation loss, resulting in median regression. More generally, suppose we want to estimate the τ th quantiles of the q responses, we can use the estimating equations xj (τ − I {yk ≤ αk + xT βk }) = 0, j = 1, . . . , p, k = 1, . . . , q together with τ − I {yk ≤ αk + xT βk } = 0, k = 1, . . . , q. When τ = 1/2 the above reduces to median regression. The flexibility of allowing more estimating equations than the number of parameters also means we can easily incorporate additional information on the parameters. For example, we can collect the estimating equations for mean regression and median regression together, if the error distribution is symmetric, to improve efficiency. Thus our approach is more general than previous reduced rank models which solely focused on least squares loss. We propose to use empirical likelihood (Owen, 1988, 1990) which naturally deals with a general set of estimating equations. As demonstrated in Qin and Lawless (1994) and Newey and Smith (2004), it requires less restrictive distributional assumptions for statistical inferences. Empirical likelihood involving non-smooth criterion functions has previously been treated in Lopez et al. (2009). Notably, when there are more estimating equations than the number of parameters, which is typically the case when using reduced rank regression, Qin and Lawless (1994) showed that the empirical likelihood approach is fully efficient in the sense that the estimator has the same asymptotic variance as the optimal estimator obtained from the class of estimating equations that are linear combinations of g1 , . . . , gm (see Corollary 2 in Qin and Lawless (1994)). Another advantage of using empirical likelihood is that it is easy to develop a test statistic for the coefficient matrix rank based on empirical likelihood ratio, which is an important but difficult problem. The rest of the paper is organized as follows. In Section 2, we present the general framework for reduced rank regression based on estimating equations using empirical likelihood. Section 3 specializes the general results to the case of mean regression and quantile regression, verifying the maximum empirical likelihood estimator is more efficient than some naive alternative estimation approaches. Section 4 reports our numerical results, using both simulations and two data examples. We conclude with some discussions in Section 5. The technical assumptions and theoretical proofs are relegated to the supplementary material (Appendix A). 2. Empirical likelihood for reduced rank regression Assume we are given i.i.d. observations zi = (xTi , yTi )T , i = 1, . . . , n from some distribution F with unknown parameters (α, B) with α ∈ Rl and B ∈ Rp×q . Let β = vec(B) and θ = (αT , βT )T . With m estimating equations g(z, θ), which satisfy E [g(z, θ)] = 0 if θ is the true value, the empirical likelihood ratio is defined as EL(θ) = sup
n
npi |pi ≥ 0,
i =1
pi = 1,
i
pi g(zi , θ) = 0 ,
i
and the maximum empirical likelihood estimator (MELE) is defined to be the maximizer of EL(θ), which exists and is unique under mild regularity assumptions. The reduced rank estimator is naturally defined as the maximizer of EL(θ) under the constraint rank(B) ≤ r, for a given integer r. When the maximizer is well defined, the Lagrange multiplier method provides the optimal weights to be pi =
1
1
n 1 + λT g(zi , θ)
,
(2)
where the multiplier λ is determined by g(zi , θ)
1 i
n 1 + λT g(zi , θ)
= 0,
(3)
and thus λ = λ(θ) is considered as a function of θ . The following matrix plays a role in the proof of our main result:
V=
V11 VT12
V12 0
,
where V11 = Eg(z, θ)gT (z, θ) is an m × m symmetric matrix, and V12 = − ∂ T E [g(z, θ)] is an m × (pq + l) matrix. Note we ∂θ do not assume that g(z, θ) is differentiable in θ , only that its expectation is.
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
141
For presentation of the theoretical results, we also need to use some parametrization of B to get rid of the rank constraint. Since rank(B) ≤ r, we can write B = DET for two matrices D ∈ Rp×r and E ∈ Rq×r . We define the Jacobian matrix
1=
∂θ ∂(α1 , . . . , αl , vecT (D), vecT (E))
evaluated at the true parameter values. Note that 1 is the form
1=
Il × l 0
0
1
where
= 1
∂(
vecT
∂β . (D), vecT (E))
In the following theorem, we consider the asymptotic distribution of the (reduced rank) MELE θ . We use subscript 0 to denote the true parameters, in particular we denote the rank of B0 by r0 . Theorem 1. Under assumptions (A1)–(A6) stated in the supplementary material (Appendix A), we have
√
n( θ − θ 0 ) → N (0, 1(1T A1)− 1T ) in distribution,
1 − denotes the Moore–Penrose pseudo-inverse. Here both A and 1 are evaluated at the true where A = VT12 V− 11 V12 and (·) parameter value.
Now we consider the problem of testing the rank r in the model. Using empirical likelihood, it is conceptually straightforward to test the rank based on the empirical likelihood ratio. More specifically, we have the following result. Theorem 2. Assume (A1)–(A6) in the supplementary material (Appendix A). Let EL(r ) = maxθ:rank(B)≤r EL(θ). If r ≥ r0 , we have d
−2logEL(r ) → χm2 −l−r (p+q−r ) . On the other hand, if r < r0 and in addition (A7) holds, then for any C > 0, P (−2logEL(r ) > C ) → 1. 3. Mean and quantile regression In this section, we specialize the previous general result to mean and quantile regression. We can write the asymptotic variance more explicitly for these cases. We do not try to verify that (A1)–(A7) hold for these special cases, which can be done as in Lopez et al. (2009) for quantile regression, for example. For both mean and quantile regression, we assume α = (α1 , . . . , αq )T are the intercept parameters, and B contains the coefficients associated with x = (x1 , . . . , xp )T . In this section, it turns out to be more convenient to reorder the components of θ such that θ = (α1 , βT1 , . . . , αq , βTq )T , so that the expressions later can be simplified without treating the intercept explicitly. Still let
1=
∂θ , ∂(α1 , . . . , αq , vecT (D), vecT (E))
and with V as defined in the previously section. We still have that the asymptotic covariance matrix of θ is 1(1T A1)− 1T . 3.1. Mean regression First consider the mean regression model (1). We assume the errors e are independent of covariates x. Let 6e be the covariance matrix of e. Without using empirical likelihood, one can define the following reduced rank estimator:
θ 1 = arg min ∥Y − 1αT − XB∥2 , rank(B)≤r
where for any matrix A, ∥A∥2 = tr(AAT ) is the squared Frobenius norm, Y = (y1 , . . . , yn )T , X = (x1 , . . . , xn )T , and 1 is the vector of ones whose dimension can be determined form contexts. To simplify notation, we define wi = (1, xTi )T , W = (w1 , . . . , wn )T and B = (α, BT )T . By this we can write Y − 1αT − XB as Y − W B. Note that due to the reordering of components of θ we have θ = vec( B) which will lead to simplification of some expressions below. We use the estimating equations g(x, y, θ) = (y − BT w) ⊗ w = 0,
142
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
where ⊗ denotes the Kronecker product. As before, we denote the MELE by θ . We have the following proposition that explicitly presents the covariance matrix and compare it with the asymptotic covariance matrix of θ 1 . It says that θ is more T efficient than θ 1 , and the two are the same if 6e is the identity matrix. Let 6w be the limit of W W/n which is assumed to be positive definite. Proposition 1. θ and θ 1 both have asymptotically Gaussian distributions with mean zero and covariance matrix given by
T 1 1 − T 6 = 1(1T (6− ⊗ 6w )1)− 1T and 61 = 1(1T (I ⊗ 6w )1)− 1T (6e ⊗ 6− e w )1(1 (I ⊗ 6w )1) 1 , respectively. We also have 6 ≤ 61 and 6 = 61 if 6e = I, where for two matrices A, B, A ≤ B (or B ≥ A) means B − A is nonnegative definite.
3.2. Quantile regression Since quantile regression is usually based on minimizing the check loss function, an obvious reduced rank estimator in quantile regression without using empirical likelihood is
θ 1 = arg min
q n
rank(B)≤r i=1 k=1
ρτ (yik − αk − xTi βk ).
(4)
We note that here we are not using the concept of multivariate quantile such as that defined in Chaudhuri (1996) or Hallin et al. (2010). Rather, with responses y1 , . . . , yq , we are interested in the quantiles of each yj ‘‘marginally’’. That is, we are assuming we are interested in the τ -quantile of y1 , τ -quantile of y2 , etc. This of course can be regarded as q entirely separated quantile regression problems. The reduced-rank approach allows us to reduce the number of the unknown parameters for more parsimonious and more efficient estimation. The following proposition compares efficiency of θ 1 with that of θ , with the latter defined to be the MELE with estimating T equations g(z, θ) = (τ 1 − I{y ≤ B w}) ⊗ w, where I{y ≤ BT w} = (I {y1 ≤ α1 + βT1 w}, . . . , I {yq ≤ αq + βTq w})T . Let e = (e1 , . . . , eq )T = τ 1 − I{y ≤ BT0 w} and 6e = E [eeT |x]. Note e has mean zero. Define the q × q diagonal matrix 0 = diag{f1 (0, x), . . . , fq (0, x)}, where fj (0, x) is the density of ej at 0 conditional on x. Proposition 2. given by
√ √ n(θ − θ) and n( θ 1 − θ) are both asymptotically Gaussian distributed with mean zero and covariance matrix
6 = 1(1T E [0 ⊗ (wwT )](E [6e ⊗ (wwT )])−1 E [0 ⊗ (wwT )]1)− 1T and
61 = 1(1T E [0 ⊗ (wwT )]1)− 1T E [6e ⊗ (wwT )]1(1T E [0 ⊗ (wwT )]1)− 1T , respectively. We also have 6 ≤ 61 . 4. Numerical examples As defined previously, when assuming B, which results from a rearrangement of components of β into a p × q matrix, is of rank r, the MELE is naturally defined as
θ = arg max EL(θ). rank(B)≤r
Directly solving this (nonconvex) constrained optimization problem seems very hard. To get around this rank constraint, we write B = DET as in our theory and iteratively solve E ← arg max EL(α, D, E), E
D ← arg max EL(α, D, E), D
α ← arg max EL(α, D, E), α
where regard g(zi , θ) as functions of α, D, E and EL(α, D, E) = sup
n i=1
npi |pi ≥ 0,
i
pi = 1,
pi g(zi , α, D, E) = 0 .
i
For example, for least squares regression, the estimating equations are (y − α − (DET )T x) ⊗ w. Vectorization is used in our implementation, using that BT x = EDT x = (E ⊗ xT )vec(D) = (xT D ⊗ I)vec(E). Thus the three optimization problems are the same as in standard empirical likelihood computation.
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
143
Obviously the iterative algorithm in our case will result in monotonically decreasing objective function values. However, it is clear that D and E are not identified in the reduced rank regression model, although DET is identified, which raised some concerns on the convergence of the algorithm. Empirically, we do not meet with any convergence issue in our numerical studies, while it seems hard to give a general theory that guarantees its convergence. In practice similar iterative scheme is popularly used in reduced-rank regression, as in Chen and Huang (2012) and Bunea et al. (2012), which focused on the mean regression problem. Chen and Huang (2012) did not discuss the convergence issue, while Bunea et al. (2012) showed that it is convergent to a stationary point. 4.1. Simulations We use two simulation examples to illustrate both mean regression and quantile regression. Example 1. The data are generated from model (1) with p = 5, q = 4, r = 2. The intercept is α = (1, . . . , 1)T and the true coefficients B0 = D0 ET0 , where we set the entries of D0 to be independently generated from N (0, 22 ), and E0 is a q × r matrix randomly generated from a uniform distribution on the set {E : ET E = I2×2 }. We generate the covariates from a ′ multivariate Gaussian distribution with Cov(Xij , Xij′ ) = ρ |j−j | , 1 ≤ j, j′ ≤ p. The q-dimensional error vector follows either a multivariate normal distribution, or a multivariate t distribution with degrees of freedom equal to 3, with correlation | j −j ′ |
matrix Cov(eij , eij′ ) = ρe , 1 ≤ j, j′ ≤ q and scale parameter σ = 1 or 2. The multivariate normal distribution and multivariate t distribution are generated using the mvtnorm package in R. We also consider χ22 errors, which are generated by first generating multivariate normal distribution as before and then applying the inverse cdf method to get correlated χ22 errors. We set n = 50, 100, ρ = 0.25, ρe = 0, 0.25, 0.5 and σ = 1, 2. Example 2. Here we consider an example with heterogeneous errors. For a given τ , we consider the generating model yi = α + BT X + (0.5 + 0.1| BT x|)σ e, where α, B are generated in the same way as in Example 1, B is independently generated using the same method as for generating B, | BT x| means component-wise absolute value, and components of e are independent normal, t, or χ22 errors shifted by a constant such that its τ -quantile is equal to zero. As in Example 1, we set n = 50, 100, ρ = 0.25 (correlation of predictors), and σ = 1, 2. First we consider least squares regression and median regression (quantile regression with τ = 0.5) applied to both simulation examples (setting τ = 0.5 in Example 2), assuming knowledge of the true rank r = 2. Tables 1–6 show the mean squared errors (MSE) ∥ B − B0 ∥2 for six methods: least squares regression without rank constraint (LS), reduced rank least squares regression (RR.LS), median regression without rank constraint (RQ), reduced rank median regression (RR.RQ), empirical likelihood based reduced rank least squares regression (RREL.LS), empirical likelihood based reduced rank median regression (RREL.RQ). We see that reduced rank methods are always improvements over the corresponding non-reduced rank methods. The empirical likelihood based reduced rank methods are always better than their non-empirical-likelihood counterparts. Next, using both examples with τ = 0.75, we further report the MSE for the quantile regression methods in Tables 7 and 8. We see that the empirical likelihood based reduced rank method is better than check loss based reduced rank method. Finally, we consider the problem of testing the rank r. In the simulation, we directly verify the accuracy of approximating the log empirical likelihood by the chi-squared distribution as stated in Theorem 2. To save space, we only use Example 1 with σ = 1 and ρe = 0.25 for illustration. The qq-plots showing the quantiles of the fitted values of −2logEL(r ) versus the quantiles of the chi-squared distribution are shown in Figs. 1–3 for the three different error distributions, respectively. Results for both RREL.LS and RREL.RQ are shown in these figures using r = 0, 1, 2, 3. When r = 0, 1, the model is misspecified and we see that the fitted values of −2logEL(r ) are very large compared to that would be predicted by a chisquared distribution, as proved in the second part of Theorem 2. When r = 2, 3, the model is correctly specified and based on the qq-plot the chi-squared distribution is a good approximation to the empirical distribution of −2logEL(r ). 4.2. Data analysis We analyze here two datasets, to illustrate the reduced-rank mean regression and reduced-rank quantile regression, respectively. First, we apply the proposed method to the SHARe Framingham Heart Study (cf. Cupples et al., 2007). In our data, various traits and medical indices were recorded for 1804 patients at multiple time points. To avoid correlations for a single patient at different time points, only measurements at the first time point for each patient are used for this analysis. The following 9 medical measurements are used as response variables: total cholesterol level, diastolic blood pressure, P-R interval in ECG, QRS interval in ECG, Q-T interval in ECG, glucose level, systolic blood pressure, triglyceride level, ventricular rate per minute in ECG, together with 7 predictors: weight, height, beer consumption per week, number of cigarettes per day, hours of heavy
144
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
Table 1 Mean squared errors for simulation Example 1 with normal errors. The numbers in the brackets are the standard errors of MSE based on 100 repetitions.
(n, ρe , σ )
LS
RR.LS
RQ
RR.RQ
RREL.LS
RREL.RQ
(50, 0, 1)
0.499 (0.207)
0.436 (0.417)
0.514 (0.153)
0.476 (0.160)
0.415 (0.156)
0.449 (0.156)
(50, 0, 2)
1.999 (0.830)
1.465 (0.752)
1.957 (0.615)
1.551 (0.656)
1.406 (0.626)
1.460 (0.715)
(50, 0.25, 1)
0.500 (0.210)
0.424 (0.414)
0.407 (0.146)
0.363 (0.151)
0.286 (0.151)
0.348 (0.157)
(50, 0.25, 2)
2.000 (0.840)
1.470 (0.751)
1.630 (0.586)
1.525 (0.604)
1.195 (0.619)
1.436 (0.661)
(50, 0.5, 1)
0.502 (0.225)
0.476 (0.419)
0.411 (0.160)
0.371 (0.171)
0.267 (0.151)
0.368 (0.190)
(50, 0.5, 2)
2.009 (0.903)
1.499 (0.803)
1.645 (0.643)
1.598 (0.766)
1.116 (0.607)
1.487 (0.717)
(100, 0, 1)
0.240 (0.088)
0.196 (0.092)
0.245 (0.077)
0.196 (0.078)
0.179 (0.056)
0.187 (0.072)
(100, 0, 2)
0.961 (0.354)
0.677 (0.284)
0.981 (0.308)
0.767 (0.345)
0.625 (0.227)
0.759 (0.320)
(100, 0.25, 1)
0.245 (0.097)
0.203 (0.097)
0.204 (0.077)
0.186 (0.073)
0.130 (0.059)
0.157 (0.069)
(100, 0.25, 2)
0.981 (0.389)
0.717 (0.333)
0.816 (0.308)
0.769 (0.325)
0.529 (0.244)
0.647 (0.288)
(100, 0.5, 1)
0.250 (0.111)
0.209 (0.106)
0.211 (0.079)
0.194 (0.082)
0.124 (0.065)
0.147 (0.060)
(100, 0.5, 2)
1.001 (0.444)
0.755 (0.417)
0.840 (0.316)
0.807 (0.362)
0.507 (0.269)
0.611 (0.304)
Table 2 Mean squared errors for simulation Example 1 with Student’s t errors.
(n, ρe , σ )
LS
RR.LS
RQ
RR.RQ
RREL.LS
RREL.RQ
(50, 0, 1)
1.358 (1.172)
1.100 (1.181)
1.080 (0.539)
0.687 (0.409)
0.683 (0.985)
0.536 (0.321)
(50, 0, 2)
5.432 (4.688)
4.160 (4.639)
4.319 (2.157)
2.922 (1.720)
3.086 (5.508)
2.379 (1.726)
(50, 0.25, 1)
1.385 (1.441)
1.127 (1.495)
1.067 (0.515)
0.692 (0.403)
0.716 (1.342)
0.566 (0.394)
(50, 0.25, 2)
5.540 (5.764)
4.251 (5.742)
4.267 (2.061)
2.876 (1.627)
3.009 (5.814)
2.271 (1.619)
(50, 0.5, 1)
1.423 (1.814)
1.164 (1.894)
1.070 (0.535)
0.716 (0.395)
0.709 (1.569)
0.533 (0.356)
(50, 0.5, 2)
5.693 (7.257)
4.458 (7.202)
4.279 (2.141)
3.032 (1.727)
3.713 (1.342)
2.072 (1.315)
(100, 0, 1)
0.703 (0.698)
0.491 (0.460)
0.452 (0.165)
0.289 (0.121)
0.252 (0.169)
0.208 (0.091)
(100, 0, 2)
2.812 (2.796)
2.269 (3.535)
1.811 (0.665)
1.181 (0.510)
1.667 (1.100)
0.835 (0.377)
(100, 0.25, 1)
0.705 (0.762)
0.492 (0.490)
0.461 (0.174)
0.295 (0.123)
0.246 (0.169)
0.198 (0.095)
(100, 0.25, 2)
2.823 (3.050)
2.247 (3.637)
1.844 (0.699)
1.196 (0.528)
1.633 (4.507)
0.792 (0.346)
(100, 0.5, 1)
0.712 (0.86)
0.502 (0.541)
0.442 (0.179)
0.287 (0.140)
0.2294 (0.164)
0.1981 (0.115)
(100, 0.5, 2)
2.851 (3.456)
2.192 (3.818)
1.770 (0.717)
1.174 (0.630)
1.324 (3.705)
0.805 (0.479)
activity per day, hours of slight activity per day, and sleeping hours per day. All the variables are standardized to have mean zero and variance one. For illustration, we will consider mean regression on this dataset. To investigate the difference between least squares based and empirical likelihood based reduced-rank regression, we regard the whole data with 1804 patients as the population and repeatedly draw 300 samples from the population randomly, for 100 times. The two reduced-rank regression
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
145
Table 3 Mean squared errors for simulation Example 1 with χ 2 errors.
(n, ρe , σ )
LS
RR.LS
RQ
RR.RQ
RREL.LS
RREL.RQ
(50, 0, 1)
1.928 (0.955)
1.519 (0.926)
1.649 (0.618)
1.144 (0.531)
1.254 (0.672)
1.116 (0.434)
(50, 0, 2)
7.712 (3.823)
6.147 (3.709)
6.595 (2.472)
5.022 (2.328)
4.813 (2.633)
4.343 (1.756)
(50, 0.25, 1)
1.922 (1.027)
1.521 (0.980)
1.689 (0.728)
1.134 (0.543)
1.103 (0.658)
1.000 (0.484)
(50, 0.25, 2)
7.686 (4.108)
6.045 (3.791)
6.755 (2.915)
4.988 (2.413)
4.941 (2.664)
4.490 (2.361)
(50, 0.5, 1)
1.918 (1.151)
1.529 (1.056)
1.778 (0.880)
1.144 (0.572)
1.202 (0.585)
1.022 (0.488)
(50, 0.5, 2)
7.672 (4.606)
6.091 (4.210)
7.110 (3.521)
5.227 (3.156)
4.923 (3.437)
4.637 (2.968)
(100, 0, 1)
0.990 (0.375)
0.698 (0.304)
0.871 (0.302)
0.575 (0.225)
0.542 (0.325)
0.442 (0.162)
(100, 0, 2)
3.960 (1.500)
2.818 (1.283)
3.484 (1.211)
2.426 (0.939)
2.510 (1.199)
2.270 (0.834)
(100, 0.25, 1)
1.009 (0.399)
0.728 (0.336)
0.878 (0.351)
0.622 (0.275)
0.553 (0.384)
0.409 (0.177)
(100, 0.25, 2)
4.034 (1.599)
2.928 (1.403)
3.514 (1.405)
2.631 (1.197)
2.513 (1.237)
2.442 (0.873)
(100, 0.5, 1)
1.035 (0.454)
0.767 (0.405)
0.881 (0.399)
0.617 (0.276)
0.625 (0.369)
0.505 (0.183)
(100, 0.5, 2)
4.142 (1.818)
3.116 (1.684)
3.524 (1.596)
2.646 (1.334)
2.710 (1.470)
2.485 (1.030)
Table 4 Mean squared errors for simulation Example 2 with normal errors.
(n , σ )
LS
RR.LS
RQ
RR.RQ
RREL.LS
RREL.RQ
(50, 1)
0.335 (0.142)
0.274 (0.335)
0.356 (0.112)
0.284 (0.134)
0.232 (0.082)
0.277 (0.098)
(50, 2)
1.342 (0.571)
1.029 (0.540)
1.356 (0.450)
1.118 (0.545)
1.028 (0.325)
1.030 (0.435)
(100, 1)
0.159 (0.085)
0.121 (0.099)
0.153 (0.050)
0.127 (0.044)
0.110 (0.043)
0.114 (0.047)
(100, 2)
0.637 (0.341)
0.455 (0.273)
0.655 (0.202)
0.489 (0.180)
0.428 (0.180)
0.435 (0.195)
Table 5 Mean squared errors for simulation Example 2 with Student’s t errors.
(n , σ )
LS
RR.LS
RQ
RR.RQ
RREL.LS
RREL.RQ
(50, 1)
1.006 (0.831)
0.784 (0.596)
0.616 (0.321)
0.390 (0.199)
0.439 (0.301)
0.317 (0.192)
(50, 2)
4.023 (3.325)
3.051 (3.337)
2.465 (1.284)
1.667 (1.014)
1.846 (1.380)
1.342 (0.927)
(100, 1)
0.469 (0.348)
0.331 (0.234)
0.252 (0.105)
0.172 (0.095)
0.175 (0.107)
0.116 (0.065)
(100, 2)
1.876 (1.394)
1.306 (1.118)
1.001 (0.423)
0.680 (0.370)
0.720 (0.427)
0.456 (0.243)
procedures are fitted on each of the 100 datasets. The average estimated coefficients B are reported in Tables 9 and 10, together with the standard errors over 100 repetitions. Qualitatively the two estimation procedures produce similar coefficient values. However, the standard errors in the empirical likelihood based reduced rank regression are smaller, showing better stability of the procedure. It can also be seen that all the coefficients for activity levels are insignificant, as well as the coefficients for the number of cigarettes. Beer consumption is marginally significant at level 0.1 only for one response, diastolic blood pressure. Next, we consider a portfolio return data. In asset pricing and portfolio management the Fama–French factor model is frequently adopted to describe stock returns. The five factors proposed in Fama and French (2015) used the following five
146
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
Table 6 Mean squared errors for simulation Example 2 with χ 2 errors.
(n , σ )
LS
RR.LS
RQ
RR.RQ
RREL.LS
RREL.RQ
(50, 1)
1.403 (0.989)
1.014 (0.735)
1.006 (0.423)
0.656 (0.305)
0.757 (0.792)
0.542 (0.247)
(50, 2)
5.611 (3.957)
4.085 (3.342)
4.023 (1.692)
2.798 (1.411)
3.024 (2.913)
2.169 (0.993)
(100, 1)
0.623 (0.256)
0.487 (0.277)
0.411 (0.186)
0.352 (0.139)
0.372 (0.142)
0.240 (0.107)
(100, 2)
2.495 (1.027)
1.812 (0.956)
1.747 (0.745)
1.431 (0.551)
1.522 (0.625)
1.015 (0.430)
Table 7 Mean squared errors for simulation Example 1 with τ = 0.75.
(n, ρe , σ )
RQ
RR.RQ
RREL.RQ
Normal error (50, 0, 1) (50, 0, 2) (50, 0.25, 1) (50, 0.25, 2) (50, 0.5, 1) (50, 0.5, 2) (100, 0, 1) (100, 0, 2) (100, 0.25, 1) (100, 0.25, 2) (100, 0.5, 1) (100, 0.5, 2)
0.948(0.331) 3.794(1.325) 0.936(0.337) 3.744(1.349) 0.962(0.386) 3.851(1.550) 0.421(0.130) 1.684(0.522) 0.428(0.139) 1.713(0.559) 0.433(0.172) 1.734(0.689)
0.614(0.252) 2.716(1.250) 0.606(0.254) 2.683(1.245) 0.630(0.279) 2.807(1.369) 0.263(0.104) 1.093(0.499) 0.276(0.113) 1.124(0.466) 0.295(0.142) 1.208(0.593)
0.555(0.306) 2.234(1.089) 0.571(0.318) 2.315(1.200) 0.608(0.345) 2.517(1.410) 0.234(0.105) 0.960(0.454) 0.245(0.111) 1.015(0.454) 0.251(0.133) 1.012(0.537)
Student’s t error (50, 0, 1) (50, 0, 2) (50, 0.25, 1) (50, 0.25, 2) (50, 0.5, 1) (50, 0.5, 2) (100, 0, 1) (100, 0, 2) (100, 0.25, 1) (100, 0.25, 2) (100, 0.5, 1) (100, 0.5, 2)
1.585(0.758) 6.342(3.035) 1.570(0.776) 6.281(3.107) 1.549(0.809) 6.195(3.237) 0.689(0.277) 2.758(1.111) 0.664(0.301) 2.658(1.207) 0.675(0.319) 2.702(1.272)
1.014(0.612) 4.549(2.735) 1.035(0.653) 4.625(2.727) 1.029(0.646) 4.611(2.847) 0.451(0.223) 1.880(0.954) 0.438(0.265) 1.764(1.039) 0.444(0.258) 1.841(1.097)
0.847(0.455) 3.750(2.026) 0.884(0.609) 3.830(2.397) 0.844(0.498) 3.911(2.690) 0.368(0.144) 1.500(0.639) 0.382(0.222) 1.584(0.819) 0.385(0.225) 1.517(0.915)
2.495(1.018) 9.981(4.073) 2.454(1.114) 9.815(4.455) 2.420(1.034) 9.681(4.138) 1.226(0.364) 4.903(1.459) 1.259(0.408) 5.035(1.635) 1.281(0.456) 5.126(1.827)
1.826(0.8214) 8.715(4.405) 1.819(0.9557) 7.283(4.759) 1.889(1.017) 7.418(4.037) 0.824(0.349) 3.736(1.604) 0.854(0.405) 3.915(1.670) 0.925(0.427) 4.106(1.746)
1.589(0.748) 7.091(2.803) 1.417(0.752) 6.146(4.012) 1.498(0.760) 6.131(3.459) 0.669(0.290) 2.306(1.369) 0.702(0.360) 3.114(1.447) 0.832(0.348) 3.575(1.299)
χ 2 error (50, 0, 1) (50, 0, 2) (50, 0.25, 1) (50, 0.25, 2) (50, 0.5, 1) (50, 0.5, 2) (100, 0, 1) (100, 0, 2) (100, 0.25, 1) (100, 0.25, 2) (100, 0.5, 1) (100, 0.5, 2)
predictors: company Size, company Price-to-Book Ratio, Market Risk, profitability (stocks with a high operating profitability perform better) and an investment factor (stocks of companies with the high total asset growth have below average returns). The data we use here can be downloaded from http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html with the monthly returns of 10 Industry Portfolios as the responses (q = 10), containing monthly data from July 1964 to December 2015 (n = 630). For this dataset, we fit the quantile regression model at quantile levels 0.25, 0.5 and 0.75. The standard quantile regression (RQ), reduced rank quantile regression (RR.RQ) and empirical likelihood based reduced rank quantile regression (RREL.RQ) are used on the three different quantile levels. We randomly sample half of the data as training and the rest as testing. The
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
147
Table 8 Mean squared errors for simulation Example 2 with τ = 0.75.
(n , σ )
RQ
RR.RQ
RRQEL.RQ
Normal error (50, 1) (50, 2) (100, 1) (100, 2)
0.530(0.194) 2.122(0.778) 0.229(0.099) 0.915(0.398)
0.335(0.147) 1.367(0.597) 0.144(0.074) 0.577(0.308)
0.326(0.172) 1.294(0.605) 0.128(0.062) 0.533(0.233)
Student’s t error (50, 1) (50, 2) (100, 1) (100, 2)
0.964(0.621) 3.858(2.485) 0.362(0.158) 1.450(0.633)
0.617(0.482) 2.658(2.275) 0.245(0.122) 1.010(0.507)
0.539(0.361) 2.236(1.279) 0.225(0.106) 0.901(0.480)
1.570(0.662) 6.281(2.650) 0.747(0.333) 2.990(1.332)
1.028(0.511) 4.935(2.424) 0.490(0.262) 2.098(1.098)
0.778(0.341) 3.469(1.725) 0.392(0.201) 1.663(0.875)
χ 2 error (50, 1) (50, 2) (100, 1) (100, 2)
Fig. 1. QQ plot for the values of −2logEL(r ) for r = 0, 1, 2, 3 for Example 1 with normal error. The first row is the results for RREL.LS and the second row is the results for RREL.RQ.
n q
βk ), with αk , βk the estimators obtained prediction error on the test set is calculated as (1/n) i=1 k=1 ρτ (yik − αk − xTi from the training set. The procedure is repeated 100 times. The average prediction errors are reported in Table 11. The proposed empirical-likelihood based reduced rank method achieves the smallest prediction errors in all cases. 5. Conclusion and discussions In this paper, we proposed adopting empirical likelihood to optimally take into account the fact that the number of estimating equations is larger than the number of parameters in reduced rank regression. A general framework is theoretically investigated with concrete examples given by least squares regression and quantile regression. The direct use of empirical likelihood for testing the rank of the matrix coefficient is of particular interest. Similar tests are hard to construct based on non-empirical likelihood methods. The present work can be extended in several directions. Empirical likelihood has been extended to incorporate penalized variable selection, which is particularly useful when the dimension of the problem is high (Hjort et al., 2009; Tang and Leng, 2010; Leng and Tang, 2012). Another direction is to investigate semiparametric versions of reduced rank regression using empirical likelihood (Zhu and Xue, 2006; Xue and Zhu, 2007; Lian and Ma, 2013).
148
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
Fig. 2. QQ plot for the values of −2logEL(r ) for r = 0, 1, 2, 3 for Example 1 with Student’s t error. The first row is the results for RREL.LS and the second row is the results for RREL.RQ.
Fig. 3. QQ plot for the values of −2logEL(r ) for r = 0, 1, 2, 3 for Example 1 with χ 2 error. The first row is the results for RREL.LS and the second row is the results for RREL.RQ.
Acknowledgments We sincerely thank the editor Professor Ana Colubi, the Associate Editor and an anonymous reviewer for their insightful comments which have led to a significantly improved manuscript. Sanying Fengs research was supported by the National Natural Science Foundation of China (11501522) and the Startup Research Fund of Zhengzhou University (1512315004). Zhu’s work is supported by National Natural Science Foundation of China (No. 11371168, No. 11271155), Science and Technology Program of Jilin Educational Department during the ‘‘13th Five-Year’’ Plan Period, and Cultivation Plan for Excellent Young Scholar Candidates of Jilin University.
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
149
Table 9 Average coefficient values with standard errors (in brackets) for least squares reduced-rank regression. Weight
Height
Beer
Cholesterol
0.264 (0.073)
−0.195
Diastolic
0.455 (0.064)
−0.103
ECG.PR
0.188 (0.075)
ECG.QRS
0.014 (0.067)
0.089 (0.074)
−0.024
0.085 (0.068)
−0.036
ECG.QT
−0.074 (0.075)
0.046 (0.041)
(0.092)
Cigarette 0.039 (0.052)
−0.033
(0.068)
0.077 (0.041)
0.071 (0.095)
0.031 (0.042)
−0.009
Glucose
0.326 (0.090)
−0.093
Systolic
0.404 (0.072)
−0.139
Triglyceride
0.431 (0.072)
−0.151
Ventricular.rate
0.058 (0.092)
−0.202
(0.048) (0.051)
(0.065) (0.072) (0.062) (0.119)
(0.051) (0.037)
−0.014 (0.058)
−0.037 (0.052)
0.051 (0.032)
−0.005
0.055 (0.047)
−0.035
(0.052) (0.050)
Heavy.act
Slight.act
Sleeping
−0.020
−0.028
−0.007
(0.048)
(0.047)
0.011 (0.042)
0.018 (0.047)
−0.004
0.018 (0.039)
0.015 (0.050)
−0.002 −0.001
(0.048)
0.092 (0.075)
0.043 (0.060)
0.048 (0.053)
−0.002
−0.007
−0.009
−0.004
(0.039)
(0.043)
0.010 (0.045)
0.025 (0.053)
0.071 (0.048)
0.066 (0.076)
−0.023
0.008 (0.045)
0.022 (0.050)
−0.060
−0.039
(0.051)
(0.048)
−0.025
(0.032) (0.043) (0.031) (0.037) (0.031)
−0.003 (0.035)
−0.003 (0.041)
−0.020 (0.046)
(0.046)
0.011 (0.034)
Heavy.act
Slight.act
Sleeping
−0.029
−0.016
−0.002
(0.055)
Table 10 Average coefficient values with standard errors (in brackets) for empirical likelihood based reduced-rank mean regression. Weight
Height
Beer
0.257 (0.069)
−0.171
Diastolic
0.449 (0.058)
−0.092
ECG.PR
0.196 (0.051)
ECG.QRS
0.048 (0.053)
0.042 (0.052)
−0.016
0.064 (0.056)
−0.032
Cholesterol
ECG.QT
−0.102 (0.053)
0.032 (0.033)
(0.077)
Cigarette 0.047 (0.040)
−0.024
(0.058)
0.081 (0.033)
0.045 (0.046)
0.040 (0.026)
−0.007
Glucose
0.300 (0.057)
−0.059
Systolic
0.394 (0.061)
−0.126
Triglyceride
0.447 (0.064)
−0.131
Ventricular.rate
0.075 (0.071)
−0.170
(0.032) (0.027)
(0.050) (0.070)
(0.028)
−0.024 (0.028)
−0.005
0.056 (0.028)
−0.031
−0.011
(0.084)
(0.020)
−0.012
0.054 (0.027)
0.069 (0.042)
(0.064)
(0.043)
(0.033)
(0.036) (0.045)
(0.027)
(0.032)
−0.027
0.002 (0.039)
(0.031)
−0.002
−0.004
(0.026)
(0.021)
−0.011 (0.035)
−0.015
(0.032)
(0.021)
(0.038)
0.061 (0.049)
0.002 (0.025)
0.045 (0.033)
0.030 (0.029)
−0.009
−0.033
−0.005
(0.024)
(0.028)
−0.016
0.017 (0.044)
(0.035)
0.063 (0.052)
−0.055
0.027 (0.030)
−0.035
−0.047
(0.039) (0.039)
−0.004 (0.023)
−0.007 (0.024)
−0.005 (0.036)
−0.013
(0.045)
(0.034)
0.006 (0.033)
0.024 (0.021)
Table 11 Average prediction errors for the portfolio return data for the three methods with τ = 0.25, 0.5, 0.75.
τ = 0.25 τ = 0.5 τ = 0.75
RQ
RR.RQ
RREL.RQ
8.247 10.279 8.498
8.043 10.139 8.432
8.033 10.053 8.391
Appendix A. Supplementary material Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.csda.2016.04.012. References Anderson, T.W., 1951. Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann. Math. Statist. 29, 327–351. Anderson, T.W., 1999. Asymptotic distribution of the reduced rank regression estimator under general conditions. Ann. Statist. 27 (4), 1141–1154. Bunea, F., She, Y., Wegkamp, M.H., 2011. Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Statist. 39 (2), 1282–1309. Bunea, F., She, Y., Wegkamp, M.H., 2012. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Statist. 40 (5), 2359–2388.
150
S. Feng et al. / Computational Statistics and Data Analysis 103 (2016) 139–150
Chaudhuri, P., 1996. On a geometric notion of quantiles for multivariate data. J. Amer. Statist. Assoc. 91 (434), 862–872. Chen, K., Chan, K.S., Stenseth, N.C., 2012. Reduced rank stochastic regression with a sparse singular value decomposition. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 (2), 203–221. Chen, K., Dong, H., Chan, K.S., 2013. Reduced rank regression via adaptive nuclear norm penalization. Biometrika 100 (4), 901–920. Chen, L., Huang, J.Z., 2012. Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Amer. Statist. Assoc. 107 (500), 1533–1545. Cupples, L.A., Arruda, H.T., Benjamin, E.J., D’Agostino, R.B., Demissie, S., DeStefano, A.L., Dupuis, J., Falls, K.M., Fox, C.S., Gottlieb, D.J., 2007. The framingham heart study 100k snp genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med. Genet. 8 (Suppl 1), S1. Fama, E.F., French, K.R., 2015. A five-factor asset pricing model. J. Financ. Econ. 116 (1), 1–22. Geweke, J., 1996. Bayesian reduced rank regression in econometrics. J. Econometrics 75 (1), 121–146. Hallin, M., Paindaveine, D., Siman, M., 2010. Multivariate quantile and multiple-output regression quantiles: from l1 optimization to half space depth. Ann. Statist. 635–703. Hjort, N.L., McKeague, I.W., Keilegom, I.V., 2009. Extending the scope of empirical likelihood. Ann. Statist. 37 (3), 1079–1111. Leng, C., Tang, C.Y., 2012. Penalized empirical likelihood and growing dimensional general estimating equations. Biometrika 99 (3), 703–716. Lian, H., Ma, S., 2013. Reduced-rank regression in sparse multivariate varying-coefficient models with high-dimensional covariates. arXiv:1309.6058. Lopez, E.M.M., Keilegom, I.V., Veraverbeke, N., 2009. Empirical likelihood for non-smooth criterion functions. Scand. J. Statist. 36 (3), 413–432. Newey, W.K., Smith, R.J., 2004. Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 72 (1), 219–255. Owen, A.B., 1988. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 (2), 237–249. Owen, A., 1990. Empirical likelihood ratio confidence regions. Ann. Statist. 18 (1), 90–120. Qin, J., Lawless, J., 1994. Empirical likelihood and general estimating equations. Ann. Statist. 22 (1), 300–325. Tang, C.Y., Leng, C., 2010. Penalized high-dimensional empirical likelihood. Biometrika 97 (4), 905–920. Xue, L., Zhu, L., 2007. Empirical likelihood for a varying coefficient model with longitudinal data. J. Amer. Statist. Assoc. 102 (478), 642–654. Zhu, L., Xue, L., 2006. Empirical likelihood confidence regions in a partially linear single-index model. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 (3), 549–570.