Computational Statistics and Data Analysis 53 (2009) 4221–4227
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Bootstrapping least distance estimator in the multivariate regression model Myoungshic Jhun ∗ , Inkyung Choi Department of Statistics, Korea University, Seoul 136-701, Republic of Korea
article
info
Article history: Received 14 December 2008 Received in revised form 2 April 2009 Accepted 17 May 2009 Available online 21 May 2009
abstract The most popular estimation methods in multivariate linear regression are the multivariate least squares estimation and the multivariate least absolute estimation. Each method repeats its univariate estimation method p, the number of response variables, times. Although they are relatively easy to apply, they do not employ the relationship between response variables. This study considers the multivariate least distance estimator of Bai et al. (1990) that accounts for this relationship. We confirm its relative efficiency with respect to the multivariate least absolute estimator under the multivariate normal distribution and contaminated distribution. However, the asymptotic inference of the multivariate least distance estimator is shown to perform poorly in certain circumstances. We suggest the bootstrap method to infer the regression parameters and confirm its viability using Monte Carlo studies. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Consider a multivariate linear regression model (1) with p response variables. yi = XTi β + εi
i = 1, . . . , n
(1)
where yi is a p × 1 response vector, Xi is a (qp) × p matrix representing q explanatory variables, β is a (qp) × 1 vector of unknown regression parameters, and εi is a sequence of i.i.d. p × 1 error vector with spatial median vector 0. To avoid confusion, we clarify here that regression model (1) is equivalent to the usual Gauss–Markoff linear model (2) for we can define Xi and β in the model (1) as XTi = I ⊗ xTi and β = Vec (B). yi = BT xi + εi
i = 1, . . . , n
(2)
where B is a q × p matrix of regression parameters and xi is a q × 1 explanatory vector. Since β(k) = (β1k , . . . , βqk ) of T
B = β(1) : · · · : β(p) is the regression parameter vector of the kth response variable, we denote βjk , element of parameter vector β in model (1), as a regression coefficient for the jth explanatory variable of the kth response variable. In a multivariate regression model, our objective is to estimate the set of regression coefficients of each response variables and make inferences about these coefficients. The most widely used estimation methods are the multivariate least squares estimation and the multivariate least absolute estimation. These estimation methods apply the univariate least squares estimation method and least absolute estimation method to each individual response variables and lump results in the parameter matrix. However, both of these methods have a drawback in that they do not employ the relationship of the response variables in the estimation process. In the multivariate regression model, we often encounter cases where the
∗
Corresponding author. Tel.: +82 2 3290 2236; fax: +82 2 924 9895. E-mail address:
[email protected] (M. Jhun).
0167-9473/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2009.05.012
4222
M. Jhun, I. Choi / Computational Statistics and Data Analysis 53 (2009) 4221–4227
correlation of the response variables is measurably high. For instance, systolic blood pressure and diastolic blood pressure from a patient would be positively correlated. The correlation of the number of cavities in the upper jaw and the lower jaw is expected to be high as well. Indeed, the very reason why we consider a multivariate model is to incorporate the relationship of response variables. Breiman and Friedman (1997) shrank the usual multivariate least squares estimator through canonical analysis to utilize the relationship of the response variables. Chakraborty (1999) assumed that the fundamental reason for the multivariate least absolute estimator suffering from low efficiency is its non-equivariant property. He proposed the affine-equivariant estimator using a data-driven transformation and re-transformation method to overcome this. This study considers the least distance estimator of Bai et al. (1990). In a multivariate location set-up, the least distance estimator is equivalent to the spatial median studied by Haldane (1948) and Brown (1983), and shown to be more efficient than the coordinate-wise median vector (see Bai et al., 1990). Uniqueness of the estimator was shown by Ducharme (1995) under reasonable assumptions. In Section 2, we confirm that the efficiency of the least distance estimator is also higher than that of the least absolute estimator in the multivariate regression situation. In Section 3, we discuss inference based on the least distance estimator using its asymptotic distribution and find some limitations in a practical application. We propose bootstrap inference and compare the asymptotic method and bootstrap method using a Monte Carlo simulation under various experimental settings. Section 4 concludes the paper. 2. Least distance estimator 2.1. Definition In model (1), the least distance estimator (LDE) is defined as (3) which minimizes the sum of distances between observed and predicted values.
βˆ = min Q (β)
(3)
β
T where Q (β) = i=1 kyi − Xi βk. The analytic value of the least distance estimator(LDE) cannot be derived, but is shown to be unique (Bai et al., 1990), so we obtain its numerical value using the Newton–Raphson optimization algorithm. ˆ = βˆ (1) : · · · : βˆ (p) for which On the other hand, the multivariate least absolute estimator (LAE) is defined as a matrix B
Pn
ˆ (k) of the kth response variable. The multivariate least absolute the kth column is the univariate least absolute estimator β estimation process is straightforward, since it is identical to repeating the usual least absolute estimation process p times. In the same manner, the multivariate least squares estimator (LSE) can be obtained by substituting the sum of absolute residuals loss function with the sum of square of residuals. Although LSE is the most widely used estimator in the regression model, it can be seriously affected when outliers exist. In such a case, LAE which has a desirable property of estimator, the robustness, is often considered. However, this estimation process fails to take into account the relationship among response variables. Thus, valuable information in the interdependence structure of the response variables may be overlooked. 2.2. Relative efficiency of LDE In this section, we investigate the performance of robust multivariate regression estimators LDE and LAE by relative efficiency (RE) through simulation. Since we can employ generalized variance as a measure of instability of an estimator vector, it would be natural to define the relative efficiency of an estimator A with respect to an estimator B, say RE (A, B), as the ratio of the generalized variances as in (4) (Serfling, 1980). Therefore if RE (A, B) is larger than 1, estimator A is considered more efficient than estimator B. RE (A, B) =
|Σ B | |Σ A |
1/p (4)
where Σ A and Σ B are covariance matrices of estimator A and estimator B respectively. We compare the efficiency of LDE with respect to LAE through the Monte Carlo investigation. We generate a regression data set of size n = 100 for p = 2 response variables and q = 2 explanatory variables. Without loss of generality, the constant coefficient β1k (k = 1, 2) is set to 0, and the slope coefficient β2k (k = 1, 2) is set to k. The constant explanatory variable xi1 is set to 1, the non-constant explanatory variable xi2 is generated from N (0, 1) and kept fixed throughout the simulations. The random error vector εi is generated from three underlying distributions: Np (0, Σ ), Np (0, Σ ) with outliers in the first response variable only, and Np (0, Σ ) with pairs of outliers in both response variables. The outliers are planted in a following way: first we generate xi2 from N (0, 1) + 5 for five observations and the response variables yik (i = 1, . . . , 5; k = 1, 2) for these five observations are given yik + 5 as in Hadi and Simonoff (1993). Variance elements of the covariance matrix Σ are set to 1. However, to examine how performance changes according to the correlation between response variables, we consider ten different values of correlation ρ = 0.0, 0.1, . . . , 0.9.
M. Jhun, I. Choi / Computational Statistics and Data Analysis 53 (2009) 4221–4227
4223
Table 1 Relative efficiency of LDE with respect to LAE.
ρ
Underlying distribution of the error vector Normal
Outlier in y1
Outlier in y1 and y2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1.214 1.239 1.207 1.270 1.264 1.292 1.318 1.408 1.544 1.698
1.208 1.196 1.193 1.210 1.220 1.278 1.322 1.388 1.477 1.737
1.397 1.382 1.376 1.388 1.411 1.393 1.471 1.607 1.679 1.869
(i)
(i)
ˆ and LAE βˆ = Vec (Bˆ ), and after m = 1000 independent trials we can In the ith trial of the simulation we have LDE β D A ˆD = obtain approximation for the covariance matrix of LDE Σ
Pm
i=1
(i) (·) (i) (·) (βˆ D − βˆ D )(βˆ D − βˆ D )T /m with βˆ D(·) =
Pm
i =1
βˆ D(i) /m.
ˆ A is obtained in an analogous way. Table 1 summarizes the results The approximation for the covariance matrix of LAE Σ 1/p
ˆ A |/|Σ ˆD of the relative efficiency of LDE respect to LAE |Σ for the three underlying distributions of error with varying correlation ρ . In Table 1, it is noticeable that the relative efficiency of LDE with respect to LAE increases as the correlation between response variables increases. This increase is considered due to the strong preservation of the correlation of response variables in the parameter estimates. Further, we can also observe that the relative efficiency of LDE with respect to LAE is greater than 1 even when there is no correlation in the response variables. This implies that we can improve by employing the least distance method for estimation of several sets of regression coefficients simultaneously. 3. Distribution theory for the inference of LDE 3.1. Asymptotic method For an underlying distribution F of the random errors, define (5) Sn =
n X
Xi BXTi ,
Tn =
n X
i=1
Xi AXTi
(5)
i=1
R +∞
T
where A = −∞ ku1k Ip − kuu uk2 as n −→ ∞
R +∞
T
dF (u), B = −∞ kuu dF (u). Under appropriate conditions, Bai et al. (1990) showed that uk2
d 1/2 ˆ −β → S− Tn β Nqp (0, I). n
(6)
ˆ . However, these matrices Sn and Tn depend on matrices A We need matrices Sn and Tn to infer based on the LDE β and B that again depend on underlying distribution F . Since this underlying distribution F is usually unknown, we cannot implement directly asymptotic inference using (6). Thus, estimators of matrices A and B are prerequisites. Considering the integrity form of A and B, natural candidates of estimators of them would be (7). ˆ = A
n 1X
1
n i=1 ki k
Ip −
i Ti ki k2
,
ˆ= B
n 1 X i Ti
n i=1 ki k2
(7)
where i = yi − yˆ i is a residual for the ith observation. ˆ and Bˆ are weakly consistent estimators of A and B respectively and It is shown that under regularity conditions, A −1/2
Sˆ n
ˆ −β Tˆ n β
(8)
converges in distribution to a multivariate normal distribution with mean 0 and covariance Iqp as n → ∞ where Sˆ n =
Pn
i =1
ˆ Ti , Tˆ n = Xi BX
Pn
i =1
ˆ Ti (Bai et al., 1990). Thus, with the true regression coefficient vector β0 we have Xi AX
T −1 d βˆ − β0 Tˆ n Sˆ n Tˆ n βˆ − β0 −→ χ(2qp)
(9)
and, the 100 × (1 − α)% confidence region of the regression coefficients vector β can be obtained as (10) using (9).
T −1 β : βˆ − β Tˆ n Sˆ n Tˆ n βˆ − β ≤ χ(2qp) (α)
where χ
2 (qp)
(α) is the upper 100 × (1 − α)th percentile of the chi-squared distribution with qp degrees of freedom.
(10)
4224
M. Jhun, I. Choi / Computational Statistics and Data Analysis 53 (2009) 4221–4227
Table 2 Estimated MSE of variance estimators Estimator
Sample size n = 30
Var (βˆ 11 ) Var (βˆ 21 ) Var (βˆ 12 ) Var (βˆ 22 )
n = 50
n = 100
Asymptotic
Bootstrap
Asymptotic
Bootstrap
Asymptotic
Bootstrap
0.00251 0.00337 0.00270 0.00336
0.00128 0.00161 0.00124 0.00156
0.00062 0.00073 0.00059 0.00069
0.00040 0.00044 0.00040 0.00046
0.00011 0.00012 0.00012 0.00013
0.00008 0.00009 0.00008 0.00009
Now, we have
βˆ jk − βjk d −→ N (0, 1) sc .e.(βˆ jk )
(11) −1
−1
where sc .e.(βˆ jk ) is an estimator of standard error of βˆ jk obtained from the covariance matrix estimator Tˆ n Sˆ n Tˆ n of βˆ . Thus, using (11), confidence interval of regression coefficient βjk for the jth explanatory variable and the kth response variable can be obtained as (12).
βˆ jk − z (α/2)sc .e.(βˆ jk ), βˆ jk + z (α/2)sc .e.(βˆ jk )
(12)
where z (α/2) is the upper 100 × (1 − α/2)th percentile of the standard normal distribution. We may use the asymptotic result (10) and (12) in constructing the confidence region of regression coefficients vector β and the confidence interval of regression coefficient βjk respectively. However, their performance diminishes when the sample size is small, since the asymptotic distribution is derived under the assumption of infinitely large sample size. i T
Moreover when there is a residual with very small value, say ith residual, the denominator of ith part k1 k (Ip − k ki 2 ) in i i 1 −1 ˆ in (7) has extraordinarily large value. Consequently, the estimator of the covariance matrix T− A n Sn Tn is poorly estimated −1
−1
by Tˆ n Sˆ n Tˆ n . −1
−1
To investigate the performance of the asymptotic covariance matrix estimator Tˆ n Sˆ n Tˆ n , we compare the Mean Squared −1
−1
Error (MSE) of Tˆ n Sˆ n Tˆ n with that of the bootstrap covariance matrix estimator (Efron and Tibshrani, 1993), which is known to be less dependent on the distributional assumptions. In this simulation study, we considered p = 2 response variables and q = 2 explanatory variables of sample size n = 30, 50 and 100. The constant coefficient β1k (k = 1, 2) is set to 0, slope coefficient β2k (k = 1, 2) is set to k, and the non-constant explanatory variable xi2 is generated from N (0, 1). To estimate MSE, elements of the true covariance matrix are approximated using 106 independent trials, and 500 bootstrap replications are used for the bootstrap estimator. The results from 1000 independent trials are shown in Table 2. −1
−1
Notice that MSE of the asymptotic variance estimator of each individual regression coefficient obtained from Tˆ n Sˆ n Tˆ n
ˆ is more than twice as large as that of the bootstrap method when the sample size is small. This of covariance matrix Var (β) is mainly due to the variance part of the MSE. We also found that the asymptotic covariance of two individual regression coefficient estimator Cov(βˆ jk , βˆ kj ) are poorly estimated with larger variance compared to the bootstrap method. While the ˆ provides rather satisfactory results. bootstrap estimator of Var (β) 3.2. Bootstrap method In Section 3.1, we stated that the performance of asymptotic inference is likely to decline when either the sample size is insufficient or small residual values exist. We propose the bootstrap method as an alternative. The bootstrap inference is based on a bootstrap distribution derived from the data and can be applied aptly without any strong distributional assumptions. The following algorithm is employed to construct the bootstrap confidence region of the regression coefficient vector β. ˆ . In the algorithm, Step 5 to Step 7 is nested to use a bootstrap method for the estimation of the covariance matrix Var (β)
ˆ is the ith residual and ¯ is a spatial Step 1. Calculate centered residuals ˜ i = i − ¯ ; i = 1, . . . , n, where i = yi − XTi β median of 1 , 2 , . . . , n . Step 2. Draw an outer bootstrap residual sample ∗1 , ∗2 , . . . , ∗n of size n from ˜ 1 , ˜ 2 , . . . , ˜ n with replacement. ˆ and the bootstrap residual sample ∗ , ∗ , . . . , ∗ , create new response variable set y∗ = XT βˆ + ∗ and Step 3. Using LDE β n i i i 1 2 ˆ obtain LDE β
∗(b1 )
from this new data set (y∗i , Xi ), i = 1, . . . , n.
M. Jhun, I. Choi / Computational Statistics and Data Analysis 53 (2009) 4221–4227
4225
∗ Tˆ
0 Step 4. Calculate centered residuals ˜ i = 0i − ¯ 0 , i = 1, . . . , n, where 0i = y∗i − Xi β is the ith residual and ¯ 0 is a spatial 0 0 0 median of 1 , 2 , . . . , n . ∗∗ ∗∗ ˜ 1 , ˜ 2 , . . . , ˜ n with replacement. Step 5. Draw an outer bootstrap residual sample ∗∗ 1 , 2 , . . . , n of size n from 0
0
0
∗
∗
ˆ and the bootstrap residual sample ∗∗ , ∗∗ , . . . , ∗∗ , create new response variable set y∗∗ = XT βˆ + ∗∗ Step 6. Using LDE β n i i i 1 2 ˆ and obtain LDE β
∗∗
from this new data set (y∗∗ i , Xi ), i = 1, . . . , n.
ˆ Step 7. Repeat Step 5 ∼ Step 6 B2 times independently and obtain β matrix Σ
∗(b1 )
∗∗(1)
ˆ ∗∗(b1 ) as following of β
Σ ∗(b1 ) =
B2 1 X
B2
, βˆ
∗∗(2)
, . . . , βˆ
∗∗(B2 )
to estimate the covariance
∗∗(b2 ) ∗∗(·) ∗∗(b2 ) ∗∗(·) T βˆ − βˆ βˆ − βˆ
b2 =1
∗∗(b2 ) βˆ . T ∗(b1 ) −1 ∗(b1 ) ∗(b1 ) ˆ ˆ ∗(b1 ) in Step 7. ˆ βˆ − βˆ using the estimate of the covariance matrix Σ Step 8. Calculate β − βˆ Σ Step 9. Repeat Step 2 ∼ Step 8 B1 times independently and obtain B1 × (1 − α)th value C ∗ (α) from possibly different B1 ∗(b ) ˆ ∗(b1 ) − βˆ T Σ ˆ ∗(b1 ) −1 βˆ 1 − βˆ , b1 = 1, . . . , B1 , an approximation for the 100 × (1 − α)th percentile values of β
ˆ where β
∗∗(·)
=
1 B2
PB2
b2 =1
of the bootstrap distribution.
ˆ is Therefore the 100 × (1 − α)% bootstrap confidence region of the regression coefficient vector β based on the LDE β obtained as (13).
T −1 ˆ βˆ − β ≤ C ∗ (α) β : βˆ − β Σ ∗(b1 )
PB
∗(·)
∗(b1 )
(13) ∗(·) T
∗(·)
PB
∗(b1 )
1 ˆ ˆ ˆ = 1 where Σ − βˆ βˆ − βˆ and β = B11 b11 =1 βˆ . The bootstrap confidence interval of b 1 =1 β B1 the individual regression coefficient βjk for the jth explanatory variable and the kth response variable can be obtained analogously.
3.3. Simulation We conduct a simulation to study the performance of the procedures introduced. We consider p = 2 and 3 response variables and q = 2 explanatory variables. The constant coefficient β1k (k = 1, . . . , p) is set to zero, and slope coefficient β2k (k = 1, . . . , p) is set to k. The explanatory variable is generated from N (0, 1), and the random error is generated from Np (0, Σ ) with variance 1 and randomly selected correlation from U (0, 1). We used B1 = 500 and B2 = 500 for the bootstrap algorithm. Using the asymptotic method and the bootstrap method, we obtained the confidence region of parameter coefficient vector and confidence intervals of each individual parameter coefficients. This procedure is repeated 1000 times independently for data sets of size n = 30, 50 and 100 to obtain estimates of coverage probabilities for the asymptotic method and the bootstrap method. After calculating the coverage probability estimates of each method, we compare them to the nominal coverage probability 1 − α = 0.90 and 0.95. We report only the results of the 1 − α = 0.95 case to save space, but the results exhibit a similar pattern where 1 − α = 0.90. The simulation results are summarized in Tables 3 and 4. In many cells, asymptotic confidence intervals of individual regression coefficients do not satisfy nominal coverage probability. This is more likely to happen either when the sample size is small or when the number of response variables is small. What is more serious is that asymptotic confidence regions of the regression coefficient vector perform very poorly. When the number of response variables p = 2, coverage probabilities of the asymptotic confidence region are substantially below the nominal coverage probability. This can be attributed to the −1
−1
poor estimator Tˆ n Sˆ n Tˆ n of the covariance matrix. On the other hand, coverage probabilities for the bootstrap method are close to the nominal coverage probability for both the confidence region of the regression coefficient vector β and the confidence interval for the individual regression coefficient βjk . The confidence region based on the asymptotic distribution is poor for small sample sizes, but improves as sample size increases. However, the bootstrap method performs well for even relatively small sample sizes. Overall, the bootstrap confidence region tends to outperform the asymptotic ones in terms of having coverage probabilities close to the nominal confidence level. 3.4. Example Let us consider the following simple example. Table 5 displays systolic and diastolic blood pressures and age data of 40 Marwari females residing in the Burrabazar area of Calcutta (Chakraborty, 1999). Since arterial pressure tends to increase
4226
M. Jhun, I. Choi / Computational Statistics and Data Analysis 53 (2009) 4221–4227
Table 3 Estimated coverage probability for p = 2 and 1 − α = 0.95. Estimator
Sample size n = 30
β β11 β21 β12 β22
n = 50
n = 100
Asymptotic
Bootstrap
Asymptotic
Bootstrap
Asymptotic
Bootstrap
0.486 0.801 0.781 0.800 0.792
0.963 0.958 0.942 0.966 0.946
0.618 0.836 0.831 0.850 0.838
0.970 0.972 0.940 0.972 0.951
0.730 0.880 0.881 0.868 0.867
0.965 0.974 0.944 0.959 0.936
Table 4 Estimated coverage probability for p = 3 and 1 − α = 0.95. Estimator
Sample size n = 30
β β11 β21 β12 β22 β13 β23
n = 50
n = 100
Asymptotic
Bootstrap
Asymptotic
Bootstrap
Asymptotic
Bootstrap
0.666 0.866 0.858 0.864 0.847 0.842 0.868
0.970 0.962 0.936 0.955 0.935 0.956 0.936
0.769 0.904 0.903 0.892 0.899 0.905 0.896
0.958 0.964 0.936 0.954 0.940 0.955 0.946
0.876 0.929 0.930 0.924 0.938 0.922 0.932
0.960 0.964 0.938 0.962 0.955 0.963 0.946
Table 5 Blood pressure and age data of Marwari females. No.
Age
Systolic pressure
Diastolic pressure
No.
Age
Systolic pressure
Diastolic pressure
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
52 21 60 38 19 50 32 41 36 57 52 19 17 16 67 42 44 56 32 21
130 120 180 110 100 170 130 120 140 170 110 120 110 120 160 130 140 170 150 140
80 88 100 90 70 100 84 80 84 106 80 80 70 80 90 90 90 100 94 94
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
26 76 37 48 40 36 39 38 16 48 22 30 19 39 38 45 22 20 18 31
130 160 110 130 160 150 140 110 110 130 120 110 120 124 130 120 130 120 120 112
84 90 80 90 112 90 100 74 70 100 80 70 80 84 94 84 80 86 80 80
with age, the multivariate linear regression model can be set using the blood pressures as the two response variables and age as the explanatory variable as follows. Systolic = β11 + β21 Age Diastolic = β12 + β22 Age. With this data set, we obtain confidence intervals of each regression coefficients β11 , β21 , β12 and β22 using the asymptotic method and the bootstrap method. Table 6 shows the LDE estimates and the corresponding confidence intervals of the estimates. The lengths of the asymptotic confidence intervals tend to be shorter than the bootstrap confidence intervals. This result was expected as the estimator of the standard error using the asymptotic method tends to be underestimated. It is possible to make an incorrect statistical inference based on the asymptotic method. 4. Conclusion We considered the least distance estimator in multivariate regression model. It is more robust than the ordinary least squares estimator and more efficient than the least absolute estimator. Though we need an inferential method to use the estimator in practice, we found that the asymptotic method derived from the large sample assumption performs
M. Jhun, I. Choi / Computational Statistics and Data Analysis 53 (2009) 4221–4227
4227
Table 6 LDE estimates and confidence intervals for 1 − α = 0.95. Estimator
LDE estimates
Asymptotic C.I.
Bootstrap C.I.
β11 β21 β12 β22
106.333 0.714 74.215 0.314
[102.880, 109.788] [0.626,0.802] [70.827, 77.604] [0.228, 0.400]
[91.017, 117.554] [0.370, 1.054] [66.683, 79.755] [0.1485, 0.4852]
poorly, especially in constructing a simultaneous confidence region when the sample size is not sufficiently large. The low performance is an inherent property of the asymptotic method and attributable to the poor estimator of the covariance matrix. The simulation study confirmed that the proposed bootstrap method provides more accurate results than the asymptotic method in terms of coverage probability, even in cases with small sample size. We therefore propose the bootstrap method as it does not require strong assumptions. Acknowledgement This research was supported by a Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF2007-314-C00039). References Bai, Z.D., Chen, X.R., Miao, B.Q., Rao, C.R., 1990. Asymptotic theory of least distance estimate in multivariate linear model. Statistics 21, 503–519. Breiman, L., Friedman, J.H., 1997. Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society. Series B 59 (1), 3–54. Brown, B.M., 1983. Statistical uses of the spatial median. Journal of the Royal Statistical Society, Series B 45 (1), 25–30. Chakraborty, B., 1999. On multivariate median regression. Bernoulli 5 (4), 683–703. Ducharme, G.R., 1995. Uniqueness of the least-distances estimator in regression models with multivariate response. The Canadian Journal of Statistics 23 (4), 421–424. Efron, B., Tibshrani, R., 1993. An Introduction to the Bootstrap. Chapman & Hall, New York. Hadi, A.S., Simonoff, J.S., 1993. Procedures for the identification of multiple outliers in linear models. Journal of the American Statistical Association 88 (424), 1264–1272. Haldane, J.B.S., 1948. Note on the median of multivariate distribution. Biometricka 35, 414–415. Serfling, R.J., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.