Statistics and Probability Letters 158 (2020) 108676
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
On minimum volume properties of some confidence regions for multiple multivariate normal means ∗
S. Bedbur , J.M. Lennartz, U. Kamps Institute of Statistics, RWTH Aachen University, D-52056 Aachen, Germany
article
info
Article history: Received 19 August 2019 Received in revised form 14 November 2019 Accepted 19 November 2019 Available online 4 December 2019
a b s t r a c t In a multi-sample model of multivariate normal distributions with covariance matrices being known or known except for unknown multipliers, simultaneous confidence regions for the mean vectors are provided with minimum volume properties. The univariate case with unknown variances is included. © 2019 Elsevier B.V. All rights reserved.
Keywords: Simultaneous confidence region Normal distribution Minimum volume
1. Introduction As an alternative or supplement to the criterion of coverage probabilities, the volume or expected volume is often used in parametric inference to measure and compare confidence regions for an unknown parameter. When doing so, a confidence region is considered the better the smaller its (expected) volume is. For a multivariate normal mean, different confidence regions have been proposed in the literature aiming for a smaller volume than that of the standard confidence regions given by certain (random) Mahalanobis balls centred at the sample mean; see, e.g., Stein (1962), Berger (1980), Casella and Hwang (1983), Shinozaki (1989), Brown et al. (1995), Tseng and Brown (1997), Samworth (2005), and, more recently, Abeysekera and Kabaila (2017). Here, the covariance matrix was assumed to be known or known except for an unknown multiplier. Such smaller confidence regions are obtained, for instance, by centring the Mahalanobis balls at James–Stein estimators or by inverting particular spherical acceptance regions; see the aforementioned references. A general account on the topic along with an analysis and comparison of different approaches is provided by Efron (2006). In the present paper, we focus on a multi-sample model and assume to have independent samples of (possibly) different sizes from multiple multivariate normal distributions; the one-sample case and the univariate case are contained in the setting. Under different assumptions on the covariance matrices, simultaneous confidence regions for the mean vectors are provided with minimum volume properties. To be more precise, they each minimize the volume among all equal- or higher-levelled confidence regions based on the same pivotal quantity. In the one-sample case, the proposed optimal confidence regions are simply the standard Mahalanobis balls such that even smaller confidence regions from the literature are found to have minimum volume within a wider class of confidence regions. In Section 2, a simultaneous minimum volume confidence region for the mean vectors is first developed under the assumption that the covariance matrices in all samples are known. Similar arguments are then applied in Section 3 to obtain a smallest confidence region in the situation, where the covariance matrices are assumed to be each an unknown multiple of the identity matrix; in this setup, the multi-sample case of univariate normal distributions with unknown variances is included. Finally, we summarize the main findings in Section 4. ∗ Corresponding author. E-mail address:
[email protected] (S. Bedbur). https://doi.org/10.1016/j.spl.2019.108676 0167-7152/© 2019 Elsevier B.V. All rights reserved.
2
S. Bedbur, J.M. Lennartz and U. Kamps / Statistics and Probability Letters 158 (2020) 108676
2. Known covariance matrices We consider m ≥ 1 independent samples from multivariate normal distributions. Sample i, 1 ≤ i ≤ m, consists of independent and identically distributed random (row) vectors X i1 , . . . , X ini with distribution Nki (µi , Σ i ), which denotes the ki -dimensional normal distribution with mean vector µi ∈ Rki and regular covariance matrix Σ i ∈ Rki ×ki . Throughout this section, we suppose that Σ 1 , . . . , Σ m are known, in the case of which T (X , µ) =
(√
−1/2
n1 (X 1 − µ1 )Σ 1
,...,
√
1/2 nm (X m − µm )Σ − m
)
(1)
has a k-dimensional standard normal distribution and therefore as the usual pivotal quantity for µ (µ1 , . . . , µm ), ∑= ∑serves ni m X / n for 1 ≤ i ≤ m, and k = where X = (X 1 , . . . , X m ), X i = (X i1 , . . . , X ini ) and X i = ij i j=1 i=1 ki . As usual, −1/2
−1/2
−1/2
1 1 Σi ∈ Rki ×ki denotes the unique square root matrix of Σ − Σi = Σ− i , i.e., Σ i i , 1 ≤ i ≤ m. In what follows, let ∥ · ∥2 denote the Euclidean norm and superscript t denote transposition. Moreover, we shall say that a confidence region C = C (X ) for µ is based on T = T (X , µ) if there exists a Borel set B in such a way that µ ∈ C if and only if T ∈ B. A minimum volume confidence region for µ based on T is stated in Theorem 1.
Theorem 1. Let α ∈ (0, 1). Among all confidence regions for µ based on T with confidence level 1 − α , the confidence region C∗ =
{
} µ : ∥T (X , µ)∥22 ≤ χ12−α (k)
has minimum volume, which is non-random and given by
( λ (C ) = k
∗
m ∏ |Σ i | k
i=1
ni i
)1/2 [ ]k/2 π χ12−α (k) . Γ (k/2 + 1)
(2)
Here, χ12−α (k) denotes the (1 − α )-quantile of the chi-square-distribution χ 2 (k) with k degrees of freedom, |Σ i | is the determinant of Σ i for 1 ≤ i ≤ m, and λk denotes the k-dimensional Lebesgue measure.
√
−1/2
Proof. Let X = (X 1 , . . . , X m ) and ∆ ∈ Rk×k be a block diagonal matrix with diagonal blocks ni Σ i , 1 ≤ i ≤ m. Then, τ (µ) = T (X , µ) = (X − µ)∆ defines a bijective mapping from Rk to Rk with Jacobian matrix −∆ being free of µ. Hence, T meets the conditions in Jeyaratnam (1985) and Dharmadhikari and Joag-Dev (1988, pp. 211/212), respectively, such that, for every c > 0, a minimum volume confidence region for µ among all equal- or higher-levelled confidence regions based on T is given by C =
{ } µ : f T (T (X , µ)) ≥ c ,
where f T denotes the density function of T (X , µ). Inserting for f T yields the representation C =
{ } µ : ∥T (X , µ)∥22 ≤ c˜
for some constant c˜ ∈ R, such that the first statement is shown. Moreover, τ linearly transforms C ∗ to a k-dimensional Euclidean ball with radius [χ12−α (k)]1/2 , the volume of which divided by |∆| leads to formula (2). □ Remark 1. (i) Theorem 1 applies to the multivariate one-sample case by choosing m = 1. (ii) Setting k1 = · · · = km = 1 in Theorem 1 yields the result for independent samples from univariate normal distributions. (iii) For k = 2, formula (2) simplifies to
λ2 (C ∗ ) = −
2πσ1 σ2 ln(α )
√
n1 n2
if k1 = k2 = 1, i.e., when independently sampling from m = 2 univariate normal distributions N1 (µi , σi2 ), i = 1, 2, and to
λ (C ) = − 2
∗
2πσ1 σ2
√
1 − ρ 2 ln(α ) n1
if k1 = 2, i.e., in case of a single sample from the bivariate normal distribution with correlation parameter ρ and marginal variances σ12 and σ22 . Remark 2. C ∗ , as stated in Theorem 1, coincides with the likelihood ratio confidence region, i.e., the confidence region ˜ ) = {X : ∥T (X , µ ˜ )∥22 ≤ χ12−α (k)} of the likelihood ratio test with confidence obtained by inverting the acceptance region A(µ ˜ ; see, e.g., Mardia et al. (1979, Section 5). Likewise, inverting the acceptance level 1−α for the simple null hypothesis µ = µ regions of the Wald and Rao score test each leads to C ∗ .
S. Bedbur, J.M. Lennartz and U. Kamps / Statistics and Probability Letters 158 (2020) 108676
3
For the multivariate one-sample case, we shall highlight the benefit and limitation of the finding in Theorem 1 related to the confidence regions mentioned in the introduction with smaller volume. First, Theorem 1 does not contradict the former findings, since the corresponding confidence regions are not based on the usual pivotal quantity constructed via the sample mean; see formula (1). Hence, these confidence regions improve in terms of volume not only upon the standard Mahalanobis ball but also upon any other confidence region based on that pivotal quantity. The latter comprises, for instance, multi-dimensional ‘rectangles’ based on T obtained by factorization of the overall confidence level, allocating the factors to dimensions, and then choosing an appropriate confidence interval in every dimension. 3. Partially unknown covariance matrices The theorem in Jeyaratnam (1985) may also be applied in situations, where the covariance matrices contain unknown parameters. For this, Σ 1 , . . . , Σ m in formula (1) have to be replaced by estimators in such a way that the distribution of T is free of µ and Σ 1 , . . . , Σ m . By doing so, the resulting minimum volume confidence region for µ will still be of the form C = {µ : f T (T (X , µ)) ≥ c }. In the most general setup with completely unknown covariance matrices in all samples, Σ 1 , . . . , Σ m might be estimated by the sample covariance matrices; however, in that case, finding the distribution and density of T is not straightforward. A simple expression for C results if the covariance matrix in sample i is supposed to be a multiple of the identity matrix, i.e., if Σ i = σi2 Iki for some unknown parameter σi > 0, 1 ≤ i ≤ m. In this case, provided that ni ≥ 2 for 1 ≤ i ≤ m, a natural pivotal quantity based on the maximum likelihood estimators of µ and σ1 , . . . , σm is given by
(√
√
n1 (X 1 − µ1 )
T˜ (X , µ) =
√
S1 / k1 (n1 − 1)
,...,
nm (X m − µm )
) ,
√
Sm / km (nm − 1)
(3)
∑ ni
∥X ij − X i ∥22 , 1 ≤ i ≤ m. Here, for 1 ≤ i ≤ m, Si2 /σi2 has a χ 2 ((ni − 1)ki )-distribution, since it can ∑ni 2 t be rewritten as the trace of the scaled scatter matrix j=1 (X ij − X i ) (X ij − X i )/σi , which, in turn, follows a Wishartdistribution Wki (ni − 1, Iki ) with ni − 1 degrees of freedom and scale matrix Iki . By Basu’s theorem, X i and Si are then independent for 1 ≤ i ≤ m, which yields that T˜ is indeed a pivotal quantity for µ with distribution being free of σ1 , . . . , σm . Now, a minimum volume confidence region for µ based on T˜ is as follows. where Si2 =
j=1
Theorem 2. Let α ∈ (0, 1) and ni ≥ 2 for 1 ≤ i ≤ m. Among all confidence regions for µ based on T˜ with confidence level 1 − α , the confidence region
⎧ ⎨
C˜ ∗ =
µ:
⎩
m ∑
( kj nj ln 1 +
nj ∥X j − µj ∥22
)
Sj2
j=1
≤ d1−α
⎫ ⎬
(4)
⎭ ∑m
has minimum volume. Here, d1−α denotes the (1 − α )-quantile of the random variable Z = j=1 kj nj ln[1 + Yj /(nj − 1)], where Y1 , . . . , Ym are independent and Yi ∼ F (ki , (ni − 1)ki ), i.e., Yi has an F -distribution with ki and (ni − 1)ki degrees of freedom for 1 ≤ i ≤ m.
√
Proof. When replacing in the proof of Theorem 1 the diagonal blocks of ∆ by ( ki ni (ni − 1)/Si )Iki , 1 ≤ i ≤ m, the theorem in Jeyaratnam (1985) may still be applied to conclude that, for every c > 0, a minimum volume confidence region for µ among all equal- or higher-levelled confidence regions based on T˜ is given by C˜ =
{
} ˜ µ : f T (T˜ (X , µ)) ≥ c ,
˜ ˜ where f T denotes the density function of T˜ (X , µ). From formula (3), f T is seen to be the product density of m standard multivariate t-distributions, i.e., ˜
f T (t˜ 1 , . . . , t˜ m ) = cˇ
m [ ∏ j=1
1+
1 kj (nj − 1)
∥t˜ j ∥22
]−kj nj /2
for t˜ i ∈ Rki , 1 ≤ i ≤ m, and a constant cˇ being free of µ and σ1 , . . . , σm ; see, e.g., Kotz and Nadarajah (2004). Inserting for T˜ and drawing the natural logarithm then leads to formula (4). Finally, since ni ∥X i − µi ∥22 /σi2 ∼ χ 2 (ki ) and Si2 /σi2 ∼ χ 2 ((ni − 1)ki ) are independent for 1 ≤ i ≤ m, the stated rule for determining d1−α is shown. □ Remark 3. (i) In the one-sample case, i.e., for m = 1, formula (4) in Theorem 2 simplifies to
{ ˜∗
C
=
µ:
n1 ∥X 1 − µ1 ∥22 S12 /(n1 − 1)
} ≤ F1−α (k1 , (n1 − 1)k1 ) ,
4
S. Bedbur, J.M. Lennartz and U. Kamps / Statistics and Probability Letters 158 (2020) 108676
where F1−α (k1 , (n1 − 1)k1 ) denotes the (1 − α )-quantile of the F (k1 , (n1 − 1)k1 )-distribution. The corresponding (random) volume is given by
λ (C˜ ∗ ) = k1
(
π F1−α (k1 , (n1 − 1)k1 ) n1 (n1 − 1)
)k1 /2
k
S1 1
Γ (k1 /2 + 1)
with distribution being free of µ. Its mean may be stated explicitly by using the formula for the raw moments of the chi-distribution χ ((n1 − 1)k1 ) with (n1 − 1)k1 degrees of freedom; see, e.g., Johnson et al. (1994, p. 421). (ii) Theorem 2 applies to m univariate normal distributions with arbitrary unknown variances by setting k1 = · · · = km = 1. As a particular result in the multivariate one-sample case, formerly developed confidence regions smaller than the standard one in Remark 3(i) may turn out to have minimum volume in a broader class of confidence regions; see, e.g., Berger (1980), Casella and Hwang (1983), Samworth (2005), and Abeysekera and Kabaila (2017). (0) Finally, it is worth mentioning that the findings of this section may easily be generalized to the case Σ i = σi2 Σ i (0) ki ×ki with an unknown scale parameter σi > 0 and a known positive definite matrix Σ i ∈ R in sample i, 1 ≤ i ≤ m. For this, the ith component of T˜ in formula (3) has to be substituted by
√
ni (X i − µi )(Σ i )−1/2 (0)
√
S˜i / ki (ni − 1) i −1/2 2 with S˜i2 = ∥2 for 1 ≤ i ≤ m. Since the distribution of T˜ remains the same, we may follow the j=1 ∥(X ij − X i )(Σ i ) arguments in the proof of Theorem 2 to arrive at a minimum volume confidence region for µ as stated in formula (4) (0) with nj ∥X j − µj ∥22 /Sj2 being replaced by nj ∥(X j − µj )(Σ j )−1/2 ∥22 /S˜j2 , 1 ≤ j ≤ m.
∑n
(0)
4. Conclusion Based on independent samples from multivariate normal distributions with known or one-parameter diagonal covariance matrices, simultaneous confidence regions for the mean vectors are derived with minimum volume among all equal- or higher-levelled confidence regions based on the usual pivotal quantities. As a particular consequence in the onesample case, previously constructed confidence regions smaller than the standard Mahalanobis balls are seen to minimize the volume within a wider class of confidence regions. The setup with partially unknown covariance matrices covers, in particular, the multi-sample case of univariate normal distributions with arbitrary unknown variances. CRediT authorship contribution statement S. Bedbur: Conceptualization, Methodology, Writing — original draft, Writing — review & editing. J.M. Lennartz: Conceptualization, Methodology, Writing — review & editing. U. Kamps: Conceptualization, Methodology, Writing — review & editing, Supervision. References Abeysekera, W., Kabaila, P., 2017. Optimized recentered confidence spheres for the multivariate normal mean. Electron. J. Stat. 11, 1798–1826. Berger, J., 1980. A robust generalized Bayes estimator and confidence region for a multivariate normal mean. Ann. Statist. 8 (4), 716–761. Brown, L.D., Casella, G., Hwang, J.T.G., 1995. Optimal confidence sets, bioequivalence, and the Limaçon of Pascal. J. Amer. Statist. Assoc. 90 (431), 880–889. Casella, G., Hwang, J.T., 1983. Empirical Bayes confidence sets for the mean of a multivariate normal distribution. J. Amer. Statist. Assoc. 78 (383), 688–698. Dharmadhikari, S., Joag-Dev, K., 1988. Unimodality, Convexity, and Applications. Academic Press, Boston. Efron, B., 2006. Minimum volume confidence regions for a multivariate normal mean vector. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 (4), 655–670. Jeyaratnam, S., 1985. Minimum volume confidence regions. Statist. Probab. Lett. 3 (6), 307–308. Johnson, N.L., Kotz, S., Balakrishnan, N., 1994. Continuous Univariate Distributions, vol. 1, second ed. Wiley, New York. Kotz, S., Nadarajah, S., 2004. Multivariate t Distributions and Their Applications. Cambridge University Press, Cambridge. Mardia, K.V., Kent, J.T., Bibby, J.M., 1979. Multivariate Analysis. Academic Press, London. Samworth, R., 2005. Small confidence sets for the mean of a spherically symmetric distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (3), 343–361. Shinozaki, N., 1989. Improved confidence sets for the mean of a multivariate normal distribution. Ann. Inst. Statist. Math. 41 (2), 331–346. Stein, C.M., 1962. Confidence sets for the mean of a multivariate normal distribution (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 24 (2), 265–296. Tseng, Y.-L., Brown, L.D., 1997. Good exact confidence sets for a multivariate normal mean. Ann. Statist. 25 (5), 2228–2258.