Statistics and Probability Letters 91 (2014) 1–5
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
Can the bounds in the multivariate Chebyshev inequality be attained? Jorge Navarro ∗ Facultad de Matematicas, Universidad de Murcia, 30100 Murcia, Spain
article
abstract
info
Article history: Received 30 January 2014 Received in revised form 27 March 2014 Accepted 29 March 2014 Available online 3 April 2014
Chebyshev’s inequality was recently extended to the multivariate case. In this paper we prove that the bounds in the multivariate Chebyshev’s inequality for random vectors can be attained in the limit. Hence, these bounds are the best possible bounds for this kind of regions. © 2014 Elsevier B.V. All rights reserved.
Keywords: Chebyshev (Tchebysheff) inequality Mahalanobis distance Principal components Ellipsoid
1. Introduction The very well known Chebyshev’s inequality for random variables provides a lower bound for the percentage of the population in a given distance with respect to the population mean when the variance is known. There are several extensions of this inequality to the multivariate case (see e.g. Chen (2011), Marshall and Olkin (1960) and the references therein). Recently, Chen (2011) proved the following multivariate Chebyshev’s inequality Pr((X − µ)′ V −1 (X − µ) ≥ ε) ≤
k
(1)
ε
valid for all ε > 0 and for all random vectors X = (X1 , . . . , Xk )′ (w ′ denotes the transpose of w ) with the finite mean vector µ = E (X) and the positive definite covariance matrix V = Cov(X) = E ((X −µ)(X −µ)′ ). Of course, (1) can also be written as Pr((X − µ)′ V −1 (X − µ) < ε) ≥ 1 − for all ε > 0 or as Pr(dV (X, µ) < δ) ≥ 1 − for all δ > 0, where dV (X, µ) =
k
δ2
k
ε (2)
(X − µ)′ V −1 (X − µ)
is the Mahalanobis distance associated with V between X and µ. Therefore (2) provides a lower bound for the probability in the concentration ellipsoid Eδ = {x ∈ Rk : dV (x, µ) < δ}.
∗
Tel.: +34 868883508; fax: +34 868884182. E-mail address:
[email protected].
http://dx.doi.org/10.1016/j.spl.2014.03.028 0167-7152/© 2014 Elsevier B.V. All rights reserved.
2
J. Navarro / Statistics and Probability Letters 91 (2014) 1–5
A simple proof of (1) was obtained in Navarro (in press). The case of a singular covariance matrix is also studied in this reference by using the principal components associated with X . Extensions of (1) to Hilbert-space-valued and Banach-spacevalued random elements can be seen in Prakasa Rao (2010) and Zhou and Hu (2012), respectively. Budny (2014) extends the inequality given in (1) by the following inequality Pr((X − µ)′ V −1 (X − µ) ≥ ε) ≤
Is,k (X)
εs ′ −1 whenever Is,k (X) = E [((X − µ) V (X − µ))s ] is finite (and known) for s > 0. In particular, if s = 2, then Pr((X − µ)′ V −1 (X − µ) ≥ ε) ≤
I2,k (X)
ε2
.
(3)
(4)
This inequality was obtained in Mardia (1970), where I2,k (X) = E [((X − µ)′ V −1 (X − µ))2 ] is presented as a multivariate kurtosis coefficient. If s = 1, then the bound obtained from (3) is the same as that given in (1) since I1,k (X) = E [(X − µ)′ V −1 (X − µ)] = k for all X (see Navarro, in press). Sometimes, the bounds given in (3) and (4) are better bounds than that given in (1) but note that there we need to know Is,k (X) for a s > 0 (s ̸= 1). In this paper we prove that the bound in (1) can be attained in the limit for all ε ≥ k. Hence, this bound cannot be improved (for this kind of regions) when we only know µ and V . The proof is given in the next section and it is based on the proof of (1) given in Navarro (in press). The same technique is also used to prove that the bounds given in (3) can also be attained in the limit when s > 0. Some conclusions are given in the last section. 2. Main results The following theorem proves that the bound in (1) can be attained in the limit. Theorem 1. Let X = (X1 , . . . , Xk )′ be a random vector with the finite mean vector µ = E (X) and the positive definite covariance (n) (n) matrix V = Cov(X) and let ε ≥ k. Then there exists a sequence X(n) = (X1 , . . . , Xk )′ of random vectors with the mean vector µ and the covariance matrix V such that lim Pr((X(n) − µ)′ V −1 (X(n) − µ) ≥ ε) =
n→∞
k
ε
.
Proof. For a fixed ε ≥ k, let us consider the random variable Dn defined by
Zn + ε Dn = − Zn + ε 0
with probability (p − 1/n)/2 with probability (p − 1/n)/2 with probability 1 − p + 1/n
for any positive integer n > ε/k, where p = k/ε ≤ 1 and where Zn has an exponential distribution with mean
ε/n
µn =
> 0.
p − 1/n
Note that p−
1
=
n
k
ε
−
1 n
∈ (0, 1).
Also note that Pr(D2n ≥ ε) = p − 1/n. The distribution function of
√
Zn + ε is given by
Zn + ε ≤ x = Pr(Zn ≤ x2 − ε) = 1 − exp(−(x2 − ε)/µn )
Pr
√ √ ε . Hence E ( Zn + ε) < ∞ and (p − 1/n) (p − 1/n) E (Dn ) = E Zn + ε − E Zn + ε = 0
for x ≥
2
2
for all n > ε/k. Moreover, E(
D2n
ε/n ) = (p − 1/n)E (Zn + ε) = (p − 1/n) +ε =pε =k p − 1/n
for all n > ε/k.
(5)
J. Navarro / Statistics and Probability Letters 91 (2014) 1–5
3
Now let Un be a random variable, independent of Zn , with a uniform distribution over the set {1, . . . , k}, that is, Un = i (n) (n) with probability 1/k for i = 1, . . . , k. Then we consider the random vector Y(n) = (Y1 , . . . , Yk )′ (which depends both on (n)
= Dn and Yj(n) = 0 for j = 1, . . . , i − 1, i + 1, . . . , k when Un = i (i.e., with probability 1/k). = Dn with probability 1/k and Yi(n) = 0 with probability (k − 1)/k. Hence
Un and on Zn ) defined by Yi (n)
Therefore, Yi (n)
E (Yi
1
)=
k
E (Dn ) = 0
and (n)
Var (Yi
1
) = E ((Yi(n) )2 ) =
(n) (n)
k
E (D2n ) = 1.
(n) (n)
Moreover, Yi Yj = 0 and E (Yi Yj ) = 0 for all i ̸= j. Then E (Y(n) ) = 0k and Cov(Y(n) ) = Ik , where 0k represents the zero vector of dimension k and Ik represents the identity matrix of dimension k. Finally, as V is positive definite, there exists a symmetric matrix V 1/2 such that V = V 1/2 V 1/2 and V 1/2 V −1 V 1/2 = Ik . Then, we consider the random vector X(n) defined by X(n) = µ + V 1/2 Y(n) which has mean E (X(n) ) = µ and covariance matrix Cov(X(n) ) = Cov(V 1/2 Y(n) ) = V 1/2 V 1/2 = V . Moreover, Pr((X(n) − µ)′ V −1 (X(n) − µ) ≥ ε) = Pr((V 1/2 Y(n) )′ V −1 (V 1/2 Y(n) ) ≥ ε)
= Pr((Y(n) )′ V 1/2 V −1 V 1/2 Y(n) ≥ ε) = Pr((Y(n) )′ Y(n) ≥ ε) k (n) 2 (Yi ) ≥ ε = Pr i =1
= Pr(D2n ≥ ε) =p− which goes to p = k/ε when n → ∞.
1 n
Remark 2. Of course, if k = 1, then the preceding theorem proves that the univariate Chebyshev bound can be attained (n) (n) in the limit. Note that the distribution of the random vector Y(n) = (Y1 , . . . , Yk )′ defined in the proof of the preceding theorem is a mixture of a discrete degenerate distribution (with weight 1 − p + 1/n) over 0k and a singular distribution (n) (n) (with weight p − 1/n) over the axes. Also note that if i ̸= j then Yi and Yj are uncorrelated but not independent. Remark 3. The sequence (X(n) ) defined in the preceding proof can also be used to prove that the bounds given in (3) can be attained in the limit when s > 0. For these random vectors, with the notation used in the proof, we have Is,k (X(n) ) = E ((X(n) − µ)′ V −1 (X(n) − µ))s
= E ((V 1/2 Y(n) )′ V −1 (V 1/2 Y(n) ))s = E ((Y(n) )′ V 1/2 V −1 V 1/2 Y(n) )s = E ((Y(n) )′ Y(n) )s s k (n) 2 =E (Yi ) i =1
D2s n
=E 1 = p− E (Zn + ε)s , n
where Zn has an exponential distribution with mean
µn =
ε/n > 0, p − 1/n
p = k/ε ≤ 1 and p−
1 n
=
k
ε
−
1 n
∈ (0, 1).
4
J. Navarro / Statistics and Probability Letters 91 (2014) 1–5
It is well known that E (Zni ) = i!µin for i = 1, 2, . . . since Zn has an exponential distribution with mean µn for n = 1, 2, . . .. Then (n)
Is,k (X
)= p−
1
1
n
= p−
E (Zn + ε)s
s s
n
i
i =0
s 1 s
= p−
n
i
i =0
ε s−i E Zni ε s−i i!µin
for s = 1, 2, . . .. Hence, the bound obtained for this sequence by using (3) is Is,k (X(n) )
εs
= p−
1
s
n
i =0
s! (µn /ε)i , (s − i)!
where
µn 1/n = . ε p − 1/n Therefore lim
Is,k (X(n) )
ε
n→∞
s
= lim
n→∞
p−
1 n
s i=0
s!
(s − i)!
(µn /ε)i = p =
k
ε
and, from (5), we have lim Pr((X(n) − µ)′ V −1 (X(n) − µ) ≥ ε) =
n→∞
k
ε
= lim
Is,k (X(n) )
n→∞
εs
for s = 2, 3, . . .. Hence, for these values of s, the bounds in (3) are attained in the limit. Note that, for s > 0, we have that the bound in (3) for this sequence is Is,k (X(n) )
εs
s 1 Zn = p− E 1+ , n ε
where p − 1/n > 0 and 1 + Zn /ε ≥ 1. Hence it is increasing in s for s > 0. Thus, for s > 1 we have p=
I1,k (X(n) )
ε
≤
Is,k (X(n) )
≤
εs
Im,k (X(n) )
εm
for any integer value m ≥ s. Therefore as lim
Im,k (X(n) )
εm
n→∞
= p,
then lim
Is,k (X(n) )
εs
n→∞
=p
for all s > 1. Analogously, for 0 < s < 1, we have Is,k (X(n) )
εs
≤p=
I1,k (X(n) )
ε
.
Moreover, from (3), we have Pr((X(n) − µ)′ V −1 (X(n) − µ) ≥ ε) = p −
1 n
≤
Is,k (X(n) )
εs
,
and hence we obtain Is,k (X(n) )
=p εs for all s ∈ (0, 1). Therefore, the bounds obtained from (3) for all s > 0 are also tight. Note that, for the sequence, the bounds obtained in (3) for s > 1 are worse (greater) than the bound obtained in (1). However, for this sequence, the bounds obtained in (3) for 0 < s < 1 are better (smaller) than the bound obtained in (1). lim
n→∞
J. Navarro / Statistics and Probability Letters 91 (2014) 1–5
5
Moreover, all of them are equal when n → ∞. As we have already mentioned before, the same might happen for other random vectors and, in some cases, the bounds obtained in (3) (when Is,k (X) is known) might be better (smaller) than the bound obtained in (1). However, when only µ and V are known, Theorem 1 proves that the best possible bound is that obtained from (1). 3. Conclusions We have proved that the bound k/ε given in the multivariate Chebyshev’s inequality (1) is the best possible upper bound for the probability Pr((X − µ)′ V −1 (X − µ) ≥ ε) whenever ε ≥ k, when only µ = E (X) and V = Cov(X) are known. Of course, if ε ≤ k, then Theorem 1 proves that the best bound is 1, that is, we do not have information about that probability. The same can be applied to the lower bound given in (2). Obviously, these bounds might be improved if we know some additional characteristics of the random vector X. For example, if X has a normal distribution, then the exact probabilities can be obtained by using the chi-squared distribution (see, e.g., page 39 in Mardia et al. (1979)). Sometimes, the bound given in (1) can be improved by the bounds obtained from (3) when Is,k (X) is known for s > 0 (s ̸= 1). Acknowledgments I would like to thank the anonymous reviewers for several helpful suggestions which were used to improve the results included in the present paper. This work is partially supported by Ministerio de Economía y Competitividad under Grant MTM2012-34023-FEDER and Fundación Séneca of C.A.R.M. under Grant 08627/PI/08. References Budny, K., 2014. A generalization of Chebyshev’s inequality for Hilbert-space-valued random elements. Statist. Probab. Lett. 88, 62–65. Chen, X., 2011. A new generalization of Chebyshev inequality for random vectors. arXiv:0707.0805v2. Mardia, K.V., 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530. Mardia, K.V., Kent, J.T., Bibby, J.M., 1979. Multivariate Analysis. Academic Press. Marshall, A.W., Olkin, I., 1960. Multivariate Chebyshev inequalities. Annal. Math. Statist. 31, 1001–1014. Navarro, J., 2014. A very simple proof of the multivariate Chebyshev’s inequality. Comm. Statist. Theory Methods. http://dx.doi.org/10.1080/03610926. 2013.873135. Published online first Dec 2013 (in press). Prakasa Rao, B.L.S., 2010. Chebyshev’s inequality for Hilbert-space-valued random elements. Statist. Probab. Lett. 80, 1039–1042. Zhou, L., Hu, Z.C., 2012. Chebyshev’s inequality for Banach-space-valued random elements. Statist. Probab. Lett. 82, 925–931.