Linear Algebra and its Applications 296 (1999) 171±181 www.elsevier.com/locate/laa
A matrix version of the Wielandt inequality and its applications to statistics Song-Gui Wang
a,b
, Wai-Cheung Ip
c,*
a b
Department of Applied Mathematics, Beijing Polytechnic University, Beijing 100022, People's Republic of China Institute of Applied Mathematics, Academia Sinica, Beijing 100080, People's Republic of China c Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Received 18 December 1997; accepted 22 May 1999 Submitted by R.A. Brualdi
Abstract Suppose that A is an n n positive de®nite Hermitian matrix. Let X and Y be n p and n q matrices, respectively, such that X Y 0. The present article proves the following inequality, 2 k1 ÿ kn X AY
Y AYÿ Y AX 6 X AX; k1 kn where k1 and kn are respectively the largest and smallest eigenvalues of A, and Mÿ stands for a generalized inverse of M. This inequality is an extension of the well-known Wielandt inequality in which both X and Y are vectors. The inequality is utilized to obtain some interesting inequalities about covariance matrix and various correlation coecients including the canonical correlation, multiple and simple correlations. Some applications in parameter estimation are also given. Ó 1999 Elsevier Science Inc. All rights reserved. Keywords: Wielandt inequality; Kantorovich inequality; Cauchy±Schwarz inequality; Canonical correlation; Condition number; Generalized inverse
* Corresponding author. Tel.: +852-27666953; Fax: +852-23629045; e-mail: mathipwc@ polyu.edu.hk
0024-3795/99/$ ± see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 0 2 4 - 3 7 9 5 ( 9 9 ) 0 0 1 1 7 - 2
172
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
1. Introduction Let A be an n n positive de®nite Hermitain matrix and x and y be n 1 vectors satisfying x y 0. The Wielandt inequality (see, e.g. [4; 7, p. 443]) asserts that jx Ayj2 6
k 1 ÿ kn k 1 kn
2
x Ax
y Ay;
1:1
where k1 P P kn are the eigenvalues of A. Moreover there exists an orthonormal pair of vectors x and y for which equality holds in (1.1). Let of A corresponding to u1 ; . . . ; un be the orthonormal eigenvectors p p k1 P P kn . In fact, when x ( u1 un )= 2 and y (u1 ÿ un ) 2 equality holds in (1.1). Eaton [5] rediscovered ÿ (1.1). Householder [9] observed that when y
x xAÿ1 x ÿ x Aÿ1 x x (1.1) becomes the following Kantorovich inequality x Ax x Aÿ1 x
x x2
2
6
k1 kn 4k1 kn
1:2
see, for example [19, p. 44]. The Wielandt inequality (1.1) is an improvement on the general Cauchy± Schwarz inequality which is jx Ayj2 6
x Ax
y Ay
1:3
for every pair of vectors x and y. The last decade has witnessed considerable progress in the study of the Cauchy±Schwarz inequality (1.3), in particular, in the matrix version and its applications in statistics. Marshall and Olkin [10] and Baksalary and Puntanen [2,3] presented some matrix versions involving a positive de®nite or semide®nite matrix. Mond and Pecaric [11] and Pecaric et al. [14] derived some general Cauchy±Schwarz type inequalities. Thus there are several matrix versions of the Cauchy±Schwarz inequalities (1.3) in the literature. In contrast to this fact, there has not been any matrix version of the Wielandt inequality in the literature yet. The purpose of this paper is to present a matrix version of (1.1) and its applications to statistics. In Section 2, we shall derive a matrix version of the Wielandt inequality and other useful inequalities which are obtained as straightforward corollaries. Section 3 presents several statistical applications of our main results which involve some interesting inequalities about correlation coecients and covariance matrix.
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
173
2. A matrix version of the Wielandt inequality For any m n matrix B, we shall use Bÿ to stand for a generalized inverse of it, i.e., Bÿ satis®es BBÿ B B: B denotes its unique Moore±Penrose inverse which is a generalized inverse satisfying B BB B ;
BB BB and
B B B B. Readers may refer to [17,20] for more details. We shall introduce the following conventions used throughout this paper. M
B denotes the ÿ column space (range) of B and PB B
B B B denotes the orthogonal projector onto M
B. A P 0
> 0 means that A is a non-negative de®nite (positive-de®nite) Hermitian matrix. A P B stands for A P 0; B P 0 and A ÿ B P 0. The main result of this paper is stated in the following theorem. Theorem 1. Let A > 0 be an n n matrix, k1 and kn be its largest and smallest eigenvalues, u1 and un be the associated orthonormal eigenvectors, and X and Y be n p and n q matrices satisfying X Y 0. Then 2 k1 ÿ kn ÿ X AY
Y AY Y AX 6 X AX
2:1 k1 kn with equality if 1 1 PX u1 PX un c
u1 un k1 kn
2:2
for scale k and M
Y M
I ÿ PX . In order to prove Theorem 1, we need the following lemma. Lemma 1. ABÿ C AB C;
for all Bÿ
2:3
if and only if M
A M
B and
M
C M
B:
2:4
The proof of Lemma 1 can be found in [16, Lemma 2.2.4 and Complement 13, p. 437]. Remark 1. In view of Lemma 1, the left-hand side of (2.1) is invariant with ÿ respect to the choice of generalized inverse
Y AY . Therefore (2.1) is equivalent to 2 k1 ÿ kn X AX:
2:5 X AY
Y AY Y AX 6 k1 kn
174
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
Lemma 2. Let A be an n n matrix and B be an m n matrix such that B B In . Then
BAB BA B : This may be proved by direct validation. Lemma 3. Let A P 0 be n n, X be n p matrices and M
X M
A. Then ÿ
ÿ
X
X Aÿ X X A ÿ AN
NAN NA;
2:6
where N I ÿ PX . Both sides of (2.6) are invariant with respect to the choice of the involved generalized inverse. In fact they are two equivalent expressions of the covariance matrix of the best linear unbiased estimator of Xb in the linear model y Xb e; E
e 0; cov
e A, whereas M
X M
A. This inequality is available in the statistical literature (e.g., [1, p. 162] and (3.2) of [3, p. 287]). We are now in a position to prove Theorem 1. Proof of Theorem 1. Pre- and post-multiplying (2.1) by X
X Xÿ and
X Xÿ X , respectively, would yield 2 k1 ÿ kn ÿ PX AY
Y AY Y APX 6 PX APX :
2:7 k1 kn It is obvious that (2.1) and (2.7) are equivalent. It is therefore sucient to prove the latter. It is in turn sucient to show that for any a 2 Cn 2 k 1 ÿ kn ÿ a PX AY
Y AY Y APX a 6 a PX APX a k 1 kn or equivalently ÿ
a PX AY
Y AY Y APX a 6 a PX APX a
k 1 ÿ kn k 1 kn
2 :
Since both k1 and kn are positive, 2 k 1 ÿ kn 6 1: k 1 kn Thus the above inequality is also equivalent to 2 ÿ a PX AY
Y AY Y APX a k 1 ÿ kn P 1 ÿ 1ÿ k 1 kn a PX APX a
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
175
or a PX APX a ÿ a PX AY
Y AYÿ Y APX a 4k1 kn P : 2 a PX APX a
k1 kn Note that ÿ
ÿ
N
NAN N P Y
Y AY Y ; where N I ÿ PX , thus it suces to show ÿ
a PX APX a ÿ a PX AN
N AN NAPX a 4k1 kn P : 2 a PX APX a
k1 kn
2:8
By Lemma 3 (2.8) reduces to ÿ
a X
X Aÿ1 X X a 4k1 kn P : 2 a PX APX a
k1 kn
2:9
To prove (2.9), consider the singular value decomposition of X X HD G ; where H is an n r matrix with H H Ir , G is a p r matrix with G G Ir and D is an r r diagonal matrix with positive diagonal elements and r rank
X. In view of Lemma 2, it can be shown that X
X Aÿ1 Xÿ X H
H Aÿ1 Hÿ1 H : Since PX HH ; if we put t H a, then (2.9) becomes ÿ1
t
H Aÿ1 H t 4k1 kn P : t H AHt
k1 kn 2
2:10
It can be shown using the Cauchy±Schwarz inequality (1.3) that 2
ÿ1
t t 6 t H Aÿ1 Ht t
H Aÿ1 H t and with equality holding if and only if H Aÿ1 Ht ct
2:11
for scalar c. Hence we have ÿ1
2
t
H Aÿ1 H t
t t P t H AHt t H AHt t H Aÿ1 Ht
x x2 4k1 kn P ; ÿ1 2 x Ax x A x
k1 kn
2:12
176
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
where x Ht and x x t H Ht t t, and the last inequality follows from the Kantorovich inequality (see, e.g.,p[7, p. 452], or [19, p. 44]). Since equality holds in (2.12) when x
u1 un = 2, (2.2) follows from (2.11). This completes the proof of Theorem 1. Remark 2. If p q 1, (2.1) reduces to (1.1). Remark 3. The Cauchy±Schwarz counterpart of (2.1) is X AY
Y AYÿ Y AX 6 X AX; which holds for any n p matrix X and n q matrix Y. Corollary 1. X AY
Y AYÿ Y AX 6
jÿ1 j1
2
X AX;
where j k1 =kn is the so-called condition number of A. Let A be an n n positive de®nite Hermitian matrix and be partitioned as A11 A12 A ;
2:13 A21 A22 where A11 is p p and A22 is q q; p q n. The matrix A11 ÿ A12 Aÿ1 22 A21 is said to be the Schur complement of A22 in A and is usually denoted by ``
A=A22 ''. The term ``Schur Complement'' and the notation ``
A=A22 '' were introduced by Haynsworth [6]. In statistical literature, however, authors usually prefer the notation A112 to
A=A22 for simplicity. The Schur complement is also very useful in statistics. Quellelle and Styan [13,17] presented an excellent survey for the Schur complement and its statistical applications. From Theorem 1, we shall obtain an important inequality concerning the Schur complement which is a re®nement of the well-known fact A112 > 0 for A > 0 due to Haynsworth [6]. Theorem 2. Let A > 0 with partition (2.13). Suppose that k1 and kn are the largest and smallest eigenvalues of A. Let j k1 =kn . Then 2 k 1 ÿ kn A12 Aÿ1 A 6 A11 ;
2:14 22 21 k 1 kn or equivalently A11:2 A11 ÿ A12 Aÿ1 22 A21 P
4j
j 1
2
A11 :
2:15
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
177
Proof.Inequality (2.14) follows from Theorem 1 by taking the n p matrix X 0 Ip and n q matrix Y to be to be . Subtracting both sides of (2.14) Iq 0 from A11 will lead to (2.15). h 3. Applications to statistics In this section we will derive some inequalities concerning various correlation coecients and covariance matrices using the results obtained in the previous section. (1) Correlation coefficients. We shall ®rst consider the canonical correlations between the random vectors u and w which are p 1 and q 1 respectively and have joint covariance matrix u R11 R12 :
3:1 cov R R21 R22 w Applying Theorem 2 would yield 2 k1 ÿ kn R 6 R11 ; R12 Rÿ1 22 21 k1 kn
3:2
where k1 and kn are the largest and smallest eigenvalues of R. It follows from (3.2) that 2 k1 ÿ kn ÿ1=2 ÿ1=2 R R 6 Ip :
3:3 R11 R12 Rÿ1 22 21 11 k1 kn Without loss of generality we may assume p 6 q and rank(R12 p. Denote by q1 P P qp the canonical correlation coecients of u and w. It follows from (3.3) qi <
k 1 ÿ kn ; k 1 kn
i 1; . . . ; p:
3:4
Corollary 2. qH
2p 2p p Y k 1 ÿ kn jÿ1 q2i 6 ; k 1 kn j1 i1
3:5
where qH is the well-known Hottelling [8] correlation coefficient and j k1 =kn is the condition number of R. (3.5) establishes a relation between the Hottelling correlation coefficient of two random vectors and the condition number (or eigenvalues) of their joint covariance matrix.
178
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
Now we move to the multiple correlation. If p 1, we replace R11 ; R12 and R21 in (3.1) by r11 ; r12 and r21 , respectively. It then follows from (3.3) 2 2 r012 Sÿ1 k1 ÿ kn jÿ1 2 22 r21 6 ;
3:6 qu;w r11 k1 kn j1 where qu;w
> 0 is the so-called multiple correlation coecient. Corollary 3. qu;w 6
k 1 ÿ kn j ÿ 1 k 1 kn j 1
3:7
and equivalently jP
1 qu;w 1 ÿ qu;w
3:8
(3.7) and (3.8) establish interesting relations between the condition number of the covariance matrix and the multiple correlation of a random variable u and a random vector w. When qu;w ! 1; j goes to infinity and vice versa. This fact shows that the condition number and multiple correlation have the same function in regression diagnostics. Similar results may be obtained directly from Wielandt inequality (1.1) for simple correlations. 0 Suppose that z
z1 ; . . . ; zp is a random vector with covariance matrix 0 cov
z R
rij . Put A R, x
0; . . . ; 0; 1; 0; . . . ; 0 and y 0
0; . . . ; 0; 1; 0; . . . ; 0 in (1.1), where 1 is at the ith and jth positions, respectively, we shall obtain 2 2 r2ij k1 ÿ kn jÿ1 6 for all i 6 j rii rjj k1 kn j1 or equivalently Corollary 4. k 1 ÿ kn j ÿ 1 qij 6 ; k 1 kn j 1
3:9
where qij denotes the simple correlation coefficient between zi and zj and j k1 =kn is the condition number of R. Similarly, we may rewrite (3.9) as 1 qij
3:10 jP 1 ÿ qij
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
179
(3.9) or (3.10) characterizes an important relation between the simple correlation coefficient of any two components of a random vector and the condition number or eigenvalues of its covariance matrix. (2) Covariance matrix. Suppose that u and w are p 1 and q 1 random vectors with joint covariance matrix as de®ned in (3.1) and S > 0 (with probability 1) partitioned as S11 S12 ;
3:11 S S21 S22 is an estimator of R, where S11 is p p. In view of Theorem 2, we shall have S112 P
4j
j 1
S12 Sÿ1 22 S21
6
2
S11 ;
jÿ1 j1
3:12 2 S11 ;
3:13
where j is the condition number of S. Further if S has the Wishart distribution W
n; R, then ([12, pp. 93, 117]) S112 Wp
n ÿ q; R112 ; S11 WP
n; R11 and further when R12 0, S12 Sÿ1 22 S21 Wp
q; R11 : Hence (3.11) and (3.12) establish some interesting relations among these Wishart matrices. These matrices appear quite often in statistical literature, in particular, in linear models and multivariate analysis. In what follows we shall present an example to explain their practical applications. (3) Covariance adjusted estimator. Suppose that T1 is a p 1 statistic and T2 is a q 1 statistic with expectations E
T1 h and E
T2 0, respectively, and joint covariance matrix R11 R12 T1 R : cov T2 R21 R22 Assume that R > 0. If R12 6 0, then as an unbiased estimator of h, T1 can be improved. In fact, the estimator h T1 ÿ R12 Rÿ1 22 T2
180
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
is the best unbiased estimator in the class n o A h~ : h~ T1 AT2 ; Eh~ h; A is p q non-random [15]. h is called the covariance adjusted estimator of h. This estimator has several optimality properties and applications in linear models [20,18]. It is obvious cov
h R11:2 6 cov
T1 : Thus the matrix R12 Rÿ1 22 R21 cov
T1 ÿ cov
h
may be considered as a measure of the gain of the estimator h . Hence matrices S11 ; S112 and S12 Sÿ1 22 S21 are estimators of cov
T1 , cov(h ) and the gain of h , respectively. Although S112 6 S11 holds always, (3.12) gives a lower bound for S112 . On the other hand (3.13) establishes an upper bound for the estimated bounds number of S. We gain of h . Both are related to S11 and the condition =jR11 j as the relative gain of h , which may be estimated R may regard R 12 Rÿ1 21 22 by S12 Sÿ1 22 S21 =jS11 j. It follows from (3.13) that an upper bound for this estimated relative gain is given by the following corollary. Corollary 5. S12 Sÿ1 S21 j ÿ 1 2p 22 6 j1 jS11 j
3:14
in which the upper bound tends to 1 as j ! 1. Acknowledgements S.-G. Wang's research was partially supported by grants from Natural Science Foundation of China, a project of the Education Committee of Beijing and Natural Science Foundation of Beijing and the second author's research was supported by a grant from the Research Committee of The Hong Kong Polytechnic University.
References [1] A. Albert, The Gauss±Markov theorem for regression models with possibly singular covariances, SIAM J. Appl. Math. 24 (1973) 182±187. [2] J.K. Baksalary, S. Puntanen, Generalized matrix versions of the Cauchy±Schwarz and Kantorovich inequalities, Aequationes Math. 41 (1991) 103±110.
S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181
181
[3] J.K. Baksalary, S. Puntanen, G.P.H. Styan, A property of the dispersion matrix of the best linear unbaised estimator in the general Gauss±Markov model, Sankhya A 52 (1990) 279±296. [4] F.L. Bauer, A.S. Householder, Some inequalities involving the Euclidean condition of a matrix, Numerische Mathematik 2 (1960) 308±311. [5] M.L. Eaton, A maximization problem and its applications to canonical correlation, J. Multivariate Anal. 6 (1976) 422±425. [6] E.V. Haynsworth, Determination of the inertia of a partitioned Hermitian matrix, Linear Algebra Appl. 1 (1968) 73±81. [7] R. Horn, O.R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, 1985. [8] H. Hottelling, Relations between two sets of variates, Biometrika 28 (1936) 321±377. [9] A.S. Householder, The Theory of Matrices in Numerical Analysis, Reprint Edition, Dover, New York, pp. 81±85. Original Version: Blaisdell, New York, 1964. [10] A.W. Marshall, I. Olkin, Matrix version of the Cauchy and Kantorovich inequalities, Aequationes Math. 40 (1990) 89±93. [11] B. Mond, J.E. Pecaric, Matrix versions of some means inequalities, Aust. Math. Soc. Gaz. 20 (1993) 117±120. [12] R.J. Muirhead, Aspects of Multivariate Statistical Theory, Wiley, New York, 1982. [13] D.V. Quellelle, Schur Complement and Statistics, Linear Algebra Appl. 36 (1981) 187±295. [14] J.E. Pecaric, S. Puntanen, G.P.H. Styan, Some further matrix extensions of the Cauchy± Schwarz and Kantorovich inequalities, with some statistical applications, Linear Algebra Appl. 237 (1996) 455±476. [15] C.R. Rao, Least square theory using an estimated dispersion matrix and its applications to measurement of signal, in: L. Lecam, Neyman (Eds.), Proceeding of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1967, pp. 355±372. [16] C.R. Rao, S.K. Mitra, Generalized Inverse of Matrices and Applications, Wiley, New York, 1971. [17] G.P.H. Styan, Schur Complements and linear statistical models, in: Pukkila, S. Puntanen (Eds.), Proceedings of the First International Tampere Seminar on Linear Statistical Models and their Applications, University of Tampere, Finland, 1985, pp. 37±75. [18] S.G. Wang, A new estimator of regression coecients in seemingly unrelated regression system, Science in China A10 (1988) 1033±1040. [19] S.G. Wang, S.C. Chow, Advanced Linear Models, Marcel Dekker, New York, 1994. [20] S.G. Wang, Z.H. Yang, Pitman optimalities of covariance improved estimators, Chinese Sci. Bull. 40 (1995) 12±15.