A matrix version of the Wielandt inequality and its applications to statistics

A matrix version of the Wielandt inequality and its applications to statistics

Linear Algebra and its Applications 296 (1999) 171±181 www.elsevier.com/locate/laa A matrix version of the Wielandt inequality and its applications t...

110KB Sizes 0 Downloads 92 Views

Linear Algebra and its Applications 296 (1999) 171±181 www.elsevier.com/locate/laa

A matrix version of the Wielandt inequality and its applications to statistics Song-Gui Wang

a,b

, Wai-Cheung Ip

c,*

a b

Department of Applied Mathematics, Beijing Polytechnic University, Beijing 100022, People's Republic of China Institute of Applied Mathematics, Academia Sinica, Beijing 100080, People's Republic of China c Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Received 18 December 1997; accepted 22 May 1999 Submitted by R.A. Brualdi

Abstract Suppose that A is an n  n positive de®nite Hermitian matrix. Let X and Y be n  p and n  q matrices, respectively, such that X Y ˆ 0. The present article proves the following inequality,  2 k1 ÿ kn X AY…Y AY†ÿ Y AX 6 X AX; k1 ‡ kn where k1 and kn are respectively the largest and smallest eigenvalues of A, and Mÿ stands for a generalized inverse of M. This inequality is an extension of the well-known Wielandt inequality in which both X and Y are vectors. The inequality is utilized to obtain some interesting inequalities about covariance matrix and various correlation coecients including the canonical correlation, multiple and simple correlations. Some applications in parameter estimation are also given. Ó 1999 Elsevier Science Inc. All rights reserved. Keywords: Wielandt inequality; Kantorovich inequality; Cauchy±Schwarz inequality; Canonical correlation; Condition number; Generalized inverse

* Corresponding author. Tel.: +852-27666953; Fax: +852-23629045; e-mail: mathipwc@ polyu.edu.hk

0024-3795/99/$ ± see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 0 2 4 - 3 7 9 5 ( 9 9 ) 0 0 1 1 7 - 2

172

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

1. Introduction Let A be an n  n positive de®nite Hermitain matrix and x and y be n  1 vectors satisfying x y ˆ 0. The Wielandt inequality (see, e.g. [4; 7, p. 443]) asserts that jx Ayj2 6



k 1 ÿ kn k 1 ‡ kn

2

…x Ax†…y Ay†;

…1:1†

where k1 P    P kn are the eigenvalues of A. Moreover there exists an orthonormal pair of vectors x and y for which equality holds in (1.1). Let of A corresponding to u1 ; . . . ; un be the orthonormal eigenvectors p p k1 P    P kn . In fact, when x ˆ ( u1 ‡ un )= 2 and y ˆ (u1 ÿ un ) 2 equality holds in (1.1). Eaton [5] rediscovered ÿ  (1.1). Householder [9] observed that when y ˆ …x x†Aÿ1 x ÿ x Aÿ1 x x (1.1) becomes the following Kantorovich inequality x Ax  x Aÿ1 x …x x†2

2

6

… k1 ‡ kn † 4k1 kn

…1:2†

see, for example [19, p. 44]. The Wielandt inequality (1.1) is an improvement on the general Cauchy± Schwarz inequality which is jx Ayj2 6 …x Ax†…y Ay†

…1:3†

for every pair of vectors x and y. The last decade has witnessed considerable progress in the study of the Cauchy±Schwarz inequality (1.3), in particular, in the matrix version and its applications in statistics. Marshall and Olkin [10] and Baksalary and Puntanen [2,3] presented some matrix versions involving a positive de®nite or semide®nite matrix. Mond and Pecaric [11] and Pecaric et al. [14] derived some general Cauchy±Schwarz type inequalities. Thus there are several matrix versions of the Cauchy±Schwarz inequalities (1.3) in the literature. In contrast to this fact, there has not been any matrix version of the Wielandt inequality in the literature yet. The purpose of this paper is to present a matrix version of (1.1) and its applications to statistics. In Section 2, we shall derive a matrix version of the Wielandt inequality and other useful inequalities which are obtained as straightforward corollaries. Section 3 presents several statistical applications of our main results which involve some interesting inequalities about correlation coecients and covariance matrix.

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

173

2. A matrix version of the Wielandt inequality For any m  n matrix B, we shall use Bÿ to stand for a generalized inverse of it, i.e., Bÿ satis®es BBÿ B ˆ B: B‡ denotes its unique Moore±Penrose inverse  which is a generalized inverse satisfying B‡ BB‡ ˆ B‡ ; …BB‡ † ˆ BB‡ and ‡  ‡ …B B† ˆ B B. Readers may refer to [17,20] for more details. We shall introduce the following conventions used throughout this paper. M…B† denotes the  ÿ column space (range) of B and PB ˆ B…B B† B denotes the orthogonal projector onto M…B†. A P 0 …> 0† means that A is a non-negative de®nite (positive-de®nite) Hermitian matrix. A P B stands for A P 0; B P 0 and A ÿ B P 0. The main result of this paper is stated in the following theorem. Theorem 1. Let A > 0 be an n  n matrix, k1 and kn be its largest and smallest eigenvalues, u1 and un be the associated orthonormal eigenvectors, and X and Y be n  p and n  q matrices satisfying X Y ˆ 0. Then  2 k1 ÿ kn ÿ    X AY…Y AY† Y AX 6 X AX …2:1† k1 ‡ kn with equality if 1 1 PX u1 ‡ PX un ˆ c…u1 ‡ un † k1 kn

…2:2†

for scale k and M…Y† ˆ M…I ÿ PX †. In order to prove Theorem 1, we need the following lemma. Lemma 1. ABÿ C ˆ AB‡ C;

for all Bÿ

…2:3†

if and only if M…A †  M…B † and

M…C†  M…B†:

…2:4†

The proof of Lemma 1 can be found in [16, Lemma 2.2.4 and Complement 13, p. 437]. Remark 1. In view of Lemma 1, the left-hand side of (2.1) is invariant with ÿ respect to the choice of generalized inverse …Y AY† . Therefore (2.1) is equivalent to  2 k1 ÿ kn ‡    X AX: …2:5† X AY…Y AY† Y AX 6 k1 ‡ kn

174

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

Lemma 2. Let A be an n  n matrix and B be an m  n matrix such that B B ˆ In . Then ‡

…BAB † ˆ BA‡ B : This may be proved by direct validation. Lemma 3. Let A P 0 be n  n, X be n  p matrices and M…X†  M…A†. Then ÿ

ÿ

X…X Aÿ X† X ˆ A ÿ AN…NAN† NA;

…2:6†

where N ˆ I ÿ PX . Both sides of (2.6) are invariant with respect to the choice of the involved generalized inverse. In fact they are two equivalent expressions of the covariance matrix of the best linear unbiased estimator of Xb in the linear model y ˆ Xb ‡ e; E…e† ˆ 0; cov…e† ˆ A, whereas M…X†  M…A†. This inequality is available in the statistical literature (e.g., [1, p. 162] and (3.2) of [3, p. 287]). We are now in a position to prove Theorem 1. Proof of Theorem 1. Pre- and post-multiplying (2.1) by X…X X†ÿ and …X X†ÿ X , respectively, would yield  2 k1 ÿ kn ÿ PX AY…Y AY† Y APX 6 PX APX : …2:7† k1 ‡ kn It is obvious that (2.1) and (2.7) are equivalent. It is therefore sucient to prove the latter. It is in turn sucient to show that for any a 2 Cn  2 k 1 ÿ kn ÿ    a PX AY…Y AY† Y APX a 6 a PX APX a k 1 ‡ kn or equivalently ÿ

a PX AY…Y AY† Y APX a 6 a PX APX a



k 1 ÿ kn k 1 ‡ kn

2 :

Since both k1 and kn are positive,  2 k 1 ÿ kn 6 1: k 1 ‡ kn Thus the above inequality is also equivalent to  2 ÿ a PX AY…Y AY† Y APX a k 1 ÿ kn P 1 ÿ 1ÿ k 1 ‡ kn a PX APX a

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

175

or a PX APX a ÿ a PX AY…Y AY†ÿ Y APX a 4k1 kn P : 2  a PX APX a …k1 ‡ kn † Note that ÿ

ÿ

N…NAN† N P Y…Y AY† Y ; where N ˆ I ÿ PX , thus it suces to show ÿ

a PX APX a ÿ a PX AN…N AN† NAPX a 4k1 kn P : 2 a PX APX a …k1 ‡ kn †

…2:8†

By Lemma 3 (2.8) reduces to ÿ

a X…X Aÿ1 X† X a 4k1 kn P : 2 a PX APX a …k1 ‡ kn †

…2:9†

To prove (2.9), consider the singular value decomposition of X X ˆ HD G ; where H is an n  r matrix with H H ˆ Ir , G is a p  r matrix with G G ˆ Ir and D is an r  r diagonal matrix with positive diagonal elements and r ˆ rank…X†. In view of Lemma 2, it can be shown that X…X Aÿ1 X†ÿ X ˆ H…H Aÿ1 H†ÿ1 H : Since PX ˆ HH ; if we put t ˆ H a, then (2.9) becomes ÿ1

t …H Aÿ1 H† t 4k1 kn P :   t H AHt …k1 ‡ kn †2

…2:10†

It can be shown using the Cauchy±Schwarz inequality (1.3) that 2

ÿ1

…t t† 6 t H Aÿ1 Ht  t …H Aÿ1 H† t and with equality holding if and only if H Aÿ1 Ht ˆ ct

…2:11†

for scalar c. Hence we have ÿ1

2

t …H Aÿ1 H† t …t t† P    t H AHt t H  AHt  t H Aÿ1 Ht …x x†2 4k1 kn P ; ˆ ÿ1 2   x Ax  x A x …k1 ‡ kn †

…2:12†

176

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

where x ˆ Ht and x x ˆ t H Ht ˆ t t, and the last inequality follows from the Kantorovich inequality (see, e.g.,p[7,  p. 452], or [19, p. 44]). Since equality holds in (2.12) when x ˆ …u1 ‡ un †= 2, (2.2) follows from (2.11). This completes the proof of Theorem 1. Remark 2. If p ˆ q ˆ 1, (2.1) reduces to (1.1). Remark 3. The Cauchy±Schwarz counterpart of (2.1) is X AY…Y AY†ÿ Y AX 6 X AX; which holds for any n  p matrix X and n  q matrix Y. Corollary 1. X AY…Y AY†ÿ Y AX 6



jÿ1 j‡1

2

X AX;

where j ˆ k1 =kn is the so-called condition number of A. Let A be an n  n positive de®nite Hermitian matrix and be partitioned as   A11 A12 Aˆ ; …2:13† A21 A22 where A11 is p  p and A22 is q  q; p ‡ q ˆ n. The matrix A11 ÿ A12 Aÿ1 22 A21 is said to be the Schur complement of A22 in A and is usually denoted by ``…A=A22 †''. The term ``Schur Complement'' and the notation ``…A=A22 †'' were introduced by Haynsworth [6]. In statistical literature, however, authors usually prefer the notation A112 to …A=A22 † for simplicity. The Schur complement is also very useful in statistics. Quellelle and Styan [13,17] presented an excellent survey for the Schur complement and its statistical applications. From Theorem 1, we shall obtain an important inequality concerning the Schur complement which is a re®nement of the well-known fact A112 > 0 for A > 0 due to Haynsworth [6]. Theorem 2. Let A > 0 with partition (2.13). Suppose that k1 and kn are the largest and smallest eigenvalues of A. Let j ˆ k1 =kn . Then  2 k 1 ÿ kn A12 Aÿ1 A 6 A11 ; …2:14† 22 21 k 1 ‡ kn or equivalently A11:2 ˆ A11 ÿ A12 Aÿ1 22 A21 P

4j …j ‡ 1†

2

A11 :

…2:15†

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

177

Proof.Inequality (2.14) follows from Theorem    1 by taking the n  p matrix X 0 Ip and n  q matrix Y to be to be . Subtracting both sides of (2.14) Iq 0 from A11 will lead to (2.15). h 3. Applications to statistics In this section we will derive some inequalities concerning various correlation coecients and covariance matrices using the results obtained in the previous section. (1) Correlation coefficients. We shall ®rst consider the canonical correlations between the random vectors u and w which are p  1 and q  1 respectively and have joint covariance matrix     u R11 R12 : …3:1† cov ˆRˆ R21 R22 w Applying Theorem 2 would yield  2 k1 ÿ kn R 6 R11 ; R12 Rÿ1 22 21 k1 ‡ kn

…3:2†

where k1 and kn are the largest and smallest eigenvalues of R. It follows from (3.2) that  2 k1 ÿ kn ÿ1=2 ÿ1=2 R R 6 Ip : …3:3† R11 R12 Rÿ1 22 21 11 k1 ‡ kn Without loss of generality we may assume p 6 q and rank(R12 † ˆ p. Denote by q1 P    P qp the canonical correlation coecients of u and w. It follows from (3.3) qi <

k 1 ÿ kn ; k 1 ‡ kn

i ˆ 1; . . . ; p:

…3:4†

Corollary 2. qH ˆ

 2p  2p p Y k 1 ÿ kn jÿ1 q2i 6 ˆ ; k 1 ‡ kn j‡1 iˆ1

…3:5†

where qH is the well-known Hottelling [8] correlation coefficient and j ˆ k1 =kn is the condition number of R. (3.5) establishes a relation between the Hottelling correlation coefficient of two random vectors and the condition number (or eigenvalues) of their joint covariance matrix.

178

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

Now we move to the multiple correlation. If p ˆ 1, we replace R11 ; R12 and R21 in (3.1) by r11 ; r12 and r21 , respectively. It then follows from (3.3)  2  2 r012 Sÿ1 k1 ÿ kn jÿ1 2 22 r21 6 ˆ ; …3:6† qu;w ˆ r11 k1 ‡ kn j‡1 where qu;w …> 0† is the so-called multiple correlation coecient. Corollary 3. qu;w 6

k 1 ÿ kn j ÿ 1 ˆ k 1 ‡ kn j ‡ 1

…3:7†

and equivalently jP

1 ‡ qu;w 1 ÿ qu;w

…3:8†

(3.7) and (3.8) establish interesting relations between the condition number of the covariance matrix and the multiple correlation of a random variable u and a random vector w. When qu;w ! 1; j goes to infinity and vice versa. This fact shows that the condition number and multiple correlation have the same function in regression diagnostics. Similar results may be obtained directly from Wielandt inequality (1.1) for simple correlations. 0 Suppose that z ˆ …z1 ; . . . ; zp † is a random vector with covariance matrix 0 cov…z† ˆ R ˆ …rij †. Put A ˆ R, x ˆ …0; . . . ; 0; 1; 0; . . . ; 0† and y ˆ 0 …0; . . . ; 0; 1; 0; . . . ; 0† in (1.1), where 1 is at the ith and jth positions, respectively, we shall obtain  2  2 r2ij k1 ÿ kn jÿ1 6 ˆ for all i 6ˆ j rii rjj k1 ‡ kn j‡1 or equivalently Corollary 4. k 1 ÿ kn j ÿ 1 qij 6 ; ˆ k 1 ‡ kn j ‡ 1

…3:9†

where qij denotes the simple correlation coefficient between zi and zj and j ˆ k1 =kn is the condition number of R. Similarly, we may rewrite (3.9) as 1 ‡ qij …3:10† jP 1 ÿ qij

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

179

(3.9) or (3.10) characterizes an important relation between the simple correlation coefficient of any two components of a random vector and the condition number or eigenvalues of its covariance matrix. (2) Covariance matrix. Suppose that u and w are p  1 and q  1 random vectors with joint covariance matrix as de®ned in (3.1) and S > 0 (with probability 1) partitioned as   S11 S12 ; …3:11† Sˆ S21 S22 is an estimator of R, where S11 is p  p. In view of Theorem 2, we shall have S112 P

4j …j ‡ 1†

S12 Sÿ1 22 S21

 6

2

S11 ;

jÿ1 j‡1

…3:12† 2 S11 ;

…3:13†

where j is the condition number of S. Further if S has the Wishart distribution W …n; R†, then ([12, pp. 93, 117]) S112  Wp …n ÿ q; R112 †; S11  WP …n; R11 † and further when R12 ˆ 0, S12 Sÿ1 22 S21  Wp …q; R11 †: Hence (3.11) and (3.12) establish some interesting relations among these Wishart matrices. These matrices appear quite often in statistical literature, in particular, in linear models and multivariate analysis. In what follows we shall present an example to explain their practical applications. (3) Covariance adjusted estimator. Suppose that T1 is a p  1 statistic and T2 is a q  1 statistic with expectations E…T1 † ˆ h and E…T2 † ˆ 0, respectively, and joint covariance matrix     R11 R12 T1 ˆRˆ : cov T2 R21 R22 Assume that R > 0. If R12 6ˆ 0, then as an unbiased estimator of h, T1 can be improved. In fact, the estimator h ˆ T1 ÿ R12 Rÿ1 22 T2

180

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

is the best unbiased estimator in the class n o A ˆ h~ : h~ ˆ T1 ‡ AT2 ; Eh~ ˆ h; A is p  q non-random [15]. h is called the covariance adjusted estimator of h. This estimator has several optimality properties and applications in linear models [20,18]. It is obvious cov …h † ˆ R11:2 6 cov …T1 †: Thus the matrix  R12 Rÿ1 22 R21 ˆ cov …T1 † ÿ cov …h †

may be considered as a measure of the gain of the estimator h . Hence matrices   S11 ; S112 and S12 Sÿ1 22 S21 are estimators of cov…T1 †, cov(h ) and the gain of h , respectively. Although S112 6 S11 holds always, (3.12) gives a lower bound for S112 . On the other hand (3.13) establishes an upper bound for the estimated bounds number of S. We gain of h . Both are related to S11 and the condition =jR11 j as the relative gain of h , which may be estimated R may regard R 12 Rÿ1 21 22 by S12 Sÿ1 22 S21 =jS11 j. It follows from (3.13) that an upper bound for this estimated relative gain is given by the following corollary. Corollary 5. S12 Sÿ1 S21  j ÿ 1 2p 22 6 j‡1 jS11 j

…3:14†

in which the upper bound tends to 1 as j ! 1. Acknowledgements S.-G. Wang's research was partially supported by grants from Natural Science Foundation of China, a project of the Education Committee of Beijing and Natural Science Foundation of Beijing and the second author's research was supported by a grant from the Research Committee of The Hong Kong Polytechnic University.

References [1] A. Albert, The Gauss±Markov theorem for regression models with possibly singular covariances, SIAM J. Appl. Math. 24 (1973) 182±187. [2] J.K. Baksalary, S. Puntanen, Generalized matrix versions of the Cauchy±Schwarz and Kantorovich inequalities, Aequationes Math. 41 (1991) 103±110.

S.-G. Wang, W.-C. Ip / Linear Algebra and its Applications 296 (1999) 171±181

181

[3] J.K. Baksalary, S. Puntanen, G.P.H. Styan, A property of the dispersion matrix of the best linear unbaised estimator in the general Gauss±Markov model, Sankhya A 52 (1990) 279±296. [4] F.L. Bauer, A.S. Householder, Some inequalities involving the Euclidean condition of a matrix, Numerische Mathematik 2 (1960) 308±311. [5] M.L. Eaton, A maximization problem and its applications to canonical correlation, J. Multivariate Anal. 6 (1976) 422±425. [6] E.V. Haynsworth, Determination of the inertia of a partitioned Hermitian matrix, Linear Algebra Appl. 1 (1968) 73±81. [7] R. Horn, O.R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, 1985. [8] H. Hottelling, Relations between two sets of variates, Biometrika 28 (1936) 321±377. [9] A.S. Householder, The Theory of Matrices in Numerical Analysis, Reprint Edition, Dover, New York, pp. 81±85. Original Version: Blaisdell, New York, 1964. [10] A.W. Marshall, I. Olkin, Matrix version of the Cauchy and Kantorovich inequalities, Aequationes Math. 40 (1990) 89±93. [11] B. Mond, J.E. Pecaric, Matrix versions of some means inequalities, Aust. Math. Soc. Gaz. 20 (1993) 117±120. [12] R.J. Muirhead, Aspects of Multivariate Statistical Theory, Wiley, New York, 1982. [13] D.V. Quellelle, Schur Complement and Statistics, Linear Algebra Appl. 36 (1981) 187±295. [14] J.E. Pecaric, S. Puntanen, G.P.H. Styan, Some further matrix extensions of the Cauchy± Schwarz and Kantorovich inequalities, with some statistical applications, Linear Algebra Appl. 237 (1996) 455±476. [15] C.R. Rao, Least square theory using an estimated dispersion matrix and its applications to measurement of signal, in: L. Lecam, Neyman (Eds.), Proceeding of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1967, pp. 355±372. [16] C.R. Rao, S.K. Mitra, Generalized Inverse of Matrices and Applications, Wiley, New York, 1971. [17] G.P.H. Styan, Schur Complements and linear statistical models, in: Pukkila, S. Puntanen (Eds.), Proceedings of the First International Tampere Seminar on Linear Statistical Models and their Applications, University of Tampere, Finland, 1985, pp. 37±75. [18] S.G. Wang, A new estimator of regression coecients in seemingly unrelated regression system, Science in China A10 (1988) 1033±1040. [19] S.G. Wang, S.C. Chow, Advanced Linear Models, Marcel Dekker, New York, 1994. [20] S.G. Wang, Z.H. Yang, Pitman optimalities of covariance improved estimators, Chinese Sci. Bull. 40 (1995) 12±15.