Asymptotic results in canonical discriminant analysis when the dimension is large compared to the sample size

Asymptotic results in canonical discriminant analysis when the dimension is large compared to the sample size

Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466 Contents lists available at ScienceDirect Journal of Statistical Planning and ...

172KB Sizes 0 Downloads 17 Views

Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i

Asymptotic results in canonical discriminant analysis when the dimension is large compared to the sample size Yasunori Fujikoshia,∗ , Tetsuto Himenob , Hirofumi Wakakib a Department b Department

of Mathematics, Faculty of Science and Engineering, Chuo University, 1-13-27 Kasuga Bunkyo-ku, Tokyo 112-8551, Japan of Mathematics, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8524, Japan

A R T I C L E

I N F O

A B S T R A C T

Article history: Received 23 July 2005 Accepted 24 September 2006 Available online 15 March 2008 MSC: Primary 62H10 Secondary 62E20 Keywords: Asymptotic null and non-null distributions; Canonical discriminant analysis; Characteristic roots; High-dimensional framework; Tests for dimensionality

This paper deals with the distributions of test statistics for the number of useful discriminant functions and the characteristic roots in canonical discriminant analysis. These asymptotic distributions have been extensively studied when the number p of variables is fixed, the number q + 1 of groups is fixed, and the sample size N tends to infinity. However, these approximations become increasingly inaccurate as the value of p increases for a fixed value of N. On the other hand, we encounter to analyze high-dimensional data such that p is large compared to n. The purpose of the present paper is to derive asymptotic distributions of these statistics in a highdimensional framework such that q is fixed, p → ∞, m = n − p + q → ∞, and p/n → c ∈ (0, 1), where n = N − q − 1. Numerical simulation revealed that our new asymptotic approximations are more accurate than the classical asymptotic approximations in a considerably wide range of (n, p, q). © 2008 Elsevier B.V. All rights reserved.

1. Introduction Canonical discriminant analysis is a statistical procedure designed to discriminate between several different groups in terms of few discriminant functions, based on given multivariate observations on individuals in each population. Suppose that we have (q + 1)p-variate normal populations Np (µi , ) (i = 1, . . . , q + 1), and ni observations xij (j = 1, . . . , ni ) are available from the ith population. Let Sh and Se be the matrices of sums of squares and products due to between groups and within groups, respectively. Then they are defined as

Sh =

q+1 

¯ x¯ i − x) ¯ , ni (x¯ i − x)(

Se =

i=1

q+1 ni 

(xij − x¯ i )(xij − x¯ i ) ,

i=1 j=1

where x¯ i =

ni 1  xij , ni j=1



x¯ =

q+1 ni 1  xij , N = n1 + · · · + nq+1 . N i=1 j=1

Corresponding author. Tel.: +81 3 3817 1745; fax: +81 3 3817 1749. E-mail addresses: [email protected] (Y. Fujikoshi), [email protected] (H. Wakaki).

0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2006.09.028

3458

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

Note that Sh and Se are independently distributed as a non-central Wishart distribution Wp (q, ; 1/2 1/2 ) and a central Wishart distribution Wp (n, ), respectively, where

1/2 1/2 =

q+1 

¯ )(µi − µ ¯ ) , ni (µi − µ

(1.1)

i=1

µ¯ =

1 (n µ + · · · + nq+1 µq+1 ), N 1 1

n = N − q − 1.

In canonical discriminant analysis, it is interesting to determine the number of discriminant functions necessary to describe group differences. The number may be defined as rank() which is called the dimension in canonical discriminant analysis. For testing H0 : rank() = k, v. s. H1 : rank()  k + 1,

(1.2)

the following test statistics have been considered (see, e.g., Siotani et al., 1985): LR = log

t 

(1 + j ),

t 

LH =

j=k+1

j ,

BNP =

j=k+1

t 

j

j=k+1

1 + j

,

(1.3)

where k < t, t = min(p, q) and i (i = 1, . . . , p) are the characteristic roots of Sh Se−1 satisfying

1 > · · · > t > t+1 = · · · = p = 0. The ith characteristic root i in canonical discriminant analysis denotes the degree of discriminant ability of the ith discriminant function. In the following we consider the case q  p since we are interesting in the case that p is large. Therefore, t = q. Further, we assume that Se is non-singular, which is equivalent to assuming n = N − q − 1  p. This paper is concerned with asymptotic distributions of the three test statistics and the characteristic roots i . These asymptotic distributions have been extensively studied (see, e.g., Hsu, 1941; Anderson, 1951; Lawley, 1959; Fujikoshi, 1977, etc.) when p and q are fixed and n tends to infinity. However, these approximations become increasingly inaccurate as the value of p increases for a fixed value of N. On the other hand, we encounter to analyze high-dimensional data such that p is large compared to n. Such data arise even in usual multivariate data and micro-array data after reducing the number of gene expressions. The purpose of the present paper is to derive asymptotic distributions of these statistics under a high-dimensional framework such that (A1) :

q; fix p → ∞,

m = n − p + q → ∞,

and p/n → c ∈ (0, 1).

(1.4)

The high-dimensional framework is, for example, realized by setting p = cb and

n=b

with 0 < c < 1 and by letting b → ∞. Numerical simulation revealed that our new asymptotic approximations are more accurate than the classical asymptotic approximations in a considerably wide range of (n, p, q). In the special case k = 0, Wakaki et al. (2006) have obtained asymptotic expansions of the distributions of the three test statistics. It may be noted that there are some works in a high-dimensional framework. For example, asymptotic distributions of the linear discriminant function for p/n → c ∈ (0, 1) have been studied extensively in the Russian literature (see Raudys and Young, 2004). The case p/n → ∞ was studied by Bickel and Levina (2004). Asymptotic behaviors of the characteristic roots of Wishart matrices have been studied by Bai (1999) and references therein. Johnstone (2001) derived an asymptotic distribution of the largest characteristic root of the matrix having a Wishart distribution Wp (n, Ip ). However, there is no work on high-dimensional asymptotic theory of test statistics for the number of useful discriminant functions and the characteristic roots in canonical discriminant analysis. 2. Preliminaries Since we deal with the distribution of a function of characteristic roots of Sh Se−1 , without loss of generality we may transform Sh and Se as S˜ h = Q  Sh Q

and

S˜ e = Q  Se Q

by a non-singular matrix Q. Note that

−1/2 Sh −1/2 ∼ Wp (q, Ip ; ),

−1/2 Se −1/2 ∼ Wp (n, Ip ).

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

3459

Let 1 , . . . , p (1  · · ·  p ) be the characteristic roots of , then q+1 = · · · = p = 0 since rank()  q. Let H be an orthogonal matrix such that ⎤ ⎡ 1 0 ⎥ ⎢   . .. H H = ⎣ ⎦ = MM , p 0 where M is a p × q matrix defined by ⎡ √ 0 ⎤ 1 ⎢ ⎥ .. ⎢ ⎥ . M=⎢ ⎥. √ ⎦ ⎣ 0 q 0 Letting S˜ h = H −1/2 Sh −1/2 H and S˜ e = H −1/2 Se −1/2 H, we have that S˜ h and S˜ e are independent, S˜ h ∼ Wp (q, Ip , MM ),

S˜ e ∼ Wp (n, Ip ),

and the characteristic roots of Sh Se−1 are equal to the ones of S˜ h S˜ e−1 . Note that we can express S˜ h as S˜ h = ZZ  ,

Z ∼ Np×q (M, Ip ⊗ Iq ).

Based on Wakaki et al. (2006), we have the following lemma. Lemma 2.1. Let Z ∼ Np×q (M, Ip ⊗ Iq ), B = Z Z

and W = B1/2 (Z  S˜ e−1 Z)−1 B1/2 .

Then B and W are independently distributed as a non-central Wishart distributions Wq (p, Iq ; D ) and a central Wishart distribution Wq (m, Iq ), respectively, where m = n − p + q and the non-centrality matrix is given by D = M M = diag(1 , . . . , q ). The characteristic roots of Z  S˜ e−1 Z are equal to the ones of BW −1 . Further, the non-zero characteristic roots of Z  S˜ e−1 Z are equal to the ones of S˜ h S˜ e−1 . Therefore we may consider the non-zero characteristic roots of BW −1 instead of Sh Se−1 . Based on Lemma 2.1, we can use the result that B and W are independently distributed as a non-central Wishart distributions Wq (p, Iq ; D ) and a central Wishart distribution Wq (m, Iq ), respectively. The non-centrality matrix of B is given by D =diag(1 , . . . , q ). The diagonal elements 1 , . . . , q are the non-zero characteristic roots of  in (1.1). If ni /N → ci ∈ (0, 1),

i = 1, . . . , q,

then the orders of elements of  are O(n) for given µi (i = 1, . . . , q + 1). Therefore, it is natural to assume that D = O(p),

(2.1)

under condition (1.4). Summarizing the above reduction, we have seen that when we treat with the distributions of the test statistics in (1.3) with t = q and the characteristic i , the characteristic roots 1 > · · · > q may be regarded as the ones of BW −1 . Note that the matrices B and W are q × q, and q is fixed, while Sh and Se are p × p, and p tends to infinity. This means that the distribution problem in the high-dimensional framework can be reduced to a similar problem in the large sample framework. Therefore, we can apply the traditional perturbation method to our problem. In the following we give a heuristic derivation, since its validity can be demonstrated as in Anderson (1963), etc. Now we consider perturbation expansion of a key matrix R = W −1/2 BW −1/2 ,

(2.2)

whose characteristic roots are the same as the ones of BW −1 . We use the notation r=

p p = , m n−p+q

(2.3)

which is O(1), under (1.4). Let U and V be the matrices defined by 1 U = √ (B − pIq − D ) and p

1 V = √ (W − mIq ), m

3460

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

respectively. Then it is well known that U is asymptotically normal as p → ∞, and V is asymptotically normal as m → ∞. We can write B and W in terms of U and V as

√ 1 B = p Iq + D + pU p

1 √ = m D + √ rU , (2.4) m

1 W = m Iq + √ V , (2.5) m where D = diag(1 , . . . , q ),



1 p

i = r 1 + i

(i = 1, . . . , q).

(2.6)

Note that D = O(1). In the following we use the symbol O∗j for the quantity O∗j (p, m, n) that is the jth order with respect to (p−1 , m−1 , n−1 ). Such a notation is used both for a stochastic order and a non-stochastic order. Using (2.4) and (2.5), we obtain a perturbation expansion for R given by



−1/2

−1/2 1 1 √ 1 1 m D + √ Iq + √ V rU Iq + √ V m m m m



√ 1 1 1 D + √ = Iq − √ V + O∗1 rU Iq − √ V + O∗1 2 m m 2 m

√ 1 1 1 = D + √ rU − VD − D V + O∗1 . 2 2 m

R=

(2.7)

3. Null distributions of tests for dimensionality In this section we consider the null distributions of the three test statistics under conditions (1.4) and (A2) :

1 > · · · > k > k+1 = · · · = q = 0, i = O(p) (i = 1, . . . , k).

(3.1)

Consider the transformed test statistics of LR, LH and BNP in (1.3) defined by ⎧ ⎫



q  √ m ⎨ p ⎬ (1 + j ) − (q − k) log 1 + log , TLR = p 1 + p ⎩ m ⎭ j=k+1 ⎧ ⎫ q ⎬ √ ⎨m  j − (q − k) , TLH = p ⎩p ⎭ j=k+1 ⎫ ⎧



q ⎬ j √ m p ⎨ 1+ − (q − k) . TBNP = p 1 + ⎭ m ⎩ p 1 + j

(3.2)

j=k+1

Note that 1 , . . . , q are the characteristic roots of R in (2.2), and these test statistics are symmetric functions of the last q − k characteristic roots k+1 , . . . , q . Using the fact that R has a perturbation expansion as in (2.7), it can be seen (see, Lawley, 1956, 1959; Anderson, 1963; Fujikoshi, 1977) that the last q − k characteristic roots k+1 , . . . , q are the characteristic roots of 1 √ Q = rIq−k + √ ( rU22 − rV 22 ) + O∗1 , m where  U=

U11 U21

 U12 , U22

 V=

V11 V21

 V12 , V22

U22 and V22 are (q − k) × (q − k) matrices.

(3.3)

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

3461

Based on the above result, we can expand TLR , TLH and TBNP as follows: ⎧ ⎫



q √ m ⎨  p ⎬ TLR = p 1 + log(1 + r + j − r) − (q − k) log 1 + p ⎩ m ⎭ j=k+1

=





⎤ 



q  j − r m ⎣  p ⎦ + O∗1 − (q − k) log 1 + p 1+ log(1 + r) + p 1+r m j=k+1

⎫ ⎧ q ⎬ √ 1 + r ⎨  j − r + O∗1 = p ⎭ r ⎩ 1+r j=k+1

√   m tr(Q − rIq−k ) + O∗1 = √ r √ = tr(U22 − rV22 ) + O∗1/2 ,  1 {r(q − k) + tr(Q − rIq−k )} − (q − k) r √ = tr(U22 − rV22 ) + O∗1/2 , ⎫ ⎧



q ⎬ j √ m p ⎨ TBNP = p 1 + 1+ − (q − k) ⎭ ⎩ m p 1 + r + j − r TLH =





p

j=k+1

=





⎤   q  j − r 1 + r 1 ∗ (r + j − r) 1 − + O1 − (q − k)⎦ p(1 + r) ⎣ r 1+r 1+r





q  1 1 1+ (j − r) + O∗1 − (q − k)⎦ p(1 + r) ⎣ − r 1+r

j=k+1

=



j=k+1





1 tr(Q − rIq−k ) + O∗1 r √ = tr(U22 − rV22 ) + O∗1/2 .

=

mr

Each of the diagonal elements of U22 and V22 is asymptotically distributed as N(0, 2). Therefore we obtain the following theorem. Theorem 3.1. Under conditions (1.4) and (3.1), TG d → N(0, 1),

G

d

where G = LR, LH, BNP, → denotes convergence in distribution, and  G = 2(q − k)(1 + r). Numerical accuracy was examined for our high-dimensional approximations (denoted as HA) in Theorem 3.1 and the classical approximations (denoted as CA) based on the limiting distribution in a large sample framework such that p is fixed and n tends to infinity. Under the large sample framework it is well known (see, e.g., Siotani et al., 1985) that the following three statistics (i) n log

q  j=k+1

(1 + j ),

(ii) n

q  j=k+1

j and (iii) n

q 

j

j=k+1

1 + j

are asymptotically distributed as a 2 -distribution with (p − k)(q − k) degrees of freedom. The values of q, n, p and  were chosen as follows: (p, n); (10, 40), (20, 40), (30, 40), (10, 80), (40, 80), (70, 80), (q, ); (2, diag(p, 0)), (4, diag(p, 0, 0, 0)), (4, diag(2p, p, 0, 0)).

3462

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

Table 1 Upper 5 percent point q

n

p

2

 = diag(p, 0) 40

80

4

LR

10 20 30 10 40 70

LH

BNP

p : fix

Simu

p→∞

Simu

p→∞

Simu

p→∞

0.394 0.904 1.681 0.188 0.884 2.426

0.473 0.989 1.820 0.226 0.920 2.542

0.484 1.468 4.372 0.207 1.421 10.311

0.576 1.562 4.486 0.248 1.442 10.073

0.326 0.595 0.814 0.172 0.587 0.912

0.391 0.655 0.876 0.206 0.616 0.944

0.423 0.754 1.064 0.211 0.682 1.117

0.948 2.241 4.246 0.453 2.295 6.388

1.103 2.373 4.325 0.542 2.359 6.358

1.177 3.595 10.762 0.501 3.585 25.441

1.309 3.517 9.223 0.589 3.527 20.898

0.783 1.525 2.209 0.413 1.577 2.607

0.936 1.666 2.328 0.500 1.648 2.664

1.003 1.891 2.744 0.501 1.791 3.020

0.565 1.423 2.712 0.270 1.515 4.144

0.785 1.665 3.016 0.385 1.635 4.386

0.671 2.613 6.137 0.293 2.321 14.780

0.936 2.497 6.567 0.420 2.472 14.816

0.482 1.001 1.466 0.250 1.054 1.737

0.662 1.156 1.595 0.354 1.132 1.800

0.657 1.275 1.862 0.329 1.217 2.053

 = diag(p, 0, 0, 0) 40

10 20 30 80 10 40 70  = diag(2p, p, 0, 0) 40 10 20 30 80 10 40 70

Table 2 Actual error probabilities of the first kind when the nominal level is 0.05 q

2

n

p

LH

BNP

p→∞

p : fix

p→∞

p : fix

p→∞

p : fix

10 20 30 10 40 70

0.0159 0.0218 0.0221 0.0158 0.0299 0.0243

0.0320 0.1733 0.6352 0.0255 0.3631 0.9945

0.0203 0.0361 0.0448 0.0187 0.0437 0.0557

0.0933 0.5256 0.9540 0.0453 0.8014 1

0.0105 0.0101 0.0034 0.0135 0.0179 0.0021

0.0039 0 0 0.0108 0.0006 0

10 20 30 80 10 40 70  = diag(2p, p, 0, 0) 40 10 20 30 80 10 40 70

0.0085 0.0212 0.0367 0.0088 0.0282 0.0553

0.0294 0.2722 0.9164 0.0184 0.6370 1

0.0214 0.0608 0.1412 0.0118 0.0587 0.1921

0.1403 0.8359 0.9999 0.0496 0.9895 1

0.0033 0.0055 0.0047 0.0061 0.0120 0.0085

0.0009 0 0 0.0055 0.0004 0

0.0014 0.0042 0.0084 0.0019 0.0133 0.0155

0.0126 0.1411 0.7250 0.0086 0.4429 1

0.0026 0.0103 0.0297 0.0022 0.0232 0.0491

0.0573 0.5970 0.9915 0.0220 0.9235 1

0.0009 0.0015 0.0012 0.0013 0.0065 0.0020

0.0010 0.0002 0 0.0030 0.0002 0

 = diag(p, 0) 40

80

4

LR

 = diag(p, 0, 0, 0) 40

Table 1 shows the upper 5 percent points based on Monte Carlo simulation, and the upper 5 percent points based on HA. Note that the large sample approximations are the same for LR, LH and BNP. Table 2 shows the actual error probabilities of the first kind based on the limiting percent points given by Table 1. We can see the following tendency from Tables 1 and 2: (1) (2) (3) (4) (5)

HA is better than CA in a considerably wide range. CA is bad while p becomes large. HA is good while p becomes large. The approximations become bad as k increases. The limiting distributions do not depend on 1 , . . . , k . It is expected to obtain some refinement.

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

3463

4. Distributions of characteristic roots The limiting distribution of (1 , . . . , q ) in the large sample framework was derived by Hsu (1941) and Anderson (1951). For its extension to an asymptotic expansion, see, e.g., Siotani et al. (1985). In this section we derive the limiting distribution of (1 , . . . , q ) under conditions (1.4) and (3.1). 4.1. Limiting distribution for j (j = 1, . . . , k) Applying a general result (see, e.g., Siotani et al., 1985) on perturbation expansion to (2.7), we can obtain √

√ rujj − j vjj + O∗1/2 ,

m(j − j ) =

j = 1, . . . , k.

(4.1)

The random matrix U may be expressed as 1 U = √ (X  X + M X + X  M − pIq ), p where X is a p × q random matrix having a normal matrix distribution Np×q (0, Ip ⊗ Iq ). Denoting X = [xij ] = [x1 , . . . , xq ], then  1 ujj = √ (xj xj + 2 j xjj − p). p Thus, ujj is asymptotically distributed as N(0, 2 + 4j /p). Similarly, vjj is asymptotically distributed as N(0, 2). Since ujj and vjj are independent, we obtain the following theorem. Theorem 4.1. Under conditions (1.4) and (3.1), √

m(j − j ) d → N(0, 1),

j = 1, . . . , k,

j

where 

j = 2r + 4

rj p

+ 22j .

For the limiting distribution of j in the large sample framework, it is known that √

m(j − ∗j )

∗j

is asymptotically distributed as N(0, 1), where

j ∗j = m

 and

∗j =

4

j m

+2

j 2 m

.

We can see that there is a similar tendency on numerical accuracy between CA and HA of the distribution of j , j = 1, . . . , k through numerical simulation. However, it is left to give a region of q, n, p and  such that HA is practically useful. 4.2. Limiting distribution for (k+1 , . . . , q ) In this section we consider the joint distribution of k+1 , . . . , q , We have seen that k+1 , . . . , q are the characteristic roots of Q in (3.3). By the definition of U and V, we can express as 1 U22 = √ (B˜ − pIq−k ), p

1 ˜ V22 = √ (W − mIq−k ), m

3464

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

1 ˜ ∼W where B˜ ∼ Wq−k (p, Iq−k ) and W q−k (m, Iq−k ). Let T = [ 2 (1 + ij )tij ], (tij = tji , ij is the Kronecker's delta), then the characteristic √ function of ( rU22 − rV 22 ) is given by √ E[etr{iT( rU22 − rV 22 )}]

 √

√  √ ir i r ˜ = E e−i rptr(T) eir m tr(T) etr √ T B˜ etr − √ T W p m

√ −p/2      2 r  2r −m/2 I + i T = Iq−k − i √ T  √  q−k p m  √ √ = exp{i prtr(T) − r tr(T 2 ) + O∗1/2 } exp{−ir mtr(T) − r2 tr(T 2 ) + O∗1/2 } = exp{−(r + r2 )tr(T 2 )}{1 + O∗1/2 }. Let √

rU22 − rV 22 = S = [sij ],

then d

sii → N(0, 2r + 2r2 ),

d

sij → N(0, r + r2 ) (i = j).

Thus, the limiting density function of S is given by exp −

1 tr(S2 ) 4r(1 + r) (2)a(a+1)/4 (2r + 2r2 )a/2 (r + r2 )a(a−1)/4

1 1 tr(S2 ) , exp − = 4r(1 + r) 2a(a+3)/4 a(a+1)/4 {r(1 + r)}a(a+1)/4 √ where a = q − k. Therefore, letting yj = m(k+j − r), j = 1, . . . , a, we obtain the following theorem. 1

Theorem 4.2. Under conditions (1.4) and (3.1), the limiting joint density function of yj = ⎧ ⎫ a a ⎨ ⎬  a(a−1)/4 1 2 exp − yj (yi − yj ), ⎩ 4r(1 + r) ⎭ a (a/2)2a(a+3)/4 {r(1 + r)}a(a+1)/4 j=1

√ m(k+j − r), j = 1, . . . , a is given by

i
where a = q − k and p (a) is a multivariate gamma function defined by

p (a) = p(p−1)/4

p 

(a − (i − 1)/2).

i=1

5. Non-null distributions of tests for dimensionality In this section we derive asymptotic non-null distributions of the three test statistics for dimensionality. Further, using the limiting results, we obtain their asymptotic powers. 5.1. Limiting non-null distributions We consider the distributions of LR, LH and BNP under the alternative hypothesis H1 with rank() = b(k < b  q). Assume that (A3) :

1 > · · · > b > b+1 = · · · = q = 0, i = O(p) (i = 1, . . . , b).

Under condition (A3) it can be seen that the perturbation expansion (4.1) hold also for j = k + 1, . . . , b, i.e., √ √ m(j − j ) = rujj − j vjj + O∗1/2 , j = k + 1, . . . , b.

(5.1)

(5.2)

Similarly, it can be seen that the last q − b characteristic roots b+1 , . . . , q are the characteristic roots of 1 √ Q˜ = rIq−b + √ ( rU˜ 22 − rV˜ 22 ) + O∗1 , m

(5.3)

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

3465

where U=

˜ U11

 U˜ 12

U˜ 21

U˜ 22

,

˜ V11

V=

 V˜ 12

V˜ 21

V˜ 22

,

and U˜ 22 and V˜ 22 are (q − b) × (q − b) matrices. Let

∗ = √p 1 + m TLR p

⎫ ⎧ ⎨ q q ⎬   (1 + j ) − log (1 + j ) , log ⎭ ⎩ j=k+1

j=k+1



∗ = TLH



q q √ m  m  p⎝ j − j ⎠ , p p j=k+1

(5.4)

j=k+1

⎫ ⎧ ⎨

 



q q ⎬   √ m p m j j ∗ TBNP 1+ = p 1+ − 1+ . m ⎩ p 1 + j p 1 + j ⎭ j=k+1

j=k+1

∗ , T ∗ and T ∗ as follows: Using (4.1) and (5.3) we can express TLR LH BNP

∗ = TLR

b 



j=k+1 ∗ = TLH

b  j=k+1

∗ TBNP =

b 

√ √ 1+r ( rujj − j vjj ) + tr(U˜ 22 − rV˜ 22 ) + O∗1/2 , r(1 + j )

√ 1 √ √ ( rujj − j vjj ) + tr(U˜ 22 − rV˜ 22 ) + O∗1/2 , r



j=k+1

(1 + r)2 2

r(1 + j )

√ √ ( rujj − j vjj ) + tr(U˜ 22 − rV˜ 22 ) + O∗1/2 .

Therefore we can combine the above three expressions as b 

TG∗ =

#

j=k+1

1+r 1 + j

$c

√ 1 √ √ ( rujj − j vjj ) + tr(U˜ 22 − rV˜ 22 ) + O∗1/2 . r

Here the notations G and c are used for that ⎧ 1 when G = LR, ⎪ ⎪ ⎨ c = 0 when G = LH, ⎪ ⎪ ⎩ 2 when G = BNP. Based on Theorems 3.1 and 4.1, we obtain the following theorem. Theorem 5.1. Let TG∗ be the transformed test statistics defined by (5.4), where G = LR, LH or BNP. Then, under conditions (1.4) and (5.1), TG∗ d ∗ → N(0, 1),

G where

∗2 G =

b  j=k+1

#

1+r 2 1 + j

$2c ⎛ ⎝1 +

2j p

+

2j r

⎞ ⎠ + 2(q − b)(1 + r).

3466

Y. Fujikoshi et al. / Journal of Statistical Planning and Inference 138 (2008) 3457 -- 3466

5.2. Asymptotic powers Let G = TG − TG∗ , then √



b rj 1+r  , log 1 + (1 + r)p r

LR = m √

j=k+1

b  j



LH = p

j=k+1

p

,

⎧ ⎫ b ⎨1 + r  ⎬ j √ BNP = p(1 + r) − (b − k) . ⎩ r ⎭ 1 + j j=k+1

We have PD = Pr(TG > G z ) = Pr(TG∗ > G z − G ), where z is the upper 100 % point of the standard normal distribution. Using Theorem 5.1, the power PD with a level of significance is asymptotically expressed as # $  − z G ∗G ,

G

√ where is the distribution function of the standard normal distribution. If the order of j is larger than p, G → ∞ so that the √ asymptotic power is one, while if the order of j is small than p, the asymptotic power is since G → 0 and ∗G → G . Suppose √ that j = O( p) (j = 1, . . . , b). Then

∗G → G , G →

b  j  . √ = tr √ p p

j=k+1

Therefore the powers of the three tests converge to the common value,



1  − z . tr √  p Acknowledgments The authors would like to thank two referees for their useful comments and careful readings. References Anderson, T.W., 1951. The asymptotic distribution of certain characteristic roots and vectors. In: Neyman, J. (Ed.), Proceedings of the Second Berkerley Symposium on Mathematical Statistics and Probability. University of California, Berkeley, pp. 105--130. Anderson, T.W., 1963. Asymptotic theory for principal component analysis. Ann. Math. Statist. 34, 122--148. Bai, Z.D., 1999. Methodologies in spectral analysis of large dimensional random matrices, a review. Statist. Sinica 9, 611--677. Bickel, P.J., Levina, E., 2004. Some theory for Fisher's linear discriminant function, ”naive Bayes”, and some alternatives when there are more variables than observations. Bernoulli 10, 989--1010. Fujikoshi, Y., 1977. Asymptotic expansions for the distributions of some multivariate tests. In: Krishnaiah, P.R. (Ed.), Multivariate Analysis-IV. North-Holland, Amsterdam, pp. 55--71. Hsu, P.L., 1941. On the limiting distribution of roots of a determinantal equation. J. London Math. Soc. 16, 183--194. Johnstone, I.M., 2001. On the distribution of the largest eigenvalue in principal component analysis. Ann. Statist. 29, 295--327. Lawley, D.N., 1956. Tests of significance for the latent roots of covariance and correlation matrices. Biometrika 43, 128--136. Lawley, D.N., 1959. Tests of significance in canonical analysis. Biometrika 46, 59--66. Raudys, Š., Young, D.M., 2004. Results in statistical discriminant analysis: a review of the former Soviet Union Literature. J. Multivariate Anal. 89, 1--35. Siotani, M., Hayakawa, T., Fujikoshi, Y., 1985. Modern Multivariate Statistical Analysis: A Graduate Course and Handbook. American Sciences Press, Columbus, Ohio. Wakaki, H., Fujikoshi, Y., Ulyanov, V.V., 2006. Asymptotic expansions of the distributions of MANOVA tests when the dimension is large, submitted for publication.