Journal of Statistical Planning and Inference 170 (2016) 186–199
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context Aki Ishii a , Kazuyoshi Yata b , Makoto Aoshima b,∗ a
Graduate School of Pure and Applied Sciences, University of Tsukuba, Ibaraki, Japan
b
Institute of Mathematics, University of Tsukuba, Ibaraki 305-8571, Japan
article
info
Article history: Received 25 December 2014 Received in revised form 11 October 2015 Accepted 12 October 2015 Available online 26 October 2015 MSC: primary 34L20 secondary 62H25 Keywords: Contribution ratio Equality test of covariance matrices HDLSS Noise-reduction methodology PCA
abstract A common feature of high-dimensional data is that the data dimension is high, however, the sample size is relatively low. We call such data HDLSS data. In this paper, we study asymptotic properties of the first principal component in the HDLSS context and apply them to equality tests of covariance matrices for high-dimensional data sets. We consider HDLSS asymptotic theories as the dimension grows for both the cases when the sample size is fixed and the sample size goes to infinity. We introduce an eigenvalue estimator by the noise-reduction methodology and provide asymptotic distributions of the largest eigenvalue in the HDLSS context. We construct a confidence interval of the first contribution ratio and give a one-sample test. We give asymptotic properties both for the first PC direction and PC score as well. We apply the findings to equality tests of two covariance matrices in the HDLSS context. We provide numerical results and discussions about the performances both on the estimates of the first PC and the equality tests of two covariance matrices. © 2015 Elsevier B.V. All rights reserved.
1. Introduction One of the features of modern data is the data dimension d is high and the sample size n is relatively low. We call such data HDLSS data. In HDLSS situations such as d/n → ∞, new theories and methodologies are required to develop for statistical inference. One of the approaches is to study geometric representations of HDLSS data and investigate the possibilities to make use of them in HDLSS statistical inference. Hall et al. (2005), Ahn et al. (2007), and Yata and Aoshima (2012) found several conspicuous geometric descriptions of HDLSS data when d → ∞ while n is fixed. The HDLSS asymptotic studies usually assume either the normality as the population distribution or a ρ -mixing condition as the dependency of random variables in a sphered data matrix. See Jung and Marron (2009) and Jung et al. (2012). However, Yata and Aoshima (2009) developed an HDLSS asymptotic theory without assuming those assumptions and showed that the conventional principal component analysis (PCA) cannot give consistent estimation in the HDLSS context. In order to overcome this inconvenience, Yata and Aoshima (2012) provided the noise-reduction (NR) methodology that can successfully give consistent estimators of both the eigenvalues and eigenvectors together with the principal component (PC) scores. Furthermore, Yata and Aoshima (2010, 2013) created the cross-data-matrix (CDM) methodology that is a nonparametric method to ensure
∗
Corresponding author. Fax: +81 298 53 6501. E-mail address:
[email protected] (M. Aoshima).
http://dx.doi.org/10.1016/j.jspi.2015.10.007 0378-3758/© 2015 Elsevier B.V. All rights reserved.
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
187
consistent estimation of those quantities. Given this background, Aoshima and Yata (2011, 2015) developed a variety of inference for HDLSS data such as given-bandwidth confidence regions, two-sample tests, tests of equality of two covariance matrices, classification, variable selection, regression, pathway analysis and so on along with the sample size determination to ensure prespecified accuracy for each inference. In this paper, suppose we have a d × n data matrix, X(d) = [x1(d) , . . . , xn(d) ], where xj(d) = (x1j(d) , . . . , xdj(d) )T , j = 1, . . . , n, are independent and identically distributed (i.i.d.) as a d-dimensional distribution with a mean vector µd and covariance matrix 6d (≥ O). We assume n ≥ 3. The eigen-decomposition of 6d is given by 6d = Hd 3d HdT , where 3d = diag(λ1(d) , . . . , λd(d) ) is a diagonal matrix of eigenvalues, λ1(d) ≥ · · · ≥ λd(d) (≥ 0), and Hd = [h1(d) , . . . , hd(d) ] 1/2
is an orthogonal matrix of the corresponding eigenvectors. Let X(d) − [µd , . . . , µd ] = Hd 3d Z(d) . Then, Z(d) is a d × n sphered data matrix from a distribution with the zero mean and the identity covariance matrix. Let Z(d) = [z1(d) , . . . , zd(d) ]T and zi(d) = (zi1(d) , . . . , zin(d) )T , i = 1, . . . , d. Note that E (zij(d) zi′ j(d) ) = 0 (i ̸= i′ ) and Var(zi(d) ) = In , where In is the n1/2
dimensional identity matrix. The ith true PC score of xj(d) is given by hTi(d) (xj(d) − µd ) = λi(d) zij(d) (hereafter called sij(d) ). Note that Var(sij(d) ) = λi(d) for all i, j. Hereafter, the subscript d will be omitted for the sake of simplicity when it does not cause n any confusion. Let zoi = zi − (¯zi , . . . , z¯i )T , i = 1, . . . , d, where z¯i = n−1 k=1 zik . We assume that λ1 has multiplicity one in the sense that lim infd→∞ λ1 /λ2 > 1. Also, we assume that lim supd→∞ E (zij4 ) < ∞ for all i, j and P (limd→∞ ∥zo1 ∥ ̸= 0) = 1. Note that if X is Gaussian, zij s are i.i.d. as the standard normal distribution, N (0, 1). As necessary, we consider the following 1/2
assumption for the normalized first PC scores, z1j (= s1j /λ1 ), j = 1, . . . , n: (A-i) z1j , j = 1, . . . , n, are i.i.d. as N (0, 1). Note that P (limd→∞ ∥zo1 ∥ ̸= 0) = 1 under (A-i) from the fact that ∥zo1 ∥2 is distributed as χn2−1 , where χν2 denotes a random variable distributed as χ 2 distribution with ν degrees of freedom. Let us write the sample covariance matrix as n n S = (n − 1)−1 (X − X )(X − X )T = (n − 1)−1 j=1 (xj − x¯ )(xj − x¯ )T , where X = [¯x, . . . , x¯ ] and x¯ = j=1 xj /n. Then, we
ˆ 1 ≥ · · · ≥ λˆ n−1 ≥ 0 be the define the n × n dual sample covariance matrix by SD = (n − 1)−1 (X − X )T (X − X ). Let λ n−1 T ˆ ˆ ˆ ˆ eigenvalues of SD . Let us write the eigen-decomposition of SD as SD = uj1 , . . . , uˆ jn )T denotes a j=1 λj uj uj , where uj = (ˆ ˆ j . Note that S and SD share non-zero eigenvalues. Also, note that tr(S ) = tr(SD ). unit eigenvector corresponding to λ Here, we emphasize that the first principal component is quite important for high-dimensional data because λ1 often becomes much larger than the other eigenvalues as d increases in the sense that λj /λ1 → 0 as d → ∞ for all j ≥ 2. See Fig. 1 in Yata and Aoshima (2013) or Table 1 in Section 2 for example. In other words, the first principal component contains much useful information about high-dimensional data sets. In addition, λ1 and h1 can be accurately estimated for high-dimensional data by using the NR methodology even when n is fixed. It is likely that the first principal component is applicable to high-dimensional statistical inferences such as tests of mean vectors and covariance matrices. That is the reason why we focus on the first principal component in this paper. In this paper, we study asymptotic properties of the first principal component in the HDLSS context. We apply them to a one-sample test and equality tests of covariance matrices for high-dimensional data sets. We consider HDLSS asymptotic theories as d → ∞ for both the cases when n is fixed and n → ∞. In Section 2, we introduce an eigenvalue estimator by the NR methodology and provide asymptotic distributions of the largest eigenvalue in the HDLSS context. We construct a confidence interval of the first contribution ratio and give a one-sample test. In Section 3, we give asymptotic properties both for the first PC direction and PC score as well. In Section 4, we apply the findings to equality tests of two covariance matrices in the HDLSS context. Finally, in Section 5, we provide numerical results and discussions about the performances both on the estimates of the first PC and the equality tests of two covariance matrices. 2. Largest eigenvalue estimation and its applications In this section, we give asymptotic properties of the largest eigenvalue. We construct a confidence interval of the first contribution ratio and give a one-sample test. 2.1. Asymptotic distributions of the largest eigenvalue Let δi = tr(62 ) − eigenvalue:
i
s=1
λ2s =
d
s=i+1
λ2s for i = 1, . . . , d − 1. We consider the following assumptions for the largest
δ1 δi = o(1) as d → ∞ when n is fixed; 2∗ = o(1) as d → ∞ for some fixed i∗ (< d) when n → ∞. λ21 λ1 d 2 2 r ,s≥2 λr λs E {(zrk − 1)(zsk − 1)} (A-iii) = o(1) as d → ∞ either when n is fixed or n → ∞. nλ21 (A-ii)
Note that (A-ii) implies the conditions that λ2 /λ1 → 0 as d → ∞ when n is fixed and λi∗ +1 /λ1 → 0 as d → ∞ for some fixed i∗ when n → ∞. Also, note that (A-iii) holds when X is Gaussian and (A-ii) is met. See Remark 2.2.
188
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
Remark 2.1. For a spiked model such as
λj = aj dαj (j = 1, . . . , m) and λj = cj (j = m + 1, . . . , d) with positive (fixed) constants, aj s, cj s and αj s, and a positive (fixed) integer m, (A-ii) holds under the condition that α1 > 1/2 and α1 > α2 when n is fixed. When n → ∞, (A-ii) holds under α1 > 1/2 even if α1 = αm . See Yata and Aoshima (2012) for the details. Remark 2.2. For several statistical inferences of high-dimensional data, Bai and Saranadasa (1996), Chen and Qin (2010) and Aoshima and Yata (2015) assumed a general factor model as follows: xj = 0wj + µ for j = 1, . . . , n, where 0 is a d × r matrix for some r > 0 such that 00T = 6, and wj , j = 1, . . . , n, are i.i.d. random vectors 2 having E (wj ) = 0 and Var(wj ) = Ir . As for wj = (w1j , . . . , wrj )T , assume that E (wqj wsj2 ) = 1 and E (wqj wsj wtj wuj ) = 0 for all q ̸= s, t , u. From Lemma 1 in Yata and Aoshima (2013), one can claim that (A-iii) holds under (A-ii) in the factor model. Also, we note that the factor model naturally holds when X is Gaussian. Let κ = tr(6) − λ1 =
d
s=2
λs . Then, we have the following result.
Proposition 2.1. Under (A-ii) and (A-iii) , it holds that
√ λˆ 1 κ − ∥zo1 / n − 1∥2 − = op (1) λ1 λ1 ( n − 1 ) as d → ∞ either when n is fixed or n → ∞. Remark 2.3. (A-ii) and (A-iii) are milder when n → ∞ compared to when fixed. Jung et al. (2012) gave a result similar to Proposition 2.1 when X is Gaussian, µ = 0 and n is fixed.
√
√
It holds that E (∥zo1 / n − 1∥2 ) = 1 and ∥zo1 / n − 1∥2 = 1 + op (1) as n → ∞. If κ/(nλ1 ) = o(1) as d → ∞ and
ˆ 1 is a consistent estimator of λ1 . When n is fixed, the condition ‘κ/λ1 = o(1)’ is equivalent to ‘λ1 /tr(6) = 1 + o(1)’ n → ∞, λ in which the contribution ratio of the first principal component is asymptotically 1. In that sense, ‘κ/λ1 = o(1)’ is quite strict condition in real high-dimensional data analyses. Hereafter, we assume lim infd→∞ κ/λ1 > 0. Yata and Aoshima (2012) proposed a method for eigenvalue estimation called the noise-reduction (NR) methodology that was brought by a geometric representation of SD . If one applies the NR method to the present case, λi s are estimated by tr(SD ) −
λ˜ i = λˆ i −
i j =1
n−1−i
λˆ j (i = 1, . . . , n − 2).
(2.1)
˜ i ≥ 0 w.p.1 for i = 1, . . . , n − 2. Also, note that the second term in (2.1) with i = 1 is an estimator of κ/(n − 1). See Note that λ ˜ i has several consistency properties Lemma 2.1 in Section 2.2 for the details. Yata and Aoshima (2012, 2013) showed that λ ˜ 1 when d → ∞ while n is when d → ∞ and n → ∞. On the other hand, Ishii et al. (2014) gave asymptotic properties of λ fixed. The following theorem summarizes their findings: Theorem 2.1 (Yata and Aoshima, 2013; Ishii et al., 2014). Under (A-ii) and (A-iii), it holds that as d → ∞
√ 2 λ˜ 1 = ∥zo1 / n − 1∥ + op (1) 1 + op (1) λ1
when n is fixed, when n → ∞.
Under (A-i) to (A-iii), it holds that as d → ∞
λ˜ 1 (n − 1) ⇒ χn2−1 when n is fixed, λ1 ˜1 n − 1λ − 1 ⇒ N (0, 1) when n → ∞. 2 λ1 Here, ‘‘ ⇒’’ denotes the convergence in distribution.
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
189
Table 1
ˆ j s and λ˜ j s, for the microarray data sets. Estimates of the first five eigenvalues by λ n
λˆ 1 , λˆ 2 , λˆ 3 , λˆ 4 , λˆ 5
λ˜ 1 , λ˜ 2 , λ˜ 3 , λ˜ 4 , λ˜ 5
Lymphoma data with 7129 (= d) genes given by Shipp et al. (2002) DLBC 58 1862, 564, 490, 398, 324 Follicular 19 2476, 704, 614, 533, 369
1768, 479, 412, 326, 257 2203, 457, 392, 333, 182
Prostate cancer data with 12625 (= d) genes given by Singh et al. (2002) Normal 50 6760, 562, 426, 371, 304 Prostate 52 6106, 687, 512, 462, 298
6637, 450, 320, 271, 209 5976, 568, 401, 359, 199
2.2. Confidence interval of the first contribution ratio We consider a confidence interval for the contribution ratio of the first principal component. Let a and b be constants satisfying P (a ≤ χn2−1 ≤ b) = 1 − α , where α ∈ (0, 1). Then, from Theorem 2.1, under (A-i) to (A-iii), it holds that P
λ (n − 1)λ˜ (n − 1)λ˜ 1 λ˜ 1 1 1 ∈ , = P a ≤ (n − 1) ≤ b = 1 − α + o(1) ˜ 1 aκ + (n − 1)λ˜ 1 tr(6) λ1 bκ + (n − 1)λ
(2.2)
as d → ∞ when n is fixed. We need to estimate κ in (2.2). Here, we give a consistent estimator of κ by κ˜ = (n − 1)(tr(SD ) − λˆ 1 )/(n − 2) = tr(SD ) − λ˜ 1 . Then, we have the following results. Lemma 2.1. Under (A-ii) and (A-iii), it holds that
κ˜ κ κ˜ = 1 + op (1) and = + op (1) κ λ1 λ1 as d → ∞ either when n is fixed or n → ∞. Theorem 2.2. Under (A-i) to (A-iii), it holds that P
λ (n − 1)λ˜ (n − 1)λ˜ 1 1 1 ∈ , = 1 − α + o(1) ˜ 1 aκ˜ + (n − 1)λ˜ 1 tr (6) bκ˜ + (n − 1)λ
(2.3)
as d → ∞ when n is fixed.
˜ 1 )/tr(6) = Remark 2.4. From Theorem 2.1 and Lemma 2.1, under (A-ii) and (A-iii), it holds that tr(SD )/tr(6) = (κ˜ + λ 1 + op (1) as d → ∞ and n → ∞. We have that λ1 λ˜ 1 = {1 + op (1)}. tr(SD ) tr(6) Remark 2.5. The constants (a, b) should be chosen for (2.3) to have the minimum length. If λ1 /κ = o(1), the length of the ˜ 1 /κ}( confidence interval becomes close to {(n − 1)λ ˜ 1/a − 1/b) under (A-ii) and (A-iii) when d → ∞ and n is fixed. Thus, we recommend to choose constants (a, b) such that argmin(1/a − 1/b) subject to Gn−1 (b) − Gn−1 (a) = 1 − α, a,b
where Gn−1 (·) denotes the c.d.f. of χn2−1 . We used gene expression data sets and constructed a confidence interval for the contribution ratio of the first principal component. The microarray data sets were as follows: Lymphoma data with 7129 (= d) genes consisting of diffuse large Bcell (DLBC) lymphoma (58 samples) and follicular lymphoma (19 samples) given by Shipp et al. (2002); and prostate cancer data with 12625 (= d) genes consisting of normal prostate (50 samples) and prostate tumor (52 samples) given by Singh et al. (2002). The data sets are given in Jeffery et al. (2006). We standardized each sample so as to have the unit variance. ˜ 1 + κ˜ = d. We gave estimates of the first five eigenvalues by λˆ j s and λ˜ j s in Then, it holds that tr(S ) (= tr(SD )) = d, so that λ Table 1. We observed that the first eigenvalues are much larger than the others especially for prostate cancer data. We also ˆ j was larger than λ˜ j for j = 1, . . . , 5, as expected theoretically from the fact that λˆ j /λ˜ j ≥ 0 w.p.1 for all j. observed that λ
˜ 21 having Wn by (4) in Aoshima and Yata (2015), where Wn is an unbiased We considered an estimator of δ1 by δ˜ 1 = Wn − λ
˜ 21 = 0.163 for DLBC lymphoma, δ˜1 /λ˜ 21 = −0.082 for follicular and consistent estimator of tr(62 ). We calculated that δ˜ 1 /λ
˜ 21 = −0.245 for normal prostate and δ˜1 /λ˜ 21 = −0.235 for prostate tumor. From these observations, we lymphoma, δ˜ 1 /λ concluded that these data sets satisfy (A-ii). In addition, from Remark 3.1 given in Section 3, by using Jarque–Bera test, we
190
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199 Table 2
˜ 1 and κ˜ , for the microarray data sets. The 95% confidence interval (CI) of the first contribution ratio, together with λ
DLBC lymphoma Follicular lymphoma Normal prostate Prostate tumor
(n , d )
CI
λ˜ 1
κ˜
(58, 7129) (19, 7129) (50, 12625) (52, 12625)
[0.183, 0.322] [0.178, 0.467] [0.422, 0.622] [0.374, 0.569]
1768 2203 6637 5976
5361 4926 5988 6649
could confirm that these data sets satisfy (A-i) with the level of significance 0.05. On the other hand, it is difficult to check whether (A-iii) holds or not. However, from Remark 2.2, (A-iii) must be a natural condition under (A-ii), so that we assume (A-iii) for these data sets. Hence, from Theorem 2.2, we constructed a 95% confidence interval of the first contribution rate for each data set by choosing (a, b) as in Remark 2.5. The results are summarized in Table 2.
2.3. Test of mean vector We consider the following one-sample test for the mean vector: H0 : µ = µ0
vs. H1 : µ ̸= µ0 ,
(2.4)
where µ0 is a candidate mean vector such as µ0 = 0. Here, we have the following result. Lemma 2.2. Under (A-ii), it holds that
√ ∥zo1 / n − 1∥2 ∥¯x − µ∥2 − tr (SD )/n = z¯12 − + op (1) λ1 n as d → ∞ when n is fixed. Let F0 =
n∥¯x − µ0 ∥2 − tr(SD )
λ˜ 1
+ 1.
˜ 1 (F0 − 1)/n) = ∥µ − µ0 ∥2 . Then, by combining Theorem 2.1 and Lemma 2.2, we have the following result. Note that E (λ Theorem 2.3. Under (A-i) to (A-iii), it holds that F0 ⇒ F1,n−1
under H0 in (2.4)
as d → ∞ when n is fixed, where Fν1 ,ν2 denotes a random variable distributed as F distribution with degrees of freedom, ν1 and ν2 . For a given α ∈ (0, 1/2) we test (2.4) by accepting H1 ⇐⇒ F0 > F1,n−1 (α), where Fν1 ,ν2 (α) denotes the upper α % point of F distribution with degrees of freedom, ν1 and ν2 . Then, under (A-i) to (A-iii), it holds that size = α + o(1) as d → ∞ when n is fixed. For the same gene expression data as in Section 2.2, we tested (2.4) with µ0 = 0 and α = 0.05. We observed that H1 was accepted for all four data sets.
3. First PC direction and PC score In this section, we give asymptotic properties of the first PC direction and PC score in the HDLSS context.
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
191
3.1. Asymptotic properties of the first PC direction
ˆ = [hˆ 1 , . . . , hˆ d ], where Hˆ is a d × d orthogonal matrix of the sample eigenvectors such that Hˆ T S Hˆ = 3 ˆ having Let H Tˆ ˆ ˆ ˆ ˆ 3 = diag(λ1 , . . . , λd ). We assume hi hi ≥ 0 w.p.1 for all i without loss of generality. Note that hi can be calculated by ˆ i }−1/2 (X − X )uˆ i . First, we have the following result. hˆ i = {(n − 1)λ Lemma 3.1. Under (A-ii) and (A-iii), it holds that
hˆ T1 h1 − 1 +
κ λ1 ∥zo1 ∥
−1/2 2
= op (1)
as d → ∞ either when n is fixed or n → ∞. If κ/(nλ1 ) = o(1) as d → ∞ and n → ∞, hˆ 1 is a consistent estimator of h1 in the sense that hˆ T1 h1 = 1 + op (1). When n is
fixed, hˆ 1 is not a consistent estimator because lim infd→∞ κ/λ1 > 0. In order to overcome this inconvenience, we consider ˜ i }−1/2 (X − X )uˆ i . From Lemma 3.1, we have the applying the NR methodology to the PC direction vector. Let h˜ i = {(n − 1)λ following result. Theorem 3.1. Under (A-ii) and (A-iii), it holds that h˜ T1 h1 = 1 + op (1) as d → ∞ either when n is fixed or n → ∞.
ˆ 1 /λ˜ 1 ≥ 1 w.p.1. We emphasize that h˜ 1 is a consistent estimator of h1 in the sense of the inner Note that ∥h˜ 1 ∥2 = λ product even when n is fixed though h˜ 1 is not a unit vector. We give an application of h˜ 1 in Section 4. 3.2. Asymptotic properties of the first PC score Let zoij = zij − z¯i for all i, j. Note that zoi = (zoi1 , . . . , zoin )T for all i. First, we have the following result. Lemma 3.2. Under (A-ii) and (A-iii), it holds that uˆ 1j = zo1j /∥zo1 ∥ + op (1)
for j = 1, . . . , n
as d → ∞ when n is fixed. Remark 3.1. From Lemma 3.2, by using uˆ 1j s and the test of normality such as Jarque–Bera test, one can check whether (A-i) holds or not. By applying the NR methodology to the first PC score, we obtain an estimate by s˜1j = sample mean squared error of the first PC score is given by MSE(˜s1 ) = n Lemma 3.2, we have the following result.
−1
n
j =1
(n − 1)λ˜ 1 uˆ 1j , j = 1, . . . , n. A
(˜s1j − s1j ) . Then, from Theorem 2.1 and 2
Theorem 3.2. Under (A-ii) and (A-iii), it holds that 1
√ (˜s1j − s1j ) = −¯z1 + op (1) for j = 1, . . . , n λ1 as d → ∞ when n is fixed. Under (A-i) to (A-iii), it holds that
n
λ1
(˜s1j − s1j ) ⇒ N (0, 1) for j = 1, . . . , n;
and
n
MSE (˜s1 )
λ1
⇒ χ12
as d → ∞ when n is fixed.
ˆ 1 uˆ 1j , j = 1, . . . , n. From Theorems Remark 3.2. The conventional estimator of the first PC score is given by sˆ1j = (n − 1)λ 8.1 and 8.2 in Yata and Aoshima (2013), under (A-ii) and (A-iii), it holds that as d → ∞ and n → ∞ MSE(ˆs1 )
λ1
= op (1) if κ/(nλ1 ) = o(1),
and
MSE(˜s1 )
λ1
= op (1).
192
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
4. Equality tests of two covariance matrices In this section, we consider the test of equality of two covariance matrices in the HDLSS context. Even though there are a variety of tests to deal with covariance matrices when d → ∞ and n → ∞, there seem to be no tests available in the HDLSS context such as d → ∞ while n is fixed. Suppose we have two independent d × ni data matrices, Xi = [x1(i) , . . . , xni (i) ], i = 1, 2, where xj(i) , j = 1, . . . , ni , are i.i.d. as a d-dimensional distribution, πi , having a mean vector µi and covariance matrix 6i (≥ O). We assume ni ≥ 3, i = 1, 2. The eigen-decomposition of 6i is given by 6i = Hi 3i HiT , where 3i = diag(λ1(i) , . . . , λd(i) ) having λ1(i) ≥ · · · ≥ λd(i) (≥ 0) and Hi = [h1(i) , . . . , hd(i) ] is an orthogonal matrix of the corresponding eigenvectors. We assume that lim infd→∞ λ1(i) /λ2(i) > 0 for i = 1, 2. Also, we assume that lim supd→∞ E (zsj4 ) < ∞ for all s, j and P (limd→∞ ∥zo1 ∥ ̸= 0) = 1, for each πi . 4.1. Equality test using the largest eigenvalues We consider the following test for the largest eigenvalues: H0 : λ1(1) = λ1(2)
vs. Ha : λ1(1) ̸= λ1(2)
(or Hb : λ1(1) < λ1(2) ).
(4.1)
˜ 1(i) be the estimate of λ1(i) by the NR methodology as in (2.1) for πi . Let ν1 = n1 − 1 and ν2 = n2 − 1. From Theorem 2.1, Let λ we have the following result. Corollary 4.1. Under (A-i) to (A-iii) for each πi , it holds that
λ˜ 1(1) /λ1(1) ⇒ Fν1 ,ν2 λ˜ 1(2) /λ1(2) as d → ∞ when ni s are fixed.
˜ 1(1) /λ˜ 1(2) . For a given α ∈ (0, 1/2) we test (4.1) by Let F1 = λ accepting Ha ⇐⇒ F1 ̸∈ [{Fν2 ,ν1 (α/2)}−1 , Fν1 ,ν2 (α/2)] or
accepting Hb ⇐⇒ F1 < {Fν2 ,ν1 (α)}
−1
.
(4.2) (4.3)
Then, under (A-i) to (A-iii) for each πi , it holds that size = α + o(1) as d → ∞ when ni s are fixed. d ˆ 1(i) . Let κi = tr(6i ) − λ1(i) = Now, we consider a test by the conventional estimator, λ s=2 λs(i) for i = 1, 2. From Proposition 2.1, if κi /λ1(i) = o(1), i = 1, 2, under (A-i) for each πi it holds that
λˆ 1(1) /λ1(1) ⇒ Fν1 ,ν2 λˆ 1(2) /λ1(2) as d → ∞ when ni s are fixed. As mentioned in Section 2, the condition ‘κi /λ1(i) = o(1) for i = 1, 2’ is quite strict in real high-dimensional data analyses. See Table 2 for example. Hereafter, we assume lim infd→∞ κi /λ1(i) > 0 for i = 1, 2. 4.2. Equality test using the largest eigenvalues and their PC directions We consider the following test using the largest eigenvalues and their PC directions: H0 : (λ1(1) , h1(1) ) = (λ1(2) , h1(2) ) vs.
Ha : (λ1(1) , h1(1) ) ̸= (λ1(2) , h1(2) ).
(4.4)
Let h˜ 1(i) be the estimator of the first PC direction for πi by the NR methodology given in Section 3.1. We assume hT1(i) h˜ 1(i) ≥ 0 w.p.1 for i = 1, 2, without loss of generality. Here, we have the following result. Lemma 4.1. Under (A-ii) and (A-iii) for each πi , it holds that h˜ T1(1) h˜ 1(2) = hT1(1) h1(2) + op (1) as d → ∞ either when ni is fixed or ni → ∞ for i = 1, 2.
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
193
We note that under H0 in (4.4) 1 (λ1(i) h1(i) )T (λ− 1(j) h1(j) ) = 1 for i = 1, 2; j ̸= i.
Hence, one may consider a test statistic such as F1 |h˜ T1(1) h˜ 1(2) | or F1 |h˜ T1(1) h˜ 1(2) |−1 . From Corollary 4.1 and Lemma 4.1, F1 |h˜ T1(1) h˜ 1(2) | and F1 |h˜ T1(1) h˜ 1(2) |−1 are asymptotically distributed as Fν1 ,ν2 . Let h˜ = max{|h˜ T1(1) h˜ 1(2) |, |h˜ T1(1) h˜ 1(2) |−1 }. Note that h˜ ≥ 1 w.p.1. Then, in view of the power, we give a test statistic for (4.4) as follows: F2 =
λ˜ 1(1) ˜ h∗ (= F1 h˜ ∗ ), λ˜ 1(2)
where
˜ ˜h∗ = h−1 h˜
˜ 1(1) ≥ λ˜ 1(2) , if λ otherwise.
From Lemma 4.1, we have the following result. Theorem 4.1. Under (A-i) to (A-iii) for each πi , it holds that F2 ⇒ Fν1 ,ν2
under H0 in (4.4)
as d → ∞ when ni s are fixed. From Theorem 4.1, we consider testing (4.4) by (4.2) with F2 instead of F1 . Then, the size becomes close to α as d increases. 4.3. Equality test of the covariance matrices We consider the following test for the covariance matrices: H0 : 61 = 62
vs. Ha : 61 ̸= 62 .
(4.5)
When d → ∞ and ni s are fixed, one can estimate λ1(i) s and h1(i) s by the NR methodology, however, one cannot estimate λj(i) s and hj(i) s for j = 2, . . . , d. Instead, we consider estimating κi s. Let SD(i) be the dual sample covariance matrix for πi .
˜ 1(i) for i = 1, 2. From Lemma 2.1, under (A-ii) and (A-iii) for each πi , κ˜ i s are consistent We estimate κi by κ˜ i = tr(SD(i) ) − λ estimators of κi s in the sense that κ˜ i /κi = 1 + op (1) as d → ∞ when ni s are fixed. Let γ˜ = max{κ˜ 1 /κ˜ 2 , κ˜ 2 /κ˜ 1 }. Similar to F2 , we give a test statistic for (4.5) as follows: F3 =
λ˜ 1(1) ˜ h∗ γ˜∗ (= F2 γ˜∗ ), λ˜ 1(2)
where
γ˜∗ =
γ˜ γ˜ −1
˜ 1(1) ≥ λ˜ 1(2) , if λ otherwise.
Then, we have the following result. Theorem 4.2. Under (A-i) to (A-iii) for each πi , it holds that F3 ⇒ Fν1 ,ν2
under H0 in (4.5)
as d → ∞ when ni s are fixed. From Theorem 4.2, we consider testing (4.5) by (4.2) with F3 instead of F1 . Then, the size becomes close to α as d increases. We analyzed lymphoma data given by Shipp et al. (2002) and prostate cancer data given by Singh et al. (2002) which are the same gene expression data as in Section 2.2. When each sample is standardized, we note that κ˜ 1 ≈ κ˜ 2 if λ1(i) /κi = o(1), i = 1, 2, since tr(SD(1) ) = tr(SD(2) ) = d, so that one loses information about the difference between κ1 and κ2 . Hence, we did not standardize each sample. We set α = 0.05. We considered two cases: (I) π1 : DLBC lymphoma (n1 = 58) and π2 : follicular lymphoma (n2 = 19) and (II) π1 : normal prostate (n1 = 50) and π2 : prostate tumor (n2 = 52). We compared the performance of F3 with two other test statistics, Q22 and T22 , by Srivastava and Yanagihara (2010). The results are summarized in Table 3. We observed that F3 accepted Ha for (I) and H0 for (II), namely, F3 rejected H0 in (4.5) for (I). On the other hand, Q22 and T22 did not work for these data sets because Q22 and T22 are established under the severe conditions that 0 < limd→∞ tr(6i )/d < ∞ (i = 1, . . . , 4) and d1/2 /n = o(1). As observed in Table 1, the conditions seem not to hold for these data sets. Hence, there is no theoretical guarantee for the results by Q22 and T22 .
194
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199 Table 3 Tests of H0 : 61 = 62 vs. Ha : 61 ̸= 62 with size 0.05 for two data sets: (I) lymphoma data with d = 7129 given by Shipp et al. (2002) and (II) prostate cancer data with d = 12625 given by Singh et al. (2002).
(I) π1 : DLBC, π2 : Follicular (II) π1 : Normal, π2 : Tumor
Ha by F3
Ha by Q22
Ha by T22
Accept Reject
Accept Reject
Reject Reject
ˆ 1 /λ1 and B: λ˜ 1 /λ1 are denoted by the dashed lines for (a) and by the dotted lines for (b) in the left panel. The values of A: var(λˆ 1 /λ1 ) Fig. 1. The values of A: λ ˜ 1 /λ1 ) are denoted by the dashed lines for (a) and by the dotted lines for (b) in the left panel. The asymptotic variance of λ˜ 1 /λ1 was given by and B: var(λ Var{χn2−1 /(n − 1)} = 0.222 and denoted by the solid line in the left panel.
5. Numerical results and discussions 5.1. Comparisons of the estimates on the first PC
˜ 1 , h˜ 1 and s˜1j with their conventional counterparts by Monte Carlo In this section, we compared the performance of λ k simulations. We set d = 2 , k = 3, . . . , 11 and n = 10. We considered two cases for λi s: (a) λi = d1/i , i = 1, . . . , d and (b) λi = d3/(2+2i) , i = 1, . . . , d. Note that λ1 = d for (a) and λ1 = d3/4 for (b). Also, note that (A-ii) holds both for (a) and (b). Let d∗ = ⌈d1/2 ⌉, where ⌈x⌉ denotes the smallest integer ≥ x. We considered a non-Gaussian distribution as follows: (z1j , . . . , zd−d∗ j )T , j = 1, . . . , n, are i.i.d. as Nd−d∗ (0, Id−d∗ ) and (zd−d∗ +1j , . . . , zdj )T , j = 1, . . . , n, are i.i.d. as the d∗ -variate t-distribution, td∗ (0, Id∗ , 10) with mean zero, covariance matrix Id∗ and degrees of freedom 10, where (z1j , . . . , zd−d∗ j )T and (zd−d∗ +1j , . . . , zdj )T are independent for each j. Note that (A-i) and (A-iii) hold both for (a) and (b) from the fact that 2 λr λs E {(zrk2 − 1)(zsk − 1)} = 2
λ2s + O(
2 r ,s≥d−d∗ +1 λr λs ) = o(λ1 ). The findings were obtained by averaging the outcomes from 2000 (= R, say) replications. Under a fixed scenario, suppose ˆ ˆ ˜ ˜ that the rth replication ends with estimates, (λ 1r , h1r , MSE(ˆs1 )r ) and (λ1r , h1r , MSE(˜s1 )r ) (r = 1, . . . , R). Let us simply ˆ 1 = R−1 Rr=1 λˆ 1r and λ˜ 1 = R−1 Rr=1 λ˜ 1r . We also considered the Monte Carlo variability by var(λˆ 1 /λ1 ) = write λ
d
r ,s≥2
d−d∗ s =2
d
(λˆ 1r − λˆ 1 )2 /λ21 and var(λ˜ 1 /λ1 ) = (R − 1)−1 Rr=1 (λ˜ 1r − λ˜ 1 )2 /λ21 . Fig. 1 shows the behaviors of (λˆ 1 /λ1 , λ˜ 1 /λ1 ) ˆ 1 /λ1 ), var(λ˜ 1 /λ1 )) in the right panel for (a) and (b). We gave the asymptotic variance of λ˜ 1 /λ1 in the left panel and (var(λ by Var{χn2−1 /(n − 1)} = 0.222 from Theorem 2.1 and showed it by the solid line in the right panel. We observed that the ˜ 1 /λ1 become close to those asymptotic values as d increases. sample mean and variance of λ T ˆ Similarly, we plotted (h1 h1 , h˜ T1 h1 ) and (var(hˆ T1 h1 ), var(h˜ T1 h1 )) in Fig. 2 and (MSE(ˆs1 )/λ1 , MSE(˜s1 )/λ1 ) and (var(MSE(ˆs1 )/ λ1 ), var(MSE(˜s1 )/λ1 )) in Fig. 3. From Theorem 3.2, we gave the asymptotic mean of MSE(˜s1 )/λ1 by E (χ12 /n) = 0.1 and showed it by the solid line in the left panel of Fig. 3. We also gave the asymptotic variance of MSE(˜s1 )/λ1 by Var(χ12 /n) = 0.02 (R − 1)−1
R
r =1
in the right panel of Fig. 3. Throughout, the estimators by the NR method gave good performances both for (a) and (b) when d is large. However, the conventional estimators gave poor performances especially for (b). This is probably because the bias of the conventional estimators, κ/{(n − 1)λ1 }, is large for (b) compared to (a). See Proposition 2.1 for the details. 5.2. Equality tests of two covariance matrices We used computer simulations to study the performance of the test procedures by (4.2) with F1 for (4.1), F2 for (4.4) and F3 for (4.5). We set α = 0.05. Independent pseudo-random normal observations were generated from πi : Nd (0, 6i ), i = 1, 2. We set (n1 , n2 ) = (15, 25). We considered the cases: d = 2k , k = 4, . . . , 12, and
6i =
6i(1)
Od−2,2
O2,d−2
6i(2)
,
i = 1, 2,
(5.1)
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
195
Fig. 2. The values of A: hˆ T1 h1 and B: h˜ T1 h1 are denoted by the dashed lines for (a) and by the dotted lines for (b) in the left panel. The values of A: var(hˆ T1 h1 ) and B: var(h˜ T1 h1 ) are denoted by the dashed lines for (a) and by the dotted lines for (b) in the right panel.
Fig. 3. The values of A: MSE(ˆs1 )/λ1 and B: MSE(˜s1 )/λ1 are denoted by the dashed lines for (a) and by the dotted lines for (b) in the left panel. The values of A: var(MSE(ˆs1 )/λ1 ) and B: var(MSE(˜s1 )/λ1 ) are denoted by the dashed lines for (a) and by the dotted lines for (b) in the right panel. The asymptotic mean and variance of MSE(˜s1 )/λ1 were given by E (χ12 /n) = 0.1 and Var(χ12 /n) = 0.02 and denoted by the solid lines in both panels.
where Ok,l is the k × l zero matrix, 61(1) = diag(d3/4 , d1/2 ) and 61(2) = (0.3|s−t | ). When considered the alternative hypotheses, we set
62(1)
√ 1/√2 = 1/ 2
√
1/ √2
−1/ 2
3/4
diag(3d
, 1.5d
1/2
√ 1/√2 ) 1/ 2
√
1/ √2
−1/ 2
(5.2)
√
and 62(2) = 1.5(0.3|s−t | ). Note that λ1(2) /λ1(1) = 3, κ2 /κ1 = 1.5 and hT1(1) h1(2) = 1/ 2. Also, note that (A-i) to (A-iii) hold
for each πi . Let h = max{|hT1(1) h1(2) |, |hT1(1) h1(2) |−1 } and γ = max{κ1 /κ2 , κ2 /κ1 }. From Lemmas 2.1 and 4.1, it holds that h˜ = h + op (1) and γ˜ = γ + op (1). Thus, from Corollary 4.1, Theorems 4.1 and 4.2, we obtained the asymptotic powers of F1 , F2 and F3 with (h˜ ∗ , γ˜∗ ) = (h−1 , γ −1 ) as follows: Power(F1 ) = P (λ1(1) /λ1(2) )f ̸∈ [{Fν2 ,ν1 (α/2)}−1 , Fν1 ,ν2 (α/2)] = 0.577,
Power(F2 ) = P h−1 (λ1(1) /λ1(2) )f ̸∈ [{Fν2 ,ν1 (α/2)}−1 , Fν1 ,ν2 (α/2)] = 0.823
and Power(F3 ) = P γ −1 h−1 (λ1(1) /λ1(2) )f ̸∈ [{Fν2 ,ν1 (α/2)}−1 , Fν1 ,ν2 (α/2)] = 0.963, where f denotes a random variable distributed as F distribution with degrees of freedom, ν1 and ν2 . Note that Power(F2 ) and Power(F3 ) give lower bounds of the asymptotic powers when h˜ ∗ = h−1 and γ˜∗ = γ −1 . In Fig. 4, we summarized the findings obtained by averaging the outcomes from 4000 (= R, say) replications. Here, the first 2000 replications were generated by setting 62 = 61 as in (5.1) and the last 2000 replications were generated by setting 62 as in (5.2). Let Fir (i = 1, 2, 3) be the rth observation of Fi for r = 1, . . . , 4000. We defined Pr = 1 (or 0) when H0 was falsely rejected (or not) for r = 1, . . . , 2000, and Ha was falsely rejected (or not) for r = 2001, . . . , 4000. We defined R/2 α = (R/2)−1 r =1 Pr to estimate the size and 1 − β = 1 − (R/2)−1 Rr=R/2+1 Pr to estimate the power. Their standard deviations are less than 0.011. When d is not sufficiently large, we observed that the sizes of F2 and F3 are quite higher than α . This is probably because h˜ ∗ (≥ 1) and γ˜∗ (≥ 1) are much larger than 1. Actually, the sizes became close to α as d increases. When d is large, F3 gave excellent performances both for the size and power.
196
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
Fig. 4. The values of α are denoted by the dashed lines in the left panel and the values of 1 − β are denoted by the dashed lines in the right panel for F1 , F2 and F3 . The asymptotic powers were given by Power(F1 ) = 0.577, Power(F2 ) = 0.823 and Power(F3 ) = 0.963 which were denoted by the solid lines in the right panel.
Acknowledgments The authors would like to thank two anonymous referees for their comments. The research of the second author was partially supported by Grant-in-Aid for Young Scientists (B), Japan Society for the Promotion of Science (JSPS), under Contract Number 26800078. The research of the third author was partially supported by Grants-in-Aid for Scientific Research (B) and Challenging Exploratory Research, JSPS, under Contract Numbers 22300094 and 26540010. Appendix Throughout, let Pn = In − 1n 1Tn /n, where 1n = (1, . . . , 1)T . Let en = (e1 , . . . , en )T be an arbitrary (random) n-vector such that ∥en ∥ = 1 and eTn 1n = 0. ∗ T T Proof of Proposition 2.1. We assume µ = 0 without loss of generality. We write that X T X = s=1 λs zs zs + s=i∗ +1 λs zs zs for i∗ = 1 when n is fixed, and for some fixed i∗ (≥ 1) when n → ∞. Here, by using Markov’s inequality, for any τ > 0, under (A-ii) and (A-iii), we have that
i
P
n d λs (zsj2 − 1) 2 j =1
and P
s=i∗ +1
nλ 1
d
>τ ≤
r ,s ≥ 2
d
2 λr λs E {(zrk2 − 1)(zsk − 1)}
→0
τ nλ21
n d δi λs zsj zsj′ 2 > τ ≤ ∗2 → 0 n λ τ λ1 1 j̸=j′ s=i∗ +1
as d → ∞ either when n is fixed or n → ∞. Note that have that
(A.1)
n
j =1
e4j ≤ 1 and
n
j̸=j′
e2j e2j′ ≤ 1. Then, under (A-ii) and (A-iii), we
n d n n d 1/2 λs (zsj2 − 1) λs (zsj2 − 1) 2 1/2 e2j e4j ≤ nλ1 nλ1 j =1 s=i∗ +1 j =1 j=1 s=i∗ +1 = op (1) and n d n n d 1/2 λs zsj zsj′ λs zsj zsj′ 2 1/2 ej ej′ e2j e2j′ ≤ nλ 1 nλ1 s=i∗ +1 j̸=j′ j̸=j′ j̸=j′ s=i∗ +1
= op (1) as d → ∞ either when n is fixed or n → ∞. Thus, we claim that i∗
eTn
XT X
(n − 1)λ1
en =
s=1 eTn
λs zs zsT
(n − 1)λ1
en +
κ + op (1) (n − 1)λ1
(A.2)
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
197
λs /{(n − 1)λ1 } = κ/{(n − 1)λ1 } + o(1) when n → ∞. Note that eTn Pn = eTn and Pn zs = zos for all T s. Also, note that / = op (1) for s ̸= s′ as n → ∞ from the fact that E {(zos zos′ /n)2 } = o(1) as n → ∞. Then, by noting T that P (limd→∞ ∥zo1 ∥ ̸= 0) = 1, lim infd→∞ λ1 /λ2 > 1 and zo1 1n = 0, it holds that
from the fact that
d
s=i∗ +1 T zos zos′ n
i∗
max
en
i∗
λs zs zsT
s=1 eTn
(n − 1)λ1
en
= max
T λs zos zos
s=1 eTn
en
(n − 1)λ1 √ = ∥zo1 / n − 1∥2 + op (1) en
(A.3)
ˆ T1 1n = 0 and uˆ T1 Pn = uˆ T1 when SD ̸= O. Then, from (A.2), (A.3) and as d → ∞ either when n is fixed or n → ∞. Note that u Pn X T XPn /(n − 1) = SD , under (A-ii) and (A-iii), we have that ˆ T1 u
SD
λ1
ˆ 1 = uˆ T1 u
XT X
√
(n − 1)λ1
ˆ 1 = ∥zo1 / n − 1∥2 + u
κ + op (1) (n − 1)λ1
as d → ∞ either when n is fixed or n → ∞. It concludes the result.
(A.4)
Proof of Lemma 2.1. By using Markov’s inequality, for any τ > 0, under (A-ii) and (A-iii), we have that
P
d λs {∥zos ∥2 − (n − 1)} 2 s=2
(n − 1)λ1
>τ
=P
d
λs {(n − 1)
n
2 (zsk − 1)/n −
k=1
n k̸=k′
zsk zsk′ /n}
(n − 1)λ1
s=2
2
>τ
d
2 2 r ,s≥2 λr λs E {(zrk − 1)(zsk − 1)}
=O
nλ21
+ O{δ1 /(nλ1 )2 } → 0 √
as d → ∞ either when n is fixed or n → ∞. Thus it holds that tr(SD )/λ1 = κ/λ1 + ∥zo1 / n − 1∥2 + op (1) from the fact that tr(SD ) = λ1 ∥zo1 ∥2 /(n − 1) + the results.
d
s=2
λs ∥zos ∥2 /(n − 1). Then, from Proposition 2.1 and lim infd→∞ κ/λ1 > 0, we can claim
Proof of Theorem 2.1. When n → ∞, we can claim the results from Theorems 4.1, 4.2 and Corollary 4.1 in Yata and Aoshima (2013). When n is fixed, we can claim the results from Theorem 3.1 and Corollary 3.1 in Ishii et al. (2014). Proof of Theorem 2.2. From Theorem 2.1 and Lemma 2.1, under (A-i) to (A-iii), it holds that P
λ (n − 1)λ˜ (n − 1)λ˜ 1 1 1 ∈ , ˜ 1 aκ˜ + (n − 1)λ˜ 1 tr(6) bκ˜ + (n − 1)λ (n − 1)λ˜ (n − 1)λ˜ 1 λ1 1 ≤ =P ≤ ˜1 ˜1 tr(6) bκ˜ + (n − 1)λ aκ˜ + (n − 1)λ κ bκ˜ aκ˜ λ˜ 1 κ =P ≤ ≤ = P a ≤ (n − 1) ≤b λ1 λ1 κ˜ (n − 1)λ˜ 1 (n − 1)λ˜ 1 = 1 − α + o(1)
as d → ∞ when n is fixed. It concludes the result.
Proof of Lemma 2.2. We write that n∥¯x − µ∥2 − tr(SD ) =
d s=1
n (zsj − z¯s )2 λs nz¯s2 − . n−1 j=1
(zsj − z¯s )2 /(n − 1) = nj̸=j′ zsj zsj′ /(n − 1) for all s, under (A-ii), we have that √ {∥¯x − µ∥2 − tr(SD )/n}/λ1 = z¯s2 − ∥zo1 / n − 1∥2 /n + op (1)
Then, from (A.1) and nz¯s2 −
n
j=1
as d → ∞ when n is fixed. It concludes the result.
Proof of Theorem 2.3. Under (A-i), we note that z¯1 and zo1 are independent, and nz¯12 is distributed as χ12 . Then, from Theorem 2.1 and Lemma 2.2, we can conclude the result.
198
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
Proofs of Lemmas 3.1 and 3.2. We note that ∥zo1 ∥2 /n = 1 + op (1) as n → ∞. From (A.4), under (A-ii) and (A-iii), we have that
ˆ T1 zo1 /∥zo1 ∥ = 1 + op (1) u
(A.5)
ˆ T1 zo1 = ∥zo1 ∥ + op (n1/2 ). Thus, we can claim the result of Lemma 3.2. as d → ∞ either when n is fixed or n → ∞, so that u On the other hand, with the help of Proposition 2.1, under (A-ii) and (A-iii), it holds that from (A.5) hT1 hˆ 1 =
=
ˆ1 hT1 (X − X )u {(n − 1)λˆ 1 }1/2
1/2
T ˆ1 λ1 zo1 ∥zo1 ∥ + op (n1/2 ) u = {∥zo1 ∥2 + κ/λ1 + op (n)}1/2 {(n − 1)λˆ 1 }1/2
=
1
{1 + κ/(λ1 ∥zo1 ∥2 )}1/2
+ op (1)
as d → ∞ either when n is fixed or n → ∞. It concludes the result of Lemma 3.1.
Proof of Theorem 3.1. With the help of Theorem 2.1, under (A-ii) and (A-iii), we have that from (A.5) hT1 h˜ 1 =
ˆ1 hT1 (X − X )u {(n − 1)λ˜ 1 }1/2
=
∥zo1 ∥ + op (n1/2 ) = 1 + op (1) {∥zo1 ∥2 + op (n)}1/2
as d → ∞ either when n is fixed or n → ∞. It concludes the result.
Proof of Theorem 3.2. By combining Theorem 2.1 with Lemma 3.2, under (A-ii) and (A-iii), we have that
˜s1j / λ1 = uˆ 1j (n − 1)λ˜ 1 /λ1 = uˆ 1j ∥zo1 ∥ + op (1) = zo1j + op (1) as d → ∞ when n is fixed. By noting that zo1j = z1j −¯z1 and z¯1 is distributed as N (0, 1/n) under (A-i), we have the results. Proof of Corollary 4.1. From Theorem 2.1, the result is obtained straightforwardly.
Proof of Lemma 4.1. Let Zi = [z1(i) , . . . , zd(i) ]T be a sphered data matrix of πi for i = 1, 2, where zj(i) = (zj1(i) , . . . , zjni (i) )T . We assume µ1 = µ2 = 0 without loss of generality. Let βst = (λs(1) λt (2) )1/2 hTs(1) ht (2) for all s, t. Let i⋆ be a fixed constant such that
d
s=i⋆ +1
X1T X2 =
λ2s(j) /λ21(j) = o(1) as d → ∞ for j = 1, 2. Note that i⋆ exists under (A-ii) for each πi . We write that
s,t ≤i⋆
βst zs(1) ztT(2) +
d s,t ≥i⋆ +1
βst zs(1) ztT(2) +
i⋆ d
βst zs(1) ztT(2) +
s=i⋆ +1 t =1
i⋆ d
βst zs(1) ztT(2) .
s=1 t =i⋆ +1
Note that E
i⋆ d
βst zsj(1) ztj′ (2)
2
s=i⋆ +1 t =1
i⋆ d = tr λs(1) hs(1) hTs(1) λt (2) ht (2) hTt(2) ≤ i⋆ λi⋆ +1(1) λ1(2) s=i⋆ +1
t =1
for all j, j′ . Also, note that E
d
βst zsj(1) ztj′ (2)
s,t ≥i⋆ +1
2
d d = tr λs(1) hs(1) hTs(1) λt (2) ht (2) hTt(2) s=i⋆ +1
≤
d s=i⋆ +1
λ2s(1)
t =i⋆ +1 d
λ2t (2)
1/2
t =i⋆ +1
for all j, j . Then, by using Markov’s inequality, for any τ > 0, under (A-ii) for each πi , we have that ′
P
n1 n2 i⋆ d j =1 j ′ =1
P
s=i⋆ +1 t =1
βst zsj(1) ztj′ (2) 2 > τ → 0, (n1 n2 λ1(1) λ1(2) )1/2
n1 n2 i⋆ d j =1 j ′ =1
and P
s=1
βst zsj(1) ztj′ (2) 2 > τ →0 (n1 n2 λ1(1) λ1(2) )1/2 t =i⋆ +1
n1 n2 d j =1 j ′ =1
βst zsj(1) ztj′ (2) 2 > τ →0 (n1 n2 λ1(1) λ1(2) )1/2 s,t ≥i⋆ +1
A. Ishii et al. / Journal of Statistical Planning and Inference 170 (2016) 186–199
199
as d → ∞ either when ni is fixed or ni → ∞ for i = 1, 2. Hence, similar to (A.2), it holds that eTn1
eTn1 X1T X2 en2
(ν1 ν2 λ1(1) λ1(2) )1/2
=
s,t ≤i⋆
βst zs(1) ztT(2) en2
(ν1 ν2 λ1(1) λ1(2) )1/2
+ op (1).
1 Note that eTni Pni = eTni and Pni z1(i) = zo1(i) for i = 1, 2, where zo1(i) = z1(i) − (¯z1(i) , . . . , z¯1(i) )T and z¯1(i) = n− i
Also, note that Xi Pni = (Xi − X i ) for i = 1, 2, where X i = [¯xi , . . . , x¯ i ] and x¯ i =
ˆ T1(i) Pni eigenvector of (Xi − X i )T (Xi − X i ) for i = 1, 2. Note that u under (A-ii) for each πi , we have that uT1(1)
ˆ
(X1 − X 1 ) (X2 − X 2 )uˆ 1(2)
ˆ T1(1) u
T
(ν1 ν2 λ1(1) λ1(2) )1/2
=
s,t ≤i⋆
ni
k=1
z1k(i) .
ni
ˆ j=1 xj(i) /ni . Let u1(i) be the first (unit) T T = uˆ 1(i) when (Xi − X i ) (Xi − X i ) ̸= O for i = 1, 2. Then,
βst zos(1) zotT (2) uˆ 1(2)
(ν1 ν2 λ1(1) λ1(2) )1/2
+ op (1)
(A.6)
˜ 1(i) }−1/2 (Xi − X i )uˆ 1(i) for i = 1, 2. Also, note as d → ∞ either when ni is fixed or ni → ∞ for i = 1, 2. Note that h˜ 1(i) = {νi λ T ′ that zos(i) zos′ (i) /ni = op (1) (s ̸= s ) when ni → ∞ for i = 1, 2. Then, by combining (A.6) with Theorem 2.1 and (A.5), we can claim the result. Proofs of Theorems 4.1 and 4.2. By combining Theorem 2.1, Lemmas 2.1 and 4.1, we can claim the results.
References Ahn, J., Marron, J.S., Muller, K.M., Chi, Y.-Y., 2007. The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94, 760–766. Aoshima, M., Yata, K., 2011. Two-stage procedures for high-dimensional data. Sequential Anal. 30, 356–399. Editor’s special invited paper. Aoshima, M., Yata, K., 2015. Asymptotic normality for inference on multisample, high-dimensional mean vectors under mild conditions. Methodol. Comput. Appl. Probab. 17, 419–439. Bai, Z., Saranadasa, H., 1996. Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6, 311–329. Chen, S.X., Qin, Y.-L., 2010. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38, 808–835. Hall, P., Marron, J.S., Neeman, A., 2005. Geometric representation of high dimension, low sample size data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 427–444. Ishii, A., Yata, K., Aoshima, M., 2014. Asymptotic distribution of the largest eigenvalue via geometric representations of high-dimension, low-sample-size data. In: Mukhopadhyay, N. (Ed.), Sri Lankan J. Appl. Stat. 81–94. Special Issue: Modern statistical methodologies in the cutting edge of science. Jeffery, I.B., Higgins, D.G., Culhane, A.C., 2006. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 7, 359. Jung, S., Marron, J.S., 2009. PCA consistency in high dimension, low sample size context. Ann. Statist. 37, 4104–4130. Jung, S., Sen, A., Marron, J.S., 2012. Boundary behavior in high dimension, low sample size asymptotics of PCA. J. Multivariate Anal. 109, 190–203. Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R., 2002. Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nature Med. 8, 68–74. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R., 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209. Srivastava, M.S., Yanagihara, H., 2010. Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivariate Anal. 101, 1319–1329. Yata, K., Aoshima, M., 2009. PCA consistency for non-Gaussian data in high dimension, low sample size context. In: Mukhopadhyay, N. (Ed.), Comm. Statist. Theory Methods 38, 2634–2652. Special Issue Honoring Zacks, S.. Yata, K., Aoshima, M., 2010. Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. J. Multivariate Anal. 101, 2060–2077. Yata, K., Aoshima, M., 2012. Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. J. Multivariate Anal. 105, 193–215. Yata, K., Aoshima, M., 2013. PCA consistency for the power spiked model in high-dimensional settings. J. Multivariate Anal. 122, 334–354.