On asymptotic normality of cross data matrix-based PCA in high dimension low sample size

On asymptotic normality of cross data matrix-based PCA in high dimension low sample size

Journal of Multivariate Analysis 175 (2020) 104556 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www...

603KB Sizes 0 Downloads 35 Views

Journal of Multivariate Analysis 175 (2020) 104556

Contents lists available at ScienceDirect

Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva

On asymptotic normality of cross data matrix-based PCA in high dimension low sample size ∗

Shao-Hsuan Wang , Su-Yun Huang, Ting-Li Chen Institute of Statistical Sciences, Academia Sinica, Taiwan

article

info

Article history: Received 8 April 2019 Received in revised form 27 September 2019 Accepted 27 September 2019 Available online 4 October 2019 AMS 2010 subject classifications: primary 62H12 secondary 62F12 Keywords: Asymptotic normality Cross data matrix High dimension low sample size Principal component analysis Spiked covariance model

a b s t r a c t Principal component analysis in high dimension low sample size setting has been an active research area in recent years. Yata and Aoshima (2010) proposed a cross data matrix-based method and showed the asymptotic normality for estimates of spiked eigenvalues and also consistency for corresponding estimates of PC directions. However, the asymptotic normality for estimates of PC directions is still lacking. In this article, we have extended Yata and Aoshima (2010)’s work to include the investigation of the asymptotic normality for the leading CDM-based PC directions and to compare it with the asymptotic normality for the classical PCA. Numerical examples are provided to illustrate the asymptotic normality. © 2019 Elsevier Inc. All rights reserved.

1. Introduction The study of principal component analysis (PCA) has a long history in statistics and many other scientific fields. Early works of asymptotic theory can be found in Anderson [2] and Tyler [13] in the setting of large sample size and fixed dimension. Later, Bai [5], Baik et al. [6], Paul [11] and many others have investigated the asymptotic behavior of empirical eigenvalues and eigenvectors in the setting of both sample size and dimension diverging. Recently, great attention has been given to the high dimension and low sample size (HDLSS) data due to its wide applications such as genetic microarrays, medical images, text recognition, finance, chemometrics, etc. See, for instance, Johnstone and Lu [9], Hall et al. [7], Ahn et al. [1], Yata and Aoshima [15,16] and Shen et al. [12]. Yata and Aoshima [16] proposed a cross data matrix (CDM) method for PCA and demonstrated its superior empirical performance than the classical PCA in HDLSS setting. In a study of prostate cancer, they showed that the first two PC directions of CDM-PCA can better separate the normal from the tumor samples than PCA. Yata and Aoshima [17] proposed another effective high-dimensional PCA method, the noise reduction (NR) method, and gave asymptotic properties of the NR method of eigenvalues and PC directions. However, in the case of a heavy-tailed distribution, they demonstrated in simulation study that CDM-PCA tends to perform better than the NR method. There are quite some applications using CDM methods. For instance, a correlation test for high-dimensional data [18], the estimation of tr(Σ 2 ) in high dimensional setting [20], and a two-sample test under the strongly spiked eigenvalue model ([4] and [8]). Yata and Aoshima [16] also established the asymptotic normality for eigenvalue estimates. However, the asymptotic normality for estimates of PC ∗ Corresponding author. E-mail address: [email protected] (S.-H. Wang). https://doi.org/10.1016/j.jmva.2019.104556 0047-259X/© 2019 Elsevier Inc. All rights reserved.

2

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

directions is still lacking. In this article, we have extended their work to include the investigation of the asymptotic normality for the leading CDM-based PC directions and to compare it with the asymptotic normality for the classical PCA. We find that both the CDM-PCA and the classical PCA have the same asymptotic normality in HDLSS. However, the former provides consistent and asymptotically normal estimates in a broader HDLSS region. Our contribution in this article is twofold. First, we derive the asymptotic normality for PC directions in HDLSS setting for both CDM-based and the classical PCA. Second, we provide theoretical comparison and explain why CDM-based method works in a broader HDLSS region than the classical PCA. The rest of this article is organized as follows. In Section 2, a brief review for CDM-PCA and some existing asymptotic results are given. In Section 3, the asymptotic normality for estimates of PC directions for both CDM-PCA and classical PCA are established. Comparison in asymptotic theory for the CDM-PCA and classical PCA is also provided. In Section 4, numerical examples are presented to demonstrate the behavior of asymptotic normality. In Section 5, further discussions and some concluding remarks are made. A summary list of notation usage and all the proofs are placed in Appendix. 2. A brief view for CDM-PCA In this section, we will introduce a spiked covariance model and give a brief view for CDM-PCA by Yata and Aoshima [16]. 2.1. Probabilistic model Suppose that X = (x1 , . . . , x2n ) is a data matrix of size d × 2n with xi = (x1i , . . . , xdi )⊤ , i ∈ {1, . . . , 2n}, where xi s are independent and identically distributed from a d-dimensional multivariate distribution with mean zero and covariance matrix Σ . Let X = (X(1) , X(2) ), where X(1) and X(2) , each of size d × n, form a partition for X . Assume that d ≫ n in the sense n = o(d). In general, the sample sizes of X(1) and X(2) can be different and the total sample size n can be an odd number. For notation simplicity but without loss of generality, assume that the sample sizes in the partition are equal. Denote the eigenvalue decomposition of Σ by Σ = H ΛH ⊤ , where Λ is a diagonal matrix with eigenvalues λ1 , . . . , λd and H = (h1 , . . . , hd ) is a matrix of corresponding eigenvectors. Let m be a fixed positive integer. Consider the following spiked covariance model: for λ1 > · · · > λm > λm+1 ≥ · · · ≥ λd ,

λj = aj dαj and λj < C for j > m, and aj , α1 ≥ · · · ≥ αm , C are positive constants, j ∈ {1, . . . , m}.

(1)

The spiked covariance model in (1) is quite natural. Especially, the model with α1 ≥ 1/2 is often observed in microarray data sets. A comprehensive discussion of αi can be found in Yata and Aoshima [19] and Aoshima et al. [3]. 2.2. Cross data matrix method ⊤ ⊤ Define a cross data matrix by SD(1) = n−1 X(1) X(2) or SD(2) = n−1 X(2) X(1) . The procedure of CDM-PCA is described below.

∑d

˜ ˜˜ ˜⊤ 1. Perform the singular value decomposition (SVD) of SD(1) , i.e., SD(1) = j=1 λj uj(1) uj(2) , where λj s are singular values and ˜ uj(1) s and ˜ uj(2) s are, respectively, the left and right singular vectors. √ λj . 2. Calculate ˜ hj(ℓ) = X(ℓ)˜ uj(ℓ) / n˜

3. Let ˜ λj be the estimate of λj and ˜ hj /∥˜ hj ∥ be the estimate of hj , j ∈ {1, . . . , n}, ℓ ∈ {1, 2} where ˜ hj =

Note that Yata and Aoshima [16] used

1 2

{ } ˜ ˜ ˜ ˜⊤ ˜ hj(1) + sign(˜ h⊤ j(1) hj(2) )hj(2) . Since hj(1) hj(2) = 1/2

1 u⊤ X ⊤ X u n˜ λj j(1) (1) (2) j(2)

˜

˜

1 2

} { ˜ hj(1) + ˜ hj(2) .

= 1, we dropped

sign(hj(1)˜ hj(2) ) from the above expression. Let X = H Λ Z such that Z = (z1 , . . . , zd ) is a d × 2n spherical data matrix from a distribution with the identity covariance matrix, where zj⊤ = (zj1 , . . . , zj,2n ). In other words, Ezj = 02n×1 , 1 Ezj⊤ zj = 1, and Ezj⊤ zk = 0 for j ̸ = k. Assume that lim sup1≤j≤d E|zj1 |4 is bounded. The following theorem is due to Yata 2n and Aoshima [16].

˜⊤



Theorem 1 (Theorem 2, Theorem 3 and Remark 6 of Yata and Aoshima [16]). Assume the spiked covariance model (1), where the leading m population are distinct. Then, √ eigenvalues ( ) ˜ λj 2n − 1 ⇒ N(0, 1), (a) for j ∈ {1, . . . , m}, 2 var(zj1 ) λj 2(1−αj ) ˜ ⊤˜ and sign(h⊤ /n → 0 if j hj )hj hj − 1 = op (1) under conditions: d → ∞ and n → ∞ if αj > 1/2; d → ∞ and d αj ∈ (0, 1/2]; ˜ ( ) ˜ ⊤ hj − 1 = op (1) (b) for j ∈ {1, . . . , m},sign h⊤ j hj hj ∥˜ hj ∥ under conditions: (i) d → ∞ and n → ∞ if αj > 1; (ii) d → ∞ and d1−αj /n → 0 if αj ∈ (1/2, 1]; (iii) d → ∞ and d2(1−αj ) /n → 0 if αj ∈ (0, 1/2].

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

3

From Theorem 1, the eigenvalues and directions of CDM-PCA achieve the consistency under conditions (i)–(iii). Note that the conditions given by Theorem 1 are not continuous at αj = 1/2. This phenomenon is mentioned in Remark 4 of Yata and Aoshima [16]. From the proof of Lemma 5 in Yata and Aoshima [16], the convergence rate of the non-spiked matrix needs to be of order op (n−1/2 ) to ensure the asymptotic normality of ˜ λj . For αj > 1/2, conditions n → ∞ and d → ∞ ensure that the non-spiked parts can be ignorable asymptotically. However, for αj ≤ 1/2, an extra requirement is needed. 3. Main results: asymptotic normality for PC directions Recall the spiked covariance model in (1). Following the ideas of Yata and Aoshima [15,16], and Jung and Marron [10], the first m eigenvalues of this model can be further considered as

α1 = α2 = · · · = αs1 > αs1 +1 = · · · = αs2 > · · · > αsJ −1 +1 = · · · = αsJ , where s0 = 0 and s1 + · · · + sJ = m. That is, eigenvalues with the same growth rate are classified into the same group. For notation simplicity, denote H = (H[1:m] , Htail ), where H[1:m] = (h1 , . . . , hm ) and Htail = (hm+1 , . . . , hd ). In addition, ˇ ⊤ vˇ so that the inner product is the inner product v ⊤ vˇ of a direction v and its estimate vˇ is adjusted by sign(v ⊤ v)v nonnegative. Such a convention is adopted throughout this article. 3.1. CDM-PCA Theorem 2 (CDM-PCA). Assume the spiked covariance model (1), where the leading m population eigenvalues are distinct, and conditions of Theorem 1(a). Consider the jth eigenvector, where su−1 < j ≤ su ≤ m. Then



2n H[⊤1:m] ˜ hj − hj ⇒ N 0, M [j] ,

)

(

(

)

where M [j] is an m × m matrix with (k, k)th diagonal entries and (k, k′ )th off-diagonal entries given by

{ [j ]

ak aj (ak −aj )2

Mkk =

2 2 zk1 ], E[zj1

0,

{ [j ]

Mkk′ =

k ∈ (su−1 , su ], k∈ / (su−1 , su ],



aj ak ak′ E (ak −aj )(ak′ −aj )

[zk1 zk′ 1 zj12 ],

0,

k, k′ ∈ (su−1 , su ], k′ ∈ / (su−1 , su ] or k ∈ / (su−1 , su ].

(2)

(3)

2 If we further assume that Z comes from a spherical distribution, then E[zk1 zk′ 1 zj1 ] = 0. Note that ˜ hj is not a unit vector. ⊤ 2 2 hj(2) = 1. In the following theorem, we will show the asymptotic More precisely, ∥˜ hj ∥ = 1 +∥˜ hj(1) −˜ hj(2) ∥ /4 ≥ 1 since ˜ hj(1)˜ results for length normalized CDM-PC directions ˜ hj /∥˜ hj ∥.

Theorem 3. Assume the spiked covariance model (1), where the leading m population eigenvalues are distinct, and conditions of Theorem 1(a). ( Then, ) ( ) ˜ √ hj [j ] ⊤ (a) 2n hk , k ̸= j, k, j ∈ (su−1 , su ], − hj ⇒ N 0, Mkk ∥˜ hj ∥ [j ] where Mkk is given by (2), and





2n hk

(

) ˜ hj − hj = op (1), j ∈ (su−1 , su ], k ∈ / (su−1 , su ]; ∥˜ hj ∥

(b) further assume that d → ∞ and d2(1−αj ) /n → 0 if αj ∈ (1/2, 1]. Then

( ) ˜ √ hj − h = op (1), j ∈ (su−1 , su ]; ∥˜ hj ∥ = 1 + op (n−1/2 ), 2n h⊤ j j ∥˜ hj ∥ ( ) −1/4 ˜ ˜ h⊤ , k ̸= j, k, j ≤ m. k h j = op n Moreover,

   ⊤  ( −1/4 )  ⊤ ˜ ( ) hj  H ˜    = op n−1/4 . h = o n , H p tail j  tail ∥˜  h∥ j

˜ ˜⊤˜ ˜ Note that {˜ hj }m j=1 do not satisfy the orthogonality property, i.e. ∥hj ∥ ̸ = 1 and hj hk ̸ = 0 for j ̸ = k. When hj is normalized ˜ ˜ ˜ ˜ to unit length by hj /∥hj ∥, it can be seen from Theorem 3(b) that the normalized estimate hj /∥hj ∥ requires a stronger condition on relation of d and n when αj ∈ (1/2, 1] to achieve the consistency.

4

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

Remark 1. When the sizes of X(1) and X(2) are different, say, d × n1 and d × n2 , respectively, a cross data matrix is ⊤ defined as SD(1) = (n1 n2 )−1/2 X(1) X(2) . Assume that 0 < limn1 ,n2 →∞ n1 /n2 < ∞. The definition of ˜ hj(ℓ) is modified to

√ ˜ λj , j ∈ {1, . . . , n}, ℓ ∈ {1, 2}. The asymptotic normality in Theorems 2 and 3 are still valid by hj(ℓ) = X(ℓ)˜ uj(ℓ) / nℓ˜ √ √ n n replacing 2n with 2 n 1+n2 . A balanced partition, i.e., limn1 ,n2 →∞ n1 /n2 = 1/2, is optimal among all partitions in terms 1 2 of asymptotic efficiency. 3.2. Classical PCA Define the sample covariance matrix S = (2n)−1 XX ⊤ and SD = (2n)−1 X ⊤ X . Let ˆ λj ’s, ˆ hj ’s are, respectively, the eigenvalues and eigenvectors of S. For classical PCA, the eigenvalues and PC directions are obtained from the sample covariance matrix. We first give the asymptotic normality of PC directions of the classical approach. Next, we compare the asymptotic normality of PC directions between the CDM-based method and the classical method. Theorem 4 (Theorem 3.1 and Theorem 4.1 of Yata and Aoshima [15]). Assume the spiked covariance model (1), where the leading m population eigenvalues are distinct. Then for j ∈ {1, . . . , m},



) ˆ λj ˆ − 1 ⇒ N(0, 1), h⊤ j hj − 1 = op (1) 2 var(zj1 ) λj 2n

(

under conditions: (i′ ) d → ∞ and n → ∞ if αj > 1; (ii′ ) d → ∞ and d2(1−αj ) /n → 0 if αj ∈ (0, 1]. Theorem 5 (PCA). Assume the spiked covariance model (1), where the leading m population eigenvalues are distinct, and ′ conditions (i′ )–(ii the jth eigenvector, where su−1 < j ≤ su ≤ m. Then, √ ( ) in Theorem ) ( 4. Consider ) (a) 2n H[⊤1:m] ˆ hj − hj ⇒ N 0, M [j] , [j ] where 2;)  M ( ( is given)in Theorem ⊤ ˆ  (b) Htail hj − hj  = op n−1/4 . Corollary 6. Assume the spiked covariance model (1), where the leading m population eigenvalues are distinct, and conditions (i′ )–(ii′ ) in Theorem 4. Then,( )



[j ] ˆ 2n h⊤ k hj − hj ⇒ N 0, Mkk , k ̸ = j, k, j ∈ (su−1 , su ],

(a)

(

)

[j ]

where √ Mkk is( given by ) (2); ˆ (b) 2n h⊤ h − h j j = op (1), j ∈ (ss−1 , su ]. j



(c)

ˆ 2n h⊤ / (su−1 , su ]. k hj − hj = op (1), j ∈ (su−1 , su ], k ∈ (

)

Remark 2. The asymptotic behavior of PC directions for classical PCA was also investigated by Wang and Fan [14] under different assumptions of the spiked covariance model. In their work, data is assumed to follow a sub-Gaussian distribution, while in our study only finite fourth moment is required. In their work, d/(nλj ) goes to a nonzero constant cj , while in our setting d/(nλj ) → 0 under conditions (i′ )–(ii′ ). Remark 3. In general, the population mean of xi may not be zero. Let ⊤ SoD(1) = SoD(2) =

SoD =

1 2n

1 n

( I2n −

( In − 1 2n

1 n

1n 1⊤ n

12n 1⊤ 2n

)

)

⊤ X(1) X(2)

X ⊤X

(

( I2n −

In − 1 2n

1 n

1n 1⊤ n

12n 1⊤ 2n

)

)

,

,

where 1k = (1, 1, . . . , 1)⊤ is of size k. Following the same arguments of Corollary 2 in [16], Theorems 1–5 and Corollary 6 are still valid by replacing {SD(ℓ) , SD } with {SoD(ℓ) , SoD }. 3.3. Comparison between CDM-PCA and classical PCA By Theorems 2 and 5, the CDM-PC direction ˜ hj and the classical PC direction ˆ hj , when projected to the spiked subspace, have the same asymptotic normality with covariance M [j] . However, the valid region of this asymptotic normality is different for the two methods. The CDM-PCA has a broader valid region (described by conditions (i)–(iii)) for asymptotic normality than the classical PCA (described by conditions (i′ )–(ii′ )). We explain this phenomenon of broader valid region from two perspectives.

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

5

3.3.1. Broader consistency region for CDM-PCA For easy illustration, consider a simple spiked model with λ1 = dα1 and λ2 = · · · = λd = 1. Let SD = (2n)−1 X ⊤ X . For classical PCA, the largest empirical eigenvalue can be expressed as follows.

{ ˆ λ1 =



e1 SD e2 =

max e1 ,e2 ∈R2n

∥e1 ∥=∥e2 ∥=1

λ1

max e1 ,e2 ∈R2n

1



(

e1

2n

z1(1) z1(2)

)(

z1(1) z1(2)

}

)⊤



e2 + e1 Rn e2

,

∥e1 ∥=∥e2 ∥=1

where Rn =

d ( ∑ z

1 2nλ1

j(1)

)(

zj(2)

j=2

zj(1) zj(2)

)⊤

.

Therefore, we have the following ratio,

ˆ λ1 = λ1

{ −1 ⊤

(2n)

max e1 ,e2 ∈R2n

(

e1

z1(1) z1(2)

)(

z1(1) z1(2)

}

)⊤

.



e2 + e1 Rn e2

∥e1 ∥=∥e2 ∥=1

For CDM-PCA, the ratio of the largest empirical eigenvalue to the population one can be expressed as follows.

˜ λ1 1 = λ− 1 λ1

max

e1 ,e2 ∈Rn ∥e1 ∥=∥e2 ∥=1

e⊤ 1 SD(1) e2 =

{

max

e1 ,e2 ∈Rn ∥e1 ∥=∥e2 ∥=1

⊤ ⊤˜ n−1 e⊤ 1 z1(1) z1(2) e2 + e1 Rn e2 ,

(

)

}

where d 1 ∑ ⊤ ˜ Rn = zj(1) zj(2) . nλ1 j=2

For ˆ λ1 and ˜ λ1 to achieve consistency for λ1 , we need ∥Rn ∥F = op (1) and ∥˜ Rn ∥F = op (1), respectively, where ∥ · ∥F denotes the matrix Frobenius norm. Note that Rn =

(∑

1

d j=2

2nλ1

= Dn +

⊤ zj(1) zj(1)

∑d

0

j=2

(

1 2nλ1

)

0

∑d

0

∑d

⊤ zj(2) zj(2) j=2



ℓ=2 zj(2) zj(1)

+

2nλ1

⊤ zj(1) zj(2)

0

(

1

∑d

⊤ j=2 zj(1) zj(2)

0

∑d

⊤ ℓ=2 zj(2) zj(1)

) = Dn +

1

(

)

0

0

˜ Rn

Rn 2 ˜

0

)

,

(4)

where Dn is the first term in the expression for Rn . Since E∥(2nλ1 )

−1

d ∑

zj(ℓ) zj(⊤ℓ) ∥2

( = Op

d2

)

λ21 n

j=2

, ℓ ∈ {1, 2}

and E∥(2nλ1 )−1

d ∑

⊤ 2 zj(1) zj(2) ∥ = Op

j=2

(

d

λ21 n

)

,

we have the following proposition. Proposition 7.

As d → ∞ and n → ∞,

√ √ ∥Dn ∥ = Op (d1−α1 / n) and ∥˜ Rn ∥ = Op (d1/2−α1 / n). The residual matrix Rn in classical PCA involves Dn , which has a larger order of magnitude than ˜ Rn in CDM-PCA. This is the reason that the CDM-PCA can have a broader valid region for consistency and asymptotic normality. 3.3.2. More spiked eigenvalues in CDM-PCA Recall that ˜ hj(ℓ) = (n˜ λj )−1/2 X(ℓ)˜ uj(ℓ) for ℓ = 1, 2. We have, for j ∈ {1, . . . , m},

{ } { } ⊤ ⊤ ˜ ⊤ ⊤ ˜ ˜ λ2j ˜ hj(1) = n−2 X(1) X(1) X(2) X(2) hj(1) and ˜ λ2j ˜ hj(2) = n−2 X(2) X(2) X(1) X(1) hj(2) .

(5)

2

⊤ ⊤ ⊤ ⊤ Note that E[n−2 X(1) X(1) X(2) X(2) ] = E[n−2 X(2) X(2) X(1) X(1) ] = Σ , as X(1) and X(2) are independent. Both equalities in (5) can be regarded as the sample analog of

λ2j hj = Σ 2 hj , j ∈ {1, . . . , m}.

(6)

6

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556 Table 1 (Degenerate asymptotic normality) Performance comparison of the estimated PC directions for CDM-PCA and PCA methods. Here hj , j ∈ {1, 2} are the first two population directions; ˜ hj , ˜ hj /∥˜ hj ∥, and ˆ hj , j = {1, 2} are the estimated CDM-PCA (non-normalized), CDM-PCA (normalized), and PCA directions. The mean and standard deviation of the first two projected PC directions for Gaussian and Student-t with the sample size n = 100 and the covariate dimension d = 200, 2000 over 2000 replicate runs. CDM-PCA



(

⊤˜

PCA

2n h1 h1 − 1

)



(

⊤˜

)

2n h2 h2 − 1

√ 2n

(

˜ h1 h⊤ 1 ∥˜ h1 ∥

)

−1

√ 2n

(

˜ h2 h⊤ 2 ∥˜ h2 ∥

−1

)



ˆ 2n(h⊤ 1 h1 − 1)



ˆ 2n(h⊤ 2 h2 − 1)

Gaussian d = 200 Mean Std d = 2000 Mean Std

0.012 0.053

0.010 0.054

−0.042

−0.102

0.033

0.035

0.008 0.051

0.008 0.053

−0.065

−0.187

0.038

0.043

0.034 0.127

0.025 0.112

−0.060

−0.099

0.081

0.076

0.040 0.128

0.030 0.108

−0.069

−0.148

0.067

0.073

−0.042 0.033

−0.101 0.035

−0.065 0.038

−0.185 0.042

−0.064 0.095

−0.110 0.182

−0.076 0.192

−0.197 0.548

Student-t d = 200 Mean Std d = 2000 Mean Std

As for the classical PCA, we have

{ } ˆ λjˆ hj = (2n)−1 XX ⊤ ˆ hj , j ∈ {1, . . . , m}.

(7)

Its corresponding population analog is λj hj = Σ hj . From (5)–(7), we see that the CDM-PCA corresponds to use Σ , while the classical PCA uses Σ . In other words, the CDM method adopts a power-method like strategy with power exponent 2, which leads to more spiked leading eigenvalues. 2

4. Numerical study In this section, we conducted a numerical study to demonstrate the asymptotic normality of the estimated CDM-PCA directions in comparison with the classical PCA. Set the sample size n = 100, the covariate dimension d = 200, 2000, Σ is a d × d diagonal matrix of the form diag(16 d2/3 , 4 d2/3 , d1/3 , .5 d1/3 , 1, . . . , 1), and the number of simulation runs is 2000. There are four spiked eigenvalues (s1 = 2, s2 = 2 and m = 4). Two types of distributions are considered, multivariate Gaussian and multivariate Student-t with √ √ 5 degrees of freedom. ˜ ˜ ˜ Table 1 presents the performance of 2n h⊤ 2n h⊤ j (hj − hj ) (non-normalized CDM-PCA) and j (hj /∥hj ∥− hj ) (normalized



ˆ CDM-PCA) for j = 1, 2, and 2n h⊤ j (hj − hj ) (classical PCA). The non-normalized CDM-PCA outperforms the normalized CMD-PCA, while there is no significant difference between the normalized CDM-PCA and classical PCA. From previous discussions in Section 3, the non-normalized CDM-PCA has a broader consistency region than the normalized CDM-PCA and the classical region as the normalized CDM-PCA. Furthermore, according to Theorems 3 √ PCA has the same consistency √ ˜ ˜ ˆ 2n h⊤ and 5, both 2n h⊤ j (hj /∥hj ∥ − hj ) and j (hj − hj ) are degenerated as n, d → ∞. For the case d = 200, the reported √

(

˜ h

)

2 2n h⊤ −1 2 ∥˜ h2 ∥ √ ⊤ˆ and 2n(h2 h2 − 1) are somewhat deviating from zero. √ One⊤ possible reason is that d is too√large ⊤such that conditions could be violated. Table 2 presents the performance of 2n hk ˜ hj (non-normalized CDM-PCA), 2n hk ˜ hj /∥˜ hj ∥ (normalized √ ⊤ˆ PCA. CDM-PCA), and 2n hk hj (classical PCA). We see that CDM-PCA slightly outperforms the classical √ √ ˜ ˜ ˆ 1–2, we show the normal probability plot of the quantiles of four estimators: 2n h⊤ 2n h⊤ 2 h1 /∥h1 ∥, 2 h1 , √ In Figs. √ ⊤˜ ⊤ˆ ˜ 2n h1 h2 /∥h2 ∥, and 2n h1 h2 versus the theoretical quantile values from the standard normal distribution. The performances of these four estimators from CDM-PCA and PCA are similar. However, there are more points from PCA deviating from the dash line in Student-t case (see Figs. 3–4). It reveals that estimators from CDM-PCA are more stable than those from PCA, especially when data comes from a distribution with heavy tails.

means and standard derivations are indeed close to zero. For the case d = 2000, we observe that

5. Discussions and concluding remarks In this article, we investigate the asymptotic behavior of CDM-based PCA in HDLSS setting and compare it with the classical PCA. Our results can serve as a necessary complement to the interesting initiate work of Yata and Aoshima [16]. We show that both CDM-PCA and classical PCA have the same asymptotic covariance for their PC directions. However, the asymptotic normality for the CDM-PCA is valid in a broader region. That is, CDM-PCA is able to distinguish the spiked components from the non-spiked components in more general cases. This might partially explain why CDM-PCA has a better empirical performance than the classical PCA.

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

7

Table 2 (Asymptotic normality) Performance comparison of the estimated PC directions for CDM-PCA and PCA methods. Here hj , j ∈ {1, 2} are the first two population directions; ˜ hj , ˜ hj /∥˜ hj ∥, and ˆ hj , j = {1, 2} are the estimated CDM-PCA (non-normalized), CDM-PCA (normalized), and PCA directions. The mean and standard deviation of the first two projected PC directions for Gaussian and Student-t with the sample size n = 100 and the covariate √ dimension d = 200, 2000 over 2000 replicate runs. Astd is the theoretical value for asymptotic standard deviation given by Astd =

a1 a2 (a1 −a2 )2

2 2 E[z11 z21 ].

CDM-PCA



⊤˜

2nh2 h1

PCA



⊤˜

2nh1 h2









˜ h1 2nh⊤ 2 ∥˜ h1 ∥

˜ h2 2nh⊤ 1 ∥˜ h2 ∥

ˆ 2nh⊤ 2 h1

ˆ 2nh⊤ 1 h2

Gaussian d = 200 Mean Std d = 2000 Mean Std Astd = 0.666

0.012 0.688

0.036 0.684

0.012 0.681

0.036 0.680

−0.018 0.675

0.001 0.679

−0.012

−0.006 0.703

−0.012 0.692

−0.006 0.698

0.013 0.689

0.003 0.697

0.016 0.960

0.005 0.979

0.016 0.947

0.005 0.966

−0.018 0.997

−0.041

−0.009

0.013 0.941

−0.008 0.910

0.013 0.929

0.004 0.979

−0.023

0.706 Student-t

d = 200 Mean Std d = 2000 Mean Std Astd = 1.104

0.927

0.999

0.954

Fig. 1. Here hj , j ∈ {1, 2} are the first two population directions; ˜ hj and ˆ hj , j = {1, 2} are the estimated CDM-PCA (non-normalized) and PCA directions. The normal probability plot for Gaussian case with the sample √ size n = 100 and the covariate dimension d = 200, 2000. (upper left) √ √ √ 2nh⊤˜ h1 ; (upper right) 2nh⊤ˆ h1 ; (bottom left) 2nh⊤˜ h2 ; (bottom right) 2nh⊤ˆ h2 . 2

2

1

1

To achieve consistency, CDM-PCA requires conditions (i)–(iii) listed in Section 2.2. These conditions imply d/(nλj ) → 0. Wang and Fan [14] have studied the asymptotic behavior of PC directions for classical PCA under the setting d/(nλj ) → cj > 0. They have discussed bias adjustment to achieve the consistency. It is worth to investigate the asymptotic behavior of CDM-PCA under the setting d/(nλj ) → cj > 0 in a future study. Acknowledgments The authors would like to thank the Editor-in-Chief, Professor Dietrich von Rosen, and two anonymous referees for their constructive comments and suggestions, which have lead to a great improvement of the presentation of this article. Appendix

A.1. Proofs The following two lemmas will be used in the proof of Theorems 2 and 5.

8

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

˜ h

Fig. 2. Here hj , j ∈ {1, 2} are the first two population directions; ˜j and ˆ hj , j = {1, 2} are the estimated CDM-PCA (normalized) and PCA directions. ∥hj ∥ The normal probability plot for Gaussian case with the sample size n = 100 and the covariate dimension d = 200, 2000. (upper left)



(upper right)

ˆ 2nh⊤ 2 h1 ; (bottom left)



˜ h

2 2nh⊤ ; (bottom right) 1 ∥˜ h2 ∥





˜ h

1 2nh⊤ ; 2 ∥˜ h1 ∥

ˆ 2nh⊤ 1 h2 .

Fig. 3. Here hj , j ∈ {1, 2} are the first two population directions; ˜ hj and ˆ hj , j = {1, 2} are the estimated CDM-PCA (non-normalized) and PCA directions. The normal probability plot for Student-t case with the sample √ size n = 100 and the covariate dimension d = 200, 2000. (upper left) √ √ √ 2nh⊤˜ h1 ; (upper right) 2nh⊤ˆ h1 ; (bottom left) 2nh⊤˜ h2 ; (bottom right) 2nh⊤ˆ h2 . 2

2

1

1

Lemma A.1. Assume the spiked covariance model (1), where the leading m population eigenvalues are distinct, and conditions of Theorem 1(a). Then for j, k ∈ {1, . . . , m}, √ ˜ h (a) n (h⊤ j − 1) = op (1). j





√ ˜ 1 ⊤ λk ⊤˜ (b) hj hk = hj Shk + op (n−1/2 ). ˜ ˜ λj λj˜ λk √ ) (√ ( ) ˜ λj ⊤˜ ˜ λk ⊤˜ 1 1 λj λk ⊤ (c) hk hj + hj hk = + hj Shk + + op (n−1/2 ). λk λj λj λk λk λj ˜ λj ⊤˜ hk hj + ˜ λk

Proof. (a) Based on Yata and Aoshima [16, equation (A6)], we have

(

zj(ℓ)

)⊤

∥zj(ℓ) ∥

˜ uj(ℓ) = 1 + op (n−1/2 ), j ∈ {1, . . . , m}, ℓ ∈ {1, 2}.

(A.1)

It implies that there exist an n × 1 normalized vector yj and two random variables b1n , b2n such that yj⊤ zj(ℓ) = 0 and

˜ uj(ℓ) = b1n

zj(ℓ)

∥zj(ℓ) ∥

+ b2n yj , j ∈ {1, . . . , m},

(A.2)

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

9

˜ h

Fig. 4. Here hj , j ∈ {1, 2} are the first two population directions; ˜j and ˆ hj , j = {1, 2} are the estimated CDM-PCA (normalized) and PCA directions. ∥hj ∥ The normal probability plot for Student-t case with the sample size n = 100 and the covariate dimension d = 200, 2000. (upper left)



(upper right)

ˆ 2nh⊤ 2 h1 ; (bottom left)

˜ h2 2nh⊤ ; 1 ∥˜ h2 ∥







˜ h

1 2nh⊤ ; 2 ∥˜ h1 ∥

ˆ 2nh⊤ 1 h2 .

(bottom right)



λj , ℓ ∈ {1, 2}. We where b1n = 1 + op (n−1/2 ) and b2n = op (n−1/4 ). Recall that X(ℓ) = H Λ1/2 Z(ℓ) and ˜ hj(ℓ) = X(ℓ)˜ uj(ℓ) / n ˜ have



{ } λj ∥zj(ℓ) ∥ 1 + op (n−1/2 ) . n˜ λj

⊤˜

hj hj(ℓ) =

(A.3)

Again, based on Yata and Aoshima [16, equation (A6)],

˜    λj = n−1/2 zj(1)  n−1/2 zj(2)  + op (n−1/2 ). λj

(A.4)

By (A.3) and (A.4),



(



⊤˜

n(hj hj − 1) =



(

n hj

˜ hj(1) + ˜ hj(2) 2

)

) −1

√ (∥zj(1) ∥1/2 − ∥zj(2) ∥1/2 )2 n (1 + op (n−1/2 )) 2∥zj(1) ∥1/2 ∥zj(2) ∥1/2 ( )( ) √ ∥zj(1) ∥2 − ∥zj(2) ∥2 ∥zj(1) ∥1/2 − ∥zj(2) ∥1/2 n (1 + op (n−1/2 )) . = n 2n ∥zj(1) ∥1/2 + ∥zj(2) ∥1/2 ∥zj(1) ∥1/2 ∥zj(2) ∥1/2 (∥zj(1) ∥ + ∥zj(2) ∥)

=

(A.5)

Since zj1(ℓ) , . . . , zjn(ℓ) are independent and identical for given j,

√ ∥zj(1) ∥2 − ∥zj(2) ∥2 n

2n

By (A.5) and (A.6), we have X⊤

X(2) (2) h⊤ hj(1) j(1) ˜ λj n

(b) Since ˜

(˜ hj(1) − hj )⊤

˜

˜ λj n

(A.6)



˜ n(h⊤ j hj − 1) = op (1).

= 1 and

⊤ X(2) X(2)

∥zj(1) ∥1/2 − ∥zj(2) ∥1/2 ∥zj(1) ∥1/2 ∥zj(2) ∥1/2 (∥zj(1) ∥ + ∥zj(2) ∥) = op (1), = Op (1). 1 / 2 1 / 2 ∥zj(1) ∥ + ∥zj(2) ∥ n

= Op (1),

⊤ X(2) X(2)

˜ λj n

˜ hj(1) = ˜ hj(2) ,

(˜ hj(1) − hj ) = ˜ h⊤ j(1)

⊤ X(2) X(2)

˜ λj n

˜ hj(1) − 2h⊤ j

⊤ X(2) X(2)

˜ λj n

˜ hj(1) + h⊤ j

⊤ X(2) X(2)

˜ λj n

hj

˜ = 1 − 2h⊤ j hj(2) +

⊤ λj zj(2) zj(2) . ˜ n λj

(A.7)

˜ (˜ hj(2) − hj ) = 1 − 2h⊤ j hj(1) +

⊤ λj zj(1) zj(1) . ˜ n λj

(A.8)

Similarly, (˜ hj(2) − hj )⊤

⊤ X(1) X(1)

˜ λj n

10

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

By (A.7) and (A.8),

(˜ hj(1) − hj )⊤

⊤ X(2) X(2)

˜ λj n

(˜ hj(1) − hj ) + (˜ hj(2) − hj )⊤

⊤ X(1) X(1)

˜ λj n

(˜ hj(2) − hj )

(

) ( ) ⊤ ⊤ λj zj zj λj zj zj = 4(1 − hj hj ) + 2 −1 = 2 − 1 + op (n−1/2 ) ˜ ˜ λj 2n λj 2n Lemma A.1(a) ( ) (∥z(1) ∥ − ∥z(2) ∥)2 ∥z(1) ∥2 + ∥z(2) ∥2 = 2 − 1 + op (n−1/2 ) = + op (n−1/2 ) = op (n−1/2 ). 2∥z(1) ∥ ∥z(2) ∥ ∥z(1) ∥ ∥z(2) ∥ (A.4) ⊤˜

(A.9)

By the same arguments, ⊤ X(2) X(2)

⊤ X(1) X(1)

(˜ hk(2) − hk ) = op (n−1/2 ).

(A.10)

⊤ ⊤ X(2) X(2) X(1) X(1) (˜ hk(1) − hk )⊤ √ (˜ hj(1) − hj ) + (˜ hk(2) − hk )⊤ √ (˜ hj(2) − hj ) = op (n−1/2 ). ˜ ˜ ˜ ˜ λj λk n λj λk n

(A.11)

(˜ hk(1) − hk )⊤

˜ λk n

(˜ hk(1) − hk ) + (˜ hk(2) − hk )⊤

˜ λk n

By (A.9), (A.10), and the Cauchy inequality, we have

˜ ˜⊤ ˜ Since ˜ h⊤ j(1) hj(2) = 1 and hj(1) hk(2) = 0 for j ̸ = k, ⊤ ⊤ X(2) X(2) X(1) X(1) (˜ hk(1) − hk )⊤ √ (˜ hj(1) − hj ) + (˜ hk(2) − hk )⊤ √ (˜ hj(2) − hj ) ˜ ˜ ˜ ˜ λj λk n λj λk n

⎞ ⎛ √ ⎞ √ ˜ ˜ ˜ 2 λk ⊤˜ ⎠ ⎝ λj ⊤˜ λk ⊤˜ ⎠ hj hk(2) + − hk hj(1) − hj hk(1) + √ h⊤ j Shk ˜ ˜ ˜ λj λk λj ˜ ˜ λj λk ⎞ ⎛ √ √ ˜ ˜ 1 λ λ j ⊤ k ⊤ ⎠. = 2 ⎝− hk ˜ hj − hj ˜ hk + √ h⊤ j Shk ˜ ˜ λk λj ˜ λ˜ λ ⎛ √

˜ λj ⊤˜ hk hj(2) − = ⎝− ˜ λk



(A.12)

j k

Thus, (b) follows from (A.11) and (A.12). (c) We will use the same approach as we used for part (b). Define for ℓ ∈ {1, 2}

Ψ ( ℓ) =

1

{

2

X(ℓ) X(⊤ℓ) n

(

m ∑

) 1 ⊤ λ− ℓ hℓ hℓ

( +

ℓ=1

m ∑

) 1 ⊤ λ− ℓ hℓ hℓ

X(ℓ) X(⊤ℓ) n

ℓ=1

} .

Note that Ψ (ℓ) is a non-negative definite matrix. We have

)( ) m ( ⊤ ∑ zj(2) zj(2) zℓ(2) ⊤ zℓ(1) ˜ ˜ (˜ hj(1) − hj ) Ψ (2) (˜ hj(1) − hj ) = u⊤ u + − √ √ j(2) j(1) ⊤

n

ℓ=1



n

n



( ) ˜ λj zj(2) ˜ u⊤ √ j(2) λj n

) ( )( ⊤ zj(2) zℓ(2) λj ∑ ⊤ zℓ(1) ˜ − uj(1) √ , ˜ n n λj ℓ=1 √ ( )( ) ) m ( ⊤ ∑ ˜ z z z z λj j(1) ℓ(1) j(1) ⊤ ℓ(2) ⊤ zj(1) ˜ ˜ ˜ (˜ hj(2) − hj )⊤ Ψ (1) (˜ hj(2) − hj ) = u⊤ u + − u √ √ √ j(1) j(2) j(1) n λj n n n ℓ=1 √ ( ) ) m ( ⊤ zj(1) zℓ(1) λj ∑ ⊤ zℓ(2) ˜ − uj(2) √ . ˜ n n λj ℓ=1 m

(A.13)

(A.14)

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

11

By (A.2), (A.4), (A.13), and (A.14), (˜ hj(1) − hj )⊤ Ψ (2) (˜ hj(1) − hj ) + (˜ hj(2) − hj )⊤ Ψ (1) (˜ hj(2) − hj )

⎧ ⎫ √ √ ( )⎬ ⎨( ∥z ∥ ∥z ∥ ) z ⊤ z ⊤ ˜ ∥ z ∥ λ z z ∥ z ∥ λ j(2) j(2) j(1) j j(1) j = 2 + − − + op (n−1/2 ) √ √ √ √ ˜ ⎭ 2n λj 2n n n n n λj (A.2) ⎩ {( ) ( )1/2 ⊤ ( )1/2 } ˜ ˜ ˜ λj z ⊤z λj z z λj = 2 + op (n−1/2 ) + − − λ 2n λ 2n λ j j j (A.4) {( )1/2 } {( )1/2 } ˜ ˜ z ⊤z λj λj =2 − − 1 + op (n−1/2 ) = op (n−1/2 ). λj 2n λj

(A.15)

Similarly, (˜ hk(1) − hk )⊤ Ψ (2) (˜ hk(1) − hk ) + (˜ hk(2) − hk )⊤ Ψ (1) (˜ hk(2) − hk ) = op (n−1/2 ).

(A.16)

By (A.15), (A.16) and the Cauchy inequality, we have (˜ hj(1) − hj )⊤ Ψ (2) (˜ hk(1) − hk ) + (˜ hj(2) − hj )⊤ Ψ (1) (˜ hk(2) − hk ) = op (n−1/2 ).

(A.17)

Obviously, (˜ hj(1) − hj )⊤ Ψ (2) (˜ hk(1) − hk ) + (˜ hj(2) − hj )⊤ Ψ (1) (˜ hk(2) − hk )

( ) ˜ λj ⊤˜ ˜ λk ⊤˜ 1 1 = hk hj + hj hk − + h⊤ j Shk λk λj λj λk ⎧√ ⎫ ( )( )( ) √ ∑ )⎬ m ( m ⊤ ⎨ ˜ z z λ λj ∑ ⊤ zℓ(2) z z ℓ (1) j ℓ (2) ℓ (1) j(2) + uj(2) √ u⊤ − u⊤ √ √ k(1) √ k(1) √ ˜ ⎩ ˜ n n n n n ⎭ λk ℓ=1 λk ℓ=1 ⎫ ⎧√ )( ( )( ) √ ∑ )⎬ m ( m ⊤ ⎨ ˜ z z λ z z λj ∑ ⊤ zℓ(1) ℓ(2) j ℓ(2) j(1) ℓ(1) u⊤ − u⊤ uj(1) √ + √ √ k(2) √ k(2) √ ˜ ⎩ ˜ n n n n n ⎭ λk ℓ=1 λk ℓ=1 {√ ( )( )( ) √ ∑ )} m ( m ⊤ ˜ zk(2) λk λk ∑ ⊤ zℓ(2) zℓ(2) ⊤ zℓ(1) ⊤ zℓ(1) + uk(2) √ uj(1) √ − uj(1) √ √ √ ˜ ˜ n n n n n λj ℓ=1 λj ℓ=1 √ {√ ( )( ) ( ) )} m ( m ⊤ ˜ λk ∑ ⊤ zℓ(1) λk ∑ zk(1) zℓ(1) ⊤ zℓ(2) ⊤ zℓ(2) + uk(1) √ uj(2) √ − uj(2) √ . √ √ ˜ ˜ n n n n n λj ℓ=1 λj ℓ=1 Based on (A.2) and the fact X(1) and X(2) are independent, (˜ hj(1) − hj )⊤ Ψ (2) (˜ hk(1) − hk ) + (˜ hj(2) − hj )⊤ Ψ (1) (˜ hk(2) − hk )

˜ λk ⊤˜ λj ˜ ˜ = h⊤ hj + h hk − λk k λj j

(

1

λj

+

1

)

λk

(√ ⊤

hj Shk +

λj + λk



λk λj

) op (n−1/2 ).

(A.18)

Thus, (c) follows from (A.17) and (A.18). □ Lemma A.2. Assume the spiked covariance model (1), where the leading m population eigenvalues are distinct, and conditions ′ (i′ )–(ii √ ) in⊤ Theorem 4. Then for j, k ∈ {1, . . . , m}, (a) n (hj ˆ hj − 1) = op (1). ⊤ˆ −1/2 ˆ (b) h⊤ ). k hj + hj hk = op (n

√ (c)

ˆ λj ⊤ˆ hk hj + ˆ λk



ˆ λk ⊤ˆ hj hk = ˆ λj



1

ˆ λjˆ λk

−1/2 h⊤ ). j Shk + op (n



Proof. (a) Since ˆ hj = Xˆ uj / 2nˆ λj and X = H Λ1/2 Z ,

√ ⊤ˆ

hj hj =

λj 2nˆ λj

zj⊤ˆ uj .

(A.19)

12

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

Based on Yata and Aoshima [15, equation (29)] , we have for j ∈ {1, . . . , m},

(

)⊤

zj

∥zj ∥

ˆ uj = 1 + op (n−1/2 ).

(A.20)

It implies that there exist an 2n × 1 normalized vector yj and two random variables b1n , b2n such that yj⊤ zj = 0 and

ˆ uj = b1n

zj

∥zj ∥

+ b2n yj , given j ≤ m

(A.21)

with b1n = 1 + op (n−1/2 ) and b2n = op (n−1/4 ). Together with (A.19),

√ ⊤ˆ

hj hj =

λj

zj⊤ zj + op (n−1/2 ).

2nˆ λj

(A.22)

Again, by Yata and Aoshima [15, equation (29)], we have

ˆ λj ∥zj ∥2 = + op (n−1/2 ). λj 2n

(A.23)

By (A.22) and (A.23), the proof is completed. (b) Based on the fact ∥ˆ hj ∥ = 1 and part (a), ⊤ −1/2 ˆ ∥ˆ hj − hj ∥2 = 2(1 − ˆ h⊤ ). j hj ) = 2hj (hj − hj ) = op (n

(A.24)

Similarly,

∥ˆ hk − hk ∥2 = op (n−1/2 ).

(A.25)

ˆ⊤ˆ

From (A.24) and (A.25) and the fact hj hk = 0, it is implied by the Cauchy inequality that

)2 ( )2 ( ⊤ ˆ = (ˆ hj − hj )⊤ (ˆ hk − hk ) ≤ ∥ˆ hj − hj ∥2 ∥ˆ hk − hk ∥2 = op (n−1 ). hj hk + ˆ h⊤ k hj (c) We will use the same approach as we used for part (b). We have 1 ˆ ⊤ˆ ⊤ ˆ ˆ λ− j (hj − hj ) S(hj − hj ) = 1 − 2hj hj +

h⊤ j Shj

ˆ λj

(

) ⊤ λj zj zj = 2 − 2hj hj + − 1 = op (n−1/2 ). (A.26) ˆ λj 2n Lemma A.2(a), (A.23) (

⊤ˆ

)

Note that Sˆ hj = ˆ λjˆ hj , j ∈ {1, . . . , 2n} and S = (2n)−1 H Λ1/2 ZZ ⊤ Λ1/2 H ⊤ . Similarly, 1 ˆ ⊤ ˆ −1/2 ˆ λ− ). k (hk − hk ) S(hk − hk ) = op (n

(A.27)

From (A.26), (A.27) and the Cauchy inequality we have

⎛√

ˆ ˆ ⎝ λj h⊤ k hj + ˆ λk { ≤



ˆ λk ⊤ˆ hj hk − ˆ λj



(ˆ hj − hj ) S(ˆ hj − hj )

⎞2



1

ˆ λjˆ λk

}{

ˆ λj

⎞2



⎠ = ⎝√ h⊤ j Shk

1

(ˆ hj − hj )⊤ S(ˆ hk − hk )⎠

ˆ λjˆ λk } (ˆ hk − hk ) S(ˆ hk − hk ) = op (n−1 ). □ ˆ λk ⊤

Proof of Theorem 2. We first show the asymptotic normality for

( 1−

λj λk

)

˜ h⊤ k hj =

λj ⊤ hj Shk + ˜ λj λk



λj op (n−1/2 ) = λk



λj λk

(



zj⊤ zk

(A.28)

˜ n h⊤ k hj . It follows from Lemma A.1(a)–(b) that

)

2n

(1 + op (1)).

Thus,

λj λk hk hj = λk − λj



(

⊤˜

We have

2n 1 ∑

2n

) zji zki

i=1

⎛ √λ λ

j 1

λ1 −λj

H[⊤1:m]˜ hj =

2n ⎜ 1 ∑⎜ ⎜ 2n ⎝√ i=1

(1 + op (1)) +

zji z1i

.. .

λj λm z z λm −λj ji mi

for some m × 1 unit vector w.

λk op (n−1/2 ). λk − λj

(A.29)

⎞ ⎟ ⎟ ⎟ (1 + op (1)) + op (n−1/2 )w ⎠

(A.30)

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

13

For j ̸ = k and j ̸ = k′ , j, k, k′ ∈ (su−1 , su ], Avar Avar

(√

˜ 2n h⊤ k hj

(√

)

˜ 2n h⊤ k hj ,

=



ak aj (ak − aj )2

˜ 2n h⊤ k′ hj

)

For j ∈ (su−1 , su ] and k ∈ / (su−1 , su ],

2 2 zk1 ], E[zj1

= √

(A.31)

ak ak′ aj (ak − aj )(ak′ − aj )

2 ]. E[zk1 zk′ 1 zj1

˜ 2n h⊤ k hj = op (1). Thus, the proof is completed.

(A.32) □

Proof of Theorem 3. (a) It is a direct result of Theorem 2. (b) We have

⎛ ∥˜ hj ∥2 = ⎝

˜ uj(1) √ 2

˜ uj(2) √

⎞⊤



⎠ (2n ˜ λj )−1 X ⊤ X ⎝

2

˜ uj(1) √ 2

˜ uj(2) √

⎞ ⎠.

(A.33)

2

⊤ ˜ ˜ Note that ˜ u⊤ j(1) X(1) X(2) uj(2) /(nλj ) = 1. Based on (A.2) and (A.21), it implies that there exist a 2n × 1 normalized vector yj uj = 0 and and two random variables b1n , b2n such that yj⊤ˆ



˜ uj(1) √



˜ uj(2) √

2

⎞ ⎠ = b1nˆ uj + b2n yj , given j ≤ m

(A.34)

2

with b1n = 1 + op (n−1/2 ) and b2n = op (n−1/4 ). Therefore,

∥˜ hj ∥2

ˆ λj ∥z(1) ∥2 + ∥z(2) ∥2 {1 + op (n−1/2 )} = 1 + op (n−1/2 ). {1 + op (n−1/2 )} = λj (A.33), (A.34) ˜ (A.9) (A.4), (A.23) 2∥z(1) ∥∥z(2) ∥ =

(A.35)

Similar to (A.34), there exist a 2n × 1 unit vector yk and two random variables b′1n = 1 + op (n−1/2 ) and b′2n = op (n−1/4 ) such that yk⊤ˆ uk = 0 and



˜ uk(1) √



˜ uk(2) √

2

⎞ ⎠ = b′1nˆ uk + b′2n yk , given k ≤ m.

2

Then, for k ̸ = j, and k, j ≤ m,

⎛ ˜ ˜ h⊤ k hj =

1

⎝ √ λk˜ λj 2n ˜

˜ uk(1) √ 2

˜ uk(2) √ 2

⎞⊤



˜ uj(1) √

⎠ X ⊤X ⎝

2

˜ uj(2) √ 2

⎞ ⎠=

( ′ )⊤ ( ) ( ) 1 √ b1nˆ uk + b′2n yk X ⊤ X b1nˆ uj + b2n yj = op n−1/4 . λk˜ λj 2n ˜

uj . Moreover, it is implied by Lemma A.1(a) and (A.35) that Note that (2n)−1 X ⊤ Xˆ uj = ˆ λjˆ ⊤

hj

(

( ) ) ˜ ˜ hj − hj h⊤ hj 1 − ∥˜ hj ∥ j − hj = + = op (n−1/2 ). ˜ ˜ ˜ ∥hj ∥ ∥hj ∥ ∥hj ∥

⊤˜ As for Htail hj , it is implied by Lemma A.1(a) and (A.35) that ⊤˜ 2 ⊤˜ 2 ˜ 2 ˜ 2 ∥Htail hj ∥ ≤ ∥˜ hj ∥2 − ∥h⊤ j hj ∥ = (∥hj ∥ − 1) + 1 − ∥hj (hj − hj ) + 1∥ 2 ⊤˜ −1/2 ˜ = (∥˜ hj ∥2 − 1) − (h⊤ ). j (hj − hj )) − 2hj (hj − hj ) = op (n

Since ∥˜ hj ∥ = 1 + op (n−1/2 ),

 2  ⊤ ˜ hj  H  = op (n−1/2 )  tail ∥˜ hj ∥  and the proof is completed.



ˆ Proof of Theorem 5. (a) First, we will show the asymptotic normality of h⊤ k hj . It follows from Lemma A.2(b) and (c) that √

⎛√ √ ⎞ √ √ ⊤ ˆ ˆ λk λ λk ⎠ ⊤ˆ 1 ⊤ λj λk zj zk j −1/2 −1/2 ⎝ op (n + op (n−1/2 ). )+ − hk hj = hj Shk + op (n )= ˆ ˆ ˆ ˆ λj λk λj λjˆ λk λjˆ λk 2n

14

S.-H. Wang, S.-Y. Huang and T.-L. Chen / Journal of Multivariate Analysis 175 (2020) 104556

Thus,

λj λk hk hj = λj − λk



(

⊤ˆ

2n 1 ∑

2n

) (1 + op (1)) + op (n−1/2 ).

zji zki

(A.36)

i=1

Note that S = (2n)−1 XX ⊤ = (2n)−1 H Λ1/2 ZZ ⊤ Λ1/2 H ⊤ . We have

⎛ √λ λ

j 1

λ −λ1

hj = H[⊤1:m]ˆ

2n ⎜ j 1 ∑⎜ ⎜ 2n ⎝√

zji z1i

.. .

λj λm z z λj −λm ji mi

i=1

⎞ ⎟ ⎟ ⎟ (1 + op (1)) + op (n−1/2 )w ⎠

(A.37)

for some m × 1 unit vector w. For j ̸ = k and j ̸ = k′ , j, k, k′ ∈ (su−1 + 1, su ], Avar Avar

(√ (√

ˆ 2n h⊤ k hj

)

ˆ 2n h⊤ k hj ,

=



ak aj (ak − aj )2

ˆ 2n h⊤ k′ hj

)

2 2 zk1 ], E[zj1

=

ak ak′ aj (ak − aj )(ak′ − aj )

(A.38) 2 E[zk1 zk′ 1 zj1 ].

(A.39)

When j ∈ (su−1 + 1, su ] and k ∈ / (su−1 + 1, su ],



ˆ 2n h⊤ k (hj − hj ) = op (1).

(A.40)

(b) By Lemma A.2(a) and the fact ∥ˆ hj ∥ = 1, ⊤ˆ 2 ⊤ˆ 2 ⊤ˆ 2 ⊤ˆ −1/2 ˆ 2 ∥Htail hj ∥ ≤ 1 − ∥h⊤ ) j hj ∥ = 1 − ∥hj (hj − hj ) + 1∥ = −{hj (hj − hj )} − 2hj (hj − hj ) = op (n

and the theorem is established.

(A.41)



References [1] J. Ahn, J.S. Marron, K.M. Muller, Y.Y. Chi, The high-dimension, low-sample-size geometric representation holds under mild conditions, Biometrika 94 (2007) 760–766. [2] T.W. Anderson, Asymptotic theory for principal component analysis, Ann. Math. Stat. 34 (1963) 122–148. [3] M. Aoshima, D. Shen, H. Shen, K. Yata, Y.H. Zhou, J.S. Marron, A survey of high dimension low sample size asymptotic, Aust. N. Z. J. Stat. 60 (2018) 4–19. [4] M. Aoshima, K. Yata, Two-sample tests for high-dimension, strongly spiked eigenvalue models, Statist. Sinica 28 (2018) 43–62. [5] Z.D. Bai, Methodologies in spectral analysis of large-dimensional random matrices, a review, Statist. Sinica 9 (1999) 611–677. [6] J. Baik, G. Ben Arous, S. Péché, Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices, Ann. Probab. 33 (2005) 1643–1697. [7] P. Hall, J.S. Marron, A. Neeman, Geometric representation of high dimension, low sample size data, J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 427–444. [8] A. Ishii, K. Yata, M. Aoshima, Equality tests of high-dimensional covariance matrices under the strongly spiked eigenvalue model, J. Statist. Plann. Inference 202 (2019) 99–111. [9] I.M. Johnstone, A.Y. Lu, On consistency and sparsity for principal components analysis in high dimensions, J. Amer. Statist. Assoc. 104 (2009) 682–693. [10] S. Jung, J.S. Marron, PCA consistency in high dimension, low sample size context, Ann. Statist. 37 (2009) 4104–4130. [11] D. Paul, Asymptotics of sample eigenstructure for a large dimensional spiked covariance, Statist. Sinica 17 (2007) 1617–1642. [12] D. Shen, H. Shen, H. Zhu, J.S. Marron, The statistics and mathematics of high dimension low sample size asymptotics, Statist. Sinica 26 (2016) 1747–1770. [13] D.E. Tyler, The asymptotic distribution of principal component roots under local alternatives to multiple roots, Ann. Statist. 11 (1983) 1232–1242. [14] W. Wang, J. Fan, Asymptotics of empirical eigenstructure for high dimensional spiked covariance, Ann. Statist. 45 (2017) 1342–1374. [15] K. Yata, M. Aoshima, PCA consistency for non-Gaussian data in high dimension, low sample size context, Comm. Statist. Theory Methods 38 (2009) 2634–2652. [16] K. Yata, M. Aoshima, Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix, J. Multivariate Anal. 101 (2010) 2060–2077. [17] K. Yata, M. Aoshima, Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations, J. Multivariate Anal. 105 (2012) 193–215. [18] K. Yata, M. Aoshima, Correlation tests for high-dimensional data using extended cross-data-matrix methodology, J. Multivariate Anal. 117 (2013) 313–331. [19] K. Yata, M. Aoshima, PCA consistency for the power spiked model in high-dimensional settings, J. Multivariate Anal. 122 (2013) 334–354. [20] K. Yata, M. Aoshima, High-dimensional inference on covariance structures via the extended cross-data-matrix methodology, J. Multivariate Anal. 151 (2016) 151–166.