A note on the unbiased estimator of Σ2

A note on the unbiased estimator of Σ2

Statistics and Probability Letters 129 (2017) 141–146 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage:...

441KB Sizes 4 Downloads 145 Views

Statistics and Probability Letters 129 (2017) 141–146

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

A note on the unbiased estimator of Σ2 Bu Zhou, Jia Guo * Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore

article

info

Article history: Received 5 February 2017 Received in revised form 9 May 2017 Accepted 18 May 2017 Available online 31 May 2017

a b s t r a c t This paper gives simple and intuitive derivations of three equivalent forms of a distributionfree and unbiased estimator of the squared covariance matrix Σ2 . Particularly, computationally efficient forms of the unbiased estimators of Σ2 and its trace are derived from the computationally intensive U-statistic forms. © 2017 Elsevier B.V. All rights reserved.

MSC 2010: 62H12 62G05 Keywords: Unbiased estimators U-statistics Covariance matrix High-dimensional data

1. Introduction Suppose we have n independently distributed p-variate samples {x1 , . . . , xn } with a common mean vector µ and covariance matrix Σ, we are interested in the estimation of the squared covariance matrix Σ2 . This work is inspired by the estimation of the trace of the squared covariance matrix tr(Σ2 ), which is used in many high-dimensional inference problems, such as the high-dimensional mean and covariance testing problems, see for example, (1996), Chen ∑n Bai and Saranadasa ¯ ¯ ⊤ and Qin (2010), and Srivastava and Yanagihara (2010) among others. Let S = i=1 (xi − x)(xi − x) /(n − 1) be the sample covariance matrix, in the classical setting when the dimension p is fixed, a very natural plug-in estimator of tr(Σ2 ) is tr(S2 ). However, tr(S2 ) is biased and is not consistent when p diverges because the bias of tr(S2 ) depends on p. When the samples {x1 , . . . , xn } are i.i.d. (independently and identically distributed) from a multivariate normal distribution, an unbiased estimator can be easily found using the moments formulas of Wishart matrix. An unbiased and translationinvariant estimator of tr(Σ2 ) for normally distributed samples is given, for example, in Bai and Saranadasa (1996) (see also (3.18) of Zhang and Xu, 2009) and is used in the estimation of the variance of their two-sample test statistic. This unbiased estimator is in fact a UMVUE (uniformly minimum variance unbiased estimator) of tr(Σ2 ) under the normality assumption because it is a function of S only (see also Lemma 2 of Hu et al., 2017). However, the unbiased estimator is generally biased for non-normal data and further assumptions are needed for its asymptotic unbiasedness. Some other asymptotically unbiased estimators of tr(Σ2 ) are proposed, for example, in Ahmad et al. (2008) and Chen and Qin (2010), but these estimators also require additional conditions for asymptotic unbiasedness and are usually not translation-invariant. In high-dimensional analysis, the unbiasedness of an estimator is a very desirable property because any small bias depending on p may no longer be negligible when p diverges. Besides, we want the estimator of tr(Σ2 ) to be translationinvariant in the sense that it does not depend on the mean vector µ because in many cases when we need the estimation of

*

Corresponding author. E-mail addresses: [email protected] (B. Zhou), [email protected] (J. Guo).

http://dx.doi.org/10.1016/j.spl.2017.05.014 0167-7152/© 2017 Elsevier B.V. All rights reserved.

142

B. Zhou, J. Guo / Statistics and Probability Letters 129 (2017) 141–146

tr(Σ2 ), µ is also unknown. Another important factor to be considered is the computational efficiency, as when both n and p are large, the computation times for inefficient estimators can be intolerable. In this paper, we derive three different forms of the unbiased estimator of Σ2 under only the independence and the common mean and covariance assumptions. In particular, a computationally efficient form of the unbiased estimator of Σ2 is derived from the computationally intensive U-statistic form. The derivations are simple, intuitive and extendable, only relying on some basic ideas of U-statistics and elementary calculations. Based on the three different forms of the unbiased estimator of Σ2 , we can obtain three forms, including a computationally efficient form, of the unbiased and translation-invariant estimator of tr(Σ2 ), that appear in Hu et al. (2017), Chen et al. (2010) and Yamada and Himeno (2015), respectively, and establish their equivalence automatically. Especially, this paper provides a new and intuitive way to derive the computationally efficient form of the unbiased estimator of tr(Σ2 ) given by Yamada and Himeno (2015), and thus unifies the three unbiased and translation-invariant estimators of tr(Σ2 ) proposed respectively by Chen et al. (2010), Hu et al. (2017) and Yamada and Himeno (2015), from a U-statistics perspective. 2. Main results An unbiased estimator can be directly given in the form of a U-statistic. Note that Σ2 = E{(x1 − x3 )(x1 − x4 )⊤ (x2 − x5 )(x2 − x6 )⊤ }, by averaging all possible combinations, the U-statistic for estimating Σ2 is

ˆ2 (1) = Σ

1 P6

∗ ∑ {

(xs − xi )(xs − xj )⊤ (xt − xk )(xt − xl )⊤ ,

}

(1)

s,t ,i,j,k,l

∑∗

where the ‘‘hollow sum’’ i1 ,...,im denotes the summation over mutually distinct indices i1 , . . . , im ∈ {1, . . . , n}, and Pm = n × · · · × (n − m + 1). By linearity of the trace and the expectation operators, we obtain the first form of the unbiased estimator of tr(Σ2 ),

ˆ ˆ2 (1) ) = 1 tr( Σ2 )(1) = tr(Σ P6

∗ ∑ {

(xt − xl )⊤ (xs − xi )(xs − xj )⊤ (xt − xk ) ,

}

(2)

s,t ,i,j,k,l

ˆ which is a U-statistic for estimating tr(Σ2 ). The form tr( Σ2 )(1) was used by Hu et al. (2017) for their MANOVA test. The ˆ ˆ2 (1) and tr( ˆ2 (1) unbiasedness of Σ Σ2 )(1) does not depend on the underlying distribution of the samples. What is more, Σ ˆ 2 and tr(Σ ) are translation-invariant because the factors like x − x in (1) and (2) are all translation-invariant. Obviously,

∑∗ s

(1)

i

6 ˆ2 (1) is computationally intensive because the summation Σ s,t ,i,j,k,l is of order O(n ). But notice that for any term in the ⊤ ⊤ ⊤ expansion of (xs − xi )(xs − xj ) (xt − xk )(xt − xl ) , we have four unique indices at most, for example, xi x⊤ j xk xl , so the order 4 ⊤ ⊤ of the summation can be reduced to O(n ) by expanding the kernel (xs − xi )(xs − xj ) (xt − xk )(xt − xl ) . This is equivalent to an alternative approach of constructing the }unbiased of Σ2 based on a linear combination of U-statistics. Note ( ⊤estimator )2 { ⊤ ⊤ ⊤ 2 ⊤ 2 2 ⊤ ⊤ ) + µµ − E(x ) − µµ , we have Σ = E(x x Σ = E(x1 x⊤ 1 x1 )µµ − µµ E(x1 x1 ) by expanding Σ . Replacing the 1 1 1 four terms in the expansion of Σ2 by their corresponding U-statistics, we have the following estimator of Σ2 ,

ˆ2 (2) = Σ

∗ ∗ ∗ ∗ ) ) 1 ∑ ( ⊤ ⊤) 1 ∑( ⊤ 1 ∑ ( ⊤ ⊤) 1 ∑( ⊤ xi xi xj xj + xi xj xk x⊤ − x x x x − xi xj xk x⊤ i i j k l k , P2 P4 P3 P3 i,j

i,j,k,l

i,j,k

(3)

i,j,k

and the corresponding estimator of tr(Σ2 ),

{

ˆ ˆ2 (2) tr( Σ2 )(2) = tr Σ

=

∗ 1 ∑(

P2

i,j

}

⊤ x⊤ j xi xi xj +

)

∗ ∗ 1 ∑( ⊤ ⊤ ) 2 ∑( ⊤ ⊤ ) xl xi xj xk − xk xi xi xj . P4 P3 i,j,k,l

(4)

i,j,k

ˆ ˆ2 (1) The above estimator tr( Σ2 )(2) was used in Chen et al. (2010) for constructing their test statistics. The equivalence of Σ ⊤ ⊤ ˆ 2 and Σ (2) can be easily shown by expanding (xs − xi )(xs − xj ) (xt − xk )(xt − xl ) (see a complete proof in Appendix A), so we have the following fact. ˆ ˆ ˆ2 (1) = Σ ˆ2 (2) , and tr( Fact 1. Σ Σ2 )(1) = tr( Σ2 )(2) . ˆ ˆ2 (2) and tr( By Fact 1, we immediately know that Σ Σ2 )(2) are also translation-invariant. ˆ2 (2) given in (3) still needs a summation of order O(n4 ), which is also very slow to compute when n is The second form Σ large. A common way to reduce the computation of hollow sums in U-statistics is to represent them in terms of complete sums without any indices restrictions as in Lemma 1 in Appendix A. One may directly simplify the hollow sums in (3) using this idea. However, the computation can be significantly reduced by exploiting the translation-invariant property of the

B. Zhou, J. Guo / Statistics and Probability Letters 129 (2017) 141–146

143

ˆ2 (2) . Recall that Σ ˆ2 (1) is translation-invariant, and so is Σ ˆ2 (2) by Fact 1. Let x¯ = unbiased estimator Σ mean vector, if we make the following transformation of the original samples yi = xi − x¯ ,

∑n

i=1 xi

/n be the sample

i = 1, . . . , n,

ˆ2 (2)′ in the same way as defining Σ ˆ2 (2) in (3) but with xi replaced by these centered samples yi , i = 1, . . . , n, and define Σ that is, let ∗ ∗ ∗ ) ) 1 ∑ ( ⊤ ⊤) 1 ∑( ⊤ 1 ∑( ⊤ ⊤ ⊤ − yi yi yj yj + yi yj yk y⊤ yi yj yk y⊤ l k + yi yi yj yk , P2 P4 P3

ˆ2 (2)′ = Σ

i,j

i,j,k,l

(5)

i,j,k

ˆ2 (2)′ = Σ ˆ2 (2) by the translation-invariant property of Σ ˆ2 (2) . To further simplify the expression of we immediately have Σ ˆ2 (2)′ , we note for all the complete sums like ∑ yi y⊤ yk y⊤ , where I denotes the set of indices, i, j, k, l ∈ {1, . . . , n}, with Σ j l I possibility that some indices coincide (always take the same values), only the following four are not 0, n ∑





yi yi yi yi =

) ⊤ 2

yi yi

n ∑

,

⊤ yi y⊤ j yi yj =

i,j=1

( ⊤



yi yi yj yj =

i,j=1

i=1 n

i=1 n



n ∑ (

∑(

)2

yi y⊤ , j

i,j=1

)2 ,



yi yi

i=1 n ∑

and

n ∑

⊤ yi y⊤ j yj yi .

i,j=1

∑n

The above result is due to the fact that i=1 yi = 0. This important observation implies that if we re-express the hollow sum terms in (5) in terms of complete sums, most terms are 0 and only the above four terms are remained. So instead of directly re-expressing the hollow sums in (3), we re-express the hollow sums in (3) in terms of complete sums. Applying Lemma 1 to the four terms in (5), we have ∗ ∑ (

( ⊤





) ⊤

yi yi yj yj

)

=

i,j

n ∑

)2 ⊤

yi yi

n ∑ (



i=1

i=1

∗ ∑ (

yi yj yk yk

=2

i,j,k

(

n ∑ (

) ⊤ 2

yi yi



i=1

∗ ∑ (





yi yi yj yk

)

=2

i,j,k





yi yj yk yl

)

(

n ∑ (

) ⊤ 2

yi yi

= −6

i,j,k,l

n ∑

)2 ,



yi yi

i=1



i=1

∗ ∑ (

)2

yi y⊤ , i

n ∑

)2 ,



yi yi

i=1

(

n ∑ (

) ⊤ 2

yi yi

+

i=1

n ∑

)2 ⊤

yi yi

+

n ∑ (

yi y⊤ j

)2

+

i,j=1

i=1

n ∑ (

⊤ yi y⊤ . j yj yi

)

i,j=1

By plugging the above results into (5), we get a new form of the unbiased estimator of Σ2 ,

ˆ2 (3) = − Σ

) ⊤ 2

yi yi

(n − 2)(n − 3)

)2

(

n ∑ (

1

+

n n2 − 3n + 1 ∑

P4

i=1



yi yi

+

i=1

n 1 ∑(

P4

yi y⊤ j

)2

i,j=1

+

n 1 ∑

P4

⊤ yi y⊤ j yj yi .

(6)

i,j=1

Applying matrix trace to both sides of (6), note n ∑ (

tr yi y⊤ i

)2

=

i=1

( tr

n ∑ ( i=1





)2

n

yi y⊤ i

= tr ⎝

) ⊤ 2

tr yi yj

=

i,j=1

n ∑ (

⎞ ⊤⎠ yi y⊤ = i yj yj

⊤ y⊤ j yi yj yi =

)

i,j=1

)2

y⊤ i yj ,

n ∑ (

)2

y⊤ i yj ,

i,j=1

)2

n



⊤ tr yi y⊤ = j yj yi

n ∑ ( i,j=1

)

i,j=1

(

n

(

n ∑ i,j=1

i=1 n ∑ (



)2

y⊤ i yi ,

y⊤ i yi

,

i=1

we obtain the corresponding form of unbiased estimator of tr(Σ2 ),

ˆ tr( Σ2 )(3) = −

1 (n − 2)(n − 3)

n ∑ (



yi yi

i=1

)2

+

1 n(n − 3)

n ∑ ( i,j=1



yi yj

)2

+

1 P4

(

n ∑ i=1

)2 ⊤

yi yi

.

(7)

144

B. Zhou, J. Guo / Statistics and Probability Letters 129 (2017) 141–146

Table 1 Computational complexity of the estimators. Estimator

ˆ2 (1) Σ

ˆ2 (2) Σ

ˆ2 (3) Σ

ˆ tr( Σ2 )(1)

ˆ tr( Σ2 )(2)

ˆ tr( Σ2 )(3)

Complexity

O(n6 p2 )

O(n4 p2 )

O(n2 p2 )

O(n6 p)

O(n4 p)

O(n2 p)

)2

∑n (

1 ⊤ Now define Q = n− i=1 yi yi , by the definition of S, 1 ˆ then tr(Σ2 ) can be written as

∑n

i,j=1

(

y⊤ i yj

)2

( ) (∑n ⊤ )2 = (n − 1)2 tr S2 , = (n − 1)2 tr2 (S), i=1 yi yi

(3)

ˆ tr( Σ2 )(3)′ = −

n−1 (n − 2)(n − 3)

Q +

(n − 1)2 n(n − 3)

tr S2 +

(n − 1)

( )

n(n − 2)(n − 3)

tr2 (S) .

(8)

From previous discussions, we have the following result.

ˆ ˆ ˆ2 (2) = Σ ˆ2 (3) , and tr( Fact 2. Σ Σ2 )(2) = tr( Σ2 )(3) . ˆ ˆ2 (3) given in (6) and the form tr( The form Σ Σ2 )(3) given in (7) are very compact and are also much more efficient than ˆ ˆ ˆ ˆ 2 2 2 2 Σ ,Σ and tr(Σ ) , tr(Σ ) , respectively, because the summations involved in (6) and (7) are of order O(n2 ) at most. (1)

(2)

(1)

(2)

ˆ Katayama and Kano (2014) and Hu and Bai (2016) discussed how to optimize the computation of tr( Σ2 )(2) by re-expressing the hollow sums involved in (4) using complete sums. However, neither of them noticed that using the form with centered ˆ ˆ samples, i.e., (5), can further simplify the computation of tr( Σ2 )(2) to tr( Σ2 )(3) . So their estimators have cumbersome forms in ˆ ˆ terms of complete sums and are computationally less efficient than tr(Σ2 ) . The form tr( Σ2 ) ′ given in (8) was also derived (3)

(3)

in Yamada and Himeno (2015) under a general factor data model for i.i.d. samples, ( )but using a very different approach based on solving a linear system constructed by evaluating the expectations of Q , tr S2 and tr2 (S). Although they also obtained

}2

unbiased estimators of tr2 (Σ) and E (x1 − µ)⊤ (x1 − µ)

{

at the same time, their method is not very intuitive and is not

}2

easy to be extended. Besides, their approach requires more assumptions such that E (x1 − µ)⊤ (x1 − µ) exists. On the other hand, the method used in this paper is simpler,{ needs minimal assumptions and can be easily modified to derive }2 distribution-free and unbiased estimators of tr2 (Σ), E (x1 − µ)⊤ (x1 − µ) , and also can be easily generalized to estimate Σk (and tr(Σk ), trk (Σ)), k ≥ 3. Finally, we briefly discuss some computational matters of these estimators in the high-dimensional scenario when p is much larger than n. It is easy to see that for a p × ( r matrix ) A and ( an)r × q matrix B, the product AB requires ( ) O(prq) calculations. ( ) ˆ2 (3) , the factors yi y⊤ 2 and yi y⊤ 2 should be calculated as yi y⊤ yi y⊤ and yi y⊤ yi y⊤ , So in the formula (6) for Σ i j i i j j 3 2 respectively because the former two require O(p ) calculations while the latter two only need O(p ) calculations. Similarly, it ( ⊤ ) ⊤ (∑n ) ( ) ⊤ ∑n ⊤ ⊤ 2 is better to calculate the factor yi y⊤ as i,j=1 yi y⊤ j yj yi as yi yj yj yi , and calculate i yj yj . For the estimation i=1 yi yi ( ) ( ) ˆ ˆ of tr Σ2 , the formula tr( Σ2 ) given in (7) is more efficient than tr( Σ2 ) ′ given in (8) when n ≪ p, because tr S2 and

{

(3)

(3)

tr2 (S) in (8) have higher order computational complexities when n ≪ p and contain redundant calculations. However, ˆ ˆ implementation of formula tr( Σ2 )(3) usually involves for loops, so formula tr( Σ2 )(3)′ , which can take advantage of build-in ˆ covariance matrix and matrix trace functions, may be faster than tr( Σ2 ) even when n ≪ p in interpreted languages such as

(

(3)

)

Matlab and Python. Implementations of (7) and (8) for estimating tr Σ2 in various programming languages (C++, Fortran, R, Matlab/Octave, Python and Julia) are provided in the supplementary material, which can be used for comparing their performances (see Appendix B). For easy comparison, the computational complexities of different forms of the estimators are summarized in Table 1. Acknowledgments The authors thank the Editor-in-Chief, the Associate Editor and one reviewer for their comments and suggestions which helped improve the presentation of this paper. Appendix A. Technical Proofs Proof of Fact 1. Simply expand (2), we have ∗ ∑ {

(xs − xi )(xs − xj )⊤ (xt − xk )(xt − xl )⊤

s,t ,i,j,k,l

=

∗ ∑ (

}

⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ xs x⊤ s xt xt − xs xs xt xl − xs xs xk xt + xs xs xk xl

s,t ,i,j,k,l

)

B. Zhou, J. Guo / Statistics and Probability Letters 129 (2017) 141–146

145

∗ ∑ ) ( ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ −xs x⊤ j xt xt + xs xj xt xl + xs xj xk xt − xs xj xk xl

+

s,t ,i,j,k,l

∗ ∑ ) ( ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ −xi x⊤ s xt xt + xi xs xt xl + xi xs xk xt − xi xs xk xl

+

s,t ,i,j,k,l

∗ ∑ (

⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ xi x⊤ j xt xt − xi xj xt xl − xi xj xk xt + xi xj xk xl

+

s,t ,i,j,k,l

∗ ∑

=

∗ ∑

⊤ xs x⊤ s xt xt −

= P4

∗ ∑





xs xs xt xt + P2

s,t

∗ ∑





xs xj xt xl − P3

∗ ∑

⊤ xs x⊤ j xt xt +

⊤ xs x⊤ j xt xl

s,t ,i,j,k,l

s,t ,i,j,k,l

s,t ,i,j,k,l

s,t ,i,j,k,l

∗ ∑

⊤ xs x⊤ s xt xl −

)

∗ ∑ (

⊤ , xs xs xt xl + xs x⊤ j xt xt ⊤



)

s,t ,l

s,t ,j,l

then it is easy to see the equivalence of (1) and (3). □ The following lemma gives a general result for re-expressing hollow sums as complete sums up to order 4. Lemma 1. For any function f (i, j) of indices i, j ∈ {1, . . . , n}, any function f (i, j, k) of indices i, j, k ∈ {1, . . . , n}, and any function f (i, j, k, l) of indices i, j, k, l ∈ {1, . . . , n}, we have ∗ ∑

f (i, j) =

i,j

∗ ∑

f (i, j) −

n ∑

i,j=1 n

f (i, j, k) =

i,j,k

∗ ∑

n ∑

f (i, i),



f (i, j, k) + 2

i,j,k=1 n

f (i, j, k, l) =

i,j,k,l

n ∑

f (i, j, k, l) − 6

i,j,k,l=1



f (i, j, j, j) +

f (i, j, i) −

∑ ∑

n ∑

f (i, j, i, i) +

f (i, j, i, j) +

i,j=1 n

f (i, i, k, l) −



1{i=j} =

i = j; i ̸ = j.



f (i, j, i, l) −



{ 1{i̸=j} =

f (i, j) =

i,j

n ∑

f (i, j)1{i̸=j} =

i,j=1 n

=



f (i, j) −

i,j

n ∑

n ∑

f (i, j, k, j) −



f (i, j, k, i)



f (i, j, k, k).

i ̸ = j; i = j.

f (i, j) 1 − 1{i=j}

(

f (i, j)1{i=j} =

n ∑

i,j,k

f (i, j, k) =

n ∑

i ,j

i,j,k=1

n ∑ i=1

f (i, j, k) 1{i̸=j} 1{i̸=k} 1{j̸=k} ,

(

)

f (i, j) −

which proves Eq. (A.1). For the second one (A.2), ∗ ∑



f (i, j, j, i)

i,j,k=1

1, 0,

i,j

i,j

f (i, i, i, l)

⎫ ⎬

i,j,k=1 n

i,j,k=1

and

n ∑ i,l=1

Clearly, 1{i=j} + 1{i̸=j} = 1. Then ∗ ∑

(A.2)

i,j=1 n

i,j,l=1 n

f (i, j, j, l) −

f (i, i, k, i) +

i,k=1

Proof. For any two indices i and j, define the following indicator functions, 1, 0,

f (i, j, j)

i,j=1

n



f (i, i, k, k) +

i,j,l=1

{

n ∑

f (i, i, i, i)

n

i,k,l=1 n



n ∑ i,j=1

i,j=1

i,k=1 n



f (i, i, k) −

i,k=1

n ∑

i,j=1

n

+

n ∑

n ∑

i=1

⎧ n ⎨∑ ⎩

f (i, i, i) −

i=1



+2

(A.1)

i=1

)

f (i, i),

(A.3)

146

B. Zhou, J. Guo / Statistics and Probability Letters 129 (2017) 141–146

where

(

)

(

)(

)(

)

1{i̸=j} 1{i̸=k} 1{j̸=k} = (1 − 1{i=j} 1 − 1{i=k} 1 − 1{j=)k(} ) = (1 − 1{i=k} − 1{i=j} + 1{i=j} 1{i=k} 1 − 1{j=k} ) = (1 − 1{i=k} − 1{i=j} + 1{i=j} 1{i=k} − 1{j=k} + 1{i=k} 1{)j=k} + 1{i=j} 1{j=k} − 1{i=j} 1{i=k} 1{j=k} = 1 − 1{i=k} − 1{i=j} − 1{j=k} + 2 × 1{i=j} 1{i=k} 1{j=k} ,

(A.4)

so ∗ ∑

f (i, j, k) =

i,j,k

n ∑

f (i, j, k) 1 − 1{i=k} − 1{i=j} − 1{j=k} + 2 × 1{i=j} 1{i=k} 1{j=k} ,

(

)

i,j,k=1

and (A.2) is proved. For (A.3), similarly, we have ∗ ∑

f (i, j, k, l) =

i,j,k,l

n ∑

f (i, j, k, l) 1{i̸=j} 1{i̸=k} 1{i̸=l} 1{j̸=k} 1{j̸=l} 1{k̸=l}

(

)

i,j,k=1 n

=



f (i, j, k, l) 1{i̸=j} 1{i̸=k} 1{j̸=k}

(

)(

1{i̸=l} 1{j̸=l} 1{k̸=l} .

)

i,j,k=1

Based on (A.4) and the result

(

)

)(

)(

1{i̸=l} 1{j̸=l} 1{k̸=l} = (1 − 1{i=l} 1 − 1{j=l} 1 − 1{k)=(l} ) = (1 − 1{j=l} − 1{i=l} + 1{i=l} 1{j=l} 1 − 1{k=l} ) = 1 − 1{j=l} − 1{i=l} + 1{i=l} 1{j=l} − 1{k=l} + 1{j=l} 1{k=l} + 1{i=l} 1{k=l} − 1{i=l} 1{j=l} 1{k=l} , after some simple calculations, we have

(

1{i̸=j} 1{i̸=k} 1{j̸=k}

)(

)

(

)

1{i̸=l} 1{j̸=l} 1{k̸=l} = 1 − 6 1{i=j} 1{i=k} 1{i=l} 1{j=k} 1{j=l} 1{k=l} + 1{(i=j} 1{k=l} + 1{i=k} 1{j=l} + 1{j=k} 1{i=l} ) + 2 1{i=j} 1{j=l} + 1{j=k} 1{k=l} + 1{i=k} 1{k=l} + 1{i=j} 1{j=k} − 1{i=j} − 1{i=k} − 1{i=l} − 1{j=k} − 1{j=l} − 1{k=l} ,

and (A.3) is shown. □ Appendix B. Supplementary Material R code for numerically verifying the equivalence of (1), (3) and (6), and implementations of (7) and (8) in various programming languages (C++, Fortran, R, Matlab/Octave, Python and Julia) are provided in the supplementary material. Supplementary material related to this article can be found online at http://dx.doi.org/10.1016/j.spl.2017.05.014 References Ahmad, M.R., Werner, C., Brunner, E., 2008. Analysis of high-dimensional repeated measures designs: The one sample case. Comput. Statist. Data Anal. 53 (2), 416–427. Bai, Z.D., Saranadasa, H., 1996. Effect of high dimension: by an example of a two sample problem. Statist. Sinica 6 (2), 311–329. Chen, S.X., Qin, Y.-L., 2010. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 (2), 808–835. Chen, S.X., Zhang, L.-X., Zhong, P.-S., 2010. Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 105 (490), 810–819. Hu, J., Bai, Z., 2016. A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices. Sci. China Math. 59 (12), 2281–2300. Hu, J., Bai, Z., Wang, C., Wang, W., 2017. On testing the equality of high dimensional mean vectors with unequal covariance matrices. Ann. Inst. Statist. Math. 69 (2), 365–387. Katayama, S., Kano, Y., 2014. A new test on high-dimensional mean vector without any assumption on population covariance matrix. Commun. Stat. Theory Methods 43 (24), 5290–5304. Srivastava, M.S., Yanagihara, H., 2010. Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivariate Anal. 101 (6), 1319–1329. Yamada, T., Himeno, T., 2015. Testing homogeneity of mean vectors under heteroscedasticity in high-dimension. J. Multivariate Anal. 139, 7–27. Zhang, J.-T., Xu, J., 2009. On the k-sample Behrens-Fisher problem for high-dimensional data. Sci. China Ser. A Math. 52 (6), 1285–1304.