Moderate deviation principles for classical likelihood ratio tests of high-dimensional normal distributions

Moderate deviation principles for classical likelihood ratio tests of high-dimensional normal distributions

Accepted Manuscript Moderate deviation principles for classical likelihood ratio tests of high-dimensional normal distributions Jiang Hui, Wang Shaoch...

801KB Sizes 0 Downloads 54 Views

Accepted Manuscript Moderate deviation principles for classical likelihood ratio tests of high-dimensional normal distributions Jiang Hui, Wang Shaochen PII: DOI: Reference:

S0047-259X(17)30081-7 http://dx.doi.org/10.1016/j.jmva.2017.02.004 YJMVA 4221

To appear in:

Journal of Multivariate Analysis

Received date: 4 December 2015 Please cite this article as: J. Hui, W. Shaochen, Moderate deviation principles for classical likelihood ratio tests of high-dimensional normal distributions, Journal of Multivariate Analysis (2017), http://dx.doi.org/10.1016/j.jmva.2017.02.004 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript Click here to download Manuscript: MDPLRT2017.pdf

Click here to view linked Referenc

MODERATE DEVIATION PRINCIPLES FOR CLASSICAL LIKELIHOOD RATIO TESTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS JIANG HUI1 AND WANG SHAOCHEN2

1

Department of Mathematics Nanjing University of Aeronautics and Astronautics, Nanjing, P.R. China 2

School of Mathematics South China University of Technology, Guangzhou, P.R. China Abstract. Let x1 , . . . , xn be a random sample from a Gaussian random vector of dimension p < n with mean µ and covariance matrix Σ. Based on this sample, we consider the moderate deviation principle of the modified likelihood ratio test (LRT) for the testing problem H0 : Σ = λIp versus H1 : Σ ̸= λIp in the highdimensional setting, where λ is some unknown constant [T.F. Jiang, F. Yang, Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions, Ann. Statist. 41 (2013) 2029–2074]. We assume that both the dimension p and sample size n go to infinity in such a way that p/n → y ∈ (0, 1]. Under H0 , our results give the exponential convergence rate of the LRT statistic to the corresponding asymptotic distribution.

1. Introduction Testing the mean and the covariance of a Normal distribution is an important topic in multivariate analysis. In the past few decades, due to technical limitations, most theoretical results about the asymptotic behavior of testing statistics have been obtained in the fixed dimensional case only. See, for instance, Anderson [1], Eaton [8] and Muirhead [22]. However, for many modern datasets such as financial, consumer, manufacturing and multimedia data, high-dimensional settings are now common. More examples can be found in Johnstone [17] and the references in [16]. We first review some known facts about the tests for the covariance matrix of a high-dimensional random vector. Let x1 , . . . , xn be independent and identically distributed (iid) observations from a random vector of dimension p < n with mean µ and covariance matrix Σ. We consider the hypothesis testing problem H0 :

Σ = Σ0

vs.

H1 :

Σ ̸= Σ0 .

(1)

2010 Mathematics Subject Classification. 60F10; 62H15. Key words and phrases. High-dimensional Normal distribution, likelihood ratio tests, moderate deviations.

2

JIANG HUI AND WANG SHAOCHEN

If Σ0 = Ip , where p is fixed and x1 is normally distributed, the asymptotic behavior of the likelihood ratio test (LRT) for (1) has been well studied. See, for instance, Anderson [1], Eaton [8] and Muirhead [22]. See also below for more details. When the dimension p tends to infinity, Bai et al. [3] corrected the LRT under the largedimensional limiting scheme p/n → y ∈ [0, 1). Their proofs depend on the Central Limit Theorem for the linear spectral statistics of the sample covariance matrix. Jiang et al. [13] extended it to the case p/n → y ∈ (0, 1] and p < n. However, their proofs were based heavily on an assumption on the population distribution. More recently, Jiang [12] proposed Rao’s score test for (1). This requires only the finiteness of the 4th moment of the population distribution and p/n → [0, ∞). Her proofs also depend on the Central Limit Theorem for the linear spectral statistics of the sample covariance matrix. Finally, some other tests for covariance matrices can be found in Cai and Ma [5], Chen et al. [6], Wang and Yao [24], and others. In this paper, we mainly study the moderate deviation principle (MDP) for the test statistic in the setting of Jiang and Yang [16]. More precisely, we consider the hypothesis testing problem H0 :

Σ = λIp

vs.

H1 :

Σ ̸= λIp

(2)

for the Normal distribution Np (µ, Σ) with λ unspecified and p/n → y ∈ (0, 1]. Let x1 , . . . , xn be iid Rp -valued random variables with Normal distribution Np (µ, Σ). Let n n 1∑ 1∑ ¯= ¯ )(xi − x ¯ )⊤ . (3) x xi and S = (xi − x n i=1 n i=1 Mauchly [21] introduced the likelihood ratio test statistic of (2) as follows: { tr(S) }−p Vn = |S| × , p

(4)

where tr(·) is the trace operator and |A| indicates the determinant of any square matrix A. When p ≥ n, S is not of full rank with probability 1; see also Remark 3, Section 2. This indicates that the likelihood ratio test of (2) exists only when p ≤ n − 1. In what follows, we will not state this assumption again. For the case p ≥ n, see Chen et al. [6], Ledoit and Wolf [20], and others. We now consider the case when both n and p are large and p ≤ n − 1. For sake of clarity when taking limits, let p = pn , i.e., p depends on n (if there is no confusion, we will frequently drop the subscript n). For all n ≥ 3, we define ( ) ( ) 3 p µn = −p − n − p − ln 1 − (5) 2 n−1

and

σn2

= −2

{

( )} p p + ln 1 − . n−1 n−1

(6)

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

3

We assume that limn→∞ p/n = y ∈ (0, 1]. Then, under H0 in (2), Jiang and Yang [16] proved that (ln Vn − µn )/σn converges in distribution to N (0, 1) as n → ∞. Then, the performance of the test statistic Vn can be measured by a local measure [18], i.e., for any x > 0, ( ) 1 | ln Vn − µn | x2 lim lim ln Pr ≥ ax = − . a→∞ n→∞ a2 σn 2 A natural question is whether one has ( ) 1 | ln Vn − µn | x2 lim ln Pr ≥ an x = − , n→∞ a2 σn 2 n

(7)

for any sequence (an ) with an → ∞. We call (7) the moderate deviation estimation or more generally moderate deviation principle (MDP). A standard reference for MDP theory is the book by Dembo and Zeitouni [7]. We note that (7) extends the conventional local asymptotic analysis for ln Vn , focusing on σn -neighborhoods, i.e., Pr(| ln Vn − µn | ≥ σn x) (central limit theory), to the moderate deviation region, focusing on an σn -neighborhoods, i.e., Pr(| ln Vn − µn | ≥ an σn x). See Inglot and Kallenberg [11], Kallenberg [19], and Otsu [23]. In hypothesis testing problems, it is important to control the type I errors, i.e., the probability ( ) | ln Vn − µn | Pr ≥ x , x > 0. an σn As pointed by Gao and Zhao [10], if the MDP holds for the test statistic and if we use the test statistic to construct the rejection (or acceptance) region, then the probabilities related to both type I and type II errors tend to zero exponentially (see Remark 2). This decay rate can be used to give the minimum sample sizes for given type I and type II errors. Thus, from the viewpoint of the statistical cost of experiments, the study of moderate deviation estimation is quite meaningful. In this paper, in addition to the sphericity hypothesis test model (2), we consider the MDP problem for several other classical likelihood ratio tests for mean and covariance matrices of high-dimensional Normal distributions. For example, the LRT statistics in testing that several components of a vector with distribution Np (µ, Σ) are independent; the hypothesis testing problem with H0 : Np (µ1 , Σ1 ) = · · · = Np (µk , Σk ); the test of the equality of the covariance matrices from several Normal distributions and others. Since the proofs are quite similar, we will state the results in Appendix A. We refer to Jiang and Yang [16] for a full study and a comparison with existing results, especially the powers of these new statistics. The rest of this paper is organized as follows. In Section 2, we state our main results with some remarks. Section 3 is devoted to a simulation study of our MDP results. The proofs are postponed to Section 4. Moreover, we include the MDP for several other classical likelihood ratio tests for mean and covariance matrices of high-dimensional Normal distributions in Appendix A.

4

JIANG HUI AND WANG SHAOCHEN

2. MDP for testing covariance matrices of normal distributions proportional to the identity matrix As described in the introduction, we consider the testing problem H0 :

Σ = λIp

vs.

H1 :

Σ ̸= λIp

for the Normal distribution Np (µ, Σ) with λ unspecified and p/n → y ∈ (0, 1]. Let Vn and µn , σn be defined as in (4)–(6), respectively. Our main result is the following: THEOREM 2.1. (i) If y = 1, then (ln Vn − µn )/(an σn ) satisfies the moderate deviation principle with speed a2n and good rate function I(x) = x2 /2 for all x ∈ R, where {an : n ≥ 1} is a sequence of positive numbers satisfying an lim an = ∞ and lim sup = 0. n→∞ n→∞ σn (ii) If y ∈ (0, 1), then (ln Vn − µn )/(an σn ) satisfies the moderate deviation principle with speed a2n and good rate function I(x) = x2 /2 for all x ∈ R, where {an : n ≥ 1} is any sequence of positive numbers such that an → ∞ and limn→∞ an /n = 0. In both cases, for any fixed x ≥ 0, we have that ( ) 1 1 ln Vn − µn x2 lim 2 ln Pr . ≥ x = − n→∞ an an σn 2 Several remarks are in order.

REMARK 1. The different assumptions on the scale an in Theorem 2.1 are mainly due to the fact that if y = 1, then σn tends to infinity, whereas for y ∈ (0, 1), σn is bounded. In the proof of the above theorem, we will see that, in case (i), we have √ σn2 ∼ −2 ln(1 − p/n). Therefore, √ we can choose an = σn . In contrast, an can be chosen in case (ii) to be, e.g., n. REMARK 2. Suppose that the rejection region for testing the null hypothesis H0 against H1 has the form {|(ln Vn − µn )/σn | ≥ can }, where c is a constant and an is the scale number in the above theorem. Then the probability αn of type I error is αn = PrH0 {|(ln Vn − µn )/σn | ≥ can }.

Using Theorem 2.1, we known that for any c ≥ 0

1 c2 ln Pr {|(ln V − µ )/σ | ≥ ca } = − . H n n n n 0 n→∞ a2 2 n lim

This shows that αn ≈ exp(−c2 a2n /2), i.e., αn decays to zero exponentially. Since the asymptotic distribution of the test statistics is obtained under the null hypothesis, we can only get the type I error probability αn . If we derive the asymptotic distribution of the test statistics under the alternative hypothesis, the type II error probability βn can be derived as above.

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

5

REMARK 3. Now consider the simple null hypothesis H0 : Σ = Ip . Under H0 , one has that ln Vn = ln |S| − p ln tr(S) + p ln p. Moreover, nS has the same distribution as Z⊤ Z, where Z = (zij )(n−1)×p and the zij ’s are iid with distribution N (0, 1). Thus, ∑ d p ln tr(S) = p ln tr(nS) − p ln n = p ln zij2 −p ln n, i,j

d

where = refers to equality in distribution. Consequently, d

p ln p − p ln tr(S) = −p ln

∑ zij2 i,j

np

.

Using the fact that zij2 are iid chi-square distributions with one degree of freedom, ∑ it is easy to check that i,j zij2 /(np) satisfies the large deviation principle [7] with speed np and good rate function I(x) = (x−1−ln x)/2, x > 0; otherwise I(x) = ∞. This shows that ( ) ∑ zij2 Pr{|p ln p − p ln tr(S)| ≥ an σn x} = Pr ≥ ean σn x/p np i,j } { np −1 an σn x/p − an σn x/p + ln λ − 1) . ≈ exp − (λ e 2

Moreover, if we assume that an = O(n1−δ ) for some δ ∈ (0, 1), we can easily check that lim supn→∞ an σn /p = 0. Then the Taylor expansion gives 1 nσn2 2 x. ln Pr{|p ln p − p ln tr(S)| ≥ a σ x} ≈ − n n a2n 4p We note that, by the definition of σn2 , regardless of the case (p/n ∈ [0, 1]), σn2 is at most of order − ln(1 − p/n). This implies that if limn→∞ p/n ∈ [0, 1), then lim supn→∞ |nσn2 /(4p)| is bounded. In contrast if limn→∞ p/n = 1, then lim supn→∞ |nσn2 /(4p)| = ∞. That is, the term |p ln p − p ln tr(S)| cannot be dropped in the sense of MDP, when limn→∞ p/n ∈ (0, 1). In contrast, if limn→∞ p/n = 1, then |p ln p − p ln tr(S)| is negligible. Consequently, when limn→∞ p/n = 1, LRT test statistic Vn enjoys the same MDP as ln |S|, as obtained by Jiang and Wang [14]. However, if limn→∞ p/n ∈ (0, 1), then the term |p ln p − p ln tr(S)| affects the variance function essentially. REMARK 4 (Fixed dimensional case). Let Vn be defined as in (4). When p is fixed, Mauchly [21] showed that under H0 , −(n − 1)ρ ln Vn converges weakly to the chisquare distribution with f = (p − 1)(p + 2)/2 degrees of freedom, where ρ = ρn → 1 is used to improve the convergence rate. Obviously, when p varies, this test statistic cannot be used. However, we can as well consider the MDP for −(n−1)ρ ln Vn when p is fixed.

6

JIANG HUI AND WANG SHAOCHEN

3. Numerical simulations of MDP results In this section, we give a numerical simulation verification of 1 z2 ln Pr{|(ln V − µ )/σ a | ≥ z} = − , n n n n n→∞ a2 2 n lim

for any fixed z ≥ 0, where Vn and {an : n ≥ 1} are the test statistic and the scale rate in Section 2, respectively. We consider the following alternative (which was also considered by Jiang and Yang [16] in the case θ = 1.69): p

H0 :

Σ = Ip

vs.

H1 :

z }| { Σ = diag(θ, θ, . . . , θ, 1, 1, . . . , 1) | {z }

(8)

[p/2]

Let (x10 , . . . , xn0 ) and (x11 , . . . , xn1 ) be two samples under the assumptions H0 and H1 , respectively. We compute Vn0 , Vn1 according to (4). We repeat the above (1) (N ) two steps N = 1000 times and we obtain samples Vni , . . . , Vni for i = 0, 1 under the assumptions H0 and H1 , respectively. Moreover, we choose an = n1/2 and define P (z) =

1 (k) #{|lnVn0 − µn | ≥ an σn z : k = 1, . . . , N }, N

1 (k) #{|lnVn1 − µn | ≥ an σn z : k = 1, . . . , N } N and Q(z) = exp (−z 2 a2n /2) for all z ≥ 0. A comparison among P (z), L(z) and Q(z) for different θ’s and p’s is depicted in Figure 1, which illustrates at least two points, namely L(z) =

(i) The black solid line and the blue dashed line are very close even for the small right-hand side region of 0. This confirms the MDP result in Theorem 2.1. (ii) The red dotted line in each panel shows that the moderate deviation tail estimates are very sensitive to the parameter θ. Consequently, if the red dotted line deviates markedly from the black solid line, we should reject H0 .

Figure 2 also illustrates at least two points, namely

(i) The black solid line and the blue dashed line are not very close for the righthand side region of 0. This suggests that the MDP result in Theorem 2.1 may not hold in the case p/n → 0. This may be caused by the misspecification of the constants µn and σn . (ii) Even though the MDP result may not hold (in fact, we still have that the MDP holds for LRT; however, we should correct the mean and variance for LRT), the red dotted line in each panel still shows that the moderate deviation tail estimates are very sensitive to the parameter θ. Consequently, the moderate deviation tail estimates are very robust for the testing problem of (8) in the entire range of p/n → [0, 1].

0.2

0.4

0.6

0.8

0.6 0.0

0.2

0.4

0.6 0.4 0.2 0.0

0.0

0.2

0.4

0.6

0.8

Q(z) P(z) L(z)

0.8

Q(z) P(z) L(z)

0.8

Q(z) P(z) L(z)

0.0

7

1.0

1.0

1.0

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

0.0

0.2

z

0.4

0.6

0.8

0.0

0.2

0.4

z

0.6

0.8

z

1.0

1.0

1.0

Figure 1. Model parameters with n = 100, p = 90 and θ equal to 1.1, 1.69 and 5 in the left, middle and right panel, respectively.

0.0

0.2

0.4 z

0.6

0.8

0.6 0.0

0.2

0.4

0.6 0.4 0.2 0.0

0.0

0.2

0.4

0.6

0.8

Q(z) P(z) L(z)

0.8

Q(z) P(z) L(z)

0.8

Q(z) P(z) L(z)

0.0

0.2

0.4 z

0.6

0.8

0.0

0.2

0.4

0.6

z

Figure 2. Model parameters with n = 100, p = 5 and θ equal to 1.1, 1.69 and 5 in the left, middle and right panel, respectively.

0.8

8

JIANG HUI AND WANG SHAOCHEN

4. Proof of Theorem 2.1 This section will be devoted to proving the main results listed in Section 2. Now, we state some lemmas which play an important role in our analysis. 4.1. Technical lemmas. Let us introduce some notations first. For two sequences of numbers {an : n ≥ 1} and {bn : n ≥ 1}, the notation an = O(bn ) as n → ∞ means that lim supn→∞ |an /bn | < ∞. The notation an = o(bn ), as n → ∞, means that limn→∞ an /bn = 0. For two functions f (x) and g(x), the notations f (x) = O{g(x)} and f (x) = o{g(x)} as x → x0 ∈ [−∞, ∞] are similarly interpreted. Throughout the paper, Γ(z) is the Gamma function defined on the complex plane C. The first lemma is a modified version of Lemma 5.1 in Jiang and Yang [16], where λn is of small order compared with n. LEMMA 4.1. Let λn , n ≥ 1, be a sequence of positive numbers satisfying λn λn → ∞, → 0, n → ∞. n Then, for any fixed µ ∈ R, as n → ∞, we have ( ) Γ(n + µλn ) 1 1 ln = ln n − λn µ + λ2n µ2 + max{O(1/n), O(λ3n /n2 )}. Γ(n) 2n 2n Proof. According to Stirling’s formula (see Gamelin [9], p. 368), ( ) √ ( ) 1 1 ln Γ(n) = n − logn − n + ln 2π + + O 1/n3 , 2 12n as n → ∞. This gives ln Γ(n + µλn ) ( ) √ 1 = n − + µλn ln (n + µλn ) − n − µλn + ln 2π + O(1/n) 2 ( )} ){ ( √ 1 µλn = n − + µλn ln n + ln 1 + − n − µλn + ln 2π + O(1/n) 2 n ( } ){ 1 µλn µ2 λ2n 3 3 = n − + µλn ln n + − + O(λn /n ) − n − µλn 2 n 2n2 √ + ln 2π + O(1/n). Using (9) and (10), we can complete the proof of the lemma. LEMMA 4.2. Let λn , n ≥ 1, be a sequence of positive numbers satisfying λn → 0, n → ∞. λn → ∞, n Moreover, we assume that p p → ∞, → y ∈ (0, 1], n → ∞. n

(9)

(10)



MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

9

Then, for any µ ∈ R, as n → ∞, we have ) } p { ( Γp (n + µλn ) ∑ i−1 1 ln = ln n − − λn µ Γp (n) 2 2(n − (i − 1)/2) i=1 +

p ∑ i=1

1 λ2 µ2 + max{O(1/n), O(λ3n /n)}, 2n + 1 − i n

where the function Γp (z) is defined as Γp (z) = π

p(p−1)/4

( ) p ∏ i−1 Γ z− , 2 i=1

(11)

for any complex number z with Re(z) > (p − 1)/2.

Proof. According to the definition of Γp , we can write that ) ( )} ( p { Γp (n + µλn ) ∑ i−1 i−1 ln = + µλn − ln Γ n − . ln Γ n − Γp (n) 2 2 i=1

We note that p/n → y ∈ (0, 1] as n → ∞. This implies that for all i ∈ {1, . . . , p}, we have λn = o{n − (i − 1)/2}. Therefore, the proof follows from Lemma 4.1.  LEMMA 4.3. For any positive integer p with p ≤ n, we have ( ) p ( ∑ i−1 p) . ln 1 − ≥ −p − (2n − p) ln 1 − 2n 2n i=1

(12)

Conversely, ( ) ( ) ( ) p ∑ i−1 1 p ln 1 − ≤ p ln 1 + − p − (2n + 1 − p) ln 1 − . 2n 2n 2n + 1 i=1

(13)

This implies (12). In addition, we have ( ) ( ) ( ) i−1 1 i ln 1 − = ln 1 + + ln 1 − . 2n 2n 2n + 1

(14)

Proof. A simple calculation shows that ( ) ∫ p ( p ( ∑ i−1 x) p) ln 1 − ≥ ln 1 − dx = −p − (2n − p) ln 1 − . 2n 2n 2n 0 i=1

Moreover, ( p ∑ ln 1 − i=1

i 2n + 1

)





0

p

( ln 1 −

x 2n + 1

)

( dx = −p−(2n+1−p) ln 1 −

p 2n + 1

This implies (13) by summing (14) over all possible values of i ∈ {1, . . . , p}.

)

.



10

JIANG HUI AND WANG SHAOCHEN

LEMMA 4.4. Assume that p/n → y ∈ (0, 1] as n → ∞. Then we have p ( ∑ 1 y) = − ln 1 − . lim n→∞ 2n + 1 − i 2 i=1

Proof. By simple calculations, we have p ∑ i=1

p

1 1∑ 1 = → n − (i − 1)/2 n i=1 1 − (i − 1)/(2n)



y

0

1 dx = −2 ln(1 − y/2). 1 − x/2

This immediately implies the conclusion of the lemma.



LEMMA 4.5. Let n > p, p/n → y ∈ (0, 1] and t, s = o(1) as n → ∞. Then ln and

n−1 ∏

Γ(i/2 − t) = pt(1 + ln 2 − ln n) + rn2 {t2 + (p − n + 1.5)t} + o(1) Γ(i/2) i=n−p

ln

Γp (n/2 + t) = p(t − s)(ln n − 1 − ln 2) Γp (n/2 + s) + rn2 {t2 − s2 − (p − n + 0.5)(t − s)} + o(1),

as n → ∞, where rn = {− ln(1 − p/n)}1/2 . Proof. By Proposition 5.1 and Lemma 5.4 in Jiang and Yang [16], we know that the above results hold for t, s = O(rn−1 ). Carefully checking their proofs, we can find that they still hold for any t, s → 0. We omit the details.  4.2. Proof of Theorem 2.1. The next lemma is from Corollary 8.3.6 in Muirhead [22]. LEMMA 4.6. For any h > (p − n)/2, the moments of Vn satisfy E(Vnh ) = pph

Γ{(n − 1)p/2} Γp {(n − 1)/2 + h} × , Γ{(n − 1)p/2 + ph} Γp {(n − 1)/2}

where Γp (z) is defined by (11).

Proof of Theorem 2.1. By the G¨artner–Eillis theorem (see Dembo and Zeitouni [7]), we only need to show that, for any fixed λ ∈ R, { ( )} 1 ln Vn − µn lim Ψn (λ) = lim 2 ln E exp λan = λ2 /2, (15) n→∞ n→∞ an σn where µn and σn are defined by (5) and (6), respectively. We consider two cases.

Case 1: p/n → y = 1 as n → ∞. Note that in this case, as n → ∞, we have σn2 ∼ −2 ln(1 − p/n) → ∞,

λan /σn → 0.

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

11

According to the proof of Theorem 1 in Jiang and Yang [16], and Lemma 4.5, we deduce that 1 λµn λ2 Ψn (λ) = 2 ln EVnλan /σn − = + o(1). an an σn 2 This implies (15) immediately. Case 2: p/n → y ∈ (0, 1) as n → ∞. In this case,

lim σn2 = −2y − 2 ln(1 − y) > 0.

n→∞

This implies that the variance σn2 is uniformly bounded. Then |λan σn−1 | tends to infinity as n → ∞. Therefore, in this case we cannot use the proof of Theorem 1 in Jiang and Yang [16] directly. Therefore, some finer analysis is called for. For convenience, we let λn = λan σn−1 . By assumption, an ≪ n. This gives |λn | ≪ n. This, in turn, implies that we can use Lemma 4.6 to compute the moment of Vn . We have Γ{(n − 1)p/2} Γp {(n − 1)/2 + λn } ln EVnλn = pλn ln p + ln + ln . (16) Γ{(n − 1)p/2 + pλn } Γp {(n − 1)/2} Using Lemma 4.1, we get that

{ } Γ{(n − 1)p/2} 1 ln (17) = − ln(n − 1) + ln p − ln 2 − pλn Γ{(n − 1)p/2 + pλn } (n − 1)p (n − 1)p + 1 2 − λn + max{O(1/n), O(λ3n /n)}. (n − 1)2

Similarly, by Lemma 4.2, we derive that

) } p { ( Γp {(n − 1)/2 + λn } n−1 ∑ i−1 1 ln = pλn ln + ln 1 − − λn Γp {(n − 1)/2} 2 n−1 n−i i=1 +

p ∑ n−i+1 i=1

(n −

i)2

(18)

λ2n + max{O(1/n), O(λ3n /n)}.

Combining all the assertions (16)–(18), we obtain that { ( )} p p ∑ ∑ 1 1 i − 1 ln EVnλn = (19) − + ln 1 − λn n − 1 i=1 n − i i=1 n−1 { } p (n − 1)p + 1 ∑ n − i + 1 − − λ2n + max{O(1/n), O(λ3n /n)}. 2 (n − 1)2 (n − i) i=1 According to Lemma 4.3, we have ) ( ) ( p ∑ p i−1 ≥ −p − (n − p − 1) ln 1 − ln 1 − n − 1 n−1 i=1

12

and

JIANG HUI AND WANG SHAOCHEN

p ∑ i=1

( ) ( ) ( i−1 1 p) ln 1 − ≤ p ln 1 + − p − (n − p) ln 1 − . n−1 n−1 n

Adding the two equations and using λn = λan σn−1 and Lemma 4.4, we get that { } ( ) p p ∑ ∑ 1 1 i−1 λn − + ln 1 − − µn = O(an ). n − 1 i=1 n − i i=1 n−1 This implies that, as n → ∞, { } ) ( p p ∑ ∑ λn 1 1 i−1 − + ln 1 − − µn → 0. a2n n − 1 i=1 n − i i=1 n−1

(20)

Using Lemma 4.4 again, we have, as n → ∞, p

(n − 1)p + 1 ∑ n − i + 1 − → y + ln(1 − y). (n − 1)2 (n − i)2 i=1

Combining (19) and (20), we obtain )} { ( 1 λ2 ln Vn − µn λ2 lim 2 ln E exp λan = lim 2 {−y − ln(1 − y)} = . n→∞ an n→∞ σn σn 2 This proves (15) and completes the proof of the theorem.



Appendix A. MDP for other LRT statistics Herein, we give a brief summary of the MDPs for several other classical likelihood ratio test statistics for mean and covariance matrices of high-dimensional normal distributions. These include the LRT statistics in testing that several components of a vector with distribution Np (µ, Σ) are independent, the hypothesis test problem with H0 : Np (µ1 , Σ1 ) = · · · = Np (µk , Σk ), the test of the equality of the covariance matrices from several Normal distributions and others. The asymptotic normality of these test statistics has been obtained in Jiang and Yang [16]. In the following subsections, means convergence in distribution. A.1. Testing equality of means and covariance matrices. (I) Testing whether multiple Normal distributions are identical Given Normal distributions Np (µ1 , Σ1 ), . . . , Np (µk , Σk ), we are testing whether they are all identical, i.e., H0 : µ1 = · · · = µk ,

Σ1 = · · · = Σk

vs. Ha : H0 is not true.

(21)

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

13

Let {yij : 1 ≤ i ≤ k, 1 ≤ j ≤ ni } be independent p-dimensional random vectors and {yij : 1 ≤ j ≤ ni } be iid from N (µi , Σi ), for each i ∈ {1, . . . , k}. Moreover, let A=

k ∑ i=1

B=

k ∑



¯ )(¯ ¯) , ni (¯ yi − y yi − y Bi =

i=1

ni k ∑ ∑ i=1 j=1

Bi =

ni ∑ j=1

¯ i )(yij − y ¯ i )⊤ and (yij − y

¯ i )(yij − y ¯ i )⊤ , (yij − y

where ni 1 ∑ ¯i = y yij , ni j=1

k

1∑ ¯= ¯i, y ni y n i=1

n=

k ∑

ni .

i=1

The following likelihood ratio test statistic for (21) was first derived by Wilks [26]: ∏k |Bi |ni /2 npn/2 Λn1 = i=1 × . ∏ pn /2 k |A + B|n/2 ni i i=1

See also Theorem 10.8.1 from Muirhead [22]. When ni > p + 1 and the p’s are proportional to ni , viz. p/ni → yi ∈ (0, 1],

1 ≤ i ≤ ℓ,

n → ∞,

Jiang and Yang [16] established, under H0 , that ln Λn1 − µn1 nσn1

where µn1 and

1 = 4

{

−2kp −

k ∑

yi +

i=1

2 σn1

with

n′i

nrn2 (2p

1 = 2

N (0, 1),

− 2n + 3) −

( k ∑ n2 i

i=1

n2

rn2 ′i − rn2

)

k ∑

ni rn2 ′i (2p

i=1

− 2ni + 3)

}

> 0,

= ni − 1 and rx = {− ln (1 − p/x)}1/2 for x > p.

(II) Testing equality of several covariance matrices Let k ≥ 2 be an integer. For each i ∈ {1, . . . , k}, let xi1 , . . . , xini be iid Np (µi , Σi )distributed random vectors. We are considering H0 : Σ1 = · · · = Σk

vs. Ha : H0 is not true.

14

JIANG HUI AND WANG SHAOCHEN

For each i ∈ {1, . . . , k}, let xi =

ni 1 ∑ xij ni j=1

and Ai =

ni ∑ j=1

(xij − xi )(xij − xi )⊤ ,

as well as A = A1 + · · · + Ak and n = n1 + · · · + nk . Bartlett [4] suggested the likelihood ratio test statistic Λn2 , viz. ∏k (ni −1)/2 (n − k)(n−k)p/2 i=1 (detAi ) Λn2 = × ∏k . (ni −1)p/2 (detA)(n−k)/2 i=1 (ni − 1)

When ni > p + 1 and the p’s are proportional to ni , viz. p/ni → yi ∈ (0, 1],

1 ≤ i ≤ k,

Jiang and Yang [16] established, under H0 , that ln Λn2 − µn2 (n − k)σn2

where µn2 =

and

n → ∞,

N (0, 1),

( 1{ p ) (n − k)(2n − 2p − 2k − 1) ln 1 − 4 n−k k ( ∑ p )} − (ni − 1)(2ni − 2p − 3) ln 1 − ni − 1 i=1

1{ ( p ) ∑( ni − 1 )2 ( p )} = ln 1 − − ln 1 − . 2 n−k n − k n − 1 i i=1 k

2 σn2

(III) Testing specified values for mean vector and covariance matrix Let x1 , . . . , xn be iid Rp -valued random vectors from a Normal distribution Np (µ, Σ), where µ ∈ Rp is the mean vector and Σ is the p × p covariance matrix. We consider the hypothesis test H0 : µ = µ0

and Σ = Σ0

vs. Ha : H0 is not true,

where µ0 is a specified vector in Rp and Σ0 is a specified p×p non-singular matrix. By ˜ i = Σ−1/2 (xi − µ0 ), this hypothesis test is equivalent applying the transformation x to the test of H0 : µ = 0 and Σ = Ip

We recall the notation

n

x ¯=

1∑ xi n i=1

and A =

vs. Ha : H0 is not true. n ∑ i=1

(xi − x ¯)(xi − x ¯ )⊤ .

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

15

The likelihood ratio test can be defined as ( e )np/2 ⊤ Λn3 = |A|n/2 e−tr(A)/2 e−n¯x x¯/2 . n When n > p + 1 for all n ≥ 3 and the p’s are proportional to n, viz. p/n → y ∈ (0, 1],

n → ∞,

Jiang and Yang [16] established, under H0 , that ln Λn3 − µn3 nσn3

where µn3

1 =− 4

and 2 σn3

{

N (0, 1),

( n(2n − 2p − 3) ln 1 −

1 =− 2

{

p n−1

)

+ 2(n + 1)p

}

( )} p p + ln 1 − > 0. n−1 n−1

Using Corollaries 10.8.3 and 8.5.4 in Muirhead [22], Lemma 5.8 in Jiang and Yang [16] and methods similar to those in the proof of Theorem 2.1 (Lemmas 4.1–4.4), we can derive the following result. We omit the details. THEOREM A.1. Let Λni and µni , σni for each i ∈ {1, 2, 3}, be defined as in (I)– (III), respectively. Then, for each i ∈ {1, 2, 3}, the following statements hold true. (a) If y = 1, then (ln Λni − µni )/(an nσni ) satisfies the moderate deviation principle with speed a2n and good rate function I(x) = x2 /2 for all x ∈ R, where {an : n ≥ 1} is any sequence of positive numbers such that an = 0. lim an = ∞ and lim sup n→∞ n→∞ nσni (b) If y ∈ (0, 1), then (ln Λn −µni )/(an nσni ) satisfies the moderate deviation principle with speed a2n and good rate function I(x) = x2 /2 for all x ∈ R, where {an : n ≥ 1} is any sequence of positive numbers such that lim inf n→∞ an /n = ∞. A.2. Testing independence of Normal distributions. (IV) Testing independence of components of Normal distributions Let k ≥ 2, p1 , . . . , pk be positive integers. Let p = p1 + · · · + pk and Σ = (Σij )p×p be a positive–definite matrix, where Σij is a pi × pj sub-matrix for all i, j ∈ {1, . . . , k} Let Np (µ, Σ) be a p-dimensional normal distribution. We are testing H0 : Σij = 0 for all 1 ≤ i < j ≤ k

vs. H1 : H0 is not true.

(22)

16

JIANG HUI AND WANG SHAOCHEN

Let x1 , . . . , xN be iid with distribution Np (µ, Σ). Let n = N − 1 and S be the covariance matrix as in (3). Now we partition A = nS in the following way:   A11 A12 · · · A1k  A21 A22 · · · A2k  A= , ..   ... ··· ··· .  Ak1 Ak2 · · ·

Akk

where Aij is a pi ×pj matrix. Wilks [26] and Muirhead [22] showed that the likelihood ratio statistic for testing (22) is given by Λn4 = ∏k

|A|(n+1)/2

i=1 |Aii

|(n+1)/2

= (Wn )(n+1)/2 .

(23)

When n ≥ p + 1 and the pi ’s are proportional to n, viz. pi /n → yi ∈ (0, 1),

n → ∞,

(ln Wn − µn4 )/σn4

N (0, 1),

Jiang and Yang [16] established, under H0 , that where

k ( 3) ∑ 2 ( 3) 2 p−n+ µn4 = −rn−1 + rn−1,i pi − n + , 2 2 i=1

and rx = {− ln(1 − p/x)} each i ∈ {1, . . . , k}.

1/2

2 2 σn4 = 2rn−1 −2

for x > p and rx,i = {− ln(1 − pi /x)}

1/2

k ∑

2 rn−1,i

i=1

for x > pi and

(V) Testing complete independence Finally, we study the likelihood ratio test of the complete independence of the coordinates of a high-dimensional normal random vector. More precisely, let R = (rij )p×p be the correlation matrix generated from Np (µ, Σ) and x = (x1 , . . . , xp ) ∼ Np (µ, Σ). The test is H0 : R = I vs. Ha : R ̸= I.

(24)

Let x = (x1 , . . . , xn ) ∈ Rn and y = (y1 , . . . , yn ) ∈ Rn . We recall that the Pearson correlation coefficient r is defined by ∑n ¯ )(yi − y ¯) (xi − x , r = rx,y = √∑n i=1 (25) ∑n 2 ¯ )2 ¯ ) × i=1 (yi − y i=1 (xi − x ∑ ∑ ¯ = ni=1 yi /n. ¯ = ni=1 xi /n and y where x We say that a random vector x ∈ Rn has a spherical distribution if Ox and x have the same probability distribution for every n × n orthogonal matrix O. Let X = (xij )n×p = (x1 , . . . , xn ) = (y1 , . . . , yp ) be an n × p matrix such that y1 , . . . , yp are independent random vectors with n-variate spherical distributions

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

17

and Pr(yi = 0) = 0 for all i ∈ {1, . . . , p} (these distributions may be different). Let rij = ryi ,yj , i.e., the Pearson correlation coefficient between yi and yj for any 1 ≤ i ≤ j ≤ p. Then Rn = (rij )p×p is the sample correlation matrix. It is known that Rn can be written as Rn = U⊤ U, where U is an n × p matrix; see, e.g., Jiang [15]. Thus, Rn does not have full rank. Hence, |Rn | = 0 if p > n. When n ≥ p + 5 and p is proportional to n, viz. p/n → y ∈ (0, 1],

n → ∞,

Jiang and Yang [16] established, under H0 , that ln Λn5 − µn5 σn5

where Λn5 = |Rn |,

and

N (0, 1),

( 3) ( p ) n−2 µn5 = p − n + p ln 1 − − 2 n−1 n−1

( )) p p = −2 + ln 1 − . n−1 n−1 Using Theorem 11.2.3 in Muirhead [22], Lemma 5.10 in Jiang and Yang [16] and methods similar to those in the proof of Theorem 2.1 (Lemmas 4.1–4.4), we can obtain the following result. We omit the details. 2 σn5

(

THEOREM A.2. Let Λni and µni , σni for i ∈ {4, 5}, be defined as in (IV)–(V), respectively. For i ∈ {4, 5}, the following statements hold true. (a) If y = 1, then (ln Λni −µni )/(an σni ) satisfies the moderate deviation principle with speed a2n and good rate function I(x) = x2 /2 for all x ∈ R, where {an : n ≥ 1} is any sequence of positive numbers such that an lim an = ∞ and lim sup = 0. n→∞ n→∞ σni (b) If y ∈ (0, 1), then (ln Λn − µni )/(an σni ) satisfies the moderate deviation principle with speed a2n and good rate function I(x) = x2 /2 for all x ∈ R, where {an : n ≥ 1} is any sequence of positive numbers such that lim inf n→∞ an /n = ∞. Acknowledgments The authors are grateful to the editor, Christian Genest for his great kindly help which led to an improved presentation of this paper. The authors would like to express their gratitude to the two anonymous referees for their constructive comments which made several places clearly. Jiang Hui is supported by the Fundamental Research Funds for the Central Universities (Grant No. NS2015074) and China Postdoctoral Science Foundation (Grant No. 2016T90450). Wang Shaochen is supported by the Project Funded by China Postdoctoral Science Foundation (Grant No. 2015M580713).

18

JIANG HUI AND WANG SHAOCHEN

References [1] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd Ed., Wiley, New York, 1958. [2] G.E. Andrews, R. Askey, R. Roy, Special Functions, Cambridge University Press, 1999. [3] Z. Bai, D. Jiang, J. Yao, S. Zheng, Corrections to LRT on large-dimensional covariance matrix by RMT, Ann. Statist. 37 (2009) 3822–3840. [4] M.S. Bartlett, Properties and sufficiency and statistical tests, Proc. R. Soc. Lond. A 160 (1937) 268–282. [5] T.T. Cai, Z. Ma, Optimal hypothesis testing for high dimensional covariance matrices, Bernoulli 19 (2013) 2359–2388. [6] S. Chen, L. Zhang, P. Zhong, Tests for high-dimensional covariance matrices, J. Amer. Statist. Assoc. 105 (2010) 810–819. [7] A. Dembo, O. Zeitouni, Large deviations techniques and applications, vol. 38, Springer, New York, 2009. [8] M. Eaton, Multivariate Statistics: A Vector Space Approach, Wiley, New York, 1983. [9] T.W. Gamelin, Complex Analysis, 1st Ed., Springer, New York, 2001. [10] F.Q. Gao, X.Q. Zhao, Delta method in large deviations and moderate deviations for estimators, Ann. Statist. 39 (2011) 1211–1240. [11] T. Inglot, W.C.M. Kallenberg, Moderate deviations of minimum contrast estimators under contamination, Ann. Statist. 31 (2003) 852–879. [12] D.D. Jiang, (2016). Tests for large dimensional covariance structure based on Rao’s score test, J. Multivariate Anal. 152 (2016) 28–?9. [13] D.D. Jiang, T.F. Jiang, F. Yang, Likelihood ratio tests for covariance matrices of highdimensional normal distributions, J. Stat. Plann. Inference 142 (2012) 2241–2256. [14] H. Jiang, S.C. Wang, Cram´er type moderate deviation and Berry–Esseen bound for the Log Determinant of Sample Covariance Matrix, submitted, 2016. [15] T.F. Jiang, The limiting distributions of eigenvalues of sample correlation matrices. Sankhy¯ a 66 (2004) 35–48. [16] T.F. Jiang, F. Yang, Central limit theorems for classical likelihood ratio tests for highdimensional normal distributions, Ann. Statist. 41 (2013) 2029–2074. [17] I. Johnstone, On the distribution of the largest eigenvalue in principal components analysis, Ann. Statist. 29 (2001) 295–327. [18] J. Jureˇckov´ a, W.C.M. Kallenberg, N. Veraverbeke, Moderate and Cram´er-type large deviation theorems for M-estimators, Stat. Probab. Lett. 6 (1988) 191–199. [19] W.C.M. Kallenberg, On moderate deviation theory in estimation, Ann. Statist. 11 (1983) 498–504. [20] O. Ledoit, M. Wolf, Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size, Ann. Statist. 30 (2002) 1081–1102. [21] J.W. Mauchly, Significance test for sphericity of a normal n-variate distribution, Ann. Math. Statist. 11 (1940) 204–209. [22] R.J. Muirhead, Aspects of Multivariate Statistical Theory, Wiley, New York, 1982. [23] T. Otsu, Moderate deviations of generalized method of moments and empirical likelihood estimators, J. Multivariate Anal. 102 (2011) 1203–1216. [24] Q.W. Wang, J.F. Yao, On the sphericity test with large-dimensional observations, Electron. J. Statist. 7 (2013) 2164–2192. [25] S.S. Wilks, On the independence of k sets of normally distributed statistical variables, Econometrica 3 (1935) 309–326. [26] S.S. Wilks, Certain generalizations in the analysis of variance, Biometrika 24 (1932) 471–494.

MDP FOR CLASSICAL LRTS OF HIGH-DIMENSIONAL NORMAL DISTRIBUTIONS

E-mail address: [email protected] E-mail address: [email protected]

19