Analysis of high-dimensional repeated measures designs: The one sample case

Analysis of high-dimensional repeated measures designs: The one sample case

Computational Statistics and Data Analysis 53 (2008) 416–427 Contents lists available at ScienceDirect Computational Statistics and Data Analysis jo...

880KB Sizes 0 Downloads 28 Views

Computational Statistics and Data Analysis 53 (2008) 416–427

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Analysis of high-dimensional repeated measures designs: The one sample case M. Rauf Ahmad a , C. Werner b , E. Brunner a,∗ a

Department of Medical Statistics, University of Göttingen, Germany

b

DATAMAP GmbH, Freiburg, Germany

article

info

Article history: Received 10 October 2007 Received in revised form 7 August 2008 Accepted 7 August 2008 Available online 19 August 2008

a b s t r a c t A one sample statistic is derived for the analysis of repeated measures design when the data are multivariate normal and the dimension, d, can be large compared to the sample size, n, i.e. d > n. Quadratic and bilinear forms are used to define the statistic based on Box’s approximation [Box, G.E.P., 1954. Some theorems on quadratic forms applied in the study of analysis of variance problems I: Effect of inequality of variance in the oneway classification. Annals of Mathematical Statistics 25 (2), 290–302]. The statistic has an approximate χf2 distribution, even for moderately large n. One of the main advantages of the statistic is that it can be used both for unstructured and factorially structured repeated measures designs. In the asymptotic derivations, it is assumed that n → ∞ while d remains finite and fixed. However, it is demonstrated through simulations that for n as small as 10, the new statistic very closely approximates the target distribution, unaffected by even large values of d (d > n). The application is illustrated using a sleep lab example with n = 10, d = 18. © 2008 Elsevier B.V. All rights reserved.

1. Introduction and a motivating example Jordan et al. (2004) report a sleep disorder experiment where the objective was to investigate the activity of prostaglandin-D-synthase (β -trace) in relation to human sleep. The variable of interest, serum concentration of lipocalintype prostaglandin D synthase, was measured on 10 young, healthy women (we omit the measurements taken on 10 healthy men to keep the presentation simple). The repeated measures were taken at 4 h time intervals for 5 consecutive days and nights. Part of the data are depicted in Fig. 1. The selected part of the data consist of serum concentrations on women measured for first three nights. These are, therefore, a total of 18 repeated measurements taken on each of 10 subjects at 4 h time intervals for three consecutive nights, classified as Normal Sleep, Total Sleep Deprivation and Recovery Night. It is a structured repeated measures data with longitudinal structure. Experiments with such a design layout are frequently carried out in medical, biological and behavioral research (see e.g. Hand and Taylor (1987), Crowder and Hand (1990) and Davis (2002)). People have also addressed the profile analysis for the repeated measures designs when the underlying covariance structure is specified. For example, Kenward (1987) considers the profile analysis under antedependence covariance structure when the types of profiles are not specified in advance. Recently, there have been several attempts to address the question of analyzing such data when the number of repeated measures, d, exceeds the number of subjects/replications in an experiment. This is the case for the selected part of data shown in Fig. 1. There are, though only few, recent attempts on the issue of high dimensionality that can be found in the literature. Bathke (2002) proves that the classic ANOVA F -test is still asymptotically normal when the number of levels of a factor converges

∗ Corresponding address: Department of Medical Statistics, University of Göttingen, Humboldtallee 32, 37073 Gottingen, Germany. Tel.: +49 551 39 4991; fax: +49 551 39 4995. E-mail address: [email protected] (E. Brunner). 0167-9473/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2008.08.013

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

417

Fig. 1. Whisker plot for sleep lab data.

to infinity but the number of replications is finite, assuming independent observations. He proves the asymptotic normality without relying on the normality assumption for the model. Bathke and Harrar (2008) derive asymptotic distributions of different multivariate tests for factorial structure when the number of treatments tends to infinity while the number of replications is relatively small. The test statistics are nonparametric and do not require any special structure of the covariance matrix. For other such references, see Bathke et al. (2008), and Harrar and Bathke (2008). In this paper, we consider a similar design as given by Bathke and Harrar, however, for repeated measures taken on the same subject. We introduce a modified version of the ANOVA-type statistic, based on Box’s approximation (Box, 1954), to test the hypothesis H0 : Hµ = 0, where H is an appropriate hypothesis matrix. In addition to the usual hypotheses for the classical profile analysis, the hypothesis matrix H can also be formulated for a factorial structure with repeated measures on one or more factors. The sleep lab data, cited above, for example, contains a factorial structure with repeated measures on the interaction of both factors. The statistic asymptotically follows a χ 2 -distribution and is not affected by the dimension, i.e. the number of repeated measures. For any fixed d we let n → ∞. The quality of approximation of the distribution is not affected by even a large dimension d. Furthermore, the variances of the estimators used to develop the statistic are uniformly bounded with respect to d. The main idea is introduced in Section 2. The estimators are defined in Section 3. The sampling distribution of the statistic and an approximation of it is investigated in Section 4. In Section 5, the performance of the statistic is investigated through simulation studies in a variety of parameter settings. The sleep disorder data, explained in Section 1 above, is analyzed in Section 6. Section 7 covers a discussion and conclusions of the present work. The derivations are given in the Appendix. 2. The ANOVA-type statistic Let Xk = (Xk1 , . . . , Xkd )0 ∼ N (µ, Σ ), Σ > 0, k = 1, . . . , n, be independent, identically distributed random vectors where the d components are the repeated measurements taken on kth individual. In this set-up, the null hypothesis to be tested is H0 : Hµ = 0 where H is the hypothesis matrix. The matrix H can be formulated in different settings depending on the objectives of the experiment. Specifically, we may set H = I to test the usual multivariate hypothesis H0 : µ = 0. For a simple, unstructured repeated measures design, we have H = Pd , where Pd = Id − 1d Jd , being the centering matrix, is symmetric and idempotent, hence a projection matrix, J = 110 is the matrix of 1s and I is the unit matrix. To give a unique representation of the null hypothesis, we can write H0 : Tµ = 0 where T = H0 (HH0 )− H is the general hypothesis matrix with (HH0 )− denoting a g-inverse of HH0 . We note that Hµ = 0 ⇔ Tµ = 0 and T is a projection matrix. The matrix T can be formulated so as to represent any general linear hypothesis, including, for example, any factorial structure of repeated measures. For example, for the sleep lab experiment described above, we have T = P3 ⊗ 16 J6 and T = 13 J3 ⊗ P6 for the main effects and P3 ⊗ P6 for the interaction effect. When d < n, such data can be analyzed using the classical multivariate approach without assuming any particular structure of Σ . But when d > n, the case of high-dimensional data, the multivariate approach, for example Hotelling’s T 2 , totally collapses since the estimated covariance matrix is singular. For d < n, there are several competing statistics to be used to test H0 , for example, Wald-type statistic, Hotelling’s T 2 statistic and ANOVA-type statistic which we shall consider in detail in what follows. The Wald-type statistic (WTS) is defined as (Rao, 1973; Timm, 2002) 0

b n H0 Wn = nX H0 HΣ

−

HX,

(1)

bn are the sample estimators of µ and Σ . Clearly, we can replace the g-inverse, (·)− , with the unique where X and Σ Moore–Penrose inverse, (·)+ . Under H0 , Wn ∼ χr2 , as n → ∞, where r = ρ(H) and ρ(·) denotes the rank of a matrix

418

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

Table 1 Estimated quantiles for An with covariance matrix Σ known; 10,000 simulations; d ≶ n Covariance structure/1 − α

d

CS

5 10 50 100 200

AR(0.6)

UN

0.90

0.95

0.99

0.90

0.95

0.99

0.90

0.95

0.99

0.8952 0.9006 0.9016 0.8994 0.8918

0.9498 0.9480 0.9526 0.9454 0.9432

0.9888 0.9890 0.9918 0.9880 0.9884

0.8852 0.8966 0.9016 0.8978 0.8968

0.9404 0.9432 0.9448 0.9458 0.9428

0.9848 0.9856 0.9848 0.9874 0.9860

0.9215 0.9091 0.9003 0.8993 0.8990

0.9596 0.9536 0.9499 0.9486 0.9500

0.9907 0.9910 0.9895 0.9897 0.9897

(see Campbell and Meyer (1979)). Wn can be used to define Hotelling’s T 2 statistic, usually transformed to an F -statistic, as (see e.g Davis (2002, Ch. 3)) Hn =

n−r

(n − 1)r

Wn ,

(2)

where r and Wn are as defined above. Under H0 , the statistic Hn has a central F -distribution, i.e., Hn ∼ F (r , n − r ). The ANOVA-type statistic, An , as considered by Box (1954), Brunner et al. (1997), Brunner et al. (1999) and Tian and Wilcox (2007), is based on the traces of products of matrices involving the sample covariance matrix and is defined as 0

An =

nX TX

b) tr(TΣ

,

(3)

b)]2 /tr(TΣ b)2 is the degree of freedom f where b f = [tr(TΣ where tr(·) denotes the trace of a square matrix. Further, An ∼ χb2 /b f estimated from the sample. The rationale behind the ANOVA-type statistic stems from a shortcoming of the Wald-type statistic, Wn , which needs a very large sample size to yield a good approximation to the chi-square distribution. The approximation is still very bad if b, are zero or near zero. The solution approached by An , in at least some of the diagonal elements of the covariance matrix, Σ b out from the statistic Wn in Eq. (1), and as T = H0 (HH0 )− H, by definition, Wn reduces to this situation, is to take Σ 0 e Qn = nX TX. (4) 2 e Obviously, Qn is not asymptotically distributed as a χ random variable. This problem, however, can be approached using

Box’s approximation (Box, 1954). First, we need the following well-known representation theorem of a quadratic form from the theory of linear models (Box, 1954; Graybill, 1976). Theorem 1. Let X ∼ Nd (0, Σ ) and let T be any symmetric matrix. Then Q = X0 TX ∼

d X

λ i Ci ,

i =1

where λi are the eigenvalues of TΣ and the Ci ∼ χ12 are independent. The quantity Q in Theorem 1 represents a quadratic form as a weighted sum of independent single-degree-of-freedom chisquare random variables. The idea is to approximate the distribution of Q in Theorem 1 with that of a scaled, g χf2 distribution,

where g and f are chosen such that the first two moments of Q and g χf2 coincide. Such an approximation is referred to as Box (1954) approximation although the idea was first proposed by Patnaik (1949), see Box (1954) and Mathai and Provost (1992). Based on this approximation, we obtain (Box, 1954; Mathai and Provost, 1992, Ch. 4) f =

[tr(TΣ )]2 tr(TΣ )2

and g =

tr(TΣ )2 tr(TΣ )

.

In order to investigate the quality of this approximation, first let us evaluate the performance of An when Σ is known. Table 1 reports the simulation results for the test sizes for An for a compound symmetric (CS), an autoregressive (AR(1)) and an unstructured (UN) covariance pattern. A compound symmetric covariance structure is defined as Σ = κ I + ρ J where I is the unit matrix, J is the matrix of 1s and κ and ρ are appropriate constants. A covariance structure is autoregressive of order 1 if Cov(Xk , Xl ) = κρ |k−l| , ∀ k, l. The unstructured covariance matrix refers to the SAS PROC MIXED unstructured pattern (UN), i.e. Σ = (σij )di,j=1 . Table 1 reports the results of 10,000 simulation runs for n = 10, d ∈ {5, 10, 50, 100, 200} while assuming κ = ρ = 1 for CS, κ = 1, ρ = 0.6 for AR(1), and σij = 1(1)d (i = j), ρij = (i − 1)/d (i > j), for UN. From Table 1 we observe that the ANOVA-type statistic maintains the preassigned level quite accurately when the population covariance is known, even for n as small as 10. This performance is the same for all three types of covariances and does not seem to depend on the dimension. The evidence we get from these results is that, provided the covariance matrix

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

419

Table 2 Comparisons of estimated quantiles for Wn in Eq. (1), Hn in Eq. (2) and An in Eq. (3); Σ unknown; 10,000 simulations; d ≶ n (d = 5)

Simulated quantiles

Statistic Wn

Hn

An

Quantile

n = 10

n = 15

n = 20

n = 30

n = 50

0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99

0.6315 0.7062 0.8189 0.8963 0.9491 0.9906 0.9042 0.9536 0.9913

0.7377 0.8161 0.9074 0.9008 0.9506 0.9906 0.9043 0.9519 0.9904

0.7898 0.8573 0.9390 0.9020 0.9513 0.9905 0.9035 0.9515 0.9889

0.8302 0.8947 0.9618 0.9008 0.9503 0.9895 0.9031 0.9526 0.9928

0.8592 0.9179 0.9761 0.8995 0.9487 0.9911 0.9004 0.9534 0.9921

(n = 10)

Simulated quantiles

Statistic Wn

Hn

An

Quantile

d=5

d = 10

d = 20

d = 30

d = 50

0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99

0.6234 0.7005 0.8130 0.8949 0.9440 0.9881 0.9014 0.9532 0.9902

0.0435 0.0546 0.0840 0.8982 0.9484 0.9899 0.9368 0.9774 0.9980

0.9780 0.9836 0.9918 – – – 0.9719 0.9938 1.0000

1.0000 1.0000 1.0000 – – – 0.9849 0.9979 1.0000

1.0000 1.0000 1.0000 – – – 0.9966 0.9997 1.0000

is known, Box’s approximation is accurate whatever be the dimension. Note that we also investigated the behavior of Box’s approximation for other covariance structures, for example, for a heterogeneous AR(1) structure where ρ is the same but the variances are different or for an even extreme case where both variances and correlations are monotonically increasing, and found similar results as reported in Table 1. But, in practice, of course, we need to estimate f and g, hence we need to estimate Σ , or at least some function of it. Using the classical estimator, the sample covariance matrix,

bn = Σ

1

n X (Xk − X. )(Xk − X. )0 ,

n − 1 k=1

we note that the plug-in estimators of the functional tr(TΣ ) is unbiased but the estimators of [tr(TΣ )]2 and tr(TΣ )2 are biased, which makes the estimator of the degrees of freedom, f , biased. Table 2 gives the simulation results for the type I error rates of three test statistics, Wn , Hn and An , as defined in Eqs. (1)–(3), respectively. For the upper part of the table, d = 5 is fixed and n ∈ {10, 15, 20, 30, 50} while for the lower part, n = 10 is fixed and d ∈ {5, 10, 20, 30, 50}. Note that both An and Hn keep the nominal level when d is fixed and n increases, i.e. when d < n, but the Wald-type statistic is liberal. It tends to be less and less liberal as n increases but still at n = 50 the gap is rather large especially when compared to the other two statistics. On the other hand, when n is fixed and d increases, the case of high-dimensional data, Hn totally collapses for d > n and Wn ranges from being very liberal to very conservative, while An is also conservative when d increases. Note that, both An and Hn keep the nominal level when d ≤ n but Wn even fails in this range. One can use similar results to show that f , which is proportional to Box’s  (Box, 1954), is biased and the bias increases with increasing d. The results of Tables 1 and 2 hint at a possibility of improving An while Hn and Wn appear totally unworkable for the case of high dimensionality. Note that, the classical statistics, like Wald-type statistic, are used in practice when d > n and the covariance structure is simple, for example, compound symmetry. An example of mixed model approach to the analysis of repeated measures design is given in Davis (2002, Ch. 6) for d > n assuming simple covariance structures. But, as stated in Davis (2002, p 140), the researcher is limited in the choice of covariance patterns with these methods. To improve the performance of An , we need to define the component estimators of the statistic such that they are unbiased and consistent and the variances of these estimators are uniformly bounded with respect to the dimension, d. In other words, the estimators must be consistent and dimensionally stable. To this end, we need Definition 2 which extends the concept of consistency of a sequence of estimators to an array of estimators which depend, simultaneously, on n and d. P

Definition 2. An array of estimators b θn,d of a functional θd is consistent if b θn,d − θd − → 0 for all fixed d. A straightforward calculation shows that consistency follows if ∀ 1 ≤ d < ∞, d fixed, limn→∞ {E (b θn,d )} = θd and limn→∞ {Var(b θn,d )} =n0. In what follows, we shall show that for the estimators we define for the modified An , E (b θn,d ) = θd , o for all n, and limn→∞

Var(b θn,d )

θd2

= 0, where these variances are uniformly bounded with respect to d, so that the quality of

the approximation depends only on n. Finally, to settle the issue of dimensional stability, we shall ensure that the estimators defined for the modified ANOVA-type statistic follow the criteria given below.

420

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

Definition 3. An array of estimators b θn,d of functional θd is dimensionally stable if, ∀ d ≥ 1, n ≥ 1, (1) E b θn,d = θd ,



(2) Var

b θn,d θd

≤ B(n) < ∞,

where B(n) → 0, as n → ∞ and is uniformly bounded with respect to d. Based on these results, we shall, in the next section, define the new estimators such that they are unbiased, consistent and are uniformly bounded with respect to d. 3. The estimators We have Xk ∼ N (µ, Σ ), Σ > 0, k = 1, . . . , n. For H0 : Tµ = 0, let TXk = Yk . Under H0 , E (Yk ) = 0, Var(TXk ) = TΣ T = S. Then, we define the estimator of the covariance matrix as

Var(Yk ) =

n n 1X 1X b Sn = Sk = Yk Y0k .

n k=1

(5)

n k=1

It is straightforward to verify that b Sn , under H0 , is an unbiased estimator of S. Let Ak = Y0k Yk and Akl = Y0k Yl , k 6= l, be, respectively, a quadratic and a symmetric bilinear form. Let B0 , B1 and B2 be the estimators of tr(TΣ ), [tr(TΣ )]2 and tr(TΣ )2 , respectively, where, n 1X

     n k=1    n n  XX  1  B1 = Ak Al    n(n − 1) k=1 l=1   | {z } k6=l    and    n X n X  1 2   B2 = Akl .    n(n − 1) k=1 l=1    | {z } B0 =

Ak ,

(6)

k6=l

The properties of these estimators are given in the following theorem which is proved in Appendix B. Theorem 4. For the estimators B0 , B1 and B2 , as defined in Eq. (6), we have, under H0 : Tµ = 0,



E (B0 ) = tr(TΣ )

Var

E (B1 ) = [tr(TΣ )]2



B0 tr(TΣ )

 Var

2

, 



n

B1

[tr(TΣ )]2



8 n−1

and E (B2 ) = tr(TΣ )

2

 Var

B2 tr(TΣ )2

 ≤

8 n−1

.

It may be noted that the variances are uniformly bounded with respect to the dimension, d. 4. The approximating distribution Now, we can re-write Qn in Eq. (4), as 0

Qn = nX TX =

n n 1 XX

n k=1 l=1

Akl ,

(7)

where Akl is the bilinear form as defined right before Eq. (6). Then the modified ANOVA-type statistic is defined as Qn e An =

(8)

B0

B

with degrees of freedom e f = B1 . Using the moments of quadratic and bilinear forms (Appendix A), we get the following 2 results which are proved in Appendix C.

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

421

Proposition 5. For Qn and B0 , as defined above, we have E (Qn ) = tr(TΣ ),

(9)

Var(Qn ) = 2tr(TΣ )

2

(10)

and Cov(Qn , B0 ) =

2 n

tr(TΣ )2 .

(11)

The test statistic e An = Qn /B0 is the ratio of two random variables. Using the approximation formulas based on the Taylor series expansion, we can approximately compute these moments of the test statistic. We compute the moments of the statistic ignoring the remainders, O(1/n) for the mean and o(1/n) for the variance, for approximation purposes and obtain the result as given, e.g., in Casella and Berger (2002, p. 245). We get E (X )

  X

E



Y

  X

Var

Y

(12)

E (Y )

[E (X )]2 ≈ [E (Y )]2

Var(X )



[E (X )]2

+

Var(Y )

[E (Y )]2

−2

Cov(X , Y )

 (13)

[E (X )][E (Y )]

where ≈ means ‘approximately equal to’. Note also that, the within parentheses expression for the variance of the statistic is composed of sums of the coefficients of variation of X and Y minus twice the joint coefficient of variation. Consequently, the smaller these coefficients of variation, the better the approximation. Theorem 6. The first two moments of e An are approximately computed as E (e An )

≈1

and Var(e An ) ≈ where, e f =

B1 B2

2

e f

 1−

1 n

with E (e f) ≈



,

[tr(TΣ )]2 . tr(TΣ )2

Proof. The proof is quite trivial by substituting X = Qn and Y = B0 in Eqs. (12) and (13) and by noting that E (Qn ) = E (B0 ), as proved in the Appendix. f are, respectively, 1 and 2/e f . Comparing with the Note that the first two moments of the target distribution χe2 /e f

corresponding moments of the new statistic, e An , we observe that the sampling distribution of e An is very closely approximated f distribution, especially when n → ∞. Moreover, the moments of e An are uniformly bounded with respect by the χe2 /e f

to d. For convenience, we can write Fn = e An ·

B1 B2

,

(14)

such that E (Fn ) ≈ e f and Var(Fn ) ≈ 2e f where the moments, now, correspond to the first two moments of the χe2 distribution; hence Fn ≈ χe2 , as n → ∞. f

f

5. Simulation results The performance of the test statistic is evaluated through simulations for different parameter settings and covariance structures. The results are reported both for level and power for CS, AR(1) and UN covariance patterns introduced with reference to Table 1. Although not reported here, the simulations are also carried out for AR(1) with ρ = 0.2 and a heterogeneous autoregressive (ARH(1)) covariance structure with similar results obtained as reported for other covariance structures. Table 3 reports the estimated 1 − α quantiles for e An . These are the results of 10,000 simulation runs with n ∈ {10, 20, 50} and d ∈ {10, 20, 50, 100, 200, 500} where α ∈ {0.01, 0.05, 0.10}. The test statistic very closely estimates the nominal quantiles even for n as small as 10 and d as large as 500. We note that the closeness is not affected by the increasing dimension and by changing the covariance structure. The power of e An is also examined for n ∈ {10, 50}, d ∈ {10, 20, 50, 100} and the same covariance structures, CS, AR(0.6) and UN. The corresponding power curves are shown in Fig. 2. For these power computations, we define the alternative

422

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

Table 3 Estimated quantiles of e An using B0 , B1 and B2 ; Σ unknown; 10,000 simulations; d > n n

10

20

50

d

10 20 50 100 200 500 50 100 200 500 100 200 500

CS

AR(0.6)

UN

0.90

0.95

0.99

0.90

0.95

0.99

0.90

0.95

0.99

0.9033 0.9008 0.9072 0.9084 0.9089 0.9102 0.9006 0.9015 0.9038 0.9040 0.8965 0.9017 0.9018

0.9503 0.9542 0.9491 0.9535 0.9540 0.9619 0.9476 0.9522 0.9524 0.9521 0.9501 0.9518 0.9514

0.9940 0.9944 0.9938 0.9949 0.9935 0.9944 0.9913 0.9901 0.9921 0.9917 0.9900 0.9924 0.9910

0.8972 0.8961 0.8965 0.8980 0.8950 0.9017 0.8971 0.9025 0.8983 0.9018 0.9018 0.8975 0.9033

0.9472 0.9480 0.9485 0.9482 0.9472 0.9499 0.9442 0.9510 0.9469 0.9498 0.9503 0.9465 0.9497

0.9903 0.9924 0.9900 0.9902 0.9902 0.9906 0.9873 0.9892 0.9884 0.9903 0.9892 0.9861 0.9895

0.8985 0.9017 0.8967 0.9012 0.8973 0.8974 0.9050 0.8983 0.8989 0.8976 0.8933 0.8949 0.9028

0.9471 0.9544 0.9480 0.9496 0.9485 0.9513 0.9520 0.9491 0.9524 0.9514 0.9483 0.9468 0.9538

0.9923 0.9925 0.9911 0.9915 0.9913 0.9914 0.9911 0.9888 0.9911 0.9903 0.9894 0.9879 0.9915

Fig. 2. Power curves for e An .

as µY = δi d−1 , where µY = E (Yk ) = E (TXk ), d−1 = ( 1d , 2d , . . . , dd )0 and δi is the ith element of the vector of constants, δ = 0(0.02)2. We observe that the power of e An is not only very high, even for n = 10, but also increases for increasing d and like the nominal level, the power is also unaffected by changing the covariance structure. Note that, for the case when n > d, i.e. when Hn can be computed, we compared e An with Hn . The two statistics outperform each other depending on the type of alternative and the underlying covariance structure.

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

423

Table 4 Analysis of sleep data Effect

Fn

e f

p-value

Day Time Day×time

0.25 4.04 2.66

1.55 3.86 9.24

0.7200 0.0032 0.0040

NS–TSD NS–RN TSD–RN

3.39 2.51 1.54

4.67 5.00 7.60

0.0057 0.0279 0.1404

Time (NS) Time (TSD) Time (RN)

4.54 1.08 2.80

2.98 8.46 4.74

0.0035 0.3709 0.0175

Fig. 3. Day × time interaction plot for sleep lab data.

6. Analysis of sleep lab data The test statistic is used to analyze the sleep lab example introduced in Section 1. The variable of interest is serum concentration measured on each of n = 10 individuals at d = 18 time points. Fig. 1 gives the mean profiles of the individuals for three consecutive nights. It is intended to examine all three effects, the main effects of day and time and the effect of their interaction. Table 4 reports the results. The hypothesis matrices for the main effects of day and time are P3 ⊗ 16 J6 and 13 J3 ⊗ P6 , respectively, while that for the interaction effect is P3 ⊗ P6 . For each effect, the test statistic

Fn , under H0 , follows a χ 2 distribution with e f degrees of freedom. We observe highly significant time and interaction effects. Since the interaction effect is significant (Fig. 3), we stratify it to study, for example, the effect for each pair of days. Note that, in this case, n = 10 and d = 12. The middle panel of the table gives the results of three interactions for the pairs of days labeled as Normal Sleep (NS), Total Sleep Deprivation (TSD) and Recovery Night (RN). Based on the closure testing principle, we observe that the interaction is significant for the pairs wherein NS is common, i.e. NS–TSD and NS–RN. Since a significant interaction effect can be better interpreted through the simple effects, therefore, the simple effects of time for each day level are also shown in the last panel of the table. 7. Discussion and conclusions A modification of the ANOVA-type statistic, based on Box’s approximation, is presented to analyze repeated measures data when the dimension is large compared to the number of replications. The statistic, Fn , follows a χf2 distribution, asymptotically, and is based on unbiased, consistent and dimensionally stable estimators of the traces, tr(TΣ ), [tr(TΣ )]2 and tr(TΣ )2 . The estimators are defined such that the quality of the approximation to a χ 2 distribution is not affected by a very large dimension. The advantage is that, although we let n → ∞ for the derivation of the asymptotic distribution, the performance of the statistic, regarding control of the preassigned level and power, is in practice not negatively affected by large values of the dimension d which is assumed fixed in the asymptotic derivations. The behavior of the statistic for small samples is evaluated through simulation studies. Note that the statistic is not, intrinsically, invariant under linear transformation but remains so, automatically, since, in a repeated measures set-up, measurements must be made on a commensurate scale. It implies that, the statistic is also

424

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

invariant when each component undergoes the same linear transformation, even if the design set-up is not of repeated measurements, for example, the usual multivariate design set-up. This modification of the ANOVA-type statistic extends the idea presented by Brunner et al. (1997) and it is shown that the modified ANOVA-Type statistic, e An , hence Box’s approximation, is still valid even if the data are high dimensional, when appropriate estimators of the traces are used. Clearly, the statistic and the corresponding estimators are defined under H0 but it does not limit the use of the test statistic in practice as the results are derived under quite general conditions, especially regarding the structure of the covariance matrix. Remark 7. We have derived the modified ANOVA-type statistic under the assumption of multivariate normality. It may be of interest to note that the statistic is robust to the normality assumption. In an ongoing research, we are working on the robustness of the statistic to the normality assumption. The preliminary results are quite promising. This will be the subject of another paper. Remark 8. As pointed out in Section 2, the test statistic, e An , is valid for any general linear hypothesis including the one for T = I which transforms the general hypothesis H0 : Tµ = 0 to the usual multivariate hypothesis. Although the results are not reported here, one can easily verify that the test statistic e An remains valid with all its properties to test H0 : µ = 0. Actually, this is a topic of future research. Note that, for this specific case of multivariate hypothesis, there has been some recent work by other researchers, for example Bai and Saranadasa (1996), Srivastava and Fujikoshi (2006), Srivastava (2007) and Srivastava and Du (2007). Remark 9. Two main areas of interest, where the high-dimensional data frequently occur, is microarray analysis and proteomics analysis where we usually have thousands of observations measured on a few individuals. Actually, analysis of microarray data gave impetus to search for a useful test statistic which works for high dimensionality. It may be noted that the statistic, Fn , may also be applied for the analysis of microarray data. It is expected that the quality of the approximation of the statistic is maintained even if the dimension is in the thousands. Acknowledgements We are thankful to the associate editor and the anonymous referees for their helpful suggestions which led to a substantial improvement of the manuscript. This manuscript is a part of the first author’s Ph.D. thesis for which he is thankful to the Higher Education Commission, Pakistan, and the German Academic Exchange Service (DAAD), Germany, for their financial support. Appendix A. Quadratic and bilinear forms Let Ak and Akl , k 6= l, be the quadratic and bilinear forms, respectively, as defined in Section 3. We state the following theorem without proof. For details see Searle (1971) and Mathai and Provost (1992). Theorem 10. We have Xk ∼ N (µ, Σ ), Σ > 0. For the transformation Yk = TXk where Yk ∼ N (0, TΣ T), the rth cumulant of a quadratic form is given by

κr (Y0k Yk ) = 2r −1 (r − 1)!tr(TΣ )r ,

r = 1, 2, . . . .

(A.1)

For the moments of a bilinear form, we first write it as a quadratic form as 0

Yk Yl =

1 2

0 where Z = Yk ,

 0 0

Yk , 0

Y0l

Yl

0



0 I

 

I 0

Yk Yl

=

0 with mean µ = 0 ,

1 0 Z AZ, 2

0

00 , covariance matrix V =

(A.2) TΣ T 0



0 TΣ T



and A =



0 I



I 0

. Then we have the

following result (Searle, 1971). Theorem 11. Let Y0k Yl be a bilinear form, expressed as a quadratic form, as defined in Eq. (A.2). Let A and V be also as defined above. Then, the rth cumulant of Y0k Yl is computed as

κr (Y0k Yl ) =

1 2

(r − 1)!tr(AV)r ,

r = 1, 2, . . . , k 6= l.

(A.3)

From these general results, it is straightforward to compute the first four moments of quadratic and bilinear forms using the moment-cumulant relationships (see Stuart and Ord (1994, Ch. 3)).

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

425

For the joint moments we have the following theorem. Theorem 12. Cov(Ak Al , Ar As ) =

2tr(TΣ )2 [tr(TΣ )]2 4 tr(TΣ )2

  Cov(Akl , Ars ) =

Cov(A2kl , A2rs ) =

k 6= r , l 6= s k 6= r , l = s; k = r , l 6= s

 0

2

+ 4tr(TΣ )2 [tr(TΣ )]2

k 6= r , l 6= s; k = r , l 6= s; k 6= r , l = s k = r, l = s



0 tr(TΣ )2

 0

2tr(TΣ )4



6tr(TΣ )4 + 2 tr(TΣ )2



Cov(Akl , Am ) = 0,

2

(A.4)

k = r, l = s

k 6= r , l 6= s k = r , l 6= s; k 6= r , l = s

(A.5)

(A.6)

k = r, l = s

k = m 6= l; l = m 6= k

(A.7)

where Ak , Al , Ar , As , Am are the quadratic forms and Akl , k 6= l, Ars , r 6= s are the bilinear forms, as defined in Section 3. Further, it is assumed that k 6= l, r 6= s. Proof. The first parts of Eqs. (A.4) and (A.6), and of Eq. (A.5) for k 6= l, r 6= s, are clear due to the independence of quadratic and bilinear forms involved. For the second part of Eq. (A.4), let k = r , l 6= s. Then, by independence, Cov(Ak Al , Ak As ) = E (Ak Al Ak As ) − E (Ak Al )E (Ak As )

= E (A2k )[E (Ak )]2 − [E (Ak )]4 = 2tr(TΣ )2 [tr(TΣ )]2 , as required. The last part of Eq. (A.4) is trivial since, for k = r , l = s, Cov(Ak Al , Ak As ) = Var(Ak Al ) and the result follows immediately from Theorem 10 using the fact that k 6= l. For Eq. (A.5), we first note that Cov(Akl , Aks ) = E (Akl Aks ) since E (Akl ) = 0. Now, for the second and third conditions of the first part of Eq. (A.5), let k = r , l 6= s, so that Cov(Akl , Aks ) = E (Akl Aks ) = E (Y0k Yl Y0k Ys ) = E (Y0l Yk Y0k Ys )

= E [tr(Y0l Yk Y0k Ys )] = E [tr(Ys Y0l Yk Y0k )] = tr[E (Ys Y0l Yk Y0k )] = 0, since k 6= l 6= s and E (Yk ) = 0 under H0 . Finally, when k = r , l = s, then Cov(Akl , Ars ) = Var(Akl ) = tr(TΣ )2 from Theorem 10. The most convenient way to prove the second part of Equation Eq. (A.6) is to directly use the representation of a bilinear form. We know that E (Yk ) = 0 = E (Yl ), Var(Yk ) = TΣ T = Var(Yl ) and Cov(Yk , Yl ) = 0, k 6= l. Then we can write Pd 0 0 Y0k Yl = j=1 λj Uj Vj where the λj are the eigenvalues of TΣ , and U = (U1 , . . . , Ud ) , V = (V1 , . . . , Vd ) are such that E (U) = 0 = E (V), Var(U) = I = Var(V) and Cov(U, V) = 0. Note that this representation needs no distributional assumption, normality or otherwise, but since we have Yk ∼ N (0, TΣ T), therefore, E (Ui2 ) = 1, E (Ui4 ) = 3. Then, for k = r , l 6= s, we have Cov(A2kl , A2ks ) = E (A2kl A2ks ) − [E (A2kl )]2 , where E (A2kl ) = tr(TΣ )2 , from Theorem 10, and

 E (A2kl A2ks ) = E 

!2 X

X

λ j U j Vj

j

!2  λk Uk Wk 

k





X  XX   2 2 2 2 2 2 4 4 2 2 λ λ U V U W = E λ U V W + j j j j j k j j k k   j  j k | {z } j6=k

=3

X

λ4j +

j

XX j

λ2j λ2k

k

| {z } j6=k

=2

X j

λ + 4 j

XX j

k

 2 λ2j λ2k = 2tr(TΣ )4 + tr(TΣ )2 ,

426

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

so that Cov(A2kl , A2ks ) = 2tr(TΣ )4 . The last part of Eq. (A.6), when k = r , l = s, is trivial since then Cov(A2kl , A2ks ) = Var(A2kl ) and the result comes immediately from Theorem 10 and noting that k 6= l. The last identity is the covariance of a bilinear form and a quadratic form. Since E (Akl ) = 0, therefore, for k = m 6= l, we have Cov(Akl , Ak ) = E (Akl Ak ) = E (Y0k Yl Y0k Yk ) = E (Y0l Yk Y0k Yk ) = 0, since k 6= l and E (Yl ) = 0 under H0 . This completes the proof. Appendix B. Proof of Theorem 4 We use the results of Theorems 10–12 and proceed as follows. The unbiasedness is quite straightforward using the moments of quadratic and bilinear forms and using the independence when k 6= l. Then we begin with Var(B0 ). We have Var(B0 ) =

n 1 X

n2

2

Var(Ak ) =

tr(TΣ )2 .

n

k =1

Now, with k 6= l, r 6= s, we have





      n X n n X n X n X n   X X 1   Var ( A A ) + Var(B1 ) = 2 Cov ( A A , A A )   k l k l r s  n (n − 1)2  k=1 l=1 k=1 l=1 r =1 s=1 | {z }  | {z } | {z }    k6=l  k6=l r 6=s | {z } k6=r ,l=s; k=r ,l6=s

= =

1 n2

h

(n − 1) h 8 2

 

2n(n − 1) 4 tr(TΣ )

tr(TΣ )2

n(n − 1)

2 2



 i + 4tr(TΣ )2 [tr(TΣ )]2 + 4n(n − 1)(n − 2) · 2tr(TΣ )2 [tr(TΣ )]2

i + (n − 1)tr(TΣ )2 [tr(TΣ )]2 .

2

Similarly,





      n X n n X n X n X n   X X 1  2 2 2  Var(B2 ) = 2 Var(Akl ) + Cov(Akl , Ars )  2  n (n − 1)  k=1 l=1 k=1 l=1 r =1 s=1 | {z }  | {z } | {z }    k6=l  k6=l r 6=s {z } | k6=r ,l=s; k=r ,l6=s

= =

1 n2

h

(n − 1) 4

n(n − 1)



2n(n − 1) 6tr(TΣ ) + 2 tr(TΣ )2

2

4



2 

+ 4n(n − 1)(n − 2) 2tr(TΣ )4

i

h  2 i (2n − 1)tr(TΣ )4 + tr(TΣ )2 .

Clearly, all three variances vanish when n → ∞, for any fixed d, proving the estimators to be consistent. Further, we observe that

 Var

Var



tr(TΣ )

 Var

B0

B1

=

tr(TΣ )2

n [tr(TΣ )]

2



[tr(TΣ )]2   B2

2 tr(TΣ )2

=

=

8 n(n − 1) 4

n(n − 1)

2



n tr(TΣ )2



2

[tr(TΣ )]4

+ (n − 1)

tr(TΣ )2

[tr(TΣ )]2

! ≤

8 n−1

! (2n − 1)tr(TΣ )4 8 +1 ≤ .  2 2 n − 1 tr(TΣ )

The inequalities follow immediately by noting that, for two matrices, A ≥ 0 and B ≥ 0, tr(AB) ≤ tr(A)tr(B) (see Zhang (1999, p 166)).

M. Rauf Ahmad et al. / Computational Statistics and Data Analysis 53 (2008) 416–427

427

Appendix C. Proof of Proposition 5 Here, Qn is the quadratic form of the test statistic and from Eq. (7) we have Qn =

1 n

Pn

k=1

Pn

l =1

0

Akl = nX TX. Since,

Var(X) = 1n Σ , the first two identities come directly from the moments of a quadratic form. Hence E (Qn ) = tr(TΣ ) and Var(Qn ) = 2tr(TΣ )2 . For the covariance of Qn and B0 , we write Qn as



 Qn =

n 1 X

  n X n X    A A + kl  , k  n  k=1  k=1 l=1 | {z }

(C.1)

k6=l

where Ak is the quadratic form when k = l and Akl is the bilinear form when k 6= l. Since B0 =









Pn

k =1

Ak , we obtain



n X n n n n X X X X  1    Cov Cov(Qn , B0 ) = 2  A , + Cov A A , Am kl m k  n    k=1 l=1  m=1 k=1 m=1



1 n



| {z }

!    

k6=l

= =

1 n2 2 n

n

Cov

X k=1

Ak ,

n X m=1

! Am

=

1 n2

Var

n X

! Ak

k=1

tr(TΣ )2 ,

using Eq. (A.7). This completes the proof. References Bai, Z., Saranadasa, H., 1996. Effect of high dimension: By an example of a two sample problem. Statistica Sinica 6, 311–329. Bathke, A., 2002. ANOVA for a large number of treatments. Mathematical Methods of Statistics 11 (1), 118–132. Bathke, A., Harrar, S.W., 2008. Nonparametric methods in multivariate factorial designs for large number of factor levels. Journal of Statistical Planning and Inference 138 (3), 588–610. Bathke, A., Harrar, S.W., Madden, L., 2008. How to compare small multivariate samples using nonparametric tests? Computational Statistics and Data Analysis 52 (11), 4951–4965. Box, G.E.P., 1954. Some theorems on quadratic forms applied in the study of analysis of variance problems I: Effect of inequality of variance in the one-way classification. Annals of Mathematical Statistics 25 (2), 290–302. Brunner, E., Dette, H., Munk, A., 1997. Box-type approximations in nonparametric factorial designs. Journal of the American Statistical Association 92 (440), 1494–1502. Brunner, E., Munzel, U., Puri, M., 1999. Rank score tests in factorial designs with repeated measures. Journal of Multivariate Analysis 70, 286–317. Campbell, S.L., Meyer, C.D., 1979. Generalized Inverses of Linear Transformations. Dover Publications, New York. Casella, G., Berger, R.L., 2002. Statistical Inference, 2nd ed. Duxbury Press, Pacific Grove, CA. Crowder, M.J., Hand, D.J., 1990. Analysis of Repeated Measures. Chapman and Hall, London. Davis, C.S., 2002. Statistical Methods for the Analysis of Repeated Measurements. Springer, New York. Graybill, F., 1976. Theory and Application of Linear Models. Duxbury Press, MA. Hand, D.J., Taylor, C.C., 1987. Multivariate Analysis of Variance and Repeated Measures. Chapman and Hall, London. Harrar, S.W., Bathke, A., 2008. Nonparametric methods for unbalanced multivariate data and many factor levels. Journal of Multivariate Analysis 99 (8), 1635–1664. Jordan, W., Tumani, H., Cohrs, S., et al., 2004. Prostaglandin D synthase (β -trace) in healthy human sleep. Sleep 27 (5), 867–874. Kenward, M.G., 1987. A method for comparing profiles of repeated measurements. Applied Statistics 36 (3), 296–308. Mathai, A.M., Provost, S.B., 1992. Quadratic Forms in Random Variables. Marcel Dekker Inc., New York. Patnaik, P.B., 1949. The non-central χ 2 -and F-distributions and their applications. Biometrika 36, 202–232. Rao, C.R., 1973. Linear Statistical Inference and its Applications, 2nd ed. Wiley, New York. Searle, S.R., 1971. Linear Models. Wiley, New York. Srivastava, M.S., 2007. Multivariate theory for analyzing high dimensional data. Journal of Japanese Statistical Association 37 (1), 53–86. Srivastava, M.S., Du, M., 2007. A test for the mean vector with fewer observations than the dimension. Journal of Multivariate Analysis 99, 386–402. Srivastava, M.S., Fujikoshi, Y., 2006. Multivariate analysis of variance with fewer observations than the dimension. Journal of Multivariate Analysis 97, 1927–1940. Stuart, A., Ord, J.K., 1994. Kendall’s Advanced Theory of Statistics, Vol. 1: Distribution Theory, 6th ed. Arnold Pubs, London. Tian, T., Wilcox, R., 2007. A comparison of two rank tests for repeated measures designs. Journal of Modern Applied Statistical Methods 6 (1), 331–335. Timm, N.H., 2002. Applied Multivariate Analysis. Springer, New York. Zhang, F., 1999. Matrix Theory. Basic Results and Techniques. Springer, New York.