Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i
Estimation of variance components in the mixed effects models: A comparison between analysis of variance and spectral decomposition Mi-Xia Wua, b, ∗ , Kai-Fun Yub , Ai-Yi Liub a
College of Applied Sciences, Beijing University of Technology, Beijing 100022, PR China Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, NIH/DHHS, 6100 Executive Boulevard, Rockville, MD 20852, USA b
A R T I C L E
I N F O
A B S T R A C T
The mixed effects models with two variance components are often used to analyze longitudinal data. For these models, we compare two approaches to estimating the variance components, the analysis of variance approach and the spectral decomposition approach. We establish a necessary and sufficient condition for the two approaches to yield identical estimates, and some sufficient conditions for the superiority of one approach over the other, under the mean squared error criterion. Applications of the methods to circular models and longitudinal data are discussed. Furthermore, simulation results indicate that better estimates of variance components do not necessarily imply higher power of the tests or shorter confidence intervals. © 2009 Elsevier B.V. All rights reserved.
Article history: Received 14 January 2008 Received in revised form 12 August 2008 Accepted 14 March 2009 Available online 27 March 2009 Keywords: Analysis of variance Best linear unbiased estimate Least squares estimate mixed effects model Spectral decomposition
1. Introduction In the past two decades, the mixed effects linear model has received considerable attention from both theoretical and practical points of view due to its extensive applications, for example, in analyzing longitudinal data arising from the health sciences, computer graphics and mechanical engineering. For a comprehensive overview of this model, see Davidian and Giltinan (1996); Diggle et al. (2002) and Demidenko (2004). Consider the mixed effects linear model with two variance components yit = xit + ui + it ,
i = 1, . . . , N, t = 1, . . . , T,
where is p × 1 vector of fixed effects, ui is random effect, ui and it are assumed to be independent and normally distributed with mean 0 and variance 2u and 2 , respectively. Writing this model in matrix form, we have y = X + Zu + ,
(1)
where y and X are of dimensions n × 1 and n × p, respectively, n = NT, Z = IN ⊗ 1T , ⊗ denotes the Kronecker product, IN is the identity matrix of order N, and 1T is a vector of ones of dimension T, u = (u1 , u2 , . . . , uN ) and e = (11 , . . . , 1T , . . . , N1 , . . . , NT ) . Thus the covariance matrix of y is given by Cov(y) = V(2 ) = 2u ZZ + 2 In . ∗ Corresponding author at: College of Applied Sciences, Beijing University of Technology, Beijing 100124, PR China. E-mail address:
[email protected] (M.-X. Wu). 0378-3758/$ - see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2009.03.014
(2)
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
3963
Analysis of variance is one of the popular methods in the statistical literature to estimate the variance components 2u and 2 . Compared to other estimates such as the maximum likelihood estimate (MLE), restricted maximum likelihood estimate (REMLE), minimum norm quadratic unbiased estimate (MINQUE), the analysis of variance estimate (ANOVAE) has simple closed forms that make it feasible to study analytically its small sample behavior and construct exact tests and confidence intervals for the variance components based on the estimates; see, e.g. Wald (1940), Searle et al. (1992), and Burch and Iyer (1997). In contrast, when the parameter lies in the boundary of the parametric space, the asymptotic distributions of the likelihood-based tests, such as the likelihood ratio tests, do not follow chi-square distributions, as showed in Stram and Lee (1994) and Crainiceanu and Ruppert (2004). Moreover, solutions based on maximum likelihood estimates and restricted maximum likelihood estimates usually have very poor performance in mixed models with relatively small sample size; see Burdick and Larsen (1997). Exact tests and confidence intervals are often required in practice. For the two-variance-component mixed effects model (1), a new approach, referred to as the spectral decomposition, to estimating the variance components, was recently developed by Wang and Yin (2002). The resulting estimate, the spectral decomposition estimate (SDE), possesses similar advantages as the ANOVAE, having a simple closed form and allowing the construction of exact tests and confidence intervals. The present paper serves as the first in the literature to compare analytically the small sample properties of ANOVAE and SDE under model (1). In Section 2 we introduce the ANOVAE and SDE of the variance components. A necessary and sufficient condition is derived in Section 3 under which the two methods yield identical estimates. In Section 4, a relationship between the two estimates is established, and some conditions are found under which one estimate is superior to the other in terms of mean squared error (MSE). Exact test and confidence intervals based on the two estimates are constructed in Section 5. Applications to two special models are given in Section 6. Section 7 presents some simulation results and some discussions are given in Section 8. 2. ANOVA and spectral decomposition Denote by A− a generalized inverse of a matrix A, and write PA = A(A A)− A . Then PA is the orthogonal projector onto M(A), the column space of a matrix A. Let QA = I − PA . Using Henderson Method III (Henderson, 1953 or Searle et al., 1992), we can obtain the ANOVAE of 2 and 2u as
ˆ 2 =
1 y (I − P(X:Z) )y, n − r0
ˆ 2u = {y (P(X:Z) − PX )y − (r0 − rk(X))ˆ 2 }/{T tr(QX PZ )},
(3) (4)
where (X : Z) is a n × (p + N) matrix consisting of the column vectors of the matrices X and Z, rk(A) and tr(A) stand for the rank and trace of a matrix A, respectively, and r0 = rk(X : Z). Note that the covariance matrix of y can be decomposed as Cov(y) = PZ + 2 QZ , where = T 2u + 2 . Left-multiplying model (1) by PZ and QZ , respectively, gives PZ y = PZ X + 1 ,
1 ∼ N(0, PZ ),
Q Z y = Q Z X + 2 ,
2 ∼ N(0, 2 QZ ),
both being singular linear models. From the unified theory of least squares (Rao, 1973), we can use
˜ 2 = y (QZ − QZ X(X QZ X)− X QZ )y/b,
(5)
˜ = y (PZ − PZ X(X PZ X)− X PZ )y/m,
(6)
where m = N − rk(PZ X) and b = rk(QZ ) − rk(QZ X), to estimate 2 and , respectively. An estimate of 2u is then given by
˜ 2u = (˜ − ˜ 2 )/T.
(7)
Wang and Yin (2002) derived the two estimates ˜ 2 and ˜ 2u and termed them as the spectral decomposition estimates of 2 and 2u , respectively. It is easy to verify (Craig's Theorem, see Rao, 1973) that the two quadratic forms, y (PZ − PZ X(X PZ X)− X PZ )y and y (QZ − QZ X(X QZ X)− X QZ )y, are independent, and distributed as 2m and 2 2b , respectively, where 2k denotes a 2 variable with k degrees of freedom. Using this we can construct exact tests and confidence intervals for 2 or 2u based on these quadratic forms; see Section 4 for details.
3964
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
3. Conditions for equality We first notice that the ANOVA and spectral decomposition approach lead to the same estimate for 2 , as stated below. Theorem 3.1. Under model (1) the ANOVAE and SDE of 2 are identical, that is,
ˆ 2 = ˜ 2 . Proof. It follows from M(A : B) = M(A : QA B) and A QA B = 0 that P(A:B) = PA + QA B(B QA B)− B QA
(8)
and rk(A : B) = rk(A) + rk(QA B) for any matrices A and B with the same number of rows. Thus QZ − QZ X(X QZ X)− X QZ = I − (PZ + QZ X(X QZ X)− X QZ ) = I − P(X:Z) , b = rk(QZ ) − rk(QZ X) = n − r0 . 2
Therefore ˆ = ˜ 2 . However, the estimates of 2u from the ANOVA and spectral decomposition can be quite different. We derive below conditions 2 under which the two estimates, ˜ 2u and ˆ u , are identical. Write t = y − X , (r0 − rk(X))(I − P(X:Z) ) A0 = (P(X:Z) − PX ) − {T tr(QX PZ )}, n − r0 PZ − PZX (X PZ X)− X PZ QZ − QZ X(X QZ X)− X QZ A1 = − T. N − rk(PZ X) n − r0 2
2
Clearly, both ˜ 2u and ˆ u are unbiased estimators of 2u , and can be reexpressed as ˆ u = A0 , ˜ 2u = A1 , where t ∼ N(0, V(2 )). Using the fact (Wang and Chow, 1994) that for any known symmetric matrix A, Var(t A) = 2tr(AVAV), we have 2 2 2 4 (n − rk(X))(r0 − rk(X)) 2 2 4 tr(QX PZ ) ˆ Var(u ) = 2 + u + u (9) Ttr(QX PZ ) T 2 [tr(QX PZ )]2 (n − r0 ) [tr(QX PZ )]2 and
(n + N − r0 − rk(PZ X)) 2 1 + 4u . + 2 2u Var(˜ 2u ) = 2 4 2 T[N − rk(PZ X)] N − rk(PZ X) T [N − rk(PZ X)](n − r0 )
(10)
We will need the following two lemmas to prove the main results. Lemma 3.1 (Wang and Chow, 1994). Let P = PA PB . Then (i) P is an orthogonal projection matrix if and only if PA PB = PB PA . (ii) If PA PB = PB PA , then P is the orthogonal projection matrix onto M(A) ∩ M(B). Lemma 3.2. The following statements are equivalent: (i) (ii) (iii) (iv)
P A P B = PB P A , M(A) ∩ M(B) = M(PB A), rk(PB A) = dim(M(A) ∩ M(B)), PB A(A PB A)− A PB = PA PB ,
where dim(·) denotes the dimension of a space. Proof. Note that for any vector c ∈ M(A) ∩ M(B), there exist vectors and such that c = A = B . It follows that PB PA c = PB PA A = PB A = PB B = B = c, yielding
M(A) ∩ M(B) ⊆ M(PB PA ) ⊆ M(PB A).
(11)
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
3965
If (i) holds, then PA (PB A) = PB PA A = PB A, which is equivalent to M(PB A) ⊆ M(A). Note that M(PB A) ⊆ M(B). Thus M(PB A) ⊆
M(A) ∩ M(B). This together with (11) yields (ii).
Conversely, if (ii) is true, then by (11) we have M(A) ∩ M(B) = M(PB PA ), and PA (PB PA ) = PB PA . Thus PB PA is symmetric, and (i) holds. That (ii) and (iii) are equivalent is obvious from (11) and the fact that dim(M(PB A))=rk(PB A), and that (iv) and (i) are equivalent is a direct consequence of Lemma 3.1. The two lemmas lead to the following theorem. 2 Theorem 3.2. Var(˜ 2u ) = Var(ˆ u ) if and only if PX PZ is symmetric, i.e. PX PZ = PZ PX . 2
Proof. Note that for any 2u 0 and 2 > 0, Var(˜ 2u ) = Var(ˆ u ) if and only if the following three equalities all hold: (n − rk(X))(r0 − rk(X))
n + N − r0 − rk(PZ X) = , N − rk(PZ X) [tr(QX PZ )]2 (b) tr(QX PZ ) = N − tr(PX PZ ) = N − rk(PZ X), (c) [tr(QX PZ )]2 /[tr(QX PZ )]2 = 1/(N − rk(PZ X)). (a)
Write r1 = dim(M(X) ∩ M(Z)). Then r0 = rk(X) + rk(Z) − r1 . Since rk(Z) = N, we have n + N − r0 − rk(PZ X) = n − rk(X) + r1 − rk(PZ X). Thus the three conditions (a)–(c) are equivalent to [tr(QX PZ )]2 = tr(QX PZ ) = N − rk(PZ X) = N − r1 .
(12)
Note that the last equality above is equivalent to rk(PZ X) = r1 . Hence by Lemma 3.2 the last equality of (12) holds if and only if PX PZ = PZ PX . On the other hand, if PX PZ = PZ PX , then the first two equalities of (12) hold. Theorem 3.2 follows. We now give the main results. 2 Theorem 3.3. ˜ 2u = ˆ u if and only if PX PZ = PZ PX .
Proof. Note that PX PZ = PZ PX if and only if QX PZ = PZ QX . By Lemma 3.2, if PX PZ = PZ PX , then PZ X(X PZ X)− X PZ = PX PZ ,
QX Z(Z QX Z)− Z QX = QX PZ .
Using (8) we have P(X:Z) − PX = QX PZ = PZ − PZ X(X PZ X)− X PZ . 2
Thus ˜ 2u = ˆ u . 2 2 On the other hand, if ˜ 2u = ˆ u , then Var(˜ 2u ) = Var(ˆ u ). By Theorem 3.2, we have PX PZ = PZ PX . This completes the proof. 2
Remark 3.1. Both ˜ 2u and ˆ u are uniformly minimum variance unbiased (UMVU) estimates of 2u if PX PZ = PZ PX , see Wu and Wang (2004). We exemplify the main results with two popular mixed effects models, both satisfying the identity condition PX PZ = PZ PX , and thus the ANOVA and spectral decomposition yield the same estimates for 2u . Example 3.1. The one-way classification model has the form yij = + ui + ij ,
i = 1, . . . , a, j = 1, . . . , b,
where is a fixed parameter, ui is a random effect, ui and ij are independent with ui ∼ N(0, 2u ) and ij ∼ N(0, 2 ). Here X = 1ab , Z = Ia ⊗ 1b . Thus PX PZ = PZ PX = PX = ¯Jab , where ¯Jt = 1t 1t /t. Example 3.2. The two-way classification model is given by yij = + i + j + ij ,
i = 1, 2, . . . , a,
j = 1, 2, . . . , b,
where i is a random effect, j is a fixed effect. We assume that i ∼ N(0, 2 ) and ij ∼ N(0, 2 ), and all i s and ij s are mutually independent.
3966
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
In matrix form, the model can be expressed as y = X + Z + , where = ( , 1 , . . . , b ) , = (1 , . . . , a ) , and X = (1a ⊗ 1b : 1a ⊗ Ib ), Z = Ia ⊗ 1b . It is easy to verify that PX PZ = PZ PX = (¯Ja ⊗ Ib ) · (Ia ⊗ ¯Jb ) = ¯Ja ⊗ ¯Jb . 4. Conditions for superiority An important question to be answered concerning ANOVAE and SDE is, “If PX PZ PZ PX , then which estimator is better and in what sense?” In this section, we obtain conditions under which one estimator has smaller mean squared error than the other. Since both estimators are unbiased, their mean squared error is identical to their variances. To this end, let us express the spectral decomposition of the matrix QX ZZ QX as QX ZZ QX =
g
di Mi ,
(13)
i=1
where d1 , . . . , dg are the different nonzero eigenvalues of the matrix QX ZZ QX , Mi is the orthogonal projector onto the corresponding eigenvalue subspace. Note that g > 1 because PX PZ PZ PX . Using the facts that Mi2 = Mi , Mi Mj = 0 (i j), P(X:Z) − PX = QX Z(Z QX Z)− Z QX = PQX Z =
g
Mi ,
i=1
we obtain y (P(X:Z) − PX )y =
g
y Mi y,
(14)
i=1
with y Mi y ∼ (di 2u + 2 )2mi , where mi = tr(Mi ), and y M1 y, . . . , y Mg y are independent. Since QX ZZ QX (PZ − PZ X(X PZ X)− X PZ ) = T(PZ − PZ X(X PZ X)− X PZ ), T is a nonzero eigenvalue of QX ZZ QX and PZ − PZ X(X PZ X)− X PZ is the orthogonal projector onto the eigenvalue subspace corresponding to T. Without loss of generality, we assume that M1 = PZ − PZ X(X PZ X)− X PZ ,
d1 = T,
m1 = m = N − rk(PZ X)
and
ˆ 2u (i) = (y Mi y/mi − ˆ 2 )/di ,
i = 1, . . . , g. 2
2
It follows straightforwardly that ˆ u (1) = ˜ 2u , and each ˆ u (i) is an unbiased estimate of 2u . Combining (4) and (14), we obtain the following theorem. 2
Theorem 4.1. The ANOVAE of 2u is a weighted average of {ˆ u (i) : i = 1, . . . , g}, that is,
ˆ 2u = Tm1 ˜ 2u / +
g
2
di mi ˆ u (i)/ ,
(15)
i=2
where = T tr(QX PZ ) =
g
dm. i=1 i i
Thus the SDE ˜ 2u under model (1), in fact, is an unbiased estimate deduced by only one quadratic form of y M1 y, . . . , y Mg y. 2 2 In the following, we will compare the variance of ˆ u , the weighted average of {ˆ u (i) : i = 1, . . . , g}, with that of ˜ 2u when PX PZ PZ PX . Recall Eqs. (9) and (10), we have 2
Theorem 4.2. The corresponding coefficients of 2u 2 and 4 in Var(˜ 2u ) are larger than their counterparts in Var(ˆ u ) if PX PZ PZ PX .
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
3967
Proof. Using the facts that rk(PZ X) = tr(PZ X(X PZ X)− X PZ ) tr(PZ PX PZ ) = tr(PX PZ ) and [tr(QX PZ )]2 = tr(QX PZ QX PZ QX ) tr(QX PZ QX ) = tr(QX PZ ) = N − tr(PX PZ ), we have [tr(QX PZ )]2 [tr(QX PZ )]2
1 1 , tr(QX PZ ) N − rk(PZ X)
and the equalities above hold if and only if PX PZ = PZ PX by Theorem 3.2. Theorem 4.2 thus follows. Note that r0 − rk(X) = N − dim(M(X) ∩ M(Z)), n + N − r0 − rk(PZ X) = n − rk(PZ X : QZ X), 2
we obtain the ratio of the coefficient of 4 in Var(ˆ u ) to that in Var(˜ 2u ): (n − rk(X))(r0 − rk(X)) (n + N − r0 − rk(PZ X)) 2 N − rk(PZ X) [tr(QX PZ )] =
n − rk(X) N − dim(M(X) ∩ M(Z)) N − rk(PZ X) . n − rk(PZ X : QZ X) N − tr(PX PZ ) N − tr(PX PZ )
(16)
Because rk(PZ X) tr(PX PZ ) dim(M(X) ∩ M(Z)) 0 and n − rk(PZ X : QZ X) n − rk(X), it follows that the first two fractions in the right-hand side of (16) are larger than or equal to 1, and the third fraction is no larger than 1. Combining these with Theorem 4.2 we have 2
Corollary 4.1. If the right-hand side of (16) is less than 1, then Var(ˆ u ) < Var(˜ 2u ). 2 Remark 4.1. The ANOVAE ˆ u is not uniformly superior to the SDE ˜ 2u . If (2e /T)?2u and the right-hand side of (16) > 1, then 2 2 Var(ˆ u ) > Var(˜ u ). 2 Example 6.2 in Section 6 illustrates that (16) is not always less than 1. Note that (Var(ˆ u ) − Var(˜ 2u ))/ 2 is a quadratic function of = T 2u / 2 with the negative coefficients for the terms 2 and , and the common term is obtained by subtracting the 2 denominator from the numerator of (16). Thus when the right-hand side of (16) is larger than 1, Var(ˆ u ) > Var(˜ 2u ) for small 2 2 and Var(ˆ u ) < Var(˜ u ) for large ; see Example 6.2. That is, the SD estimate is preferred when the predictor varies considerably across subjects and the random effect variance and the number of repeated observations for each subject, T, is small.
5. Exact tests and confidence intervals A common interest in model (1) is testing hypotheses and constructing confidence intervals concerning the relative magnitude,
= 2u / 2 , of the variation due to the random effects and the random errors. To compare the two approaches under investigation, we assume that PX PZ PZ PX so the two methods yield different estimates of 2u . Consider the hypothesis H0 : = 0 ←→ H1 : > 0 , where 0 0. Since the spectral estimates ˜ and ˜ 2 in (5) and (6) are independent, with m1 ˜ / ∼ 2m1 , b˜ 2 / 2 ∼ 2b , we can construct an exact test statistic F( 0 ) =
˜ T ˜ + 1 y M1 y = = , 2 2 T 0 + 1 (T 0 + 1)˜ (T 0 + 1)m1 ˆ
(17)
which follows an F-distribution with degrees of freedom m1 and b under the null hypothesis H0 : = 0 . Here, ˜ = ˜ 2u / ˜ 2 , is the spectral decomposition estimate of .
3968
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
We reject the null hypothesis H0 if F( 0 ) > Fa,b (). The power of the test is given by
P(F( 0 ) > Fm1 ,b ()) = P
T + 1 Fm1 ,b > Fm1 ,b () , T 0 + 1
(18)
where Fm1 ,b = b2m1 /m1 2b is an F-statistic with degrees of freedom m1 and b, and Fm1 ,b () is the corresponding upper 100% quartile. In contrast the quadratic forms y (P(X:Z) − PX )y and y (I − P(X:Z) )y in the ANOVA method cannot be directly used to construct an exact test for H0 except when 0 = 0, though the two quadratic forms are independent for any 0. From (14) we have g y (P(X:Z) − PX )y/ 2 ∼ i=1 (di + 1)2mi , which is not a 2 variable, but rather a linear combination of two or more independently 2 distributed random variables if 0. In the case of 0 = 0, the exact test statistic based on the quadratic forms in the ANOVA method can be given by FA =
y (P(X:Z) − PX )y/a , y (I − P(X:Z) )y/b
(19)
g where b is defined as in (5), a = rk(QX Z) = i=1 mi , and mi is defined in (14). Under H0 the statistic follows an F-distribution with degrees of freedom a and b. If 0 > 0, FA is intractable because of its complicated distribution. As an alternative instead, Wald's test statistic g
W( 0 ) =
{
i=1
y Mi y/(di 0 + 1)}/a
(20)
y (I − P(X:Z) )y/b
is often used in the literature, see Wald (1940). Clearly, W(0) = FA . To compare the power of the two tests based on F( 0 ) and W( 0 ), we define Fi ( 0 ) =
y Mi y 2
(di 0 + 1)mi ˆ
,
i = 2, . . . , g.
Then W( 0 ), in fact, is a weighted sum of these F-statistics, that is g
W( 0 ) =
mi m1 F( 0 ) + F ( 0 ). a a i
(21)
i=2
It is easy to see that W( 0 ) follows an F-distribution with degrees of freedom a and b under the null hypothesis H0 : = 0 , and the power is given by g P{W( 0 ) > Fa,b ()} = P
i=1
(di + 1)2mi /(di 0 + 1) a2b /b
> Fa,b () .
(22)
Intuitively the power of the two tests based on F( 0 ) and W( 0 ) should be close when m1 /a approaches 1; see simulation results in Section 7. In this case, the former test is more appealing in practice due to its simplicity. Note that the pivotal quantities F( ) ∼ Fm1 ,b and W( ) ∼ Fa,b are decreasing functions of . Thus two exact 100(1 − )% confidence intervals for , based on pivotal quantities F( ) and W( ), respectively, can be given by the sets { ∈ [0, +∞) : Fm1 ,b (/2) F0 ( ) Fm1 ,b (1 − /2)}
(23)
{ ∈ [0, +∞) : Fa,b (/2) W( ) Fa,b (1 − /2)}.
(24)
and
Furthermore, (23) can be simplified as y M1 y y M1 y 1 1 ∈ [0, +∞) : . − − 2 2 T T Tm1 ˆ Fm1 ,b (1 − /2) Tm1 ˆ Fm1 ,b (/2) In Section 7, we will compare the expected lengths of these confidence intervals via Monte Carlo simulation. 6. Applications Section 3 describes two simple examples in which the identity condition PX PZ = PZ PX always holds and thus the ANOVAE and SDE are identical. In this section we examine two more popular models and discuss when the identity condition holds.
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
3969
6.1. Circular data The circular feature of a machined part is one of the most basic geometric primitives, and can be described easily by its center and radius. Due to imperfections introduced in manufacturing, the desired feature may not be truly circular. In order to control the production, we need to estimate the geometric parameters (center and radius), which require data on machined parts along their circumferences, and a corresponding statistical model. In practice the data can be obtained using a computer-controlled coordinate measuring machine, while one of the models adopted for such data is the mixed effects model provided by Wang and Lam (1997), which takes into consideration of the variability in center location of different machined parts. Let (xij , yij ) be the jth coordinated measurement on the circumference of the ith machined part, i = 1, . . . , m, j = 1, . . . , n, where m is the number of machined parts, and n is the number of measurements taken from the circumference. Denote by i(j) the angle of the jth measurement of the ith part. In general the phase change or angular difference i(j+1) − i(j) can be assumed to be known. Write
i(j) = i0 + j ,
j = 1, 2, . . . , n,
i = i cos i0 and i = i sin i0 , where j is known and i0 is fixed but unknown. Then for the data points (xij , yij ), we have the following model (see Wang and Lam, 1997) xij = + i cos j − i sin j + u1i + 1ij , yij = + i sin j + i cos j + u2i + 2ij ,
(25)
where (, ) is the designed location of the center of the part, (u1i , u2i ) is the random variable in the location of the center of the ith machined part, kij ∼ N(0, 2 ), uki ∼ N(0, 20 ), k = 1, 2, i = 1, . . . , m, j = 1, . . . , n, and all kij s and uki s are independent. Put z=(z1 , . . . , zm ) , zi =(xi , yi ) , xi =(xi1 , . . . , xim ) , yi =(yi1 , . . . , yim ) , =(, , 1 , 1 , . . . , m , m ) , u=(u11 , u21 , u12 , u22 , . . . , u1m , u2m ) and
1 −2 =
2
1
with 1 = (cos 1 , . . . , cos n ) , 2 = (sin 1 , . . . , sin n ) . Then model (25) can be rewritten in matrix form as z = X + Zu + , where X = (1m ⊗ I2 ⊗ 1n : Im ⊗ ), Z = I2m ⊗ 1n . Clearly, the covariance matrix of z is Cov(z) = 2 I2mn + 20 (I2m ⊗ 1n 1n ). It is easy to verify that PX PZ = PZ PX under model (25) is equivalent to ⎧ n 1 ⎪ cos j = 0, ⎪ c¯ = ⎪ ⎨ n j=1 n ⎪ 1 ⎪ ⎪ ⎩ s¯ = sin j = 0. n j=1
(26)
According to Theorems 3.1 and 3.2, the SDE and ANOVAE of the variance components (20 , 2 ) are equal if (26) holds. And the condition c¯ = s¯ = 0 can be easily satisfied by sampling n measurements equally spaced around the circumference of the circular feature. 6.2. Longitudinal data Consider the popular model for longitudinal data: yit = 0 + xit 1 + ui + it ,
i = 1, 2, . . . , N, t = 1, 2, . . . , T,
(27)
where ui is the random effect, ui and it have the same assumption as model (1), see Diggle et al. (2002). In fact, model (27) is a special case with design matrix X = (1n : x), where x = (x1 , x2 , . . . , xN ) , xi = (xi1 , xi2 , . . . , xiT ) . Without loss of generality, we assume that rk(X) = 2. That is, x a1n for any scalar quantity a. Note that tr(QX PZ ) = N − 1 − sb (x),
[tr(QX PZ )]2 = N − 1 − 2sb (x) + s2b (x),
where sb (x) = T
N (x¯ i. − x¯ .. )2 i=1
T N i=1 t=1
(xit − x¯ .. )2 ,
3970
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
T 2 x¯ i. = Tt=1 xit /T and x¯ .. = N i=1 t=1 xit /(TN). Thus tr(PX PZ ) = [tr(PX PZ )] if and only if sb (x) = 0 or sb (x) = 1. Combining with the following facts: (i) 0 sb (x) 1; (ii) sb (x) = 0 if and only if x¯ 1. = x¯ 2. = · · · = x¯ N. ; (iii) sb (x) = 1 if and only if xi = ci 1T , i = 1, . . . , N. We can show that PZ PX = PX PZ in model (27) is equivalent to sb (x) = 0 or sb (x) = 1. In fact, if PZ PX = PX PZ then tr(PX PZ ) = [tr(PX PZ )]2 , thus we prove sb (x) = 0 or sb (x) = 1. On the other hand, we have PZ PX = ¯Jn if sb (x) = 0; PZ PX = P(1:c) ⊗ ¯JT if sb (x) = 1, where c = (c1 , . . . , cN ) . Thus we obtain PZ PX = PX PZ . It follows from Theorem 3.3 that the necessary and sufficient condition for the identity of the SDE and ANOVAE of 2u under model (27) is x¯ 1. = x¯ 2. = · · · = x¯ N. or xi = ci 1T , i = 1, . . . , N. However, the two conditions are not usually satisfied in practice. In the following, we compare the SDE and ANOVAE of 2u under general case 0 < sb (x) < 1. For convenience, we simply write sb = sb (x). 2 Clearly, if 0 < sb < 1, then rk(PZ X) = 2, dim(M(X) ∩ M(Z)) = 1, and the ratio of the coefficients of 4 in Var(ˆ u ) and Var(˜ 2u ) by (16) is equal to f (sb ) =
(n − 2)(N − 1)(N − 2) (n − 3)(N − 1 − sb )2
,
which is a continuous and increasing function of sb in the interval (0, 1), and for any N > 2 and T > 1, it has lim f (sb ) =
(n − 2)(N − 2) N(T − 1) − 1 < 1, =1− (n − 3)(N − 1) (n − 3)(N − 1)
lim f (sb ) =
(n − 2)(N − 1) > 1. (n − 3)(N − 2)
sb →0+
sb →1−
So there exists sb0 ∈ (0, 1) such that f (sb ) < 1 if sb < sb0 and f (sb ) > 1 if sb > sb0 . In fact, sb0 can be given by sb0 = N − 1 −
(N − 1)(N − 2)(n − 2) . (n − 3)
(28)
From Corollary 4.1, it follows that the ANOVAE of 2u is superior to the SDE if 0 < sb < sb0 . Because 0.5 −
1 1 < sb0 < 0.5 + , 2T 2N − 3
it follows that (i) f (sb ) < 1 if sb < 0.5 − 1/(2T); (ii) f (sb ) > 1 if sb > 0.5 + 1/(2N − 3). Combining with Theorem 4.2, we conclude that for any 2 > 0, 2u 0 the ANOVAE of 2u is superior to the SDE if sb < 0.5−1/(2T). 2 On the other hand, the ANOVAE of 2u is not always superior to its SDE. Let v() = {Var(ˆ u ) − Var(˜ 2u )}/ 4 , which is the quadratic function of = T 2u / 2 , and the coefficients of 2 and are negative. Using the fact that v(0) > 0 if sb > 0.5 + 1/(2N − 3), we can show that there exist two different solutions for the equation v() = 0 in this case, denoted by 1 , 2 (1 < 2 ). Clearly, 1 < 0 < 2 . 2 Thus v() > 0 for any ∈ (0, 2 ) if sb > 0.5 + 1/(2N − 3). That is, Var(ˆ u ) > Var(˜ 2u ) if sb > 0.5 + 1/(2N − 3) and T 2u / 2 < 2 . 7. Simulation Under model (27), both T and Tsb are distinct nonzero eigenvalues of the matrix QX ZZ QX , and their multiplicities are N − 2 and 1, respectively. We fix 2e = 1, and compare via Monte Carlo simulation the power of F0 ( 0 ) and W( 0 ). Results are displayed for two different choices xit = t + eit + i and xit = log(t + 1) + eit + 2i , and three different sample sizes (N,T): (12, 3), (20, 4), (5, 10), with 10,000 replicates in each case, where i ∼ N(0, 1) and eit ∼ N(0, 1) are independent. Note that for each choice of sample size (N, T), xit s are fixed. Here sb0 can be calculated by (28) for the three sample sizes of (N,T), that is, 0.354, 0.387 and 0.499, respectively. We first obtain two sets of simulated values, (0.345, 0.280, 0.039) and (0.732,0.741,0.431), of sb under the three sample sizes of (N,T), respectively, for xit = t + eit + i and xit = log(t + 1) + eit + 2i . Clearly, the three values of sb in the first set and the last value in the second set are smaller than the corresponding sb0 , and the first two values of sb in the second set are larger than the corresponding sb0 . Using the results in Example 6.2, we can obtain the following results under three sample sizes above. For the first choice xit = t + eit + i the ANOVAE of 2u is superior to its SDE for any 2u 0 and 2 > 0 . For the second choice xit = log(t + 1) + eit + 2i , the SDE of 2u is superior to its ANOVAE only if 2u 0.227 for (N, T) = (12, 3), or only if 2u 0.159 for (N, T) = (20, 4). Secondly, we take random numbers from the independent 2 distributions 2m1 , 21 and 2b with 10,000 replicates for each of the six cases above. Using formulas (18) and (22), we can obtain simulated test powers of F( 0 ) and W( 0 ), see Tables 1 and 2.
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
3971
Table 1 Power of tests for hypothesis = 0 under model (27) with xit = t + eit + i , where all i ∼ N(0, 1), eit ∼ N(0, 1) are independent (runs:10,000).
0
N = 5 T = 10
N = 12 T = 3
N = 20 T = 4
F( 0 )
W( 0 )
F( 0 )
W( 0 )
F( 0 )
W( 0 )
0
0.1 0.3 0.5 0.7 0.9 1
0.2490 0.5519 0.6964 0.7808 0.8288 0.8479
0.2815 0.6239 0.7797 0.8550 0.8982 0.9138
0.1345 0.3689 0.5799 0.7220 0.8206 0.8508
0.1351 0.3750 0.5874 0.7344 0.8311 0.8638
0.2483 0.6993 0.8987 0.9685 0.9876 0.9922
0.2518 0.7067 0.9044 0.9714 0.9891 0.9929
0.5
1 1.5 2 3
0.2131 0.3708 0.4909 0.6468
0.2436 0.4314 0.5652 0.7306
0.2518 0.4854 0.7750 0.8987
0.2582 0.4988 0.7941 0.9132
0.4200 0.7472 0.8987 0.983
0.4221 0.7598 0.9062 0.9864
1.5
2 3 4 5 6
0.1053 0.2352 0.3526 0.4504 0.5300
0.1145 0.2672 0.4093 0.5213 0.6055
0.1249 0.3381 0.5383 0.6824 0.7850
0.1262 0.3515 0.5557 0.7059 0.8072
0.1794 0.5343 0.7780 0.8987 0.9577
0.1818 0.5472 0.7929 0.9070 0.9628
*Significance level = 0.05.
Table 2 Power of tests for hypothesis = 0 under model (27) with xit = log(t + 1) + eit + 2i , where all i ∼ N(0, 1), eit ∼ N(0, 1) are independent (runs:10,000).
0
N = 5 T = 10
N = 12 T = 3
N = 20 T = 4
F( 0 )
W( 0 )
F( 0 )
W( 0 )
F( 0 )
W( 0 )
0
0.1 0.3 0.5 0.7 0.9 1
0.2490 0.5519 0.6964 0.7808 0.8288 0.8479
0.2562 0.5884 0.7467 0.8288 0.8780 0.8963
0.1345 0.3689 0.5799 0.7220 0.8206 0.8508
0.1315 0.362 0.5708 0.7141 0.8151 0.8500
0.2483 0.6993 0.8987 0.9685 0.9876 0.9922
0.2441 0.6922 0.8961 0.9681 0.9877 0.9918
0.5
1 1.5 2 3
0.2131 0.3708 0.4909 0.6468
0.236 0.4211 0.5568 0.7229
0.2518 0.4854 0.6585 0.8508
0.2487 0.4833 0.6623 0.8572
0.4200 0.7472 0.8987 0.983
0.4165 0.7492 0.9004 0.9849
1.5
2 3 4 5 6
0.1053 0.2352 0.3526 0.4504 0.5300
0.1131 0.2646 0.4053 0.5176 0.6025
0.1249 0.3381 0.5383 0.6824 0.7850
0.1249 0.3442 0.5480 0.6948 0.7968
0.1794 0.5343 0.7780 0.8987 0.9577
0.1796 0.5393 0.7872 0.9041 0.9610
*Significance level = 0.05.
Tables 1 and 2 show that the test based on W( 0 ) is more powerful than that based on F( 0 ) for the null hypothesis H0 : = 0 under sample size (N, T) = (5, 10), a scenario with relatively large number of repeated observations for each subject but small number of subjects. There is no substantial difference between the power of the two tests above under the sample size (N, T) = (12, 3) or (20,4), a scenario with relatively small number of repeated observations for each subject but large number of subjects, regardless of whether the ANOVAE is superior to the SDE. As a subsequent observation, the result indicates that better estimates of variance components do not necessarily imply higher power of the tests. A similar phenomenon is observed in the expected lengths of confidence intervals (24) and (23); see Fig. 1 below. Thus we can adopt the simpler test statistic F( 0 ) and confidence interval (23) in practice when T, the number of repeated observations for each subject, is relatively small. However, they are not good choices due to the low power of the test and the long length of the confidence interval when T is relatively large, especially, when T is larger than N, the number of subjects. 8. Discussion In this paper, we study the finite sample properties of ANOVAE and SDE under model (1). We establish the necessary and sufficient condition for the equality of the two estimates, and some sufficient conditions for the superiority of one estimate over the other under the mean squared error criterion. Furthermore, we consider exact tests and confidence intervals based on the two estimates, and demonstrate via simulations that better estimates of variance components do not necessarily imply higher power of the tests or shorter confidence intervals.
3972
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
Case I (xit=t+eit+αi)
Case II (xit=log(t+1)+eit+2αi)
25
25 N=12
20
N=12
20
T=3
15
15
10
10
5
5
0
T=3
0 0
1
0.5
1.5
25
0
1
1.5
0.5
1
1.5
1
1.5
25 N=20
20 Expected Length
0.5 N=20
20
T=4
15
15
10
10
5
5
0
T=4
0 0.5
0
1
1.5
0 25
25 N=5
20
N=5
20
T=10
15
15
10
10
5
5
T=10
0
0 0
0.5
1
1.5
0
0.5
θ0
θ0
Fig. 1. Expected lengths of exact confidence intervals based on F( 0 ) (solid line) and W( 0 ) (dash line).
Table 3 Estimated coverage probabilities ∗ the confidence interval for based on MLE under model (27) (runs:10,000).
0.5 1 1.5
N = 5 T = 10
N = 12 T = 3
N = 20 T = 4
Case I
Case II
Case I
Case II
Case I
Case II
0.7122 0.7235 0.7205
0.6944 0.7088 0.7127
0.8597 0.8596 0.8509
0.8408 0.8317 0.8292
0.8922 0.8829 0.8798
0.8745 0.8692 0.8669
*Nominal confidence level 1 − = 0.95. Case I: xit = t + eit + i . Case II: xit = log(t + 1) + eit + 2i .
As comparison, the likelihood method has poor small sample size performance in mixed models (see Burdick and Larsen, 1997). Moreover, when the parameter lies in the boundary of the parametric space, the likelihood-based tests such as Wald and likelihood ratio tests do not asymptotically follow chi-square distributions as demonstrated in Stram and Lee (1994) and Crainiceanu and Ruppert (2004). For the two models considered in Section 7 with the same settings, Table 3 presents the coverage probabilities of the confidence intervals based on the maximum likelihood estimate. The results in the table clearly indicate that with relatively small sample sizes, the coverage probabilities are substantially below the nominal level of 95%.
M.-X. Wu et al. / Journal of Statistical Planning and Inference 139 (2009) 3962 -- 3973
3973
As noticed in the literature, both ANOVAE and SDE may take negative values. This is not a problem for genetic field in which variance component is permitted to be negative, see Burton et al. (1999) and Hazelton and Gurrin (2003). However, estimates of variance components are often required to be non-negative in many other cases. One remedy is to estimate the two variance 2 components using max{0, ˆ u } and max{0, ˜ 2u }. However, further research is needed to find conditions for the superiority among the two truncated estimates. In this regard, Theorem 4.1 in this paper is a potentially useful tool for such an investigation. Acknowledgments This research was supported by the Intramural Research Program of the National Institute of Child Health and Human Development, National Institutes of Health. Wu's research was also partially supported by Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality PHR (IHLB) and National Natural Science Foundation of China (NSFC 10801005) and (NSFC 10771010). The opinions expressed in this article are not necessarily of the National Institutes of Health. The authors wish to thank two referees for their valuable suggestions which considerably improved this paper. References Burch, B.D., Iyer, H.K., 1997. Exact confidence intervals for a variance ratio (or heritability) in a mixed linear model. Biometrics 53, 1318–1333. Burdick, R.K., Larsen, G.A., 1997. Confidence intervals on measures of variability in repeatability and reproducibility study. J. Qual. Tech. 29, 261–273. Burton, P., Tiller, K., Gurrin, L., Cookson, W., Musk, A., Palmer, L., 1999. Genetic variance components analysis for binary phenotypes using generalized linear mixed models (GLMMs) and Gibbs sampling. Genet. Epidemiol. 17, 118–140. Crainiceanu, C.M., Ruppert, D., 2004. Likelihood ratio tests in linear mixed models with one variance component. J. Royal Stat. Soc. B 66, 165–185. Davidian, M., Giltinan, D.M., 1996. Nonlinear Models for Repeated Measurement Data. Chapman & Hall, London. Demidenko, E., 2004. Mixed Models: Theory and Applications. Wiley, New Jersey. Diggle, P.J., Liang, K.Y., Zeger, S.L., 2002. Analysis of Longitudinal Data. second ed. Oxford Science, New York. Hazelton, M.L., Gurrin, L.C., 2003. A note on genetic variance components in mixed models. Genet. Epidemiol. 24, 297–301. Henderson, C.R., 1953. Estimation of variance and covariance components. Biometrics 9, 226–253. Rao, C.R., 1973. Linear Statistical Inference and its Applications. second ed. Wiley, New York. Searle, S.R., Casella, G., McCulloch, C.E., 1992. Variance Components. Wiley, New York. Stram, D.O., Lee, J.W., 1994. Variance components testing in the longitudinal mixed effects model. Biometrics 50, 1171–1177. Wald, A., 1940. A note on the analysis of variance with unequal class frequencies. Ann. Math. Statist. 11, 96–100. Wang, C.M., Lam, C.T., 1997. A mixed effects models for the analysis of circular measurements. Technometrics 39, 119–126. Wang, S.G., Chow, S.C., 1994. Advanced Linear Models. Marcel Dekker, New York. Wang, S.G., Yin, S.J., 2002. A new estimate of the parameters in linear mixed models. Sci. China Ser. A 45, 1301–1311. Wu, M.X., Wang, S.G., 2004. Simultaneous optimal estimates of fixed effects and variance components in the mixed model. Sci. China Ser. A 47, 787–799.