Analysis of multivariate longitudinal data using quasi-least squares

Analysis of multivariate longitudinal data using quasi-least squares

Journal of Statistical Planning and Inference 103 (2002) 421–436 www.elsevier.com/locate/jspi Analysis of multivariate longitudinal data using quasi...

133KB Sizes 0 Downloads 97 Views

Journal of Statistical Planning and Inference 103 (2002) 421–436

www.elsevier.com/locate/jspi

Analysis of multivariate longitudinal data using quasi-least squares N. Rao Chaganty ∗ , Dayanand N. Naik Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529, USA Received 17 September 1999; accepted 8 November 2000

Abstract In this paper we consider the analysis of multivariate longitudinal data assuming a scale multiple of Kronecker product correlation structure for the covariance matrix of the observations on each subject. The method used for the estimation of the parameters is the quasi-least squares method developed in the following three papers: Chaganty (J. Statist. Plann. Inference 63 (1997) 39), Shults and Chaganty (Biometrics 54 (1998) 1622) and Chaganty and Shults (J. Statist. Plann. Inference 76 (1999) 145). We show that the estimating equations for the correlation parameters in the quasi-least-squares method are optimal unbiased estimating equations if the data is from a normal population. An algorithm for computing the estimates is provided and implemented on a real life data set. The asymptotic joint distribution of the estimators of the regression and correlation parameters is derived and used for testing a linear hypothesis on the regression c 2002 Elsevier Science B.V. All rights reserved. parameters.  MSC: primary 62H12; 62H15; 62J12; secondary 62F10; 62F12 Keywords: Longitudinal data; Kronecker product; Quasi-least squares; GEE; AR(1); SUR model

1. Introduction In many practical situations, observations on n subjects (possibly randomly assigned to g groups) are made on a set of p response variables (or characteristics) at t occasions. Thus, on each subject we have a p × t matrix Yi = (yijk ) of observations, where yijk is the observation on the jth response variable taken at the kth occasion or time period corresponding to the ith subject. Here 1 6 i 6 n; 1 6 j 6 p; and 1 6 k 6 t. These data are termed as multivariate repeated measures data, or multivariate longitudinal data, or multi-response growth curve data. Also, associated with each of the n subjects, we have measurements xijkl taken on q covariates (1 6 l 6 q). The covariates could be categorical, and they may or may not change with time. Let ∗ Corresponding author. Tel.: +1-757-683-3897; fax: +1-757-683-3885. E-mail address: [email protected] (N.R. Chaganty).

c 2002 Elsevier Science B.V. All rights reserved. 0378-3758/02/$ - see front matter  PII: S 0 3 7 8 - 3 7 5 8 ( 0 1 ) 0 0 2 3 5 - X

422 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436

xil = (xi11l ; : : : ; xip1l ; xi12l ; : : : ; xip2l ; : : : ; xi1tl ; : : : ; xiptl ) be the vector of observations on the lth covariate taken on the ith subject. Let Xi = [xi1 : : : : : xiq ]pt×q be the matrix of measurements taken on the q covariates associated with each response variable at the t occasions on the ith subject. Suppose that the expected value and the covariance matrix of yi = vec (Yi ) are E(yi ) = i () = Xi  and Cov(yi ) = , respectively. Assuming that the data on each subject come from a pt dimensional multivariate normal distribution with dispersion matrix , the maximum likelihood estimator of  can be obtained as a function of  and inference can be performed using the standard asymptotic theory of maximum likelihood estimators. If the data do not come from a multivariate normal distribution or if the response variables on which data are collected are not continuous then the standard methods do not readily apply. Recently, Chaganty (1997) introduced a new method called “quasi-least squares” for analyzing longitudinal data. This method is an alternative to the GEE method of Liang and Zeger (1986) and its various variations. Quasi-least squares method was developed to overcome some of the pitfalls of the GEE method (see Crowder, 1995). Unlike the GEE method which can yield non-feasible and inconsistent estimates for the correlation parameters (see Sutradhar and Das, 1999), quasi-least squares method yields feasible and consistent estimates for the correlation parameters, and in some cases even if the correlation structure is misspeciMed. Quasi-least squares has been successfully utilized in various practical problems involving unbalanced and unequally spaced data. See Shults and Chaganty (1998) and Chaganty and Shults (1999). Several authors (for example, Boik, 1991; Galecki, 1994; Shults, 2000; Naik and Rao, 2001; Shults and Morrow, 2002) have observed many advantages of using Kronecker product structured covariance, that is,  = 1 ⊗ 2 , where 1 and 2 , respectively, are t × t and p × p positive deMnite matrices, for analyzing multivariate repeated measures and other types of data. Further the linear model with this covariance structure reduces to the well-known Zellner’s seemingly unrelated regression (SUR) (Zellner, 1962, 1963) model when 2 = I . Hence in this article we will consider Kronecker product covariance structure for the dispersion matrix of yi . The main focus of this paper is to implement the quasi-least squares method for analyzing multivariate longitudinal data assuming a scale multiple of Kronecker product correlation structure for the covariance matrix. The organization of the paper is as follows. In Section 2, we will describe the quasi-least squares method as applied to the present situation. We also present a discussion of the optimality of the estimating equations and an iterative algorithm for the computation of the estimates. In Section 3, we will derive closed-form solutions for the estimates of the correlation parameters for some popular correlation structures. In Section 4, we will derive the joint asymptotic distribution of the quasi-least squares regression and correlation parameter estimates. We also present a test statistic for testing linear hypothesis concerning the regression parameter  and derive its asymptotic distribution. We will present the analysis of a data set in Section 5 and Mnally end with some concluding remarks.

N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 423

2. The method of quasi-least squares For analyzing multivariate repeated measures data that are continuous non-normal or categorical we adopt the quasi-least squares method, described in Chaganty (1997), Shults and Chaganty (1998) and the bias corrected version of the correlation parameter in Chaganty and Shults (1999). To put the problem in a slightly general framework we assume that E(yi ) = i () = g(Xi );

(1)

where as before Xi is the pt × q design matrix,  is a q × 1 vector of unknown parameters and the inverse of g is a known link function. Further assume that the covariance matrix of yi is  =  Ai1=2 ()(RT () ⊗ RP ())Ai1=2 () =  i ()

(say);

(2)

where  = (; ; ) and RT () and RP (), respectively, are correlation matrices of order t ×t and p×p, which are functions of the vectors  and , respectively. The correlation matrix RT () represents the correlation among the t repeated measurements over time, whereas, RP () represents the correlation among the p response variables. The pt × pt diagonal matrix Ai1=2 () contains the standard deviations and  is an overdispersion or a scale parameter. The mean-covariance model (1), (2) encompasses several discrete and continuous models. While the main parameter of interest is , the parameters ;  and  are nuisance parameters. 2.1. Estimating equations Here we describe the method of quasi-least squares. This is a two stage procedure. In the Mrst stage we minimize with respect to ;  and  the quadratic form n  −1=2 −1 (yi − i ()) Ai−1=2 ()(R−1 ()(yi − i ()) T () ⊗ RP ()) Ai i=1

=

n  i=1

−1 zi () (R−1 T () ⊗ RP ())zi ();

(3)

where zi () = Ai−1=2 ()(yi − i ()) with (Ai1=2 ())−1 = Ai−1=2 (). Note that zi () = vec (Zi ()), where   zi11 : : : zi1t   Zi () =  ... . . . ...  : zip1 : : : zipt Since −1 −1   −1 tr(R−1 T ()Zi () RP () Zi ()) = vec (Zi ()) (RT () ⊗ RP ()) vec (Zi ()) −1 = zi ()(R−1 T () ⊗ RP ())zi ();

424 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436

we can rewrite the quadratic form (3) as  n  −1  tr R−1 () = np tr(R−1 () Z ()R ()Z i i P T ()U (; )) T

(4)

and also as  n  −1 −1  tr RP () Zi ()RT ()Zi () = nt tr(R−1 P ()V (; ));

(5)

i=1

i=1

n where the matrix U (; ) = (1=np) i=1 Zi ()R−1 P ()Zi () is of the order t × t and n  V (; ) = (1=nt) i=1 Zi ()R−1 ()Z () is of order p × p. Equating to zero the partial i T derivatives of (3) – (5) with respect to ;  and , respectively, as in Chaganty (1997), we obtain the following three estimating equations: n  −1 (6) Di ()Ai−1=2 () (R−1 T () ⊗ RP ())zi () = 0; i=1



@R−1 T () tr U (; ) = 0; (7) @  −1 @RP () tr V (; ) = 0; (8) @ ˜ ; where Di () = @i =@ . Let ˜ = (; ˜ ) ˜  be the solution of the above three equations. ˜ The estimate  is consistent but the estimates ˜ and ˜ are asymptotically biased (see Theorem 4.1). The main reason being the estimating equation (6) is unbiased whereas Eqs. (7) and (8) are not unbiased. The second stage in the quasi-least squares method consists of solving the two equations  −1

@RT ()

tr R () =0 (9) T @

=˜

and

tr



@R−1 P ()

@

=˜

RP () = 0

(10)

to obtain consistent estimates ˆ and ˆ of  and , respectively. Let ˆ = ˜ (or the estimate obtained solving Eq. (6) substituting ˆ and ˆ for  and , respectively). We ˆ ˆ and ˆ simply as the shall call the second stage quasi-least squares estimates , quasi-least squares estimates of ,  and , respectively. Finally, a consistent estimate of  is given by ˆ = min(ˆ 1 ; ˆ 2 ); where ˆ 1 = and ˆ 2 =

n

ˆ  (RT () ˆ − i ()) ˆ ⊗ RP ()) ˆ −1 (yi − i ()) ntp

n

ˆ  (yi − i ()) ˆ − i ()) : ntp

i=1 (yi

i=1 (yi

(11)

N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 425

2.2. Optimality of the estimating equations It is well known that, when  and  are known, the function g1 (; ; ) =

n  i=1

−1 Di ()Ai−1=2 () (R−1 T () ⊗ RP ())zi ()

(12)

is the optimal unbiased estimating function for estimating  according to Godambe’s criterion (see Godambe, 1960; Heyde, 1997, p. 22). Since E(U (; )) =  RT (), the function  −1 @RT () (U (; ) −  RT ()) g2 (; ; ; ) = tr @ −1 −1 is clearly unbiased. Also, since @R−1 T ()=@ = − RT ()(@RT ()=@) RT (), using the properties of the trace, Kronecker product and vec(·) operator (see Rao and Rao, 1998, p. 202) we can verify that   @ vec(RT ()) g2 (; ; ; ) = − (RT () ⊗ RT ())−1 @

×(vec(U (; )) −  vec(RT ())):

(13)

When ,  and  are known, the unbiased estimating function (13) is the optimal estimating function for estimating  if a constant multiple of Cov (vec(U (; ))) is used in place of (RT () ⊗ RT ()). But Cov(vec(U (; ))) depends in general on the fourth moments of the yi ’s and we have no assumptions made concerning the fourth moments. However, note that if the yi ’s are normal, then Cov (vec(U (; ))) is 2  (RT () ⊗ RT ()). Thus, the estimating function (13) is the optimal unbiased estimating function for the parameter  when the yi ’s are normally distributed and the other parameters are known. It will be close to the optimal unbiased estimating equation whenever a constant multiple of Cov (vec(U (; ))) is approximately equal to (RT () ⊗ RT ()). Similarly, since E(V (; )) =  RP (), the function  −1 @RP () g3 (; ; ; ) = tr (V (; ) − RP ()) @ is unbiased. If the yi ’s are independent and normally distributed we can check that Cov (vec(V (; ))) = 2(RP () ⊗ RP ()). Therefore, the function   @ vec(RP ()) g3 (; ; ; ) = − (RP () ⊗ RP ())−1 @ ×(vec(V (; )) −  vec(RP ()))

(14)

is the optimal unbiased estimating function for estimating  when ,  and  are known, if the yi ’s are normally distributed, and is close to being optimal if a constant multiple of Cov (vec(V (; ))) is approximately equal to (RP () ⊗ RP ()). Now

426 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436

ˆ ; from (7) – (10) we can see that the quasi-least squares estimates satisfy g1 (; ˆ ) ˆ = 0,  −1

@RT ()

ˆ ) tr ˆ =0 (15) (U (; ˆ − RT ()) @ =˜ and

tr



@R−1 P ()

@

=˜

ˆ ) (V (; ˆ −  RP ()) ˆ =0

(16)

ˆ Thus the method for all . In particular, Eqs. (15) and (16) are satisMed when  = . of quasi-least squares provides a feasible solution to the unbiased estimating equations gi = 0 for i = 1; 2; 3. If the data are from a normal population then these three equations are also the optimal unbiased estimating equations according to Godambe’s criterion. Regardless of normality, closeness of these estimating equations to optimality corresponds to closeness of (constant multiples of) Cov (vec(U (; ))) and Cov (vec(V (; ))) to (RT () ⊗ RT ()) and (RP () ⊗ RP ()), respectively. 2.3. An algorithm In general, a closed-form solution does not exist for the estimating equations (6) – (8). Also, we need to solve these equations using a recursive procedure and an optimization routine. An iterative algorithm for obtaining the Mrst stage quasi-least squares estimates of ;  and  could be described as follows: Step 1: Start with a trial value 0 . Step 2: Fix a trial value for 0 and compute U0 = U (0 ; 0 ). Step 3: Get the estimate 0 minimizing tr (R−1 T () U0 ) with respect to . Step 4: Compute V0 = V (0 ; 0 ). Step 5: Get the estimate 1 minimizing tr(R−1 P ()V0 ) with respect to . Step 6: Repeat Steps 2–5 with 0 = 1 , until convergence and obtain (0 ; 0 ). Step 7: Compute the updated value −1 n

n    −1   −1 1 = 0 + Di0 i0 Di0 Di0 i0 zi (0 ) ; i=1

i=1



where i0 = i (0 ); 0 = (0 ; 0 ; 0 ) ; Di0 = Di (0 ) and Di () = @i ()=@ . Stop the iterative procedure if 1 ≈ 0 and set ˜ = 0 ; ˜ = 0 and ˜ = 0 . Otherwise repeat Steps 2– 6 with 0 replaced by 1 . We note that for most of the commonly used correlation structures the second stage in the quasi-least squares method does not require an iterative procedure, since the estimates can be obtained in a closed form as shown in the next section. 3. Correlation structures Here we consider several popular correlation structures, including the unstructured correlation for RT () and illustrate the method of minimization of tr (R−1 T () U ) needed in Step 3 of the algorithm. We will also obtain feasible, unique and often closed-form

N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 427

solution to Eq. (9) for these correlation structures. The structures assumed here for RT () can be assumed for RP () as well. The form of the solutions ˜ and ˆ obtained here for  can be used for obtaining ˜ in the algorithm described in Section 2.3 and the solution ˆ for Eq. (10) in Section 2.1. 3.1. Equicorrelated correlation structure Suppose that the t repeated measurements are equicorrelated, that is, the correlation structure RT () is of the form RT () = (1 − )I + J , where I is the identity matrix and J is a matrix of 1’s. Since  1 R−1 I− J; T () = (1 − ) (1 − )(1 + (t − 1)) we have 1  tr(R−1 tr(U ) − tr(UJ ) T ()U ) = (1 − ) (1 − )(1 + (t − 1)) b a − ; (17) (1 − ) (1 − )(1 + (t − 1)) where a = tr(U ); b = tr(UJ ). Taking derivatives, we can check that the function (17) has a unique point of minimum in the interval (−1=(t − 1); 1), given by  −a(t − 1) + b(t − 1)(at − b) ˜ = : (t − 1)(a (t − 1) − b) Also, in this case there is a unique solution to Eq. (9) in the interval (−1=(t − 1); 1) and it is given by =

b−a ˜2 (t − 2) + 2˜ = : (18) [1 + ˜2 (t − 1)] (t − 1)a We can verify that the function (1 (·) is a continuous, one-to-one and onto function on the interval (−1=(t − 1); 1). ˜ = ˆ = (1 ()

3.2. First-order autoregressive (AR(1)) correlation structure Consider the situation where the correlation between the t repeated measurements decreases with time. A commonly used correlation structure in this situation is the Mrst-order autoregressive (AR(1)) structure, RT () = [|i−j| ]. Here 1 R−1 [I − C1 + 2 C2 ]; T () = (1 − 2 ) where C1 is a tridiagonal matrix with 0 on the diagonal and 1 on the lower and upper diagonals and C2 = diag(0; 1; : : : ; 1; 0). Therefore, 1 tr(R−1 [tr(U ) −  tr(C1 U ) + 2 tr(C2 U )] T () U ) = (1 − 2 ) =

1 [a − c1 + 2 c2 ]; (1 − 2 )

(19)

428 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436

where a = tr(U ); c1 = tr(C1 U ), and c2 = tr(C2 U ). We can easily check that (19) has a unique point of minimum in the interval (−1; 1) given by  (a + c2 ) − (a + c2 )2 − c12 ˜ = : (20) c1 In this case the feasible solution to Eq. (9) is c1 2˜ = ˜ = : (21) ˆ = (1 () 2 a + c2 (1 + ˜ ) The function (1 (·) is a continuous and one-to-one and onto function on the interval (−1; 1). We will use the above estimate ˆ in the example discussed in Section 5. 3.3. Tri-diagonal structure Let RT () be a tri-diagonal matrix, that is, the diagonal elements of RT () are one and all the elements above and immediately below the diagonal are equal to  and other elements are zero. Here, R−1 T () does not have a simple closed form but the matrix RT () admits a spectral decomposition RT () = P+()P  ; where P, the matrix of orthogonal eigen vectors, does not depend on . See Chaganty (1997, Example 4:2). Now −1  tr(R−1 T ()U ) = tr(+ ()P UP)

=

t 

uk ; 1 + 2 cos(k-=(t + 1)) k=1

(22)

where uk is the kth diagonal element of P  UP. It is well known that RT () is positive deMnite if and only if  falls in the interval (1 ; t ), where −1 j = ; j = 1; t: 2 cos (j-=(t + 1)) We can verify that (22) has a unique point of minimum ˜ in the interval (1 ; t ), which could be computed numerically. In this case the second stage estimate ˆ is in a closed form and is given by t 1 k=1 bk ; ˆ = (1 () ˜ = − t 2 k=1 bk cos (k -=(t + 1)) where bk =

cos (k-=(t + 1)) : (1 + 2 ˜ cos (k-=(t + 1)))

3.4. Unstructured correlation matrix Suppose that the correlation between the t repeated measurements RT () = RT is an unstructured positive deMnite correlation matrix. As shown in Chaganty (1997), the

N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 429

point of minimum R˜ T in Step 3 of the algorithm described in Section 2.3, can be obtained recursively starting with any positive deMnite diagonal matrix +0 and com1=2 1=2 1=2 puting +k = diag (+k−1 U+k−1 ) at the kth step and ending the recursive process as ˜ The matrix R˜ T is equal to +˜ −1=2 (+˜ 1=2 U +˜ 1=2 )1=2 +˜ −1=2 . The soon as +k ≈ +k−1 = +. bias corrected correlation matrix is given by  if .T ¿ 0; R˜ T .T R˜ T Rˆ T = −1=2 −1=2 (diag (U )) U (diag(U )) otherwise:

Here .T = diag[(R˜ T ◦ R˜ T )−1 e], e is a vector of 1’s and ◦ denotes the Hadamard (see, Chaganty and Shults, 1999). Similarly, we can construct an estimate RP () = RP , when it is an unknown unstructured correlation matrix. We will estimate Rˆ P in the example described in Section 5.

(23) product Rˆ P for use the

4. Large sample inference In this section we will study the large sample properties of the quasi-least squares estimates. We show that the estimates are consistent and asymptotically normal. We also propose a test statistic for testing a hypothesis concerning the regression parameter and derive it’s asymptotic distribution. 4.1. Asymptotic distribution Here we will establish consistency and joint asymptotic normality of the quasi-least squares estimates. Let  = (; ; ) be the vector consisting of the regression and the correlation parameters. Note that the Mrst stage quasi-least squares estimate ˜ is the n solution of the equation i=1 hi () = 0, where hi () = (h1i (); h2i (); h3i ()) and −1 h1i () = Di ()Ai−1=2 () (R−1 T () ⊗ RP ())zi ();

@R−1 T ()  ()Z () ; Zi () R−1 i P @  −1 @RP ()  h3i () = tr ()Z () : Zi ()R−1 i T @ 

h2i () = tr

The expected value of hi () does not depend on i and equals   −1  −1  @RT () @RP () 2() = 0;  tr  tr : (24) RT () RP () @ @ n Since E(zi ()) = 0, we can check that In () =−(1=n) i=1 E(@hi ()=@ ) is of the form   0 0 In11 () (25) In () =  0 In22 () In23 ()  :  0 In23 () In33 ()

430 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436

In the above the three partitions are made according to the dimensions of the three vecn tors ;  and , respectively. Similarly, we can partition Mn () = (1=n) i=1 Cov (hi ()) as   Mn11 () Mn12 () Mn13 ()  Mn () =  Mn12 (26) () Mn22 () Mn23 ()  ;   Mn13 () Mn23 () Mn33 () n where Mnjk = (1=n) i=1 Cov(hji (); hki ()). We can check that Mn11 () = In11 (), where n 1 D () i−1 ()Di () (27) In11 () = n i=1 i and i () is deMned in (2). It is possible to express the other matrices Mnjk and Injk in (25) and (26) as functions of ; ;  and  explicitly. See Chaganty (1997, p. 47) for some details concerning these formulas. The next theorem establishes the asymptotic normality of the Mrst stage quasi-least squares estimates. Below we will use the acronym AN for asymptotically normal as in SerSing (1981, p. 20). ˜ ; Theorem 4.1. Let  = (; ; ) be ;xed. Let ˜ = (; ˜ ) ˜  be the solution of the equan tion i=1 hi () = 0. Let Mn () → M () and In () → I () as n → ∞. Assume that a central limit theorem holds for the summands hi () and they satisfy; as a function of ; the regularity conditions needed for a Taylor series expansion to hold. Then √ (28) n(˜ −  − [I ()]−1 2()) → N (0; [I ()]−1 M ()[I ()]−1 ) as n → ∞; where 2() is de;ned in (24). n ˜ = 0, using a Taylor series expansion and a standard argument Proof. Since i=1 hi () we can verify that the asymptotic distribution of (˜ − ) is same as the asymptotic distribution of

−1 n

n    n 1 1 @hi () −1 1  − E hi () = [In ()] hi () : (29) n i=1 @ n i=1 n i=1 Note that E(hi ()) = 2() for all i. Since Mn () converges to M () and the summands hi () satisfy a central limit theorem, we conclude that  

n M () 1 : (30) hi () is AN 2(); n i=1 n Since In () converges to I (), from (29) and (30) we get that  [I ()]−1 M ()[I ()]−1 −1 ˜ ; ( − ) is AN [I ()] 2(); n

(31)

which is equivalent to (28). This completes the proof of the theorem. Since I () is the limit of (25), from the above theorem and using (24), we can see that the Mrst stage quasi-least squares estimate ˜ is a consistent estimate of ,

N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 431

whereas ˜ and ˜ are asymptotically biased. To get consistent estimates of  and  we ˜ which depends on the structures of the correlation will make a transformation on , matrices RT () and RP (). Let T ) = (b1 (; T ); b2 (; b(; T )) T ); b3 (;

(32)

T ; be a function of T = (; T ) T  and  = (; ; ) , where T ) = T − ; b1 (;  −1

@RT ()

b2 (; RT () ; T ) = tr @ =T



@R−1 P ()

T ) = tr RP () : b3 (; @ =T

(33)

˜ of the Note that the second stage quasi-least squares estimate is the solution ˆ = ((), ˜ ) = 0. The next theorem shows that ˆ is a consistent estimate of  and equation b(; asymptotically normal. ˜ ) be Theorem 4.2. Let  = (; ; ) be ;xed. Let ˜ be as in Theorem 4:1 and b(; ˆ ˆ ; as de;ned in (32). Assume that the conditions of Theorem 4:1 hold. Let  = (; ˆ ) ˆ ˜ ˜ be the solution; say ((); of the equation b(; ) = 0, where ((·) is a di
(34)

where  ∗ =  + [I ()] 2() and ∇(( ∗ ) is @(()=@ evaluated at  =  ∗ . Finally; ˆ as de;ned in (11); is a consistent estimate of . −1

˜ ; Proof. From Theorem 4.1, we know that ˜ = (; ˜ ) ˜  converges to  ∗ = (∗ ; ∗ ; ∗ ) ∗ as n → ∞. Note that  = . Using the weak law of large numbers, we can check that n 1  ˜ ) ˜ −1 ()Z ˜ →  tr(R−1 (∗ ) RP ()) RT () U (; ˜ = Z  ()R ˜ i () (35) P P np i=1 i p and n 1 ˜ ) ˜ −1 ()Z ˜ →  tr(R−1 (∗ )RT ())RP (): V (; ˜ = Zi ()R ˜ i () T T nt i=1 t

(36)

Taking the limit as n → ∞, using (35) and (36) we get   0     n 1  tr(R−1 (∗ )RP ())b2 (∗ ; )  P ˜ hi () =  p (37) 0 = lim ; n→∞ n i=1     ∗ ∗ tr(R−1 T ( )RT ())b3 ( ; ) t where the functions b2 and b3 are deMned in (33). Since ∗ = , from (37) we can ˜ converges to . This see that b( ∗ ; ) = 0. Thus  = (( ∗ ), and therefore ˆ = (()

432 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436

ˆ By the delta establishes consistency of the second stage quasi-least squares estimate . √ ˆ theorem it follows that n ( − ) converges in distribution to a normal distribution with mean 0 and covariance matrix 5(). Finally, consistency of ˆ implies that ˆ is a consistent estimate of . This completes the proof of the theorem. Remark 4.1. Since Mn11 () = In11 (), taking limit as n → ∞, using (27) and Theo√ rem 4.2, we can see that n(ˆ − ) converges to a q-variate normal distribution with mean 0 and covariance matrix [C()]−1 , where

n  1  C() = lim Di ()i−1 ()Di () : (38) n→∞ n i=1 Thus ˆ is asymptotically an eUcient estimator. The same asymptotic property also holds for the estimate ˆ obtained solving (6) for  after substituting , ˆ ˆ for  and , respectively. 4.2. Test of hypothesis Suppose that we are interested in testing the null hypothesis K   = m; where m is a known s × 1 vector and K is a known q × s matrix of rank s 6 q. We propose the test statistic n ˆ ˆ −1 Di ()) ˆ −1 K)−1 (K  ˆ − m) (K  ˆ − m) (K  ( i=1 Di () i Tn = ; (39) ˆ ˆ and ˆ is deMned in (11). Large values of Tn are considered to be where ˆ i = i () signiMcant. It can be seen easily from Theorem 4.2 and Remark 4.1 that, under the above null hypothesis, Tn converges to a central 92 with s degrees of freedom. We will use the test statistic Tn to test various hypothesis in the example considered in Section 5. 5. An example To illustrate the estimation of the regression, correlation and scale parameters and to perform certain hypotheses testing, here we present the analysis of Zullo’s dental data appeared in Timm (1980, Table 7:2). The data has two treatment groups and three response variables measured over three time periods. The study was concerned with the relative eVectiveness of two orthopedic adjustments of the mandible. Nine subjects were assigned to each of the two orthopedic treatments, say T1 and T2 (g = 2; n1 = 9; n2 = 9), called activator treatments. The measurements were made on three characteristics (p = 3), namely, SOr-Me (in mm), ANS-Me (in mm), and Pal-MP angle (in degrees) to assess the changes in the vertical position of the mandible at three time periods (t = 3) of activator treatment. The three null hypotheses of interest here are (i) there is no group and time interaction, (ii) there is no group eVect, and (iii) there is no time eVect.

N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 433

We analyze these data using the procedures described in the previous sections by Mtting a Kronecker product of an arbitrary correlation structure (for the three response variables), and an AR(1) correlation structure (for the measurements observed over the three time periods). Our main objective in analyzing these data is to illustrate the computation of various estimators by implementing the algorithm given in Section 2.3 and test statistic given in (39), rather than to gain any new insight about these data itself. As a preparation for Mtting this correlation matrix, we Mrst divide the observations corresponding to the three variables, SOr-Me, ANS-Me, and Pal-MP angle by the values closer to their respective standard deviations, namely 7.34, 4.76, and 5.56. This is done to make the variances of these variables approximately equal. Suppose yirjk is the observation on the jth variable at the kth occasion corresponding to the ith individual in the rth group (1 6 r 6 g). We assume that the expected value of yirjk is given by rjk = var j + timejk + grouprj + (group ∗ time)rjk : To express the model in the standard form, that is, as E(yi ) = Xi , we deMne the following independent variables. First, for i = 1; 2; 3, let xvi = 1 if the observation is on variable i and 0 otherwise. The coeUcient of each of these variables in the model will represent the unstructured mean of that variable. Next, for l = 1; 2, let xtl = 1 if the time period is l + 1 and −1 if the time period is 1 and 0 otherwise. Now, the null hypothesis of no time eVect is same as testing that the regression coeUcients corresponding to xt1 and xt2 are simultaneously zero. To test for the group eVect, we introduce xg = 1 if the observation is from group T2 and xg =−1 if the observation is from T1 : Testing that the coeUcient of xg in the model is zero will test the hypothesis that there is no group eVect. Finally, the hypothesis of no interaction can be tested by testing that the regression coeUcients corresponding to the products xg ∗ xt1 and xg ∗ xt2 are zero. Thus, the eight columns representing the independent variables in Xi are the columns providing the values corresponding to the eight variables (xv1 ; xv2 ; xv3 ; xt1 ; xt2 ; xg ; xt1 ∗ xg ; xt2 ∗ xg ): The quasi-least squares estimate of the regression parameter corresponding to these eight independent variables obtained using the algorithm given in Section 2.3 is ˆ = (16:7591; 13:6798; 4:4253; 0:0385; 0:1674; 0:1105; −0:0032; 0:0009) . Under the assumption Cov(yi ) = (RT () ⊗ RP ()), where RT () is the matrix of AR(1) correlation structure and RP () = RP is the unstructured correlation matrix, the quasi-least squares estimates of the correlation parameters and the scale parameter are Rˆ P (1; 2) = 0:7478; Rˆ P (1; 3) = 0:0264; Rˆ P (2; 3) = 0:3364; ˆ = 0:9381 and ˆ = 0:9362. The null hypothesis of no interaction, that is, H0 : 7 = 0 and 8 = 0 can be written as K   = 0; where K  is a 2 × 8 matrix with K  (1; 7) = 1; K  (2; 8) = 1 and all the other elements equal to zero. The value of the test statistic Tn in (39) for testing no interaction is 0.0136, indicating no interaction between the treatment groups and the time period. Similarly, the test for testing no group eVect also showed no signiMcance with a value of test statistic equal to 0.4861. Only time eVect is signiMcant with test statistic value 23.8208 and the corresponding p-value based on chi-square distribution with two degrees of freedom less than 0.0001. Several authors (e.g. Hauck and Donner, 1977; Moolgavkar and Venzon, 1986), at least in the context of logistic regression, have observed that the Wald-type statistic Tn

434 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436

in (39) requires very large sample sizes for a satisfactory approximation by a chi-square distribution. The sample sizes in the example we just analyzed are in fact very small (n1 = n2 = 9). A referee has suggested that we perform a simulation experiment to gain an understanding on how large the sample size should be for the chi-square approximation to be valid. We take the same design matrix as in the above example for our simulation. That is, we consider two treatment groups, three variables observed on each individual at three time periods. Corresponding to every individual (sampling unit) we generate a 9 × 1 multivariate normal random vector having mean vector zero and variance covariance matrix (RT () ⊗ RP ()): For the present simulation experiment we will Mx  = 1:0 and the 3×3 unstructured correlation matrix RP (1; 2) = 0:75; RP (1; 3) = 0:03, and RP (2; 3) = 0:34. However, we will determine the AR(1) correlation structured matrix RT () by the three values:  = 0:1; 0:5, and 0.9. This will enable us, to some extent, to study the eVect of the AR(1) correlation parameter  on the distribution of the test statistic under consideration. Since the purpose of the simulation experiment is to examine the chi-square approximation for the null distribution of the test statistic Tn in (39), we have set the  vector to zero. For various values of n1 and n2 we generate 10,000 simulation models and for each model compute the test statistic Tn : Then we determine the proportion of times the values of this statistic exceed the appropriate 5% chi-square upper tail cut-oV value. These proportions for the three tests, viz. no group eVect (Group), no time eVect (Time) and no group and time interaction (Int), are summarized in Table 1. It is clear from Table 1 that for values of || ¿ 0:9, a sample size of at least 50 is needed for the one degree of freedom chi-square approximation to be valid for testing the group eVect. Further, a sample size of greater than 100 is needed for the two degrees of freedom chi-square approximation to be valid for the distribution of test statistics for testing the time eVect and the interaction eVect. For smaller values of , that is, for  = 0:1 and 0:5 the approximation is better even for reasonably large sample size (¿ 30). More extensive simulations to determine the eVects of the parameters of RP and other design matrices on the chi-square approximation to the distribution of the test statistic can be performed. We postpone such a detailed study to a later date.

Table 1 Estimated level of signiMcance for various eVects Sample (n1 = n2 ) 9 30 50 100

 = 0:1

 = 0:5

 = 0:9

Group

Time

Int

Group

Time

Int

Group

Time

Int

0.0862 0.0670 0.0656 0.0590

0.0926 0.0684 0.0682 0.0670

0.0966 0.0698 0.0682 0.0598

0.0896 0.0692 0.0596 0.0474

0.0970 0.0676 0.0602 0.0606

0.0918 0.0684 0.0644 0.0610

0.0904 0.0644 0.0574 0.0532

0.1180 0.0854 0.0814 0.0846

0.1128 0.0832 0.0740 0.0794

N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 435

6. Concluding remarks In this paper we have analyzed multivariate repeated measures data assuming a scale multiple of Kronecker product correlation structure for the observations on each subject, using the method of quasi-least squares. We presented an algorithm for the computation of the estimates and derived the large sample distributions of the estimators. We have suggested Wald-type tests for testing any linear hypothesis based on the quasi-least squares estimates. Finally, we have implemented these results on a real life data set. Since the quasi-least squares method as applied to the present problem amounts to solving a set of best (optimal if the data are normal) unbiased estimating equations, our procedure is one of the best procedures for analyzing these data without making any distributional assumptions. Acknowledgements We thank the associate editor and the referee for their valuable comments and suggestions. This research was partially supported by the USARO grant number DAAH04-961-0070. References Boik, J.B., 1991. ScheVXe’s mixed model for multivariate repeated measures: a relative eUciency evaluation. Commun. Statist.-Theory Meth. 20, 1233–1255. Chaganty, N.R., 1997. An alternative approach to the analysis of longitudinal data via generalized estimating equations. J. Statist. Plann. Inference 63, 39–54. Chaganty, N.R., Shults, J., 1999. On eliminating the asymptotic bias in the quasi-least squares estimate of the correlation parameter. J. Statist. Plann. Inference 76, 145–161. Crowder, M., 1995. On the use of a working correlation matrix in using generalized linear models for repeated measures. Biometrika 82, 407–410. Galecki, A.T., 1994. General class of covariance structures for two or more repeated factors in longitudinal data analysis. Commun. Statist.-Theory Meth. 23, 3105–3120. Godambe, V.P., 1960. An optimum property of regular maximum likelihood estimation. Ann. Math. Statist. 31, 12–1208. Hauck, W.W., Donner, A., 1977. Wald’s test as applied to hypotheses in logit analysis. J. Amer. Statist. Assoc. 72, 851–853. Heyde, C.C., 1997. Quasi-Likelihood and its Applications. Springer, New York. Liang, K-Y., Zeger, S.L., 1986. Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. Moolgavkar, S.H., Venzon, D.J., 1986. ConMdence regions for case-control and survival studies with general relative risk functions. In: Moolgavkar, S.H., Prentice, R.L. (Eds.), Modern Statistical Methods in Chronic Disease Epidemiology. Wiley, New York, pp. 104–120. Naik, D.N., Rao, S., 2001. Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix. J. Appl. Statist. 28, 91–105. Rao, C.R., Rao, M.B., 1998. Matrix Algebra and its Applications to Statistics and Econometrics. World ScientiMc, Singapore. SerSing, R.J., 1981. Approximation Theorems of Mathematical Statistics. Wiley, New York. Shults, J., 2000. Modeling the correlation structure of data that have multiple levels of association. Commun. Statist.-Theory Meth. 29, 1005–1015.

436 N.R. Chaganty, D.N. Naik / Journal of Statistical Planning and Inference 103 (2002) 421–436 Shults, J., Chaganty, N.R., 1998. Analysis of serially correlated data using quasi-least squares. Biometrics 54, 1622–1630. Shults, J., Morrow, A.L., 2002. The use of quasi-least squares to adjust for two levels of correlation. Biometrics, to appear. Sutradhar, B.C., Das, K., 1999. On the eUciency of regression estimators in generalized linear models for correlated data. Biometrika 86, 459–465. Timm, N.H., 1980. Multivariate analysis of variance of repeated measurements. In: Krishnaiah, P.R. (Ed.), Handbook of Statistics, Vol. 1. North-Holland, New York, pp. 41–87. Zellner, A., 1962. An eUcient method of estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57, 348–368. Zellner, A., 1963. Estimators for seemingly unrelated regressions: some exact Mnite sample results. J. Amer. Statist. Assoc. 58, 977–992.