Journal of Statistical Planning and Inference 141 (2011) 3345–3355
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Variance least squares estimators for multivariate linear mixed model Jun Han Mathematics and Statistics Department, Georgia State University, Atlanta, GA 30303, United States
a r t i c l e i n f o
abstract
Article history: Received 30 August 2010 Received in revised form 17 April 2011 Accepted 20 April 2011 Available online 28 April 2011
Non-iterative, distribution-free, and unbiased estimators of variance components by least squares method are derived for multivariate linear mixed model. A general inter-cluster variance matrix, a same-member only general inter-response variance matrix, and an uncorrelated intra-cluster error structure for each response are assumed. Projection method is suggested when unbiased estimators of variance components are not nonnegative definite matrices. A simulation study is conducted to investigate the properties of the proposed estimators in terms of bias and mean square error with comparison to the Gaussian (restricted) maximum likelihood estimators. The proposed estimators are illustrated by an application of gene expression familial study. & 2011 Elsevier B.V. All rights reserved.
Keywords: Variance component Covariance matrix Variance least squares estimator Unbiased Non-iterative Distribution-free Multivariate mixed model Matrix differentiation Kronecker product
1. Introduction In biomedical sciences, such as survival analysis and genetic epidemiology, research questions can only be fully answered if multiple repeatedly measured outcomes are jointly modeled simultaneously. In controlled clinical trials, systolic and diastolic blood pressure are often jointly modeled to detect a longitudinal treatment effect on cardiovascular diseases. In familial studies, multiple traits are jointly modeled to investigate both familial association and trait association. In AIDS infection, plasma viral load HIV RNA, marker of the immune system status CD4, and marker of inflammation b2 microglobuline, can be modeled concurrently to explore relationship between markers, and to improve the likelihood value of joint model compared with those of separate univariate models (Thie´baut et al., 2002). For patients with polycystic kidney disease, the biomarkers for kidney structure and function are simultaneously modeled to study the joint evolution of these response variables over time. In epidemiological studies of hepatitis B virus (HBV) infected liver disease, although many HBV markers such as hepatitis B surface antigen (HBsAg), hepatitis B surface antibody (HBsAb), hepatitis Be antigen (HBeAg), hepatitis Be antibody (HBeAb), hepatitis B core antigen (HBcAg), hepatitis B core antibody (HBcAb), immunoglobulin (Ig) G and M have been identified and used in diagnosing and monitoring the progress of disease, no single marker can unambiguously diagnose the infection (Chan, 2002). Consequently, it becomes imperative to jointly model these markers for diagnostic and therapeutic purposes. For these multivariate clustered data sets, not only the correlation between repeated measurements of one outcome is taken into account, but also the
E-mail address:
[email protected] 0378-3758/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2011.04.018
3346
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
correlation structure between the different outcomes should be allowed for. Therefore, random-effects model (Laird and Ware, 1982) has become a powerful and flexible tool to analyze multiple responses. We consider the situation that several characteristics are measured on each individual at each occasion under a multivariate random-effects and error structure for repeated measurements or longitudinal observations. Let l ¼1,y,k index the characteristic or the response, i¼1,y,m index the cluster or the subject, and j ¼ 1, . . . ,nli index the membership or the occasion within the cluster, for a repeated measurement or a longitudinal observation ylij. In a multivariate linear mixed model, the nli dimensional response vector yli, the nli pl fixed-effects design matrix Xli and the nli ql randomeffects design matrix Zli satisfy the characteristic-cluster indexed structural relation yli ¼ Xli bl þ Zli bli þ eli ,
l ¼ 1, . . . ,k, i ¼ 1, . . . ,m,
ð1Þ
where bl is a pl dimensional fixed-effects coefficient vector, bli is a ql dimensional random-effects coefficient vector with mean zero and finite covariance matrix Dll, and eli is an nli dimensional random error vector with mean zero and scalar diagonal covariance matrix Sli . Here it is assumed that for each fixed l ¼1,y,k, random vectors fbli , eli gm i ¼ 1 are mutually uncorrelated. The measurements taken on the same cluster across different characteristics indexed by l are likely to be correlated. The dependence between different types of responses is built through the random effects, where we assume Covðbli ,bri Þ ¼ Dlr for lar, and the error terms where Varðe1ij , . . . , ekij ÞT ¼ L ¼ ðllr Þ for any given i and j. The variance components of random effects and error terms represent the relationships among the response variables and potential correlations among observations from the same cluster or subject. In this article, we assume that there is no serial correlation within the error term of each cluster for each response, i.e., the homogeneous within-group error assumption given by Sli ¼ lll Inli . Additionally, it is assumed that all k characteristics are observed simultaneously for each member (occasion) of a cluster (subject), so that nli ni for any l¼1,y,k. T T Let yi ¼ ðy1i1 , . . . ,yki1 ; . . . ; y1ini , . . . ,ykini ÞT be the vector of all the observed response variables for subject i, b ¼ ðb1 , . . . , bk ÞT T T T be the combined fixed-effects coefficient vector, and bi ¼ ðb1i , . . . ,bki Þ be the combined random-effects coefficient vector. P Let Xi be the corresponding combined kni p fixed-effects design matrix with p ¼ kl¼ 1 pl , Zi be the corresponding Pk combined kni q random-effects design matrix with q ¼ l ¼ 1 ql , D ¼ ðDlr Þ be a q q partitioned block matrix, and Si ¼ Ini L be a kni kni matrix. By using these definitions, we are able to recast a multivariate linear mixed model as a cluster indexed univariate linear mixed model given by yi ¼ Xi b þ Zi bi þ ei ,
i ¼ 1, . . . ,m,
ð2Þ
m m m where Eðbi Þ ¼ 0, Varðbi Þ ¼ D, Eðei Þ ¼ 0, and Varðei Þ ¼ Si . After stacking fyi gm i ¼ 1 as y, fXi gi ¼ 1 as X, fbi gi ¼ 1 as b, fei gi ¼ 1 as e, and diagonalizing fZi gm as Z ¼ diagðZ , . . . ,Z Þ, we can write the cluster indexed short form model (2) as the index-free long m 1 i¼1 form model given by
y ¼ X b þZb þ e:
ð3Þ
Multivariate linear mixed model with the Gaussian random effects and error terms has been considered by many authors. For example, Reinsel (1982) derived close-form estimates for multivariate random-effects model with completely observed responses and balanced designs. Shah et al. (1997) studied the multivariate mixed-effects model for longitudinal data via the EM algorithm. Schafer and Yucel (2002) investigated the computational strategies for multivariate linear mixed-effects models with missing values. Fieuws and Verbeke (2006) proposed a pair-wise modeling approach to circumvent the dimensional limitations in multivariate mixed model. The key part of estimation of linear mixed model is that of variance components. However, the normality assumptions imposed on the random effects and error terms may not be appropriate in multivariate setting when the underlying distributions are skewed or multimodal. Calvin and Dykstra (1991) developed an iterated estimation procedure, by least squares criterion and convex analysis, to produce nonnegative definite estimates of the covariance matrices in balanced multivariate variance components models. Han (2011) proposed non-iterative, distribution-free, and unbiased estimators of variance components, including minimum norm quadratic unbiased estimators and method of moments estimators, for multivariate linear mixed model. In this article, we will develop non-iterative, distribution-free, and unbiased estimators of variance matrices via least squares method for multivariate linear mixed model from the perspective of matrix derivatives. A general inter-cluster variance matrix, a same-member only general inter-response variance matrix, and an uncorrelated intra-cluster error structure for each response are assumed. The outline of this paper is as follows. Section 2 presents the variance least squares estimators and the unbiased variance least squares estimators for variance components. Section 3 performs a simulation study to investigate the properties of the proposed estimators with comparison to the Gaussian (restricted) maximum likelihood estimators. Section 4 illustrates the proposed estimators by a gene expression familial data. We conclude with some remarks in Section 5. The following matrix identities involving vec operator and Kronecker product (Henderson and Searle, 1979) will prove useful, vecðABCÞ ¼ ðC T AÞvecðBÞ,
ð4Þ
ðA BÞðC DÞ ¼ AC BD,
ð5Þ
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
3347
vecðIn LÞ ¼ JvecðLÞ, T
ab ¼ bT a ¼ a bT ,
ð6Þ T
vecðab Þ ¼ b a,
ð7Þ
the results holding for matrices with conformable dimensions. The first identity shifts a between-matrix to the right side of a sequence of ordinary matrix multiplications. Identity (6) (Magnus, 1988, p. 44) facilitates the conversion of a Kronecker matrix product to an ordinary matrix product for the purpose of inversion, where L is a k k matrix, J ¼ ½ðIn Kkn ÞðvecðIn Þ Ik Þ Ik is a ðknÞ2 k2 matrix, and Kkn is a kn kn commutation matrix (Magnus and Neudecker, 1979). 2. Variance least squares estimators The idea of least squares on squared residuals of variance components was suggested by Amemiya (1977), and is widely used in the literature on regression with heteroscedastic errors. Demidenko (2004, p. 171) developed the variance least squares estimator for the univariate linear mixed model with homogeneous within-group scalar variance s2 and general between-group variance matrix D. We will consider the variance component estimators under the assumption of heterogeneous within-group variance matrix Si ¼ Ini L and general between-group variance matrix D for a multivariate setting. Let b^ be the ordinary least squares estimator of the coefficient vector for the fixed effects. Then the residual vector T ^ ei ¼ yi Xi b^ is an estimator of ei ¼ Zi bi þ ei ¼ yi Xi b, and the ith empirical covariance matrix can be assessed by e^ i e^ i . Accordingly, the resulting variance least squares (VLS) estimators for matrices L and D minimize the sum of traces of squared matrices of the difference between the empirical matrices and the theoretical covariance matrices: m X
T trðe^ i e^ i Ini LZi DZ Ti Þ2 :
ð8Þ
i¼1
Theorem 1. Suppose that there are m clusters and k responses, and there are ni observations for the ith cluster of each response. P Let the total number of observations per response be n ¼ m i ¼ 1 ni . Assume that the multivariate linear mixed model has a q q general inter-cluster covariance D, a k k same-occasion (same-member) only general inter-response covariance L, and an uncorrelated intra-cluster error structure for each response. Let Ji ¼ ½ðIni Kkni ÞðvecðIni Þ Ik Þ Ik with Kkni the commutation T T matrix, and k kni matrix UjðiÞ ¼ ð0, . . . ,0,Ik ,0, . . . ,0Þ with identity matrix Ik at its jth column block. Set ZiðjÞ ¼ UjðiÞ Zi and T ^ ^ e^ iðjÞ ¼ UjðiÞ e^ i . Then the variance least squares estimators l and d for vecðLÞ and vec(D) are given by 2 " # l^ d^
nIk2 6 6 6 ¼6 m 6X T 4 ðZ Z T ÞJi i
i¼1
i
ni m X X
31 2
ZiðjÞ ZiðjÞ 7 7 i ¼ 1j ¼ 1 7 7 m X 7 T T ðZi Zi Þ ðZi Zi Þ 5 i¼1
3 ni m X X e^ iðjÞ e^ iðjÞ 7 6 6 i ¼ 1j ¼ 1 7 6 7 6 m 7: 7 6X T T 4 ðZi e^ i Þ ðZi e^ i Þ 5
ð9Þ
i¼1
The estimators obtained in (9) are biased, but we can make some adjustments by taking expectations of the VLS estimators to obtain the unbiased variance least squares (UVLS) estimators. Theorem 2. Suppose that there are m clusters and k responses, and there are ni observations for the ith cluster of each response. P Let the total number of observations per response be n ¼ m i ¼ 1 ni . Assume that the multivariate linear mixed model has a q q general inter-cluster covariance D, a k k same-occasion (same-member) only general inter-response covariance L, and an uncorrelated intra-cluster error structure for each response. Let V ¼ ðX T XÞ1 , Pi ¼ Xi VX Ti , Ji ¼ ½ðIni Kkni ÞðvecðIni Þ Ik Þ Ik with T Kkni the commutation matrix, and k kni matrix UjðiÞ ¼ ð0, . . . ,0,Ik ,0, . . . ,0Þ with identity matrix Ik at its jth column block. Set T T T ^ XiðjÞ ¼ UjðiÞ Xi , ZiðjÞ ¼ UjðiÞ Zi , e^ iðjÞ ¼ UjðiÞ e i , and define ( " #) ni m X m X X T T F1 ¼ Ik2 ½UjðiÞ ðXiðjÞ VX Ti ÞJi ½ðXiðjÞ VX Ti Þ UjðiÞ Ji þ ½ðXiðjÞ VÞ ðXiðjÞ VÞ ðXlT XlT ÞJl , i¼1j¼1
F2 ¼
ni m X X
l¼1
(
"
½ZiðjÞ ZiðjÞ ½ZiðjÞ ðXiðjÞ VX Ti Zi Þ½ðXiðjÞ VX Ti Zi Þ ZiðjÞ þ ½ðXiðjÞ VÞ ðXiðjÞ VÞ
i¼1j¼1
F3 ¼
m X
fðZiT i¼1
m X
#) ðXlT Zl Þ ðXlT Zl Þ
l¼1
"
ZiT ÞJi ½ðZiT Pi Þ
ðZiT ÞJi ½ZiT
ðZiT Pi ÞJi g þ
m X i¼1
ðZiT Xi VÞ
ðZiT Xi VÞ
3 #2 m X T T 4 ðXj Xj ÞJj 5, j¼1
,
3348
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
F4 ¼
m X
" ½ðZiT Zi Þ
ðZiT Zi ÞðZiT Pi Zi Þ
ðZiT Zi ÞðZiT Zi Þ
ðZiT Pi Zi Þ þ
i¼1
m X i¼1
ðZiT Xi VÞ
ðZiT Xi VÞ
3 #2 m X T T 4 ðX Zj Þ ðX Zj Þ5: j
j
j¼1
Then the unbiased variance least squares estimators l^ and d^ for vecðLÞ and vec(D) are given by 2 m n 3 i XX e^ iðjÞ e^ iðjÞ 7 " # " #1 6 6 i ¼ 1j ¼ 1 7 F1 F2 l^ 6 7 ¼ 6 7: m F3 F4 6X T 7 d^ T^ 5 4 ^ ðZi e i Þ ðZi e i Þ
ð10Þ
i¼1
The proofs of Theorems 1 and 2 are left to Appendices A and B, respectively. The VLS estimators in (9) and UVLS estimators in (10) generalize those estimators derived in Demidenko (2004, p. 171) for the univariate mixed model. 3. Simulation To study the finite sample size properties of the proposed estimators, we perform a simulation study to numerically examine biases and root mean square errors of VLS and UVLS estimators. For comparison purposes, we also include the Gaussian maximum likelihood (ML) estimators and the Gaussian restricted maximum likelihood (REML) estimators. To generate the data, we use a familial gene-trait association model as a prototype, which is a typical example of bivariate mixed-effects ANOVA model. Let i¼1,y,m indicate the sibship identification and j ¼1,y,ni indicate the member identification. Let gij denote the gene expression level for member j in sibship i and tij the trait value for member j in sibship i. Suppose that gene expression levels gij and trait values tij follow the random-effects model given by " # " # " # " # egij gij mg mgi ¼ þ þ tij etij , mt mti where the population mean is given by ðmg , mt ÞT , the between-family variation is given by ðmgi , mti ÞT with mean zero and variance matrix D, the between-response (within-family) variation is given by ðegij , etij ÞT with mean zero and variance matrix L, and " # " # 2 2 D¼
s1 r1 s1 s2 s3 r2 s3 s4 , L¼ : r1 s1 s2 s22 r2 s3 s4 s24
ð11Þ
The association between a trait and a gene expression level can be reflected by the familial correlation r1 and the response (residual) correlation r2 in (11). In the simulation, we consider two unbalanced designs with the average sibship sizes being 3 and 6, respectively. For the design of average sibship size 3, the sibship size ranges from 2 to 4; while for the design of average sibship size 6, the sibship size ranges from 4 to 8. The distributions include normal distribution and mixture of normal distributions. There are 10 sibships and 500 data sets generated for each scenario of the four combinations of average cluster size and distribution. The fixed-effects coefficient vector is set to be (1,2)T. The inter-family random effects and inter-response random effects are generated by bivariate normal distributions or mixtures of two bivariate normal distributions. For inter-sibship random effects of the mixture distribution case, the means of the two components of the mixture are (2,0)T and (2,0)T, respectively, the elements of the variance matrix of the first component are s11 ¼ s22 ¼ 1:0 and s12 ¼ s21 ¼ 0:5, and the elements of the variance matrix of the second component are s11 ¼ s22 ¼ 1:0 and s12 ¼ s21 ¼ 0:0. For inter-response random effects of the mixture distribution case, the means of the two components of the mixture are ( 1,1)T and (1,1)T, respectively, the elements of the variance matrix of the first component are s11 ¼ s22 ¼ 1:0 and s12 ¼ s21 ¼ 0:5, and the elements of the variance matrix of the second component are s11 ¼ s22 ¼ 1:0 and s12 ¼ s21 ¼ 0:9. In the normal distribution case, the means of the two normal distributions are ð0,0ÞT , the elements of the variance matrix of the inter-family random effects are s11 ¼ 5:00, s22 ¼ 1:00, and s12 ¼ s21 ¼ 0:25, and the elements of the variance matrix of the inter-response random effects are s11 ¼ s22 ¼ 2:0 and s12 ¼ s21 ¼ 1:2. These two variance matrices of two types of random effects in normal case are exactly the variance matrices of two types of random effects in mixture case. To visualize the mixtures of bivariate normal distributions for the inter-family and inter-response random effects, we present the contour plots and perspective plots for the densities of these two distributions in Fig. 1. The top panel shows the mixture distribution for the inter-family random effects, while the bottom panel shows the mixture distribution for the inter-response random effects. The family-wise random effects are close to being symmetric though not exactly, whereas the response-wise random effects are asymmetric. The root mean square errors of four estimators, based on 500 sample data sets of average cluster size 6 generated from mixture and normal random effects, are presented in Table 1. For all components of D and L, ML estimators are the best among all the estimators in both mixture and normal distribution cases. VLS is better than UVLS while ML is better than
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
Family
3349
Family
2 0.08 0.06 0.04 0.02 0.00 –4
1 y
0 –1
–2
x 0
–2 –2
–4
0 x
0 2
2
0.15 0.10 0.05 0.00 –4
0
y
y
Response
4
–2
–2
x 0
–4 –2
4
4 –4
4
2
Response
–4
–2
2
2
0 x
0 2
–2
2
4
y
4 –4
4
Fig. 1. Contour and perspective plots of bivariate normal mixture distributions for inter-family and inter-response random effects.
Table 1 Root mean square errors and relative biases of variance estimators for normal mixture and normal random effects, based on 500 replicates of data with 10 clusters and six observations on average per cluster. The variance matrices are denoted as D ¼ (dij) and L ¼ ðlij Þ, respectively. Root mean square error vls
Mixture
y
d11 d21 d22
5.00 0.25 1.00 2.00 1.20 2.00
l11 l21 l22 Normal d11 d21 d22
l11 l21 l22
y 5.00 0.25 1.00 2.00 1.20 2.00
1.720 0.876 0.606 0.539 0.356 0.368 vls 2.471 0.820 0.638 0.665 0.364 0.400
uvls 1.803 0.989 0.661 0.546 0.360 0.371 uvls 2.766 0.926 0.694 0.669 0.366 0.402
Relative bias ml 1.593 0.835 0.560 0.374 0.306 0.344 ml 2.407 0.779 0.598 0.377 0.312 0.373
reml 1.646 0.928 0.607 0.374 0.306 0.345 reml 2.672 0.866 0.649 0.377 0.312 0.374
vls 0.128 0.101 0.153 0.008 0.020 0.008 vls 0.077 0.118 0.170 0.029 0.017 0.003
uvls 0.006 0.073 0.005 0.019 0.018 0.013 uvls 0.051 0.091 0.024 0.001 0.015 0.008
ml 0.117 0.007 0.128 0.012 0.007 0.017 ml 0.056 0.104 0.139 0.007 0.016 0.015
reml 0.012 0.034 0.005 0.012 0.006 0.017 reml 0.056 0.088 0.008 0.006 0.014 0.014
REML for all variance components and both mixture and normal distributions. It is observed that VLS is comparable with ML and UVLS is comparable with REML. Relative biases of these estimators for the design of average cluster size 6 are also given in Table 1. The relative biases are computed to facilitate the parameter wise comparison for a given method, in addition to the method wise comparison for a given parameter. The relative bias plot of variance component estimators for the design of average cluster size 6 is presented in Fig. 2. For both mixture and normal random effects, VLS and ML have larger biases than UVLS and REML do in general. Similar to the result of root mean square errors, VLS and ML are comparable while UVLS and REML are comparable in terms of relative bias.
3350
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
Relative Bias of Unbalanced.Mixture
vls uvls
0.20
Relative Bias of Unbalance.Normal
ml reml
ml reml
0.10
Relative Bias
0.10
Relative Bias
vls uvls
0.20
0.00
0.00
–0.10
–0.10
–0.20
–0.20 0
1
2
3
4
5
Parameter Estimate
6
7
0
1
2
3
4
5
6
7
Parameter Estimate
Fig. 2. Relative bias plot of estimators for normal mixture and normal random effects, based on 500 replicates of data with 10 clusters and six observations on average per cluster. The horizontal axis denotes the six parameter estimates of variance components in the same order as in the first column of Table 1.
The root mean square errors and relative biases of four estimators, based on 500 sample data sets of average cluster size 3 generated from mixture and normal random effects, are also obtained (not shown). The observations made for the design of average cluster size 3 are similar to those made for the design of average cluster size 6.
4. Application In biomedical studies and public health sciences, multiple responses are jointly modeled to accommodate a correlation structure between the different responses in addition to the association between repeated measurements of one response. As an example, we apply VLS and UVLS estimators to a gene expression data studied by Morley et al. (2004) and Cheung et al. (2005). This data set consists of 3554 expression phenotypes (traits) in lymphoblastoid cells from members of Centre d’Etude du Polymorphisme Humain (CEPH) Utah pedigrees. The trait values are collected for 14 three-generation families. There are 194 individuals, approximately eight offspring per sibship and about 14 individuals per family (Cheung and Spielman, 2007). As there are phenotypic information for four grandparents, two parents, and about eight sibling children for each family, the relationships within each family include grandparent-offspring, parent-offspring and spouse. To satisfy our model assumption of uniform within-family correlation for each trait, we only choose sibling children of the third generation from each family to study the intra-sibling and inter-trait association for two selected traits. The statistical model for this data set is the same as that of the example in Section 3, except that the bivariate responses in this real example are two traits instead of a gene and a trait. The VLS and UVLS estimators of covariance matrices for between-sibship random effects and between-trait random effects are presented in Table 2. Upon checking the eigenvalues of D and L given in Table 2, both VLS and UVLS estimates are legitimate covariance matrices. To make a comparison, we also include the Gaussian maximum likelihood (ML) estimates and the Gaussian restricted maximum likelihood (REML) estimates of variance components in Table 2. Based on estimates of covariance matrices, we can assess the familial correlation r1 and trait correlation r2 . The familial correlations obtained by VLS and UVLS are slightly bigger than those by ML and REML, while the trait correlations obtained by VLS and UVLS are slightly smaller than those by ML and REML. The standard errors for ML and REML estimates of variance components are computed by the Fisher information, and standard errors for VLS and UVLS estimates of variance components are computed by leave-one-cluster-out Jackknife method. The 95% confidence intervals for familial correlation and trait correlation are computed by applying delta method to the Fisher transformation of correlation coefficient. The technical details of the derivation of covariance matrices of variance component estimates for ML and REML, and the Fisher delta confidence intervals for familial correlation and trait correlation coefficients are deferred to Appendices C and D.
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
3351
Table 2 Estimates of variance components for the gene expression family data with 14 sibships and eight siblings per sibship on average. The variance matrices are denoted as D ¼(dij) and L ¼ ðlij Þ, and the familial correlation and the trait correlation are denoted as r1 and r2 , respectively. Standard errors are inside parentheses, and 95% confidence intervals are inside square brackets. Parameter
VLS
UVLS
ML
REML
d11 d12 d22
0.174 (0.052) 0.161 (0.060) 0.238 (0.066)
0.189 (0.057) 0.173 (0.065) 0.260 (0.071)
0.171 (0.073) 0.153 (0.076) 0.236 (0.110)
0.186 (0.082) 0.164 (0.084) 0.258 (0.123)
l11 l12 l22
0.178 (0.034) 0.012 (0.019) 0.415 (0.164)
0.178 (0.034) 0.012 (0.019) 0.415 (0.164)
0.180 (0.021) 0.017 (0.024) 0.415 (0.054)
0.180 (0.021) 0.017 (0.024) 0.415 (0.054)
EigenðDÞ
0.369 0.042
0.401 0.048
0.359 0.047
0.390 0.054
EigenðLÞ
0.415 0.178
0.415 0.178
0.416 0.179
0.416 0.179
r1 r2
0.79 [0.24, 0.96] 0.05 [ 0.09, 0.17]
0.78 [0.25, 0.95] 0.05 [ 0.09, 0.17]
0.76 [0.22, 0.94] 0.06 [ 0.11, 0.23]
0.75 [0.20, 0.94] 0.06 [ 0.11, 0.23]
5. Concluding remarks It is possible that UVLS estimators of variance components are not nonnegative definite, especially for large q and k, or small m. Suppose we consider the data set consisting of only the first three families of the original data, then the UVLS estimate of the inter-family variance component is given by ^ ¼ 0:0188 0:0429 : D 0:0429 0:0785 Since the estimate of matrix D is not nonnegative definitive matrices, we can project it onto the space of all nonnegative ^ ¼ PAPT be the eigenvalue decomposition with definite matrices D þ ¼ fDjD is nonnegative definite matrixg. Let D A ¼ diag½a1 , . . . ,aq being the eigenvalue matrix and P being the eigenvector matrix. Then the projection (Demidenko, ^ þ ¼ PA þ P, where A þ ¼ diag½maxð0,a1 Þ, . . . ,maxð0,aq Þ. For the three family data set, we have 2004, p. 104) can be found as D ^ þ ¼ 0:0306 0:0116 : D 0:0116 0:0044 In this article, we have proposed estimators of variance components via least squares method for multivariate linear mixed model. Such estimators are reasonable alternatives to the Gaussian ML and REML estimators. First, UVLS estimators are unbiased estimators while ML and REML estimators are not under general distributional assumptions of inter-cluster and inter-response random effects. Second, variance least squares estimators are distribution-free estimators and are therefore robust to distributional misspecifications. Third, these non-iterative estimators are very attractive when it comes to high dimensional data analysis, like micro-array data analysis and joint modeling multiple longitudinal biological markers, where iterative procedures are impractical even given the current advanced computing technology. Fourth, these non-iterative estimators can serve as reasonable starting values for iterative procedures, such as the EM and the Newton– Raphson algorithms, to obtain parametric maximum likelihood estimators if the underlying distribution assumptions are truly justified. Since Patterson and Thompson (1971) proposed and Harville (1977) promoted the restricted Gaussian maximum likelihood estimator, majority of theoretical and computational work has been devoted to iterative Gaussian (restricted) maximum likelihood estimator. As there is no reason to believe that the Gaussian iterative maximum likelihood estimators perform the best for any distribution of random effects, it is advisable to make some efforts to reconsider and develop some distribution-free estimators for variance components, like VLS and UVLS. Given the performance of the non-iterative VLS and UVLS estimators, we conjecture that least squares related iterative estimators based on nonparametric empirical likelihood instead of the Gaussian likelihood, shall outperform the iterative Gaussian maximum likelihood estimators if the underlying random effects do not come from multivariate normal distributions. An area for further development is to investigate the asymptotic properties of the proposed estimators and iterative versions of VLS and UVLS estimators given an empirical likelihood.
Acknowledgment The author would like to thank Dr. Yixin Fang for providing the data from the Problem 1 in Genetic Analysis Workshop 15 supported by GAW Grant GM031575. The data generation was funded by Grant HG002386. The author is also grateful to the editor and two referees for their helpful comments that led to improvements in the manuscript.
3352
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
Appendix A. Variance least squares estimators To find the stationary equations for L and D, we differentiate the quadratic form (8) with respect to L and D. We start with the differentiation of (8) with respect to L. Since this differentiation involves derivative of the Kronecker product or block diagonal matrix, let us first find out the derivative of Ini L with respect to L. Following Dwyer (1967), we use the notation /XSa, b to indicate the specific a, b element of the matrix while its scalar value is xa, b , and more generally we simply use /XS in place of /XSa, b . For constant block diagonal matrix, the derivative of the matrix with respect to an element of the diagonal block and the derivative of an element of the matrix with respect to the diagonal block matrix, are given by @ðIni LÞ ¼ Ini Jab , @/LSab @/Ini LSgd ¼ @L
(
ð12Þ 2O, if ðg, dÞ= if ðg, dÞ 2 O,
0 Jab
ð13Þ
where the indicator matrix Jab is the k k matrix with zero in every position except for jab ¼ 1, O ¼ fðg, dÞ : ðg, dÞ is a location in diagonal blocksg, and the ab in second equation of the above is the corresponding index within L for index gd within Ini L. When L ¼ s2 , Eqs. (12) and (13) are reduced to @ðs2 Ini Þ ¼ Ini , @/s2 S
@/s2 ISgd ¼ 1gd , @s2
ð14Þ
where 1gd is the Kronecker indicator function with 1gd ¼ 0 for gad and 1gd ¼ 1 for g ¼ d. T T Let w ¼ trðBÞ, B ¼ ðe^ i e^ i Ini LZi DZ Ti Þ2 , and C ¼ e^ i e^ i Ini LZi DZ Ti . Using Dwyer’s (1967) notations, we have @/BSgd ¼ Kgd C þ CK gd , @C
@B ¼ Jab C þ CJab , @/CSab
where Jab is the indicator matrix of same dimension as C and Kgd is the indicator matrix of same dimension as B. By chain rule we have X @/BSgd X @/BSgd @/CSab @/Ini LSab ¼ ¼ /Kgd C þ CK gd Sab @L @C @ L @L ab a, b a, b ¼
X
/Kgd C þCK gd Sab Ja0 b0 ¼
ða, bÞ2O
ni X
T UjðiÞ ðKgd C þ CK gd ÞUjðiÞ ,
j¼1
0
T where a0 b is the corresponding withinL index for index ab, and UjðiÞ ¼ ð0, . . . ,0,Ik ,0, . . . ,0Þ with Ik at its jth column block is a k kni matrix. By the fact of ð@=@BÞ½tr ðBÞ ¼ I and chain rule again, we have 2 3 ni ni ni X X X X X @/BSgd @w X @w T T 4 T ¼ ¼ 1gd UjðiÞ ðKgd C þ CK gd ÞUjðiÞ ¼ UjðiÞ 1gd ðKgd C þ CK gd Þ5UjðiÞ ¼ UjðiÞ ðIC þ CIÞUjðiÞ @ L @L @B gd j¼1 j¼1 j¼1 g, d g, d g, d
¼ 2
ni X
T
T ^ ^ UjðiÞ ðe i e i Ini LZi DZ Ti ÞUjðiÞ :
ð15Þ
j¼1
Now we are in a position to differentiate (8) with respect to qD. By the fact of @/Zi DZ Ti Sab =@D ¼ ZiT Kab Zi and chain rule, we have X @/BSgd X @/BSgd @/CSab ¼ ¼ /Kgd C þ CK gd Sab ZiT Kab Zi @D @C @D ab a, b a, b ¼ ZiT ðKgd C þ CK gd ÞZi : Applying chain rule again we have X @w T ¼ 1gd ZiT ðKgd C þ CK gd ÞZi ¼ ðZiT ICZ i þ ZiT CIZ i Þ ¼ 2ZiT ðe^ i e^ i Ini LZi DZ Ti ÞZi : @D g, d
ð16Þ
By (15) and (16), the stationary equations for L and D are given by ni m X X
T UjðiÞ ðIni L þZi DZ Ti ÞUjðiÞ ¼
i¼1j¼1 m X i¼1
ZiT ðIni LÞZi þ
ni m X X
T
T ^ ^ e i e i UjðiÞ , UjðiÞ
ð17Þ
i¼1j¼1 m X i¼1
ZiT Zi DZ Ti Zi ¼
m X i¼1
T ZiT e^ i e^ i Zi :
ð18Þ
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
3353
P T T ^ ^ Let n ¼ m i ¼ 1 ni , l ¼ vecðLÞ, d ¼vec(D), ZiðjÞ ¼ UjðiÞ Zi , e iðjÞ ¼ UjðiÞ e i , and Ji ¼ ½ðIni Kkni ÞðvecðIni Þ Ik Þ Ik . Applying (4), (6) and (7), the vectorized forms of (17) and (18) are given by nIk2 l þ
ni m X X
½ZiðjÞ ZiðjÞ d ¼
i¼1j¼1 m X
ni m X X
e^ iðjÞ e^ iðjÞ ,
i¼1j¼1
ðZiT ZiT ÞJi l þ
i¼1
m X
½ðZiT Zi Þ ðZiT Zi Þd ¼
i¼1
m X
ðZiT e^ i Þ ðZiT e^ i Þ:
i¼1
Therefore, the VLS estimators are given by (9) as desired. Appendix B. Unbiased variance least squares estimators Pm Pm 1 T^ T^ T T ^ First, we consider the expectation of , i ¼ 1 ðZi e i Þ ðZi e i Þ. Recalling that e i ¼ ei Xi V j ¼ 1 Xj ej , where V ¼ ðX XÞ T T Covðei ,ej Þ ¼ 0, Eei ¼ 0, and Varðei Þ ¼ Eðei ei Þ ¼ Ini L þZi DZ i , we have 2 0 13 m m m m X X X X T^ T^ T T T T A5 4 @ ðZiT ei Þ ðZi e i Þ ðZi e i Þ ¼ ðZi ei Þ ðZi ei Þ Zi Xi V Xj ej i¼1
m X
2 ðZiT ei Þ
i¼1
0
4ZiT Xi V @
i¼1
Let
Pi ¼ Xi VX Ti , E½ðZiT ei Þ
m X
13 XjT ej A5 þ
j¼1
i¼1 m X
2
0
4Z T Xi V @
j¼1 m X
i
i¼1
13 XjT ej A5
2
0
4ZiT Xi V @
j¼1
m X
13 XjT ej A5:
j¼1
applying (7), (4) and (6), and the fact that Eðei eTj Þ ¼ 0 for iaj, we have
ðZiT ei Þ ¼ ðZiT ZiT ÞJi l þ ½ðZiT Zi Þ ðZiT Zi Þd,
82 9 0 13 m < = X T T A5 T 4 @ ðZi ei Þ ¼ ½ðZiT Pi Þ ZiT Ji l þ ½ðZiT Pi Zi Þ ðZiT Zi Þd, Xj ej E Z XV : i i ; j¼1
8 2 0 139 = < m X T T T A5 4 @ ¼ ½ZiT ðZiT Pi ÞJi l þ ½ðZiT Zi Þ ðZiT Pi Zi Þd: Xj ej E ðZi ei Þ Zi Xi V ; : j¼1
Similarly, by (7), (4), (6), and (5), we have 82 0 13 2 0 139 = < m m m X X X XjT ej A5 4ZiT Xi V @ XjT ej A5 ¼ ðZiT Xi VÞ ðZiT Xi VÞ ðXjT XjT ÞJj l E 4ZiT Xi V @ ; : j¼1
j¼1
þ ðZiT Xi VÞ ðZiT Xi VÞ
m X
j¼1
ðXjT Zj Þ ðXjT Zj Þd:
j¼1
Setting F3 ¼
F4 ¼ we have " E
m X
"
fðZiT i¼1
ðZiT Pi ÞJi g þ
m X
ðZiT Xi VÞ
ðZiT Xi VÞ
i¼1
½ðZiT Zi Þ i¼1
ðZiT e^ i Þ i¼1
ðZiT ÞJi ½ZiT
ðZiT Zi ÞðZiT Pi Zi Þ
ðZiT Zi ÞðZiT Zi Þ
ðZiT Pi Zi Þ þ
3 #2 m X T T 4 ðXj Xj ÞJj 5, j¼1
"
m X
m X
ZiT ÞJi ½ðZiT Pi Þ
m X
ðZiT Xi VÞ i¼1
3 #2 m X T T 4 ðX Zj Þ ðX Zj Þ5
ðZiT Xi VÞ
j
j
j¼1
#
ðZiT e^ i Þ
¼ F3 l þF4 d:
ð19Þ
Pm Pni T T ei and XiðjÞ ¼ UjðiÞ Xi , then e^ iðjÞ ¼ eiðjÞ XiðjÞ V e^ e^ iðjÞ . Let eiðjÞ ¼ UjðiÞ Second, we consider the expectation of i¼1 j ¼ 1 iðjÞ Pm T X e and l¼1 l l " # " # " # " # m m m m X X X X XlT el XiðjÞ V XlT el eiðjÞ þ XiðjÞ V XlT el XiðjÞ V XlT el : e^ iðjÞ e^ iðjÞ ¼ eiðjÞ eiðjÞ eiðjÞ XiðjÞ V l¼1
Applying (7), (4), (6) and
Eðei eTj Þ ¼ 0
for iaj, we have
E½eiðjÞ eiðjÞ ¼ Ik2 l þ ½ZiðjÞ ZiðjÞ d,
l¼1
l¼1
l¼1
3354
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
(
"
E eiðjÞ XiðjÞ V
m X
#) T ðXiðjÞ VX Ti ÞJi l þ ½ZiðjÞ ðXiðjÞ VX Ti Zi Þd, ¼ ½UjðiÞ
XlT el
l¼1
(" XiðjÞ V
E
m X
#
)
XlT el eiðjÞ
T Ji l þ ½ðXiðjÞ VX Ti Zi Þ ZiðjÞ d: ¼ ½ðXiðjÞ VX Ti Þ UjðiÞ
l¼1
Similarly, by (7), (4), (6), and (5), we have " # (" # " #) " # m m m m X X X X T T T T T T Xl el XiðjÞ V Xl el ðXl Xl ÞJl l þ½ðXiðjÞ VÞ ðXiðjÞ VÞ ðXl Zl Þ ðXl Zl Þ d: ¼ ½ðXiðjÞ VÞ ðXiðjÞ VÞ E XiðjÞ V l¼1
l¼1
l¼1
l¼1
By setting F1 ¼
ni m X X
( T Ik2 ½UjðiÞ
"
ðXiðjÞ VX Ti ÞJi ½ðXiðjÞ VX Ti Þ
T UjðiÞ Ji þ ½ðXiðjÞ VÞ
ðXiðjÞ VÞ
i¼1j¼1
F2 ¼
ni m X X
( ½ZiðjÞ ZiðjÞ ½ZiðjÞ
#)
m X
ðXlT l¼1
"
ðXiðjÞ VX Ti Zi Þ½ðXiðjÞ VX Ti Zi Þ
ZiðjÞ þ ½ðXiðjÞ VÞ ðXiðjÞ VÞ
i¼1j¼1
XlT ÞJl
m X
,
ðXlT Zl Þ l¼1
#)
ðXlT Zl Þ
,
we obtain 2 3 ni m X X e^ iðjÞ e^ iðjÞ 5 ¼ F1 l þF2 d: E4
ð20Þ
i¼1j¼1
Therefore, combining (19) and (20) yields the unbiased variance least squares estimators given by (10). Appendix C. Covariance matrices of the Gaussian ML and REML estimators Let SðfÞ be the gradient of the log likelihood function lðfÞ with f the combined fixed-effects and non-redundant random-effects parameter vector. The gradient function for the fixed-effects coefficient vector is m X
SðbÞ ¼
1 1 T 1 1 XiT ½S1 þ ZiT S1 i Si Zi ðD i Zi Þ Zi Si ðyi Xi bÞ,
i¼1
for both ML and REML. The ML gradient function for the variance component matrix of the family random effects is SðDÞ ¼
m m 1X 1X T 1 1 ½D1 D1 Ui D1 þ ½D1 Ui ZiT S1 , i ri ri Si Zi Ui D 2i¼1 2i¼1
P 1 where Ui ¼ ðD1 þ ZiT S1 ¼ ðD1 þ nj i¼ 1 ZijT L1 Zij Þ1 and ri ¼ yi Xi b. The REML gradient function for the variance i Zi Þ component matrix of the family random effects is SðDÞ ¼
m m 1X 1X T 1 1 ½Z T P Z þ ½D1 Ui ZiT S1 , i ri ri Si Zi Ui D 2i¼1 i i i 2i¼1
P 1 1 T T Xi Wi and Wi ¼ ðSi þ Zi DZ Ti Þ1 ¼ S1 ðZi Ui ZiT ÞS1 where Pi ¼ Wi Wi Xi ð m i ¼ 1 Xi Wi Xi Þ i S i . The ML gradient function for the variance component matrix of the trait random effects is SðLÞ ¼
ni ni m X m X 1X 1X ½L1 L1 Zij Ui ZijT L1 þ ½L1 gij gijT L1 , 2i¼1j¼1 2i¼1j¼1
T 1 where gij ¼ rij Zij Ui ðZiT S1 i ri Þ, rij ¼ yij Xij b, and Zi Si ri ¼ component matrix of the trait random effects is
SðLÞ ¼
Pni
j¼1
ZijT L1 rij . The REML gradient function for the variance
ni ni m X m X 1X 1X ½Pi jj þ ½L1 gij gijT L1 , 2i¼1j¼1 2i¼1j¼1
where ½Pi jj is the ½ðj1Þkþ 1 : jk,ðj1Þk þ 1 : jk block of Pi. To avoid the second-order derivative of log likelihood function, we can apply numerical differential method to the gradient function SðfÞ to obtain the Hessian HðfÞ of lðfÞ. The central difference numerical differential formula of order Oðh2 Þ is given by Di ðhÞ ¼
f ðx þ hui Þf ðxhui Þ , 2h
J. Han / Journal of Statistical Planning and Inference 141 (2011) 3345–3355
3355
where h is the step size of a small positive value and ui is the ith coordinate vector with ith component being 1 and all others being 0. The Hessian matrix HðfÞ can be treated as the Jacobian of the gradient function SðfÞ. The covariance matrix of the component estimates can be obtained by taking the inverse of the negative Hessian matrix. Appendix D. Fisher delta confidence intervals for correlation coefficients pffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi The familial correlation and trait correlation coefficients are computed by r1 ¼ d12 = d11 d22 and r2 ¼ l12 = l11 l22 , respectively, where the estimates of variance components dij and lij and their covariance matrices are available from VLS, UVLS, ML, and REML methods. Since the range of a correlation coefficient is within ( 1,1), we can apply the Fisher transformation given by 1 1þr z ¼ arctanhðrÞ ¼ log : 2 1r Then we can use delta method to obtain the variance of the transformed correlation coefficient estimate. Let f ¼ ðd^ 11 , d^ 12 , d^ 22 ÞT ,V ¼ VarðfÞ, and wðfÞ ¼ ðw1 ,w2 ,w3 Þ with w1 ¼
1:5 0:5 0:5d^ 11 d^ 12 d^ 22 , 2 1 1d1 11 d12 d22
w2 ¼
0:5 0:5 d^ 11 d^ 22 , 2 1 1d1 11 d12 d22
w3 ¼
0:5 1:5 0:5d^ 11 d^ 12 d^ 22 , 2 1 1d1 11 d12 d22
then the estimated variance of the Fisher-transformed correlation coefficient estimate is given by wðfÞT VwðfÞ, and the confidence interval based on normal approximation for the transformed correlation coefficient is obtained. The confidence interval for the original correlation coefficient can be computed by applying an inverse Fisher transformation to the confidence interval of the transformed correlation coefficient. References Amemiya, T., 1977. A note on a heteroscedastic model. Journal of Econometrics 6, 365–370. Calvin, J.A., Dykstra, R.L., 1991. Least squares estimation of covariance matrices in balanced multivariate variance components models. Journal of the American Statistical Association 86, 388–395. Chan, H.L.Y., 2002. Changing scene in hepatitis B serology interpretation. Hospital Medicine 63, 16–19. Cheung, V.G., Spielman, R.S., 2007. Data for genetic analysis workshop (GAW) 15, problem 1: genetics of gene expression variation in humans. BMC Proceedings 1 (Suppl. 1), S2. Cheung, V.G., Spielman, R.S., Ewens, K., Weber, T.M., Morley, M., Burdick, J.T., 2005. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365–1369. Demidenko, E., 2004. Mixed Models Theory and Applications. John Wiley & Sons, Inc., Hoboken, NJ. Dwyer, P.S., 1967. Some applications of matrix derivatives in multivariate analysis. Journal of the American Statistical Association 62, 607–625. Fieuws, S., Verbeke, G., 2006. Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles. Biometrics 62, 424–431. Han, J., 2011. Distribution-free estimators of variance components for multivariate linear mixed models. Journal of Nonparametric Statistics 23, 219–235 First published on: 28 September 2010 (iFirst). Harville, D.A., 1977. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association 72, 320–338. Henderson, H.V., Searle, S.R., 1979. Vec and vech operator for matrices, with some uses in Jacobians and multivariate statistics. The Canadian Journal of Statistics 7, 65–81. Laird, N.M., Ware, J.H., 1982. Random-effects model for longitudinal data. Biometrics 38, 963–974. Magnus, J.R., 1988. Linear Structures. Oxford University Press, London. Magnus, J.R., Neudecker, H., 1979. The commutation matrix: some properties and applications. The Annals of Statistics 7, 381–394. Morley, M., Molony, C.M., Weber, T.M., Devlin, J.L., Ewens, K.G., Spielman, R.S., Cheung, V.G., 2004. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747. Patterson, H.D., Thompson, R., 1971. Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554. Reinsel, G., 1982. Multivariate repeated-measurement or growth curve models with multivariate random-effects covariance structure. Journal of the American Statistical Association 77, 190–195. Schafer, J.L., Yucel, R.M., 2002. Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics 11, 437–457. Shah, A., Laird, N., Schoenfeld, D., 1997. A random-effects model for multiple characteristics with possibly missing data. Journal of the American Statistical Association 92, 775–779. Thie´baut, R., Jacqmin-Gadda, H., Chˆene, G., Leport, C., Commenges, D., 2002. Bivariate linear mixed models using SAS proc mixed. Computer Methods and Programs in Biomedicine 69, 249–256.