Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Bayesian Cholesky factor models in random effects covariance matrix for generalized linear mixed models Q1
Keunbaik Lee a , Jae Keun Yoo b,∗ a
Department of Statistics, Sungkyunkwan University, Seoul, 110-745, Republic of Korea
b
Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
article
info
Article history: Received 19 November 2013 Received in revised form 5 June 2014 Accepted 17 June 2014 Available online xxxx Keywords: Cholesky decomposition Longitudinal data Heterogeneity
abstract Random effects in generalized linear mixed models (GLMM) are used to explain the serial correlation of the longitudinal categorical data. Because the covariance matrix is high dimensional and should be positive definite, its structure is assumed to be constant over subjects and to be restricted such as AR(1) structure. However, these assumptions are too strong and can result in biased estimates of the fixed effects. In this paper we propose a Bayesian modeling for the GLMM with regression models for parameters of the random effects covariance matrix using a moving average Cholesky decomposition which factors the covariance matrix into moving average (MA) parameters and IVs. We analyze lung cancer data using our proposed model. © 2014 Elsevier B.V. All rights reserved.
1. Introduction Random effects in generalized linear mixed models (GLMM) are used to explain the serial correlation of the longitudinal categorical data (Breslow and Clayton, 1993). Because the covariance matrix is high dimensional and should be positive definite, its structure is assumed to be constant over subjects and to be restricted such as AR(1) structure. However, these assumptions are too strong and can result in biased estimates of the fixed effects (Heagerty and Kurland, 2001). To release these assumptions, regression analysis of the random effects covariance matrix was proposed. Pourahmadi (1999, 2000) proposed a modified Cholesky decomposition approach and similar models were proposed for linear (mixed) models (Pourahmadi and Daniels, 2002; Daniels and Zhao, 2003; Pourahmadi and Daniels, 2002; Daniels and Zhao, 2003; Pan and MacKenzie, 2003, 2006) and for GLMMs (Lee et al., 2012; Lee, 2013), respectively. The modified Cholesky decomposition factors the inverse covariance matrix and results in a set of dependence parameters, generalized autoregressive parameters (GARP) and a set of variance parameters, the innovation variances (IV). The positive definiteness restriction of the covariance matrix is removed using a log-linear model for IV (Pourahmadi, 1999, 2000). Recently another approach was proposed using a moving average Cholesky decomposition of the random effects covariance matrix for linear models (Zhang and Leng, 2012). This decomposition factors the covariance matrix into moving average (MA) parameters and IVs instead of the modified Cholesky decomposition of the inverse covariance matrix. This approach presents the moving average interpretation as an alternative to the autoregressive models. Similar to the modified Cholesky decomposition, the covariance matrix is positive definite when the IV is positive. In this paper, we propose a Bayesian modeling for the GLMM with regression models for parameters of the random effects covariance matrix obtained using Zhang and Leng (2012)’s Cholesky decomposition. In addition, we compare the GLMM
∗
Corresponding author. E-mail address:
[email protected] (J.K. Yoo).
http://dx.doi.org/10.1016/j.csda.2014.06.016 0167-9473/© 2014 Elsevier B.V. All rights reserved.
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2
K. Lee, J.K. Yoo / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
6
using the modified Cholesky decomposition (Lee et al., 2012) to that using the moving average Cholesky decomposition using real data. The paper is organized as follows: In Section 2, we review the regression models for the random effects covariance matrix using the modified Cholesky decomposition approach. In Section 3, we propose a Bayesian GLMM with a random effects covariance matrix using the moving average Cholesky decomposition for longitudinal binary data. In Section 4, we analyze lung cancer data using our proposed model. Finally, a brief summary is included in Section 5.
7
2. Review of GLMM with random effects covariance matrix using modified Cholesky decomposition
1 2 3 4 5
8 9 10 11 12
In this section, we review the modified Cholesky decomposition approach of random effects covariance matrix. Denote the categorical response vector of the ith subject by yi = (yi1 , . . . , yini )T for i = 1, . . . , N where yit is a categorical response at time t for subject i. We assume that each Yit is conditionally independent given bi and the responses for different subjects are independent. Let µit be a conditional expectation of Yit given random effect bit . We also let xit indicate covariates corresponding to yit . We assume the following generalized linear mixed model:
13
g (µit ) = xTit β + bit ,
14
bi ∼ N (0, Σi ),
15 16 17 18 19 20 21
(1) (2)
where g (·) is a link function, β is a p × 1 vector of fixed effects, bi = (bi1 , . . . , bini ) is a ni × 1 dimensional vector of random effects, and the random effects covariance matrix Σi is indexed by subject i. bi represents the vector of random effect values for subject i. To solve the positive-definiteness of Σi , we decompose the random effects covariance matrix using the modified Cholesky decomposition. The random effect bi is assumed that bit is regressed on its predecessors bi1 , . . . , bit −1 . More precisely, we have T
bi1 = ei1 ,
(3)
t −1 22
bit =
φi,tj bij + eit ,
for t = 2, . . . , ni
(4)
j =1 23 24 25 26 27 28 29 30 31
32 33 34
where ei = (ei1 , . . . , eini )T ∼ N (0, Di ) with Di = diag(σi1 , . . . , σini ). Then, for t = 1, . . . , ni , we can write (3) in the matrix form as T i bi = e i
(5)
where Ti is a unit lower triangular matrix having ones on its diagonal and −φi,tj at its (t , j)th position for j < t. From (5), we have Ti Σi TiT = var(ei ) = Di ,
(6)
where Di is diagonal with σi2,t = var(eit ) as its diagonal entries. The generalized autoregressive parameters (GARP) are represented by φ , and σi2,t denotes the innovation variances (IV). For Σi to be positive definite the IV must be positive. The parameters GARP and IV can be modeled using time and/or subject-specific covariate vectors wi,tj and hi,t by setting log(σi2,t ) = hTi,t λ, (7) φi,tj = wiT,tj γ , where γ and λ are a × 1 and b × 1 vectors of unknown dependence and variance parameters, respectively. Note that design vectors wi,tj and hi,t are used to model the GARP/IV parameters as functions of subject-specific covariates. These kinds of
37
models have been commonly used for random effects covariance matrices in (generalized) linear mixed models to allow the covariance matrix to vary across subjects (Pourahmadi, 2000; Pourahmadi and Daniels, 2002; Daniels and Zhao, 2003; Lee et al., 2012; Lee, 2013).
38
3. Bayesian moving average regression model for random effects covariance matrix
35 36
40
We now propose Bayesian GLMMs with the random effects covariance matrix using the moving average Cholesky decomposition.
41
3.1. Moving average Cholesky decomposition of random effects covariance matrix
39
42
43
We also assume the GLMM in (1) and (2). To model Σi , we assume that, for t = 2, . . . , ni , bit =
t −1
li,tj eij + eit ,
(8)
j =1 44 45 46
where li,tj are moving average (MA) coefficients, and ei ∼ N (0, Di ) for ei = (ei1 , . . . , eini )T . Note that bi1 = ei1 . Then we rewrite (8) in matrix form as b i = Li e i
K. Lee, J.K. Yoo / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
3
where Li is a unique lower triangular having ones on its diagonal and li,tj at its (t , j)th position for j < t. Then the covariance matrix for bi is
Σi =
Li Di LTi
.
li,tj =
1 2 3
The constraint on these parameters for Σi to be positive definite is that the IV needs to be positive. Note that by letting Li = Ti−1 , we can write Σi = Ti−1 Di Ti−T which is equivalent to (6). Similar to (7), the parameters, MA and IV, can be modeled using time and/or subject-specific covariate vectors zi,tj and hi,t by setting ziT,tj
Q2
η,
log(σ ) = 2 i ,t
hTi,t
λ,
4 5 6 7
(9)
8
where η and λ are a × 1 and b × 1 vectors of unknown dependence and variance parameters, respectively. Thus, this approach reduces the number of parameters for the covariance matrix using regression models for the IV and MA parameters.
10
3.2. Prior distributions
11
Using the MA Cholesky decomposition, we form priors that are diffused. The diffused prior distributions for β, η, and λ are given by
9
12 13
β ∼ N (0, σβ I ),
(10)
14
η ∼ N (0, ση2 I ),
(11)
15
λ ∼ N (0, σλ I ),
(12)
16
2
2
where σβ2 , σγ2 , and σλ2 are large values (for example, 100).
17
3.3. Bayesian analysis via MCMC
18
To derive the likelihood function for the GLMM expressions (1) and (2) we use the simple case of longitudinal binary data. Then the link function g (·) is the logit and we let ω = (β T , ηT , λT ). The joint distribution of sample and the random effects in (1) and (2) are given by
19 20 21
ni
L(ω; y, b) =
N
pit (bit , β)yit (1 − pit (bit , β))1−yit φ(bi |η, λ)
(13)
22
i =1 t =1
where pit (bit , β) = P (Yit = 1|bit , β, xit ) and φ(bi ; η, λ) is a multivariate normal density with mean vector 0 and covariance matrix Σi which is simplified as ni
φ(bi |η, λ) = (2π )− 2
ni 2 − 21 1 T −T −1 −1 exp − bi Li Di Li bi . σi,t
From the distribution (13) and prior distributions (10)–(12), the joint distribution is given by
P (y, b, β, η, λ) ∝ φ(β; σβ2 )φ(η; ση2 )φ(λ; σλ2 )
n N i i=1
24
25
2
t =1
23
(pcit (bit , β))yit (1 − pcit (bit , β))1−yit
26
φ(bi |η, λ) .
27
t =1
Full conditional posterior distributions are required to implement the MCMC algorithm (Gelman et al., 2004), and they are given as follows:
• For β ,
28 29 30
P (β|y, b, η, λ) ∝
ni N
(pcit (bit , β))yit (1 − pcit (bit , β))1−yit
φ(β; σβ2 )
31
i =1 t =1
∝
ni N
( (bit , β)) (1 − pcit
yit
pcit
(bit , β))
1−yit
exp −
i =1 t =1
• For η,
1 2σβ2
β β . T
32
33
P (η|y, b, β, λ) ∝
N
φ(bi |η, λ) φ(η; ση2 )
34
i =1
∝ exp −
N 1
2 i =1
T −1 −1 bTi L− i Di Li bi
exp −
1 2ση2
η η . T
35
4
1
2
K. Lee, J.K. Yoo / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
• For λ, P (λ|y, b, β, η) ∝
N
φ(bi |η, λ) φ(λ; σλ2 )
i=1
ni N N 2 − 21 1 T −T −1 −1 1 T ∝ σit exp − bi Li Di Li bi exp − 2 λ λ . 2 i=1 2σλ i=1 t =1
3
4
5
• For bi (i = 1, . . . , N), P (bi |y, β, η, λ) ∝
n i
( (bit , β)) (1 −
pcit
(bit , β))
( (bit , β)) (1 −
pcit
(bit , β))
pcit
yit
φ(bi |η, λ)
1−yit
t =1
∝
6
n i t =1
7 8 9 10 11 12 13 14 15 16
pcit
yit
1−yit
exp −
N 1
2 i=1
T −1 −1 bTi L− i D i Li b i
.
Here φ(·; σ·2 ) is the multivariate normal probability density with mean 0 and covariance matrix σ·2 I. Since all full conditionals are intractable analytically and not easily generated from, we have to construct suitable proposals for a Metropolis– Hastings step (Hastings, 1970; Gamerman, 1997). In practice, Gibbs sampling is implemented using WinBUGS (http://www. mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml). The MCMC algorithm simulates direct draws from the above full conditionals iteratively until convergence is achieved. Two long chains are used for proposed model with different initial values. In Bayesian modeling, there are several model selection criteria such as posterior predictive loss, Bayes factor and deQ3 viance information criterion (DIC) (Daniels and Hogan, 2008). In this paper, we use DIC for Bayesian model comparison (Spiegelhalter et al. 2002). Let θ be a vector of all parameters and let L(θ|y) be the likelihood of y = (y1 , . . . , yN )T . Then the DIC is given as DIC = D(θ ) + pD
21
where D(θ ) = −2 log L(θ|y) measures the deviance evaluated at the posterior mean of the parameters (goodness-of-fit) and pD = Dev(θ ) − D( θ ) measures the effective dimension (penalty of complexity) of model. The DIC is then defined analogously to AIC and models with smaller DIC should be preferred to models with larger DIC. The advantages of DIC over other criteria, for Bayesian model selection, is that the DIC is easily calculated from the MCMC samples. In contrast, AIC and BIC require calculating the likelihood at its maximum values, which are not easily available from the MCMC simulation.
22
4. Lung cancer data
17 18 19 20
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
This trial was designed as a prospective open-label randomized non-comparative parallel study in a single institute to evaluate the response rate (the percentage of patients whose cancer shrinks or disappears after treatment) for two treatments (GEFITINIB or ERLOTINIB) (Kim et al., 2012). A total of 96 patients with lung cancer was randomly assigned to either GEFITINIB or ERLOTINIB. The overall response rates were 47.9% (95% CI, 33.8%–62.0%) in the GEFITINIB arm and 39.6% (95% CI, 25.7%–53.4%) in the ERLOTINIB arm (Kim et al., 2012). However, in this paper, it was of interest to see if there was a negative impact of treatments on patients’ quality of life (QOL). The questionnaire for health-related QOL consisted of 30 items developed by the European Organization for Research and Treatment of Cancer (EORTC). Specifically, we examined one of the QOL measures, loss of appetite (Appetite). Appetite was originally scored on a 4-point scale ranging from normal, usually did not lose my appetite at all (1) to no appetite, I usually lost my appetite (4). For the purpose of this paper we dichotomize the score between scores 1–2 (Y = 0) and 3–4 (Y = 1). In our analysis, we used 93 patients without missing data at baseline. To examine treatment differences in appetite levels, we included type of treatment(GROUP = 1 for GEFITINIB, 0 for ERLOTINIB) and visit number(TIME = 0.0, 0.1, . . . , 2.4) re-scaled. We assumed the missing responses (mostly due to dropout) were MAR in our initial analysis. Fig. 1 presents marginal proportions of Y = 1 for two treatment groups. The proportion of Y = 1 for two groups was Q4 similar until visit 15. Then the proportion of Y = 1 for GEFITINIB was larger than that for ERLOTINIB. We fit five models for Σi using various structures. In particular, we considered the models specified in Table 1. Models 1 and 2 are GLMMs with random effects covariance matrix using AR(1) and AR(2) modified Cholesky decomposition, respectively. Both models have the homogeneous covariance matrix. Models 3 and 4 are GLMMs with random effects covariance matrix using MA(1) and MA(2) modified Cholesky decomposition, respectively. Both models have the homogeneous covariance matrix. Model 5 is GLMMs with independent homogeneous random effects covariance matrix. To examine the convergence of Markov chains, we simulated chains of 30,000 iterations two times starting from two different initial values. As an example, we illustrated in Fig. 2 the convergence of Markov chains for parameter β2 , which indicated the coefficient of Group. The figure shows that two Markov chains with different initial values converge to the same target distribution. Similar patterns are found with other parameters. Therefore, we concluded that the Markov chains converged properly for estimation of our model.
K. Lee, J.K. Yoo / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
5
Fig. 1. Marginal proportions of Y = 1 under two groups. Solid line (GEFITINIB), dashed line (ERLOTINIB). Table 1 Table of φitj , litj , and log σit . Model 1: Model 2: Model 3: Model 4: Model 5:
φitj = γ0 I(|t −j|=1) φitj = γ0 I(|t −j|=1) + γ2 I(|t −j|=2) litj = η0 I(|t −j|=1) litj = η0 I(|t −j|=1) + η2 I(|t −j|=2)
log σit2 log σit2 log σit2 log σit2 log σit2
= λ0 = λ0 = λ0 = λ0 = λ0
Fig. 2. Convergence diagnostics: time plot of two Markov chains with different initial values.
Table 2 presents posterior means for parameters for Models 1–5. There is no proper method to verify an AR or MA structure of the random effects covariance matrix for GLMMs. Therefore, we compared the 5 models with the random effects covariance matrix having AR or MA structures by using a model selection criterion (DIC). DIC values indicate that Model 4 provided a better fit than the other models. Therefore, we focus on Model 4. The estimates of IV and MA(1) were significant and the estimate of MA(2) was not significant. In ERLOTINIB (Groupi = 0), the log odds of the conditional posterior probability of losing appetite decreased by 0.252 (2.52 × 0.1) as visit number increased by 0.1. In GEFITINIB (Groupi = 1), the log odds of the conditional posterior probability of losing appetite increased by 0.067 (= (−2.25 + 3.19) × 0.1) as visit number increased by 0.1.
1 2 3 4 5 6 7 8
5. Conclusion
9
We proposed moving average Cholesky regression models for the random effects covariance matrix of generalized linear mixed models. The covariance matrix is factored to the IV and MA which are modeled with measured covariates. The positive definiteness restriction of the covariance matrix is solved when the IVs are positive. This approach also reduces the number of parameters for the covariance matrix using regression models for the IV and MA parameters. The proposed models were implemented using Bayesian approach for parameter estimation. WinBUGS program was used to fit the Bayesian models. The WinBUGS code is available upon request. The analysis of lung cancer data showed that the estimated conditional posterior probability of losing appetite respectively decreased in ERLOTINIB and increased in GEFITINIB as visit number increased.
10 11 12 13 14 15 16 17
Uncited references
18
Q5
Lee et al., 2009.
19
6
K. Lee, J.K. Yoo / Computational Statistics and Data Analysis xx (xxxx) xxx–xxx
Table 2 Posterior means (95% Bayesian confidence interval) of fixed effects parameters, GARP (MA), and IV for Models 1–4 and fixed effects parameters and variance parameters for Model 5, respectively. Model 1 Fixed effects parameters: β Int −0.80 (−2.56, 0.79) Time −2.38* (−4.25, −0.81) Group −2.01 (−4.31, 0.09) Time×Group 2.79* (0.72, 5.08) GARP or MA parameters: γ or η Int (AR(1) or 0.92* (0.82, 1.01) MA(1)) Int (AR(2) or MA(2))
2 3 4 5 6 7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Model 3
Model 4
Model 5
−0.84 (−2.70, 0.79) −2.46* (−4.37, −0.88) −2.04* (−4.41, 0.14)
−2.19* (−4.53, −0.41) −2.49* (−4.30, −1.05) −2.51* (−4.99, −0.21)
−2.22* (−4.79, −0.34) −2.52* (−4.52, −1.01) −2.52* (−5.17, −0.06)
−1.27 (−2.92, −0.35) −1.04* (−2.13, −0.35) −1.21* (−2.71, −0.05)
2.84* (0.80, 5.18)
3.17* (1.32, 5.36)
3.19* (1.18, 5.51)
0.92* (0.82, 1.00)
0.79* (0.56, 1.09)
0.79* (0.56, 1.09)
0.00 (−2.33, 2.35)
1.58 (0.62, 3.09)
0.00 (−2.38, 2.37)
IV parameters: λ Int
1.49* (0.55, 2.60)
1.60* (0.65, 2.86)
2.61* (1.56, 3.61)
2.63* (1.56, 3.78)
0.46 (−2.03, 2.88)
DIC
468.990
460.523
453.559
452.661
761.196
*
1
Model 2
Indicates statistical significance with 95% confidence level.
Acknowledgments We would like to thank Dr. Myung-Ju Ahn of Samsung Medical Center in South Korea for providing the data and for their help in data collection and clarifying some issues with data. For Keunbaik Lee, this work was supported by Basic Science Research Program through the National Research Foundation of Korea (KRF) funded by the Ministry of Education, Science Q6 and Technology (NRF-2012R1A1A1004002). For Jae Keun Yoo, this work was supported by Basic Science Research Program through the National Research Foundation of Korea (KRF) funded by the Ministry of Education, Science and Technology Q7 (NRF-2012R1A1A1040077). References Breslow, N.E., Clayton, D.G., 1993. Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88, 125–134. Daniels, J.M., Zhao, Y.D., 2003. Modelling the random effects covariance matrix in longitudinal data. Stat. Med. 22, 1631–1647. Gamerman, D., 1997. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman & Hall, London. Q8 Gelman, A., Carlin, J., Stern, H., Rubin, D.B., 2004. Bayesian Data Analysis, second ed. Chapman & Hall, London, UK, 1995. Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. Heagerty, P.J., Kurland, B.F., 2001. Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 88, 973–985. Lee, K., 2013. Bayesian modeling of random effects covariance matrix for generalized linear mixed models. Comm. Statist. Appl. Methods 20, 235–240. Lee, K., Joo, Y., Yoo, J.K., Lee, J., 2009. Marginalized random effects models for multivariate longitudinal binary data. Stat. Med. 28, 1284–1300. Lee, K., Yoo, J.K., Lee, J., Hagan, J., 2012. Modeling the random effects covariance matrix for the generalized linear mixed models. Comput. Statist. Data Anal. 56, 1545–1551. Pan, J.X., MacKenzie, G., 2003. On modelling mean–covariance structures in longitudinal studies. Biometrika 90, 239–244. Pan, J.X., MacKenzie, G., 2006. Regression models for covariance structures in longitudinal studies. Stat. Model. 6, 43–57. Pourahmadi, M., 1999. Joint mean–covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika 86, 677–690. Pourahmadi, M., 2000. Maximum likelihood estimation of generalized linear models for multivariate normal covariance matrix. Biometrika 87, 425–435. Pourahmadi, M., Daniels, M.J., 2002. Dynamic conditionally linear mixed models for longitudinal data. Biometrics 58, 225–231. Zhang, W., Leng, C., 2012. A moving average Cholesky factor model in covariance modelling for longitudinal data. Biometrika 99, 141–150.