Computational Statistics and Data Analysis 56 (2012) 1944–1951
Contents lists available at SciVerse ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
A combined overdispersed and marginalized multilevel model✩ Samuel Iddi a , Geert Molenberghs b,a,∗ a
I-BioStat, Katholieke Universiteit Leuven, B-3000 Leuven, Belgium
b
I-BioStat, Universiteit Hasselt, Agoralaan 1, B-3590 Diepenbeek, Belgium
article
info
Article history: Received 16 June 2011 Received in revised form 24 November 2011 Accepted 28 November 2011 Available online 8 December 2011 Keywords: Combined model Correlation Overdispersion Partial marginalization
abstract Overdispersion and correlation are two features often encountered when modeling nonGaussian dependent data, usually as a function of known covariates. Methods that ignore the presence of these phenomena are often in jeopardy of leading to biased assessment of covariate effects. The beta-binomial and negative binomial models are well known in dealing with overdispersed data for binary and count data, respectively. Similarly, generalized estimating equations (GEE) and the generalized linear mixed models (GLMM) are popular choices when analyzing correlated data. A so-called combined model simultaneously acknowledges the presence of dependency and overdispersion by way of two separate sets of random effects. A marginally specified logistic-normal model for longitudinal binary data which combines the strength of the marginal and hierarchical models has been previously proposed. These two are brought together to produce a marginalized longitudinal model which brings together the comfort of marginally meaningful parameters and the ease of allowing for overdispersion and correlation. Apart from model formulation, estimation methods are discussed. The proposed model is applied to two clinical studies and compared to the existing approach. It turns out that by explicitly allowing for overdispersion random effect, the model significantly improves. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Non-Gaussian hierarchical outcomes are commonly encountered in applied statistics. A rich array of methodology, mostly based on extending the univariate generalized linear models family (GLM; (Nelder and Wedderburn, 1972; Agresti, 2002; McCullagh and Nelder, 1989)), has become available, with a lot of recent work in the area (Hoff, 2011; Commenges et al., 2011). Roughly speaking, methods can be divided into population-averaged models, such as generalized estimating equations (Liang and Zeger, 1986) and subject-specific models, such as the generalized linear mixed model (GLMM; (Engel and Keen, 1994; Wolfinger and O’Connell, 1993; Breslow and Clayton, 1993)). The random effects also capture overdispersion to some extent but the model fit may be compromised because such random effects should be able to handle intra-subject variability and overdispersion simultaneously. Illustrations of this inadequacy are given in Molenberghs et al. (2007). In all cases, apart from the data hierarchy, overdispersion can be encountered, often even in the univariate case. Appropriate models have been developed in this respect, such as the negative binomial model for count data (Hinde and Demétrio, 1998a,b). Molenberghs et al. (2007, 2010) brought together and extended the aforementioned modeling frameworks to accommodate overdispersion and correlation in repeated, overdispersed non-Gaussian data through the introduction of two sets of
✩ The software code used for this paper is available as Supplementary Materials.
∗
Corresponding author at: I-BioStat, Universiteit Hasselt, Agoralaan 1, B-3590 Diepenbeek, Belgium. Tel.: +32 11268238. E-mail addresses:
[email protected],
[email protected] (G. Molenberghs).
0167-9473/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2011.11.021
S. Iddi, G. Molenberghs / Computational Statistics and Data Analysis 56 (2012) 1944–1951
1945
random effects. The model is simply called the combined model (CM): an overdispersion random effect, often of a classical conjugate type, is introduced to handle overdispersion and juxtaposed with longitudinal random effects, often normally distributed, to cope with the hierarchical nature of the data. As a by-product, many of these models allow for analytical derivation of the marginal distributions, as well as their corresponding moments. This, in turn, is useful when marginal correlations or other explicit functions need to be derived. However, the derivations do require some algebra and, more importantly, the corresponding marginal functions may be neither very intuitive nor easy to manipulate, given that some contain infinite series. It is therefore sensible to turn to the concepts laid out in Heagerty (1999) and Heagerty and Zeger (2000), who proposed a so-called marginally specified logistic-normal model for longitudinal binary data, combining the strength of GEE and GLMM. The method specifies, at first sight contrary to intuition, a separate model for the marginal and conditional means and links them using a function that depends on covariates, marginal parameters, and the random-effect specification. Hence, both a marginal and conditional interpretation of the parameters can be maintained. Also, direct comparison can be made between the marginal and conditional parameters without fitting separate models. This model, called the marginalized multilevel model (MMM) also enjoys the advantages of utilizing likelihood-based estimation, such as the availability of expressions for the full probability distribution of the response (Fitzmaurice and Laird, 1993; Molenberghs and Lesaffre, 1994), and produces valid inferences when data are missing at random (MAR), among others. The CM, just like the GLMM, is a conditionally specified model which fails to provide a simple population-averaged interpretation for the covariate effects, in spite of the aforementioned closed forms. On the other hand, the MMM may fail to flexibly capture overdispersion and correlation simultaneously. Therefore, we bring together in a single modeling framework the attractive features of both CM and MMM: the combined overdispersed and marginalized multilevel model (COMMM). The remainder of the paper is organized as follows. We devote Section 2 to introducing two case studies, one in count data and one in binary data. Models specific for these situations are introduced, parameters fitted, and inferences drawn. Thereafter, a fully general account of methodology for the existing CM and MMM, and our proposed COMMM will be presented in Section 3. The estimation method adopted in this study is explained in Section 4. 2. The analysis of two case studies 2.1. Count data: a clinical trial in epileptic patients The data are from a randomized, double-blind, parallel group multi-center study for the comparison of placebo with a new anti-epileptic drug (AED), in combination with one or two other AED’s (Faught et al., 1996). Patients were randomized after a 12-week stabilization period for the use of AED’s, and during which the number of seizures were counted. After that run-in period, 45 patients were assigned to the placebo group, 44 to the new treatment. Patients were measured weekly and followed (double-blind) during 16 weeks; thereafter they entered a long-term open-extension study. Some patients were followed for up to 27 weeks. The outcome of interest is the number of epileptic seizures experienced during the latest week, i.e., since the last time the outcome was measured. The research question is whether or not the additional new treatment reduces the number of epileptic seizures. Let the number of epileptic seizures that patient i experienced during follow-up period j be Yij and tij the time point at which Yij was measured. In view of the research question to be answered, one assumes Yij to follow a Poisson distribution and focuses on the model: log(πijm ) =
β00 + β01 tij β10 + β11 tij
if placebo, if treatment,
(1)
where πijm is the marginal mean function. To allow for correlation between repeated measures, the conventional way would be to include a normally distributed random effect, say bi ∼ N (0, d), into the predictors in (1). However, the so-resulting generalized linear mixed model would no longer allow for a marginal interpretation. Therefore, in line with Heagerty (1999), we leave (1) intact, but rather specify, at the same time, the conditional model log(πijc ) = ∆ij + bi with bi ∼ N (0, d). Here, ∆ij connects the marginal and conditional specification. In Section 3, we return to its precise nature. This is our first instance of an MMM. The data may also exhibit overdispersion. In that case, a COMMM can be considered, for which the mean is written as πijc = θij exp(∆ij + bi ) where θij ∼ Gamma(α1 , α2 ). Note that the constraint α2 = 1/α1 needs to be imposed because this model is over-parameterized. If this addition is done without marginalization, then we refer to the resulting model as the combined model (CM), in line with Molenberghs et al. (2007, 2010). A fully general introduction of the model is given in Section 3, while details about the estimation procedure are presented in Section 4. We now continue with the data analysis. Parameter estimates and standard errors for the log–log-normal MMM and the gamma-log–log-normal COMMM model are presented in Table 1. Observe that the parameter estimates for the two models are very similar, with the same holding for the standard errors. This will be different in the next example. This notwithstanding, the log–log-normal model improves when the gamma random effect is introduced, as seen from a likelihood ratio comparison. This crucially affects inferences about the difference between the slopes as well as the ratio of the slopes. For the log–log-normal model, the difference of the
1946
S. Iddi, G. Molenberghs / Computational Statistics and Data Analysis 56 (2012) 1944–1951 Table 1 A clinical trial in epileptic patients. Comparison of the log–log-normal MMM with the combined gamma and log–lognormal MMM.
Effect
Parameter
Intercept placebo Slope placebo Intercept treatment Slope treatment Std. dev. random effect Negative-binomial par. Negative-binomial par. −2 log-likelihood
β00 β01 β10 β √11 d
α1 α2 =
1
α1
CM gamma and log-normal
MMM log–log-normal
COMMM gamma and log–log normal
Estimate (s.e.)
Estimate (s.e.)
Estimate (s.e.)
0.9112 (0.1755)
1.3960 (0.1887)
1.4757 (0.1962)
−0.0248 (0.0077)
−0.0143 (0.0044)
−0.0248 (0.0077)
0.6555 (0.1782)
1.2256 (0.1901)
1.2200 (0.1970)
−0.0118 (0.0075)
−0.0120 (0.0043)
−0.0118 (0.0075)
1.0625 (0.0871) 2.4640 (0.2113) 0.4059 (0.0348) −7664
1.0755 (0.0857) – – −6810
1.0625 (0.0871) 2.4640 (0.2113) 0.4059 (0.0348) −7664
slopes β11 −β01 was found not to be significantly different form zero while the ratio of the slopes β11 /β01 showed a significant difference from one (p = 0.7111 and p = 0.0376, respectively). On the other hand, both the slope difference (p = 0.2260) and ratio (p = 0.1591) showed non-significance in the combined model. To understand this, two things need to be borne in mind. First, the above demonstrates that, due to more careful modeling of the association and dispersion structures, inferences about functions of the model parameters may be erroneous in the simpler model, underscoring that care must be taken regarding conclusions based on the simpler model. Indeed, it would lead to a significant treatment difference, whereas the more general combined model showed no evidence for treatment difference. Similar observations were also made by Molenberghs et al. (2007), where the combined Poisson–gamma-normal showed a strong improvement of the Poisson GLMM model, underscoring the importance of introducing the gamma random effect. Second, and very important, one should not directly compare the estimates in the marginalized and the conditional version. Indeed, in the MMM model, treatment effects, slopes, etc. have a marginal interpretation. In addition, we can examine the results of fitting a combined beta and log-normal model, which is purely conditionally specified. The interpretation of the latter should be considered at the individual level, or at least for a change between two patients with different covariate profile (e.g., treated versus non-treated), but with the same level of the random effect. We deduce from these results that for a random intercept model, only the intercept parameters are affected but all other parameters remain the same compared to the combined gamma and log–log-normal model. These would, however, not be the same, for example, for a random intercept and slope model. Given that the log link was used for both marginal and conditional models, we see further that the log-likelihood remains the same across both combined models. 2.2. A clinical trial in onychomycosis The data come from a randomized, double-blind, parallel group, multicenter study for the comparison of two oral treatments, A and B, for toenail dermatophyte onychomycosis (TDO), described in full detail by De Backer et al. (1996). TDO is a common toenail infection, difficult to treat, affecting more than 2% of western populations (Roberts, 1992). Antifungal compounds, classically used for treatment of TDO, need to be taken until the whole nail has grown out healthily. The development of new-generation compounds has reduced the treatment duration to 3 months. The aim of the present study was to compare the efficacy and safety of 12 weeks of continuous therapy with treatment A or with treatment B. In total, 2 × 189 patients were randomized. Subjects were followed during 12 weeks of treatment and followed further, up to a total of 48 weeks. Measurements were taken at baseline, every month during treatment, and every 3 months from then on, leading to a maximum of 7 measurements per subject. The outcome of interest was the severity of the infection, coded as 0 (not severe) or 1 (severe). The question of interest was whether the percentage of severe infections decreased over time, and whether that evolution was different for the two treatment groups. Also here, both the conditional as well as the marginal mean will be specified. The conditional model takes the form: Yij |bi ∼ Bernoulli(πijc ),
Φ −1 πijc = ∆ij + bi , where bi ∼ N (0, d). The corresponding mean model logit(πijm ) = β0 + β1 Xi + β2 tij + β3 Xi tij . Here, Xi is an indicator for the treatment applied to subject i, tij is the time at which the jth measurement is taken. For the COMMM model, the conditional mean model is specified as πijc = θij Φ (∆ij + bi ) where θij ∼ Beta(α1 , α2 ) and Φ −1 is the probit link. The constraint c = α2 /α1 was imposed. From the results presented in Table 2, it is again clear that introducing the beta random effect improves significantly the model fit when comparing the log-likelihoods (smaller AIC). Parameter estimates from both models are slightly different, but a much more dramatic effect is seen in precision estimation. For many, but not all parameters, the extended model yields a higher precision. Furthermore, we observed that whereas the broader model encompassing both overdispersion and
S. Iddi, G. Molenberghs / Computational Statistics and Data Analysis 56 (2012) 1944–1951
1947
Table 2 A clinical trial in onychomycosis. Comparison of logistic–probit-normal MMM with the combined Beta and logistic–probitnormal MMM. CM beta and probit-normal
MMM logistic–probitnormal
COMMM beta and logistic–probit-normal
Effect
Parameter
Estimate (s.e)
Estimate (s.e)
Estimate (s.e)
Intercept Treatment effect Time effect Interaction effect Std. dev. random effect Beta-binomial parameter −2 log-likelihood
β0 β1 β2 β √3
−0.7285 (0.8622) −0.7404 (1.1816) −0.9109 (0.2321) −0.3989 (0.1876)
−0.6154 (0.1493) −0.0382 (0.2120) −0.1529 (0.0190) −0.0702 (0.0288)
−0.4762 (0.0408) −0.1858 (0.1240) −0.1832 (0.0241) −0.0691 (0.0392)
8.6763 (1.9535) 0.2828 (0.0372) 1259.9
2.1061 (0.1904) – 1265.2
8.8901 (0.0152) 0.2769 (0.0363) 1254.0
d
α2 /α1
correlation concludes that there is no effect of the evolution of treatment (β3 ) on the response with p-value of p = 0.0790, the MMM model results in a significant treatment evolution (p = 0.0155). Also presented in Table 2 are the results for a combined beta and probit-normal model whose parameters have a conditional interpretation. The treatment evolution was found to be significant with p-value of p = 0.0343. By comparing the two combined models, which both account for overdispersion and correlation simultaneously but with different interpretation of parameters, we may conclude that, while there is a significant treatment evolution given subjects, there is no evidence of population average treatment evolution. 3. Methodology In the previous section, two instances of an MMM and COMMM were given. Clearly, there is a single pattern underlying both, lending itself naturally to generalization. 3.1. General formulation of the combined model (CM) Suppose the longitudinal outcome Yij is measured for each independent subject i = 1, . . . , N at occasion j = 1, 2, . . . , ni . The response for the ith subject, Yi = (Yi1 , Yi2 , . . . , Yini )T is assumed to follow an exponential-family distribution. Hence, the density of an individual outcome Yij , conditional on overdispersion and longitudinal random effects, takes the generic form: fi yij |bi , β, θij , φ = exp φ −1 yij λij − ψ(λij ) + c (yij , φ) .
The conditional mean follows as the product: E Yij |bi , β, θij = µcij = ψ ′ (λij ) = θij κij ,
(2)
where the random variable
θij ∼ Θij υij , σij2 ,
(3)
with mean υij , variance σ
2 ij ,
g (κij ) = x′ij β + zij′ bi
and the mean component (4)
depends on an ni × p fixed-effects design Xi and a ni × q random-effects design Zi through a link function g (·); β and bi ∼ N (0, D) are fixed and random effects, respectively. The two sets of random effects θij and bi are conveniently assumed to be independent of each other, but this can be generalized if needed. Depending on the type of outcome under investigation, the distribution of θij can be chosen appropriately. The distribution of certain outcomes tends to combine nicely with a so-called conjugate distributional choice for θij (Cox and Hinkley, 1974, p. 370). This is well-known for the case without additional normal random effects. Conjugate distributions produce a general and closed-form solution for the corresponding marginal distribution. The particular case of the normal distribution is self-conjugate, i.e., conjugacy applies when θij is then chosen normally distributed as well. Further, the binomial and beta distributions are conjugate, as are the Poisson and gamma distributions. Also the Weibull and gamma distributions are conjugate. For the case where also normal random effects are present, Molenberghs et al. (2010) defined strong conjugacy, which essentially means that a simple, closed-form marginalization over the strongly conjugate random effect, but conditional on the normal random effect, applies. The marginal joint distribution and its moments are easy to obtain when strong conjugacy holds. In the absence of this property, straightforward marginalization is not guaranteed. For example, such expressions for a Poisson outcome Yi with gamma and normal random effects are given in Molenberghs et al. (2007). Similarly, expressions for a Bernoulli model with probit link and beta and normal random effects, and for Weibull and exponential models for survival data with gamma and normal random effects can be found in Molenberghs et al. (2010). The probit case is special in that it allows for closed-form marginalization, even though strong conjugacy does not apply.
1948
S. Iddi, G. Molenberghs / Computational Statistics and Data Analysis 56 (2012) 1944–1951
In contrast, Bernoulli models with logit link do not allow for strong conjugacy with the beta and normal random effects and, not surprisingly, explicit expressions for the marginal moments and joint marginal distribution are unavailable. This also follows from the fact that there is no closed form solution for the logistic-normal GLMM model. However, closed-form expressions do exist for a logistic-normal GLMM model with a logit link that assumes a so-called bridge distribution for the random effects, as introduced by Wang and Louis (2003). 3.2. General marginalized multilevel models formulation (MMM) The general marginalized multilevel model due to Heagerty (1999) can be written as ′ m g (µm ij ) = xij β ,
(5)
g (µ ) = ∆ij + zij bi , ′
c ij
bi ∼ Fb (0, D) , Yijc
(6)
= Yij |bi ∼ FY c µcij , υ .
(7)
Here, υ is a dispersion parameter, similar to the overdispersion parameter φ in the exponential family. The marginal mean µm ij = E (Yij ) is made to depend on an ni × p matrix of p linear predictors Xi through a link function g (·). Further, the conditional mean µcij = E (Yij |bi ) relates to the random variable bi with distribution (6) and the function ∆ij connects the marginal and conditional means through the same link function; the latter aspect could be relaxed if desired. The conditional response distribution is given by FY c . The function ∆ij is obtained from the solution to the integral equation
µ =g m ij
−1
(xij β ) = ′
m
g −1 (∆ij + zij′ bi )dFb .
(8)
b
For example, when the link function is logit and the distribution of the random effect is normal, the expression of ∆ij is obtained from: expit(x′ij βm ) =
expit(∆ij + zij′ bi )ϕ(bi |0, D)dbi .
b
Griswold and Zeger (2004) expanded the model by formally relaxing the common link function assumed for both the marginal and conditional model specification. For example, using a logistic–probit-normal model formulated as, ′ m logit(µm ij ) = xij β ,
Φ −1 (µcij ) = ∆ij + zij′ bi , bi ∼ Fb (0, D) , Yijc |bi = Yij ∼ FY c µcij , υ . (8) becomes
∆ij =
1 + zij′ Dzij · Φ −1 {expit(x′ij βm )}.
(9)
The logit–probit-normal is more attractive than the logit–logit normal version in the sense that, for example, the marginal parameters will enjoy the odds ratio interpretation while at the same time retaining the computational advantage associated with the probit-normal relationship. For count data, a log–log-normal specification leads to
∆ij = x′ij βm − zij′ Dzij /2.
(10)
Note from this expression that, in particular for a random intercept model, i.e., one where zij bi = bi with bi ∼ N 0, τ 2 , ′
√
then zij′ Dzij = 1 + τ 2 , which implies that only fixed intercept parameters will be affected in the MMM model compared to their counterparts in the conditional GLMM model. For a general random-effects design zij′ bi , this will not be the case. The expression for ∆ij , in a case of probit–probit-normal, log–log-gamma model and the logistic–logistic-Bridge MMM can be found in Griswold and Zeger (2004). 3.3. Combined overdispersed and marginalized multilevel models (COMMM) We propose a new, extended model by combining (2)–(4) from the CM with (5)–(7) from the MMM in the following way: ′ m g (µm ij ) = xij β
g (κij ) = ∆ij + zij′ bi
µcij = θij κij θij ∼ Θij τij , σij2 bi ∼ Fb (0, D) Yijc = (Yij |θij , bi ) ∼ FY c µcij , υ .
S. Iddi, G. Molenberghs / Computational Statistics and Data Analysis 56 (2012) 1944–1951
1949
Note that the response distribution is now conditioned on two sets of random effects, namely the overdispersion and longitudinal ones. This implies that the expression for ∆ij will change slightly. Because µcij = E (Yij |θij , bi ), the function ∆ij will now be obtained from the integral equation
µ =g m ij
−1
(xij β ) = ′
m
b
θ
θij g
−1
(∆ij + zij bi )dΘθ dFb = ′
E (θij )g −1 (∆ij + zij′ bi )dFb .
(11)
b
It can easily be shown that for the logistic–probit-normal model with beta distribution for the overdispersion parameter, i.e., θij ∼ Beta(α1j , β2j ), (11) becomes
∆ij =
1 + zij′ Dzij · Φ −1 {(1 + cj ) · expit(x′ij βm )},
where cj = β2j /α1j , which can serve as one of several possible constraints, given that the model is now over-parameterized. For the log–log-normal MMM model with θij ∼ Gamma(α1j , α2j ),
∆ij = − log(α1j α2j ) + x′ij βm − zij′ Dzij /2. The fully marginalized joint distribution can be obtained from integrating out the two random effects. Less effort is needed here because the expressions for the marginal distribution are similar to those found in Molenberghs et al. (2010), except for replacing κij with κij = g −1 (∆ij + zij′ bi ). 4. Estimation Even though the CM allows for closed-form full-distribution expressions, they are too intractable to use for parameter estimation. However, it is possible to make use of commercially available software for parameter estimation. This proceeds as follows. First, analytical integration over the conjugate random effects is straightforward, as we will show below. Second, a generic optimizer that allows for the presence of normal random effects can be used. For example, the SAS procedure NLMIXED allows for a generally defined likelihood for non-linear models fitted to repeated measures with normal random effects, precisely what is needed. Evidently, many other statistical software packages have similar features. The approach is termed, by Molenberghs et al. (2010), partial marginalization. We now give the expressions needed for the above approach, in both the Poisson and binary cases. For a combined Poisson–gamma-normal model, we assume Yij ∼ Poi θij κij ,
κij = exp x′ij β + zij′ bi , with bi normally distributed and θij following a gamma distribution. The distribution of Yi conditional on bi and marginal over θij is then given by f (yij |bi ) =
α1j + yij − 1 α1j − 1
α2j 1 + κij α2j
yij
1 1 + κij α2j
α2j
y
κij ij .
Similarly, for the combined Bernoulli (with logit link)-beta-normal model: Yij ∼ Bernoulli πij = θij κij ,
exp x′ij β + zij′ bi
κij =
1 + exp x′ij β + zij′ bi
,
with the distribution of θij now beta and partially marginalized density f (yij |bi ) =
1
α1j + α2j
y 1−yij κij α1j ij (1 − κij )α1j + α2j .
Molenberghs et al. (2010) referred to various other estimation methods, including but not limited to pseudo-likelihood estimation techniques (Aerts et al., 2002; Molenberghs and Verbeke, 2005), generalized estimating equations in the spirit of Zeger et al. (1988), and Bayesian estimation. Evidently, these are not readily available in statistical packages in view of the CM. Hence, partial marginalization is an attractive mode. Furthermore, MMM model parameters can also be conveniently estimated by way of maximum likelihood and can be easily implemented in the SAS procedure NLMIXED (Griswold and Zeger, 2004). For the COMMM, the expressions of the partially marginalized conditional densities of the combined model take center stage here once again, for reasons given above. Our model will be fitted by specifying the marginal and conditional means, together with the function linking them. The conditional distribution obtained by integrating out the overdispersed random effect will also be specified while leaving out the normal random effect, which is left for numerical integration, by NLMIXED. It is important to note that the connector function (9) in the binary case exhibits a non-linear relationship between the variance components D and the regression parameter βm , consisting of the product of two non-linear functions. For the count case, on the other hand, connector (10) is linear. The implications of this are clear from the precision estimates. In the binary case, the precision for the variance component d, changes dramatically when switching from the conditional to the marginal version, as can be seen from Table 2, whereas for the count data, reported in Table 1, both are the same.
1950
S. Iddi, G. Molenberghs / Computational Statistics and Data Analysis 56 (2012) 1944–1951
5. Concluding remarks We have shown that the combined model introduced by Molenberghs et al. (2007, 2010) is not confined to extending the standard GLMM, but can be engaged in other model formulations as well. Beneficially, the model of Heagerty (1999), which combines the strength of marginal modeling with hierarchical, random-effects formulations into a single marginalized multilevel model, has been extended to accommodate overdispersion through the introduction of a new set of random effects. The resulting model enjoys three desirable properties: (a) flexible modeling of overdispersion; (b) flexible modeling of association and subject-specific effects; (c) direct marginal interpretation of the regression parameters. This leads to more trustworthy inferences, be it in terms of model parameters, or in terms of functions thereof. This was clear in both our data analyses, where the use of the more elaborate model leads to a change in the inferences on the treatment effect. Apart from parameters and inferences thereof, other functions, such as marginal correlation functions, could be studied, as was done by Vangeneugden et al. (2011), who showed in a data analysis that the correlations from a CM were considerably different from those obtained from a conventional GLMM. In terms of implementation, the proposed method can conveniently make use of available resources, such as, for example, the SAS procedure NLMIXED. Little additional coding effort is required. Programs used for analysis are available from the authors, upon simple request. One might question identifiability of all model parameters. As indicated in Molenberghs et al. (2007, 2010), this is definitely not an issue for count data, given that univariate count data allow for an identifiable overdispersion parameter. The same would be true for time-to-event data, but these are not considered here. For binary data, one or other form of data hierarchy is necessary because univariate binary data cannot exhibit overdispersion. This is different, though, for binomial data. As soon as binary outcomes are measured repeatedly, these can be viewed as being of a binomial type. Both the betabinomial model as well as the logistic-normal model allow for overdispersion and/or correlation within a cluster, but both are of a specific parametric form, leaving room for model improvement. This is why the parameters in the posited model are all identifiable. The data analyses reported here corroborate this. Supplementary Materials The web-based Supplementary Materials present the SAS programs for both the epileptic patients data analysis as well as for the onychomycosis trial. Acknowledgment The authors gratefully acknowledge the financial support from the IAP research Network P6/03 of the Belgian Government (Belgian Science Policy). Appendix. Supplementary data Supplementary material related to this article can be found online at doi:10.1016/j.csda.2011.11.021. References Aerts, M., Geys, H., Molenberghs, G., Ryan, L., 2002. Topics in Modelling of Clustered Data. Chapman & Hall, London. Agresti, A., 2002. Categorical Data Analysis, second ed. John Wiley & Sons, New York. Breslow, N.E., Clayton, D.G., 1993. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88, 9–25. Commenges, D., Jolly, D., Drylewicz, J., Putter, H., Thiébaut, R., 2011. Inference in HIV dynamics models via hierarchical likelihood. Computational Statistics and Data Analysis 55, 446–456. Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman & Hall/CRC, London. De Backer, M., De Keyser, P., De Vroey, C., Lesaffre, E., 1996. A 12-week treatment for dermatophyte toe onychomycosis: terbinafine 250 mg/day vs, itraconazole 200 mg/day—a double blind comparative trial. British Journal of Dermatology 134, 16–17. Engel, B., Keen, A., 1994. A simple approach for the analysis of generalized linear mixed models. Statistica Neerlandica 48, 1–22. Faught, E., Wilder, B.J., Ramsay, R.E., Reife, R.A, Kramer, L.D., Pledger, G.W., Karim, R.M., 1996. Topiramate placebo-controlled dose-ranging trial in refractory partial epilepsy using 200-, 400-, and 600-mg daily dosage. Neurology 46, 1684–1690. Fitzmaurice, G.M., Laird, N.M., 1993. A likelihood based method for analysing longitudinal binary responses. Biometrika 80, 141–151. Griswold, M.E., Zeger, S.L., 2004, On marginalized multilevel models and their computation, Johns Hopkins University, Department of Biostatistics Working Paper #99, November 2004. Heagerty, P.J., 1999. Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55, 688–698. Heagerty, P.J., Zeger, S.L., 2000. Marginalized multilevel models and likelihood inference with discussion. Statistical Science 15, 1–26. Hinde, J., Demétrio, C.G.B., 1998a. Overdispersion: models and estimation. Computational Statistics and Data Analysis 27, 151–170. Hinde, J., Demétrio, C.G.B., 1998b. Overdispersion: models and estimation. São Paulo: XIII Sinape. Hoff, P.D., 2011. Hierarchical multilinear models for multiway data. Computational Statistics and Data Analysis 55, 530–543. Liang, K.-Y., Zeger, S.L., 1986. Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models. Chapman & Hall/CRC, London. Molenberghs, G., Lesaffre, E., 1994. Marginal modelling of correlated ordinal data using a multivariate Plackett distribution. Journal of the American Statistical Association 89, 633–644. Molenberghs, G., Verbeke, G., 2005. Models for Discrete Longitudinal Data. Springer, New York. Molenberghs, G., Verbeke, G., Demétrio, C., 2007. An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime Data Analysis 13, 513–531. Molenberghs, G., Verbeke, G., Demétrio, C., Vieira, A., 2010. A family of generalized linear models for repeated measures with normal and conjugate random effects. Statistical Science 25, 325–347.
S. Iddi, G. Molenberghs / Computational Statistics and Data Analysis 56 (2012) 1944–1951
1951
Nelder, J.A., Wedderburn, R.W.M., 1972. Generalized linear models. Journal of the Royal Statistical Society, Series B 135, 370–384. Roberts, D.T., 1992. Prevalence of dermatophyte onychomycosis in the United Kingdom: results of an omnibus survey. British Journal of Dermatology 126 (Suppl. 39), 23–27. Vangeneugden, T., Molenberghs, G., Verbeke, G., Demétrio, C., 2011. Marginal correlation from an extended random-effects model for repeated and overdispersed counts. Journal of Applied Statistics 38, 215–232. Wang, Z., Louis, T.A., 2003. Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika 90, 765–775. Wolfinger, R., O’Connell, M., 1993. Generalized linear mixed models: a pseudo-likelihood approach. Journal of Statistical Computation and Simulation 48, 233–243. Zeger, S.L., Liang, K.-Y., Albert, P.S., 1988. Models for longitudinal data: a generalized estimating equation approach. Biometrics 44, 1049–1060.