Multivariate probit analysis of binary familial data using stochastic representations

Multivariate probit analysis of binary familial data using stochastic representations

Computational Statistics and Data Analysis 56 (2012) 656–663 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data An...

795KB Sizes 17 Downloads 178 Views

Computational Statistics and Data Analysis 56 (2012) 656–663

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Multivariate probit analysis of binary familial data using stochastic representations Yihao Deng a , Roy T. Sabo b,∗ , N. Rao Chaganty c a

Department of Mathematical Sciences, Indiana University Purdue University-Fort Wayne, Fort Wayne, IN 46805-1499, USA

b

Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298-0032, USA

c

Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA 23529-0077, USA

article

info

Article history: Received 28 October 2010 Received in revised form 5 September 2011 Accepted 10 September 2011 Available online 18 September 2011 Keywords: Multivariate probit Stochastic representation Familial binary data Fisher information

abstract The probit function is an alternative transformation to the logistic function in the analysis of binary data. However, use of the probit function is prohibitively complicated for cases of multivariate or repeated-measure binary responses, as integrations involving the multivariate normal distribution can be difficult to compute. In this paper, we propose an alternative form to stochastically represent random variables in the case of familial binary data that simplifies calculation of the multivariate normal integrals involved in the probit link. We provide examples of these stochastic representations for one- and twoparent families, and compare the performance of this methodology with that of moment estimators by calculating asymptotic relative efficiencies and through a real-life data example. Particular attention is paid to analyzing the properties of regression parameter estimates from these two methods with respect to the feasible ranges of the correlation parameters. Published by Elsevier B.V.

1. Introduction Familial data consisting of measurements on parents and children appear naturally in many fields, including biomedicine, psychology and social sciences. Since members within the same family can share similarities due to genetic inheritance or cohabitation, measurements on these subjects are usually dependent. Such data (or responses) can be either continuous (e.g. height and blood pressure) or discrete (e.g. presence or absence of a certain trait or disease). Generalized linear models are easily applied to repeated continuous responses; yet several difficulties arise when responses are discrete. In the latter case, correlation parameters are restricted by the Frechét bounds, which are dependent not only on the structured correlation matrix but also on the marginal means and covariates (Prentice, 1988; Chaganty and Joe, 2006). Though these bounds apply to all types of discrete data, they are most stringent for binary observations, which are ubiquitously used to represent qualitative measurements in all fields of science. Chaganty and Deng (2007) found ranges of familial dependence using several well-known association statistics in the case of binary response variables, and found that for even the simplest familial case of one parent and two children, the feasible correlation ranges can be prohibitively narrow. In most cases, there is no guarantee that correlation parameter estimates will lie within the restricted bounds, so specific methodologies are required to ensure feasibility. Ashford and Sowden (1970) proposed the multivariate probit model (MVP) for the analysis of binary variables, which assumes that a latent standard normal variable corresponds to each binary outcome. The binary outcome takes the value 1



Corresponding author. Tel.: +1 804 828 3047; fax: +1 804 828 8900. E-mail addresses: [email protected] (Y. Deng), [email protected] (R.T. Sabo), [email protected] (N.R. Chaganty).

0167-9473/$ – see front matter. Published by Elsevier B.V. doi:10.1016/j.csda.2011.09.014

Y. Deng et al. / Computational Statistics and Data Analysis 56 (2012) 656–663

657

when the latent variable falls below a threshold value and takes the value 0 otherwise. This modeling transforms the study of binary variables into the study of continuous variables, where well-developed statistical techniques can be used. Further, the correlation between the latent variables need only satisfy the positive-definite requirement and are not subject to the Frechèt bounds. Although MVP simplifies the analysis of binary variables (especially within a familial framework, as will be subsequently shown), maximum likelihood estimation of parameters is challenging because it requires computationally intensive evaluation of the multivariate normal integral at each iteration. Methods that improve or accelerate this process would enhance parameter estimation and thus be attractive to potential users. In this paper, we propose using i.i.d. standard normal random variables to stochastically represent the behaviors of the normal latent variables to simplify the multiple integral and reduce the computational burden required for parameter estimation in the MVP method. Various familial dependence structures are used as motivation, since they require several parameters and lead to interesting stochastic representations. The paper is organized as follows: in Section 2 we give the stochastic representation of the latent variables in MVP and discuss their properties for several family types, while the maximum likelihood estimation process is presented in Section 3. Results from comparing the efficiencies of maximum likelihood and moment estimators are discussed in Section 4, and an example is shown in Section 5 using longitudinal familial data. A brief discussion is provided in Section 6. 2. Stochastic representations Suppose for the ith family (of size ti ) we observed a vector of binary responses yi = (yi1 , yi2 , . . . , yiti )′ , where xij = (xij1 , xij2 , . . . , xijq )′ is a vector of covariates for the jth member. The multivariate probit model (MVP) assumes there is a latent random vector Zi = (Zi1 , Zi2 , . . . , Ziti )′ distributed as multivariate normal with mean 0 and correlation matrix Ri = (αjk ) such that the binary observation yij is an indicator of the event {Zij ≤ x′ij β}, where β is the regression parameter of interest. The probability mass function of yi is then given by P (yi ) =







1

···

ti

1

Ci1

(2π ) 2 |Ri | 2

 (−∞, x′ij β) (x′ij β, ∞)

if yij = 1 if yij = 0.

Cit

exp

−z′i Ri −1 zi 2

 dzi

(1)

where

Cij =

The ease of evaluating multiple integral (1) depends on correlation structure Ri . For example, when it is reasonable to assume that correlations among the members of the ith family are positive and equal (αjk = α > 0, for all j ̸= k), the latent variable √ √ Zij has the stochastic representation Zij = α Vi + 1 − α Uij where Vi , Uij , j = 1, 2, . . . , ti are independent standard normal random variables. This stochastic representation reduces the multidimensional integral (1) to a single integral P (yi ) =





1

√ −∞



 exp −

vi2 2

∏ ti

yij

pij (1 − pij )1−yij dvi

(2)

j =1

and thus abates the computational burden in calculating the probability mass function. Here pij = Φ





x′ij β− α vi



1−α

 and

Φ (·) is the  standard normal cumulative distribution function. Kotz et al. (2000) suggested using the representation Zij =

λij Vi +

1 − λ2ij Uij , where Vi , Uij are i.i.d. standard normal variables, if the correlation between Zij and Zik (j ̸= k) can be

written as the product of λij and λik . Unlike the former representation where α is restricted to be within the range (0, 1), this latter stochastic representation is less restrictive since λij can take values in the interval (−1, 1). 2.1. Single-parent family A reasonable correlation model in the case where the ith family consists of one parent and ti children is to assume constant correlation (ρ) between the parent and children and constant correlation (α) between children. In this case the off-diagonal elements in the first row and column of Ri are equal to ρ , while all other off-diagonal elements are equal to α . A stochastic representation that leads to this familial correlation structure Ri is given by

 ρ ρ2 Zim = √ Vi + 1 − Uim α √ √ α Zij = α Vi + 1 − α Uij j = 1, 2, . . . , ti ,

(3)

where the Vi , Uim and Uij are independent standard normal variables. It is easy to check that Var (Zim ) = √ Var (Zij ) =√ 1, Corr (Zim , Zij ) = ρ and Corr (Zij , Zij′ ) = α . Note that this representation requires for all ti : 0 < α < 1 and − α < ρ < α , which is a proper subset of the region where the familial correlation structure Ri is positive definite, which itself depends

658

Y. Deng et al. / Computational Statistics and Data Analysis 56 (2012) 656–663

Fig. 1. Ranges of (ρ, α ) for one-parent familial structure when ti = 3. Outer region: positive definite boundary. Inner region: correlation bounds necessitated by stochastic representation.

on ti . Fig. 1 shows the positive definite (outer) region and the bounds necessitated by the stochastic representation (inner region) when there are ti = 3 children. The outer region close to the inner region as ti increases. If we √becomes increasingly √ conveniently re-parameterize the correlations as λ = α and η = ρ/ α , then (3) can be rewritten as



Zim = ηVi +

1 − η2 Uim



Zij = λVi +

1 − λ2 Uij

j = 1, 2, . . . , ti

which is more flexible because both parameters λ and η range from −1 to 1. For this familial correlation structure the multiple integral (1) reduces to ∞



P (yi ) =



−∞

with pim = Φ



1

√ 

x′im β−ηvi





1−η2

exp −

vi2



2

and pij = Φ

y

pimim (1 − pim )1−yim

ti ∏

yij

pij (1 − pij )1−yij dvi

j =1



x′ij β−λvi





.

1−λ2

2.2. Nuclear family If we assume that observations are drawn from nuclear families (consisting of a father, mother and ti children), then there are the correlations between the: parents (γ ), father and children (ρ1 ), mother and children (ρ2 ), and children (α). The representations

 ρ1 ρ2 ρ1 ρ2 − ρ12 γ− Wi + 1 − γ + Uif α α   ρ2 ρ1 ρ2 ρ1 ρ2 − ρ22 Wi + 1 − γ + Uim Zim = √ Vi + γ − α α α √ √ Zij = α Vi + 1 − α Uij j = 1, 2, . . . , ti ρ1 Zif = √ Vi + α



can simplify the computation of the multiple integral (1) at the expense of some additional constraints beyond positive definiteness of the correlation structure. Fig. 2 shows the region necessitated by this representation as a subset of the positive definiteness region. By re-parameterizing the correlations ρ1 = λ ξ , ρ2 = λ η, α = λ2 , γ = ξ η + νζ , we can rewrite Zif = ξ Vi + ν Wi +



Zim = η Vi + ζ Wi +

1 − ξ 2 − ν 2 Uif



1 − η2 − ζ 2 Uim



Zij = λ Vi +

1 − λ2 Uij

j = 1, 2, . . . , ti .

This representation reduces (1) to the following double integral P (yi ) =



×





1

 exp −

−∞

t ∏ j =1

yij

wi2 2

∫



 exp −

−∞

pij (1 − pij )1−yij dvi dwi

vi2 2



yif

y

pif (1 − pif )1−yif pimim (1 − pim )1−yim

Y. Deng et al. / Computational Statistics and Data Analysis 56 (2012) 656–663

659

Fig. 2. Ranges of (ρ1 , ρ2 , α ) for nuclear familial structure when ti = 2, γ = 0.1 (left) and γ = 0.4 (right). Outer region: positive definite boundary. Inner region: correlation bounds necessitated by the stochastic representation. The horizontal axis denotes ρ1 , the vertical axis denotes α , and the axis going from front to back denotes ρ2 .

with pif = Φ



x′if β−ξ vi −ζ wi





1−ξ 2 −ζ 2

, pim = Φ



x′im β−ηvi −ζ wi



1−η2 −ζ 2

 and pij is defined as before. Alternatively we could use polar

coordinates for the correlation pairs (ξ , ν) ≡ (r1 cos(ϕ), r1 sin(ϕ)) and (η, ζ ) ≡ (r2 cos(ψ), r2 sin(ψ)). This will result in the following stochastic representations Zif = r1 cos(ϕ)Vi + r1 sin(ϕ)Wi +



Zim = r2 cos(ψ)Vi + r2 sin(ψ)Wi + Zij = λVi +



1 − λ2 Uij

1 − r12 Uif



1 − r22 Uim

j = 1, 2, . . . , ti ,

which are convenient since the constraints on the parameters are linear and the joint parameter space is a rectangle. We have used this parametrization in the software package ‘‘SRProbit’’ developed by the authors and described in the next Section. 3. Likelihood estimation Assuming we have data on y1 , . . . , yn for n independent families, parameter estimation is done by maximizing the log-likelihood

ℓ(θ ) =

n −

ln P (yi |θ),

(4)

i =1

where θ consists of the regression and correlation parameters. More specifically θ = ( β, η, λ) for single-parent familial data and θ = ( β, r1 , ϕ, r2 , ψ, λ) in the nuclear family case. Several optimization routines (such as optim in R, NLPQN in SAS) could be used to maximize (4) and numerically obtain the maximum likelihood (ML) estimate  θ . We have written a program entitled ‘‘SRProbit’’ in R language (available on the corresponding author’s website) for computation of the ML estimates and their standard errors. The program uses an iterative procedure. An initial value of the regression parameter β is obtained after setting all correlation parameters to zero. Initial values of the correlation parameters within the feasible range need to be chosen more carefully, since poor initial values could cause convergence problems. The current values of the correlation parameters and the stochastic representations are used to simplify the calculation of the multiple integral (1) and the log-likelihood function. Asymptotic standard errors for ML estimates of the regression and correlation parameters are obtained by inverting the Fisher information matrix (Chaganty and Joe, 2004)

 n − − ∂ P (yi ) ∂ P (yi )

I(θ ) =

i =1

∂θ

yi

∂θ ′

P (yi )

(5)

at θ =  θ , where the inner sum is taken over all possible binary vectors. Using the delta theorem, standard errors of the regression parameter and the original correlation parameters are given by ∆I(θ )−1 ∆′ , where ∆ is the matrix of partial derivatives of the parameter transformations. Since ρ = λ η and α = λ2 for the single parent family, the matrix ∆ = ∆1 where I 0 0

 ∆1 =

0

λ 0

0

η



 .

660

Y. Deng et al. / Computational Statistics and Data Analysis 56 (2012) 656–663

For the nuclear family model ∆ = ∆2 , where



I 0  ∆2 = 0 0 0

0

r2 cos(ϕ − ψ) λ cos ϕ 0 0

0 −r1 r2 sin(ϕ − ψ) −λr1 sin ϕ 0 0

0

r1 cos(ϕ − ψ) 0 λ cos ψ 0

0

r1 r2 sin(ϕ − ψ) 0 −λr2 sin ψ 0



0  0  r1 cos ϕ  . r2 cos ψ  2λ

We have also developed a similar program in SAS/IML using optimization routine NLPQN to cross-check the parameter estimates and standard errors from the R program. Both the SAS/IML program and ‘‘SRProbit’’ in R gave essentially identical results for test data sets. 4. Comparison with the alternative estimation method An alternative to the ML method is the unbiased estimating equation approach popularized by Liang and Zeger (1986) as the generalized estimating equation (GEE) method. The method estimates the regression parameter β solving the estimating equation n − ∂µ′ i

i=1

∂β

Wi−1 (yi − pi ( β)) = 0,

(6) 1/2

1/2

where pi ( β) is the mean of yi , and Wi = Ai ( β) Ro Ai ( β) is a working covariance matrix of yi that depends on the working familial correlation structure Ro , which is a function of the correlations (γ o , ρ1o , ρ2o , α o ), depending on the context. Unlike MVP, which models the latent correlations, the GEE models the actual correlation of the binary responses and hence these parameters (γ o , ρ1o , ρ2o , α o ) are subject to Fréchet bounds which are more restrictive than the positive definiteness of the dependence structure. Here Ai ( β), a function of pi ( β), is the diagonal matrix of variances of the binary observations. If  βg is the solution to Eq. (6) then its asymptotic covariance matrix is given by

′

V = D −1 Σ D −1 ,



∂µ′i −1 ∂µi ], i=1 [ ∂β Wi ∂β

(7)

∑n  ∂µ′i

 −1 ∂µi

where D = Σ = i=1 ∂β Wi−1 Cov(yi )Wi ∂β . In this section we compare the asymptotic relative efficiencies (ARE) of the GEE regression parameter estimate with respect to the ML estimate obtained from the MVP method using stochastic representations. Since calculation of ARE only requires the evaluation of (I(θ ))−1 and V at known values of β , the covariates and the correlation parameters, we will not need to simulate response vectors Y , nor will we need to estimate β . We assume the model consists of two covariates, one dichotomous type x1 (such as gender) simulated as Bernoulli(0.5) and another continuous type x2 (such as serum concentrations of some biomarker) simulated as uniform(3,6), along with an intercept and we constrain family sizes to take reasonable values between 3 and 7, ensuring that all familial correlations are present. We assume β = ( β0 , β1 , β2 ) takes fixed values (7.386, −0.327, −1.346) when n = 20 and (6.738, −0.192, −1.452) when n = 75. These particular values, along with the simulated values of x1 and x2 , ensure that we obtain a wide range of marginal probabilities (from 0.15 to 0.99 in the n = 20 case, and from 0.02 to 0.99 in the n = 75 case). True values for the latent correlation parameters are fixed and, together with the xi and β , are used to compute the Fisher information matrix (5), which is then inverted to obtain the covariance matrix of the ML parameter estimates. For the GEE estimates, varying correlation values are used to compute the working covariance matrix Wi , the latent correlation values are used to compute Cov(Yi ), and both are used in conjunction with the covariates xi and β to compute asymptotic covariance matrix Vi . The correlation values used to compute Wi are varied throughout the positive definite range to reflect the values of moment estimates, and at times will take values that violate the Frechét bounds. This is done to calibrate the efficiencies of the GEE estimators with respect to the ML estimator, which are calculated by taking the ratios of the diagonal elements of matrices (I(θ))−1 and V (defined in Eqs. (5) and (7), respectively) that correspond to the regression parameters.

∑n

4.1. Single-parent family In this case the latent correlations are fixed at ρ = 0.74 between the parent and children and α = 0.64 between children. Fig. 3 shows the contour plot of the efficiencies for the regression parameter estimates with n = 20 and n = 75 families. In both cases and for all three β parameters the ML estimators are everywhere more efficient than the corresponding GEE estimators. The efficiencies are relatively close (ARE > 0.85) for the estimator ( β1 ) of the categorical covariate (x1 ) over a wide range of working correlation estimates when n = 20, but those favorable comparisons are reduced when n = 75. For neither the intercept ( β0 ) or the estimator ( β2 ) of the continuous covariate (x2 ) are the GEE estimators as efficient as the ML estimator (ARE < 0.85). Note that the efficiencies for β1 are in general larger than those for β2 ; this is because the categorical covariate (x1 ) corresponding to β1 contains less information than the continuous covariate (x2 ) corresponding to β2 , and as such has less influence on the overall model.

Y. Deng et al. / Computational Statistics and Data Analysis 56 (2012) 656–663

661

Fig. 3. Efficiency of GEE versus MVP regression parameter estimates.

4.2. Nuclear family An efficiency study was also conducted for the two-parent family case. Family sizes are here constrained between 4 and 7, the number of families is fixed at n = 20, and the latent correlations are set at (γ = 0.2, ρ1 = 0.3, ρ2 = 0.3, α = 0.4). Table 1 shows the asymptotic relative efficiencies of the GEE and ML β estimators for various values of GEE working correlations (γ o , ρ1o , ρ2o , α o ). Here each working correlation is varied while holding all others constant to more precisely observe each working correlation’s effect on the ARE (ρ1o is not varied as it yields efficiencies that are practically identical to

662

Y. Deng et al. / Computational Statistics and Data Analysis 56 (2012) 656–663

Table 1 Efficiency of GEE versus MVP regression parameters estimators under nuclear family structure. Latent correlation values (γ = 0.2, ρ1 = ρ2 = 0.3, α = 0.4) are used to calculate I in MVP and Cov(yi ) in GEE, while (γ ◦ , ρ1◦ , ρ2◦ , α ◦ ) are used for GEE working correlation in Vi . Efficiency

γ◦

ρ1◦

ρ2◦

α◦

β0

β1

β2

0.1 0.1 0.1 0.1 0.1

0.3 0.3 0.3 0.3 0.3

0.3 0.3 0.3 0.3 0.3

0.1 0.2 0.3 0.4 0.5

0.6712 0.7733 0.8007 0.7924 0.7618

0.7975 0.9049 0.9430 0.9445 0.9191

0.6776 0.7714 0.7975 0.7905 0.7623

0.2 0.2 0.2 0.2 0.2

0.3 0.3 0.3 0.3 0.3

0.1 0.2 0.3 0.4 0.5

0.4 0.4 0.4 0.4 0.4

0.7844 0.7911 0.7945 0.7880 0.7492

0.9288 0.9418 0.9489 0.9403 0.8877

0.7820 0.7885 0.7923 0.7867 0.7504

0.3 0.4 0.5

0.3 0.3 0.3

0.3 0.3 0.3

0.4 0.4 0.4

0.7922 0.7839 0.7656

0.9454 0.9315 0.9017

0.7901 0.7824 0.7651

Table 2 Parameter estimates for MVP and GEE estimating procedures, with SEs and p-values. Parameter

MVP

SE

p-value

GEE

SE

p-value

Intercept Gender Age Pre-ATP

7.38 −0.58 −0.02 −1.36 0.74 0.64

2.175 0.357 0.011 0.423 – –

<0.001

7.45 −0.63 −0.02 −1.36 0.51 0.44

1.826 0.328 0.010 0.346 – –

<0.001 0.055 0.076 <0.001 – –

ρ α

0.106 0.166 0.001 – –

those observed when ρ2o is varied). As in the one-parent family case, here we see that the ML estimates are more efficient than the moment estimates. Note that the efficiencies are again generally higher for the parameter corresponding to the categorical covariate ( β1 ) than for the parameter corresponding to the continuous covariate ( β2 ). 5. Example The MVP model using the stochastic representation is demonstrated on a real-life example. Dern and Wiorkowski (1969) conducted a study on the relationships between offsprings and their parents (22 families) of Erythrocyte Adenosine Triphosphate (ATP) levels in human blood (both pre- and post-storage). The families were markedly unbalanced, as there was one family with one child, six families with two children, six families with three children, three families with four children, five families with five children, and one family with six children. Besides the ATP level measurements, age, gender and pedigree (father, mother, son or daughter) are also available. To obtain binary responses, we dichotomize the poststorage ATP level using the following scheme: code 1 if the measure is less than the median; 0 otherwise. For simplicity, we ignore fathers’ information to obtain a one-parent familial structure, and families with missing values are removed (leaving us with 19 families). Table 2 gives the MLEs of the parameters using the MVP model as estimated using the SRProbit software package, along with moment (GEE) estimates of those parameters for comparison. Note that the regression estimates are similar for both procedures, and the negative coefficient for Pre-ATP (MPV : −1.36; GEE : −1.36) implies that increased levels of prestorage ATP level lead to increased probabilities of high post-storage ATP levels. Both Gender and Age are not significantly related to the response in both models. Furthermore, the large associations between parent and child post-storage ATP levels (0.74), as well as the large association between child post-storage ATP levels (0.64), indicate that ATP levels are highly associated within families, especially between generations. Though the standard errors of the moment estimates are slightly smaller than those for the MLEs, the Frechét bounds necessarily imposed on the GEE estimate of the correlation parameter α is given by (−0.057, 0.276). The GEE estimate is clearly out of the bounds, which may invalidate the inferences we can make based on those estimates. 6. Discussion While our focus was on familial data, the multivariate probit model is useful for analyzing relationships between all types of longitudinal binary variables and corresponding covariates. Using the stochastic representations reduces the computation burden normally encountered in the evaluation of the likelihood function under certain familial structures. This version of maximum likelihood estimation is more efficient than moment estimation, though it can require more time to algorithmically converge. Importantly, correlation estimates from the MVP method are restricted only by the positive

Y. Deng et al. / Computational Statistics and Data Analysis 56 (2012) 656–663

663

definite range of the correlation structure and the form of the stochastic representations, and are not subject to the Frechét Bounds. Avoiding those restrictive bounds will help ensure valid inference on the parameters (see Example 2 in Sabo and Chaganty (2010) for another comparison of the MVP and GEE methodologies). Acknowledgement The first author’s research was supported by a Purdue Research Foundation summer research grant, 2007. References Ashford, J.R., Sowden, R.R., 1970. Multi-variate probit analysis. Biometrics 26, 535–546. Chaganty, N.R., Deng, Y., 2007. Ranges of measures of association for familial binary variables. Communications in Statistics—Theory and Methods 36, 587–598. Chaganty, N.R., Joe, H., 2004. Efficiency of generalized estimating equations for binary responses. Journal of Royal Statisical Society B 66, 851–860. Chaganty, N.R., Joe, H., 2006. Range of correlation matrices for dependent bernoulli random variables. Biometrika 93, 197–206. Dern, R.J., Wiorkowski, J.J., 1969. Studies on the preservation of human blood. IV. The hereditary component of pre- and poststorage erythrocyte adenosine triphosphate levels. Journal of Laboratory & Clinical Medicine 73, 1019–1029. Kotz, S., Balakrishnan, N., Johnson, N.L., 2000. Continuous Multivariate Distributions, Volume 1: Models and Applications, 2nd ed.. John Wiley & Sons, Inc.. Liang, K.-L., Zeger, S., 1986. Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22. Prentice, R., 1988. Correlated binary regression with covariates specific to each binary observation. Biometrics 44, 1033–1048. Sabo, R.T., Chaganty, N.R., 2010. What can go wrong when ignoring correlation bounds in the use of generalized estimating equations. Statistics in Medicine 29, 2501–2507.