Correlated binomial regression models

Correlated binomial regression models

Computational Statistics and Data Analysis 56 (2012) 2513–2525 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data ...

547KB Sizes 0 Downloads 114 Views

Computational Statistics and Data Analysis 56 (2012) 2513–2525

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Correlated binomial regression models Rubiane M. Pires ∗ , Carlos A.R. Diniz Departamento de Estatística, Universidade Federal de São Carlos, C.P. 676, 13565-905, São Carlos-SP, Brazil

article

info

Article history: Received 30 June 2011 Received in revised form 2 February 2012 Accepted 2 February 2012 Available online 11 February 2012 Keywords: Binomial regression models Generalized binomial distribution Data augmentation scheme Bayesian inference Bayesian residuals Local influence

abstract In this paper, a class of correlated binomial regression models is proposed. The model is based on the generalized binomial distribution proposed by Luceño (1995) and Luceño and Ceballos (1995). The regression structure is modeled by using four different link functions and the dependence between the Bernoulli trials is modeled by using three different correlation functions. A data augmentation scheme is used in order to overcome the complexity of the mixture likelihood. A Bayesian method for inference is developed for the proposed model which relies on both the data augmentation scheme and the MCMC algorithms to obtain the posterior estimate for the parameters. Two types of Bayesian residuals and a local influence measure from a Bayesian perspective are proposed to check the underlying model assumptions, as well as to identify the presence of outliers and/or influential observations. Simulation studies are presented in order to illustrate the performance of the developed methodology. A real data set is analyzed by using the proposed models. © 2012 Elsevier B.V. All rights reserved.

1. Introduction It is common in a real practical situation to obtain data sets concerning the frequencies of events, Y1 , Y2 , . . . , Ym , that occurred in m independent clusters, respectively, with frequency Yi based on a sum of dependent Bernoulli random variables ni Wij , j = 1, 2, . . . , ni , Yi = j=1 Wij , where Wij is an indicator variable of the event status of the jth individual and ni is the number of individuals in the ith cluster. In some situations, for each cluster, the values of a set of k covariates, xi1 , xi2 , . . . , xik and, for each individual inside the cluster, the values of a set of q covariates, ri1j , ri2j , . . . , riqj , are available. For instance, consider a financial data set containing a portfolio of m holding companies (clusters), controlling partial or complete interest in, at least, two companies. For each company inside the holding company the condition of default is observed, that is, it is observed if the company has or has not met its legal obligations according to the debt contract. The number of companies in default inside the ith holding, Yi , is observed. Holding company covariates such as the number of partners in the corporation and the size of the corporation and individual covariates such as the revenue of the company and the size of the company could be available. The main interest of the analysis is to fit a regression model that can be used to determine the probability of default of a new holding company. Considering this, the interest is to model the behavior of the response variable, Yi , as a function of k covariates, xi1 , xi2 , . . . , xik . There are some models in the literature which can be used to find the probability of default. For instance, if it is assumed that the Bernoulli random variables within the cluster have a common correlation and such a correlation parameter is common to all clusters, the modified logistic-linear model (Williams, 1982) may be applied. A multivariate probit regression for modeling equicorrelated binary observations is described in Ochi and Prentice (1984). Regression for overdispersed



Corresponding author. Tel.: +55 16 3351 8243; fax: +55 16 3351 8241. E-mail addresses: [email protected] (R.M. Pires), [email protected] (C.A.R. Diniz).

0167-9473/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2012.02.004

2514

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

binomial data is presented in Prentice (1986) which extended the beta-binomial distribution to allow negative correlations among binary variates within a cluster and proposed regression models for both the binary variate response rate and for the pairwise correlation between binary variates. Prentice (1988) proposed regression methods for the analysis of correlated binary data when each binary observation may have its own covariates. Lindsey and Altham (1998) discussed and compared three different generalizations of the binomial distribution, beta-binomial (Skellam, 1948), double binomial (Efron, 1986; Lindsey, 1995) and multiplicative binomial (Altham, 1978), which deals with over- or underdispersion. For each distribution, both the probability and the dispersion parameters are allowed to vary simultaneously with respect to a set of covariates according to two separate regression equations. In this paper, an alternative class of correlated binomial regression models is proposed based on the correlated binomial distribution. Kupper and Haseman (1978) developed the correlated binomial model. This distribution was derived by correcting the conventional binomial model via a method suggested by Bahadur (1961) to allow for dependency among the Bernoulli variables. Luceño (1995) and Luceño and Ceballos (1995) proposed a generalized binomial distribution which is discussed in detail in Diniz et al. (2010). The generalized binomial distribution proposed by Luceño (1995) and Luceño and Ceballos (1995) is used to develop the models discussed in this paper. The regression structure is modeled by using four different link functions, the logit, the complementary log–log, the log–log and the probit links. The dependency between individuals of the same cluster is modeled by using correlation functions (Jennrich and Schluchter, 1986; Zimmerman and Harville, 1991; Cressie, 1993; Russell, 1996; Sherman, 2011) taking into account the covariates of the individuals within the clusters. Correlated binomial regression models allow only the presence of positive correlation between individuals. Luceño and Ceballos (1995) and Kolev and Paiva (2008) present a biological motivation and a theoretical justification, respectively, for this restriction. The complexity of the mixture likelihood due to the presence of products of sums is overcome using a data augmentation scheme (Tanner and Wong, 1987; Diebolt and Robert, 1994). A Bayesian approach is considered in the estimation process. The sensitivity of the model due to the choice of the priors is evaluated. The posterior predictive distribution is defined and an algorithm is constructed to obtain the predicted value by using a single MCMC sample from the posterior distribution of the model parameters (Chen et al., 2000). Residuals based on the predicted values obtained by the posterior predictive distribution (Gelman et al., 2003) and residuals based on the posterior distribution of the model parameters (Albert and Chib, 1995) are defined to check the assumptions in the model. A sensitivity study to detect outliers or influential cases that can change the results is performed. Graphical techniques using the standardized residuals are considered to identify the presence of outliers and a Bayesian case deletion influence diagnostic (Cook and Weisberg, 1982) based on the Kullback–Leibler divergence (Cho et al., 2009) is considered to evaluate the sensitivity of the observations in the estimation of the parameters. The deviance information criterion (DIC) (Spiegelhalter et al., 2002) is used as a model selection criterion. The article is organized as follows. In Section 2, a class of correlated binomial regression models, in which the regression structure is modeled using one of the four link functions, is described. The likelihood function is defined and a more tractable alternative is presented using a data augmentation method. Section 3 deals with Bayesian inference of the model and the posterior predictive distribution is presented. In Section 4, a diagnostics analysis is provided and a Bayesian predictive model selection criterion is explored. In Section 5, simulation studies are presented in order to illustrate the performance of the developed methodology including the frequentist properties of the Bayes estimators. A Bayesian analysis for the betabinomial regression models is presented in Section 6. A real data set is explored in Section 7. Finally, we present a brief discussion and concluding remarks in Section 8. 2. A class of correlated binomial regression models Before presenting an alternative correlated binomial regression model, it is useful to recall the definition of the generalized binomial distribution. The generalized binomial distribution derived by Luceño (1995), CB(n, p, ρ), is obtained by supposing that variable Y , the number of successes in n trials of Bernoulli, is the sum of equicorrelated binary responses with a probability of success constant p and a common correlation coefficient equal to ρ . Formally, let Y = W1 + W2 + · · · + Wn , where Ws , s = 1, 2, . . . , n, is a binary variable with E (Ws ) = p, Var(Ws ) = p(1 − p) and Corr(Ws , Wt ) = ρ , for all s and t, s ̸= t; ρ > 0. A probability distribution of Y can be obtained by the mixture of the distributions of two variables. One of them follows a binomial distribution, B(n, p), with mixing probability (1 − ρ), and the other one follows a modified Bernoulli distribution, MBern(p), taking values 0 or n (Fu and Sproule, 1995), rather than the conventional values 0 or 1, with mixing probability ρ . The probability distribution of Y , given n, p and ρ is then given by P (Y = y|n, p, ρ) =

  n y

y

py (1 − p)n−y (1 − ρ)IA1 ( y) + p n (1 − p)

n−y n

ρ IA2 ( y),

(1)

where A1 = {0, 1, . . . , n}, A2 = {0, n}, y = 0, 1, . . . , n, n ∈ N − {0}, 0 < p < 1 and 0 ≤ ρ ≤ 1. The mean and variance of this model are E (Y ) = np and Var(Y ) = p(1 − p){n + ρ n(n − 1)}, which accommodates an extra-binomial variation for ρ ̸= 0. Note that the CB(n, p, ρ) model is equivalent to the binomial model for ρ = 0. Correlated binomial regression models are constructed assuming a regression structure in the correlated binomial distribution parameters (1) by using four different link functions, the logit, the complementary log–log, the log–log and the probit links. Using latent variable, the complete data likelihood function for this model is presented in the next section.

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

2515

Table 1 Link functions used to model pi . Link function

g −1 (ηi )

Logit Comp. log–log Log–log Probit

exp {ηi } / [1 + exp {ηi }] 1 − exp {− exp {ηi }} exp {− exp {−ηi }}

Φ (ηi )

Φ (·) is the cumulative distribution function for the normal distribution.

2.1. Correlated binomial regression models Suppose that Y1 , Y2 , . . . , Ym are independent random variables such that Yi has the correlated binomial distribution, CB(ni , pi , ρi ). The response variable Yi takes values 0, 1, . . . , ni , corresponding to the number of individuals with the event of interest within the ith cluster, that is, Yi = Wi1 + · · · + Wini , where Wij = 0, 1, j = 1, . . . , ni , is a binary variable with E (Wij ) = pi , Var(Wis ) = Var(Wit ) = pi (1 − pi ) = σi2 and Corr(Wis , Wit ) = ρi , for all s and t, s ̸= t. The mean and variance of Yi are ni pi and pi (1 − pi ){ni + ρi ni (ni − 1)}, respectively. It is convenient, as in the construction of likelihood functions for binary data presented in McCullagh and Nelder (1989) to consider the likelihood as a function of the m-vectors p = (p1 , p2 , . . . , pm )⊤ and ρ = (ρ1 , ρ2 , . . . , ρm )⊤ . Subsequently, the likelihood is rewritten as a function of the coefficients β0 , β1 , . . . , βk , associated with the covariates, and of the coefficient γ , associated with the correlation structure. Assuming y1 , y2 , . . . , ym a set of observed values of Y1 , Y2 , . . . , Ym and using (1), the likelihood may be written as

L(p, ρ; m, n, y ) =

 m   ni i =1

yi

y pi i

(1 − pi )

n i −y i

yi ni

(1 − ρi ) + pi (1 − pi )

(ni −yi ) ni

 ρi IA2i ( yi ) ,

(2)

where A2i = {0, ni }, yi = 0, . . . , ni , ni ∈ N − {0}, y = ( y1 , y2 , . . . , ym )⊤ , n = (n1 , n2 , . . . , nm )⊤ , 0 < pi < 1 and 0 ≤ ρi ≤ 1, with i = 1, . . . , m. Since our main interest is to define a correlated binomial regression model, it is useful to model the success probability, pi , and the correlation parameter, ρi . The parameters pi are modeled using the link functions g −1 (ηi ) specified in Table 1;  ηi = kr=0 βr xir ; the coefficients β0 , β1 , . . . , βk are unknown regression parameters to be estimated; xi0 = 1, for all i; and xi1 , xi2 , . . . , xik represent the values of the k covariates for the ith cluster. The correlation structure can be written, in general, as

ρi = h(v(ri ), γ )

(3)

where h(v(ri ), γ ), an appropriate nonlinear, monotonic and differentiable function, is the correlation between any two individuals within the ith cluster; v(ri ) represents a function of the individual covariates values, assuming positive values; ri = (ri11 , . . . , ri1ni , ri21 , . . . , ri2ni , riq1 , . . . , riqni )⊤ , with rilj representing the value of the lth covariate for jth individual inside the ith cluster, i = 1, . . . , m, l = 1, . . . , q and j = 1, . . . , ni ; γ is the parameter which determines the rate of decay of correlation as a function of v(ri ) (Sherman, 2011). Using spatial ideas of correlation structures, the possible choices of the function v(ri ) can be made considering, for instance, continuous functions of some distance between position vectors or between other available vectors which allow us to characterize the relationship among the individuals within the cluster (Sherman,  2011). Therefore, candidates for v(ri ), using only the covariates ri1 and ri2 , could be the Euclidean distance metric,

   (rils − rilt )2 , the Manhattan distance, defined as l=1,2 s s


l=1,2

  s

s
yi

i=1

+g

−1

yi ni

(ηi ) (1 − g

−1

(ηi ))

ni −yi ni

 h(v(ri ), γ )IA2i ( yi ) .

(4)

The complexity of the mixture likelihood (4) due to the presence of the products of sums is a natural barrier to determine the conditional posterior distributions for the Metropolis–Hasting within Gibbs algorithm (Robert and Casella, 2004). However, a data augmentation scheme (Tanner and Wong, 1987; Diebolt and Robert, 1994) overcomes this problem by simplifying the conditional posterior distributions required for the MCMC methods.

2516

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525 Table 2 Some alternatives for h(v(ri ), γ ). h(v(ri ), γ )

Structure (a)

exp{−γ v(ri )} exp{−[γ v(ri )]2 }

Exponential Gaussian(a) ContinuousAR(b)

γ v(ri )

In cases (a) γ > 0 and in (b) γ ∈ (0, 1).

2.2. The complete data likelihood function A latent variable Zi , i = 1, . . . , m, is introduced into the model indicating to which component of the CB(ni , pi , ρi ) model the observation yi , i = 1, . . . , m, belongs to, that is

 Zi =

1, 0,

if the observation yi belongs to the MBern(pi ) if the observation yi belongs to the B(ni , pi )

and, conditionally on the observations yi , the probability of success of the variable Zi is given by P (Yi = yi |Zi = 1)P (Zi = 1)

τi = P (Zi = 1|Yi = yi , ni , pi , ρi ) = yi ni

ρi pi (1 − pi )

=

yi ni

ρi pi (1 − pi )

ni −yi ni

P (Yi = yi ) ni −yi ni

IA 2 ( y i )

,

i

IA2 ( yi ) + (1 − ρi ) i

  ni yi

(5)

y

pi i (1 − pi )ni −yi

where A2i = {0, ni } and yi = 0, . . . , ni ; i = 1, . . . , m. Thus, z

P (Zi = zi |Yi = yi , ni , pi , ρi ) = τi i (1 − τi )1−zi



yi n

ρi pi i (1 − pi )

=

ni −yi ni

yi ni

zi  1−zi   y (1 − ρi ) nyii pi i (1 − pi )ni −yi

IA2 ( yi ) i

ρi pi (1 − pi )

ni −yi ni

IA2 ( yi ) + (1 − ρi )

 

i

ni yi

,

y

pi i (1 − pi )ni −yi

i = 1 , . . . , m. The complete data likelihood function with the presence of the latent variable z = (z1 , z2 , . . . , zm )⊤ , is defined as

 L(p, ρ; n, y , z ) =

m 

{P ( yi |ni , pi , ρi )}

i=1

=

 {P (zi |yi , ni , pi , ρi )}

i=1

  (1−zi ) m  ni i=1

 m 

yi

yi ni (zi +ni −ni zi )

pi

  z (ni −yi ) ni +1−zi zi

( 1 − pi )

 (1−zi )

ρi (1 − ρi )

i

IA 2 ( y i ) i

zi

.

(6)

Using parameterization φ = log(γ ), for exponential or Gaussian correlation structure, or φ = log(γ /(1−γ )), for continuous AR correlation structure, the parameter γ can be estimated without restriction. It follows that the likelihood function for θ = (β0 , . . . , βk , φ)⊤ , given the complete data D ∗ = (m, n, y , z , x, r )⊤ can be expressed as

L(θ; D ) ∝ ∗

  (1−zi ) m  ni yi

i=1

× h (v(ri ), φ) ∗

zi

(ηi )

yi ni (zi +ni −ni zi )

(1 − g

−1

(ηi ))

g

−1



(1−zi ) 1 − h (v(ri ), φ) IA2 ( yi )zi i ∗

  z (ni −yi ) ni +1−zi i

 (7)

where h∗ (v(ri ), φ) is similar to function h(v(ri ), γ ), presented in Table 2, considering the parameterization on γ = exp{φ}/(1 + exp{φ}) or γ = exp{φ}. The likelihood function (7), based on the latent data, becomes more tractable than the usual likelihood function, facilitating the Bayesian analysis.

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

2517

3. Bayesian inference In this section, a Bayesian method for inference is developed for the correlated binomial regression model. The method relies on both the data augmentation scheme and the MCMC algorithms to obtain the posterior estimate for the parameters. The Bayesian approach assumes ni , i = 1, . . . , m, known and prior independence among the parameters β0 , β1 , . . . , βk , φ . A priori distribution for parameter vector θ can be given by a (k + 2)-dimensional multivariate normal distribution with mean vector zero and covariance matrix Σ = diag{τ1 , . . . , τk+2 }, with known hyper-parameters τr , r = 1, . . . , k + 2. Using the parameterizations on γ , the joint posterior distribution for θ is obtained by combining the likelihood function (7) and the prior distribution for θ . Thus, the joint posterior distribution is given as

π (θ|D ∗ ) ∝ L(θ; D ∗ ) π (θ).

(8)

This distribution is not tractable analytically. The Bayesian inference can be conducted by considering MCMC methods such as Metropolis–Hasting within Gibbs algorithm (Robert and Casella, 2004). The posterior full conditional distributions for parameters βr and φ required to implement the Metropolis–Hasting algorithm are given by

π (βr |β(−r ) , τr , D ∗ ) ∝

m  



g −1 (ηi )

yi ni (zi +ni −ni zi )



(1 − g −1 (ηi ))

    z (ni −yi ) ni +1−zi i

i=1

 exp −

βr2 2τr



,

and

π (φ|τk+2 , D ∗ ) ∝

m  

1−zi 

h∗ (v(ri ), φ)zi 1 − h∗ (v(ri ), φ)



 exp −

i =1

 φ2 . 2τk+2

Since none of the above full conditionals have a closed form, the Metropolis–Hasting within Gibbs algorithm can be used to sample from these conditionals. (0) To generate samples from θ at the first iteration in the chain, consider the initial values for θ , θ (0) = (β0 , . . . ,

βk(0) , φ (0) )⊤ , generate z (0) = (z1(0) , . . . , zm(0) )⊤ , where zi(0) ∼ Bern(τi(0) ), τi as given by (5), obtaining the complete data D ∗ (0) = (D , z (0) )⊤ . Generate a candidate sample θ ∗ from Nk+2 (θ (0) , ξ Ik+2 ), where ξ is a value which should be chosen such that the acceptance rate is reasonable, and random numbers ui , i = 1, . . . , k + 2, from the uniform distribution on the interval (0, 1). For each candidate βr∗ , r = 0, . . . , k, and for each candidate φ ∗ , both in θ ∗ , calculate the ratios RMH1 =

0) ∗ (0) π (βr∗ |β((− ) r ) , τr , D 0) ∗ (0) ) π (βr(0) |β((− r ) , τr , D

,

and RMH2 =

π (φ ∗ |τk+2 , D ∗ (0) ) , π (φ (0) |τk+2 , D ∗ (0) )

(1)

(1)

(0)

if ur ≤ RMH1 , then accept the candidate point βr = βr∗ , else βr = βr and if uk+2 ≤ RMH2 , then accept the candidate point φ (1) = φ ∗ , else φ (1) = φ (0) . With θ (1) instead of θ (0) , repeat the process to update θ . The software is available under request by contacting [email protected]. 3.1. Posterior predictive distribution The distribution of a future observation, y˜ , conditional on the observed data D = (m, n, y , x, r )⊤ , is given by its posterior predictive distribution (Gelman et al., 2003). The posterior predictive distribution for the correlated binomial regression model is defined as follows

π (˜yi |D ) =

 Θ

with π (˜yi |D , θ) =

  π (˜yi |D , θ)π θ|D ∗ dθ,   ni y˜ i

(9) y˜ i n



pi i (1 − pi )ni −˜yi (1 − ρi ) + pi i (1 − pi )

(ni −˜yi ) ni

ρi IA2i ( yi ), where ρi = h∗ (v(ri ), φ) and pi =

k g −1 ( r =0 βr xir ). Considering the Bayes estimators for the cluster parameters, we can predict the value of y˜ i using (9). For this, a Monte Carlo estimation of the density π (˜yi |D , θ) can be obtained using a MCMC sample of the posterior distribution π (θ|D ∗ ) (Chen et al., 2000). However, in order to carry out this method, it is necessary first to fix a value to y˜ in {0, . . . , ni }. The following algorithm provides the predicted value for y˜ numerically. (a) Generate a sample set θ 1 , . . . , θ Q of size Q from π (θ|D ∗ ). For each θ q = (β0q , . . . , βkq , φq )⊤ , q = 1, . . . , Q , determine the values of pˆi q and ρˆi q , where pˆi q = g

−1

 k  r =0

 βˆr q xir

and ρˆi q = h∗ (v(ri ), φˆ q ).

2518

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

(b) For each y˜ i in {0, . . . , ni }, obtain the Monte Carlo estimate for πˆ (˜yi |D ), given by

πˆ (˜yi |D ) =

Q 1 

 

Q q=1

y˜ i

ni

y˜ i n



pˆi qi (1 − pˆi q )ni −˜yi (1 − ρˆi q ) + pˆi q i (1 − pˆi q )

(ni −˜yi ) ni

ρˆi q IA2i (˜yi ).

(c) The value of y˜ i in {0, . . . , ni } which maximizes πˆ (˜yi |D ) is the predict value for future observation. 4. Diagnostics Among the underlying assumptions made in the construction of the correlated binomial regression model we can emphasize: (i) independence among the response variables, (ii) the response variables follow a correlated binomial distribution, BC (ni , pi , ρi ), (iii) and the assumption of positive correlation between the variables in the Bernoulli cluster, ρi > 0. In this section, two different types of Bayesian residuals and a local influence measure from a Bayesian perspective are proposed to check the underlying model assumptions (i) and (ii) and to identify the presence of outliers and/or influential observations. To check the assumptions that the response variables follow a correlated binomial distribution BC (ni , pi , ρi ) and that ρi > 0, the significance of the correlation structure parameter γ is observed using inter-quantile and HPD (Chen and Shao, 1999) credibility intervals obtained in the inferential process. If γ = 0 or γ = 1, depending on the correlation structure used, the usual binomial regression model can be considered in the analysis. Furthermore, in this section we present a convenient criterion of a model selection. 4.1. Residuals based on the posterior predictive distribution pp

The residuals for the ith observation, ri , based on the predicted values obtained by the posterior predictive distribution pp (Gelman et al., 2003) are calculated as ri = yi − y˜ i , where yi is the ith observed response and y˜ i is the predicted value conditional to the vector of values of the covariates, (xi0 , . . . , xik , ri11 , . . . , ri1ni , ri21 , . . . , ri2ni , riq1 , . . . , riqni )⊤ , where xir is the value of the rth covariate inside the ith cluster, i = 1, . . . , m and r = 0, . . . , k, and rilj is the value of the lth covariate spp for jth individual inside the ith cluster, i = 1, . . . , m, l = 1, . . . , q and j = 1, . . . , ni . The standardized residuals, ri , based on the posterior predictive distribution for the correlated binomial regression model are defined as pp

spp

ri

= 

ri

pˆi (1 − pˆi ){ni + ρˆi ni (ni − 1)}

,

i = 1, . . . , m,

(10)

where pˆ i and ρˆ i are the Bayesian estimates of pi and ρi , respectively. It is expected that the set of residuals is close to zero. If this does not occur, that is, if there is a set of points far from zero it is an indication that the model is misfitted to the data. 4.2. Residuals based on the posterior distribution of the model parameters spd

The standardized residual, Ri , based on the posterior distribution of the model parameters is defined by the equation yi − E (Yi |D , θ) yi − ni pi = √ = √ , (11) pi (1 − pi )(ni + ρi ni (ni − 1)) Var(Yi |D , θ) where pi and ρi are unobserved and ni and yi are sample information for the ith observation. Since the relation in (11) is a function of the parameter vector θ , it carries all the uncertainty of the parameters reflecting the posterior distribution of spd residuals Ri . Therefore, the precision of the knowledge about θ , as expressed in its posterior distribution, will be reflected in spd

Ri

spd

the precision of the sizes of Ri s (Albert and Chib, 1995). The posterior distribution of the residuals can be summarized using samples from a MCMC sampler. That is, a sample θ 1 , . . . , θ Q of size Q from π (θ|D ∗ ) is generated and, for the pair ( yi , ni ), i = 1, . . . , m, a sample of size Q of the residuals in (11) is generated. It is expected that the residual mean is centered at zero. If this does not occur, or if the residual sample presents a high variability, the observation can be considered an outlier. 4.3. Bayesian influence diagnostics In this section, a Bayesian case deletion influence diagnostics (Cook and Weisberg, 1982) based on the Kullback–Leibler (K –L) divergence (Cho et al., 2009), which is calculated using the conditional predictive ordinate defined for the correlated binomial regression model, is considered to evaluate the sensitivity of the observations when estimating the parameters. The conditional predictive ordinate (CPO) of the ith observation is defined as

 CPOi =

Θ

1 G( yi |θ)

  π θ|D ∗ dθ

 −1

,

i = 1, . . . , m,

where G( yi |θ) =

  ni yi

y

yi n

pi i (1 − pi )ni −yi (1 − ρi ) + pi i (1 − pi )

(ni −yi ) ni

ρi IA2i ( yi ).

(12)

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

2519

The K –L divergence between the posterior distribution with the full data D ∗ and the posterior distribution with the ith ∗ observation deleted D(− i) can be written as ∗ K π θ|D ∗ , π θ|D(− i)



 





= − log(CPOi ) + Eθ [log{G( yi |θ)}|D ] ,

(13)

where G( yi |θ) is given by (12). In estimating the K –L divergence, the Monte Carlo method with MCMC samples of the posterior distribution, π (θ|D ∗ ), can be used (Chen et al., 2000). Thus, if θ 1 , . . . , θ Q is a sample of size Q from π (θ|D ∗ ), then the Monte Carlo estimation for (13) is given by

      ∗ = − log Kˆ π θ|D ∗ , π θ|D(− i)

Q 1 



1

Q q=1 G( yi |θ q )

+

Q 1 

Q q =1

log[G( yi |θ q )].

(14)

Large values of K –L divergence show evidence of the presence of influential points in the data set. In order to ascertain whether the ith observation is an influential point or not, a K –L divergence calibration is determined, as shown in Cho et al. (2009) and Peng and Dey (1995), given by

 p∗i = 0.5 1 +



   ∗ . 1 − exp −2Kˆ π (θ|D ∗ ) , π θ|D(− i) 



If the p∗i value is larger than 0.5, the ith case is regarded as influential. 4.4. Bayesian model selection In a practical situation, it is possible to have a set of potential models for the same data set. It can occur if each model can be constructed using a subset of covariates or the number of parameters is not clearly defined or non-hierarchical models are available. The DIC Bayesian model selection criterion (Spiegelhalter et al., 2002) for a correlated binomial regression model is determined as

 = 2D¯ − Dˆ , DIC where Q  ¯ = 1 ˆ =D D D(θ q ) and D



Q q=1

with D(θ q ) = −2

m

i =1

Q 1 

Q q=1

β0q , . . . ,

Q 1 

Q q =1

βkq ,

Q 1 

Q q =1

 φq ,

log G( yi |θ q ) , and θ q = (β0q , . . . , βkq , φq )⊤ a sample generated from the posterior distribution





π (θ|D ∗ ). Function G( yi |θ q ) is as defined in (12). To calculate Dˆ we generate Q MCMC samples of θ 1 , . . . , θ q and determine Q Q Q m 1 1 1 q=1 β0q , . . . , Q q=1 βkq , Q q=1 φq . These values are used directly in −2 i=1 log (G( yi |θ)). The best fitted model is Q the one which yields the smallest DIC. 5. A simulation study The simulated data set involves m = 500 clusters with the response variables Yi , i = 1, . . . , 500, following a BC (ni , pi , ρi ), with ni generated from a B(45, 0.5) distribution. The cluster parameter values are set at: β0 = 2, β1 = −2 and β2 = −2. The parameter of the correlation functions used in the simulation is γ = 0.2 and v(ri ) assumes values in a U (0, 1) distribution. Two covariates are considered, xi1 and xi2 , xi1 generated from a U (0, 2) distribution and xi2 from a N (0, 4) distribution. Global convergence was tested for each chain considering the Geweke criterion (Geweke, 1992). The generated data set is accepted if the Geweke values belong to the interval (−2, 2). 5.1. Sensitivity to the prior distribution To evaluate the sensitivity of the model for the choice of prior distributions, we conducted a study using two different priors, N (0, 10−4 ) and Constant, for each of the parameters βr , r = 0, 1, 2, and two different priors, LN (0, 10−4 ) and G(10−4 , 10−4 ), for γ and Constant for log(γ ). The logit link function and the exponential correlation structure are used in all models. The posterior summaries, point estimates (means and medians) and the 95% inter-quantile and HPD credibility intervals, for the parameters β0 , β1 , β2 for both priors are extremely similar. Moreover, the posterior summaries are almost the same, using the priors LN (0, 10−4 ) and Constant, for the parameter γ and log(γ ), respectively. Thus, only the results using the prior N (0, 10−4 ) for the parameters βr , r = 0, 1, 2, and the results using the priors LN (0, 10−4 ) and G(10−4 , 10−4 ) for parameter γ are shown in Table 3.

2520

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

Table 3 Summaries of the marginal posterior of the parameters β0 , β1 , β2 and γ . Parameter

Prior distributions

Posterior mean

Posterior median

95% inter-quantile credibility interval

95% HPD credibility interval

β0 β1 β2 γ γ

N (0, 10−4 ) N (0, 10−4 ) N (0, 10−4 ) LN (0, 10−4 ) G(10−4 , 10−4 )

1.606 −2.016 −1.879 0.194 0.212

1.574 −2.001 −1.872 0.188 0.207

(0.913; 2.398) (−2.749; −1.433) (−2.383; −1.462) (0.097; 0.329) (0.111; 0.344)

(0.934; 2.407) (−2.689; −1.375) (−2.332; −1.409) (0.084; 0.308) (0.109; 0.338)

Table 4 Coverage probabilities of the inter-quantile and HPD credibility intervals for different link functions and correlation structures. Parameter

Link function

Correlation structure Exponential

Gaussian

Continuous AR

Inter-quantile

HPD

Inter-quantile

HPD

Inter-quantile

HPD

β0

Logit Comp. log–log Log–log Probit

0.95 0.94 0.94 0.93

0.94 0.95 0.94 0.93

0.95 0.94 0.93 0.95

0.95 0.94 0.93 0.95

0.92 0.94 0.93 0.94

0.93 0.93 0.93 0.94

β1

Logit Comp. log–log Log–log Probit

0.94 0.94 0.95 0.94

0.95 0.94 0.94 0.94

0.95 0.94 0.93 0.96

0.95 0.94 0.93 0.95

0.94 0.93 0.94 0.94

0.93 0.93 0.94 0.93

β2

Logit Comp. log–log Log–log Probit

0.95 0.94 0.93 0.93

0.95 0.93 0.93 0.94

0.94 0.94 0.93 0.95

0.94 0.94 0.93 0.95

0.94 0.95 0.92 0.94

0.93 0.94 0.92 0.93

γ

Logit Comp. log–log Log–log Probit

0.94 0.94 0.96 0.95

0.93 0.94 0.96 0.94

0.93 0.93 0.93 0.92

0.93 0.91 0.92 0.92

0.94 0.95 0.94 0.94

0.94 0.95 0.94 0.95

95% nominal coverage probability.

5.2. Frequentist properties of the Bayes estimators The estimated coverage probabilities are based on 1000 simulations with m = 500 clusters and simulation scenarios as described at the beginning of the Section 5. The four link functions, logit, complementary log–log, log–log and probit links, and the three correlation structures, exponential, Gaussian and continuous AR, are used in the analysis. The prior N (0, 10−4 ) is considered for each of the parameters β0 , β1 , β2 and φ . The results presented in Table 4 indicate that the estimates of coverage probabilities are between 92% and 96%, considering a 95% nominal coverage probability, for the four parameters in all scenarios. For the exponential correlation structure the estimates of coverage probabilities of the inter-quantile and HPD credibility intervals for β0 , β1 and β2 , using the logit link function, are 94% or 95%, whereas for γ these estimates are 94% and 93% respectively. Similar results for β0 , β1 and β2 are found for the Gaussian correlation structure. However, for γ the estimates of coverage probabilities are between 91% and 93%. For the continuous AR structure the estimates of coverage probabilities for γ are between 94% and 95%. Table 5 presents the mean-square error and bias of the posterior medians considering the four link functions and the three correlation structures. In order to determine these values, 1000 MCMC samples of the posterior distribution for each parameter are generated. In each MCMC sample, the posterior median, the MSE and bias are calculated. The mean of these 1,000 MSE and bias, for each parameter, are calculated. The results show that the values are near zero. Similar results are found for the posterior mean. The continuous AR structure presents the best global results for mean-square error and bias, whereas the Gaussian structure presents the worst global results for mean-square error and bias. 6. Bayesian approach for the beta-binomial regression models In order to compare the results of the fitted model proposed in this work with those given by other methods, a Bayesian analysis is considered for the beta-binomial regression models (Prentice, 1986; Lindsey and Altham, 1998). Assuming p arises from a conjugate beta distribution, Beta(α1 , α2 ), α1 > 0 and α2 > 0, and the parameterization p = α1 /(α1 + α2 ) and ρ = 1/(α1 + α2 + 1), such that α1 = p/ζ and α2 = (1 − p)/ζ , where ζ = ρ/(1 − ρ). The beta-binomial distribution can be written as P (Y = y|n, p, ζ ) =

  y−1 n y

j =0

n−y−1

(p + ζ j)

 j =0

((1 − p) + ζ j)

 n −1  j =0

 −1 (1 + ζ j)

,

(15)

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

2521

Table 5 Mean-square error (MSE) and bias of the posterior median for different link functions and correlation structures. Parameter

Link function

Correlation structure Exponential MSE

Gaussian Bias

Continuous AR

MSE

Bias

MSE

Bias

−0.006

0.015 0.008 0.020 0.006

−0.002

β0

Logit Comp. log–log Log–log Probit

0.058 0.034 0.080 0.023

0.006 0.013 0.020 0.004

0.133 0.077 0.189 0.059

β1

Logit Comp. log–log Log–log Probit

0.056 0.035 0.067 0.023

−0.006 −0.011 −0.020 −0.006

0.122 0.079 0.160 0.067

−0.009 −0.031 −0.012

0.014 0.008 0.016 0.007

−0.003 −0.002 −0.002

β2

Logit Comp. log–log Log–log Probit

0.021 0.018 0.050 0.012

−0.014 −0.012 −0.021 −0.007

0.066 0.041 0.112 0.049

−0.012 −0.018 −0.035 −0.018

0.005 0.004 0.012 0.004

−0.001 −0.003 −0.003 −0.002

0.002 0.002 0.002 0.002

−0.001 −0.001

0.003 0.003 0.003 0.003

−0.008 −0.008 −0.009 −0.007

0.001 0.001 0.001 0.001

−0.001

γ

Logit Comp. log–log Log–log Probit

0.000 −0.000

0.011 0.029 0.022 0.008

0.003 0.000 0.001 0.003

0.000 0.001 0.001

where j=0 cj = 0, for any x < 0, y = 0, 1, . . . , n, n ∈ N − {0}, 0 < p < 1 and −1 ≤ ρ ≤ 1. The mean and variance of this model are E (Y ) = np and Var(Y ) = np(1 − p)(1 + (n − 1)ρ) (Prentice, 1986). Let y1 , y2 , . . . , ym be a set of observed values of Y1 , Y2 , . . . , Ym . The likelihood function is given by

x

LBB (p, ζ; m, n, y ) =

  yi −1 m   ni  i =1

 yi

ni −yi −1

(pi + ζi j)

j =0



((1 − pi ) + ζi j)

j =0

 n −1 i 

(1 + ζi j)

j =0

−1  

,

(16)



where pi = g −1 (ηi ) and ρi = h∗ (v(ri ), φ). Function g −1 (ηi ) is shown in Table 1 and h∗ (v(ri ), φ) is similar to that shown in Table 2. Considering a prior distribution, π (θ), for θ = (β0 , β1 , . . . , βk , φ)⊤ , θ ∼ Nk+2 (0, Σ ), Σ = diag{τ1 , . . . , τk+2 } with known hyper-parameters τr , r = 1, . . . , k + 2, and ni , i = 1, . . . , m, known, the joint posterior distribution for θ is given by

πBB (θ|D ) ∝ LBB (θ; D ) π (θ). (17) The posterior distribution πBB (θ|D ) is fairly tractable from a Bayesian perspective. The posterior full conditional distributions for parameters βr and φ are given by   m y   i −1 βr2  h∗ (v(ri ), φ)j g −1 (ηi ) + π (βr |β(−r ) , φ, τr , D ) ∝ exp − 2τr i=1 j=0 1 + h∗ (v(ri ), φ)  ni − yi −1   h∗ (v(ri ), φ)j −1 (1 − g (ηi )) + × , 1 + h∗ (v(ri ), φ) j =0 and

    m y i −1 φ2 h∗ (v(ri ), φ)j π (φ|β, τk+2 , D ) ∝ exp − g −1 (ηi ) + 2τk+2 i=1  j=0 1 + h∗ (v(ri ), φ)  ni − y i −1   h∗ (v(ri ), φ)j −1 × (1 − g (ηi )) + 1 + h∗ (v(ri ), φ) j =0  n −1    −1  i ∗  h (v(ri ), φ)j × 1+ ,  1 + h∗ (v(ri ), φ) j =0 

where h∗ (v(ri ), φ) is similar to the function h(v(ri ), γ ), presented in Table 2. The Bayesian inference can be conducted by considering MCMC methods such as Metropolis–Hasting within Gibbs algorithm. The deviance information criterion value for beta-binomial regression models can be obtained in a similar form to Section 4.4, replacing the function G( yi |θ) by GBB ( yi |θ) =

  y i −1 ni yi

j =0

n i −y i −1

(pi + ζi j)

 j =0

((1 − pi ) + ζi j)

 n −1 i  j =0

−1 (1 + ζi j)

,

(18)

2522

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525 Table 6 DIC values for correlated binomial regression fitted models with different link functions. DIC

Logit

Complementary log–log

Log–log

Probit

Value

0.679

0.689

0.699

0.689

Table 7 Summaries of the marginal posterior of the parameters γ , β0 , β1 , β2 and β3 , for the default data set. Parameter

γ β0 β1 β2 β3

Posterior mean

Posterior median

1.811

1.789

−3.158 −2.533

−3.157 −2.533

0.361 0.422

0.359 0.407

where ζi = ρi /(1 − ρi ),

x

j =0

95% inter-quantile credibility interval

95% HPD credibility interval

(1.370, 2.435) (−4.331, −2.079) (−3.439, −1.639) (0.182, 0.588) (0.044, 0.866)

(1.359, 2.425) (−4.133, −1.952) (−3.465, −1.723) (0.181, 0.583) (0.044, 0.866)

cj = 0, for any x < 0, yi = 0, 1, . . . , ni , ni ∈ N − {0}, 0 < pi < 1 and 0 ≤ ρi ≤ 1, with

pi = g −1 (ηi ) and ρi = h∗ (v(ri ), φ). In Section 7, the results of the correlated binomial regression model using the logit link function, are compared with these given by the Bayesian approach of the Prentice models. 7. Real data In this Section, a real financial data set from a credit bureau in Brazil containing a portfolio of m = 148 holding companies (clusters), controlling partial or complete interest in, at least, two companies, is analyzed by using the proposed model (data set is available at http://www.ufscar.br/~des/docente/carlos/Dados/Dados1.txt). For each company inside the holding company, the condition of default is observed, that is, it is observed if the company has or has not met its legal obligations, in the period from January 2009 to December 2009, according to the debt contract. The available data in the ith holding company, i = 1, 2, . . . , 148, with ni companies, consists of Wi1 , Wi2 , . . . , Wini , each one assuming value 0 or 1, depending  ni on the default status of the company (0 = No Default; 1 = Default). The response variable, Yi = j=1 Wij , assumes values at {0, 1, . . . , ni } according to the number of companies in default inside the ith holding. A total of 10 covariates were initially available in the data set, but only three covariates are considered in this analysis. Covariates with an excess of missing data, suspicious values and with the same value in all the cases were dropped from the data set. The three holding company covariates are the number of partners in the corporation, xi1 ; the size of the corporation, xi2 ; and the number of overdue promissory notes of the corporation with any delay in the last year, xi3 . The specific covariate, ri1 , considered in the analysis, is the revenue of the companies. The presence of correlation among any two companies inside the holding is possible due to different facts, such as the companies sharing the same Board of Directors, sharing the same economic priorities of the holding and the companies experiencing inherent weaknesses of the holding, which would limit their ability to overcome a financial crisis. Moreover, it is intuitive to assume that the greater the difference in revenues between any two companies within the corporation, the less the dependency between them, that is, companies with greater revenues do not rely on the companies with lower revenues, while companies with similar revenues would have a greater dependency between them in order to maintain their market positions, trading services, sales and information. For these reasons, the exponential correlation structure appears to be a good option. This correlation structure is given by h(v(ri ), γ ) = exp(−γ maxs,t |ri1s − ri1t |/ maxi maxs,t |ri1s − ri1t |), with s, t = 1, . . . , ni and i = 1, . . . , m, which is in the [0, 1] scale. Note that if all companies in a holding have the same revenues, either they default in block, or none of them defaults (perfect correlation), which could be a limit case. The main interest of the analysis is to fit a regression model that can be used to determine a probability of default of a new holding company. It is assumed that a holding company will be in default if, at least, one of its companies is in default. As it is well known, the most common approach for modeling default is by using a logistic regression model. The correlated binomial regression model is used as a new approach in this analysis with the aim of providing further inferences. The same vague prior distributions used in a simulation study were used here. The prior N (0, 10−4 ) is considered for each of the parameters βr , r = 0, . . . , 3, and log(γ ). Correlated binomial regression models are fitted to the data using the four link functions. The results obtained by the model selection DIC are shown in Table 6. As can be observed by these results, the model with the logit link function is identified as the best choice. The posterior summaries of the parameters for the fitted model with logit link function are shown in Table 7. Note that both the inter-quantile and HPD credibility intervals of the correlation structure parameter, γ , do not contain zero, corroborating with the fact that a correlated binomial regression model should be used in the analysis. Standard convergence diagnostics (Geweke, 1992) show that the MCMC chains have converged (Geweke tests present values in the (−1.2, 1.2) range). The assumption of independence and the presence of outliers can be observed by examining the residuals plotted in time order, if the order is available. The model specification and, again, the presence of outliers are observed by examining the residuals plotted against predicted values via the predictive distribution. These plots are presented in Fig. 1a and b for both

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

2523

Fig. 1. Regarding the correlated binomial regression model. (a) Residuals based on the predicted values versus predicted values; (b) residual based on the posterior distribution of the model parameters versus predicted values; (c) means and HPD credibility intervals, with 95% credibility, for residual posterior distributions versus expected values E (Yi |D , θ).

Fig. 2. Boxplots of the MCMC samples from the residual posterior distributions versus the expected values E (Yi |D , θ).

types of Bayesian residuals. In Fig. 1a, the residuals are based on the predicted values obtained by the posterior predictive distribution, as described in (10). In Fig. 1b, the residuals are based on the posterior distribution of the model parameters, that is, for each parameter value in the MCMC sample, the Eq. (11) is calculated. The means and HPD credibility intervals, with 95% credibility, for the residual posterior distributions versus expected values E (Yi |D , θ) are presented in Fig. 1c. Using the HPD interval as a reference, this plot indicates case 113 as a possible outlier. The boxplots of the MCMC samples from the residual posterior distributions, for each of the expected values E (Yi |D , θ), are presented in Fig. 2. The ranges of these boxplots are small due to the low variation of the data points in each sample. Cases that have high dispersions require further investigation. The assessment of the impact caused by case 113 in the estimation of parameters is shown in Table 8. The data set does not provide any calibration value greater than 0.61, indicating that, despite case 113 being indicated as an outlier, the KL divergence does not classify this point as influential in the estimation process. Table 8 presents the posterior mean and the relative change of this mean value with respect to the mean calculated omitting case 113. The calibration value for this case is equal to 0.56. To reinforce this need of using the proposed model in this data set, the usual binomial regression fitted model has a DIC value of 1.483, the beta-binomial regression fitted model has a DIC value of 1.340, while the correlated binomial regression fitted model has a DIC value of 0.679. The correlated binomial regression and beta-binomial regression models fitted using the logit link function and the exponential correlation structure. Besides the smallest DIC in its favor, the analysis described

2524

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

Table 8 Posterior mean and relative change of the mean value with respect to the mean calculated omitting some cases. Parameter Omitted

γ

Cases

Mean

%

Mean

β0 %

β1 Mean

%

Mean

β2 %

β3 Mean

%

{None} {113}

1.81 1.80

– −0.55

−3.16 −3.19

– 0.94

−2.53 −2.67

– −5.53

0.36 0.36

– –

0.42 0.46

– 9.52

Fig. 3. Regarding the beta-binomial regression model. (a) Residuals based on the predicted values versus predicted values; (b) residual based on the posterior distribution of the model parameters versus predicted values; (c) means and HPD credibility intervals, with 95% credibility, for residual posterior distributions versus expected values E (Yi |D , θ).

in this section indicates that the alternative model provides a very good fit for this data set. Fig. 3a–c are similar to Fig. 1a–c for the beta-binomial regression model. All these plots illustrated several large residuals indicating that the beta-binomial regression model is misfitted to the data. 8. Conclusions It is common to use binomial regression models in the data set whose response variables are frequency of events in independent clusters with the presence of covariates. However, an important assumption in the setting of the usual binomial regression model is the independence among the Bernoulli trials. For situations where this independence is not feasible there are models such as beta-binomial models which can be used. In this work an alternative class of correlated binomial regression models has been proposal which outperforms the aforementioned ones. This regression structure, besides to model the probability of success of an event of interest in a particular cluster, allows us to insert a correlation structure to model the dependence between the Bernoulli trials within the clusters. Different link functions can be used to provide the relationship between the linear predictor and the parameters of interest and different correlation structures can be suggested taking into account the individual covariates inside the cluster. The developed methodology can make the analysis more realistic to model the probability of success in data sets of event frequencies in the presence of dependence among the trials. Acknowledgments The research of Rubiane M. Pires is supported by the Brazilian organization CAPES. The authors would like to thank two anonymous referees for their helpful comments and suggestions for improving this paper.

R.M. Pires, C.A.R. Diniz / Computational Statistics and Data Analysis 56 (2012) 2513–2525

2525

References Albert, J.H., Chib, S., 1995. Bayesian residual analysis for binary response regression models. Biometrika 82 (4), 747–759. Altham, P.M.E., 1978. Two generalizations of the binomial distribution. Journal of the Royal Statistical Society, Series C 27 (2), 162–167. Bahadur, R.R., 1961. A representation of the joint distribution of responses to n dichotomous items. In: Studies Item Analysis and Prediction. Stanford University Press, Solomon. Chen, M.H., Shao, Q.M., 1999. Monte carlo estimation of bayesian credible and hpd intervals. Journal of Computational and Graphical Satatistics 8 (1), 69–92. Chen, M.H., Shao, Q.M., Ibrahim, J., 2000. Monte Carlo Methods in Bayesian Computation, 95. Springer, New York. Cho, H., Ibrahim, J.G., Sinha, D., Zhu, H., 2009. Bayesian case in influence diagnostics for survival models. Biometrics 65 (1), 116–124. Cook, R., Weisberg, S., 1982. Residuals and influence in regression. In: Monographs on Statistics and Applied Probability. Chapman and Hall, London. Cressie, N.A.C., 1993. Statistics for Spatial Data. Wiley-Interscience, New York. Diebolt, J., Robert, C.P., 1994. Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society, Series B 56 (2), 363–375. Diniz, C.A.R., Tutia, M.H., Leite, J.G., 2010. Bayesian analysis of a correlated binomial model. Brazilian Journal of Probability and Statistics 24 (1), 68–77. Efron, B., 1986. Double exponential families and their use in generalized linear regression. Journal of the American Statistical Association 81 (395), 709–721. Fu, J., Sproule, R., 1995. A generalization of the binomial distribution. Communications in Statistics—Theory and Methods 24 (10), 2645–2658. Gelman, A., Carlin, J., Stern, H., Rubin, D.B., 2003. Bayesian Data Analysis, second ed. In: Texts in Statistical Science, Chapman and Hall/CRC, London. Geweke, J., 1992. Evaluating the Accuracy of Sampling-Based Approaches to Calculating Posterior Moments. Oxford University Press, Oxford. Jennrich, R.I., Schluchter, M.D., 1986. Unbalanced repeated-measures models with structured covariance matrices. Biometrics 42 (4), 805–820. Kolev, N., Paiva, D., 2008. Random sums of exchangeable variables and actuarial applications. Insurance: Mathematics and Economics 42 (1), 147–153. Kupper, L.L., Haseman, J.K., 1978. The use of a correlated binomial model for the analysis of certain toxicological experiments. Biometrics 34 (1), 69–76. Lindsey, J.K., 1995. Modelling frequency and count data. In: Oxford Statistical Science. Oxford University. Lindsey, J.K., Altham, P.M.E., 1998. Analysis of the human sex ratio by using overdispersion models. Journal of the Royal Statistical Society, Series C 1 (47), 149–157. Luceño, A., 1995. A family of partially correlated poisson models for overdispersion. Computational Statistics and Data Analysis 20 (5), 511–520. Luceño, A., Ceballos, F., 1995. Describing extra-binomial variation with partially correlated models. Communications in Statistics—Theory and Methods 24 (6), 1637–1653. McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, second ed. Chapman and Hall, London. Ochi, Y., Prentice, R.L., 1984. Likelihood inference in a correlated probit regression model. Biometrika 71 (3), 531–543. Peng, F., Dey, D., 1995. Bayesian analysis of outlier problems using divergence measures. The Canadian Journal of Statistics 23 (2), 199–213. Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association 81 (394), 321–327. Prentice, R.L., 1988. Correlated binary regression with covariates specific to each binary observation. Biometrics 44 (4), 1033–1048. Robert, C., Casella, G., 2004. Monte Carlo statistical methods. In: Springer Texts in Statistics. springer. Russell, D.W., 1996. Heterogeneous variance: covariance structures for repeated measures. Journal of Agricultural, Biological, and Environmental Statistics 1 (2), 205–230. Sherman, M., 2011. Spatial Statistics and Spatio–Temporal Data: Covariance Functions and Directional Properties. In: Wiley Series in Probability and Statistics, John Wiley and Sons. Skellam, J.G., 1948. A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. Journal of the Royal Statistical Society, Series B 10 (2), 257–261. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A., 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B 64 (4), 583–639. Tanner, M.A., Wong, W.H., 1987. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82 (398), 528–540. Williams, D.A., 1982. Extra-binomial variation in logistic linear models. Journal of the Royal Statistical Society, Series C 31 (2), 144–148. Zimmerman, D.L., Harville, D.A., 1991. A random field approach to the analysis of field-plot experiments and other spatial experiments. Biometrics 47 (1), 223–239.