Computational Statistics and Data Analysis 71 (2014) 274–287
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
State space mixed models for binary responses with scale mixture of normal distributions links Carlos A. Abanto-Valle a,∗ , Dipak K. Dey b a
Department of Statistics, Federal University of Rio de Janeiro, Caixa Postal 68530, CEP: 21945-970, RJ, Brazil
b
Department of Statistics, University of Connecticut, USA
article
info
Article history: Received 8 April 2012 Received in revised form 13 January 2013 Accepted 15 January 2013 Available online 21 January 2013 Keywords: Binary time series Longitudinal data Markov chain Monte Carlo Particle learning Probit Scale mixture of normal links Sequential Monte Carlo State space models
abstract A state space mixed models for binary time series where the inverse link function is modeled to be a cumulative distribution function of the scale mixture of normal (SMN) distributions. Specific inverse links examined include the normal, Student-t, slash and the variance gamma links. The threshold latent approach to represent the binary system as a linear state space model is considered. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced for parameter estimation. The proposed methods are illustrated with real data sets. Empirical results showed that the slash inverse link fits better over the usual inverse probit link. © 2013 Elsevier B.V. All rights reserved.
1. Introduction In many areas of application of statistical modeling one encounters observations that take one of two possible forms. Such binary data are often measured with covariates or explanatory variables that either continuous or discrete or categorical. Time series of binary responses may adequately be described by Generalized linear models (McCullagh and Nelder, 1989). However, if serial correlation is present or if the observations are overdispersed, these models may not be adequate, and several approaches can be taken. Generalized linear state space models also address those problems and are treated in a paper by West et al. (1985) in a conjugate Bayesian setting. They have been subject to further research by Fahrmeir (1992), Song (2000), Carlin and Polson (1992) and Czado and Song (2008) among others. Consider a binary time series {Yt , t = 1, . . . , T }, taking the values 0 or 1 with probability of success given by πt and which is related with a time-varying covariates vector xt = (xt1 , . . . , xtk )′ and a q-dimensional latent state variable θ t . We consider a Generalized linear state space model framework for binary responses in the following way Yt ∼ B er(πt ) t = 1, . . . , T ,
(1)
πt = F (xt β + St θ t ),
(2)
θ t = Gt θ t + ηt ηt ∼ Nq (0, Wt ).
(3)
′
′
In the above setup the observed process {Yt } is described by Eqs. (1)–(2), where πt = P (Yt = 1 | θ t , xt , St ) is the conditional probability of success, St is a q-dimensional vector, β is a k-dimensional vector of regression coefficients and
∗
Corresponding author. Fax: +55 2125627374. E-mail addresses:
[email protected],
[email protected] (C.A. Abanto-Valle).
0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2013.01.009
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
275
xt = (xt1 , . . . , xtk )′ is a k × 1 vector of covariates. The system process is defined as a first order Markov process in Eq. (3), where Gt is the q × q transition matrix, Wt is the covariance matrix of error term ηt , B er(·) and Nq (., .) indicate the Bernoulli and the q-dimensional normal distributions respectively. In the terminology of generalized linear models (McCullagh and Nelder, 1989), F is the inverse link function. For ease of exposition, we refer to F as the link function in this article. A critical issue in modeling binary response data is the choice of the links. In the context of binary regression problems, the probit link is widely used in the literature. Albert and Chib (1993) using the data augmentation principle introduced the threshold latent approach to deal with the symmetric probit and Student-t links in an elegant way. Recently, Naranjo et al. (forthcoming) used the exponential power link. Other symmetric links using normal scale mixture links in a nonparametric setup are described in Basu and Mukhopadhyay (2000a,b). The binary state space model with probit link using the threshold approach (Albert and Chib, 1993) have been used by Carlin and Polson (1992) and Song (2000) without including covariates. Czado and Song (2008) introduced covariates for binary state space models with probit link and called the resulting class as binary state space mixed models (BSSMM). They justified that including regression variables is appealing as it would enable us to quantify the relationship between the probability of success and covariates. In this paper, we extend the BSSMM with probit link (Czado and Song, 2008) by assuming the flexible class of scale mixtures of normal (SMN) links (Lange and Sinsheimer 1993; Chow and Chan 2008) and the univariate latent states follow a first order autoregressive process. Interestingly, this rich class contains as proper elements the normal (BSSMM-N), Student-t (BSSMM-T), slash (BSSMM-S) and variance gamma (BSSMM-VG) links. All these distributions have heavier tails than the normal one, and thus can be used for robust inference in these types of models. We refer to this generalization as BSSMM-SMN. Inference in the class of BSSMM-SMN is performed under a Bayesian paradigm via MCMC methods, which permits to obtain the posterior distribution of parameters by simulation starting from reasonable prior assumptions on the parameters. Using the threshold latent approach (Albert and Chib, 1993), we simulate the latent states in an efficient way by using the simulation smoother of de Jong and Shephard (1995). The remainder of this paper is organized as follows. Section 2 gives a brief review about the SMN distributions and links. Section 3 outlines the general class of the BSSMM-SMN models as well as the Bayesian estimation procedure using MCMC methods. Particle Filtering methods are developed to calculate the predictive likelihood function. Section 5 is devoted to the application and model comparison among particular members of the BSSM-SMN models using two real data sets. Finally, some concluding remarks and suggestions for future developments are given in Section 6. 2. Scale mixture of normal distributions A random variable Y belongs to the SMN family if it can be expressed as Y = µ + κ(λ)1/2 X ,
(4)
where µ is a location parameter, X ∼ N (0, σ ), λ is a positive mixing random variable with cdf H (· | ν) and pdf h(·|ν), ν is a scalar or parameter vector indexing the distribution of λ and κ(·) is a positive weight function. As in Lange and Sinsheimer (1993) and Chow and Chan (2008), we restrict our attention to the case in that κ(λ) = 1/λ. Given λ, we have Y |λ ∼ N (µ, λ−1 σ 2 ) and the pdf of Y is given by 2
fSMN (y|µ, σ , ν) = 2
∞
φ(y|µ, λ−1 σ 2 )dH (λ|ν),
(5)
−∞
where φ(· | µ, σ 2 ) denotes the density of the univariate N (µ, σ 2 ) distribution. From Eq. (5), we have that the cdf of the SMN distributions is given by FSMN (y|µ, σ 2 , ν) =
y
−∞
∞
φ(u|µ, λ−1 σ 2 )dH (λ|ν)du −∞
λ1/2 [y − µ] dH (λ|ν), = Φ σ −∞
∞
(6)
where Φ (·) is the cdf of the standard normal distribution. The notation Y ∼ SMN(µ, σ 2 , ν, H ) will be used when Y has pdf (5) and cdf (6), as before H (· | ν) is the cdf of the mixing variable λ. As was mentioned above, the SMN family constitutes a class of thick-tailed distributions including the normal, the Student-t, the Slash and variance gamma distributions, which are obtained respectively by choosing the mixing variables as: λ = 1, λ ∼ G( ν2 , ν2 ), λ ∼ B e(ν, 1) and λ ∼ IG( ν2 , ν2 ), where G(., .), B e(., .) and IG(., .) denote the gamma, beta and inverse gamma distributions respectively. 3. Binary responses state space mixed models with normal scale mixture links In this section we introduce the BSSM with SMN links using a latent variable representation in order to develop an efficient MCMC algorithm for parameter estimation.
276
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
3.1. Model setup Let Y1:T = (Y1 , . . . , YT )′ , where Yt , t = 1, . . . , T , denote T independent binary random variables. As before, xt is a k × 1 vector of covariates. According to Albert and Chib (1993), we introduce T latent variables Z1 , . . . , ZT , such that Zt > 0 Zt ≤ 0.
Yt =
1 0
(7)
We assume that
πt = P (Yt = 1 | θt , xt , β) = P (Zt > 0 | θt , xt , β) = FSMN (xt β + θt ) = ′
∞
Φ λt [xt β + θt ] dH (λt |ν), 1 2
′
(8)
−∞
which is the cdf in (6) with µ = 0 and σ 2 = 1. Using the latent threshold vector Z1:T = (Z1 , . . . , ZT )′ , we have the linear state space model with −1/2 Zt = x′t β + θt + λt ϵt ,
(9)
θt = δθt −1 + τ ηt ,
(10)
λt ∼ p(λt |ν),
(11)
where, the innovations ϵt and ηt are assumed to be mutually independent and normally distributed with mean zero and unit variance and p(λt | ν) is the mixing density. We assume that | δ |< 1, i.e., the latent state process is stationary and
τ θ0 ∼ N (0, 1−δ 2 ). The Eqs. (9) and (10), conditioned on δ , the vector β and the mixing variable λt , represent jointly a linear state space model. Clearly θt represents a time-specific effect on the observed process. Under a Bayesian paradigm, we use MCMC methods to conduct the posterior analysis in the next subsection. Conditionally to λt , some derivations are common 2
to all members of the BSSMM-SMN family (see Appendix for details). 3.2. Inference procedure using MCMC A Bayesian approach to parameter estimation of the model defined by Eqs. (9)–(11), techniques using Monte Carlo simulation via Markov Chain (MCMC) is adopted. Suppose that the model depends on a parameter vector Ψ = (β′ , δ, τ 2 , ν)′ . Then the likelihood function L(Ψ ) is not easy to calculate. The Bayesian approach for estimating the parameters in the model uses the data augmentation principle, which considers Z1:T , θ0:T and λ1:T as latent parameters. The joint posterior density of parameters and latent variables can be written as p(Z1:T , θ 0:T , λ1:T , Ψ | Y1:T ) ∝ p(Z1:T | θ 0:T , λ1:T , Ψ , Y1:T )p(θ 0:T | Ψ )p(λ1:T | Ψ )p(Ψ ),
(12)
where p(Z1:T
T ′ | θ 0:T , λ1:T , Ψ , Y1:T ) = {1(Zt ≥ 0)1(Yt = 1) + 1(Zt < 0)1(Yt = 0)}φ(Zt | xt α + θt , λt ) ,
(13)
t =1
p(θ 0:T | Ψ ) = φ
p(λ1:T | Ψ ) =
T
θ0 | 0,
τ2 1 − δ2
T
φ(θt | δθt −1 , τ 2 ),
(14)
t =1
p(λt | ν),
(15)
t =1
where 1(X ∈ A) denotes an indicator function that is equal to 1 if the random variable X is contained in the set A and zero otherwise, and p(Ψ ) indicates the prior distribution. We assume the prior distribution as p(Ψ ) = p(β)p(δ)p(τ 2 )p(ν). For the common parameters of the BSSMM-SMN class, the prior distributions are set as: β ∼ Nk (β0 , 60 ), δ ∼ N(−1,1) (δ0 , σδ2 ) and τ 2 ∼ IG( n20 , T20 ), where Nk (., .), N(a,b) (., .), IG(., .) denote the k-variate normal, the truncated normal on interval (a, b) and the inverse gamma distributions respectively. The p(ν) is specified for each member of the BSSMM-SMN class. As the posterior distribution in (12) is intractable analytically, we draw random samples of Ψ , Z1:T , λ1:T and θ 0:T from their full conditional distributions using the Gibbs sampling. The sampling scheme is described by the following algorithm: (i)
(i)
Algorithm 1. 1. Set i = 0 and get starting values for the parameters Ψ (i) and the latent variables θ 0:T , λ1:T ; 2. Draw 3. Draw
(i+1) Z1:T (i+1) θ 0:T
∼ ∼
(i) (i) p(Z1:T |θ 0:T , λ1:T , Ψ (i) , Y1:T ); (i) (i+1) p(θ 0:T |λ1:T , Ψ (i) , Z1:T , Y1:T );
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
(i+1)
4. Draw λ1:T
(i+1)
277
∼ p(λ1:T |θ (1i:+T 1) , Ψ (i) , Z(1i:+T 1) , Y1:T ); ∼ p(Ψ |θ (1i:+T 1) , λ(1i:+T 1) , Z(1i:+T 1) , Y1:T );
5. Draw Ψ 6. Set i = i + 1 and return to step 2 until achieving convergence.
Cycling through 2–5 is a complete sweep of this sampler. The MCMC sampler will require us to perform many thousands of sweeps to generate samples from the posterior distribution p(Z1:T , θ 0:T , λ1:T , Ψ | Y1:T ). Details on the full conditionals of Ψ , the mixing variables λ1:T , the threshold variables Z1:T are given in the Appendix. Conditional on λ1:T , Z1:T and Ψ , Eqs. (9)–(11) define a linear state space model, we sample the latent states θ 0:T in step 3 of Algorithm 1 using the simulation smother of de Jong and Shephard (1995). 3.3. Model comparison with the log predictive score Scoring rules provide summary measures for the evaluation of probabilistic forecast, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. The fit of the models studied here will be assessed using log predictive scores (Good, 1952; Gneiting and Raftery, 2007; Delatola and Griffin, 2011). The average log predictive score for one-step ahead predictions is given by LPS = −
T 1
T t =1
log p(Yt | Y1:t −1 , Ψˆ ),
(16)
where Y1:t −1 = (Y1 , . . . , Yt −1 )′ , Ψˆ is an estimate of the model parameters and p(Yt | Y1:t −1 , Ψˆ ) the one-step ahead predictive density. Smaller values of the LPS indicates a better fitting model. Because p(Yt | Y1:t −1 , Ψˆ ) does not have closed form, it can be evaluated numerically by using the particle learning (PL) method (Carvalho et al., 2010; Lopes et al., 2011), which is described next. 3.4. Particle learning algorithm Sequential Monte Carlo (SMC) is an alternative to traditional MCMC methods that is designed for online inference in dynamic, possibly nonlinear models. It allows us to provide the full set of filtering distributions. Carvalho et al. (2010) recommended particle learning (PL) for filtering nonlinear state-space models. Central to PL is the creation of a essential (j) state vector Vt to be tracked sequentially. Specifically, at time t, we have a set of random draws, or particles {Vt }Nj=1 , which has a filtering distribution p(Vt |Y1:t ) that contains the sufficient information about all of the uncertainties given the data up to time t. Given the next observation, Yt +1 , the key task is to update the particles from t to t + 1 and provide draws from the next filtering distribution. The next filtering distribution is P (Vt +1 |Y1:t +1 ) =
∝
p(Vt +1 |Vt , Yt +1 )dP(Vt |Y1:t +1 ) p(Vt +1 |Vt , Yt +1 )p(Yt +1 |Vt )dP(Vt |Y1:t ),
where p(Yt +1 |Vt ) and p(Vt +1 |Vt , Yt +1 ) denote the predictive and propagation rules, respectively, and Y1:t = (Y1 , . . . , Yt )′ . N Here P(Vt |Y1:t ) denotes the current distribution of Vt and in particle form corresponds to j=1 δV (j) , with δ a Dirac measure. This suggest the following particle approximation:
t
• Re-sample Vt(j) : For j = 1, . . . , N draw ζ (j) ∼ Mult(w (j) ) where w(j) ∝ p(Yt +1 |Vt(j) ), ζ ( j) • Propagate with a draw Vt(+j)1 ∼ p(Vt +1 |Vt , Yt +1 ) to obtain a new collection of particles {Vt(+j)1 }Nj=1 ∼ P (Vt +1 |Y1:t +1 ). Now, we apply the particle learning algorithm to the system defined by Eqs. (7), (9)–(11). The key to our approach is the use of an essential state vector Vt = (λt +1 , sθt ). The predictive distribution is given by Ut +1 = P (Yt +1 = 1 | sθt , Ψ , λt +1 ) = 1 − Φ
−1 −ft +1 Qt +21
which plays an important role in the re-sample step. We suppose that Ψ = (β′ , δ, τ 2 , ν)′ is known. The propagation step requires one be able to sample λt +1 from p(λt +1 | sθt , Ψ , Yt +1 ), Zt +1 from p(Zt +1 | sθt , Ψ , Yt +1 ), θt and θt +1 from p(θt , θt +1 | sθt , Ψ , Yt +1 ), with conditional state sufficient statistics sθt +1 = (mt +1 , Ct +1 ) given by the standard Kalman filter recursions (West and Harrison, 1997). More explicitly, the conditional posterior θt +1 | sθt , Ψ ∼ N (mt +1 , Ct +1 ) with moments given by mt +1 = (1 − At +1 )at +1 + At +1 (Zt +1 − ft +1 ),
(17)
Ct +1 = (1 − At +1 )Rt +1 ,
(18)
1 where at +1 = δ mt , At +1 = Rt +1 /Qt +1 , Rt +1 = δ 2 Ct + τ 2 , ft +1 = x′t +1 β + at +1 , Qt +1 = Rt +1 + λ− t +1 . We summarize the steps of the PL scheme in Algorithm 2.
278
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
Algorithm 2. 1. For i = 1, . . . , N (i) 1.1 Sample λt +1 ∼ p(λt +1 | ν), (i)
(i)
θ(i)
1.2 Set Vt = (λt +1 , st ). (i) (i) (i) (i) 2. For i = 1, . . . , N, draw ζ (i) ∼ Mult(wt +1 ) where wt +1 ∝ [Ut +1 ]Yt +1 [1 − Ut +1 ]1−Yt +1 . ζ (i) ζ (i) ζ (i) ζ (i) (i) 3. Sample Zt +1 ∼ I (Yt +1 = 1)N(0,∞) (ft +1 , Qt +1 ) + I (Yt +1 = 0)N(−∞,0) (ft +1 , Qt +1 ). ζ (i)
4. Sample (θt , θt +1 )(i) from t +1 | Vt p(θt , θ (i)
4.1 Sample θt
∼N
(i)
nt = (i)
Nt
=
(i)
(i) nt (i) Nt
4.2 Sample θt +1
,
1 (i) Nt
, where
δ(Zt(+i)1 − x′t +1 β) −1ζ (i)
λ t +1
+ τ2
, Z(1i:)t +1 ).
ζ (i)
+
mt
ζ (i)
Ct
,
δ2 . + −1ζ (i) Ct λt +1 + τ 2 (i) kt 1 ∼N (i) , (i) , where
(i)
1
ζ (i)
Kt
ζ (i)
Kt
δθt(i)
(i)
kt = λt +1 (Zt +1 − x′t +1 β) + , τ2 1 δ2 (i) Kt = ζ (i) + −1ζ (i) . Ct λt +1 + τ 2 θ(i) 5. For i = 1, . . . , N, update st +1 using Eqs. (17) and (18). Finally, the predictive likelihood p(Yt +1 | Y1:t , Ψ ) can be obtained by p(Yt +1 | Y1:t , Ψ ) ≈
N 1
N i =1
[Ut(+i) 1 ]Yt +1 [1 − Ut(+i) 1 ]1−Yt +1 .
(19)
3.5. Out-of-sample forecasting We have that K -step ahead prediction density can be calculated using the composition method through the following recursive procedure: p(ZT +K | y1:T ) = p(θT +K
| Ψ )p(Ψ | y1:T ) dθT +K dλT +K dΨ ,
p(ZT +K | λT +K , θT +K , Ψ )p(θT +K | Ψ , y1:T )p(λT +K | Ψ , y1:T ) = p(θT +K | Ψ , θT +K −1 )p(θT +K −1 | Ψ , y1:T ) dθT +K −1 .
Evaluation of these integrals is straightforward, by using Monte Carlo approximation. To initialize a recursion, we use N (i) (i) (i) (i) draws {θT , λT , Ψ (i) }Ni=1 from the MCMC sample. Then given these N draws, sample N draws θT +k from p(θT +k | Ψ (i) , θT +k−1 ) (i)
and λT +k from p(λT +k | Ψ (i) ) for i = 1, . . . , N and k = 1, . . . , K , by using Eqs. (10) and (11), respectively. Finally, sample N (i)
(i)
draws {yT +k }Ni=1 from p(ZT +k | λT +k , θT +k , Ψ (i) ). With draws from ZT +k , θT +k and λT +k the conditional probability πT +k can be calculated easily. 4. Simulation study We illustrate the Bayesian methods introduced in Section 3, considering two tasks: parameter recovery and model selection using the LPS criteria. With this objective, we conduct a simulation study assuming two covariates xt1 and xt2 . We independently generate xt1 ∼ N (0, 1) and xt2 ∼ N (0, 9) for t = 1, . . . , T , where T is the sample size. In our simulation study, we use two sets with T = 200 and T = 400 respectively. Then, we generate 300 data sets for the BSSMM-N, BSSMM-T and BSSMM-S models using the same set of xt1 and xt2 values, in such a way that for each data set, T independent Bernoulli response variables Yt are obtained and assuming that the proportion of ones is 0.55. We set β = (β0 , β1 , β2 )′ = (−0.1, 0.5, −0.45)′ , θ0 = 0, δ = 0.95 and τ 2 = 0.3. We assume ν = 4 and ν = 6 for the BSSMM-T and BSSMM-S models respectively. We set the priors as: δ ∼ N(−1,1) (0.95, 100), τ 2 ∼ IG(2.5, 0.125) and β ∼ N3 (β0 , Σ0 ), where β0 = 0 and Σ0 = 5002 I3 , 0 indicates a 3 × 1 vector os zeros and I3 the identity matrix of order 3. The prior distributions on the shape parameters for the BSSMM-S and in the BSSMM-T models were chosen as ν ∼ G(5.0, 0.8) and a non-informative prior as in Fonseca et al. (2008), respectively. The priors’ mean and variance for δ are respectively, 0.0032 and 0.3328. So, this prior is equivalent to the uniform distribution on interval (−1, 1), which gives zero mean and variance of 0.3333.
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
279
Table 1 Simulation study based on 300 replications. Parameter estimation results for the BSSMM-N, BSSMM-T and BSSMM-S models. T
Parameter
True value
Mean
Bias
MSE
CP
−0.10 −0.10
0.2408 0.1783 0.5120 0.4913 −0.4475 −0.4479 0.9410 0.9500 0.3005 0.3039
0.3408 0.2782 0.0120 −0.0087 0.0025 0.0021 −0.0089 0.0000 0.0005 0.0039
0.4189 0.3320 0.0305 0.0147 0.0124 0.0076 0.0022 0.0005 0.1148 0.0720
99.67 99.67 95.00 96.33 98.33 98.33 97.67 98.33 96.33 96.33
0.2775 0.1642 0.5039 0.4911 −0.4537 −0.4503 0.9428 0.9506 0.3118 0.2943 6.6078 6.9087
0.3775 0.2642 0.0039 −0.0088 −0.0037 −0.0003 −0.0072 0.0006 0.0118 −0.0056 2.6078 2.9087
0.3842 0.3842 0.0400 0.0138 0.0154 0.0083 0.0025 0.0005 0.3061 0.0464 8.3993 12.0527
99.67 99.67 96.33 98.33 96.33 98.33 98.33 98.67 99.67 99.67 100.00 100.00
0.2926 0.1078 0.5316 0.5056 −0.4689 −0.4632 0.9425 0.9480 0.3392 0.3184 6.2439 6.2330
0.3926 0.2078 0.0316 0.0056 −0.0189 −0.0132 −0.0075 −0.0020 0.0392 0.0184 0.2439 0.2330
0.4689 0.2272 0.0349 0.0142 0.0157 0.0065 0.0013 0.0005 0.1133 0.0281 0.1553 0.2043
96.33 100.00 96.33 98.33 100.00 98.33 98.33 96.33 100.00 96.33 100.00 100.00
BSSMM-N 200 400 200 400 200 400 200 400 200 400
β0
200 400 200 400 200 400 200 400 200 400 200 400
β0
β1 β2 δ τ2
0.50 0.50 −0.45 −0.45 0.95 0.95 0.30 0.30
BSSMM-T
β1 β2 δ τ2 ν
−0.10 −0.10 0.50 0.50 −0.45 −0.45 0.95 0.95 0.30 0.30 4.00 4.00
BSSMM-S 200 400 200 400 200 400 200 400 200 400 200 400
β0 β1 β2 δ τ2 ν
−0.10 −0.10 0.50 0.50 −0.45 −0.45 0.95 0.95 0.30 0.30 6.00 6.00
For each generated data set, we fit the BSSMM-N, BSSMM-T and BSSMM-S models using the estimation procedure described in Section 3. In each case, we use 25 000 iterations of the MCMC procedure. The first 5000 are discarded as a burn-in period. We save the next 20 000 values. All the calculations were performed running stand-alone code developed by us using an open source C++ library for statistical computation, the Scythe statistical library (Pemstein et al., 2011), which is available for free download at http://scythe.wustl.edu. In Table 1 we summarize the mean of the posterior mean, the bias, the mean square error (MSE) and the coverage probability (CP). For each one of the simulated models and sample sizes, we found that β0 is difficult to estimate, but in all the cases, the coverage probability of the true value to be in the 95% credibility interval is high (above 96.33%). Now, we analyze the results of the degrees of freedom of the BSSMM-T model. In Table 1, we found that the mean of the posterior mean is 6.6078 and 6.9087 for T = 200 and T = 400 respectively, a biased result, but we found that the mean of the posterior median are in bout cases 4.53 and 4.15 (not reported in Table 1), respectively and the CP in both cases is 1.00. For the other parameters, we found small values for the bias and MSE. In Table 2, we report the LPS criteria for model selection. For each one the 300 data sets simulated from the respectively true model, we fit the BSSMM-N, BSSMM-T and BSSMM-S models. In each case, we report the fraction of times that the LPS selects a model. When T = 200 the LPS selects correctly the BSSMM-N, BSSMM-T and BSSMM-S in 91.00%, 93.00% and 93.67% of the times, respectively. In the case of T = 400 the values of correct classification are 93.67%, 95.00% and 96.67% for the BSSMM-N, BSSMM-T and BSSMM-S, respectively. 5. Application In this section we re-analyze the infant sleep status (Czado and Song, 2008) and the deep brain stimulation Smith et al. (2009) using the methodology described in Section 3.
280
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287 Table 2 Simulation study based on 300 replications. Fraction of times that the LPS selects a model. T
Fitted model BSSMM-N
BSSMM-T
BSSMM-S
True model: BSSMM-N 200 400
0.9100 0.9367
200 400
0.0200 0.0167
200 400
0.0300 0.0167
0.0533 0.0367
0.0367 0.0263
True model: BSSMM-T 0.9300 0.9500
0.0500 0.0333
True model: BSSMM-S 0.9367 0.9667
0.0333 0.0167
5.1. Infant sleep data set A binary time series of infant sleep status were recorded in a 120 min electroencephalographic (EEG) sleep pattern study (Sttoffer et al., 1998). Careful consideration should be given to the lability of state and the disruption of the expected rapid eye movement (REM) and non-REM components of the neonatal or infant sleep cycle. So, here it is considered that Yt = 1 if during minute t the infant was judged to be in REM sleep cycle and otherwise Yt = 0. Two time-varying covariates are considered. Let xt1 be the number of body movements during the minute t and xt2 the number of body movements due not to sucking during minute t. As in Czado and Song (2008) our objective is to investigate whether or not the probability of being in the REM sleep status is significantly related to the two types of body movements xt1 or xt2 . Our analysis differs from them in the sense that we compare the fit of the probit (BSSMM-N), the Student-t (BSSMM-T), slash (BSSMM-S) and variance gamma (BSSMM-VG) links. So, from Eq. (8), we have that the conditional probability of success is given by
πt = P (Yt = 1 | β, θt , λt ) = FMSN (β0 + β1 xt1 + β1 xt2 + θt ), and the state equation is an AR(1) process as in Eq. (10). We set the prior as for the commons parameters as in the simulation study. The prior distributions on the shape parameters were chosen as: ν ∼ G(5.0, 0.8) for the BSSMM-S and BSSMM-VG models, respectively. In BSSMM-T, we assume a non-informative prior as in Fonseca et al. (2008). We fit the BSSMM-N, BSSMM-T, BSSMM-S and BSSMM-VG models. For each case, we conducted the MCMC simulation for 50 000 iterations. In all the cases, the first 10 000 draws were discarded as a burn-in period. In order to reduce the autocorrelation between successive values of the simulated chain, only every 20th values of the chain were stored. With the resulting 2000 values, we calculated the posterior means, the 95% credible intervals, Monte Carlo errors and the convergence diagnostic (CD) statistics (Geweke, 1992). If the sequence of the recorded MCMC output is stationary, it converges in distribution to the standard normal. According to the CD the null hypothesis that the sequence of 2000 draws is stationary was accepted at the 5% level, CD ∈ (−1.96, 1.96), for all the parameters in all the models considered here. Table 3 ∞ summarizes the results. The inefficiency factor is defined by 1 + s=1 ρs where ρs is the sample autocorrelation at lag s. It measures how well the MCMC chain mixes (see, e.g., Kim et al., 1998). It is the estimated ratio of the numerical variance of the posterior sample mean to the variance of the sample mean from uncorrelated draws. When the inefficiency factor is equal to m, we need to draw MCMC samples m times as many as the number of uncorrelated samples. From Table 3 examining the inefficiency coefficients, we found that our algorithm produces a good mixing of the MCMC chain. For all the models, the posterior means of δ are above 0.93, showing higher persistence of the autoregressive parameter for states variables and thus in binary time series. The posterior means τ 2 are between 0.27 and 0.34. These values are consistent with the results in Czado and Song (2008). The magnitude of the tail fatness is measured by the shape parameter ν in the BSSMM-T, BSSMM-S and BSSMM-VG models. In each case, the posterior means of ν are almost 7.5, 6.4 and 6.3, respectively. These results seem to indicate that the measurement errors of the Zt threshold variables are better explained by heavy-tailed distributions, as a consequence the corresponding links could be more convenient than the normal probit link. From Table 3, we found empirically that the influence of the number of body movements (x1 ) is marginal, since the corresponding 95% credible interval for β1 contains the zero value. On the other hand, the influence of the number of body movements not due to sucking (x2 ) is detected to be statistically significant. The negative value of the posterior mean for β2 shows that a higher number of body movements not due to sucking will reduce the probability of the infant being in REM sleep. Fig. 1 shows the posterior smoothed mean of states variables, θt . The states could be considered as an underlying continuous ‘‘sleep state’’. We found some differences between the fit of the different models, but in general the results are according with Czado and Song (2008). To assess the goodness of the estimated models, we calculate the log predictive score (LPS) defined in Eq. (16). To estimate the LPS, we use the particle filter method of Carvalho et al. (2010) with 5000 particles. The minimum value of the LPS gives
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
281
Fig. 1. Estimation results for the infant sleep status data set. Posterior smoothed mean of θt . BSSMM-N: solid line, BSSMM-T: +, BSSMM-S: ∗ and BSSMM-VG: −. Table 3 Estimation results for the infant sleep data set. First row: Posterior mean. Second row: Posterior 95% credible interval in parentheses. Third row: MC error. Fourth row: CD statistics. Fifth row: Inefficiency factors. Parameter
BSSMM-N
BSSMM-T
BSSMM-S
BSSMM-VG
β0
−0.0479 (−1.9425, 1.6770) 0.0483 1.39 6.04
−0.0281 (−2.3300, 2.5256)
−0.0550 (−2.2316, 2.0157)
0.0944 1.43 15.08
0.0842 0.68 14.03
0.3185 (−1.7192, 4.4206) 0.1764 1.25 31.43
β1
0.2706 (−0.0833, 0.6245) 0.0044 0.00 1.12
0.4336 (−0.0837, 1.1298) 0.0115 −0.32 2.97
0.3134 (−0.0942, 0.7338) 0.0056 0.37 1.33
0.2775 (−0.0791, 0.6670) 0.0054 −1.27 1.29
β2
−0.4679 (−0.9525, −0.0257) 0.0056 0.11 1.12
−0.6755 (−1.5048, −0.0866) 0.0131 0.28 2.62
−0.5196 (−1.0588, −0.0372) 0.0068 −0.72 1.33
−0.4647 (−0.9336, −0.0207) 0.0060 1.29 1.30
δ
0.9336 (0.8333, 0.9868) 0.0011 −0.72 1.67
0.9402 (0.8528, 0.9890) 0.0011 0.21 2.25
0.9349 (0.8341, 0.9887) 0.0012 −0.46 1.72
0.9424 (0.8512, 0.9922) 0.0019 −0.18 5.47
τ2
0.3060 (0.0663, 1.0169) 0.0149 0.53 7.04
0.3423 (0.0628, 1.1807) 0.0206 −1.27 6.82
0.3566 (0.0667, 1.2465) 0.0226 0.14 6.85
0.2789 (0.0512, 0.9980) 0.0205 1.51 11.66
ν
– – – – –
7.5621 (2.0410, 30.6290) 0.4931 −0.56 9.19
6.4002 (2.0310, 13.2640) 0.1265 −1.74 3.94
6.2396 (1.7800, 12.8040) 0.1118 1.65 3.15
Table 4 Data infant sleep: Model comparison.
BSSMM-N BSSMM-T BSSMM-S BSSMM-VG
LPS
Rank
0.4406 0.4379 0.4356 0.4401
4 2 1 3
the best fit. We compare the BSSMM-N, BSSMM-T, BSSMM-S and BSSMM-VG models. From Table 4, the LPS selects the BSSMM-S model as the best model. Fig. 2 shows the posterior smoothed mean and 95% credibility intervals of πt based on the MCMC output for the BSSMM-N (solid line) and BSSMM-S (dotted line) links. We divide the entire data in three panels with the objective to better visualization of the results. Both models show similar smoothed probabilities.
282
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
Fig. 2. Posterior smoothed mean of the conditional probability and 95% credibility limits for πt for BSSMM-N (solid line) and BSSMM-S (dotted line). The points are the true observations.
Fig. 3. CDF with location 0, scale 1. For ν , we use the posterior mean of the MCMC output from the respective model for the DBS data set.
We investigate the predictive ability of the BSSMM-N and BSSMM-S models. For this purpose, we divide our data set in an initial group with 110 observations and in the second group, we left the next 10 observations for out-of-sample forecast. We fit the BSSMM-N and BSSMM-S models, parameter estimation for this data set is not reported here. We simulate outof-sample states using the methods described in Section 3.5. As a byproduct of the MCMC simulation, we can found the conditional probability of success π110+k for the next 10 observations for each model considered. In accordance with Albert and Chib (1993) we define the ‘‘pseudo-residual’’ as rt = Yt − πt . We used the mean square error of forecast defined by MSEF =
K 1
K k=1
(YT +k − πT +k )2 .
(20)
We found that the MSEF are 0.0210 and 0.0316 for the BSSMM-S and BSSMM-N, respectively. So, the BSSMM-S outperforms the BSSMM-N model. 5.2. Deep brain stimulation Now, we consider responses from a monkey performing the attention paradigm described in Smith et al. (2009). The task consisted of making a saccade to a visual target followed by a variable period of fixation on the target and detection of a change in target color followed by a bar release. This standard task requires sustained attention because in order to receive a reward, the animal must release the bar within a brief time window cued by the change in target color (see Smith et al., 2009, for a detailed description of the experiment). Thus our behavioral data set for this experiment are composed of a time series of binary observations with a 1 corresponding to reward being delivered and a 0 corresponding to reward not being delivered at each trial, respectively. The goal of the experiment is to determine whether, once performance has diminished as a result of spontaneous fatigue, deep brain stimulation (DBS) allows the animal to recover its pre-fatigue level of performance. In this experiment, the monkey performed 1250 trials. Stimulation was applied during 4 periods across trials 300–364, 498–598, 700–799 and 1000–1099, indicated by shaded gray regions in Figs. 4 and 5. Dividing the results into periods when stimulation os applied (‘‘ON’’) and not applied (‘‘OFF’’), there are 240 correct responses out of 367 trials during the ON periods and 501 correct responses from 883 trials during the off periods. Out of 1250 observations, 741 (or 59.28%)
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
283
Table 5 Estimation results for the DBS data set. First row: Posterior mean. Second row: Posterior 95% credible interval in parentheses. Third row: MC error. Fourth row: CD statistics. Fifth row: Inefficiency factors. Parameter
BSSMM-N
BSSMM-T
BSSMM-S
BSSMM-VG
β0
1.1621 (−0.4972, 3.4056) 0.1058 1.38 28.89
1.4890 (−0.1961, 3.5161) 0.1141 0.67 31.70
1.8504 (−0.4034, 5.0480) 0.1718 −1.58 36.84
1.3288 (−0.1766, 2.8082) 0.0974 −0.79 16.34
−0.0021 (−0.0048, 0.0000)
−0.0025 (−0.0055, 0.0000)
−0.0029 (−0.0075, 0.0000)
−0.0026 (−0.0055, −0.0005)
0.0004 0.00 31.57
0.0002
0.0002
−0.54
−1.21
29.37
30.45
0.0002 0.79 30.79
δ
0.9938 (0.9823, 0.9994) 0.0003 −1.36 10.29
0.9941 (0.9831, 0.9994) 0.0003 −0.81 9.93
0.9926 (0.9798, 0.9993) 0.0004 −1.08 10.52
0.9953 (0.9853, 0.9994) 0.0004 −1.34 12.73
τ2
0.0108 (0.0059, 0.0193) 0.0002 0.31 7.04
0.0126 (0.0065, 0.0224) 0.0206 0.93 7.78
0.0234 (0.0125, 0.0432) 0.0005 0.36 11.36
0.0102 (0.0056, 0.0174) 0.0003 −1.26 13.66
ν
– – – – –
11.4105 (2.3180, 34.9660) 1.1229 −0.56 30.77
1.9487 (1.0410, 3.3940) 0.0764 −1.83 25.49
14.7991 (4.5330, 35.5310) 0.1118 1.80 35.19
β1
Fig. 4. Estimation results for the DBS data set. Posterior smoothed mean of θt . BSSMM-N: solid line, BSSMM-T: dotted line, BSSMM-S: tiny dotted line and BSSMM-VG:.–.
are correct responses. So, from Eq. (8), we have that the conditional probability of success is given by
πt = P (Yt = 1 | β, θt , λt ) = FMSN (β0 + β1 t + θt ). We set the priors as: δ ∼ N(−1,1) (0.95, 100), τ 2 ∼ IG(2.5, 0.125) and β ∼ N2 (β0 , Σ0 ), where β0 = 0 and Σ0 = 5002 I2 , 0 indicates a 2×1 vector as zeros and I2 the identity matrix of order 2. The prior distributions on the shape parameters for the BSSMM-S and in the BSSMM-T models were chosen as ν ∼ G(5.0, 0.8) and a non-informative prior as in Fonseca et al. (2008), respectively. As in the previous example, we fit the BSSMM-N, BSSMM-T, BSSMM-S and BSSMM-VG models. For each case, we conducted the MCMC simulation for 50 000 iterations. In all the cases, the first 10 000 draws were discarded as a burn-in period. In order to reduce the autocorrelation between successive values of the simulated chain, only every 20th values of the chain were stored. With the resulting 2000 values, we calculated the posterior means, the 95% credible intervals, Monte Carlo errors and the convergence diagnostic (CD) statistics (Geweke, 1992). As the CD ∈ (−1.96, 1.96), all the parameters converge. From Table 5, we found that for all the models considered here, the posterior means of δ are above 0.99, showing higher persistence of the autoregressive parameter for states variables and thus in binary time series. The posterior means of τ 2 are between 0.0102 and 0.0234, being the BSSMM-N, BSSMM-T and BSSMM-VG less variables than the BSSMM-S. The posterior mean of β1 is negative for all the models. It indicates that the response is affected as a result of spontaneous fatigue. We found that the posterior mean and 95% credibility interval for the shape parameter ν are 11.4105 and (2.3180, 34.9660), 1.9487 and (1.0410, 3.3949), 14.7991 and (4.5330, 35.5310), for the BSSMM-T, BSSMM-S and BSSMM-VG respectively. These results seem to indicate that the measurement errors of the Zt threshold variables are better explained by heavy-tailed distributions,
284
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
Fig. 5. Estimation results for the DBS data set. Posterior smoothed mean of πt . BSSMM-N: solid line, BSSMM-T: dotted line, BSSMM-S: tiny dotted line and BSSM-VG: dashed line. Table 6 DBS data set: Model comparison.
BSSMM-N BSSMM-T BSSMM-S BSSMM-VG
LPS
Rank
0.5810 0.5816 0.5801 0.5822
2 3 1 4
as a consequence the corresponding links could be more convenient than the normal probit link. To illustrate the tail behavior of the cumulative distribution functions (CDF), we plot in Fig. 3 the CDF of the normal, student-T , slash and variance gamma. We assume the location and scale parameters as being zero and one, respectively. For the shape parameter ν , we use the posterior mean of the respective model obtained from the MCMC output. We can see differences between the rates as them approaches to zero and one. Fig. 4 shows the posterior smoothed mean for the states θt for each one of the models fitted. The solid, dotted, tiny dotted and dashed lines indicate the posterior smoothed mean for the BSSMM-N, BSSMM-T, BSSMM-S and BSSMM-VG, respectively. There are expressive differences between the posterior estimates of the BSSMM-VG and the others from the BSSMM-N, BSSMM-T and BSSMM-VG models. We found differences between the posterior means of the last three models specially in the OFF periods. In Fig. 5, we plot the posterior smoothed mean for the probability of a correct response computed using the BSSMM-N (solid line), BSSMM-T (dotted line), BSSMM-S (tiny dotted line) and BSSM-VG (dashed line) models. In this case the estimated probability is less constrained and tracks the data independent of the stimulation-ON/OFF information. In all the cases, on average the response curve lies around the 0.75 level but decreases are observed at the end of the first stimulation-ON period around trial 375, at the end of the 4th OFF period around trial 950 and for the remainder of the experiment from trial 1100 onwards. All the models are able to account for the stimulation effect. The results indicate that stimulation has a positive influence on the performance. However, they show that the performance does not improve during the first stimulation period. Overall, however, all the models result highlight an abrupt step-like decline in performance towards the end of the experiment, around trial 950, which undergoes a significant increase during the final stimulation period before a final significant drop to zero. All the results are consistent with Smith et al. (2009). To assess the goodness of the estimated models we use the log predictive score (LPS). To estimate the LPS, we use the particle filter method of Carvalho et al. (2010) with 5000 particles. The minimum value of the LPS gives the best fit. We compare the BSSMM-N, BSSMM-T, BSSMM-S and BSSMM-VG models. From Table 6, the LPS selects the BSSMM-S model as the best model for the DBS data set. We investigate the predictive ability of the BSSMM-N and BSSMM-S models. For this purpose, we divide our data set in an initial group with 1200 observations and in the second group, we left the next 50 observations for out-of-sample forecast. We fit the BSSMM-N and BSSMM-S models, parameter estimation for this data set is not reported here. We simulate out-of-sample states using the methods described in Section 3.5. As byproduct of the MCMC simulation, we can found the conditional probability of success π1200+k for the next 50 observations for each model considered. We found that the MSEF are 0.4619 and 0.5758, for the BSSMM-S and BSSMM-N models, respectively. The result indicates that the BSSMM-S model outperforms the BSSMM-N model. 6. Conclusions In this paper we proposed a class of state space mixed models for longitudinal binary data using mixture of scale normal links as an extension of Czado and Song (2008). The models include both deterministic and random predictors. We studied three specific sub-classes, viz. the Student-t, slash and the variance gamma links. Under a Bayesian perspective, we constructed an algorithm based on Markov chain Monte Carlo (MCMC) simulation methods to estimate all the parameters
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
285
and latent quantities of our proposed BSSMM-SMN links. The latent states are efficiently simulated using the simulation smoother of de Jong and Shephard (1995). Particle Filtering methods are developed to calculate the predictive likelihood function and the log-predictive score for model comparison. In the two examples analyzed, we found that the BSSMM-S performs the best fit. It is also important to emphasize that in general, we do not advocate the use of the BSSMM-SMN models in all situations but recommend using the models discussed here to assess the robustness of the conclusions, replacing the normal assumption with a more flexible model if this provides a more appropriate analysis. This article makes certain contributions, but several extensions are still possible. First, we focus on symmetrical links, but if the rate of zeros or ones are not the same, skewed links as the skew normal or the skew Student-t are good alternatives. Nevertheless, a deeper investigation of those modifications is beyond the scope of the present paper, but provides stimulating topics for further research. Acknowledgments The authors would like to thank the Associate Editor and two anonymous referees for their constructive comments which substantially improved the quality of this paper. The research of Carlos A. Abanto-Valle was supported by the CNPq grant 202052/2011-7. We thank P.X.-K. Song and Anne C. Smith for making the infant sleep and deep brain stimulation data sets available on http://www-personal.umich.edu/~pxsong/BOOKLDA.html and http://www.ucdmc.ucdavis.edu/ anesthesiology/research/smith_Bayesian.html, respectively. Appendix. The full conditionals In this appendix, we describe the full conditional distributions for the parameters, the threshold variables Z1:T and the mixing variables λ1:T of the BSSMM-SMN class. A.1. Full conditional distribution of β, δ and τ 2 For the common parameters of the BSSMM-SMN class, the prior distributions are set as: β ∼ Nk (β0 , 60 ), δ ∼ N(−1,1)
(δ0 , σδ2 ) and τ 2 ∼ IG( T20 , S20 ). So, the full conditional of β is given by
β | Z1:T , θ 0:T , λ1:T ∼ Nk (β1 , Σ1 ), T T where Σ1 = [Σ0−1 + t =1 xt x′t λt ]−1 and β1 = Σ1 [Σ0−1 β0 + t =1 xt (Zt − θt )λt ]. We have the following full conditional for δ is given by p(δ | θ 0:T , τ 2 ) ∝ Q (δ) exp −
where Q (δ) =
aδ
2τ 2
δ−
bδ
(21)
2
1(|δ| < 1),
aδ
√
1 − δ 2 exp{− 2τ12 [(1 − δ 2 )θ02 ]}, aδ =
T
t =1
θt2−1 +
(22) τ2 , σδ2
bδ =
T
t =1
θt −1 θt + δ0 στ 2 and 1(X ∈ A) denotes 2
δ
an indicator function that is equal to 1 if the random variable X is contained in the set A and zero otherwise is an indicator variable. As p(δ | θ 0:T , τ 2 ) in (22) does not have closed form, we sample it by using the Metropolis–Hastings algorithm with 2 b truncated N(−1,1) ( aδ , τa ) as the proposal density.
δ
δ
Finally, the full conditional of τ 2
τ 2 | θ 0:T , δ ∼ IG
T1 S1 2
,
2
,
(23)
where T1 = T0 + T + 1 and S1 = S0 + [(1 − δ 2 )θ02 ] +
T
t =1
(θt − δθt −1 )2 .
A.2. Full conditional of Z1:T Since the latent variables Z1:T are conditional independent given θ 0:T and λ1:T , we have that the full conditional distribution of Z1:T is given by p(Z1:T | Y1:T , θ 0:T , λ1:T , β) =
T
p(Zt | θt , λt , β)
t =1
T ′ = {1(Zt ≥ 0)1(yt = 1) + 1(Zt < 0)1(yt = 0)}φ(Zt | xt α + θt , λt ) ,
(24)
t =1
where φ(x | µ, σ 2 ) indicates the density of the normal density with mean µ and variance σ 2 . So, a density p(Zt | θt , λt , β) is a truncate normal density according to the value of Yt .
286
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
A.3. Full conditional of λt and ν [Z −x′ β−θ ]2 +ν
• BSSMM-T case. As λt ∼ G( ν2 , ν2 ), then λt | Zt , θt , β, ν ∼ G( ν+2 1 , t t 2 t ). An important modeling assumption is the regularization penalty p(ν) on the tail thickness. A default Jeffreys’ prior was developed by Fonseca et al. (2008), with a number of desirable properties particularly when learning a fat-tail from a finite data set. The default Jeffreys’s prior for ν takes the form
p(ν) ∝
ν ν+3
12 ψ
′
ν 2
d{ψ(a)}
−ψ
′
ν+1
−
2
2(ν + 3)
12 ,
ν(ν + 1)2
(25)
d{log 0 (a)}
are the trigamma and digamma functions, respectively. The interesting feature where ψ ′ (a) = da and ψ(a) = da of this prior is its behavior as ν goes to infinity and it has polynomial tails of the form p(ν) ∝ ν −4 . In this case, the tail of the prior decays rather fast for large values of ν and assessing the degree of tail thickness can require prohibitively large samples. Then, the full conditional of ν is
T2ν ν 2
e
− ν2
p(ν | λ1:T ) ∝ p(ν)
T t =1 (λt −log λt )
1 (2 < ν ≤ 40).
ν T 0 2
(26)
As (26) does not have closed form, we sample ν by using the Metropolis–Hastings algorithm. The proposal density used are N(2<ν<40) (µν , τν2 ), with µν = x −
q ′ ( x) q′′ (x)
and τν2 = (−q′′ (x))−1 , where x is the value of the previous iteration, q(·) is the
logarithm of the conditional posterior density, and q′ (·) and q′′ (·) are the first and second derivatives respectively.
• BSSMM-S case.
Using the fact that λt ∼ B e(ν, 1), then λt | Zt , ht , θ ∼ G(0<λt <1) (ν + 21 , 12 [Zt − x′t β − θt ]2 ), the right truncated gamma distribution. Assuming that a prior distribution of ν ∼ G(aν , bν ), the full conditional distribution of ν is Gν>1 (T + aν , bν − T t =1 log λt ), i.e., the left truncated gamma distribution.
• BSSMM-VG case. As λt ∼ IG( ν2 , ν2 ), the full conditional of λt is given by − ν2 + 12 −1 − 12 λt [Zt −x′t β−θt ]2 + λν
p(λt | Zt , ht , ν) ∝ λt
e
t
,
(27) ν
which is the generalized inverse Gaussian distribution, GIG(− 2 + , [Zt − x′t β − θt ]2 , ν). We assume the prior distribution of ν as G0
T2ν ν 2
p(ν | y1:T , h0:T , λ1:T ) ∝
ν a ν −1 e
− ν2
T
t =1
ν T 0 2
1 λt +log λt +2bν
1 2
1 (0 < ν ≤ 40).
(28)
Thus, we sample ν by using the Metropolis–Hastings algorithm as in the case of the BSSMM-T model with proposal density N(0,40) (µν , τν2 ). References Albert, J., Chib, S., 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, 669–679. ¯ The Indian Journal of Statistics. Basu, S., Mukhopadhyay, S., 2000a. Bayesian analysis of binary regression using symmetric and asymmetric links. Sankhya: Series B 62, 372–379. Basu, S., Mukhopadhyay, S., 2000b. Binary response regression with normal scale mixture links. In: Dey, D.K., Ghosh, S.K., Mallick, B.K. (Eds.), Generalized Linear Models: A Bayesian Perspective. Marcell Decker, New York, pp. 231–239. Carlin, B.P., Polson, N.G., 1992. Monte Carlo Bayesian methods for discrete regression models and categorical time series. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics. Vol. 4. Clarendon Press, Oxford, UK, pp. 577–586. Carvalho, C.M., Johannes, M., Lopes, H.F., Polson, N.G., 2010. Particle learning and smoothing. Statistical Science 25, 88–110. Chow, S.T.B., Chan, J.S.K., 2008. Scale mixtures distributions in statistical modelling. Australian & New Zealand Journal of Statistics 50, 135–146. Czado, C., Song, P.X.-K., 2008. State space mixed models for longitudinal observations with binary and binomial responses. Statistical Papers 49, 691–714. de Jong, P., Shephard, N., 1995. The simulation smoother for time series models. Biometrika 82, 339–350. Delatola, E.-I., Griffin, J.E., 2011. Bayesian nonparametric modelling of the return distribution with stochastic volatility. Bayesian Analysis 6, 901–926. Fahrmeir, L., 1992. Posterior mode estimation by extended Kalman filtering for multivariate dynamic generalized linear models. Journal of the American Statistical Association 87, 501–509. Fonseca, T.C.O., Ferreira, M.A.R., Migon, H.S., 2008. Objective Bayesian analysis for the Student-t regression model. Biometrika 95, 325–333. Geweke, J., 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics. Vol. 4. Clarendon Press, Oxford, UK, pp. 577–586. Gneiting, T., Raftery, A.E., 2007. Strictly proper scoring rules, prediction and estimation. Journal of the American Statistical Association 6, 901–926.
C.A. Abanto-Valle, D.K. Dey / Computational Statistics and Data Analysis 71 (2014) 274–287
287
Good, I.J., 1952. Rational decisions. Journal of the Royal Statistical Society. Series B 14, 107–114. Kim, S., Shepard, N., Chib, S., 1998. Stochastic volatility: likelihood inference and comparison with ARCH models. Review of Economic Studies 65, 361–393. Lange, K.L., Sinsheimer, J.S., 1993. Normal/independent distributions and their applications in robust regression. Journal of Computational and Graphical Statistics 2, 175–198. Lopes, H.F., Carvalho, C.M., Johannes, M., Polson, N.G., 2011. Particle learning for sequential Bayesian computation (with discussion). In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (Eds.), Bayesian Statistics. Vol. 9. Clarendon Press, Oxford, UK, pp. 317–360. McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, second ed. Chapman and Hall, London. Naranjo, L., Martín, J., Pérez, C.J., 2012. Bayesian binary regression with exponential power link. Computational Statistics and Data Analysis. URL: http://dx.doi.org/10.1016/j.csda.2012.07.022 (forthcoming). Pemstein, D., Quinn, K.V., Martin, A.D., 2011. The scythe statistical library: An open source C++ library for statistical computation. Journal of Statistical Software 42, 1–26. Smith, A.C., Shah, S.A., Hudsonc, A.E., Purpurad, K.P., Victor, J.D., Brown, E.N., Schiff, N.D., 2009. A Bayesian statistical analysis of behavioral facilitation associated with deep brain stimulation. Journal of Neuroscience Methods 18, 267–276. Song, P.X.-K., 2000. Monte Carlo Kalman filter and smoothing for multivariate discrete state space models. The Canadian Journal of Statistics 28, 641–652. Sttoffer, D.S., Schert, M.S., Richardson, G.A., Day, N.L., Coble, P.A., 1998. A Walsh–Fourier analysis of the effects of moderate maternal alcohol consumption on neonatal sleep-state cycling. Journal of the American Statistical Association 83, 954–963. West, M., Harrison, J., 1997. Bayesian Forecasting and Dynamic Models, second ed. Springer-Verlag, New York. West, M., Harrison, P.J., Migon, H.S., 1985. Dynamic generalized linear models and Bayesian forecasting. Journal of the American Statistical Association 136, 209–220. With discussion.