Journal of Multivariate Analysis 140 (2015) 325–342
Contents lists available at ScienceDirect
Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva
Geometric ergodicity of random scan Gibbs samplers for hierarchical one-way random effects models Alicia A. Johnson a,∗ , Galin L. Jones b a
Department of Mathematics, Statistics, and Computer Science, Macalester College, United States
b
School of Statistics, University of Minnesota, United States
article
abstract
info
Article history: Received 14 May 2014 Available online 11 June 2015
We consider two Bayesian hierarchical one-way random effects models and establish geometric ergodicity of the corresponding random scan Gibbs samplers. Geometric ergodicity, along with a moment condition, guarantees a central limit theorem for sample means and quantiles. In addition, it ensures the consistency of various methods for estimating the variance in the asymptotic normal distribution. Thus our results make available the tools for practitioners to be as confident in inferences based on the observations from the random scan Gibbs sampler as they would be with inferences based on random samples from the posterior. © 2015 Elsevier Inc. All rights reserved.
AMS subject classifications: 60J05 60J22 65C40 62F15 Keywords: Markov chain Monte Carlo Convergence Gibbs sampling Geometric ergodicity One-way random effects
1. Introduction Suppose that for i = 1, . . . , K ind
Yi |θi , γi ∼ N (θi , γi−1 ) ind
1 −1 θi |µ, λθ , λi ∼ N(µ, λ− θ λi ) (µ, λθ , γ1 , . . . , γK , λ1 , . . . , λK ) ∼ p(µ, λθ , γ1 , . . . , γK , λ1 , . . . , λK )
(1)
where p is a generic proper prior. Eventually we will consider two distinct choices for the prior p, but we leave the description until Section 2. In both cases, the hierarchy in (1) yields a proper posterior which is intractable in the sense that the posterior quantities, such as expectations or quantiles, required for Bayesian inference are not available in closed form. Thus we will consider the use of Markov chain Monte Carlo (MCMC) methods. Let y denote all of the data, λ = (λ1 , . . . , λK )T , ξ = (θ1 , . . . , θK , µ)T , and γ = (γ1 , . . . , γK )T . In Section 2 we will see that the specific forms of the posterior full conditional densities f (ξ |λθ , λ, γ , y), f (λθ |ξ , λ, γ , y), f (λ|ξ , λθ , γ , y) and f (γ |ξ , λθ , λ, y) are available and hence it is easy to construct Gibbs samplers to help perform posterior inference. Gibbs samplers can be implemented in either a deterministic scan or a random scan, among other variants [22,13]. Although deterministic
∗
Corresponding author. E-mail addresses:
[email protected] (A.A. Johnson),
[email protected] (G.L. Jones).
http://dx.doi.org/10.1016/j.jmva.2015.06.002 0047-259X/© 2015 Elsevier Inc. All rights reserved.
326
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
scan MCMC algorithms are currently popular in the statistics literature, random scan algorithms were some of the first used in MCMC settings [25,9] and remain useful in applications [20,28]. Random scan Gibbs samplers can also be implemented adaptively while the deterministic scan version cannot. In addition, there has been recent interest in the theoretical properties of random scan algorithms [2,13,18,21,33,36]. We will study the random scan Gibbs sampler which is now described. Let r = (r1 , r2 , r3 , r4 ) with r1 + r2 + r3 + r4 = 1 (n) and each ri > 0 where we call r the selection probabilities. If (ξ (n) , λθ , λ(n) , γ (n) ) is the current state of the Gibbs sampler, then the (n + 1)st state is obtained as follows. Draw U ∼ Uniform(0, 1) and call the realized value u. (n)
If u ≤ r1 , draw ξ ′ ∼ f (ξ |λθ , λ(n) , γ (n) , y) and set
(ξ (n+1) , λ(θn+1) , λ(n+1) , γ (n+1) ) = (ξ ′ , λ(θn) , λ(n) , γ (n) ) else if r1 < u ≤ r1 + r2 , draw λ′θ ∼ f (λθ |ξ (n) , λ(n) , γ (n) , y) and set
(ξ (n+1) , λ(θn+1) , λ(n+1) , γ (n+1) ) = (ξ (n) , λ′θ , λ(n) , γ (n) ) (n)
else if r1 + r2 < u ≤ r1 + r2 + r3 , draw λ′ ∼ f (λ|ξ (n) , λθ , γ (n) , y) and set
(ξ (n+1) , λ(θn+1) , λ(n+1) , γ (n+1) ) = (ξ (n) , λ(θn) , λ′ , γ (n) ) (n)
else if r1 + r2 + r3 < u ≤ 1, draw γ ′ ∼ f (γ |ξ (n) , λθ , λ(n) , y) and set
(ξ (n+1) , λθ(n+1) , λ(n+1) , γ (n+1) ) = (ξ (n) , λ(θn) , λ(n) , γ ′ ). Our goal is to investigate the conditions required for the random scan Gibbs sampler to produce reliable simulation results. Specifically, we will investigate conditions under which the Markov chain is geometrically ergodic, which we now define. Let X = RK × R × R+ × RK+ × RK+ and B (X) denote the Borel sets. Let P n : X × B (X) → [0, 1] denote the n-step Markov kernel for the random scan Gibbs sampler so that if A ∈ B (X) and n ≥ 1 (1)
(n+1)
P n ((ξ (1) , λθ , λ(1) , γ (1) ), A) = Pr((ξ (n+1) , λθ
, λ(n+1) , γ (n+1) ) ∈ A | (ξ (1) , λ(θ1) , λ(1) , γ (1) )).
Let F denote the posterior distribution associated with (1) and ∥ · ∥ denote the total variation norm. Then the random scan Gibbs sampler is geometrically ergodic if there exists M : X → [0, ∞) and t ∈ [0, 1) such that for all ξ , λθ , λ, γ and n = 1, 2, . . .
∥P n ((ξ , λθ , λ, γ ), ·) − F (·)∥ ≤ M (ξ , λθ , λ, γ ) t n .
(2)
Geometric ergodicity is a useful stability property for MCMC samplers [16,32] in that it ensures rapid convergence of the Markov chain since t < 1, the existence of a central limit theorem (CLT) [1,12,14,30], and consistency of various methods to estimate asymptotically valid Monte Carlo standard errors [5,7,8,15]. To see the connection between (2) and the CLT let g : X → R and f be the posterior density and suppose we want to calculate
µg :=
g (ξ , λθ , λ, γ )f (ξ , λθ , λ, γ |y) dξ dλθ dλ dγ . X
Assuming µg exists and the Markov chain is irreducible, aperiodic and Harris recurrent (see Meyn and Tweedie [26] for definitions and Section 2 for discussion of these conditions for our two random scan Gibbs samplers), then, as n → ∞,
µn :=
n 1
n i =1
(i)
g (ξ (i) , λθ , λ(i) , γ (i) ) → µg
with probability 1.
Of course, µn will be much more valuable if we can equip it with an asymptotically valid standard error. If the random scan Gibbs sampler is geometrically ergodic and
g 2 (ξ , λθ , λ, γ )f (ξ , λθ , λ, γ |y) dξ dλθ dλ dγ < ∞,
(3)
X
then there exists δg2 < ∞ such that, as n → ∞, and for any initial distribution
√
d
n(µn − µg ) → N(0, δg2 ).
(4)
The quantity δg2 is complicated [10], but if the Markov chain is geometrically ergodic there are several methods for estimating it consistently [7,12,15]. This then allows construction of asymptotically valid interval estimates of µg to describe the precision of µn [5] and hence the reliability of the simulation. A similar approach is available for estimating posterior quantiles which, of course, are often useful for constructing posterior credible intervals; see the recent work of Doss et al. [3].
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
327
There has been a fair amount of work on establishing geometric ergodicity of two-component deterministic scan Gibbs samplers (see, among others, [11,17,24,27,29,31,34,35]), while Doss and Hobert [4] and Jones and Hobert [17] considered geometric ergodicity of three-component deterministic scan Gibbs samplers. However, there has been almost no corresponding investigation for random scan Gibbs samplers. Liu et al. [23] did study geometric convergence of random scan Gibbs samplers, but the required regularity conditions have so far precluded application of their results outside of simple settings. More recently, Johnson et al. [13] and Tan et al. [36] studied geometric ergodicity of some simple two-component random scan Gibbs samplers. In contrast, the random scan Gibbs samplers we consider have more components, making the required analysis substantially more complicated. The rest of the paper is organized as follows. In Section 2 we will fully specify our two Bayesian hierarchical models and the random scan Gibbs samplers as well as our main results where we prove geometric ergodicity of both samplers. In Section 3 we give some discussion and context for the results. Many technical details and calculations are deferred to the appendices. 2. Two models and random scan Gibbs samplers We now turn our attention to describing the two hierarchical models, the associated random scan Gibbs samplers (RSGS), and our main results. 2.1. Two unknown variance components Suppose for i = 1, . . . , K ind
Yi |θi , γi ∼ N (θi , γi−1 ) ind
1 −1 θi |µ, λθ , λi ∼ N(µ, λ− θ λi )
(5)
iid
1 µ ∼ N(m0 , s− 0 )
γi ∼ Gamma(a3 , b3 )
λθ ∼ Gamma(a1 , b1 )
λi ∼ Gamma(a2 , b2 )
iid
where we assume the a1 , a2 , a3 , b1 , b2 , b3 and s0 are known positive constants while m0 is a known scalar. (Note that if X ∼ Gamma(a, b), then it has density proportional to xa−1 e−bx for x > 0.) The hierarchy results in a proper posterior density f (ξ , λθ , λ, γ |y) and four full conditional densities f (λθ |ξ , λ, γ ), f (λ|ξ , λθ , γ ), f (γ |ξ , λθ , λ) and f (ξ |λθ , λ, γ ) which are given by
λθ |ξ , λ, γ ∼ Gamma a1 +
ind
λi |ξ , λθ , γ ∼ Gamma a2 + ind γi |ξ , λθ , λ ∼ Gamma a3 +
K 2 1
, b1 +
2 i=1
λθ
λi (θi − µ)
2
(θi − µ) 1 1 , b3 + (θi − yi )2 2
, b2 +
K 1
2
2
2
2
ξ |λ, λθ , γ ∼ NK (ξ0 , V ) where the components of ξ0 and V are reported in Appendix C. Let δ denote Dirac’s delta and r = (r1 , r2 , r3 , r4 ) be the selection probabilities. Define k(ξ ′ , λ′θ , λ′ , γ ′ |ξ , λθ , λ, γ ) = r1 f (ξ ′ |λθ , λ, γ )δ(λ′θ − λθ )δ(λ′ − λ)δ(γ ′ − γ )
+ r2 f (λ′θ |ξ , λ, γ )δ(ξ ′ − ξ )δ(λ′ − λ)δ(γ ′ − γ ) + r3 f (λ′ |ξ , λθ , γ )δ(ξ ′ − ξ )δ(λ′θ − λθ )δ(γ ′ − γ ) + r4 f (γ |ξ , λθ , λ)δ(ξ ′ − ξ )δ(λ′ − λ)δ(λ′θ − λθ ).
For A ∈ B (X) the Markov kernel for the RSGS is given by P ((ξ , λθ , λ, γ ), A) =
A
k(ξ ′ , λ′θ , λ′ , γ ′ |ξ , λθ , λ, γ )dξ ′ dλ′θ dλ′ dγ ′
= r1 Pξ ((λθ , λ, γ ), A) + r2 Pλθ ((ξ , λ, γ ), A) + r3 Pλ ((ξ , λθ , γ ), A) + r4 Pγ ((ξ , λθ , λ), A) where Pξ ((λθ , λ, γ ), A) =
{ξ :(ξ ,λθ ,λ,γ )∈A}
f (ξ |λθ , λ, γ ) dξ
and similarly for the remaining Gibbs updates Pλθ ((ξ , λ, γ ), A), Pλ ((ξ , λθ , γ ), A), and Pγ ((ξ , λθ , λ), A).
(6)
328
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
It is well known that the RSGS kernel P is reversible with respect to the posterior F (see e.g. [32]) and hence that the posterior is the invariant distribution for the RSGS. Let h(ξ ′ , λ′θ , λ′ , γ ′ |ξ , λθ , λ, γ ) = f (ξ ′ |λθ , λ, γ )f (λ′θ |ξ ′ , λ, γ )f (λ′ |ξ ′ , λ′θ , γ )f (γ ′ |ξ ′ , λ′θ , λ′ ). Note that h is a strictly positive density on X = RK × R × R+ × RK+ × RK+ ; in fact, h is the Markov transition density for the deterministic scan Gibbs sampler. Next observe that for A ∈ B (X) P 4 ((ξ , λθ , λ, γ ), A) ≥ r1 r2 r3 r4
A
h(ξ ′ , λ′θ , λ′ , γ ′ |ξ , λθ , λ, γ ) dξ ′ dλ′θ dλ′ dγ ′ .
It follows immediately that the RSGS is irreducible, aperiodic and Harris recurrent. We turn our attention to establishing geometric ergodicity of the RSGS. The tool we will use to do so is Lemma 15.2.8 in Meyn and Tweedie [26]. Geometric ergodicity will follow from finding a function V : RK +1 × R+ × RK+ × RK+ → R+ which is unbounded off compact sets – that is, the level set
{(ξ , λθ , λ, γ ) : V (ξ , λθ , λ, γ ) ≤ T } is compact for every T > 0 – and a 0 < ρ < 1 and L < ∞ such that E [V (ξ ′ , λ′θ , λ′ , γ ′ )|ξ , λθ , λ, γ ] ≤ ρ V (ξ , λθ , λ, γ ) + L.
(7)
See Jones and Hobert [16] for an accessible treatment of the connection between drift conditions, such as the one in (7), and geometric ergodicity. Theorem 1. If 2a1 + K − 2 > 0 and a3 > 1, then the RSGS is geometrically ergodic. Proof. Define V = i=1 Ai wi where the wi are defined as follows and the Ai are positive constants that will be determined later. For 0 < c < min{b1 , b2 , b3 },
18
1 w1 (λθ ) = λ− θ
w2 (λ) =
K
−1/2
λi
w3 (ξ ) =
K (θi − µ)2
i=1
w4 (ξ , γ ) =
K
i=1
w5 (λθ ) = ec λθ
γi (θi − yi )2
w6 (λ) =
K
i=1
i=1 K
K
w7 (ξ , λθ ) =
1/2 λθ (θi − µ)2 + 2b2
w8 (ξ , λ) =
w13 (λ, γ ) =
K
ec γi
w11 (γ ) =
K
i=1
i =1
K
K
λi γi−1
w14 (γ ) =
i=1
w16 (λ) =
K
λi (θi − µ)2
w9 (λθ ) = λθ
i=1
i=1
w10 (γ ) =
ec λi
w12 (λθ , γ ) = λθ
γi−1
K w15 (ξ , λθ ) = λθ (θi − yi )2
w17 (ξ , λ) =
i=1
γi−1
i=1
i =1
λi
K
γi
K
i=1
λi (θi − yi )2
w18 (ξ ) =
i =1
K (θi − yi )2 . i =1
We need to show that V is unbounded off compact sets. That is, for every T > 0 the level set LT = {(ξ , λθ , λ, γ ) : V (ξ , λθ , λ, γ ) ≤ T } is compact. Let T > 0 be arbitrary and observe that w1 → ∞ as λθ → 0 and w5 → ∞ as λθ → ∞; w2 → ∞ as λi → 0 and w6 → ∞ as λi → ∞; w14 → ∞ as γi → 0 and w10 → ∞ as γi → ∞; w18 → ∞ as |θi | → ∞; and conditional on θi ∈ LT , w3 → ∞ as |µ| → ∞. It now follows from the continuity of V that LT is closed and bounded, hence compact. Now all that is left is to establish (7). It is sufficient to consider only r1 = r2 = r3 = r4 = 1/4, since if the RSGS is geometrically ergodic for some selection probabilities it is geometrically ergodic for all selection probabilities (Proposition 19 [18]). Next observe that due to the form of k(ξ ′ , λ′θ , λ′ , γ ′ |ξ , λθ , λ, γ ) the required expectation can be expressed as E [V (ξ ′ , λ′θ , λ′ , γ ′ ) | ξ , λθ , λ, γ ] =
1
4
+
1 4
V (ξ ′ , λθ , λ, γ )f (ξ ′ |λθ , λ, γ ) dξ ′
V (ξ , λ′θ , λ, γ )f (λ′θ |ξ , λ, γ ) dλ′θ
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
+ +
1 4
1 4
329
V (ξ , λθ , λ′ , γ )f (λ′ |ξ , λθ , γ ) dλ′ V (ξ , λθ , λ, γ ′ )f (γ ′ |ξ , λθ , λ) dγ ′ .
Hence we need to calculate expectations of the wi with respect to the posterior full conditional densities. These calculations are done in Appendix A yielding E [V (ξ ′ , λ′θ , λ′ , γ ′ )|ξ , λθ , λ, γ ] ≤ c1 w1 (λθ ) + · · · + c18 w18 (ξ ) + L where ci = 3Ai /4 for i = 1, 2, 5, 6, 10, c4 = A4 /2, 3
c3 =
4 1
c8 =
2 1
c12 =
2 1
c15 =
2 3
c9 =
4 3
c14 =
4 3
c16 =
4 3
c18 =
4
2a1 + K
A3 +
A15
4(2a1 + K − 2) 1 1 + A7 + A15 4 4 1 + A12 4(2a3 − 1) 1 2 K (2s− 0 +∆ )
A9 +
A16 +
1 4
A3 +
2 3
c11 =
4 1
c13 =
2 1
c17 =
2a1 + K
Kb3 2(2a3 − 1) 2a2 + 1
A12 +
8b1
1 2 2s− 0 +∆
1
c7 =
A8
A1
A7 +
4
A14 +
A18
8b2
1
A8 + A12
2a2 + 1
A7 +
8b1
8b2
b3
A8 +
A12 + A13 +
A13 +
1 2 K ( s− 0 +∆ )
4 1 4
2
1 Γ (a2 ) A7 + √ A2 Γ ( a 4 2 2 + 1/2) A11 + A13 + A17 +
1 2 s− 0 +∆
4 1 4
A8 +
1 4
A4
A17
1 4(2a3 − 1)
A13
A15
A18
s0−1 + ∆2
A17 2(2a3 − 1) 4 2a3 + 1 1 2a1 + K 2a2 + 1 + A4 + A14 + A15 + A17 8b3 4(2a3 − 1) 8b1 8b2 4
and b1
L =
2(2a1 + K − 2)
+ +
K (2b2 + 1) 2 K (2a2 + 1) 8b2
A1 +
A7 + A16 +
K
4
2 s0
2a1 + K
A9 +
8b1 K
4
+∆
2
1 s0
K 4
A3 +
K 4
A4 +
1
4
a1 +K /2
b1
a3 +1/2
b3 b3 − c
A5 +
b1 − c
K (2a3 + 1)
A10 +
8b3
K
a2 +1/2
b2
A6
b2 − c
4
A11 +
Kb3 2(2a3 − 1)
A14
+ ∆2 A18 .
Notice our assumption that 2a1 + K − 2 > 0 and a3 > 1 ensures that all of the above quantities are well-defined. To establish (7) we need to establish the existence of ρ < 1. Recall that we still have not chosen the values of the Ai . Let A4 > 0, A5 > 0, A6 > 0, A7 = A8 = 1, A10 > 0,
√
0 < A1 < 2(2a1 + K − 2) A3 >
2a1 + K 2b1
+
0 < A2 <
2a2 + 1
2
2b2
1 2
1
+ A17 < A13 < 2(2a3 − 1)A17 A16 >
A3 +
2a1 + K 2b1
s0
+∆ +
A12 +
2b3
2
2a3 − 1
2a2 + 1 2b2
A13 + (s0 + ∆ )A17 −1
2
2b3 2a3 − 1
1 2 A12 + (s− 0 + ∆ )A15
+ A15 < A12 < 2(2a3 − 1)A15 2
A17 >
−1
1
A15 >
2
2
Γ (a2 )
A9 > K 2s0 + ∆2 +
1 2 A11 > [s− 0 + ∆ ]A 4
1
2 2Γ (a2 + 1/2)
1 4(2a3 − 1) − 1 1 4(2a3 − 1) − 1
A13 + A18 < A14 < (2a3 − 1) A18 −
2a3 + 1 2b3
A4 −
2a1 + K 2b1
A15 −
2a2 + 1 2b2
A17
330
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
and A18 >
1
A3 +
2a3 − 2
2a1 + K 2b1
A12 +
2a2 + 1 2b2
A13 +
2a3 − 1 2a3 − 2
2a3 + 1 2b3
A4 +
2a1 + K 2b1
A15 +
2a2 + 1 2b2
A17 .
The conditions on A1 , . . . , A18 ensure the existence of ρ such that
max
c1
,
c2
A1 A2
,...,
c18
A18
≤ρ<1
and hence we have shown the existence of 0 < ρ < 1 and an L < ∞ such that E [V (ξ ′ , λ′θ , λ′ , γ ′ )|ξ , λθ , λ, γ ] ≤ c1 w1 (λθ ) + · · · + c18 w18 (ξ ) + L c18 c1 A18 w18 (ξ ) + L = A1 w1 (λθ ) + · · · + A1 A18
≤ max
c1
,
c2
A1 A2
,...,
c18 A18
V (ξ , λθ , λ, γ ) + L
≤ ρ V (ξ , λθ , λ, γ ) + L. We conclude that the RSGS is geometrically ergodic.
2.2. One unknown variance component Doss and Hobert [4] consider a hierarchical model which is a special case of the model presented in the previous section, that is, (5). Let λ = (λ1 , . . . , λK )T and suppose that for i = 1, . . . , K ind
Yi |θi ∼ N (θi , γi−1 ) ind
1 −1 θi |µ, λθ , λ ∼ N(µ, λ− θ λi )
(8)
1 µ∼N(m0 , s− 0 ) iid
λθ ∼ Gamma(a1 , b1 )
λi ∼ Gamma(a2 , b2 )
where the γi are known and positive as are s0 , a1 , a2 , b1 and b2 . Also, m0 ∈ R is known. Doss and Hobert [4] use this model in the context of meta-analysis where it can be reasonable to assume the γi are known. The posterior density is f (ξ , λθ , λ|y), yielding full conditional distributions
λθ |ξ , λ ∼ Gamma a1 +
ind
K 2
, b1 +
K 1
2 i=1
1
λθ
2
2
λi |ξ , λθ ∼ Gamma a2 + , b2 +
λi (θi − µ)
(θi − µ)
2
2
(9)
ξ |λ, λθ ∼ NK (ξ0 , V ) where the components of ξ0 and V are reported in Appendix C. Notice that, due to the conditional independence assumptions, these are the same as the full conditionals in (6). Let δ denote Dirac’s delta and r = (r1 , r2 , r3 ) denote the selection probabilities. Define k(ξ ′ , λ′θ , λ′ |ξ , λθ , λ) = r1 f (ξ ′ |λθ , λ)δ(λ′θ − λθ )δ(λ′ − λ) + r2 f (λ′θ |ξ , λ)δ(ξ ′ − ξ )δ(λ′ − λ) + r3 f (λ′ |ξ , λθ )δ(ξ ′ − ξ )δ(λ′θ − λθ ) so that if A ∈ B (X) then the Markov kernel for the RSGS is given by P ((ξ , λθ , λ), A) =
A
k(ξ ′ , λ′θ , λ′ |ξ , λθ , λ)dξ ′ dλ′θ dλ′
= r1 Pξ ((λθ , λ), A) + r2 Pλθ ((ξ , λ), A) + r3 Pλ ((ξ , λθ ), A) where Pξ ((λθ , λ), A) =
{ξ :(ξ ,λθ ,λ)∈A}
f (ξ |λθ , λ) dξ
and similarly for the remaining Gibbs updates Pλθ ((ξ , λ), A), and Pλ ((ξ , λθ ), A).
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
331
It is well known that the RSGS kernel P is reversible with respect to the posterior (see e.g. [32]) and hence that the posterior is the invariant distribution for the RSGS. Let h(ξ ′ , λ′θ , λ′ |ξ , λθ , λ) = f (ξ ′ |λθ , λ)f (λ′θ |ξ ′ , λ)f (λ′ |ξ ′ , λ′θ ), which is a strictly positive density on X = RK × R × R+ × RK+ . Next observe that for A ∈ B (X) P 3 ((ξ , λθ , λ), A) ≥ r1 r2 r3
A
h(ξ ′ , λ′θ , λ′ |ξ , λθ , λ) dξ ′ dλ′θ dλ′
and it follows that the RSGS is irreducible, aperiodic and Harris recurrent. As in Section 2.1 we will establish geometric ergodicity using Lemma 15.2.8 from Meyn and Tweedie [26]. Specifically, we construct a function V : RK +1 × R+ × RK+ → R+ which is unbounded off compact sets and a 0 < ρ < 1 and L < ∞ such that E [V (ξ ′ , λ′θ , λ′ )|ξ , λθ , λ] ≤ ρ V (ξ , λθ , λ) + L.
(10)
Theorem 2. If 2a1 + K − 2 > 0, then the RSGS is geometrically ergodic. Proof. Define V =
10
i=1
Ai wi where the constants Ai are to be determined below and the functions wi are as follows. Letting
0 < c < b1 ∧ b2 , and αi = 1/γi + 2/s0 + ∆2 and α =
K
i=1
1 w1 (λθ ) = λ− θ
αi , set
w2 (λ) =
K
−1/2
λi
w3 (ξ ) =
i=1
w4 (ξ ) =
K
γi (θi − yi )
i=1
c λθ
w5 (λθ ) = e
2
w6 (λ) =
i=1
w8 (ξ , λ) =
ec λi
K
λi (θi − µ)2
w9 (λθ ) = λθ
i=1
i=1 K
K i =1
K 1/2 w7 (ξ , λθ ) = λθ (θi − µ)2 + 2b2
w10 (λ) =
K (θi − µ)2
αi λi .
i=1
We begin by showing that V is unbounded off compact sets, that is, the set LT = {ξ , λθ , λ : V (ξ , λθ , λ) ≤ T } is compact for every T > 0. The argument is the same as one given by Doss and Hobert [4] in their proof of geometric ergodicity of the deterministic scan Gibbs sampler, but is included here for the sake of completeness since their drift function did not have our w8 , w9 and w10 . Let T > 0 be arbitrary and observe that w5 → ∞ and w9 → ∞ as λθ → ∞ while w1 → ∞ as λθ → 0; w6 → ∞ and w10 → ∞ as λi → ∞ while w2 → ∞ as λi → 0; w4 → ∞ as |θi | → ∞; and conditional on θi ∈ LT we see that w3 → ∞ and w8 → ∞ as |µ| → ∞. It now follows from the continuity of V that LT is closed and bounded, hence compact. Now we need to establish (10). It is sufficient to consider only r1 = r2 = r3 = 1/3 since if the RSGS is geometrically ergodic for some selection probabilities it is geometrically ergodic for all selection probabilities (Proposition 19 [18]). Notice that E [V (ξ
′
, λ′θ , λ′ )|ξ , λθ , λ]
= r1
V (ξ , λθ , λ)f (ξ |λθ , λ) dξ + r2 ′
+ r3
′
′
V (ξ , λ′θ , λ)f (λ′θ |ξ , λ) dλ′θ
V (ξ , λθ , λ′ )f (λ′ |ξ , λθ ) dλ′ .
Hence we need to calculate expectations of the wi with respect to the full conditional distributions. We calculate the required expectations in Appendix B. If L =
2b1 3(2a1 + K − 2)
+
K 3
b2 b2 − c
A1 +
α 3
A3 +
a2 +1/2 A6 +
1 3
K + (∆ + s0 ) 2
−1
K
γi A4 +
i=1
2K (2b2 + 1) 3
A7 +
2a1 + K 6b1
A9 +
1 3
a1 +K /2
b1 b1 − c
(2a2 + 1)α 6b2
A10
A5
332
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
and c1 = c3 = c5 =
2 3 2 3 2 3
c7 =
c2 =
A1 A3 +
2a1 + K 6b1
A7 +
2a2 + 1 6b2
c4 =
A8
c6 =
A5 1
√
Γ (a2 )
3 2 Γ (a2 + 1/2) α 2 c9 = A7 + A9 3 3
A2 +
1 3
2 3 2 3 2 3
A2 A4 A6
c8 =
A7
c10
1
3(2a1 + K − 2) 1 2 = A8 + A10 , 3 3
A1 +
1 3
A8
then our calculations yield E [V (ξ ′ , λ′θ , λ′ )|ξ , λθ , λ] ≤ c1 w1 (λθ ) + · · · + c10 w10 (λ) + L. Now we need to choose the Ai so as to ensure the existence of ρ < 1 in (10). To this end let A4 > 0, A5 > 0, A6 > 0, A7 = A8 = 1, A9 > α , A10 > 1,
√
0 < A1 < 2(2a1 + K − 2),
0 < A2 <
2 2Γ (a2 + 1/2)
Γ (a2 )
and A3 >
2a1 + K 2b1
+
2a2 + 1 2b2
.
Then there exists ρ such that
max
c1 A1
,...,
c10
A10
≤ ρ < 1.
Hence we have shown that E [V (ξ ′ , λ′θ , λ′ )|ξ , λθ , λ] ≤ c1 w1 (λθ ) + · · · + c10 w10 (λ) + L c10 c1 A10 w10 (λ) + L = A1 w1 (λθ ) + · · · + A1 A 10 c1 c10 ≤ max ,..., V (ξ , λθ , λ) + L A1 A10 ≤ ρ V (ξ , λθ , λ) + L. We conclude that the RSGS is geometrically ergodic.
3. Discussion MCMC methods can produce asymptotically valid point estimates of posterior quantities such as means and quantiles under benign regularity conditions. Indeed these conditions often hold by construction of the algorithm. However, ensuring reliability of the simulation requires some idea of the quality of the estimation and this is best addressed through the construction of an asymptotically valid Monte Carlo standard error [5,16]. This requires a central limit theorem as in (4) and methods to consistently estimate the variance of the asymptotic normal distribution [15]. In general, it is much more difficult to establish a central limit theorem for Markov chains than for a random sample. However, when the Markov chain is reversible and geometrically ergodic, then the same moment condition is required for both settings; recall (3). By establishing geometric ergodicity the practitioner has all of the tools available to perform rigorous output analysis to describe the quality of the estimation procedure. See Flegal and Hughes [6] for an R package which allows easy implementation of the suggested methodology. We have focused on establishing geometric ergodicity of random scan Gibbs samplers for two Bayesian one-way random effects models. While the random scan Gibbs sampler is as easy to implement as the deterministic scan version, there has been little work in comparing the two methods. However, Roberts and Rosenthal [33] recently have shown in some examples the random scan version is more efficient while in other examples the deterministic scan is more efficient. Unfortunately, their results do not apply to the Gibbs samplers considered here. Generally, comparison of random scan with deterministic scan is difficult. There seem to be two technical reasons for this. Notice that the computational cost to complete one sweep of the deterministic scan algorithm is essentially the same as that required to complete 4 steps of the random scan version in Section 2.1 and 3 steps of the random scan version in 2.2. Also, the random scan Gibbs sampler is reversible with respect to the posterior while the deterministic scan Gibbs sampler is not. However, there has been speculation [13] that random scan Gibbs samplers and deterministic scan Gibbs samplers converge at the same qualitative rate for the two-component setting. There has been some progress in this direction [36], but little is known in the case where there are more than two components.
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
333
The RSGS considered in Section 2.1 has four components and it is unknown whether the deterministic scan Gibbs sampler is geometrically ergodic. However, Hobert and Geyer [11] and Jones and Hobert [17] considered deterministic scan Gibbs samplers for the same hierarchy in (5) except that they assumed each λi = 1 and hence our work required much new analysis. Interestingly, the conditions on the sample size in our Theorem 1 are slightly weaker than those obtained by either Hobert and Geyer [11] or Jones and Hobert [17] for the deterministic scan Gibbs sampler. Doss and Hobert [4] consider the setting of Section 2.2 and prove that the deterministic scan Gibbs sampler is geometrically ergodic when a2 = b2 = d/2, d ≥ 1 and 2a1 + K − 2 > 0. However, we think that their restrictions on a2 and b2 play no role in their argument. That is, the sufficient conditions for geometric ergodicity of RSGS are the same as those Doss and Hobert [4] obtained for geometric ergodicity of the deterministic scan Gibbs sampler. We note that establishing geometric ergodicity of random scan Gibbs samplers also has implications for other MCMC algorithms. For example, [19] require the standard RSGS to be geometrically ergodic to implement asymptotically valid adaptive RSGS algorithms. Thus our work allows rigorous implementation of adaptive RSGS algorithms. Finally, our results imply geometric ergodicity of the class of random scan Metropolis–Hastings-within-Gibbs samplers where each ratio of the proposal density to the target full conditional is bounded; see Theorem 9 in Jones et al. [18] for the details. Acknowledgments The second author research was supported by the National Science Foundation DMS-13-10096 and the National Institutes for Health NIBIB R01 EB012547. Appendix A. Conditional expectations for the proof of Theorem 1 This section contains the calculation of the conditional expectations required for the proof of Theorem 1. Some supplementary calculations are given in Appendix C. Note that E [w1 (λ′θ )|ξ , λθ , λ, γ ] =
= E [w2 (λ′ )|ξ , λθ , λ, γ ] =
3 4 3 4 3 4
1
w1 (λθ ) + E [(λ′θ )−1 |ξ , λ, γ ] 4
w1 (λθ ) + w2 (λ) +
1 4(2a1 + K − 2)
K 1
4 i =1
w8 (ξ , λ) +
b1 2(2a1 + K − 2)
,
E [(λ′i )−1/2 |ξ , λθ , γ ]
1/2 Γ (a2 ) λθ 2 = w2 (λ) + b2 + (θi − µ) 4 4 i=1 Γ (a2 + 1/2) 2 3 1 Γ ( a2 ) = w2 (λ) + √ w7 (ξ , λθ ) 4 Γ ( a 4 2 2 + 1/2) K 1
3
and E [w3 (ξ ′ )|ξ , λθ , λ, γ ] =
3 4 3
w3 (ξ ) + w3 (ξ ) +
K 1
4 i =1 K 1
E [(θi′ − µ′ )2 |λθ , λ, γ ]
1
2
+∆ by (12) 3 K 2 = w3 (ξ ) + w14 (γ ) + + ∆2 . ≤
4
4
4 i=1 1
γi
+
4
2
s0
4
s0
Now E [w4 (ξ ′ , γ ′ )|ξ , λθ , λ, γ ] =
1 2
1
1
4
4
w4 (ξ , γ ) + E [w4 (ξ ′ , γ )|λθ , λ, γ ] + E [w4 (ξ , γ ′ )|ξ , λθ , λ]
and hence we consider E [w4 (ξ ′ , γ )|λθ , λ, γ ] and E [w4 (ξ , γ ′ )|ξ , λθ , λ] individually before returning to the calculation for E [w4 (ξ ′ , γ ′ )|ξ , λθ , λ, γ ]: E [w4 (ξ ′ , γ )|λθ , λ, γ ] =
K
γi E (θi′ − yi )2 |λθ , λ, γ
i=1
≤
K i =1
γi
1
γi
+
1 s0
+∆
2
1 2 = K + (s− 0 + ∆ )w11 (γ )
by (13)
334
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
and K
E [w4 (ξ , γ ′ )|ξ , λθ , λ] =
(θi − yi )2 E γi′ |ξ , λθ , λ
i=1 K
=
2a3 + 1
(θi − yi )2
2b3 + (θi − yi )2
i=1
2a3 + 1
≤
2b3
w18 (ξ ).
Therefore, E [w4 (ξ ′ , γ ′ )|ξ , λθ , λ, γ ] ≤
=
1 2 1 2
1 1 1 2 K + (s− 0 + ∆ )w11 (γ ) + 4 4
w4 (ξ , γ ) + w4 (ξ , γ ) +
1 2 s− 0 +∆
4
2a3 + 1
w11 (γ ) +
8b3
2a3 + 1 2b3
w18 (ξ ) +
K 4
w18 (ξ )
.
Recall that 0 < c < min{b1 , b2 , b3 }. Thus 3
1 ′ w5 (λθ ) + E ec λθ |ξ , λ, γ 4 4 a1 +K /2 K 2 + λ (θ − µ) 2b 1 i i 3 1 i=1 = w5 (λθ ) + K 4 4 2b1 + λi (θi − µ)2 − 2c
E [w5 (λ′θ )|ξ , λθ , λ, γ ] =
i=1
3
≤
4
w5 (λθ ) +
1
a1 +K /2
b1 b1 − c
4
.
Similarly, E [w6 (λ )|ξ , λθ , λ, γ ] ≤ ′
3 4
w6 (λ) +
K 4
b2
a2 +1/2
b2 − c
.
Next, E [w7 (ξ ′ , λ′θ )|ξ , λθ , λ, γ ] =
1 2
1
1
4
4
w7 (ξ , λθ ) + E [w7 (ξ ′ , λθ )|λθ , λ, γ ] + E [w7 (ξ , λ′θ )|ξ , λ, γ ]
where E [w7 (ξ ′ , λθ )|λθ , λ, γ ] =
K 1/2 |λθ , λ, γ E λθ (θi′ − µ′ )2 + 2b2 i=1
≤
K
λθ E [(θi′ − µ′ )2 |λθ , λ, γ ] + 2b2
1/2
by Jensen’s inequality
i =1
≤
K
λθ
γi
i =1
≤
K i =1
λθ
1
1
γi
+
+
2 s0 2 s0
+∆
2
+∆
2
1/2
+ 2b2
by (12)
+ 2b2 + 1
1 2 = K (2s− 0 + ∆ )w9 (λθ ) + w12 (λθ , γ ) + K (2b2 + 1)
and E [w7 (ξ , λ′θ )|ξ , λ, γ ] =
K 1/2 E λ′θ (θi − µ)2 + 2b2 |ξ , λ, γ i=1
K 1/2 ≤ (θi − µ)2 E (λ′θ |ξ , λ, γ ) + 2b2 i =1
by Jensen’s inequality
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
K (θi − µ)2 i =1
=
335
1/2
2a1 + K K
2b1 +
λi (θi − µ)2
+ 2b2
i=1
≤
K 2a1 + K
2b1
i=1
=
2a1 + K 2b1
(θi − µ)2 + 2b2 + 1
w3 (ξ ) + K (2b2 + 1).
Hence 2a1 + K
E [w7 (ξ ′ , λ′θ )|ξ , λθ , λ, γ ] =
8b1
1
1 2 K (2s− 0 +∆ )
2
4
w3 (ξ ) + w7 (ξ , λθ ) +
1
K (2b2 + 1)
4
2
w9 (λθ ) + w12 (λθ , γ ) +
Further, 1
E [w8 (ξ ′ , λ′ )|ξ , λθ , λ, γ ] =
2
1
1
4
4
w8 (ξ , λ) + E [w8 (ξ ′ , λ)|λθ , λ, γ ] + E [w8 (ξ , λ′ )|ξ , λθ , γ ]
where E [w8 (ξ ′ , λ)|λθ , λ, γ ] =
K
λi E [(θi′ − µ′ )2 |λθ , λ, γ ]
i=1
≤
K
λi
i =1
1
γi
+
= w13 (λ, γ ) +
2
+ ∆2
s0
2 s0
by (12)
+ ∆2 w16 (λ)
and E [w8 (ξ , λ′ )|ξ , λθ , γ ] =
K (θi − µ)2 E [λ′i |λθ , λ, γ ] i=1
K = (θi − µ)2 i=1
≤
2a2 + 1 2b2
2a2 + 1 2b2 + λθ (θi − µ)2
w3 (ξ ).
Hence E [w8 (ξ ′ , λ′ )|ξ , λθ , λ, γ ] ≤
2a2 + 1 8b2
1
1
1 2 2s− 0 +∆
2
4
4
w3 (ξ ) + w8 (ξ , λ) + w13 (λ, γ ) +
Continuing with the calculations, E [w9 (λ′θ )|ξ , λθ , λ, γ ] =
=
3 4 3 4
1
w9 (λθ ) + E (λ′θ |ξ , λ, γ ) w9 (λθ ) +
4 1 4
2a1 + K 2b1 +
K
λi (θi − µ)2
i=1
≤ E [w10 (γ ′ )|ξ , λθ , λ, γ ] =
=
≤
3 4 3 4 3 4 3 4
w9 (λθ ) + w10 (γ ) + w10 (γ ) + w10 (γ ) +
2a1 + K 8b1
,
K 1 cγ ′ E e i |ξ , λθ , λ 4 i =1 K 1
4 i =1 K 4
2b3 + (θi − yi )2
2b3 + (θi − yi )2 − 2c b3
b3 − c
a3 +1/2
a3 +1/2
w16 (λ).
.
336
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
and 3
E [w11 (γ ′ )|ξ , λθ , λ, γ ] =
w11 (γ ) +
4 3
=
w11 (γ ) +
4 3
≤
4
w11 (γ ) +
K 1 ′ E γi |ξ , λθ , λ 4 i=1 K 1
2a3 + 1
4 i=1 2b3 + (θi − yi )2 K (2a3 + 1)
.
8b3
Further, E [w12 (λ′θ , γ ′ )|ξ , λθ , λ, γ ] =
1 2
1
1
4
4
w12 (λθ , γ ) + E [w12 (λ′θ , γ )|ξ , λ, γ ] + E [w12 (λθ , γ ′ )|ξ , λθ , λ]
where E [w12 (λ′θ , γ )|ξ , λ, γ ] = w14 (γ )E [λ′θ |ξ , λ, γ ] 2a1 + K
= w14 (γ ) 2b1 +
K
λi (θi − µ)2
i=1
≤
2a1 + K
w14 (γ )
2b1
and E [w12 (λθ , γ ′ )|ξ , λθ , λ] = w9 (λθ )
= w9 (λθ )
K 1 |ξ , λ , λ E θ γi′ i =1 K 2b3 + (θi − yi )2
2a3 − 1
i =1
=
2b3 K 2a3 − 1
w9 (λθ ) +
1 2a3 − 1
w15 (ξ , λθ ).
Hence E [w12 (λ′θ , γ ′ )|ξ , λθ , λ, γ ] ≤
b3 K 2(2a3 − 1)
1
2a1 + K
2
8b1
w9 (λθ ) + w12 (λθ , γ ) +
w14 (γ ) +
1 4(2a3 − 1)
Similarly, E [w13 (λ′ , γ ′ )|ξ , λθ , λ, γ ] =
1 2
1
1
4
4
w13 (λ, γ ) + E [w13 (λ′ , γ )|ξ , λθ , γ ] + E [w13 (λ, γ ′ )|ξ , λθ , λ]
where E [w13 (λ′ , γ )|ξ , λθ , γ ] =
= ≤
K 1 E [λ′i |ξ , λθ , γ ] γ i i=1 K 1 2a2 + 1 + λθ (θi − µ)2 γ 2b i 2 i=1
2a2 + 1 2b2
w14 (γ )
and E [w13 (λ, γ ′ )|ξ , λθ , λ] =
K
λi E
i =1 K
=
λi
γi
2b3 + (θi − yi )2
i =1
=
|ξ , λ , λ θ ′
1
2b3 2a3 − 1
2a3 − 1
w16 (λ) +
1 2a3 − 1
w17 (ξ , λ).
w15 (ξ , λθ ).
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
337
Hence 1
E [w13 (λ′ , γ ′ )|ξ , λθ , λ, γ ] ≤
2a2 + 1
w13 (λ, γ ) +
2
8b2
w14 (γ ) +
b3 2(2a3 − 1)
w16 (λ) +
1 4(2a3 − 1)
w17 (ξ , λ).
Next E [w14 (γ ′ )|ξ , λθ , λ, γ ] =
= =
3 4 3 4 3 4
K 1
w14 (γ ) +
4 i =1
E
|ξ , λ , λ θ ′
1
γi
1 2b3 + (θi − yi )2 K
w14 (γ ) +
2a3 − 1
4 i =1 1
w14 (γ ) +
4(2a3 − 1)
w18 (ξ ) +
Kb3 2(2a3 − 1)
.
Further, 1
E [w15 (ξ ′ , λ′θ )|ξ , λθ , λ, γ ] =
2
1
1
4
4
w15 (ξ , λθ ) + E [w15 (ξ ′ , λθ )|λθ , λ, γ ] + E [w15 (ξ , λ′θ )|ξ , λ, γ ]
where K E (θi′ − yi )2 |λθ , λ, γ
E [w15 (ξ ′ , λθ )|λθ , λ, γ ] = λθ
i=1 K 1
≤ λθ
i =1
γi
+
1 s0
+∆
2
by (13)
1 ≤ K ∆2 + w9 (λθ ) + w12 (λθ , γ ) s0
and E [w15 (ξ , λ′θ )|ξ , λ, γ ] = w18 (ξ )E [λ′θ |ξ , λ, γ ] 2a1 + K
= w18 (ξ ) 2b1 +
K
λi (θi − µ)2
i=1
≤
2a1 + K 2b1
w18 (ξ ).
Hence K
E [w15 (ξ ′ , λ′θ )|ξ , λθ , λ, γ ] ≤
4
1 1 1 2a1 + K ∆2 + w9 (λθ ) + w12 (λθ , γ ) + w15 (ξ , λ) + w18 (ξ ). s0
4
2
8b1
Now E [w16 (λ′ )|ξ , λθ , λ, γ ] =
= ≤
3 4 3 4 3 4
w16 (λ) + w16 (λ) + w16 (λ) +
K 1
4 i=1
E [λ′i |ξ , λθ , γ ]
K 1
2a2 + 1
4 i=1 2b2 + λθ (θi − µ)2 K (2a2 + 1) 8b2
.
Note that E [w17 (ξ ′ , λ′ )|ξ , λθ , λ, γ ] =
1 2
1
1
4
4
w17 (ξ , λ) + E [w17 (ξ ′ , λ)|λθ , λ, γ ] + E [w17 (ξ , λ′ )|ξ , λθ , γ ]
where E [w17 (ξ ′ , λ)|λθ , λ, γ ] =
K i=1
λi E (θi′ − yi )2 |λθ , λ, γ
338
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
≤
K
λi
1
γi
i=1
≤
1 s0
1
+
+∆
s0
2
by (13)
+ ∆2 w16 (λ) + w13 (λ, γ )
and E [w17 (ξ , λ′ )|ξ , λθ , γ ] =
K (θi − yi )2 E [λ′i |ξ , λθ , γ ] i =1
K = (θi − yi )2 i =1
≤
2a2 + 1
2a2 + 1 2b2 + λθ (θi − µ)2
w18 (ξ ).
2b2
Hence E [w17 (ξ , λ )|ξ , λθ , λ, γ ] ≤ ′
′
1 4
1
w13 (λ, γ ) +
4
1
+∆
s0
2
1
2a2 + 1
2
8b2
w16 (λ) + w17 (ξ , λ) +
w18 (ξ ).
Finally, E [w18 (ξ ′ )|ξ , λθ , λ, γ ] =
≤ ≤
3 4 3 4 1 4
w18 (ξ ) + w18 (ξ ) +
K 1 ′ E (θi − yi )2 |λθ , λ, γ 4 i=1 K 1
1
γi
4 i =1 3
1
+
w14 (γ ) + w18 (ξ ) +
s0
K
4
+ ∆2
4
1 s0
by (13)
+∆
2
.
Appendix B. Conditional expectations for the proof of Theorem 2 This section contains the calculation of the conditional expectations required for the proof of Theorem 2. Some supplementary calculations are given in Appendix C. Note that E [w1 (λ′θ )|ξ , λθ , λ] =
= E [w2 (λ′ )|ξ , λθ , λ] =
2 3 2 3 2 3
1
w1 (λθ ) + E [w1 (λ′θ )|ξ , λ] 3
w1 (λθ ) + w2 (λ) +
1 3(2a1 + K − 2)
K 1
3 i=1 1
w8 (ξ , λ) +
2b1 3(2a1 + K − 2)
E [(λ′i )−1/2 |ξ , λθ ]
Γ (a2 ) w7 (ξ , λθ ), 3 3 2 Γ (a2 + 1/2) K 1 2 E [w3 (ξ ′ )|ξ , λθ , λ] = w3 (ξ ) + E [(θi′ − µ′ )2 |λθ , λ]
=
≤ =
2
w2 (λ) + √
3
3 i=1
2
K 1
3 2 3
w3 (ξ ) + w3 (ξ ) +
1
γi
3 i=1
+
2 s0
+∆
2
by (12)
α 3
and E [w4 (ξ ′ )|ξ , λθ , λ] =
≤
2 3 2 3
w4 (ξ ) + w4 (ξ ) +
K 1
3 i=1 K 1
3 i =1
γi E [(θi′ − yi )2 |λθ , λ] γi
1
γi
+
1 s0
+ ∆2
by (13)
,
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
2
≤
3
w4 (ξ ) +
1
1 K + (∆2 + s− 0 )
3
K
γi .
i =1
Further 2
1
w5 (λθ ) + E (ec λθ |ξ , λ) 3 3 a 1 +K / 2 1 b1 + w8 (ξ , λ)/2 2 = w5 (λθ ) + 3 3 b1 − c + w8 (ξ , λ)/2 a1 +K /2 2 1 b1 ≤ w5 (λθ ) + 3 3 b1 − c
E [w5 (λ′θ )|ξ , λθ , λ] =
′
and, similarly, E [w6 (λ )|ξ , λθ , λ] ≤ ′
2 3
w6 (λ) +
K
a2 +1/2
b2 b2 − c
3
.
Next, E [w7 (ξ ′ , λ′θ )|ξ , λθ , λ] =
1 3
1
1
3
3
w7 (ξ , λθ ) + E [w7 (ξ , λ′θ )|ξ , λ] + E [w7 (ξ ′ , λθ )|λθ , λ]
where E [w7 (ξ , λ′θ )|ξ , λ] =
K
E {[λ′θ (θi − µ)2 + 2b2 ]1/2 |ξ , λ}
i=1
≤
K [(θi − µ)2 E (λ′θ |ξ , λ) + 2b2 ]1/2 by Jensen’s inequality i =1
1/2
=
K (θi − µ)2 i=1
2a1 + K 2b1 +
K
λi (θi − µ)2
+ 2b2
i=1
≤
K
(θi − µ)2
2a1 + K
(θi − µ)2
2a1 + K
i =1
≤
K i =1
=
2a1 + K 2b1
2b1
2b1
1/2 + 2b2 + 2b2 + 1
w3 (ξ ) + K (2b2 + 1)
and E [w7 (ξ ′ , λθ )|λθ , λ] =
K
E {[λθ (θi′ − µ′ )2 + 2b2 ]1/2 |λθ , λ}
i =1 K ≤ {λθ E [(θi′ − µ′ )2 |λθ , λ] + 2b2 }1/2 by Jensen’s inequality i=1
1/2 K 2 1 2 ≤ + + ∆ + 2b2 λθ γi s0 i=1 ≤
K
{αi λθ + 2b2 + 1}1/2
i=1
≤ αw9 (λθ ) + K (2b2 + 1).
by (12)
339
340
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
It follows that 1
E [w7 (ξ ′ , λ′θ )|ξ , λθ , λ] ≤
3
w7 (ξ , λθ ) +
2a1 + K 6b1
w3 (ξ ) +
α 3
w9 (λθ ) +
2K (2b2 + 1) 3
.
Further, 1
E [w8 (ξ ′ , λ′ )|ξ , λθ , λ] =
3
1
1
3
3
w8 (ξ , λ) + E [w8 (ξ ′ , λ)|λθ , λ] + E [w8 (ξ , λ′ )|ξ , λθ ]
where E [w8 (ξ ′ , λ)|λθ , λ] =
K
λi E [(θi′ − µ′ )2 |λθ , λ] ≤
K
i=1
αi λi = w10 (λ)
i=1
and E [w8 (ξ , λ′ )|ξ , λθ ] =
K K (θi − µ)2 E [λ′i |ξ , λθ ] = (θi − µ)2 i =1
i=1
2a2 + 1 2b2 + λθ (θi − µ)2
≤
2a2 + 1 2b2
w3 (ξ ).
Putting this together we obtain E [w8 (ξ ′ , λ′ )|ξ , λθ , λ] =
1 3
1
2a2 + 1
3
6b2
w8 (ξ , λ) + w10 (λ) +
w3 (ξ ).
Continuing the calculations: E [w9 (λ′θ )|ξ , λθ , λ] =
=
2 3 2 3
1
w9 (λθ ) + E [w9 (λ′θ )|ξ , λ] w9 (λθ ) +
3 1 3
2a1 + K 2b1 +
K
λi (θi − µ)2
i=1
≤
2 3
w9 (λθ ) +
2a1 + K 6b1
and E [w10 (λ′ )|ξ , λθ , λ] =
= ≤
2 3 2 3 2 3
w10 (λ) + w10 (λ) + w10 (λ) +
10 1
3 i =1 10 1
αi E [λ′i |ξ , λθ ] 2a2 + 1
αi
3 i=1 2b2 + λθ (θi − µ)2 (2a2 + 1)α 6b2
.
Appendix C. Components of ξ0 and V Doss and Hobert [4] derive the components of ξ0 and V and upper bounds for them. Letting t =
K γi λθ λi , λ θ λi + γi i =1
ξ0 has components λθ λi E [θi |λθ , λ] = λθ λi + γi
K γi yi γj λθ λj yj + m 0 s0 + s0 + t j=1 λθ λj + γj λθ λi + γi 1
and
K γj λθ λj yj E [µ|λθ , λ] = + m0 s0 . s0 + t j=1 λθ λj + γj 1
Let ∆ denote the length of the convex hull of {y1 , . . . , yK , m0 }. Note that E [µ|λθ , λ] is a convex combination of y1 , . . . , yK , m0 and that E [θi |λθ , λ] is a convex combination of E [µ|λθ , λ] and y1 , . . . , yK . Hence
(E [θi |λθ , λ] − E [µ|λθ , λ])2 ≤ ∆2 and (E [θi |λθ , λ] − yi )2 ≤ ∆2 for i = 1, . . . , K .
(11)
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
341
The components of V are given by
λ2θ λ2i 1 1 ≤ + λθ λi + γi (λθ λi + γi )(s0 + t ) γi s0 1 λ2θ λi λj ≤ Cov(θi , θj |λθ , λ) = (λθ λi + γi )(λθ λj + γj )(s0 + t ) s0 1 λθ λi ≤ Cov(θi , µ|λθ , λ) = (λθ λi + γi )(s0 + t ) s0 Var(µ|λθ , λ) =
1
Var(θi |λθ , λ) =
1 s0 + t
1+
≤
1 s0
.
Now we can see that for i = 1, . . . , K E [(θi − µ)2 |λθ , λ] = Var(θi |λθ , λ) + Var(µ|λθ , λ) − 2Cov(θi , µ|λθ , λ) + [E (θi |λθ , λ) − E (µ|λθ , λ)]2
≤
1
γi
+
2 s0
+ ∆2
(12)
and E [(θi − yi )2 |λθ , λ] = Var(θi |λθ , λ) + [E (θi |λθ , λ) − yi ]2
≤
1
γi
+
1 s0
+ ∆2 .
(13)
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33]
K.S. Chan, C.J. Geyer, Comment on Markov chains for exploring posterior distributions, Ann. Statist. 22 (1994) 1747–1758. P. Diaconis, K. Khare, L. Saloff-Coste, Gibbs sampling, orhtogonal polynomials and orthogonal families, Statist. Sci. 23 (2008) 151–178. C.R. Doss, J.M. Flegal, G.L. Jones, R.C. Neath, Markov chain Monte Carlo estimation of quantiles, Electron. J. Stat. 8 (2014) 2448–2478. H. Doss, J.P. Hobert, Estimation of Bayes factors in a class of hierarchical random effects models using a geometrically ergodic MCMC algorithm, J. Comput. Graph. Statist. 19 (2010) 295–312. J.M. Flegal, M. Haran, G.L. Jones, Markov chain Monte Carlo: Can we trust the third significant figure? Statist. Sci. 23 (2008) 250–260. J.M. Flegal, J. Hughes, mcmcse: Monte Carlo standard errors for MCMC R package version 1.0-1, 2012 http://cran.r-project.org/web/packages/mcmcse/index.html. J.M. Flegal, G.L. Jones, Batch means and spectral variance estimators in Markov chain Monte Carlo, Ann. Statist. 38 (2010) 1034–1070. J.M. Flegal, G.L. Jones, Implementing Markov chain Monte Carlo: Estimating with confidence, in: S. Brooks, A. Gelman, G. Jones, X.-L. Meng (Eds.), Handbook of Markov Chain Monte Carlo, CRC Press, Boca Raton, FL, 2011. S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6 (1984) 721–741. O. Häggström, J.S. Rosenthal, Variance conditions for Marlov chain CLTs, Electron. Commun. Probab. 12 (2007) 454–464. J.P. Hobert, C.J. Geyer, Geometric ergodicity of Gibbs and block Gibbs samplers for a hierarchical random effects model, J. Multivariate Anal. 67 (1998) 414–430. J.P. Hobert, G.L. Jones, B. Presnell, J.S. Rosenthal, On the applicability of regenerative simulation in Markov chain Monte Carlo, Biometrika 89 (2002) 731–743. A.A. Johnson, G.L. Jones, R.C. Neath, Component-wise Markov chain Monte Carlo, Statist. Sci. 28 (2013) 360–375. G.L. Jones, On the Markov chain central limit theorem, Probab. Surv. 1 (2004) 299–320. G.L. Jones, M. Haran, B.S. Caffo, R. Neath, Fixed-width output analysis for Markov chain Monte Carlo, J. Amer. Statist. Assoc. 101 (2006) 1537–1547. G.L. Jones, J.P. Hobert, Honest exploration of intractable probability distributions via Markov chain Monte Carlo, Statist. Sci. 16 (2001) 312–334. G.L. Jones, J.P. Hobert, Sufficient burn-in for Gibbs samplers for a hierarchical random effects model, Ann. Statist. 32 (2004) 784–817. G.L. Jones, G.O. Roberts, J.S. Rosenthal, Convergence of conditional Metropolis-Hastings samplers, Adv. Appl. Probab. 46 (2014) 422–445. K. Latuszynski, G.O. Roberts, J.S. Rosenthal, Adaptive Gibbs samplers and related MCMC methods, Ann. Appl. Probab. 23 (2013) 66–98. K.-J. Lee, G.L. Jones, B.S. Caffo, S. Bassett, Spatial Bayesian variable selection models for functional magnetic resonance imaging data, Bayesian Anal. 9 (2013) 699–732. R.A. Levine, G. Casella, Optimizing random scan Gibbs samplers, J. Multivariate Anal. 97 (2006) 2071–2100. J.S. Liu, W.H. Wong, A. Kong, Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes, Biometrika 81 (1994) 27–40. J.S. Liu, W.H. Wong, A. Kong, Covariance structure and convergence rate of the Gibbs sampler with various scans, J. R. Stat. Soc. Ser. B Stat. Methodol. 57 (1995) 157–169. D. Marchev, J.P. Hobert, Geometric ergodicity of van Dyk and Meng’s algorithm for the multivariate Student’s t model, J. Amer. Statist. Assoc. 99 (2004) 228–238. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, Equations of state calculations by fast computing machine, J. Chem. Phys. 21 (1953). S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability, Cambridge University Press, Cambridge, 2009. O. Papaspiliopoulos, G.O. Roberts, Stability of the Gibbs sampler for Bayesian hierarchical models, Ann. Statist. 36 (2008) 95–117. S. Richardson, L. Bottolo, J.S. Rosenthal, Bayesian models for sparse regression analysis of high dimensional data, in: J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M. Smith, M. West (Eds.), Bayesian Statistics 9, Oxford University Press, Oxford, 2010. G.O. Roberts, N.G. Polson, On the geometric convergence of the Gibbs sampler, J. R. Stat. Soc. Ser. B Stat. Methodol. 56 (1994) 377–384. G.O. Roberts, J.S. Rosenthal, Geometric ergodicity and hybrid Markov chains, Electron. Commun. Probab. 2 (1997) 13–25. G.O. Roberts, J.S. Rosenthal, Convergence of slice sampler Markov chains, J. R. Stat. Soc. Ser. B Stat. Methodol. 61 (1999) 643–660. G.O. Roberts, J.S. Rosenthal, General state space Markov chains and MCMC algorithms, Probab. Surv. 1 (2004) 20–71. G.O. Roberts, J.S. Rosenthal, Surprising convergence properties of some simple Gibbs samplers under various scans, preprint, 2015.
342
A.A. Johnson, G.L. Jones / Journal of Multivariate Analysis 140 (2015) 325–342
[34] V. Roy, J.P. Hobert, Convergence rates and asymptotic standard errors for Markov chain Monte Carlo algorithms for Bayesian probit regression, J. R. Stat. Soc. Ser. B Stat. Methodol. 69 (2007) 607–623. [35] A. Tan, J.P. Hobert, Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration, J. Comput. Graph. Statist. 18 (2012) 861–878. [36] A. Tan, G.L. Jones, J.P. Hobert, On the geometric ergodicity of two-variable Gibbs samplers, in: G.L. Jones, X. Shen (Eds.), Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2013.