Multiple change-point detection of multivariate mean vectors with the Bayesian approach

Multiple change-point detection of multivariate mean vectors with the Bayesian approach

Computational Statistics and Data Analysis 54 (2010) 406–415 Contents lists available at ScienceDirect Computational Statistics and Data Analysis jo...

979KB Sizes 0 Downloads 70 Views

Computational Statistics and Data Analysis 54 (2010) 406–415

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Multiple change-point detection of multivariate mean vectors with the Bayesian approachI Sooyoung Cheon, Jaehee Kim ∗ Korea University, Seoul, Republic of Korea Duksung Women’s University, Seoul, Republic of Korea

article

info

Article history: Received 17 November 2008 Received in revised form 31 August 2009 Accepted 1 September 2009 Available online 9 September 2009

abstract Bayesian multiple change-point models are proposed for multivariate means. The models require that the data be from a multivariate normal distribution with a truncated Poisson prior for the number of change-points and conjugate priors for the distributional parameters. We apply the stochastic approximation Monte Carlo (SAMC) algorithm to the multiple change-point detection problems. Numerical results show that SAMC makes a significant improvement over RJMCMC for complex Bayesian model selection problems in change-point estimation. © 2009 Elsevier B.V. All rights reserved.

1. Introduction The change-point problem is of increasing interest in various fields such as economics, finance, pharmacology, psychology, geology, meteorology, environmental studies and even in daily lives. However, the problem of identifying change-points is one of the most challenging statistical problems since both the number of change-points and their locations are unknown. Parametric methods have been proposed by many researchers including Hinkley (1970) who took a likelihood-based approach to the change-point problem. Chernoff and Zacks (1964) took a Bayesian change-point approach and considered a Markov model for the normal observations. Smith (1975) also proposed a Bayesian method based on the posterior probabilities of the possible change-points for binomial and normal distributions. Carlin et al. (1992) extended Smith’s approach using Markov chain Monte Carlo (MCMC) methods for change-points having continuous support. Unlike the classical statistical inference where a fixed probabilistic mechanism of data generation is assumed, those Bayesian inference approaches assume a random mechanism that incorporates the prior information. In the multiple change-point setting, Venter and Steel (1996) identified multiple abrupt change-points in a sequence of observations via hypothesis testing. Hawkins (2001) developed an approach with maximum likelihood estimates of the change-points and within-segment parameters in the exponential family. Barry and Hartigan (1993) and Crowley (1997) applied the product partition model (PPM) to identify multiple change-points in normal means. Loschi and Cruz (2005) extended PPM to calculate the probability of change-points in means and variances of normal data with Gibbs sampling. Chen and Gupta (2000) considered testing and estimating for a single change-point in the multivariate mean vectors. Son and Kim (2005) studied Bayesian single change-point detection in a sequence of multivariate normal observations using the Bayes factor. Gooijer (2005) proposed a general test statistic for detecting change-points in multi-dimensional stochastic processes with unknown variances.

I This work was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST R01-2008-00010133-0). ∗ Corresponding author at: Duksung Women’s University, Seoul, Republic of Korea. Tel.: +82 2 901 8334; fax: +82 2 901 8331. E-mail address: [email protected] (J. Kim).

0167-9473/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2009.09.003

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

407

In many of the Bayesian approaches, Markov sampling techniques have been used for the calculation of posterior probabilities. Due to the many possible partitions, the model space becomes complex with multiple modes, and the traditional Monte Carlo methods are prone to get trapped in local energy minima. Unlike MCMC, stochastic approximation Monte Carlo (SAMC) is known to be well applicable to many complicated computational problems, such as neural network training (Liang, 2007a) and Bayesian model selection (Liang et al., 2007). However, its application to Bayesian change-point detection in the multivariate model has not been well explored. This paper is concerned with the multiple change-point problem when the observations follow a multivariate normal distribution. A latent vector as the change-point positions is introduced to partition the observations, and the other parameters are integrated out with an appropriate choice of prior distributions. We show that, for the model selection problems having complex sample space, SAMC yields better results than RJMCMC (reversible jump Markov Chain Monte Carlo). This paper is organized as follows. Section 2 develops the Bayesian multiple change-point model for the multivariate normal distribution and the posterior distribution of change-points. The SAMC algorithm to find an optimal solution over the parameter space of variable dimension and the sampling method for Bayesian model selection problems are provided in Section 3. Section 4 presents some numerical results applied to simulation and real data sets for multiple change-point detection. Section 5 concludes the paper. 2. Bayesian multiple change-point model of multivariate normal observations Let Z = (z1 , z2 , . . . , zn ) denote the independent observation sequence ordered in time. There exists a partition on the set {1, 2, . . . , n} that divides the data into blocks within which the observations follow the same distribution. The partition is expressed using a binary vector x = (x1 , x2 , . . . , xn−1 ) with xc1 = xc2 = · · · = xck = 1 and 0 elsewhere, and 0 = c0 < c1 < · · · < ck < ck+1 = n. There are k unknown change-points in the model. The multiple change-point model with independent zi s can be written as follows: zi ∼ fr (·|θr ),

cr −1 < i ≤ cr

(1)

for r = 1, 2, . . . , k + 1 and fr depends on the parameters θr ∈ Θ . Thus the parameter values change at the indexes c1 + 1, c2 + 1, . . . , ck−1 + 1, ck + 1. Here, c1 , c2 , . . . , ck−1 , ck are called the change-points. Let fr be a d-dimensional multivariate normal density parameterized by θr = (µr , 6r ) with d-dimensional mean vector µ and d × d covariance matrix 6. Let x(k) denote a configuration of x with k change-points. Let η(k) = (x(k) , µ1 , 61 , . . . , µk+1 , 6k+1 ) and Ak be the space of models with k change-points, x(k) ∈ Ak and χ = ∪nk=0 Ak . The likelihood function of Z with d-dimensional observed vector zj is L(Z|η(k) ) =

ck+1

c1 Y

Y

f1 (zj |µ1 , 61 ) · · ·

j=c0 +1

fk+1 (zj |µk+1 , 6k+1 ).

(2)

j=ck +1

The log-likelihood function of model η(k) is log L(Z|η

(k)

)=−

dn 2

log(2π ) −

( k+1 X ci − ci−1 2

i =1

log |6i | +

1

ci X

) (zj − µi ) 6i (zj − µi ) . 0

−1

2 j =c +1 i−1

(3)

We set the prior distribution for x(k) in η(k) as

π (x(k) ) =

λk (n − 1 − k)! , nP −1 (n − 1)! λj j =0

k = 0, 1, . . . , n − 1.

(4)

j!

We set a Poisson distribution truncated at (n − 1) for Ak , with parameter λ and specify that each model in Ak is equally 1 likely. A vague prior and an inverse-Wishart IW (υ0 , Λ− 0 ) are placed on the µi ’s and 6i ’s, respectively. These priors are independent of each other. Then the log-prior density is log π (η

(k)

) = ak −

k+1  X υ0 + d + 1 i=1

2

log |6i | + tr



1 2

30 6i

−1

 (5)

where

" ak = −(k + 1)

ν0 d 2

log 2 +

d(d − 1) 4

+ log(n − 1 − k)! + k log λ.

log π +

d X u=1

log Γ



ν0 + 1 − u 2

 −

ν0 2

# log |30 |

408

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

Here υ0 , Λ0 and λ are fixed hyper-parameters. The log-posterior of η(k) (up to an additive constant) can be obtained by adding (3) and (5). The combined log probability is as follows: log P (Z|η(k) )P (6|x(k) )π (x(k) )

= ak −

( k+1 X 1

)

(zj − µi ) 6i 0

−1

2 j =c +1 i−1

i=1

= ak −

ci X

k+1  X ci − ci−1 + υ0 + d + 1

log |6i | + tr

2

i=1

ci − ci−1 + υ0 + d + 1 1 1 (zj − µi ) + log |6i | + tr(30 6− i ) 2 2



30 2

+

(ci − ci−1 ) 2



1 Si 6− i



 (ci − ci−1 ) 1 i ¯ + (µi − z¯ i )0 6− (µ − z ) , i i

(6)

2

since ci X

1 −1 0 i 0 0 i (zj − µi )0 6− i (zj − µi ) = tr[6i (1µi − z ) (1µi − z )]

j=ci−1 +1 1 −1 ¯i = [(ci − ci−1 )(µi − z¯ i )0 6− i (µi − z )] + tr[(ci − ci−1 )(6i Si )]

where zj = (zj1 , . . . , zjd )0



··· .. . ···

zci−1 +1,1

.. .

i

 z =

for j = ci−1 + 1, . . . , ci ,

z ci , 1

zci−1 +1,d

.. .

1(ci −ci−1 )×1 = (1, . . . , 1)0

  ,

1

i

z¯ =

ci X

ci − ci−1 j=c +1 i−1

z ci , d

zj1 , . . . ,

!0

ci X

1

ci − ci−1 j=c +1 i−1

zjd

µi = (µi1 , . . . , µid ) , 0

6i , −1

(7)

for i = 1, . . . , k + 1

and Si =

ci X

1

ci − ci−1 j=c +1 i−1

(zi − z¯ i )(zi − z¯ i )0 .

Integrating out µ1 , . . . , µk+1 from the full posterior distribution (6), and taking the logarithm, and multiplying by (2π)d/2 |6i |1/2 /(ci − ci−1 ) gives bk −

k+1  X (ci − ci−1 ) + υ0 + d

2

i=1

(k+1)d

where bk = ak + 2 log 2π − and taking the logarithm gives (k)

log π (x

log |6i | +

|Z) = bk +

Pk+1 i=1

( k+1 X vi d i =1

2

1 2



1 tr ({30 + (ci − ci−1 )Si }6− i ) ,

log(ci − ci−1 ). Then integrating out 61 , . . . , 6k+1 from the full posterior distribution

log 2 +

d(d − 1) 4

log π +

d X

log Γ



vi + 1 − u 2

u =1

 −

νi 2

) log |30 + (ci − ci−1 )Si |

(8)

where vi = ci − ci−1 + v0 − 1. In the d-variate normal distribution case, we integrate out (3) and (5) with respect to (µ1 , 61 , . . . , µk+1 , 6k+1 ) to calculate the Bayes factors. Then for a given k, − d(k2+1)

e (2π ) ck

k+1 Y

"

i=1

νi d

d(d−1)  # d ν Y π 4 νi + 1 − u − 2i |30 + (ci − ci−1 )Si | Γ (ci − ci−1 )1/2 2 u=1

2

2

(9)

where

" ck = −(k + 1)

ν0 d 2

log 2 +

d(d − 1) 4

log π +

d X u =1

log Γ



ν0 + 1 − u



2



ν0 2

# log |30 | .

(10)

We can choose the model that minimizes Bayesian Information Criterion (BIC) of competing models for a data set. The model with the highest posterior probability may be the one that minimizes BIC = −2(log maximized likelihood) + (log n)(number of parameters). Note that if the candidate models have insignificant differences among their posterior probabilities but have large gaps among the numbers of their parameters, the model with the highest posterior probability may not have the minimum BIC.

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

409

3. Application of SAMC to multiple change-point estimation 3.1. SAMC algorithm The basic idea of SAMC (Liang, 2007a,b) can be explained briefly as follows. Let f (x) = c ψ(x),

x∈χ

(11)

denote the target probability density/mass function, Rwhere χ is the sample space and c is an unknown constant. Let E1 , E2 , . . . , Em denote a partition of χ , and let wi = E ψ(x)dx for i = 1, . . . , m. SAMC seeks to sample from the trial i distribution fw (x) =

m X gi ψ(x) i =1

wi

I (x ∈ Ei )

(12)

where gi ’s are pre-specified constants such that gi > 0 for all i and i=1 gi = 1 and g = (g1 , g2 , . . . , gm ) is called the desired sampling distribution of the subregions. If w1 , w2 , . . . , wm can be well estimated, sampling from fw (x) will result in a ‘‘random walk’’ in the space of subregions. Each subregion is sampled with a frequency proportional to gi . Hence, the local-trap problem can be essentially overcome, provided that the sample space is appropriately partitioned. The success of SAMC depends crucially on the estimation of this wi . Let δti denote the working estimate of log(wi /gi ) obtained at iteration t, let δt = (δt1 , δt2 , . . . , δtm ), and let {γt } denote a positive, non-increasing sequence satisfying the conditions

Pm

(i)

∞ X

γt = ∞,

(ii)

t =1

∞ X

ς

γt < ∞

(13)

t =1

for some ζ ∈ (1, 2). Since fw (x) is invariant with respect to a scale change of w = (w1 , w2 , . . . , wm ), the domain of δt can be kept in the compact set Ω in simulations by adjusting δt with a constant vector. In this paper, we set Ω = [−10100 , 10100 ]m , although this is practically equivalent to setting Ω = Rm . One iteration of the SAMC algorithm consists of the following steps: (a) (Sampling) Simulate a sample xt by a single MH update with the target distribution fδt (x) ∝

m X ψ(x) i =1

eδti

I (x ∈ Ei ).

(14)

(b) (Weight updating) Set

δ∗ = δt + γt +1 (˜et − g )

(15)

where e˜ t = (˜et ,1 , . . . , e˜ t ,m ) and e˜ t ,i = 1 if xt ∈ Ei and 0 otherwise. If δ ∈ Ω , set δt +1 = δ ; otherwise, set δt +1 = δ + c ∗ where c ∗ = (c ∗ , . . . , c ∗ ) can be an arbitrary vector which satisfies the condition δ∗ + c ∗ ∈ Ω . For an effective implementation of SAMC in the change-point detection, several issues need to be considered. The sample space should be partitioned according to the model index: e.g., E1 = {x : k = 1}, E2 = {x : k = 2}, . . . . In this paper, without loss of generality, we consider only models with kmin ≤ k ≤ kmax , where k is the number of change-points, and kmin and kmax can be determined after a pilot study of the above algorithm, respectively. Outside this range, P (χi |Z) ≈ 0. The gain factor sequence is set to be ∗

γt =



t0 max(t0 , t )

0.6





t = 1, 2, . . .

(16)

where t0 > 0 is a user-specified tuning parameter. A practical guideline for the choice of t0 is to examine the flatness of the histogram of the samples drawn at different subregions. A histogram is said to be flat if the sampling frequency at each subregion is not less than 80% of the average sampling frequency over subregions. If the histogram is not flat, SAMC should be re-run with a larger value of t0 , a larger number of iterations, or both. Refer to Liang et al. (2007) for the choice of t0 . 3.2. Application of SAMC to Bayesian model selection problems We apply the SAMC algorithm to the multiple change-point identification as a Bayesian model selection problem. For change-point detection, the maximum a posteriori (MAP) estimate of x(k) is often a reasonable solution to the problem. In addition to this estimate, we are interested in estimating the marginal posterior distribution P (χk |Z) to compare SAMC and RJMCMC in this paper. Let Ek = χk and ψ(·) ∝ P (χk |Z). From the convergence theorem of SAMC (Liang et al., 2007), w ˆ i(t ) /w ˆ j(t ) = eδti −δtj forms a (t ,l)

consistent estimator for the ratio P (χi |Z)/P (χj |Z) when t is large. The sampling step of SAMC is as follows. Let xk

denote

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

-10

-5

Y 0

5

10

410

-10

-5

0 X

5

10

Fig. 1. A plot of simulated data from the bivariate normal distributions.

the lth sample generated at an iteration t, where k indicates the number of change-points in the sample. The next sample can be generated from the following procedure. (a) Set j = k − 1, k, or k + 1 according to probabilities qk,j , where qk,k = 1/3 for kmin ≤ k ≤ kmax , qkmin ,kmin +1 = qkmax ,kmax −1 = 2/3, and qk,k+1 = qk,k−1 = 1/3 if kmin < k < kmax . (t ,l)

(b) Update xk by a ‘‘death’’, ‘‘simultaneous’’ or ‘‘birth’’ move if j = k − 1, k or k + 1, respectively. The ‘‘death’’, ‘‘simultaneous’’, and ‘‘birth’’ moves are designed as described in Green (1995) and Liang (2007b). The new sample of the ‘‘death’’, ‘‘simultaneous’’ or ‘‘birth’’ move is denoted by x∗k−1 , x∗k or x∗k+1 , respectively. The acceptance probabilities of the three types of moves are as follows. For the ‘‘death’’ move, it is

(

eδt ,k P (x∗k−1 |Z) qk−1,k

1

)

min 1, δ e t ,k−1 P (x(t ,l) |Z) qk,k−1 cu+1 − cu − 1 k

.

(17)

The acceptance probabilities of ‘‘simultaneous’’ and ‘‘birth’’ moves are respectively,

( min 1,

P (x∗k |Z) (t ,l)

P (xk

|Z)

)

( and

eδt ,k P (x∗k+1 |Z) qk+1,k

)

min 1, δ (cu+1 − cu − 1) e t ,k+1 P (x(t ,l) |Z) qk,k+1 k

(18)

where u is drawn differently according to the ‘‘death’’, ‘‘simultaneous’’ or ‘‘birth’’ move. 4. Numerical results 4.1. A simulated example with bivariate normal observations A total of 1000 observations were generated from

        −3.5 1.0 0.1 5.5 1.2 0.2 z1 , . . . , z120 ∼ i.i.d. N , , z121 , . . . , z210 ∼ i.i.d. N , , −3.5 0.1 1.3 3.0 0.2 1.2         −4.0 0.8 −0.1 4.0 1.1 0.1 z211 , . . . , z460 ∼ i.i.d. N , , z461 , . . . , z530 ∼ i.i.d. N , , −7.5 −0.1 1.1 0.0 0.1 1.0         −5.5 0.9 0.2 2.0 1.1 −0.1 z531 , . . . , z615 ∼ i.i.d. N , , z616 , . . . , z710 ∼ i.i.d. N , , −7.0 0.2 1.2 5.0 −0.1 1.1         −4.0 1.3 −0.1 4.5 1.0 0.2 z711 , . . . , z800 ∼ i.i.d. N , , z801 , . . . , z950 ∼ i.i.d. N , , −1.0 −0.1 1.0 7.5 0.2 1.1     −3.0 1.3 −0.1 z951 , . . . , z1000 ∼ i.i.d. N , . −1.0 −0.1 0.6 Nine data sets were generated independently as shown in Fig. 1. This example illustrates the multiple change-point identification with SAMC.    0.1 0.01 1 As a prior, we set the hyper-parameter (υ0 , Λ− 2, 0.01 0.1 , which corresponds to a conjugate prior on 6i 0 ) =

and λ = 8. These hyper-parameters are set up in the simulation. We set kmin = 7 and kmax = 14. Within this range, we partitioned the phase space into model index, {E1 , E2 , · · · , E7 , E8 }. SAMC was run for 106 iterations with t0 = 1000, m = 8, and g1 = · · · = gm = 1/m. The relative sampling frequency of each subregion is defined as Ni /N¯ × 100%, where Ni and N¯ denote the sampling frequency of subregion i and the average sampling frequency over all nonempty subregions,

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

411

Table 1 Relative sampling frequencies of the subregions for the simulated example. Index

Frequency (%)

Index

Frequency (%)

Index

Frequency (%)

7 10 13

98.2480 100.3192 100.1184

8 11 14

100.3696 100.2280 100.1376

9 12

100.4056 100.1736

Table 2 The 10 models with the largest log-posterior values sampled by SAMC. The true model is underlined and is ranked first in log-posterior values among all models sampled by SAMC. No

Log-posterior

# change-points

Change patterns

BIC

1 2 3 4 5 6 7 8 9 10

4745.3955 4743.1221 4742.9761 4742.2905 4742.0039 4741.9253 4741.6831 4740.9971 4740.8145 4740.5957

8 9 9 9 9 9 9 9 9 9

(120, 210, 460, 530, 615, 710, 800, 950) (120, 210, 460, 530, 615, 710, 800, 939, 950) (120, 210, 460, 530, 615, 710, 800, 943, 950) (120, 210, 460, 530, 615, 710, 800, 941, 950) (120, 210, 460, 530, 615, 710, 800, 944, 950) (120, 210, 460, 530, 615, 710, 800, 940, 950) (120, 210, 460, 530, 540, 615, 710, 800, 950) (120, 210, 460, 466, 530, 615, 710, 800, 950) (120, 210, 460, 530, 615, 710, 800, 942, 950) (120, 210, 460, 530, 615, 710, 800, 945, 950)

−9428.6212 −9417.1666 −9416.8746 −9415.5035 −9414.9303 −9414.7730 −9414.2887 −9412.9166 −9412.5514 −9412.1139

5000 10000 15000 0

Relative frequency

a

200

0

400

600

800

1000

Change-point position

0

Obs

-10

-5

-5

0

Obs

5

5

10

c

b

0

200

600 Sequence

1000

0

200

600 Sequence

1000

Fig. 2. Simulation results of SAMC for the change-point example: (a) The posterior histogram of change-point positions from the posterior samples generated by SAMC; (b) A comparison of the MAP estimates of the change-point positions and the true change-point positions in observations of the first variable. The vertical (dotted) lines indicated the change-point positions identified by the MAP model, and the horizontal (dashed) lines indicate the mean value of observations separated by change-point positions of the true model; (c) same with (b) except for observations of the second variable.

respectively. Table 1 indicates that each subregion has been sampled with approximately the same probability for each run. This result shows that the SAMC simulation has converged, and our choices of t0 and the total number of iterations are appropriate for this example. The change-point patterns found by SAMC are summarized in Fig. 2 and Table 2. Fig. 2 shows the performance of the MAP estimates of the change-points, indicating that SAMC does well in estimating the change-points. Table 2 lists the ten models with the largest log-posterior values identified by SAMC. The underlined model is the true pattern and is ranked first in log-posterior values among all models. Fig. 2(a) is the posterior histogram of the identified change-point positions with the model having high posterior probabilities. Fig. 2(b) and (c) show the subregions divided by the maximum a posteriori estimate of the change-points, (120, 210, 460, 530, 615, 710, 800, 950), which is exactly the same as the true pattern.

412

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

Table 3 The estimated posterior distributions for the change-point identification example. SD: standard deviation of the estimation. CPU: the CPU time (in seconds) cost by a single run of the corresponding algorithm on a 2.0 GHz computer. K

SAMC

RJMCMC

Prob (%) 7 8 9 10 11 12 13 14

Prob (%)

SD

<0.0001

<0.0001

64.6114 23.4853 10.9576 0.8989 0.0436 0.0312 0.0001

5.3115 2.1769 6.8729 0.7904 0.0371 0.0030 0.0002

0.0032 64.8828 25.2688 7.8493 1.7261 0.2502 0.0180 0.0015

0.0037 13.5405 7.5526 7.4006 2.6125 0.4200 0.0291 0.0024

-Log-posterior

4.25 4680 4700 4720 4740 4760

CPU (s)

SD

4.11

SAMC RJMCMC

0

1000

2000

3000

4000

5000

Iteration

Fig. 3. The sample paths of log-posterior of SAMC and RJMCMC.

To provide some basis for comparison, RJMCMC was also applied to the same simulated data with the same parameter settings and with the estimated posterior distribution, P (χk |Z). Each algorithm was run 10 times independently. The two algorithms performed exactly the same number of energy evaluations in each run and the probability estimates generated by SAMC and RJMCMC are consistent. However as shown in Table 3, estimates by SAMC have smaller variances than those by RJMCMC in probability models. Note that SAMC samples equally from each model space, while RJMCMC samples proportionally to its probability from each model space. We also examined the evolving sample paths of the model by SAMC and RJMCMC. Fig. 3 shows that the samples generated by SAMC are mixed and reach to the maximum value faster than the RJMCMC sampler. SAMC differs from RJMCMC in the estimation of the model probabilities and the way of accepting a new sample; i.e., acceptance probabilities of the three types of moves, ‘‘death’’, ‘‘simultaneous’’ or ‘‘birth’’ move. SAMC accepts a new sample with adjusted probability and works in the reverse direction of the estimation error of the model probability. This adaptability of SAMC overcomes difficulties in dimension—jumping moves and guarantees the convergence of the algorithm. However, the RJMCMC algorithm has the different acceptance probabilities. The acceptance probability of ‘‘death’’ move in RJMCMC is

( min 1,

P (x∗k−1 |Z) qk−1,k (t ,l)

P (xk

)

1

|Z) qk,k−1 cu+1 − cu − 1

.

(19)

For ‘‘simultaneous’’ move and ‘‘birth’’ move, their respective acceptance probabilities are

( min 1,

P (x∗k |Z) (t ,l)

P (xk

|Z)

)

( and

min 1,

P (x∗k+1 |Z) qk+1,k (t ,l)

P (xk

|Z) qk,k+1

) (cu+1 − cu − 1)

(20)

where u is drawn according to the ‘‘death’’, ‘‘simultaneous’’ or ‘‘birth’’ move. In RJMCMC each model is sampled with a frequency proportional to its probability not having the self-adjusting ability. In simulations, RJMCMC often stays longer on the model with a significantly higher probability than its neighboring models. Since SAMC updates the estimates of model probabilities in the logarithmic scale, SAMC works better for models with large difference in probabilities (Table 3). On the other hand RJMCMC does not work as well for these models since it is meant for a group of models with comparable probabilities (Liang et al., 2007). 4.2. Example of fat data from Hanwoo bulls and steers Palatability of meat is generally described in terms of tenderness, juiciness and flavor scores (Forrest, 1975). Among the different parameters that determine palatability, marbling, or intramuscular fat content, has been a major component due to its relationship with beef flavor, tenderness and juiciness (Wheeler et al., 1994). Sensory evaluations were done on ten

413

60 40 20

TENDER

80

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

40

50

60

70

80

90

JUICY

Fig. 4. A plot of juiciness and tenderness.

Table 4 Relative sampling frequencies of the subregions for fat data. Index

Frequency (%)

Index

Frequency (%)

Index

Frequency (%)

2 5 8

99.0401 100.1133 100.3172

3 6 9

99.5193 100.2294 100.3005

4 7 10

99.8748 100.2443 100.3613

Table 5 The 10 models with the largest log-posterior values sampled by SAMC. No

Log-posterior

# change-points

Change patterns

BIC

1 2 3 4 5 6 7 8 9 10

1124.8646 1124.6780 1124.3990 1124.3872 1124.2974 1124.2969 1123.9556 1123.8123 1123.7988 1123.7576

7 5 6 5 8 4 4 8 6 7

[2(0.05), 3(0.06), 80(4.09), 83(4.23), 233(15.53), 245(17.26), 251(18.84)] (2, 3, 233, 244, 251) (8, 80, 81, 233, 245, 251) (2, 3, 233, 245, 251) (2, 3, 80, 83, 233, 245, 249, 251) (233, 245, 250, 251) (233, 244, 250, 251) (2, 3, 80, 81, 83, 233, 245, 251) (80, 81, 233, 246, 250, 251) (1, 2, 3, 233, 244, 250, 251)

−2204.8828 −2215.7211 −2209.5575 −2215.1396 −2198.1425 −2220.5647 −2219.8821 −2197.1723 −2208.3570 −2202.6687

The second column shows the log-posterior values of the a posteriori models. The fifth column indicates BIC values for model selection.

various cuts of Korean Hanwoo bulls and steers in Korea in 2006. In Korean cuisine, roasting is a popular cooking method for beef. The fat percentage of each cut was measured to analyze its relation with the taste of beef due to their association (Jennings et al., 1978; Indurain et al., 2009). Consumers were asked to taste and score the samples from 0 (dislike extremely) to 100 (like extremely) for tenderness, juiciness and flavor. Fig. 4 shows the plot of juiciness and tenderness. It is of interest to determine at which fat percentage levels the sensory evaluation of beef taste changes. We apply our method to find multiple change-points of fat percentage with the bivariate sensory evaluation data. The sequential design points were determined and sorted by the fat percentage. 1 We partitioned the phase space with model index by setting kmin = 2 and kmax = 10 and λ = 3, and Λ− 0 =



1 0.8

0.8 1



.

The run of SAMC is 2.0 × 10 . Table 4 shows the relative sampling frequencies obtained in one run. The approximate equality of the relative sampling frequencies at each subregion implies the appropriateness of the parameter settings for SAMC. The highest log-likelihood value and maximum posterior change-point estimates (2, 3, 80, 83, 233, 245, 251) are shown in Table 5 with ten other best models. 6

Fig. 5(a) shows that most of the models with high posterior probabilities include three clustering of change-points around 2, 82 and 242. Since change-point estimates 2 and 3, and 80 and 83 are nearest points, we select the best model with 5 change-point estimates, (3, 83, 233, 245, 251) by the maximum of the posterior density. Fig. 5(b) shows the plot of juiciness and tenderness, distinguished by the estimated change-points, (3, 83, 233, 245, 251); i.e., (0.06, 4.23, 15.53, 17.26, 18.84) in fat percent. Fig. 5(c) and (d) shows the locations of change-points. For examining the Gaussian hypothesis, we perform the Shapiro–Wilks test for the bivariate normality in each subregion divided by the change-points. Table 6 provides p-values of Shapiro–Wilks tests. Chi-square plots for the third subregion (that with the smallest Shapiro–Wilks p-value) are shown in Fig. 6(a) and (b). Excluding the points with the eight highest Mahalanobis distances, the Shapiro–Wilks test gives a p-value of 0.1729. Table 6 gives the change-points of beef taste characterized by juiciness and tenderness according to the fat percent.

414

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

b 15000

1-3 4-83 84-233 234-245 246-251 252-272

20

0

40

5000

TENDER

60

10000

Relative frequency

80

a

0

50

100

150

200

40

250

50

60

70

80

90

JUICY

Change-point position

d

20

40

50

40

60

TENDER

70 60

JUICY

80

80

90

c

0

50

100

150

200

250

0

50

FAT

100

200

150

250

FAT

Fig. 5. The results for the change-point example: (a) The posterior histogram of change-point positions constructed using the posterior samples by SAMC; (b) A plot of juiciness and tenderness, with the maximum log-posterior estimate of the change-point positions; (c) A plot of the maximum log-posterior estimate of the change-point positions with the change-point positions (vertical lines) and means (horizontal lines) of juiciness in each subregion. (d) same with (c) except for tenderness.

Table 6 The mean and covariance estimates in each subregion separated by estimated change-points. Partition (number of data)

Change-point position

Fat percentage

Mean   Juicy Tender

Covariance matrix  Var(Juicy) Cov(Juicy,Tender)

74.6850 63.5650 69.7476 62.1356 74.0979 67.7791 83.8963 83.2638 71.2708 67.0792 80.9586 80.6802

19.72627 88.56120 103.6607 112.8811 79.5242 94.9679 15.6461 16.5521 76.7372 20.8099 70.4805 96.0703

1 (3)

3

0.03–0.06

2 (80)

83

0.08–4.23

3 (150)

233

4.39–15.53

4 (12)

245

15.60–17.26

5 (6)

251

17.66–18.84

6 (21)

19.01–26.24



Cov(Juicy,Tender) Var(Tender) 88.5612 831.3330 112.8811 215.3725 94.9679 176.3116 16.5521 26.2289 20.8099 125.6797 96.0703 154.8323

Bivariate normality test (P-value)

0.0870 0.0033 0.0181 0.0347 0.0680

5. Concluding remark In this paper, we have proposed a Bayesian multiple change-point model with multivariate normal distribution for multiple change-point identification. As a computational tool for the posterior calculation, we adopted SAMC because it efficiently estimates the optimum values without getting stuck in a local minimum or maximum. The numerical results show that SAMC performs better than RJMCMC. The SAMC algorithm is a promising tool in the Bayesian change-point

415

8 6 4 2

P-value=0.0033

0

Mahalanobis Distance

10

12

S. Cheon, J. Kim / Computational Statistics and Data Analysis 54 (2010) 406–415

0

2

4

6

8

10

Chi-square Quantile

(a) Chi-square plot of the 150 data with 8 highest Mahalanobis Distance.

(b) Chi-square plot of the 142 data without 8 highest Mahalanobis Distance.

Fig. 6. Chi-square plots for assessing the bivariate normality of the third subregion (84–233).

estimation when the number of change-points is unknown. In an application of the methodology to the Hanwoo sensory evaluation data, meaningful change-points affecting the taste of beef were detected and estimated. Further work on the multi-dimensional change-point estimation is required when the data do not follow a multivariate Gaussian distribution. Acknowledgements We thank Dr. Soohyun Cho and the Korean National Institute of Animal Science for providing the data on Hanwoo beef. References Barry, D., Hartigan, J.A., 1993. A Bayesian analysis for change-point problems. Journal of the American Statistical Association 88, 309–319. Carlin, B.P., Gelfand, A.E., Smith, A.F.M., 1992. Hierarchical Bayesian analysis of changepoint problems. Applied Statistics 41, 389–405. Chen, J., Gupta, A.K., 2000. Parametric Statistical Change Point Analysis. Birkhäuser, New York. Chernoff, H., Zacks, S., 1964. Estimating the current mean of a normal distribution which is subject to changes in time. Annals of Mathematical Statistics 35, 999–1018. Crowley, E.M., 1997. Product partition models for normal means. Journal of the American Statistical Association 92, 192–198. Forrest, R.J., 1975. Effects of castration, sire and hormone treatments on the quality of rib roasts from Holstein–Friesian males. Canadian Journal of Animal Science 55–87. Gooijer, J.G., 2005. Detecting change-points in multidimensional stochastic processes. Computational Statistics & Data Analysis 51 (3), 1892–1903. Green, P.J., 1995. Reversible Jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732. Hawkins, D.M., 2001. Finding multiple change-point models to data. Computational Statistics & Data Analysis 37, 323–341. Hinkley, D.V., 1970. Inference about the change-point in a sequence of random variables. Biometrika 57, 1–17. Jennings, T.G., Berry, B.W., Joseph, A.L., 1978. Influence of fat thickness, marbling and length of aging on beef palatability and shelf-life characteristics. Journal of Animal Science 46 (3), 658–665. Indurain, G., Carr, T.R., Goni, M.V., Insausti, K., Beriain, M.J., 2009. The relationship of carcass measurements to carcass composition and intramuscular fat in Spanish beef. Meat Science 82, 155–161. Liang, F., 2007a. Annealing stochastic approximation Monte Carlo for neural network training. Machine Learning 68 (3), 201–233. Liang, F., 2007b. Improving SAMC using smoothing methods: Theory and Applications to Bayesian model selection problems. Technical Paper. Texas A&M University. Liang, F., Liu, C., Carroll, R.J., 2007. Stochastic approximation in Monte Carlo computation. Journal of the American Statistical Association 102, 305–320. Loschi, R.H., Cruz, F.R.B., 2005. Extension to the product partition model: Computing the probability of a change. Computational Statistics & Data Analysis 48, 255–268. Smith, A.F.M., 1975. A Bayesian approach to inference about a change-point in a sequence of random variables. Biometrika 62, 407–416. Son, Y.S., Kim, S.W., 2005. Bayesian single change point detection in a sequence of multivariate normal observations. Statistics 39 (5), 373–387. Venter, J.H., Steel, S.J., 1996. Finding multiple abrupt change points. Computational Statistics & Data Analysis 22, 481–504. Wheeler, T.L., Cundiff, L.V., Koch, R.M., 1994. Effect of marbling degree on beef palatability in Bos Taurus and Bos Inbdicus cattle. Journal of Animal Science 72, 3145–3151.