Bootstrap and Bayesian bootstrap clones for censored Markov chains

Bootstrap and Bayesian bootstrap clones for censored Markov chains

Journal of Statistical Planning and Inference 128 (2005) 459 – 474 www.elsevier.com/locate/jspi Bootstrap and Bayesian bootstrap clones for censored...

288KB Sizes 10 Downloads 353 Views

Journal of Statistical Planning and Inference 128 (2005) 459 – 474

www.elsevier.com/locate/jspi

Bootstrap and Bayesian bootstrap clones for censored Markov chains Cheng-Der Fuha , Edward H. Ipb;∗ a Institute

b Marshall

of Statistical Science, Academia Sinica, Taipei, Taiwan School of Business, University of Southern California, CA 90089, USA Received 13 September 2002; accepted 22 November 2003

Abstract This is a study of the behaviors of the naive bootstrap and the Bayesian bootstrap clones designed to approximate the sampling distribution of the Aalen–Johansen estimator of a nonhomogeneous censored Markov chain. The study shows that the approximations based on the Bayesian bootstrap clones and the naive bootstrap are 9rst-order asymptotically equivalent. The two bootstrap methods are illustrated by a marketing example, and their performance is validated by a Monte Carlo experiment. c 2003 Elsevier B.V. All rights reserved.  MSC: 62G05; 62F05 Keywords: Product estimator; Bayesian bootstrap clones; Transition probability matrix; Con9dence band

1. Introduction In studies in the medical and social sciences, one is often concerned with the evaluation of two or more time-dependent stochastic events and their relationships with one another. For example, cancer clinical trials commonly involve time until disease remission and/or relapse, time until treatment-related toxicity, and time until death. For this kind of study, Weiss and Zelen (1965) proposed a semi-Markov model in which the disease history of a patient was considered as a sequence of transitions from one disease state to another. To extend semi-Markov methods to censored data, Lagakos et al. (1978) developed non-parametric likelihood methods for multi-state stochastic ∗

Corresponding author. Department of Public Health Sciences, Medical Center Blvd, Wake Forest University, Winston-Salem NC 25157, USA. E-mail address: [email protected] (E.H. Ip). c 2003 Elsevier B.V. All rights reserved. 0378-3758/$ - see front matter  doi:10.1016/j.jspi.2003.11.009

460

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

processes. Based on a counting process approach, Voelkel and Crowley (1984) developed large-sample results for these methods. Methodologies for semi-Markov models have also developed in the contexts of demography and epidemiology, where a population process at the individual level can frequently be represented by a continuous-time Markov chain with a 9nite state space (cf. Hoem, 1976). Accordingly, the population phenomena may be described by the transition intensities of the model. Therefore, it is of great interest to estimate these functions. In the case of continuous-time censored Markov chains, which is a special case of semi-Markov processes, Aalen and Johansen (1978) introduced the so-called Aalen–Johansen estimator for general 9nite-state Markov processes. Their method allows rather general censoring patterns. Furthermore, they used product-integration, combined with tools from counting processes, martingales, and stochastic integrals, to study small- and large-sample properties of the estimator. Extendors of their work include Fleming (1978a, b), who de9ned the Aalen–Johansen estimator as an implicit solution to the Kolmogorov equation and derived large-sample results. Borgan and Ramlau-Hansen (1985) gave a thorough presentation of related estimators for intensities for continuous-time Markov models and a theory of incidence rates. A discrete-time version of the censored Markov processes was given by Phelan (1988), in which he applied this model to a case study of a malaria survey on a human population in Nigeria. In the context of Bayesian analysis, Hjort (1990) derived a non-parametric Bayes estimator for the integrated intensity in Markov process models by using the beta process prior. More recently, in an unpublished manuscript, James (1998) proposed a general bootstrap procedure that includes naive bootstrap and Bayesian bootstrap clones for approximating the 9nite sample distributional characteristics of the continuous time Aalen–Johnasen estimator. In particular, the author introduced a new sampling framework which induces a martingale bootstrap structure. Using the martingale property, large-sample results for the general bootstrap procedure were obtained. In this paper, we investigate the behaviors of bootstrap and Bayesian bootstrap clones that approximate the sampling distribution of the Aalen–Johansen estimator for discrete-time, non-homogeneous, censored Markov processes. Analogous results for the continuous time scale, as noted previously, were obtained in James (1998). Here we focus on discrete-time processes, and instead of using James’ martingale approach, we use a diGerent method to establish large-sample results. To set up notation, we follow Phelan (1988): Let n denote a time point for a Markov chain with 9nite state space E = {1; 2; : : : ; s}, and n 6 N . For n ¿ 1, let P n = (pijn ; i; j ∈ E) denote one-step transition probabilities at time n. Furthermore, let P = {P(l; m); 0 6 l 6 m} denote a family of Markov transition probabilities such that  P(l; m) = Pn ; l¡n6m

where P(l; m)=I if the product above is empty, and r denote the sample size. For each r ¿ 1 and 9xed N ¿ 0, let (; F; r ) denote a probability space on which a family of processes {(Xnk ; Jnk ); 0 6 n 6 N; k =1; 2; : : : ; r} is de9ned, and let F r ={Fnr ; 0 6 n 6 N } be a given increasing family of the sub--algebras of F. Note that (; F) is taken independent of r.

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

461

Assumptions. (1) For each k = 1; : : : ; r, J k = {Jnk ; Fnr ; 0 6 n 6 N } is a {0; 1}-valued stochastic process r ; i.e. J k is F r -predictable. in which Jnk ∈ Fn−1 (2) The random variables Xnk for k ∈ {k : Jnk =1} are independent; moreover, for i; j ∈ E k and {Xn−1 = i}, we have r P(Xnk = j|Fn−1 ) = pijn Jnk ;

1 6 n 6 N;

where Jnk is an indicator function about whether the kth process is “at risk” at time n. Here, it is assumed that when a realization of the process is censored, it is no longer at risk. For n = 1; : : : ; N and i; j ∈ E de9ne r  k LNn (i; j) = I (Xn−1 = i; Xnk = j; Jnk = 1) k=1

and Yn (i) =

r 

k I (Xn−1 = i; Jnk = 1):

k=1

For a given observation {(Xnk ; Jnk ); 1 6 k 6 r; 1 6 n 6 N }, and for any i; j ∈ E, de9ne the estimate pˆ nij of pijn by  LN (i; j)=Yn (i); 1 6 n 6 N; if Yn (i) ¿ 0; n pˆ ij = 0; otherwise: For 1 6 n 6 N , de9ne the random matrix Bˆ n = (Bˆ n (i; j); i; j ∈ E) by  n     pˆ lij ; for i = j;   l=1 Bˆ n (i; j) =     Bˆ n (i; j); for i = j: −   j=i

Following the idea in Aalen and Johansen (1978) for continuous-time, non-homogeneous, censored Markov chains, Phelan (1988) introduced the discrete-time Aalen– ˆ m); 0 6 l 6 m} as Johansen product estimator Pˆ = {P(l;  ˆ m) = P(l; (I + LBˆ n ); (1) l¡n6m

ˆ m)=I if product (1) is empty. The quantity Pˆ = P(l; ˆ m) where LBˆ n = Bˆ n − Bˆ n−1 and P(l; can be interpreted as the transition probabilities from time l to time m. According to Theorem 4.2 of Phelan (1988), the asymptotic distribution of the product estimator Pˆ in (1) is Gaussian, as r → ∞. Furthermore, the limiting covariance function is given by (4.6) of Phelan (1988), which involves the Kroneckor product of matrices.

462

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

In this paper, we investigate the 9rst-order eGect of using bootstrap and arbitrary positive random variables in Bayesian bootstrap for obtaining an approximation to ˆ While the limiting covariance function for Pˆ is availthe sampling distribution of P. able from large-sample theory, several motivations exist for considering bootstrap and Bayesian bootstrap methods. First, as pointed out by Efron and Tibshirani (1993), the bootstrap can be used as a substitute for mathematical calculation for covariance functions when the latter becomes intraction in complex situations. Second, in terms of computational complexity, the naive bootstrap method is O(Bns2 ) where s = Card(E), B is the number of bootstrap replications, while the standard method (Phelan’s equation (4.6)) is O(n3 s4 ). We performed a simulation experiment that showed that the CPU time required by the standard method could be several times higher than the bootstrap method for moderate values of n and s. For example, for n = 20, s = 15, the CPU times for the standard and bootstrap methods were, respectively, 14.4 and 2:8 s (B = 200). Finally, another motivation for considering the bootstrap and Bayesian bootstrap methods is their potential for providing better coverage for the population parameters when sample size is not large. First-order ePciency results of the bootstrap and Bayesian bootstrap methods are provided in Lo (1993). James (1998) surmised that the bootstrap method is also second-order ePcient. While presently there does not exist theoretical result on second-order ePciency of the bootstrap method, in this paper we provide simulation results that show that the bootstrap and Bayesian bootstrap methods generally outperform the standard method in terms of coverage probability (for a related report, see Lo, 1991). The remainder of this paper is organized as follows. In Section 2, we study√and report results on a naive bootstrap to approximate the sampling distribution of r(Pˆ − P). Results regarding a Bayesian bootstrap clones (BBC) procedure will be described in Section 3. In Section 4, we illustrate both bootstrap methods by a marketing example, in which brand switching probabilities are analyzed. Numerical simulations that compare the normal approximation, bootstrap, and BBC are described in Section 4. Finally, some open problems are outlined in the last section. 2. Bootstrap method In this section, we investigate the bootstrap method for censored Markov chains (as de9ned in Section 1), in which the observations are independent, identically distributed (i.i.d.) random processes. A substantial literature exists for bootstrap methods for Markov processes. For the approximation of sampling distribution of the pivot of the estimator of transition probabilities for a single observed Markov chain with 9nite states, Kulperger and Prakasa Rao (1990) proposed a parametric bootstrap method (cf. Efron, 1979); further investigation was given by Datta and McCormick (1992). The case for countable state space was investigated by Athreya and Fuh (1992), in which a 9rst-order large-sample theory was developed. Small-sample properties for bootstrapping Markov chains were further given by Fuh (1993). The approach described in this paper can be viewed as a generalization of Efron’s (1981) bootstrap method for the Kaplan–Meier estimator of survival data. When the

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

463

state space has only two states, one state represents the alive status, while the other state represents death (an absorbing state). Efron’s method of producing an approximation √ ˆ − S(t)) proceeds to the sampling distribution of the Kaplan–Meier estimator r(S(t) as follows: Obtain the bootstrap sample (x∗1 ; J ∗1 ); : : : ; (x∗r ; J ∗r ) by random sampling from (x1 ; J 1 ); : : : ; (xr ; J r ) with replacement, where J k = 1 or 0, respectively, denotes uncensored data or censored data, k = 1; : : : ; r. Construct a√Kaplan–Meier estimator ˆ Repeat Sˆ∗ (t) based on (x∗1 ; J ∗1 ); : : : ; (x∗r ; J ∗r ) and evaluate S∗ (t) = r(Sˆ∗ (t) − S(t)). this process B times to get S∗1 (t); : : : ; S∗B (t), and use the empirical distribution func√ ˆ − S(t)). tion of S∗1 (t); : : : ; S∗B (t) to approximate the (sampling) distribution of r(S(t) First-order large-sample theory for this procedure was discussed, independently, by Akrias (1986), Lo and Singh (1986), and Horvath and Yandell (1987). Second-order ePciency was given by Lai and Wang (1993) and Chen and Lo (1996). A Bayesian bootstrap for this problem has been investigated by Lo (1993), in which the algorithm was based on i.i.d. simulation from exponential distribution instead of resampling from observations. James (1997) derived Edgeworth expansions for a class of weighted bootstrap methods for the Kaplan–Meier estimator, of which the Aalen–Johansen estimator can be considered as a generalized form. Here, we consider a discrete-time Markov process and generalize Efron’s √ (1981) censored data bootstrap method to estimate the sampling distribution of r(Pˆ − P). The bootstrap algorithm for the Aalen–Johansen estimator can be described as follows. Let (x; j) = {(xkn ; jnk ); 0 6 n 6 N; k = 1; : : : ; r} be a realization from a non-homogeneous, censored Markov chain {(Xnk ; Jnk ); 0 6 n 6 N; k = 1; : : : ; r}. ∗k (1) Draw a family of bootstrap samples {(x∗k n ; jn ); 0 6 n 6 N; k = 1; : : : ; r} by inˆ the empirical distribution of dependently sampling r times with replacement from F, putting mass 1=r at each chain {(xkn ; jnk ); 0 6 n 6 N }. The bootstrap analogous to pˆ nij is de9ned by ∗ ∗ pˆ n∗ 1 6 n 6 N; ij = LNn (i; j)=Yn (i);  r r ∗k ∗k where LNn∗ (i; j) = k=1 I (Xn−1 = i; Xn∗k = j; Jn∗k = 1), and Yn∗ (i) = k=1 I (Xn−1 = ∗k i; Jn = 1): For 1 6 n 6 N , de9ne the random matrix Bˆ ∗n = (Bˆ ∗n (i; j); i; j ∈ E) by  n     pˆ l∗ for i = j;  ij ;  l=1 ∗ ˆ Bn (i; j) =     Bˆ ∗n (i; j); for i = j: −  j=i

The bootstrap estimator of P is given by  (I + LBˆ ∗n ): Pˆ ∗ (l; m) = l¡n6m

√ (2) Approximate the (sampling) distribution of r(Pˆ − P) by the bootstrap distri√ ∗ ˆ given (x; j). bution of r(Pˆ − P) Our 9rst result states that for censored Markov chain models the bootstrap distribution has a limiting distribution that is the same as the sampling distribution of

464

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474



r(Pˆ − P). In this sense, the bootstrap is consistent in approximating the sampling distribution for censored Markov chains with a 9nite state space. The following notations are needed in Theorem 1. For i ∈ E, let yi denote a strictly positive function on [0; N ], , denote N independent mean zero Gaussian and let Vn = (Vn (i; j); i; j ∈ E); n = 1; : : : ; N random matrices. For i ∈ E, Vn (i; i) = − j=i Vn (i; j), and for i = i; {Vn (i; j); j ∈ E} and {Vn (i ; j); j ∈ E} are independent. Assume that for i = j, Vn (i; j) has variance pijn (1−pijn )=yi (n), and that for j  = j, Vn (i; j) and Vn (i; j  ) has covariance −pijn pijn  =yi (n). De9ne the matrix-valued process U = {Un ; 1 6 n 6 N } by Un =

n 

P(0; l − 1)Vl P(l; n):

l=1

Theorem 1. Assume that a censored Markov chain model as de4ned in Section 1, which satis4es Assumptions 1 and 2, is given. Furthermore, for almost all sample sequences, suppose that for i ∈ E and n = 1; : : : ; N , Yn (i)=r → yi (n) as r → ∞. Then, as r → ∞, √ ∗ ˆ r(Pˆ − P)|(x; j) converges weakly to U: Proof. Note that √

ˆ n)) = r(Pˆ ∗ (0; n) − P(0;

n 

√ ˆ n): Pˆ ∗ (0; l − 1) r(LBˆ ∗l − LBˆ l )P(l;

(2)

l=1

Therefore, to prove weak convergence of verify that for given (x; j):



ˆ n)), by (2), we need to r(Pˆ ∗ (0; n) − P(0;

ˆ n)| → 0 in probability, as r → ∞. (i) max06n6N |Pˆ ∗ (0; n) − P(0; √ (ii) For each n = 1; : : : ; N , and l 6 n, the conditional distribution of r(LBˆ ∗l (i; j) − r converges to a normal distribution. LBˆ l (i; j)) given Fn−1 For the proof of (i), we have ˆ n)| 6 2N max |Bˆ ∗n − Bˆ n |: max |Pˆ ∗ (0; n) − P(0;

06n6N

06n6N

We continue to use superscript ? to indicate a bootstrap analogue. It is easy to see r that for i; j ∈ E, conditional on Fn−1 , Bˆ ∗n (i; j) − Bˆ n (i; j) is a martingale with respect to ∗r Fn−1 . By Doob’s inequality, we have, for any given " ¿ 0,

P ∗ max |Bˆ ∗n (i; j) − Bˆ n (i; j)| ¿ " 16n6N

1 1 6 2 E ∗ {Bˆ ∗N (i; j) − Bˆ N (i; j)}2 = 2 E ∗ " "

 N  l=1

2 (LBˆ ∗l (i; j)

− LBˆ l (i; j))

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

=



2 N N  LNl∗ (i; j) 1 ∗ ∗ l ˆ ∗l (i; j) − LBˆ l (i; j))2 = 1 E (L B E − p ˆ ij "2 "2 Yl∗ (i) l=1

=

465

l=1

N 1 1  l l ∗ ; p ˆ (1 − p ˆ )E ij ij "2 Yl∗ (i) l=1

ˆ Since Yl∗ (i)=r → yi (l) almost surely, by the where E ∗ (·) denotes expectation under F. ∗ ∗ dominated convergence theorem, E 1=Yl (i) → 0 as r → ∞. Hence, we obtain (i). ∗r For the proof of (ii), let i ∈ E, then conditional upon Fn−1 , {LNn∗ (i; j); j ∈ E} are independent, multinomial, random variables with the parameters {Yn∗ , pˆ ij ; j ∈ E}. Thus, when Yn∗ ¿ 0, we have   √ r(LBˆ ∗n (i; j) − LBˆ n (i; j))   → N (0; 1) in distribution as r → ∞:  (r=Yn∗ (i))pˆ nij (1 − pˆ nij )  (x; j)

By the assumptions and Theorems 4:1 and 4:2 of Phelan (1988), the asymptotic mean √ and variance of r(LBˆ ∗n (i; j) − LBˆ n (i; j)) are 0 and pij (1 − pij )=yi (n), respectively. Combining (i) and √ (ii), and applying (iii) of Theorem 3:4 of Jacod et al. (1982), ˆ | (x; j) converges weakly to U as r → ∞. we conclude that r(Pˆ ∗ − P)

3. Bayesian bootstrap method Our second result concerns the Bayesian bootstrap method for approximating the sampling distribution of the Aalen–Johansen estimator (1). The Bayesian bootstrap method was initially proposed by Rubin (1981) to approximate the posterior distribution for uncensored i.i.d. data with general priors. Lo (1987) showed that the Bayesian bootstrap and Efron’s bootstrap are 9rst-order asymptotically equivalent in approximating the sampling distribution of the sample mean. Weng (1989) showed that the Bayesian bootstrap approximation is superior to the standard mean with respect to a Dirichlet prior and studied its second-order property. Lo (1991) used arbitrary positive random variables in Bayesian bootstrap and examined its 9rst-order property. For random censoring models, Lo (1993) developed the Bayesian bootstrap based on simulation of i.i.d. exponential distribution. Fuh and Fan (1997) generalized the concept in their paper to 9nite-state Markov chains. They showed that the Bayesian bootstrap is second-order consistent in approximating the pivot of the posterior distribution of the transition probabilities. Fan (1995) considered the Bayesian bootstrap clones (BBC) procedure of the 9nite-state Markov chain problem. We show here that BBC is 9rst-order consistent, in the sense that the BBC distribution converges weakly to U . Let (x; j)={(xkn ; jnk ); 0 6 n 6 N; k=1; : : : ; r} be a realization from a non-homogeneous, censored Markov chain {(Xnk ; Jnk ); 0 6 n 6 N; k = 1; : : : ; r}, where {(x0k ; j0k ); k = 1; : : : ; r}

466

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

is assumed to be known. The key idea here is to rewrite pˆ nij = LNn (i; j)=Yn (i) as  LNn (i; j) t∈B (n) 1 n = Yn ij(i) ; (3) pˆ ij = Yn (i) t=1 1 j j where Bij (n) = { l=1 LNn (i; l − 1) + 1; : : : ; l=1 LNn (i; l)}, with LN sn (i; 0) = 0. Note that Bij (n) can be obtained by means of 9rst partitioning {1; 2; : : : ; i=1 Yn (i)} into s groups of sizes Yn (1); : : : ; Yn (s), say B1 (n); : : : ; Bs (n), and then dividing each Bi (n) into s disjoint subgroups of sizes LNn (i; 1); : : : ; LNn (i; s). These subgroups are denoted by Bij (n). The concept underlying BBC is to use simulation instead of resampling. That is, for each 9xed i, j we replace the “1’s” in (3) by i.i.d. positive random variables “Zit ’s” and obtain  t∈B (n) Zit ∗n : p˜ ij = Ynij(i) t=1 Zit For 1 6 n 6 N , de9ne the random matrix B˜ ∗n = (B˜ ∗n (i; j); i; j ∈ E) by  n     p˜ ∗l for i = j;  ij  l=1 ∗ B˜ n (i; j) =     B˜ ∗n (i; j) for i = j: −   j=i

Thus, the BBC estimator of P is  P˜ ∗ (l; m) = (I + LB˜ ∗n ): l¡n6m

The BBC algorithm for approximating the sampling distribution of Pˆ is described as follows: (1) Simulate i.i.d. positive random variables Zit with distribution G, for t=1; : : : ; Yn (i), i = 1; : : : ; s. (2) Replace the “1’s” in (3) by “Zit ’s” to obtain p˜ ∗n ij . (3) Repeat the previous two steps a large number of√times, say B times, to obtain ˜∗ ˆ P˜ ∗1 ; : : : ; P˜ ∗B , and use the √ empirical distribution based on %2 r(P b − P), for b = 1; : : : ; B to approximate that of r(Pˆ − P), where % = = and ,  denote the mean and variance of G. The next result states that for censored Markov chain models, the limiting distribution √ of the BBC distribution is the same as the sampling distribution of r(Pˆ − P). Theorem 2. Assume a censored Markov chain model as de4ned in Section 1. Under the conditions of Theorem 1, we have, for almost all sample sequences, that √ ˆ | (x; j) converges weakly to U: % r(P˜ ∗ − P)

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

467

Proof. Note that √

ˆ n)) = % r(P˜ ∗ (0; n) − P(0;

n 

√ ˆ n): P˜ ∗ (0; l − 1)% r(LB˜ ∗l − LBˆ l )P(0;

l=1

By an argument similar to that of Theorem 1, we obtain ˆ n) | → 0 in probability as r → ∞. (i) max06n6N | P˜ ∗ (0; n) − P(0; It remains to be shown that √ (ii) for each n = 1; : : : ; N , the conditional distribution of % r(LB˜ ∗l (i; j) − LBˆ l (i; j)) given the data converges to a normal distribution. For any 9xed n, i; j ∈ E, de9ne  SLNn (i;j) (n) ≡ (Zit − ); t∈Bij (n) Yn (i)

SYn (i);Yn (i)−LNn (i;j) (n) ≡



(Zit − ) −

t=1



(Zit − ):

t∈Bij (n)

Then, conditional upon (x; j), √ % r(LB˜ ∗n (i; j) − LBˆ n (i; j)) 





t∈Bij (n) Zit Yn (i) t=1 Zit

=% r



%r

= Yn (i) t=1

Zit

− pˆ nij

LNn (i; j) r

1−

LNn (i; j) Yn (i)

SLN (i;j) (n)  n LNn (i; j)

SY (i);Y (i)−LNn (i;j) (n) 1 LNn (i; j)  −√ Yn (i) − LNn (i; j) n n r Yn (i) Yn (i) − LNn (i; j)  =



r

Yn (i) t=1

 −

Zit

Yn (i) t=1

= (I) − (II)

SLN (i;j) (n) LNn (i; j) (1 − pˆ nij )  n Yn (i)  LNn (i; j) t=1 Zit



r



Zit

(say):

Yn (i) − LNn (i; j) n SYn (i);Yn (i)−LNn (i;j) (n) pˆ ij  Yn (i)  Yn (i) − LNn (i; j) t=1 Zit

468

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

Yn (i) Since t=1 Zit =Yn (i) →  almost surely, it is easy to check that for almost all sample sequences  yi (n)(I)  → N(0; 1) in distribution; (1 − pˆ nij ) pˆ nij  yi (n)(II)  → N(0; 1) in distribution: n pˆ ij 1 − pˆ nij Therefore, as r → ∞, √ % r(LB˜ ∗n (i; j) − LBˆ n (i; j)) → N(0; 1) pˆ nij (1 − pˆ nij )=yi (n)

in distribution:

This proves (ii). Remark. By using the Markov property and independence among r censored Markov chains, we obtain the likelihood L(data) ˙

N  

(pij )LNn (i; j) :

n=1 i; j

It can be proved equation beta processes are speci9ed as prior n that if in the likelihood n for Bn (i; j) = l=1 pijl , Bˆ n (i; j) = l=1 pˆ lij , where pˆ lij = LNl (i; j)=Yl (i), the posterior mean with respect to “Uat” prior, then the posterior of Bn (i; j) is also a beta process (Hjort, 1990, Theorem 5:2). However, the posterior of P is diPcult to clarify in the Bayesian framework. 4. A marketing example We illustrate the two bootstrap methods with an example from a marketing application. The data used in this example, collected from a national retail store chain in Japan, contained a history of household purchases of baby diapers. A goal of the analysis was to examine the transition intensities of brand switching among household diaper buyers. It was known that buyers sometimes switch to diGerent brands as the baby grows in size. Naturally, marketers are concerned about the evolution of the transition intensities in the entire lifecycle of diaper buyers. The original dataset contained a random sample of 20; 416 transactions (from January 1, 1996–December 31, 1999) generated by 2516 households. We only included households with at least 20 diaper purchases. Furthermore, to reduce a left-censoring problem, we only included households whose 9rst purchase was made 6 months after the January 1, 1996 start date. As a result, a sample of 121 households was eventually included in the analysis. Because each household started on a diGerent date, we aligned the data on the same time scale, so that patterns of the lifecycle could be examined. Approximately 14% of the data were right censored.

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

469

We examined the brand choice of baby diapers for these households that were recorded over a period of 20 months. There were two major and almost a dozen minor brands competing in the Japanese baby diaper market. We shall call the major brands M and R to protect anonymoity. Together, they controlled approximately 80 percent of the market share. To simplify our analysis, we grouped the brands into three categories: M, R and O (others). Accordingly, the Markov chain contains three distinct states and nine transition probabilities at each month-to-month transition. Fig. 1 shows the 95% con9dence bands of the naive bootstrap and the BBC (the distribution G = Exp(1)) methods for the nine transition probabilities in the Aalon –Johansen estimator P(0; m), m = 1; : : : ; 20. Each con9dence band was based on a bootstrap sample size of 200. In summary, we have the following observations: (1) The naive bootstrap and BBC con9dence bands generally agree quite well. (2) The widths of the con9dence bands remain relatively stable over time. (3) The con9dence bands suggest that within the evolutions of almost all of the paths there exist signi9cant changes—something that should be of general managerial interest. The programs for this analysis were written in Splus. 5. Numerical simulation We performed a simulation study to validate the equivalence of the naive bootstrap and the BBC methods. For a 2-state r censored Markov chains observed over a period of time, {(Xnk ; Jnk ); 0 6 n 6 10; k = 1; : : : ; r}, with initial state (X0k ; J0k ) = (1; 1), we compared the normal approximation, bootstrap, and BBC in approximating the con9dence interval for the transition probability matrix P. The data were generated by a non-homogeneous Markov chain with one-step transition probability at time n   1 1 1 2 − +  3 n+2 3 n+2  : Pn =  1 1 2 1  + − 3 n+6 3 n+6 The censoring time was uniformly distributed over [0; u]. For this small sample study, we used a sample size r = 100 with censoring probabilities that were 0.5, 0.25, and 0.125, respectively, at u = 20, 40 and 80. The bootstrap and BBC sample sizes were the same as the original sample size. Two parameters of interest were selected—p12 , p21 , where each is de9ned within the estimator:    p11 p12 n ; P ≡ P(0; 8) = p21 p22 0¡n68 where p11 = 1 − p12 , p22 = 1 − p21 . We simulated 95% con9dence intervals, along with the corresponding average lengths and empirical coverage probabilities. The results,

470

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

Fig. 1. The con9dence bands of the transition probabilities. The top three panels represent the Aalon– Johansen estimates and the respective con9dence bands for probabilities M → M, M → R, and M → O; the second line of panels represents R → M, R → R, and so on. The con9dence bands for BBC (G = Exp(1)) are in long dash, those for the naive bootstrap are in dots, and the solid line is the Aalon–Johansen estimate.

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

471

Table 1 Comparison of the approximately 95% con9dence intervals when censoring time is U (0; 20) r = 100

95% C.I.

A.L.

C.P.

p12 G = Exp(1) G = )(4; 1) G = )(16; 1) B.M. N.A.

(0.411,0.627) (0.413,0.630) (0.416,0.632) (0.410,0.627) (0.391,0.630)

0.216 0.217 0.216 0.217 0.239

0.886 0.879 0.873 0.880 0.905

p21 G= Exp(1) G = )(4; 1) G = )(16; 1) B.M. N.A.

(0.375,0.590) (0.373,0.588) (0.375,0.591) (0.372,0.590) (0.369,0.609)

0.215 0.215 0.216 0.218 0.240

0.888 0.899 0.884 0.882 0.915

Table 2 Comparison of the approximately 95% con9dence intervals when censoring time is U (0; 40) r = 100

95% C.I.

A.L.

C.P.

p12 G = Exp(1) G = )(4; 1) G = )(16; 1) B.M. N.A.

(0.417,0.611) (0.416,0.611) (0.412,0.611) (0.416,0.611) (0.405,0.618)

0.194 0.195 0.199 0.195 0.213

0.947 0.951 0.946 0.953 0.962

p21 G = Exp(1) G = )(4; 1) G = )(16; 1) B.M. N.A.

(0.393,0.586) (0.393,0.587) (0.388,0.584) (0.393,0.586) (0.382,0.595)

0.193 0.194 0.196 0.195 0.213

0.952 0.948 0.947 0.954 0.961

based on 1000 Monte Carlo trials, are listed in Tables 1–3. The following abbreviated notations are used in Tables 1–3: A.L.—average length, C.I.—con9dence interval, C.P.—coverage probability, N.A.—normal approximation, B.M.—bootstrap method. The results in Tables 1–3 suggest that the performance of naive bootstrap and BBC generally agree well in terms of coverage probabilities and length of con9dence intervals. As expected, the behaviors of the normal approximation, bootstrap, and BBC all get worse as u gets smaller—coverage probabilities get insuPcient for severe censoring. The bootstrap method does not perform well when data are heavily censored because the censored empirical distribution no longer well approximates the true distribution (Efron and Tibshirani, 1993, p.81). Among the three methods, the normal approximations tend to use slightly longer con9dence intervals, and as a result this

472

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

Table 3 Comparison of the approximately 95% con9dence intervals when censoring time is U (0; 80) r = 100

95% C.I.

A.L.

C.P.

p12 G = Exp(1) G = )(4; 1) G = )(16; 1) B.M. N.A.

(0.418,0.612) (0.417,0.612) (0.420,0.612) (0.418,0.612) (0.412,0.614)

0.194 0.195 0.192 0.194 0.202

0.953 0.948 0.941 0.952 0.956

p21 G = )(1; 1) G = )(4; 1) G = )(16; 1) B.M. N.A.

(0.392,0.585) (0.388,0.585) (0.387,0.584) (0.391,0.585) (0.385,0.588)

0.193 0.197 0.197 0.194 0.203

0.948 0.954 0.956 0.954 0.960

method covers the population values better than the bootstrap methods when censoring is heavy, but it also over-covers when censoring is moderate and slight. In general, the bootstrap and BBC both attain reasonably accurate coverage probabilities for the transition parameters. The programs for this simulation study were written in FORTRAN, and the computation was performed on a VAX-8650 computer. The IMSL subroutines generated the necessary random numbers.

6. Some open problems In this paper, we proposed the naive bootstrap and BBC procedures to construct approximate con9dence intervals for parameters of transition probabilities in censored Markov chains. The following are related research topics that are worthy of further investigation: (1) A theoretical study of the accuracy of bootstrap, which involves Edgeworth expansion for censored Markov chains. (2) Application of importance sampling to censored Markov chains. Importance sampling is a powerful technique for improving the ePciency of Monte Carlo simulation. In a related work Fuh et al. (1998) applied this technique to facilitate a reduction in the bootstrap replication size when simulating 9nite-state ergodic Markov chains. (3) Generalization of related work in Bayesian bootstrap to censored Markov chains. It is known that thecensored data Bayesian bootstrap survival function has the same distribution as j : t( j)6t (1 − *j ), where *j is the discrete hazard rate (cf. Lo, 1993). Some work in this direction and some second-order ePciency investigation have been conducted by the authors.

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

473

Acknowledgements This research was partially supported by the National Science Council of the Republic of China. We also acknowledge Professor Yukinobu Hamuro and the Pharma, Japan, for generously providing the data. References Aalen, O.O., Johansen, S., 1978. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand. J. Statist. 5, 141–150. Akrias, M.G., 1986. Bootstrapping the Kaplan–Meier estimator. J. Amer. Statist. Assoc. 83, 1032–1038. Athreya, K.B., Fuh, C.D., 1992. Bootstrapping Markov chains: countable case. J. Statist. Plan. Inference 33, 311–331. Borgan, O., Ramlau-Hansen, H., 1985. Demographic incidence rates and estimation of intensities with incomplete information. Ann. Statist. 13, 564–582. Chen, K., Lo, S.H., 1996. On bootstrap accuracy with censored data. Ann. Statist. 24, 569–595. Datta, S., McCormick, W.P., 1992. Bootstrap for a 9nite-state Markov chain based on i.i.d. resampling. In: Lepage, R., Billard, L. (Eds.), Exploring the Limits of Bootstrap. Wiley, New York, pp. 77–97. Efron, B., 1979. Bootstrap methods: another look at the jackknife. Ann. Statist. 7, 1–26. Efron, B., 1981. Censored data and the bootstrap. J. Amer. Statist. Assoc. 76, 312–319. Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman and Hall, New York. Fan, T.H., 1995. Bayesian bootstrap clones for 9nite state Markov chain. J. Statist. Comput. Simul. 53, 289–298. Fleming, T.R., 1978a. Nonparametric estimation for non-homogeneous Markov processes in the problem of competing risks. Ann. Statist. 6, 1057–1070. Fleming, T.R., 1978b. Asymptotic distribution results in competing risks estimation. Ann. Statist. 6, 1071–1079. Fuh, C.D., 1993. Statistical inquiry for Markov chains by bootstrap method. Statist. Sinica 3, 53–66. Fuh, C.D., Fan, T.H., 1997. A Bayesian bootstrap for 9nite-state Markov chains. Statist. Sinica 7, 1005–1019. Hjort, N.L., 1990. Nonparametric Bayes estimation based on beta processes in models for life history data. Ann. Statist. 18, 1259–1294. Hoem, J.M., 1976. The statistical theory of demographic rates. A review of current developments (with discussion). Scand. J. Statist. 3, 169–185. Horvath, L., Yandell, B.S., 1987. Convergence rates for the bootstrapped product limit process. Ann. Statist. 15, 1155–1173. Jacod, J., Klopotowski, A., MYemin, J., 1982. ThYeorZeme de la limite central et convergence fonctionelle vers un processus aZ accroissements indYependents: La mYethode des martingales. Inst. Henri PoincarYe 18, 1–45. James, J.F., 1997. A study of a class of weighted bootstrap for censored data. Ann. Statist. 25, 1595–1621. James, J.F., 1998. General exchangeable bootstraps for the Aalen–Johansen estimator. Unpublished manuscript. Kulperger, R.J., Prakasa Rao, B.L.S., 1990. Bootstrapping a 9nite state Markov chains. Sankhy[a Ser. A 51, 178–191. Lagakos, S.W., Sommer, C.J., Zelen, M., 1978. Semi-Markov models for partially censored data. Biometrika 65, 311–317. Lai, T.L., Wang, J.Q.Z., 1993. Edgeworth expansions for symmetric statistics with applications to bootstrap methods. Statist. Sinica 3, 517–542. Lo, A.Y., 1987. A large-sample study for the Bayesian bootstrap. Ann. Statist. 15, 360–375. Lo, A.Y., 1991. Bayesian bootstrap clones and a biometry function. Sankhya Ser. A 53, 320–333. Lo, A.Y., 1993. A Bayesian bootstrap for censored data. Ann. Statist. 21, 100–123. Lo, S.H., Singh, K., 1986. The product-limit estimator and the bootstrap: some asymptotic representations. Probab. Theory Related Fields 71, 455–465.

474

C. Fuh, E.H. Ip / Journal of Statistical Planning and Inference 128 (2005) 459 – 474

Phelan, M.J., 1988. Inference from censored Markov chains with applications to multi-wave panel data. Stochastic Process Appl. 29, 85–102. Rubin, D.B., 1981. The Bayesian bootstrap. Ann. Statist. 9, 130–134. Voelkel, J.G., Crowley, J.J., 1984. Nonparametric inference for a class of semi-Markov processes with censored observations. Ann. Statist. 12, 142–160. Weiss, G.H., Zelen, M., 1965. A semi-Markov model for clinical trial. J. Appl. Probab. 2, 269–285. Weng, C.S., 1989. On a second-order asymptotic property of the Bayesian bootstrap mean. Ann. Statist. 17, 705–710.