Estimation using plug-in of the stationary distribution and Shannon entropy of continuous time Markov processes

Estimation using plug-in of the stationary distribution and Shannon entropy of continuous time Markov processes

Journal of Statistical Planning and Inference 141 (2011) 2711–2725 Contents lists available at ScienceDirect Journal of Statistical Planning and Inf...

473KB Sizes 0 Downloads 55 Views

Journal of Statistical Planning and Inference 141 (2011) 2711–2725

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

Estimation using plug-in of the stationary distribution and Shannon entropy of continuous time Markov processes Philippe Regnault Laboratoire de Mathe´matiques Nicolas Oresme, Universite´ de Caen, BP 5186, F 14032 CAEN Cedex, France

a r t i c l e i n f o

abstract

Article history: Received 29 April 2010 Received in revised form 31 January 2011 Accepted 22 February 2011 Available online 16 March 2011

A natural way to deal with the uncertainty of an ergodic finite state space Markov process is to investigate the entropy of its stationary distribution. When the process is observed, it becomes necessary to estimate this entropy. We estimate both the stationary distribution and its entropy by plug-in of the estimators of the infinitesimal generator. Three situations of observation are discussed: one long trajectory is observed, several independent short trajectories are observed, or the process is observed at discrete times. The good asymptotic behavior of the plug-in estimators is established. We also illustrate the behavior of the estimators through simulation. & 2011 Elsevier B.V. All rights reserved.

Keywords: Ergodicity Plug-in estimation Pure jump Markov processes Shannon entropy Stationary distribution

1. Introduction Shannon (1948) introduced the entropy of a random variable in the context of information theory in order to measure the amount of information or uncertainty, of a source of information: if X is a random variable taking values in a finite set E with probability distribution P, the entropy SðXÞ of X is the entropy SðPÞ of its distribution, that is X PðxÞlogPðxÞ:

SðPÞ ¼ 

ð1Þ

x2E

Since then, the entropy of a random variable has been widely used in numerous fields involving random variables, such as large deviations theory or computer science; see Cover and Thomas (2006) for properties of entropy and an overview of possible topics. In statistical theory, the maximum entropy principle method, introduced by Jaynes (1957), is widely used to compute the distribution that best matches the observations, using all the information, without adding any. It favors the solution that can be realized in the greatest number of ways. The use of entropy as measure of information can be extended to a continuous time stochastic process X ¼ ðXt Þt2R þ taking values in a finite set E by considering the entropies of marginal distributions, that is the entropies of Xt, for t 2 R þ . If the process is stationary, its marginal distributions are equal to some common distribution p. Then the marginal entropies equal its entropy SðpÞ, which measures the amount of information of the process at any time t. When the process is not stationary, its marginal entropies may differ and the investigation of the amount of information of the process at any time becomes much more complicated. As an alternative, one can consider ergodic processes, whose asymptotic behavior is

E-mail address: [email protected] URL: http://www.math.unicaen.fr/~regnault/ 0378-3758/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2011.02.022

2712

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

described by some unique probability distribution. Then, the entropy of this asymptotic distribution measures the information of the process at equilibrium, that is when it stabilizes to its asymptotic behavior. The entropy of the marginal distribution of a stationary process and the entropy of the asymptotic distribution of an ergodic process are particularly useful for applications, such as stochastic dynamical systems topics; see Chiquet and Limnios (2006). In queueing systems theory, the maximum entropy method is used to estimate quantities of interest of the system at equilibrium, such as the distribution of the number of customers in the system, subject to some specific constraints (capacity of the system, proceeding rates, etc.); see Guiasu (1986) and El-Affendi and Kouvastos (1983). Among ergodic processes, homogeneous Markov processes with finite state space are particularly interesting examples. Such processes satisfy the Markov property, which states that their future behavior conditional to the past and present, depends only on the present. Precisely, for all t 2 R þ , h4 0, and for all sequences 0 rt1 r    r tr ¼ t, i1 , . . . ,ir 2 E and j 2 E,

PðXt þ h ¼ jjXt ¼ ir ,Xtr1 ¼ ir1 , . . . ,Xt1 ¼ i1 Þ ¼ PðXh ¼ jjX0 ¼ ir Þ: The behavior of such a process is characterized by an E  E matrix called the infinitesimal generator of the process. Its asymptotic distribution, which is also its stationary distribution, is a left eigenvector of the generator associated to the eigenvalue 0. The computation of the entropy of the stationary distribution then easily follows. When only observations of the process are available, the need to estimate entropy arises for its use in all applications. The asymptotic properties of the estimator of the entropy of a distribution P from independent and identically distributed (i.i.d.) observations, obtained by plugging the maximum likelihood estimator (MLE) of P into (1), were obtained long ago. Basharin (1959) derived the strong consistency and asymptotic normality of the plug-in estimator from the law of large numbers and the central limit theorem. Zubkov (1973) proved that asymptotic normality holds only if P is not uniform, that is if the entropy is not maximum. Harris (1977) proved that if P is uniform, the plug-in estimator is asymptotically w2 -distributed. Very few results exist in the literature for non-i.i.d. sequences. Gao et al. (2008) computed several estimators of the entropy of random sequences, by using Lempel–Ziv compression algorithm and the context-tree weighting method. Due to the few assumptions made on the sequences, the behavior of the estimators remains partly unknown. Ciuperca and Girardin (2007) computed plug-in estimators of the entropy of the stationary distribution of a finite-state ergodic Markov chain. For such a sequence, the stationary distribution is an explicit function of the transition probability distributions. Hence, the estimators are derived by plugging the MLE of the transitions into this expression. The good asymptotic behavior of the estimators is established thanks to the ergodic theorem for Markov chains and the delta ¨ ´ (2009) for a detailed study of such estimators for two-state Markov chains. For method. See also Girardin and Sesboue ergodic continuous-time processes, Regnault (2009) established the good asymptotic behavior of plug-in estimators of the entropy of the stationary distribution of a two-state continuous-time ergodic Markov process, by extending the methods developed by Ciuperca and Girardin (2007) to continuous-time processes. Up to our knowledge, no other result exists in the literature on the estimation of the entropy of continuous-time Markov processes. This paper aims to contribute to fill this gap for continuous-time ergodic Markov processes with general finite state spaces. Three situations of observation are discussed, according to whether one long trajectory is observed, or several independent trajectories, or the process is observed at discrete times. The estimators are obtained by proving that both the stationary distribution and its entropy are explicit functions of the generator and then by plugging estimators of the generator into these expressions. The ensuing plug-in estimators are proven to be strongly consistent and their asymptotic distributions are established and illustrated numerically through simulation. For an estimation based on one trajectory of long length, problems may arise from the non-observation of some states, especially if the number of state is large; see Albert (1962). Anyway, practically, it is often simpler to observe many independent trajectories of the process with short length than a long one, especially in survival data analysis, reliability or stochastic dynamical systems topics; see Albert (1962) and also Chiquet and Limnios (2006). An alternative consists of the observation of the process at discrete times. The interest for estimation topics linked to discrete observations, whether for theory or applications, has widely spread in the last decades; see Bladt and Sørensen (2005) and Crommelin and Vanden-Eijnden (2009). The paper is organized as follows. In Section 2, we recall and complete several methods for estimating the infinitesimal generator and stationary distribution of an ergodic Markov process. The first one is based on the observation of several independent trajectories of the process censored by a family of random times of observation. The second is based on the observation of one trajectory when the time of observation grows to infinity. Finally, we consider estimators built from discrete observations of the process. In Section 3, we introduce plug-in estimators of the entropy of the stationary distribution and prove the good asymptotic properties of these estimators. In Section 4, we illustrate the behavior of all estimators for each scheme of observation through simulation.

2. Estimation of the generator and stationary distribution of the process Before establishing the asymptotic properties of the plug-in estimators of the entropy of the stationary distribution of a Markov process in Section 3, some preliminary results on the estimation of the infinitesimal generator of the process and its stationary distribution are needed. This is the aim of the present section. Let X ¼ ðXt Þt2R þ be an ergodic pure-jump Markov process with finite state space E; we will let E ¼ f1, . . . ,sg for simplification of notation. Let A ¼ ðai,j Þði,jÞ2E2 be its infinitesimal generator and p its stationary distribution. Recall that the

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

2713

entries of the infinitesimal generator are ai,j ¼ lim

PðXh ¼ jjX0 ¼ iÞdi,j

h-0

h

ði,jÞ 2 E2 ,

where di,j is the Kronecker symbol. The coefficient ai,j represents the speed with which the process jumps from the state i to the state j. From the relation p:A ¼ 0, Albert (1962) computed p as an explicit function P of the generator A, explicitly ! c ðAÞ p ¼ PðAÞ ¼ P i,i , ð2Þ j2E cj,j ðAÞ i2E

where ci,i ðAÞ is the (i,i)th cofactor of the generator; that is ci,i ðAÞ ¼ detAði,iÞ ,

ð3Þ

where Aði,iÞ is the (s 1)  (s  1) matrix obtained from A by cancelling both the ith row and the ith column. This leads to estimate p by plugging estimators of A into (2). Due to the smoothness of P, the so-built estimators of p inherit the asymptotic behavior of the estimators of A. In the following, the process X will be identified, when necessary, with the countable family of random variables (called the history of the process) ðYn , Dn Þn2N , where ðYn Þn2N is the embedded Markov chain of X, that is the sequence of successive states the process visits, and ðDn Þn is the sequence of successive sojourn times; see Albert (1962) for details. Then, a trajectory of the process observed up to time T is identified with a realization of ððY0 , D0 Þ, . . . ,ðYNT 1 , DNT 1 Þ,YNT Þ, where NT denotes the (random) number of jumps of the process during the time interval [0,T]. See Fig. 1 for an illustration.

2.1. Estimation from one long trajectory Let us compute the empirical estimator and a plug-in estimator of the stationary distribution of the process from the observation of one long trajectory and compare their asymptotic behaviors and computation times. Albert (1962) proved that the MLE of A is its empirical estimator, that is 8 > < nT ði,jÞ if r ðiÞa0, T rT ðiÞ jai, ð4Þ A^ T ði,jÞ ¼ > :0 else, P PNT 1 with A^ T ði,iÞ ¼  jai A^ T ði,jÞ, where nT ði,jÞ ¼ m ¼ 0 1Ym ¼ i,Ym þ 1 ¼ j is the number of jumps from i to j observed along the PNT 1 trajectory, for ði,jÞ 2 E2 ¼ fði,jÞ 2 E2 : jaig and rT ðiÞ ¼ m ¼ 0 Dm 1Ym ¼ i is the total time the trajectory spends in i. It is pffiffiffi strongly consistent and T ðA^ T AÞ is asymptotically normal with diagonal variance matrix

S2A,c,1 ðði,jÞ,ði,jÞÞ ¼ ai,j =pi

ði,jÞ 2 E2 :

Fig. 1. Trajectory of a five-state Markov process simulated up to time T¼ 10.

ð5Þ

2714

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

Just as in Section 2.2, we build a plug-in estimator for p from the estimator A^ T of A by defining

p^ T ¼ PðA^ T Þ,

ð6Þ

where T is the length of the observed trajectory and P is defined by (2). Then, p^ T inherits the asymptotic properties of A^ T . Proposition 1. Let X ¼ ðXt Þt2R þ be an ergodic Markov process with finite state space E, generator A and stationary distribution p.ffiffiffiThe plug-in estimator p^ T ¼ PðA^ T Þ of p obtained from one long trajectory converges almost surely to p as T goes to infinity and p T ðp^ T pÞ converges in distribution to a centered normal distribution with asymptotic variance S2p,c,1 ¼ DP ðAÞ:S2A,c,1 :DP ðAÞt . Moreover, the asymptotic normal distribution is never degenerated and the rate of convergence is optimal. Proof. Since A^ T converges almost surely to A as T goes to infinity and P is a continuous pffiffiffi function, the continuous mapping theorem yields the strong consistency of p^ T . Moreover, the random variables T ðA^ T AÞ are known to converge in distribution to a centered normal distribution with variance S2A,c,1 givenpin ffiffiffi (5). Hence, provided that DP ðAÞ is not null, the delta-method applies and leads to the convergence in distribution of T ðp^ T pÞ to a centered normal distribution with variance S2p,c,1 . See e.g., Shao (2003) for a proof of the delta-method in the discrete-time case based on Taylor’s expansion at order 2 of the function S at A; extension to continuous-time case can be proven to hold similarly. We prove in Lemma 1 (see Appendix) that DP ðAÞa0 for all A, which achieves the proof. & The computation of the plug-in estimator p^ T of the stationary distribution involves cofactor computations. The total time complexity is of the order of OðTsln7=ln2 Þ, which must be compared with the complexity of computation of the empirical estimator

p^ T,emp ðiÞ ¼

rT ðiÞ : T

ð7Þ

This latter computation time is a linear function of the number of states s of the process and of the time of observation T. See Fig. 2 for a comparison of computation times. The strong consistency of the empirical estimator is given by the ergodic property of the process, saying that p^ T,emp ðiÞ converges almost surely to pðiÞ, for any i 2 E. Unfortunately, the exact asymptotic distribution of p^ T,emp is not known: even if Taga (1963) proved that all variables rT ðiÞ,i 2 E are asymptotically normal with explicit variances, the covariances are not available. This constitutes a strong obstacle to its use in practical situations. 2.2. Estimation from several independent trajectories This section deals with the estimation of the generator and the stationary distribution of a Markov process, from the observation of k independent trajectories, say t1 , . . . , tk , observed during the time intervals [0,Tl], where the censoring times Tl 2 Rþ , for l 2 f1, . . . ,kg are supposed to be i.i.d. random variables drawn according to some distribution m. The estimators of both the generator and the stationary distribution are linked to the number of jumps from one state to another and the sojourn times in the different states. So, for ði,jÞ 2 E2 , let us define for all l 2 f1, . . . ,kg, the number of PNT 1 jumps nðlÞ ði,jÞ ¼ m l¼ 0 1Y ðlÞ ¼ i,Y ðlÞ ¼ j from i to j observed along trajectory tl , where NTl is the number of jumps observed m mþ1 PNT 1 ðlÞ along the trajectory tl ; let r ðiÞ ¼ m l¼ 0 DðlÞ m 1Y ðlÞ ¼ i be the total time the trajectory tl spends in i. Further, let us introduce

cpu time needed for computations (seconds)

m

50 40

Computation time for the plug−in estimator Computation time for the empiric estimator

30 20 10 0 2

4

6

8

10

12

14

Number of states of the process Fig. 2. Comparison of computation time (in seconds) for both estimators p^ T (circles) and p^ T,emp (triangles) according to the number s of states.

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

2715

P the total number of jumps from i to j observed in the k trajectories Nk ði,jÞ ¼ kl¼ 1 nðlÞ ði,jÞ and the total time spent in i, say P Rk ðiÞ ¼ kl¼ 1 r ðlÞ ðiÞ . Chiquet and Limnios (2006) have shown that the maximum likelihood estimator (MLE) of the generator A is the empirical estimator, that is 8 > < Nk ði,jÞ if R ðiÞa0, c k Rk ðiÞ jai, ð8Þ A^ k ði,jÞ ¼ > :0 otherwise, P c c with A^ k ði,iÞ ¼  jai A^ k ði,jÞ, where the c index means that the trajectories are observed continuously, opposed to discrete observations in Section 2.3 below. This estimator is also to be strongly consistent as the number of trajectories goes pffiffiffi shown c to infinity. Moreover, the family of random variables kðA^ k AÞ converges in distribution to a centered normal distribution with diagonal variance Z þ 1 Z s 1 S2A,c,m ðði,jÞ,ði,jÞÞ ¼ ai,j PðXt ¼ iÞ dt dmðsÞ ði,jÞ 2 E2 , 0

0

where the m index means that several trajectories are observed. Note that this result is based on Albert (1962). He computed the MLE of the generator from k independent trajectories censored by a deterministic time of observation T and established its strong consistency and asymptotic normality. Now, considering the plug-in estimator c

p^ ck ¼ PðA^ k Þ,

ð9Þ

where P is defined in (2), yields a strongly consistent estimator of the stationary distribution p. Proposition 2. Let X ¼ ðXt Þt2R þ be an ergodic Markov process with finite state space E, generator A and stationary distribution c

p. The plug-in estimator p^ ck ¼ PðA^ k Þ of p built from k independent trajectories censored by the family of i.i.d. random variables pffiffiffi c ðTl Þl2N , converges almost surely to p as k goes to infinity and kðp^ k pÞ converges in distribution to a centered normal distribution, with asymptotic variance S2p,c,m ¼ DP ðAÞ:S2A,c,m DP ðAÞt . pffiffiffi c Moreover, the differential DP ðAÞ is never null. Thus, the asymptotic normal distribution of kðp^ k pÞ is never degenerated and the rate of convergence is optimal. c Proof. These properties follow from the delta-method and the properties of A^ k in the same way as in the proof of Proposition 1. &

2.3. Estimation from discrete observations of the process In the case of a pure-jump Markov process, considering the embedded Markov chain Y¼ (Yn) of the process seems a natural approach to estimation from discrete observations. Unfortunately, this yields no satisfying result since Y does not contain enough information about the process, as shown by the following simple example; see also Fig. 1. For a two-state Markov process, the embedded chain Y is deterministic conditionally to X0. So, the only information available from the knowledge of the embedded chain truncated at time T is the number of jumps, that is the average speed of jumps under the stationary distribution (exactly, a1,2 p1 þa2,1 p2 ). But nothing is known about the time spent in the different states and so nothing about the generator. Another more productive way to deal with discrete observations of general continuous time processes is to assume that the process is observed at discrete times Dk, k 2 f1, . . . ,ng, where D is a positive number representing the (constant) time between two successive observations. In the case of a Markov process, the sequence ðZn Þn2N ¼ ðXDn Þn2N is known to be a Markov chain with transition matrix P ¼ expðDAÞ. The MLE P^ n of P, given by Nn ði,jÞ ði,jÞ 2 E2 , P^ n ði,jÞ ¼ P k2E Nn ði,kÞ P with P^ n ði,iÞ ¼ 1 jai Nn ði,jÞ for i 2 E, where Nn ði,jÞ is the number of jumps from i to j before n þ1, has good asymptotic properties; see Anderson and Goodman (1957). Thus, one can consider d A^ n ¼ exp1 ðP^ n Þ=D

ð10Þ

for estimating A, provided it is well and uniquely defined, as stated right below. Let FD : A/expðDAÞ be defined on the open set G of all ergodic generators. Let P 0 ¼ feDA ,A 2 Gg denote the image of G by FD and let P 00 ¼ fP 2 P 0 : (!A 2 G,P ¼ eDA g. A transition matrix belonging to P 00 is said to be uniquely embeddable. The d probability that A^ n exists and is unique grows to 1 as n goes to infinity, provided that expðDAÞ belongs to the interior of P 00 ; see Bladt and Sørensen (2005).

2716

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

For s ¼2, Kingman (1962) showed that ( ! ) 1p p : p,q 2 ½0,1,q o1p : P 00 ¼ P 0 ¼ q 1q d

and computations are easy in this case: see Regnault (2009) for an explicit expression of A^ n and its asymptotic variance. Describing explicitly the set P 00 when s is larger than 2 is a difficult task, as noted in Bladt and Sørensen (2005), in which d

approximations of A^ n are obtained using EM and MCMC algorithms. The literature on this topic is rich whether assuming equidistant observations or not. Among them, Metzner et al. (2007) provide an EM algorithm computing approximations of local MLE of the generator from non-equidistant observations of the process. Crommelin and Vanden-Eijnden (2009) provide an alternative method, computing the estimator of the generator that best matches the empirical transition matrix associated to discrete observations according to some convex quadratic minimization problem. This method is flexible enough to allow discrete observations drawn according to random sampling intervals, and large state spaces. Nevertheless, d it requires the empirical transition matrix to be uniquely embeddable in order to provide an estimator-then equal to A^ n , d with good asymptotic properties. Bladt and Sørensen (2005) proved that A^ n is the MLE of A, is strongly consistent and that pffiffiffi ^ d nðA n AÞ converges in distribution to a centered normal distribution with asymptotic variance S2A,d ¼ DF1 ðPÞ:IP1 :DtF1 , D

D

where IP is the Fisher information matrix of the chain. Now, the relation p ¼ PðAÞ, where P is given in (2), defines the plug-in estimator d

p^ dn ¼ PðA^ n Þ:

ð11Þ

Proposition 3. Let X ¼ ðXt Þt2R þ be an ergodic Markov process with finite state space E, generator A and stationary distribution p. Let A belong to the interior of P 00 . The plug-in estimator p^ dn of p, built from equidistant observations of the process, satisfies the following properties: 1. The probability that p^ d is well-defined tends to1 as n goes to infinity, provided that expðDAÞ is uniquely embeddable. n d 2. When well-defined, p^ d is a strongly consistent estimator of p and pffiffiffi nðp^ n pÞ is asymptotically normal with asymptotic n variance S2p,d ¼ DP ðAÞ:S2A,d :DP ðAÞt . Moreover the asymptotic normal distribution is never degenerated and the rate of convergence is optimal. d

d

Proof. The events ‘‘p^ n is well-defined’’ and ‘‘A^ n is well-defined’’ are identical, which shows Point 1. Again, Point 2 is d derived from the delta-method applied to the family ðA^ n Þn2N . Thanks to Lemma 1, the differential DP ðAÞ is never null, so that the rate of convergence is optimal.

&

3. Estimating the entropy of the stationary distribution of the process In Section 2, we have obtained estimators of the stationary distribution p of an ergodic Markov process, for three situations of observation. Now, this allows us to construct plug-in estimators of SðpÞ. Precisely, if one long trajectory of length T is observed, let us set S^ T ¼ Sðp^ T Þ

ð12Þ

S^ T,emp ¼ Sðp^ T,emp Þ

ð13Þ

and

where p^ T is the plug-in estimator defined by (6) and p^ T,emp is the empirical estimator of p defined by (7); if k independent trajectories with i.i.d. random lengths are observed, then let us set X c c c c ð14Þ S^ k ¼ Sðp^ k Þ ¼  p^ k ðiÞlogp^ k ðiÞ, i2E

^ ck

where p is the estimator of p defined by (9). Finally, supposing we deal with equidistant discrete observations of the process, let us set d d S^ n ¼ Sðp^ n Þ,

^ dn

where p is defined by (11).

ð15Þ

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

2717

Note that, since p ¼ PðAÞ is an explicit function of the infinitesimal generator, so is the entropy; precisely SðpÞ ¼ SðAÞ where X ci,i ðAÞ c ðAÞ P log P i,i , SðAÞ ¼  c ðAÞ j,j j j cj,j ðAÞ i2E with ci,i ðAÞ defined by (3). Due to the smoothness of this expression, each plug-in estimator inherits the asymptotic behavior of the estimator of the stationary distribution it is built from. The nature of the asymptotic distributions of these estimators depends on whether the differential DS ðAÞ is null or not. Theorem 1. Let X ¼ ðXt Þt2R þ be an ergodic Markov process with finite state space E, generator A and stationary distribution p. c d Let S^ T , S^ T,emp , S^ k and S^ n be the estimators of the entropy of p defined by (12)–(15).

d 1. The probability that S^ n is well-defined goes to1 when n goes to infinity, provided that expðDAÞ is uniquely embeddable. c d 2. The estimators S^ T , S^ T,emp , S^ k and S^ n are strongly consistent when well-defined. Explicitly: (a) S^ T and S^ T,emp converge almost surely to SðpÞ as the time of observation T goes to infinity,

(b) S^ c converges almost surely to SðpÞ as the number k of independent trajectories goes to infinity, k d (c) if well-defined, S^ n converges almost surely to SðpÞ as the number n of discrete observations goes to infinity. 3. If the differential DS ðAÞ is not null, then pffiffiffi (a) T ðS^ T SðpÞÞ converges in distribution to a centered normal distribution with asymptotic variance S2

S,c,1

¼ DS ðAÞ

S2A,c,1 DS ðAÞt as the time of observation T goes to infinity, (b)

pffiffiffi c kðS^ k SðpÞÞ converges in distribution to a centered normal distribution with asymptotic variance S2S,c,m ¼ DS ðAÞ

S2A,c,m DS ðAÞt as the number k of trajectories goes to infinity,

pffiffiffi ^ d nðS n SðpÞÞ converges in distribution to a centered normal distribution with asymptotic variance S2S,d ¼ DS ðAÞS2A,d DS ðAÞt as the number n of observations goes to infinity, 4. If DS ðAÞ is null, then P ðc,1Þ (a) 2TðSðpÞS^ T Þ converges in distribution to Yði,jÞ when T goes to infinity, where the random variables Yði,jÞ ,ði,jÞ 2 2 a (c)

ði,jÞ2E

ði,jÞ

depend on S2A,c,1 . E2 are w2 ð1Þ distributed with one degree of freedom and the coefficients aðc,1Þ ði,jÞ P c ðc,mÞ (b) 2ðT1þ    þTk Þ½SðpÞS^ k  converges in distribution to ði,jÞ2E2 aði,jÞ Yði,jÞ when k goes to infinity, where the random variables Yði,jÞ ,ði,jÞ 2 E2 are w2 ð1Þ distributed, and the coefficients aðc,mÞ depend on S2A,c,m . ði,jÞ P d (c) 2nðSðpÞS^ n Þ converges in distribution to ði,jÞ2E2 aðdÞ Yði,jÞ when n goes to infinity, where the random variables Yði,jÞ ,ði,jÞ 2 ði,jÞ E2 are w2 ð1Þ distributed with one degree of freedom and the coefficients aðdÞ depend on S2A,d . ði,jÞ d d Proof. Clearly, S^ n is well-defined if and only if p^ n is well-defined. Then Point 1 is a consequence of Proposition 1. Points 2–4 are obtained using the generalized delta-method; see Shao (2003). &

For a two-state Markov process, Regnault (2009) established that DS(A) is null if and only if the stationary distribution is uniform that is if and only if both non-diagonal coefficients of the generator are equal. The entropy is well-known to be maximum when the distribution is uniform; the converse is proven by determining directly the set of points at which DS is null. For a three-state space, Lemma 3 states that DS(A) is null if and only if the stationary distribution of the process is uniform and Lemma 4 gives the set of generators with a uniform stationary distribution; see Appendix. For a larger state space, due to the numerous variables involved in the computation, it is difficult to establish a similar explicit condition on the generator for the differential DS(A) to be null. In general, if the stationary distribution is uniform, then DS(A) is null. The set of corresponding generators is the ðs1Þ2 -dimensional subset of the set of all generators, defined by the relation ð1, . . . ,1Þ:A ¼ 0.

4. Simulation We will illustrate through simulation the behavior of the estimators considered in Section 3. We compare these estimators with the true value of the entropy, compare their empirical distributions with the true ones and test their goodness of fit. We will first estimate the asymptotic variances and then detail each scheme of observation separately.

2718

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

4.1. Estimation of the asymptotic variances For comparing the empirical distributions of

pffiffiffi c pffiffiffi pffiffiffi d T ðS^ T SðpÞÞ, kðS^ k SðpÞÞ and nðS^ n SðpÞÞ with the normal distribu-

tions obtained in Theorem 1, it is necessary to estimate the asymptotic variances S2S,c,1 , S2S,c,m and S2S,d . Theorem 1 states that S2S,c,1 , S2S,c,m and S2S,d are functions of S2A,c,1 , S2A,c,m and S2A,d , respectively. In Sections 2.2, 2.1 and 2.3, S2A,c,1 , S2A,c,m and S2A,d are shown to be explicit functions of the generator. Therefore, they can c d be estimated by plugging the estimators A^ T , A^ k and A^ n of the generator computed in (4), (8) and (10), into these functions. Precisely, the ensuing plug-in estimators are !   c 2 k A^ T ði,jÞ ^2 ^ 2 ¼ D 1 ðP^ n ÞI ^ D 1 ðP^ n Þt , ^ S A,c,1 ¼ dði,jÞ,ði0 ,j0 Þ , S ¼ dði,jÞ,ði0 ,j0 Þ A^ k ði,jÞ , S A,c,m A,d FD P n FD 2 R p^ T ðiÞ ðiÞ 0 0 2 k ðði,jÞ,ði ,j ÞÞ2ðE Þ 0 0 2 2 ðði,jÞ,ði ,j ÞÞ2ðE Þ

where IP^ n is Fisher’s information matrix of P^ n . Finally, 2

2

S^ S,c,1 ¼ DS ðA^ T ÞS^ A,c,1 DS ðA^ T Þt

ð16Þ

and c

2

c

2

S^ S,c,m ¼ DS ðA^ k ÞS^ A,c,m DS ðA^ k Þt , 2

2

S^ S,d ¼ Df1 ðP^ n ÞS^ A,d Df1 ðP^ n Þt : D

D

ð17Þ ð18Þ

Proposition 4. Let X ¼ ðXt Þt2R þ be an ergodic Markov process with finite state space E, generator A and stationary distribution 2

2

2

p. Let S^ S,c,1 , S^ S,c,m and S^ S,d be defined by (17)–(18). ^ 2 is well-defined tends to1 as n tends to infinity, provided that expðDAÞ is uniquely embeddable. 1. The probability that S S,d 2 2 2 ^2 , S ^2 ^2 2. If well-defined, S S,c,1 S,c,m and S S,d are strongly consistent estimators of SS,c,1 , SS,c,m and SS,d , respectively. 3. If DS(A) is not null, then pffiffiffi ^ S,c,1 converges in distribution to the standard normal distribution as the time of observation T goes to (a) T ðS^ T SðpÞÞ=S infinity, pffiffiffi (b) kðS^ c SðpÞÞ=S ^ S,c,m converges in distribution to the standard normal distribution as the number k of trajectories goes to k

infinity, (c) pffiffiffi ^ d ^ S,d converges in distribution to the standard normal distribution as the number of observations n goes to nðS n SðpÞÞ=S infinity. Proof. Points 1 and 2 are consequences of the smoothness of the asymptotic variances with respect to the generator and of the continuous mapping theorem. Point 3 is a direct consequence of Slutsky’s theorem (see e.g., Shao, 2003, Theorem 1.11), Theorem 1 and Point 2. & 4.2. One long trajectory For simulating a pure-jump finite state Markov process, we have simulated the embedded chain ðYn Þn and the corresponding sojourn times ðDn Þn until a time T. Fig. 3 shows the point-wise convergence of the plug-in and empirical estimators of the entropy of the stationary distribution. We have fixed T¼5000. The 100 points of the figure have been obtained by computing S^ T (circles) and S^ T,emp (triangles) from one trajectory of a finite state-space Markov process (two states at the top of the figure, 10 at the bottom) from time 50 to time 5000 by step of 50. On the left of Fig. 3, the generator of the process is non-uniform, given by   2 2 A0 ¼ , ð19Þ 3 3 while on the right, the generator is uniform, given by   1 1 A1 ¼ : ð20Þ 1 1 pffiffiffi ^ S,c,1 compared with the standard normal distribution for Fig. 4 (left) shows the empirical distribution of T ðSðpÞS^ T Þ=S a two-state Markov process with non-uniform generator A0. Fig. 4 (right) shows the empirical distribution of 2TðSðpÞS^ T Þ

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

2719

0.675 0.6925 0.670 0.665

0.6915 1000

0

2000

3000

4000

5000

0

1000

2000

3000

4000

5000

0

1000

2000

3000

4000

5000

2.294 2.295

2.292 2.290

2.285 2.288 2.275 0

1000

2000

3000

4000

5000

Fig. 3. Point-wise convergence of the plug-in estimator S^ T (circles) and the empirical estimator S^ T,emp (triangles) for two-state (top) and 10-state (bottom) Markov processes with non-uniform (left) and uniform (right) generators.

Non−uniform process

Uniform process

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0 −4

−2

0

2

4

0

1

2

3

4

pffiffiffi ^ S,c,1 versus N ð0,1Þ (left) and 2TðSðpÞS^ T Þ versus w2 ð1Þ (right). Fig. 4. Empirical distribution of T ðSðpÞS^ T Þ=S

compared with the w2 ð1Þ distribution for a uniform generator. In both cases, the empirical distribution has been built from 200 independent trajectories of length T¼ 1000 of a two-state process. pffiffiffi ^ S,c,1 with the standard normal We have tested the goodness of fit of the empirical distributions of T ðSðpÞS^ T Þ=S c 2 ^ distribution through Shapiro–Wilk normality test and of 2TðSðpÞS k Þ with the w ð1Þ distribution through Kolmogorov– Smirnov test. Fig. 5 shows the values of the Shapiro–Wilk statistic W and Kolmogorov–Smirnov statistic D and the corresponding p-value. Both hypotheses are accepted with 0.01-level of type-1 error since the p-value is greater than 0.01.

4.3. Several trajectories censored by exponentially distributed length of observation For the observation of several trajectories, we have chosen the censoring variables Tl ,l 2 f1, . . . ,kg to be exponentially distributed. For showing the good asymptotic properties of the estimator for a large number of trajectories, we have chosen a short mean time of observation EðTl Þ ¼ 100,l 2 f1, . . . ,kg and k ¼1000. We have simulated k independent variables

2720

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

pffiffiffi ^ S,c,1 (left) and Kolmogorov–Smirnov goodness of fit test of the Fig. 5. Shapiro–Wilk normality test for the empirical distribution of T ðSðpÞS^ T Þ=S 2 ^ empirical distribution of 2TðSðpÞS T Þ with the w ð1Þ-distribution (right), for T¼ 1000.

0.69315

0.678 0.674

0.69305

0.670

0.69295

0.666

0.69285 0

200

400

600

1000

800

0

200

600

400

800

1000

1.09860 1.038

1.09850

1.034

1.030

1.09840 0

200

400

600

Fig. 6. Point-wise convergence of the plug-in estimator uniform (right) generators.

800 c S^ k

1000

0

200

400

600

800

1000

for two-state (top) and three-state (bottom) Markov processes with non-uniform (left) and

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0 −4

−2

0

2

4

0

1

2

3

4

pffiffiffi c ^ S,c,m versus N ð0,1Þ (left) and 2ðT1 þ    þ Tk ÞðSðpÞS^ c Þ versus w2 ð1Þ (right). Fig. 7. Empirical distributions of kðS^ k SðpÞÞ=S k

with exponential distribution Eð1=100Þ. For simulating a trajectory of a pure-jump finite state Markov process censored to time Tl, we have simulated the embedded chain and the corresponding sojourn times until Tl. Fig. 6 shows the point-wise convergence of the plug-in estimator of the entropy of the stationary distribution. The 100 c points of the figure have been obtained by computing S^ k from k¼1000 independent trajectories of a finite state-space

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

2721

Markov process (two states at the top of the figure, three at the bottom) by step of 10 trajectories. On the left of Fig. 6, the generator of the process is non-uniform and defined by (19), while on the right, the generator is uniform, given by (20) pffiffiffi c ^ S,c,m compared with the standard normal distribution. Fig. 7 (left) shows the empirical distribution of kðS^ SðpÞÞ=S k

d Fig. 7 (right) shows the empirical distribution of 2ðT1 þ    þ Tk ÞðSðpÞS^ k Þ compared with the w2 ð1Þ distribution. These empirical distributions have been obtained from 100 times 100 independent trajectories censored by ðTl Þl2f1,...,100g (with Tl exponentially distributed with parameter 1/100) of a simulated two-state Markov process respectively with non-uniform generator A0 and uniform generator A1. pffiffiffi c ^ S,c,m with the standard normal We have tested the goodness of fit of the empirical distributions of kðSðpÞS^ Þ=S k

c distribution through Shapiro–Wilk normality test and of 2ðT1 þ    þ Tk ÞðSðpÞS^ k Þ with the w2 ð1Þ distribution through Kolmogorov–Smirnov test. Fig. 8 shows the values of the Shapiro–Wilk statistic W (left) and Kolmogorov–Smirnov statistic D (right) and the corresponding p-value. Both hypotheses are accepted with 0.01-level of type-1 error since the p-value is greater than 0.01.

4.4. Discrete observations For simulating discrete observations (at Dn times, n 2 N) of the Markov process with generator A0 defined by (19), we have simulated a Markov chain with transition matrix expðDA0 Þ. According to Bladt and Sørensen (2005), D must be chosen

pffiffiffi c ^ S,c,m (left) and Kolmogorov–Smirnov goodness of fit test of the Fig. 8. Shapiro–Wilk normality test for the empirical distribution of kðSðpÞS^ k Þ=S c empirical distribution of 2ðT1 þ    þ Tk ÞðSðpÞS^ k Þ with the w2 ð1Þ-distribution (right), for k¼ 100.

0.68 0.66 0.64 0.62 0.60

0

500

Fig. 9. Point-wise convergence of the plug-in estimator

d S^ n

1000

1500

2000

to the entropy of the stationary distribution SðpÞ of a two-state Markov process.

1.0 0.8 0.6 0.4 0.2 0.0 −4

−2

0

2

pffiffiffi d ^ S,d versus N ð0,1Þ. Fig. 10. Empirical distribution of nðSðpÞS^ n Þ=S

4

2722

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

Fig. 11. Shapiro–Wilk normality test for the empirical distribution of

pffiffiffi d ^ , for n¼ 1000. T ðSðpÞS^ n Þ=S S,d

as small as possible; for large values (larger than 1) of D, expðDA0 Þ tends to be degenerated and the estimation of A0 fails. We take D ¼ 1=10. d Fig. 9 shows the point-wise convergence of the plug-in estimator S^ n of the entropy of the stationary distribution of a two-state Markov process. The estimator has been computed from the n¼2000 first values of a Markov chain with transition matrix P0 ¼ expðDA0 Þ by step of 20 coordinates. pffiffiffi d ^ Fig. 10 shows the empirical distribution of nðS^ n SðpÞÞ=S S,d compared with the standard normal distribution. This empirical distribution has been obtained from the simulation of 200 independent trajectories of length n ¼1000 of a Markov chain with transition matrix P0. pffiffiffi d ^ We have tested the goodness of fit of the empirical distribution of nðSðpÞS^ n Þ=S S,d with the standard normal distribution through Shapiro–Wilk normality test. Fig. 11 shows the value of the Shapiro–Wilk statistic W and the corresponding p-value. The normality hypothesis is accepted with 0.01-level of type-1 error since the p-value is greater than 0.01. Appendix

Lemma 1. Let X ¼ ðXt Þt2R þ be an ergodic Markov process with finite state space E, generator A and stationary distribution p ¼ PðAÞ, where P is defined in (2). The derivative DP ðAÞ of P at A is not null. Proof. Let us prove that at least one partial derivative is not null. For any i 2 E and ðk,lÞ 2 E2 , the partial derivative of Pi ðAÞ with respect to ak,l is 2 32 2 3 X X X @cj,j ðAÞ @ @c ðAÞ i,i 5: P ðAÞ ¼ 4 cj,j ðAÞ5 4 c ðAÞci,i ðAÞ @ak,l i @ak,l j2E j,j @ak,l j2E j2E Choosing k¼i, the partial derivative of ci,i ðAÞ with respect to ai,l is null for all lai, and hence 2 3 X @cj,j ðAÞ @ ci,i ðAÞ 4 5: P ðAÞ ¼  P @ai,l i ½ j2E cj,j ðAÞ2 jai @ai,l P Since ci,i ðAÞa0, it is sufficient to show that jai @cj,j ðAÞ=@ai,l is not zero at least for one lai. Actually, Lemma 2 below implies that none of the partial derivatives @cj,j ðAÞ=@ai,l , lai are null and also that they are of the same sign. That sign depends only on the number s of states of the process; explicitly,

 first, @cl,l ðAÞ=@ai,l ¼ detðAðl,lÞði,iÞ Þ where Aðl,lÞði,iÞ is the (s  2)  (s 2) matrix obtained from A by cancelling both the lth row and lth column, and ith row and ith column. This matrix satisfies the hypothesis of Lemma 2 with p ¼s  2. So @cl,l ðAÞ=@ai,l is not null and its sign is the sign of ð1Þs2 .

 if jal, then @cj,j ðAÞ=@ai,l ¼ det½Aðj,jÞði,lÞ  þ ð1ÞdðlÞ þ i det½ðAðj,jÞ Þði, dðlÞÞ , with (

dðlÞ ¼

l

if j 4 l,

l1

if j r l:

In the same way as in the proof of Lemma 2 (expanding both determinants in cofactors with respect to the first column) we can prove that @cj,j ðAÞ=@ai,l a0 and that its sign is the sign of ð1Þs2 . Consequently, not all of the partial derivatives ð@=@ai,l ÞPi ðAÞ, for l 2 E\fig are null. Lemma 2. Let M ¼ ðmu,v Þ be a p  p 0 p X m01,v m01,2 B B v¼1 B B p X M¼B B m02,1  m02,v B @ v¼1 ^ where

m0u,v 4 0

matrix of the form 1

for all u,v 2 f1, . . . ,pg.

  &

m01,p C C C C C, C C A

&

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

2723

Then, the determinant of M is positive if p is even, negative if p is odd. Q P Proof. By definition, detðMÞ ¼ s2Sp EðsÞ pu ¼ 1 mu, sðuÞ , where Sp is the set of permutations of f1, . . . ,pg and EðsÞ is the signature of s. Let us introduce for q 2 f0, . . . ,pg, the subset SðqÞ p of Sp of all permutations with q fixed points. Then, p X

detðMÞ ¼

aq ,

ð21Þ

q¼0

where

aq ¼

X

EðsÞ

p Y

mu, sðuÞ :

u¼1

s2SðqÞ p

Now, let us show that, due to the special form of the diagonal coefficients of the matrix, several terms in (21) vanish. First, for q¼p, SðpÞ p is reduced to the identity and ! p p p p X Y Y Y X 0 ap ¼ mu,u ¼  mu,v ¼ ð1Þp m0u, ZðuÞ , ð22Þ u¼1

u¼1

v¼1

E Z2Epp u ¼ 1

E

where Ep ¼ f1, . . . ,pg and Epp is the set of all applications from Ep to Ep. For qap, let us introduce for s 2 SðqÞ p the set Is ¼ fu1 , . . . ,uq g of all fixed points of s. In order to separate the diagonal coefficients of M from the others, we can write Y X Y aq ¼ EðsÞ mu,u mu, sðuÞ : ð23Þ u2Is

s2SðqÞ p

u= 2I s

Since Y

mu,u ¼ ð1Þq

u2Is

Y

p X

u2Is

v¼1

! ¼ ð1Þq

m0u,v

q X Y

m0ur ,vr :

v2Eqp r ¼ 1

Identifying each m0u1 ,v1    m0uq ,vq with the application m from Is to Ep mapping ui into vi, Y XY mu,u ¼ m0u, mðuÞ , u2Is

m2EIps u2Is

and hence, Y Y X Y mu,u mu, sðuÞ ¼ m0u, mðuÞ , u2Is

m2MIss u2Is

u= 2Is

E

where MIss is the subset of Epp of all applications whose restriction to Isc is equal to sjIc . Therefore (23) becomes s X X Y aq ¼ ð1Þq EðsÞ m0u, mðuÞ : m2MIss u2Is

s2SðqÞ p

of Sp containing the In order to exchange the sums, let us introduce for any F DEp with cardinal q, the subset SðFÞ p permutations whose set of fixed points is F. Then X X X Y aq ¼ ð1Þq EðsÞ m0u, mðuÞ : F D Ep ,jFj ¼ qs2SðFÞ m2MFs

u2Is

p

SðFÞ p

Observe that fixed point. So

aq ¼ ð1Þq

E

 MFs can be identified with MF ¼ fZ 2 Epp : ZjF 2 SFð0Þ g, where SFð0Þ is the set of permutations of F with no X

X

F D Ep ,jFj ¼ qZ2MF

EðZjF Þ

Y

m0u, ZðuÞ :

ð24Þ

u2Ep

Eqs. (22) and (24) together yield in (21) detðMÞ ¼ ð1Þp

XY

m0u, ZðuÞ þ

E 2Epp u2Ep

p1 X q¼0

Z

X

ð1Þq

X

F D Ep ,jFj ¼ qZ2MF

EðZjF Þ

Y

m0u, ZðuÞ

u2Ep

or detðMÞ ¼ ð1Þp

XY Z2Epp u2Ep E

m0u, ZðuÞ þ

p1 X q¼0

ð1Þq þ 1

p X Y u¼1 Z2AðqÞ 1

m0u, ZðuÞ þ

p1 X q¼0

ð1Þq

p X Y u¼1 Z2AðqÞ 1

m0u, ZðuÞ ,

2724

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

ðFÞ where for any subset F of Ep, we set AðFÞ 1 ¼ fZ 2 MF : EðZjF c Þ ¼ 1g and A1 ¼ fZ 2 MF : EðZjF c Þ ¼ 1g, and also [ [ Ep AðFÞ AðqÞ AðFÞ AðqÞ 1 ¼ 1 ¼ fZ 2 Ep : (F DEp ,jFj ¼ q and Z 2 MF , EðZjF c Þ ¼ 1g, 1 ¼ 1 : F2Ep ,jFj ¼ q

F2Ep ,jFj ¼ q

Two different cases thus appear, according to whether qþ1 and p have the same parity or not. Let us suppose they have. Then 2 3 p p Y Y X X 6 7 m0u, ZðuÞ þ m0u, ZðuÞ 5, ð25Þ detðMÞ ¼ ð1Þp 42 Z2A u ¼ 1

Z2Epp \A0 u ¼ 1 E

where [



! AðqÞ 1

q:2jðq þ 1pÞ

[

[

! AðqÞ , 1

q:2jðqpÞ

A0 ¼

p1 [

ðqÞ ðA1ðqÞ [ A1 Þ:

q¼0

E

Since A0 aEpp , the latter sum in (25) is not empty and detðMÞ is of the sign of ð1Þp . The alternative case can be handled similarly. & Lemma 3. Let X ¼ ðXt Þt2R þ be an ergodic three-state Markov process with state space E, generator A and stationary distribution p. Let SðpÞ ¼ SðAÞ be its entropy. The differential of S at A is null if and only if p is uniform on E. Proof. Clearly, DS ðAÞ ¼ 0 if and only if 3 X @ @ SðAÞ ¼ ½logPk ðAÞ P ðAÞ ¼ 0 @ai,j @ai,j k k¼1

ði,jÞ 2 E2 :

ð26Þ

First, let us assume that p is uniform. Then að1,1Þ ¼ að2,2Þ ¼ að3,3Þ , and hence, since generator A, " # 3 X @ @ SðAÞ ¼ log3 PðkÞðAÞ ¼ 0 ði,jÞ 2 E2 : @ai,j @ai,j k¼1

P3

k¼1

Pk ðAÞ ¼ 1 for any ergodic

Conversely, let us assume that the stationary distribution is not uniform. Then at least two cofactors differ; we assume that að1,1Þ oað2,2Þ rað3,3Þ (if not, the order of the different states can be exchanged). We will show that at least one partial derivative in (26) is not null. The partial derivatives of S are @ SðAÞ ¼ E1 ½ða3,1 þ a3,2 Þðað2,2Þ það3,3Þ Það1,1Þ a1,3  þ E3 ½a1,3 ðað1,1Þ þ að2,2Þ Þða3,1 þa3,2 Það3,3Þ , @a2,1 @ SðAÞ ¼ E1 ½a3,1 ðað2,2Þ þ að3,3Þ Það1,1Þ ða1,2 þ a1,3 Þþ E3 ½ða1,2 þ a1,3 Þðað1,1Þ það2,2Þ Þa3,1 að3,3Þ , @a2,3 where E1 ¼ log½að1,1Þ =að2,2Þ  o0 and E3 ¼ log½að3,3Þ =logað2,2Þ  Z0. Therefore, @ @ SðAÞ SðAÞ ¼ E1 ½a3,2 ðað2,2Þ þ að3,3Þ Þ þ að1,1Þ a1,2  þ E3 ½a1,2 ðað1,1Þ þ að2,2Þ Þa3,2 að3,3Þ : @a2,1 @a2,3 Since X is ergodic, a3,2 ðað2,2Þ það3,3Þ Þ það1,1Þ a1,2 4 0 and hence,

E1 ½a3,2 ðað2,2Þ þ að3,3Þ Þ þ að1,1Þ a1,2  o 0: In the same way, E3 ½a1,2 ðað1,1Þ þ að2,2Þ Þa3,2 að3,3Þ  r 0. Thus, ð@=@a2,1 ÞSðAÞð@=@a2,3 ÞSðAÞ is negative and at least one of these derivatives is not null.

&

Lemma 4. The ergodic generators with a uniform stationary distribution are those of the form 0 1 a1,2 a1,3 a1,2 a1,3 B C a2,3 A ¼ @ a1,2 þ a1,3 a3,1 a1,2 a1,3 þa3,1 a2,3 A: a3,1 a1,3 þ a2,3 a3,1 a1,3 a2,3 Proof. The result follows easily from the relation ð13 , 13 , 13ÞA ¼ 0.

&

References Albert, A., 1962. Estimating the infinitesimal generator of a continuous time finite state Markov process. Ann. Math. Statist. 33, 727–753. Anderson, T., Goodman, L., 1957. Statistical inference about Markov chains. Ann. Math. Statist. 28, 89–110.

P. Regnault / Journal of Statistical Planning and Inference 141 (2011) 2711–2725

2725

Basharin, G., 1959. On a statistical estimation for the entropy of a sequence of independent random variables. Theor. Probab. Appl. 4, 333–336. Bladt, M., Sørensen, M., 2005. Statistical inference for discretely observed Markov jump processes. J. Roy. Statist. Soc. B 67, 395–410. Chiquet, J., Limnios, N., 2006. Estimating stochastic dynamical systems driven by a continuous-time jump Markov process. Methodol. Comput. Appl. Probab. 8, 431–447. Ciuperca, G., Girardin, V., 2007. Estimation of the entropy rate of a countable Markov chain. Comm. Statist. Theory Methods 36, 1–15. Cover, T., Thomas, J., 2006. Elements of Information Theory, second ed. Wiley, New Jersey. Crommelin, D., Vanden-Eijnden, E., 2009. Data-based inference of generators for Markov jump processes using convex optimization. Multiscale Model. Simul. 7, 1751–1778. El-Affendi, M., Kouvastos, D., 1983. A maximum entropy analysis of the M/G/1 and G/M/1 queueing systems at equilibrium. Acta Inform. 19, 339–355. Gao, Y., Kontoyiannis, I., Bienenstosk, E., 2008. Estimating the entropy of binary time series: methodology, some theory and a simulation study. Entropy 10, 71–99. ¨ ´ , A., 2009. Comparative construction of plug-in estimators of the entropy rate of two-state Markov chains. Methodol. Comput. Appl. Girardin, V., Sesboue Probab. 11 (2), 181–200. Guiasu, S., 1986. Maximum entropy condition in queueing theory. J. Oper. Res. Soc. 37 (3), 293–301. Harris, B., 1977. The statistical estimation of entropy in the non-parametric case. Colloq. Math. Soc. Janos Bolyai 16, 323–355. Jaynes, E., 1957. Information theory and statistical mechanics. Phys. Rev. 106, 620–630. Kingman, J., 1962. The imbedding problem for finite Markov chains. Z. Wahrscheinlichkeitstheor. 1, 14–24. ¨ Metzner, P., Horenko, I., Schutte, C., 2007. Generator estimation of Markov jump processes based on incomplete observations non-equidistant in time. Phys. Rev. E 76, 066702. Regnault, P., 2009. Plug-in estimator of the entropy rate of a pure-jump two-state Markov process. In: Bayesian Inference and Maximum Entropy Methods in Science and Engineering. AIP Conference Proceedings, vol. 1193, pp. 153–160. Shannon, C., 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656. Shao, J., 2003. Mathematical Statistics, second ed. Springer, New-York. Taga, Y., 1963. On the limiting distributions in Markov renewal processes with finitely many states. Ann. Inst. Statist. Math. 15, 1–10. Zubkov, A., 1973. Limit distribution for a statistical estimator of the entropy. Theor. Probab. Appl. 18, 611–618.