Operations Research Letters 9 (1990) 305-313 North-Holland
September 1990
DISCRETE-TIME 'INVERSION' AND DERIVATIVE ESTIMATION FOR MARKOV CHAINS
Paul GLASSERMAN AT&T Bell Laboratories, Holmdel, NJ 07733, USA Received March 1989 Revised October 1989
In estimating functions of continuous-time Markov chains via simulation, one may reduce variance and computation by simulating only the embedded discrete-time chain. To estimate derivatives (with respect to transition probabilities) of functions of discrete-time Markov chains, we propose embedding them in continuous-time processes. To eliminate the additional variance and computation thereby introduced, we convert back to discrete time. For a restricted class of chains, we may embed in a continuous-time Markov chain and apply perturbation analysis derivative estimation. Embedding, instead, in a certain non-Markovian process yields an unbiased perturbation analysis estimate for general chains (but may have higher variance). When this last estimate is converted to discrete time, it turns into a likelihood ratio derivative estimate for the original, discrete-time chain, revealing a surprising connection between the two methods.
Markov chains * gradient estimation * simulation * perturbation analysis * likelihood ratios
1. Introduction Hordijk, Iglehart and Schassberger [5] showed that in estimating functions of continuous-time Markov chains by simulation, one reduces both variance and computation by simulating only an embedded Markov chain and replacin~ state holding times by their (conditional) means. Fox and Glynn [1,2] generalized this technique and called it "discrete-time conversion". Intuitively, discrete-time conversion works because it both eliminates the need to generate holding times, and removes the variability due to random holding times. Here, in order to estimate derivatives of functions of discrete-time Markov chains (with respect to transition probabilities), we invert this procedure and add variability to the holding times. This can make an otherwise discontinuous function of the chain, continuous--a critical precondition for our approach to derivative estimation. To make our method more efficient, we then convert back to discrete time. We first consider 'inverting' discrete-time conversion in the most obvious way: we embed the discrete-time chain in a continuous-time Markov chain. From the continuous-time chain we can estimate derivatives using the perturbation analysis estimate developed in [3]. This estimate is then easily converted to discrete time (as in [3]), yielding an algoriihm that can be applied directly to the original chain, without ever actually randomizing holding times. But the estimate in [3] requires a condition on continuous-time chains which translates to an analogous condition in discrete time. Since this condition is not always satisfied, we consider other embeddings. We find a class of non-Markovian, continuous-time processes in which any chain can be embedded, and for which the resulting perturbation analysis estimate is unbiased. Interestingly, when this estimate is converted back to discrete time, the result is just a likelihood ratio derivative estimate for the original chain. This reveals an unexpected connection between two approaches to derivative estimation.
0167-6377/90/$3.50 © 1990 - Elsevier Science Publishers B.V. (North-Holland)
305
Volume 9, Number 5
OPERATIONS RESEARCH LETTERS
September 1990
2. Preliminaries
Let Xt be a discrete state space, continuous-time Markov chain with generator Q. Suppose x t has no instantaneous or absorbing states. Denote by %, n > 0, the epoch of the n - t h j u m p of Xt and let TO= 0. Let Y = { Y,, i = 0, 1, ... } be the j u m p chain embedded in X t at %, rl . . . . . A standard problem in simulation (and other areas) is that of estimating the expectation of the integral over [0, T] of a bounded, real-valued function, f , of Xr If T = %, then in terms of the embedded chain, this integral can be expressed as n--]
L~= fo"f(Xt) dt= E f(Yi)(,ri+l-,ri).
(1)
i=0
Taking the conditional expectation given Y, we get n-1
]' •
f(Yi)('ri+l- ~,)lr
E i
n-
F~ f(Y~)q-'(Y~),
i=0
where q-l(y) is the mean holding time in state y. (The holding time r i + l - ~ ' i in Y~ is exponentially distributed with mean q-l(}~).) The last quantity is an unbiased estimate of E[L,] which can be calculated from a simulation of Y only, and has smaller variance than (1). This is discrete-time conversion. N o w let { P ( 0 ) , 0 ~ O } be a parametric family of Markov transition matrices. Let E 0 denote expectation under P(O) with fixed initial state Y0 = Yo, and consider the problem of estimating dEo[L]/dO, where L = ET=0f(Y/). The perturbation analysis approach to this type of problem tries to construct, on a c o m m o n probability space, a family (Y(0), 0 ~ O} of processes, each Y(O) a Markov chain with transition matrix P(O), so that L(O) is, at each 0, differentiable with probability one. The 'sample' derivative dL/dO is used to estimate dEo[L]/dO. If E denotes expectation on the c o m m o n probability space, then this estimate is unbiased if
(2) Equation (2) will typically fail to hold unless L is continuous in 0. But unless f is constant or P(O) is independent of O, no construction of Y(O) makes L continuous. Since L takes on only discrete values, it must be constant to be continuous. To put this another way, dL/dO - 0 wherever it exists. This suggests looking for an alternative approach. We may identify Y, in the obvious way, with a continuous-time, semi-Markov process in which every state holding time, ~i, is identically one. Suppose we randomize the holding times and allow them to change with 0. This embeds each Y(O) in a continuous-time process. It may be possible to choose the ~ in such a way that
L(o)- E i=0
satisfies E [ £ ( O ) ] --- E[L(O)] for all O. Since L may take a continuum of values, we cannot a priori exclude its continuity the way we can that of L. We may in fact be able to use L ' as an unbiased estimate of
E[L'].
3. Embedding in a Markov process
Suppose every P(O) has only zero as a diagonal entry. F r o m the transition matrix P(O) define a generator Q(O) by setting Q(x, y, a)=P(x, y, O) if x ~ y , and every Q(x, x, O ) = - 1 . (If P(O) had a non-zero diagonal entry we could construct Q(O) as a uniformized generator, allowing transitions from a 306
Volume 9, Number 5
OPERATIONS RESEARCH LETTERS
September 1990
state to itself, but we will not c a r r y t h a t out here.) Under Q(0), the mean holding time in every state is one; so if Xt(O ) has Q(O) as its generator,
Moreover, the derivatives of both sides are equal, wherever they exist. In [3], we showed how to construct continuous-time Markov chains (satisfying some simple conditions) so that
L ( O ) = £~"+'f( Xt(O)) dt is continuous throughout (0 0 - h, 00 + h) with probability 1 - O(h2), for each 00 in an open interval 0. The conditions we imposed on Q(O) there translate into the following analogous conditions on P(0): (a) Every P(x, y, O) is a differentiable function of 0 on 0. (b) If P ( x , y, 0) = 0 for some O, it is zero for all O ~ @. (c) For every pair of states x, y, if there is a z for which P(z, x ) P ( z , y ) > 0, then there is a z' with
P ( x , z ' ) P ( y , z ' ) > O. The first two of these are quite natural; the last is more restrictive, but is critical to the construction. Some indication of its role will be given below. In the meantime, from Lemma 8.1 of [3] we get the following lemma. Lemma 1. Suppose (P(O), O ~ @ )
satisfies (a)-(c). Suppose that, uniformly in x, y, and O, every I P ' ( x , y, O) I is bounded above and every non-zero P ( x , y, O) is bounded away from zero. Then for each 0o ~ 0, Xt(O) can be constructed so that the following hoM with probability 1 - O(h2): (i) every jump epoch ri is continuous throughout (0 o - h, 0o + h); and (ii) /f Y/= X,, + has a discontinuity at some O ~ (0 o - h, 0o + h ), then q'i+ 1 ( 0 ) = Ti( O ). Rewriting L as a s u m - - a s in (1)--it is not hard to see that L is continuous wherever the two parts of Lemma 1 hold. In fact, under the conditions for Lemma 1, Theorem 8.1 of [3] shows that [dL] dElL] E --d-0- = - - - d f f - - =
dElL] dO
(including the fact that the derivative on the far right exists). We also show in [3] how to compute d L / d 0 from simulation of X, and how to convert to discrete time in order to compute E [ d £ / d O I Y ] from simulation of Y. This, too, is an unbiased estimate of d E [ L ] / d O . We now present the algorithm that results from embedding the chain as above and applying the version of the derivative estimate in [3] converted (back) to discrete time. For a detailed discussion and further motivation, see [3]. We need some notation. For states x and y satisfying the hypothesis of (c), let K ( x , y) be a state z' satisfying the conclusion. When (c) holds, if y is some state reachable in one step from Y,_ 1, but from Y,_ 1 the process moves instead to Y~4: y, then there is a state z = K(Y~, y ) reachable from both ~ and y. We need the inverse image (given Y~-I and Y~) of the mapping that takes y to z in this way. Hence, given Y,_1 and Yi, let
S,(z)={y:
K(Y~, y ) = z ,
P(Y~-I, y ) > 0 } ,
which may be empty. In the algorithm below, accumulate 'perturbations' in variables Dn(y ). As Y is simulated or observed, apply the following steps:
Step 1. Initialize by setting every Do(y ) = 0 and d L 0 = 0. Step 2. At the i-th transition, update by setting
= D,_l(g)-
e'(Y,_l, Y,) e(r _l, 307
Volume 9, Number 5
OPERATIONS RESEARCHLETrERS
September 1990
for every x with P(Y/, x) > 0 and [Si(x ) I > O,
D,(x) = D,(g) +
E
P(Y~-a, Y) (
x3
yES,(x)
Di_ (y)
Y) y)
Di(Y/)) ;
while if P(Y,, x) > 0 and I SAx) I = 0, then Di(x ) = Di(Yi). Set d L~ = d L i_ 1 + f(Yi_l)[Di(Yi) - D i _ I ( ~ _ I ) I. Step 3. After n + 1 transitions, stop; the value of d L , + 1 is E [ d L / d O I Y ] . This solves the derivative estimation problem for chains satisfying (c) (and (a) and (b)). To get a more general estimate, we look more carefully at the implications of (c) and Lemma 1. Let ~i = r~+l - "ri denote the i-th holding time, and write L(O) = ~.ff(Yi(O))~(O). Part (i) of Lemma 1 implies that each ~i is (with high probability) continuous in O. Roughly speaking, what (c) guarantees is that if, through a change in 8, Y, becomes Y/*, then
( Ya , . . . , Yi , . . . , Y, )
becomes
( Y1 .... , Yi * . . . . . IT,) ;
that is, condition (c) allows us (in [3]) to construct Y(O) so as to force the process to return to Y~+I even after a dicontinuity in Y, In order for L to be continuous at a discontinuity in Y~, it is, therefore, sufficient to have ~, = 0 at a discontinuity of Yv This is exactly what part (ii) of Lemma 1 ensures. If condition (c) is dropped, a discontinuity in Y, potentially introduces discontinuities in all of Y~+I . . . . . II,. Thus, we are led to try to construct the holding times ~i SO that at a discontinuity of Y~, all of /Ji. . . . . ~5. go to zero.
4. Embedding in a non-Markov process
Motivated by these observations, we now embed Y in a non-Markovian continuous-time process Z. The process Z stays in state Y/ for an interval of length ~i, then jumps to Y~+a- We will define the ~ so that E[~i[ Y] -- 1; hence, if ~'i = E ki = l ~ i , then, applying (1),
e[£] = E
r'+'f(Zt) dt
= E E i~=of(Yi)~i Y
= E [ i~=of( Yi) ] = E L L ] .
(3)
However, the ~i will no longer be conditionally independent given Y, nor will they be exponentially distributed. For each state x, partition the unit interval into subintervals, one of length P(x, y) for each y with P(x, y) > 0. Let the y-th such interval be [axy, bxy), where
axy--
E P(x, y') y, o
and bxy=axy+P(x, y). For 0 ~ < u < l , let Sx(U)=y when axy<~u
September 1990
OPERATIONS RESEARCH LETTERS
Volume 9, Number 5
U
t. . . .
I
0
i
I
a~y
. . . .
b~y
P(x,y)
l-
t
1
,I
Fig. 1. Illustrating $~(u) when s~(u) = y.
E[tx(U) Isx(U)]/P(x, sx(U))= 1; that is, rx(U ) has conditional mean one, gives sx(U). With Y constructed as above, we define the ~i by setting ~0 - 1, and for i > 0, ~i+l = rL(U~+a)~i. Some of these quantities are illustrated in Figure 1. As the figure suggests, continuous changes in the endpoints axy and bxy c a n only change the value of s~(u) by first reducing 8x(u) (hence rx(u)) to zero. We need to verify that each ~i indeed has conditional expectation equal to one. Note that, given Y, the rv,(U,÷l) become conditionally independent because the Us. are independent. Moreover, each rr,(U~+l) depends on Y only through Y, and Y/+~. Thus,
E[~lY]=E[rro(U1)""
rr,_,(U~) I Y]
=E[rro(U1)lYo, Y,] . . . E[rr,_,(U,)[ Y,_I, Y,]
=E[rro(U~)lYo,Sro(U1)]
' e[rr,_,(U,)IE-,,
sr,_,(E)]
=1---1=1. Thus, (3) holds. If P depends on 0, then so do Y and ~ as constructed above; in fact, we have:
Lemma 2. If every P(x, y, 0) is continuous in 0, then so is L - Ei=lf(Yi)~i. Proof. If P changes continuously then so do all the interval endpoints axy and bxy. This makes 6x(U) continuous in 0 for every x and u; hence, also every rx(U). Thus, for each k, so long as Y1. . . . . Yk remain constant, ~1,-.., ~ are continuous. Consider, therefore, a point 00 where some Yk is discontinuous; and let k be, in fact, the smallest such index. Let 0m be any sequence in @ converging to 00. For Y~ to be discontinuous at 00, srk_l(Uk) must be discontinuous, which can only happen if 8rk_,(Uk)= 0. In fact, 8yk ,(Uk) decreases continuously to zero as 0,, goes to 00. Thus, so does ~k, and also every ~j with j > k. This implies that n
k-1
£(Om) = E f ( Y i ( O m ) ) ~ i ( O m ) i=o
-'~
E f(Yi(Oo))~i(Oo) = L(Oo), i=o
so that the discontinuity of Y does not impede the continuity of L.
[]
We also have the following lemma. I.,emma 3. For each O, L is, with probabifity 1, differentiable, and
dO = i=O
dO "
Proof. The only points where L' fails to exist are those where some U~, 1 ~< i ~
ar,_~r, or bv,_lV. For each 0, such an event has probability zero. Everywhere else, the result follows by simply differentiating, recalling that df(Y~)/dO can only be zero. [] 309
Volume 9, Number 5
September 1990
OPERATIONS RESEARCHLETTERS
Lemma 3 yields the perturbation analysis estimate obtained by embedding Y in Z. We now find conditions under which this estimate is unbiased. Differentiating rx(U) with respect to O, we get
r~(u)=4(8~(u)P(x'sx(u))-Sx(u)P'(x'sx(u)) ) P sx(u)) ( 2x ,
"
(4)
Note that 8x'(U) is equal to either -a~sx(~) or b~x(~), depending on which endpoint u is closer to. Suppose there are positive constants B, M and p., such that the following hold, for all x, y and 0 ~ O:
I P'(x, y, O) I O ~ P(x, y,O)>p,,
(5) (6)
and
P(x,Y,O)>O)I~
(7)
(8)
and observing that I rx(u) I ~ 4/p., we find that
I '1
i(4/P.)S-I(8BM/p2.) •
~
If P(O) satisfies (5)-(7) and (a)-(b), then dE[L]/dO exists and equals E[dL/d8] under the construction above.
Theorem 1.
Proof. Using the bounds above, we see that [ d L / d O I is uniformly bounded on 0 by a constant, namely n2(sup [ f l) • (4/p.)n(8BM/p2). Together with the continuity of Lemma 2, this is enough to permit the interchange of differentiation and expectation. [] This solves our derivative estimation problem for a general class of chains and leads to the following simple algorithm:
Step 1. Initialize by setting ~0 = 1, ~0 = 0, d L 0 = 0, and fixing Yo =Y0Step 2. Generate the i-th transition by setting Y~= rr,_l(U~). Let ~ ry,_l(Ui)~i_l; ~'i = r~_l(Ui)~i-1 + rY,_l(Ui)~i-1, =
!
t
and
ol., = dL,_l +
(9)
Step 3. After n transitions, stop; the value of d L n is d L / d 8 . This algorithm has the shortcoming that it requires using a particular scheme to generate Y, and could not, for example, be applied to observations of Y generated by some other means. It also uses randomized holding times. To get around this, we condition the estimate d L / d O on Y, which will make it a function of Y. That is, we convert to discrete time. Note that
= E i=0
310
Volume 9, Number 5
OPERATIONS RESEARCH LETTERS
September 1990
Thus, the algorithm above would be modified simply by replacing ~,' with E[~; [Y] in (9), and generating Y by any means. To calculate E[~; I Y], first note that
irl =E[~;,(U/+I)IY/,Y/+I] 1 t l t = 7(-aEr,+,) + :i(by, y,+,) 1 t
= 3*' (r,, 4+,)-
Recalling that E[Sr,(U/+I) ] Y] = ½P(Yi, Y,+,), from (4) we get
E[r;i(U~+I)[Y ] = E[r;,(U~+I)[Y~, r,+l] ½Pt(Y,, Y/+I)P(Y/, Y / + I ) - ¼P(Yi, Y,+,)P'(Y/, Y/+I) p ( y / , y/+, )2
=4
e'(r~, r,+,) ,,(r,, r~+,) • Next, differentiating (8) and using the conditional independence of the rr,(U,+l), we get
1
;]
t
]-[%(vk+,)r~,(Uj+,)
=e
j=0 ~,kq~J
= y'
)
Y
]
kI-JjE[ryk(Uk+,)lY] E [ r ~ ( U j + I ) Y ]
j=o
,-o ~.1 ~(~, ~.+~) ,-1 e'(~., ~+,) ~- E e ( Y j , yj+,) " j=0
Thus, E ~
Y = i=
f(Yi) E p ( y j , yj+,) • j=0
(10)
We recognize the right side of (10) as a likelihood ratio derivative estimate, in the sense of Glynn [4] and Reiman and Weiss [71. To see the connection, write E[L] = E[f(Yo) ] + ... +E[f(Y,)] and Y~. f(yi)P(yo, y , ) ' "
E[f(Y,)] =
P(Yi-1, Yi),
Yl,"., Y,
where the sum runs over all Ya..... y, that make the product of P entries positive. (Recall that we have fixed Y0=Y0.) Differentiating, then multiplying and dividing by P(Yo, Yl) "'" P(Y,-a, Y,), we get
e[f(g)]'=
E /(y,)
Y0..... Y,
j=0 P ( Y J ' Yj+I)
e(yo, yl)---J'(y,-,, y,)
,-1 t,'(E, L+,) ] = e
f(Yi) E p ( y j , r j + l ) J" j=0
From this we see directly that E[L]' is the expectation of the quantity on the right in (10). 311
Volume 9, Number 5
OPERATIONS RESEARCHLETI~ERS
September 1990
5. Initial distributions. Hitting times The embedding of Y in Z of the previous section has two interesting properties, in addition to Theorem 1 and the connection with likelihood ratios: The continuity of L is preserved even if Y0 is chosen from an initial distribution that depends on O, or if f ( Z t) is integrated up to the time to reach some state, rather than up to ~'n+l- Continuity fails in both of these cases under the construction in [3] and every other application of perturbation analysis of which we are aware. We briefly indicate why the present example is different. Throughout, we have assumed that Y0 is fixed at Y0. Our results go through as before if I:0 is chosen from a distribution, /~, independent of 0. But what if/~ also changes with 0? We may partition [0, 1] into intervals of length/t(y, O), and define s(u) and r(u) for this partition as we did sx(u ) and rx(u) for the partition based on P ( x , . , 0). By taking ~0 = r(Uo) (rather than ~0 = 1), with U0 uniform on [0, 1], we preserve the continuity of L. This construction changes the initial state by first reducing all holding times continuously to zero, so that L is continuous even at a discontinuity in Y0Hitting times work similarly. Let Tx be the time to reach x. Suppose at Oo the first entrance to x occurs at ~ (so that Yj = x), and that at 00+ it occurs at ~'k with k 4=j. For this to happen, there must be some least i ~< rain(j, k) such that Y~ has a discontinuity at 00. But them ~m(Oo) = 0 for all m >/i making every ~'m(00), m > i, equal t o q'i(Oo). This makes Tx(00+) = "c~(Oo)= T~(0o). Similarly,
foTxf(Zt(O)) dt
(11)
is continuous at 00. The continuity of (11) is important because through functions of this form we can get at steady state values using the regenerative method. 6. An example and a comparison We illustrate the derivative estimates of Sections 3 and 4 through a random walk example, for which the estimates are quite simple. The conditions for both methods will be in force, so either could be applied. One might question whether it is worth considering the estimate of Section 3 given that the one in Section 4 is more general. But the two methods behave quite differently when the time parameter, n, becomes large. One purpose of this section is to suggest that, in some cases, there may be advantages to the method of Section 3 when both are applicable. Consider a homogeneous random walk on the non-negative integers, with reflection at the origin: 1, x=O, y=l,
P(x, y, 0 ) =
p(O), ~l-p(0),
x>0, x>0,
y=x+l,
y=x-1.
We take 0 < p (0) < 1 for all 0, and always start at Y0 = 0. To consider the estimate of Section 3, we need to evaluate Di(Y~) Di_I(Y/_I). Away from the origin, Di+I(Y/ + 1) -- Di(Y/_I + 1) - p ' / p and D~+a(Y~- 1) = Di(Yi_ 1 - 1) + p ' / ( 1 - p ) . But at every transition from 0 to 1, the third case of Step 2 applies, and Di(0) is set equal to Dg(1). Combining these observations it can be shown that D,+I(Y/+I) - D,(Y/) = A~(Y~- Y~-x, Y~+a- Y~), where a i ( a , 1) = - p ' / p , p' p' A~(1, -1)=(i-N~)p(f-p). . P , -
t
312
a,(-1,
1)=-
A,(--I,
--
t
( i - N,.) p (:__ P) + P._L__I_p,
1) =p'/(1 - p ) ;
Volume 9, Number 5
OPERATIONS RESEARCH LETrERS
September 1990
and N~ is the time of the last visit to 0 before i. The resulting derivative estimate is just
n E f(Yi)Ai(Yi-
i=1
Y/-1, ~ + 1 , ~ ) -
(12)
The estimate in (10) takes the form (13)
f(Y~)Ri, i=l
where R i is p ' / p times the number of up jumps from a non-zero state, plus - p ' / ( 1 - p ) times the number of down jumps (up to time i). Now divide (12) and (13) by n and consider what happens as n becomes large. Since (ET=lf(Yi))/n converges to a constant, it is reasonable to expect the behavior of A and R,, for large n, to determine the stability of the two estimates. For An, note that only the middle two cases of the four considered above present possible difficulties. But note, also, that n - N n is just the 'age' associated with the renewal process of returns to 0. Since the time between returns to 0 is geometrically distributed (on the even integers), it has finite moments of all orders. Thus, Var(n - Nn) and, hence, Var(An) remain bounded as n ~ oo. For R,, we argue as in Section 4 of Glynn [4]. Note that, in general, E
P(Y/-1, Y/)
Y/-1 =0,
and, in the present example, each term [ P'(Y,-1, Yi)/P(Y~-a, Y/) I is bounded by I P'/pl. Therefore, R n satisfies a (martingale) central limit theorem (e.g., the one in Pollard [6], p. 171). That is, n ~/2R n = n -1/2
-
i ,=,
P ( i-1, Y,) 'Y r/)
=
oN(O, 1)
for some constant o. But then Var(Rn)- o2n. This suggests that (12) does better than (13) for large n. This conclusion is supported by the empirical example in [3].
References [1] B. Fox and P.W. Glynn, "Discrete-time conversion for simulating semi-Markov processes", Oper. Res. Lett. 5, 191-196 (1986). [2] B. Fox and P.W. Glynn, "Discrete-time conversion for simulating finite-horizon Markov processes", to appear in SIAMJ. Appl. Math. (1990). [3] P. Glasserman, "Derivative estimates from simulation of continuous time Markov chains", submitted to Oper. Res., 1989. [4] P.W. Glyrm, "Likelihood ratio gradient estimation: An overview", Proc. Winter Simulation Conf., A. Thesen, H. Grant and W.D. Kelton (eds.), 366-374, 1987. [5] A. Hordijk, D.L. Iglehart and R. Schassberger, "Discrete time methods for simulating continuous time Markov chains", Adv. in Appl. Probab. 8, 772-788 (1976). [6] D. Pollard, Convergence of Stochastic Processes, Springer, New York, 1984. [7] M.I. Reiman and A. Weiss, "Sensitivity analysis for simulations via likelihood ratios", Oper. Res. 37, 830-844 (1989).
313