D.P. Heyman,M.J. Sobel, Eds., Handbooks in OR & MS Vol. 2 © Elsevier Science Publishers B.V. (North-Holland) 1990
Chapter 3 Martingales and Random Walks
Howard M. Taylor Department of Mathematical Sciences, 515 Ewing Hall, The University of Delaware, Newark, DE 19716, U.S.A.
Introduction
In the simplest case, a sequence X = {Xn, n = O, 1 , . . . } of random variables having finite means is a martingale if E[Xn+IIX o. . . . , X ~ I : X n for a l l n : O , 1 . . . . .
(o)
One may think of X n as representing the fortune after the nth play of a gambler, and the martingale property expressed in (0) captures one notion of the game being fair in that the player's fortune on the next play is, on the average, equal to his current fortune. Indeed, some early work in martingale theory was motivated in part by problems in gambling. For example, optional stopping theorems address the question as to whether it is possible to quit a fair game with a positive expected profit. The more general martingale systems theorems consider if an astute choice of betting strategy can turn a fair game into a favorable one, and the name 'martingale' derives from a French term for the particular strategy of doubling one's bets until a win is secured. While martingale theory is still used to study games of chance. J.L. Doob's book on stochastic processes [1] showed convincingly that martingale theory has ramifications far beyond gambling. Today martingale theory has evolved from a specialized and advanced technique motivated in part by gambling problems, into a fundamental tool of probability analysis that affects virtually all areas of probability theory and stochastic modeling. Part of the evolution in the theory has been to recognize the importance of weakening the equality in (0) into an inequality, and to allow the conditioning to take place in a more general manner. The supermartingale property for a sequence of random variables is a simple inequality analogous to (0) that corresponds to the game being unfair to the gambler, and the submartingale property reverses the inequality and corresponds to a game being favorable. The martingale property, or the weaker sub and supermartingale properties~ occur in numerous contexts other than gambling and have far reaching 125
126
H.M. Taylor
consequences despite their deceptively mild appearances. Martingale theory has become a basic tool in both theoretical and applied probability as well as in other areas of mathematics such as functional analysis and partial differential equations. It is used for calculating absorption probabilities, deriving inequalities for stochastic processes, analyzing the path structure of continuous time Markov processes, analyzing sequential decision and control models, constructing and describing point processes that arise in queueing and other applications, and it provides the basis for the modern theory of stochastic integration. In this chapter, we will begin by giving the modern and most general discrete time definitions of martingale, supermartingale and submartingale, and follow this with a number of examples. We next outline three of the main areas of martingale results: martingale transforms, including transformations under systems of optional stopping, martingale inequalities, and the martingale convergence theorems. In Section 3, some sample applications of these results in probability, statistics and management science are described. The optional stopping theorems find direct application in management science to decision problems of an optimal stopping format, and an example of a simple asset selling model is included. For continuous time martingales and stochastic integration, many important topics and applications are very specialized and highly technical, and can be properly stated only after many definitions are made and a whole new vocabulary is introduced. We provide an introduction to this vocabulary beginning in Section 4. A heuristic principle of stochastic modeling asserts that every process can be viewed as a Markov process provided that enough history is included in the state description, and the modern theory of stochastic integration, in which martingale theory is basic, provides a framework for carrying out this program. We finish our introduction to martingale theory by very briefly describing the terminology and framework of some of this work as it relates to the martingale approach to point processes, a topic that has ,recently become important in operations research through applications in statistical inference, queueing theory and reliability and maintainability modeling.
1. Martingales Let X = {Xn; n = 0, 1, o . .} be a sequence of random variables defined on a probability space 02, A, P). Let F = {Fn; n = 0 , 1 , . . . } be an increasing sequence of sub-o--algebras of A and suppose that X is adapted to the sequence F; that is, X, is measurable with respect to F n for each n. The sub-o--algebra F,, represents the observable 'history' up to 'time' n and events in F n are said to be prior to n. The sequence X is called a supermartingale with respect to the sequence F if, for all n,
Ch. 3. Martingales and Random Walks
(i)
E[X2] >
-%
where x - = min{x, O} ,
127
(1)
and
(ii)
E[Xn+IlFn]<~X,,.
(2)
Since the supermartingale property expressed by (ii) is in terms of conditional expectations, the inequality is meant almost surely. Property (ii) can be expressed in the equivalent integrated form (ii')
~ X~ dP >-fB Xn+l dP
for B in F~.
(3)
In the sense of (ii'), a supermartingale is expectation decreasing. Let X be a sequence adapted to F such that - X = { - A n ; n = 0, 1 . . . . } is a supermartingale. Then X is expectation increasing and is called a submartingale. If both X and - X are supermartingales, then X is called a martingale. For a martingale, E[IX~i] 1, let X n = ~1 + "'" + ~,. Then X is a martingale. The integrability of each X n is an elementary consequence of the triangle inequality, and the independence of the summands implies that E[~:.+I[Fn] = O, whence
E[X,,+IlFn] = E[XnIF,] + E[~n+lfF~] Example 2. Martingales in Markov process with transition kernel
P(x,
processes.
= X , + 0 = An. Let s%, ¢1, o . . , be a Markov
A) = P { ~ , + , ~ A I ~ n = x ) °
A nonnegative function f for which
f(x) >if f(y)P(x, is called
dy)
for all x
excessive or superharmonic. If f
(4)
is excessive, then X n = f(~n) defines a
H.M. Taylor
128
nonnegative supermartingale. The integrability condition (i) is satisfied because f is nonnegative and the supermartingale inequality condition (ii) is an elementary consequence of the Markov property and (4). More generally, suppose that f is an eigenvector and A an eigenvalue of the transition kernel in the sense that Af(x) = f f(y)P(x, dy)
for all x .
(5)
Then, X,, = A-"f(s~n)
(6)
is a martingale, provided, of course, that e [ I x . I ] < o0 for all n .
To see the versatility and power of this technique, let ~ be the sequence of partial sums of some independent and identically distributed random variables Y1, Y 2 , . . . ( ~ 0 = Y o = O ) , having a moment generating function q~(A) = E[exp{AYk} ] that is finite for some A ~ 0 . Then X 0 = 1, and
Xn
=
~0(/~)-n exp{A(Y 1 + - - . + Y.)}
(7)
determines a martingale because the function f ( y ) = exp{Ay} is an eigenfunction for the Markov process of partial sums ~ with associated eigenvalue ~o(A). For example, suppose I11, Y2, • • •, are independent and normally distributed with mean zero and variance one. Then ~o(A) = E[exp{ h Y1}] = exp{ ½Ae }
(8)
and therefore, X. = exp(A(Yl + . . . + Y . ) - lnA2}
(9)
is a martingale.
Example 3. Likelihood ratios. Let ~co, ~1,.. -, be independent and identically distributed random variables, and let f0 and fl be probability density functions° A stochastic process of fundamental importance in the theory of testing statistical hypotheses is the sequence of likelihood ratios
xo= L(¢o)L(¢O: Jol~go)Yo~,~S1)
,,=o, 1.....
(lo)
Ch. 3. Martingales and Random Walks
129
When the common distribution of the ~'s has f0 as its probability density function, then X is a martingale When fo is the normal density with mean zero and variance one, and fl is the normal density with mean A and variance one, then
fl(y)/fo(y) = e x p { h y - ½A2) and X n = e x p { h ( ~ 0 + - - - + ~ : , ) - anA2),
(11)
a martingale that was encountered in the previous example. The definition of a supermartingale can be extended to any stochastic process X = {X(t); t in T} where T is a subset of the real line. Thus we can consider continuous time martingales, X = {X(t); t~>0}, for example. The basic results about discrete parameter martingales carry over to continuous time, but with measure theoretic complications. The nomenclature used in discussing continuous time martingales is described in Section 4. A stochastic process Z = {Z,; n = 0, 1 . . . . } defined by Z n = X n, where X = { . . . . , X_ 1, X0} is a martingale, is termed a backward martingale.
2. The major results of martingale theory Three major areas of results in martingale theory are the martingale transform theorems, the martingale inequalities, and the martingale convergence theorems. The transform theorems give conditions under which certain transforms of martingales remain martingales. In the early literature, these were sometimes called martingale systems theorems because the transforms were often modeled after gambling or betting systems. The most important were optional stopping theorems, where the gambler was trying to leave the game at an advantageous time, and the optional sampling theorems. In more basic applications, these systems theorems lead to the martingale inequalities, which in some sense might be viewed as extensions to stochastic sequences of Markov's inequality for single nonnegative random variables: P ( Z > c) 0. Applying Markov's inequality to the nonnegative random variable Z = ( X - E [ X ] ) 2 leads to Chebyshev's inequality, from which the weak law of large numbers follows easily, so one can begin to imagine how powerful the martingale inequalities might be. The martingale convergence theorems follow from the inequalities and assert that under broad and general conditions, martingales have limits as n--~ co (or as n ~ - ~, where applicable). Let X = {X,; n = 0, 1 . . . . } be a supermartingale with respect to a sequence F = {Fn; n = 0, 1 , . . . } of sub-o--algebras of A. A random variable T taking
H.M. Taylor
130
values in {0, 1 . . . . , ~} is called a Markov time or stopping time provided that the event { T ~< n} is in F n for n = 0, 1 , . . . . In words, stopping prior to n is an event prior to n. A n optional stopping theorem relates the m e a n of X r to that of X 0. Some examples will be stated formally as theorems. T h e o r e m 1. I f X is a nonnegative supermartingale, then
E[Xol/-- E[XT;
< oq
for all Markov times T. Here E[X~; T < oo] = f~ 7"<=} X r dP " Theorem 2. If X is a uniformly integrable martingale and T is an almost surely finite Markov time, then E[Xr] = E[Xo]. Let
Tl <~ T2 ~ . . . < oo be M a r k o v times. The optional sampling theorem asserts that, under quite general conditions ST
1, S T
2, . . . ,
is also a martingale. For example, this holds whenever the M a r k o v times are uniformly bounded. We turn to the more general martingale transform theoiems. Let X be a martingale. Then X can be written in the form Xo=~ 0
and
Xn=~0+'''+~n
forn~>l
(12)
where £o, ~ 1 , . . . , called the martingale differences, are not necessarily independent, but satisfy E[ £, + 1]F,] = 0 for n ~> 0. Conversely, the partial sums of an integrable sequence ~o, ~ , . • • , for which E[£.+IIF.] = 0 for n/> 0 f o r m a martingale. Given such a sequence £ now define the sequence Y0 = ~0
and
Y,+I =
~=0
~+~f~(~0 . . . . .
~),
(13)
where, for each i, ~ is a prescribed b o u n d e d function of i + 1 variables. The sequence Y is called a martingale transform of X, and it is an important result that Y is also a martingale. Martingale transforms can be used to model certain betting strategies or gambling systems. Let us suppose that ~ represents the amount won or lost per dollar bet in a fair game. The function f~ specifies the
Ch. 3. Martingalesand Random Walks
131
amount bet on the (i + 1)st play as a function of the outcomes observed to date. Then the martingale transform Y-- (Y,) will be the player's fortune over time. The assertion that Y is a martingale is a simple example of what used to be called a martingale systems theorem: The gambling system represented by the functions f~ cannot change the fair game into a favorable one. But martingale transforms have far more broad and general applications than is suggested by this example. When the transform (13) is written in the form Yn+l = 2
i=1
g i ( X o , . . . , X i ) [ X i + 1 -- Xi] ,
where g i ( X o . . . . ' X i ) ~ - f i ( ~ o , . . . , analog to the stochastic integral
(14)
~i), the resulting transform is a discrete
r(t) = fo gs(X(u); u < s) dX(s)
(15)
and, indeed, this analogy can be made precise and begins to suggest the far reaching implications of martingale theory in stochastic integration. By appropriately choosing the Markov time T in an optional stopping theorem, or the sequence of Markov times { Tn; n = 1, 2 . . . . } in the optional sampling theorem, a variety of powerful inequalities can be derived. Tile simple, but important, maximal inequality for submartingales asserts that
hP(omkax Xk > h) <~E[Xn] for any h > 0 and submartingale X. It is derived by applying the optional stopping theorem for submartingales with the Markov time {min{k ~>0; Xe > a} T=
ifXk>,~forsomek=O,,..,n, ifX k<~hfork=0,...,n.
Let ~1, £2,.. •, be independent and identically distributed random variables with mean zero and variance (r 2, and define SO= 0, and S, = ~1 "-}- " ' " "~ ~n for n ~>1. Noting that the variance of S, is no -i, Chebyshev's inequality gives c 2 P ( [ S . [ > c) ~ n o "2 ,
for any c > 0. The maximal inequality applied to the submartingale S 2 yields a stronger bound, known as Kolmogorov's inequality, which plays a central role in the classical proof of the strong law of large numbers. Kolmogorov's inequality is c2p( max \O~k<~n
for any c > O.
]Skt>C)
~-%ntr 2
132
H.M.
Taylor
A second example of a martingale inequality is the fundamental upcrossings inequality, which places limits on the oscillations possible for a submartingale. Given a submartingale X, real numbers a < b, and a positive integer N, define V~,b to be the number of pairs (i, ]) with 0 ~< i < j ~< N for which X i ~< a < X k < b ~< Xj for i < k '. Then Va,b counts the number of times that X upcrosses the interval (a, b) during n = 0," 1 . . . . . N. The upcrossings inequality asserts that E [ ( X . - a) +] - E [ ( X o - a) +] E[V"'b] <
lim X , = X~
almost surely.
Theorem 4. A uniformly integrable martingale converges almost surely and in Z 1.
Theorem 5. A backward martingale converges almost surely and in LI. To mention one of the interesting applications of the backward martingale convergence theorem let ~1, ~2, • - •, be independent and identically distributed random variables having a finite mean. Let S O= 0, and introduce the partial sums S, = ~1 + "'" + ~,. The strong law of large numbers can be rather easily derived from the observation that Z n = Sn/n is a backward martingale and the backward martingale convergence theorem. As indicated in equation (4), a discrete parameter martingale X can be written as the partial sums of the martingale differences ~. The differences ~0, ~ 1 , . . . , are not necessarily independent, but satisfy E[~n+alF,, ] = 0 for n 1> 0. Since ~ for k ~< n is measurable with respect to F,, it easily follows that E[~n+~k] = 0 so that the summands are orthogonal. Now, of course, orthogonality does not imply independence, but it is close, so that one might expect that martingales might obey central limit theorems, and this is indeed the case. T h e o r e m 6. For n = l , 2 . . . . , let {~,,~; l<~k~
Ch. 3. Martingales and Random Walks k(n)
Z
2
E[~nklF,,k_l] = 1
a.s.
133
f o r each n ,
k=l
and k(,,)
E E[#n2kl(l#nkl> c}]--,o
f o r each c > 0 ,
k=l
then
k(n) k=l
where N(O, 1) denotes a s t a n d a r d n o r m a l r a n d o m variable.
3. Sample applications of martingale theory Martingale theory so pervades all of probability and stochastic analysis that it is impossible in this brief chapter to give a representative sampling of all of its applications. The examples that we do give should be regarded as only a sugestion of the manifold possibilities.
Example 4. Referring to Example 1, identically distributed random variables p and q = l - p , respectively. When + --- + £n form a martingale ( X 0 = 0).
let ~1, ~2 . . . . . be independent and taking the values -+ 1 with probabilites p=q=½, the partial sums X , = ~ I The random time
T = min{n ~ 0 ; X n . . . . a or X = b} where a and b are positive integers, is a Markov time to which the optional stopping theorem applies. Let v a be the probability that X reaches - a before it reaches b. When a = 0 and b is the total fortunes of' two competing players in a game of chance having unit bets, then determining v, is called the g a m b l e r ' s ruin p r o b l e m . H e r e X , is the fortune of one of the players, and v, is the probability that this player goes broke before the other player does. Invoking Theorem 1, we deduce 0 := E [ X 0 ]
=
E[XT] = Oa(-a ) +
which solves to give v a = b / ( a 4- b ) .
Z n = X~z - n is also a martingale, and
(1 -
Oa)b ,
H.M. Taylor
134
E [ Z r ] = 0 = vaa2 + ( 1 - va)b 2 - E[T] reduces to give E[r] = ab.
When p ~ q, the gambler's ruin problem is solved by using the martingale
( q/p)X. Example 5. Consistency of the likelihood ratio test. Let ~:0, ~:1, • • *, be independent and identically distributed random variables having the probability density function f0- The likelihood ratio given in equation (14) is a nonnegative martingale and thus, by Theorem 1, it converges as n ~ ~ almost surely to a finite limit. If f0 = )¢] a.s., then this limit is necessarily one. Otherwise, the limit must be zero, all other states being transient. That is, unless the two distributions are identical, the likelihood ratio Xn converges to zero with probability one when n--~ ~ under the density f0. A statistician can use this result as follows. The statistician chooses a positive constant a and if a given sample of size n yields Xn t> a, he or she acts as if fl were the correct density. Otherwise, he or she will act as if f0 is correct. The limit that we have just demonstrated indicates that if n is large and f0 is the correct density, then the likelihood ratio converges to zero and thus the statistician will incorrectly choose fl with small probability. Example 6. Optimal stopping problems. Stopping rule problems ocur in many areas of management science as special types of control problems in which the decision maker has only two available actions: 'stop' and 'continue'. Let X = {Xn;n = 0 , 1 , . . . } be a Markov process with transition kernel P. We assume that we are given a discount factor /3, with 0 3 ~< 1, and that associated with each state x is a nonnegative reward r(x). The decision maker sequentially observes X0, X~ . . . . . At each stage the option is available either to stop or to continue observing. Stopping at stage n earns the decision maker the reward r(X~). The present value (discounted) reward received is/3nr(Xn). Never stopping earns no reward. Of course the decision maker need not stop at a preselected fixed time n, but can stop at some random time based on the observed process. Corresponding to a prescribed decision procedure, let T be the time of stopping, with T = identifying the event of never stopping. Clearly the problem is meaningful only when the event { T = n} is determined by what is observed prior to n; that is, when T is a Markov time. The decision maker wants to choose a Markov time T = T* that maximizes the expected present value of the reward that he receives, E[/3Vr(Xr)]. The following lemma describes a means for verifying that a given function is an upper bound on the expected rewards. Consistent with our assumption that never stopping carries no reward, we interpret E[r(Xr) ] as an integration only over the event { T < ~}.
Ch. 3. Martingalesand Random Walks
135
Lemma. Let u be a function satisfying, for all x,
u(x) >I r(x)
and
u(x) >t fl fy P(x, dy)u(y).
Then for any Markov time T, u(x) >i E[13rr(Xr)[Xo = x]
for all x ,
To establish the lemma, one uses the technique of Example 1, especially equation (5), to show that fl-nU(Xn) is a nonnegative supermartingale to which T h e o r e m 1 applies. This gives the first inequality below. The second inequality in
u(x) >!
u(XOIXo = x]/> e[t " r ( x o I x 0
= x]
is because u(x) >i r(x) for all x by hypothesis. The lemma provides an effective way for determining an optimal Markov time in many circumstances. Using heuristic arguments, one attempts to find a Markov time T* whose expected return
u(x) :
*r(x .)lXo : x]
satisfies the hypotheses of the lemma. Such a Markov time must then be optimal since no other time can achieve a higher return. For example, consider the following simple asset selling model. Let I"1, Y2,.. •, be independent and identically distributed positive random varaibles having a finite mean and a known distribution. These represent successive daily bids on an asset that one is trying to sell. The maximum bid to date is given by the Markov process 2(, = m a x { Y 1 , . . . , Yn} and optimally choosing the bid to accept, assuming that one can recall earlier bids, corresponds to maximizing over Markov times T the expected discounted return
where fi is a prescribed discount factor. Let a be the smallest value y for which y I> f i E [ m a x { y , Y1}]. It is not difficult to argue heuristically that one would be indifferent between accepting or rejecting a bid whose value was Y = a, and that therefore, an optimal policy would be to accept the first bid whose value was a or more. That is, the optimizing Markov time should be T* = min{n > 0; Yn >~ a) . One now sets out to prove this by defining u(x) = max{a, x}, and then showing that the function u satisfies the conditions of the lemma so that, in particular, for x = 0 ,
H.M. Taylor
136
u(O) = a >t E[flrXr]
for all Markov times T .
To show that T* is optimal, all that remains is to show that
a = E[B r• max{Y 1. . . . , Yr*}] , and this is a relatively easy computation based on the definitions of a and T*.
4. Continuous time martingales Continuous time martingales have become very important in applied probability and stochastic modeling. Here we introduce some of the relevant terminolgy while, at the same time, trying to avoid many of the details. Suppose that (O, A, P) is a complete probability space. A family of oralgebras F = {F t C A; 0 ~< t ~<~} is a filtration if (i) Fs_C_F~for s < t ; (ii) F is right continuous: As> t Fs = F t ; (iii) F is complete: F 0 contains all P-null sets of A . The collection (J2, A, P, F) is called a stochastic basis. A stochastic process X = {X(t); t I> 0} is measurable if it is measurable as a function (t, to)--~ X(t, to) with respect to the product tr-algebra B ® A where B is the Borel o--algebra on [0, ~). The process X is progressively measurable if for every t, the mapping X restricted to [0, t] × O is measurable with respect to the product cr-algebra B[O, t]®t(,, where B[0, t] is the Borel o--algebra on [0, t]. The process X is adapted to the filtration F, or F-adapted, if X(t) is Ft-measurable for all t. Every adapted right-continuous process and every adapted left-continuous process is progressively measurable. A random variable T taking values in [0, ~] is called a Markov time (with respect to F) if { T ~< t} is in F t for all 0 ~< t < ~. An adapted right-continuous process M with E[IM(t)t ] < ~ for all t is a martingale if
E[M(t)lbs]=M(s)
a.s.
for a l l 0 ~ s < t < ~ .
Analogous definitions are made for supermartingales and submartingales. The optional sampling theorems and convergence theorems that were described for discrete time martingales carry over to continuous time with only technical modifications. As indicated earlier, the continuous time analog of the martingale transform is the stochastic integral. Here the going gets much tougher because the technicalities greatly increase in number and complexity. Stochastic integration began with the It6 integral which defined integrals of the form
Ch. 3. Martingales and Random Walks
Y(t) = f~ f(s) dB(s)
137
(16)
where f is a continuous adapted random process and B is a Brownian motion. In differential form, (16) reads d Y ( t ) = f ( t ) d B ( t ) , so that integration with respect to differentials dY of processes Y of the form given in (16) may be reduced to integration with respect to differentials of a Brownian motion. Continuing in this manner, and using the stochastic calculus that he developed, It6 was able to extend his integral to differentials of smooth functions of diffusion processes having smooth coefficients by expressing these differentials in terms of those of a Brownian motion. Since the paths of a Brownian motion have unbounded variation a.s., the Riemann-Stieltjes theory cannot be applied and the usual chain rule for differentiation must be modified into It6's formula, which in the simple case of the integral in (16) is
dg(Y(t)) = gy(Y(t)) dY(t) + ½gyy(Y(t))f(t) dt for twice continuously differentiable functions g. It is the martingale property of the Brownian motion that is critical in the development of the It6 integral. The modern theory considers integrals of the form
r(t)
=
f(s) d X ( t )
where both f and X are quite general random processes. The question is: What are the 'natural' and most general processes f and X for which a stochastic integral can be defined? Only recently has a completely satisfactory answer to this question been found. The natural integrands f are the predictable processes, and the most general differentials dX for which one has an integral are those arising from semi-martingales X, as we shall now explain. A process X is predictable if it is measurable with respect to the o'-algebra on (0, oo) × g2 generated by the collection of adapted processes whose paths are left-continuous. A process X is optional if it is measurable with respect to the o--algebra on (0, o0)× O generated by the collection of adapted processes whose paths are right-continuous. When the stochastic basis is that of a Brownian motion, because of the continuous paths and the strong Markov property of Brownian motion, the predictable processes turn out to be the same as the optional processes. For this reason, it took many years to see that the assumption of predictability was the natural one in more general contexts. For a process having jumps, such as a counting process, the predictable processes typically are not the same as the optional ones, so that the distinction becomes important~ A process X has a property locally if there exists a localizing sequence of stopping times {Tk; k>~ 1} such that (i) Tk---~~ a.s. as k--* ~; and (ii) for each
138
H.M. Taylor
k, the process X(. ^ T~) has the property. Thus, for example, we speak of local martingales, locally square integrable processes, etc. A semimartingale is a proces X that can be written as a sum X = M + A of a local martingale M and a function of bounded variation A. The stochastic integral with respect to the differentials of a semimartingale is defined as a local martingale integral for M, and a path by path Lebesgue-Stieltjes integral for A. Under mild assumptions, semimartingales are the most general processes for which a stochastic integral can be defined. A basic question then arises as to what processes are semimartingales. It is an elementary fact of real analysis that a function of bounded variation can be written as the difference of two increasing (meaning, nondecreasing) functions, and it is equally elementary that the difference of two martingales is again a martingale. Therefore, one starts by asking what processes can be written as the sum of a martingale and an increasing process. The fundamental tool for finding semimartingales in the theory of stochastic integration is the DoobMeyer decomposition theorem and its later improvements. The Doob-Meyer result asserts that a submartingale satisfying an integrability condition can be uniquely written as the sum of a martingale and a predictable integrable increasing process. An improvement asserts that any submartingale can be decomposed into the sum of a local martingale and a predictable increasing process. The very important It6 transformation formula can be generalized to stochastic integrals with respect to semimartingales, and is useful in a variety of applications. Unfortunately, the general formula is too long to describe in this brief survey. If M is a square integrable martingale, then M 2 is a submartingale and the Doob-Meyer decomposition theorem implies the existence of a predictable increasing process ( M ) , called the predictable variation of M, such that M 2 - ( M ) is a martingale. For this reason, square integrable martingales, and locally square integrable martingales, play a major role in the development of the theory.
4.1. Martingales and counting or point processes A counting process is an adapted, nondecreasing, right-continuous, integer valued process N with N(0) = 0 a.s., and with all jumps of size one. The value N(t) counts the number of points occurring in the interval (0, t] in a random point process. A multivariate counting process ( M 1 , . . . , Mr) is a finite collection of counting processes with the additional restriction that no two processes jump simultaneously. A counting process N, being nondecreasing, is a submartingale and thus may be decomposed as the sum of a local martingale M and a nondecreasing predictable process A. The (unique) process A is called the compensator of N, sometimes denoted N, or N. Heuristically, the compensator expresses the cumulative conditional intensity of the point process associated with the
Ch. 3. Martingales and Random Walks
139
counting process in the sense that
P(dN(t) =
1IF,-) =
dA(t).
As an illustration of this structure, let us attempt to model a random lifetime X that arises from a hazard or failure rate, r(t), t >i O, which is itself a random process. In a management science context, we might be describing the failure time of some equipment operating in a random environment. The counting process will be N(t) = { 0 1
f°rO~< t < X , for t > - X .
We construct X as follows. First, suppose that ~ is a unit exponentially distributed random variable, independent of the random hazard process r(t), t ~ 0. We assume that ~: is measurable with respect to F0, and that r(t) is measurable with respect to F t . Define
@(t)=for(u)du
for t>/O,
and assume that @(t)---~
a.s.
as t - - - ~ .
The lifetime X is defined in terms of the random hazard rate process r and the unit exponential random variable ~: by X = inf{u 1>0; qb(u) .~ ~} so that X > x if and only if q~(x) < ~. Then the compensator for N relative to F is given by
A(t) = @(t A X ) . By the construction of X it follows that A is adapted, and since @ is continuous, so is A, and thus A is predictable. For t < X, we have P(dN(t) = 1) = d A ( t ) = d ~ ( t ) = r(t)dt, so that the counting process is related to the random hazard rate as it should be. The differential dA(t) is always thought of as being in the forward direction, so that for t/> X, we have d A ( t ) = 0, as it should be. It is important to recognize that the compensator depends on the history, or filtration that is used. In the above example, the o--algebra F, includes
H.M. Taylor
140
knowledge of the unit exponential random variable ~ and the random hazard rate r(u), 0 <- u<~t up to time t. That is, it is assumed that the random environment was observable. What if this knowledge is not available? What if all that is known at time t is whether or not failure has occurred? The appropriate filtration here is termed the internal history, denoted F N = (Flu), where F1u is the o--algebra generated by the counting process N(u), 0 ~ u ~ t up to time t. If we let G(x) = P(X ~ x) = E [ e x p { - f o r(u) du} ] be the (unconditional) cumulative distribution function of the failure time X~ Then the compensator relative to the internal history is
A,(t) = @l(t ^ X) where dF(u) q~1(t) = f ] 1 -- F---(-u--) " The martingale approach to point processes is used in queueing theory, the theory of statistical inference on point processes, and in reliability and maintainability applications.
4.2. Martingales and diffusion processes A diffusion process in Nd is usually specified through its diffusion coefficients
a = a(t, x) and b = b(t, x) where a is a positive semidefinite symmetric matrix and b is a d-vector for each t and x. Given the coefficients a and b, we define the operator L t operating on functions f in C O by
(Ltf)(x) = ½ ~ aq(t, x) 02f + ~. bj(t, x) O___f xix j
xj
There are various ways of describing exactly what we mean by a diffusion process corresponding to the specified set of coefficients. Let 12 be the space of all Nd valued continuous functions and let X be the evaluation mapping. The martingale approach of Stroock and Varadhan defines a solution to the martingale problem corresponding to the given coefficients and starting at x as a measure P in the stochastic basis for which P(X(O) = x) = 1, and for which for each f in C o,
f ( X ( t ) ) - fo (Lsf)(X(s)) ds is a martingale~
Ch. 3. Martingales and Random Walks
141
When a is b o u n d e d continuous and positive definite, and b is b o u n d e d and measurable, then a solution to the martingale problem uniquely exists and defines a strong M a r k o v process. In the one dimensional case the same result holds assuming only that a and b are bounded and measurable and that a is uniformly positive on compact sets.
5. R a n d o m walks The partial sums S = {S n; n >10} of independent and identically distributed random variables ~ = { {~; n/> 1 } are commonly referred to as a random walk. When the {'s take the values -+ 1 with respective probabilities p and q = 1 - p, the the process is called a Bernoulli random walk. A simple random walk refers to p = q = ½, and often a simple random walk, or its analog on the d dimensional lattice, is called, simply, a r a n d o m walk. Obviously r a n d o m walks arise wherever sums of independent identically distributed r a n d o m variables arise. For example, the random walk hypothesis in the stock m a r k e t asserts that successive daily changes in the price of a stock are independent and identically distributed r a n d o m variables, so that the sequence of prices are a random walk. U n d e r the r a n d o m walk hypothesis, the past sequence of price changes provides no information about future changes. In the popular press, the r a n d o m walk hypothesis is often confused with another p r o p e r t y of an efficient fair market, that one can do as well in choosing a portfolio at random as based on public information. This is a distinct property, of course. Modern expressions of the r a n d o m walk hypothesis are in terms of martingales related to successive ratios of prices. The study of r a n d o m walks divides itself into the study of long run properties, such as recurrence, and short term properties. Fluctuation theory is the study of the latter, of r a n d o m variable of the f o r m f(So, S 1 , . . . , S~) defined on the sequence of partial sums. A n u m b e r of key relationships in fluctuation theory derive from certain symmetry and combinatoric considerao tions. A n example follows. We assume that the s u m m a n d s are continuous random variables so that the possibility of ties among the partial sums can be ignored. Define m , = min{0, $1,. ~ ~ , Sn} ,
M, = max{0, $1, . . . , Sn} ,
and recursively define W0 = 0
and
W,+I = max{W, + ~:,+1,0}
.
The .fundamental result to be obtained is that Wn and M n have the same distribution. To see this, first observe that Wn and S n - m , have the same distribution. Then
H.M. Taylor
142 S,
-
m,
=
S,
-
min{0,
S 1, . . . ,
Sn)
= max { S , - S k } O~k~n
= max{0, X , , X , + X , _ I , . . .
,X, +.-. +X1}
i~ max{0, $1, Sz, . . • , Sn} as claimed. Example 7. Queueing models. Suppose that the times between successive arrivals of customers to service facility are independent and identically distributed random variables A~, A 2. . . . . The service time of the nth customer is Bn, forming an independent and identically distributed sequence, B~, B z . . . . , and independent of the arrival process. The waiting time Wn of the nth customer is the time from his arrival to the instant when his service begins; the total time spent by the customer at the server is Wn + B n. Assume that customer number 0 arrives at time 0 to an empty server so that W0 = 0. Suppose now that the nth customer arrives at time t and we know his waiting time Wn. His service begins at time t + W~ and terminates at time t + W, + B,. The next customer arrives at time t + An+ 1 and has waiting time W,+ a = max{0, Wn + B, - An+I}. Then the waiting time W, has the same distribution as M, where the summands are ~, = B n - A n +1. 8. Storage systems. Consider a storage system such as a reservoir or inventory system where the input in successive periods are independent and identically distributed random variables. The successive outputs are also independent and identically distributed except that no output is possible when the system is empty. Denote by ~n the supply minus the demand during period n, and let Wn be the storage system level at the beginning of period n. Clearly Wn+ 1 = max{W, + ~,, 0}. Thus, the distribution of M, is important in many applications. A major tool in its study is Spitzer's identity: For It] < 1 and u real, Example
E[ei"M"]t " = exp n=0
~ E[ei"Sk; S~ ~ 0] exp -
-k- P(Sk < O) . k=l
Related generating function Fourier transform identities that have found use in management science applications are the W i e n e r - H o p f f a c t o r i z a t i o n and the K h i n t c h i n e - P o l l a c z e k formula. j-
Simple random walk. Let ~:1, ~ 2 . . . . , be independent and identically distributed random variables taking the values -+ 1. Let X o = 0, and for n/> 1, let Xn = ¢1 + "'" + Cn. Because the step size is one, the paths of this Bernoulli random walk are continuous in the sense that the random walk cannot move
Ch. 3. Martingales and Random Walks
143
from integer point a to integer point b without visiting every intervening integer. Simple random walk, for which the moves up and down are equally likely, has an additional symmetry, and for this reason, many functionals of the process can be computed exactly. We will give one example, using the reflection principle. Let M n = m a x { X 0 , . . . , An} be the maximum of the process, and let us consider the problem of determining P ( M n >I a) for a positive integer a. Consider first paths for which Xn > a. By the continuity of the paths, such a path must assume the value a at some time prior to n. Starting at the first such time, reflect the remainder of the path about the level a. That is, where the original path moves up, the reflected path moves down, and vice versa. The result is a path, call it X ,r for which M nr ~~ a and X,r < a. By the symmetry, each of the paths X and X' has the same probability, so that P ( M . >i a, X . < a) = P ( M . >i a, X n > a) = P ( X ~ > a ) .
There are also paths for which M n i> a and X. = a, which reflection leaves unchanged. Thus we obtain P ( M n >t a) : P ( M n >t a, X n < a) + P ( M . >i a, Xn > a) + P ( M n >t a, X . = a) : 2 P ( x . > a) + P ( X n : a ) ,
The last two probabilities are rather simple functions of the symmetric bino= mial distribution.
Bibliography Doob's book [1] introduced martingales to the probability community at large and stands as a basic reference for discrete time. Karlin and Taylor [2] has a long non-measure theoretic introduction to martingales, but, unfortu-nately, neglects any mention of martingale central limit theorems. A clear and modern expository treatment of these is found in the monograph by G~inssler and Haeusler [3]. A thorough treatment of discrete time martingales is given by Neveu [4]. A number of excellent books have recently appeared that present the martingale theory of stochastic integration. A clear introduction is provided by Chung and Williams [5]. Metivier [6] gives a thorough pedagogic treatment. Ikeda and Watanabe [7] is also highly recommended. The marting~ ale approach to point processes is the central theme in the book by Bremaud [8]. The recent text by Jacod and Shiryaev [9] provides a systematic exposition of weak convergence for semimartingales. Spitzer [10] and Feller [11] remain the best introductions to random walk. Applications in management science are presented in Prabhu [12].
144
H.M. Taylor
References [1] Doob, J.L. (1950). Stochastic Processes. Wiley, New York. [2] Karlin, S. and Taylor, H.M. (1975). A First Course in Stochastic Processes. Academic Press, New York. [3] G~inssler, P. and Haeusler, E. (1986). On Martingale Central Limit Theory. Springer-Verlag, Berlin-New York. [4] Neveu, J. (1975). Discrete Parameter Martingales. North-Holland, Amsterdam. [5] Chung, K.L. and Williams, R.J. (1983). Introduction to Stochastic Integration. Birkhauser, Basel-Boston. [6] Metivier, M. (1982). Semimartingales. De Gruyter, Berlin. [7] Ikeda, N. and Watanabe, S. (1981). Stochastic Differential Equations and Diffusion Processes. North-Holland, Amsterdam. [8] Bremaud, P. (1981). Point Processes and Queues: Martingale Dynamics. Springer-Verlag, Berlin-New York. [9] Jacod, J. and Shiryaev, A.N. (1987). Limit Theorems for Stochastic Processes. Springer° Verlag, Berlin-New York. [10] Spitzer, F. (1964). Principles of Random Walk. Van Nostrand, New York. [11] Feller, W. (1966). An Introduciton to Probability Theory and Its Applications, Vol. II. Wiley, New York. [12] Prabhu, N.U. (1990). Queues and Inventories. Wiley, New York. [13] Karr, A.F. (1986). Point Processes and their Statistical Inference. Dekker, New York.