On effective computation of expectations in large or infinite dimension

On effective computation of expectations in large or infinite dimension

23 Journal of Computational and Applied Mathematics 31 (1990) 23-34 North-Holland On effective computation of expectations in large or infinite dime...

824KB Sizes 0 Downloads 9 Views

23

Journal of Computational and Applied Mathematics 31 (1990) 23-34 North-Holland

On effective computation of expectations in large or infinite dimension Nicolas CERMA,

BOULEAU Ecole Nationale des Pants et ChaussCes, B.P. 105, F-93194 Noisy-le-grand,

France

Received 31 March 1989 Abstract: This study is an analysis of the natural difficulties of integration by Monte Carlo or quasi Monte Carlo methods. In spite of what is sometimes written, these methods work only in some precise cases. For the important problem of the computation of expectations of functionals of stochastic processes, we present the advantages of a method based on the implementation of the Bernoulli shift operator by pointers.

R&sum& Cette etude est une analyse des difficult& inherentes a l’inttgration par des methodes de Monte Carlo ou de quasi- Monte Carlo. Contrairement a ce qui est tcrit parfois ces mtthodes ne sont praticables que dans certains cas. Pour le probleme important du calcul de l’esptrance de fonctionnelles de processus stochastiques, on presente les avantages d’une methode de simulation en dimension grande ou infinie fond&e sur une implementation de l’operateur de shift par des pointeurs.

Keywords: Monte Carlo, quasi Monte Carlo, Riemann integrable, discrepancy, effectivity, infinite dimension, Bernoulli shift, pointers.

On the one hand simulation methods are often presented for solving stochastic problems as the only ones which work when no direct computation is possible, but on the other hand they are known to be slow. Some reservations have to be made to these two statements, first because these methods don’t work in certain cases but need precise assumptions which are often not satisfied; second because several improvements allow in some cases significant speeding up. Section 1 is an attempt to make this analysis more accurate to clarify what succeeds and what fails when computing the expectation of a random variable by simulation, i.e., by Monte Carlo or quasi Monte Carlo sampling and averaging. For quasi Monte Carlo methods this depends highly on the dimension of the space on which the random variable is defined. That leads to the concept of simulation morphisms. This is presented in Sections 1.1 and 1.2. But with both Monte Carlo and quasi Monte Carlo cases a more fundamental difficulty comes from the fact that some random variables (often even very simple ones) are not Riemann integrable in the setting where they are naturally defined (see Examples 2, 3 and 6 in Section 1.3). Finding another random variable with the same law and being Riemann integrable is in general a difficult problem which remains unsolved in many cases. As exposed in Section 1.4 it is nevertheless possible to compute the expectation of non-Riemann integrable random variables without proceeding to such a reduction to the Riemann integrable case, if the random variable can be shown to be effective in L’, that is, Elsevier Science Publishers B.V. (North-Holland)

N. Bouleau / Effective computation of expectations

24

.

can be effectively approximated for the L’-norm by bounded continuous functions. In this part many examples are given and some open questions are set. Section 2 is devoted to the infinite-dimensional case which is shown to be very usual when computing expectations of functionals of stochastic processes such as Markov chains or diffusions. This is just for introducing Section 3, in which the shift method is exposed. Using Birkhoff’s pointwise ergodic theorem instead of the law of large numbers is quite a natural idea, but what is more interesting is that, despite the fact that there is no standard speed of convergence for this theorem, this method runs surprisingly fast in usual cases, so much that it is the only one available in some situations. This is due to the efficient use of pointers that this method allows. As the law of large numbers, the shift method has its Monte Carlo and its quasi Monte Carlo version, the latter being in connexion with the notion of rapidly normal numbers or sequences which replace here low discrepancy sequences. The shift method was exposed in a course of the author at the Universite Paris VI in 1987-1988, and in two lectures at European universities in 1988. At present, this method raises several questions to fundamental research, especially on inequalities for the speed of convergence and whether random sequences are the fastest or not in infinite dimension; a result of [4] showing that, in a certain sense, they are not far to be the best ones. The word “effective” in the title and along the paper is only a preformalized and naive version of recursive predicate theory.

1. Difficulties of simulation 1.1. Fast integration

in small dimension

Our study being mainly devoted to other cases (integration in large or infinite dimension), we recall this case just for completeness. Here X is a real random variable defined on the space ([0, l]“, %‘([O, l]“), dx) and s is a small natural integer. Several papers tend to make the word “small” more precise (cf. for instance, X is supposed to be Riemann integrable, i.e., can be ]I5,191), (roughly, s G 15 say). Moreover, inserted between two continuous functions whose integrals do not differ more from each other than E, for every c > 0. In this case, which is quite usual for applications for engineers, there is better to do than to pick randomly independent points U, in [0, 11” and to compute E(X) by the law of large numbers: E(X)

= limi

$

5 X(U,) n=l

dx,

a.s.,

that is to say, in practice, by using pseudorandom sequences. is a uniformly distributed sequence on [0, l]“, i.e., if the measure Indeed, if < = (
25

N. Bouleau / Effective computation of expectations

But for some sequences this convergence is faster than with independent samples. The discrepancy being defined by o,*(rr)

=

sup

]&([O,

uniform random

Xi] x .** x [O, x,]) --x1 *** xslr

xE[o,l]s

for independent samples it cannot be asymptotically better than {(log log n)/(2n) , by a form of the iterated logarithm law, while several sequences are known for which it is of the order c(logn)“/n. Examples (a) Halton sequences.

Let r be an integer >, 2. If the r-adic expansion of n is

n=a,+a,r+

a,E {O,l,...,

.a* +a,P,

r-l},

then cD~( n) is defined by @Jn)=?+

-0. +&, r

which belongs to [0, 11, and the Halton sequence is the following: 5,= where r,,...,

(@J4...>@r3(4)Y r, are the first s prime numbers. Then, [5]

Q*(5)

’ rj log( rjn ) G ; J-I log rj ’ I=1

assoonas

n>r,.

(b) Hammersley sequences. The second time a simulation is run, it is generally possible to improve it by using finite sequences, that is the case for Halton sequences. If (5,) 1 < m 6 n, is defined by

then

‘--l rj log(r,n) o,c(6)GL

n n n=l

log r,



as soon as n > r,_,,

(cf. 161). (c)Irrational translations of the torus. Let (Y~E (0, l), i = 1,. , . , s, be such that 1, (~i,. . . , a, be independent over Q, then the sequence 5,=

({na,),...,

{na,>)

(where {x } = x mod 1) satisfies Ve> 0 3c(L c) VA'

D*(t)<

c(E, z)&

(cf. [16]). With such sequences, moreover, the stopping criterions discrepancy with the Koksma-Hlawka inequality F(X)

-,+-dM

G v(X)G(~)

are deterministic

when using the

N. Bouleau / Effective computation of expectations

26

(cf. [7,10]) where V(X) is the variation of X, in the sense of Hardy-Krauze (this is, modulo some technical points related to the mass on the boundary of the cube, the total mass of the measure v if it exits such that Y([O, xi] X . . . X [0, x2]) = X(x) - E(X), where x = (xi,. . . , xs) E [0, 11”). Other sequences are known (cf. for instance, [S]) satisfying the estimate ND,*(E)

< C,(log N)S + O((log N)$_l),

N 2 2,

0)

some with constants C, as small as possible, such as Faure sequences [3], themselves improved by Niederreiter [12]. But the practical difficulty comes from the fact that in (1) the right-hand side increases rapidly with the dimension s (despite the fact that C’ itself could be decreasing). That is true, so that for s = 20 several millions of iterations are necessary to reach similar results as with random sampling (see [l&19]). Remark. It is nevertheless interesting to note that equidistributed sequences allow the use of the method. More precisely, if Y is a real random variable which can be simulated by the rejection method using a dominating random variable simulated on [0, llp and if the random variable X whose expectation is to be computed is given by rejection

X=F(Y,

U, )...) u,),

where the q’s are independent uniform variables independent of Y and F is supposed to be Riemann integrable, then E(X) can be computed by an equidistributed sequence on [0, l]p+q, see for instance [l]. 1.2. Changes

of dimension

Let ( fi2,, &‘i, Pi) and (a,, A@‘*,P2) be two Polish spaces equipped with their Bore1 u-field and probability measures Pi, P2; then if @ is a mapping from 52, into 9, such that (i) Pi{ x: @ is not continuous at x) = 0, (ii) @*Pi = P2, then, if f is Riemann integrable from tin2 into R, f 0 @ is Riemann integrable over 9, and if .$= (5,) is Pi-uniformly distributed,

’ ~fO~(~,)-jfoedP,=jfdP,, 77 n=l hence @(&,) is P,-uniformly distributed over 0,. @ can be called a simulation morphism. Example 1. Let x=5$, II=1

a, E (0, l}

with ViV 5

a, # 0.

n=N

Then define a map QS from (0, l] into [0, 11” by QS: XE(O,

l] + (yl,...,

then QS is a simulation morphism.

y,) E [0, 11’

with

y,=

O” a

c I?=1

Sn$pl

; A

N. Bouleau / Effective computationof expectations

21

Questions. (1) Are there simulation morphisms which from good low discrepancy sequences on [0, l] give good sequences on [0, l]‘? (2) Are there simulation morphisms @ from [0, 11 into [0, 11” such that f 0 @ are of bounded variation as soon f is of bounded variation on [0, llS? 1.3. Reduction of a non-Riemann

integrable random variable

(a) Clearly, if X is a real random variable on ([0, 11, 99([0, 1]), dx) which is not Riemann integrable, it is not possible to use directly X to know the law of X by simulation. If (5,) is uniformly distributed, the average

does not converge to E(X)

in general.

Example 2. Let x E [0, l] given by a, E (0, l}

with V’N f

a,, # 0.

n=N

Let us define X(x) = sup; n

i

a,;

k=l

then X is not Riemann integrable. Indeed, let x: lim$

i

ak=t

and a,=0

,

k=l

clearly for Lebesgue measure P(A) = i, and if x E A, then X(x) hood of x there is a y with X(y) = 1. Example 3. Here is a more probabilistic let s,=x,+

< 1. But in every neighbour-

example. The Xn’s being iid with law $dx on [ - 1, 11,

*** +x,

be the symmetrical random walk. Let us put x=

sup (i

n,2

s, +J2nloglogn

A2 i

vo; I

then X is bounded but not Riemann integrable. (b) To see where the difficulty lies, let us recall what can be called the fundamental of simulation.

theorem

28

N. Bouleau / Effective computation of expectations

Theorem 4. For every probability measure p on IWd with compact support, fhere is a Riemann integrable random variabie H with value in [Wd defined on ([0, 11, dx) with iaw ct. This theorem is effective, its proof gives h explicitly if p is known, see [l]. A more general theorem exists for stochastic processes, see [17]. (c) These facts lead us to introduce the following notion. Definition 5. A non-Riemann integrable random variable X is said to be reduced if a Riemann integrable random variable Y can be exhibited with the same law as X. Most of almost sure convergence theorems of probability theory give examples of random variables whose reduction is known. The reduction is in general a less difficult operation than to find an analytical expression of the law even if analytical is understood in a large sense (allowing the use of nonholomorphic usual functions). Example 6. Let (U,) be the coordinate mappings of ([0, llN, dxBN). Let us consider the Markov chain xn =

&((n +1)x,-,

+ lf,,,n_,~),

x0

E

(0, l>,

which is a manner of writing Polya’s urn. If E = a( Ui, i < n), ( xn) is an E-martingale in [0, l] and if x, = lim sup, x,, then x, is not Riemann integrable. Nevertheless if x0 = $, the law of xc0 is the Lebesgue measure on [0, 11. Hence x, is reduced. 1.4. Computation of the expectations random variables effectively in L’

of nonbounded

or non-Riemann

integrable

random variables:

(a) Nonbounded random variables are very commonly used in simulation. For instance the log function on [0, l] is used for simulating the exponential law by

x= -;

log u.

But the use of nonbounded random variables is impossible in simulation if the position of the poles are unknown. Even when this position is known, the result will depend on the speed with which the uniformly distributed sequence nears the poles. Example 7. To compute /$1/2 dx = 2 by simulation, sequence 5 = (&,) over (0, 1) and let us consider S1

+

let us use a uniformly

distributed

n

If the sequence 5, is changed on a null set A c N (i.e., such that l/iVC,N,,l,(n) sequence is still uniformly distributed. Let us take tp2 = l/p*; then

+ 0) the new

N. Bouleau / Effective computation

of expectations

29

(b) Nevertheless, it is often possible to compute the expectation of nonbounded or even of non-Riemann integrable functions (without proceeding to a reduction to the Riemann integrable case). Example 8. Let U, be the coordinates of ([0, l]“, dx@“). Let us put X,=X+-

IZ 2u.-1

and

Y=limsup]X,]. n

i=l

It can be proved that Y is not bounded and that (Y A 1) V (the martingale X, converges almost surely in L2 and 02

IIY-

IT71 ll22G c i=n+l

1)

is not Riemann integrable. But

1

-2. ’

Hence to compute E(Y) with the precision c > 0, it is sufficient to choose n large enough so that C+ 1 < $6 and then to compute by simulation E( I X,, I) with the accuracy $E. On the classical probability spaces used in simulation which are compact, E(Y) can be computed as soon as a sequence of continuous functions f, is effectively known such that

IIY-fn

IL’+O*

In practice on [0, l]“, this property of being effectiue in L’ splits into two properties: (i) An effective bound decreasing to zero is known showing that ]Y] dP=O. lim k+cc J fYl>k

(ii) For each k a sequence f,, k of polynomials with rational coefficients the limit

is known such that

lim ]Ifn,k -(Yr\k)v(-k)llL1=O n is proved with effective bound. Remark. If only by Fatou’s lemma it is necessarily effective in L’. In general, the supermartingales are not effectively in L’ the dominated convergence theorem does

known that Y belongs to L’, the variable Y is not almost sure limits of nonuniformly integrable positive unless they can be obtained also in another way. Even not yield the fn,k’s.

2. Integration of functionals of stochastic processes In situations related to stochastic processes, one has often to do with integration in infinite dimension. When computing (expectations of functions of) entrance times or entrance points of a stochastic process in a set or in an event, or, more general, functionals of the sample paths, one is dealing with random variables which can be bounded and can be Riemann integrable but defined on ([0, llN, .93([0, llN), dXaN). Let us consider as a generic example the case of Markov chains (to which reduces also the case of diffusions by discretization of SDEs) and let us look how the classical simulation method works.

30

N. Bouleau / Effective computation of expectations

Let X,, be a Markov chain defined by X n+1= F:(X,Y n, u,+,),

X,=x,

(2)

where F: Rd X IA X [0, ilk -+ Rd is given and where the Un’s are i.i.d. with law dx on [0, ilk. If the following quantities have to be computed: l E(T), E( X,), E[G( X,, T)] for T stopping time such as T= inf{ n: (X,, n) E A}, l E(CE,,C( X,, Xk+i)) for C: R2 + R! representing costs, . or more generally E( H( X.)) where H : RN -+ US’, by (2) X, can actually be defined on the probability space ([0, 11“)” with U, being the coordinates, and the problem is to find good double sequences

uj= (UOj )...)

N

uij

)...

)E ([o, 11”)

)

in such a way that

& I?

H(x-(uj))

j=l

converges rapidly to E( H( X.)). (a) Let us take k = 1 for simplicity. Then let us remark at first that if we take uij=

y+S,

mod 1,

where (K)iEN is i.i.d. on [0, l] and ( tj) j > 1 is a (low-discrepancy) uniformly distributed sequence on [0, 11, one gets a sequence (uj) which is not uniformly distributed over ([0, 11)” for dxBN. (b) On the contrary, if ( uij) is chosen in such a way that for every q E N u;=

(z$,...,

Uqj)

is uniformly distributed over [0, 114, then (uj) is uniformly distributed over [0, 11” and

in the narrow sense. With Halton sequences, we get infinite-dimensional Sobol’ [18]. For such sequences ( ui) the average

Halton sequences which where studied by

$E

SC"j)

n=l

converges to E(g) for smooth g, i.e., for “almost cylindrical” functions g. Nevertheless these sequences are unusable in practice because, even for cylindrical functions, the convergence is too slow (or more exactly too late) in high dimension as we have already seen in Section 1.

N. Bouleau / Effective computation of expectations

31

3. The Bernoulli shift method The infinite- or high-dimensional case turns out to have its specific features which allow to avoid picking completely different sample paths and doing the average on them. 3.1. An efficient method Although the pointwise ergodic theorem has no typical speed of convergence (such as the iterated logarithm law), the method, we shall explain now, seems to be the most efficient one at present for computing high-dimensional functionals of stochastic processes: the implementation of this algorithm takes great advantage of the notion of pointers (or equivalent device of piles management). Let us consider the preceding Markov chain X n+r =JIX,,

n, u,+,),

X0=x,

where the (U,) are i.i.d. random variables with law dx on [0, 11. As we saw the process ( X,,)n>l can be represented on ([0, llN, dx @N). For the notations to be simple, it is actually convenient here to represent it on ([0, llN*, dxBN* ). The Un’s being the coordinate mappings, n >, 1. Let us take, for instance, an entrance time T, T = inf{ n 2 1: X, E A} and a functional G( X,, T). The expectation E(G( X,, T)) can be computed by Birkhoff’s theorem E(G( X,, as soon as G( X,,

T)) = N~oo lim k I{:

G(X,,

T)o e”,

a&

(3)

T) E L’, where t? is the shift operator satisfying

v’w E [o, l]“*

v’n >, 1.

u, 0 e(w) = u,+,(w),

The use of pointers can be explained as follows. For applying formula (3) we have to compute G( X,, T) on successive points w, B(w), e2(w), . . . of [0, llN*, these points are sequences of points in [0, 11: @ = (u,W, 64

u,(4,.

= (7-J&4,

.-9 u&4,.

u,(4,...,

..>,

u,+,(4...),

Then if the stopping time T is finite, as we suppose here, only a finite length of each of these sequences has to be computed: the sequence w being computed until T(o) and this is told by the test (test)

X,(w) i X,(w)

still outside A

*

k-c T(w),

E A for the first time

-

k = T(O).

Suppose the sequence w = (U,, U,, . . . ) has been picked have been put in pointers as indicated in Fig. u,, u,,... The following sequence 8(o) = (U,, U,, . . . ) is partially or it must be lengthened, in which case we have the new

out, and the pseudorandom numbers 1. already chosen, either it is long enough scheme (see Fig. 2).

32

N. Bouleau / Effectitiue computation

of expecrations

.

Fig. 2.

In these pictures the arrows * X n+l but in this computations computations the sequence

=w,,

n7

represent computations to do, that is, mainly, to compute

u,,,),

computation of F(x, n, y) only the variable x (and n) is new and partial depending only on y can be stored in the box of U,,, in such a way that only easy remain to be done. This storage has of course to be made during the lengthening of when it occurs as explained above.

3.2. Speed of convergence Let us quote the following two results on speed of convergence ([0, llN, dPN) (cf. [ll]). Proposition 9. For every sequence from [O, 11N into R such that -1 $lf % il n i=o

o&Ef Ii

(a,),

for the Bernoulli

shift

EN, (Y, > 0, a,, + 0, there exists a continuous function

-+ +co,

a.s.

f

N. Bouleau / Effective computation of expectations

Proposition 10. For every sequence (c,), Bore1 set A with P(A) = i, on which

c, > 0, increasing to infinity,

33

with c1 > 2, there exists a

They show that the speed of convergence can be arbitrarily slow or arbitrarily near 0(1/n). Nevertheless, a speed of convergence can be obtained if the function whose expectation is to be computed fulfils additional assumptions. Some of these results will be published elsewhere. 3.3. Connexion with normal numbers Let us replace [0, 11” by (0, l}” for simplicity of the discussion. The best points for integration by the shift method are fast normal numbers. Let us recall that a real number x in [0, 11, or a sequence of binary digits, is called normal if the sequence x 0 8” is uniformly distributed over (0, l}” equipped with the measure (58, + $i)@“. Explicit normal numbers are known, the Champemowne sequence, for instance. There are other known normal sequences which are generalizations of Champernowne’s one (see, for instance, [2,13,14]). Some of these constructions make use of (s, m)-de Bruijn sequences which are periodic mappings u from N into {O,..., m - l} of period mS such that the m, strings of length s (u(n),...,

u(n+s-l)),

n=O,l,...,

mS-1,

are all different. Algorithms are known for generating infinite de Bruijn sequences (see [9] and its references), that is to say infinite sequences whose initial sections of length m’, considered as loops, are (s, m)-de Bruijn sequences for every s. The efficiency of these sequences is difficult to evaluate, and is largely an open subject. One of the main difficulties is to summarize the performance of a sequence by a single scale while there is naturally an infinity of discrepancies, one on each finite product space. On the space { 0, l}” Flajolet et al. [4] give an asymptotic result which shows that in a certain sense, random sequences are not far to be the best ones. Concluding remarks (a) What was explained for a Markov chain is of course valid for a large class of stochastic processes or stochastic fields, which, in usual cases, can be defined in a natural way on (10, IIN, -QWAIIN), dxaN ) after discretization if necessary. (b) It can happen that the process, whose functional is studied, is itself mixing or simply ergodic. In this case the shift method can be applied to the process itself instead of to the underlying coordinates. This works as soon as a sample path of arbitrary length of the process can be simulated. (c) It is well known that pointers are particularly convenient to represent tree structures, for instance binary trees. Then the space which takes the place of the product [0, llN is the space [0, llB or a space of the form AB where A is some probability space and B = {I, r}(N) (finite sequences of I’s and r’s). Such a space with a suitable extension of Birkhoff s theorem is an interesting device for dealing with branching processes which are of great practical interest.

34

N. Bouleau / Effective computation of expectations

Nevertheless the above explained “linear” shift method works actually for branching processes as well, and after our experiments this “linear” implementation runs even faster. (d) The shift method can be slightly improved by using the two-sided Bernoulli shift, which allows to never throw a box away. The idea is to simulate two independent copies of the process, one with the classical right to left shift, the other with a left to right shift beginning with the box left by the first one. But the benefit is negligible in high dimension.

References [l] N. Bouleau, Probabihtes de I’IngPnieur, Variables Aleatoires et Simulation (Hermann, Paris, 1986). [2] J.M. Dumont and A. Thomas, Une modification multiplicative des nombres g-normaux, Ann. Fat. Sci. Toulouse Math. (5) 3 (1986-87) 367-373. [3] H. Faure, Discrepance de suites associees a un systeme de numeration (en dimension s), Acta Arith. 41 (1982) 337-351. (41 P. Flajolet, P. Kirchenhofer and R.F. Tichy, Deviations from uniformity in random strings, Probab. Theory Related Fields 80 (1988) 1399150. [5] J.H. Halton, On the efficiency of certain quasi-random sequences of points in evaluating multidimensional integrals, Numer. Math. 2 (1960) 84-90. [6] J.M. Hammersley, Monte Carlo methods for solving multivariate problems, Ann. New York Acad. Sci. 86 (1960) 844874. [7] E. Hlawka, Funktionen von beschrlnkter Variation in der Theorie Gleichverteilung, Ann. Math. Pura Appt. 54 (1961) 325-333. [8] Hua Loo Keng and Wang Yuan, Application of Number Theory to Numerical Analysis (Springer, Berlin, 1981). [9] A. Ivanyi, Construction of infinite de Bruijn arrays, Discrete Appt. Math. 22 (1988/89) 289-293. [lo] I.F. Koksma, Een algemeene stelling uit de theorie der gelijkmatige verdeeling modulo 1, Math. B (Zutphen) 11 (1942-43) 7-11. [ll] U. Krengel, Ergodic Theorems (Gruyter, Berlin, 1985). [12] H. Niederreiter, Low-discrepancy and low-dispersion sequences, J. Number Theory 30 (1) (1988) 51-70. [13] G. Rauzy, Discrepance dune suite complttement Cquirepartie, Ann. Fat. Sci. Tozdouse Math. (3) (1981) 105-112. [14] G. Rauzy, PropriCtCs statistiques des suites arithmttiques, Presses Univ. de France, 1985. [15] P.K. Sarkar and M.A. Prasad, A comparative study of pseudo and quasi random sequences for the solution of integral equations, J. Comput. Phys. 68 (1987) 66-68. [16] W.M. Schmidt, Simultaneous approximation to algebraic numbers by rationals, Acta Math. 125 (1970) 189-201. [17] A.V. Skorohod, Studies in the Theory of Random Processes (Addison-Wesley, Reading, MA, 1965). [18] J.M. Sobol’, On the convergence of infinite dimensional cubature and simulation of Markov chains, Voprosy Vychisl. i Prikl. Mat. (Tashkent) 32 (1975) 161-167. [19] T. Wamock, Low-discrepancy points sets in transport codes, in: S.K. Zaremba, Ed., Application of Number Theory to Numerical Analysis (Academic Press, New York, 1972).