Applied Mathematics and Computation 185 (2007) 658–666 www.elsevier.com/locate/amc
The maximum entropy method applied to stationary density computation Jiu Ding
a,*
, Lawrence R. Mead
b
a
b
Department of Mathematics, The University of Southern Mississippi, Hattiesburg, MS 39406-5045, USA Department of Physics and Astronomy, The University of Southern Mississippi, Hattiesburg, MS 39406-5046, USA
Abstract The maximum entropy method (maxent) is widely used in the context of the moment problem which appears naturally in many branches of physics and engineering; it is used to numerically recover the density with least bias from finitely many known moments. We introduce the basic idea behind this method and apply this method to approximating fixed densities of Markov operators in stochastic analysis and Frobenius–Perron operators in ergodic theory of chaotic dynamics. Ó 2006 Elsevier Inc. All rights reserved. Keywords: Maximum entropy; Markov operators
1. Introduction The concept of entropy was first introduced by Clausius into thermodynamics in the middle of the nineteenth century, and later used in a different form by L. Boltzmann in his pioneering work on the kinetic theory of gases in 1866 [9]. It is a measure of the amount of information required to specify the state of a thermodynamic system. The famous Second Law of Thermodynamics says that in an isolated system, the (thermodynamic) entropy never decreases. A modern concept of entropy was established in information theory by C.E. Shannon in 1948. The Shannon entropy defined for all finite sample spaces with events w1, w2, . . . , wn with probabilities p1, p2, . . . , pn (discrete information sources) is n X pi log pi ð1Þ H ðp1 ; p2 ; . . . ; pn Þ ¼ i¼1
which is a measure of uncertainty of an information source. Based on the Shannon entropy, the Kolmogorov–Sinai–Ornstein entropy defined for measurable transformations of probability spaces was successfully used in solving the problem of isomorphism of dynamical systems. *
Corresponding author. E-mail address:
[email protected] (J. Ding).
0096-3003/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2006.07.052
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
659
And in 1965 R. Adler, A. Konhein, and M. McAndrew introduced the topological entropy for continuous transformations of compact Hausdorff spaces (see e.g., [5]). The principle of maximum entropy was introduced in the context of information theory in 1957 by Jaynes. Since his seminal paper [8], the maximum entropy method has been widely used in many areas of science and technology. Basically this numerical scheme is used to numerically recover the required density function with least bias among all the possible candidates which satisfy given constraints. The maximum entropy method has diverse applications in physics and engineering (see [12,13] and references therein). In this paper, we will introduce the basic idea of the maximum entropy method and present some of the recent progresses of the method in solving some type of operator equations. In Section 2 we give the formation of the method and study their properties. Another formulation based on the Galerkin projection will also be presented. In Section 3 we apply the method to solve the Markov operator fixed density problem which is important in the stochastic analysis of deterministic dynamical systems. Some error analysis and numerical experiments are contained in Section 4. We conclude in Section 5. 2. The Boltzmann entropy and the maximum entropy method Let R(X, R, l) be a probability measure space. A nonnegative function in L1 L1(X) such that kf k1 X jf jðxÞdlðxÞ ¼ 1 is called a density. The set of all densities is denoted by D. Let u log u u > 0; gðuÞ ¼ ð2Þ 0 u ¼ 0: Definition 2.1. If f P 0, then the (Boltzmann) entropy of f is defined as Z Z H ðf Þ ¼ gðf ðxÞÞdlðxÞ ¼ f ðxÞ log f ðxÞdlðxÞ: X
ð3Þ
X
Some basic properties of the entropy are [2,9]: (i) H(f) is either finite or 1. (ii) H : {f P 0 : f 2 L1} ! [1, 1) is a proper, upper semicontinuous concave function, strictly concave on its domain that consists of all functions f P 0 with H(f) > 1. (iii) For any a > 1, the (upper) level set ff P 0 : H ðf Þ P ag
ð4Þ 1
is weakly compact in L . The concavity of the scalar function g (2) leads to the Gibbs inequality u u log u 6 v u log v for u; v P 0
ð5Þ
which implies that Z Z f ðxÞ log f ðxÞdlðxÞ P f ðxÞ log gðxÞdlðxÞ 8f ; g 2 D: X
ð6Þ
X
Moreover, we have the following sharp inequality [2]: Z 2 Z f ðxÞ 1 dlðxÞ P f ðxÞ log jf ðxÞ gðxÞjdlðxÞ ; gðxÞ 2 X X
f ; g 2 D:
ð7Þ
Let {g1, g2, . . . , gN} be a finite set of functions in L1 L1(X), which is the dual space of L1. The maximum entropy method refers to solving the following constrained optimization problem: Z max H ðf Þ; s:t: f 2 D; f ðxÞgn ðxÞdl ¼ mn ; 1 6 n 6 N ; ð8Þ X
where m1, . . . , mN are given constants. The Gibbs inequality gives the solution to this problem:
660
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
Proposition 2.1. Suppose a1, . . . , aN are real numbers such that the function PN exp n¼1 kn gn ðxÞ fN ðxÞ ¼ R PN exp n¼1 kn gn ðxÞlðdxÞ X satisfies the constraints in (8), that is, R PN g ðxÞ exp n¼1 kn gn ðxÞlðdxÞ X i ¼ mi ; R PN exp n¼1 kn gn ðxÞlðdxÞ X
ð9Þ
i ¼ 1; . . . ; N :
ð10Þ
Then fN solves the maximum entropy problem (8). Proof. For simplicity, set Z N X Z¼ exp kn gn ðxÞlðdxÞ: X
n¼1
Then H ðfN Þ ¼ log Z þ
N X
kn mn :
n¼1
Given any f 2 D that satisfies the constraints. Then from inequality (6) " # Z Z N X kn gn ðxÞ lðdxÞ H ðf Þ 6 f ðxÞ log fN ðxÞlðdxÞ ¼ f ðxÞ log Z X
X
¼ log Z þ
Z
N X
kn
f ðxÞgn ðxÞlðdxÞ ¼ log Z þ
X
n¼1
n¼1 N X
kn mn ¼ H ðfN Þ:
n¼1
Remark 2.1. Traditionally the maximum entropy problem (8) with X = [0, 1] and gn(x) = xn is for numerically determining an unknown probability density p whose first N + 1 power moments are given m0 ¼ 1;
mn ¼
Z
1
xn pðxÞdx;
n ¼ 1; 2; . . . ; N :
ð11Þ
0
The constrained information-theoretic entropy carried by p is S½p ¼
Z
1
pðtÞ ln pðtÞdt þ
0
N X n¼0
Z
1
t P ðtÞdt mn : n
kn
ð12Þ
0
According to the maximum entropy principle, the probability density p(x) that is ‘‘least biased’’ or ‘‘most likely’’ is that which maximizes the entropy functional (12) with respect to p and with respect to the Lagrange multipliers, kn, n = 0, 1, 2, . . . , N, which enforce the moment constraints (11). Standard variational calculus yields a (nonlinear) set of equations determining p and the k’s: Z 1 mn ¼ xn pN ðxÞdx; n ¼ 0; 1; 2; . . . ; N ; ð13Þ 0
where pN ðxÞ ¼ Z 1 exp
N X
kn x n
n¼1
is the (N + 1)-moment approximate density, and where we have denoted ek0 þ1 as Z. Mead and Papanicolaou [12] have shown that the maxent Eq. (13) have a unique solution provided certain conditions obtain guaranteeing a distribution exists satisfying the full moment problem with N = 1.
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
661
The maximum entropy method (8) can also be interpreted as a Galerkin projection method as follows. Let a (nonlinear) operator A : L1 ! L1 be defined by AhðxÞ ¼ exp hðxÞ
ð14Þ
and consider the operator equation Ah = f*, where f* 2 D satisfies Z f ðxÞgn ðxÞdlðxÞ ¼ mn ; n ¼ 1; 2; . . . ; N : X
Let MN = span{g0, g1, . . . , gN} with g0 1. The Galerkin method associated with the subspace MN for solving the operator equation (14) is to find hN 2 MN such that Z ½AhN ðxÞ f ðxÞgn ðxÞdlðxÞ ¼ 0; n ¼ 0; 1; 2; . . . ; N X
or equivalently, Z Z exp hN ðxÞ gn ðxÞdlðxÞ ¼ f ðxÞgn ðxÞdlðxÞ ¼ mn ; n ¼ 0; 1; 2; . . . ; N : X X PN It follows that the solution hN ¼ n¼0 kn gn satisfies the following algebraic equations: Z N X exp kn gn ðxÞ gk ðxÞdlðxÞ ¼ mk ; k ¼ 0; 1; 2; . . . ; N : X
n¼0
Thus, we have the following: Proposition 2.2. The Galerkin method for the operator equation (14) with respect to the finite dimensional subspace MN = span{g0, g1, . . . , gN} gives the solution hN 2 MN satisfying Z exp hN ðxÞ gn ðxÞdlðxÞ ¼ mn ; n ¼ 0; 1; 2; . . . ; N : X
So the density function fN = exp hN is the maximum entropy solution of (8). 3. The maximum entropy method for Markov operator equations The maximum entropy idea has been successfully applied to solving Fredholm integral equations [11], computing absolutely continuous invariant measures [4], and estimating the Lyapunov exponents [6]. In this section we use the same approach to solve Markov operator equations. A linear operator P : L1 ! L1 is called a Markov operator if PD D. Markov operators describe the evolution of probability densities of dynamical systems. A special subclass of Markov operators is that of Frobenius–Perron operators PS associated with a nonsingular transformation S : X ! X (that is, l(A) = 0 implies l(S1(A)) = 0) defined by Z Z P S f dl ¼ f dl 8A 2 R: S 1 ðAÞ
A
Frobenius–Perron operators are mainly employed in ergodic theory for the study of statistical properties of chaotic dynamics [9]. Let P* : L1 ! L1 be the dual operator of P. Then Z Z Pf ðxÞgðxÞlðdxÞ ¼ P gðxÞf ðxÞlðdxÞ ð15Þ X
X
1
1
for all f 2 L and g 2 L . The dual operator of a Frobenius–Perron operator PS is the so-called Koopman operator (or composition operator) US : L1 ! L1 defined by U S gðxÞ ¼ gðSðxÞÞ:
ð16Þ
Our purpose is numerically solving the fixed density problem Pf = f for a Markov operator P. A fixed density of the Markov operator P is called a stationary density of P since it gives asymptotic properties of the
662
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
iteration of P [9]. For the sake of simplicity of the presentation, we only consider the case that X = [0, 1] and l is the usual Lebesgue measure m. The maximum entropy formation of the fixed density problem is based on the following simple lemma. Lemma 3.1. Suppose f 2 L1(0, 1) satisfies that Z 1 xn f ðxÞdx ¼ 0; n ¼ 0; 1; . . . 0
Then f = 0. Rx Proof. Let F ðxÞ ¼ 0 f ðtÞdt. Then F(1) = 0. For n > 0 integration by parts gives Z 1 Z 1 Z 1 1 n n n1 0¼ x f ðxÞdx ¼ ½x F ðxÞ0 n x F ðxÞdx ¼ n xn1 F ðxÞdx: 0
0
0 1
2
Since F is continuous, F 2 L (0, 1). And the fact that the set fxn gn¼0 is dense in L2(0, 1) implies that F(x) 0, and hence f = 0. h Proposition 3.1. f* 2 D is a fixed point of P if and only if Z 1 f ðxÞðI P Þxn dx ¼ 0; n ¼ 1; 2; . . .
ð17Þ
0
Proof. Suppose P has a stationary density f *. Then for n = 1, 2, . . . , Z
1 n
x Pf ðxÞdx ¼
Z
0
1
xn f ðxÞdx:
0
Since Z
1
xn Pf ðxÞdx ¼
0
Z
Z
1
P xn f ðxÞdx;
0 1
f ðxÞðI P Þxn dx ¼ 0;
n ¼ 1; 2; . . .
0
Conversely, suppose f * 2 D solves the above system. Then by duality, Z 1 xn ðI P Þf ðxÞdx ¼ 0; n ¼ 1; 2; . . . 0
Since the above equality is also true for n = 0, by Lemma 3.1, Pf* = f*. h Hence, the fixed density problem and the (homogeneous) moment problem (16) are equivalent. This motivates the maximum entropy method for the fixed density problem of Markov operators: Algorithm. Choose a natural number N and solve the maximum entropy problem: find f 2 D which solves Z 1 max H ðf Þ : f ðxÞðI P Þxn dx ¼ 0; 1 6 n 6 N ð18Þ 0
to get an approximate solution fN of the operator equation Pf = f. When PS is the Frobenius–Perron operator associated with S : [0, 1] ! [0, 1], since (PS)* = US, the constraints in (18) are Z 1 f ðxÞ½xn SðxÞn dx ¼ 0; 1 6 n 6 N : ð19Þ 0
Then, a direct consequence of Proposition 2.2 is
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
663
Corollary 3.1. For the Frobenius–Perron operator equation PSf = f associated with S, we have gn(x) = xn S(x)n, mn = 0, n = 1, 2, . . . , N. Suppose f* is a stationary density of P. Then the maximum entropy solution fN = exp hN, where hN is the Galerkin method solution for Ah = f* on n o M N ¼ 1; x SðxÞ; x2 SðxÞ2 ; . . . ; xN SðxÞN : The existence and uniqueness of the maximum entropy solution to the above system is guaranteed by Theorem 3.1. Suppose f *(x) > 0 on [0, 1] and the functions fðI P Þxn gNn¼1 are linearly independent. Then there exists a unique vector (k1, . . . , kN) that satisfies the N constraints Z
1
½ðI P Þxi exp
N X
0
kn ðI P Þxn dx ¼ 0;
i ¼ 1; . . . ; N
ð20Þ
n¼1
and the unique solution of (18) is fN ðxÞ ¼ R 1 0
exp exp
PN
n¼1 kn ðI
P Þxn
n¼1 kn ðI
P Þxn dx
PN
:
ð21Þ
The convergence of the maximum entropy method is based on a general theory developed by Borwein and Lewis in the 1990s [2,3]. Definition 3.1. Suppose X is a normed vector space. A concave functional f : X ! [1, 1) is strongly concave if it is upper semicontinuous, strictly concave on the domain {xjf(x) > 1}, with weakly compact (upper) level sets {xjf(x) P a} for a 2 R, and whenever xn converges weakly to x 2 X and f(xn) converges to f(x) > 1, it follows that xn converges to x in norm. Example 3.1. If X = Lp with p 2 [1, 1), then f(x) = kxkp is strongly concave. The proof of the following key result is referred to [2]. Lemma 3.2. The entropy H : L1 ! [1, 1) is strongly concave. Let X be a locally convex topological space, let {CN} be a nested decreasing sequence of closed subsets of X, and let W : X ! [1, 1). Consider the sequence of problems maxfW ðhÞ : h 2 C N g
ð22Þ
and the limiting problem maxfW ðhÞ : h 2 \1 N ¼1 C N g:
ð23Þ
Lemma 3.3. Suppose W has compact level sets. If hN is an optimal solution to (22) and h1 is a unique optimal solution of (23) with W(h1) > 1. Then hN ! h1 in the topology of X. Theorem 3.2. Suppose f * > 0 is a unique stationary density of the Markov operator P such that Z 1 H ðf Þ ¼ f ðxÞ log f ðxÞdx > 1: 0
Then limN!1 fN = f * strongly. Remark 3.1. A sufficient condition for H(f*) > 1 is that f* is a bounded function. For an estimate of the convergence rate, we need
664
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
Definition 3.2. For each N define ( ) N X T n N EN ¼ inf 1 þ log f kn ðI P Þx : ðk1 ; . . . ; kN Þ 2 R : n¼1
ð24Þ
1
Theorem 3.3. Under the same condition of Theorem 3.2, kfN f k1 6 EN eEN =2 :
ð25Þ
Remark 3.2. Since it is hard to estimate EN in (24) and the upper bound in (25) may not be sharp, an applicable error bound based on Taylor series can be obtained as given in [7]. 4. Numerical results We use the maximum entropy method with a high precision Gaussian quadrature to calculate the stationary density f * of a Markov operator. First we apply our method to a Markov operator P : L1(0, 1) ! L1(0, 1) with a stochastic kernel Z 1 yexy Pf ðxÞ ¼ f ðyÞdy: y 0 e 1 The unique stationary density f * of P is f ðxÞ ¼
ex 1 ; ax
R1 where the constant a 0 ðex 1Þ=x dx 1:317902151. In implementing our algorithm we needed to find P*xn explicitly. Since P is an integral operator, its dual operator P* is given by Z 1 Z 1 yexy y P gðyÞ ¼ gðxÞdx ¼ exy gðxÞdx y ey 1 0 0 e 1 from which we find x P x ¼ x e 1 n
Z
1 xt n
e t dt ¼
ex
Pn
0
j n! nj þ j¼0 ð1Þ ðnjÞ! x n x x ðe 1Þ
ð1Þ
nþ1
n!
:
Our next two examples are for the approximation of a fixed density of the Frobenius–Perron operator PS and the corresponding Lyapunov exponent Z 1 k¼ f ðxÞ ln jS 0 ðxÞjdx 0
for the continuous fraction map [1] 1 1 S 1 ðxÞ ¼ x x on [0, 1], where b1/xc is the integer part of 1/x, and for the cusp map [1] pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi S 2 ðxÞ ¼ 1 j2x 1j on [0, 1]. The stationary densities of P S 1 and P S 2 are f1 ðxÞ ¼
1 ; ð1 þ xÞ ln 2
f2 ðxÞ ¼ 2ð1 xÞ
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
665
respectively, and their Lyapunov exponents are k1 ¼
p2 1:948349; 6 ln 2
1 k2 ¼ : 2
The numerical computations were performed with double precision on the Cray C916/10 supercomputer at the Mississippi Center for Super-computing Research. In the maximum entropy algorithm, Newton’s method is used to solve the nonlinear system of Eq. (20) to a high precision (up to 15 significant digits). To evaluate the derivative matrix required in the algorithm, we used the 50-node Gaussian quadrature for the numerical integrations. For the first three of these, we compared the quantity pffiffiffi Z 1 x aN dx fN ðxÞ 2 0 with the exact average pffiffiffi Z 1 x a dx f ðxÞ 2 0 to estimate the error. In problems not involving chaos, averages such as aN are known to be rapidly convergent as the number of input moments increases [12] as indicated by our first example in the following. In Table 1, we list the quantities aN versus N for the three examples with the exact averages a* for f ; f1 ; f2 listed in the last row. The maxent algorithm worked very well for the integral Markov operator P partly because this operator is compact. Table 2 lists the function values of f5 and f* at discrete nodes. For each of the two maps S1 and S2, we have computed the Lyapunov exponent according to the formula Z 1 k¼ f ðxÞ ln jS 0 ðxÞjdx: ð26Þ 0
The results are shown in Table 3. Table 1 Averages of the computed fixed densities N
P
P S1
P S2
1 2 3 4 5 6
0.351112609 0.351052177 0.351051964 0.351051969 0.351051967
0.313330 0.310709 0.312135 0.312240 0.311654
0.3014 0.2931 0.2917 0.2817 0.2690
a*
0.351052239
0.309605
0.2667
Table 2 Density function values x
f*(x)
f5(x)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.7588 0.7980 0.8400 0.8849 0.9330 0.9845 1.0397 1.0989 1.1624 1.2306 1.3038
0.7587 0.7980 0.8400 0.88490 0.9330 0.9844 1.0397 1.0988 1.1624 1.2305 1.3037
666
J. Ding, L.R. Mead / Applied Mathematics and Computation 185 (2007) 658–666
Table 3 Computed Lyapunov exponents and exact values Map
N
kexact
kmaxent
S1 S2
9 5
p2 6 ln 2
1.999519 0.5217
0.5
1:948349
5. Conclusions The maximum entropy method gives an alternative means to numerically determine with good accuracy some desired density of some physical process with even small number of known moments. In most examples, it is comparable or even better than the famous Ulam method for computing invariant measures [10]. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
C. Beck, F. Schlo¨gl, Thermodynamics of Chaotic Systems, Cambridge University Press, 1993. J.M. Borwein, A.S. Lewis, Convergence of the best entropy estimates, SIAM J. Optim. 1 (2) (1991) 191–205. J.M. Borwein, A.S. Lewis, On the convergence of moment problems, Trans. Am. Math. Soc. 325 (1991) 249–271. J. Ding, A maximum entropy method for solving Frobenius–Perron operator equations, Appl. Math. Comput. 93 (1998) 155–168. J. Ding, T.-Y. Li, Entropy – an introduction, Nankai Series Pure Appl. Math. Th. Phys., vol. 4, World Scientific, 1993, pp. 26–53. J. Ding, L.R. Mead, Maximum entropy approximation for Lyapunov exponents of chaotic maps, J. Math. Phys. 43 (2002) 2518. M. Leaseburg, L.R. Mead, Error bounds in maximum entropy approximations, J. Math. Phys. 34 (1993) 6009. E.T. Jaynes, Information theory and statistical mechanics, Phys. Rev. 106 (1957) 620–630. A. Lasota, M. Mackey, Chaos, Fractals, and Noises, second ed., Springer-Verlag, New York, 1994. T.Y. Li, Finite approximation for the Frobenius–Perron operator, a solution to Ulam’s conjecture, J. Approx. Theory 17 (1976) 177– 186. [11] L.R. Mead, Approximate solutions of Fredholm integral equations by the maximum-entropy method, J. Math. Phys. 27 (12) (1986) 2903–2907. [12] L.R. Mead, N. Papanicolaou, Maximum entropy in the problem of moments, J. Math. Phys. 25 (1984) 2404–2417. [13] Nailong Wu, The Maximum Entropy Method, Springer, 1997.