Journal
of Econometrics
37 (1988) 195-209.
CALCULATION APPROXIMATION
North-Holland
OF MAXIMUM ENTROPY DISTRIBUTIONS AND OF MARGINALPOSTERIOR DISTRIBUTIONS*
Arnold
ZELLNER
and Richard
A. HlGHFlELD
Unruersi[v of Chicugo, Chicago, IL 60637, USA Received June 1986, final version received June 1987 It is shown how members of the broad class of Cobb-Koppstein-Chen probability density functions can be generated by maximizing Shannon’s entropy subject to appropriate side conditions. A particular member of the class, the quartic exponential distribution, is obtained by maximizing Shannon’s entropy subject to the conditions that the probability density function integrate to one and have given first four moments. An algorithm for computing the quartic exponential distribution is provided and applied to approximate marginal posterior probability densities. These approximations are compared with those provided by the Pearson density function approximation procedure and exact densities obtained by numerical integration. It is found that the maximum entropy method performs very well.
1. Introduction Shannon (1948) has indicated how maximum entropy (ME) distributions can be derived by a straightforward application of calculus of variations techniques. In particular, he has shown that the ME distribution that maximizes entropy subject just to a normalization condition is the uniform distribution. When additional side conditions giving the first two moments of the distribution are imposed, the ME distribution is the normal distribution. In the present article, we provide an algorithm for computing the ME distribution subject to a normalization condition and side conditions on the first four moments of the distribution. The resulting distribution thus maximizes entropy and has four given moments as well as being normalized. Jaynes (1982, p. 1; 1983, p. 317) has stated: It has long been recognized, or conjectured, that the notion of entropy defines a kind of measure on the space of probability distributions such that those of high entropy are in some sense favored over others. The basis for this was stated first in a variety of intuitive forms: that the distributions of high entropy represent greater ‘disorder’, that they are *Research financed by funds from the H.G.B. Alexander Endowment Fund, Graduate School of Business, University of Chicago and from NSF Grant SES791314. We thank T. Amemiya and the referees for helpful comments.
0304-4076/88/$3.5001988,
Elsevier Science Publishers
B.V. (North-Holland)
A. Zellner and R. A. Highfield, Maximum
196
‘smoother’, of entropy
entropy distributions
that they ‘assume less’, according as an information measure, etc.
to Shannon’s
interpretation
He goes on to write that ‘probably most information theorists have considered it obvious that, in some sense, the possible distributions are concentrated strongly near the one of maximum entropy; i.e. that distributions with appreciably lower entropy than the maximum are atypical of those allowed by the constraints’ [Jaynes (1982, p. l)] and presents a ‘Concentration Theorem’ to make this point analytically. In the present context, there are other distributions that have given first four moments, say Pearson curves, but they differ from the ME distribution in having lower entropy. Also, while we have concentrated attention on the case of four given moments in order to make comparison with the Pearson approach, it is possible to use more moment constraints in deriving maximum entropy distributions. Use of four moment constraints does allow, however, for skewness, kurtosis and possible multiple modes in the maximum entropy distribution. In what follows, we first present an algorithm for computing ME distributions subject to side conditions on the zeroth and first four moments. Then we present an example illustrating the procedure and compare the results with a distribution fitted by the Pearson approach. Our interest in this problem arose from the problem of calculating marginal posterior distributions from a high-dimensional joint posterior distribution. While this can be done using Monte Carlo numerical integration procedures as, for example, in Kloek and van Dijk (1980) and Zellner and Rossi (1984) this procedure is expensive in terms of computing time. An alternative procedure is to compute the first four moments of each variable in the joint distribution and to use a ME distribution or a Pearson distribution to approximate the marginal distribution of each variable. We compare the ME and Pearson approximations and then go on to relate the particular ME distribution that we employ to a class of distributions put forward by Cobb, Koppstein and Chen (1983). We also provide conditions under which the Cobb-Koppstein-Chen distributions are ME distributions. 2. Maximum Shannon W= where
p(x)
entropy distributions (1948) defines entropy -
/
(or uncertainty)
as
P(x)logp(x)dx,
is a probability
density
function.’
Maximizing
W subject
to
’ It has come to be appreciated that entropy should be defined as W’ = /p(x) log[ p ( x)/m ( x)] d x where m(x) is a measure function - see, e.g., Jaynes (1968). We assume m(x) = c, a constant. Then w’ = W- loge with W given in (1) and thus maximization of W’ is equivalent to maximization of W. Also, the integrals in (1) and in all subsequent equations, unless otherwise noted, are definite integrals over the range of x. In the examples in section 4, this range is the real line.
A. Zellner and R. A. Highfield, Maximum entropy distributions
197
various side conditions is well known in the literature as a method for deriving the forms of minimal information prior distributions; e.g., Jaynes (1968) and Zellner (1971, p. 43; 1977). Jaynes (1982) has extensively analyzed examples in the discrete case, while in Gokhale (1975) Kagan, Linnik and Rao (1973) and Lisman and van Zuylen (1972) continuous cases are considered. In our problem, it is the following maximization which we want to perform: maximize W= -
J
p(x)logp(x)dx,
(2a)
subject to
Jp(x)dx=&,
(2’4
Jxp(x)dx=p.;,
(24
I x’p(x)dx=&,
(2d)
x’p(x)dx=&,
(2e)
x4p(x)dx=k)4,
(2f)
J
/
where & = 1 and y;, &, &, and p; are the given four non-central moments of the distribution. This problem can be expressed as the Lagrangian
L=/p(x)logp(x)dx+h,[/p(x)dx-l]++p(x)dx-p;]
+x2
x*p(x)dx-& [,
I
+A,
x’p(x)dx-& [I
+x4x4p(x)dx-pi I,
1 (3)
for which the necessary conditions for a stationary point are logp(x)
+ 1+ A, + x,x + A,X2 + A,X3 + x,x4 = 0,
(4)
198
A. Zellner ond R. A. Highjield, Maximum
plus the five constraints matrix assumed positive
From (4) it is seen that general form:
entropy distributions
given in (2b) through definite:
(2f) above with the following
the ME distribution
of x will have the following
p(xlh)=exp(-[1+X,+X,x+X,x2+h3x3+Xqx4]),
(5)
where X = (A,, A,, A,, A,, h4) and A, > 0. To specify the ME distribution completely our next step would be to substitute (5) into (2b) through (2f) and solve this highly non-linear set of equations for the lambdas in terms of the four known moments. This is not trivial as it involves, among other things, the integration of an exponential function wherein the exponent is quartic.’ Having attempted, without success, to arrive at a general analytic solution we adopt the standard Newton method of solution that should lead to good approximate solutions in specific cases. That is, we view (5) as a function of the lambdas, expand it in Taylor’s series about trial values for the lambdas, drop the quadratic (in the lambdas) and higher-order terms and solve iteratively. If we make the following definition:
G,(X)
= /x’exp(
- [l + A,+
h,x + X,x2 + A,x3 + A4x4]) dx, for
our initial
set of equations
p;=G,(A) zGi(AO)
for
can then be expressed
i=O,l,...,
(6)
8,
as
i=O,1,2,3,4,
+ (‘,-AO,)g,o+
(Ai-AOl)gii+
+(h3-A03)g;3+(X4-hOq)g;4,
(‘2-‘?)gi2 (7)
where A0 = (A\, A:,, A\, A\, A”,) is a vector of initial values, G,(A’) is Gi(h) evaluated at A0 and g.,, is aG,/aX, evaluated at A’. It can now be seen that all of the partial derivatives g,j are simply moments of the ME distribution for the given trial values for lambda, i.e., moments of p(xlA’). Indeed we can ‘See O’Toole
(1933) for work on this integration.
A. Zellner
and R. A. HighJield,
Maximum
entrop_v distrihurions
199
state that - G,(Ao) =
gk,
for all 0 I k, J I 4, such that k + J = i, and for i = O,l, 2,. . . ,8. This allows US to restate our eq. (7) as the simple linear system Go
G,
G,
G3
G4
%'
G,
G
G3
G4
G,
a,0
G,
G,
G,
G,
%
6,"
G,
G4
G,
G
G,
830
G,
G6
G,
G,__6:
_G4
=
GO-PLb G, - 1-4 G, - PL;
(8)
G3-P; G4-$4
where S,O= X, - Xq. This system is solved for h’ = ho + 6’, which becomes our new vector of trial lambdas, and iterations continue until 6 becomes appropriately small. Of course this requires that the Hankel matrix on the left side of (8) be non-singular. A proof is given in a footnote.3 Further, an upper bound on the distance between a computed solution and the actual solution can be computed as described in Stoer (1976, p. 225 ff.). The only difference between this procedure and the standard procedure for iterative solution of a non-linear system is the necessity of solving the nine integrals in (6) given the current trial lambdas at each iteration. These are calculated numerically using univariate Simpson’s rule [see, e.g., Zellner (1971, p. 400) for a discussion of this technique]. The procedure has been programmed in FORTRAN for the DEC-20 computer at the University of Chicago Graduate School of Business. It has been applied to several sample problems, some of which are given below. In our experience to date, convergence to a solution accurate to five decimal places has generally required ten to twenty iterations and has consumed less than five seconds of CPU time when 3Let Q be the matrix vector. Then
on the left side of eq. (8), and let y’ = (yO, ~1, ~2~73, ~4) be mY no*-mo
V’QY= Y&I + 2YoY1c1
+ (Y:
+ 2Y,Y2)G2 + 2(YoY, + Y1Y2jG3
+(Y22+2YoY4+2Y,Y3)G,+2(Y,Y4+Y2Y3)G~ +( Y3’ + 2Y2Y4)% + 2Y,Y,% =j[ ad
+ Y&I
YO+ Ylx + Y2x2 + Y3x3 + Y~~4]2pWYPx.
since this quantity is always positive it follows that the
matrix Q is positive definite
A. Zellner and R. A. Highjieli,
200
Maximum
the four moments are given and 200 subdivisions every numerical integration. 3. Uniqueness
entropy distributions
along the X axis are used for
of the ME distribution
The calculation of the parameters of the ME distribution involves the solution of a set of non-linear equations and thus the possibility of multiple solutions exists. Aroian (1948) and Matz (1978) consider the problem of calculating the maximum likelihood (ML) estimates of the parameters of a quartic exponential distribution such as (5) from a random sample from such a distribution. This ML problem is conceptually quite different from our problem of fitting a ME density to four known moments. It is simple to show, however, that the first-order conditions (FOC) for the ML estimates are mathematically the same as those given above in our ME procedure [with sample moments replacing the given moments in (2b) through (2f) above]. Since the maximum likelihood estimates for the exponential family are unique [see BarndorffNielsen (1973)] the problem of arriving at the ML estimates is one of discarding the inadmissible solutions to the FOC or showing that only one exists. Matz (1978) has shown that a unique solution to the FOC does exist for the special case of a symmetric quartic exponential distribution and in simulations involving the general case has found that unique solutions to the FOC exist over a broad range of the parameter space. The uniqueness issue can be resolved by noting that the matrix on the left-hand side of (8) is the Jacobian matrix for mapping R4 X R+ + R5 represented by the first-order condition eqs. (2b) through (2f) once (5) has been substituted for p(x). Since this Jacobian matrix is positive definite symmetric, we can appeal to the well-known result of Gale and Nikaida (1965) that this mapping is univalent in an arbitrary convex region of R5.Thus, if the FOC have a solution, it is unique over the parameter space. 4. Computed
examples
Our examples a two-parameter y, = 1
come from the Zellner and Rossi (1984) Bayesian logit model. The analysis employs the population with probability
= 0 with probability
analysis model
of
Pi, 1-
Pi,
where P,= F(& + plx,) = l/[l + exp{ - (/?a + &x,)}], that is F is the logistic cumulative distribution function. In that paper samples of sizes 50, 25 and 15 were generated with & = -4.0 and pr = 0.8 and values of the independent variable xi drawn from a normal distribution. Posterior distributions of PO
A. Zellner and R. A. Highfield, Maximum
201
entropy distributions
Table 1 Moments of marginal posterior densities for & and &, Pl
B2
P3
P4
N= 50
2
- 3.82 0.845
5.21 0.221
- 2.56 0.023
85.7 0.154
N = 25
2
- 0.264 0.643
7.69 0.334
- 4.40 0.0503
184.0 0.354
- 164. 1.06
44.2 1.56
- 12.5 2.35
N=15
6250.0 7.71
Table 2 Values of parameters in approximate ME marginal densities for &, and &. x0
h
A*
x3
x4
N=50
2
2.15554 0.82375
0.83048 - 4.41293
0.13163 3.17480
0.00302 - 0.34097
o.oOOO1 0.00084
N = 25
2
1.32644 - 0.08708
0.34511 - 2.02503
0.07888 1.97757
0.00172 - 0.25099
0.00001 0.01163
4.01779 2.33347
0.54281 - 2.86409
0.04258 1.19593
0.00123 - 0.18524
0.00001 0.01177
N=15
and pi, based on diffuse priors, were analyzed for the three samples (N = 50, 25 and 15). The moments and 200 ordinates of the marginal posterior densities for &, and pi were obtained through numerical integration using a bivariate version of Simpson’s rule. As explained in Zellner and Rossi (1984, p. 381, fn. 7) many bivariate numerical integrations are required to obtain the ordinates of these marginal densities. The means and higher moments about the means are given in table 1. Application of the above procedure with these given moments (converted to raw moments) results in the parameters for (5) given in table 2, the ME distribution. Table 3 shows selected ordinates of the exact marginal posterior distributions, obtained by numerical integration and the corresponding ordinates for the ME distribution used as an approximation. For the purpose of comparison, the table also includes the ordinates for the curves from the Pearson system that were fitted using the moments in table 1. Details of the procedure for fitting the Pearson curves are presented in Highfield (1982) and an extensive discussion of the Pearson system can be found in Elderton and Johnson (1969). We note that Pearson Type IV curves, which have unlimited range, were fitted for both variables for sample sizes 50 and 25, whereas Pearson Type I curves, which have range limited in both directions, were fitted for both variables when the sample size is 15.
-
15.0 14.0 13.0 12.0 11.0 10.0 - 9.0 -8.0 - 7.0 -6.0 - 5.0 - 4.0 -3.0 - 2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0
Abscissa
Ordinates
0.0000 0.0001 0.0003 o.ooo9 0.0025 0.0064 0.0152 0.0325 0.0618 0.1027 0.1461 0.1747 0.1718 0.1360 0.0852 0.0417 0.0159 0.0048 0.0011 0.0002 0.0000
Exact”
Marginal
functions
0.0000 0.0001 0.0003 0.0008 0.0023 0.0062 0.0150 0.0325 0.0622 0.1032 0.1463 0.1740 0.1707 0.1357 0.0860 0.0426 0.0162 0.0047 0.0010 0.0002 0.0000 0.0000 0.0001 0.0003 0.0008 0.0024 0.0063 0.0150 0.0324 0.0619 0.1029 0.1463 0.1744 0.1712 0.1359 0.0857 0.0423 0.0161 0.0047 0.0011 0.0002 0.0000
-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
A. Sample size n = 50 0.0000 0.0007 0.0039 0.0168 0.0582 0.1592 0.3426 0.5819 0.7877 0.8635 0.7812 0.5950 0.3890 0.2222 0.1127 0.0515 0.0215 0.0082 0.0029 0.0010 0.0003
Exact
density of PI
of a two-parameter
0.0001 0.0005 0.0034 0.0163 0.0587 0.1614 0.3446 0.5800 0.7824 0.8598 0.7823 0.5990 0.3923 0.2233 0.1122 0.0507 0.0209 0.0080 0.0029 0.0010 0.0003
Ordinates
Marginal
of the parameters
Maximum entropy
Abscissa
density
Table 3 probability
PearsonC
posterior
Ordinates
marginal
Maximum entropyb
density of &
for the exact and approximate
0.0001 0.0006 0.0035 0.0164 0.0584 0.1606 0.3440 0.5806 0.7840 0.8608 0.7819 0.5978 0.3914 0.2232 0.1126 0.0510 0.0210 0.0080 0.0028 0.0009 0.0003
Pearson
logit model.
3 p S 5. 2
$+
g S 9 3
2 “5
k
& z 2 7 f: Q_ c
h
0.0001 0.0006 0.0014 0.0033 0.0071 0.0145 0.0275 0.0480 0.0763 0.1084 0.1355 0.1470 0.1369 0.1071 0.0708 0.0394 0.0201 0.0074 0.0026 0.0008
0.0015 0.0023 0.0035 0.0053
-35.00 -32.95 -30.90 -28.85
0.0000
-15.00 -13.95 -12.90 -11.85 -10.80 -9.75 -8.70 - 7.65 -6.60 -5.55 -4.50 -3.45 -2.40 -1.35 -0.30 0.75 1.80 2.85 3.90 4.95 6.00
0.0008 0.0016 0.0030 0.0051
0.0001 0.0002 0.0005 0.0012 0.0030 0.0069 0.0145 0.0280 0.0491 0.0774 0.1087 0.1345 0.1453 0.1353 0.1075 0.0721 0.0402 0.0185 0.0069 0.0021 0.0005
O.OOG8 0.0015 0.0027 0.0046
-2.0 -1.5 -1.0 -0.5
C. Sample size n = 15
- 2.00 -1.75 -1.50 -1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
B. Sample size n = 25
O.COOO 0.0002 0.0005 0.0012 0.0030 0.0069 0.0145 0.0280 0.0491 0.0773 0.1086 0.1345 0.1453 0.1354 0.1076 0.0720 0.0402 0.0184 0.0069 0.0021 0.0005
0.0000 O.oooO 0.0009 0.0073
0.0002 0.0071 0.0284 0.0863 0.2085 0.3964 0.5924 0.7022 0.6725 0.5332 0.3594 0.2113 0.1110 0.0531 0.0236 0.0098 0.0039 0.0015
0.0000 0.0000
0.0000
0.0000 0.0000 0.0005 0.0062
0.0001 0.0009 0.0056 0.0259 0.0861 0.2129 0.4014 0.5907 0.6948 0.6674 0.5347 0.3643 0.2151 0.1120 0.0523 0.0222 0.0087 0.0032 0.0011
0.0000 0.0000
_d _d _d 0.0014
0.0009 0.0057 0.0259 0.0859 0.2126 0.4014 0.5913 0.6953 0.6673 0.5342 0.3639 0.2151 0.1121 0.0524 0.0223 0.0087 0.0032 0.0011
0.0000 0.0001
0.0000
0.0078 0.0114 0.0164 0.0230 0.0313 0.0411 0.0512 0.0598 0.0639 0.0606 0.0492 0.0329 0.0174 0.0070 0.0022 0.0005 0.0001
-
Pearson’
Abscissa
0.0080 0.0119 0.0169 0.0234 0.0315 0.0408 0.0507 0.0591 0.0632 0.0604 0.0497 0.0337 0.0180 0.0072 0.0020 0.0004 0.0000
0.0075 0.0117 0.0173 0.0245 0.0331 0.0422 0.0510 0.0576 0.0605 0.0580 0.0494 0.0359 0.0204 0.0075 0.0010 0.0000 _*
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
C. Sample size n = 15
0.0354 0.1091 0.2204 0.3130 0.3393 0.3026 0.2354 0.1670 0.1106 0.0700 0.0429 0.0257 0.0151 0.0057 0.0000 O.oooO 0.0000
Ekact
0.0357 0.1133 0.2250 0.3127 0.3344 0.2972 0.2330 0.1680 0.1141 0.0735 0.0444 0.0244 0.0117 0.0046 0.0013 0.0003 0.0000
Ordinates Maximum entropy
density of &
Maximum entropyb
Marginal
Ordinates
density of &
0.0370 0.1268 0.2301 0.3013 0.3208 0.2940 0.2393 0.1762 0.1186 0.0733 0.0417 0.0218 0.0104 0.0046 0.0018 0.0006 0.0002
Pearson
aTaken from an example of a two-parameter logit model using simulated data found in Zellner and Rossi (1984). The population model is y, = 1 with probability Pi and ): = 0 with probability 1 - P,, where P, = F(&, + /3,x,) and F is the logistic cumulative distribution function. In the example /I,, = - 4.0 and & = 0.8. The moments and ordinates of the exact marginal posterior distribution were calculated using a bivariate version of Simpson’s rule from the joint posterior distribution. bMaximum entropy ordinates are obtained by deriving the maximum entropy distribution p(x) subject to the four known moments and a normalization constraint. The parameters of the result p(xlX) = exp( -[l + h, + h,x + X,x2 + h,x3 +X,.x4]) were derived through Taylor’s series expansion and solving iteratively. ‘Pearson ordinates were obtained by fitting the curve from the Pearson system which has the same moments. The curves fit were Pearson Type IV curves for sample sizes 50 and 25, and Pearson Type I for sample size 15. *The abscissa value is outside the range for the Pearson curve.
26.80 24.15 22.70 20.65 18.60 16.55 14.50 12.45 10.40 - 8.35 -6.30 - 4.25 - 2.20 -0.15 1.90 3.95 6.00
Exacta
Abscissa
Marginal
Table 3 (continued)
A. Zellner ond R. A. Highfield, Maximum
entropy distributions
205
Table 4 Calculated
N=50 N=25 N=15
2
entropies
of exact, ME and Pearson
densities.
Exact density
Entropy of ME density
Pearson density
2.240 0.660
2.240 0.660
2.240 0.660
2.435 0.863
2.435 0.865
2.435 0.865
3.270 1.598
3.282 1.611
3.279 1.608
As can be seen in table 3, the ME approximations seem to capture the shape of the exact distributions quite well. Differences between the exact and marginal ordinates occur most often at the third and fourth decimal places. Although the largest relative errors of approximation appear to occur in the tails, they are still quite small. For sample sizes 50 and 25 we also note that there is little to choose between the ME and Pearson approximations. However, for sample size 15 the ME distribution provides a better ‘fit’, particularly in the tails, since the range for the Pearson curve is bounded. The similarity of the ordinates of the three distributions is, perhaps, not surprising when one considers the values of their entropies which are listed in table 4. As one would expect from Jaynes’ (1982,1983) comments on entropy concentration, the entropies of both the exact and Pearson densities are very close to the maximum. While the results from this example are certainly encouraging, we should note that this is not necessarily indicative of a general result. We recommend that ordinates of the fitted ME distribution (or Pearson curve) always be checked against selected ordinates for the exact distribution. This will prevent, for example, the approximation of a bimodal exact distribution with a unimodal distribution. We suspect, however, that in most well-behaved problems the ME approximation will usually be quite good. Finally, as noted above a ML procedure such as that proposed by Matz (1978) and our ME procedure should result in the same parameter estimates for the quartic exponential distribution. Matz computes the estimates for two examples in his paper. We computed the parameters using our ME procedure, taking Matz’ sample moments as given. As expected, we achieved the same result. We note that Matz’ procedure is more complex than ours but may be more efficient computationally. Some linear relationships between the parameters of the likelihood function and its moments are exploited to reduce the number of non-linear equations to be solved to two plus the calculation of a normalization constant.
206
A. Zellner and R. A. Highjield, Maximum entropy distributions
5. Relationship
to other distributions
In this section we relate the maximum entropy distribution derived above to a class of distributions recently considered by others, and in so doing will highlight some properties of the ME distribution. Cobb, Koppstein and Chen (1983) consider a general class of non-mixture multimodal densities which arise as the stationary density functions of a non-linear diffusion process of the form dx, = p(xJdt
+ a(x,)dw,,
(9)
where wt is a standard Wiener process, 2p(x) = g(x) - u’(x) and a2(x) = u(x). Such densities are useful in catastrophe theory since the multiple modes can be interpreted as a result of multiple stable states in dynamical systems rather than as the result of heterogeneous populations, the usual interpretation of multimodal mixture densities. The general class considered by Cobb, Koppstein and Chen can be expressed as
fk(xlS) =
Whp(- /[dx)/ub)ldx)~
where gk(x)=&+&x+ ... +fikx“, k>O, /?‘=(&,,&,...,Pk), the integral in the exponential is indefinite, the domain of fk(x) is the open interval constant given on which u(x) is positive and where ,$‘(/3) is a normalization the values of the coefficients of g(x). The number of modes is determined by k and the density approaches 0 or approaches 00 at the roots of u(x) depending on the coefficients of g(x) [unless g(x) and u(x) have common roots]. Many common univariate distributions arise as special cases of (10). In particular, the members of the Pearson system can be expressed in this form with gk(x) being of degree 1 and u(x) being of degree I 2. We shall show that the ME distribution given in (5) is also a special case. Cobb, Koppstein and Chen (1983) restrict themselves to four types of densities within the general class. Each type corresponds to a particular specification for u(x), the variance of the underlying stochastic process: -~coxx<,
Type N:
u(x)
= 1,
Type G:
u(x)
=x,
o
Type I:
u(x)
=x*,
o
Type B:
u(x)
=x(1
- x),
O
A. Zellner and R. A. Highjeld, Maximum entropy distributions
207
The problem they consider is the estimation of the parameters of fk(x) given the type and k, the degree of gk(x). In their notation, for example, a Type B density with k = 3 is denoted by Type B,. The class of densities described by (10) is closely related to the class of densities arrived at by maximizing entropy subject to various side conditions. Indeed the ME distribution given in (5) is equivalent to a Type N3 distribution. To see this rewrite (5) as follows: p(xlX)=exp(-[1+X,-c]) xexp(
- [ h,x + h,x2
+ A,x3 + X,x4 + c]),
(11)
and we see that with u(x) = 1 we have
Jg(x)dx=A,x+A,
04
x2 + h,x3 + x,x4 + c,
and so g(x)=X,+2h,x+3X,x2+4X4x3,
03)
where &, = A,, & = 2X,, & = 3X,, p3 = 4X,, and 5(B) = exp( -[l + A, - cl). Thus we see that the Type N3 density can be obtained by maximizing entropy subject to four moment constraints. Since u(x) = 1, the modal points of this density are the roots of (13) for which g(x) has a sign change from - to +. Further, the ME density is now seen to be the stationary density of a non-linear diffusion process. Similarly, it is easy to show that the other types considered by Cobb, Koppstein and Chen can be obtained by maximizing entropy subject to side conditions as indicated in table 5.
Table 5
Me Nk Gk
Maximize (k + 1)-moment Co3
entropy
constraints
to:
and normalization
for - 00 < X
k-moment constraints, normalization and the constraint that llogxp(x)dx = c, a known constant, for 0 < x < cc (k - 1)-moment constraints, straints that jlogxp(x)dx=c, c1 and c2 known constants,
B,
subject
normalization and the conand J(l/x)p(x)dx=c,, for 0 < x < cc
(k - 1)-moment constraints, normalization and the constraints that llogxp(x)dx = cl and /log(l - x)p(x)dx = c2, q and c2 known constants, for 0 i X-C 1
208
A. Zellner and R. A. Highjield, Maximum entropy distributions
6. Conclusions In this paper we have outlined a method by which the complete form of an approximate marginal posterior density can be derived from the first four moments of the actual density. Further, if one is willing to accept Shannon’s entropy measure as a criterion, this approximation is optimal. However, it is certainly not the only method available for fitting densities. For methods involving approximations by series expansions or transformations to normality the reader is referred to Kendall and Stuart (1969, ch. 6) and Elderton and Johnson (1969). Also, in general situations in which information is available about the moments of a distribution, the maximum entropy distribution is a distribution that deserves serious consideration as a hypothesized population distribution. Further, the distribution approximation technique described in this paper can be applied to a broad range of problems, e.g., to approximate the distribution of the Durbin-Watson test statistic, as suggested by the editor, or any statistic or estimator with known moments. Finally, we have shown that particular ME densities are identical to those arising as stationary density functions of a rather general non-linear diffusion process.
Postscript Recently, Professor Lawrence R. Mead of the Department of Physics, Washington University, St. Louis, Missouri has drawn our attention to his and N. Papanicolaou’s paper, ‘Maximum entropy in the problem of moments’, Journal of Mathematical Physics, Vol. 25 (8), August 1984, pp. 2404-2417, in which an approach similar to ours is put forward and compared to Pad& and other approximation approaches. The August 1982 version of our paper was presented to the meeting of the NBER-NSF Seminar on Bayesian Inference in Econometrics, held in October 1982.
References Aroian, L.A., 1948, The fourth degree exponential distribution function, AMals of Mathematical Statistics 19, 589-592. Bamdorff-Nielsen, O., 1973, Exponential families and conditioning (S. Moller Christensen, A/S, Denmark); reprinted (Wiley, New York). Cobb, L., P. Koppstein and N.H. Chen, 1983, Estimation and moment recursion relations for multimodal distributions of the exponential family, Journal of the American Statistical Association 78, 124-130. Elderton, W.P. and N.L. Johnson, 1969, Systems of frequency curves (Cambridge University Press, Cambridge). Gale, D. and H. Nil&da, 1965, The Jacobian matrix and global univalence of mappings, Mathematische Annalen 159, 81-93. Gokhale, D.V., 1975, Maximum entropy characterizations of some distributions, in: G.P. Patil et al., eds., Statistical distributions in scientific work, Vol. 3 (Reidel, Dordrecht, Holland) 299-304.
A. Zeilner and R. A. Highjield, Maximum
entropy distributions
209
Hightield, R.A., 1982, Approximating marginal density functions with curves from the Pearson system, Manuscript (H.G.B. Alexander Research Foundation, Graduate School of Business, University of Chicago, Chicago, IL). Jaynes, E.T., 1968, Prior probabilities, IEEE Transactions on Systems Science and Cybernetics SSC-4, 227-241. Jaynes, E.T., 1982, What is the problem?, Manuscript presented to the Econometrics and Statistics Colloquium, University of Chicago, March 1982; published in: Jaynes (1983). Jaynes, E.T., 1983, Papers on probability, statistics and statistical physics, R.D. Rosenkrantz, ed. (Reidel Publishing Company, Dordrecht, Holland). Kagan, A.M., Y.V. Linnik and C.R. Rao, 1973, Characterization problems in mathematical statistics (Wiley, New York). Kendall, M.G. and A. Stuart, 1969, The advanced theory of statistics, 3rd ed., Vol. 1 (Hafner, New York). Kloek, T. and H.K. van Dijk, 1980, Bayesian estimates of equation systems parameters: An application of integration by Monte Carlo, Econometrica 46 (1978) 1-19; reprinted in: A. Zellner, ed., Bayesian analysis in econometrics and statistics: Essays in honor of Harold Jeffreys (North-Holland, Amsterdam) 311-329. Lisman, J.H.C. and M.C.A. van Zuylen, 1972, Note on the generation of most probable frequency distributions, Statistica Neerlandica 26, 19-23. Matz, A.W., 1978, Maximum likelihood parameter estimation for the quartic exponential distribution, Technometrics 20,475-484. O’Toole, A.L., 1933, On the system of curves for which the method of moments is the best method of fitting, Annals of Mathematical Statistics 4, 80-93. Shannon, C.E. 1948, The mathematical theory of communication, Bell System Technical Journal, July-Oct.; reprinted in: C.E. Shannon and W. Weaver, The mathematical theory of communication (University of Illinois Press, Urbana, IL) 3-91. Stoer, J., 1976, Einfbhrung in die numerische Mathematik I (Springer Verlag, New York). Zellner, A., 1971, An introduction to Bayesian inference in econometrics (Wiley, New York). Zellner, A., 1977, Maximal data information prior distributions, in: A. Aykac and C. Brumat, eds., New methods in the applications of Bayesian methods (North-Holland, Amsterdam). Zellner, A. and P. Rossi, 1984, Bayesian analysis of dichotomous quantal response models, Journal of Econometrics 25, 365-393.