Economics Letters 6 (1980) 53-57 North-Holland Publishing Company
THE SYMMETRIC
MAXIMUM
ENTROPY DISTRIBUTION
*
Henri THEIL University Received
of Chicago, Chicago, IL 60637, USA 24 November
1980
The symmetric maximum entropy (ME) distribution order statistics. Its properties (including higher-order the non-symmetric ME distribution.
is introduced based on symmetrized moments) are compared with those of
1. The symmetric ME distribution
The maximum entropy (ME) distribution proposed by Theil and Laitinen (1980) is based on order statistics, x1
=x+gxi
-Xn+l--i),
i= l,...,n
(1)
so that Z’,...,F are symmetrized order statistics which are located symmetrically around X. The symmetric maximum entropy (SME) distribution is then constructed from the 2”s in the same way that the non-symmetric ME distribution is obtained from the x1 ‘s. It should be
* Helpful discussions with Delores gratefully acknowledged.
Conway,
James
53
F. Meisner
and Patricia
O’Brien
are
54
H. Theil / The symmetric
maximum
entropy distribution
possible to extend this approach to incorporate of the population. ’
other known properties
2. Evidence and potential evidence The SME median is obviously identical to the sample mean. A comparison of the mean square sampling errors of the ME and SME medians of samples from a normal population shows that the gain obtained from exploiting symmetry is very small for very small n, and that this gain increases for larger n. ’ We should expect such an increasing gain because xi and x”+‘-~ for fixed i become more and more independent as n + cc so that L? in (1) is then obtained by combining independent data. Whereas the ME distribution is primarily effective for ’ For example, take a sample of size n from a p-variate
normal distribution so that the p marginal distributions are not only symmetric but also identical up to the mean and variance. This suggests using adjusted order statistics of the form .Y*+c,sh for variable h, where _?,,= sample mean, .?i = symmet nzed order statistic (I), c, =arithmetic average of (%A-.?,,)/s,, over h=l,...,p, and s,=standard deviation of .ZA,...,n,“. It would be worthwhile to extend Meisner’s (1980) risk evaluation of ME moment matrices by including those associated with cZ;,and x,, +cis,. In particular, it seems plausible that the ME moment matrix associated with f, +c,s, is relatively precise when the population moment matrix is close to diagonal because the c,‘s are then averages of p almost independent data. ’ For random samples from a standard normal population, the mean square error (MSE) of the SME median is l/n and that of the ME median was tabulated by Theil and O’Brien (1980) for odd values of II< 20. For n = 3 the MSE of the ME median is 0.3405 and that of the SME median is 0.3333; for n=5 these values are 0.2336 and 0.2OOQ and for n = 19 they are 0.0753 and 0.0526. A comparison with the corresponding MSEs of sample medians in Theil and O’Brien (1980) indicates that the incremental precision gain from exploiting symmetry in addition to continuity is very small at n=3, but that it soon dominates the gain from exploiting continuity only as n rises. Further evidence can be obtained from a comparison with the ME quartiles analyzed by O’Brien (1980) for values of n so that q=(n+ 1)/4 is an integer (i.e., so that the sample upper quartile is well-defined). The SME interquartile range is identical to the ME interquartile range (even if the population density is asymmetric), while the SME upper quartile has au expectation equal to that of the ME upper quartile if the population density is symmetric. Preliminary calculations for random samples from a normal distribution, based on Teichroew’s (1956) tables, indicate that the MSE of the SME upper quartile is only marginally below that of the ME upper quartile at n=3 (or q= l), but that the relative difference is much larger at n=7 (or q=2), in agreement with the results for ME and SME medians described above. These MSE calculations are laborious because each symmetrized order statistic, and hence also the SME upper quartile, involve all n order statistics via X in (1).
17. Theil / The symmetric
maximum entropy distribution
55
small n by exploiting the population’s continuity, the SME distribution should be particularly effective for moderate and large n if the population density is indeed symmetric. Another way of assessing the merits of the SME distribution is by means of higher-order moments. The ME distribution is based on primary midpoints, Ii =3(xi +x’+‘) for i=O,l,...,n with x0 =x1 and x”+’ = x”, and on secondary midpoints, Xi = t(&, + &) for i = 1,. . . , n, so that the rth order moment of the secondary midpoints around the mean is (l/n)&%?, where Xi = X’ - X. It is shown in section 3 that the corresponding moments pr of the ME distribution for r = 2, 3 and 4 are
(2)
+;(X;A,
+X;An)+&(A:
+A:),
(4
where hi = (6, - Ei_,)*,X, =x1 - X and X, = x” - X. The corresponding moments p, of the SME distribution takes the same form except that they are calculated from the primary and secondary midpoints associated with the symmetrized order statistics (1). Obviously, ps = 0. We should expect that if the parent distribution is symmetric, kurtosis estimation is more efficient when based on the SME distribution (at least for large n) because the imposed symmetry will yield better estimates of the tails. Another reasonable presumption is that the SME distribution should have certain merits with respect to outliers: if either x1 or xn is an outlier, then 2’ and 2” are both less of an outlier.
3. Derivations The ME distribution consists of n - 2 uniform distributions over the with 1, = (&_ ,,&) and two exponential distributions intervalsI,,...,I,_,
56
H. Theil / The symmetric
maximum
entropy distribution
at the tails, I, = (- oo,[,) and I,, = (<,,co). The result (2) follows directly from Theil (1980, eq. (4)). To verify (3) we note that for i = 2,. . . ,n - 1,
=
(xi)’
(5)
++x’Ai.
It is shown in the next paragraph that &(X3~X~I,)=(x’)3+~(~‘+2~,,)A,,
(6)
Since each 1, contains a fraction l/n of the mass of the ME distribution, we obtain the third ME moment around zero by averaging (5),(6) and (7): &(X3)=;
.i (Xi,‘+& 1=1
,i XiAi +&,,A,
+&A,).
(8)
1=1
Then, by subtracting from p3 = &(X3) - 3:~~ - X3 the analogous equation for (l/n)&%:, we obtain (3) using (2). To verify (7) we note that X” is the mean and X” - &- , is the standard deviation of X E I,, . Then, using Johnson and Kotz (1970, pp. 210-21 l), we have &(X3~XE4)=(x”)3+3x”(x”
-5,_J2+2(xn
-5,-J3
from which (7) can be derived. The proof of (6) is analogous. For i = 2,. . . , n - 1 we have &(X4JXEJI)=~(~;_,
+&$
= (xi)“+f(xi)‘Ai
+&-,t,’
+Ei-&
+.$f)
+$,A;.
(9)
It may be shown along the lines of the previous paragraph that &(X4~X~I,)=(X1)4+f(X1)2A,
+&A;
+&+A,,
(10)
H. Theil / The symmetric
&(x~IxEI~)=(X”)~++(Y)*A~ Averaging
maximum
entropy distribution
+&A; +<:A,.
57
(11)
of (9), (10) and (11) yields
&(x4)=; ,i (xi,‘+& 2 (Xi)*Ai 1=l
1=l
+&
,i: A; 1=l
(12) Then, by subtracting from p4 = &(X4) - 4x&( X3) + 6X2p2 + 3X4 the analogous equation for (l/n)&%?, we obtain (4) using (2),(g) and (12).
References Johnson, N.L., and S. Kotz, 1970, Continuous univariate distributions-l (Houghton Mifflin Company, Boston, MA). Meisner, J.F., 1980, A risk evaluation of the maximum entropy moment matrix, Economics Letters 5, no. 4, 329-333. O’Brien, P.C., 1980, The quartiles of the maximum entropy distribution, Economics Letters, this issue. Teichroew, D., 1956, Tables of expected values of order statistics and products of order statistics for samples of size twenty and less from the normal distribution, Annals of Mathematical Statistics 27, 410-426. Theil, H., 1980, A simple form of the maximum entropy moment matrix and its inverse, Economics Letters 5, no. 1, 53-57. Theil, H., and K. Laitinen, 1980, Singular moment matrices in applied econometrics, in: P.R. Krishna& ed., Multivariate analysis (North-Holland, Amsterdam) 629-649. Theil, H. and P.C. O’Brien, 1980, The median of the maximum entropy distribution, Economics Letters 5, no. 4, 345-347.