The entropies: Some roots of ambiguity

The entropies: Some roots of ambiguity

Socio-Econ. Plan. SE;..Vol. 14, pp. 137-145 PergamonPress Lid., M30 Printed in Great Britain THE ENTROPIES: School of Public SOME ROOTS OF AMBIGUI...

1MB Sizes 79 Downloads 105 Views

Socio-Econ. Plan. SE;..Vol. 14, pp. 137-145 PergamonPress Lid., M30 Printed in Great Britain

THE ENTROPIES:

School

of Public

SOME ROOTS OF AMBIGUITY

KINGSLEY E. HAYNES and Department of Geography,Indiana University,

and Environmental

Affairs

IN 47405,

Market

Research

FRED Y. PHILLIPS of America 624 So. Michigan

Corporation

Bloomington,

U.S.A.

Avenue,

Chicago,

IL 60202,

U.S.A.

and JAMES W. MOHRFELD ManagementScience,

School

(Received

Abstract-

of Business,

I July

University

of Texas,

1979; in revised fotm

Entropy concepts have developedin

four general

Austin,

14 November

TX 78712, U.S.A.

1979)

contexts-thermodynamics,

communication

theory,

statistical information theory and social and life sciences. These concepts have five distinct mathematical forms. Ambiguity and complexity in utilization of the concept has been heightened by this multi-faceted heritage. Effective utilization may be strengthened by a sharper articulation of entropy through combinations of literal, mathematical and graphical modes of expression. Geography and planning as user disciplines have important responsibilities in enhancing cross-disciplinary communication of this important adisciplinarv concept. Implications of these arguments for the social sciences include the need for caution in drawing social analogies from the ambiguous entropy concepts of statistical mechanics; and an indication of the potential benefits of social science models based on the unambiguous entropic constructs of statistical information theory.

INTRODUCTION

has been a major borrower of isomorphic concepts from other fields that produce increased insights in the description and analysis of spatial form and process. The gravity model from physics, point pattern concepts from plant ecology, systems concepts from the life sciences and behavioral constructs from psychology are all evidence of this learning and borrowing phenomena. Some conc’epts are technique-oriented, others are constructs for explanations of specific processes, others are structural or physical analogs and still others are relatively isomorphic in their wide range of purposeful applications. Geography is not alone in this absorption process althouth it appears to be amazingly more receptive and perhaps less critical in these borrowing activities. In recent years one of these concepts that has been widely borrowed and utilized is entropy. The utilization of this concept is not isolated to geography. In fact the power of the concept can be seen in its survival through the shifting contexts of its application. Certainly entropy has been used to measure diverse phenomena. In the physical sciences, it has measured loss of energy in a system; in an information theoretic framework the concept has been related to the amount of information in particular probability distributions; in the life sciences it has been used as a measure of biological time; and in the social sciences entropy has been used as a measure of organization. Though these extensions of the original principle have been revealing and at times very useful, this continual re-definition has not been without problems of correspondence. A handful of spatial analysts have made consistent incursions into this field[l]. The entropy concept has been invoked in several contexts such as numerical map analysis, spatial interaction modeling and spatial hypoGeography

thesis testing121. A major part of the work in numerical map analysis has employed entropy to measure pattern diversity in mapped distributions [3-81. Somewhat similar to this approach is Curry’s [9] attempt to measure “spatial information loss” between differing scales of maps. Much of the related research in spatial interaction and hypothesis testing has been carried out under the rubric of “maximum-entropy models” or “extremal techniques”. Perhaps the key studies in this area are Wilson’s[ IO] derivation of the gravity model by maximizing Boltzmann’s H-function, and the later connection of the gravity model to statistical and mathematical programming theories, due to Charnes, Raike and Bettinger [ 111. In adjacent areas of inquiry, Evans[l2] Wilson and Senior[l3] and Phillips, White and Haynes[l4] have demonstrated certain elements of correspondence between the combinational maximum-entropy model and extremal solutions using constrained information theoretic characterizations of the same problem. Batty[lS, 161and Haynes and Storbeck[l7] have used an entropy based information statistic to compare subjective entropy in testing hypotheses concerning the spatial distribution of urban populations. At different points in time and as a consequence of the expanded use of this concept in spatial analysis a number of summarizations have been developed as pedological tools. In particular Gould’s[lS] review of Wilson’s entropy methods and Webber’s [ 191 clear presentation and more general stance in the utilization of entropy maximizing models have been extremely effective. Similarly Cesario’s [20] applied presentation to the companion field of planning has increased the practitioner’s recognition and utilization of these concepts. However confusion at the simplest level continues to obscure our systematic integration of these ideas and it is hoped that an historical perspective with some cautionary notes may 137

KINGSLEYE. HAYNESet

138

al.

be helpful for the new analyst and seasoned researcher alike. Although our original work on this subject was stimulated by attempts to compare Wilson’s maximumentropy models to Charnes, Raike and Bettinger’s work on constrained information theoritic estimation procedures in the context of spatial interaction modeling[14] the issues were discovered to be more basic and pervasive than originally anticipated. The purpose of this paper is to (i) discuss the development of the entropy concepts with emphasis on the contexts in which these concepts evolved; (ii) discuss the basis of rapid developments in concept utilization in mathematical statistics; (iii) discuss some sources of the ambiguity associated with the word “entropy” and (iv) suggest some regularities in our approach which may reduce ambiguity.

developed-thermodynamics, communication theory, and (statistical) information theory and the social and life sciences . Using Table 1, in the thermodynamics context entropy has been defined using two distinct function forms, (a) and (b). In the communication theory context two distinct function forms (b) and (c) were used; and in the social and life sciences three function forms (a),(b) and(d) have been used. Forms(e),(f) and(g) are those of statistical information theory. Each of the authors listed in Table 1 has developed his own concept of entropy by specifying a function and applying it in a particular context. In this paper the word “concept” is used to signify an author’s intended meaning of the word “entropy”, whether or not this meaning is explicitly based on a contextual application or interpretation of a function. Terms in Table 1 are defined below.

The single word “entropy” has been used in the systems literature to represent various concepts; however, often the basic form of the mathematical function on which the concept under discussion was based is not specified, nor is the reader informed of the context from which the concept has been taken. Taschdjian[21], for example, used several concepts of entropy drawn from three contexts: l a thermodynamic context, i.e. “The friction inherent in any real system produces a decrease in free energy available to do work, and this increases its entropy (p. 94)” l an informational context, i.e. “There is a close correspondence between entropy and uncertainty (p. 94)” l a social and life science context, i.e. “This (environmental, mutation, technical or social change) means that the overall entropy of the system is not zero, but has a positive value (p. 98).” The original concept of entropy was introduced into the physical sciences in the mid-nineteenth century by Rudolf Clausius in association with the second law of thermodynamics. Later, Ludwig Boltzmann developed a different function for measuring the entropy of a system, therby introducing a second concept of entropy into the thermodynamics context. In the late 1940’sShannon and Weaver introduced their work on information theory in which they defined a concept of entropy using a function proportional to Boltzmann’s measure of entropy. At this same time Norbert Wiener developed a measure for information which he noted to be “the negative of the quantity usually defined as entropy in similar situations” ([22], p. 62). As this measure differed from both Clausius’ function and from the basic form of Boltzmann as well as Shannon and Weaver’s function, Wiener added another concept of entropy to the literature. Also working in the 1940’s but in the context of the life sciences, Erwin Schrodinger [23] introduced another concept of entropy into the literature, one that was based on a function distinct from the three forms already mentioned. In 1951, Kullback and Leibler produced a seminal paper which resolved the ambiguities surrounding the Boltzmann function and the Wiener function-but did so in a context of statistical inference which was divorced from all physical arguments.

Thermodynamic entropy

EVOLUTION One

may begin by noting the four sources and three contexts in which entropy concepts have been

A concept of theromdynamic entropy was originated by Clausius. He defined change in entropy from a standard state (&) as a change in bound energy (dQ) per degree Kelvin (T), where bound energy is that energy which is no longer available to carry out the same task that free energy could ([24], p. 107). For example, when some quantity of free energy, the energy available to do work (i.e. the energy in a gallon of gasoline), is used to power an engine, two products-mechanical work and heat-are produced. The heat is classified as bound energy. According to Clausius’ definition, a system with low entropy is one in which the bound energy is low and which is thermodynamically more efficient (i.e. has less energy loss) than one of higher entropy, under the assumption of the same temperature in both systems. It should be noted that Clausius did not rely on causal arguments to explain his definition; he developed an equation that mapped the thermodynamic aspect of systems as he observed it. Graphical representations of entropy measurements, using Clausius’ function, for various media (i.e. air, steam, etc.) can be found in the temperature-entropy diagrams of most engineering handbooks. Boltzmann employed statistical mechanics (probability theory) to explain that the cause of Clausius’ macroscopic observations of heat phenomena is the microscopic locomotion of particles. He reasoned that Clausius’ equation would be expressed in statistical terms as: S=kIn

W

where S = entropy, k = 1.38x lo-l6 ergs per degree, w = N!/N:!N*!,.., NM ! and N = the total number of particles in the system, Ni = the total number of particles in state i, m = the total number of states in the system, or as S= kNZ fi InA where fi = NJN or as S= kNH where H = Zfi In fi ([25], p. 190). Refer to Fig. 1 for a graphic representation of the summand of Boltzmann’s H-function. Note that this function is proportional to Boltzmann’s measure of entropy and that Boltzmann’s entropy function is a summation and is therefore based on the assumption that the proportion of particles in each state (fi) changes in discrete increments rather than continuously. Note also that when cf,, f2)= (OJ) or (l,O), H is 0; H increases to a maximum when cf,, f2)= (S,.5),i.e. when there is an equal number of particles in each state. Using Boltzmann’s explanation, as the disorder of a system increases (i.e. as the particles become evenly distributed

139

The entropies: some roots of ambiguity Table I. The basic forms of mathematical functions that have been used to define or measure entropy, the authors that have used these forms, and the contexts in which each has been applied Context

Thermodynamics

Foster [67] Leopold.and

Clausius[24]

Boltzmann [25]

The basic forms of functions used to measure

Social and life science

Communication and information theory

Shannon and Weaver [28] (communications)

entropy/information (a) IdQ/T Langbein[68]

09 (Xf,lnf,)kxN

Mac Arthur [69] Raymbnd 1301 Tribus[70]

(Boltzmann) X f, Inf, (Shannon) +X

(cl _

Wiener [22]

log~f(x)f(x)dx

I -cc

(communications)

(4 klogD

Schrodinger [23]

(d

Fisher [71] (statistics)

(0

Kullback and Leibler [72]

1,(b)In [$$$(dx)

(9) FPl ln k]

(statistics)

quantity lying between two points, x and x + dx. It is: +m[logzf(x)lf(x)dx

I -m

Fig.

1. Boltzmann’s

H-function

and

Shannon’s

measure

of

entropy as a function of probabilitiesin a case where there are two possible states in a system.

within the system), H increases, and thus entropy increases. Note however, that Boltzmann and others after him have not been able to establish that Boltzmann’s entropy function is equivalent to Clausius’ unless it is assumed that entropy is a subsumptive variable, i.e. that is additive that the entropy of two independent systems when combined equals the sum of these two entropy values. While subsumptive properties are meaningful for concrete measures such as physical dimensions, mass and system time, they have an uninterpreted meaning for relativistic measures such as chronological time, temperature, disorder, etc. ([26], pp. 145-146). As the assumption of subsumption for this concept of entropy has little meaning, it must be concluded that Boltzmann’s equation differs from Causius’ and that Boltzmann produced a second definition of entropy-one that is based on the relative disorder of a system. Informational entropy

Norbert Wiener wrote the first “generally accepted” (in Georgescu-Roegen’s opinion [26]-although see Khintchine [27])-definition of an amount of information that is associated with a probability density (f(x)) of a

(WI, P. 61)

as Wiener noted, this function multiplied by - 1 is a measure of entropy. Yet, this function differed from Boltzmann’s function for entropy in that it is a continuous function, whereas Boltzmann’s function is discrete; and the two functions cannot describe the same class of physical phenomena ([26], pp. 397-398). ‘Qus, Wiener’s work resulted in still another concept of entropy. See Fig. 2 for a graphic representation of the integrand of Wiener’s measure of information and entropy. Note that when f(x) = 1 and when f(x) = 0 (i.e. when it is known with certainty that a quantity is in or is not in an interval), both information and entropy take the value of zero according to Wiener’s equations. Note that this function is not symmetrical, nor does it peak at f(x) = 0.5. (This is due to the choice of 2 as the base of logarithms.) Shannon wanted to obtain a measure of the capacity of a code system to transmit and store messages and he used previously developed theory in statistical mechanics to reach this goal. He calculated the total number of typical messages in a system by the combinatorial formula: w =N!/N,!N*!...

NI!...N,!

where N = the total number of signals in state i and i = pi x N where pi = the probability of Ni occurring and s = the total number of states in the system ([28], p. 21). Using this formula which, because Shannon’s pi is equal to Boltzmann’s fi, is the same as the one used by Boltzmann to derive his H-function, Shannon defined a measure of the average information capacity of a system to be equal to the average entropy of that system. This measure is expressed mathematically as: -H=Inw/NorasH=-Spilnpi.

KINGSLEY

140

0

0 1 0.2

0.3

0.4

0.5

06

07

0.8

0.9

1.0’

E. HAYNES

et al

-f(x)

Fig. 2. Graph of f(x) vx. f [logf(x)lf(x), the integrand of Wiener’s measure of information and entropy.

+CO

Note that this function is equivalent to the Boltzmann H-function (H = Zfi Infi) except that Shannon put a negative sign in the equation to get a positively-valued measure for both entropy and information. See Fig. 1 for a graphic representation of Shannon’s measure of entropy. Social and fife science entropy

Schrodinger initiated the use of a concept of entropy in the social and life sciences by developing a literal interpretation of the function: entropy = k log D or -entropy = k log (l/D) where D is a measure of disorder l/D is a measure of order and k is Boltzmann’s constant ([23], pp. 73-74). With the latter equation, the concepts associated with “negative entropy” or “negentropy” were introduced. See Fig. 3 for a graphical representation of Schrodinger’s measures of entropy and negentropy. One point of confusion that appears with Schrodinger’s work is the definition of disorder. Schrodinger states that “to give an exact explanation of this quantity D in brief non-technical terms is well-nigh impossible ([23],p. 73).” Although not specifically stated by Schrodinger, it appears that if D is to be quantified, it must be quantified as a proportion, and thus would range

Fig. 3. Schrodinger’s

measures of entropy and negentropy.

-al Fig. 4. Schrodinger’s measures of entropy and negentropy when the horizontal axis is used to represent both D (from 0 to 1) and l/D (from I to infinity).

from 0 to 1. If so, entropy and negentropy values will follow the standard form of a logarithm function for this range of D. See Fig. 4. Note also that Schrodinger does not specify the reference point for any system at which order becomes disorder. From different points of view the same system can be described as‘ having high order or high disorder. Brillouin[29] followed Schrodinger’s work with an attempt to enhance the interpretation of “entropy” by relating it to the second law of thermodynamics and to life, death and intelligence. However, he worked on a generalized level and did not refer directly to the various functions that had been developed to measure entropy, but only to the literal interpretations of those functions. Raymond[30] responded to Brillouin by referring to Schrodinger’s function for entropy and using it as the basis for a literal argument in which he tried to combine all previously-developed entropy concepts and related them to open and closed systems. Ostow[31] responded to the literal interpretations of those two authors with the issue that information cannot take an “aboslute form” as Brillouin suggested, but rather it takes a “distributed form.” Ostow thereby implied that information is a relativistic rather than an absolute concept. One major issue to recognize here is that in 1950, as today, no universally accepted definition of information existed and none of the life-science definitions of entropy were composed of operationally measurable quantities. One exception proved this rule: the work of Miller [73] and of McCulloch and Pitts (available in Buckley[74]) on measuring the rate of information processing of human and animal nervous systems. These investigations used Shannon’s well-defined information measures and extended their application to the case of communication (neural) networks with many stimuli and many receptors. This use of the entropy idea to measure functional capacity thus differed in intent from the previous biological applications which were attempts to measure the structural complexity of the organism.

The entropies: some roots of ambiguity DEVELOPMENTS IN MATHEhlATlCAL STATISTICS

R.A. Fisher[32] was the first to introduce a technical definition of information in terms of mathematical statistics (see Table 1). As noted above, in other fields of study, both prior to and subsequent to Fisher’s work, concepts of information appeared which seemed to be parallel to Fisher’s, either because of similarities of logarithmic forms in the mathematical definition of information, because of the seeming universality of entropy/information in the newly developing conceptions of the physical world, or because of the everyday connotations of the word “information.” The work of Boltzmann and others on entropy in thermodvnamics and statistical mechanics is documented by Khintchine [27,33] and Georgescu-Roegen[261; and Fisher[32], Kullback[34] and Shannon and Weaver[28] attribute the mathematical basis of statistical information theory to this work in physical science. Although Shannon and Weaver’s derivation of a logarithmic measure of the information in a coded message, in 1949, prompted the birth of information theory in its modern form by sparking interest in the mathematics of entropic measures, it was not at all clear at first that their work in communication theory was of any importance to statistics[35]. Khintchine[27] in work concurrent with and evidently independent of Shannon and Weaver, first solved the variational problem of determining the maximum-entropy density function when a single constraint is considered. This singular contribution was part of a mathematical formalization of statistical mechanics and Khintchine did not pursue issues of statistical inference, considering his solution to be an application of probability theory. Khintchine was also the first in 1953 and 1956 to fully correct and extend Shannon and Weaver’s results in coding theory and problems of transmission over a noisy channel. In 1951 Kullback and Leibler generalized Khintchine’s and Shannon and Weaver’s results to the case of general measures, and presented the derivation of the information number Z(1:2) = lofl (x) In Lfi (x)/f*(x)] A (dx). Their article proposed the use of Z as a measure of the distance, or difficulty of discrimination between the densities fi and fi; proved its invariance under sufficient transformations; and indicated an application to hypothesis testing.? Kullback and Leibler suggested that a system of inference could be built around a rigorously defined measure of the amount of information (about an unknown parameter) provided by a particular set of tKullback and Leibler cited Jeffreys’I361 demonstration of the symmetry and additivity properties of I and its positive detiniteness and invariance under nonsingular transformations. SThe information given by log-likelihood ratio in favor of the null hypothesis is established by an argument involving Bayes’ Theorem[35]. #In the I%1 paper, Renyi interpreted I(P.Q) as “a measure of the amount of information concerning the random variable 5 contained in the observation of the event E”, where Q is the unconditional distribution of 5, and P the conditional distribution of 5 given E. This is an interesting special use of KuUback’s generalization. See also Renyi,[39]. ItA 1%7 article of KuUback presents an improved lower bound for the discrimination information in terms of variation, where the variation between two distributions is defined as V(P,, Pz) = _f~r(x)-f&~)lA(dx). A 1970 note corrects some errors in the test of the 1%7 article. However, in addition to the corrections noted by Kullback, eqn (7) of the original article should

read I(P,, P2)2 V?P,, P,)/2 t V’(P,, P,)/36b UP,, P2)log (1 t V(P,, P#.

141

observations. Remarking that much of the earlier probabilistic terminology could coincidentally be adopted directly to this task, Kullback and Leibler thus established information theory as a branch of mathematical statistics. In the above definition of Z(1:2), fi is a probability sample space, the density fz must be absolutely continuous with respect to the density f, and the fixed measure A must dominate f, and fz. The discrete counterpart of Z(1:2),Hpi In bi/qi), has an exactly parallel meaning as a measure of statistical information to the continuous form when discrete probability laws are addressed. (Recall the argument earlier in this paper that the discrete and continuous forms are not comparable as descriptors of thermodynamic entropy.) The statistical information number is, in either the discrete or the continuous case, merely the mean of the log-likelihood ratio under the null hypothesis represented by fit. The logarithmic transformation of the likelihood ratio yields the additivity property which is so desirable for a measure of information, i.e. that twice as many statistically independent observations of a random variable will yield twice as much information about its distribution. The introduction of the notion of independence, in the statistical context, provided the property of additivity that was wanted in the thermodynamic entropy of Boltzmann. It is a straightforward argument[l4] that the Boltzmann Z-Z-function-and hence the “entropy” models of the geography literature-are a special case of the discrete form of Z(1:2). Kullback followed the 1951 paper with a series of articles which developed some of the inequalities of statistical information theory[35] and investigated applications to several areas of multivariate analysis [37]. These led to his 1959 book which further developed the properties of the information measures (including an asymptotic distribution theory for the minimum value of Z under constraints) and recast a substantial body of sampling, estimation, hypothesis testing and multivariate techniques in information theoretic terms. Despite Kullback’s pioneering work, the general applicability of information theory has until the present time, received greater recognition and attention in Japan, Hungary and the USSR than in the United States[38]. In 1961 Renyi showed that a theorem of Fadeev characterizing the Shannon entropy KZp log (l/p) was satisfied as well by the class of functions l/l (Ylog (Cp”). Renyi called these functions “generalized entropies” and noted that the Shannon entropy was a limiting case (as (Y+ 1). On this basis, Renyi proposed an “information of order a” (l/a - 1 log p&a) of which the KulIback information number is a limiting case,0 set down properties of the a-order information measures, and provided the first characterization theorem. In 1967Renyi used an information theoretic argument to establish a Bayesian version of the Fundamental Lemma of Neyman and Pearson. Arimoto in 1971gave an alternative definition of generalized information, with applications to estimation and decision theory, and refers to additional untranslated papers of Renyi. Today formulations and characterizations of generalized information numbers proliferate[40], but applications have been few. The book of Mathai and Rathie presents an extensive bibliography on the theory and application of statistical information through 1975. Some recent extensions are due to Kullback[41],” Akaike[42,44,45] and Chames and Cooper [46,47].

142

KINGSLEYE. HAYNESet al.

The work of Akaike[42] and others has demonstrated an asymptotic equivalence of maximum likelihood methods, Fisher-i~ormation principles, and KullbackLeiber est~ation. Akaike used the directed divergence between an estimated density function f(x, 6) and a true density f(x, 0) as an economic loss function for the estimate 8* of 8. This approach yields an important uni~cation of statistical estimation, hypothesis testing, and decision theory&%]. Among the most important of recent advances, Akaike’s unification of maximum likelihood and minimum discrimination information estimates broke the path for a variety of practical applications, via Akaike’s extension of the maximum likelihood principle. The applications include[43] determination of the number of factors in a factor analysis; decision procedures for principal component analysis; decisions for the order of decomposition of variance in analysis of variance; similar decisions in multiple regression; and for the fitting of autoregressive time series models. In a later paper[441, Akaike showed that the extended max-likelihood principle leads to a form of the James-Stein results (i.e. the inadmissibiIity of the ordinary max-likelihood estimate of the mean of a multivariate normal distribution) which does not depend on Bayesian arguments. More recently]451 Akaike has shown the optimality of an information criterion for selecting the best of a parametric class of models. In another 1978 paper1761 Sakamoto and Akaike have developed an objective information theoretic procedure, which extends Jeffreys’early work, for evaluating the prior distribution in a Bayesian model. All of these advances will have an impact on quantitative invest~ations in the social sciences (see also the work of Gokhale and Kullback[48] on the MD1 analysis of contingency tables). Most entropic models in geography and marketing? have involved the determination of the max-entropy or the minimum discrimination information (MDI) number in the presence of (usually linear) constraints. Charnes and Cooper[46] have developed a powerful mathematical programming duality theory which allows the constructive solution of arbitrarily complex problems of this type by minimizing an unconstrained convex function. This approach was specialized by Charnes, Haynes, Phillips and White1491 for the case of the “gravity model” of geography. Phillips[SO] provided a detailed guide for the construction of statistical tests based on Kullback’s MD1 numbers, and cast a number of geographic and marketing problems in information theoretic terms. It has indeed been the case that many longstanding and successful rule-of-thumb methods in the social sciences have recently been proven equivalent to MD1 estimates

fSee Phillips[50] for a survey of applications. Learner and Phillips (forthcoming) use the hypothesis-testing capabilities of MD1 as the core of a comprehensive system of marketing planning. ~Inci~ing classes of gravity models, brand switching models, indjvidu~ choice theories, and production functions. gee for eXaInPle Charnes, Cooper and Learnerr and Phillips[50] as additional references. We Sheppardf531, MacLean[U] and Bacharach[SS] for discussion of this issue. %te celebrated controversy over “Maxwell’s Demon” is not central to the presentation in this paper, but constituted an important tangent. The controversy was resolved by Szilard[75]; see also the readings in Buckley[74].

and pr0cedures.S Beyond the computational advantages just described, constrained MD1 estimates yield the multiplicative (log-linear) models so prevalent in the social sciences, are distributed in exponential families, and preserve

sufficient statistics[51].

Learner and Phii-

lips (forthcoming) further detail the strong evidence that information theoretic statistics are especially appropriate for social science research. These considerations reinforce the general feeling of the universality of entropic concepts, and increase the importance of a solid basis for their use in the social sciences. The lack of explicit error terms in constrained MD1 models has led to some unfounded doubts as to whether these are indeed statistical models.5 The information number (the “directed divergence” in Kullback’s definition) is itself the error term in these models, and is minimized to find the MD1 distribution. The asymptotic theory of the MD1 number then indicates the goodness of fit of the model. This consideration points up a weakness of the geographic models of Wilson[lO] which are derived combinatorially and by analogy from statistical thermodynamics and include no consideration of goodness of fit. In statistical mechanics, the construction of the set of microstates of a system, and of the rules for grouping microstates into macrostates, are not unambiguous (see Georgescu-R~genI241). Indeed the Einstein-Bose and Fermi-Dirac statistics were developed as an alternative to Boltzmann’s grouping scheme, in order to achieve better “fit” for particular situations. It is most evident that one is on shaky ground when uncritically adopting from physics a technique that is not even consistent within its original milieu. The derivation of Wilson’s basic gravity model [ 101is in accord with common sense (which counts for much in human-scale situations), but there is no assurance that this would be the case were the model extended to more complex social problems. In any case, goodness-of-fit measures are needed to deal with model specification error and sampling error; and for an “analog” model* a good goodness-of-fit value would enable us to conclude that “microstates” and grouping rules have been adequately specified. We must look to statistical information theory for these measures. The historical development of statistical mechanics occasioned the mentioned controversy over open and closed systems, grouping rules and allowable microstates; much skipping back and forth among alternative notions of the axiomatic and philosophical foundations of probability; and much confusion over ergodicity-all of which confounded the de~nition of entropy.“” Shannon’s entropy, of fundamental importance to coding theory and message transmission engineering, has never been effectively generalized to broader applications. Schrodinger’s biological entropies involve unobservable quantities and from a social science standpoint any such analog of the Clausius entropy may be so macroscopic in view as to preclude the analysis of any problem of a smaller scale than the decline and death of a species. Statistical i~ormation, as evidenced by its straightforward definition in terms of the likelihood ratio and by the work of Kullback and Akaike, has been, aside from Clausius’ original entropy, the most logically consistent entropy construct of those discussed in this paper, Information theoretic measures have been shown to be of value for business and social science applications when regarded strictly as statistical concepts without reference to physical analogies 1.501.Yet their formal similarities to

The entropies: some roots of ambiguity

the “other entropies”

143

function forms in three contexts, one word “entropy” has been applieu to all these concepts. From a semantic point of view this singular map of several concepts is unacceptness of fit an analog model, or indeed in determining able; this practice increases the “entropy” of the reader. whether in a particular case an analog model, or a statistical model would be superior. In the scholarly development of literature, an The inferential power of statistical theory should cer- effectively developed theme is usually started by an tainly be available to enrich applications of entropy ideas anchor article and then evolves as authors extend the in whatever field. That statistical information theory has theme like links in an anchor chain. In the case of the split from communication theory and blossomed into an “entropy chain” many links have been added extensive body of knowledge in its own right, has not haphazardly to a chain that is not firmly connected to an anchor. become generally known. According to Bartlett[56], this fact “. . . has not, I think, been recognized sufficiently in The concepts of entropy have been developed without some of the recent conferences on infomzation theory the developing community seeing the whole system. It is (emphasis added), to which mathematical statisticians per ironic that the systems community which advocates stuse have not always been invited.” Indeed a recently dying the whole system should be caught not practicing published large volume entitled The Maximum Entropy its philosophy. Yet, despite the fact that the word “entropy” has been Formalism [57] would allow the reader to conclude that the state of the art includes no statistical concept of used with much confusion in the literature, it has been information. The momentum of recent developments in popular in most areas of science, and especially in the the field (due to Akaike, Charnes and Cooper, Gokhale systems literature. This situation leads one to ponder the source of such popularity. In the systems literature it can and Kullback and others (on, cit.) are an encouraging be partially accredited to the fact that these entropy sign that the situation will turn for the better. concepts are perceived as isomorphisms which can be used to integrate study among disciplines. Entropy and AMJJIGUITYASSOClATEDWITHTHEWORD“ENTROPY" its counterpart negentropy make up a binary map that It can be seen from this discussion of the development can be used to relate the physical and life sciences, a tool of entropy concepts that as the number of concepts sought after by many scientists. Also, the logarithmic increased linearly, the ambiguity and confusion asso- equation of Schrodinger and the summation series of ciated with the concepts seemed to increase exponen- Boltzmann, Shannon and Weaver have properties that tially. The unqualified, non-contextual application of the have been used effectively in the social and life sciences single term “entropy” to these concepts has been a to describe many activities. Furthermore, entropy concepts are usually associated with the second law of major source of ambiguity in the literature. More specifically, ambiguity resulted from factors thermodynamics and users of these concepts can (with such as: caution) relate them and their own area of study to this l failing to recognize the multiple meaning of the word law. “entropy”, i.e. that it represents at least four forms of Much of the ambiguity associated with the word mathematical functions in at least three different “entropy” can be eliminated by: contexts[23,30,58,59] l recognizing the historical development of entropy . not specifying the bases (function and context) upon concepts and by qualifying usage of this word with the which a specific entropy concept and its literal inter- appropriate context and function. pretation were developed [21,60-62] . taking quotations and references from only clearly l basing arguments qualitatively on hazy concepts developed concepts that are supported with functions. suggested by the second law-such as order/chaos, ran. compiling and using a cross-disciplinary vocabulary dom/systematic, evolution/degradation-which, although to deal with these concepts. facile initially, are ultimately ambiguous and impossible . presenting these concepts in several modes (literal, to operationalize. mathematical and graphical) to enhance communication. . quoting sources which quote other sources which . adopting new, analogy free, mathematical or statistimay or may not quote the original source of the concept cal definitions that are appropriately specified for general being discussed[31,63,64] applications. . attempting to develop a higher-order-synthesis, Similar considerations should be applied in the cases of unified, literal interpretation of entropy, but succeeding other useful adisciplinary terms (e.g. information, in only creating additional concepts of entropy by not hierarchy, growth, complex systems, etc.) to facilitate referring to original sources and/or functions[30,65]. communication of the concepts associated with these . presenting concepts in a literal mode while ignoring terms. mathematical or graphic modes of communication[29,66]. Acknowledgements-The authors express their gratitude to comments and contributions to this paper by Drs. A. Charnes, R. l communicating concepts on an interdisciplinary level without the existence of a unified vocabulary to deal with Mather, T. Ruefli and O.R. Young of the University of Texas, Austin. these concepts. lrapidity of concept development and utilization (math, statistics). operationally

useful-for

are conceptually stimulating and example in testing the good-

REFERENCES CONCLUSION

Referring back to Table 1, one can see that ambiguity has been associated with entropy concepts because although many concepts have developed from four distinct

1. B. Marchand, Information theory and geography. G~ographical Analysis, 4, 234-257.(1972) 2. R. Lee, A markovian entropy-maximizing model of population distribution. Ettuironment and Planning A. 6, 693-702. (1974)

144

KINGSLEYE. HAYNESet al.

3. Y. V. Medvedkov, The concept of entropy in settlement pattern analysis. Reg. Sci. Asoc. Papers, 18, 165-168. European congress, Vienna (1966). 4. G. P. Chapman, The application of information theorv to the analysis of population distributions in space. Economic Geoarauhv, 46. 317-331. (Suoolement) (IWO). 5. R. Keiih Temple, and R. G. Golledge,’ An analysis of entropy changes in a settlement pattern over time Economic Geography, 46, 157-180. (1970). 6. R. K. Semple, Recent trends in the spatial concentration of corporate headquarters, Economic Geography, 49, 309-318. (1973) 7. K. E. Haynes and W. T. Enders, Distance, direction and entropy in the evaluation of a settlement pattern. Economic Geography, 51, 357-365. (1975). 8. A. Getis and B. Boots, Models of Spatial Processes, Cambridge University Press, Cambridge (1978). 9. R. A. Curry, A spatial analysis of gravity flows. Rep. Studies, 6, 131-147: (197i). 10. A. G. Wilson, Entropy in Urban and Regional Modeling, Pion, London: (1970). 11. A. Chames, W. Raike and C. 0. Bettinger, An extremal and information-theoretic characterization of some zonal transfer models. Socio-Econ. Plan. Sci., 6, 531-537. (1972). 12. S. P. Evans, A relationship between the gravity model for trip distribution and transportation problems in linear programming. Transport Res. 7,3%1 (1973). 13. A. G. Wilson, and M. L. Senior, Some relationships between entropy maximizing models, mathematical programming models, and their duals. J. Reg. Sci. 14, 207-215 (1974). 14 F. Y. Phillips, G. M. White and K. E. Haynes, Extremal approaches to estimating spatial interaction. Geographical Analysis, 9, 185-200, (1976). 15. M. Batty, Spatial entropy. Geographical Analysis 6, l-32 (1974). 16. M. Batty, Urban density and entropy functions. J. Cybernetics 4, 41-55 (1974). 17. K. E. Haynes, and .I. S. Storbeck, The entropy paradox and the distribution of urban population. Socio-Econ. Plan. Sci. 12, 1-6 (1978). 18. P. Gould, Pedagogic review. Annals of the Assoc. of Am. Geographers, 62, 689-700 (1972). 19. M. J. Webber, Pedagogy again: What is entropy? Annals of the Assoc. of Am. Geographers, 67, 254-266 (1977). 20. F. Cesario, A primer on entropy modeling. J. of Am Institute of Planners, 41, (1974). 21. E. Taschdjian, The entropy of complex dynamic systems. Behavioral Sci. 19,93-99 (1974). 22. N. Wiener, Cybernetics, Wiley, New York (1948) 23. Erwin Schrodinger, What is Life? Cambridge University Press, Cambridge (1944). 24. R. Clausius, The Mechanical Theory of Heat, (trans. W. R. Brown) MacMillan, London, (1879). 25. L. Boltzmann, Vorlesungen Uber Gastheorie, I. theil., thans. A. Meiner, Verlag von Johann Ambrosius Barth, Leipzig (18%). 26. N. Georgescu-Roegen, The Entropy Law and Ihe Economic Process, Harvard University Press, Cambridge, Mass (1971). 27. A. 1. Khintchine, Mathematical Foundations of Statistical Mechanics, Dover, New York (1949). 28. C. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, Illinois (1949). 29. L. Brillouin, Life, thermodynamics and cybernetics. Am. Scientist 37, 554-568 (1949). 30. R. C. Raymond, Communication, entropy and life. Am. Scientist, 38, 273-278 (1950). 31. M. Ostow, The entropy concept and the psychic function. Am. Scientist, 39, 140-144. (1951). 32. R. A. Fisher, The logic of inductive inference. J. Royal Statist. Sot. 98 (1935); No. 26 in Contributions to Mathematical Statist., Wiley, New York (1950). 33. A. I. Khintchine, Mathematical Foundations of Information

34. 35.

36. 37.

38.

39. 40.

41.

42.

43.

44.

45. 46.

47. 48. 49.

50.

51.

52.

53. 54. 55.

56. 57. 58.

Theory Dover, New York (1957). (New translations of Khintchine’s papers The entropy concept in probability theory and On the fundamentals of information theory originally published in Russian in Uspekhi Malematicheskikh Nauk, 8 (1953) and 11 (1956) respectively). S. Kullback, Information Theory and Statistics, Wiley, New York (1959). S. Kullback, Certain inequalities in information theory and the Cramer-Rao inequality, Annals of Mathemaf. Statist, 25, 745-75 1 ( 1954). H. Jeffreys, Theory of Probability 2nd Edn Oxford University Press, London (1948). S. Kullback, An application of information theory to multivariate analysis I and Il. Annals of Mathemat. Statist. I, Vol. 23. pp. 88 (1952) and II, Vol. 27, pp. 122. (1956) (With correction p. 860.) Lawrence M. Seiford, Entropic solutions and disruption solutions for N-person games. Ph.D. dissertation. The University of Texas,~ Austin(1977). A. RCnyi, Statistics and information theory Studies Sci. Math. Hung. 2,249-256 (1%7). A. A. Mathai. and P. N. Rathic Basic ConcepIs in Information Theory and Statics. Halsted Press, New York (1975). S. Kullback, A lower bound form discrimination information in terms of variation, IEEEE Transportation Information Theory, IT-13.126-127. (1%7). H. Akaike, Information theory and an extension of the maximum likelihood principle, In 2nd Annual International Symposium on Information Theory (Ed. by B. N. Petron and F. Csakii) pp. 267-281. Akademiai Kiado (1973). H. Akaike, On entropy maximization principle. In Applications of Statistics, (Edited by P. R. Krishnaiah) pp. 27-41, North Holland, Amsterdam, (1975). H. Akaike, An extension of the method of maximum likelihood and the stein’s problem. Ann. Inst. Statist. Math. 29, Part A, 153-164 (1977). H. Akaike, A new look at the Bayes procedure Biometrika 65, 53-59 (1978). A. Charnes and W. W. Cooper, Constrained Kullback-Leibier estimation; generalized Cobb Douglas balance, and unconstrained convex programming. Center for Cybernetic Studies Report CCS 167, (January 1975) and Rendiconti di Accademia Hazionale dei Lincei (April 1975a). A. Charnes and W. W. Cooper, Goal programming and constrained regression-A comment. Omega, 3 (1975b). D. V. Gokhale and S. Kullback, 7’he Information in CONtingency Tables, Marul Dekker, New York (1978). A. Charnes, K. E. Haynes, F. Y. Phillips and C. M. White Dual extended geometric programming problems and the gravity model, J. Reg. Sci. 17,71-76 (1978). F. Y. Phillips, Some information theoretical methods for management analysis in marketing and resources, Ph.D. dissertation, University of Texas, Austin (1978). P. L. Brockett, A. Chames and W. W. Cooper, M.D.I. estimation via unconstrained convex programming. Research Report CCS 326, Center for Cybernetic Studies, University of Texas at Austin (Nov. 1978). A. Charnes, W. W. Cooper and D. B. Learner, Constrained information theoretic characterizations in consumer purchase behavior. J. Op[. Res. Sot. 29, 833-842 (1978). E. S. Sheppard, Notes on spatial interaction. Professional Geographer 31 (I), 8-15 (1979). A. S. MacLean, Maximum likelihood and the gravity model. Transport. Research, 10, 287-297 (1976). Micheal Bacharach, Bipropotiional Matrices and InputOutput Change. Cambridge University Press, Cambridge (1970). M. S. Bartlett, Probability, Statistics and Time: A Collection of Essays. Chapman and Hall, London (1975). R. D. Levine, and M. Tribus, (Eds.) The Maximum Entropy Formalism, MIT Press, Cambridge, Mass (1979). R. Arngeim, Entropy and Art University of California, Berkeley, California (1971).

The entropies:

some roots of ambiguity

59. E. Goldsmith, The limits of growth in natural systems. General Systems, 16, 69-76 (1971). 60. L. Von Bertalanffy, General Systems Theory University of Alberta, Edmonton, Canada (1968). 61. W. Buckley, Sociology and Modern Systems Theory Prentice Hall, Englkwood Cliffs, New Jersey (1%7). 62. J. W. S. Prinale. On the uarallel between learning and evolution. GeneralSystems, 1: 90-l IO (1956). 63. T. R. Young, Stratification and modern systems theory, General Systems, 14, 113-I I7 (1969). 64. M. Braham, A general theory of organization. General Sysferns 18, 13-24 (1973. 65. L. Brillouin, Thermodynamics and information theory. Am. Scientist 38, 594-599 (1950). 66. D. Krech, Dynamic systems as open neurological systems. Genera/ Systems, 1, l&l54 (1956). 67. C. Foster, A. Rappoport and E. Trucco. Some unsolved problems in the theory of non-isolated systems. General Systems, 2, 9-29 (1957). 68. L. B. Leopold, and W. B. Longbein The concept of entropy in landscape evolution. Genera/ Sysfems, 9 254l (1964). 69. R. MacArthur, Fluctuations of animal populations and a measure of community stability. General Systems, 3, 148-151 (1958). 70. M. Tribus, Information theory as the basis for thermostatics and thermodynamics. General Svstems, 6, 127-138 (1961). 71. R. A. Fisher, Theory of statistical estimation. Proc. Cambridge PhilosophicatSoc. 22 (1925); No. I1 in Contribution to Mathematical Statistics Wilev. New York (1950). 72. S. Kullback and R. A. ieibler, On information and sufficiency. Annals of Mathemat. Statist., 22, 79-86 (1951). 73. G. A. Miller, The magical number seven plus or minus two: some limits on our capacity for processing information. Psychological Rev. 63, 81-97 (1956). 74. W. Buckley, Modem Systems Research for the Behavioral Scientist. Aldine, Chicago (1%8). 75. L. Szilard, Uber die entropieverminderung in einem thermodynamischen system bei eingriffen intelligenter wesen. Zeitschr. f. Phvs. 53. 84@856 (1929); translated by A. Rapoport and MI Knoller as On the increase in entropy in a thermodynamic system by the intervention of intelligent beings. Behavioral Sci. 09, 302-310 (1964). 76. Y. Sakamoto and H. Akaike, Analysis of cross classified data by A.I.C. Annals of the Institute Statist. Math. 3OB, 185-197 (1978).

145 FURTHER

READING

S. Arimoto, Information theoretical considerations on estimation problems. Information and Control 19, 181-194 (1971). L. Brillouin, Science and Information Theory 2nd Edn. Academic Press, New York (1962). A. Charnes, K. E. Haynes and F. Y. Phillips, A ‘generalized distance’ estimation procedure for intra-urban interaction. Geographical Analysis, 8, 289-294 (1976). S. Guiasu, Information Theory with Applications. McGraw-Hill, New York (1977). R. V. L. Hartley, Transmission of information. Bell System Technical, Joumai, 7, 535-563 (1928). T. Jaynes, Information theory and statistical mechanics. Physics Rev. 106, 620-630 (1957). S. Kullback, Correction to a lower bound for discrimination information in terms of variation. IE.EEE Transportation Information Theory, IT-16 652 (1970). D. D. Learner, and F. Y. Phillips, Information theoretic models for controlled forecasting in marketing. J. Marketing, forthcoming. R. Lee, Entropy models in spatial analysis. Discussion Paper No. I5 University of Toronto, Dept. of Geography, mimeograph (1974). F. Y. Phillips, G. M. White and K. E. Haynes, Transportation models for Texas coastal zone management. Presented to Canadian Association of Geographers, Annual Meetings, Toronto (1975). H. Quastler, Information Theory in Psychology. Free Press, Glencoe, Illinois (1956). A. RCnyi, On measures of entropy and information, Fourth Berkely Symposium on Mathematical and Statistical Problems, I, 547 (1961). R. Keith Semple and George Demko, The changing trade structure of eastern Europe: An information-theoretic approach Presented at Association of American Geographers, Annual Meeting, Seattle (April 1974). R. Keith Semple, and George Demko. An information theoretic analysis: An application to Soviet-COMECON trade flows, Geographic Analysis, 9 (1977). M. L. Senior and A. G. Wilson, Explorations and syntheses of linear programming and spatial interaction models of residential location Geographical Analysis, 6, 209238 (1974). H. Thiel. Economics of Information Theory. Rand McNally, Chicago (1967). ’ . A. G. Wilson, A statistical theory of spatial distribution models Transport. Research, 1, 253-269 (1967).