17 June 2002
Physics Letters A 298 (2002) 369–374 www.elsevier.com/locate/pla
Thermostatistics based on Kolmogorov–Nagumo averages: unifying framework for extensive and nonextensive generalizations Marek Czachor a,b , Jan Naudts b,∗ a Katedra Fizyki Teoretycznej i Metod Matematycznych, Politechnika Gda´nska, 80-952 Gda´nsk, Poland b Departement Natuurkunde, Universiteit Antwerpen UIA, Universiteitsplein 1, B2610 Antwerpen, Belgium
Received 24 January 2002; received in revised form 11 April 2002; accepted 11 April 2002 Communicated by C.R. Doering
Abstract We show that extensive thermostatistics based on Rényi entropy and Kolmogorov–Nagumo averages can be expressed in terms of Tsallis nonextensive thermostatistics. We use this correspondence to generalize thermostatistics to a large class of Kolmogorov–Nagumo means and suitably adapted definitions of entropy. As an application, we reanalyze linguistic data discussed in a paper by Montemurro. 2002 Elsevier Science B.V. All rights reserved. PACS: 05.20.Gg; 05.70.Ce Keywords: Nonextensive thermostatistics; Rényi entropy; Nonlinear averages; Zipf–Mandelbrot law
Generalized averages of the form xφ = φ
−1
pk φ(xk ) ,
(1)
k
where φ is an arbitrary continuous and strictly monotonic function, were introduced into statistics by Kolmogorov [1] and Nagumo [2], and further generalized by de Finetti [3], Jessen [4], Kitagawa [5], Aczél [6] and many others. Their first applications in information theory can be found in the seminal papers by Rényi [7,8] who employed them to define a one-parameter family of measures of information * Corresponding author.
E-mail addresses:
[email protected] (M. Czachor),
[email protected] (J. Naudts).
(α-entropies) 1 Iα = ϕα−1 pk ϕα logb pk k 1 = logb pkα . 1−α
(2)
k
The Kolmogorov–Nagumo (KN) function is here ϕα (x) = b(1−α)x , a choice motivated by a theorem [9] stating that only affine or exponential φ satisfy x + Cφ = xφ + C,
(3)
where C is a constant. Random variable Ik = − logb pk ,
(4)
represents an amount of information received by learning that an event of probability pk took place [10,11];
0375-9601/02/$ – see front matter 2002 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 5 - 9 6 0 1 ( 0 2 ) 0 0 5 4 0 - 6
370
M. Czachor, J. Naudts / Physics Letters A 298 (2002) 369–374
b specifies units of information (b = 2 corresponds to bits; below we use b = e which is more common in the physics literature). α-entropies were also derived in a purely pragmatic manner in [12] as measures of information for concrete information-theoretic problems. The above derivation of Iα clearly shows the two elements which led Rényi to the idea of α-entropy: (1) one needs a generalized average and (2) the random variable one averages is the logarithmic measure of information. The latter has a well-known heuristic explanation which goes back to Hartley [13]: to uniquely specify a single element of a set containing N numbers one needs log2 N bits; but, if one splits the set into n subsets containing, respectively, N1 , . . . , Nn elements ( i Ni = N ) then in order to specify only in which set the element of interest is located it is enough to have log2 N − log2 Ni = log2 (N/Ni ) bits of information. The latter construction ignores the information encoded in correlations between the subsets. For this reason the sum of pieces of information characterizing subsets is typically larger than the information characterizing the entire set. The idea is used in datacompression algorithms and is essential for the argument we will present below. In particular, we shall see that for systems with long-range correlations a natural candidate is Ik = lnα (1/pk ), where lnα (·) is the deformed logarithm [14] (ln1 (·) = ln(·)). Although α-entropies are occasionally used in statistical physics [15,16] it seems the same cannot be said of KN-averages. Thinking of the original motivation behind generalized entropies one may wonder whether this is not logically inconsistent. Constructing statistical physics with α-entropies one should consistently apply KN-averaging to all random variables, internal energy included. Applying the procedure to thermostatistics one may expect to arrive at a oneparameter family of equilibrium states which, in the limit α → 1, reproduce Boltzmann–Gibbs statistics. During the past ten years it became quite clear that there is a need for some generalization of standard thermostatistics, as exemplified by the ongoing efforts in Tsallis’ q-thermodynamics [17]. Systems with long-range correlations, memory effects or fractal boundaries are well described by q = 1 Tsallis-type equilibria. Gradual development of this theory allowed to understand that there is indeed a link between generalized entropies and generalized averages. However, the averages one uses in Tsallis’ statistics are the stan-
dard linear ones but expressed in terms of the so-called escort probabilities (see (14) below). So there is no direct link to KN-averages. In what follows we present a thermostatistical theory based on KN-averages. The idea is to use maximum entropy principles where the KN-averages are applied on equal footing to entropies and constraints. As we shall see there is a link between such a theory and Tsallis’ thermostatistics. Actually, many technical developments obtained within the Tsallis scheme have a straightforward application in the new framework. An important difference with respect to the Tsallis theory is that we can obtain both nonextensive and extensive generalizations so that one may expect the formalism will have a still wider scope of applications. Rényi’s definition of entropy (2) becomes more natural if one notices that KN-averages are invariant under φ(x) → Aφ(x) + B and one replaces ϕα by e(1−α)x − 1 ≡ lnα exp(x) . (5) 1−α The α-entropy written with the help of the modified KN-function (5) is φα−1 pk φα (Ik ) = φα−1 pk φα (− ln pk ) (6) φα (x) =
k
k
−1 1−α = φα−1 pk lnα (1/pk ) = φα−1
α k pk
k
= Iα .
(7)
It is interesting that in the course of the calculation of Iα the expression for the Harvda–Charvat–Daróczy– Tsallis entropy [15,17–19] α p −1 , Sα (p) = k k (8) 1−α arises. This shows that in the context of KN-means there is an intrinsic relation between Iα and Sα : φα (Iα ) = Sα .
(9)
Let us note that the formula φα (Ik ) = lnα (1/pk ),
(10)
may hold also for other pairs (φα , Ik ), with φα not given by (5), and be valid even for measures of information different from the Hartley–Shannon–Wiener
M. Czachor, J. Naudts / Physics Letters A 298 (2002) 369–374
random variable Ik = − ln pk . The key assumption of the present Letter is that the generalized theory is characterized by the properties (9) and (10). One can see (10) as a definition of Ik in case φα is given, or as a constraint on φα if Ik is given. In particular, the choice φα (x) = x determines Tsallis’ thermostatistics as one of the generalized thermostatistic formalisms under consideration here. Generalized thermodynamics is obtained by maximizing Iα under the constraint of fixed internal energy −1 β0 H φα = φα (11) pk φα (β0 Ek ) = β0 U, k
where β0 is a constant needed to make the averaged energy dimensionless. Equivalently, the problem may be reformulated as maximizing Sα , given by (8), under the constraint pk φα (β0 Ek ) = φα (β0 U ). (12)
371
One has clearly also the inverse relation q
ρ pk = k q , k ρk
(15)
with q = 1/α. The above optimization problem is now equivalent to maximizing Sq (ρ) under the constraint q k ρk φ1/q (β0 Ek ) (16) = φ1/q (β0 U ). q k ρk This is so because Sq (ρ) is maximal if and only if S1/q (p) is maximal (see [24]). The latter optimization problem is of the type studied in the new style nonextensive thermostatistics [20]. The free energy F is defined by q ρk φα (β0 Ek ) − β0 T ∗ Sq (ρ). β0 F = k (17) q k ρk Minima of β0 F , if they exist [23,25], are realized for distributions of the form [20]
k
This problem is of the type originally considered by Tsallis [17]. However, since then the formalism of nonextensive thermostatistics has evolved. In particular, one has learned [20] that the optimization problem should be reparametrized for the following reasons. The standard thermodynamic relation for temperature T is dS 1 = , T dU
(13)
with S and U , respectively, entropy and energy calculated using the equilibrium averages. In generalized thermostatistics this definition of temperature is not necessarily correct. Recently it has been shown [21, 22] that (13) is valid if the entropy is additive and must be modified in all other cases. The reparametrization of nonextensive thermostatistics, by introduction of escort probabilities, is such that energy U becomes generically an increasing function of some (unphysical) temperature T ∗ (see, e.g., Proposition 3.5 of [23]), which is then related to physical temperature T . The reparametrization is done by means of q ↔ 1/q duality [20,24]. The escort probabilities ρk are defined by pα ρk = k α . k pk
(14)
ρk ∼
1 , [1 + axk ]1/(q−1)
if 1 < q,
(18)
if 0 < q < 1.
(19)
or 1/(1−q)
ρk ∼ [1 − axk ]+
,
Here xk = φα (β0 Ek ) and [x]+ equals x if x is positive, zero otherwise. Expression (18), with 1/(q − 1) replaced by 1 + κ, is called the kappa-distribution or generalized Lorentzian distribution [26]. There are several reasons why this distribution is of interest. In the first place, the Gibbs distribution, which determines the equilibrium average in the standard setting of thermodynamics [27], is obtained in the limit κ → +∞, or q → 1. The kappa-distribution is frequently used. For example, in plasma physics it is used to describe an excess of highly energetic particles [28]. Typical for distribution (19) is that the probabilities pk are identically zero whenever aEk 1. This cutoff for high values of Ek is of interest in many areas of physics. In astrophysics it has been used [29] to describe stellar systems with finite average mass. A statistical description of an electron captured in a Coulomb potential requires the cut-off to mask scattering states [24,30]. In standard statistical mechanics the treatment of vanishing probabilities requires infinite energies which lead to ambiguities. These can be avoided if distributions of the type (19) are used.
372
M. Czachor, J. Naudts / Physics Letters A 298 (2002) 369–374
The formulas that follow are based on results already found in literature at many places, e.g., in [23]. The equilibrium average is the KN-average with pk given by pk =
1/(α−1) 1 , 1 + a(1 − α)(xk − u) + Z1
(20)
with xk = φα (β0 Ek ) and the normalization constant being given by −n+α/(α−1) 1 + a(1 − α)(xk − u) + Zn = , k
n = 0, 1,
(21)
for n = 1. The unknown parameters a > 0 and u have to be fixed in such a way that (12) holds. This condition can be written as Z0 1 φα (β0 U ) = u + (22) −1 . (1 − α)a Z1 The entropy Iα follows from (8) with (20). One obtains 1 Z0 − 1 . φα (Iα ) = (23) 1 − α Z1α Temperature ∗
T =
T∗
aαZ12
k
Internal energy equals 1 −αβ0 Ek 1 ln β0 U = e = Iα − ln Z1 . 1−α Z1 k
(29)
is given by (cf. Eq. (14) in [23])
(1+α)/α
Z0
In general, the equilibrium probabilities are not of the product form (there is one exception—see below). The product form is of course also absent in the standard formalism when there are correlations between subsystems. Nevertheless, if the correlations are not too strong, then the system in equilibrium is still extensive. This is expressed by stating that the so-called thermodynamic limit exists. We expect that also in the present formalism the thermodynamic limit exists. We have checked this statement for the Curie– Weiss model [33]. Consider now the case a = 1/(1+(1−α)u) in (20). This is a remarkable case because the equilibrium distribution (20) becomes exponential. Indeed, one verifies that 1 e−β0 Ek . pk = e−β0 Ek , with Z1 = (28) Z1
.
(24)
The set of Eqs. (20)–(24) is what is needed for applications. Let us finally return to the specific case of Rényi’s entropy, i.e., Ik and φα given, respectively, by (4) and (5). This choice is particularly interesting since only then the following three conditions are satisfied β0 H + β0 Eφα = β0 H φα + β0 E,
(25)
β0 HA+B φα = β0 HA φα + β0 HB φα ,
(26)
Iα (A + B) = Iα (A) + Iα (B),
(27)
where A and B are two uncorrelated noninteracting systems. Condition (25) when combined with the explicit form of equilibrium state means that equilibrium does not depend on the origin of the energy scale. The remaining two conditions imply that we have a oneparameter family of extensive generalizations of the Boltzmann–Gibbs statistics, the latter being recovered in the limit α → 1. For α = q −1 = 1 we obtain the well-known Tsallis-type kappa-distributions but with energies βEk replaced by φα (β0 Ek ).
This means that for each system there exists a particular temperature where the equilibrium state is factorizable. Still assuming that φα is given by (5) one can easily calculate thermodynamic temperature T , as given by (13). One finds d d β0 U Sα β0 T = da da 1 1 + a(1 − α)(φα (β0 U ) − u) . = (30) aα 1 + (1 − α)φα (β0 U ) This expression can be used to eliminate a from (20). With some effort one obtains pk ∼ 1 + (1 − α)φα (β0 U ) +
1−α φα (β0 Ek ) − φα (β0 U ) αβ0 T
1/(α−1)
Using definition (5) of φα one obtains 1/(α−1) pk = A 1 − λ + λe(1−α)β0(Ek −U ) ,
. (31)
(32)
where A is the appropriate normalization constant, and with λ = (αβ0 T )−1 . From this result it is immediately
M. Czachor, J. Naudts / Physics Letters A 298 (2002) 369–374
clear that the Boltzmann–Gibbs distribution follows in the limit α = 1. From (30) one sees that the special temperature T for which a = (1 + (1 − α)u)−1 holds is 1/(αβ0 ). Formula (32) shows also that β0 controls cross-over between different regimes of temperature dependence of pk and is the energy analog of the cross-over time t0 used in [32]. This clarifies the meaning of β0 . It is quite remarkable that (32) is exactly the probability density postulated in [31] on an ad hoc basis in order to improve theoretical fits to experimental protein-folding data [32]. Although the original motivation of [31] was to interpret (32) as a signature of nonextensivity, we have derived it on the basis of Rényi’s entropy which is extensive. However, as shown in [31], the long-tail data are still better described by a small deviation from (32). An appropriate nonextensive (although inequivalent to Tsallis’) departure from Rényi’s φα is given by
φαρ (x) = lnα expρ (x)
(1−α)/(1−ρ) 1 1 + (1 − ρ)x −1 . = 1−α (33) The corresponding equilibrium distribution is of the form (assuming U = 0, see [33]) pk = A 1 − λ
(1−α)/(1−ρ)−1/(1−α) . + λ 1 + (1 − ρ)β0 Ek (34) In the limit ρ = 1 this expression coincides with (32). The exponent ρ controls the tail of the distribution pk —see Fig. 1. Recently, Montemurro [34] analyzed linguistic data using (32) and further generalizations proposed in [31]. From our point of view (34) is the obvious generalization of (32). Therefore we have reanalyzed some of the fits made in [34]. Of particular interest are the compound data from a corpus of 2606 books in English, containing 448,359 different words. Surprisingly, a convincing 4-parameter fit to all 448,359 data points is possible using (34) in the limit α = 0, assuming Ek = k. See Fig. 2. The fitting parameters are ρ = 0.568, β0 = 1/3086, and λ = 2630. The slope of the tail equals 1/(1 − ρ) 2.32, in agreement with the
373
Fig. 1. Log–log plot of pk , with Ek = k, β0 = 1/100, α = 0.1 and λ = 500, not normalized (A = 1), for different values of ρ: 0.1 dotted, 0.8 short-dashed, 1.0 solid, 1.08 long-dashed.
Fig. 2. Log–log plot of the frequency of words as a function of their ranking. Comparison of experimental data (solid line) and fitted curve (dotted line).
value 2.3 mentioned in [34]. The root mean square error of the fit in the log–log plot equals 0.27. Let us summarize the results. We present a formalism of thermostatistics based on nonlinear KN-averages. Entropy is maximized under the constraint that the nonlinear average of energy E equals a given value U . If energy does not fluctuate then linear and nonlinear averages coincide and our approach reduces to the standard one. However, in interesting systems energy does fluctuate, in which case we obtain new results. Our formalism simultaneously generalizes Boltzmann–Gibbs and Tsallis theories. As opposed to the Tsallis case, which is always nonextensive, the KNapproach allows for a family of extensive generalizations, which however lead to equilibrium states sharing many properties with Tsallis q = 1 distributions,
374
M. Czachor, J. Naudts / Physics Letters A 298 (2002) 369–374
via the relation between Iα and Sα . The extensive case corresponds to the choice φα (x) = lnα (exp x) since then the average information coincides with Rényi’s entropy. As proved by Rényi, his entropy, together with that of Shannon [10], are the only additive entropies. As shown in [21,22] additivity of entropy is a requirement for physical temperature T to be defined by the usual thermodynamic relation (13). The formalism generalizes to other nonexponential choices of φ provided the information measure is adapted in such a way that (9) and (10) still hold. In this more general context entropy is no longer additive. In a natural way Tsallis’ entropy appears as a tool for calculating equilibrium averages. This offers the opportunity to reuse the knowledge from Tsallis-like thermostatistics. A tempting question is whether in each of the many applications of Tsallis’ thermostatistics one can find a natural KN-average which maps the problem into the present formalism. In [33] we discuss the present formalism from a more fundamental point of view and give explicit examples. Here we mention only that the results for a two-level system and the Curie–Weiss model are satisfactory. Of course, more complicated examples should be studied. For the sake of completeness let us mention that Rényi’s entropy has been studied already [16] in relation with escort probabilities (14). One of the conclusions of that paper is that they obtain the same results as in Tsallis’ thermostatistics, which is not a surprise since Rényi’s entropy and Tsallis’ entropy are monotonic functions of each other. The cross-over property of our pk , a consequence of KN-averaged constraints of our formalism, is absent for distributions found in [16] since their constraints employ linear averages. Acknowledgements We are grateful to Dr. Montemurro for making available his numerical data. One of the authors (M.C.) wishes to thank the NATO for a research fellowship enabling his stay at the Universiteit Antwerpen. References [1] A. Kolmogorov, Atti R. Accad. Naz. Lincei 12 (1930) 388.
[2] M. Nagumo, Japan. J. Math. 7 (1930) 71. [3] B. de Finetti, Giornale di Istituto Italiano del Attuarii 2 (1931) 369. [4] B. Jessen, Acta Sci. Math. 5 (1931) 108. [5] T. Kitagawa, Proc. Phys. Math. Soc. Japan 16 (1934) 117. [6] J. Aczél, Bull. Amer. Math. Soc. 54 (1948) 392. [7] A. Rényi, MTA III. Oszt. Közl. 10 (1960) 251, reprinted in [35], pp. 526–552. [8] A. Rényi, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 1961, pp. 547–561, reprinted in [35], pp. 565–580. [9] G.H. Hardy, J.E. Littlewood, G. Pólya, Inequalities, Cambridge, 1934, Theorem 89. [10] C.E. Shannon, Bell System Tech. J. 27 (1948) 379; C.E. Shannon, Bell System Tech. J. 27 (1948) 623. [11] N. Wiener, Cybernetics, Wiley, New York, 1948. [12] A. Rényi, Rev. Inst. Internat. Stat. 33 (1965) 1, reprinted in [35], pp. 304–318. [13] R.V. Hartley, Bell System Tech. J. 7 (1928) 535. [14] C. Tsallis, Quim. Nova 17 (1994) 468; E.P. Borges, J. Phys. A 31 (1998) 5281. [15] A. Wehrl, Rev. Mod. Phys. 50 (1978) 221. [16] E.K. Lenzi, R.S. Mendes, L.R. da Silva, Physica A 280 (2000) 337. [17] C. Tsallis, J. Stat. Phys. 52 (1988) 479. [18] J. Harvda, F. Charvat, Kybernetica 3 (1967) 30. [19] Z. Daróczy, Inform. Control 16 (1970) 36. [20] C. Tsallis, R.S. Mendes, A.R. Plastino, Physica A 261 (1998) 543. [21] S. Abe, A. Martinez, F. Pennini, A. Plastino, Phys. Lett. A 281 (2001) 126; S. Martinez, F. Pennini, A. Plastino, Physica A 295 (2001) 246; S. Martinez, F. Pennini, A. Plastino, Physica A 295 (2001) 416. [22] R. Toral, cond-mat/0106060. [23] J. Naudts, Rev. Math. Phys. 12 (2000) 1305. [24] J. Naudts, Chaos Solitons Fractals 13 (3) (2002) 445. [25] J. Naudts, M. Czachor, in: S. Abe, Y. Okamoto (Eds.), Nonextensive Statistical Mechanics and its Applications, Lecture Notes in Physics, Vol. 560, Springer, 2001, pp. 243–252. [26] A.V. Milovanov, L.M. Zelenyi, Nonlinear Processes Geophys. 7 (2000) 211. [27] E.T. Jaynes, Phys. Rev. 106 (1957) 620. [28] N. Meyer-Vernet, M. Moncuquet, S. Hoang, Icarus 116 (1995) 202. [29] A.R. Plastino, A. Plastino, Phys. Lett. A 174 (1993) 384. [30] L.S. Lucena, L.R. da Silva, C. Tsallis, Phys. Rev. E 51 (1995) 6247. [31] C. Tsallis, G. Bemski, R.S. Mendes, Phys. Lett. A 257 (1999) 93. [32] R.H. Austin et al., Phys. Rev. Lett. 32 (1974) 403. [33] J. Naudts, M. Czachor, cond-mat/0110077. [34] M. Montemurro, Physica A 300 (3) (2001) 567. [35] Selected Papers of Alfréd Rényi, Vol. 2, Akadémiai Kiadó, Budapest, 1976.