Emergence of power laws from partitional dynamics

Emergence of power laws from partitional dynamics

BioSystems 74 (2004) 63–71 Emergence of power laws from partitional dynamics Antonio Garc´ıa-Olivares a,∗ , Pedro C. Marijuán b a ICM (CSIC), Passei...

331KB Sizes 0 Downloads 41 Views

BioSystems 74 (2004) 63–71

Emergence of power laws from partitional dynamics Antonio Garc´ıa-Olivares a,∗ , Pedro C. Marijuán b a

ICM (CSIC), Passeig Mar´ıtim de la Barceloneta 37-49, 08003 Barcelona, Spain b Fundación CIRCE, CPS Universidad de Zaragoza, Zaragoza 50018, Spain Received 5 December 2003; accepted 17 February 2004

Abstract This work explores the possibility of producing some specific power laws (with exponential cut-off) out from a partitional dynamics. The data obtained from the partitions of integers in the interval 1–60 have been used to find the coefficients of a power law and its exponential cut-off, and also of a single hyperbolic form after renormalization. There is also the speculation that this type of power law, so easily derived from arithmetic minimalist operations, may underlie the communication exchanges of living cells and the structural games of self-constructing agents endowed with relative structural freedom. © 2004 Elsevier Ireland Ltd. All rights reserved. Keywords: Power law; Exponential cut-off; Self-organization; Complex systems; Partitional canon; Information; Self-construction

1. Introduction The ubiquity of power laws in the most varied settings, physical, biological (biomolecular, vegetal, animal, neuronal), linguistic, social, economic, and technological, is an amazing scientific fact, often not very well explained in its dynamical assumptions (Gell-Mann, 1994). One can find, for instance, historical conceptualizations such as Pareto’s law, Zipfz’s law, Guttenberg–Richter’s law, or the classical debates on organismic allometries in biology. Also, late Gordon Scarrott (1998) pointed out recursive organization as a basic trait in the emergence of human information systems, implying the subsequent presence of hyperbolic (or power law) distributions in the most varied communicational and economic contexts of human societies. But perhaps the most fruitful ap∗

Corresponding author. Tel./fax: +34-932309515. E-mail addresses: [email protected] (A. Garc´ıa-Olivares), [email protected] (P.C. Mariju´an).

proaches nowadays are based on self-similarity and critical phenomena (Bak, 1996; Sethna et al., 2001). Assuming that the ubiquity of power laws can often be interpreted under the assumption of self-organized criticality emerging from various types of dynamic equations (Bak, 1996), here we are going to explore the plausibility of a direct ‘arithmetic’ approach. Indeed, as argued by Marijuán et al. (1998), a very simple way to obtain a numerical distribution very close to a power law is by considering the total set of partitional summands of a given integer n (later on, we will argue about the intriguing differences that appear between both distributions). That is, in the elementary arithmetic operation of breaking an integer n into all of its possible countable parts, we obtain a series of elements known as partitions (see following text), each one composed by summands or parts that add to the value n. Then, the distribution formed by all the summands or parts of all partitions, considered together in a unique total set, shows regular ‘lawful’ features that can be related to power laws.

0303-2647/$ – see front matter © 2004 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.biosystems.2004.02.005

64

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71

Although partitions were discovered long ago by mathematician Leonhard Euler, only in recent decades they have become a subject of intense research, mostly for cryptographic purposes. However, research on different aspects of partitions was kept alive by several distinguished mathematicians, including Ramanujan and Ërdos. In recent years, and working outside the mainstream, Karl Javorszky (1995, 2003) has produced a number of intriguing mathematical structures stemming out from partitional operations. His ‘System M’ series of quasi-Fibonacci numbers, as well as other suggestions about the importance of partitional counting in the organization of biological phenomena, have motivated the present authors to explore the possibility that a general approach to self-organized criticality and extremal dynamics might also be obtained throughout partitional operations. The explanatory potential of partitions in relation to the communicational dynamics of living cells (the ‘language of cells’) and, in general, concerning the multiple self-construction processes based on ‘informational dynamics,’ has already been explored by these authors elsewhere (Marijuán et al., 1998; Marijuán and Villarroel, 1998). In what follows, we will introduce algorithmic procedures to obtain the partitions of any integer n (in the interval 1–60) and to estimate the probability of each k summand in the total partitional set. We will appreciate, subsequently, a wide zone of quasi-linear behavior for small and middle-range values of the k summands, suggesting a power-law behavior for these predominant values. For larger values of k, however, the behavior gradually changes into a steeper slope decay which resembles the form of the “exponential cut-off” that has been observed in many power-law models of the real world. The general coefficients of the power-law fit have been estimated by means of the Mathematica Program and also the parameters of a single hyperbolic functional form (obtained after renormalization of the two coordinate axis for the whole set of curves).

As an example, the partition of 5 in its summands is: S = {{5}, {4, 1}, {3, 2}, {3, 1, 1}, {2, 2, 1}, {2, 1, 1, 1}, {1, 1, 1, 1, 1}} which shows that the number 5 has one partition in one part, two partitions in two parts, two partitions in three parts, one partition in four parts and one partition in five parts. The total number of partitions of n in k parts p(n, k) can be obtained by means of the following recursive algorithm:  n k The total number of partitions of n can then be calculated by means of the following expression:  1, n=0 pt (n) = n (2) p(n, kk) n>0 kk=1 With this formula it is easy to confirm that the total number of partitions of 5 is 7. Ramanujan obtained a surprising and useful formula to approximate pt (n) when n is large (see Javorszky, 1995): ∞ 1 √ d pt (n) = √ kAk (n) dn π 2 k=1  √ √ sinh π/k 2/3 n − 1/24 × √ n − 1/24 where

 2πihn Ak (n) = exp (πis(h, k)) exp − k 0
and k−1

 j hj s(h, k) = k k j=1

where 2. Material and methods A numerical partition of an integer number n in summands is a sequence p1 ≥ p2 ≥ · · · ≥ pk > 0, such that p1 +p2 + · · · +pk =n. Each pi is called a part.

(3)

((x)) =



x − [x] − 1/2 0

if x is not an integer if x is an integer

Fig. 1 compares the absolute error obtained by calculating pt (n) with Eq. (3) instead of using the exact

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71

65

A program to generate the ns (n, k) numbers with this algorithm has been implemented by means of the Mathematica symbolic processor (Wolfram Research). 3. Number of times that a given summand k appears as a power law

Fig. 1. Absolute error obtained by calculating pt (n) with Eq. (3) instead of using the exact definition (2). The horizontal axis represents increasing values of n.

definition (2). As it can be observed, the Ramanujan’s formula is not an accurate explicit expression. However, the relative error obtained by using this formula decreases when n becomes large, as it can be observed in Fig. 2. According to the partitional dynamics proposed by Marijuán et al. (1998), a number that could result far more interesting is ns , the number of times that a given summand k appears in the whole set of partitions of n. This number can be obtained by using the following property (see Herb Wilf, in the web page: www.theory.csc.uvic.ca/∼cos/inf/nump/NumPartition. html):  nn = k  1, ns (n, k) = pt (nn − k) + pt (nn − 2k) (4)  + pt (nn − 3k) + · · · , nn > k

Fig. 2. Relative error obtained by calculating pt (n) with Eq. (3) instead of using the exact definition (2). The horizontal axis represents increasing values of n.

Fig. 3 shows log–log plots of the values of ns (n, k) versus k for different n values, n = 1–60. From bottom to top it is possible to see the curves representing the values obtained when n goes from 1 to 60. The left to right axis represents the different values of k, from 1 to n. It is apparent in the figure, a wide zone of quasi-linear behavior for any individual curve of constant n, especially for small values of k, suggesting a power-law behavior for these values. For large values of k, however, the behavior gradually changes into a steeper slope decay which resembles the form of the “exponential cut-off” that has been observed in many power-law models of the real world (for an illustration, see p. 96, 111 in Bak, 1996). The behavior is similar for larger values of n, for instance, n = 80, as it can be observed in Fig. 4, which shows the same two zones by plotting ns (n, k) for n = 80 and k from1 to 80. Fig. 5 shows kns (80, k) versus k. The observed plateau of the plot in the zone with small values of k indicates a tendency to the “equidistribution of mass” for the small parts of the partition.

Fig. 3. log(ns ) as a function of k for different n values. Different curves in the plot correspond to n going from 1 (bottom) to 60 (top). In every curve, the values for k go from 1 to n.

66

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71

not-explained variance. The slope of this line is: b(n) =

ln a(n) − ln f(n, k) ln k

where a(n) = ns (n, k = 1) and f(n, k) is the observed of ns (n, k) for some k close to 1. In our case, a value k = 10 has been taken to calculate b(n). With this formula it is easy to see that b takes the value 1.69 for n = 20, 1.45 for n = 40, 1.36 for n = 60 and 1.307 for n = 80. This behavior fits with a χ2 = 7 × 10−5 to the following function: Fig. 4. Plot of ns (n, k) vs. k for n = 80.

b(n) = 1.2789 + 0.9637 exp(−0.0427n)

It is interesting to know if the number ns (n, k) has some explicit functional form. Given the recursive nature of its definition, it is a formidable task to try to obtain an explicit “general term” for ns as a function of n and k. However, we have already noticed that ns (n, k) for constant n behaves close to a power function for small k’s and near to an exponential function for large k’s. Therefore, a fit of ns (n, k) to a general power function has been tried for the numbers ns (n, k) as obtained from (4), (2) and (1) which tries to represent the general behavior of the function for the range k < n/2. The function to be adjusted is the following: f(n, k) = a(n)k−b(n)

(5)

where the amplitude and the exponent are assumed to depend on n. The exponent b is easy to fit by using the logarithmic plot of ns (n, k) versus ln k for several constant n and adjusting the plot to a straight line, minimizing the

Fig. 5. Plot of kns (n, k) vs. k for n = 80.

(6)

which, for large values of n, tends to −1.279, a number that would be close to some of the exponents discussed for biological allometries, in the vicinity of the 4/3 value. An integrated relative error of the fit can be defined as the square root of the fractional variability not ex plained by the fit, that is: R = χ2 . This error is 0.008 in the fit, that is, lower than 1%. The amplitude fits well (with an integrated relative error: R = 1 × 10−4 ) with the addition of a power law and an exponential law:

 k1 exp(k2 n) + k3 nk4 a(n) = (7) 2 where k1 = 195.3, k2 = 0.17, k3 = 2.4 × 10−9 and k4 = 8.63. This fit was obtained by using the values ns (n, k = 1) for n varying from 1 to 60. Fig. 6 shows the values of ns (n, k) versus n and k from n = 1 to 40 and k = 1 to 20, as obtained from its definition (4) and Fig. 7 shows the values of ns (n, k) as obtained from expression (5). The visual comparison of both figures confirms that the relative error of the fit is very small. However, the qualitative behavior of this error can be better observed by plotting the same two figures in logarithmic form. This is done in Figs. 8 and 9, respectively. In contrast with Fig. 3, that is a log–log plot, Figs. 8 and 9 are log–linear plots, therefore linearity should not be expected here for small values of k. However, the comparison of these two figures shows clearly that expression (5) tends to overpredict ns (n, k) for n and k larger than n/2.

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71

Fig. 6. Plot of ns (n, k) vs. n and k, from n = 1 to 40 and k = 1 to 20, as obtained from its definition (4).

67

Fig. 9. Plot of ns (n, k) vs. n and k, from n = 1 to 40 and k = 1 to 20, as obtained from expression (5), in logarithmic scale.

4. Hyperbolic fit of ns (n, k) for any n and k

Fig. 7. Plot of ns (n, k) vs. n and k, from n = 1 to 40 and k = 1 to 20, as obtained from expression (5).

As can be observed in Fig. 3, three different regions can be distinguished in the curves ns (n, k) for constant n: an asymptotical tendency towards a potential behavior for small k (k < n/3), the “cut-off” for large values of k (k > 2n/3) and an intermediate transition zone (n/3 < k < 2n/3). In addition, this behavior is similar in all the curves obtained for different values of n, which becomes even more apparent if we collapse all the curves for different n values in a single one, by means of a renormalization of the two coordinate axes. This renormalization is the following: n s (n, k) = ns (n, k)/np ,

k = k/nq

Fig. 10 shows ns (n, k ) versus k for n from 11 to 60 when p = 0.71 and q = 0.35. All the curves tend to collapse into a hyperbolic form for large n values.

Fig. 8. Plot of ns (n, k) vs. n and k, from n = 1 to 40 and k = 1 to 20, as obtained from its definition (4), in logarithmic scale.

Fig. 10. Plot of ln(ns (n, k))/np vs. ln(k)/nq for n from 11 to 60 when p = 0.71 and q = 0.35.

68

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71

This behavior suggests that the whole set of curves can be fitted to a single hyperbolic functional form after the renormalization of their axis. The general expression for a hyperbola with centre at (x0, y0), semiaxis a and b and rotation angle te is the following:

b = 0.474, x0 = 1.01, y0 = 0.623. The mean relative error of ns versus k is 4.0%. The final expression is:  n s = −17.867 0.1472x  + 0.0163x2 − 0.0330x + 0.0176 − 0.1840

a2 y0 cos[te]2 + a2 x cos[te] sin[te] + b2 x cos[te] sin[te] − a2 x0 cos[te] sin[te]

Y→

− b2 x0 cos[te] sin[te] − b2 y0 sin[te]2   2 2 2  a b x cos[te]4 − 2a2 b2 xx0 cos[te]4 + a2 b2 x02 cos[te]4 − a4 b2 cos[te]6 + 2a2 b2 x2 cos[te]2 sin[te]2   − 4a2 b2 xx0 cos[te]2 sin[te]2 + 2a2 b2 x02 cos[te]2 sin[te]2 − 2a4 b2 cos[te]4 sin[te]2   2 4 4 2 2 2 2 4 2 2 4 2 2 2 4 +  + a b cos[te] sin[te] + a b x sin[te] − 2a b xx0 sin[te] + a b x0 sin[te]   − a4 b2 cos[te]2 sin[te]4 + 2a2 b4 cos[te]2 sin[te]4 + a2 b2 x02 sin[te]4 − a4 b2 cos[te]2 sin[te]4  + 2a2 b4 cos[te]2 sin[te]4 + a2 b4 sin[te]6 a2 cos[te]2 − b2 sin[te]2

Therefore, it is possible to fit the parameters of this expression in order to minimize the distance to the curves shown in Fig. 3. The squared-mean nonlinear fit has been obtained with the Mathematica program. For n = 60, a χ2 = 0.0005 is obtained for the logarithmic plot of ns versus k , when the parameters take the following values: te = 0.719, a = 0.462, b = 0.515, x0 = 1.13, y0 = 0.816. The mean relative error of ns versus k is 2.8%. The final expression for ns is (in the following x is the k variable):  n s = 179.862 0.2375x  + 0.0567x2 − 0.1280x + 0.0718 − 0.2633

Fig. 11 compares the curve ns (n, k) versus k for n = 40 with its functionally fitted counterpart. Similar plots are obtained for n = 60 and n = 20, showing that with a relative error under 3%, the real and fitted plots are almost indistinguishable, except for very small values of k. A quadratic variation with a strong linear component can be observed in the values obtained for the

For n = 40, a χ2 = 0.0008 is obtained for the logarithmic plot of ns versus k , when the parameters take the following values: te = 0.7188, a = 0.405, b = 0.512, x0 = 1.106, y0 = 0.765. The mean relative error of ns versus k is 3.0%. The final expression is:  n s = −48.692 0.2111x  + 0.0430x2 − 0.0950x + 0.0534 − 0.2491 For n = 20, a χ2 = 0.001 is obtained for the logarithmic plot of ns versus k , when the parameters take the following values: te = 0.718, a = 0.270,

Fig. 11. Comparison of the exact (continuous line) and fitted functions ns (n, k) vs. k for n = 40.

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71

69

parameters te, a, b, x0 and y0 in the fits implemented for n = 60, 40 and 20. Therefore, the following general expression can be obtained for ns : 2 x + 0.495651b2 x − 0.495651a2 x0 − 0.495651b2 x0 0.495651a √ + −0.565807a4 b2 + 0.434193a2 b4 + 1 × a2 b2 x2 − 2 × a2 b2 xx0 + 1 × a2 b2 x02 + 0.565807a2 y0 − 0.434193b2 y0 0.565807a2 − 0.434193b2 0.131613a2 x + 0.131613b2 x − 0.131613a2 x0 − 0.131613b2 x0 + (0.991301a4 b2 + 0.991301a2 b4 + 0 × a2 b2 x2 + 0 × a2 b2 xx0 + 0 × a2 b2 x02 )/ √ 2 −0.565807a4 b2 + 0.434193a2 b4 + 1 × a2 b2 x2 − 2 × a2 b2 xx0 + 1 × a2 b2 x02

− 0.991301a2 y0 − 0.991301b2 y0 + (−0.7194 + te) 0.565807a2 − 0.434193b2

0.991301a2 0.991301b2 + + (0.565807a2 − 0.434193b2 )2 (0.565807a2 − 0.434193b2 )2  × 0.495651a2 x + 0.495651b2 x − 0.495651a2 x0 − 0.495651b2 x0 + −0.565807a4 b2 + 0.434193a2 b4 + 1 × a2 b2 x2 − 2 × a2 b2 xx0 + 1 × a2 b2 x02  + 0.565807a2 y0 − 0.434193b2 y0

where for 1 < n < 60: te = 0.7194 − 0.00003(60 − n) a = 0.4622 − 0.00089(60 − n) − 0.00010(60 − n)2 b = 0.5153 + 0.00068(60 − n) − 0.00004(60 − n)2 x0 = 1.1278 + 0.00065(60 − n) − 0.000088(60 − n)2 y0 = 0.8157 − 0.000215(60 − n) − 0.000115(60 − n)2

The previous expression is a hyperbola in the x (or k ) variable, with parameters depending on n. This expression has been obtained by taking the following first-order approximations: sin(te) = sin(te(60)) + cos(te(60))(te − te(60)) and cos(te) = cos(te(60)) − sin(te(60))(te − te(60)) which are justified due to the very slow variation of the te parameter with n.

5. Final comments We have obtained the partitions of any integer n and estimated the probability of each k summand in

the total partitional set. We have seen a wide zone of quasi-linear behavior for small and middle-range values of the k summands, suggesting a power-law behavior, while for larger values of k the behavior gradually changes into another linear behavior with a much larger power exponent. The global aspect of this canon, that is actually hyperbolic in the logarithmic scale, is very close to the “power law with exponential cut-off,” a functional behavior that, seemingly, has been observed in many models of the real world, living beings included. In processes where aggregation or dissociation energies are acting along a wide range of scales a partitional model, as the one described here, could be a fruitful zero-order approach as well as a good heuristic tool. For instance, in the fragmentation processes taking place along a wide range of scales in a crystalline material. In the extent to which power laws with an exponential cut-off appear as ‘attractors’ for the structures of some very especial dynamic entities, as if they were pushed towards this distribution by some ‘invisible hand,’ there is the possibility that simple arithmetic operations of gaining and loosing component units

70

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71

(e.g. stochastically performed after the information exchanges with the environment) lead to the same result than ad hoc dynamic laws. In other words, the partitional canon (our term proposed for the distribution of summands in the total partitional) may be the hidden source of a robust order. It means that ‘mature’ distributions of populations of informational self-constructing entities might abide to a natural statistics: the partitional canon itself. Actually, this canon represents the easiest outcome of the easiest operation—a populational random walk through a landscape of successive minimal additions and subtractions in the structures of the participating agents. The suspicion (Gell-Mann, 1994) that an entropy-like distribution related to a deeper phenomenon could underlie the natural universe of power laws, has been interpreted by us in a similar way to the Maxwell–Boltzmann statistics to which populations of gases do abide. Populations of self-constructing agents (informational ones) endowed with enough organizational freedom tend to mature, as if pushed by an invisible hand, towards the partitional canon—a power law with an exponential cut-off. This is the hypothesis that these authors are going to checkout in future works. The echoes of power laws can be found in a variety of biological structures and processes (Bak, 1996; Scarrott, 1998; Marijuán, 2003). From the energy flow through ecosystem levels, to the distribution of biomass among the different species in a niche, and to the metabolic organization within organisms and cells (Darveau et al., 2002); and also in the physical structures of tissues and the circulatory and respiratory systems, and amazingly in the molecular components of the cell itself (both in the connectivity of protein networks and in the formation of protein complexes—Jeong et al., 2001; Maslov and Sneppen, 2002), and also in the internal processes of cellular signalling systems (Villarroel, 2002; Marijuán, 2003). Usually, hot debates and very difficult estimates (e.g. biological allometries, metabolic regimes) are involved in the determination of these power laws. In a number of cases, the emergence of the power laws (with or without exponential cut-off) can clearly be identified as corresponding to specific dynamic laws embedded in adjacent organization levels, while in other cases the hypothesis of an ‘invisible hand’ emerging out from simplest arithmetic operations of

gaining and loosing component parts stochastically performed might be a plausible option to explore. One cannot forget that the whole organization of the living cell is based on an interplay of production and degradation operations (the extent of protein degradation becoming almost comparable in its vastness and complexity to protein synthesis itself—Marijuán, 2003), and that the multicelular organization also encompasses a crucial balance between constructive and degradative processes (e.g. apoptosis, present in all tissues). Thus, the biological gaining and loosing of components stochastically performed along the vagaries of the incoming environmental information may be a very predisposed scenario for the emergence of power-law signatures. In very different scientific realms, the exploration of the partitional approach might yield interesting results. Perhaps it is not too farfetched thinking that a variety of natural (and mathematical) structures and constants are waiting to be rediscovered along the partitional track. For instance, we have also found that, quite probably, the golden mean shows up in an average operation related to the power laws with exponential cut-off discussed here.

References Bak, P., 1996. How Nature Works: The Science of Self-Organized Criticality. Copernicus, Springer-Verlag, New York. Darveau, C.A., Suárez, R.K., Andrews, R.D., 2002. Allometric cascade as a unifying principle of body mass effects on metabolism. Nature 417, 166–170. Gell-Mann, M., 1994. The Quark and the Jaguar. Adventures in the Simple and the Complex. Little Brown, London. Javorszky, K., 1995. Granularity Algebra. Mackinger-Verlag, Vienna. Javorszky, K., 2003. Information processing in auto-regulated systems. Entropy 5, 161–192. Jeong, H., Mason, S.P., Barabási, A.L., Oltvai, Z.N., 2001. Lethality and centrality in protein networks. Nature 411, 41–42. Marijuán, P.C., 2003. From inanimate molecules to living cells: the informational scaffolding of life. In: Musumeci, F., Brizhik, L.S., Ho, M.W. (Eds.), Energy and Information Transfer in Biological Systems. World Scientific, Singapore. Marijuán, P.C., Villarroel, M., 1998. On information theory stumbling blocks. Cybernet. Hum. Knowing 5, 4. Marijuán, P., Pastor, J., Villarroel, M., 1998. The language of cells: a partitional approach to cell-signalling. Symmetry: Culture Sci. 9, 383–392.

A. Garc´ıa-Olivares, P.C. Mariju´an / BioSystems 74 (2004) 63–71 Maslov, S., Sneppen, K., 2002. Specificity and stability topology of protein networks. Science 296, 910–913. Scarrott, G., 1998. The formulation of a science of information: an engineering perspective on the natural properties of information. Cybernet. Hum. Knowing 5, 4.

71

Sethna, J.P., Dahmen, K.A., Myers, C.R., 2001. Crackling noise. Nature 410, 242–250. Villarroel, M., 2002. Information processing in a partitional framework. In: FIS 2002 Electronic Conference (http//:www. mdpi.net/fis2002/browse.htm).