Annals of Physics 374 (2016) 138–161
Contents lists available at ScienceDirect
Annals of Physics journal homepage: www.elsevier.com/locate/aop
From micro-correlations to macro-correlations Iddo Eliazar Smart Device Innovation Science Team, New Devices Group, Intel Corporation, Yakum, Israel
article
Article history: Received 9 January 2016 Accepted 19 July 2016 Available online 17 August 2016 Keywords: Overall correlation Micro correlation Macro correlation Mixture models
abstract
info
Random vectors with a symmetric correlation structure share a common value of pair-wise correlation between their different components. The symmetric correlation structure appears in a multitude of settings, e.g. mixture models. In a mixture model the components of the random vector are drawn independently from a general probability distribution that is determined by an underlying parameter, and the parameter itself is randomized. In this paper we study the overall correlation of high-dimensional random vectors with a symmetric correlation structure. Considering such a random vector, and terming its pair-wise correlation ‘‘microcorrelation’’, we use an asymptotic analysis to derive the random vector’s ‘‘macro-correlation’’ : a score that takes values in the unit interval, and that quantifies the random vector’s overall correlation. The method of obtaining macro-correlations from microcorrelations is then applied to a diverse collection of frameworks that demonstrate the method’s wide applicability. © 2016 Elsevier Inc. All rights reserved.
1. Introduction Assume that, given a high-dimensional random vector, you are asked to evaluate the overall correlation between the vector’s components. In case of a two-dimensional random vector the answer is straightforward. Indeed, the vector’s overall correlation is the correlation between its single pair of components. But what is the answer in the case of a random vector of arbitrary dimension d? And, moreover, what is the answer when the dimension d is very large? Well, as in the case of two-dimensional random vectors, we can consider the pair-wise correlations between the different components of the given random vector. There are d (d − 1) /2
E-mail addresses:
[email protected],
[email protected]. http://dx.doi.org/10.1016/j.aop.2016.07.027 0003-4916/© 2016 Elsevier Inc. All rights reserved.
I. Eliazar / Annals of Physics 374 (2016) 138–161
139
such correlations, and these correlations constitute the vector’s correlation matrix. But how can we evaluate the vector’s overall correlation from its correlation matrix? Alternatively, given two random vectors, how can we determine which is more correlated than the other? Given two bodies, we can measure the temperature of each, and determine which is hotter and which is colder. Analogously, we are looking for a one-dimensional measure of overall correlation in the context of random vectors. Such a measure exists, it emanates from the notion of Wilks’ generalized variance [1–5], and it is based on the determinant of the random vector’s correlation matrix [6]. In particular, in the case of a two-dimensional random vector the overall correlation is the square of the correlation between the vector’s pair of components. As in the Central Limit Theorem (CLT) [7–9], and as in Extreme Value Theory (EVT) [10–12], our key focus in this paper is the evaluation of overall correlation in the high-dimensional limit d → ∞. The CLT and EVT share a common bedrock model that considers the components of the given random vector to be independent and identically distributed (IID) random variables. The CLT addresses the aggregate of the vector’s components, whereas EVT addresses the maximum of the vector’s components. Both the CLT and EVT explore how the microscopic affects the macroscopic: the ‘‘microscopic’’ being the statistical distribution of the IID components; the ‘‘macroscopic’’ being the statistical distribution of the components’ aggregate and maximum, in the high-dimensional limit d → ∞. Our goal in this paper – similarly to the CLT and EVT, yet in the context of correlation – is to explore how the microscopic affects the macroscopic. To that end the first step is to set a bedrock model. Evidently, the CLT and the EVT bedrock model of IID components is trivial and uninteresting, as it bears no dependencies and hence no correlations. Nonetheless, mixture models offer a neat way of transforming IID components to dependent components [13–15]. Specifically, in a mixture model the components are IID random variables that are drawn from a general probability distribution that is determined by an underlying parameter, and the underlying parameter is randomized before drawing the random variables. This randomization induces dependencies, and consequently mixture models yield a symmetric correlation structure: the pair-wise correlations between the different components of the given random vector share a common value. Our bedrock model in this paper is the symmetric correlation structure. At first sight this bedrock model may appear somewhat too simple and restricting. However, this bedrock model is actually surprisingly rich. Indeed, along the paper we shall provide a diverse collection of frameworks that demonstrate many applications of this bedrock model. Examples of the applications to be presented include: Maxwell–Boltzmann, Bose–Einstein, and Fermi–Dirac statistics; Lévy bridges and Lévy random probabilities; the aforementioned mixture models; regular and anomalous diffusion; regular and anomalous relaxation; power-laws; and conditional Lévy processes. In what follows, given a high-dimensional random vector with a symmetric correlation structure, we consider the vector’s common pair-wise correlation value as our input, and term it ‘‘microcorrelation’’. Then, using the aforementioned measure of overall correlation together with an asymptotic analysis, we derive in explicit form an output and term it ‘‘macro-correlation’’. This output is a score that takes values in the unit interval, and that quantifies the overall correlation of the random vector of interest, in the high-dimensional limit d → ∞. Thus, to summarize: in this paper we present a method for obtaining macro-correlations from micro-correlations, in the context of high-dimensional random vectors with a symmetric correlation structure. The method is established in Section 2. A wide range of the method’s applications is presented in Section 3, followed by a short conclusion in Section 4. The derivations of various results and formulae stated in Sections 2 and 3 are detailed in Section 5.
2. The method The goal of this paper is to study the overall correlation of large systems with a symmetric correlation structure. In this section we describe the measurement of overall correlation and the notion of a symmetric correlation structure, and then present an asymptotic analysis that meets our goal.
140
I. Eliazar / Annals of Physics 374 (2016) 138–161
2.1. Overall correlation Consider a pair of real-valued random variables, X1 and X2 . The most common measure of the interdependence of these random variables is their correlation r, defined as the ratio of their covariance to the product of their standard deviations. The correlation r is invariant with respect to affine transformations of the underlying measurement system, e.g. if the random variables are temperatures then it does not matter if the values are represented in Celsius, Fahrenheit, or Kelvin. Also, the correlation r features the following key properties: (I) It takes values in the interval −1 ≤ r ≤ +1. (II) It vanishes if and only if the random variables are uncorrelated: r = 0 if and only if E [X1 X2 ] = E [X1 ] E [X2 ]. (III) It yields its extreme values if and only if the random variables are affinely dependent: r = ±1 if and only if a1 X1 + a2 X2 = b, where {a1 , a2 , b} are real coefficients that are not all zero. The random variables X1 and X2 can also be viewed as a random vector (X1 , X2 ). From the vector perspective the inter-dependencies between the components of the vector (X1 , X2 ) are given by the corresponding correlation matrix
R=
1 r
r . 1
(1)
In turn, the determinant of the correlation matrix is given by det (R) = 1 − r 2 . Interestingly, 1 − det (R) = r 2 , and the squared correlation r 2 displays features that are similar to the key properties of the correlation r. Indeed, the squared correlation r 2 satisfies the following properties: (I) It takes values in the unit interval: 0 ≤ r 2 ≤ 1. (II) It yields its lower bound if and only if the random variables are uncorrelated: r 2 = 0 if and only if r = 0. (III) It yields its upper bound if and only if the random variables are affinely dependent: r 2 = 1 if and only if r = ±1. Let us now transcend from the two-dimensional case to the multi-dimensional case. Namely, instead of two real-valued random variables consider a collection of d real-valued random variables, X1 , . . . , Xd , where d is an arbitrary dimension (greater than two). As in the two-dimensional case, the collection of random variables can be viewed as a d-dimensional random vector (X1 , . . . , Xd ). In
d
turn, the corresponding correlation matrix is given by R = rij i,j=1 , where rij denotes the correlation between the random vector’s ith and jth components — the random variables Xi and Xj . On the one hand, in the transcendence from two dimensions (d = 2) to higher dimensions (d > 2) the single pair-wise correlation r is replaced by a set of d(d − 1)/2 pair-wise correlations rij (i ̸= j). On the other hand, regardless of the dimension d, the determinant det (R) is always a real number. Moreover, in [6] it was established that the aforementioned properties of the quantity ρ = 1 − det (R) – stated above in the case of two dimensions – hold valid in any dimension. Specifically, for all dimensions d the quantity ρ = 1 − det (R) satisfies the following key properties:
• It takes values in the unit interval: 0 ≤ ρ ≤ 1. • It yields its lower bound if and only if the random variables are uncorrelated: ρ = 0 if and only if rij = 0 for all i ̸= j. • It yields its upper bound if and only if the random variables are affinely dependent: ρ = 1 if and only if a1 X1 + · · · + ad Xd = b, where {a1 , . . . , ad , b} are real coefficients that are not all zero. Thus, the term ρ = 1 − det (R) is a quantitative score of the overall correlation between the random variables X1 , . . . , Xd . The smaller ρ , the less inter-dependent the random variables — with ρ = 0 characterizing the case of mutually uncorrelated random variables. The larger ρ , the more inter-dependent the random variables — with ρ = 1 characterizing the case of affinely dependent random variables. 2.2. Symmetric correlation structure Consider a system whose state is represented by a d-dimensional random vector (X1 , . . . , Xd ). We say that the random vector (X1 , . . . , Xd ) has a symmetric correlation structure if its different
I. Eliazar / Annals of Physics 374 (2016) 138–161
141
components share a common pair-wise correlation: Corr Xi , Xj = r for all i ̸= j (i, j = 1, . . . , d). Consequently, the correlation matrix of the random vector (X1 , . . . , Xd ) is given by
1 r
.
R= .. r r
.. .
··· ··· .. .
r r
r r
··· ···
1 r
r 1
.. .
r r
.. . . r
(2)
1
In a sense, the correlation matrix of Eq. (2) is the multi-dimensional analogue of the two-dimensional correlation matrix of Eq. (1). Analogous to the definition of a symmetric correlation structure, we say that the random vector (X1 , . . . , Xd ) has a symmetric covariance structure if its different components share a common variance and a common covariance: Var (Xi ) = v (i = 1, . . . , d) where v is a positive variance value, and Cov Xi , Xj = c (i, j = 1, . . . , d, i ̸= j) where c is a real covariance value. Also, in probability theory, the random vector (X1 , . . . , Xd ) is said to be exchangeable if its distribution is invariant with respect to permutations of its components [16]. Evidently, exchangeability implies a symmetric covariance structure, and a symmetric covariance structure implies a symmetric correlation structure with r = c /v . Using a result from the theory of circulant matrices [17], we calculate the determinant det (R) of the correlation matrix of Eq. (2) (see Section 5 for the details), and obtain that the corresponding overall correlation is given by
ρ = 1 − [1 + (d − 1) r] (1 − r )d−1 .
(3)
Eq. (3) provides an explicit representation of the overall correlation ρ = 1 − det (R) in terms of the dimension d and the pair-wise correlation r. The terms appearing on the right-hand-side of Eq. (3) are, in effect, the eigenvalues of the correlation matrix of Eq. (2). Indeed, this matrix has two real eigenvalues: λ = 1 + (d − 1) r with multiplicity 1, and λ = 1 − r with multiplicity d − 1. Consider the overall correlation of Eq. (3), as well as the aforementioned eigenvalues, to be functions of the common pair-wise correlation value: ρ = ρ (r ) and λ = λ (r ). As noted in Section 2.1, the overall correlation takes values in the unit interval, and hence Eq. (3) implies that the function ρ (r ) 1 ≤ r ≤ 1. Over this range the eigenvalue λ (r ) = 1 + (d − 1) r is an is defined over the range − d− 1 1 affine function that increases linearly from λ(− d− ) = 0 to λ(1) = d, and the eigenvalue λ (r ) = 1 − r 1
1 is an affine function that decreases linearly from λ(− d− ) = d−d 1 to λ(1) = 0. The two affine functions 1 intersect at the zero pair-wise correlation, r = 0, where they yield the eigenvalue λ (0) = 1. The function ρ (r ) is a polynomial of order d, and its power-series expansion is given by
ρ (r ) =
d−1 d k=2
k
(k − 1) (−r )k + (d − 1) (−r )d .
(4)
1 ≤ r ≤ 1, is as follows: it initiates from one The behavior of the function ρ (r ), over the range − d− 1
(ρ(− d−1 1 ) = 1), decreases monotonically on the sub-range − d−1 1 ≤ r ≤ 0 to zero (ρ (0) = 0), and then increases monotonically on the sub-range 0 ≤ r ≤ 1 to one (ρ (1) = 1). Also, the function ρ (r ) 1 is convex in the sub-range − d− ≤ r ≤ d−1 1 and is concave in the sub-range d−1 1 ≤ r ≤ 1. Graphs of 1 the function ρ (r ), for various dimensions d, are plotted in Fig. 1. In dimension d = 2 the function ρ (r ) is defined over the range −1 ≤ r ≤ 1 and it is symmetric: ρ (−r ) = ρ (r ). In higher dimensions (d > 2) symmetry no longer holds: the range − d−1 1 ≤ r ≤ 1 is asymmetric, and the function ρ (r ) is asymmetric and skewed. The asymmetry of the function ρ (r ) implies an asymmetry between positive and negative values of the pair-wise correlation r — and the higher the dimension d, the greater the asymmetry. Indeed, assume that we want to attain a high degree of overall correlation: ρ (r ) ≃ 1. In the case of positive correlations this target is met only with a large value of the pair-wise correlation: r ≃ 1. However, in the case of negative correlations this target is met with a minute value of the pair-wise correlation: r ≃ −1/ (d − 1), provided that
142
I. Eliazar / Annals of Physics 374 (2016) 138–161
Fig. 1. Graphs of the function ρ = ρ (r ) of Eq. (3). The horizontal axis represents the input r over the range −1 ≤ r ≤ 1, and the vertical axis represents the output ρ over the range 0 ≤ ρ ≤ 1. The different graphs manifest different values of the dimension parameter: (i) red line — d = 2; (ii) orange line — d = 4; (iii) green line — d = 6; (iv) blue line — d = 8; (v) purple line — d = 10. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
the dimension d is high. Thus, in high dimensions, even a seemingly insignificant value of negative pair-wise correlation can result in a dramatic overall correlation. 2.3. Large systems As stated in the opening of this section, our goal is to study the overall correlation of large systems with a symmetric correlation structure — large meaning high dimensional. Having described the measurement of overall correlation, and having described the notion of a symmetric correlation structure, we are all set to meet our goal — and we shall apply an asymptotic analysis to do so. To accommodate the asymptotic analysis consider a sequence of systems labeled by n = 1, 2, 3, . . . . Specifically, we consider system n to have a symmetric correlation structure with common pair-wise correlation rn , and to be of dimension dn — an integer that diverges to infinity in the limit n → ∞. Set ωn = (dn − 1) rn to be the scaled pair-wise correlation of system n, the scaling factor effectively being the system’s dimension. Eq. (3) implies that the overall correlation of system n is given by
ωn ρn = 1 − (1 + ωn ) 1 − dn − 1
dn −1
.
(5)
Note that as the pair-wise correlation takes values in the range − d 1−1 ≤ rn ≤ 1, the scaled pairn wise correlation takes values in the range −1 ≤ ωn ≤ dn − 1. Eq. (5) implies that a finite limit ρ∞ = limn→∞ ρn exists if and only if a finite limit ω∞ = limn→∞ ωn exists. Consequently, if the limit ω∞ exists then it takes values in the range −1 ≤ ω∞ < ∞, and we have
ρ∞ = 1 − (1 + ω∞ ) exp (−ω∞ ) .
(6)
Consider the limiting overall correlation to be a function of the limiting scaled pair-wise correlation: ρ∞ = ρ∞ (ω∞ ). Eq. (6) implies that the behavior of the function ρ∞ (ω∞ ) is as follows: it initiates from one (ρ∞ (−1) = 1), decreases monotonically on the sub-range −1 ≤ ω∞ ≤ 0 to zero (ρ∞ (0) = 0), and then increases monotonically on the sub-range 0 ≤ ω∞ < ∞ to one (ρ∞ (∞) = 1). Also, the function ρ∞ (ω∞ ) is convex in the sub-range −1 ≤ ω∞ ≤ 1 and is concave
I. Eliazar / Annals of Physics 374 (2016) 138–161
143
Fig. 2. Graph of the function ρ∞ = ρ∞ (ω∞ ) of Eq. (6). The horizontal axis represents the input ω∞ over the range −1 ≤ ω∞ ≤ 9, and the vertical axis represents the output ρ∞ over the range 0 ≤ ρ∞ ≤ 1.
in the sub-range 1 ≤ ω∞ < ∞. As in the case of the function ρ (r ) of Eq. (3), the function ρ∞ (ω∞ ) of Eq. (6) is defined on an asymmetric range — over which it is asymmetric and skewed. The power-series expansion of the function ρ∞ (ω∞ ) is given by 2 ρ∞ (ω∞ ) = ω∞
∞ k=0
(−ω∞ )k , k! (k + 2) 1
(7)
and its graph is plotted in Fig. 2. In practice, the above asymptotic analysis is applied to a large, yet finite, system with n ≫ 1. For such a system:
• rn is the microscopic pair-wise correlation — which we henceforth term, in short, ‘‘micro-correlation’’. • The limit ω∞ is approximated by the product of the system’s dimension and micro correlation: ω∞ ≃ dn rn . • ρ∞ = ρ∞ (ω∞ ) is the macroscopic overall correlation — which we henceforth term, in short, ‘‘macro-correlation’’. Eq. (6) provides a closed-form formula facilitating the transition from micro-correlations to macrocorrelations. With this formula we can quantify the system-level effect of pair-wise correlations that are present on the components-level — thus determining how the microscopic affects the macroscopic. 2.4. Two analogies We conclude this section with two analogies of the asymptotic analysis of the symmetriccorrelation-structure model: the Law of Small Numbers yielding the Poisson distribution [10–12], and the Central Limit Theorem yielding the Gauss distribution [7–9]. Both the Poisson distribution and the Gauss distribution are encountered ubiquitously across science and engineering. Here and hereinafter E [·] denotes the operation of mathematical expectation.
144
I. Eliazar / Annals of Physics 374 (2016) 138–161
2.4.1. Poisson analogy Consider a bulk of radioactive matter which we monitor with a Geiger counter for some fixed time period. The bulk is composed of d atoms, each atom has a probability p of undergoing a radioactive decay during the monitoring period, and the atoms decay independently of each other. Set Xk to be a binary random variable indicating the decay of the kth atom (k = 1, . . . , d). Specifically: Xk = 1 if atom k decayed during the monitoring period (with probability p), and Xk = 0 otherwise (with probability 1 − p). The sum S = X1 + · · · + Xd is the random output of the Geiger counter, and its Probability Generating Function (PGF) is given by
ψ (z ) := E z S = [1 − p (1 − z )]d
(8)
(z real). In this radioactive-decay model the common decay probability p and the PGF ψ (z ) are, respectively, the analogues of the common pair-wise correlation r and the overall correlation ρ in the symmetric-correlation-structure model of Section 2.2. The number of atoms in the bulk is vast and calls for an asymptotic analysis. To that end we consider a sequence of bulks labeled by n = 1, 2, 3, . . . . For the nth bulk: the atoms’ common decay probability is pn , and the number of atoms is dn — an integer that diverges to infinity in the limit n → ∞. Also, we set λn = dn pn . Consequently, the PGF of the nth bulk is given by
λn (1 − z ) dn ψn (z ) = 1 −
(9)
dn
(z real). In turn, Eq. (9) implies that a finite PGF limit ψ∞ (z ) = limn→∞ ψn (z ) exists if and only if a finite limit λ∞ = limn→∞ λn exists — in which case we have
ψ∞ (z ) = exp [−λ∞ (1 − z )]
(10)
(z real and λ∞ positive). Eqs. (9) and (10) are, respectively, the analogues of Eqs. (5) and (6). The limiting PGF of Eq. (10) characterizes a Poisson distribution with rate λ∞ . In the radioactivedecay model we shifted from the microscopic atoms’ level to the macroscopic bulk level. The micro-level is quantified by the atoms’ common decay probability pn , and the macro-level is quantified by the bulk’s Poisson rate λ∞ . Analogously, in the symmetric-correlation-structure model the microlevel is quantified by the common pair-wise correlation rn , and the macro-level is quantified by the limit ω∞ . 2.4.2. Gauss analogy Consider a random walker over the real line. The walker initiates at the origin and makes d steps of size ∆. Each step is either left or right with equal probability (a fair coin toss), and the steps are independent of each other. Set Xk to represent the kth step (k = 1, . . . , d). Specifically: Xk = −∆ if step k is to the left (with probability 1/2), and Xk = +∆ if it is to the right (with probability 1/2). The sum S = X1 + · · · + Xd is the walker’s end location, and its Characteristic Function (CF) is given by
ϕ (z ) := E [exp (izS )] = [cos (∆z )]d
(11)
(z real). In this random-walk model the common step size ∆ and the CF ϕ (z ) are, respectively, the analogues of the common pair-wise correlation r and the overall correlation ρ in the symmetriccorrelation-structure model of Section 2.2. We now ‘zoom out’ and observe the random walker from afar via an asymptotic analysis. To that end we consider a sequence of walkers labeled by n = 1, 2, 3, . . . . For the nth walker: the common step size is ∆n , and the number of steps is dn — an integer that diverges to infinity in the limit n → ∞. Also, we set νn = dn ∆2n . Consequently, the CF of the nth walker is given by
ϕn (z ) = [cos (∆n z )]
dn
≈ 1−
νn z 2 2dn
dn (12)
I. Eliazar / Annals of Physics 374 (2016) 138–161
145
(z real). In turn, Eq. (12) implies that a finite CF limit ϕ∞ (z ) = limn→∞ ϕn (z ) exists if and only if a finite limit ν∞ = limn→∞ νn exists — in which case we have
z2 ϕ∞ (z ) = exp −ν∞ 2
(13)
(z real and ν∞ positive). Eqs. (12) and (13) are, respectively, the analogues of Eqs. (5) and (6). The limiting CF of Eq. (13) characterizes a Gauss distribution with variance ν∞ . In the randomwalk model we shifted from the microscopic perspective to the macroscopic perspective. The micro-perspective is quantified by the walker’s common step size ∆n , and the macro-perspective is quantified by the walker’s variance ν∞ . Analogously, in the symmetric-correlation-structure model the micro-perspective is quantified by the common pair-wise correlation rn , and the macroperspective is quantified by the limit ω∞ . 3. Applications In this section we shall demonstrate various applications of the method described in Section 2. In particular, we shall focus on the transition from micro-correlations to macro-correlations. 3.1. Aggregation framework Consider the following aggregation framework. The state of system n is represented by an ndimensional random vector (X1 , . . . , Xn ) with a symmetric covariance structure. Also, the state of system n satisfies the sum constraint X1 + · · · + Xn = sn ,
(14)
where sn is a real value. Namely, the components of the random vector (X1 , . . . , Xn ) sum up to sn . The sum constraint of Eq. (14) implies that the micro-correlation is given by rn = −
1
(15) n−1 (see Section 5 for the details). Given a number 0 < f < 1 we observe a fraction f of the components of the random vector we taketo be the first components. Specifically, we (X1 , . . . , Xn ) — which, with no loss of generality, observe the dn -dimensional random vector X1 , . . . , Xdn , where dn = ⌊fn⌋ is the integer part of fn. Consequently, we obtain that
ω∞ = −f .
(16)
So, it is the fraction f of the observed components that determines the value of the macro-correlation, and as a function of this fraction the macro-correlation is given by ρ∞ = ρ∞ (−f ) (0 < f < 1). Note that the function ρ∞ (−f ) is monotone increasing from the zero level ρ∞ (−0) = 0 to the unit level ρ∞ (−1) = 1. In what follows we describe three specific applications of the aggregation framework. 3.1.1. Particles and cells A foundational topic in statistical physics is the random scattering of p particles into n cells. Denoting by Xk the number of particles in cell k, we obtain an n-dimensional random vector (X1 , . . . , Xn ) that satisfies the sum constraint of Eq. (14) with sn = p. The ‘‘random scattering’’ can be performed in three major methods [18–20]. In the first method each particle, independently of all the other particles, picks at random a cell; this method is called Maxwell–Boltzmann statistics. In the second method a configuration – i.e. a specific solution of the equation x1 + · · · + xn = p, with xk ∈ {0, 1, . . . , p} – is picked at random; this method is called Bose–Einstein statistics. In the third method each cell can accommodate no more than one particle (hence the number of particles must be no greater than the number of cells, p ≤ n), and a configuration — i.e. a specific solution of the equation x1 + · · · + xn = p, with xk ∈ {0, 1} — is picked at random; this method is called Fermi–Dirac statistics. All three scattering methods yield a symmetric covariance structure of the random vector (X1 , . . . , Xn ).
146
I. Eliazar / Annals of Physics 374 (2016) 138–161
3.1.2. Lévy bridges One of the most important and fundamental classes of continuous-time stochastic processes constitutes of Lévy processes [21–23]. The Lévy class – which includes the Poisson process, compound Poisson processes, Brownian motion, and stable motions – has numerous applications in science and engineering. A Lévy process (L (t ))t ≥0 is a stochastic process that initiates at the origin (L (0) = 0) and that is characterized by stationary and independent increments. The increment of the Lévy process over the time interval (a, b) is its displacement over this interval: L (b) − L (a) (with 0 ≤ a < b < ∞). The stationary-increments property means that increments over time intervals of the same length are equal in law. The independent-increments property means that increments over disjoint time intervals are independent random variables. A Lévy bridge over the time interval (0, ∆) is obtained by disclosing the displacement of the Lévy process (L (t ))t ≥0 over this interval: L (∆) = l. Thus, knowing the displacement, the Lévy process ‘‘bridges’’ the origin (0, 0) and the endpoint (∆, l). Given a positive integer n we partition the time interval (0, ∆) into n sub-intervals of equal length ∆/n, and set Xk to be increment over the kth subinterval:
Xk = L
k n
k−1 ∆ −L ∆ . n
(17)
Evidently, the random vector (X1 , . . . , Xn ) satisfies the sum constraint of Eq. (14) with sn = l. The underlying Lévy structure induces a symmetric covariance structure of the random vector (X1 , . . . , Xn ). 3.1.3. Lévy random probabilities A Lévy process (L (t ))t ≥0 is called a subordinator if its trajectories are monotone increasing with probability one. Given a Lévy subordinator and a time interval (0, ∆), a Lévy random probability is constructed as follows [24–26]. The random probability of a sub-interval (a, b) (with 0 ≤ a < b ≤ ∆) is the ratio of the Lévy subordinator’s increment over the sub-interval to the Lévy subordinator’s increment over whole interval: [L (b) − L (a)] /L (∆). Consequently, given a positive integer n we partition the time interval (0, ∆) into n sub-intervals of equal length ∆/n, and set Xk to be the random probability of the kth sub-interval:
k ∆ − L k−n 1 ∆ n . (18) Xk = L ( ∆) Evidently, the random vector (X1 , . . . , Xn ) satisfies the sum constraint of Eq. (14) with sn = 1. The L
underlying Lévy structure induces a symmetric covariance structure of the random probability vector (X1 , . . . , Xn ). 3.2. Conditional IID framework Consider the following conditional IID framework. The state of system n is represented by an n-dimensional random vector (X1 , . . . , Xn ) with components that are conditionally IID. Specifically, provided an information In , the components of the random vector (X1 , . . . , Xn ) are IID random variables with common conditional mean Mn and with common conditional variance Vn . In statistics this framework is termed mixture model [13–15]. The laws of conditional variance and conditional covariance imply a symmetric covariance structure with micro-correlation that is given by rn =
Var [Mn ] Var [Mn ] + E [Vn ]
(19)
(see Section 5 for the details). In turn, setting dn = n, Eq. (19) implies that
ω∞ = lim n n→∞
Var [Mn ] E [Vn ]
,
provided that the limit appearing on the right-hand-side of Eq. (20) exists.
(20)
I. Eliazar / Annals of Physics 374 (2016) 138–161
147
We now turn to present an assortment of examples of the conditional IID framework. All the examples share the following setting. On the one hand there is an underlying statistical distribution Φθ that is determined by a real parameter θ . On the other hand the information In is a real-valued random variable Θn , with constant mean and with variance that is inverse to the dimension: E[Θn ] = µ where µ is a real constant, and Var[Θn ] = σ 2 /n where σ is a positive constant. Provided the information In , the components of the random vector (X1 , . . . , Xn ) are IID random variables that are drawn from the statistical distribution Φθ with parameter θ = Θn . The ‘‘Θn information setting’’ naturally arises from an averaging mechanism that is described as follows. In system n we have n agents, agent k arrives with its own ‘‘hidden parameter’’ ξk , and the agents’ hidden parameters are IID copies of a general random variable ξ with mean E [ξ ] = µ and with variance Var [ξ ] = σ 2 . The agents decide to use a common parameter Θn , which is set to be the average of their individual hidden parameters:
Θn =
ξ1 + · · · + ξn n
.
(21)
This averaging indeed implies that the common parameter Θn has mean E[Θn ] = µ and variance Var[Θn ] = σ 2 /n.
3.2.1. Bernoulli distribution Consider a financial portfolio consisting of n contracts, where the bearer of each contract can either default or not. Let Xk denote the binary variable representing the kth contract: Xk = 1 if the bearer of this contract defaults, and Xk = 0 if not. The bearers of the contracts act independently of each other, and if we knew the exact state of the economy then the components of the random vector (X1 , . . . , Xn ) would be IID Bernoulli random variables. Namely: Pr (Xk = 1) = θ and Pr (Xk = 0) = 1 − θ , where the parameter θ is a default probability that is determined by the state of the economy. However, the exact state of economy is not known, and instead of a precise default probability θ we are given the random variable Θn (taking values in the unit interval, 0 < Θn < 1). Thus, the common conditional mean is Mn = Θn , and the common conditional variance is Vn = Θn (1 − Θn ). In turn, Eq. (20) yields the limit
ω∞ =
σ2 . µ (1 − µ)
(22)
In the field of credit risk this conditional IID setting is termed ‘‘exchangeable Bernoulli mixture model’’ [27].
3.2.2. Poisson distribution Consider n agents that were exposed, in an identical manner, to a radiation source. Each agent is equipped with a radiation counter, and Xk denotes the integer variable representing the radiation count of the kth agent. The agents were exposed to the radiation source independently of each other, and hence if we knew the exact radioactivity level of the source then the components of the random x vector (X1 , . . . , Xn ) would be IID Poisson random variables. Namely: Pr (Xk = x) = exp (−θ ) θx! (x = 0, 1, 2, . . .), where the parameter θ is a rate that is determined by the source’s radioactivity. However, the exact radioactivity level of the source is not known, and instead of a precise rate θ we are given the positive-valued random variable Θn . Thus, the common conditional mean is Mn = Θn , and the common conditional variance is Vn = Θn . In turn, Eq. (20) yields the limit
ω∞ =
σ2 . µ
(23)
148
I. Eliazar / Annals of Physics 374 (2016) 138–161
3.2.3. Exponential distribution Consider n inter-quake durations of a fault, and let Xk denote the length of the kth duration. If we knew the exact geology of the fault then the components of the random vector (X1 , . . . , Xn ) would be IID exponential random variables. Namely: Pr (Xk > x) = exp (−x/θ ) (x ≥ 0), where the parameter θ is an inverse-rate that is determined by the faults’s geology. However, the exact geology of the fault is not known, and instead of a precise inverse-rate θ we are given the positive-valued random variable Θn . Thus, the common conditional mean is Mn = Θn , and the common conditional variance is Vn = Θn2 . In turn, Eq. (20) yields the limit
ω∞ =
σ2 . µ2
(24)
3.2.4. Uniform distribution Consider n points that are scattered randomly along a rugged curve. Let Xk denote the distance, along the curve, of point k from one of the curve’s ends. If we knew the exact length of the rugged curve then the components of the random vector (X1 , . . . , Xn ) would be IID uniform random variables. Namely: Pr (Xk ≤ x) = x/θ (0 ≤ x ≤ θ ), where the parameter θ is the curve’s length. However, the exact length of the rugged curve is not known, and instead of the precise parameter θ we are given the positive-valued random variable Θn . Thus, the common conditional mean is Mn = 21 Θn , and the common conditional variance is Vn =
ω∞ = 3
1 Θ 2. 12 n
In turn, Eq. (20) yields the limit
σ2 . µ2
(25)
3.2.5. Rayleigh distribution Consider n diffusive particles that are initiated from the origin of a plane. The diffusion is isotropic, after a fixed time we take a snapshot of the particles’ planar positions, and we denote by Xk the distance of particle k from the origin. If we knew the exact physics of the diffusive transport then the components of the random vector (X1 , . . . , Xn ) would be IID Rayleigh random variables. Namely: Pr (Xk > x) = exp[− (x/θ)2 /2] (x ≥ 0), where the parameter θ is the particles’ diffusion coefficient. However, the exact physics of the diffusive transport is not known, and instead of the parameter θ we are common conditional mean is πgiven the positive-valued random variable Θn . Thus, 4the Θn , and the common conditional variance is Vn = −π Θn2 . In turn, Eq. (20) yields the Mn = 2 2 limit
ω∞ =
π σ2 . 4 − π µ2
(26)
3.2.6. Pareto distribution Consider n rich citizens, and let Xk denote the wealth of the kth rich citizen. The distribution of wealth in human societies is known to be lognormal in its bulk and Pareto in its tail [28–30]. If we knew the exact threshold between the middle-class and the rich then the components of the random vector (X1 , . . . , Xn ) would be IID Pareto random variables. Namely: Pr (Xk > x) = (θ /x)ϵ (θ ≤ x < ∞), where the parameter θ is the Pareto cutoff point, and where ϵ is the Pareto exponent. However, the exact threshold between the middle-class and the rich is not known, and instead of a precise threshold θ we are given the positive-valued random variable Θn . Thus, the common conditional mean ϵ ϵ Θ , and the common conditional variance is Vn = Θn2 (we consider the Pareto is Mn = ϵ− 2 1 n (ϵ−1) (ϵ−2)
exponent to be greater than two). In turn, Eq. (20) yields the limit
ω∞ = ϵ (ϵ − 2)
σ2 . µ2
(27)
I. Eliazar / Annals of Physics 374 (2016) 138–161
149
3.2.7. Scaling mechanism The last four examples – the exponential, uniform, Rayleigh, and Pareto distributions – are special cases of a general scaling mechanism. Indeed, in these examples the statistical distribution of the random variable Xk was of the form Pr (Xk ≤ x) = Φ (x/θ), where Φ (z ) is a cumulative distribution function defined over the real line. So, the parameter θ is a common underlying scale, and the following stochastic representation holds: Xk = θ Zk (k = 1, . . . , n), where {Z1 , . . . , Zn } are IID random copies of a generic random variable Z that is governed by the statistical distribution Pr (Z ≤ z ) = Φ (z ). As above, the precise value of the common scale θ is not known, and instead of the parameter θ we are given the positive-valued random variable Θn . Thus, the common conditional mean is Mn = E[Z ]Θn , and the common conditional variance is Vn = Var[Z ]Θn2 . In turn, Eq. (20) yields the limit
ω∞ = γ
σ2 , µ2
(28)
where γ = E[Z ]2 /Var[Z ]. Note that in terms of the random variables ξ and Z – the random variable ξ representing the underlying averaging mechanism, and the random variable Z representing the underlying scaling mechanism – Eq. (28) can be written in the neat form
ω∞ =
Var[ξ ]/E[ξ ]2 Var[Z ]/E[Z ]2
.
(29)
With the general notion of scaling at hand we shall now revisit the exponential, uniform, Rayleigh, and Pareto examples.
3.2.8. Relaxation The exponential example is a special case of the scaling example, with the generic random variable Z being governed by the unit-mean exponential distribution: Pr (Z > z ) = exp (−z ) (z ≥ 0). This exponential distribution characterizes the ‘‘normal’’ Debye relaxation, which is omnipresent across the sciences. However, in addition to the ‘‘normal’’ Debye relaxation many natural processes and phenomena exhibit also ‘‘anomalous’’ forms of relaxation — and perhaps the most prevalent anomalous form is the stretched-exponential relaxation [31–33]. In terms of the random variable Z both the Debye relaxation and the stretched-exponential relaxation are represented by the Weibull distribution: Pr (Z > z ) = exp (−z ϵ ) (z ≥ 0), where the exponent ϵ is a positive parameter that characterizes the type of the relaxation — ‘‘normal’’ when ϵ = 1, and ‘‘anomalous’’ when ϵ ̸= 1. More specifically, the exponent range 0 < ϵ < 1 characterizes stretched-exponential relaxation, and the exponent range 1 < ϵ < ∞ characterizes super-exponential relaxation. For this Weibull setting the coefficient of Eq. (28) is given by γ = 1/(γϵ − 1), where γϵ = Γ (1 + 2ϵ )/Γ (1 + 1ϵ )2 . Note that the Weibull exponent ϵ = 1 indeed yields the coefficient of the exponential example: ϵ = 1 ⇒ γ = 1.
3.2.9. Scattering The uniform example is a special case of the scaling example, with the generic random variable Z being governed by the uniform distribution over the unit interval: Pr (Z ≤ z ) = z (0 ≤ z ≤ 1). The uniform example emanates, effectively, from the random scattering of points over a one-dimensional interval. Consider now the random scattering of n points over a δ -dimensional ball of radius θ , and denote by Xk the distance of point k from the ball’s center. This δ -dimensional random scattering admits the setting of the scaling example, with the generic random variable Z being governed by the following power-law distribution: Pr (Z ≤ z ) = z δ (0 ≤ z ≤ 1). For this power-law setting the coefficient of Eq. (28) is given by γ = δ (2 + δ). Note that the dimension δ = 1 indeed yields the coefficient of the uniform example: δ = 1 ⇒ γ = 3.
150
I. Eliazar / Annals of Physics 374 (2016) 138–161
3.2.10. Diffusion The Rayleigh example is a special case of the scaling example, with the generic random variable Z being governed by the unit-mode Rayleigh distribution: Pr (Z > z ) = exp −z 2 /2 (z ≥ 0). Also, as noted above, the Rayleigh example emanates from isotropic two-dimensional diffusion. Consider now the analogous δ -dimensional isotropic diffusion: n diffusive particles that are initiated from the origin of a δ -dimensional space, after a fixed time a snapshot of the particles’ spatial positions is taken, and Xk denotes the distance of particle k from the origin. The particles’ diffusion coefficient is the scale parameter θ , and the statistical distribution of generic random variable Z is characterized as follows: its square, Z 2 , is governed by a chi-square distribution with δ degrees of freedom. For this chi)2 . square setting the coefficient of Eq. (28) is given by γ = 1/(γδ − 1), where γδ = 2δ Γ ( 2δ )2 /Γ ( 1+δ 2
2 ; (ii) in two dimensions we have In particular: (i) in one dimension we have δ = 1 ⇒ γ = π− 2 π δ = 2 ⇒ γ = 4−π ; and (iii) in three dimensions we have δ = 3 ⇒ γ = 3π8−8 . The diffusion we considered so far was of the ‘‘normal’’ type — and ‘‘normal’’ diffusion is indeed ubiquitous across the sciences. However, in addition to the ‘‘normal’’ diffusion many natural processes and phenomena exhibit also ‘‘anomalous’’ types of diffusion [34–36]. In fact, ‘‘anomalous’’ diffusion turns out to be rather normal and ubiquitous [37–39]. Both normal and anomalous isotropic diffusions admit the following stochastic representation. Consider a particle initiated from the origin, and let D (t ) denote the particle’s distance from the origin at time t (t ≥ 0). Then: the random variable D (t ) is equal, in law, to the random variable t ϵ/2 D (1), where the exponent ϵ is a positive parameter. The exponent ϵ characterizes the type of the diffusion — ‘‘normal’’ when ϵ = 1, and ‘‘anomalous’’ when ϵ ̸= 1. More specifically, the exponent range 0 < ϵ < 1 characterizes ‘‘sub-diffusion’’, and the exponent range 1 < ϵ < ∞ characterizes ‘‘super-diffusion’’. A key parameter of a general diffusion is its intrinsic rate λ. Namely, the particle’s distance from the origin at time t is actually given by D (λt ), rather than by D (t ). Hence, the particle’s distance ϵ/2 from the origin at time t = 1 is equal, in law, to the random √ variable λ D (1). In the context of ‘‘normal’’ diffusion the particle’s diffusion coefficient is θ = λ, and in the context of general diffusion the diffusion coefficient is given by the term θ = λϵ/2 . Thus, in addition to ‘‘normal’’ diffusion, also ‘‘anomalous’’ diffusion is a special case of the scaling example — with scaling parameter θ = λϵ/2 , and with generic random variable Z = D (1).
3.2.11. Power laws The Pareto example is a special case of the scaling example, with the generic random variable Z being governed by the unit-cutoff Pareto distribution: Pr (Z > z ) = 1/z ϵ (z ≥ 0). The Pareto distribution follows a power-law in its upper tail. In the example of random scattering we encountered a distribution that follows a power-law in its lower tail: Pr (Z ≤ z ) = z δ (0 ≤ z ≤ 1). In fact, the empirical distributions of many positive-valued quantities happen to exhibit power-laws in both their lower and upper tails [28], and this double power-law structure was shown to emerge universally from general transient and equilibrium mechanisms [40–43]. The beta distribution and the beta-prime distribution are key statistical models that yield a double power-law structure. The beta and beta-prime distributions correspond, respectively, to the cases of bounded and unbounded positive-valued quantities. On the one hand these distributions are analytically tractable and are determined by only two parameters — two positive exponents α and β . On the other hand these distributions are highly versatile and their densities span a variety of shapes. We now turn to exemplify the application of the general scaling mechanism in the context of the beta and beta-prime distributions. In what follows we use the shorthand notation c (α, β) = Γ (α + β) /[Γ (α) Γ (β)]. In the case of the beta distribution the generic random variable Z takes values in the unit interval, and is governed by the density c (α, β) z α−1 (1 − z )β−1 (0 ≤ z ≤ 1). For this beta setting the coefficient of Eq. (28) is given by γ = α (α + β + 1) /β . Note that the beta exponent β = 1 yields the power-law distribution we encountered in the example of random scattering, Pr (Z ≤ z ) = z α (0 ≤ z ≤ 1), with γ = α (α + 2). In the case of the beta-prime distribution the generic random variable Z is positive-valued, and is governed by the density c (α, β) z α−1 / (1 + z )α+β (0 ≤ z < ∞). For this beta-prime setting the
I. Eliazar / Annals of Physics 374 (2016) 138–161
151
coefficient of Eq. (28) is given by γ = α (β − 2) /(α + β − 1) (we consider the beta-prime exponent β to be greater than two). Note that the beta-prime exponent α = 1 yields a Pareto distribution that is supported over the positive half-line, Pr (Z > z ) = 1/(1 + z )β (0 ≤ z < ∞), with γ = (β − 2) /β . 3.3. Conditional Lévy framework We already presented applications to Lévy processes in the context of the aggregation framework of Section 3.1. In this subsection we shall present further applications to Lévy processes in the context of a conditional Lévy framework. Specifically, we consider a stochastic process (X (t ))t ≥0 that – provided an information I – is a Lévy process with conditional mean M = E[X (1) |I] and with conditional variance V = Var[X (1) |I]. Given a time interval (0, ∆) and a positive integer n we partition the interval into n sub-intervals of equal length ∆/n, and set Xk to be the increment of the stochastic process (X (t ))t ≥0 over the kth sub-interval:
Xk = X
k n
∆ −X
k−1 n
∆ .
(30)
So, we obtain an n-dimensional random vector (X1 , . . . , Xn ) whose components are the increments of a conditional Lévy process. The underlying Lévy structure, together with the conditional IID framework of Section 3.2, imply a symmetric covariance structure with micro-correlation that is given by rn =
Var [M] ∆2 Var [M] ∆2 + nE [V ] ∆
(31)
(see Section 5 for the details). In turn, setting dn = n, Eq. (31) implies that
ω∞ =
Var [M] E [V ]
∆.
(32)
In what follows we describe four different applications of this framework. 3.3.1. Brownian motion Brownian motion is one of the best known examples of Lévy processes, and is the predominant statistical model of ‘‘normal’’ diffusion [44–46]. The basic form of Brownian motion, {B (t )}t ≥0 , is a Lévy process with increments governed by the Gauss law — with zero means, and with variances that equal the increments’ lengths. √The general form of Brownian motion, termed linear Brownian motion, is given by: L (t ) = δ t + ν B (t ) (t ≥ 0), where δ is a real ‘‘drift’’ parameter, and where ν is a positive ‘‘volatility’’ parameter. The exponentiation of linear Brownian motion yields geometric Brownian motion, which is the prototypical model of price processes in economics and finance, and which is the foundation of the Merton–Black–Scholes theory of option pricing [47–49]. Consider a real-valued random variable M and a positive-valued random √ variable V that are both independent of a basic Brownian motion {B (t )}t ≥0 . Setting X (t ) = Mt + V B (t ) (t ≥ 0) yields a conditional linear Brownian motion with ‘‘random drift’’ M and with ‘‘random volatility’’ V . The random drift M is the conditional mean, and the random volatility V is the conditional variance. 3.3.2. Poisson process The Poisson process is a key example of Lévy processes, and is the principal statistical model of sequences of events occurring randomly in time [50–52]. The basic form of the Poisson process, {N (t )}t ≥0 , is a Lévy process with increments governed by the Poisson law — with means that equal the increments’ lengths. The general form of the Poisson process is given by: L (t ) = N (λt ) (t ≥ 0), where λ is a positive ‘‘rate’’ parameter. Consider a positive-valued random variable Λ that is independent of a basic Poisson process {N (t )}t ≥0 . Setting X (t ) = N (Λt ) (t ≥ 0) yields a conditional Poisson process with ‘‘random rate’’ Λ. Both the conditional mean and the conditional variance are given by the random rate: M = Λ and V = Λ. Poisson processes with random rates were introduced and pioneered by Cox [53], and are named in his honor.
152
I. Eliazar / Annals of Physics 374 (2016) 138–161
3.3.3. Random paces In the Poisson example we set a ‘‘random rate’’ into a basic Poisson process, thus obtaining a cox process. In fact, this setting can be applied in the context of general Lévy processes. Indeed, consider a Lévy process {L (t )}t ≥0 with mean E[L (1)] = µ and with variance Var[L (1)] = σ 2 . Also, consider a positive-valued random variable Λ that is independent of the Lévy process {L (t )}t ≥0 . Setting X (t ) = L (Λt ) (t ≥ 0) yields a conditional Lévy process with an intrinsic ‘‘random pace’’ Λ. This setting implies that the conditional mean is given by the product of the random rate and the Lévy mean, and that the conditional variance is given by the product of the random rate and the Lévy variance: M = Λµ and V = Λσ 2 . Consequently, Eq. (32) implies that
ω∞ =
Var [Λ] µ2 E [Λ]
σ2
∆.
(33)
3.3.4. Compound Poisson processes Compound Poisson processes are yet another key example of Lévy processes, and they constitute the fundamental statistical model of random walks in continuous time [54–56]. A compound Poisson process admits the form L (t ) =
N (λt )
Jk ,
(34)
k=1
where: (i) {N (t )}t ≥0 is a basic Poisson process, and λ is a positive ‘‘rate’’ parameter; (ii) {Jk }∞ k=1 is a sequence of IID real-valued random variables, which are independent of the Poisson process. The form of the compound Poisson process has the following stochastic meaning: A particle is placed at the origin of the real-line at time t = 0, and thereafter it starts ‘‘jumping around’’; its jumps occur randomly in time, and the jump epochs are governed by a Poisson process with rate λ; its jump-sizes are random variables, and Jk represents the size of the kth jump. Consider a conditional compound Poisson process, {X (t )}t ≥0 , which is described as follows. Provided an information I, {X (t )}t ≥0 is a compound Poisson process with ‘‘random rate’’ Λ, and its jumps are IID random variables with ‘‘random first moment’’ M1 and with ‘‘random second moment’’ M2 . (Λ and M2 are positive-valued random variables, and M1 is a real-valued random variable). This setting implies that the conditional mean is given by the product of the random rate and the random first moment, and that the conditional variance is given by the product of the random rate and the random second moment: M = ΛM1 and V = ΛM2 . 3.4. Conditional aggregation framework In this section we present a conditional version of the aggregation framework of Section 3.1. To that end we consider, at first, the following conditional symmetric framework. The state of system n is represented by an n-dimensional random vector (X1 , . . . , Xn ) with components that are conditionally symmetric. Specifically, provided an information In , the components of the random vector (X1 , . . . , Xn ) share a common conditional mean Mn , a common conditional variance Vn , and a common conditional covariance Cn . The laws of conditional variance and conditional covariance imply a symmetric covariance structure with micro-correlation that is given by rn =
Var [Mn ] + E [Cn ] Var [Mn ] + E [Vn ]
(35)
(see Section 5 for the details). The conditional aggregation framework is described as follows. A real-valued random variable Sn is drawn from an arbitrary distribution, and this random variable is the information In . Provided the information In , the n-dimensional random vector (X1 , . . . , Xn ) has a symmetric covariance structure that satisfies the sum constraint X1 + · · · + Xn = S n .
(36)
I. Eliazar / Annals of Physics 374 (2016) 138–161
153
This conditional aggregation framework is a special case of the conditional symmetric framework, in which (see Section 5 for the details): (i) the common conditional mean Mn is given by Mn = Sn /n; and (ii) the common conditional variance and the common conditional covariance are coupled by Vn = − (n − 1) Cn . Consequently, setting dn = n and using Eq. (35) we obtain that
ω∞ = lim
n→∞
Var [Sn ] nE [Vn ]
− 1,
(37)
provided that the limit appearing on the right-hand-side of Eq. (37) exists (see Section 5 for the details). In what follows we describe four different applications of this framework. 3.4.1. Particles and cells In Section 3.1 we addressed the random scattering of particles into cells, leading to the Maxwell–Boltzmann, Bose–Einstein, and Fermi–Dirac statistics. We now return to these three statistics, this time having a random number of particles represented by an integer-valued random variable Sn . In the case of a fixed number of particles we saw that the three different statistics all yield a common limit ω∞ . In the case of a random number of particles the different statistics yield different limits. Indeed, for the Maxwell–Boltzmann statistics we obtain that
ω∞ = lim
n→∞
Var [Sn ] E [Sn ]
− 1,
(38)
provided that the limit appearing on the right-hand-side of Eq. (38) exists (see Section 5 for the details). Also, provided that the limit limn→∞ Var [Sn ] /E [Sn ] exists and is finite, we obtain that
ω∞ =
Var[Sn ] lim n→∞ E[Sn ] lim 1 E [Sn ] n→∞ n
1±
− 1,
(39)
where the ± sign appearing on the right-hand-side of Eq. (39) is + for the Bose–Einstein statistics, and is − for the Fermi–Dirac statistics (see Section 5 for the details). 3.4.2. Lévy bridges In Section 3.1 we addressed Lévy bridges — which are the trajectories of Lévy processes conditioned on a given endpoint. We now return to Lévy bridges, this time conditioning on a random endpoint. Specifically, we draw a real-valued random variable Sn , and then consider a Lévy process (L (t ))t ≥0 , over the time interval (0, ∆n ), that is conditioned on the ‘‘endpoint information’’ L (∆n ) = Sn . In the case of a fixed endpoint we saw that all Lévy bridges yield a common limit ω∞ . In the case of a random endpoint different Lévy bridges yield different limits, as we shall now demonstrate with three specific examples. The first example is the Brownian bridge, in which case the underlying Lévy process is the basic Brownian motion discussed in Section 3.3. For this bridge we obtain that
ω∞ = lim
n→∞
Var [Sn ]
∆n
− 1,
(40)
provided that the limit appearing on the right-hand-side of Eq. (40) exists (see Section 5 for the details). The second example is the Poisson bridge, in which case the underlying Lévy process is the basic Poisson process discussed in Section 3.3. For this bridge we obtain that
ω∞ = lim
n→∞
Var [Sn ] E [Sn ]
− 1,
(41)
provided that the limit appearing on the right-hand-side of Eq. (41) exists (see Section 5 for the details). The third example is the Gamma bridge, in which case the underlying Lévy process is the basic Gamma process. The increments of this process are governed by the Gamma law — with means and
154
I. Eliazar / Annals of Physics 374 (2016) 138–161
variances that equal the increments’ lengths. For this bridge we obtain that
ω∞ = lim
n→∞
1 + ∆n 1+
E[Sn ]2 Var[Sn ]
− 1,
(42)
provided that the limit appearing on the right-hand-side of Eq. (42) exists (see Section 5 for the details). 3.4.3. A non-scaling scenario Consider the following non-scaling scenario: Sn = S, where S is a real-valued random variable with mean E[S ] = µ and variance Var[S ] = σ 2 . In the particles-and-cells context this means that we scatter, at random, S particles into n cells. For all three scattering statistics – Maxwell–Boltzmann, Bose–Einstein statistics, and Fermi–Dirac – Eqs. (38) and (39) imply that this scenario yields the common limit
ω∞ =
σ2 − 1. µ
(43)
In the Lévy-bridges context we couple this scenario with a fixed interval length: ∆n = ∆. Namely, we consider Lévy processes over the time interval (0, ∆), conditioned on the ‘‘endpoint information’’ L (∆) = S. Eqs. (40)–(42) imply the following limits:
ω∞
2 σ −1 ∆ σ2 −1 = µ ∆ − µ2 σ 2 + µ2
Brownian bridge, Poisson bridge,
(44)
Gamma bridge.
3.4.4. A scaling scenario Consider the following scaling scenario: Sn = ξ1 + · · · + ξn , where the summands are IID copies of a general real-valued random variable ξ with mean E[ξ ] = µ and variance Var[ξ ] = σ 2 . In the particles-and-cells context this means that cell k comes with a random number of balls ξk , and that all the cells’ balls are collected into one pool — and are then redistributed to the cells via random scattering. In this scenario Eqs. (38) and (39) imply the following limits:
ω∞
2 σ −1 µ σ2 −1 = µ (1 + µ) 2 σ −1 µ (1 − µ)
Maxwell–Boltzmann, Bose–Einstein,
(45)
Fermi–Dirac.
Note that in the case of the Fermi–Dirac statistics the random variable ξ is binary, i.e. ξ ∈ {0, 1}, and hence its mean takes values in the unit interval: 0 < µ < 1. In the Lévy-bridges context we couple this scenario with a scaling interval length: ∆n = n∆. Namely, we consider Lévy processes over the time interval (0, n∆), conditioned on the ‘‘endpoint information’’ L (∆) = ξ1 + · · · + ξn . Eqs. (40)–(42) imply the following limits:
ω∞
2 σ −1 ∆ σ2 −1 = µ 2 σ ∆−1 µ2
Brownian bridge, Poisson bridge, Gamma bridge.
(46)
I. Eliazar / Annals of Physics 374 (2016) 138–161
155
Comparing Eqs. (44) and (46) note that the Brownian and Poisson bridges yield the same limits ω∞ in the non-scaling and scaling scenarios, whereas the Gamma bridge yields different limits ω∞ in these different scenarios. 4. Conclusion In this paper we studied the transition from micro-correlations to macro-correlations in large systems with a symmetric correlation structure. The state of such a system is represented by a highdimensional random vector, with pair-wise correlations between its different components that share a common value — the ‘‘micro-correlation’’. Using a scaling procedure that takes into account the high dimension, an asymptotic analysis gave rise to the ‘‘macro-correlation’’ — a score that takes values in the unit interval, and that quantifies the overall correlation of the random vector. Thus, we established a method that, based on the microscopic pair-wise correlations taking place at the components-scale, calculates the macroscopic overall correlation on the system-scale. The use of this method was demonstrated in a diverse collection of applications — yielding closed-form formulae that produce macro-correlations from micro-correlations. 5. Methods In this section we present detailed derivations of results and formulae stated along the manuscript. 5.1. Proof of Eq. (3) A circulant matrix of dimensions d × d is of the form
m0 m1
md−1 m0
. ..
M=
md−1
md−2
···
m2 m3
..
.. .
.
···
m1
m1 m2
.. , .
(47)
m0
where {m0 , m1 , . . . , md−1 } are arbitrary real numbers [17]. Moreover [17]: (i) the characteristic function of the circulant matrix M is defined as
φ (z ) :=
d−1
mk z k
(48)
k=0
(z complex); and (ii) the determinant of the circulant matrix M is given by det (M) = φ (1) ·
d−1 √ k 2π −1 . φ exp
(49)
d
k=1
Consider the matrix M = R − λI, where: R is the correlation matrix of Eq. (2), λ is a complex parameter, and I is the identity matrix of dimensions d × d. Evidently, the matrix M is circulant with entries m0 = 1 − λ and m1 = · · · = md−1 = r. Hence, the characteristic function of the circulant matrix R is given by
φ (1) = 1 − λ + r (1 + · · · + 1) = 1 − λ + (d − 1) r
(50)
for z = 1, and is given by
φ (z ) = 1 + r z + · · · + z
1
d−1
=1+r
1 − zd 1−z
−1
(51)
for z ̸= 1. Eq. (51) implies that
k φ exp 2π i =1−λ−r d
(52)
156
I. Eliazar / Annals of Physics 374 (2016) 138–161
(k = 1, . . . , d − 1). Substituting Eqs. (50) and (52) into Eq. (49) yields det (M) = [1 − λ + (d − 1) r] · (1 − λ − r )d−1 .
(53)
Setting λ = 0 in Eq. (53) yields det (R) = [1 + (d − 1) r] · (1 − r )d−1 .
(54)
In turn, Eq. (54) yields Eq. (3). Note that, as a function of the parameter λ, det (M) is the characteristic polynomial of the correlation matrix R — and hence the roots of the polynomial are the eigenvalues of the matrix. Specifically, Eq. (53) implies that: (i) λ = 1 + (d − 1) r is an eigenvalue with multiplicity 1; and (ii) λ = 1 − r is an eigenvalue with multiplicity d − 1. 5.2. Proof of Eq. (15) Consider the n-dimensional random vector (X1 , . . . , Xn ) with its symmetric covariance structure, and denote by {vn , cn , rn } the respective variance, covariance,and correlation values. Using the basic n properties of the variance, together with the sum constraint k=1 Xk = sn , we obtain that 0 = Var [sn ]
= Var
n
Xk
k=1
=
n
Var [Xk ] +
n
Cov [Xk , Xl ]
k,l=1 k̸=l
k =1
=
n
vn +
n
cn
k,l=1 k̸=l
k=1
= nvn + n2 − n cn .
(55)
Eq. (55) implies that 0 = 1 + (n − 1)
cn
vn
.
(56)
In turn, as rn = cn /vn , Eq. (56) yields Eq. (15). 5.3. Proof of Eq. (19) Provided the information In , the components of the random vector (X1 , . . . , Xn ) are IID random variables with common conditional mean Mn and with common conditional variance Vn . Thus, considering a component Xk of the random vector (X1 , . . . , Xn ), the law of conditional variance implies that Var [Xk ] = E [Var [Xk |In ]] + Var [E [Xk |In ]]
= E [Vn ] + Var [Mn ] .
(57)
Also, considering different components Xi and Xj of the random vector (X1 , . . . , Xn ), the law of conditional covariance implies that Cov Xi , Xj = E Cov Xi , Xj |In
+ Cov E [Xi |In ] , E Xj |In
= E [0] + Cov [Mn , Mn ] = Var [Mn ] .
(58)
Eqs. (57) and (58) assert that the random vector (X1 , . . . , Xn ) has a symmetric covariance structure with variance value vn = E [Vn ] + Var [Mn ] and covariance value cn = Var [Mn ]. Consequently, the micro-correlation is given by Eq. (19).
I. Eliazar / Annals of Physics 374 (2016) 138–161
157
5.4. Proof of Eq. (31) Consider a general Lévy process (L (t ))t ≥0 . The characteristic properties of Lévy processes imply that [22,23]: (i) the random variable L (t ) has a finite mean if and only if the random variable L (1) has a finite mean, in which case E [L (t )] = E [L (1)] · t
(59)
(t ≥ 0); and (ii) the random variable L (t ) has a finite variance if and only if the random variable L (1) has a finite variance, in which case Var [L (t )] = Var [L (1)] · t
(60)
(t ≥ 0).
Focus now on the conditional Lévy process (X (t ))t ≥0 , with conditional mean M = E[X (1) |I] and conditional variance V = Var[X (1) |I]. Eq. (59), applied to the increments of the process (X (t ))t ≥0 , implies that
E [Xk |I] = E X
k n
=E X
∆ −X
k−1 n
∆ |I
∆ ∆ ∆ |I = E [X (1) |I] = M . n
n
(61)
n
Also, Eq. (60), applied to the increments of the process (X (t ))t ≥0 , implies that
Var [Xk |I] = Var X
k n
= Var X
k−1 ∆ −X ∆ |I n
∆ ∆ ∆ |I = Var [X (1) |I] = V . n
n
n
(62)
In turn, Eqs. (61) and (62) imply that the n-dimensional random vector (X1 , . . . , Xn ) satisfies the conditional IID framework of Section 3.2 with information In = I, with conditional mean Mn = M ∆ , n . Thus, Eq. (19) implies that and with conditional variance Vn = V ∆ n rn =
=
Var [Mn ] Var [Mn ] + E [Vn ] Var M ∆ n
Var M
=
∆
n
+E V
= ∆ n
Var [M] ∆2 Var [M] ∆2 + nE [V ] ∆
Var [M] Var [M]
∆ 2 n
∆ 2 n
+ E [V ] ∆n
.
(63)
5.5. Proof of Eq. (35) Provided the information In , the components of the random vector (X1 , . . . , Xn ) are random variables with common conditional mean Mn , common conditional variance Vn , and common conditional covariance Cn . Thus, considering a component Xk of the random vector (X1 , . . . , Xn ), the law of conditional variance implies that Eq. (57) holds. Also, considering different components Xi and Xj of the random vector (X1 , . . . , Xn ), the law of conditional covariance implies that Cov Xi , Xj = E Cov Xi , Xj |In
+ Cov E [Xi |In ] , E Xj |In
= E [Cn ] + Cov [Mn , Mn ] = E [Cn ] + Var [Mn ] .
(64)
Eqs. (57) and (64) assert that the random vector (X1 , . . . , Xn ) has a symmetric covariance structure with variance value vn = E [Vn ] + Var [Mn ] and covariance value cn = E [Cn ] + Var [Mn ]. Consequently, the micro-correlation is given by Eq. (35).
158
I. Eliazar / Annals of Physics 374 (2016) 138–161
5.6. Proof of Eq. (37) Consider the n-dimensional random vector (X1 , . . . , Xn ) with its conditionally symmetric structure, and denote by {Mn , Vn , Cn } the respective conditional mean, variance, and covariance. Recall that the information In is the random variable Sn , which equals the sum of the random-vector’s components: X1 + · · · + Xn = Sn . Using the basic properties of the conditional mean, together with the conditional aggregate-framework structure, we obtain that Sn = E [Sn |In ]
=E
n
Xk |In
n
=
k=1 n
=
E [Xk |In ]
k =1
Mn = nMn .
(65)
k=1
Eq. (65) implies that the conditional mean is given by 1
Sn . (66) n Using the basic properties of the conditional variance, together with the conditional aggregateframework structure, we obtain that Mn =
0 = Var [Sn |In ]
= Var
n
Xk |In
k=1
=
n
n
Var [Xk |In ] +
=
n
Cov [Xk , Xl |In ]
k,l=1 k̸=l
k=1
Vn +
n
Cn = nVn + n2 − n Cn .
(67)
k,l=1 k̸=l
k=1
Eq. (65) implies that the conditional variance and the conditional covariance are coupled by Vn = − (n − 1) Cn .
(68)
Consequently, setting dn = n and using Eqs. (35), (66) and (68), we obtain that
ω∞ = lim (dn − 1) rn n→∞
= lim (n − 1) n→∞
= lim
Var [Mn ] + E [Cn ] Var [Mn ] + E [Vn ]
(n − 1) Var Var
n→∞
(n−1)
= lim
n→∞
n→∞
S n n
Var [Sn ] n2 1 Var [Sn ] n2
+ E [Vn ]
− E [Vn ]
+ E [Vn ]
(n−1) Var[Sn ]
= lim
Sn + (n − 1) E [Cn ]
1 1n
n nE[Vn ] 1 Var[Sn ] n nE[Vn ]
−1
+1
.
(69)
Eq. (69) implies that the limit ω∞ exists and is finite if and only if the limit l∞ = lim
Var [Sn ]
nE [Vn ] exists and is finite — in which case Eq. (69) yields Eq. (37). n→∞
(70)
I. Eliazar / Annals of Physics 374 (2016) 138–161
159
5.7. Proof of Eqs. (38)–(39) In the derivations of Eqs. (38)–(39) we use the following shorthands: (i) MB = Maxwell–Boltzmann statistics; (ii) FD = Fermi–Dirac statistics; (iii) BE = Bose–Einstein statistics. Provided the sum Sn , the variances of the three statistics are given by:
1 Sn 1 − n n S Sn n 1− Vn = n n n − 1 Sn Sn 1+ n+1 n n
MB, FD,
(71)
BE.
In turn, Eq. (71) further implies that
1 E [Sn ] 1 − n 1 nE [Vn ] = E [Sn ] − E Sn2 n n − 1 E [Sn ] + 1 E S 2 n n+1 n
MB, (72)
FD, BE.
Noting that E Sn2 = Var [Sn ] + E [Sn ]2 ,
(73)
we can re-write Eq. (72) in the form
1 E [Sn ] 1 − n 1 1 nE [Vn ] = E [Sn ] − E [Sn ]2 − Var [Sn ] n n n − 1 E [Sn ] + 1 E [Sn ]2 + 1 Var [Sn ] n+1 n n
MB, (74)
FD, BE.
Hence: (i) for the MB statistics we have lim
n→∞
Var [Sn ] nE [Vn ]
= lim n→∞
Var [Sn ] 1 n
1−
= lim
E [Sn ]
n→∞
Var [Sn ] E [Sn ]
;
(75)
(ii) for the FD statistics we have lim
n→∞
Var [Sn ] nE [Vn ]
= lim
n→∞
= lim
n→∞
Var [Sn ] E [Sn ] −
1 E [Sn ]2 n
− 1n Var [Sn ]
Var[Sn ] E[Sn ]
1 − 1n E [Sn ] −
1 Var[Sn ] n E[Sn ]
=
Var[Sn ] lim n→∞ E[Sn ] lim 1 E [Sn ] n→∞ n
1−
,
(76)
provided that the limit limn→∞ Var [Sn ] /E [Sn ] exists and is finite; (iii) for the BE statistics we have lim
n→∞
Var [Sn ] nE [Vn ]
= lim
Var [Sn ]
n→∞ n−1 n+1
= lim
n→∞
E [Sn ] + 1n E [Sn ]2 + 1n Var [Sn ]
n+1
Var[Sn ] E[Sn ]
n − 1 1 + 1 E [Sn ] + n
1 Var[Sn ] n E[Sn ]
=
Var[Sn ] lim n→∞ E[Sn ] lim 1 E [Sn ] n→∞ n
1+
provided that the limit limn→∞ Var [Sn ] /E [Sn ] exists and is finite.
,
(77)
160
I. Eliazar / Annals of Physics 374 (2016) 138–161
5.8. Proof of Eqs. (40)–(42) Consider a Lévy process (L (t ))t ≥0 over the time interval (0, ∆), provided the information L (∆) = l. Setting 0 < u < 1, we consider the conditional distribution of the random variable L (∆u) given the information L (∆) = l. We focus on three specific Lévy processes: the basic Brownian motion, the basic Poisson process, and the basic Gamma process. The distribution of an increment of length t is thus given by: (i) Brown — normal with mean 0 and variance t; (ii) Poisson — Poisson with mean t; (iii) Gamma — Gamma with mean t and variance t. Consequently, the conditional distribution of the random variable L (∆u), provided the information L (∆) = l, is given by: (i) Brown — normal with mean lu and variance ∆u (1 − u); (ii) Poisson — binomial with l trials and success probability u; (iii) Gamma — Beta, supported on the interval (0, l), with exponents ∆u and ∆ (1 − u). Consider now the Lévy process (L (t ))t ≥0 over the time interval (0, ∆n ), with the ‘‘endpoint information’’ L (∆n ) = Sn . Applying the aforementioned conditional distribution, while setting u = 1/n, we obtain that
∆ S n 1 1 n 1− · Vn = 2 n n Sn 1 + ∆n
Brown, Poisson,
(78)
Gamma.
Hence
∆n E [S ] 1 n nE [Vn ] = 1 − · E S 2 n n 1 + ∆n
Brown, Poisson,
(79)
Gamma.
Plugging Eq. (79) into Eq. (37), and taking the limit n → ∞, yields Eqs. (40)–(42) (in the case of the Gamma bridge we make use of Eq. (73) in order to obtain Eq. (42)). References [1] S.S. Wilks, Biometrika 24 (1932) 471. [2] S.S. Wilks, in: I. Olkin, S.G. Ghurye, W. Hoeffding, W.G. Madow, H.B. Mann (Eds.), Contributions to Probability and Statistics, Stanford University Press, Stanford CA, 1960, pp. 486–503. [3] S.S. Wilks, Collected Papers: Contributions to Mathematical Statistics, Wiley, New York, 1967. [4] S. Kocherlakota, K. Kocherlakota, in: N.L. Johnson, S. Kotz (Eds.), Encyclopedia of Statistical Sciences, Vol. 3, Wiley, New York, 1983, pp. 354–357. [5] A. Sen Gupta, Encyclopedia of Statistical Sciences, Wiley, New York, 2006, http://dx.doi.org/10.1002/0471667196.ess6053. pub2. [6] I. Eliazar, Ann. Phys. 363 (2015) 164. [7] P. Lévy, Théorie De l’addition Des Variables Aléatoires, Gauthier-Villars, Paris, 1954. [8] B.V. Gnedenko, A.N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables, Addison-Wesley, London, 1954. [9] W. Feller, An Introduction to Probability Theory and Its Applications, Vol. 2, second ed., Wiley, New York, 1971. [10] E.J. Gumbel, Statistics of Extremes, Dover, Mineola, 2004, originally published by Columbia University Press in 1958. [11] J. Galambos, Asymptotic Theory of Extreme Order Statistics, second ed., Krieger, Melbourne, Florida, 1987. [12] S. Kotz, S. Nadarajah, Extreme Value Distributions, Imperial College Press, London, 2000. [13] B.S. Everitt, D.J. Hand, Finite Mixture Distributions, Chapman & Hall, London, 1981. [14] G.J. McLachlan, D. Peel, Finite Mixture Models, Wiley, New York, 2000. [15] K.L. Mengersen, C.P. Robert, D.M. Titterington, Mixtures: Estimation and Applications, Wiley, New York, 2011. [16] D.J. Aldous, Exchangeability and Related Topics, Springer, Berlin, 1985. [17] P.J. Davis, Circulant Matrices, American Mathematical Society, Providence, RI, 2012. [18] W. Feller, Introduction to Probability Theory and Its Applications, Vol. I, Wiley, New York, 1971. [19] T.L. Hill, Statistical Mechanics, Dover, Mineola NY, 1987. [20] R.C. Tolman, The Principles of Statistical Mechanics, Dover, Mineola NY, 2010. [21] P. Lévy, Processus Stochastiques et Mouvement Brownien, Gauthier-Villars, Paris, 1965. [22] A. Janicki, A. Weron, Simulation and Chaotic Behavior of Stable Stochastic Processes, Marcel Dekker, New York, 1994. [23] G. Samrodintsky, M.S. Taqqu, Stable non-Gaussian Random Processes, Chapman and Hall, New York, 1994. [24] T.S. Ferguson, Ann. Statist. 1 (1973) 209. [25] T.S. Ferguson, Ann. Statist. 2 (1974) 615. [26] I. Eliazar, Physica A 356 (2005) 207. [27] R. Frey, A.J. McNeil, J. Risk 6 (2003) 59.
I. Eliazar / Annals of Physics 374 (2016) 138–161
161
[28] W.J. Reed, M. Jorgensen, Comm. Statist. Theory Methods 33 (2004) 1733. [29] A. Chatterjee, S. Yarlagadda, B.K. Chakrabarti (Eds.), Econophysics of Wealth Distributions, Springer, Milan, 2005. [30] B.K. Chakrabarti, A. Chakraborti, S.R. Chakravarty, A. Chatterjee, Econophysics of Income and Wealth Distributions, Cambridge University Press, Cambridge, 2013. [31] J.C. Phillips, Rep. Progr. Phys. 59 (1996) 1133. [32] J. Laherrere, D. Sornette, Eur. Phys. J. B 2 (1998) 525. [33] W.T. Coffey, Yu.P. Kalmykov, Fractals, Diffusions and Relaxation in Disordered Systems, Wiley-Interscience, New York, 2006. [34] J.P. Bouchaud, A. Georges, Phys. Rep. 195 (1990) 12. [35] R. Metzler, J. Klafter, Phys. Rep. 339 (2000) 1. [36] J. Klafter, I.M. Sokolov, Phys. World 18 (2005) 29. [37] J.M. Sancho, A.M. Lacasta, K. Lindenberg, I.M. Sokolov, A.H. Romero, Phys. Rev. Lett. 92 (2004) 250601; Phys. Rev. Lett. 94 (2005) 188902. [38] M. Khoury, A.M. Lacasta, J.M. Sancho, K. Lindenberg, Phys. Rev. Lett. 106 (2011) 090602. [39] I. Eliazar, J. Klafter, Ann. Phys. 326 (2011) 2517. [40] W.J. Reed, B.D. Hughes, Phys. Rev. E 66 (2002) 067103. [41] W.J. Reed, Physica A 319 (2003) 469. [42] I.I. Eliazar, M.H. Cohen, Physica A 391 (2012) 5598. [43] I. Eliazar, M.H. Cohen, Phys. Rev. E 88 (2013) 052104. [44] C. Gardiner, Handbook of Stochastic Methods, Springer, New York, 2004. [45] N.G. Van Kampen, Stochastic Processes in Physics and Chemistry, third ed., North-Holland, Amsterdam, 2007. [46] Z. Schuss, Theory and Applications of Stochastic Processes, Springer, New York, 2009. [47] P.A. Samuelson, Ind. Manag. Rev. 6 (1965) 13. [48] R.C. Merton, Bell J. Econ. Manag. Sci. 4 (1973) 141. [49] F. Black, M. Scholes, J. Polit. Econ. 81 (1973) 637. [50] J.F.C. Kingman, Poisson Processes, Oxford University Press, Oxford, 1993. [51] D.R. Cox, V. Isham, Point Processes, CRC, Boca Raton, 2000. [52] R.L. Streit, Poisson Point Processes, Springer, New York, 2010. [53] D.R. Cox, J. R. Stat. Soc. Ser. B Stat. Methodol. 17 (1955) 129. [54] E.W. Montroll, G.H. Weiss, J. Math. Phys. 6 (1965) 167. [55] J. Klafter, I.M. Sokolov, First Steps in Random Walks: From Tools to Applications, Oxford University Press, Oxford, 2011. [56] I. Eliazar, M.F. Shlesinger, Phys. Rep. 527 (2013) 101.