Proceedings of the 13th IFAC Symposium on Information Control Problems in Manufacturing Moscow, Russia, June 3-5, 2009
Approximation of continuous distribution via the generalized Erlang distribution Salah E. Elmagrhraby*, Rachid Benmansour**, Abdelhakim Artiba**, Hamid Allaoui*** *North Carolina State University, Raleigh, NC 27695-7906 (e-mail:
[email protected]) ** Institut Supérieur de Mécanique de Paris, France (e-mail:{rachid.benmansour; hakim.artiba}@supmeca.fr ) ***Université d’Artois, France (e-mail:
[email protected]) Abstract: In this paper, we study the approximation of a given probability distribution function by the generalized Erlang distribution (GED), which is a Phase-type Distribution (PH-D). In the method of moment we search equating the moments of a given probability distribution function and the moments of the GED which will approximate it. We will show conditions under which the approximation is feasible and will present a mathematical programming approach when the method of equating moments fails. The performance of this method is illustrated by numerical examples. Keywords: Moment method, Entropy, Phase-type distribution. 2.1 Generalized Erlang distribution with two phases(GED2)
1. INTRODUCTION Interest in the Generalized Erlang distribution (GED) stems from our need to model the project resource allocation problem as a continuous time Markov chain. A way for us to deal with this is to model a general distribution of the work content by a combination of exponential distributions, known as a phase type distribution (PH-D), see (Neuts, 1975). We have comfort that we can accomplish such approximation, since the set of Phase type distributions is dense in the set of nonnegative distributions (Nelson, 1995). A classical method in mapping a general probability distribution, G, into a PH-D, P, is to choose P such that some moments of P and G agree. Previous work (Altiok, 1985; Marie, 1980; Sauer, and Chandy, 1975; Whitt, 1982) has contributed a large number of moment matching algorithms. There are four measures of goodness of the approximation (Osogami, 2005) : number of moments matched, minimality of the number of phases, generality of the solution, and computational efficiency of the algorithm. While all of these algorithms excel with respect to some of these four measures, they all are deficient in at least one of them.
Consider a two-phase process in which the service in phase 1 is exponentially distributed with parameter λ1 and in phase 2 with parameter λ2. We assume that the system is blocked with one job in it. Denote the time in each phase by Xi; i = 1, 2, and denote the time over the two stages by Y. Given that FX and fX are respectively the cumulative function and the probability density function of the random variable X, the cumulative distribution function of Y is the convolution of the two distributions of X1 and X2 which we designate as GED2. y
FY ( y ) = ∫ FX 2 ( y − t ) f X 1 (t )dt
FY ( y ) = 1 + (λ 2 e − λ1 y − λ1 e − λ2 y ) (λ1 − λ 2 )
∞
µ k = ∫ (t − µ ) k λe −λt dt
(
f Y ( y ) = (λ1 λ 2 (λ1 − λ 2 )) e − λ2 y − e − λ1 y
(1)
Hence the first four moments are given by:
µ1 =
1
λ
, µ2 = σ 2 =
1
λ2
, µ3 =
2
λ3
, µ4 =
9
λ4
978-3-902661-43-2/09/$20.00 © 2009 IFAC
)
(5)
Calculating of the four central moment of the GED2 gives:
µ1 =
0
(4)
Hence the probability density function of Y is:
2. GENERALIZED ERLANG DISTRIBUTION The single phase generalized Erlang reduces to the exponential distribution with parameter λ. The kth central moments (given that the mean µ = 1/λ) are given by:
(3)
0
1
λ1
+
1
λ2
, σ2 =
1
λ12
+
1
λ 22
, µ3 =
2
λ13
+
2
λ32
, µ4 =
9
λ14
+
9
λ 42
2.2 Generalized Erlang distribution with three phases(GED3) We seek the probability density function of Z = X + Y, where X is exponentially distributed with parameter λ3 and Y is the GED2 with parameters λ1 and λ2. The cumulative density function of Z is secured from the convolution:
(2)
FZ ( z ) = ∫
z
y =0
240
FX ( z − y ) f Y ( y )dy
(6)
10.3182/20090603-3-RU-2001.0570
13th IFAC INCOM (INCOM'09) Moscow, Russia, June 3-5, 2009
FZ ( z ) = 1 + −
λ2 λ1 − λ 2
e −λ1 z −
λ1 λ1 − λ 2
λ1λ 2 λλ e −λ z + 1 2 (λ 2 − λ3 )(λ1 − λ3 ) λ1 − λ 2 3
N
e − λ2 z
Λ (Pexact , Papprox ) = ∑ pr log ( pr π r )
(7)
e − λ2 z e −λ1z − λ 2 − λ3 λ1 − λ3
Observe that a perfect fit in which pr=πr ∀r, yields the value 0.
Therefore, the probability density function is given by, fZ (z) =
λ1λ 2 (e − λ z − e − λ z ) λ1 − λ 2 2
3.1 Method of moments: two phases Assume two phases of the generalized Erlang distribution with parameters λ1, λ2. We equate the first two moments of the GED2 and the “exact” distribution.
1
(8)
λ1λ 2 λ3 λ λ λ e − λ z λ2e − λ z e − λ z + 1 2 1 − (λ 2 − λ3 )(λ1 − λ3 ) λ1 − λ 2 λ1 − λ3 λ 2 − λ3 1
+
(9)
r =1
2
3
1 λ1 + 1 λ 2 = µ
(10)
The four central moments of the GED3 are:
1 λ12 + 1 λ 22 = σ 2
(11)
The mean:
µ1 = 1 λ1 + 1 λ2 + 1 λ3
The variance:
µ2 = σ 2 = 1 λ12 + 1 λ22 + 1 λ23
The skewness:
µ3 = 2 λ13 + 2 λ32 + 2 λ33
where λ1, λ2 > 0 and µ, σ2 are, respectively, the mean and variance of the “exact” distribution. These are two nonlinear equations in two unknowns which solution is easily achieved provided the following restrictive condition is satisfied: (12) 2σ 2
The kurtosis:
µ4 = 9 λ14 + 9 λ42 + 9 λ43
1≤
µ
2
≤ 2
3.2 Method of moments: three phases We do not deal with more stages because the expressions become more complex with little gain in insight relative to methodology. 3. METHOD OF MOMENTS: THE CLASSICAL APPROACH The method of moments is a classical approach in curve fitting to empirical data in statistical studies. Here we adopt it to approximate a given probability density function with a GEDn, in which we demand that the first n moments of the two distributions are equal. The approach is feasible up to a point, beyond which it is infeasible. And even when it is feasible it may impose rather strict limitations on the exact distribution. However, we circumvent the problem of infeasibility by formulating a mathematical program which solution provides the best approximation given a specified set of penalties. To assess the goodness of the approximation, we offer a qualitative measure by “eye-balling” seeing whether the fit is good or otherwise, and a quantitative one by calculating a quasi-cross-entropy measure of the two distributions.
One has in hand the exact probability distribution function over the non-negative half line, with known central moments µ, σ2, µ3. Suppose we wish to approximate it with a GED3 with unknown parameters λ1, λ2, and λ3. One way to secure a good approximation is to equate the three moments: 1 1 1 + + = µ λ2 λ3 λ1 1 1 1 2 2 + 2 + 2 =σ λ λ λ 2 3 1 2 2 2 3 + 3 + 3 = µ3 λ 1 λ2 λ3
(13)
In general this system of equations is infeasible, which shall lead us to investigate deviation from exact equalities. 3.2.1 Example: The Beta Distribution The Beta distribution has the density function: f (y ) =
1
(b − a )
k1 + k 2 −1
( y − a )k −1 (b − y )k −1 , a ≤ 1
β (k 1 , k 2 )
2
y≤b
Where
By definition, if the exact distribution has (mass) probabilities {πr} and the approximation has (mass) probabilities {pr} (1≤r≤N), then the cross-entropy is given by:
β (k 1 , k 2 ) =
Γ (k 1 )Γ (k 2 ) Γ (k 1 + k 2 )
And Γ(.) is the Gamma function.
N
Λ (Pexact , Papprox ) = − ∑ pr log ( pr π r )
Suppose that:
r =1
The drawback of this expression is that gross underestimation of the exact density function in some region may be compensated by gross overestimation in another region, which would result is a small cross-entropy measure, giving the false impression of a “close to perfect” fit. To obviate such eventuality we take the absolute value of the logarithm to secure:
a = 0, b = 18, k1 = 1.45, k2 = 8.
(14)
The first two moments of the Beta distribution are given by: µ = a + (b − a ) σ 2 = (b − a) 2
241
k1 k1 + k 2
(15) k1 k 2
(k1 + k 2 ) 2 (k1 + k 2 + 1)
(16)
13th IFAC INCOM (INCOM'09) Moscow, Russia, June 3-5, 2009
We now analyze the feasible ranges of the shape parameters k1 and k2 in view of Condition (12). Henceforth we assume for simplicity that a = 0.
Since the normal distribution is symmetric we opt for λ1=λ2=λ. The ordinary Erlang distribution for k phases and parameter λ, EDk, is given by:
We substitute from (15) and (16), 2
2σ
=
2
µ
f ( x, k , λ ) =
2k2 k1 ( k1 + k 2 + 1)
µ
2
(19)
with expected value µe = 2/λ and variance σe2 = 2/λ2. For the given set of parameters for the normal distribution function we can either equating the means or the variance.
Parameters suggested in (14) satisfy Condition (12) since:
Equating the Means:
2 (4 . 0274 ) = 1.056 ∈ (1,2 ) (2 . 7619 )2 2
=
)
f ( x,2, λ ) = λ 2 xe − λx
2k 2 ≤2 1≤ k 1 ( k 1 + k 2 + 1)
2
Γ (k
To start, we set k = 2. The ED2 density function simplifies to:
Therefore, for condition (12) to be satisfied, we must have
2σ
λ k x k −1 e − λ x
2/λ =5.6555 implies λ = 0.353638
Then equations (10) and (11) yield:
Then the relative error in the variance is 0.047%
1 λ 2 = 1.05436 or 1 λ 2 = 1.707545.
Equating the Variances:
We opt for the second value of 1/λ2, which yields
2/λ2 =16 implies λ = 0.35355
λ1 = 0.5856, λ 2 = 0.9484
Then the relative error in the mean is -0.025%
The GED2 is superimposed on the Beta distribution in Fig. 1. As can be seen, the approximation is rather good.
Since the two relative errors are negligible, we opt for the former because of the tendency to demand equality in the means. The plot of the two functions is shown in Fig. 2. As can be seen, it is highly unsatisfactory. Following the procedure for approximating the given distribution with a generalized Erlang distribution, albeit still with only two phases, we have, following the arguments presented in subsection 3.1:
λ1 = 0.361550, λ2 = 0.346064 The plot of this GED2 approximation is superimposed on the ED2 approximation in Fig. 2.
Fig. 1. Approximating the Beta distribution with GED2 As quantitative measure of the goodness approximation, we calculate the cross-entropy: Λ (beta, GED2 ) ≈ 0.046246
of
the
(17)
Would an approximation with GED3 be better? We would need the solution to the system of equations given in (13). Here, the first three moments of the Beta distribution are given by: µ 1 = 2.7619 , σ 2 = 4.0274 , µ 3 = 8.7766 The system has no feasible solution. The method of equating moments fails, and we must appeal to some other device to rescue the approach. This is provided by a mathematical programming approach which is discussed in section 4. But first, we give another example in order to demonstrate the confining restrictions imposed by condition (12).
Fig. 2. Approximating the Normal distribution with ED2 and GED2. A quantitative measure of the quality of the approximation is the cross-entropy which is equal to: Λ (Normal , ED2 ) ≈ 0.166678
3.2.2 Example: The normal distribution
Λ (Normal , GED2 ) ≈ 0.16676
Consider a normal distribution with µ = 5.6555 and σ2 = 16. The probability density function is given by:
The improvement is marginal and the approximation is still unsatisfactory.
f (x) =
1 4
2π
e
( x − 5.6555 − 32
) 2
(18)
In the above discussion we confined ourselves to an Erlang distribution with k = 2 for two reasons. First, to enable us to compare ED2 and GED2. Second, it is the closest to have a mean and variance that are almost equal to the mean and
242
13th IFAC INCOM (INCOM'09) Moscow, Russia, June 3-5, 2009
variance of the normal distribution. However, it is known that EDk → the normal distribution as k → ∞. Experimenting with increasing values of k we conclude that maintaining equality of the means results in progressive degradation of the variance vis-a-vis the normal distribution. The density functions of the exact normal distribution and the EDk for k = 2, 4, 8, 16, 32 are plotted in Fig. 3. To be sure, convergence of EDk to the normal is evident; however, the approximation is unsatisfactory for all values of k.
If the nonlinear program yields a solution with equal or almost equal λ’s, then one should use an approximation via the EDk instead of the GEDk. 4.1 Example: a Beta distribution Suppose we have a normalized Beta distribution with parameters k1 = 2 and k2 = 5. Then the density function reduces to: f ( x ) = 30 x (1 − x ) ;0 ≤ x ≤ 1 4
The three moments of this Beta distribution are: µ = 0.2857, σ2 = 0.02551, µ 3 = 0.0170. In this case, system (13) of three nonlinear equations is infeasible. The mathematical program for this distribution is:
min z = h1 (s 1 + e 1 ) + h 2 (s 2 + e 2 ) + h3 (s 3 + e 3 ) Subject to the constraints given in equations (21)-(24). Put h1 = 10, h2 = 5, h3 = 1, meaning that we evaluate deviation from the mean 2 times as heavily as deviation from the variance which is 5 times as heavily as deviation from skewness.
Fig. 3. Erlang distributions with increasing value of k. 4. METHOD OF MOMENTS: A MATHEMATICAL PROGRAMMING APPROACH
The optimal solution (secured via Excel Solver 2007) gives all three λ’s equal so that the mean is satisfied,
It can be easily surmised from the above discussion that to circumvent the severe limitations imposed by constraints above on the shape of the exact density function, we have to deviate from strict equalities in the moments and permit deviations, at a price. To this end we propose a nonlinear program that permits such deviations from the specified moments. We present the model for three stages; generalization to more stages is straightforward albeit it becomes progressively cumbersome. 3
min z = ∑ h i (s i + ei )
λ 1 = λ 2 = λ 3 = 3 0.2857 = 10.500986 which results in an ED3 (not a GED3) approximation with: s1 = e1 = s 2 = e 3 = 0 , e 2 = 0.0017, s 3 = 0.005036
The Beta distribution and its ED3 approximation are shown in Fig. 4. The infeasibility has been removed. We have not only succeeded in finding an ED3 approximation but also the approximation appears to be excellent.
(20)
i =1
subject to the constraints: 3
1
∑λ i =1
i
3
1
∑λ i =1 3
2
∑λ i =1
2 i
3 i
+ s1 − e1 = µ
(21)
+ s 2 − e2 = σ 2
(22)
+ s 3 − e3 = µ 3
(23)
λi > 0, i = 1, 2, 3
Fig. 4. Approximating the Beta distribution with ED3.
(24)
A quantitative measure of the goodness of the approximation is provided by the cross-entropy measure which is:
with ei ≥ 0, si≥ 0 i=1, 2, 3. In this nonlinear program hi is the penalty coefficient, si is the “slack” variable, and ei is the “excess” variable. These variables are penalized in the objective function depending on the conceived importance of the deviation. We illustrate afterward the application of this program to the same two distributions discussed above, except that now we do not have to worry about satisfying condition (12).
Λ (Beta ( 0,1; 2,5), ED 3 ) ≈ 0.060813
4.2 Example: a normal distribution Consider a normal distribution with parameters µ=20 and ɐ2=16. This distribution yields negligible probabilities of having negative values of the argument x. Further, condition (12) is not satisfied.
243
13th IFAC INCOM (INCOM'09) Moscow, Russia, June 3-5, 2009
Put h1= 10, h2=1. The mathematical program for this distribution is to:
k = 1.2, λ = 2, θ = 0.
m in z = 10 (s 1 + e 1 ) + (s 2 + e 2 )
µ = 1.8813, σ2 = 2.4789, µ 3 = 5.9371, µ 4 = 143.607
Subject to the constraints:
Solution of equation (8) and (9) in this case yields:
2
1
∑λ i =1
i
2
1
∑λ i =1
2 i
Then the first four moments are given by:
λ1= 2.89745 and λ2= 0.65096.
+ s1 − e1 = 20
The Weibull distribution and its GED2 approximation are shown in Fig. 6.
+ s 2 − e 2 = 16
An initial feasible solution is secured by concentrating on equating the means at the expense of the variance:
λ = 2 20 = 0 .1 s 1 = e1 = s 2 = 0 , e 2 = 1 84 , z = 1 84 . Λ ( Normal (20,16), ED2 ) ≈ 1.421152
Can we do better? The answer is, fortunately, yes. Since the normal distribution is symmetric we opt to approximate it with the Erlang distribution EDk rather than the generalized Erlang distribution GEDk. Simple enumeration over the value of k reveals that k = 24 is the best integer value; continued search would have resulted in a fractional k that coincides with the given normal distribution. A summary of our search is presented in Table 1, and the approximating functions are shown in Fig. 5.
Table 1. The search for the best EDk approximation. Distribution ED2: λ=0.1 ED4: λ=0.2 ED8: λ=0.4 ED16 :λ=0.8 ED32: λ=1.6 ED24: λ=1.2
Mean 20 20 20 20 20 20
Variance 200 100 50 25 12.5 16.6
Cross-entropy 1,421152 0,880751 0,410256 0,125916 0,058972 0,046255
Fig. 6. Approximation of Weibull (1.2,2,0) with GED2. The approximation is rather good. As quantitative measure of the goodness of the approximation, we calculate the crossentropy following equation (9): Λ (Weibull(1.2,2,0), GED2 ) ≈ 0.046049
To seek the GED3 approximation of the Weibull distribution we need to solve the three nonlinear equations in (13). This system of equations has no feasible solution; we use then our mathematical programming approach. The following solution is found (with parameters h1=10, h2=5, h3=1):
λ1= 0.70357, λ2= 37204503607, λ3= 2.17388. The cross-entropy is secured from equation (9): Λ (Weibull(1. 2,2,0), GED3 ) ≈ 0.054683
The value of λ2 is too large to be useful in a practical case. We use then a GED’2 with parameters λ1 and λ3. We notice that this GED’2 approximation has the same quality as the previous one and that the cross-entropy remains almost equal, but we have avoided the handling of λ2. For the GED4 approximation, we use analysis similar to that performed in the previous cases with the new parameters: h1 = 10, h2 = 5, h3 = 1, h4=1. The optimal solution is:
λ1= 1,48498, λ2= 30,9729, λ3= 2,76236, λ4= 0,72577. Fig. 5. Approximation of N(20,16) with ED2, ED4 and ED24
Then the cross-entropy is:
4.3 Example: a Weibull distribution
Λ (Weibull(1.2,2,0), GED4 ) ≈ 0.144217
The probability density function is given by:
k x − θ f X ( x ) = λ λ
k −1
e
x −θ − λ
k
;x ≥θ
Suppose that the parameters of the Weibull distribution are:
The quality of the approximation has not improved and the approximation by GED2 remains the best one. From now on we choose parameters of the Weibull distribution so that condition (10) is not satisfied. Suppose that these parameters are: k = 3.5, λ = 2, θ = 0.
244
13th IFAC INCOM (INCOM'09) Moscow, Russia, June 3-5, 2009
The first four moments are given by: µ = 1.7995, σ2 = 0.3243, µ 3 = 0.00463, µ 4 = 12.8866. We give in Table 2. the results of approximating the Weibull distribution via ED2, ED3 and GED4.
Takahashi uses PH-distributions to study complicated queueing models (Takahashi, 1981). Our method can be used to find directly a solution of the renewal equation in the case of an approximation via GED2 or GED3.
Table 2. Approximating Weibull (3.5,2,0) via ED2, ED3 and GED4.
ED2
Parameters hi
Value of λi
Cross-entropy
Value of z
h1=10
λ1=1.111 λ2=1.111
0,801509
6.4739
λ1=1.667 λ2=1.667 λ3=1.667
0,518725
5.0657
λ1=1.4849 λ2=30.973 λ3=2.7623 λ4=0.7257
0,382571
4.66011
h2=5 h1=10
ED3
h2=5
REFERENCES
h3=1 h1=10
GED4
h2=5 h3=1 h4=1
The approximation of the Weibull distribution by ED3 as a first step and GED4 in the second improves the entropy by approximately 35.28% and 26.24% respectively. The overall improvement is of 52.27%, but Fig. 7 shows that the approximation is still unsatisfactory.
Altiok, T. (1985). On the phase-type approximations of general distributions. IIE Transactions, 17(2), 110-116. Marie, R.A. (1980). Calculating equilibrium probabilities for λ(n)/ck/1/n queues, In: Proceedings of the Performance Tools, 117-125. Nelson, R. (1995). Probability, Stochastic Processes, and Queueing Theory, Springer-Verlag. Neuts, M.F. (1975), Probability distributions of phase type. In: Liber amicorum Professor Emeritus H. Florin. Dept. of Mathematics, University of Louvain, 173–206. Puterman, M. L. (1994), Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series In: Probability and Statistics. Sauer, C. H. and K. M. Chandy (1975). Approximate analysis of central server models. IBM Journal of Research and Development, 19(3), 301-313. Takahashi, Y. (1981). Asymptotic exponentiality of the tail of the waiting-time distribution in a PH/PH/c queue. Advances in Applied Probability, 13, 619–630. Osogami, T. (2005), Analysis of Multi-server Systems via Dimensionality Reduction of Markov Chains, PhD thesis, School of Computer Science Carnegie Mellon University Pittsburgh. Whitt, W. (1982), Approximating a point process by a renewal process: Two basic methods. Operations Research, 30(1), 125-147.
Fig. 7. Approximation of Weibull (3.5,2,0) with ED2,ED3 and GED4. 5. CONCLUSION In this paper, a new approach of the moment matching method has been given. When the classical approach fails in mapping a general probability distribution into a phase type distribution, this new one provides an alternative way to rescue the approach via a mathematical program. The method suggested was found to be a very good tool to approximate a general distribution, for the examples considered. It is easy to implement this method for almost all usual distribution function and it has a broad applicability in engineering and operations research. Furthermore, many optimization problems in stochastic environment can be formulated as Markov decision (Puterman, 1994) by mapping general probability distributions into phase type distributions.
245