Suggestions of efficient self-similar generators

Simulation Modelling Practice and Theory 15 (2007) 328–353 www.elsevier.com/locate/simpat Suggestions of eﬃcient self-similar generators Hae-Duck J. ...

Download PDF

357KB Sizes 0 Downloads 13 Views

Report

PDF Reader
Full Text

Simulation Modelling Practice and Theory 15 (2007) 328–353 www.elsevier.com/locate/simpat

Suggestions of eﬃcient self-similar generators Hae-Duck J. Jeong a,*, Jong-Suk R. Lee b, Don McNickle c, Krzysztof Pawlikowski c b

a School of Information Science, Korean Bible University, Seoul, South Korea Grid Technology Research Department, Supercomputing Centre Korea Institute of Science and Technology Information, Daejeon, South Korea c University of Canterbury, Christchurch, New Zealand

Received 8 January 2006; received in revised form 9 October 2006; accepted 17 October 2006 Available online 28 November 2006

Abstract The growth of Grid computing and the Internet has been exponential in recent years. These high-speed communication networks have had a tremendous impact on our civilisation. High-speed communication networks oﬀer a wide range of applications, such as multimedia and data intensive applications, which diﬀer signiﬁcantly in their traﬃc characteristics and performance requirements. Many analytical studies have shown that self-similar network traﬃc can have a detrimental impact on network performance, including ampliﬁed queueing delays and packet loss rates in broadband wide area networks. Thus, full understanding of the self-similar nature in teletraﬃc engineering is an important issue. This paper presents a detailed survey of self-similar generators proposed for generating sequential and ﬁxed-length selfsimilar pseudo-random sequences for simulation in communication networks. We evaluate and compare the operational properties of the ﬁxed-length and sequential generators of self-similar pseudo-random sequences. The statistical accuracy and time required to produce long sequences are discussed theoretically and studied experimentally. The evaluation of the generators concentrated on two aspects: (i) how accurately self-similar processes can be generated (assuming a given mean, variance and self-similarity parameter H), and (ii) how quickly the generators can generate long self-similar sequences. Overall, our results have revealed that the fastest and most accurate generators of the six sequential and ﬁve ﬁxed-length sequence generators considered are the SRP-FGN, FFT and FGN-DW methods. 2006 Elsevier B.V. All rights reserved. Keywords: Self-similar generator; Self-similar process; Hurst parameter; Sequential simulation; Complexity; Communication network

1. Introduction Many studies of traﬃc modelling and analysis have shown that self-similar (or fractal) processes may provide better models for teletraﬃc in modern communication networks than Poisson processes for the * Corresponding author. Address: 205 Sanggye 7-dong, Nowon-gu, Seoul 139-791, South Korea. Tel.: +82 2 950 5495; fax: +82 2 950 5411. E-mail address: [email protected] (Hae-Duck J. Jeong).

1569-190X/$ - see front matter 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.simpat.2006.10.002

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

329

past decade [1–3]. If this is not taken into account, it can lead to inaccurate conclusions about performance of telecommunication networks. Thus, an important requirement for conducting simulation studies of communication networks is the ability to accurately generate long synthetic stochastic self-similar sequences. A various methods for generating pseudo-random self-similar sequences have been proposed [4,5]. Generators of synthetic self-similar sequences can be divided into two practical classes: sequential generators and ﬁxed-length sequence generators as follows. 1.1. An overview of sequential generators It is possible to construct a sequential Markovian model that mimics a self-similar sequence [6,7]. However, a disadvantage of this method is that the connection between the model’s parameters and its self-similar properties is diﬃcult to understand. Markovian models for self-similar traﬃc are forced to include several control parameters with a wide range of input values. It is also more complicated to control these values and more sensitive to the accuracy in sequential generators than in generators of ﬁxed-length sequences of self-similar processes with a given Hurst parameter. For example, a method based on superposition of two state Markovian processes was proposed by Andersen and Nielsen [8]. They showed that it can be used to imitate a selfsimilar process with a certain Hurst parameter, over three to ﬁve time scales. However, ﬁve control parameters are needed, and the resulted self-similarity can gradually disappear as the time scale increases. Thus, in this respect, their method is unable to adequately model the self-similar counting processes. Lowen and Teich [9], and Ryu and Lowen [10,11] proposed four generators (i.e., fractal-binomial-noise-driven Poisson process [FBNDP], fractal renewal process [FRP], superposition of fractal renewal processes [SFRP], and fractal-short-noise-driven Poisson process [FSNDP]). We considered only SFRP and FBNDP because they are more ﬂexible, more accurate and faster at generating self-similar processes than FRP and FSNDP, see [11,12]. A sequential generator based on the renewal reward processes, has been proposed by Mandelbrot [13] and Taqqu and Levy [14,3]. However, we excluded that generator because it requires O(nM) computations to generate n numbers, where M is an aggregation level. Further, the generator behaves as a fractional Brownian motion when n M, for M P 16,000; see [14 and 3], for detailed discussions. We considered and investigated the following eﬃcient candidate sequential generators: • A generator based on the fractal-binomial-noise-driven Poisson processes (FBNDP), proposed by Lowen and Teich [9], and Ryu and Lowen [10,11]; • A generator based on the superposition of fractal renewal processes (SFRP), proposed by Lowen and Teich [9], and Ryu and Lowen [10,11]; • A generator based on the M/G/1 processes (MGIP), proposed by Cox and Isham [15], and Cox [16]; • A generator based on the Pareto-modulated Poisson processes (PMPP), proposed by Le-Ngoc and Subramanian [17]; • A generator based on the spatial renewal processes and fractional Gaussian noise (SRP-FGN), proposed by Taralp et al. [18]; and • A generator based on the superposition of autoregressive processes (SAP), proposed by Granger [19].

1.2. An overview of ﬁxed-length sequence generators The most frequently studied models of self-similar traﬃc in the discrete-time case belong to the class of fractional autoregressive integrated moving-average (F-ARIMA) processes and the class of fractional Gaussian noise (FGN) processes because they require the Hurst parameter and variance; see [2,5,20]. F-ARIMA (p, d, q) processes were introduced by Hosking [21], where p is the order of autoregression in the ARIMA process, d is the degree of diﬀerencing, and q is the order of the moving average. Hosking showed that the FARIMA processes are asymptotically self-similar with the Hurst parameter H ¼ d þ 12, as long as 0 < d < 12.

330

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

To describe the FGN process, we ﬁrst introduce the fractional Brownian motion (FBM) process BH(t), t P 0, which has a Hurst parameter H, 0 < H < 1. The FBM process is a Gaussian process with zero-mean, stationary increments and the autocovariance function 1 2H CovðBH ðt1 Þ; BH ðt2 ÞÞ ¼ ½t2H þ t2H 2 ðt 1 t 2 Þ Var½BH ð1Þ; 2 1 where t1 and t2 are time. This is statistically self-similar in the sense that BH(at), t P 0, has the same ﬁnite dimensional distributions as aHBH(t), t P 0, for all a > 0. The FGN process Y is the incremental process of the FBM process. It is deﬁned by Yi = BH(i + 1) BH(i),i P 1, and its properties of stationarity, zero mean and variance E½Y 2i ¼ E½B2H ð1Þ ¼ r20 are derived from the FBM process. The ACF of the FGN process is given by 1 2H 2H qk ¼ ½ðk þ 1Þ 2k 2H þ ðk 1Þ : ð1Þ 2 A diﬀerent approach to generating synthetic self-similar sequences for packet traﬃc was proposed by Erramilli et al. [22–24], based on deterministic chaotic maps [25]. Chaos is present in a dynamic system if simple, low order, nonlinear deterministic equations can produce behaviour that mimics random processes. In particular, Erramilli and Singh have shown that a simple, two parameter nonlinear chaotic map, referred to as an intermittent map, can capture many of the fractal properties in actual packet traﬃc measurements. Clearly, the generation of synthetic traﬃc via nonlinear chaotic maps makes the dynamic system’s approach to packet trafﬁc modelling particularly appealing. After an appropriate chaotic map has been derived from a set of traﬃc measurements, generating a packet stream for an individual source is generally quick and easy. Deriving an appropriate nonlinear chaotic map based on a set of actual traﬃc measurements, however, currently requires considerable guessing and experimenting. We considered the following fully synthesised ﬁxed-length sequence generators: • A generator based on the fractional-autoregressive integrated moving average (F-ARIMA) process, proposed by Hosking [21]; • A generator based on the fast Fourier transform (FFT) algorithm, proposed by Paxson [20]; • A generator based on fractional Gaussian noise and Daubechies wavelets (FGN-DW), proposed by Jeong et al. [26,27]; • A generator based on the random midpoint displacement (RMD) algorithm and implemented by Lau et al. [28]; and • A generator based on the successive random addition (SRA) algorithm, proposed by Saupe [29], in the version implemented by Jeong et al. [30]. Our comparative evaluation of self-similar pseudo-random teletraﬃc generators concentrates on two aspects: (i) how accurately self-similar processes can be generated, and (ii) how quickly the methods generate long selfsimilar sequences. We describe six sequential generators, based on FBNDP, SFRP, MGIP, PMPP, SRP-FGN and SAP in Section 2, and ﬁve ﬁxed-length generators of self-similar sequences, based on the F-ARIMA, FFT, FGNDW, RMD and SRA methods, in Section 4. Then, in Section 3 and 5 we concentrate on the least biased estimators, the wavelet-based H estimator and Whittle’s MLE, as discussed in [31], when presenting the numerical results of a comparative analysis of the generated sequences. In Section 6, the fastest and most accurate sequential generator is compared with the most accurate ﬁxed-length sequence generator; ﬁnally, conclusions are presented. 2. Sequential generators 2.1. Method based on fractal-binomial-noise-driven poisson process For the standard fractal renewal process (FRP), inter-event times are independent random variables. The marginal probability density function (PDF) of such a fractal renewal process assumes the form

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

f ðtÞ ¼

0;

t 6 A; d ðdþ1Þ

dA t

;

331

ð2Þ

t > A;

where 0 < d < 2 [12]. However, the resulting IDC(t) (see [31]) has a dip near t = t0, caused by the abrupt cutoﬀ in the inter-event time PDF. The time instant t0, which marks the lower limit for signiﬁcant scaling behaviour in the IDC(t) and ACF, is also known as the fractal onset time. Furthermore, the power spectral density exhibits excessive oscillations for the same reason. Selecting d in the range 1 < d < 2 proves far superior to 0 < d < 1 for the same required value of a, but the form of the inter-event time PDF in Eq. (2) can be further improved. The improved PDF of the FRP decays as a power law given by ( dA1 edt=A ; 0 < t 6 A; f ðtÞ ¼ ð3Þ ded Atðdþ1Þ ; t > A; which is continuous for all t and it produces smoother spectral density function than Eq. (2). The FRP is recast as a process with real-values that alternates between two values, zero and R, R > 0 [11]. This alternating FRP starts at a value of zero (‘‘OFF’’), and then switches to a value of R (‘‘ON’’) at a time corresponding to the ﬁrst event in the FRP. At the second such event, the alternating FRP switches back to zero, and proceeds to switch back and forth at every successive event in the FRP. Thus, all ON/OFF periods are IID with the same heavy-tailed distribution as in the FRP. A method based on the fractal-binomial-noise-driven Poisson process (FBNDP) adds M IID alternating FRPs to generate a fractal binomial noise process that serves as the rate function for a Poisson process. The FBNDP requires ﬁve input parameters to generate self-similar sequences: A, d, R, Dt and M. The resulted Hurst parameter H assumes the value (a + 1)/2. The algorithm advances by the intervals Dt. If S is a simulation clock, which advances in time and S(j) is the elapsed time of the jth FRP sequence, then ðjÞ ðjÞ ðjÞ ðjÞ ðjÞ S ¼ s0 þ s1 þ þ sk for some k and j = 1, 2, . . . , M, where sk is the inter-arrival time. The sequence of self-similar pseudo-random numbers X0, X1, . . . is generated by the following steps: ðjÞ

Step 1. For each j = 1, 2, . . . , M, generate s0 from ( 1 d1 A log½U ðdðV 1Þ ÞðdðV U Þ Þ ; V P 1; ðjÞ s0 ¼ AV 1=ð1dÞ ; V < 1;

ð4Þ

where V

1 þ ðd 1Þed U; d

ð5Þ ðjÞ

and U is an IID uniformly distributed random variable over the unit interval [0, 1); set S ðjÞ ¼ s0 . Step 2. Find j* and S ðj Þ such that j* = argminj{S(j)}. Step 3. Calculate ( 0; if S ðj Þ < A; x¼ ð6Þ 1; if S ðj Þ P A: Step 4. If x = 1, then X0 should be drawn from a Poisson probability distribution with k = 1. If x = 0, then X0 = 0. Step 5. Set i = 1, and y = 0. Advance the simulation clock, i.e., S S ðj Þ . ðjÞ Step 6. Construct a new inter-event time si from ( d1 A log½U ; U P ed ; ðjÞ si ¼ ð7Þ e1 AU 1=d ; U < ed ; and set S ðj

Þ

ðjÞ

S ðj Þ þ si .

332

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

Step 7. Step 8. Step 9. Step 10. Step 11. Step 12.

Find a new j* such that j* = argminj{S(j)}, and compute S ðj Þ S. Repeat Step 6 through Step 8 to obtain x as in Step 3. Advance the simulation clock, i.e., S S ðj Þ , and set y = y + x. Repeat Step 6 through Step 10 within time slot of length Dt. Compute Xi = POISS(y), set y = 0, and i = i + 1. Repeat Step 6 through Step 11 until i = n, where n is the number of sample points.

An approximate self-similar sequence {X0, X1, X2, . . .} is obtained from these steps. We assume that four input values of the FBNDP method are A = 9.92, R = 200, M = 4 to 14, and H = 0.6 to 0.9. Our results show that the appropriate aggregate level M is between 4 and 10. These results show that no aggregation level in this range of M is consistently better than others. Without studying whether the marginal probability distributions of these mixtures of Poisson processes are close enough to normal distributions, we chose the aggregate level of M = 10, when comparing this generator with others in Section 3. Thus, the problem of selection of M for securing normality of marginal distributions of the output processes from such a generator remains an open problem. Note that, for small input values of k of Poisson processes, only the Poisson approximation can be used, but for large input values of k we can use either the normal or the Poisson approximation. This implies that for large values of k it must be possible to approximate the Poisson distribution by the normal distribution; see [32, p.190] for details. Generation of a sample sequence of 1,048,576 numbers (so, about 524 · 106 inter-event times) took 9 min 38 s on a Pentium II (233 MHz, 512 MB). The FBNDP method requires O(n) computations to generate n numbers. For a more detailed discussion, see [12]. 2.2. Method based on superposition of fractal renewal processes The fractal renewal process (FRP) was described in Section 2.1. This self-similar process results from the superposition of a number of independent and identical FRPs. We now consider a method based on the superposition of fractal renewal processes (SFRP), proposed by Lowen and Teich [9] and Ryu and Lowen [10,11]. This method is deﬁned as the superposition of M independent and identical FRPs. The method is characterised by M and the common inter-event PDF in Eq. (3). This method requires three parameters, i.e., a and A from the individual FRPs, and M, the number of FRPs superposed. The resulted Hurst parameter H, and mean l and variance r2 of the marginal output distribution of a related count process in the unit time interval, are given by H = (a + 1)/2, l = E[Xn] = k, r2 = Var[Xn] = (1 + (1/t0))ak, where 1

1

k ¼ Md½1 þ ðd 1Þ ed A1 is the aggregated arrival rate of events in the unit time interval, and 1

2

t0 ¼ ð21 d2 ðd 1Þ ð2 dÞð3 dÞed ½1 þ ðd 1Þed Aa Þ

1=a

;

that is the value of time at which the resulting IDC(t) has a dip. If S is a simulation clock, which advances in time and S(j) is the elapsed time of the jth FRP sequence, then ðjÞ ðjÞ ðjÞ ðjÞ S ¼ s0 þ s1 þ þ sk for some k and j = 1, 2, . . . , M. The inter-event times Xi are generated by the following steps: ðjÞ

ðjÞ

For each j = 1, 2, . . . , M, and i = 0, generate s0 from Eqs. (4) and (5) in the FBNDP; set S ðjÞ ¼ s0 . Find j* such that j* = argminj {S(j)}, and set X 0 ¼ S ðj Þ . Advance the simulation clock, i.e., S S ðj Þ . ðjÞ Set i = i + 1. Construct a new inter-event time si from Eq. (7) in the FBNDP and set S ðj Þ ðjÞ S ðj Þ þ si . Step 5. Find a new j* such that j* = argminj{S(j)}, and compute X i ¼ S ðj Þ S. Step 6. Advance the simulation clock, i.e., S S ðj Þ . Step 7. Repeat Step 4 through Step 6 until a given i = n is reached, where n is the number of sample points. Step Step Step Step

1. 2. 3. 4.

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

333

Using the previous steps, this method generates an approximate self-similar sequence {X1, X2, . . .}. This method produces the most accurate result when the aggregation level M is between 4 and 10. It took 22 min, 44 s to generate a sequence of 1,048,576 numbers (so, about 1362 · 106 inter-event times) on a Pentium II (233 MHz, 512 MB). The results were obtained assuming M = 10 and A = 3.8. The SFRP method requires O(n) computations to generate n numbers. For a more detailed discussion, see [12]. 2.3. Method based on M/G/1 processes An M/G/1 is a queueing system in which a server is available immediately for every arriving customer, regardless of how many customers are already being served. Applying this process to generate LRD count sequences, we assume that new arrivals can enter M/G/1 only at the beginning of time slot of length Dt. Let us call this method as MGIP. The method is based on simulation of customers that arrive at an inﬁnite-server queueing system according to a Poisson process with an arrival rate k. This method generates asymptotically self-similar sequences obtained from counting the number of customers from unlimited servers in the system, where the service time distribution G satisﬁes the heavy-tailed condition [16,22,33]. Cox [16] showed that an inﬁnite variance service time distribution results in an asymptotically self-similar process. Likhanov et al. [34] proposed a model for aggregate packet streams, based on combining sequences generated by several ON/OFF sources with a Pareto-distributed ON period. They showed that increasing the number of sources yields a limiting behaviour identical to the M/G/1 input sequence with a Pareto distribution. To implement their ﬁndings we need to assume a given coeﬃcient utilisation of the queueing system q, 0 < q < 1, and a Pareto distribution of service times with ﬁnite mean service times and inﬁnite variance of service times, i.e., with the shape parameter a, 1 < a < 2. The simulation will be advanced each time by Dt seconds. Then, the MGIP method consists of the following steps: Step 1. Given q, a, Dt, i = 1. Step 2. Simulate performance of an M/G/1 queueing system over Dt seconds. This means, generate pseudo-random numbers representing the number of Poisson arrivals to the M/G/1 queueing system within Dt seconds, and pseudo-random numbers from the Pareto distribution representing service times of these customers. Assume arrival rate k = q(a 1)/a, where q is traﬃc intensity and a is the shape parameter of the Pareto distribution, and service rate (a 1)/a. Step 3. Count the number of customers in the simulated M/G/1 queueing system at the end of this time slot of length Dt. This is Xi, the ith number of the output LRD self-similar sequence in the scale Dt. LRD sequences in larger time scales, say s, can be obtained by counting number of customers in the system at the end of each s time slots, i.e., by assuming a time lag equal to Dt. Step 4. Set i = i + 1. Repeat Step 2 to Step 4, advancing the simulated time by the next Dt seconds, until i 6 n, where n is the number of sample points. Otherwise, stop. A self-similar sequence {X1, X2, . . .} is obtained from these steps. We assume the MGIP method with input traﬃc intensity q = 0.9 and service rate l = (a 1)/a, where a = 3 2H, for H = 0.6, 0.7, 0.8 and 0.9, and time lag s = 4 to 14. Our results show that this method is most eﬃcient at time lag s between 4 and 8. Generation of an asymptotic self-similar sequence with 1,048,576 numbers with these time lags took 27 s on a Pentium II (233 MHz, 512 MB). O(n) computations are required to generate a self-similar sequence. 2.4. Method based on Pareto-modulated poisson processes This method is based on the fact that a Pareto-modulated Poisson process (PMPP), based on a switched Poisson process with two states, with sojourn times governed by an independent and identical Pareto distribution, asymptotically generates a self-similar sequence [17]. Fig. 1 shows a state diagram of the PMPP. The two states of the switched Poisson process can be viewed as intervals with the long and short burst rates of events. This process goes through consecutive cycles of being in State 1 and State 2. The time spent in each cycle is governed by a Pareto distribution characterised by a, 1 < a < 2. These cycles have the mean length (ML) equal

334

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

State 1

State 2

λ1

λ2

Fig. 1. State diagram of the Pareto-modulated Poisson process [17]. It is a two-state switched Poisson process with the sojourn time in each state following an independent and identical Pareto distribution.

ML ¼

a a 2a þ ¼ time units: a1 a1 a1

ð8Þ

Mean numbers of Poisson events (MNPE) generated in state S1 and state S2 are: a ; and MNPE in state S 1 ¼ k1 a1 a MNPE in state S 2 ¼ k2 : a1 Thus, mean number of Poisson events per cycle is given by a a a k1 þ k2 ¼ ðk1 þ k2 Þ ; a1 a1 a1

ð9Þ

ð10Þ

and using Eqs. (8) and (10), we get the mean number of Poisson events per time unit as E¼

ðk1 þ k2 Þ : 2

ð11Þ

As mentioned the PMPP can be used to generate asymptotically self-similar sequences. The quality generated sequences, in the sense of the closeness to exactly self-similar processes, depends on the size of frames within which one counts numbers of Poisson events occurring in underlining PMPP, see Fig. 2. If frames have length T seconds, then the mean number of Poisson events occurring in a frame can be called the aggregation level of that method, given as N ¼ E T;

ð12Þ

where E is the number of events per second. There should exist the minimum acceptable aggregation level N min below which this method would not generate satisfactory self-similar sequences, but we leave this issue for further research. In our investigations we assume k1 = 100, k2 = 120, and T P 100. Thus, we assume N min P 11; 000. The PMPP generator follows of the following steps: For a given k1, k2, a, T and n, let S be the time advance. ð0Þ

Step 1. Set i = 1, k = 1, S = 0, X i ¼ 0. Step 2. Generate the sojourn times sk and sk+ 1 for state Sk mod 2 and S(k+1) mod 2 of PMPP using the Pareto distribution with shape parameter a,1 < a < 2. Produce a sequence of Poisson arrivals within each state Sj with rate kj, j = 1, or 2. Calculate S = S + sk + sk+1 if S P iT then go to Step 3, otherwise assume k = k + 1 and repeat Step 2. A cycle State 1

State 2 * *

*

State 1

State 2 * *

State 1

* * * Time

X1 events (in Frame 1)

X2 events (in Frame 2)

Fig. 2. Graphical explanation of the concept of data aggregation in the generator based on PMPP. Note that d is an event in Poisson process with rate k1, and * is an event in Poisson process with rate k2. Xi is the number of Poisson events occurring in Frame i.

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

335

ð1Þ

Step 3. Count X i , the number of events that occur in the last frame of the length of T seconds. ð0Þ

ð1Þ

Xi ¼ Xi þ Xi :

ð13Þ ð0Þ X iþ1 ,

Count the number of events that occur in the remaining time interval (i T, S). This is the initial component of Xi+1. If i < n then assume i = i + 1, go to Step 2. Otherwise if i = n, then stop (the required number of pseudo-random self-similar numbers has been generated). We compare this generator with others in Section 3, assuming T = 300. Generating a sample sequence of 1,048,576 numbers took 7 min, 11 s on a Pentium II (233 MHz, 512 MB). The PMPP method requires O(n) computations to generate n numbers. For a more detailed discussion, see [17]. 2.5. Method based on spatial renewal processes and fractional gaussian noise The SRP-FGN generator is a hybrid method that uses a fractional Gaussian noise (FGN) generator based on the spatial renewal process (SRP) developed by Taralp et al. [18]. Before discussing SRP, we ﬁrst introduce the concept of sub-exponentiality. It means that the ACF of a stationary process decays not exponentially, but hyperbolically, for large lags [35]. For example, Jelenkovic´ [36] observed distinctive sub-exponential characteristics of MPEG video traﬃc in the functional behaviour of its scene length distributions. The SRP belongs to a class of sub-exponentially time-dependent stochastic processes. The SRP Z is composed of a chain of mutually independent renewal periods. For practical reasons, it is assumed that the SRP is a discrete time process and its ith period Ti has length ki, where ki is an integer, and the sample of Z during the period is represented by a sequence of ki numbers Y 1 ; Y 2 ; . .P . ; Y ki , governed by the normal distribution. The ki consecutive number of the output self-similar time series X i j¼1 Y j. To improve statistical properties of the output sequence (it ﬁt to normal distribution and the required correlation function), it has been proposed to aggregate a number of such sequences [18]. We investigated this suggestion experimentally by considering various levels of aggregation. Our results also show that the SRPFGN method is most accurate if the level of aggregation M = 10, supporting the advice of Taralp et al. [18], who wrote that the aggregation level needs not be high (i.e., 10) to obtain accurate results. The aggregate output sequence {X1, X2, . . .} is computed by after having summed the sequences and normalisation of the sample variance to one. The SRP-FGN model has a normal marginal distribution, and is characterised by a sub-exponential ACF. Fig. 3 shows a block diagram of the SRP-FGN generator, which consists of the following steps: Step 1. Given H, n, M, i = 0, m = 0, s = 0, l = 1, X0 = 0. Step 2. Generate a random length ki of the renewal cycle Ti governed by the following a cumulative probability distribution function FT(k). ( 0; 0 6 k < 1; F T ðkÞ ¼ ð14Þ H fðkþ1Þ2H 1 2k 2H 1 þðk1Þ2H 1 g ; 1 6 k: 1 ð22H 1 2Þ

1 0 1 Uniformly distributed random numbers

Inverse Yi normal CDF

Inverse Ti renewal CDF

1 0 1 Uniformly distributed random numbers

Independent FGN sequences

Z . . .

X Aggregate FGN sequence

Fig. 3. Block diagram of the SRP-FGN method [18].

336

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

Step 3. The SRP Z is composed of a chain of renewal periods where the ith period Ti is ki in length. Generate ki random numbers PkYi 1 ; Y 2 ; . . . ; Y ki governed by the normal distribution. Set m = m + ki. If m < M, then X i ¼ X i þ j¼1 Y j , set s = s + m, and go to Step 2. If m P M, then set m = m M, and go to Step 4. PlM Step 4. The output value i is computed as follows: If s = 0, then X i ¼ j¼lMMþ1 Y j , and set s = 0. If s > 0, PX Ms then X i ¼ X i þ j¼1 Y j , and set s = 0. Step 5. Set i = Piki + 1, and Xi = 0. If i < n and if m = 0, then go to Step 2. Otherwise,– if m < M, then X i ¼ j¼k Y , set s = m, l = 1, and go to Step 2.– if m P M, then set m = m M, s = 0, i mþ1 j l = l + 1, and go to Step 4. If i = n, where n is the number of sample points, then stop. This generator produces approximately self-similar sequences {X0, X1, X2, . . .}. We obtained the points of the inverse renewal CDF using Eq. (14). In order to obtain more accurate results of the tail behaviour, we chose a number of intervals, T = 10,000, for the renewal CDF. The SRP-FGN renewal CDF FT(i) and complementary CDF are plotted in Figs. 4 and 5. FT(i) gradually has longer tails as the H value increases. The SRP-FGN method generates sample sequences {X1, X2, . . . , Xn} with O(n). It took 26 s to generate a sequence of 1,048,576 numbers on a Pentium II (233 MHz, 512 MB).

1

Renewal CDF

0.8

0.6

0.4

H H H H

0.2

0

0

5

10

15

= 0.6 = 0.7 = 0.8 = 0.9 20

Time

log (Complementary Distribution) 10

Fig. 4. Cumulative distribution function FT(i) of the SRP-FGN method for H = 0.6, 0.7, 0.8 and 0.9.

0

−3 x 10

−0.5

−1

−1.5

−2 −2.5 −0.1

H H H H

= 0.6 = 0.7 = 0.8 = 0.9 −0.08

−0.06

−0.04

−0.02

0

log10 (Time)

Fig. 5. Complementary cumulative distribution function of the SRP-FGN method for H = 0.6, 0.7, 0.8 and 0.9.

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

337

2.6. Method based on superposition of autoregressive processes The method based on the superposition of autoregressive processes (SAP) proposed by Granger [19] generates asymptotically self-similar sequences when aggregating several independent autoregressive processes. In the simplest case this can be the sum of two autoregressive processes of the ﬁrst order: z1i ¼ A1i z1;i1 þ y 1i ; z2i ¼ A2i z2;i1 þ y 2i ;

i ¼ 1; 2; . . .

ð15Þ

where A1i and A2i are randomly chosen from a beta distribution B(a1, a2) on [0, 1] with shape parameters a1 and a2, where a1 > 0, a2 > 0. y1i and y2i are a pair of IID sequences of random variables with a mean of zero and variance r2 = 1. As shown in [19], using the least-square ﬁtting it can be found that a2 = 7.7929 * log(H) + 4.9513. Thus, the Hurst parameter H is linearly dependent on the shape parameter a2 of the beta-distribution, while a1 can be selected arbitrary, for example, a1 = 1 in all cases that we investigated. The PDF f(x) of the beta distribution is given by ( a 1 x 1 ð1xÞa2 1 ; 0 < x < 1; bða1 ;a2 Þ f ðxÞ ¼ ð16Þ 0; otherwise; where b(a1, a2) is deﬁned by Z 1 Cða1 ÞCða2 Þ a 1 : bða1 ; a2 Þ ¼ xa1 1 ð1 xÞ 2 dx ¼ Cða1 þ a2 Þ 0 This method, as based on the superposition of the autoregressive processes, consists of the following steps: Given: a1, a2, i = 0. Step 1. Set i = i + 1. Determine z1i and z2i using Eq. (15). Step 2. Calculate the sum, X i ¼ z1i þ z2i ;

i ¼ 1; 2; . . . :

Step 3. Repeat Step 1 and Step 2 until i = n, where n is the number of sample points. Using the previous steps, the method based on the superposition of the autoregressive process generates an asymptotically self-similar sequence {X1, X2,. . .}. The CPU time required to generate 1,048,576 numbers was 35 s on a Pentium II (233 MHz, 512 MB). Unlike the other sequential generators, the SAP generator does not require an aggregation level to be assumed as an input parameter, but such input parameters as the shape parameter a1, instead. 3. Comparison of sequential generators All six sequential generators based on FBNDP, SFRP, MGIP, PMPP, SRP-FGN and SAP, generate approximately self-similar sequences. We investigate their properties in greater detail in this section. All have been implemented in C on a Pentium II (233 MHz, 512 MB) computer. The mean times required for generating sequences of a given length were obtained using the SunOS 5.7 time command and were averaged over 30 replications, each with sequences of 32,768 (215), 65,536 (216), 131,072 (217), 262,144 (218), 524,288 (219) and 1,048,576 (220) numbers. We have analysed the accuracy of the six methods. For each of H = 0.5, 0.6, 0.7, 0.8 and 0.9, sample sequences generated were analysed as follows. The FBNDP process was analysed with input M = 10, R = 200, and cutoﬀ parameter A = 9.92; the SFRP method required the following three parameters: M = 10, H = 0.6, 0.7, 0.8 and 0.9, and cutoﬀ parameter A = 3.8; the M/G/1 process (MGIP) with input trafﬁc intensity q = 0.9, service rate l = a/(a 1), where a = 3 2H, and time lag s = 8; the Pareto-modulated Poisson process (PMPP) with input T = 300, k1 = 100 and k2 = 120; the SRP-FGN methods with input

338

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

M = 10, and the interval number of the renewal CDF (T) = 10,000. The input of the superposition of the autoregressive process (SAP) with a beta-distribution (B(a1, a2)) on [0, 1] was B(1, 2.9), B(1, 8.1), B(1, 21.3), and B(1, 71.5). 3.1. Accuracy of generated sequences A ‘‘good’’ estimator is not only one which produces an estimate whose expected value is close to the true parameter (low bias), but also one which has small variance. A comparative analysis of the most frequently used H estimation techniques, the wavelet-based H, Whittle’s MLE, periodogram, R/S-statistic, variance-time and IDC(t) estimators, has been done [31]. The results have shown that the wavelet-based H estimator and Whittle’s MLE are the least biased of the H estimation techniques. Thus, the least biased estimators were considered when presenting the numerical results of a comparative analysis of the generated sequences. For each of H = 0.6, 0.7, 0.8 and 0.9, and for each of a2 = 2.9, 8.1, 21.3 and 71.5, all results are averaged over 30 sequences. The relative inaccuracy DH was calculated using DH ¼

b H H 100%; H

ð17Þ

b is its empirical mean value. where H is the exact value assumed and H (a) Table 1 shows the results of the six sequential methods using the wavelet-based H estimator with the b 1:96^ corresponding 95% conﬁdence intervals H r b . For all input H and a2 values, the SRP-FGN H method produced sequences with the least biased H values compared with the other six methods. For H = 0.6, 0.7 and 0.8, the absolute relative error for the FBNDP method was less than 5%, but for H = 0.9, it was greater than 5% (i.e., 5.5%). For H = 0.6, the estimated H value for the method was positively biased, but for H = 0.7, 0.8 and 0.9, were gradually more negatively biased as the H value increased. For H = 0.6, 0.7, 0.8 and 0.9, each relative error for the SFRP method was +2.76%, +1.29%, 0.17% and 3.49%, respectively. As in the FBNDP method, estimated H values for the method ranged from positively biased to negatively biased as the H value increased. A shortcoming of the MGIP method was that it generated approximately self-similar sequences with strongly biased H values. For H = 0.6, 0.7, 0.8 and 0.9, each relative error was 9.25%, 10.84%, +2.42% and +23.5%, respectively. Although these inter-event processes can be used to produce synthetic teletraﬃc with bursts appearing over a wider range of time scales than Poisson processes, the associated arrival processes do not appear to be self-similar when the aggregation level is low [37,38]. For H = 0.6, 0.7, 0.8 and 0.9, the values of the Hurst parameter from the sample sequences of the PMPP method were lower than the desired values. Each relative error was 1.27%, 2.22%, 1.23% and 3.77%, respectively. Furthermore, this method required four control parameters (i.e., two Poisson arrival rates k1

Table 1 Mean values of estimated H using the wavelet-based H estimator for the six sequential generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% conﬁdence intervals for the means in parentheses Methods

Mean values of estimated H and DH 0.6

0.7

b H FBNDP SFRP MGIP PMPP SRP-FGN SAP

.6086 .6166 .5445 .5924 .5942 .5989

(.581, (.589, (.517, (.565, (.567, (.595,

.636) .644) .572) .620) .622) .603)

DH(%)

b H

+1.440 +2.759 9.252 1.266 0.968 0.182

.6875 .7091 .6241 .6845 .6960 .6852

0.8

(.660, (.682, (.597, (.657, (.669, (.680,

.715) .737) .652) .712) .724) .690)

DH(%)

b H

1.789 +1.296 10.84 2.221 0.569 2.112

.7827 .7986 .8194 .7902 .8056 .7845

0.9

(.755, (.771, (.792, (.763, (.778, (.781,

.810) .826) .847) .818) .833) .788)

DH(%)

b H

2.157 0.174 +2.424 1.229 +0.700 1.937

.8502 .8686 1.1120 .8661 .9031 .8971

DH(%) (.823, .878) (.841, .896) (1.084, 1.139) (.839, .894) (.876, .931) (.894, .900)

5.538 3.491 +23.50 3.766 +0.344 0.325

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

339

and k2, an aggregation number M, and the shape parameter a), and generated a self-similar sequence that was positively biased to negatively biased as the shape parameter a approached one. For H = 0.6, 0.7, 0.8 and 0.9, the values of the Hurst parameter from the sample sequences of the SRP-FGN method match the required values well. For H = 0.6, 0.7, 0.8 and 0.9, each relative error was 0.97%, 0.57%, +0.70% and +0.34%, respectively. For a2 = 2.9, 8.1, 21.3 and 71.5, all values of the Hurst parameter from the sample sequences of the SAP method were lower than the required values. Each relative error was 0.18%, 2.11%, 1.94% and 0.33%, respectively. (b) Table 2 shows the results of the six sequential methods using Whittle’s MLE with the corresponding 95% b 1:96^ conﬁdence intervals H r b . As for the results obtained from the wavelet-based H estimator, the H SRP-FGN method produced sequences with the least biased H values compared with the other six methods. For H = 0.6 and 0.7, the absolute relative error for the FBNDP method was less than 3%, while for H = 0.8 and 0.9, it was greater than 5% (i.e., 5.53% and 9.29%). For H = 0.6, the estimated H value for the method was positively biased; and for H = 0.7, 0.8 and 0.9, they gradually became more negatively biased as the H value increased. For H = 0.6, 0.7, 0.8 and 0.9, relative error for the SFRP method was +6.21%, +1.78%, 1.43% and 5.37%, respectively. As in the FBNDP method, estimated H values ranged from positively biased to negatively biased as the H value increased. A shortcoming of the MGIP method was that it generated approximately self-similar sequences with biased H values for H = 0.7, similar to results obtained from the wavelet-based H estimator. For H = 0.6, 0.7, 0.8 and 0.9, relative error was 8.0%, 9.6%, 3.2% and +5.5%, respectively. For H = 0.8 and 0.9, the values of the Hurst parameter from the sample sequences of the PMPP method were lower than the required values, but for H = 0.6 and 0.7, they were higher. Relative error for H = 0.6, 0.7, 0.8 and 0.9 was +7.18%, +2.24%, 1.28% and 5.59%, respectively. This method generated a self-similar sequence that ranged from negatively biased to positively biased as the shape parameter a approached one. For H = 0.6, 0.7, 0.8 and 0.9, the Hurst parameter values from the sample sequences of the SRP-FGN method match the required values well. For H = 0.6, 0.7, 0.8 and 0.9, relative error was +0.11%, +1.91%, +2.54% and +2.53%, respectively. The results were consistently overestimated. For a2 = 2.9 and 8.1, all values of the Hurst parameter from the sample sequences of the SAP method were higher than the required values. Relative error was +24.14% and +10.45%, respectively; thus, these results were overestimated. For a2 = 21.3 and 71.5, relative errors was 0.01% and 8.45%. Our results show that all six sequential generators produced approximately self-similar sequences, but that relative inaccuracy (jDHj) increased as H increased. Table 2 Mean values of estimated H using Whittle’s MLE for the six sequential generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% conﬁdence intervals for the means in parentheses Methods

Mean values of estimated H and DH 0.6

0.7

b H FBNDP SFRP MGIP PMPP SRP-FGN SAP

.6122 .6372 .5520 .6431 .6007 .7448

(.603, (.628, (.542, (.634, (.591, (.736,

.622) .647) .562) .652) .610) .754)

DH(%)

b H

+2.028 +6.205 7.995 +7.176 +0.110 +24.14

.6828 .7124 .6325 .7157 .7133 .7731

0.8

(.674, (.703, (.623, (.706, (.704, (.764,

.692) .722) .642) .725) .723) .782)

DH(%)

b H

2.452 +1.777 9.641 +2.236 +1.905 +10.45

.7557 .7886 .7742 .7898 .8203 .7999

0.9

(.747, (.779, (.765, (.781, (.811, (.791,

.765) .798) .783) .799) .829) .809)

DH(%)

b H

5.531 1.426 3.225 1.281 +2.539 0.009

.8164 .8517 .9499 .8497 .9227 .8239

DH(%) (.807, (.843, (.941, (.841, (.914, (.815,

.826) .861) .959) .859) .932) .833)

9.291 5.366 +5.549 5.586 +2.526 8.451

340

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

3.2. Complexity and speed of generation Performance analysis of a generator can be classiﬁed into two types: time complexity and space complexity. Time complexity refers to a function describing how much time it will take the generator to execute, based on the parameters of its input. The exact value of this function is usually ignored in favour of its order, expressed in the so-called Big-O notation (this is based on the limit of the time complexity function as the values of its parameters increase without bound.) Space complexity is the way in which the amount of storage space required by the generator varies with the size of the problem it is solving. The space complexity is normally expressed as an order of magnitude, e.g. O(n2) means that if the size of the problem (n) doubles then four times as much working storage will be needed. The time complexity was only considered in this paper. The computational complexities of all six sequential generators of pseudo-random self-similar sequences of a given length n are O(n). However, the number of arithmetic operations per number required by each of them are diﬀerent, as shown in Table 3. The MGIP, SAP and SRP-FGN methods are the fastest of the six sequential generators requiring 4976, 4481 and 4173 operations per number, respectively. While the SFRP method was the slowest (85,199 operations per number), the FBNDP and PMPP methods require similar amounts of time to generate the same number of self-similar sequences, requiring 11,028 and 19,507 arithmetic operations per number, respectively. Fig. 6 shows the experimental mean running times of the six sequential generators. All require O(n) computations to generate n numbers. In summary, our results show that the generator based on the SRP-FGN algorithm is the fastest if long sequences of self-similar pseudo-random numbers are required. The MGIP and SAP methods are nearly as Table 3 Computational complexities and arithmetic mean operations required for each of the six sequential generators (s: the time lag in the M/G/ 1 queueing system, M: aggregation level, T: the number of intervals in the renewal CDF) Method

Complexity

Operations per number

Mean operations per number when assuming optimum values

FBNDP SFRP MGIP PMPP SRP-FGN SAP

O(n) O(n) O(n) O(n) O(n) O(n)

161M + 9418 1378M + 71,419 130s + 3936 130M + 4037 14T + 4159 4481

11,028 85,199 4976 19,507 4173 4481

18

FBNDP SFRP MGIP PMPP SRP -FGN SAP

log2 (Mean Running Time in Seconds)

16 14 12 10 8 6 4 2 0 15

16

17 18 log 2 (Sequence of Numbers)

19

20

Fig. 6. Mean running times of the six sequential generators. Running times were obtained using the SunOS 5.7 time command on a Pentium II (233 MHz, 512 MB); each mean is averaged over 30 iterations, each with sequences of 32,768 (215), 65,536 (216), 131,072 (217), 262,144 (218), 524,288 (219) and 1,048,576 (220) numbers.

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

341

fast. The SRP-FGN generator is also the most eﬃcient when evaluated by Whittle’s MLE. Therefore, the SRP-FGN generator was compared with the most eﬃcient ﬁxed-length sequence generator, which we discuss in Section 6. However, as pointed out in [39] there are general pitfalls in using generators of LRD processes based on FRPs, since the generated processes fail to capture fully the required autocorrelation structure. For a more detailed discussion, see [39]. 4. Fixed-Length sequence generators The numerical results for the following ﬁxed-length sequence generators are given in Section 5. 4.1. Fast fourier transform method The fast Fourier transform method generates approximately self-similar sequences based on the fast Fourier transform (FFT) and the fractional Gaussian noise (FGN) process. Its main weakness is in the accuracy of the power spectrum calculation, which involves an inﬁnite summation. Paxson [20] has proposed a solution to this problem by applying a ﬁnite approximation. Another possible method to generate self-similar sequences is to run the FFT of white noise through the power spectrum, and then apply the inverse FFT. An overview of the FFT method is given as follows. For a more detailed discussion, see [20,30,40]. Step 1. Given: H. Start for i = 1 and continue until i = n/2. Calculate a sequence of values ff1 ; . . . ; fn2 g, where fi ¼ f^ 2pi ; H , corresponding to the power spectrum of an FGN process for frequencies from n 2p to p, 1/2 < H < 1, and the total length of the sequence generated, n. For an FGN process, the n power spectrum f(k, H) is deﬁned by Eqs. (18) and (19). f ðk; H Þ ¼ 2cf ð1 cosðkÞÞBðk; H Þ

ð18Þ

with 0 < H < 1 and p 6 k 6 p, where cf ¼ r2 ð2pÞ1 sinðpH ÞCð2H þ 1Þ; 1 X Bðk; H Þ ¼ j2pk þ kj2H 1 ;

ð19Þ

k¼1

2

and r = Var[Xk] and C(Æ) is the gamma function; see [41]. As mentioned, the inﬁnite summation in Eq. (18) for B(k, H) poses the main diﬃculty in computing the power spectrum exactly. Paxson [20] proposed to use the approximation given by Eq. (20): 0

Bðk; H Þ

ad1

þ

bd1

þ

ad2

þ

bd2

þ

ad3

þ

bd3

0

0

0

ad þ bd3 þ ad4 þ bd4 ; þ 3 8H p

ð20Þ

where d = 2H 1, d 0 = 2H, ai = 2ip + k, bi = 2ip k. Step 2. Multiply the sequence of values ff1 ; . . . ; fn2 g by an exponential random variable with a mean of one. Paxson [20] used this step since, when estimating the power spectrum of a process using the periodogram, the power spectrum estimated for a given frequency is distributed asymptotically as an exponential random variable with a mean equal to the actual power (seeq also ﬃﬃﬃﬃ Beran [42, (p. 409)]). Step 3. Generate fZ 1 ; . . . ; Z n2 g, a sequence of complex values such that jZ i j ¼ f^ i and the phase of Zi is uniformly distributed between 0 and 2p. This random phase technique, taken from Schiﬀ [43], preserves the spectral density corresponding to ff^ i g. It also makes the marginal distribution of the ﬁnal sequence normal, as proved by Lindeberg ([44, p. 256]), and deﬁnes the requirements for FGN. Step 4. Start for i = 0 and continue until i < n. Construct fZ 00 ; . . . ; Z 0n1 g, an expanded version of fZ 1 ; . . . ; Z n2 g: 8 if i ¼ 0; > < 0; 0 if 0 < i 6 n2 ; and Zi ¼ Zi; ð21Þ > : n Z ni ; if 2 < i < n:

342

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

where Z ni denotes the complex conjugate of Zni. fZ 0i g retains properties of the power spectrum used to construct {Zi}, but because fZ 0i g is symmetric about Z 0n , it now corresponds to the FFT 2 of a real-valued signal. Step 5. Calculate the inverse FFT fZ 0i g to obtain the approximate FGN sequence {Xi} with a mean of zero and variance of one. Then form the ﬁnal sequence X1, X2, . . ., Xn by assigning X i Z 0i , i = 1,2,. . .,n. This method generates an approximately self-similar sequence {X1, X2, . . . ,Xn} with the exact values. Generating a sample sequence of 1,048,576 numbers took 33 s on a Pentium II (233 MHz, 512 MB). The FFT method requires O(nlogn) computations to generate n numbers because of the Fourier transform algorithm [20,45]. The Danielson-Lanczos’ FFT1 algorithm was used [46]. For a more detailed discussion, see [20]. 4.2. Fractional-Autoregressive integrated moving average method Hosking [21,47] states that the F-ARIMA method2 is used to generate an approximately self-similar process with a Hurst parameter of H ¼ d þ 12. We used the F-ARIMA(0, d, 0) method for generating self-similar sequences, where d is the fractional diﬀerencing parameter, 0 < d < 12. Hosking’s algorithm is used to generate the process X = {Xi : i = 0, 1, 2, . . . , n} with a normal marginal distribution, a mean of zero and variance r20 , and an autocorrelation function (ACF), {qk}(k = 0, ± 1, . . .) deﬁned as qk ¼ ck =c0 ¼

Cð1 dÞCðk þ dÞ ; where CðdÞCðk þ 1 dÞ

ð22Þ

k

ck ¼ r20

ð1Þ Cð1 2dÞ ; see½47 and page 63 on½41: Cðk d þ 1ÞCð1 k dÞ

Step 0. Set N0 = 0 and D0 = 1. X0, the ﬁrst pseudo-random element in the output self-similar sequence, is generated from the normal distribution N ð0; r20 Þ, where r20 is the required variance of the Xi. Step i (i = 1, . . . , n 1.) Compute meani and Vari of Xi recursively, using the following equaPi1 tions:N i ¼ qi j¼1 /i1;j qij , Di ¼ Di1 N 2i1 =Di1 , /ii = Ni/Di, /ij = /i1,j /ii/i1,ij,j = 1, . . . , i 1, where /ij, i = 0, j = 0, . . . , n 1, is given by i ðj d 1Þ!ði d jÞ! ; /ij ¼ ðd 1Þ!ði dÞ! j Pi meani ¼ j¼1 /ij X ij ; vari ¼ ð1 /2ii Þvari1 . Generate Xi from N(meani, vari). Increase i by 1. If i = n, then stop. A self-similar sequence {X1,X2, . . . , Xn} is obtained in n steps. The F-ARIMA method is too computationally intensive to generate long sample sequences. Generation of an F-ARIMA traﬃc sample sequence with 1,048,576 numbers took 41 h, 0 min and 22 s on a Pentium II (233 MHz, 512 MB). This method requires O(n2) computation time. 4.3. Random midpoint displacement method The random midpoint displacement (RMD) method generates an approximately self-similar sequence in the time interval [0, T]. The RMD algorithm is an approximate fractional Brownian motion (FBM) generation method. The basic concept of the RMD algorithm is to interpolate the interval [0, T] recursively and calculate the values of the process at the midpoints from the values at the endpoints.

1

Available at http://ita.ee.lbl.gov/. The autocorrelation functions of two other simple processes, F-ARIMA(1, d, 0) and F-ARIMA(0, d, 1), behave similarly at high lags, but the F-ARIMA(0, d, 1) autocorrelation function drops more sharply between lags 1 and 2. For a more detailed discussion, see [47]. 2

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

343

Fig. 7 illustrates the ﬁrst three steps of the process. This method leads to the generation of the sequence (d3,1, d3,2, d3,3, d3,4). The interval between 0 and 1 is subdivided to construct the increments governed by a normal distribution. Adding oﬀsets to the midpoints makes the marginal distribution of the ﬁnal result normal. For more detailed discussions of the RMD method, see [28,30,48]. Step 1. If the process Y is to be computed for any time instance t between 0 and 1, then begin by setting Y0 = 0 and selecting Y1 as a pseudo-random number from a normal distribution with mean 0 and variance Var½Y 1 ¼ r20 . Then Var½Y 1 Y 0 ¼ r20 . Step 2. Next, Y 12 is constructed as the average of Y0 and Y1, that is, Y 12 ¼ 12 ðY 0 þ Y 1 Þ þ d 1 . The oﬀset d1 is a normal random number (NRN), which is multiplied by a scaling factor 12, with mean 0 and variance S 21 of d1. Compare the visualisation of this step and the next one in Fig. 7. For 2H Var½Y t2 Y t1 ¼ jt2 t1 j r20 to be true for 0 6 t1 6 t2 6 1, then 2H h i 1 1 1 r20 ¼ r20 þ S 21 : Var Y 12 Y 0 ¼ Var½Y 1 Y 0 þ S 21 ; 4 2 4 2H 2H 2 2 1 2 Thus S 1 ¼ 21 ð1 2 Þr0 . pﬃﬃﬃ Step 3. Reduce the scaling factor by 2; that is, now assume p1ﬃﬃ8, and divide the two intervals from 0 and 12 and from 12 to 1 again. Y 14 is set as the average 12 Y 0 þ Y 12 plus an oﬀset d2,1, which is an NRN multiplied by the current scaling factor p1ﬃﬃ. The corresponding formula holds for Y 3 , that is, 8

4

1 Y 34 ¼ ðY 12 þ Y 1 Þ þ d 2;2 ; 2 where d2,2 is a random oﬀset computed as before. Therefore, the variance S 22 of d2,* must be chosen such that 2H 2H h i 1 h i 1 1 1 2 r ¼ r20 þ S 22 : Var Y 14 Y 0 ¼ Var Y 12 Y 0 þ S 22 ; 0 4 4 2 22 2H Thus S 22 ¼ 212 ð1 22H 2 Þr20 . pﬃﬃﬃ ﬃ. Then set Step 4. Proceed in the same manner: reduce the scaling factor by 2, that is, scale by p1ﬃﬃﬃ 16 1 1 Y 0 þ Y 14 þ d 3;1 ; Y 38 ¼ Y 1 þ Y 12 þ d 3;2 ; Y 18 ¼ 2 2 4 1 1 Y 12 þ Y 34 þ d 3;3 ; Y 78 ¼ Y 34 þ Y 1 þ d 3;4 : Y 58 ¼ 2 2 ﬃ. In each formula, d3,* is computed as a diﬀerent NRN multiplied by the current scaling factor p1ﬃﬃﬃ 16 p ﬃﬃ ﬃ The following step computes Y at t ¼ 161 ; 163 ; . . . ; 15 using a scaling factor again reduced by 2, 16 2 and continues as indicated above. The variance S 3 of d3,* is chosen such that

d3,3

Y d1

d2,2

d2,1

d3,4

d3,2

d3,1

0.25

0.50

0.75

1.00

Fig. 7. The ﬁrst three stages in the RMD method.

t

344

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

h i 1 Var½Y 18 Y 0 ¼ Var Y 14 Y 0 þ S 23 ; 4

2H 2H 1 1 1 2 r ¼ r20 þ S 23 ; 0 4 22 23

2H 2H ð1 22H 2 Þr20 . The variance S 2n of dn,*, therefore, yields 21n ð1 22H 2 Þr20 . that is, S 23 ¼ 213 Step 5. Calculate the values at the midpoints in the previous same manner until the given n is equal to 2No of Steps. Then form the ﬁnal sequence X0, X1, X2, . . . by assigning X i Y i=2NoofSteps , i = 0,1,2,. . .. A self-similar sequence {X0, X1, . . . , Xn} is obtained from the previous steps. Generation of an approximately self-similar sequence with 1,048,576 numbers took 17 s on a Pentium II (233 MHz, 512 MB). The theoretical algorithmic complexity is O(n) [49]. 4.4. Successive random addition method An alternative method for the direct generation of an FBM process is based on the successive random addition (SRA) algorithm [29,31]. The SRA method uses the midpoints as the RMD method does, but adds a displacement of a suitable variance to all the points [49]. Adding oﬀsets to all points should make the resultant sequence self-similar and of normal distribution [49]. The SRA method consists of the following steps: Step 1. If the process Y is to be computed for time instances t between 0 and 1, then begin by setting Y0 = 0 and selecting Y1 as a pseudo-random number from a normal distribution with mean 0 and variance Var½Y 1 ¼ r20 . Then Var½Y 1 Y 0 ¼ r20 . Step 2. Next, Y 12 is constructed by the interpolation of the midpoint, that is, Y 12 ¼ 12 ðY 0 þ Y 1 Þ. Step 3. Add a displacement of a suitable variance to all points, i.e.,Y0 = Y0 + d1,1, Y 12 ¼ Y 12 þ d 1;2 , Y1 = Y1 + d1,3. The oﬀsets d1,* are governed by a normal random number. For Var½Y h t2 Y t1i ¼ 2H

jt2 t1 j r20 to be true, for any t1, t2, 0 6 t1 6 t2 6 1, 2Hit is required thatVar Y 12 Y 0 ¼ 12H 2 1 2 2 2 2 1 1 1 Var½Y 1 Y 0 þ 2S 1 , 2 r0 ¼ 4 r0 þ 2S 1 , that is, S 1 ¼ 2 21 ð1 22H 2 Þr20 . 4 2H Step 4. Step 2 and Step 3 are repeated. Therefore, S 2n ¼ 12 21n ð1 22H 2 Þr20 , where r20 is the initial variance and 0 < H < 1. Step 5. Calculate the values at the midpoints as noted previously until the given n is equal to 2No of Steps. Then form the ﬁnal sequence X0, X1, X2, . . . by assigning X i Y i=2NoofSteps , i = 0, 1, 2, . . ..

Using these steps, the SRA method generates an approximately self-similar sequence {X0, X1, . . . , Xn}. It took 15 s to generate a sequence of 1,048,576 numbers on a Pentium II (233 MHz, 512 MB). The theoretical algorithmic complexity is O(n) [49]. 4.5. Fractional gaussian noise and daubechies wavelets method We present a new generator of pseudo-random self-similar sequences based on fractional Gaussian noise (FGN) and Daubechies wavelets (DW), called the FGN-DW method [26,27]. A pseudo-random generator of self-similar teletraﬃc based on Haar wavelet transforms has been proposed in [50,51 and 52]. We used Daubechies wavelets because the generator based on Daubechies wavelets produces more accurate self-similar sequences than one based on Haar wavelets. In other words, not only estimates of H obtained from the Daubechies wavelets are closer to the true values than those from the Haar wavelets, but also variances obtained from the Daubechies wavelets are lower. The reason behind is that the Daubechies wavelets produce smoother coeﬃcients of wavelets that are used in the discrete wavelet transform than the Haar wavelets [53– 55]. Haar wavelets are discontinuous, and they do not have good time–frequency localisation properties, since their Fourier transforms decay as jkj1, for k ! 1, meaning that the resulting decomposition has a poor scale. Therefore, Daubechies wavelets produce more accurate coeﬃcients than Haar wavelets; for a more detailed discussion, see [53,55].

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

345

Our method for generating synthetic self-similar FGN sequences in a time domain is based on a discrete wavelet transform (DWT). Wavelets can provide compact representations for a class of FGN processes [56,57,54], because the structure of wavelets naturally matches the self-similar structure of long-range dependent processes [53,55,58]. We claim that the FGN-DW method is suﬃciently fast for the practical generation of synthetic self-similar sequences that can be used as simulation input data. The general strategy behind our method is similar to Paxson’s, who used the Fourier transform [20]. Fig. 8 graphically illustrates a discrete Fourier and a discrete wavelet transform. Wavelet analysis transforms a sequence onto a time-scale grid, where the term scale is used instead of frequency, because the mapping is not directly related to frequency as in the Fourier transform. The wavelet transform delivers good resolution in both time and scale, as compared to the Fourier transform, which provides only good frequency resolution. The algorithm consists of the following steps: Step 1. Given: H. Start for i = 1 and continue until i = n. Calculate a sequence of values {f1, f2, . . ., fn} using Eq. (23) (following), where fi ¼ f^ pin ; H , corresponding to the spectral density of an FGN process for frequencies fi ranging between pn and p.The main diﬃculty with using Eq. (18) when computing the spectral density is that it requires to execute the inﬁnite summation. The approximation of f(k,H) is given in [41] as f ðk; H Þ ¼ cf jkj12H þ Oðjkjminð32H ;2Þ Þ;

ð23Þ

where cf is Eq. (19) and O(Æ) represents the residual error.This formula was used in the generation of self-similar sequences proposed in this paper. Another generator of self-similar sequences based on FGN was also proposed by Paxson [20], but his method was based on a more complicated approximation of f(k,H) as shown in Eq. (20). Eq. (23) can be used to determine f(k, H) for k ! 1, or for n ! 1 at k ¼ pn. For a large value of k, f(k,H) can be calculated by Eq. (18). Step 2. Multiply {fi} by realisations of an independent exponential random variable with a mean of one to obtain ff^ i g, because the spectral density estimated for a given frequency is distributed asymptotiqﬃﬃﬃﬃ cally as an independent exponential random variable with mean f(k, H) [42]. Step 3. Generate a sequence {Y1, Y2, . . . , Yp} of complex numbers such that jY i j ¼ f^ i and the phase of Yi is uniformly distributed between 0 and 2p. This random phase technique, taken from Schiﬀ [43], preserves the spectral density corresponding to ff^ i g. It also makes the marginal distribution of the ﬁnal sequence normal and produces the requirements for FGN.

a

b

Frequency

Scale

f0

f0

t0

Time

t0

Time

Fig. 8. A graphical representation of a discrete Fourier transform and a discrete wavelet transform: (a) a discrete Fourier transform and (b) a discrete wavelet transform.

346

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

Step 4. Calculate the two synthetic coeﬃcients of orthonormal Daubechies wavelets that are used in the inverse DWT (IDWT). The output sequence {X1, X2, . . . , Xn} representing approximately self-similar FGN process (in time domain) is obtained by applying the IDWT operation to the sequence {Y1, Y2, . . . , Yn}. Using the previous steps, the proposed FGN-DW method generates a fast and suﬃciently accurate self-similar FGN process {X1, X2, . . . , Xn}. It took 16 s to generate a sequence of 1,048,576 numbers on a Pentium II (233 MHz, 512 MB). Its theoretical algorithmic complexity is O(n). Moreover, the accuracy of Daubechies wavelets is slightly better than Haar wavelets, but there is no diﬀerence in the time taken to obtain the same number of coeﬃcients. For more detailed discussion, see also [31]. 5. Comparison of ﬁxed-length generators Paxson [20] and Lau et al. [28] suggest that the FFT- and RMD-based methods are suﬃciently fast in the generation of simulation input data for practical applications. In this paper, we report on the properties of these two methods and the F-ARIMA-based method, and compare them with SRA and FGN-DW, two recently proposed alternative methods for the generation of pseudo-random self-similar sequences [26,30]. These ﬁve ﬁxed-length sequence generators are comparable because all of them have the same statistical properties, such as normal marginal distributions, means and variances. They were implemented in C on a Pentium II (233 MHz, 512 MB) computer. The mean times required for generating sequences of a given length were obtained using the SunOS 5.7 time command and were averaged over 30 replications, each with sequences of 32,768 (215), 65,536 (216), 131,072 (217), 262,144 (218), 524,288 (219) and 1,048,576 (220) numbers. We have analysed the accuracy with which ﬁve considered generators generate normal pseudo-random sequences with the required value of H. For H = 0.6, 0.7, 0.8 and 0.9, each method was used to generate 30 sample sequences of 32,768 (215) numbers starting from diﬀerent random seeds. Self-similarity and marginal distributions of the sequences generated were assessed by the same techniques as those used in Section 3. 5.1. Accuracy of generated sequences A summary of the results of our analysis follows: The estimates of the Hurst parameter for the wavelet-based H estimator and Whittle’s MLE are shown in Table 4. All results were averaged over 30 sequences. (a) The results for the wavelet-based H estimator with the corresponding 95% conﬁdence intervals b 1:96^ H r b , (see Table 4), show that for all input H values, the F-ARIMA, the FFT and the FGNH DW methods produced sequences with less biased H values than other methods.Using the FFT method, for H = 0.6, 0.7 and 0.8, the values of the Hurst parameter from the sample sequences match the required values well, but for H = 0.9, the accuracy of the match is lower. Relative error was +0.08%,

Table 4 Mean values of estimated H using the wavelet-based H estimator for the ﬁve ﬁxed-length sequence generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% conﬁdence intervals for the means in parentheses Methods

Mean values of estimated H and DH 0.6

0.7

b H F-ARIMA FFT FGN-DW RMD SRA

.5974 .6005 .6013 .5963 .5848

(.593, (.596, (.574, (.591, (.579,

.601) .604) .629) .601) .589)

DH(%)

b H

0.427 +0.083 +0.214 0.613 2.528

.6990 .6967 .6987 .6907 .6797

0.8

(.693, (.692, (.671, (.684, (.674,

.704) .700) .726) .696) .685)

DH(%)

b H

0.142 0.469 0.185 1.332 2.899

.7947 .7862 .7962 .7805 .7700

0.9

(.787, (.782, (.769, (.773, (.763,

.801) .790) .824) .787) .776)

DH(%)

b H

DH(%)

0.663 1.719 0.474 2.443 3.744

.8900 (.880, .899) .8639 (.859, .867) .8938 (.866, .921) .859 2 (.852, .866) .8499 (.842, .856)

1.115 4.012 0.694 4.536 5.568

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

347

0.47%, 1.75% and 4.11%, respectively.The estimated values of the F-ARIMA method were similar to the FFT method. For H = 0.6, 0.7, 0.8 and 0.9, all values of the Hurst parameter from the sample sequences were lower than the required values. Relative error was 0.43%, 0.14%, 0.66% and 1.12%, respectively.The FGN-DW method demonstrated a high level of accuracy and was fast. For H = 0.6, 0.7, 0.8 and 0.9, the relative error was +0.21%, 0.19%, 0.47% and 0.69%, respectively.The RMD method generated approximately self-similar sequences [28,30]. For H = 0.6, 0.7, 0.8 and 0.9, the Hurst parameter tended to be lower than the required value. Relative error was 0.61%, 1.33%, 2.44% and 4.54%, respectively.The SRA method results were similar to the RMD results. SRA generated self-similar sequences with the most biased H values. For a more detailed discussion, see [30]. b 1:96^ (b) The results for Whittle’s MLE with the corresponding 95% conﬁdence intervals H r b , (see Table 5), H show that for all input H values, the FFT and the FGN-DW methods produced sequences with less biased H values than other methods.The FFT method demonstrated a high level of accuracy. For H = 0.6, 0.7, 0.8 and 0.9, the values of the Hurst parameter from the sample sequences match the required values very well. Relative error was +0.03%, +0.03%, +0.03% and +0.02%, respectively.Using the F-ARIMA method, for H = 0.6, 0.7, 0.8 and 0.9, all values of the Hurst parameter from the sample sequences were lower than the required values. Relative error was 3.28%, 5.31%, 6.64% and 7.51%, respectively.The FGN-DW method is more accurate than the F-ARIMA, RMD and SRA methods, but not the FFT method. For H = 0.6, 0.7, 0.8 and 0.9, the relative error was 2.52%, 3.92%, 4.75% and 5.22%, respectively.The RMD method generated approximately self-similar sequences [28,30]. For H = 0.6, 0.7, 0.8 and 0.9, the Hurst parameter tended to be lower than the required value. Relative error was 3.91%, 6.18%, 7.48% and 8.21%, respectively.The SRA method results were similar to the RMD results. It generated self-similar sequences with the most biased H values. For a more detailed discussion, see [31]. Our results show that all ﬁve generators produced approximately self-similar sequences, with the relative inaccuracy DH increasing with H, but always remaining below 9%. 5.2. Complexity and speed of generation The computational complexities of the ﬁve ﬁxed-length sequence generators for generating pseudo-random self-similar sequences of a given length n are shown in Table 6. The number of arithmetic operations per number required by each of them are also shown in Table 6. The F-ARIMA method was the slowest, requiring 178n2 + 3,936 time operations per number. FFT was also slower than the FGN-DW, RMD and SRA methods. The results of our experimental analysis of the mean times required by each generator are shown in Fig. 9. Our main conclusions are: (a) The F-ARIMA method was the slowest of the ﬁve methods, as expected. (b) On average, the FFT method was faster than F-ARIMA, but slower than the other three. This was caused by the relatively high complexity of the inverse FFT algorithm. FFT requires O(n log n) computations to generate n numbers [45] and 49 log n + 4175 time operations per number. Table 5 Mean values of estimated H using Whittle’s MLE for the ﬁve ﬁxed-length sequence generators for H = 0.6, 0.7, 0.8 and 0.9. We give 95% conﬁdence intervals for the means in parentheses Methods

F-ARIMA FFT FGN-DW RMD SRA

Mean values of estimated H and DH 0.6

0.7

0.8

b H

DH(%)

b H

DH(%)

b H

DH(%)

b H

DH(%)

3.281 +0.027 2.521 3.910 3.965

.6628 .7002 .6725 .6567 .6563

5.308 +0.033 3.924 6.180 6.249

.7469 (.738, .756) .8003 (.791, .809) .762 (.753, .771) .7401 (.731, .749) .7395 (.730, .749)

6.642 +0.033 4.745 7.482 7.567

.8324 (.823, .842) .9002 (.891, .909) .853 (.844, .862) .8261 (.817, .835) .8252 (.816, .834)

7.507 +0.024 5.223 8.214 8.311

.5803 .6002 .5849 .5765 .5762

(.571, (.591, (.575, (.567, (.567,

.590) .610) .594) .586) .586)

(.654, (.691, (.663, (.647, (.647,

.672) .710) .682) .666) .666)

0.9

348

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

Table 6 Computational complexities and arithmetic operations required by each of the ﬁve ﬁxed-length sequence generators Method

Complexity

Operations per number

F-ARIMA FFT FGN-DW RMD SRA

O(n2) O(n log n) O(n) O(n) O(n)

178n2 + 3936 49 log n + 4175 4091 4103 4167

log 2 (Mean Running Time in Seconds)

18

F-ARIMA FFT FGN-DW RMD SRA

16 14 12 10 8 6 4 2 0 15

16

17 18 log 2 (Sequence of Numbers)

19

20

Fig. 9. Mean running times of the ﬁve ﬁxed-length sequence generators. Running times were obtained using the SunOS 5.7 time command on a Pentium II (233 MHz, 512 MB); each mean is averaged over 30 iterations, each with sequences of 32,768 (215), 65,536 (216), 131,072 (217), 262,144 (218), 524,288 (219) and 1,048,576 (220) numbers.

(c) The FGN-DW, RMD and SRA methods were equally fast. The theoretical complexity of forming a spectral density, and constructing normally distributed complex numbers, is O(1), while the inverse DWT is O(n) [20,55]. Thus, the time complexity of FGN-DW is also O(n) and this method requires 4091 operations per number. The theoretical algorithmic complexity of the RMD and SRA methods is also O(n) [49] and they require 4103 and 4167 operations per number, respectively. Overall, our results showed that the generator based on FGN-DW is the fastest of the ﬁve generators if long sequences of self-similar pseudo-random numbers are required. 6. Sequential generators versus ﬁxed-length sequence generators We have presented the results of a comparative analysis of six sequential generators of (long) pseudo-random self-similar sequences. All six sequential generators, based on the FBNDP, SFRP, MGIP, PMPP, SRPFGN and SAP methods, generated approximately self-similar sequences; SRP-FGN was the most accurate. However, our results show that for most input H values, the MGIP and SAP-based generators were strongly biased. The FBNDP method was biased for H = 0.9. The analysis of mean times required to generate sequences of a given length demonstrates that all six sequential generators are more attractive than the F-ARIMA-based generator for practical simulation studies of communication networks because they are much faster. However, these generators require more input parameters, and selecting appropriate values is a problem that remains. Furthermore, in the case of SAP, the question of how to deﬁne the relationship between the Hurst parameter and the two shape parameters (i.e., a1 > 0 and a2 > 0) of a beta-distribution remains.

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

349

We have also presented the results of a comparative analysis of ﬁve ﬁxed-length generators of self-similar sequences. All ﬁve, based on the F-ARIMA, FFT, FGN-DW, RMD and SRA methods, generated approxb below 9% if 0.6 6 H 6 0.9. Howimately self-similar sequences, with the relative inaccuracy of the resultant H ever, the analysis of mean times required to generate sequences of a given length shows that the FFT, FGNDW, RMD, and SRA generators are more attractive for practical simulation studies of communication networks because they generate sequences much faster. When the wavelet-based H estimator and Whittle’s MLE (the least biased of the H estimation techniques), are applied (see Chapter 3 in [31]), FFT produces the most accurate results, with the FGN-DW results almost as accurate. Thus, FFT and FGN-DW are the most practical in both accuracy and speed for simulation studies with self-similar input. Table 7 and Fig. 10 show a comparison of the three fastest and most accurate generators of the six sequential and ﬁve ﬁxed-length sequence generators: the SRP-FGN, FFT and FGN-DW generators. While estimated H values for the FGN-DW method obtained using the wavelet-based H estimator were more accurate than those for the SRP-FGN and FFT methods, those for the FFT method obtained using Whittle’s MLE were the most accurate. Even though the SRP-FGN and FGN-DW methods have the same computational complexity, O(n), the FGN-DW method was faster than the SRP-FGN and FFT methods, as shown in Fig. 10. Furthermore, while the SRP-FGN method required three input parameters (i.e., H, M and T), the FFT and FGN-DW methods required only the Hurst parameter H. Thus, the FFT method was more accurate than the SRP-FGN and FGN-DW methods, and the FGN-DW method was faster than the other two. Overall, all three methods Table 7 Comparison of the most eﬃcient SRP-FGN, FFT and FGN-DW methods. Relative inaccuracies of mean values of estimated H obtained using the wavelet-based H estimator and Whittle’s MLE Estimator

Method

0.6

0.7

0.8

0.9

Wavelet-based

SRP-FGN FFT FGN-DW

0.968 +0.083 +0.214

0.569 0.469 0.185

+0.700 1.719 0.474

+0.344 4.012 0.694

Whittle’s MLE

SRP-FGN FFT FGN-DW

+0.110 +0.027 2.521

+1.905 +0.033 3.924

+2.539 +0.033 4.745

+2.526 +0.024 5.223

log2 (Mean Running Time in Seconds)

5

DH

SRP-FGN FFT FGN-DW

4 3 2 1 0 −1

15

16

17 18 log 2 (Sequence of Numbers)

19

20

Fig. 10. Mean running times of the SRP-FGN, FFT and FGN-DW generators. Running times were obtained using the SunOS 5.7 time command on a Pentium II (233 MHz, 512 MB); each mean is averaged over 30 iterations, each with sequences of 32,768 (215), 65,536 (216), 131,072 (217), 262,144 (218), 524,288 (219) and 1,048,576 (220) numbers.

350

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

are more attractive for practical simulation studies of telecommunication networks than the other nine generators. 7. Conclusions One of the problems that communication network researchers face when conducting simulation studies is how to generate long synthetic sequential self-similar sequences. Three aspects must be considered: (i) how accurately self-similar processes can be generated, (ii) how quickly the methods generate long self-similar sequences, and (iii) how appropriately self-similar processes can be used in sequential simulations. Most of the existing synthetic methods for generating self-similar sequences require large amounts of CPU time. Some current methods must store either part, or all, of the sequence in memory before generating numbers of a sequence. In addition, they are often inaccurate or inappropriate in simulation studies of communication networks. Sequential generators of self-similar sequences depend on the level of approximation, and need several input parameters to be assumed, while ﬁxed-length sequence generators need only to assume the Hurst parameter to generate self-similar sequences. Certainly, more eﬃcient and accurate generators of self-similar sequences of pseudo-random numbers are needed. A comparative study of self-similar pseudo-random teletraﬃc generators was undertaken. Overall, our results, obtained using the wavelet-based H estimator and Whittle’s MLE, which are the least biased of the H estimation techniques considered in [31], have revealed that the fastest and most accurate generators of the six sequential and ﬁve ﬁxed-length sequence generators considered are the SRP-FGN, FFT and FGN-DW methods. However, these methods have both strengths and weaknesses. The FFT and FGNDW methods are more attractive for non-sequential simulations, because they can generate the required number of sequences more accurately and quickly than the SRP-FGN method. If the FFT and FGN-DW methods are used for sequential simulations, suﬃcient numbers of sequences must be generated before the simulation begins. However, the required number of sequences is not easy to predict in practical simulations. On the other hand, the SRP-FGN method is more attractive for sequential simulations, because it does not need to ‘‘know’’ the required number of sequences beforehand. Unfortunately, this method is less accurate and requires more generating time than the FFT method. Network layer traﬃc generators that can generate packet arrival processes have been addressed in this paper. In practical simulation studies of communication networks, application level traﬃc generators are more approaching to the real network traﬃc. While these network layer traﬃc generators are more ﬂexible [31], those application-layer traﬃc generators [59–61] are only ﬁt to a speciﬁc process. On the other hand, for practical simulation studies of communication networks, self-similar processes with arbitrary marginal distributions are needed. In order to obtain these, it is needed to transform a given sequence from the self-similar processes into a sequence that represents a realisation of a speciﬁc process. For example, the FFT and FGN-DW methods can be used to synthesise VBR video traﬃc, and the SRP-FGN method can be used to investigate the queueing behaviour in (steady-state) simulation studies of queueing systems fed by self-similar input. Acknowledgements We thank anonymous referees for their constructive remarks and valuable comments. This work was partially supported by the Korean Bible University’s Research Grant, and Korea Institute of Science and Technology Information, Korea. References [1] T. Karagiannis, M. Molle, M. Faloutsos, Long-range dependence: ten years of internet traﬃc modelling, IEEE Internet Computing 8 (5) (2004) 57–64. [2] W. Leland, M. Taqqu, W. Willinger, D. Wilson, On the self-similar nature of ethernet traﬃc (extended version), IEEE ACM Transactions on Networking 2 (1) (1994) 1–15. [3] W. Willinger, M. Taqqu, R. Sherman, D. Wilson, Self-similarity through high-variability: statistical analysis of ethernet LAN traﬃc at the source level, IEEE ACM Transactions on Networking 5 (1) (1997) 71–86.

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

351

[4] R.E.A. Khayari, R. Sadre, B.R. Haverkort, A. Ost, The pseudo-self-similar traﬃc model: application and validation, Performance Evaluation 56 (1–4) (2004) 3–22. [5] J. Lo´pez-Ardao, C. Lo´pez-Garcı´a, A. Sua´rez-Gonza´lez, M. Ferna´ndez-Veiga, R. Rodrı´guez-Rubio, On the use of self-similar processes in network simulation, ACM Transactions on Modeling and Computer Simulation 10 (2) (2000) 125–151. [6] S. Robert, J.-Y. LeBoudec, On a Markov modulated chain exhibiting self-similarities over ﬁnite timescale, Performance Evaluation 27–28 (1996) 159–173. [7] S. Robert, J.-Y. LeBoudec, New models for pseudo self-similar traﬃc, Performance Evaluation 30 (1–2) (1997) 57–68. [8] A. Andersen, B. Nielsen, An application of superpositions of two-state Markovian sources to the modelling of self-similar behaviour, in: Proceedings of IEEE INFOCOM’97, Kobe, Japan, 1997, pp. 196–204. [9] S. Lowen, M. Teich, Estimation and simulation of fractal stochastic point processes, Fractals 3 (1) (1995) 183–210. [10] B. Ryu, S. Lowen, Point process approaches to the modeling and analysis of self-similar traﬃc – Part i: model construction, in: Proceedings of IEEE INFOCOM’96, San Francisco, CA, 1996, pp. 1468–1475. [11] B. Ryu, S. Lowen, Point process models for self-similar network traﬃc, with applications, Communications in Statistics-Stochastic Models 14 (3) (1998) 735–761. [12] B. Ryu, Fractal Network Traﬃc: from Understanding to Implications, Ph.D. thesis, Graduate School of Arts and Sciences, Columbia University, 1996. [13] B. Mandelbrot, Long-run linearity, locally gaussian processes, H-spectra and inﬁnite variances, International Economic Review 10 (1969) 82–113. [14] M. Taqqu, J. Levy, Using renewal processes to generate long-range dependence and high variability, Dependence in Probability and Statistics 11 (1986) 73–89. [15] D. Cox, V. Isham, Point Processes, Chapman and Hall, New York, 1980. [16] D. Cox, Long-range dependence: a review, in: H.A. David, H.T. David (Eds.), Statistics: An Appraisal, Iowa State Statistical Library, The Iowa State University Press, 1984, pp. 55–74. [17] T. Le-Ngoc, S. Subramanian, A pareto-modulated poisson process (PMPP) model for long-range dependent traﬃc, Computer Communications 23 (2000) 123–132. [18] T. Taralp, M. Devetsikiotis, I. Lambadaris, A. Bose, Eﬃcient fractional Gaussian noise generation using the spatial renewal process, in: Proceedings of IEEE International Conference on Communications (ICC’98), Atlanta, GA, USA, 1998, pp. 7–11. [19] C. Granger, Long memory relationships and the aggregation of dynamic models, Journal of Econometrics 14 (1980) 227–238. [20] V. Paxson, Fast, approximate synthesis of fractional gaussian noise for generating self-similar network traﬃc, Computer Communication Review, ACM SIGCOMM 27 (5) (1997) 5–18. [21] J. Hosking, Modeling persistence in hydrological time series using fractional diﬀerencing, Water Resources Research 20 (12) (1984) 1898–1908. [22] A. Erramilli, P. Pruthi, W. Willinger, Fast and physically-based generation of self-similar network traﬃc with applications to ATM performance evaluation, in: S. Andradottir, K.J. Healy, D.H. Withers, B.L. Nelson (Eds.), Proceedings of the 1997 Winter Simulation Conference, Atlanta, Georgia, 1997, pp. 997–1004. [23] A. Erramilli, R. Singh, P. Pruthi, Chaotic maps as models of packet traﬃc, in: J. Labetoulle, J.W. Roberts (Eds.), The Fundamental Role of Teletraﬃc in the Evolution of Telecommunications Networks: Proceedings of the 14th International Teletraﬃc Congress-ITC 14, 1a, Elsevier, Amsterdam, Antibes Juan-les-Pins, France, 1994, pp. 329–338. [24] W. Leland, W. Willinger, M. Taqqu, D. Wilson, Statistical analysis and stochastic modeling of self-similar data traﬃc, in: J. Labetoulle, J.W. Roberts (Eds.), The Fundamental Role of Teletraﬃc in the Evolution of Telecommunications Networks: Proceedings of the 14th International Teletraﬃc Congress-ITC 14, 1a, Elsevier, Amsterdam, Antibes Juan-les-Pins, France, 1994, pp. 319–328. [25] E. Costamagna, L. Favalli, P. Gamba, G. Iacovoni, A simple model for VBR video traﬃc based on chaotic maps: validation through evaluation of ATM multiplexers QoS parameters, in: Proceedings of IEEE International Conference on Communications (ICC’98), Atlanta, GA, USA, 1998, pp. S16_6.1–S16_6.5. [26] H.-D. Jeong, D. McNickle, K. Pawlikowski, Fast self-similar teletraﬃc generation based on FGN and wavelets, in: Proceedings of the IEEE International Conference on Networks, ICON’99, Brisbane, Australia, 1999, pp. 75–82. [27] H.-D. Jeong, J.-S. Lee, D. McNickle, K. Pawlikowski, Distributed steady-state simulation of telecommunication networks with selfsimilar teletraﬃc, Simulation Practice and Theory 13 (3) (2005) 233–256. [28] W.-C. Lau, A. Erramilli, J. Wang, W. Willinger, Self-similar traﬃc generation: the random midpoint displacement algorithm and its properties, in: Proceedings of IEEE International Conference on Communications (ICC’95), Seattle, WA, 1995, pp. 466–472. [29] A. Crilly, R. Earnshaw, H. Jones, Fractals and Chaos, Springer-Verlag, New York, 1991. [30] H.-D. Jeong, D. McNickle, K. Pawlikowski, A comparative study of three self-similar teletraﬃc generators, in: Proceedings of 13th European Simulation Multiconference, ESM’99, vol. 1, Warsaw, Poland, 1999, pp. 356–362. [31] H.-D. Jeong, Modelling of Self-Similar Teletraﬃc for Simulation, Ph.D. thesis, Department of Computer Science, University of Canterbury, 2002. [32] W. Feller, An Introduction to Probability Theory and Its ApplicationsThird edition, I, John Wiley & Sons, Inc., New York, 1968. [33] M. Parulekar, A. Makowski, M/G/1 input processes: a versatile class of models for network traﬃc, in: Proceedings of IEEE INFOCOM’97, Kobe, Japan, 1997, pp. 419–426. [34] N. Likhanov, B. Tsybakov, N. Georganas, Analysis of an ATM buﬀer with self-similar (‘‘fractal’’) input traﬃc, in: Proceedings of IEEE INFOCOM’95, Boston, Massachusetts, 1995, pp. 985–992. [35] C. Klu¨ppelberg, Subexponential distributions and integrated tails, Journal of Applied Probability 25 (1988) 132–141.

352

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353

[36] P. Jelenkovic´, The Eﬀect of Multiple Time Scales and Subexponentiality on the Behavior of a Broadband Network Multiplexer, Ph.D. thesis, Graduate School of Arts and Sciences, Columbia University, 1996. [37] H. Leroux, M. Hassan, R. Egudo, Are inter-arrival times of internet traﬃc also self-similar?, in: Proceedings of IEEE International Conference on Telecommunications (ICT’99), vol. 1, Cheju, South Korea, 1999, pp. 549–553. [38] V. Paxson, S. Floyd, Wide-area traﬃc: the failure of poisson modeling, IEEE ACM Transactions on Networking 3 (3) (1995) 226– 244. [39] M. Roughan, J. Yates, D. Veitch, The mystery of the missing scales: pitfalls in the use of fractal renewal processes to simulate LRD processesApplications of Heavy Tailed Distributions in Economics, Engineering and Statistics, American University, Washington, DC, 1999. [40] S. Ledesma, D. Liu, Synthesis of fractional Gaussian noise using linear approximation for generating self-similar network traﬃc, Computer Communication Review, ACM SIGCOMM 30 (2) (2000) 4–17. [41] J. Beran, Statistics for Long-Memory Processes, Chapman and Hall, New York, 1994. [42] J. Beran, Statistical methods for data with long range dependence, Statistical Science 7 (4) (1992) 404–427. [43] S. Schiﬀ, Resolving time-series structure with a controlled wavelet transform, Optical Engineering 31 (11) (1992) 2492–2495. [44] W. Feller, Second ed.An Introduction to Probability Theory and Its Applications, II, John Wiley & Sons Inc., New York, 1966. [45] W. Press, B. Flannery, S. Teukolsky, W. Vetterling, Numerical Recipes: The Art of Scientiﬁc Computing, Cambridge University Press, Cambridge, 1986. [46] W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Numerical Recipes in C, Cambridge University Press, Cambridge, 1999. [47] J. Hosking, Fractional dﬀerencing, Biometrika 68 (1) (1981) 165–176. [48] H.-O. Peitgen, H. Jurgens, D. Saupe, Chaos and Fractals: New Frontiers of Science, Springer-Verlag, New York, 1992. [49] H.-O. Peitgen, D. Saupe, The Science of Fractal Images, Springer-Verlag, New York, 1988. [50] V. Ribeiro, R. Riedi, M. Crouse, R. Baraniuk, Simulation of nonGaussian long-range-dependent traﬃc using wavelets, in: Performance Evaluation Review, Proceedings of ACM SIGMETRICS’99, vol. 27(1), 1999, pp. 1–12. [51] R. Riedi, M. Crouse, V. Ribeiro, R. Baraniuk, A multifractal wavelet model with application to network traﬃc, IEEE Transactions on Information Theory 45 (3) (1999) 992–1018. [52] D. Veitch, J.-A. Ba¨ckar, J. Wall, J. Yates, M. Roughan, On-line generation of fractal and multifractal traﬃc, in: The PAM2000 Passive and Active Measurement Workshop, Hamilton, New Zealand, 2000. [53] I. Daubechies, in: Ten Lectures on WaveletsCBMS-NSF Regional Conference Series in Applied Mathematics, 61, SIAM Press, Philadelphia, Pennsylvania, 1992. [54] M. Roughan, D. Veitch, P. Abry, On-line estimation of the parameters of long-range dependence, in: Proceedings of GLOBECOM’98, Sydney, Australia, 1998, pp. 3716–3721. [55] M. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, A.K. Peters, Ltd., Wellesley, Massachusetts, 1994.+++. [56] P. Flandrin, Wavelet analysis and synthesis of fractional Brownian motion, IEEE Transactions on Information Theory 38 (2) (1992) 910–917. [57] L. Kaplan, C.-C. Kuo, Fractal estimation from noisy data via discrete fractional Gaussian noise (DFGN) and the Haar Basis, IEEE Transactions on Signal Processing 41 (12) (1993) 3554–3562. [58] P. Abry, D. Veitch, Wavelet analysis of long-range-dependent traﬃc, IEEE Transactions on Information Theory 44 (1) (1998) 2–15. . [59] J. Cao, W.S. Cleveland, Y. Gao, K. Jeﬀay, F.D. Smith, M. Weigle, Stochastic models for generating synthetic HTTP source traﬃc, in: Proceedings of IEEE INFOCOM’2004, Hong Kong, 2004, pp. 1547–1558. [60] J. Wallerich, Design and Implementation of a WWW Workload Generator for the NS-2 Network Simulator, 2006. . [61] M. Weigle, P. Adurthi, F. Herna´ndez-Campos, K. Jeﬀay, F. Smith, Tmix: a tool for generating realistic TCP application workloads in NS-2, Computer Communication Review, ACM SIGCOMM 36 (3) (2006) 65–76, Jul.

A Glossary of Terms and Acronyms ACF: autocorrelation function CDF: cumulative distribution function CPU: central processing unit DW: Daubechies wavelets DWT: discrete wavelet transform F-ARIMA: fractional autoregressive integrated moving-average FBM: fractional Brownian motion FBNDP: fractal-binomial-noise-driven Poisson process FFT: fast Fourier transform FGN: fractional Gaussian noise FGN-DW: fractional Gaussian noise and Daubechies wavelets FRP: fractal renewal process FSNDP: fractal-short-noise-driven Poisson process IDC: index of dispersion for counts

Hae-Duck J. Jeong et al. / Simulation Modelling Practice and Theory 15 (2007) 328–353 IDWT: inverse discrete wavelet transform IID: independent, identically distributed LRD: long-range dependence MB: mega byte MGIP: M/G/1 processes ML: mean length MLE: maximum likelihood estimator MNPE: mean numbers of Poisson events MPEG: moving picture experts group NRN: normal random number PDF: probability density function PMPP: Pareto-modulated Poisson processes RMD: random midpoint displacement SAP: superposition of autoregressive processes SFRP: superposition of fractal renewal processes SRA: successive random addition SRP-FGN: spatial renewal processes and fractional Gaussian noise

353

Suggestions of efficient self-similar generators

Suggestions of efficient self-similar generators

Recommend Documents