On weak convergence of long-range-dependent traffic processes

On weak convergence of long-range-dependent traffic processes

Journal of Statistical Planning and Inference 80 (1999) 155–171 www.elsevier.com/locate/jspi On weak convergence of long-range-dependent trac proce...

116KB Sizes 4 Downloads 104 Views

Journal of Statistical Planning and Inference 80 (1999) 155–171

www.elsevier.com/locate/jspi

On weak convergence of long-range-dependent trac processes R.G. Addie ∗ Department of Mathematics & Computing, University of Southern Queensland, Toowoomba, Queensland 4250, Australia Received 11 November 1997; accepted 19 June 1998

Abstract A possible model for communication trac is that the amount of work arriving in successive time intervals is jointly Gaussian. This model seems to y in the face of certain obvious and characteristic features of real trac, such as the fact that it arrives in discrete bundles and that there is often a non-zero probability of zero trac in a time interval of signi cant length. Also, the Gaussian model allows the possibility of negative trac, which is clearly unrealistic. As the number of sources of trac increases and the quantity of trac in communication networks increases, however, under suitable conditions, the deviation between the distribution of real trac and the Gaussian model will become less. The appropriate concept of topology=convergence must be used or the result will be meaningless. To identify an appropriate convergence framework, the performance statistics associated with a network, namely cell loss, delay, and, in general, statistics which can be expressed in terms of the network bu ers which accumulate in the network may be used as a guide. Weak convergence of probability measures has the property that when the probability measures of trac processes converge to that of a certain trac process, the distribution of their performance characteristics, such as bu er occupancy, also converges in the same sense to the performance of the system to which they were converging. Real trac appears, unambiguously, to be long-range dependent. There is an interesting example where aggregation of trac does not seem to produce convergence to the queueing behaviour expected of Gaussian trac, at any rate the tail characteristics do not converge to those of the Gaussian result. However, in Section 4, it is shown that if the variance of one trac stream is nite and as a proportion of the variance of the whole trac volume tends to zero, then the trac in networks can be expected to converge to Gaussian in the sense of weak convergence of probability measures. It is then shown that, as a consequence, the trac in the paradoxical example does converge in this sense also. The paradox is explained by noticing that asymptotic tail behaviour may become increasingly irrelevant as trac is aggregated. This fact should sound a warning concerning the cavalier use of tail-behaviour as an indication of performance. Long-range dependence apparently places no inhibition on convergence to Gaussian behaviour. Convergence to a Gaussian distribution of increasing aggregates of trac is only shown to occur for discrete time models. In fact it appears that continuous time Gaussian models do not share this property and their use for modelling real trac may be problematic. ? 1999 Elsevier Science B.V. All rights reserved. ∗

Fax: +61-7-631-1775. E-mail address: [email protected] (R.G. Addie)

c 1999 Elsevier Science B.V. All rights reserved. 0378-3758/99/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 9 8 ) 0 0 2 4 8 - 1

156

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

1. Introduction The paper by Leland et al. (1994) and other papers have demonstrated the fact that real trac exhibits long-range dependence. One possible model which exhibits long-range dependence is the fractional Brownian noise (FBN) process which was proposed as a possible model for trac in high-speed networks in Leland et al. (1994). Real trac generated by a small number of trac sources often exhibits a markedly di erent marginal distribution of work arriving in a sample interval than the Gaussian distribution implied by fractional Brownian noise. For example, it is common for sources to issue no trac at all for an extended period of time, so that a succession of zero samples would be observed in a sampling process. However, another point of view is that as networks grow, and the number of trac streams being multiplexed on a single link increases, the shape of trac must become closer and closer to Gaussian. This is the point of view put forward in this paper. It has become quite common to use continuous time models for trac in communication networks. At rst sight, it would seem that the choice of continuous or discrete time model should not have any fundamental importance. However, in practice, there are some important distinctions between the continuous and the discrete-time frameworks which are particularly relevant to the topic of this paper. The main result of this paper, that trac converges to a Gaussian distribution as it is aggregated, may not hold in the continuous-time framework at all. In fact, it is not even clear that the obvious continuous-time model, fractional Brownian noise (FBN), can be thought of as realistic. This is because FBN would normally have substantial periods of time during which the net arriving trac is negative, during small time intervals. This is unrealistic. (One way around this might be to adopt an autocorrelation function which is very close to one for an interval near the origin – however this approach remains untried as far as the author is aware.) Discrete-time Gaussian models su er from this same defect (negative trac) to a much lesser degree and as more and more trac is aggregated the quantity of negative trac required in a Gaussian model with the same second-order characteristics as the real trac and the same sampling interval becomes less. Therefore, it is possible that real trac can converge to a discrete-time Gaussian model, under aggregation, even though there is no such thing as negative trac in the real world. To identify an appropriate convergence framework for trac in communication networks, the performance statistics associated with a network, namely cell loss, delay, and, in general, statistics which can be expressed in terms of the statistics of the bu ers which accumulate trac for short time periods in the network may be used as a guide. An appropriate topology probably should have the characteristic that when two trac models are close, their associated performance characteristics are also close. Weak convergence, and the associated topology, has this characteristic, as we shall see below. Giving this degree of precedence to performance measures is fairly traditional and there is no reason to suppose that performance of networks is becoming less important today than it used to be. O ered trac continues to expand to meet the available bandwidth

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

157

and users continue to seek the best possible performance. If there is a situation where performance can sensibly be regarded as irrelevant, then a di erent de nition of convergence of probability measures than the one adopted in this paper might be more appropriate, but for the purpose of this paper, performance issues retain their customary pre-eminence. In particular, it is known that (Kennedy, 1972; Whitt, 1974) queues are continuous in the topology implied by weak convergence. More precisely: the function which maps the stochastic processes representing arrival and service processes of a queue to the queue length process of the queue is continuous in the topology of weak convergence. These results will be reviewed and revised to ensure that they apply to the type of convergence which is relevant in a communication network. In particular, we do not talk about arrival processes and service processes in the context of a communication network. Instead, we talk about trac processes. The next step of our program, in Section 5, is to con rm that weak convergence to a Gaussian probability measure does take place in a typical communication network. Where does long-range dependence come into this? Because weak convergence is a phenomenon which is de ned in terms of the nite-dimensional distributions of the probability measures in question, the long-range dependence of these probability measures does not signi cantly alter convergence questions. The probability measures we are likely to meet in a communication network in practice will be long-range dependent, and at rst sight one would expect this fact to be relevant to the type of convergence which is of interest. However, we will show that the relevant convergence concept, which is weak convergence of measures, does not need to be modi ed for long-range-dependent processes. Finally, we consider an example where, somewhat paradoxically, there appears to be evidence that convergence of queueing behaviour to that expected of a Gaussian model does not occur. The evidence in this particular case is that it has been shown that the asymptotic shape of the tail of the queueing distribution is not consistent with the asymptotic shape which applies to the Gaussian case. A simple explanation of this paradox can be given, and is quite revealing: queueing tail behaviour is not continuous. That is to say, if a system of queues converges weakly to a speci c queueing sytem, there is no reason to suppose that the tail behaviour of the sequence converges to the tail behaviour of that queue. This reminds us of an all-too-obvious fact about asymptotic tail behaviour – it might be irrelevant and misleading. In case this seems to be a peripheral issue, recall that the majority of results available for queues with anything but trivial autocorrelation properties, and Gaussian queues in particular, are largely con ned to tail approximations. In order for these results to have any real validity we need to have grounds for believing that the tail behaviour is actually typical of the queue distribution as a whole. In the case of queues with Gaussian input there is a signi cant body of evidence that the tail behaviour is characteristic of the queue as a whole. In the case of non-Gaussian queues an implication of the convergence result in Section 5 is that when there is an inconsistency with the Gaussian

158

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

model, the tail must have disappearingly small signi cance as more and more trac is aggregated. 2. The single server queue Performance of many communication systems in networks is well modelled by a single server queue: Vn+1 = (Vn + Yn )+ ;

n¿0;

in discrete time in which the net input to the queue, Yn , represents the di erence between net arriving work and the net ability to complete work at each time instant. Then, n P Yj ; n¿0: (1) Vn = max 06k6n+1 j=k

(When k =n+1 the sum is taken to equal zero.) This is the key to obtaining large deviations results for the stationary distribution of {Vn }n¿0 . Let Skn = Yk + · · · + Yn ; k; n¿0. Eq. (1) implies the following bounds: n P P{Skn ¿ x}; (2) max P{Skn ¿ x}6P{Vn ¿ x}6 k6n+1

k=0

which gives us a lead for how to apply the large deviations theory, because both bounds are expressed in terms of partial sums of the series {Yn }n¿0 . Note that Eq. (2) holds even when n = ∞, which is usually the case of most interest. If one term in the sum on the RHS of Eq. (2) was dominant, these bounds would be quite close together. This is not usually the case; however the style of approximation sought in large deviations theory is somewhat relaxed, and in this context, the bounds in Eq. (2) can be regarded as quite close even when they are numerically very di erent. A large deviations approximations for the probability P{Vn ¿ x} is typically an approximation of log P{Vn ¿ x} which applies for large x. One of the conclusions of this paper is that there are some important systems in which a result which is asymptotically valid for large x may be quite misleading. 3. Queueing results Using methods from large deviations theory several authors (Dueld and O’Connell, 1995; Glynn and Whitt, 1993; Kesidis et al., 1993; Norros, 1994) have been able to obtain approximations for P{Vn ¿ x}, in the large deviations sense. It may be simplest if we divide these results into two cases, the ‘non-fractal’ case and the ‘fractal’ case. More precisely, Case 1 is where Var(S0n ) ¡ ∞; n→∞ n and Case 2 is where this limit is in nite. In Case 1, Glynn and Whitt (1993) have obtained the following result: lim

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

159

Proposition 1. Assume the R-valued arrival sequence {Yn }n¿0 is stationary and ergodic and setting 1 ÂS0n } n (Â) = log E{e n that there exists  ∗ ¿ 0; ∗ ¿ 0 and function (Â) deÿned on | −  ∗ | ¡ ∗ such that (Â) = limn→∞ n (Â) on | −  ∗ | ¡ ∗ ; (Â) ¡ ∞ on a neighbourhood of  ∗ ; is di erentiable at  ∗ ; ¿ 0; and 3. n ( ∗ ) ¡ ∞ for n¿1. 1. 2.

Then lim

x→∞

(Â ∗ )=0 and

0

(Â ∗ )

1 log P{V∞ ¿ x} = −Â ∗ : x

A preliminary result for Case 2 (except that in their paper time was continuous) was obtained by Norros (1994) and then made more precise by Dueld and O’Connell (1995). In the particular case where the trac is fractional Brownian noise, characterised by Var(S1n ) = 2 n2H ;

n¿0;

their result is equivalent to (after translating to the discrete time context of the present paper) −2H  H 1 −2(1−H ) −2 log P{V∞ ¿ t} = − 2 (1 − H ) : (3) lim t t→∞ 2 (1 − H )E{Y1 } Since this result was obtained, more precise results for the continuous time model have been obtained in Narayan (1998) and Norros et al. (1995), including methods to estimate the mass of the tail. In parallel with this work, the present author and others have pursued the analysis of discrete-time trac models with a high degree of correlation by somewhat di erent methods. In particular, in Addie and Zukerman (1993,1994), Case 1 was treated and a fairly explicit (albeit approximate) formula for the expected queueing performance was obtained for Gaussian trac. This result relied not only on the trac having a nite asymptotic index of dispersion but also on the tail being dominant. Under the assumption that E{Y1 } = m ¡ 0 and Var(S0n ) = v ¡ ∞; n→∞ n they showed in Addie and Zukerman (1993) that s∗ (−m; ) √ exp(s∗ t); P{V∞ ¿ t} ≈ − (4) erf (−u= 2) √ √ 2 2 where s∗ = 2m=v; u = m2 =v and (x; ) = (= 2)e−x =2 − (x=2)erfc(x= 2). The basis for the approximation in Eq. (4) is that the coecient in front of the exponential lim

160

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

tail, i.e. the weight of the tail, has been chosen as follows. When an approximation for the bu er contents of the form of Eq. (4) (but with an arbitrary weight for the exponential tail) is adopted, and the probability distribution of the bu er contents after one iteration of the basic cycle of the queueing process is calculated, the unique choice for the weight which produces a consistent result for the mean bu er contents before and after the queue cycle is the one used in Eq. (4). More recently, a more precise result for this model has been obtained in Choe and Shro (1999). Case 2 was treated in Addie et al. (1995). Again, the trac was assumed to be Gaussian, which is a restriction on the generality of the result, and the result was an approximation; on the other hand the result was fairly explicit. A formula was provided for the exponent and the mass of an exponential formula for the stationary queued work distribution. Here it is: √ 2 (−m; ) P{V∞ ¿ t} ≈  ! −2H  H 1 −2 2−2H t : (5) ×exp − 2 × |1 − H | 2 |(1 − H )m| The formula for the mass is consistent with the earlier result in the sense that as v → ∞ the mass coecient in Eq. (4) tends to the mass coecient in Eq. (5) – this was how it was obtained. Note that there was an error in one of the formulae in Addie et al. (1995) which implied that the initial factor in the exponential in Eq. (5) should be 2=2 instead of 1=22 . Eq. (5) has recently received con rmation by simulation (Lavaud, 1998). The recent results for the continuous-time case (Narayan, 1998; Norros et al., 1995) appear to be inconsistent with Eq. (5). The mass of the tail in (5) can be arbitrarily small whereas the mass of the continuous time model tail is of the same order of magnitude as 1 for all choices of parameters. In e ect the continuous-time model appears to be satisfactory as a heavy trac approximation only, whereas the light trac case may really be of more interest. This apparant inconsistency con rms that the continuous- and discrete-time models are quite di erent. To be speci c: the negative trac feature of the continuous-time model is always important, whereas in the discrete time model the quantity of negative trac may be quite small. We shall return to the contrast between the continuous- and discrete-time models after the main results of the paper are presented in Sections 4 and 5.

4. The suitability of weak convergence Since the queueing performance of a communication system depends more on the upper tail of the sample distribution of the trac than on the rest of the distribution

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

161

it is conceivable that a series of trac models should converge in distribution to Gaussian even though the queueing performance of the series of trac models does not converge to the queueing performance of the Gaussian trac. Kennedy (1972) and Whitt’s (1974) result on the continuity of queues seems to ensure, however, that so long as a series of stochastic processes (arrival processes and service processes) converge in distribution (the probability measures converge weakly) to a Gaussian process then the queueing distributions will converge weakly. The case of concern is where we are aggregating trac streams together, which is a little di erent to that considered in Kennedy (1972) and Whitt (1974). In particular, the function from one sample path space to another which is used to represent the queueing function is Eq. (1). In order to be able to apply the continuous mapping principle (Billingsley, 1968, Section 5), we need to show that this function is continuous (or at least continuous almost everywhere). Proposition 2. If the probability measures of the processes {Yn(k) }n¿0 ; k = 1; : : : ; converge weakly to the probability measure of the process {Zn }n¿0 then the probability measure of the bu er occupancy process of the processes {Yn(k) }n¿0 converges to that of the process {Zn }n¿0 . Proof. By the continuous mapping principle of Billingsley (1968), what we need to prove here is the continuity of the mapping f : R∞ → R∞ , de ned by f : {Yn }n¿0 7→ {Vn }n¿0 using formula (1). The appropriate topology on R∞ is the product topology, which is equivalent to the topology induced by the metric space described in Appendix 1 of Billingsley (1968). The sets Nk (x) = {y : |yi − xi | ¡ ; i = 1; : : : ; k}; x ∈ Rk ; k ∈ Z, form a base for this topology in R∞ (Billingsley, 1968) so to prove continuity of the mapping it is sucient to show that for each such neighbourhood centred on a point in the range of f there is another neighbourhood centred on each point x0 which maps to x which is mapped by f to a subset of Nk (x). Choose a speci c neighbourhood of this form and suppose that f(x0 ) = x. The neighbourhood Nk=k (x0 ) is mapped into Nk (x). This proves the continuity of f. It follows from the continuous mapping principle (Billingsley, 1968) that if the input trac processes to a queue converge weakly to Gaussian form then the probability measures of the queue contents will converge to the form expected when the input is Gaussian. Note that so far we have not mentioned the stationary probability measure of the queueing processes associated with this trac, which is where our interest really lies. It is essential for the convergence concept we use to have the property that when trac processes converge to a certain limit the stationary bu er contents probability

162

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

measures of these systems also converge to the appropriate limit. This result does not follow readily, if at all, from Proposition 2. To achieve the widest possible applicability, we want a result which applies not just to the marginal stationary probability measure of the bu er contents at one point in time, but rather to the joint stationary probability measure of all the bu er contents at all points of time. Note that this continuity result for stationary probability measures was not presented in Kennedy (1972) or Whitt (1974). Furthermore, it is this result rather than Proposition 2 which is important in the sequel. Proposition 3. If we now assume; in addition; that the trac processes {Yn(k) }n¿0 and {Zn }n¿0 are ergodic and stationary; and that the queueing systems into which they feed are stable; then; as k → ∞; the joint stationary probability measure of the queue length distribution of the system with input {Yn(k) }n¿0 tends to that of the system with input {Zn }n¿0 . Proof. Suppose now the trac input processes are de ned for negative time indices as well as positive, extending the probability measures to this space in the only way which is stationary and consistent with the probability measures of {Yn(k) }n¿0 . Now de ne the following function from the trac samples to queue bu er lengths: n P y−j ; (6) f( y) = sup n j=0

which is just the bu er contents in the system at time zero, under the condition that the net input to the queue is given by {yn }n60 up to now. The assumption of stability, which is that the stationary mean, m(k) = E{Yn(k) } ¡ 0, ensures that this function is well-de ned almost everywhere. The argument goes like this. By the Birkho ’s ergodic theorem, which is applicable because of the assumed ergodicity of all the trac processes, for all k¿0, ) ( n 1P (k) (k) = 1: (7) Y =m P lim n→∞ n j=0 −j Recall that m(k) =E{Y0(k) } ¡ 0. If we express (7) more formally, this means that, except on a set of probability zero, for any  ¿ 0 (in particular, choose  ¡ |m(k) |) there exists Pn Pn (k) (k) N ¿ 0 such that for all n ¿ N; |(1=n) j=0 Y−j −m(k) | ¡ . So, for n ¿ N; (1=n) j=0 Y−j Pn (k) is negative. Consequently, the supremum of j=0 Y−j , as n varies over {0; 1; : : :}, must Pn occur for n6N . It follows that for all k¿0, the set {y : supn j=0 y−j is not attained

for any n} has measure zero in the probability measure of the process Yn(k) . This argument is identical if applied to the process {Zn }n¿0 and shows, in addition, the following: Let AN denote the set of sample paths where the supremum in Eq. (6) S Pn occurs when n6N , i.e. AN ={y : supn6N j=0 y−j =f(y)}: Then P(Z ∈ N ¿0 AN )=1.

We want to show that the probability measure of f({Yn(k) }n∈Z ) converges weakly to that of f({Zn }n∈Z ) as k → ∞. By Theorem 5:1 of Billingsley (1968), this will follow if we can show that f is continuous except on a set of measure zero in the probability

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

163

measure of {Zn }n∈Z . But f is continuous on An , for each n, by the same argument S as in the previous proposition, and we have just seen that P( n¿0 An ) = 1. Therefore S n¿0 An is the desired set, of probability measure 1, where f is continuous. We are not quite done yet because we want to show that the joint stationary probability measure of bu er contents at all times also shares this continuity property. For this purpose it is sucient to check the continuity property for the joint probability measure of any nite subset of bu er lengths because weak convergence of the complete joint probability measure follows from weak convergence of the nite-dimensional projections (Billingsley, 1968, p. 19). Let us now de ne f[J ] : R∞ → RJ to be the function f[J ] (y) = (f0 (y); : : : ; fJ −1 (y)); y ∈ R∞ , where n P y−l : (8) fj (y) = sup n l=−j

The argument already given for f de ned at Eq. (6) applies without signi cant change if we replace it by f[J ] , so the joint stationary distribution also has the weak convergence property. These two propositions are o ered as evidence that weak convergence is the right concept of convergence to use for probability measures of trac processes in high speed networks. They say, e ectively, that when two types of trac are close, in this sense, then their performance characteristics will be close also – both the time-dependent performance characteristics observed as a system gradually tends to its stationary state, and the stationary performance characteristics. A great variety of performance measures may be de ned in terms of the joint distribution of bu er contents – for example jitter, which may be de ned as the variance of delay, may be estimated as the variance of bu er contents. Consequently the distribution of any estimator of jitter based on a nite sample from the queueing systems with input {Yn(k) }n∈Z must converge weakly to the corresponding estimate for the queueing system with input {Zn }n¿0 , and as a consequence the mean of these estimates, i.e. the jitter itself (assuming an unbiassed estimator is used), must also converge. It is not reasonable to insist that weak convergence is the only possible concept of convergence applicable to trac processes. However, the results of this section show that it does have a strong claim to providing a valid and useful concept of convergence and approximation, especially in a context where performance issues are the primary concern. 5. A Gaussian limit The fact that the streams being aggregated exhibit long-range dependence apparently does not a ect convergence of the nite-dimensional projections. A sucient condition in order to be able to apply a Central Limit Theorem is that each trac stream has nite variance, that the trac streams are independent of each other, and that the

164

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

contribution of each individual trac stream to the variance of the aggregate tends to zero. The Cramer–Wold device (Billingsley, 1968, Theorem 7:7) can be used to extend a one-dimensional CLT to a multi-dimensional one in any nite number of dimensions. This is the critical argument and it relates directly to a matter of key interest in this paper, the long-range dependence of the trac processes. For this reason, it is appropriate to quote the relevant theorem and to spell out the argument in detail. Theorem 7:7 from Billingsley is: Theorem 1. In Rk ; Xn converges in distribution to X if and only if each linear combination of the components of Xn converges in distribution to the corresponding linear combination of the components of X . From this, together with Proposition 3, it is relatively straightforward to obtain: Proposition 4. If the trac streams have ÿnite mean and variance; a common limiting autocorrelation function and mean; and the largest fraction of the variance contributed by the one trac stream tends to zero; then the aggregate; normalised trac obtained by averaging increasing numbers of independent trac streams converges weakly to a Gaussian form; and the stationary queueing distribution function also converges weakly to that of the corresponding Gaussian trac. Proof. Let us denote the individual trac streams by {Xn(k) }n∈Z ; k = 0; 1; : : : ; and the aggregate normalised trac by ,s ! k k P P (k) j2 Xn(k) ; n ∈ Z; Yn = 1 j=1

j=1

j2

denotes the variance of the jth trac stream. Our objective is to prove in which (k) that {Yn }n∈Z converges in distribution to {Zn }n∈Z , a Gaussian process with the same mean and autocorrelation function, in the limit as k → ∞; of {Xn(k) }n∈Z . To make matters a little simpler, weak convergence of a sequence of probability measures on R∞ can be inferred from weak convergence of the nite-dimensional projections of these measures (Billingsley, 1968, p. 19). So, for the moment, let us limit our attention to the rst N components of the trac processes. In order to be able to apply Theorem 1, let us choose any linear combination of the aggregate trac stream, for example, Ya(k) =

N P n=0

an Yn(k) ;

k¿0. In order to be able to apply Theorem 1, we need to show that Ya(k) converges in distribution to Za(k) , the same linear combination of components of {Zn }n∈Z . Now Ya(k) =

N k P P j=0 n=0

an Xn(j) ;

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

165

so by the one-dimensional Central Limit Theorem Ya(k) converges in distribution to Za(k) . Since this was an arbitrary linear combination, Theorem 1 now implies that the vector (Y0(k) ; : : : ; YN(k) ) converges in distribution to (Z0 ; : : : ; ZN ). This shows that the probability measure of the aggregate trac streams converges weakly to a Gaussian distribution. The rest follows from Proposition 3. Although the variance of the trac demand from an individual workstation or networked device is probably in nite (this is suggested by the adequacy of the M=Pareto model in capturing second-order properties of network trac), it is not practically possible for the trac from one source to have in nite variance because this variance is limited by the maximum rate at which the networking equipment is able to submit the trac to the network. However, for the conditions of Proposition 4 to be invalidated it might be sucient for the variance of the trac which is typically generated in a network to grow without bound over time, and this is likely to happen because of improvements in technology. On the other hand, all we require for Proposition 4 to hold is that the fraction of the aggregate variance contributed to by one source should decrease without limit, to zero. This condition could fail if, for example, whenever a new, faster transmission technology is introduced, it is used not only in the central core of the network, but there are also individual hosts generating trac streams which exercise this technology to the limit. In my opinion, it is unlikely to be in the best interest of network providers, or the bulk of network users, to allow this to happen. It makes more sense, in fact, for network providers to ensure that the peak rate (and therefore also the variance) contributed by each individual user is limited to a value which will ensure that we can be con dent that the conditions of Proposition 4 are justi ed. The results of this paper can be viewed as an argument in favor of such a practise. Example 1 (A special case). Suppose the variance time function of the process {Xn }, Pn i.e. n 7→ Var( k=1 Xk ), is n 7→ 2 n2H and {Yn(k) } is obtained by aggregating together k independent instances of {Xn } and subtracting a service time, k . In order to produce a non-trivial result, we need to increase the speed of the transmission. We choose a speci c speed for each level of aggregation which produces a simple result. Thus Yn(k) =

k P j=1

Xn(j) − k ;

where {Xn(j) } is distributed like {Xn } and all of these processes are independent. It follows that the variance–time function for {Yn(k) } is n 7→ 2 kn2H : In order to identify a speci c limit for the stationary distribution of work it is convenient also to vary the unit in which work is expressed. Let us denote this unit

166

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

by k . The variance–time curve of {Yn(k) } in these units is therefore n 7→ 2 kn2H =k2 : √ By choosing k = k we can ensure that this curve remains xed for all k. The mean net input into this system, in the chosen units, is kE{X } − k √ k = kE{X } − √ : mk = k k By choosing √ k = kE{X } − km;

(9)

we can ensure that mk = mk and hence is also xed, in the chosen units, for all k. This shows that when expressed in the units k , the stationary queueing distribution of the system is the same as that of the Gaussian queueing system with variance time curve n 7→ 2 n2H and mean net input m, for all k. If we use Eq. (5) to approximate this queueing distribution, we nd it is, expressed in the original time units: √ |1 − H |−2 2 (−m; ) × exp − P{V∞ ¿ t} ≈  22 ! −2H  √ 2−2H H × ( kt) (10) : |(1 − H )m| The implications of this example are largely independent of Eq. (5). In √ order to achieve a stationary distribution which is stable in units proportional to k, which actually implies less and less bu ering when measured in units proportional to the time it takes to transmit the work in the bu er, √ it is adequate that the margin above k. kE{X } increase at the somewhat slower rate of √ Note that m is negative, so k|m| is the performance margin of the server, i.e. the extra speed it requires in order to maintain performance levels at the desired standard. Eq. (9) shows that, as a proportion of the total system capacity, the performance margin steadily reduces. This is a concrete representation of the multiplexing gain which can be expected as networks become larger and trac is aggregated. This formula for multiplexing gain will not apply until a Gaussian approximation is valid. The continuous-time model can be viewed as a limiting case of the discrete-time model, as the discrete-time steps become smaller and smaller. For this reason, the continuous-time model may provide an approximation to the discrete time model – however this is not necessarily the case. It depends on the length of the sampling interval. In e ect, the continuous-time model lacks one parameter of the discrete-time model – the sampling rate. The choice of this sampling interval is not arbitrary – if it is chosen too small, the unpleasant negative trac phenomenon of the Gaussian model can become quite signi cant – which reduces the realism of the model considerably. There is a comment in Narayan (1998) concerning this issue which seems to suggest that an additional coecient of  (the utilisation level of the server) is all that is

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

167

required to x the problem. The argument which leads to this suggestion is somewhat incomplete. Multiplying the formula of Narayan (1998) by  would make very little impact on the discrepancy between their result and those for the discrete time model described above. The discrepancy is not due to errors or approximations in either formula; the models are just di erent. It appears that in order to model real trac the sampling rate required in a discrete-time Gaussian model does separate it quite signi cantly from the corresponding continuoustime model. Another approach to discrete-time trac models which shares some of the features of the models considered in this paper and supports to a degree the results of Addie and Zukerman (1993, 1994) is presented in Choe and Shro (1999). The choice of sampling interval in a discrete-time Gaussian model has been discussed previously in Addie and Zukerman (1994) and is a delicate issue. The sampling interval must be long enough that there is not an excessive probability of negative arriving trac according to the model, but short enough that the queueing process is accurately modelled. If the standard deviation of the trac arriving in a short period of time is large relative to the trac arriving in a time interval which is already on the borderline of being too long for an adequate approximation of the queueing process, it may be impossible to choose a satisfactory sampling interval. Fortunately, however, as trac is aggregated, the ratio of the standard deviation over the mean inevitably reduces and, consequently, there should be a stage where a range of suitable choices for the sampling interval are available. On the other hand, perhaps the continuous-time models considered up to now can be improved by introducing higher levels of correlation at small time scales; the fractional Brownian noise (FBN) model could be modi ed by adjusting its autocorrelation function near the origin. In this way it might be possible to limit the amount of negative trac to a minimal level without having to select a sampling interval.

6. A paradox An interesting model in which random overlapping bursts, each with length Pareto distributed, arrive according to a Poisson process, was discussed in Likhanov et al. (1995). The trac can be represented as the server occupancy process of an in nite server queue into which is owing a Poisson stream of jobs, each with Pareto-distributed service times. Let us refer to this as an M=Pareto model. In Parulekar and Makowski (1996), it was shown that the asymptotic properties of the queued work distribution of a system in which the input trac is of this M=Pareto type are inconsistent with those of the fractional Brownian noise process and the discrete time Gaussian models (which have the same asymptotic tail shape). In the notation of this paper, the result of Parulekar and Makowski (1996) states that lim inf t→∞

1 log P{V∞ ¿ t}¿( − 1)E{Yn }; log t

168

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

in which ; 1 ¡ ¡ 2, is the parameter of the Pareto distribution, which can be de ned, to be speci c, by P{ ¿ t} = (t + 1)− : This result is inconsistent with Eq. (3). The quantity of trac arriving in an interval in this model also has nite variance and so the M=Pareto model falls in the domain of attraction of the Gaussian model. There is a discrepancy between these two models in that the M=Pareto model is a continuous-time model whereas the Gaussian model of this paper is a discrete-time model. However, if we sample the M=Pareto model, or interpret the Pareto distribution as a distribution on the integers, it can be compared directly to the Gaussian model, and in fact the model was regarded as a discrete-time model in Parulekar and Makowski (1996). So what happens when we aggregate M=Pareto models? We get another M=Pareto model, with di erent parameters. And yet, the CLT theorem says that it should be becoming more like Gaussian trac! It is true that as the contribution of each source diminishes in signi cance, in discrete time, the Pareto model converges weakly to the discrete-time fractional Brownian noise model, i.e. the probability measure of the aggregate processes converge weakly to a Gaussian probability measure and so, by Proposition 3, the stationary distribution of {Vn }n¿0 must also tend to the stationary distribution in the Gaussian case, which has the asymptotic tail properties set out in Eq. (3). There is only one possible explanation of this apparent discrepancy, and that is that the asymptotic tail properties of the stationary distribution of {Vn }n¿0 may become less and less signi cant, so that the distribution as a whole may tend to something without these tail properties. This raises the issue that for any distribution asymptotic properties of the form P{· ¿ t} for large t can be misleading. What if t has to be at 10 least 1010 before these properties take hold? Naturally, the tail properties of {Yn }n¿0 and sums of Yn will a ect those of {Vn }n¿0 . However, if the tail of {Yn }n¿0 we are speaking of is so far away from the mean and so low in mass as to be irrelevant then the same will be true of the implied tail of {Vn }n¿0 . Example 2 (Two limits). In order to clarify the relationship between the two con icting limit results we have for the M=Pareto model, let us consider a much simpler case with the same structure. Suppose we have both a weak convergence limit and an asymptotic tail estimate for the sequence of random variables Bk , as follows: lim P{Bk ¿ t}et = C1 ;

t→∞

k∈N

and lim P{Bk ¿ t} = C2 e−t ;

k→∞

t¿0;

in which  6= . The rst limit is an asymptotic limit characterising the tail of the distribution of each Bk , the second is a weak convergence result which can be used to

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

169

approximate whole distributions. The two limits appear to be inconsistent. However, here is sequence of distributions for the Bk to which satis es both these limits:  C2 e−t ; t6k; P{Bk ¿ t} = min{C2 e−k ; C1 e−t }; t ¿ k: Observe that for large k, the asymptotic approximation provided by the rst limit becomes more and more arti cial. In this example, we have patched together the array of probabilities which tends to di erent limits depending on the order in which the limits are taken in a manner perhaps arti cial. Nevertheless, whenever asymptotic tail approximations which are inconsistent with results for the Gaussian case are obtained, we nd ourselves facing a situation very similar to this example. Non-Gaussian asymptotic tail approximations are frequently proposed in the literature of performance evaluation of communication systems. This example suggests that caution should be exercised in their use as general-purpose approximations. The fact that there is a choice between a Gaussian approximation and a more speci c tail approximation and the possibility that the Gaussian approximation might prove to be better was previously noted in passing in Montgomery and de Veciana (1996). 7. Concluding remarks As the numbers of trac streams being aggregated together in the large central pipes of the public networks of the future grows, we can expect the trac being carried to look more and more Gaussian, not only from a super cial point of view, but also in the sense that the performance of multiplexers carrying this trac is well approximated by a Gaussian model. All this can be stated and proved quite readily for a discrete-time model of a communication network. This presupposes the choice of a sampling interval and it would be nice to be able to address the same issues in the framework of continuous time. One criticism of the convergence results given above should be mentioned. We have not obtained any rate of convergence results – so, until such results have been obtained, or experiments conducted with real data, the convergence to a Gaussian result may be purely theoretical. The speed with which networks are growing, and with which more and more trac is accumulating on shared paths is staggering. This fact about the real world provides some grounds for believing that results referring to limits as more and more trac is aggregated are important and provide useful insight even when support from other approaches is lacking. Also, recent results, to be reported in Addie et al. (1998), con rm that convergence occurs for aggregates of trac easily within reach of current growth predictions. One of the advantages of the M=Pareto model is that it is de ned in a continuous time framework, thereby avoiding the choice of an arbitrary sampling interval. On the other hand, it appears to be dicult to analyse its queueing performance and analysis

170

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

based on tail behaviour produces a result which is inconsistent with the corresponding Gaussian model. Since the M=Pareto model can be viewed as an increasing aggregation of trac, one expects the marginal distribution of its sampled trac to tend to that of the corresponding Gaussian model. An explanation for this apparant anomoly is that the tail behaviour becomes less and less signi cant as the rate parameter of the M=Pareto parameter increases. The FBN model is another continuous time model which appears attractive, and appears to be very similar to the Gaussian discrete time models discussed in this paper, however the unusual behaviour of this model at ne time scales appears to have surprisingly strong implications – and as a consequence the queueing behaviour of such a model is quite di erent from a discrete time Gaussian model tted to a realistic communication system. Use of a discrete time model, as in the present paper, provides us with a tractable model which is realistic, and the fact that it represents the limit of non-Gaussian discrete time models as trac accumulates appears to render it suitable for application to the core of real networks. Moreover this model has simple but important implications concerning the architecture of high speed networks as they become larger and larger.

References Addie, R.G., Zukerman, M., 1993. An approximation for performance evaluation of stationary single server queues. Proc. IEEE Infocom ’93. IEEE, New York, March. Addie, R.G., Zukerman, M., 1994. An approximation for performance evaluation of stationary single server queues. IEEE Trans. Commun. Addie, R.G., Zukerman, M., Neame, T., 1998. Application of the Central Limit Theorem to communication networks. Proc. 6th IFIP Workshop on Performance Evaluation of ATM Networks. Addie, R.G., Zukerman, M., Neame, T.M., 1995. Fractal trac: measurements, modelling and performance evaluation. Proc. IEEE Infocom 1995. IEEE, New York, April. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York. Choe, J., Shro , N.B., 1999. A central limit theorem based approach for analyzing queueing behaviour in high-speed networks. IEEE=ACM Trans. Networking, submitted. Dueld, N.G., O’Connell, N., 1995. Large deviations and over ow probabilities for the general single-server queue, with applications. Math. Proc. Cambridge Philos. Soc. 118, 363–374. Glynn, P.W., Whitt, W., 1993. Logarithmic asymptotics for steady-state tail probabilities in a single-server queue. J. Appl. Probab. 31A, 131–156. Kennedy, D.P., 1972. The continuity of the single server queue. J. Appl. Probab. 9, 370–381. Kesidis, G., Walrand, J., Chang, C.-S., 1993. E ective bandwidths for multiclass Markov uids and other ATM sources. IEEE=ACM. Trans. Networking 1, 424–428. Lavaud, N., 1998. The fractional Brownian motion: trac model and related studies. Master’s Thesis, Monash University. Leland, W.E., Taqqu, M.S., Willinger, W., Wilson, D.V., 1994. On the self-similar nature of ethernet trac (extended version). IEEE=ACM. Trans. Networking 2, 1–15. Likhanov, N., Tsybakov, B., Georganas, N.D., 1995. Analysis of an ATM bu er with self-similar (“fractal”) input trac. Proc. IEEE Infocom 1995, IEEE, New York, April, pp. 1–15. Montgomery, M., de Veciana, G., 1996. On the relevance of time scales in performance oriented trac characterizations. Proc. IEEE Infocom 96. Narayan, O., 1998. Exact asymptotic queue length distribution for fractional Brownian trac. Adv. Performance Anal. 1.

R.G. Addie / Journal of Statistical Planning and Inference 80 (1999) 155–171

171

Norros, I., Simonian, A., Veitch, D., Virtamo, J., 1995. A Benes formula for a bu er with fractional Brownian input. Proc. 9th ITC Specialists Seminar. Norros, I., 1994. A storage model with self-similar input. Queueing Systems – Theory Appl. 16, 387–396. Parulekar, M., Makowski, A.M., 1996. Tail probabilities for a multiplexer with self-similar trac. Proc. Infocom ’96, pp. 1452–1459. Whitt, W., 1974. The continuity of queues. Adv. Appl. Probab. 6, 175–183.