ELSEVIER
Computer Communications 20 (1997) 1l-28
Research
Circuit emulation approach to traffic control in a B-ISDN Andrea Baiocchi”, Nicola BlCfari-Melazzib, Francesca Cuomoa, Marco Listantia7* “Dipartimento di Infocom, Universi@ of Roma Za Sapienza’, Via Euabssiana 18, 00184 Roma, Italy bDipartimento di Ingegneria Elettronica, University of Roma at Tor Vergata, Roma, Italy
(Received5 November 1994; revised 6 October 1995)
Abstract In this paper we propose a way to look at the congestion control problem in a B-ISDN that intends to gather the benefits of both circuit and packet transfer modes. We define a Circuit Emulation (CE) facility based on an admission policy at the edges of the network, and on a service discipline in the inner nodes that guarantees no information loss and no delay-jitter. On the other side, since a pure multirate-circuit emulation would be too inefficient for low bit rate bursty sources, we propose to use this facility only for a subset of the overall traffic. The CE facility is provided via the so called Worst Deterministic Pattern Allocation (WDPA) strategy, characterized by resource allocation rules as simple as those of the peak allocation. WDPA is based on the concept of preventively constraining information sources to emit their cells according to a superimposed deterministic mask. This forced regularity is exploited to control the cell transfer within the network. Practically, WDPA establishes the worst emission pattern of the source. Resource allocation is performed taking into account only the parameters of the declared deterministic mask. The same allocation rule is used in every network section. This target is achieved by means of an ad hoc defined Virtual Multiplexing Algorithm. Application of WDPA to an actual switching node based on classical output queueing is also discussed, and its compatibility with this architecture is demonstrated. A performance evaluation is carried out to highlight the benefits deriving from the proposed strategy. K~words:
Connection Admission Control; Traffic descriptors;Shaping; Deterministictrafficpattern; Periodicinput queues
1. Introduction
Congestion control in a B-ISDN for Constant Bit Rate (CBR) and Variable Bit Rate (VBR) sources is mainly based on call level control schemes. They aim at avoiding congestion through Connection Admission Control (CAC) functions regulating the admittance or denial of a connection on the basis of a set of user-declared Traffic Descriptors (TDs) at the call set-up phase [l]. The source peak cell rate is the simplest TD. It is easily predictable and monitorable, and it entails a very simple call acceptance rule. However, it allows only the exploitation of the multirate capability of ATM, and does not reap its potential statistical gain. Nevertheless, to avoid cell loss, the use of re-spacing equipment for each Virtual Channel (VC) in internal network sections is required [2]. To go beyond peak allocation, a statistical multiplexing scheme can be employed; however, a more complex set of * Email:
[email protected] 0140-3664/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PIZ SOl40-3664(96)01166-8
TDs is needed (e.g. long-term average cell rate, burst length statistics, etc.). Some drawbacks of this approach can be envisaged: a user can hardly predict the statistical characteristics of its own cell flow; the monitoring of TDs becomes harder and harder as the complexity of TDs grows, complicating the design of efficient and reliable Usage Parameter Control (UPC) equipment; the resource allocation rules are quite complex; in this context, the weight of a source flow, in terms of the amount of resources to be allocated, depends on the particular traffic mix currently loading a network link, so the allocation rules must be based on the solution of highly sophisticated queueing systems [3,4]; network performance are heavily biased by ‘bad’ sources (e.g. sources characterized by long bursts, high mean rates, etc.) imposing performance degradation also to all the other (possibly ‘better’) sources [5]; 9 excepting two levels of priority, ATM intends to offer the same QoS to every user. This is potentially inefficient l
l
l
l
12
l
A. Baiocchi et aLlComputer
since, for instance, there are many services that do not require 1O-9 cell loss ratio; finally, in a classical statistical multiplexing scheme, it is difficult to guarantee a negotiated QoS to a specific user: the network can guarantee average and long-term performance, but a single user, during a specific connection, can experience higher losses or higher delays.
In view of these considerations, it has been proposed to limit the goal of ATM and to focus on multirate-circuit emulation [6]. ATM would still have many advantages on STM paradigms (synchronization and multi-rate switching are not relevant problems in ATM), while enjoying the merits of this traditional framework. On the other side, a pure multirate-circuit emulation would be too inefficient for low bit rate, bursty sources. Most of the problems outlined above arise from the characteristics of part of the traffic that ATM will have to handle. Data networks were traditionally dimensioned under random traffic assumptions (e.g. Poisson arrivals), but the analysis of the traffic sources that the B-ISDN will have to support (voice, LAN traffic, video) has shown that, due to correlation effects. the arrival processes of these sources can not be modelled as Poisson ones. It this were possible, congestion control would not be a complex problem: simple queueing models (e.g. M/D/l queue) could be used to dimension and allocate resources. A look at the many performance studies that have appeared in the literature reveals that a fundamental parameter to look at is the ratio between the link bit rate and the peak bit rate of the sources to multiplex. Only when this ratio is high enough is statistical multiplexing convenient (or more precisely, feasible, without resorting to very complex mechanisms). This may seem obvious, but it is often overlooked. Correlations in the traffic flows or peculiar requirements of some sources can alter the range of the above ratio in which statistical multiplexing is convenient, but does not change this concept. In other words, ATM even intends to statistically multiplex, among others, sources with high bit rates and high burst lengths that could be more conveniently handled in a circuit-like way (especially if they demand very stringent QoS requirements). In this paper, we propose a way of looking at the congestion control problem that is not based on statistical multiplexing of all traffic sources. The proposed approach is faithful to the original ATM paradigm, and intends to gather the benefits of both circuit and packet transfer modes. The international bodies have defined three categories of traffic: Available Bit Rate (ABR), Constant Bit Rate (CBR) and Variable Bit Rate (VBR). With reference to this distinction, we propose to divide VBR traffic into two sub-classes: VBR,,, : sources that can be efficiently handled with a ‘statistical transfer mode’, with a high value of the ratio between the link bit rate and the peak bit rate of the sources; (modelled with very simple arrival processes, e.g. Poisson);
Communications
20 (1997) 11-28
VBRcrrc: sources that cannot be statistically multiplexed efficiently, or that require complex models to be represented satisfactorily, or that ‘prefer’ a circuitlike transfer mode (e.g. guaranteed QoS: very low delay jitter, no information loss). In this scenario we assume that sources belonging to the class VBRt, are handled according to a statistical allocation paradigm. Sources belonging to the class VBR,irc are handled emulating a multirate-circuit transfer mode, thus allocating a constant bit rate. In other words, we assume that the ATM network offers a Circuit Emulation (CE) facility. If there are no stringent delay requirements, the cell stream of a source belonging either to the VBR,, or VBR,-ir, class can possibly be shaped before entering the network. A shaping function can improve the traffic characteristics, thus reducing the network resources needed to obtain the desired QoS. In the case of VBF& sources using the CE facility, the only significant shaping function consists in reducing the peak cell rate, and thus the value of the requested bandwidth. Within this framework a B-ISDN should handle three categories of traffic: (i) sources handled with simple traffic assumptions (VBR,,,,) (shaped or not); (ii) sources handled by emulating multirate-circuit transfer mode (VBR,,,, shaped or not, and CBR traffic); (iii) ABR traffic. The advantages of this scheme are the following: (i) easy implementability; the core network can be very simple, since the traffic entering the network does not have nasty characteristics; all the complexity (including the shaping function) is confined in the access section; (ii) efficiency; statistical gain can be achieved for the VB$,,, traffic; for the VBRci,c traffic it is possible to exploit the activity statistics of each source via the shaping function so as to reduce its bandwidth demand with respect to the original peak bit rate. The relation between our approach and ABR traffic control (e.g. rate-based congestion control as standardized by the ATM Forum [7]) is discussed in Section 7. Our strategy, called the Worst Deterministic Pattern Allocation (WDPA), provides a CE service and allows the following goals to be achieved: possibility of providing different QoS to different sources; guaranteed QoS; no cell loss within the network for VBR,, and CBR traffic; adaptive redefinition of TDs and resource allocation re-negotiation for CBR and VBRtic sources; utilization of very simple, easily measurable and monitorable TDs; . possibility of allocation of any amount of bandwidth between mean and peak rate of a source; independence of sources in resource assignment
l
l
l
l
l
A. Baiocchi et aL/Computer Communications 20 (1997) II-28
procedure, so as to define resource allocation rules no more complex than those implied by peak allocation. Congestion control mechanisms based on the control of the source flows and on constrained multiplexing schemes similar to the WDPA have been proposed in the literature, e.g. the Virtual Framing (VF) scheme [8,9] and the Counter Based Congestion (CBC) control [lo]. Both these strategies allow the assignment of bandwidth to each connection independently of other existing connections. The VF scheme imposes a source to emit a maximum number of cells in a fixed time frame. CBC control allows each source to define its own frame independently, but a source is forced to emit a single cell in each frame. To avoid cell loss within the network, the network buffer sizes needed by both these mechanisms are strictly tied to the frame duration. This entails the need of a large amount of buffer to obtain a fine granularity in bandwidth allocation and, consequently, a high network efficiency. WDPA is a based on superimposing on the source flow a periodic, deterministic mask which enables or blocks cell emission, i.e. the mask defines the emission opportunities of the ATM cells. Obviously, an opportunity can be either used or missed (an opportunity is missed if there are no cells ready to be transmitted). This corresponds to establishing the worst emission pattern of the source. With respect to the VF and CBC schemes, in which the maximum number of cells to be emitted in a time window is declared a priori, the WDPA also requires that the time positions of the emission opportunities within the mask are fixed. This further constraint determines a sensitive saving in resource allocation. For each source, the resource allocation in a network section (access or internal) is performed, taking into account only the parameters of the declared deterministic mask, and it concerns the impact on both bandwidth and buffer consumption. In this sense, a source is seen by the network as a deterministic one. To impose the deterministic mask, a shaping device has to be provided at the source premises. The shaper is usercontrolled, and has the goal of disjoining the user environment from the network domain. The size of the shaper buffer and the value of the parameters of the imposed mask are dimensioned on the basis of the source characteristics and the tolerable cell loss and delay for the relevant application. At each multiplexing (or switching) stage, the time positions of the emission opportunities of each VC are properly preserved within a multiplexed flow. This target is achieved by means of an ad hoc defined Virtual Multiplexing Algorithm (VMA), which is based on a FIFO multiplexing of the emission opportunities of the various VCs. The main outcome of this approach consists in the utilization of the same acceptance rule in all the network sections crossed by a source-destination path. The WDPA is able to reach a sensitive efficiency gain, since it exploits the activity statistics of each source so as
13
to reduce their bandwidth demand with respect to the corresponding peak bit rates. Moreover, it allows a smaller buffer to be employed as compared to the VF and CBC schemes. In fact, the silence length and time position of the emission opportunities do not impact on the buffer sizing, but only the number of superimposed sources has to be considered. This paper is organized in eight sections. In Section 2, a general overview of the user functions is given, In particular, shaping and multiplexing are defined and some possible user configurations are discussed. Section 3 deals with the WDPA resource allocation rules. Section 4 is devoted to the description of the switching functions needed by the WDPA approach. In Section 5, we present some criteria for the dimensioning of the source shaper parameters, and compare other deterministic allocation strategies with the WDPA. Section 6 discusses the overall efficiency of the WDPA. Section 7 sketches a possible unifying framework that shows how the proposed approach can fit in with the current standardization activity and technology. Conclusions are drawn in Section 8.
2. User access scenario We refer to a slotted UNI, each slot containing a single ATM cell. The ATM link connecting the user and network sides of the UN1 carries a multiplexed stream composed of cells belonging to several VCs. Both shaping and multiplexing functions are performed at the user side of the UNI. In Section 2.1 the shaping function envisaged by the WDPA strategy for each VC is described; Section 2.2 deals with the multiplexing function; finally, Section 2.3 outlines different alternatives for the shaping and multiplexing functions distribution among the devices composing the user side of the UNI. 2.1. Shaping function The WDPA strategy establishes that each VC cell emission is specified by an emission pattern, obtained by means of the periodic repetition of a mask of length T slots. The definition of the pattern is under the complete control of the user, but once it has been fixed the user must strictly adhere to it, unless the pattern itself is successfully renegotiated. As a consequence, the network knows exactly what the worst case source emission can look like. Two situations can be envisaged for the user capabilities to choose a pattern profile that fits his needs: (i) stable, known sources (e.g. voice, specific video codec, etc.); (ii) time varying, unpredictable sources (e.g. LAN generated traffic). In the first case, the pattern parameters can be easily determined once for all that source class, whereas in the second case, thanks to the fast renegotiation capability of the proposed strategy, an adaptive sizing of the pattern profile can be performed.
14
A. Baiocchi et aLlComputer Communications 20 (1997) 11-28 Period - T
tI
4,
1
I
1
,
I
)
time Emission opportunities
re-negotiation of the pattern profile can eventually be performed to achieve a closer matching with the actual source activity. Anyway, such a potential inefficiency can be mitigated by utilizing the unused bandwidth to carry ABR traffic [7,11]. 2.2. Multiplexing function
Fig. 1. Example of Uniform (a, T)-pattern (a = 5, T = 18).
Let a denote the number of emission opportunities in a pattern period (1 5 a 5 T). A pattern is completely specified on the time axis, once T, a and the time positions of the emission opportunities are assigned as well as the starting time cpof the pattern. In the following, we refer to the so called uniform (a, T)-pattern, defined as follows. Let tn denote the time of the nth emission opportunity. Then
Fig. 1 depicts an example of uniform (a, T)-pattern with a = 5 and T = 18. Note that the dependence of (1) upon a and T suggests that it can be extended to the case where the bandwidth fraction takes arbitrary (non-rational) values p, simply by substituting T/u in (1) with l/p. Moreover, the uniform (a, T)-pattern remains unchanged if we multiply both a and T by the same integer factor j, since the emission epochs depend only on the ratio T/u. Therefore, in the following we assume, without loss of generality, a and T prime to each other. The rationale underlying (1) is to scatter the emission opportunities assigned to a source as uniformly as possible over the entire period T. That leads to the following advantages: (i) the shaper is guaranteed emptying opportunities with minimal variance of inter-opportunity intervals; this is desirable, since it is known that the performance of a queue with server vacations deteriorates as variance of the vacation times increases; (ii) if a VC requires a fraction q (0 c q 5 1) of the bandwidth of an ATM link, the uniform pattern must satisfy the constraint u/T 1 q; as the choice of a and T values does not have any impact on the buffer consumption, the relative excess bandwidth E = (u/T - q)/q can be made arbitrarily small, thus allowing a very precise tailoring of the pattern bandwidth to that actually required; (iii) the buffer consumption of a VC is minimized; in fact, given a VC characterized by a given bandwidth fraction q = u/T, the uniform (a, T)-pattern provides the minimum buffer consumption with respect to all other possible pattern configurations. Link utilization inefficiency arises because a source does not utilize all the emission opportunities of the negotiated pattern. In any case, tariffing criteria should take into account the amount of reserved resources, and a
If a multiplexing function is performed at the user side of the UNI, then the User-Network Interface supports a multiplexed flow of cells belonging to several VCs. Unfortunately, multiplexing alters the original VC patterns, e.g. clumping cells, otherwise spaced out in the original pattern, or imposing a variable delay on cells belonging to the same pattern. Furthermore, additional disorder has to be expected in each multiplexer or switching node crossed along the source-destination path. Two directions can be explored to face such a problem. First, a different allocation rule for each section of the network could be designed, based on the knowledge of the output traffic characteristics of the upstream nodes. Many efforts have been made [12,13] to understand and model the behaviour of an ATM multiplexer output. The results are rather disappointing, both in terms of achievable network efficiency and, above all, of traffic manageability and allocation rule complexity. Second, some mechanisms could be devised to adopt a unique (simple) allocation rule in all the network sections. Such a goal can be conceptually achieved by rebuilding at each network section the original VC patterns. That can be performed by simply removing the effect of multiplexing, i.e. equalizing the cell delays. Provided that the multiplexed flow is obtained from a set of rigorously patterned VCs, delay equalization requires knowledge of: (i) the maximum delay D a cell can suffer in crossing the upstream multiplexer; (ii) the actual delay of the first cell of each VC being multiplexed. In fact, assume that the first cell of a given VC shows up at the equalizing buffer at time r. and let dI denote its delay in the upstream multiplexer. Then the nth (n 5: 0) cell that VC can be extracted from the equalizing buffer at time T, = +ro+D - d, + t,(O), where t,(s) is given in Eq. (1). In this way, the pattern crosses the multiplexerequalizing buffer chain without ‘distorsion’, but only suffering a constant time shift (delay) equal to D cell transmission times. A possible delay equalization scheme is presented in Section 5. At the cost of increasing delays and buffer sizes, the above pattern reconstruction (regeneration in the following) can be simplified by removing the need of knowing dl, the delay suffered by the first cell of a given VC in the upstream multiplexing buffer. In fact, it suffices that the nth (n 2 0) cell of the VC be released from the regeneration buffer at time r, = 7. + D + t,(O). This ensures that no starving takes place at the regeneration buffer and implies that all VC cells cross the multiplexing-regeneration buffer chain
15
A. Baiocchi et al.lComputer Communications20 (1997) 11-28
Network
UNI
User side
side
(4
_1
Policer
1
WI
j
Policer
1
W
_1
Policer
1
-El-
(c)
Fig. 2. Network scenario. (a) Single-source/single-VC; @I) Multiple-sources/Concen ted-MUX, shaper at source premises; ($2) Multiple-sources/ Concentrated-MUX, integrated shaper and multiplexing; (c) Multiple-sources/Distributed-MUX. (GW: gateway; SH: shapes device; MUX: multiplexer.)
suffering a constant delay equal to D + di cell transmission times. It can also be shown that the regeneration buffer required to avoid cell loss is doubled with respect to the previous case in which d, is known. Note that no constraint has been posed on the multiplexing discipline up to now, provided D can be given a well defined value. Nor does any smart multiplexing algorithm improve delay performance in any way, because of the equalization carried out downstream the multiplexer. In this respect, all multiplexing algorithms are equivalent. Our approach is to adopt an emission opportunity multiplexing scheme, realized by means of the so called Virtual Multiplexing Algorithm (VMA). A multiplexer governed by the VMA determines in each output time slot the VC whose cell can be forwarded (if any), by simulating the output of a Virtual Queue (VQ). The VQ inputs are the VC patterns loading the actual multiplexer, each with its proper starting time: emission opportunities are queued up in the VQ according to arrival order. Clearly, the VQ realizes a FIFO multiplexing of the emission opportunities corresponding to the actually multiplexed VCs. A formal exposition of the VMA can be found in Section 4.
Simpler disciplines, like the FIFO handling of the incoming cells, are not considered, since they result in a multiplexer output stream that depends on source activity. This may cause some trouble both in the design of policing devices and in the implementation of equalizing buffers. While it does not impact on delay performance in any way, the choice of introducing a virtual multiplexing of emission opportunities instead of actual cells defines a logical resource to achieve guaranteed QoS cell delivery across the network. 2.3. Network access scenarios As far as the possible user access configurations are concerned, taking into account the impact of WDPA and VMA, three basic cases can be envisaged:
(i) single-source/single-VC; (ii) multiple-sources/concentrated-multiplexing; (iii) multiple-sources/distributed-multiplexing. These alternatives are depicted in Fig. 2. In the single-source-VC case (Fig. 2(a)), the WDPA
16
A. Baiocchi et al.lComputer
only imposes that the cell flow emitted by the source is structured according to the declared emission mask, to this end a shaper device is located downstream the source. In the second case (multiple-sources/concentratedmultiplexing) (Fig. 2(b)), a multiplexed cell stream flows through the UNI. This stream is emitted by a MUX equipment on the basis of the individual VCs generated by remote located sources. In this case, the WDPA imposes that the multiplexing of the shapered VCs be operated according to the VMA. The shaper functionality can be: (i) located at source premises (Fig. 2(bl)); (ii) integrated within the MUX (Fig. 2@2)). In the latter a shaping buffer saving can be reached. It is to be noted that a particular case of this user configuration is given by the single-source/multiple-VCs case (e.g. multimedia terminal). Each VC cell flow must be shapered and the overall cell flow must be organized according to the VMA rules. The last case (multiple-sources/distributed-multiplexing, Fig. 2(c)) concerns a more complex user scenario in which the information sources are connected to the UN1 through a structured Customer Premises Network (CPN) (e.g. LAN, MAN, etc.). In this case, gateway (GW) equipment has to be located in between the CPN and UNI. Such equipment has to perform both the shaper and VMA functions so as to achieve the complete decoupling between the CPN and the ATM network. It is useless to perform the shaper functions at the sources; in fact, in every case, it must be replicated within the GW. This is due to the fact that the distributed multiplexing operated within the CPN does not guarantee the correct application of VMA. At the network side of UNI, the policing functions aim at verifying that the actual emission pattern of each VC is consistent with that defined at the connection set-up, also taking into account the effects of VMA. In this sense, the policing unit has to know the number of VCs crossing the UN1 and their individual characteristics, i.e. the emission masks. It is worth noting that, although the policing equipment is not described in this paper, it is easily implementable, since its operation is based on the verification of deterministic events.
Communications
v(tl,tz) Q(t) Qmax
L
20 (1997) 11-28
number of arrivals at the MUX queue in the interval [tl, t2]; MUX buffer occupancy in cells at time t, in the case of an infinitely large buffer; maximum value of Q(t) over the entire time axis; number of active VCs.
As for the ith pattern, parameters: number pattern Ti ui(tl, t2) number pattern ai
we introduce
the following
of emission opportunities in a period; period; of arrivals at the MUX queue from the ith in the interval [tl, t2].
The MUX buffer occupancy Q(t) is clearly a periodic function, with the period equal to the least common multiple p, of the Ti, i = 1, . . ..L. Therefore Q,,, is the maximum of Q(t) over any time span of length p. Since we aim at achieving no cell loss, we require that the overall assigned bandwidth be no more than the MUX output link capacity, and that the queue length fluctuations due to concurrent arrivals be absorbed by the MUX buffer. Quantitatively, these requirements yield
(2) The numerical evaluation of Q,, should be done by means of an exhaustive search over the whole period )L, thus resulting in a time-consuming procedure, depending on the values of p and hence of I;:, i = 1, . .., L. Above all, Q, depends on the considered traffic mix in a non-linear way, so that the knowledge of Q,, for a given VC scenario cannot help determining the new value when a VC is eventually set-up or cleared. To obtain linear additive allocation rules with respect to the traffic mix, we resort to the evaluation of an upper bound of Q,,. In fact, a simple upper bound of Q,,, can be derived for the case at hand, as stated in the following:
Property 1. The maximum queue length achieved by an infinite buffer MUX, whose input consists of the superposition of L randomly phased uniform (a, T)-patterns, is upper bounded by the number of patterns, i.e. Q,, 5 L.
3. Call admission control according to the WDPA
Proof. Let G(x) be the survivor
scheme
occupancy.
The call acceptance rule associated to the WDPA strategy is described with reference to a Multiplexer (MUX) directly loaded by patterned cell streams. In the next section it will be shown how the same acceptance rule can be also utilized in all the network interfaces. We assume that time is slotted: the cell transmission time (time slot) is the time unit. Let c
B
MUX output link capacity; MUX buffer size in ATM cells;
G(x) =
$
function
of the buffer
It is known that [14]: r I vT,T+t-l)=t+X (
and
Q(?)=O}.
f=l
(3) We prove that there exists a (generally real) value X such thatitmustbev(T,T+l-l)
A. Baiocchi et aLlComputer Communications20 (1997) 11-28
17
where
Since 0 5 p(y) < 1, it follows that the right-hand sides of the inequalities (6) are all less than 1 and hence we can safely choose bi = 1, that yields Q,, 5 L. n In the end, we obtain the following simple buffer and bandwidth acceptance rules: (LIB 0
20
10
30
40
(7)
BUFFER POSITION a) Uniform (7,50)-pattern, L=7; b) Uniform (7,351) pattern, L=49; c) Non Uniform (a&pattern,
s=7, s=43, (T=50) /_=7;
d) Uniform (1,50) pattern, L=49. Fig. 3. Upper and lower bound of the survivor function of buffer occupancy vs. the buffer position: comparison between different values of L and different emission patterns.
Let us assume that, however we select a time interval [T, 7 + I - l] of length t slots, t 2 1, there exist a nonnegative quantity bi such that vi(7,7 + t - 1) < r; + bi. I
(4)
Summing up over the index i and reminding the bottom inequality in (2), it follows that ~(T,T + t - 1) < + bL. Therefore, according to the cont+bl+b,+-.. siderations at the beginning of the proof, X 2 bl+ b2+...+bLandsoQ,~‘:[bl+b2+.,.+bL1. There remains to find the b,‘s. Let us consider a window of length I time slots, say [T,T + t - 11. Since we aim at evaluating an upper bound of V~(T,7 + t - l), it can be assumed that an emission opportunity, say the noth one, no 2 0, issued at time t,,, occurs in the time slot T, i.e. &, = T (worst case phasing between the considered window and the arrival pattern). It is easy to verify that the validity of (4) can be ascertained just by requiring the (4) holds for the time points corresponding to arrivals in a single period of the emission pattern starting from t%I’ i.e. the following inequalities have to be satisfied simultaneously:
lsjlai. Simple algebraic manipulations yield
The key features of the allocation rule (7) are the linearity in the traffic mix and its extreme simplicity. Each VC can be attributed an individual contribution to bandwidth and buffer consumption; different VCs contributions simply sum up to give the overall assigned amount of resources, just as in the case of peak rate allocation. Finally, we estimate the upper bound of property 1 by comparing it with the buffer occupancy derived by the analysis of the queue CiDi(ai, Ti)lD/l, wherein Di(ai, Ti) denotes a deterministic process whose arrivals correspond to the emission opportunities of a uniform (a, T)-pattern. The analysis of this queue has been carried out by extending the results presented in Refs. [14,15] and can be found in the appendix. Fig. 3 shows the upper and lower bounds of the survivor functions of buffer occupancy vs. the buffer position in two cases: (a) L = 7 sources each characterized by an uniform pattern with a = 7 and T = 50 (curves a); (b) L = 49 sources each characterized by an uniform pattern with a = 7 and T = 351 (curves b). Results concerning case (a) are compared with those arising from the application of the non uniform (a,~)-pattern (this pattern is studied in Ref. [16]) (curve c.). In this pattern the emission opportunities (a = 7) are consecutive and the buffer consumption is greater (s is the length of the forced silence period, i.e. a + s = T). Instead, results concerning case (b) are compared with those relevant to the application of a (1,50)-pattern whose bandwidth is slightly larger but is characterized by the minimum value of a (curve d). The main comments on this figure are: (i) if the number of multiplexed sources is low (e.g. L = 7) the property 1 provides a tight upper bound to the actual buffer occupancy; (ii) the buffer saving achieved by means of the uniform (a, T)-pattern can be very sensitive with respect to other pattern configurations; (iii) if the number of multiplexed sources is high (e.g. L = 49) the actual buffer occupancy can be quite lower than the upper bound; (iv) there exists no appreciable difference between the actual buffer occupancy
A. Baiocchi et aLlComputer Communications20 (I 597) 11-28
18
NxM Switching node
a
P
“(a;: setof VCs supported by the i-th input link (Ll ,...,N); v,c setof VCs supported by the jth output link &I ,. ,M). Fig. 4. Conceptual
scheme of an N x M switching
caused by VC shapered according to uniform (a, T)-patterns with high values of a (e.g. a = 7) with respect to that resulting by the application of a pattern with approximately the same bandwidth but with a equal to the minimum value, i.e. a = 1.
4. Multiplexing and switching according to the WDPA The resource allocation rule, based on (7), has been demonstrated under the assumption that a multiplexer is loaded by L uniform (a, T)-patterns. As previously mentioned (see Section 2.2), the same rule can be also utilized in cases in which the input lines support multiplexed flows provided that a delay equalization and a complete rebuilding of the original patterns is performed. In other words, in each network node, besides the switching function, it is necessary to reshape the cell flows associated with the VCs that are going to be multiplexed on an output link, in such a way that the output cell flow is exactly the same as that resulting from the application of the VMA on the cell flows outgoing from source shapers. We first explain this approach with reference to a conceptual switching scenario: successively it will be applied to an actual switching node based on the classical output queueing architecture.
4.1. Conceptual switching scenario Let us consider a generic N x A4switching node, as shown in Fig. 4. Let OLand p be the input and output node interfaces, respectively. Also let yai (ysj) be the set of VCS supported by the ith input link, i = 1, . . ..N (jth output link, j = 1, . . . . M). The cell flow supported by the ith input link is generated by a multiplexer denoted by MU~i and equipped with a buffer of size B,i cells, i=l , . . . . N. Let us suppose that, each mi is directly
node for WDPA utilization.
loaded by the original patterns of VCs E yai, i = 1, . . .,N, and that they operate according to VMA. The switch operation can be decomposed into three logical functions: (i) demultiplexing and reshaping, operated by N devices, indicated as RDi, i = 1, . . ..N. these devices demultiplex the incoming cell streams in as many individual flows as the number of VCs, and reshape such flows to impose their own initial (a, T)-patterns anew; (ii) switching performed by an interconnection structure able to instantaneously carry the reshaped flows associated with each VC towards the relevant output ports; (iii) multiplexing, performed by M multiplexers (indicated as MUXsj, j = 1, . ..) M), which form the cell streams outgoing from the node. The reshaping function performed by the RDs essentially amounts to equalizing the delays suffered by the cells in the MUGi, i = 1, . . . . N. Let di (0 5 dj 5 B,J be the delay of a given emission opportunity of a tagged VC E yai. That emission opportunity must be delayed exactly Bei - di time slots in the RD,. This guarantees that the tagged VC pattern can be exactly rebuilt, while starvation of the RDi is avoided. Note that delay equalization can be performed only if the delay of the first cell of the VC considered is known. Dimensioning of the amount of memory BRDi required by the RDi, i = 1, . . . . N, to avoid cell loss can be based on the following statement: Property 2. The maximum number of cells that have to be stored in the RDi cannot be greater than B,i, i.e. it CUE be BRDi = B,i (i = 1, . . ..N).
The proof of this statement is based on the fact that no cell entering RDi can be delayed by more than the maximum delay of MUGI, i.e. B,i time slots (recall that emission opportunities are FIFO multiplexed). Let us consider a cell arriving at RDi at time t to find B,i cells already stored in the RDi buffer. Since at most one cell per time slot can enter the RDi buffer (serial input), the eldest cell must
19
A. Baiocchi et aLlComputer Communications20 (1997) II-28 NxM Switching node MUXal
=I%Bal
. . .
11
MlJXclN
EktN *
a
P
Fig. 5. Scheme of an output buffer, N x M switching
have been waiting for just B,i cell times in the RDi buffer, and hence it is necessarily going to leave at time t. Therefore, there can never be more than Bai cells in the n RDi buffer. As a consequence of reshaping, the multiplexers MUXa behave as if they were directly loaded by the original patterns associated to VC E yaj (1 5 j 5 M), i.e. in accordance with the VMA. The amount of buffer in MUXsj, i.e. l,..., M, is independent of the previous multiplexing stages, and its value is utilized for the acceptance rule relevant to the node output links. Hence, with reference to a generic output link, rule (7) generalizes to:
Bpj,j
=
[ IYsl SBl3
(8) where 151indicates the cardinality of the set 5. As for the amount of memory required by a node, according to Property 2, the RDi needs B,i cells i = 1, . . . . N. As for the output, the amount of memory of the jth output is BPj, j = 1, . . . . M. Therefore, the total amount of memory required by a node equals B,, cells, where (9) i=l
node for WDPA utilization
(PC: Port Controller).
delayed by d cell times in the MUX, buffer. According to the reshaping algorithm, we can say that the cell departs with the fist emission opportunity that occurs since the time it becomes the eldest cell in the RD relevant to the tagged VC. Such an opportunity surely occurs within B, - d time slots since the cell arrived at the RD. However, the cell considered could leave earlier: this happens if an unused emission opportunity occurs, while it is the eldest cell of the tagged VC stored in the RD. We can describe this phenomenon saying that cells belonging to a VC can ‘jump ahead’, if emission opportunities issued before the cell has been emitted have not been filled. It is to be noted that jumping ahead is only allowed within the emission opportunities of the same VC: in other words, cells belonging to a VC cannot take advantage of ‘holes’ left in the emission pattern of other VCs. Clearly, the first cell emitted during a connection suffers the entire delay B, time slots at each hop of the network path the connection has been routed on. Successive cells are delayed by no more than B, time slots for each hop, but can eventually be forwarded more quickly, thus ‘compressing’ the time interval over which the cells emitted during the connection are scattered. 4.2. Output queueing node
j=l
Eq (9) shows that the amount of node memory increases linearly with the number of node I/O lines. The buffering delay suffered by a cell crossing the chain of a MUX, and a RD, associated to an input interface, is limited by B, cell times. In fact, this is actually the constant value of the delay of all the cells of a VC, provided all the emission opportunities of that VC are used by the source. If the network path chosen by the routing algorithm at the call set-up is made up of H links, the maximum transfer delay equals the sum of the buffer sixes of the H crossed MUXs. If not all emission opportunities of a VC are used, some cells of the same VC could take advantage of the ‘holes’ in the emitted pattern. Let us consider a cell of a tagged VC,
Pig. 5 shows the general model of an output queueing node architecture. The switching function is performed by a temporally transparent, non-blocking switching fabric. Reshaping and multiplexing functions can be jointly performed in the buffers associated to each output line. Each of these buffers is divided into N logical regions, one for each input line. An incoming cell is transferred by the switching fabric towards the addressed output buffer and stored in the buffer region associated to the relevant input line. Each time slot, the head of line cell of a given buffer region is read according to the VMA. The following result holds: Property 3. In the case of an output queueing node, the
20
A. Baiocchi et aLlComputer Communications 20 (1997) 11-28
required amount of memory for the jth output buffer under the WDPA strategy is B, max+ Bpjj where B, max=
max{Bai,
i =
1, . . . . N}.
Proof. Let us consider the jth output buffer. The reshaping function of the VCs coming from the ith input line requires at most a delay equal to B,i cell times i = 1, . . .,N, while the multiplexing delay at the considered output buffer cannot be more than Bpj. Hence, the maximum delay suffered by a cell crossing the jth output buffer cannot exceed B, mm + Bpj. Let US now suppose that at any time t there were B, max+ Bpj + 1 cells stored in the considered output buffer. Then, no matter what the forwarding order might be, at least one cell is going to spend more than B cimax+ Bpj time slots in the buffer, thus violating the maximum delay constraint. n The Virtual Multiplexing Algorithm (VMA) aims at establishing, in a time slot, which VC has the emission opportunity of the node output line. The VMA is carried out by an output port controller. The emission opportunities in the multiplexed flows associated to the M output lines are assigned to VCs by M VMAs run independently of one another. Therefore, we concentrate on a generic output link, supporting the set of yp of VCs. The VMA performs the reshaping and multiplexing functions in an integrated manner, in such a way that the cell flow on the considered output link coincides with that of an access MUX loaded by the rigorously patterned VCs E yg. This is done by equalizing the delays in the MUG’s and emulating the queue of such an access MUX. As in any MU&, a cell is delayed at most B, max,the equalization is accomplished by delaying every cell up to B, ,,,= time slots. For each VC, the VMA needs the values of the following four parameters: (i) a; (ii) T; (iii) (P=,i.e. the phase displacement of the virtually reconstructed pattern at the input of the MUX,; (iv) the logical buffer region h of the output buffer j where the cells of the VC are stored. These four parameters are stored in a table P. The first two parameters, a and T, are declared by sources during the call set-up; the fourth is known after call set-up, while the phase information qua has to be acquired. The delay d suffered by the first cell of the VC in the MUX, must be known to estimate (Pi. The delay d (O
and table P is completed by writing (Pi = It is to be noted that these parameters are associated to the VCI of the new VC; then such a VCI serves as an entry point to retrieve the parameters when needed. The number 6 of active VCs is incremented after B a max- d time slots, i.e. at time tl + B, max- d. cell,
(tl - d)mod(T).
Information transfer phase. In each time slot t, and index k is run from 1 to 6 (following the set-up order) and the VCI of the kth VC is written in a virtual queue VQ iff there exist n, n = 0, . . ., ak - 1, such that (t - (P&-B (I,,)IIIOd(Tk) = [nTk/ak + l/2]. The head-of-line VCI in the VQ, i.e. VCI*, is compared with the VU of the head-of-line cell in the logical region h associated to VCI*. If they are equal that cell is emitted, otherwise the current output slot is left empty; in any case, the headof-line element of VQ is cancelled. Call tear-down. When an ending cell is received then, after waiting B, max+ Bp time slots, the line of the table P relevant to the torn-down VC is cancelled and S is decreased.
5. Performance evaluation The performance evaluation is focused on: (i) criteria for the dimensioning of the buffer size of a single VC shaper (Section 5.1); (ii) a comparison between the WDPA and other deterministic allocation strategies (Section 5.2). 5.1. Buffer sizing of a single VC shaper In previous sections, we assumed that cells belonging to a given VC are patterned according to the uniform (a, T)pattern. So, cells of a given VC can only be emitted at specific (discrete) times, called emission opportunities. This means that a VBR source that chooses to utilize the CE facility must request a bandwidth equal to its peak bit rate. If there are not stringent delay requirements, the cell stream of the source can be shaped before entering the network. In this case, the shaping function consists in reducing the peak cell rate, and thus the value of the requested bandwidth. A single VC shaper can be modelled as a queueing system with a finite buffer, comprising K cell positions, and a deterministic server operating according to the adopted deterministic mask with peak bit rate equal to C bit/s. As far as the arrival process is concerned, we consider: (a) a classical two-state Markovian On-Off source model (Section 5.1.1); (b) a LAN originated traffic (Section 5.1.2); (c) a VBR video source (Section 5.1.3). All analytical results are here obtained by means of the fluid flow approximation [17,18], where the source stream is modelled by a fluid flow stochastic process whose realizations are piecewise constant rate functions f (t), representing the instantaneous source bit rate. In the fluid flow context, the effect of the superimposed deterministic mask is taken
A. Baiocchi et aLlComputer Communications 20 (1997) 11-28
into account by a constant shaper output rate equal to TC bit/s, where q is the fraction of the link bandwidth that the source has been assigned. Instead, in the case of simulations, the discrete nature of the ATM cells forming the source stream has been taken into account, together with the deterministic emission pattern enforced by the shaper. 5.1.1. Two-state Markovian on-off sources The rate functions f(t) relevant to a two-state On-Off
source alternate between bursts (peak bit rate emission) and silences (no emission). Bursts and silences are assumed to be exponentially distributed. Three parameters are needed to define the source emission process: Fp source peak bit rate;
p
ratio between the mean and the peak bit rate (activity factor); b mean burst length in ATM cells. Let II be the cell loss probability at shaper and K = K/b, i.e. buffer size measured in mean burst length unit. By using the same approach as in Ref. [16], the following results arise: due to the choice of the uniform pattern, the fluid flow approximation implies negligible inaccuracies in the evaluation of II provided that K exceeds a few units; the uniform pattern minimizes II since it minimizes the variance of the inter-opportunity intervals; the minimum feasible amount of buffer space K~~(II~) under the requirement II I II0 is
0
20
40
60
80
Fig. 7. Comparison between a two-state Markov On-Off source and a source with a limited burst length for a bandwidth assignment equal to three times the mean bit rate of the source. Ethernet generated traffic.
Instead of using directly the abscissa 8, we introduce the normalized excess bandwidth x, defined as x = (0 - p)l (1 - p). It is clear that x ranges between 0 and 1 when
p c 6 5 1. The value x = 1 corresponds to a peak bit rate assignment, while n = 0 implies an average bandwidth assignment. Note also that the limit of K,~ as p -, O+ is simply %in,O(HO)= (1 - X>log[(l - x)&J.
xlog
(
i
l-----
-pie
1-P
Fig. 6 plots ~~~(10-~) 1-p/e
+---1-P
l-0
(10)
no >
where 0 = r) CIF,, sop < 0 5 1. 1
20 18
-
0
0.2
FRACTION
0.4 OF NORMALIZED
-
p=o.2
-
p = 0.5
-x_
pzo.7
-
p = 0.9
0.6
0.8 BANDWIDTH,
1 x
Fig. 6. Buffer size required to achieve a cell loss probability less than 10m9 vs. the fraction of normalized bandwidth, for different values of the activity factor.
100
BUFFER SIZE (CELLS)
as a function
(11) of X, for
Fp = 10 Mbit/s, C = 150 Mbit/s and for various values of
as well as the limiting curve relevant to ~~,~(10-~). It can be seen that, if buffer sizes and hence delays are to be kept small, a bandwidth near the peak bit rate must be allocated. If we choose K = 10 as an upper bound of the buffer size, Fig. 6 shows that the behaviour of ~,.,,h(lO-~) is almost linear no matter what the value of p, provided that p 5 0.1. As a thumb rule, we can say that for x 2 0.5 and p 5 0.2 + 0.3 the shaper dimensioning can be done using Eq. (ll), irrespective of the actual value of p. A remarkable decrease of the required buffer can be achieved if the maximum tolerable cell loss is relaxed. However, except for high activity factors, keeping delays small requires assigning a quite large fraction of the source peak bit rate. p,
5.1.2. LAN sources In the previous section, it has been shown that a large shaper buffer is needed to meet the requirement II I lop9 if the source is characterized by low values of X. This result deserves two considerations: (i) many applications do not require a cell loss probability less than 10m9;in this case the user can tailor the shaper parameters to his needs and negotiate a bandwidth
A. Baiocchi et aLlComputer Communications 20 (1997) 11-28
22
assignment far from the peak bit rate with lower values of the shaper buffer size; it is to be noted that this significant advantage cannot be achieved by means of a statistical ATM allocation scheme, since 10e9 cell loss constraint must be guaranteed for each traffic source, that causes a remarkable waste of resources; (ii) the two-state Markovian On-Off source model is somehow not realistic, since the On times can be arbitrarily long; instead, most probably, a real source does not remain in the On state for a time greater than a given value, depending on the maximum allowed size of the used Protocol Data Units. To this aim, we introduce a simple model that attempts to describe a real LAN originated traffic, setting a limit on the maximum possible value of the burst length. We add a fourth parameter to the three source descriptors, FP,p and b, introduced in the previous section, i.e. the maximum burst length in cells b,,. We further assume that the Off period is geometrically distributed, while the On period has the following probability distribution: (1 - q)qi-l, qLx-1,
i = 1, . . ..b., i=b
- 1
(12)
max,
where q is chosen so that the mean burst length is equal to a desired value b. In Fig. 7, the performance of this kind of source are compared with those of a two-state Markov On-Off model with the same values of the parameters FP, p and b. The curves relevant to the limited burst length sources are denoted by the subscript ~TUIIC. The source parameters are FP = 10 Mbit/s, p = 0.1, b = 20, b, = 30, as a matter of example they can refer to an Ethernet LAN originated traffic flowing through UN1 at C = 150 Mbit/s.
3
150
E. 3 E
100
4 I a 3
50
As for the two-state Markovian On-Off model, Fig, 7 depicts, as a function of the shaper buffer size: the cell loss probability, IIcen; . the probability that a burst is affected by one or more cell losses? nburst _comp. Instead, as for the limited burst length source, the same figures show the cell loss probability, II,,,,trun; the probability that a burst is affected by one or more cell losses, II burst _comp, tmnc * l
l
l
All curves, except IlIce,,,have been obtained by simulations; the 95% confidence intervals are always less than 5% and are not shown to improve the neatness of the figures. The assigned bandwidth is equal to three times the average source bit rate. In the case of limited burst length, the faster decay of the loss probability as a function of the shaper buffer size allows a cell loss probability less than lop5 to be reached with a buffer size equal to 100 cells. 51.3. Video sources As a final example of shaper buffer dimensioning, a VBR videophone and/or videoconference source is considered. A frame rate of 1/3Os and 250,000 pixels/frame are assumed. For the shaper analysis the fluid flow model of the source stream proposed in Ref. [19] is adopted. This stochastic fluid can be viewed as the superposition of 20 identical two-state Markov On-Off sources (Mini-Sources, MS), with the following parameter values: FP,Ms= 945 kbitls, pMs = 0.2014 and bMs = 715.624 cells. The overall peak and average bit rates of the video source model are FP = 18900 kbitls and FA = 3806.84 kbit/s, respectively. Fig. 8 displays the maximum delay &.&I,) = I~,,~K,~(II~)/[F~ + x(FP - FA)] corresponding to a buffer size of Key as a function of the normalized excess bandwidth, x for three values of I&, Lcell being the ATM cell length. The decay of these curves is much faster than in the case of two-state On-Off sources, due to the smoothing effect of the multilevel emission with respect to the peak to zero abrupt change exhibited by an On-Off source. For example, Fig. 8 yields that an assigned bandwidth equal to slightly more than 11 Mbit/s (about 60% of the source peak bit rate) is sufficient to meet the requirement I&,( 10A9) 5 100 ms. 5.2. Comparison with other deterministic allocation strategies
C 0.0
0.2
0.4
0.6
0.8
1.0
FRACTION OF NORMALIZED BANDWIDTH, x Fig. 8. Buffer size (in ms) required to achieve a cell loss probability less than 10e9, 10e6 and 10e3 vs. the fraction of normalized bandwidth for the video source model proposed in Ref. [19].
Congestion control mechanisms based on a control of the source flows and on constrained multiplexing schemes have been proposed in literature, among others the Virtual Framing (VF) strategy [8,9] and the Counter Based Congestion (CBC) control [lo]. Both these strategies allow the assignment of bandwidth to each VC independently of
A. Baiocchi et al.lComputer Communications 20 (1997) 11-28
other existing VCs (as in circuit switched networks). These strategies assume that a VC complies with a specified emission pattern at the network edge, and give multiplexing rules inside the network that ensure that the regularity of the traffic streams is maintained. As for the VF strategy, it is based on the definition of a time frame of duration T, and requires that a source emit no more than a specified number of cells, say a, in each fixed time frame. The VF strategy has been extended to encompass the case of multiple frame lengths, but the chosen values must be integer multiples of a basic time period. Let the sources be grouped into NC classes, where the ith class comprise Li sources, each of which is allowed to emit aj cells in a frame Ti (1 I i I NC). The assigned bandwidth for a class i source is a fraction ailTi of the total link capacity. The admission policy ensures the smoothness of the traffic; this property is maintained throughout the network by means of the so called Stop&Go queueing discipline. In any given network link, multiplexing can be performed without any cell loss, by using a buffer size of B cells, if the following allocation rules are adopted:
23
1.0 h $ ki 5
o.8
0.6
i a
0,4
E 5
0.2
0
0.2
0.4
0.6
0.8
Fig. 10. Virtual link efficiency as a function of the fraction of the link capacity required by each VC; comparison between WDPA and CBC. case are as follows:
~Li(~+~)
sB
[buffer allocation] (14)
NC
PiLiai 5 B
1
FRACTION OF REQUESTED BANDWIDTH PER VC
L&l
[buffer allocation] c
i=l
[bandwidth allocation] L
i=l
(13) [bandwidth allocation]
where it can be shown [8,9] that pi c 3. Rules (13) hold if T, = K~T~_~, i 2 2, where the Ki’s are arbitrary integer values. The CBC control allows each source to define its frame size independently of one another, but a source is forced to emit a single cell in each frame. The allocation rules in this
wherein k = 1.c.m.I TI, T2, . . ., TNC). Both these strategies need a large buffer to obtain a fine granularity in bandwidth allocation, whereas a reduction of the buffer requirements result in a proportional increase in the incremental steps of bandwidth allocation. The WDPA regulates the cell emission of a VC, also requiring that the time positions of the emission opportunities are fixed. This further constraint determines a sensitive saving in resource allocation. The allocation rules here considered are here rewritten: LiIB )
1.0
(15)
NC
Ic
c z c
[buffer allocation]
i=l
i=l
0.9
G Y
0.8
5 2 a
0.7
% g
0.6 0.5
0
0.05
0.10
0.15
FRACTION OF REQUESTED BANDWIDTH PER VC
Fig. 9. Virtual link efficiency as a function of the fraction of the link capacity required by each VC; comparison between WDPA and VP.
Li ;
5 1
[bandwidth allocation]
1
The comparison between these three strategies is performed by evaluating the total capacity that can be allocated on a link, called virtual link eficiency, and the amount of buffer required for a 100% virtual link efficiency, Fig. 9 plots the virtual link efficiency of WDPA and VF strategies as a function of fraction of the link capacity required by each VC, in an homogeneous case. A buffer size B of 300 cells is assumed for both strategies, while the frame length of VF has the optimal value T = 100. In fact, for the VF strategy the bandwidth granularity increases with T, while the virtual link efficiency decreases with T for T 2 B/3. The inefficiency of VF is due to the constrained value of T, which impairs the granularity of the allocable
A. Baiocchi
24
et al.lComputer
bandwidth; this eventually leads to oversizing the assigned bandwidth. Fig. 10 is devoted to the comparison between CBC and WDPA in a homogeneous case. The buffer size is equal to 100 cells. In this case, the inefficiency of CBC is due to the stringent requirement that at most one opportunity per frame is issued; this reduces the flexibility in bandwidth allocation. Figs. 11 and 12 show the amount of buffer required to achieve a 100% virtual link efficiency in a two traffic classes mix. Each VC of the first class requires 10% of the link capacity and uses a frame of length Tt = 10 time slots and al = 1 emission opportunity per frame; this causes no bandwidth and buffer waste. Fig. 11 refers to the WDPA and VF strategies. The amount of required buffer is plotted versus the reciprocal of the fraction of bandwidth required by ,each VC of the second class (q2). For each value of q2 the frame size TZ is chosen under the constraint that it must be an integer multiple of the frame size Tt and that the value a2 is such that a21T2 equals n2. The large buffer size required by the VF strategy is due to the constraint on the frame size that implies values of a2 greater than 1. Fig. 12 compares WDPA and CBC. In this case the greater buffer requirements of CBC with respect to WDPA is due to the term proportional to the least common multiple of the frame sizes used in the considered link, i.e. T2 and Tt (see Eq. (14)). It is worth noting that the performance improvement brought about by WDPA is essentially due to the tighter constraints on the assumed source traffic profiles. This has a negligible impact on the performance (delay and cell loss) and dimensioning of the source shaper, if it is needed. 6. Overall efficiency In this section we discuss a method for improving
the
1600 5 Y
1400
z
1200
2
Es
1000
$2
800
/
VF
,’
.I
es = m
600
0
20
40
60
80
100
FRAME LENGTH (r2, Fig. 11. Buffer requirements for a superposition of two traffic classes in a heterogeneous case as a function of the fraction of the link capacity required by one of them; comparison between WLIPA and VF.
Communications
20 (I 997)
11-28
800
600
400
200
0 0
20
40
60
80
100
FRAME LENGTH ( T2) Fig. 12. Buffer requirements for a superposition of hvo traffic classes in a heterogeneous case as a function of the fraction of the link capacity required by one of them; comparison between WDPA and CBC.
overall network efficiency for sources handled with the CE facility. It is possible to show that the network efficiency provided by our approach is always greater than that relevant to the peak allocation; moreover, the WDPA guarantees the QoS, while a simple peak allocation, without re-spacing mechanisms in internal network sections, does not [2]. The improvement in the efficiency brought about by WDPA with respect to the peak allocation depends on traffic characteristics and on the system parameters. It is also worth noting that the network efficiency can be increased by adopting a fast bandwidth re-negotiation algorithm. This approach is allowed by WDPA due to its very simple allocation and de-allocation rules. However, the allocation rules derived in Sections 3 and 4 lead to rather strict bounds on the number of VC that can be set up across any given network section. This is because we are accounting for the worst case of buffer occupancy and delays. Since the VC cell streams are randomly phased, it is extremely unlikely that the worst case will ever occur. If it is tolerated that the buffers can be overflowed with a probability E (possibly very low, e.g. -10-15, so that each VC is surely guaranteed to perceive the desired QoS), much looser constraints can be defined, thus obtaining further efficiency gains. Let us consider a MUX loaded with uniform (a, T)patterns. The exact queueing model of the MUX is a ciDi (ai, TJ/D/l/B, where Di (aj, Ti) denotes a deterministic process whose arrivals correspond to the emission opportunities of a uniform (ai, T,)-pattern. The buffer overflow probability of such a queue can be overestimated by means of the survivor function of the CiDi(ai, TJDl queue (see the appendix). The main result is that, when the number L of VCs multiplexed on a link is large, the buffer occupancy level is much lower than the upper bound L, given in Property 1. Therefore the same number of VCs can be allocated in a smaller buffer, if a buffer overflow with probability E is tolerated. Furthermore, the
A. Baiocchi et al/Computer
0
50
100
150
SUFFER POSITION Fig. 13. Comparison among the buffer occupancy survivor function of a multiplexer loaded by L uniform (a, T)-patterns (upper bound) and the M/D/l model for A = 0.9.
buffer overflow probability of this queueing system can be upper bounded by using the much simpler M/D/l/B model, where the input Poisson process is chosen so as to match the mean cell arrival rates in the two queues. Then, instead of using the allocation rules proposed in Sections 3 and 4, a simple M/D/l/B model can be used. For instance, it can be easily obtained that a delay constraint of 1 ms per network section with C = 150 Mbit/s implies B = 175 ATM cells; so, if E = 10p9, the input mean load of a MUX must be limited to p* = 0.95. With this approach, rules (7) are replaced with the inequality Ciai/Ti 5 p*. As an example, Fig. 13 shows the survivor function of the buffer occupancy vs. the buffer position for a superposition of L identical and randomly phased uniform (a, T)-patterns with L = 90, 270 and 810. The value of p is such that the overall mean offered load is always equal to A = 0.9. The buffer occupancy survivor function of the M/D/l is also plotted. This figure confirms that the latter upper bound can be utilized for establishing an easy and convenient allocation rule.
7. Unifying framework In this section, we sketch a possible unifying framework for the control of VBR,,,,, VBRtic, CBR and ABR traffic that can fit in with the current standardization activity and technology. The statistical buffer allocation proposed in the previous section for VBRci, and CBR traffic tends to exploit some of the potential statistical gain offered by the ATM, while maintaining small buffers inside the network. This allocation strategy could be the linking element with the statistical multiplexing of VBR,,, which is again based on the utilization of small buffers, but does not require a CE facility to impose and maintain a uniformly patterned cell stream (thanks to the high value of the ratio between the link
Communications 20 (1997) II-28
25
bit rate and the peak bit rate of the sources that allow us to model VBR,, sources with a Poisson arrival process [4,5,20]). Then, resource allocation could be handled by using the same kind of model for all the VBR and the CBR traffic (e.g. M/D/l/B models). Up to now, we have shown a possible way to unify the control of VBR and CBR traffic. It remains to discuss ABR traffic. Our strategy is not in contrast with the rate-based strategy ratified by the ATM Forum for congestion control. In fact, the latter strategy has been defined for the control of ABR traffic. The primary goal of the ABR service is the economical support of applications with vague requirements for throughputs and delays. ABR traffic is transmitted by using the capacity left available by other traffic, i.e. CBR and VBR. The latter capacity consists of two components: the capacity not assigned to CBR and VBR traffic; and the capacity assigned to VBR traffic but not used due to its bursty nature. The ABR rate control is based on feedback from the network to the traffic sources, and is thus conceptually independent of the control mechanisms of the CBR and VBR traffic [7]. In Section 2.1 we observed that in the WDPA, link utilization inefficiency arises when a source does not utilize all the emission opportunities of the negotiated pattern (i.e. always, unless it is a CBR source or a VBR source shaped to become a CBR one). Such a potential inefficiency can be mitigated by utilizing the unused bandwidth to carry the ABR traffic. In conclusion, our approach fits in with the current standardization activity and technology.
8. Conclusions Circuit switching has some merits that cannot be given up, even when considering ATM as the transfer mode for the B-ISDN. Most evidently, the performance guarantees provided by circuit switching (e.g. no cell loss, limited delay, no delay jitter) are desirable for several teleservices. This work focuses on the definition and realization of a logical scheme apt to provide guaranteed performance (just like circuit switching does) in an ATM environment. Such a scheme should be a part of the traffic control procedures to be run in an ATM network, along with procedures handling those VBR sources that do not pose strict performance requirements and lend themselves to statistical multiplexing. A more quantitative definition of the various traffic classes is certainly a critical aspect, and is currently under investigation. The other main point of the work still derives from the desire to preserve a nice characteristic of circuit switched networks; (i) the resource allocation rules should be simple and the contribution of each traffic source should be explicit; (ii) the models used in performance assessment and dimensioning should be general and robust, and hence simple (the least possible number of assumptions and
26
A. Baiocchi et aLlComputer Communications 20 (1997) 11-28
parameters should be involved). The results recalled in this work converge to the conclusion that these goals can be pursued by using small buffers (e.g. one or at most 200 cells) inside the network to be precise, buffers should be operated in the ceZZregion [4,5,20]. The proposed CE facility clearly adheres to goals (i) and (ii) above.
Appendix This appendix
is devoted to the analysis of the queue, i.e. a single server queueing model with an infinite waiting room, constant service times and an arrival process consisting of the superposition of periodic, independent and heterogeneous sources. Each source generates cells according to a uniform (a, T)-pattern. The analysis presented is an extension of that in Refs. [14,15], where the arrival process consists of the superposition of periodic arrival stream with only one cell in every period. Here such results are generalized, allowing a source to emit more than one cell in a period. The sources are divided into Nc traffic classes, each comprising homogeneous sources. The constant value of the service time is taken as the time unit. By using the same notation as in Sections 3 and 5.2, the server utilization p (p I 1) is given by CiDi(Ui,
Ti)lD/l
vector of probabilities, denoted by qs,ij, whose kth component denotes the probability that Rij(s) = k, 0 (- k 5 ai. Notice that such a distribution is the same for all the sources of class i. The innovation with respect to previous works lies in the evaluation of qs,ij. We define a Ti-dimensional binary vector Vii to the jth source of class i, where a ‘1’ stands for cell emission, and a ‘0’ means that cell emission is inhibited. For the uniform (a, T)-patterns, instead of giving closed formula for q~,ij,we resort to an algorithmic procedure. If~‘=O~{~,,~~(O)=l~~,,~~(h)=O,l~h~~~),whilc for s’ = 1, . . ..Ti - 1: Step 1 initialize qs,ij by letting qs,ij(h) = 0 for 0 I h 5 ai; Step 2 for m = 0 to Ti - 1 do ifm+s’
n=
vii (h)
c
h=m+l Cls,ij (n)
=
qs.ij (n) +
l/Ti
else n=
%
Vii(h) + m’~
h=m+l qs,ij (n> =
’ Vii(h)
h=l qs,ij (4
+
l/Ti
endif endfor.
(A.1) The phases of the sources of the ith class (i = 1, . . .,Nb) with respect to a lixed time origin, are chosen at random with a uniform distribution in the interval [0, Ti), lsisNc. Let Q, be the number of cells queued at time t and V(U,t) the number of arrivals in the interval [u, t). The complementary queue length probability distribution G,(x) = Pr(Q, > x} can be expressed as [14,15]: G,(x)=ePr{v(t-s,t)=x+s
and
Q,-, = 0)
SC1
(A.3
Let Ri(s) be the random part of cell arrivals due to all the sources belonging to the ith traffic class. The vector qs,i, (qs,i(k) = Pr{Ri(s) = k}), is the convolution of the Li row vectors qs,ij, j = 1, . . . . L i (Li - 1 convolutions). The number of components of qs,i is therefore UiLi + 1. The overall random part due to the superposition of all the sources of the N, traffic classes, denoted by R(s) = CfLl Cf.LlRij(s), h as a distribution described by the row vector qs, of Liar + **a+ LNcuNc+ 1 components, and is obtained by convoluting the N, vectors qs,i, i = 1, . . . , N, (N, - 1 convolutions). In conclusion, the number of cell arrivals in the interval [t - s, t) is the sum of a deterministic part D(s), given by
D(s) =
2 Di(S) = 2 i=l
&Ui[S/TiJ,
i=l
where p,(x) = Pr{v(t - s, t) = x + s) 1 ~0(~,~)=Pr{Q2t_s=O~~(t-~s,t)=x+s].
(A.3)
Let us consider the probability p,(x). The number of cell arrivals in the interval [t - s, t) consists of two parts: a deterministic part D(s) and a random part R(s). The deterministic part due to each source of class i is Di(S) = ai[dTi]. The random part due to the jth source of class i, Rij (s), is the number of arrivals in a randomly chosen interval of lengths s’ = s - [S/Ti]Ti, and can take values between 0 and ai. SO the distribution of Rij(s) is a row
and of a random part R(s) with distribution qs. The probability distribution p,(x) in (A.3) can be expressed in terms of qs: qs(x + s - D(s)), PSW
if D(s) 5 x + s 5 D(s) +L*a,
=
1
0,
+ .** +LNcury,
otherwise.
As for no(x,s) in (A.3), following Refs. [14,15], we use the inequality 1 - p 5 no(x, s) i 1 in (A.2) to obtain upper and lower bounds of the queue length probability
A. Baiocchi et aLlComputer Communications 20 (1997) II-28
distribution: (A-4) Formula (A.4) shows that, for p < 0.9, the relative error implied by the approximate bounds is confined within one order of magnitude. Such an error can be tolerated when the resource assignment is such that the overBow probability is less than 10e9. Finally, if a (a 2 1) cells are emitted consecutively in a period Ti (non-uniform pattern or On-Off sources with constant On and Off times [16], the vectors qs,ij can be evaluated with the following closed formulae: l
l
if S’ = 0, qs,ij(O) = 1 and q,ij(h) = 0 for 1 I h 5 ai; if s’ # 0, for each value of s’ = 1, . . spTi - 1 we have: cls,ij Co) =
(T,-~i-r’+l)/Ti
if ~‘STi-ai
0
otherwise
qs,ij(Ui)= S’lTi; - 1 we have
if Ui=l, k=l ,...,q
0
=
instead
ai > 1
for
if s’ < k or if s’>Ti-ai+k
I qs,ii (k)
if
(A.-)
I
1 -
k-l C
qs,ij (h)
if S’ = k
h=O
2/q
if S’sTi-ui+k-1
(k + 1YTi
ifs’=Ti-ai+k
I
(A.6) and qs,ij C”i) =
(S’ - Ui + 1)/q
if S’ 2
0
otherwise.
Ui
(A-7)
References [l] CCI’lT, Recommendation 1.371, Traffic Control and Congestion Control in B-ISDN, Geneva, 1992. [2] P.E. Boyer, F.M. Guillemin, M.J. Serve1 and J.P. Codruese, Spacing cells protects and enhances utilization of ATM network links, IEEE Network, Vol 6, No 5 (September 1992) pp 38-49. [3] D. Hong and T. Suda, Congestion control and prevention in ATM networks, IEEE Network, Vol5, No 4 (July 1991) pp 10-16. [4] IEEE J. Selected Areas in Communications, Special issue on ‘Teletraffic Analysis of ATM system’, Vol 9, No 3 (April 1991). [5] A. Baiocchi, N. BlCfari-Me&xi, A. Roveri and F. Salvatore, Stochastic fluid analysis of an ATM multiplexer loaded with heterogeneous On-Off sources: an effective computational approach, INFOCOM 92, Florence, Italy, 4-8 May 1992, pp 405-414. [6] C.T. Lea, What should be the goal for ATM, IEEE Network, Vol 6, No 5 (September 1992).
27
[7] IEEE Network, Special Issue on ‘ATM flow control: rate vs. credit’, Vol 9, No 2 (March/April 1995). [8] S.J. Golestani, Congestion-free communication in high-speed packet networks, IEEE Trans Communications, Vol 39, No 12 (December 1991) pp 1802-1812. [9] S.J. Golestani, A framing strategy for congestion management, IEEE J Selected Areas in Communications, Vol 9, No 7 (September 1991) pp 1064-1077. [lo] I. Chlamtac and T. Zhang, A counter based congestion control (CBC) for ATM networks, Computer Networks & ISDN Systems, Vol 26, No 1 (September 1993) pp 5-27. [ll] P.E. Boyer and D.P. Tranchier, A reservation principle with application to the ATM traffic control, Computer Networks & ISDN Systems, VoI 24 (1992) pp 321-324. [12] Y. Ohba, M. Murata and H. Miyahara, Analysis of interdeparture process for bursty traffic in ATM networks, IEEE JSAC, Vol 9, No 3 (April 1991) pp 468-476. [13] H. Saito, The departure process of an N/G/l queue, Perf. Eval., Vol 11, No 4 (November 1990) pp 241-251. [14] J.W. Roberts and J.T. Virtamo, The superposition of periodic cell arrival streams in an ATM multiplexer, IEEE Trans Communications, Vol39, No 2 (February 1991) pp 298-302. [15] J.T. Virtamo and J.W. Roberts, Evaluating buffer requirements in ATM multiplexer, GLOBECGM 89, Dallas, TX, paper 41.4, pp 1473-1477. [16] A. Baiocchi, N. Blefari-Melazzi and M. Listanti, Worst deterministic pattern allocation strategy: a practical solution to go beyond peak allocation in ATM networks, 14th International Teletraffic Congress, Antibes Juan-les-Pins, France, June 6-10 1994, pp 571-580. [17] S. Anick, D. Mitra and M.M. Sondhi, Stochastic theory of a datahandling systems with multiple sources, Bell System Tech. J., Vol61, No 8 (October 1982) pp 1871-1894. [18] R.G. Tucker, Accurate method for analysis of a packet-speech multiplexer with limited delay, IEEE Trans Communications, Voi 36, No 4 (April 1988) pp 479-483. [19] B. Maglaris, D. Anastassiou, P. Sen, G. Karlsson and J.D. Robbins, Performance models of statistical multiplexing in packet video communications, IEEE Trans Communicatins, Vol 36, No 7 (July 1988) pp 834-844. 1201J.W. Roberts, Traffic control in the B-ISDN, Computer Networks & ISDN Systems, Vol 25, No 10 (May 1993).
Marco Listanti received his Dr. Eng. degree in electronics engineering from the Universiiy Za Sapienza’ of Roma in 1980. He joined the Fondazione Ugo Bordoni in 1981, where he was leader of the TLC network architecture group until 1991. In November 1991 he joined the University of Roma, where he is currently an Associate Professor in Switching Systems. He also holds lectures at the University of Roma ‘Tor Vergata’ on Communications Networks. His current research interests focus on multimedia broadband communications, high throughput switching architectures and integration between the intelligent network and the B-iSDN. Dr Listami is a representative of the Italian PTT administration in work being carried out by international standardization organizations (ITU-T, ETSI).
28
A. Baiocchi et aLlComputer Communications 20 (1997) II-28
Andrea Baiocchi received his ‘Laurea’ in electronics engineering in 1987 and his PhD in information and communications engineering in 1992, both from the University of Roma ‘La Sapiensa’. From 1991 to 1992 he was a Researcher at the Department of Mathematical Methods and Models for Applied Sciences of the University of Rome Za Sapienza’, where he held lectures in numerical analysis. In July 1992 he joined the INFOCOM Department in the same university as a Researcher in Communications, where he currently works in the area of communications networks. His main research interests lie in the field of multiaccess network protocols, trafhc modling and performance evaluation in broadband communications and mobile networks.
Nicola Blefari-Melazzi received his Zaurea’ in electronics engineering in 1989 and his PhD in information and communications engineering in 1994, both from the University of Roma ‘La Sapienza’. Since 1993 he has been with the Elettronica Department of the University of Roma at Tor Vergata as a Researcher in the communications network area. His main interests focus on broadband and multimedia communications, traffic control in ATM networks and performance evaluation of interconnection networks.
Francesca Cuomo received her ‘Laurea’ in electronics engineeringfrom the University of Roma ‘La Sapierua ’ in 1993. Since 1993 she has held a scholarship to work with the Telecommunication Networks Group at the Dip. INFOCOM of the University of Roma ‘La Sapienza’. In 1994 she entered a three year PhD program at the same department. Her current interests focus on trafic control schemes in integrated services broadband networks and integration between the intelligent network and the B-ISDN for the support of multimedia communications. I