Simulation Modelling Practice and Theory 12 (2004) 239–261 www.elsevier.com/locate/simpat
Modeling and simulation of data transmission on a hybrid fiber coax cable network M. Garcia a
a,*
, D.F. Garcia a, V.G. Garcia a, R. Bonis
b
Department of Computer Science, University of Oviedo, Office 1.2.15, Campus de Viesques, 33204 Gijon, Spain b TeleCable, s.a., Scientific Park, 33206 Gijon, Spain
Received 9 April 2003; received in revised form 15 September 2003; accepted 29 October 2003 Available online 5 June 2004
Abstract This paper describes the steps followed in the development, simulation, and later validation of a model for a cable network based on hybrid fiber-coax (HFC) technology, used for data transmission. This work presents a representation of a communication system which has been growing dramatically over recent years and will continue to do so in the near future. The modeling process is based on the analysis of measurements of a cable operator, establishing a direct relationship between model parameters and network characteristics. The modeling technique produces a scalable model capable of simulating the evolution of the real cable network. 2004 Elsevier B.V. All rights reserved. Keywords: Hybrid fiber-coax network; Traffic modeling; Network simulation
1. Introduction This paper describes the steps followed in the development of a simulation model for a cable network providing data transmission services. Cable networks have traditionally been associated to television broadcasts; however, over the last decade the explosive growth of Internet has sparked an interest in alternatives to traditional telephonic lines for Internet access. The advantages of cable networks over telephonic lines are their broader bandwidth and their great number of subscribers. Their great disadvantage is the absence of a return path, necessary to make the cable network bidirectional. In consolidated *
Corresponding author. Tel.: +34-985-182519; fax: +34-985-181986. E-mail address:
[email protected] (M. Garcia).
1569-190X/$ - see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.simpat.2003.10.005
240
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
and widely extended cable networks, overcoming this disadvantage requires great investments, making them a less viable alternative as an Internet access medium. However, there is a new group of cable networks, based on hybrid fiber-coax (HFC) technology, which represents a real alternative. The HFC technology improves network capacity and reliability, which facilitates the implementation of a return path. These characteristics permit the cable operator using this kind of network to act as global telecommunication operator, providing television, voice and Internet services using the same network. Currently, the main problems faced by companies using HFC technology are coping with the rapid growth of new subscribers, and the demand for new data services, such as multimedia interactive services. Cable operators must be able to predict how many users could potentially demand these services simultaneously without affecting the performance of the network, and to determine the influence of new services on the HFC network. The best way to answer these and other questions is to use a model of the cable network which includes the new service requirements. At the same time the cable network model can be used by the cable operator to support tuning and capacity planning decisions. In this paper, the development of a simulation model for data transmission on an HFC cable network is described. This model represents a generic HFC architecture, and is validated in use by a cable operator. The main characteristics of the cable network model are: • It captures the evolution of network performance over time, rather than at a fixed moment. • The performance provided by the model is directly related to the network capacity (the percentage of utilization of the network channels). • The parameters of the model are obtained from network characteristics, i.e. the number of network subscribers and the traffic measurements. • The relationships established between the traffic measurements and the number of subscribers assigned to each channel make the use of the model very simple. This paper follows the development of the simulation model. Section 2 provides a general description of the system to be represented. Other related works are summarized in Section 3. The development of the cable network model is divided in two parts: Section 4 describes the way the cable network is used, that is, the traffic model and Section 5 describes the cable network model. In Section 6, the results obtained are presented, and compared with the results obtained from the real cable network. Finally, Section 7 summarizes the main characteristics of the developed model and the conclusions obtained.
2. Description of the system The simulation model represents a generic cable network based on HFC technology. This technology combines traditional coax cables with fiber optics, establishing
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
241
a hierarchical architecture. The following is a description of this architecture from the bottom to the top. The coax cable, also called the ‘‘last mile’’, is the part nearest to the subscribers. In the subscribers’ homes, data services are accessed through a cable modem, connected to a home PC. This cable modem contains the hardware to receive and transmit signals over the HFC network. It also negotiates access to the HFC network, determining the maximum speed at which it can transmit. Each coax cable is shared by several subscribers and ends at a local optical node. Each optical node supports between 200 and 300 subscribers, in an area of 400–800 m in diameter. In the local optical node the electrical signal transmitted by the coax cable is transformed into an optical signal, to be transmitted using optic fibers towards the head-end channel switch (HCX). This structure constitutes an HFC branch. The next level in the hierarchical structure is the connection between the different branches. This connection is made using fiber trunks with different topologies. The model developed is based on the data network of the cable operator T E L E C A B L E , I N C . , one of the most important cable operators in Spain. This company, created in 1995, began by providing cable TV services; later with the growth in the demand for telecommunications, it evolved and became a global provider. This operator provides TV, voice and data services in an area including three cities, with half a million potential subscribers. Fig. 1 shows the general architecture of its cable network. The operator’s network is organized following an architecture very similar to the generic architecture previously described. However, it presents particular characteristics related with the technology it implements. The network is also organized in HFC branches, as can be seen in Fig. 1, and there are several HFC branches in each city. All the HFC branches in all the cities are connected by an ATM backbone. In one of the cities, and also connected to the ATM backbone, is the main head-end, where the servers and the Internet access are located.
Fig. 1. Architecture of the cable network.
242
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
The cable modem used in the subscribers’ homes transforms Ethernet packets into ATM cells and viceversa; it controls access to the channel using a proprietary protocol very similar to DOCSIS (Data Over Cable Service Interface Specifications) protocol. The data transmission on the HFC branch is bidirectional; there are two channels with very different characteristics: the upstream channel, used by the subscribers to send data requests, and the downstream channel, through which the subscribers receive data. The downstream channel is shared by all the subscribers and has a bandwidth of 30 Mbps. On the other side, there are up to six upstream channels in each HFC branch, each one with a bandwidth of 1.9 Mbps. Each subscriber is assigned to one of these upstream channels. All the traffic originated in the HFC branch is sent towards the branch controller (HCX in Fig. 1). This element distinguishes between the traffic whose destination is in the same branch, internal traffic, and the rest of the traffic. The internal traffic is sent back, and the rest of the traffic is sent towards the ATM switch to reach its destination. The model of the cable network developed in this work is based on the analysis of the traffic measurements taken on all the channels of this network. When the measurements were taken the cable network providing data services had 17 HFC branches and 17,369 subscribers. These dimensions are adequate to obtain representative information. The cable network works on a 24 h ‘‘flat rate’’ with ‘‘best effort’’ quality of service.
3. Related work The aim of the developed model is to evaluate the cable network performance, represented mainly by channel bandwidth requirements, as the time of day and number of subscribers change. The use of HFC architecture for data transmission has not been explored in any depth; the main studies in this field are related to standardization efforts for the proposal of an upstream media access control (MAC) protocol. Apart from these studies, only a few papers are devoted to the performance of HFC cable networks. The most representative works are: [1], concerning traffic modeling and analysis of a real HFC branch devoted to telephonic applications; [2], an analysis of the support for multimedia applications using ATM technology in an experimental HFC network; and finally [3], in which the performance perceived by the subscriber for web-browsing and interactive applications are evaluated using analytic modeling and simulation. Like the first two of these three works, the model described in this paper is based on a real system, and it evaluates performance. The differences between this work and related projects however, are significant: instead of just a part of the cable network, the whole network is represented; performance is represented as a percentage of channel utilization and throughput; the model parameters are related to the number of subscribers assigned by the cable operator to each channel, and the time of day considered; and the simulation model built is scalable and can evolve as the real cable network does.
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
243
4. Traffic modeling There are two aspects to be considered in the model development: the physical behaviour of the cable network, and the way the cable network is being used by the subscribers. The data requests sent by the subscribers through the upstream channels of the cable network constitute the network workload. This workload must be represented by a model: the workload model or traffic model. This section is devoted to the development of the traffic model. In the next section this traffic model will be incorporated into the physical model to define the complete cable network model. 4.1. Background There are many studies into traffic models, as they have been developed in parallel with the evolution of telecommunication systems. These traffic models can be divided into two groups: those developed before the work of Leland et al. [4], and those developed after. The authors in [4] proved that data traffic on modern networks exhibits the statistical property of self-similarity, which is represented by traffic invariance independently of the time scale considered. Subsequent studies, [5–8] show that self-similarity can be considered as an inherent property of data traffic in modern telecommunication systems. In the first group, referred to as traditional traffic models, traffic models were associated mainly to Poisson or Markov processes. A complete review of this kind of model can be found in [9]. The second group began in the early 90s, when it was shown that in order to consider traffic self-similarity and produce valid results, traffic models must be able to reproduce not only first and second order moments (mean and variance), but also the autocorrelation function. Some of the most important traffic models developed are: the PT k model, the M/Pareto model and the N-Burst model. The PT k model, [10], represents traffic using a combination of a power tail (PT) distribution (in this case a Pareto distribution), and an exponential ðkÞ distribution. Using this combination of distributions, it is possible to obtain self-similar traffic, while at the same time the effect of the exponential distribution improves model performance. The M/Pareto model, developed by Addie et al. [11], is used in the representation of variable bit rate (VBR) traffic. This model considers a burst superposition. The number of bursts are distributed following a Poisson distribution k, each burst has a constant traffic rate of r, and the length of the burst follows a Pareto distribution. The main disadvantage of this model is that it uses four parameters obtained from three statistical traffic properties, so there is a degree of freedom which produces an indetermination. This indetermination makes the process of parameter adjustment very complex. The N-Burst model, [12], is another traffic modeling alternative based on an ON/ OFF process. During the ON period the system transmits and in the OFF period the system is silent. The N-Burst model considers the superposition of N sources of type ON/OFF. By selecting the distribution of the ON period, it is possible to produce
244
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
self-similar traffic. This model also considers an important parameter know as the ‘‘burstiness’’ parameter, which defines the relationship between the length of the ON and OFF periods. All these models have two common characteristics: firstly, they consider homogeneous traffic, that is traffic produced by only one kind of source or application, and secondly, the traffic model parameters are obtained from real traffic analysis, but with no relationship to the characteristics of the telecommunication system evaluated. This situation results in simplified traffic models difficult to apply to real telecommunication systems, because real traffic is aggregated and there is no clear correspondence between system characteristics and traffic model parameters. The traffic model developed here to represent the traffic on the HFC cable network overcomes these limitations considering the aggregated traffic on the HFC network, and relating the traffic model parameters directly to the HFC network characteristics. The parameters of this traffic model have been related to the number of subscribers assigned by the cable operator to each channel, and the time of day. 4.2. Traffic model construction The objective of the traffic model is to represent the way in which the HFC cable network is used by subscribers. The traffic model distinguishes between the traffic on the upstream and downstream channels. The upstream traffic consist of the requests sent by the subscribers. The downstream traffic is produced as a response to these request. Thus, the traffic model focuses on the upstream traffic, whereas the downstream traffic is obtained from the upstream traffic modified by a random factor. The distribution, mean value and standard deviation of this factor are calculated from the traffic analysis. The traffic model is based on the analysis of the traffic measurements taken on all the channels of the cable network, the number of subscribers assigned to each channel, and the measurements of the IP address assignment to subscribers by the Dynamic Host Configuration Protocol, DHCP, server. These measurements were taken over two different periods of time: January 2001 and January 2002. In the interim, the cable network evolved from 8 to 17 branches and from 8331 to 17,369 subscribers. In spite of the changes in the cable network, the basic relationships found in the analysis are consistent over time. An extended analysis of the measurements can be found in [13]. In order to reproduce the real traffic values on the upstream channels the traffic model must consider three elements: the upstream channel characteristics, the media access control (MAC) protocol, and the user profile (the way in which subscribers demand service from the HFC network). In the following subsections the influence of each element on the development of the model is described. 4.2.1. Upstream channel Each upstream channel is represented as a shared media with a transmission payload of 1.5525 Mb/s. This is so because the HFC operator does not apply quality of service differences. As a result, all subscribers receive the ‘‘best effort’’ quality of service, that is, all of them compete for the channel.
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
245
4.2.2. Cable modem and MAC protocol Each subscriber connects to the cable network through a cable modem, and communication is controlled by the MAC protocol. The communication on the upstream channel uses a proprietary protocol, whose main characteristics are: • The access to the channel is organized in frames, which are repeated continuously, a frame is sent across the upstream channel every 102.4 ms. • The frame is composed of 512 ATM cells, of which only 414 are devoted to data transmission. The rest are used for communication requests, opportunities to join the channel, synchronization, conflict resolution and control. • Of the 414 useful ATM cells, each cable modem can use a maximum of 63 cells in each frame. This maximum value can be controlled by the cable operator depending on the type of contract offered. Using this protocol, cable modems transmit requests towards the controller. Although subscriber contracts with different bandwidths are possible, on the dates when the measurements were taken all the subscribers had the same assigned bandwidth: 128/64 Kb, that is 128 Kb for the downstream channel and 64 Kb for the upstream channel. This bandwidth in the upstream channel corresponds to a maximum transmission value of 16 ATM cells per frame for each subscriber. The effective value is reduced as the number of subscribers using the network increases, because in the ‘‘best effort’’ quality of service the scheduling algorithm provides a fair allocation of bandwidth between all cable modems. The way in which the information is sent produces a burst effect on the channel. Each cable modem works as an ON/OFF process. During the assigned time in the frame, the ON period, it can send up to 16 ATM cells. Then it must wait until the next frame for a new transmission; this waiting time is the OFF period. This pattern is repeated for all cable modems, so in each frame there are sequences of used cells followed by groups of unassigned or unused cells. The pattern is similar for each frame, and is modified as the number of subscribers on the channel changes. The global effect on the channel is a sequence of ON/OFF sources, as can be seen in Fig. 2. 4.2.3. Subscriber profile This is the most relevant aspect in the development of the traffic model, because the diversity of subscribers’ behavior is represented with a reduced set of parameters. Two factors must be considered: when and how the subscribers use the network.
Protocol frame
ON
Protocol frame
ON OFF OFF
Fig. 2. ON/OFF effect in the upstream channel.
246
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
4.2.3.1. When subscribers access the cable network. The total number of connected subscribers to the cable network is known at any given moment from the measurements taken on the DHCP server. The DHCP server assigns an IP address on demand to any subscriber requesting access the cable network. Thus, the DHCP server provides the global evolution of the number of connected subscribers, and can distinguish between IP addresses belonging to one of the cities and the group of the other two cities. Fig. 3 represents the percentage of the total connected subscribers in time for the cities where the cable operator provides services. Both lines show approximately the same evolution, in spite of the difference in the total number of subscribers in each city. This pattern has been observed in all the sub-networks managed by the cable operator. These measurements support the assumption that the number of connected subscribers on all the upstream channels follows a similar pattern to that observed in the DHCP measurements. Thus, using the DHCP measurements, the connection pattern can be synthesized applying Fourier analysis. The expression obtained is then used to determine the number of connected subscribers at each moment, for each upstream channel. The evolution of the number of connected subscribers is reproduced in the traffic model following these steps: (1) The number of connected subscribers in each sample period of the DHCP measurements is expressed as a percentage of the total number of subscribers. (2) The values obtained are then represented by a Fourier series. (3) In order to reduce the number of model parameters, the Fourier series used to synthesize the subscribers evolution is truncated to 90% of the spectral power. This percentage represents a tradeoff between the number of coefficients used and the error committed. The expression obtained is: x½i ¼
p X
Re X ½k cosð2pki=N Þ þ
k¼0
p X
Im X ½k sinð2pki=N Þ
ð1Þ
k¼0
Connected subscribers (%)
where Re X and Im X are the real and imaginary coefficients of the Fourier analysis, N is the number of elements in the Fourier analysis, and p is the number of terms needed to synthesize the function with 90% of the spectral power. 70
City-1 Cities2+3
60 50 40 30 20 10 0 6:00
12:00
18:00
0:00
6:00
12:00
18:00
0:00
6:00
Time of day
Fig. 3. Evolution of connected subscribers to the network.
12:00
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
247
(4) Using Eq. (1), the percentage of connected subscribers is calculated for each sample period, i. (5) Multiplying the number of assigned subscribers to each upstream channel by the percentage obtained in the previous step gives the number of connected subscribers. (6) An analysis of the variations of the connected subscribers in the DHCP measurements shows that there are slight variations in the number of connected subscribers at the same time on different days. These variations follow a normal distribution, and have been included in the result obtained. The final number of connected subscribers is obtained from a normal distribution, which has as its mean the number of connected subscribers obtained in the previous step, and a standard deviation of 5% of the mean value. 4.2.3.2. How subscribers demand services. The second step in the development of the subscriber profile is to determine how the subscribers demand services from the network. The subscriber profile is estimated from the traffic measurement analysis which confirms the following assumptions. Firstly, the mean and peak traffic on the upstream channels are proportional to the number of subscribers assigned to the channel, which means that traffic can be considered nearly homogeneous among the subscribers. Secondly, traffic can be divided into two types: interactive and noninteractive traffic. Interactive traffic increases as the number of connected subscribers increases, this traffic is associated to applications which require human interaction, for example web browsing. On the other hand, non-interactive traffic remains almost constant although the number of connected subscribers varies, and is associated to more permanent applications, for example peer to peer services. For the first assumption, Fig. 4 shows the graphs for the mean values and peak values of the upstream traffic, related with the number of subscribers assigned in each upstream channel. For each graph the points are adjusted by a linear regression model, whose equation is also shown in the graph. In the graphs two dotted lines mark the limits of the confidence interval for the predictions of the regression model, for a 95% level of confidence. This kind of relationship has also been observed on the downstream channels, and they are constant over time. This behaviour was observed both in the 2001 and in the 2002 measurements. The second assumption is based on the traffic profile on the upstream channels, where a continuous level of traffic is observed throughout the day. The existence of two types of traffic has been confirmed by the analysis of the relationship between traffic and connected subscribers, and by the traffic measurements collected in the Internet access router. From the DHCP measurements, the number of connected subscribers in each group of cities is known. The aggregated upstream traffic in each group is calculated by adding all the upstream traffics which belong to each group. Dividing the traffic between the number of connected subscribers provides the traffic per subscriber metric. Fig. 5 shows the evolution of this measurement over time: in periods of high traffic and high number of connected subscribers it remains almost constant, while in periods of low activity it exhibits a peak. This behaviour confirms that during the
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261 Channel band width utilization (%)
248
50 45 40 35 30 25 20 15 10 5 0
y=0.1119x-0.4076 2
R =0.5513
0
50
100
Channel band width utilization (%)
(a)
150
200
250
300
350
250
300
350
Nο Subscribers on the channel
100 90 80 70 60 50 40 30 20 10 0
y=0.2201x+11.811 2 R =0.654
0
50
100
(b)
150
200
Nο Subscribers on the channel
Fig. 4. Relationship of mean and peak traffic with number of subscribers assigned to the upstream channels: (a) mean upstream traffic values, (b) peak upstream traffic values.
Channel utilization per subscriber (%)
1.2
Traffic
1 0.8
Traffic per subscriber
0.6 0.4 0.2
Connected subscribers 0 0:00
6:00
12:00 18:00
0:00
6:00 12:00 18:00
0:00
6:00 12:00
Time of day
Fig. 5. Evolution of traffic per subscriber.
period of human activity interactive traffic dominates, while in the periods of low activity non-interactive traffic is more significant. The analysis of the traffic measurements taken on the Internet access router, broken down into network services, have also shown that the services can be classified into two types of patterns. The most important volume of traffic is associated to the
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
249
‘‘peer to peer’’ application Edonkey, which represents 53.35% of the total upstream traffic. The traffic pattern associated to this application has an almost constant profile throughout the day. On the other hand, the traffic profile of most of the services change over time, as the number of connected subscribers does. The conclusion is that interactive traffic occurs mainly during the human active period, it evolves with the number of subscribers and the impact of each subscriber can be considered constant. On the other hand, traffic during the human inactive period corresponds to non-interactive type, and a reduced number of subscribers generate a high volume of traffic. The different percentages of traffic belonging to each type are calculated considering that during the period of human inactivity traffic corresponds to non-interactive traffic. Thus, on each upstream channel we obtain a percentage of non-interactive traffic by averaging the traffic values between 5:35 a.m. and 6:30 a.m. Later, all the values obtained were represented in a graph against the peak to mean traffic rate. This index was chosen because the greater the peak to mean rate, the higher the number of interactive subscribers. Fig. 6 shows the graph of non-interactive traffic against the peak to mean rate; the points are adjusted by a potential model whose equation and coefficient of determination are shown on the graph. Considering the above assumptions and relationships, the traffic model parameters are calculated as follows: (1) The mean traffic on each upstream channel is obtained from the relationship shown in Fig. 4(a): Mean ¼ 0:1119 Subs 0:4076
ð2Þ
(2) The peak traffic on each upstream channel is obtained from the relationship shown in Fig. 4(b): Peak ¼ 0:2201 Subs 11:811
ð3Þ
Non-inter.channel utilization (%)
(3) The peak to mean rate is obtained by dividing the values obtained from Eqs. (2) and (3). From this value, the percentage of non-interactive traffic is estimated, using the relationship shown in Fig. 6: 60 50 y=44.579x-3.312 2 R =0.5063
40 30 20 10 0 0
0
1
1.5
2
2.5
Rate peak traffic/mean traffic
Fig. 6. Potential model which adjusts the percentage of non-interactive traffic.
250
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
Peak traffic estimation
Peak traffic model
Non-interactive traffic model Number of subscribers on the channel
Mean traffic model
Mean traffic estimation
Subscriber evolution function
Non-interactive traffic estimation
Mean Non-in. Efec. Subs.
Subscriber estimation
Non-intractive traffic estimation Interactive traffic estimation
Connected subscribers estimation
Fig. 7. Traffic estimation process based on the number of subscribers on the channel.
Nint ¼ 44:579
Peak Mean
3:312 ð4Þ
(4) The interactive traffic is calculated as a percentage of interactive traffic per subscriber. Thus, the value of interactive traffic is obtained as the difference between the mean traffic, calculated with Eq. (2), and the non-interactive traffic, from Eq. (4). Then, the mean number of effective connected subscribers on the channel is calculated. Effective subscribers are the number of connected subscribers minus the non-interactive subscribers, that is the interactive subscribers: Int ¼
Mean Nint ðperc percmin Þ Subs
ð5Þ
where perc and percmin are the mean and minimum values of the connection evolution function, and Subs is the number of subscribers on the channel. All the traffic model parameters are obtained from a known parameter: the number of subscribers assigned to each channel. Fig. 7 summarizes the calculation process described and the relationships between the traffic measurements considered. 4.3. Traffic model implementation The traffic model has been implemented using the modeling and simulation language QNAP2. 1 This language is based on the queuing network paradigm and uses discrete events simulation. Thus, the different elements of the traffic model have been represented by queues with complex services. The non-interactive traffic is represented by a source. This element produces the percentage of non-interactive traffic calculated in Eq. (4). At this source, it is necessary to determine the size of the requests and the time between requests in order to 1
QNAP2 was developed by the INRIA, and is a trademark of SIMULOG.
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
251
generate the percentage of non-interactive traffic. Considering the type of applications which produce non-interactive traffic, and in accordance with [8], the size of the non-interactive requests are distributed following a Pareto distribution. The mean value of the Pareto distribution is chosen as the maximum traffic that a cable modem can send in a second, approximately 7500 bytes. The inter-request time is calculated to reach the required traffic volume. The interactive traffic is generated by an infinite server sending requests from the active subscribers. In this case the inter-request time is fixed at 1 s, and the size necessary to maintain the traffic rate per subscriber is obtained. The size of the requests are distributed following a normal distribution taking the obtained value as the mean and a standard deviation of 20%. The upstream communication is represented by two servers. One of them implements the upstream channel, and the other considers the influence of the MAC protocol. Together, they constitute a load dependence server, whose service time depends on the number of ATM cells assigned to each cable modem by the MAC protocol under the ‘‘best effort’’ quality of service. This traffic model has been used to simulate the behaviour of each upstream channel of the cable network. The only parameter it requires is the number of subscribers assigned to the channel, and using the established relationships the model parameters are calculated. As the simulation time evolves, the number of connected subscribers are recalculated following the connection evolution function. The result of the simulation is the estimated traffic profile in each upstream channel. In Section 6, these results are compared with the real values in order to validate the traffic model, concluding that the traffic model produces a traffic profile which is statistically indistinguishable from the real traffic for a 95% of level of confidence.
5. Cable network model development In this section the traffic model is joined to a physical description of the HFC network to produce the global model of the cable network. The proposed model represents a cable network with an architecture like that shown in Fig. 1, validated for the case of study. This model represents a complex system, including the behaviour of more than 17,000 subscribers, 17 HFC branches with 17 downstream and 102 upstream channels. In order to deal with this complex system an approach based on hierarchical structure is used. In the first step a model for a simple HFC branch is developed. Later, several simple HFC branch models are combined and joined to other elements to represent the whole cable network. 5.1. A simple HFC branch model A simple HFC branch of the cable network can be represented using queuing elements as shown in Fig. 8. This HFC model includes six upstream channels, each of which is represented by an upstream traffic model (the stations enclosed in the dotted line in Fig. 8) to be applied to the cable network. Using the facilities of the QNAP2
252
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261 Subscribers x 6 Non-interactive requests Interactive requests Upstream HCX
Exterior
Downstream
HFC branch OUT
Fig. 8. Queuing model for a simple HFC branch.
language, the traffic model is encapsulated in a simulation object, which receives the number of assigned subscribers to the upstream channel as its parameter. In this way, the simplicity and independence of the model is improved. The downstream queue represents the downstream channel, the HCX queue represents the HFC controller and the exterior queue represents the rest of the cable network. The downstream and HCX stations work as time shared queues whose service time depends on the channel capacity for the downstream queue, and on the HFC controller specifications for the HCX queue. The most important element in this model is the exterior queue, which calculates the size of the responses to the subscribers’ requests. These responses make up the downstream traffic. The size of responses are obtained from the rate between downstream and upstream traffic measurements. Thus, the size of each response is calculated by multiplying the size of each subscriber request (obtained from the traffic model) by the rate. The rate for each simple HFC branch is calculated from the downstream traffic on the branch, and the aggregated traffic of all the upstream channels in the branch. Different rates are obtained for the two different types of traffic. The non-interactive rate is the rate between downstream and upstream samples belonging to periods of low activity (5:35 a.m. to 6:30 a.m.). These samples give a mean value and a standard deviation for the non-interactive rate in each HFC branch. For the interactive traffic rate, the samples from periods of high activity (between 9:00 p.m. and 11:00 p.m.) are considered. The interactive rate is calculated using the expression: r¼
Downhigh Downlow Uphigh Uplow
ð6Þ
This expression calculates the rate between interactive traffic on both kinds of channels, downstream (Down) and upstream (Up). The interactive traffic on both channels is obtained as the difference between the maximum traffic (high) and the
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
253
minimum or non-interactive traffic (low). Applying Eq. (6) to each group of samples, several values for the rate are obtained. These values produce a mean value and a standard deviation for the interactive traffic. In this way, the simple HFC network model reproduces the upstream traffic using the traffic model previously developed, and the external queue generates the traffic on the downstream channel of each HFC branch. 5.2. The cable network model Using the modeling language facilities, the simple HFC branch model is encapsulated in a simulation object. This object is the basic element for building the global cable network model. Thus, the global cable network model is formed by the same number of HFC branch objects as HFC branches on the cable network. The cable network model is completed with a queue which represents the ATM backbone, which interconnects all the HFC branches, and a special network branch which represents the cable operator head-end. This modeling strategy produces a scalable model, which adapts easily to the cable network evolution. Fig. 9 shows a scheme of the whole cable network model. The most important aspect of this model is the way the traffic is distributed in the cable network. There are several kinds of traffic: internal, local, and external. Internal traffic is that established between subscribers in the same HFC branch. It is filtered by the HCX element and does not migrate to the rest of the cable network. This traffic is very limited and can be considered negligible. Local traffic is the traffic established between subscribers in different HFC branches, and external traffic is
Subscribers x 6
x 6 Subscribers
Non-interactive request Interactive request
Non-interactive request Interactive request
Upstream
Upstream HCX
HCX Downstream
Downstream
ATM Backbone
HFC branch OUT
Branch 1
HFC branch OUT
Branch N Internet access 2 Router 1
Internet
Router 2 ATM switch
Servers
Internet access 1
Head-end branch Fig. 9. Queuing model for the cable network.
254
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
MBytes per second
the traffic sent by the subscribers towards the head-end network branch. The external traffic in the head-end can go towards the servers or towards Internet. The volume of each kind of traffic is estimated using the available measurements. The sum of all upstream traffic, that is, the traffic sent through all the upstream channels and directed towards the ATM backbone is established. This traffic comprises both local and external traffic. This global upstream traffic is compared with the traffic registered in the Internet access router (Router 2 in Fig. 9). The evolution of both traffics is shown in Fig. 10. The graph shows that the traffic profiles in both parts of the network are almost the same. This indicates that the majority of the HFC cable network traffic is sent towards Internet. The small differences between the two lines are due to the local traffic and the traffic towards the cable network server. To evaluate the volume of server traffic, the server log files for the same date as traffic measurements were analyzed. The average traffic value on the server was found to be 0.23 Mbps, far from the registered measurements in Fig. 10. This traffic is shared between the cable network subscribers (6.35% of requests and 4.41% of traffic volume) and external requests made from Internet. The conclusion obtained is that the majority of upstream traffic on the cable network is represented by traffic towards Internet, the local traffic represents between 1% and 2%, and the server traffic represents a percentage between 0.5% and 3%. The downstream traffic associated with the response to the subscribers, is calculated as in the simple HFC branch model, but in this case the rate between the incoming traffic to the cable network from Internet, and the outgoing traffic from the cable network towards Internet is considered; these traffics are measured in the Internet access router (Router 2 in Fig. 9). As in the HFC branch model, a rate for each kind of traffic, interactive and non-interactive, must be obtained. For non-interactive traffic, the continuous part of the traffic is considered. This kind of traffic is produced by peer to peer applications, so the rate between the inner and outer traffic produced by peer to peer applications gives the effect of non-interactive traffic. This rate gives a mean value of 2.44 and a standard deviation of 0.21. In the case of interactive traffic, the total traffic is reduced by the continuous traffic both for the inner and outer traffic. Obtaining the rate between the resulting types of traffic gives the influence of the interactive traffic. This rate results in a mean value of
10 9 8 7 6 5 4 3 2 1 0 6:00
TowardsInternet AgregatedUpstream
12:00
18:00
0:00
6:00
12:00
18:00
0:00
6:00
Time of day
Fig. 10. Upstream and outgoing router traffic comparison.
12:00
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
255
2.23 and a standard deviation of 0.24. From the analysis of the traffic measurements the traffic rates have been found to follow approximately normal distribution. The final cable network model generates requests in all the upstream elements, each of them represented by a traffic model, and in this way generates the upstream traffic on all the channels. This traffic is sent across the ATM backbone element mainly to the head-end branch, but also to the other HFC branches. In the headend branch, the traffic is directed towards the server or Internet elements. In these elements the response traffic is calculated, and sent back after a delay period which represents the information access time. This traffic will be sent back to the HFC branches through the downstream channels. 5.3. Model extensions The cable network model has been developed based on the characteristics of however it can be extended to other cable network architectures. The versatility of the model is a result of its modular structure and the definition of simulation objects which encapsulate some of the modules. The first module is formed by the traffic model. This model represents a complete upstream channel: the subscribers’ requests, the MAC protocol, etc. This traffic model is encapsulated on the upstream simulation object, which receives as its parameter the number of assigned subscribers to the channel. The second module is constituted by the HFC branch model. This model represents the basic elements of an HFC branch: the downstream channel, the HFC controller and defines as many upstream objects as upstream channels on the HFC branch. Finally this model is encapsulated to build the HFC branch simulation object. The modular nature of the model simplifies the changes needed to evaluate new working conditions. The following are some examples of possible working conditions that can be analyzed and their associated changes:
TELECABLE INC.,
• If the proprietary MAC protocol were changed to DOCSIS protocol, the change would require modifying the station associated to the MAC protocol in the traffic model to implement the characteristics of the DOCSIS protocol. Thus the changes would be included in the upstream simulation objects and incorporated into the cable network model. • The existence of subscribers with different qualities of services can be included in the model by defining new classes of customers in the queuing model. Each class of customers would have a distinct service in the model stations. • A similar situation occurs when the behaviour of a particular application on the cable network is to be studied. The definition of a new class with a different service will provide information about the studied application. • Finally, the number of upstream channels on each HFC branch can be modified directly from a network description file. This file initializes the cable network model by defining the number of HFC branch objects, their associated upstream channels and the number of assigned subscribers to each upstream channel.
256
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
Currently, the cable network model is used for the tuning and capacity planning of the bandwidth allocation between channels, because the model provides information over time rather than a fixed snapshot of the network situation. Using this model, the cable operator can take decisions about network growth: inclusion of a new HFC branch or new upstream channels, etc. Other direct applications of the model are to study the impact on the network of new services, such as multimedia interactive services, and the performance obtained by the subscribers using these new services.
6. Results and model validation The main result provided by the cable network model developed is the traffic expressed as a percentage of channel utilization in all the network channels: upstream and downstream channels, backbone link and Internet access channels. The model also permits the cable operator to obtain information about traffic throughput and network devices utilization. Some of these results can be directly compared with the values of the real cable network and so the model can be validated. The cable network model was developed using a hierarchical approach; first the upstream traffic model was developed, then it was included in the HFC branch model, which is the basic element of the cable network model. The final model obtained at each stage was validated in order to improve the quality of the results obtained. The validation consists of three parts: traffic profile comparison, comparison using confidence intervals, and a comparison of the values obtained for some statistical properties (autocorrelation function and self-similarity coefficient). The validation based on confidence interval comparison is described in [14]. This method is based on defining a difference series ðn ¼ Real ModelÞ and calculating its confidence interval; if the calculated confidence interval includes zero, both series are statistically indistinguishable. The first component to be validated is the upstream traffic model. Fig. 11 shows two examples of the comparison of traffic profiles on the upstream channels. Applying the comparison method based on confidence intervals for all the upstream channels, the interval obtained for a 95% level of confidence is [)11.33, 13.55], which includes zero. The statistical properties to compare are the autocorrelation function and the coefficient of self-similarity. The comparison of the autocorrelation function obtained ranges from perfect adjustment to a slight difference in the lower half of the graph for the worst cases; Fig. 12 shows the comparison of the autocorrelation function for the two upstream channels of Fig. 11. In case of self-similarity, the values of the Hurst coefficient are very close. The difference between values is lower than 10% except in two cases, and the mean difference is 2.77%. In conclusion, the upstream traffic generated by the model developed is a statistically equivalent approximation to the real traffic on the upstream channels. In the case of the HFC branch model the downstream traffics obtained were also validated using the same methods; all the methods confirm the validity of the simple
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
Channel utilization(%)
80
Real
70 60
Model
50 40 30 20 10 0 6:00 12:00 18:00 0:00
(a)
6:00 12:00
18:00 0:00
Channel utilization (%)
6:00 12:00
Time of day
60
Real
50
Model
40 30 20 10 0 6:00 12:00 18:00 0:00 6:00 12:00 18:00 0:00
(b)
257
6:00 12:00
Time of day
Fig. 11. Examples of traffic comparisons on upstream channels: (a) channel GI01CC02-UP7, (b) channel GI02CC01-UP8.
HFC branch model. The results are not shown because the simple HFC branch model is an intermediate step towards the final cable network model. The cable network model can be validated both for the traffic on the upstream and downstream channels, and on the traffic registered through the router which controls the access to Internet. The results obtained for the upstream channels are the same for the traffic model (Fig. 11). For the downstream traffic, Fig. 13(a) shows the comparison between the real and the simulated traffic profiles, giving an example of the approximation level that can be obtained. The comparison based on confidence intervals provides more information about the level of adjustment between the model and the real cable network. The confidence interval obtained on all the channels for a level of confidence of 95% includes zero ([)6.328, 14.159]), which means that both systems are statistically indistinguishable. The comparison of the statistical properties of autocorrelation function and self-similarity show insignificant differences in both cases. Fig. 13(b) depicts the worst adjustment among the autocorrelation functions in the downstream channels. The relative error of the self-similarity coefficients have a mean difference of 2.59%, and a worst case of 10.59%. Finally, as most of the traffic of the cable network is destined to the Internet, it is very important to compare the adjustment of simulation results for this traffic. Fig.
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Autocorrelation coefficient
258
Model Real
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
Autocorrelationcoefficient
(a)
Lag 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Model Real
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
(b)
Lag
Fig. 12. Comparison of autocorrelation coefficients: (a) channel GI01CC02-UP7, (b) channel GI02CC01UP8.
60
Real Model
Channel utilization(%)_
50 40 30 20 10 0 6:00
12:00
Autocorrelation coefficient.
(a)
18:00
0:00
6:00
12:00
18:00
0:00
6:00
12:00
Time of day
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
(b)
Model Real
1
3
5
7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
Lag
Fig. 13. Example of traffic comparison on a downstream channel: (a) traffic profile comparison, (b) traffic autocorrelation, worst case.
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
Mbytes per second
9 8
Real Model
7 6 5 4 3 2 1 0 6:00
12:00
18:00
0:00
(a)
Mbytes per second
259
20 18 16 14 12 10 8 6 4 2 0 6:00
(b)
6:00
12:00
18:00
0:00
6:00
12:00
Time of day
Real Model
12:00
18:00
0:00
6:00
12:00
18:00
0:00
6:00
12:00
Time of day
Fig. 14. Traffic comparison on the Internet access router: (a) outgoing traffic (towards Internet), (b) incoming traffic (from Internet).
14 compares the traffic through the router both to and from Internet. The level of coincidence between the real results and the simulated results is high in both cases. The confidence intervals obtained for a 95% level of confidence include zero: they are [)0.337, 1.323] and [)1.685, 1.706] respectively. A comparison of statistical properties confirms the validity of the model, there is no difference between the autocorrelation functions in both cases, and the relative difference between the self-similarity coefficients are 0.34% and 1.10% respectively.
7. Conclusions This paper presents the design of a simulation model for a generic cable network, and its validation procedure using the measurements of a real cable operator. The cable network model is built with a hierarchical structure which is based on simpler intermedia models; these intermedia models are independently validated, thus confirming the quality of the final model. Using the modeling language facilities, each intermedia model is represented as a simulation object, which can be directly incorporated into higher level models. Thus, the final cable network model has the same number of simulation objects as there are HFC branches in the real cable network. The cable network model is scalable, and can evolve as the real cable network does.
260
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
The parameters of the cable network model are obtained from the analysis of traffic measurements on all its channels. The traffic on the channels is related by simple regression models to the number of subscribers assigned to each channel. The result of the analysis is a simple procedure to obtain model parameters from cable network information. This procedure makes the use of the model direct, because it relates known physical parameters with more complicated traffic parameters. The use of the final cable network model by the cable operator is very simple; the only information it requires is the number of subscribers assigned by the cable operator to each upstream channel. This information allows the cable operator to use the model for a process of continuous capacity planning, as the number of subscribers assigned to each channel evolves. This makes this model different from existing models, in which the model parameters are not always related to the network and do not have a direct meaning. As a final conclusion, the developed model can be considered a valid tool to support performance decisions about the cable network studied. A simple procedure is established to obtain model parameters from the available measurements. The model is generic and has a hierarchical structure which makes it adaptable to the particular implementation of other cable networks.
References [1] D.J. Houck, W.S. Lai, Traffic modeling and analysis of hybrid fiber-coax systems, Computer Networks and ISDN Systems 30 (1998) 821–834. [2] I. Borges, F. Fontes, J. Bastos, J. Loureiro, Interactive Services over Hybrid Fibre-Coax Networks. in: Proceedings of international conference on ATM, ICATM99, Colmar, France, June 1999. [3] N.K. Shankaranarayanan, Z. Jiang, P. Mishra, User-Perceived Performance of Web-browsing and Interactive Data in HFC Cable Access Networks, IN: Proceedings of the IEEE International Conference on Communications, Helsinki, Finland, June 2001. [4] W. Leland, M. Taqqu, W. Willinger, D. Wilson, On the self-similar nature of Ethernet traffic (Extended version), IEEE/ACM Transactions on Networking. 2 (1) (1994) 1–15. [5] M. Garrett, W. Willinger, Analysis, Modeling and Generation of Self-Similar VBR Video Traffic, Proceedings of the ACM Sigcomm., London, September 1994, pp. 269–280. [6] V. Paxson, S. Floyd, Wide area traffic: the failure of Poisson modeling, IEEE/ACM Transactions on Networking 3 (3) (1994) 226–244. [7] W. Willinger, M.S. Taqqu, W.E. Leland, D.V. Wilson, Self-similarity in high-speed packet traffic: analysis and modeling of Ethernet traffic measurements, Statistical Science 10 (1) (1995) 67–85. [8] M.E. Crovella, A. Bestavros, Self-similarity in World Wide Web traffic: evidence and possible causes, IEEE/ACM Transactions on Networking. 5 (6) (1997) 835–846. [9] V.S. Frost, B. Melamed, Traffic modeling for telecommunications networks, IEEE Communications Magazine (1994) 70–81. [10] P. Fiorini, Modeling telecommunication systems with self-similar data traffic, Ph.D. Thesis, Department of Computer Science and Engineering, University of Connecticut, 1997. [11] R.G. Addie, M. Zukerman, T.D. Neame, Broadband traffic modeling: simple solutions to hard problems, IEEE Comunications Magazine 36 (8) (1998) 88–95. [12] L. Lipsky, H.-P. Schwefel, M. Greiner, M. Jobmann, Comparison of the Analytic N-Burst Model with Other Approximations to Self-Similar Telecommunications Traffic, Technical Report, TUM and BRC, November, 2000.
M. Garcia et al. / Simulation Modelling Practice and Theory 12 (2004) 239–261
261
[13] M. Garcia, X.G. Pa~ neda, D.F. Garcia, V.G. Garcia, R. Bonis, Traffic analysis of data transmission on Hybrid Fiber Coax Network, in: Proceedings of the IASTED International Conference on Communication Systems an Networks, CSN 2002, Malaga, Spain, September, 2002, pp 172–177. [14] A.M. Law, W.D. Kelton, Simulation modeling & analysis, second ed., McGraw-Hill International, 1991.