On the energy consumption computation in Content Delivery Networks

On the energy consumption computation in Content Delivery Networks

G Model ARTICLE IN PRESS SUSCOM-185; No. of Pages 10 Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx Contents lists available at...

3MB Sizes 2 Downloads 51 Views

G Model

ARTICLE IN PRESS

SUSCOM-185; No. of Pages 10

Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Sustainable Computing: Informatics and Systems journal homepage: www.elsevier.com/locate/suscom

On the energy consumption computation in Content Delivery Networks夽 Andrea Bianco, Reza Mashayekhi ∗ , Michela Meo Dip. di Elettronica e Telecomunicazioni, Politecnico di Torino, Italy

a r t i c l e

i n f o

Article history: Received 27 July 2016 Received in revised form 8 March 2017 Accepted 23 August 2017 Available online xxx Keywords: Content Delivery Networks Energy consumption Hierarchical Internet map Synchronization energy consumption CDN caching strategies

a b s t r a c t Data distribution via Content Delivery Networks (CDNs) is one of the most significant energy consuming sectors in the ICT area. A CDN system can be abstracted as a primary server housing the entire data set and several surrogate servers, each one caching a portion of the whole data set. We propose a new model to compute the total energy consumption of CDNs. The model is based on a hierarchical Internet representation, and includes the energy consumption needed to keep servers synchronized. We analyze the effect of synchronization and network topology on the total energy consumption of CDNs. Results reveal that increasing the number of surrogate servers reduces the transmission delay. However, this does not necessarily lead to a reduction of the total energy consumption. Furthermore, the effect of network topology and various caching strategies is described. Results show that CDN energy consumption strongly depends on the ratio between the number of content requests and content modifications, and considering a hierarchical network topology highlights slightly different energy consumption trends with respect to those of classical “flat” network representation. © 2017 Elsevier Inc. All rights reserved.

1. Introduction Nowadays energy consumption is becoming one of the biggest concerns throughout the world. One of the hottest energy consumption sectors is Information and Communication Technology (ICT) [1]. A considerable energy consuming portion of the ICT consumption is Internet with its dramatically fast growing trend. In this context, a very important role is played by data centers and data dissemination systems [2]. Delivering a vast amount of data from data servers around the world to a numerous number of users demands a considerable energy consumption [3]. Thus, energy management of data distribution systems is recently a hot research issue. One of the most widely known data distribution systems is Content Delivery Network (CDN). CDNs can be abstracted as a centrally managed pool of computing and storage resources, with highspeed Internet access. CDN sites are distributed at strategically chosen locations throughout the Internet or within ISP domains [4]. Performing the process of data management and distribution in CDNs, energy is consumed in different sections due to different activities. These sections are: storage energy consumption, energy consumed by the servers to react to content requests, energy con-

夽 This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. ∗ Corresponding author. E-mail address: [email protected] (R. Mashayekhi).

sumed to deliver the data to users, and finally the energy consumed to keep data servers synchronized upon data modification and insertion. In this paper we propose a simple model for a deep understanding of CDNs energy consumption, that can be used to facilitate the design of more energy efficient solutions. One of the key features of the model is that it includes synchronization energy in the computation of the total energy consumption of CDNs. Through the model, we assess the effect of synchronization energy consumption when the ratio between the number of content modifications and insertions and content requests varies. We also consider the effect of the topology in the energy formulation to demonstrate that using a more realistic hierarchical topology representation can have a non-negligible impact. As a consequence, we show that in some scenarios increasing the number of surrogate servers increases also the operational costs, besides the capital expenditures. Finally, the trade off between energy consumption and data distribution delay is considered. The paper stems from a previous preliminary contribution [5]. With respect to the previous version, the novelty of this paper consists of three aspects. First, the model is enhanced to take into account also new contents insertion; second, the model is validated through simulation; and, finally, additional scenarios are considered with different distributions of content popularity. The paper is organized as follows. Section 2 discusses the related works. In Section 3 CDN architectures are briefly summarized, while Sections 4 and 5 present the new model to compute total

http://dx.doi.org/10.1016/j.suscom.2017.08.008 2210-5379/© 2017 Elsevier Inc. All rights reserved.

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model SUSCOM-185; No. of Pages 10

ARTICLE IN PRESS A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

2

energy consumption of CDNs. In Section 6 the model is validated through simulation and in Section 7 performance results are presented and discussed. Finally, Section 8 concludes the paper and discusses possible future research activities. 2. Related works Energy consumption of CDNs has been addressed in many previous research activities. In this work we target the aspects which have not been fully covered in previous works. The main differences are discussing the effect of a hierarchical topology instead of a random graph, and including synchronization energy consumption in the computation of total energy consumption of CDNs. In [4,6], energy consumption of CDNs is computed, without considering synchronization energy consumption. In [7], authors propose a novel load adaptation technique for Caching Points which not only enhances content download rate but also reduces transmission energy consumption through random sleep cycles. They also ignore the synchronization energy consumption in CPs. The authors of [8] also propose a multiphase sleep-wake mechanism (MSWM) for CDN to achieve energy saving. Synchronization energy consumption is also not considered in this work. The work in [2] studies energy distribution representing the Internet map through random graphs. Random graphs are used to analyze data distribution systems for networks with different structures, where the main network parameters can be synthetically controlled. However, topologies derived from random graphs represent a fairly simplified model. Moreover they are not taking into account the hierarchical structure of the Internet, which may have a non-negligible impact on the energy consumption related to data distribution. In this paper, we model the Internet map according to a three-tier model which represents the hierarchical architecture of today Internet. In [9,10], the authors highlight the effect of exploiting a distributed data center controlled and managed by an ISP. They discuss the usage of a hierarchical network topology to find the best placement of data caches and data centers to decrease the energy consumption, whereas we propose a hierarchical Internet topology model to realistically compute the CDN energy consumption. Among three major CDN content outsourcing approaches, namely cooperative push-based, cooperative pull-based, or non-cooperative pull-based, previous works targeted different approaches. For instance authors in [11] compared cooperative content replication and non-cooperative case in a joint optimization problem in CDNs, while we utilize cooperative push-based approach in our model. For what concerns cache management in CDNs, several research works discuss the best strategy to distribute data among different servers in CDNs. Tuncer et al. investigate in [12] lightweight strategies that can be used by the ISPs to manage the placement of contents in the various network caching locations according to user demand characteristics. Their proposed strategies depend on the volume and the nature of contents in the system. Baliga et al. [13] suggest that frequently used data are better to be replicated and kept close to end users, while rarely accessed data should be replicated less and kept in the primary server only. We consider three different caching strategies in this work: Static, Least Recently Used (LRU), and Least Used (LU). 3. Content Delivery Networks A CDN is represented by a main server, named primary server, which stores the entire data set and is connected to several surrogate servers. Surrogate servers are positioned at the network edge, closer to end users. Surrogate servers store contents in their

cache based on the caching policy and content outsourcing strategy exploited in the network. Contents to be stored in surrogates can be chosen uniformly among the whole data set, or according to the content global popularity. Storing contents based on their popularity allows to save storage space, to balance load among servers, to reduce transmission energy consumption and consequently to reduce client download time. Content distribution and management in CDNs plays an important role. Indeed, the efficiency of the CDN approach can be determined by a smart content selection through clever caching strategies. The optimal placement of surrogate servers permits to provide high quality of service and low CDN prices [14]. Having a set of surrogate servers properly placed in the network and a smart content selection policy, an efficient content outsourcing strategy should be defined. Content outsourcing can be chosen among cooperative pushbased, cooperative pull-based, or non-cooperative pull-based approaches. In cooperative push-based approaches, content is pushed to surrogate servers from the primary server. Indeed, the primary server keeps a mapping between surrogate servers and the contents stored in each one. Therefore, on each content request, the request is directed to the closest surrogate server hosting that content. Only if the request cannot be accessed by any of the surrogate servers, it is directed to the primary server. In non-cooperative pull-based approach, client requests are always directed to their closest surrogate server. On each miss, surrogate servers pull the missed content from the primary server. The difference between cooperative and non-cooperative pull-based approaches is that in cooperative approach surrogate servers cooperate to get the requested content in case of a cache miss. It means that the surrogate with a cache miss, pulls the content from one of the other surrogates instead of the primary server. According to the above descriptions, in the next section we present our model. 4. System and model assumptions In this section we propose a model to compute energy consumption of Content Delivery Networks [5]. The novelty of this model is in two dimensions. Firstly, the model includes synchronization energy into the total energy consumption; synchronization energy is the energy consumed to keep all surrogate servers updated through propagating the modified contents from the primary server to all surrogate servers housing that content. The second novelty of the model is that it considers a network topology that represents the Internet map in more realistic way with respect to what is usually done. The real Internet map is difficult to be represented, due to a number of factors: the dynamic nature of the Internet, its huge size and its hierarchical and administrative-based structure, that has an impact on data distribution policies. We represent the Internet topology with a three-tier model representing the layered ISPs architecture. In what follows, we provide details about our model, the topology, surrogate server management, and system assumptions. 4.1. Internet map In the Internet, three types of ISPs can be identified. In Tier 3 ISPs, which are the edge portion of the network, edge routers connect end users to the Internet. They are located in Points of Presence (PoPs), which are on the one side connected to the Internet via border routers, and on the other side to Customer Edge routers or Subscribers Edge routers, which connect end users to the Internet. Border routers connect Tier 3 to Tier 2 ISPs. Tier 2 ISPs typically provide regional or national interconnection among PoPs. As such, “close-by” Tier 3 ISPs may be connected to the same Tier 2 ISP. Tier 1

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model SUSCOM-185; No. of Pages 10

ARTICLE IN PRESS A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

3

Fig. 1. Three layer Internet map.

networks are typically responsible for long-distance, international connections. The proposed hierarchical Internet map is sketched in Fig. 1 4.2. Surrogate servers management Surrogate servers management can be divided into three different management categories, namely surrogate server placement, content outsourcing, and cache management. 4.2.1. Surrogate server placement We model a typical CDN with one primary server housing the entire data set and a number of surrogate servers each one storing a part of the data set. Surrogate servers keep the contents closer to end users, with the main objective to improve the QoS (delay for fetching a content). Surrogate servers placement in our model is achieved according to the following strategy that takes into account the hierarchical topology of the Internet. As sketched in Fig. 2 that reports a simplified representation of the surrogate server placement in the Internet map, we assume that a core network is available to provide global connectivity and that T2 Tier 2 ISPs connect Tier 3 ISPs to the core. S surrogate servers, randomly positioned in Tier 3 ISPs are available, with S ≤ T2 . We assume that at most one surrogate server exists in each set of Tier 3 ISPs connected to the same Tier 2 ISP. Obviously, the primary server is aware of the surrogate servers positioning. 4.2.2. Content outsourcing As described in Section 3, there are three major content outsourcing approaches: cooperative push-based, cooperative

Fig. 2. Simplified map of the Internet with surrogate servers placement.

pull-based, or non-cooperative pull-based. We utilize cooperative push-based approach in our model, where contents are sent to surrogate servers by the primary server, and the primary server keeps a mapping between contents and the surrogates. In this scheme, each request from a user is directed to the closest server which stores the requested content. On each content modification, the primary server propagates the modified content to the surrogates that store

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model

ARTICLE IN PRESS

SUSCOM-185; No. of Pages 10

A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

4

the content, while on each new content insertion, the decision is made according to the caching strategy. 4.2.3. Cache management When the number of surrogate servers and their position is determined, we must define the cache management strategy and cache size in surrogate servers. Each surrogate server is responsible to store a fraction of the entire data set. The strategy to choose the contents to be cached in each surrogate server is of fundamental importance. We consider three different caching policies, namely Static, Least Recently Used (LRU), and Least Used (LU). • Static: Contents are pre-fetched according to a uniform distribution from the primary server to surrogate servers and they remain in the surrogates without any change during the network lifetime. • Least Recently Used (LRU): On each cache miss in a surrogate server (i.e., the closest server to the requester ISP does not house the content), the surrogate server fetches the missed content from the closest surrogate server and stores that content in place of the content in its cache which has not been requested for the longest time. In this scheme, after a while, the most popular contents are stored in each surrogate, and this increases the hit probability. • Least Used (LU): This strategy is very similar to LRU. The only difference is that the newly requested and missed content in each surrogate replaces the content which is requested the least number of times. In this strategy the popularity of the stored contents is computed according to the number of requests, while in the LRU the popularity is based on the age of recent requests. 4.3. Assumptions and notations The main assumptions and notations in our model are as follows. • There are S surrogate servers located in Tier 3 ISPs and one primary server which hosts all M contents, with a total storage capacity Btot . • Contents are of the same, fixed size B bits. • The storage capacity of all surrogate servers is the same and equal to a fraction SC of the total storage capacity, Btot , of the primary server. • There are T3 Tier 3 ISPs and Each Tier 3 ISP is connected to n end users. Accordingly, the total number of users is N = nT3 . • Each group of g3 Tier 3 ISPs are connected to one Tier 2 ISP. Therefore, there are T2 = T3 /g3 Tier 2 ISPs. • All Tier 1 ISPs are considered as a single core network. • The hit probability for each surrogate server, i.e., the probability that the requested content is hosted in the surrogate server, is denoted as Phit and depends on the cache management strategy. In the static caching strategy, the hit probability is equal to the server relative cache size SC = 0.4, the percentage of the data stored in the surrogate server with respect to the total storage capacity of the primary server. In LRU and LU caching strategies following the zipf distribution of content requests with parameter ˛, and for the server relative cache size SC , the hit probability can be computed according to [15]. Let M be the total number of contents. Let P(i) be the probability that, given the arrival of a content request, the arriving request is made for content i. Let all the contents be ranked in decreasing order of their popularity where content 1 is the most popular content. We assume that P(i), defined for i = 1, 2, . . ., M, has a Zipf-like distribution given by: P(i) = /i˛ , where  =



1 i i˛

−1

.

For the cache size equal to SC , it is assumed that the most popular contents (1 ≤ i ≤ SC · M) are cached in each surrogate server.

Therefore, the hit probability in each surrogate is the sum of P(i) for (1 ≤ i ≤ SC · M): Phit = P(i ≤ SC · M). Of course, the primary server has hit probability equal to 1. • On each content modification, the primary server instantaneously propagates the modified content to the surrogate servers hosting each content. On a new content insertion, the decision is made according to the caching strategy (LRU, LU, Static). • We assume that the average path length in each ISP is constant. • Requests are generated according to a Poisson arrival process with parameter rm . • Content modifications and insertions are generated according to a Poisson arrival process with parameters mm and im . Table 1 summarizes the notation and the values used for numerical results. The values related to storage power consumption per bit and link, router, and server energy consumption per bit are taken from [6]. 5. Model formulation Total energy consumption of CDNs is the summation of four components: storage, server, synchronization and transmission. Storage energy consumption is the energy consumed by the primary server and all surrogate servers to store the contents. Server energy consumption is the energy consumed by the servers to process user requests and react to them. Synchronization energy consumption is the energy consumed to propagate the modified contents, as well as new contents, from the primary server to all surrogate servers housing the modified content. Finally, transmission energy consumption is the energy consumed to fetch the requested contents from the closest server according to the caching policy and deliver it to the users. Accordingly, total energy consumption is computed as: Etot = Estorage + Eserver + Esynch + Etx

(1)

We now define each energy consumption component. Storage energy consumption: Estorage =



Bnm Pst t

(2)

m

where B is the size of each content, nm is the number of replica for content m, Pst is the storage power per bit, and t is the period of the analysis (time period in which the energy consumption is computed). Server energy consumption:



Eserver =

Brm Esr

(3)

t

where rm is the number of content requests per second, and Esr is the server energy consumption per bit. Synchronization energy consumption: On each content modification, the modified content is propagated from the primary server to all surrogate servers hosting the content.



Esynch =

Bmm nm [Er (Hps + 1) + El Hps ]

(4)

t

where mm shows the number of content modifications per second, Hps represents the average number of hops from the primary server to each surrogate server, and Er and El are router and link energy consumption per bit. When there is no surrogate server, Esynch = 0. Transmission energy consumption: On each user request arrival in a Tier 3 ISP, three different cases must be considered, labeled as case A, B and C.

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model

ARTICLE IN PRESS

SUSCOM-185; No. of Pages 10

A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

5

Table 1 Notation and parameters settings. Symbol

Default value

Description

S SC M B t nm rm mm im A Hsd B Hsd C Hsd Hps T3 g3 T2 N n Pst Er El Esr Phit

– 40% 1000 106 bits 100 s SC · S 10000 10, 50, 100, 500, 1000 10, 50, 100, 500, 1000 4 12 20 – 1000 20 50 2,000,000 2000 7.84 · 10−12 W 1.2 · 10−8 J/bit 1.48 · 10−9 J/bit 2.81 · 10−7 J/bit 0.4–0.79

Number of surrogate servers Cache size (surrogate servers storage capacity) Total number of contents in the data set Size of each content Time period of the analysis Number of replica for content m Requests per second Modifications per second Insertions per second Number of hops to fetch the content from the same Tier 3 ISP Number of hops to fetch the content from the same Tier 2 ISP Number of hops to fetch the content from the core network Number of hops from the primary server to surrogate servers Number of Tier 3 ISPs Number of Tier 3 ISPs connected to each Tier 2 ISP Number of Tier 2 ISPs Total number of end users Number of end users per Tier 3 ISP Storage power consumption per bit Router energy consumption per bit Link energy consumption per bit Server energy consumption per bit Probability to hit a content in a surrogate server

1 There is a surrogate server in the same Tier 3 ISP where the request is generated, and the surrogate hosts the requested content. In this case, the requested content is hit in that server. The probability that this case happens is denoted as PA , (5)PA =

S T2

·

1 g3

· Phit

2 There is no surrogate server in the same Tier 3 ISP where the request is generated, however, there is a surrogate server in the same Tier 2 ISP, and the surrogate hosts the requested content. In this case, the requested content is hit in that surrogate server. This happens with the probability denoted as PB , PB =

S · T2



1−

1 g3



· Phit

(6)

3 There is no surrogate server neither in the same Tier 3 ISP nor in the same Tier 2 ISP, or there is a surrogate server in the same Tier 3 ISP or in the same Tier 2 ISP but the requested content is not hit in that surrogate server. In this case, the requested content should be fetched through the core network. The probability that this happens is denoted as PC : PC = 1 − (PA + PB )

(7)

Therefore, the transmission energy consumption when there are S surrogate servers is:

 Etx =

PA PB

t t

PC

6. Model validation through simulation In this section we validate our model through simulation. To validate the proposed model, we simulate a CDN network and compute the total energy consumption. Since other well-known CDN simulators such as CDNSim [16] do not support neither our hierarchical topology map nor the energy consumption computation (specifically synchronization energy consumption is not considered), we designed our own simulator. The designed simulator, 3TierCDNSim, supports several possible network topologies and three caching strategies, namely static, LRU, and LU. Two different content request distributions, namely uniform distribution, and Zipf distribution can be selected. In uniform distribution, for each arriving request the requested content is chosen uniformly among the whole data set, while in Zipf distribution the requested content is chosen according to a Zipf distribution. Zipf content request distribution introduces a global content popularity in the system. Fig. 3(a) compares the results obtained from the simulation and the model for uniform distribution of content requests and static caching strategy. Fig. 3(b) represents the same comparison when content request distribution is Zipf distribution with parameter ˛ = 0.8 and caching strategy is LRU. To model LRU caching and Zipf content request distribution (content popularity) in our analytical computation, we need to estimate the hit probability. As presented in Section 4.3, with Zipf ˛ parameter equal to 0.8 and surrogate cache size of 40%, the hit probability is: Phit = 0.79. Fig. 3 shows that the model and simulation results show a difference of less than half a percent(0.5%). These results prove that, despite being simple, the model is accurate in predicting the system performance.

A A Brm [Er (Hsd + 1) + El Hsd ]+ B B Brm [Er (Hsd + 1) + El Hsd ]+

7. Results and discussions

C C Brm [Er (Hsd + 1) + El Hsd ]

In this section we (1) show the impact of synchronization energy consumption in CDNs, (2) highlight the differences between our hierarchical model and a flat graph, (3) show the effect of different caching strategies and content request distributions (i.e. content popularity) on CDN energy consumption, (4) investigate the effect of data modification and insertion separately and jointly, and (5) analyze the trade off between energy consumption and transmission delay in the network.

t A represents the number of hops to fetch the content from where Hsd B is the number of hops to fetch a surrogate in the same Tier 3 ISP, Hsd the content from a surrogate in the same Tier 2 ISP but not the same C shows the number of hops to fetch the content Tier 3 ISP, and Hsd from the core network.

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model

ARTICLE IN PRESS

SUSCOM-185; No. of Pages 10 6

A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

Fig. 3. Simulation and analytical results comparison, total energy consumption with/without synchronization, (a) static caching and uniform content request distribution, (b) LRU caching and Zipf content request distribution.

Fig. 4. Total energy consumption with/without synchronization, static caching, uniform content request distribution: (a) mm /rm = 0.001, (b) mm /rm = 0.01, (c) mm /rm = 0.1.

Fig. 5. The prototype IP backbone network, with N = 24 and  = 3.6.

7.1. Synchronization plays an important role Fig. 4 shows the total energy consumption with synchronization (denoted as Etot in the plots) and without synchronization (denoted as Etot−synch in the plots), for the static caching policy and a cache size of SC = 40%; uniform content request distribution is considered. Different plots refer to different values of the modification rate. The plots show that the energy consumption always decreases when adding surrogate servers, if we neglect synchronization. However, when the number of modifications for each

content mm is significant, see for example plot (c), synchronization energy consumption becomes relevant. By increasing the ratio with which contents are modified, the synchronization energy consumption increases. As a result, the total energy consumption (Etot ) may increase when adding surrogate servers. Fig. 4(a) and (b) report the total energy consumption of CDN system in the time period of the analysis t = 100 s for a ratio between number of content modifications (mm ) and number of requests to a content piece (rm ) equal to 0.001 (0.01). In both cases, the total energy consumption with and without considering synchronization energy consumption decreases, being the synchronization energy negligible with respect to server and transmission energy consumption. When the ratio mm /rm increases to 0.1 or to 1 (see Fig. 4(c)) considering synchronization energy consumption becomes important. Thus, when the frequency of modifications is high with respect to requests, increasing the number of surrogate servers increases the total energy consumption of the CDN. Indeed, by increasing surrogate servers, the number of replicas for each content increases, and on each content modification, more energy is consumed to update all replicas. 7.2. Hierarchical vs. prototype IP backbone network In this section, the results obtained by implementing the three layer model are compared with those of a non hierarchical Internet topology named prototype IP backbone network, presented in [6] and reported in Fig. 5. To compute the total energy consumption of

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model

ARTICLE IN PRESS

SUSCOM-185; No. of Pages 10

A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

7

Fig. 6. Total energy consumption with/without synchronization for the prototype IP backbone network with different ratios of mm /rm = 0.01, 0.1, 1.

Fig. 8. Average number of hops to fetch the requested contents. Comparison between LRU and static caching.

the prototype IP backbone network, previous formulas need to be modified, because the topology is non hierarchical. For storage and server energy consumption components, since they depend only on the number of requests and cache size and are independent from the topology, no modification is needed. However, synchronization and transmission energy consumption components depend on the number of hops, thus they should be changed according to the topology. For this network the average number of hops between nodes (Hr ) should be computed to represent the average distance between users and surrogate servers. Hr is derived as a power law function of nm (content replica). Nr is the number of nodes (total number of routers considered in previous formulas).

Fig. 6 shows total energy consumption with and without synchronization for a prototype IP backbone network. All parameters are set the same as in the model. Results show that, differently from the three-layer topology, in the prototype IP backbone, total energy consumption always decreases when the ratio between the number of content modifications and content requests is equal to 0.1. Total energy consumption with synchronization increases when the ratio becomes close to 1, whereas this happened for a ratio 0.1 in the three-layer topology. Considering the utilized range for energy consumption in the y axis (0.4 to 1.4), the lines representing total energy consumption without synchronization for all three cases of mm /rm = 1, 0.1, 0.01 overlap due to the very low difference between them. Finally, the graph reveals that, for a flat graph like the prototype IP backbone network, the energy consumption increases more than linearly when increasing the number of surrogate servers.

 N ˛

Hr (nm ) = A

r

nm

In the prototype IP backbone network, Hr is estimated as:

 N 0.57

Hr (nm ) = 0.35

r

nm

As mentioned before, storage and server energy components are computed according to the model, however synchronization and transmission energy components are modified according to:



Esynch =

Bmm nm [Er (Hr (nm ) + 1) + El Hr (nm )] t

and



Etx =

Brm [Er (Hr (nm ) + 1) + El Hr (nm )] t

7.3. Comparison among caching strategies As mentioned before, we simulated three different caching strategies. The simplest one is the static caching, in which contents are first pushed to surrogate servers and never replaced with other contents during simulation time. Contents to be cached are also chosen uniformly among all existing contents in the primary server. Fig. 7 shows the total energy consumption with and without synchronization for Static, LRU, and LU caching strategies when requests to contents arrive according to a Zipf distribution. As expected, when the ratio between the number of modifications and requests is around 0.001 or even 0.01, Fig. 7(a) and (b), total energy

Fig. 7. Total energy consumption with/without synchronization, static, LRU, and LU caching, Zipf content request distribution, (a) mm /rm = 0.001, (b) mm /rm = 0.01, (c) mm /rm = 0.1.

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model SUSCOM-185; No. of Pages 10 8

ARTICLE IN PRESS A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

Fig. 9. Energy consumption per request with synchronization (a) and without synchronization (b), LRU caching and Zipf content request distribution.

Fig. 10. Total energy consumption, Zipf content request distribution with different ˛, mm /rm = 0.1.

consumption with and without synchronization both decrease by adding more surrogate servers. This is not true when the mentioned ratio gets around 0.1 (Fig. 7(c)). The main difference among the three caching strategies is that for static caching total energy consumption with and without synchronization is higher than in LRU or LU caching policies. The reason is that since the requests to contents arrive according to a Zipf distribution, the most popular contents are more likely to get requested.

LRU and LU caching strategies, as described in Section 4, replace the least popular contents with the most popular ones. Therefore, popular contents are usually housed in the surrogates and consequently they are closer to end users. This phenomenon decreases the transmission energy consumption and consequently the total energy consumption with and without synchronization decreases. Results also show that the difference between energy consumption values for LRU and LU caching is very low. Fig. 8 proves the above mentioned phenomenon by showing the average number of hops to fetch the requested content from the closest surrogate server. We can see that in LRU caching, by increasing the number of servers, the gain is more than static caching, because the surrogates intelligently cache the most popular contents. To have a clearer idea on the effect of synchronization, we simulated the network with more ratios between the number of content modifications and requests (i.e, 0.001, 0.005, 0.01, 0.05, 0.1). Fig. 9(a) shows total energy consumption per request on different ratios for different number of surrogate servers, while Fig. 9(b) shows total energy consumption per request without synchronization. Caching strategy is LRU and content request distribution is Zipf with parameter ˛ = 0.8. Fig. 9(a) shows that after a threshold of ratios between modifications and requests, increasing the number of surrogate servers leads to the increase in total energy consumption. This ratio according to our assumptions is between 0.03 and 0.04. However, Fig. 9(b) reveals that by not considering synchronization energy consumption, increasing the number of surrogates always decreases the energy consumption.

Fig. 11. Total energy consumption, LRU caching, Zipf content request distribution, (a) modifications only, (b) insertions only, (c) both modifications and insertions.

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model SUSCOM-185; No. of Pages 10

ARTICLE IN PRESS A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

9

Table 2 Wikipedia articles statistics related to content modifications and content requests. Article

Views

Modifications

mm /rm

Month

Wikipedia:WikiProject Spam/LinkReports Wikipedia:Administrator intervention against vandalism 2015 San Bernardino attack Beverly Gray Tehran derby 2016 FIVB Volleyball World League El Clasico

40, 561 105,131 40, 737 3025 2111 147,688 256,846

12, 867 10,078 2, 971 155 24 651 267

0.3 0.1 0.07 0.05 0.01 0.005 0.001

December 2015 May 2016 December 2015 May 2016 June 2016 June 2016 April 2016

Fig. 12. Total energy consumption and average number of hops to hit the content. LRU caching, Zipf content request distribution, mm /rm = 0.1, 0.01, 0.001.

To investigate the effect of the popularity distribution, we consider different values of Zipf ˛ parameter. Fig. 10 shows the total energy consumption of a CDN using LRU caching strategy, where content requests follow a Zipf distributed with different values for ˛ parameter. The ratio between the number of modifications and requests is equal to 0.1. The graph shows that by increasing ˛, the total energy consumption decreases. Indeed, by increasing ˛, the hit probability in servers grows. Therefore, transmission energy consumption and consequently total energy consumption decreases. Furthermore, the effect of ˛ parameter is marginal. 7.4. Synchronization as insertion and modification Synchronization energy consumption is not only limited to content modifications. It also includes content insertion. When a new content is inserted, based on the exploited caching strategy and the popularity of the inserted content, it would be fetched to some surrogates and take the place of least popular contents. This action consumes energy which is not negligible in the total energy consumption computation. Reaction to content insertion in a CDN system depends on the caching strategy. For instance, if static caching strategy is exploited, inserted contents are delivered to users only from the primary server, while if LU or LRU caching strategy is utilized, on the first request to the newly inserted content, it is fetched to the closest surrogate server. The content to be deleted from the surrogate is also chosen according to the caching strategy. Therefore, insertion energy consumption, despite other energy consumption components, cannot be generally modeled for all CDNs. However, since insertion affects synchronization energy consumption, in this work we have considered insertion energy consumption impact on the total energy consumption through simulation. We consider three different cases, namely modifications only, insertions only, and both modifications and insertions. Fig. 11 shows the total energy consumption for all three cases for different ratios.

Fig. 11(a) represents the total energy consumption for modifications only where the ratio between modifications and requests is considered as: 0.05, 0.01, 0.005, and 0.001. Fig. 11(b) represents total energy consumption for insertions only with the same ratios between insertions and requests, and finally Fig. 11(c) shows total energy consumption for both modifications and insertions. Caching strategy in all cases is LRU and requests to contents are distributed according to a Zipf distribution. When a new content is inserted into the primary server, it is considered to be the most popular content, therefore it replaces the least popular content in the primary server. On each request arrival, if the requested content is the newly inserted content, according to LRU caching strategy, the content will be fetched to the closest surrogate server and from there it is delivered to the requester. Since the newly inserted contents are not a-priori stored in any of the surrogate caches, and since they are the most popular contents, they have to be fetched from the primary server to most of surrogate severs’ caches, which is energy consuming. When both modifications and insertions are considered in the CDN system, the synchronization effect is more evident and a more dramatic increase in the total energy consumption when increasing the number of surrogate servers can be observed (Fig. 11(c)). 7.5. Trade-off between energy consumption and delay Previous results show that increasing the number of surrogates is not necessarily energy efficient. However, increasing the number of surrogates is always beneficial in the quality of service perspective. Indeed, the closer surrogate servers are to end users, the smaller access delay is experienced, because the average number of hops to fetch the content is reduced. The average number of hops to hit the content depends on the probability to hit the content in each network tier, and is computed as: A B C Hm = PA Hsd + PB Hsd + PC Hsd

(8)

where Hm is the average number of hops to hit the content. Simulation results also prove the above mentioned average hop computation. Fig. 12 represents the relationship between total energy consumption and the average number of hops to hit the content when mm /rm = 0.1, 0.01, 0.001 and the caching policy is LRU with Zipf content request distribution. There is a trade-off between energy consumption and average access delay, which should be considered when designing the network system and deciding on the number of surrogate servers. 7.6. Content modification frequency in real CDNs Synchronization energy is paid when contents are modified or new contents are inserted. Fig. 11 shows that the total energy consumption in both cases of content modifications and insertions only follows a similar trend depending on the ratio between insertions/modifications and requests. The question that arises is which are the practical ratios existing in real CDNs. We analyze here some

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008

G Model SUSCOM-185; No. of Pages 10 10

ARTICLE IN PRESS A. Bianco et al. / Sustainable Computing: Informatics and Systems xxx (2017) xxx–xxx

data available for YouTube and Wikipedia, and provide some discussions on other services available on CDNs. Based on the statistics published by YouTube, a well-known video content provider, on May 2013 [17], YouTube provides ratios between the number of content insertions and content requests around 0.001 (mm /rm = 0.001). Indeed, statistics show that over 6 billion hours of video are watched per month on YouTube, while the amount of uploaded video is around 100 h per minute. In this scenario, content requests are largely dominant with respect to content modifications. Wikipedia is a mainly text based content provider in which contents are more regularly added or modified. Some articles in Wikipedia are stable, while some others are modified very frequently. Statistics related to English Wikipedia for January 2017 [18] show that English Wikipedia had over 8 billion page views per month (content requests), while there also 824 new articles inserted per day (content insertion) and 3.5 million articles were edited per month (content modifications). Considering these global statistics, Wikipedia experienced the ratio of around 0.0005. However, there are many articles in Wikipedia which are edited more frequently. According to the tool provided by Wikipedia [19], Table 2 shows some examples of the pages with various ratios of content modifications and requests during a month. The ratios vary from 0.001 to 0.1, which shows the importance of taking into account synchronization energy consumption based on the proposed model for Wikipedia. This observation can provide insights for system administrators on making energy efficient content replication decisions. There are also other content providers for which content insertion and modification rates are critical. Social networks such as Facebook, Twitter, and Instagram are some examples. In social networks, a huge amount of contents such as photos, text posts, and videos are inserted every minute. These contents have to be propagated to surrogate servers according to the caching strategy, which is energy consuming. Content modifications in social networks can be the reactions to a shared content. For instance, the number of likes and comments each content receives, can be counted as a modification related to that content. Unfortunately, data on these social networks are not available. Finally, as an example of other CDNs in which the modification rate is considerable, we would like to cite live event reports such as live sports, which include a large number of user comments. As a summary, the model proposed in this work analyzes a wide set of parameter values, but it comprises realistic values of the ratio between request and insertions/modifications. 8. Conclusions In this paper we propose a simple, yet realistic, model to compute the energy consumption of CDNs. The model is different from the previous ones in two major aspects. First, it introduces into the computation of the total energy consumption the synchronization energy, the energy consumed to keep all surrogate servers updated on each modification or insertion of a content in the system. Second, it uses a three-tier hierarchical network topology, which is more representative of the real Internet map. The model is validated through simulation results obtained by an ad hoc designed simulator. Results reveal that, depending on the ratio between the rate with which contents are modified and inserted over the request arrival rate, the total energy consumption does not always decrease by adding more surrogate servers. If this ratio overcomes a given

threshold (between 0.03 and 0.04 for LRU caching policy and Zipf content request distribution), the total energy consumption of the network increases with the number of surrogate servers. This is due to the increase in the synchronization energy consumption. We also compare the total energy consumption of three different caching strategies, namely static, LRU and LU with respect to two content popularity distributions. Results show that significant energy savings are possible with the LRU and LU strategies with respect to the static caching. Finally, the results show that there is a trade-off between the total energy consumption and the average transmission delay. In summary, for an energy-aware CDN design, the contribution of the synchronization energy cannot always be neglected, being in many scenarios quite significant. References [1] C. Forster, I. Dickie, G. Maile, H. Smith, M. Crisp, Understanding the environmental impact of communication systems, Ofcom (2009). [2] U. Lee, I. Rimac, D. Kilper, V. Hilt, K. KSE, Toward energy efficient content dissemination, IEEE Network (2011), http://dx.doi.org/10.1109/MNET.2011. 5730523. [3] U. Lee, I. Rimac, V. Hilt, Greening the internet with content-centric networking, IEEE INFOCOM (2012). [4] A. Feldmann, A. Gladisch, M. Kind, C. Lange, G. Smaragdakis, F.-J. Westphal, Energy trade-offs among content delivery architectures, IEEE Telecommunications Internet and Media Techno Economics (CTTE) (2010), http://dx.doi.org/10.1109/CTTE.2010.5557700. [5] A. Bianco, R. Mashayekhi, M. Meo, Energy consumption for data distribution in content delivery networks, IEEE International Conference on Communications (ICC) (2016), http://dx.doi.org/10.1109/FCST.2015.49. [6] K. Guan, G. Atkinson, D.C. Kilper, E. Gulsen, On the energy efficiency of content delivery architectures, 2011 IEEE International Conference on Communications Workshops (ICC) (2011), http://dx.doi.org/10.1109/iccw. 2011.5963557. [7] S. Igder, H. Idjmayyel, B.R. Qazi, S. Bhattacharya, J.M.H. Elmirghani, Load adaptive caching points for a content distribution network, 9th International Conference on Next Generation Mobile Applications, Services and Technologies (2015), http://dx.doi.org/10.1109/NGMAST.2015.24. [8] X. Zheng, H. Li, Z. Jiang, G. Wang, Energy efficient analysis for content distribution network with multiphase sleep-wake mechanism based on stochastic petri nets, 9th International Conference on Frontier of Computer Science and Technology (FCST) (2015), http://dx.doi.org/10.1109/FCST.2015. 49. [9] R. Modrzejewski, L. Chiaraviglio, I. Tahiri, F. Giroire, E. Le Rouzic, E. Bonetto, F. Musumeci, R. Gonzalez, C. Guerrero, Energy efficient content distribution in an isp network, IEEE Global Communications Conference (GLOBECOM) (2013), http://dx.doi.org/10.1109/GLOCOM.2013.6831508. [10] V. Valancius, N. Laoutaris, L. Massoulié, C. Diot, P. Rodriguez, Greening the internet with nano data centers, in: ACM CoNEXT’09, Rome, Italy, 2009, http://dx.doi.org/10.1145/1658939.1658944. [11] K. Lim, Y. Bang, J. Sung, J.-K. Rhee, Joint optimization of cache server deployment and request routing with cooperative content replication, IEEE International Conference on Communications (ICC) (2014), http://dx.doi.org/ 10.1109/ICC.2014.6883582. [12] D. Tuncer, M. Charalambides, R. Landa, G. Pavlou, More control over network resources: an ISP caching perspective, 9th International Conference on Network and Service Management (CNSM) (2013), http://dx.doi.org/10.1109/ CNSM.2013.6727806. [13] J. Baliga, R. Ayre, K. Hinton, R.S. Tucker, Architectures for energy-efficient IPTV networks, Optical Fiber Communication OFC (2009), http://dx.doi.org/10. 1364/OFC.2009.OThQ5. [14] S. Sivasubramanian, V. Universiteit, G.C. Pierre, M. Van Steen, G. Alonso, Analysis of caching and replication strategies for web applications, IEEE Internet Computing (2007), http://dx.doi.org/10.1109/MIC.2007.3. [15] L. Breslau, P. Cao, L. Fan, G. Phillips, S. Shenker, Web caching and Zipf-like distributions: evidence and implications, IEEE INFOCOM (1999), http://dx.doi. org/10.1109/INFCOM.1999.749260. [16] DNSim simulator, Available at https://sourceforge.net/projects/cdnsim/. [17] YouTube statistics, Available at http://www.youtube.com/yt/press/statistics. html. [18] English wikipedia at a glance January 2017, Available at https://stats. wikimedia.org/EN/SummaryEN.htm. [19] Wikipedia pageview analysis tool, Available at https://tools.wmflabs.org/ pageviews/.

Please cite this article in press as: A. Bianco, et al., On the energy consumption computation in Content Delivery Networks, Sustain. Comput.: Inform. Syst. (2017), http://dx.doi.org/10.1016/j.suscom.2017.08.008