Cellular traffic offloading utilizing set-cover based caching in mobile social networks

Cellular traffic offloading utilizing set-cover based caching in mobile social networks

The Journal of China Universities of Posts and Telecommunications April 2016, 23(2): 46–55 www.sciencedirect.com/science/journal/10058885 http://jcup...

365KB Sizes 0 Downloads 52 Views

The Journal of China Universities of Posts and Telecommunications April 2016, 23(2): 46–55 www.sciencedirect.com/science/journal/10058885

http://jcupt.bupt.edu.cn

Cellular traffic offloading utilizing set-cover based caching in mobile social networks Bao Xuyan, Zhou Xiaojin, Zhang Yong (

), Song Mei

School of Telecommunication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract To cope with the explosive data demands, offloading cellular traffic through mobile social networks (MSNs) has become a promising approach to alleviate traffic load. Indeed, the repeated data transmission results in a great deal of unnecessary traffic. Existing solutions generally adopt proactive caching and achieve traffic shifting by exploiting opportunistic contacts. The key challenge to maximize the offloading utility needs leveraging the trade-off between the offloaded traffic and the users’ delay requirement. Since current caching scheme rarely address this challenge, in this paper, we first quantitatively interpret the offloading revenues on the cellular operator side associated with the scale of caching users, then develop a centralized caching protocol to maximize the offloading revenues, which includes the selective algorithm of caching location based on set-cover, the cached-data dissemination strategy based on multi-path routing and the cache replacement policy based on data popularity. The experimental results on real-world mobility traces show that the proposed caching protocol outperforms existing schemes in offloading scenario. Keywords

traffic offloading, set cover, caching, mobile social networks

1 Introduction We envision a future in which an era of big data is promoted by the overwhelming growth trend of mobile data traffic. Cisco forecasts that global mobile data traffic will increase nearly 10-fold from 2014 to 2019 and reach over 24.3 EB per month in 2019 (http://www.cisco.com/ c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white_paper_c11-520862.html). Such amount of data traffic will raise big penetration to the cellular networks. One straightforward solution to tackle this issue would be deploying base stations to increase cellular capacity, which is expensive with low financial returns. Nevertheless, considering the proximity-based communication ability of mobile devices and the delay-tolerant property of some data items, offloading cellular traffic through MSNs [1] has become a promising approach to

Received date: 12-11-2015 Corresponding author: Zhang Yong, E-mail: [email protected] DOI: 10.1016/S1005-8885(16)60020-1

alleviate traffic load [2–3]. With proper incentive mechanism [4], some users are willing to wait for a specified delay to download the data item. Thus, traffic offloading can be achieved by intentionally caching cellular data at a small set of users, then the data item is disseminated to other requesters upon opportunistic contacts. A critical concern of this offloading paradigm is that the data is not guaranteed to reach the requester before a given deadline, due to the intermittent transmission path and the high dynamics of mobility trace. If the delay period ends, the remaining part of data yet has to be obtained via cellular network. Therefore, the key challenge is to leverage the trade-off between the traffic being offloaded and the users’ delay requirement. Meanwhile, some research in Ref. [5] reveal a fact that the same data is often delivered repeatedly to individual requesters, which exacerbates the traffic load imposed on cellular network. To address these challenges mentioned above, we must answer the following questions: which users should be

Issue 2

Bao Xuyan, et al. / Cellular traffic offloading utilizing set-cover based caching in mobile social networks

selected as caching nodes and how many copies of the content should be injected to the mobile users. In this paper, we first quantitatively interpret the offloading revenues on the cellular operator side associated with the scale of caching users. To maximize offloading revenues, we propose a viable method by minimizing the number of caching locations which cover all the requesters and provide reliable data accessibility. Second, we define the α -path-set and β -neighbor-set to measure the probabilistic relationship between a node-pair. Third, based on β -neighbor-set , we formulate the problem of finding the optimal caching users as a set-cover problem, based on α -path-set , we schedule the dissemination process from the caching users to the requesters by exploiting multi-path routing with controllable data copies. Finally, a cache replacement policy is designed based on both the requesting frequency and expiration time. To summarize, the key contributions are three-fold: 1) To the best of our knowledge, this is the first attempt to formulate the cache optimization problem as a classical set-cover problem, which can be solved as an integer linear programming (ILP) and achieves the maximum offloading revenues. 2) We develop a centralized caching protocol which contains the following parts: a) The selective algorithm of caching location based on set-cover. b) The cached-data dissemination strategy based on multi-path routing. c) The cache replacement policy based on data popularity. 3) We conduct our extensive simulations on real-world mobility traces. The simulation results show that our proposed caching protocol is more efficient and appropriate than the previous caching schemes in offloading scenario. The rest of the paper is organized as follows. Sect. 2 reviews the related work. Sect. 3 highlights our motivation of caching in offloading scenario and provides the network model. Sect. 4 describes the details of the set-cover based caching scheme. Sect. 5 evaluates the performance of our proposed scheme by trace-driven simulations. Sect. 6 concludes the paper.

2 Related work There is a rich series of studies on cellular network

47

offloading and delay tolerant networks (DTNs) that draw some inspiration to our design. Mobile data offloading has become an indispensable measurement to reduce the traffic burden on cellular network. A latest survey [6] of offloading techniques classifies existing strategies into two categories according to various requirements: AP and terminal-to-terminal based. In the former case, Hoteit et al. [7] evaluate the capacity and energy saving gain by passpoint-hotspot offloading under different placement and AP-selection schemes. In Ref. [8], a deployment algorithm based on the density of user request frequency is proposed, which nearly achieves the optimal offloading ratio. In the latter case, the offloading process can be realized by exploiting terminal-to-terminal connections, which employs a delay-tolerant approach. Most studies in Refs. [2,5,9] mainly focus on pre-determining a set of users as offloading assistants between cellular networks and users. Moreover, a reverse auction-based incentive framework is designed in Ref. [4] to trade-off between the offloaded traffic and users’ satisfaction. An MSN is a specific type of DTN, which lacks stable and persistent end-to-end routing paths, due to unpredictable node mobility. Meanwhile cooperative caching for DTNs has been extensively studied in recent years. In Ref. [10], a cooperative cache-based content delivery framework is established based on the maximization of overall subscription served probability. Gao et al. [11] propose to actively cache data at some network central locations in response for data accessibility. Except for caching at users, Ying et al. [12] investigate where to deploy throwboxes in large-scale DTNs. Some research [13–15] also design caching schemes by exploiting social attributes such as degree centrality, betweenness, etc. Moreover, Zhuo et al. [15] propose a social-based caching framework under the impact of contact duration. Comparatively, in this paper, we first interpret the offloading revenue on the cellular side, which is determined by the number of caching users. Second, we first formulate the caching framework as an ILP by connecting it with the classical set-cover problem. Third, the caching protocols applied for DTN are processed in distributed manner, while in offloading scenario, the optimal caching performance can be achieved with aid of cellular centralized control.

48

The Journal of China Universities of Posts and Telecommunications

3 Motivation and network model 3.1

Motivation

In cellular system, when a user initiates a request for a data item, the network operator would immediately push the data back to the user via cellular base station. This mechanism hence results in duplicated transmissions for different users that subscribe to the same data item, which brings in unnecessary traffic load. See Fig. 1 for

Fig. 1

2016

illustration. The left part shows the traditional communication mode of cellular system. Assume the file size is s , 6 users query for the same file and get copies separately which consumes 6s traffic. If we cache the file at some appropriate users, then these selected users can further disseminate the file to the requesters by exploiting opportunistic contacts. The right part shows that we intentionally cache the file at 3 users which consumes 3s traffic, then the remaining traffic can be offloaded through communications among mobile users.

Traffic offloading with aid of caching

This vivid example inspires us to represent the offloading revenues on the cellular operator side, denote a data item as d , and its size as s. Also, there are n requesters and we select m caching users for offloading. Then the revenue of an operator respect to d can be defined as follows: (1) R (d ) = ns − ms Obviously, to maximize R , an operator should select a minimal user set (minimizing m ) as caching locations which cover all the requesters and provide data accessibility. To effectively offload data traffic, the routing selection from caching locations to requesters is another critical issue. Existing routing schemes in MSNs aim at approaching the performance of epidemic routing by exploiting forwarding metrics [16–17] or social relationships [18–19]. All these routing schemes are mostly applied to pure MSN environment, which results in coarse-grained control on data replication due to the distributed manner, and the social-metric based routing is poorly performed due to the static estimation of social

graph. To cope with these challenges, we provide a path filtering criterion based on path weight, and then create a path-set for proactive routing with a given transmission probability, which utilizes cellular centralized control and provides fine-grained control on replication overhead. Summarizing in this paper, we focus on how to find a minimal node set and design an efficient routing scheme in order to offload as much traffic as possible via opportunistic contacts. 3.2

Network model

In this paper, we consider a general offloading scenario within a single cell. When a user query for a data item, the cellular operator provides two options: retrieve the data item either from the cellular network promptly, or from the caching user with a specified tolerable delay. Only in the latter case, the offloading revenue can be substantially achieved by presetting some caching locations. However, as assumed in Refs. [2,11], not all nodes are willing to

Issue 2

Bao Xuyan, et al. / Cellular traffic offloading utilizing set-cover based caching in mobile social networks

participate in data caching. Thus, we assume the available buffer space of user is limited. After the network operator actively caches the data item at a set of mobile users determined by caching scheme, the caching users then disseminate the data item to the requesters via intermittent multi-hop paths formed by a sequence of opportunistic contacts. In MSNs, opportunistic contacts are described by a network contact graph G (V , E ) , where V is the set of n mobile nodes and the stochastic contact process between pairwise nodes vi , v j ∈ V is modeled as an edge ei , j ∈ E . The characteristics of an edge are determined by the pairwise inter-contact time. Similar to Refs. [11,20], we consider that the inter-contact time follows an exponential distribution. The contacts between vi and v j thus form a Poisson process with rate λij , which indicates the weight of edge eij . Similar to Ref. [11], we then define the multi-hop opportunistic path on G (V , E ) . Definition 1 Opportunistic path A h-hop opportunistic path PAB = (Vp , Ep ) between nodes A and B consists of a node set Vp = { A, N1 , N 2 ,

..., N h −1 , B} ⊂ V and an edge set Ep = {e1 , e2 ,..., eh } ⊂ E with edge weights {λ1 , λ2 ,..., λh } . Path weight p AB (T ) is the probability that data is opportunistically transmitted from A to B along PAB within time T. As previously described, the inter-contact time J k

Nk

between nodes

and

N k +1

follows exponential

distribution with probability density function (PDF) p J ( x) = λk e − λk x . Thus, the transmission delay along PAB

49

problem [21].

4 Caching scheme In this section, we first define the data popularity and node neighbor set, then present the caching scheme based on set cover problem in details. The ultimate goal is to choose as less caching nodes as possible, while the data accessibility is still guaranteed regarding the requesters such that the offloading revenues are maximized. 4.1

Data popularity

When the buffer space of the chosen node is adequate for caching, we are free to cache any data item. However, when the buffer space is full, data popularity will play an important role in judging which data is more appropriate to be cached. We define the data popularity by considering both the requesting frequency and expiration time of data. Denote a data item by d, the popularity I (d ) can be computed as: I (d ) = 1 − e − λd (te −tc )

(4)

In Eq. (4), tc is the current time. We assume the past b

[ts , tc ] follow a Poisson distribution with the arrival rate λd = b / (tc − ts ) indicating the requesting frequency. I (d ) represents the probability requests over time period

that this data will be requested again before expiration. Indeed, this metric determines that the hot-spot and newly-arrived data has more opportunity to be cached.

4.2

Neighbor set

k

h

is D = ∑ J k , which follows hypoexponential distribution: k =1 h

p D ( x) = ∑ Ck( h ) p J ( x)

(2)

k

k =1

h

where the coefficients Ck( h ) =



 λa

( λa − λk ) .

a =1, a ≠ k

Based on Eq. (2), the path weight can be computed as: h

p AB (T ) = ∫ p D ( x)dx = ∑ Ck( h ) (1 − e − λk T ) T

0

(3)

k =1

For simplicity, if q is an opportunistic path connecting A and B, pq (T ) and p AB (T ) will be interchangeably used. This specific definition of path weight would facilitate us to quantitatively measure the relationships between the requesters and the caching nodes. Next, we will propose a novel caching scheme by connecting it with the set cover

Based on the analysis in Sect. 3.1, the cellular operator proactively pushes data into caching nodes as few as possible, then the data can be opportunistically routed to the requesters. This method which realizes traffic offloading includes two parts: one choose the best caching locations, the other route data from caching users to the requesters. As we will see in Sect. 4.3, the main idea is to adopt set cover methodology which utilizes the least node set to cover all the requesters. The key point to build an explicit model is to explain such a question: how to define ‘a node covers another node?’ In original mathematical model, covering can be illustrated as: an element belongs or not belongs to a set. However in MSNs, the relationship between the caching node and the requester can be explained as ‘whether the data can be successfully

50

The Journal of China Universities of Posts and Telecommunications

delivered before expiration or not’. Thus, we attempt to probabilistically describe the covering relationship, which is the transmission probability ( β -neighbor ) from the caching user to the requester by exploiting multi-path routing ( α -path-set ). To construct the covering matrix in Sect. 4.3, we will give the descriptive definition of the neighbor set of a node. For any node vi , the neighbor set denoted by

N (vi ) ⊆ V can be obtained by filtering the candidate node-set Vc according to the following definition. ∀v j ∈ Vc , without loss of generality, we denote a path-set by P(vi → v j ) , which contains all the paths connecting

vi and v j . To proceed, we further define a α -path-set as follows: Definition 2

The α -path-set of nodes vi and v j

is denoted by Pα (vi → v j ) and defined as the set of paths whose path weight is larger than or equal to α (see Eq. (5)}. Pα (vi → v j ) = {q : q ∈ P(vi → v j ) ∧ pq (T )≥α } (5) Then, we say v j is a β -neighbor of vi if and only if the following condition holds:

1−



[1 − pq (T )]≥β

probabilistically measures the relationship between vi and v j instead of geographical distance. Note that Eq. (6) represents the idealistic delivered probability only if the end-to-end delays of all paths in Pα (vi → v j ) are independent, which implies the paths are fully disjoint and there is no interference between packets routing in different paths. This assumption can be made in prior study [22]. Compared with existing distributed routing in pure MSNs, some paths with higher transmission probability can be picked out based on α -path-set , which provides better delivery ratio. Additionally, the fine-grained control on data copies according to the cardinality of α -path-set . In terms of transmission delay and offloading ratio, the caching location should be as near as possible to the requester, which guarantees that the data is delivered with a short delay. Compared with randomized or centralitybased selection, the path weight is dynamically estimated by measuring the contact density λ , then the probabilistic distance between the caching node and the requester can be quantitatively calculated, which provides fine-grained optimization in selection.

(6)

∀q∈Pα ( vi → v j )

Physically speaking, we filter out all the paths whose path weight are larger than α between vi and v j , which means the probability that the data is delivered before expiration is not less than α by exploiting any path in α -path-set . We further measure the probabilistic distance based on

2016

β -neighbor , which means the

probability that the data is successfully delivered is not less than β by exploiting all the paths in α -path-set . From Eq. (3) and Definition 2, α -path-set is determined

4.3

Set-cover based caching location

The set covering problem (SCP) is known to be NP-hard and can be formulated as an ILP problem [23]. We state the SCP mathematically as follows. Given a finite set X and a family F of subsets of X, such that ∀x ∈ X there is at least f ∈ F so that x ∈ f . We call a subset C ⊆ F whose members cover X, i.e., X = ∪ f ∈C f , is a set cover. The goal is to find a minimum-size subset C. Associating with our offloading scenario, we can formulate the problem of finding the optimal caching node-set as a set covering problem. Let R = {r1 , r2 ,..., rm }

by expired time T and path-weight threshold α . T is indeed reflected by the delay tolerance from upper-layer application, and α can be pre-set by cellular operator. If α is set too large, α -path-set become an empty set,

be the set of all requesters that subscribe to the same data item, assume V = n , then we define a m × n covering

inversely, too many paths would incur extra copies. Based on Definition 2 and Eq. (6), we are able to screen out the node v j ∈ Vc which is the β -neighbor of vi

matrix C as follows: 1; ri ∈ N (v j ) cij =  0; otherwise

(7)

and then form the neighbor set N (vi ) . In addition, we

Eq. (7) represents that cij is one if requester ri is a

regulate that vi is the neighbor of itself. It is noteworthy

neighbor of node v j , i.e., v j covers ri . Then our

that α is a tunable parameter, which is controlled by cellular operator to balance the number of data copies and the number of available paths, and β is a metric that

objective is to minimize the number of caching nodes that still cover all the requesters R.

Issue 2

Bao Xuyan, et al. / Cellular traffic offloading utilizing set-cover based caching in mobile social networks

  j =1  (8)  s.t.  Cz≥1; z ∈ {0,1}n  where z j indicates whether the data item would be n

min ∑ z j

cached in v j , and z represents the optimal caching locations. Observing that each element in C and z are integers, we can use dynamic programming to solve this problem with O(mn) time complexity. 4.4

Caching protocol

In this section, we present our set-cover based caching protocol in practical offloading context. Assume some requesters R = {r1 , r2 , r3 , r4 } query the cellular operator for access to data item d with a specified delay T. Considering the limited energy and computational capacity of mobile devices, users with wireless interface only detect their peers nearby, record the contact rates λij and build its own adjacent matrix. Fig. 2 illustrates all the essential steps for caching procedure.

Fig. 2

An example of caching a data item

Step 1 When the caching mechanism is triggered, the cellular operator start with identifying the adjacent list of each requester ri and then conduct breadth-first-search (BFS) to traverse potential candidate nodes. Step 2 The cellular operator pre-set the value of α , β , once a node is traversed by the BFS, check if Eq. (5)

51

of vi based on Eq. (6), add ri to N (vi ) and record the corresponding α -path-set . e.g., c1 has two available paths to r1 , but c1 → v2 → r1 is excluded whose path weight is less than α . Step 4 The cellular operator then build the covering matrix, solve the Eq. (8) by dynamic programming and determine the optimal caching locations. e.g., c1 , c2 are selected as caching nodes. Step 5 If the caching node ci has not enough space to accommodate d, it will remove as much data as the size of d from the buffer that is identified by the least popularity. If the popularity of d is the least, rebuild covering matrix excluding ci and go back to Step 4. Step 6

Then the operator pushes d towards ci

appended with its popularity computed by Eq. (4) and routes d to the requesters ri ∈ N (ci ) by exploiting every path in α -path-set . Step 7

If the requester ri do not receive the complete

d from the caching node ci after a specified time T, the remaining part of d would be directly obtained via cellular network. In the following, we make some explanations to fully implement our proposed caching protocol. 1) If there exists only one requester for a specific data, caching is meaningless. Thus in practical, the cellular operator should set a ‘sliding window’ to accumulate data requests, and trigger the caching periodically. 2) All the computation included in Step 2~3 is transferred to cellular side so that the lifetime of mobile device is maximized. The adjacent list is uploaded to the base station, the cellular operator then maintain the route table and make decisions. 3) For Step 5, the data popularity is used to manage limited caching space. we make a trivial assumption that the data size is far less than the caching space, so that any new data item can be allocated by deleting the old ones. 4) For Step 6, to realize data dissemination with a given route entry, the caching node would encapsulate the raw data with the route entry. e.g., d is delivered from c2 to

r3 using route entry c2 → v4 → r3 , c2 finds the next

holds, add the path into the α -path-set between the

hop is v4 , then deletes itself from the route entry as

current node vi and the requester ri , otherwise, remove

v4 → r3 , last waits for the contact with v4 and delivers

the current node (leaf node) from the precursor node’s adjacent list and continue searching. Step 3 After traversal, check if ri is a β -neighbor

the data with new route entry. In this way, each intermediate node only checks the next hop in route entry and waits for the contact opportunity. If any intermediate

52

The Journal of China Universities of Posts and Telecommunications

node contacts the requester, then directly delivers the data.

5 Performance evaluation 5.1

Simulation settings

We evaluate the performance of our set-cover based caching (SCBC) protocol through trace-driven simulation. Our simulation is conducted on the bluetooth trace which is collected at the campus of University of Calabria in Rende, Italy. The dataset contains bluetooth device proximity data collected by 35 devices with an Ad-hoc android application called SocialBlueConn. The bluetooth range is about 10 m. Each device finds out their nearby bluetooth-device neighbors periodically and records the results of the device discovery, including the two contact parties and the timestamp of contact. The detailed characteristics of bluetooth trace are showed in Table 1. Table 1

Description of realistic bluetooth trace

Characteristics

Bluetooth trace

Device

Bluetooth device with SocialBlueConn

Trace duration/d

8

Granularity/s

180

Number of devices

35

Number of contacts

21 920

In experiment, one min is set as the driven time-slot with regard to the scale and contact density of traces. For each time slot, we record the node pairs if there exists a contact between them based on the above data trace. Such contact can be exploited in data exchange for each simulation runs. The first half of the trace is used for training to estimate the contact rates λij . Meanwhile, the value of α , β are both set to 0.5. We generate 50 data items in the network, and assume that all data items are of the same size. The buffer space of every node is generated within the range of [200 MB, 600 MB]. Each node generates data request independently, attached with a finite expired time Te ∈ [0.5T ,1.5T ] , where Te is uniformly distributed and T controls the distribution range. Such process is implemented as a poisson process which means the inter-contact time between successive requests follows exponential distribution. In addition, we assume that request pattern follows Zipf distribution with default skewness parameter w = 1 , where the request rate of the yth most popular data item is proportional to y − w . We trigger our caching scheme periodically with the time step

2016

of 10 min, considering that small time step will cause high computational complexity and deteriorate caching performance. Additionally, the data requests over past 30 min are all recorded at each triggering point, which guarantees that there exits adequate number of requests for a certain data item. 5.2

Comparison of schemes and evaluation metrics

In the simulation, we compare our caching scheme SCBC against two other schemes: one random cache (RC): base station (BS) chooses 10% of the nodes as caching nodes randomly. The other is centrality-based cache (CC): BS chooses 10% of the nodes with the highest degree centrality, here the degree of a node is measured by the number of contacted neighbor nodes. Different from SCBC which removes the message with the least popularity, both RC and CC use the drop-front policy to realize cache replacement: drop the message which was first entered into the buffer when the cache space is full. We adopt Epidemic routing protocol for both the RC and the CC schemes: the most generic routing base in DTNs that a node replicates the data to every encountered nodes that have not received a copy yet. In this paper, our system performance is measured by the following metrics: 1) Replication overhead: the number of data copies generated in the network, which means when multiple paths are exploited to route the same data, how many data copies would be injected to the MSNs. 2) Delivery delay: the average delay for getting responses to data request. 3) Delivery ratio: the ratio of data requests being satisfied within expired time through DTN. 4) Offloading ratio: the ratio of the offloading cellular traffic to the overall traffic. 5.3

Simulation results

1) Impact of expired time In this part, we demonstrate the impact of the expired time on the system performance. From Fig. 3, it is clearly that the SCBC outperforms the RC and the CC policy in all metrics. This is because SCBC replicates and caches popular data at candidate nodes and routes data to request nodes by exploiting every path in the path set, resulting in a lower latency, shorter delivery delay and higher hit rate which leading to offload more traffic.

Issue 2

Bao Xuyan, et al. / Cellular traffic offloading utilizing set-cover based caching in mobile social networks

As shown in Fig. 3(a), the average replication overhead consumed by our proposed SCBC is lower than that of the RC and the CC and is growing as the expired time increasing. The main reason is that within the longer expired time they can send more copies. By comparing the delivery ratio in Fig. 3(c), we find that the SCBC surpasses the RC and the CC by 133.6 %, 258.2 %. The distinct advantage can be attributed to the optimal route appending for each data item, which replaces random contact opportunity with deterministic one in SCBC. From Fig. 3(b), we can see that the delivery delay of RC and CC is nearly linearly growing when the expired time increases. The majority of data packets in RC and CC fails to be transmitted to the destination within the expired time through the DTN, but the BS after expired time. Fig. 3(d) shows that SCBC provides astonishing performance in terms of offloading ratio. In CC policy, many messages are pushed into the nodes with high centrality from BS, but limited messages can be cached due to the scant caching capacity. For RC, the BS pushes data items to nodes randomly, which may produce redundant data items and engender a declined offloading ratio even below zero.

53

(c) Delivery ratio

Fig. 3

(d) Offloading ratio Caching performance with different expired time

By and large, our proposed scheme outperforms the others for its ability to improve system performance in terms of data copies, delivery ratio, delivery delay and offloading traffic.

(a) Replication overhead

(b) Delivery delay

2) Impact of data size To investigate the impact of different values of data size on each schemes, we adjust the data size from 25 MB to 200 MB as shown in Fig. 4. Generally, the replication overhead, delivery delay and ratio keep almost unchanged as the data size becomes larger. The majority of data items are delivered via the BS after the expired time in RC and CC, so the metrics mentioned above are relevant to the expired time rather than the data size. The SCBC, however, always achieves low delivery delay thanks to the optimal route we select. The data items in the caching space update frequently, therefore, the performance are less impressionable by the data size. As can be seen from Fig. 4(a), SCBC outperforms the other schemes, since the Epidemic routing schemes are prone to more replication overhead. Taking delivery delay and ratio into account, Figs. 4(b) and 4(c) show that the SCBC reduces the average delivery delay by about 49.0 %,

54

The Journal of China Universities of Posts and Telecommunications

2016

57.7 % and improves the average delivery ratio by about 126.6 %, 266.6 % when comparing with the RC and the CC, respectively. The advantage in delivery delay and ratio can be ascribed to the high probability of available paths to destination. When data size increases, the offloading traffic of all the three schemes degrades as shown in Fig. 4(d). Larger data size implies fewer data blocks will be cached with fixed caching space. BS has to resend data items which are deleted before successful transmission, resulting in descending offloading traffic. (d) Delivery delay Fig. 4 Caching performance with different data size

(a) Replication overhead

(b) Delivery delay

(c) Replication overhead

Based on Figs. 3 and 4, it is obvious that SCBC is superior to CC and RC, next we make a rational analysis to discuss the results. First, SCBC can filter out the paths with higher transmission probability, which provides tight relationship between the candidate node and the requester. Using these paths to route data can achieve better delivery ratio. Second, compared with epidemic routing adopted in CC and RC, SCBC provides fine-grained control on data copies based on α -path-set , which significantly decreases the replication overhead. Last, in terms of delivery delay and offloading ratio, the caching location should be as near as possible to the requester. In SCBC, the path weight is dynamically estimated by measuring the local contact density, thus the probabilistic distance between node-pairs can be quantitatively described, which yields fine-grained optimization in selection of caching nodes. However, randomized choosing or centrality-based approach in which the degree is statically estimated can hardly reflect the relationship between node-pair precisely, thus results in poor selection outcome. Suppose a caching node is far away with a requester, which means the data has to be routed through many hops, then the data would be expired at any intermediate node, which means the offloading ratio is decreased since the requester has to download the data directly from the cellular network. Summarizing, our scheme is stable and efficient in real offloading scenario such as road traffic updates [24], social data [25] and stream content sharing [26] with its outstanding performance on all metrics. In such context, the SCBC proposal can be carried out to shift these data traffic to MSN nodes. The major cost to achieve superior performance reflects in two sides: 1) The mobile node has to open its wireless interface to periodically detect its neighbor nodes, which consumes

Issue 2

Bao Xuyan, et al. / Cellular traffic offloading utilizing set-cover based caching in mobile social networks

energy. 2) The cellular base station has to take on all the computational overhead and maintain a global routing table.

6 Conclusions This paper mainly introduces our effort in offloading cellular traffic by exploiting opportunistic contacts between mobile users in MSNs. We first quantitatively interpret the offloading revenues on the cellular operator side associated with the scale of caching users, and then develop a centralized caching protocol to maximize the offloading revenues. The trace-driven simulation results show that the proposed caching protocol outperforms previous schemes in offloading scenario. Acknowledgements This work was supported by the National Natural Science Foundation of China (61372117).

References 1. Xiao M J, Wu J, Huang L S. Community-aware opportunistic routing in mobile social networks. IEEE Transactions on Computers, 2014, 63(7): 1682−1695 2. Li Y, Qian M J, Jin D P, et al. Multiple mobile data offloading through disruption tolerant networks. IEEE Transactions on Mobile Computing, 2014, 13(7): 1579−1596 3. Sciancalepore V, Giustiniano D, Banchs A, et al. Offloading cellular traffic through opportunistic communications: analysis and optimization. IEEE Journal on Selected Areas in Communications, 2016, 34(1): 122−137 4. Zhuo X J, Gao W, Cao G H, et al. An incentive framework for cellular traffic offloading. IEEE Transactions on Mobile Computing, 2014, 13(3): 541−555 5. Li Z, Liu Y, Zhu H S, et al. Coff: contact-duration-aware cellular traffic offloading over delay tolerant networks. IEEE Transactions on Vehicular Technology, 2015, 64(11): 5257−5268 6. Rebecchi F, Dias de Amorim M, Conan V, et al. Data offloading techniques in cellular networks: a survey. IEEE Communications Surveys & Tutorials, 2015, 17(2): 580−603 7. Hoteit S, Secci S, Pujolle G, et al. Mobile data traffic offloading over passpoint hotspots. Computer Networks: The International Journal of Computer and Telecommunications Networking, 2015, 84(C): 76−93 8. Bulut E, Szymanski B K. WiFi access point deployment for efficient mobile data offloading. Proceedings of the 1st ACM International Workshop on Practical Issues and Applications in Next Generation Wireless Networks (PINGEN’12), Aug 22−26, 2012, Istanbul, Turkey. New York, NY, USA: ACM, 2012: 45−50 9. Lu X F, Pan H, Lio P. Offloading mobile data from cellular networks through peer-to-peer WiFi communication: a subscribe-and-send architecture. China Communications, 2013, 10(6): 35−46

55

10. Ma Y Z, Jamalipour A. A cooperative cache-based content delivery framework for intermittently connected mobile ad hoc networks. IEEE Transactions on Wireless Communications, 2010, 9(1): 366−373 11. Gao W, Cao G H, Iyengar A, et al. Cooperative caching for efficient data access in disruption tolerant networks. IEEE Transactions on Mobile Computing, 2014, 13(3): 611−625 12. Ying Z, Zhang C, Wang Y. Social based throwbox placement in large-scale throwbox-assisted delay tolerant networks. Proceedings of the 2014 IEEE International Conference on Communications (ICC’14), Jun 10−14, 2014, Sydney, Australia. Piscataway, NJ, USA: IEEE, 2014: 2472−2477 13. Wang Y S, Wu J, Xiao M J. Hierarchical cooperative caching in mobile opportunistic social networks. Proceedings of the 2014 IEEE Global Telecommunications Conference (GLOBECOM’14), Dec 8−12, Austin, TX, USA. Piscataway, NJ, USA: IEEE, 2014: 411−416 14. Le T, Lu Y, Gerla M. Social caching and content retrieval in disruption tolerant networks (DTNs). Proceedings of the 2015 International Conference on Computing, Networking and Communications (ICNC’15), Feb 16−19, 2015, Garden Grove, CA, USA. Piscataway, NJ, USA: IEEE, 2015: 905−910 15. Zhuo X J, Li Q H, Cao G H, et al. Social-based cooperative caching in DTNs: a contact duration aware approach. Proceedings of the 8th International Conference on Mobile Ad Hoc and Sensor Systems (MASS’11), Oct 17−22, 2011, Valencia, Spain. Piscataway, NJ, USA: IEEE, 2011: 92−101 16. Lindgren A, Doria A, Schelén O. Probabilistic routing in intermittently connected networks. ACM SIGMOBILE Mobile Computing and Communications Review, 2004, 7(3):19−20 17. Elwhishi A, Ho P H, Naik K, et al. Self-adaptive contention aware routing protocol for intermittently connected mobile networks. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(7): 1422−1435 18. Daly E M, Haahr M. Social network analysis for information flow in disconnected delay-tolerant manets. IEEE Transactions on Mobile Computing, 2009, 8(5): 606−621 19. Hui P, Crowcroft J, Yoneki E. BUBBLE Rap: social-based forwarding in delay-tolerant networks. IEEE Transactions on Mobile Computing, 2011, 10(11): 1576−1589 20. Balasubramanian A, Levine B N, Venkataramani A. Replication routing in DTNs: a resource allocation approach. IEEE/ACM Transactions on Networking, 2010, 18(2): 596−609 21. Cormen T H. Introduction to algorithms. 2nd ed. Cambridge, MA, USA: MIT Press, 2001 22. Tie X Z, Venkataramani A, Balasubramanian A. R3: robust replication routing in wireless networks with diverse connectivity characteristics. Proceedings of the 17th Annual International Conference on Mobile Computing and Networking (MobiCom’11), Sept 19−23, 2011, Las Vegas, NV, USA. New York, NY, USA: ACM, 2011: 181−192 23. Grossman T, Wool A. Computational experience with approximation algorithms for the set covering problem. European Journal of Operational Research, 1997, 101(1): 81−92 24. Whitbeck J, Amorim M, Lopez Y, et al. Relieving the wireless infrastructure: When opportunistic networks meet guaranteed delays. Proceedings of the 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM’11), Jun 20−24, 2011, Lucca, Italy. Piscataway, NJ, USA: IEEE, 2011: 10p 25. Han B, Hui P, Anil Kumar V S, et al. Cellular traffic offloading through opportunistic communications: a case study. Proceedings of the 5th ACM Workshop on Challenged Networks (CHANTS’10), Sept 20−24, 2010, Chicago, IL, USA. New York, NY, USA: ACM, 2010: 31−38 26. Keller L, Le A, Cici B, et al. MicroCast: cooperative video streaming on smartphones. Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services ( MobiSys’12), Jun 25−29, 2012, Lake District, UK. New York, NY, USA: ACM, 2012

(Editor: Wang Xuying)