J. Parallel Distrib. Comput. (
)
–
Contents lists available at SciVerse ScienceDirect
J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc
Gossip-based cooperative caching for mobile applications in mobile wireless networks Xiaopeng Fan a,∗ , Jiannong Cao b , Haixia Mao c , Yunhuai Liu a a
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China
b
Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
c
School of Automotive and Transportation Engineering, Shenzhen Polytechnic, China
article
info
Article history: Received 31 May 2012 Received in revised form 10 January 2013 Accepted 16 January 2013 Available online xxxx Keywords: Cooperative caching Data relation Cache placement Gossip-based protocol
abstract Cooperative caching is an efficient way to improve the performance of data access in mobile wireless networks, by cache nodes selecting different data items in their limited storage in order to reduce total access delay. With more demands on sharing a video or other data, especially for mobile applications in an Internet-based Mobile Ad Hoc Network, considering the relations among data items in cooperative caching becomes more important than before. However, most of the existing works do not consider these inherent relations among data items, such as the logical, temporal, or spatial relations. In this paper, we present a novel solution, Gossip-based Cooperative Caching (GosCC) to address the cache placement problem, and consider the sequential relation among data items. Each mobile node stores the IDs of data items cached locally and the ID of the data item in use into its progress report. Each mobile node also makes use of these progress reports to determine whether a data item should be cached locally. These progress reports are propagated within the network in a gossip-based way. To improve the user experience, GosCC aims to provide users with an uninterrupted data access service. Simulation results show that GosCC achieves better performance than Benefit-based Data Caching and HybridCache, in terms of average interruption intervals and average interruption times, while sacrificing message cost to a certain degree. © 2013 Elsevier Inc. All rights reserved.
1. Introduction In Internet-based Mobile Ad Hoc Networks (IMANETs) [9], mobile users can access the Internet through multi-hop communications, without requiring infrastructure in the users’ proximity. An IMANET combines a Mobile Ad Hoc Network (MANET) with the Internet to provide universal information accessibility. With the appearance of more and more mobile applications, such as mobile games, there is a great demand for sharing game data among multiple players. For example, most of the games in Apple AppStore or Android Market are with a multiplayer mode for the users to play these games together. These users can be connected by Bluetooth, Wi-fi, or 3G etc. They can be considered as different data sources. However, it is still challenging to improve the efficiency of data access in IMANETs. The reasons are explained as follows. First, wireless networks are notorious for the contentions and channel unstableness in wireless communication. Mobile nodes cannot transmit too many packets for communication. Second, due to the
∗
Corresponding author. E-mail addresses:
[email protected],
[email protected] (X. Fan),
[email protected] (J. Cao),
[email protected] (H. Mao),
[email protected] (Y. Liu). 0743-7315/$ – see front matter © 2013 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2013.01.006
limited size of cache space in a mobile device, mobile nodes cannot cache locally all the frequently accessed data items. Third, mobile nodes move freely so that disconnections may occur frequently. Once networks partition, mobile nodes in one partition cannot access the data items cached in another partition. Data accessibility in IMANETs is then much lower than that in the fixed networks. Mobile users may cache data items in a cooperative way to improve the efficiency of data access. Cooperative caching [29,26,15,11,8,25,16,28,20,18] has been proved as a very efficient way to improve the performance of data access. A typical strategy in cooperative caching works as follows. Data sources transfer some data copies to some nodes called cache nodes. Each cache node selects a subset of all the data items to cache due to its limited cache size. Other nodes can access data items from cache nodes instead of data sources. Consequently, access delay can be decreased because of the service provided by these cache nodes, while message cost are increased and cache consistency should also be maintained if data items change. With more demands on sharing a video or other data in IMANETs, the relations among data items become much more important than before. These relations reflect the logic, temporal, or spatial correlations among these data items, and user experience depends heavily on these relations. For example, users prefer to see a video without interruptions, while they do not concern how
2
X. Fan et al. / J. Parallel Distrib. Comput. (
long the requested segments of the video are fetched, unless the fetching of the segments causes latency before they use it on time. In some environments, users can also establish the correlations among data items. For example, different groups of the users in the same mobile game are concerned about different sets of data items according to their preferences. However, existing works on cooperative caching [29,26,15,11,8,25,16,28,20,18] do not consider inherent relations among these data items. If these relations can be considered in cooperative caching, user experience can be improved much but this requires users with more cooperation. Because most of the existing works aim to minimize total access delay, it is necessary to use some new metrics that are much more useful than total access delay for mobile applications that need to share data items with inherent relations, such as mobile games. In this paper, we consider two new metrics: (1) Average Interruption Intervals (AII) and (2) Average Interruption Times (AIT). As we all know, access delay is defined as the interval between the time when a user sends its data request out and the time when it receives the reply. Interruption interval means the interval between the time when a user finishes using one data item and the time when it begins to uses the next one in a sequential order. If a user fetches the next data item from its own cache, there will be with zero delay. Thus, a user do not care about how long the data item is fetched and where the data item comes from. In this way, existing solutions cannot optimize total interruption intervals. In this paper, Average Interruption Intervals, the average of interruption intervals over all nodes, and Average Interruption Times, the average of interruption times when mobile users use all the data items, are introduced to evaluate the performance of cooperative caching for mobile applications in IMANETs. We propose a solution to the cache placement problem under the case that mobile users access all the data items in a sequential order. Our objective is to minimize total interruption intervals for all the mobile users across all the data items. We name the proposed strategy as Gossip-based Cooperative Caching (GosCC). The progress report is proposed in our algorithm, which describes the progress of each node consuming data items and the contents in caches at each node. These progress reports are exchanged by gossiping [10,24,3,17] for the first time in cooperative caching. Gossiping is a suitable way to disseminate information in IMANETs. Our caching decision is based on these progress reports above and constrained by the limited cache sizes at mobile nodes. GosCC is at least with three advantages: (1) the proposed strategy is reliable in a mobile wireless environment with a probabilistic reliability model because of our gossip-based protocol; (2) GosCC implements cooperative caching in a randomized way with good load balancing; and (3) GosCC is independent to the existing mobility models. The rest of the paper is organized as follows: Section 2 describes the system model and the problem formulation. In Section 3, we present the design of our algorithms. In Section 4, we make a theoretical analysis on the parameters in GosCC. Simulation results are reported in Section 5. Section 6 discusses the related works. Finally, Section 7 concludes this paper. 2. Problem formulation In this section, we first introduce the system model for our work. Then we formulate the cache placement problem in mobile wireless networks as the Sequential Relation Cache Placement problem (SRCP). 2.1. System model Let G(V , E ) be a graph representing an IMANET with n (n = |V |) nodes, V = {MN1 , MN2 , . . . , MNn } (see Table 1). MNi is the i-th
)
–
Table 1 Notations used in system model. V E n m MNi D di
v
Sd S SRi
The set of mobile nodes in the network The set of links in the network The number of nodes The number of data items The i-th node in the network The set of data items The i-th data item The version of di in our consistency model The size of each data item The size of the cache at each node for a cooperative caching system The data source of the i-th data item
mobile node. An edge of the graph is represented by two mobile nodes communicating directly with each other. Mobile users aim to access m data items D = {d1 , d2 , . . . , dm } by following the sequential order from d1 to dm , such as the segmentations in a video file or a series of data items with timestamps. Let Sd be the size of each data item. Let S be the size of the cache at each node. Each data item is maintained by its data source, such as a gateway node (GW ) or a game player. We define SRi as the data source of di . We assume that gateway nodes always maintain the most updated copies of data items they cached from the Internet. Each data item di can be describe as di = d(i, v), in which the version v is the timestamp to describe the difference between the copies of di . 2.2. Problem formulation Our objective is to determine what data items should be cached at which mobile nodes so that total interruption intervals are minimized. We name the proposed problem as the Sequential Relation Cache Placement problem (SRCP). The interruption interval for data item dj at MNi is defined as the interval t (i, j) between the time when MNi finishes using the data item dj−1 , and the time when MNi begins to use the data item dj in a sequential order (see Table 2). As we mentioned in the Introduction, we consider user experience as our major concern after we introduce the sequential data relation into cooperative caching. SRCP is subjected to limited cache sizes at individual nodes. The cache placement problem is to select a set of sets of cache nodes M = {M1 , M2 , M3 , . . . , Mm }, where each mobile node in Mj stores a copy of dj , to minimize total interruption intervals T (G, M ) as follows: Min : T (G, M ) =
n m
t (i, j)
(1)
i =1 j =1
s.t .|{Mj |MNi ∈ Mj }| 5 ⌊S /Sd ⌋ , where MNi appears at most ⌊S /Sd ⌋ sets of M. Total interruption intervals should be smaller than total access delay because a portion of total access delay is reduced if the access delay did not affect the use of some data items. However, how to find the proper locations for cache copies is still the key problem in the facility location problem [1,2]. The essence of the problem is still not changed. We aim to minimize total interruption intervals by making mobile nodes access all the data items at local cache as more as possible. But mobile nodes also need to consider the progresses of other nodes consuming data items and cooperate with each other, in order to reduce the probability that one mobile node cannot access the data items continuously, i.e., the process of consuming data items is interrupted. In SRCP, t (i, j) is related to two parameters: the data item size Sd and the speed at which a mobile user consumes data items vu . Let tmax be the maximum access delay for fetching a data item. t (i, j) will be zero if mobile node MNi fetches data item dj by following Eq. (2). S ≥ Sd ≥ vu × tmax
(2)
X. Fan et al. / J. Parallel Distrib. Comput. (
)
–
3
Table 2 Notations used in problem formulation. t (i, j) M Mi
vu
tmax N F ci dmj O cij
The interval t (i, j) between the time when MNi finishes using the data item dj−1 , and the time when MNi begins to use the data item dj in a sequential order A set of sets of cache nodes The set of cache nodes that stores a copy of di The speed at which a mobile user consumes data items The maximum access delay for fetching a data item All the locations in the uncapacitated facility location problem All the locations that can open a facility in the uncapacitated facility location problem The cost for opening a facility at location i The facility demands at location j The set of open facility locations The cost of shipping a unit of demand for i to j
SRCP is one element in the set of the cache placement problem [1], which is known NP-hard. We consider the uniformlength data items in SRCP and the cache placement problem with uniform-length data items is proved to be MAXSNP-hard [1]. Theorem 1. The computation of SRCP is MAXSNP-hard. Proof. The network topology of an IMANET is dynamic due to node mobility. At any time t, we know the metric uncapacitated facility location problem (UFL) can be described as follows. We are given a set N = {1, . . . , n} of n locations and a subset F ⊆ N at which we may open a facility. For each location i in F , there is cost ci for opening a facility at i. Each location j in N has demands dmj . Given a set O of open facility locations, the demand at any location j in N is served by the location in O nearest to j. For any two locations i and j, we have a cost cij of shipping a unit of demand for i to j. These costs form a metric. The objective in UFL is to determine a set of open facilities such that the total facility and shipping cost is minimized. According to Theorem 1 in [1], the data placement problem with uniform-length objects is MAXSNP-hard. In SRCP, each mobile node accesses each data item only once in a sequence order and the size of data item is the same. Thus the demand for each mobile node accessing each data item is a constant. If we consider the demand at each node in SRCP as the unit of demands in the data placement problem with uniform-length objects, the total facility cost does not change and the total shipping cost is just with a constant coefficient before the distance between the demanding node and the cache node. The total facility and shipping cost still need to be minimized. SRCP is just one element in the set of the cache placement problem with uniform-length data items. Therefore, SRCP is also a MAXSNP-hard problem. 3. Gossip-based cooperative caching In this section, we firstly present the mechanism of maintaining cache consistency in an IMANET for our caching system. Then we propose three heuristic rules for SRCP. Finally, we describe GosCC in detail with the key data structures and the proposed algorithm. 3.1. Cache consistency maintenance In our caching system, data sources are responsible for providing the data items for all the mobile users. Each data source maintains the consistency among all its cache nodes. In this paper, we apply a TTL-based cache consistency strategy to maintain the delta consistency among all the cache nodes [6,27,30,31]. In the proposed TTL-based caching strategy, each data copy is associated with an expiration time Tttl , i.e., the Time-To-Live (TTL) value. If a query arrives before the expiration time, the copy is considered valid and the reply message is generated locally. Otherwise, the copy is considered invalid until next update from the data source. The query should be forwarded to the data source or other cache nodes that holds a valid cache copy.
There are several reasons why we use a TTL-based caching consistency strategy in an IMANET. Firstly, in a TTL-based strategy, there is no need for the data source to keep track of the locations of cache nodes. Therefore, it is resilient for mobile nodes to join or leave the set of cache nodes and thus suitable for highly dynamic systems. Secondly, each node determines the validity of cache copy autonomously so that the validations of expired cache copies can be performed with either the data source or other mobile nodes with valid copies. Thirdly, the proposed TTL-based strategy guarantees that the staleness of the requested data item is timebounded. It offers the flexibility to use different TTL values to cater for different consistency requirements that may vary with the nature of shared contents and user tolerance. Next, we define the delta consistency model as follows. Let St denote the version number of the source data and Cjt be the version number of the cached copy on cache node MNj at time t. Initially, the version number of the source data is set to zero and then increased one upon each subsequent update. The version number of the cache copy is set to that of the source data at the time when it is synchronized. Thus, the delta consistency [6] can be defined as follows:
∀t , ∀j, ∃τ , 0 ≤ τ ≤ δ,
s.t . S t −τ = Cjt .
(3)
3.2. Three heuristic rules in GosCC In this section, we present three heuristic rules for SRCP. The three heuristic rules are based on our objective to reduce total interruption intervals as more as possible. Before we explain the details of the three heuristic rules, all the notations used in this section are listed in Table 3. Three terms should be defined clearly before our three heuristic rules. The first term is the parameter ‘‘Buffer Window’’ (BW ), which is defined as the number of the cached data items in the buffer at each node. These cache data times are composed of an array of data items that follows the one currently being used by a mobile node. The second term is ‘‘Gossip Report’’ (GR). In GosCC, we apply gossiping as the way to exchange the caching information among mobile nodes. There are two kinds of digest in a GR, including the cache digest and the progress digest. The cache digest describes the cache content in a mobile node at an instant t. Let C _DIGEST (i, t ) be the cache digest of MNi at time t. C _DIGEST (i, t ) is composed of a vector with m bits. The jth bit denotes whether dj is cached at MNi or not. The progress digest is described as follows. Let P_DIGEST (i, t ) be the progress digest of MNi at time t in a GR. P_DIGEST (i, t ) consists of n data item IDs and the corresponding timestamps. The kth data ID is the current data ID consumed by mobile node MNk , known by MNi at time t. The detailed structure of a gossip report is described in Fig. 1. The third term is the ‘‘fanout’’ of gossiping fi , which is defined as the number of gossiping targets for MNi . In GosCC, each mobile
4
X. Fan et al. / J. Parallel Distrib. Comput. (
)
–
Table 3 Notations used in heuristic rules. BW GR C _DIGEST (i, t ) P_DIGEST (i, t ) fi , p (i , j ) MAX _ID MIN_ID VAR_RANGE THr BNS
Buffer window, the number of the data items in the buffer Gossip report to describe the cache content at each node The cache digest of MNi at time t in a GR The progress digest of MNi at time t in a GR The fanout, the number of gossiping targets for MNi The priority index for the data item dj at mobile node MNi The maximum ID among all the data items in a progress digest The minimum ID among all the data items in a progress digest The variation range of a progress digest The threshold to evaluate the range level of a progress digest Buffering-Next Scheme
Fig. 1. Gossip report.
node periodically gossips its own gossip reports to fi randomly selected nodes. In C _DIGEST , MNi describes its own cache content to other nodes such that other mobile nodes can send their data requests to the closest cache node for data items. In P_DIGEST , MNi just stores the IDs of the data item currently consumed by each node, paired with a timestamp of the latest updated version. Each mobile node is only allowed to updates its own currently used data item in P_DIGEST . To control redundant gossip messages, each gossip report should not be forwarded more than a specific gossiping rounds R, i.e., each report is gossiped for R rounds at most. We will discuss these parameters in Section 4. Rule 1: The next data item to be used by a mobile node is always the most important one. GosCC applies Buffering-Next Scheme (BNS) to make interruption intervals as small as possible. BNS is designed for collecting the set of data items in the buffer window of a mobile node in a timely way. To minimize the total interruption intervals, how to keep the continuity of users consuming data items is the most important problem. Mobile nodes adopt a prefetching scheme to keep the buffer in the state that the next data item should be fetched from the nearest cache node. Additionally, the length of the buffer will be very important. We will discuss the length in Section 4. In BNS, we classify all the data items for each node into different access priories. Data item dj at mobile node MNi is assigned with a priority index, p(i, j). If the data item dj is the next data item in the buffer window of MNi , we set the priority index p(i, j) as 1; If the data item dj is not the next one but in the buffer window of MNi , we set the priority index p(i, j) as 2. If the data item dj is not in the buffer window of MNi , we set the priority index p(i, j) as 3. In BNS, each mobile node detects its progress periodically and sends the ‘‘active’’ cache requests with the corresponding priorities. The maximum number of cache requests in one round cannot reach the maximum number of data items in its available cache size. Rule 2: When there are still available spaces to cache more data items, mobile nodes investigate the progress digest and send cache requests for the most popular data item in the future. In Rule 1, each mobile node classifies all the data items into three levels according to its own progress. In Rule 2, mobile node should have a cooperative way to minimize the total interruption intervals. We know the length of the buffer is related to the speed of user consuming data items. If there are still available space except
for the buffer, GosCC should cache some popular data items for other mobile nodes. In GosCC, the progress digest can provide mobile nodes with the global overview on the progresses of all the mobile nodes using data items. MNi checks the progress digest in the latest gossip report received. If the data item dj has appeared the most times and there is still space available at MNi , MNi sends a cache request for data item dj+BW to the corresponding closest cache node. Rule 3: When cache replacement is required, each data item with its ID smaller than the minimum data ID in the progress digest should be replaced. We know the size of cache at each mobile node is limited. To improve the utilization of the cache space, we have to replace some data items to let more useful data item into the cache. In GosCC, we take a conservative strategy. The minimum ID in the progress report means that any data item with ID smaller than the minimum ID will not be used by any node. Thus, it is safe to remove those data items at each node. Once MNi receives a gossip report from MNj , MNi updates the progress digest and forwards the report to randomly selected mobile nodes if the report is still in its life time. MNi also checks its own cache content. If MNi finds that there are some data items with its IDS smaller than the minimum data ID in the received progress report, MNi replace these data items from its own cache. These data items are useless because MNi knows the progress of the ‘‘slowest’’ mobile node. In a gossip report, there are two important data item IDs. One is the maximum one MAX _ID, and the other is the minimum one, MIN_ID. A mobile node with higher currently used data ID means that this node consumes the data items with a higher speed and vice versa. Let VAR_RANGE be the variation range of a progress digest. It can be obtained by Eq. (4). VAR_RANGE = MAX _ID − MIN_ID.
(4)
Obviously the best case for VAR_RANGE is zero. This means that the only data item that all mobile nodes should cache is the next one. On the other hand, if the variation range is very large (VAR_RANGE < m − 1), the range of data items for all the mobile nodes to cache is from 1 to m. Therefore, we propose a threshold, the range threshold THr . If mobile node MNi detects that the variation range VAR_RANGE is greater than THr , MNi actively sends the data items that will be used by the mobile node with MIN_ID. 3.3. Gossip-based cooperative caching In this section, we describe GosCC in details. GosCC consists of two important components: the key data structures and the detailed Algorithm 1 (see Table 4). (1) Key Data Structures Closest Cache Tables are the most important data structure in GosCC. Mobile node MNi concerns the information about the closest cache node MNj that has a copy of the data item dk for
X. Fan et al. / J. Parallel Distrib. Comput. (
)
–
5
Table 4 Notations used in Algorithm 1. Tg GossipReport (MNi , TS , GR) fi Tr CacheRequest CacheUpdate(SRj, dj, version) ClosestCacheTable ProgressTable RC R CacheReply(MNi , dk ) x Tttl
Gossiping timer, each node gossips every Tg seconds Gossiping Report sent by each node periodically, including C _DIGEST and P_DIGEST in Table 3 The number of gossiping targets for mobile node MNi Cache request timer, each node sends cache copy requests periodically Cache copy requests sent by each node according to the priority index Cache copy updates sent by the corresponding data source SRj Each node maintains a table to know the closest cache node for each data item Each node stores the progress information for other mobile nodes Round counter attached to a gossip report to indicate the times of the report being forwarded The maximum round for each gossip report being forwarded Cache copies dk sent by MNi to the requestor The data ID that has appeared the maximum amount of times in a gossip report The TTL (time to live) value attached to the cache copy sent by the data source
each data item dk in the network. Therefore, MNi maintains a closest_cache table, where an entry in the closest_cache table is of the form (dk , MNj ) and MNj is the closest cache node that has the copy of dk , known by MNi . Note that if MNi is the data source or has cached dk , MNi is the closest cache node for dk .
Progress Tables is another important data structure maintained by each mobile node. In the progress table, mobile node MNi stores the progress information for other mobile nodes when MNi receives a progress digest from other nodes. An entry in the progress table is of the form (MNi , Current_ID, Timestamp).
6
X. Fan et al. / J. Parallel Distrib. Comput. (
(2) Gossip-based Cooperative Caching Algorithm GosCC is mainly composed of four procedures: gossiping, sending cache requests, cache replacement and maintaining cache consistency. The details are described as follows. Every Tg seconds, MNi builds up a GossipReport (MNi , TS , GR) message, where TS is the timestamp and GR is its current gossip report. MNi selects fi mobile nodes as its gossiping targets and send the GossipReport (MNi , TS , GR) message to these targets. The fanout fi follows the Poisson distribution [14,22]. If MNj is one of the MNi ’s gossiping targets, MNj receives GossipReport (MNi , TS , GR), and updates its ClosestCacheTable and ProgressTable. If the attached round counter in GR is smaller than the maximum round R, MNj forwards the GossipReport (MNi , TS , GR) to fj gossiping targets. The gossip operation does not finish until the round counter in the GR is equal or greater than R. Every Tr seconds, mobile node MNi carries out BNS and builds up several CacheRequest messages and sends them to the corresponding closest cache nodes from MNi ’s ClosestCacheTable. Note that the maximum number of cache requests should be smaller than the maximum number of data items that can be cached in the available cache size. Once MNi receives GossipReport(MNj , TS , GR) from MNj , there are two operations. Firstly, MNi calculates the variation range VAR_RANGE and checks whether it is greater than THr or not. If the answer is yes, MNi selects the mobile node MNj with MIN_ID as the target to help. If MNi holds the copy of the next data item that MNj maybe requires, MNi sends a CacheReply message with the data item dk to MNj . Secondly, MNj checks the progress digest in GossipReport(MNj , TS , GR). If the data item dj has appeared the maximum amount of times and there are still available spaces at MNj , MNj sends a CacheRequest message for data item di+BW to its closest cache node. When the cache space is full and there are still other data items that need to be cached at MNi , MNi removes all the data items with their IDs smaller than MIN_ID in its progress table. If there is no such data items, MNi just removes the data items with the oldest timestamp. When the data source SRj finds that dj is changed, SRj sends a CacheUpdate message to its cache nodes. If MNi is one of the cache nodes of dj , it receives CacheUpdate. It then updates the copy of dj and sets the TTL Tttl . 4. Parameter analysis In this section, we provide the analysis work for the parameters in GosCC (see Table 5). Firstly, we provide a brief introduction to the Gossiping-based Algorithms (GAs) and point out the key parameters. Secondly, we focus on two problems in GosCC: (1) what percentage of mobile nodes can eventually receive a gossip message; and (2) how long a gossip message can be propagated to all the mobile nodes. Thirdly, we evaluate the probability that a Use-in-Time event happens, i.e., the probability that a data item can be accessed from the local cache when it is in need. Gossip-based Algorithms, i.e., epidemic algorithms, have been well examined in recent years [12]. Such an algorithm is considered as a potentially efficient solution for disseminating information in large-scale systems [22,12]. Gossip-based Algorithms mimic the spread of a contagious disease. In addition to their inherent scalability, gossip-based algorithms are very easy to deploy, robust, and resilient to failures. It is possible to adjust the parameters of a gossip-based algorithm to achieve high reliability despite process crashes and disconnections, packet losses, and a dynamic network topology [7,19]. Gossiping [10,24,3,17] is well matched to the needs of ad hoc networks because it is a controlled form of flooding, i.e., messages are propagated through the whole network without
)
–
Fig. 2. Relation between gossiping reliability Rg and mean fanout f .
congesting the wireless medium. Gossiping is also independent of network topology. Gossip-based multicast protocols rely on a peerto-peer interaction model for multicasting a message, and they are scalable since the load is distributed among all participating nodes. Redundant messages are used to achieve reliability and fault tolerance. In our analysis, we apply the infect-and-die model [21,4]. Each node gossips the same gossip message exactly once, i.e., each node forwards the gossip message to its gossiping targets only for the first time when receiving the gossip message, even if the node maybe receives the copies of the message from other nodes again. The fact that a healthy individual is infected by an infectious, means a mobile node receives a gossip message from another mobile node successfully. Basically, each node forwards the gossip message with a limited number of times R. Each mobile node forwards the gossip message each time to a randomly selected set of mobile nodes of limited size f , the mean fanout of the dissemination. To simplify our analysis, the fanout f is the expectation of a Poisson distribution Po(f ) [14,22]. In this model, we mainly focus on two problems. The first one is what percentage of mobile nodes eventually receives a gossip message, i.e., the reliability of gossiping. The second one is how long a gossip message can be propagated to all the mobile nodes, i.e., how many rounds R is required to make all nodes receive a gossip message. In the first problem, let msg be a gossip message and let Rg be the proportion of mobile nodes that received msg after one execution of our gossiping algorithm, i.e., the probability that the message msg can be received by a mobile node after one execution of our gossiping algorithm. It has been proved that Rg can be calculated from the following equation [14,22]. Rg = 1 − e−fRg .
(5)
To explain the relation between Rg and f , we can transform Eq. (5) to Eq. (6). f = ln(1/(1 − Rg ))/Rg .
(6)
As Fig. 2 shows, we increase the value of the mean fanout f to improve the reliability of gossiping Rg dramatically. It is necessary to increase the mean fanout to a much greater value, like 10, so that we can make the reliability to reach 99.99% [14]. However, the message cost should also be considered to balance the tradeoff between the reliability and message cost.
X. Fan et al. / J. Parallel Distrib. Comput. (
)
–
7
Table 5 Notations used in parameter analysis. R f Po(f ) msg Rg
The rounds of gossiping or the maximum number of times for each node gossiping a message The mean fanout, the average number of gossiping targets for all the mobile nodes A Poisson distribution with its expectation f A gossip message The proportion of mobile nodes that received msg
UIT (dk , MNi ) td
The Use-in-Time event that MNi uses the data item dk without any interruption interval The consuming time for a mobile node consumes a data item The speed of mobile user consumes data items The average length of the shortest path from MNi to the closest cache node that holds the copy of dj The delay for each hop The number of executions in which a mobile node receives the gossiping message msg during R rounds The probability that a mobile node receives the gossiping message msg At least one time within R rounds The timer for sending cache requests The gossip timer Total time of running our proposed GosCC Is the total number of messages generated by gossiping
vu
h thop X PR Tr Tg Ttotal Mg
intervals T (G, M ) can be calculated as follows: T (G, M ) =
n m (1 − P (UIT (dj , MNi ))) × thop × h × Rg .
(8)
i=1 j=1
Our objective is to minimize total interruption intervals under the constraint of cache sizes. If the event UIT (dk , MNi ) happens, dk should be cached before all the data items in the buffer window BW are consumed completely. However, the occurrence of the event UIT (dk , MNi ) should be under the premise that the gossip message msg is received by all the mobile nodes within R rounds. If a gossip message is gossiping only by one round, we consider that the gossiping algorithm is executed one time. In the repeated executions, each execution can be viewed as one independent Bernoulli trial [14]. R rounds of executions can be considered as an R times Bernoulli trials. We define X as the number of executions in which a mobile node receives the gossiping message msg during R rounds. We do not consider how many times for each node to receive the message msg in one execution. Therefore, X follows a Binomial distribution B(R, Rg ). The distribution of X is described as follows:
Fig. 3. Relation between rounds R and number of nodes n.
The second problem is about the latency of infection, which evaluates the speed of gossiping. Bella [4] shows that the number of necessary rounds R to infect all of the nodes can be obtained as Eq. (7) shows. R = ln(n)/ ln(ln(n)) + O(1).
(7)
However, the result from Eq. (7) is only the theoretical result. In practice, every intermediate node on the path from the sender to the receiver can know the content of the gossip message. We only consider R in Eq. (7) as an upper bound. As Fig. 3 shows, the necessary round R for 100 nodes is almost 3. Next, we evaluate the probability that a Use-in-Time event happens. Let dk be one of the data items. We define the event UIT (dk , MNi ) as the Use-in-Time event that MNi uses the data item dk without any interruption interval. Let td be the consuming time for a mobile node consumes a data item. Let vu be the speed of mobile user consuming data items. Therefore the consuming time td can be calculated by Sd /vu . The interruption interval t (i, j) is zero if MNi accesses data item dj from its own cache. If dj is not in the local cache, MNi is required to obtain the copy from other nodes, including the data source SRj . Let h be the average length of the shortest path from MNi to the closest cache node that holds the copy of dj . We assume that each hop delay is the same and denoted by thop . But the premise for a successful retrieval is that each mobile node knows the cache content of other nodes. This is guaranteed by the reliability of gossiping Rg in a probabilistic way. Therefore, total interruption
P (X = k) = CRk (Rg )k (1 − Rg )R−k
(9)
where k = 0, 1, 2, . . . , R. We denote PR as the probability that a mobile node receives the gossiping message msg at lease one time within R rounds. Therefore, PR can be obtained from the following equation: PR = P (X ≥ 1) = 1 − (1 − Rg )R .
(10)
Finally, the following condition in Eq. (11) should be satisfied when we calculate total interruption intervals in Eq. (8). S Sd
≥ BW ≥
2h × thop × (1 − (1 − Rg )R ) td
.
(11)
However, it does not mean that the greater the buffer window BW is, the better the performance is. Since the number of cache requests sent by a mobile node is based on the size of the buffer window, we should also consider the message cost. There is a trade-off between total interruption intervals and total message cost. Therefore, we set the timer Tr for sending cache requests as follows: Tr =
BW × Sd
vu
.
(12)
To determine the value of the gossip timer Tg , the number of messages caused by gossiping should be considered. Let Ttotal be the total time of running our proposed GosCC. Let Mg be the total
8
X. Fan et al. / J. Parallel Distrib. Comput. (
)
–
number of messages generated by gossiping. We can obtain Mg from Eq. (13). Mg =
Ttotal Tg
× n × f R × h.
(13)
As Eq. (13) shows, the gossip timer Tg is related to not only the parameters of gossiping algorithm, but also the network condition. In real applications, GosCC is required to detect the average length of the shortest path from the source to gossiping targets. 5. Simulation In this section, we demonstrate the performance of GosCC, compared with Benefit-Based Data Caching (BDC) [26] and HybridCache [29] through simulations over randomly generated network topologies. To the best of our knowledge, BDC is with the best performance of cooperative caching among existing works, in terms of average query delay and caching overheads. 5.1. Simulation setting
Fig. 4. Average interruption intervals vs. data item size.
Our simulations are carried out by making use of the NS2 simulator [13]. The NS2 simulator contains different models for routing protocols in common ad hoc networks, IEEE Standard 802.11 MAC layer protocol, and two-way ground reflection propagation models [5]. The DSDV routing protocol [23] is used in our work to provide routing services. (1) Network setup We simulated GosCC, BDC and HybridCache on a network of 100 mobile nodes in an area of 2000 × 500 m2 . Mobile nodes move based on the random waypoint model [5] in NS2. In this model, each node selects a random destination and moves towards the destination with a speed selected randomly from (0 m/s, vmax m/s). After the node reaches its destination, it pauses for a period of time (100 s in our simulation) and repeats the movement pattern. There are 1000 data items with the same size as 1000 bytes. We set up two randomly placed data sources (servers) S0 and S1 as the gateway nodes, where S0 maintains the data items with even IDs and S1 maintains the ones with odd IDs. (2) Client query model Each mobile node is a client node and sends out a stream of http requests by following the sequential order of data items from d1 to d1000 . Once a mobile node receives an http reply or obtains the requested data from passing-by reply messages, the mobile node makes the schedule for requesting the next data item, according to the size of data item and the consuming data speed of mobile nodes vu . We count the interruption interval for each data item and sum them up the total interruption intervals. (3) Data access pattern Each data item should be accessed successfully by local cache or from other nodes’ replies. Otherwise, the mobile node will wait for the requested data item until it obtains the data item. In our simulation, each mobile node accesses the 1000 data items by following the sequential order from 1 to 1000. (4) Consistency model We apply a simple TTL-based caching consistency strategy in our simulation. The value of the timeout is set 8 s. (5) Performance metrics Comparisons are performed according to three metrics: average interruption intervals, average interruption times, and caching overheads. Average Interruption Interval and Average Interruption Times have been defined in the Introduction. Caching overheads includes all of the packets in our caching system, i.e., data requests, data replies, and other messages for caching systems. Note that these packets do not include routing packets because the three strategies use the same routing protocol DSDV.
Fig. 5. Average interruption times vs. data item size.
5.2. Simulation results In this subsection, we present the simulation results comparing three caching strategies, viz. BDC, GosCC and HybridCache under the same data access pattern, and study the impacts of various values on performance metrics. In all the plots, each data point represents an average of 20 rums. The 95% confidence intervals are described by error bars in all the figures. (1) Varying data item size Sd In Figs. 4–6, we vary the data item size from 1000 bytes to 8000 bytes while keeping total cache size as 750 kbytes, the maximum speed vmax as 2.0 m/s and the consuming data speed vu as 400 bytes per second. We observe that GosCC outperforms BDC and HybridCache in terms of average interruption intervals and average interruption times. However, the message cost is higher than BDC. The simulation results show that the reliability of gossiping depends on redundant messages. (2) Varying consuming data speed vu In Figs. 7–9, we vary the consuming data speed vu from 100 to 1000 bytes per second in our sequential access pattern while keeping the cache size as 750 kbytes, and the maximum speed vmax as 2.0 m/s. We observe that GosCC outperforms BDC and HybridCache in terms of average interruption intervals, average interruption times. Caching overheads of GosCC is better than that
X. Fan et al. / J. Parallel Distrib. Comput. (
)
–
9
Fig. 6. Caching overheads vs. data item size.
Fig. 9. Caching overheads vs. consuming data speed.
Fig. 7. Average interruption intervals vs. consuming data speed.
Fig. 10. Average interruption intervals vs. maximum speed.
Fig. 8. Average interruption times vs. consuming data speed.
of BDC and HybridCache after the consuming data speed is greater than 400 bytes per second. The simulation results show that the performance of GosCC is stable and almost independent to the consuming data speed.
(3) Varying maximum speed vmax In Figs. 10–12, we vary the maximum speed vmax from 2 to 20 m/s in our data access pattern while keeping the cache size as 750 k bytes, and the consuming data speed as 400 bytes per second. We observe that GosCC outperforms BDC and HybridCache in terms of average interruption intervals and average interruption times. But caching overheads of GosCC are worse than that of BDC and HybridCache. In summary, simulation results show that GosCC achieves much better performance than BDC and HybridCache in terms of average interruption intervals, average interruption times. However, the message cost is normally greater than that of BDC and HybridCache. The reasons are explained as follows. Firstly, we consider the sequential relation among data items so that GosCC is much ‘‘smarter’’ than other cooperative caching strategies. The progresses of mobile users consuming data items play a very important role in making caching decisions. The global view of the progress information is propagated throughout the whole network in a timely way. Secondly, redundant messages in gossiping provide reliability at the cost of message overheads. This will be one of our future works to control redundant messages in a more grained way.
10
X. Fan et al. / J. Parallel Distrib. Comput. (
Fig. 11. Average interruption times vs. maximum speed.
Fig. 12. Caching overheads vs. maximum speed.
6. Related work The problem of caching placement has been widely studied in the context of both wired networks and wireless networks. In this paper, we mainly focus on the cache placement problem for sharing multiple data items in mobile wireless networks. Existing solutions can be categorized into three categories, including selfish schemes, cooperative schemes and global schemes. Firstly, the selfish schemes make mobile nodes cache data items only by their own preferences. Hara [16] proposes the Static Access Frequency scheme (SAF) that makes each node cache the items most frequently accessed by self. Yin and Cao [28] proposes the Greedy Scheme that considers the impact of the access frequencies and the sizes of data items on the caching decision. Data items with higher access frequencies and smaller sizes are more preferably cached. Under such schemes, the performance is even worse when there are fewer client nodes and access frequencies are uniform for all the nodes. Secondly, cooperative schemes consider the requirements from both self and other nodes. Hara [16] presents the Dynamic Access Frequency and Neighborhood scheme that eliminates the replica duplication among neighboring mobile hosts in order to improve SAF. Dynamic Connectivity based Grouping (DCG) is the third cooperative caching schemes proposed in [28]. DCG aims at sharing replicas in larger group of mobile hosts than
)
–
DAFN that shares replicas among neighboring nodes. However, it is not easy to find stable nodes to act as ‘‘central nodes’’ in order to collect neighboring information and determine caching placements, when there are frequent failures and movements in an IMANET. This is with the same problem as the one in One-To-One Optimization (OTOO) [28] and Reliable Neighbor Scheme (RN) [28]. Yin and Cao [29] proposes three distributed caching schemes, viz. CacheData, CachePath, and HybridCache. CacheData caches the passing-by data items at each node. CachePath caches the path to the nearest cache of the passing-by data item. HybridCache caches the data item if its size is small enough, else it caches the paths to the data item. These three schemes are simple but efficient schemes. However, the only consideration is that CachePath depends on the modification of routing protocols and sometimes the node which modified the route should reroute the request to the original data source. Zhou et al. present Group-based Peer-to-peer Cooperative Caching (GroCoca) [8] to improve the performance of data access in mobile environments. In GroCoca, a tightly-coupled group is maintained as a collection of peers that possess a similar mobility pattern and display similar data affinity. However, GroCoca relies on Mobile Support Station (MSS) and it is not suitable for IMANETs. Thirdly, global schemes consider the impact of the benefit on the whole system when a node intends to evaluate the result of its caching decision. Tang et al. present Benefit-based Data Caching (BDC) scheme [26] in order to maximize the benefit (i.e., the reduction in total access cost) instead of minimizing total access cost. To the best of our knowledge, it is the best solution that presents approximation algorithms for the cache placement problem for sharing multiple data items under memory constraints. However, BDC has similar performance compared with other schemes in mobile environments, and it cannot determine the value of the benefit threshold in the distribution version. Moreover, data servers periodically broadcast to the entire network the latest cache list, which results in much message cost. Du et al. propose Cooperative Caching (COOP) scheme [11] for ondemand data access applications in MANETs. COOP increases the effective capacity of cooperative caching by minimizing caching duplications within the cooperation zone and accommodating more data varieties. However, data requests are resolved by flooding within a cooperative zone. This will incur much message cost in an IMANET. 7. Conclusions In this paper, we first define the Sequential Relation Cache Placement problem (SRCP) and our objective is to minimize total interruption intervals. SRCP is very different from the traditional cache placement problem, in the way that we take user experience into our consideration. To increase the probability that mobile nodes access data items in their local caches, we present Gossip-based Cooperative Caching to make mobile nodes exchange their information on cache contents and the progress of using data in a gossip-based way. We introduce three heuristic rules to minimize total interruption intervals. In our simulations, we evaluate GosCC in IMANETs with various settings, in terms of average interruption intervals, average interruption times, and caching overheads. The results demonstrate that GosCC outperforms BDC and HybridCache in terms of average interruption intervals and average interruption times, while sacrificing message cost to a certain degree. Additionally, GosCC presents very stable performance so it is suitable for a mobile wireless environment.
X. Fan et al. / J. Parallel Distrib. Comput. (
Acknowledgments This research is partially supported by the National Natural Science Foundation of China (No. 61202416), Shenzhen Strategic Emerging Industry Development Funds (No. JCYJ20120615130218295), the Hong Kong RGC under the GRF grant PolyU 5102/08E and the Hong Kong Polytechnic University under the grant 1-BB6C. The authors would like to thank Dr. Bin Tang for providing us with BDC and HybridCache simulation codes. References [1] I. Baev, R. Rajaraman, Approximation algorithms for data placement in arbitrary networks, in: Proc. of ACM-SIAM Symposium Discrete Algorithms, SODA’01, 2001. [2] S. Bespamyatnikh, B. Bhattacharya, D. Kirkpatrick, M. Segal, Mobile facility location, in: 4th International Workshop on Discrete Algorithms and Methods for Mobile Computing & Communications, 2000. [3] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, Y. Minsky, Bimodal multicast, ACM Transactions on Computer Systems 17 (2) (1999) 41–88. [4] B. Bollobás, Modern Graph Theory, Springer-Verlag, New York, pp. 215–251. [5] J. Broch, D.A. Maltz, D.B. Johnson, Y.-C. Hu, J. Jetcheva, A performance comparison of multi-hop wireless ad hoc network routing protocols, in: Proc. of MOBICOM, 1998. [6] J. Cao, Y. Zhang, L. Xie, G. Cao, Data consistency for cooperative caching in mobile environments, IEEE Computer (Apr.) (2007). [7] R. Chandra, V. Ramasubramanian, K. Birman, Anonymous gossip: improving multicast reliability in mobile ad hoc networks, in: Proc. of ICDCS, 2001, pp. 275–283. [8] C. Chow, H. Leong, A. Chan, GroCoca: group-based peer-to-peer cooperative caching in mobile environments, IEEE Journal on Selected Areas in Communications 25 (1) (2007). [9] M.S. Corson, J.P. Macker, G.H. Cirincione, Internet-based mobile ad hoc networking, IEEE Internet Computing (July–August) (1999) 63–70. [10] A.J. Demers, D.H. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, D. Terry, Epidemic algorithms for replicated database maintenance, in: Proc. of PODC, 1987, pp. 1–12. August. [11] Y. Du, S. Gupta, G. Varsamopoulos, Improving on-demand data access efficiency in MANETs with cooperative caching, Elsevier Ad Hoc Networks (2009) 579–598. [12] P. Euqster, R. Guerraoui, A. Kermarrec, L. Massoulieacute, Epidemic information dissemination in distributed system, IEEE Computer 37 (5) (2004). [13] K. Fall, K. Varadhan, NS notes and documentation, The VINT Project, U.C. Berkeley, LBL, USC/ISI, and Xerox PARC, 1997. [14] X. Fan, J. Cao, W. Wu, M. Raynal, On modeling fault tolerance of gossip-based reliable multicast protocols, in: Proc. of ICPP, 2008. [15] M. Fiore, F. Mininni, C. Casetti, C. Chiasserini, To Cache or not to Cache, in: Proceeding of IEEE International Conference on Computer Communications, 2009. [16] T. Hara, Effective replica allocation in ad hoc networks for improving data accessibility, in: Proc. of INFOCOM, 2001. [17] A.-M. Kermarrec, L. Massoulie’, A.J. Ganesh, Probabilistic reliable dissemination in large-scale systems, IEEE Transaction on Parallel and Distributed Systems 14 (3) (2003). [18] Y. Lin, W. Lai, J. Chen, Effects of cache mechanism on wireless data access, IEEE Transactions on Wireless Communications 2 (6) (2003). [19] J. Luo, P.T. Eugster, J.P. Hubaux, Route driven gossip: probabilistic reliable multicast in ad hoc networks, in: In Proc. of INFOCOM, 2003. [20] Y. Ma, A. Jamalipour, A cooperative cache-based content delivery framework for intermittently connected mobile ad hoc networks, IEEE Transactions on Wireless Communications 9 (1) (2010). [21] J.D. Murray, Mathematical Biology, second ed., Springer, Berlin, 1993. [22] M. Newman, S. Strogatz, D. Watts, Random graphs with arbitrary degree distributions and their applications, Physical Review E 64 (2001) 026118. [23] C. Perkins, P. Bhagwat, Highly dynamic dsdv routing for mobile computers, in: Proc. of SIGCOMM, 1994. [24] R. Renesse, Y. Minsky, M. Hayden, A gossip-style failure detection service, in: Proc. of IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing, 1998. [25] M. Taghizadeh, A. Plummer, S. Bisws, Cooperative caching for improving availability in social wireless networks, in: Proc. of MASS, 2010. [26] B. Tang, H. Gupta, S. Das, Benefit-based data caching in ad hoc networks, in: Proc. of ICNP, Santa Barbara, California, 2006, November. [27] Y. Wang, J. Wu, Z. Jiang, F. Li, A joint replication-migration-based routing in delay tolerant networks, in: Proc. of the IEEE International Conference on Communications, ICC, 2012.
)
–
11
[28] L. Yin, G. Cao, Balancing the tradeoffs between data accessibility and query delay in ad hoc network, in: Proc. of SRDS, 2004. [29] L. Yin, G. Cao, Supporting cooperative caching in ad hoc networks, IEEE Transaction on Mobile Computing 5 (1) (2006) 77–89. [30] Y. Zhao, J. Wu, B-sub: a practical bloom-filter-based publish-subscribe system for human networks, in: Proc. of the 30th International Conf. on Distributed Computing Systems, ICDCS, 2010. [31] Y. Zhao, J. Wu, Socially-aware publish/subscribe system for human networks, in: Proc. of IEEE Wireless Communications and Networking Conference, WCNC, 2010.
Xiaopeng Fan received the Ph.D. degree in Computer Science in 2010 from Hong Kong Polytechnic University. He also received the BE and ME degrees in Computer Science at the Xidian University, Xi’an, China, in 2001, and 2004 respectively. He is currently an Assistant Professor in the Shenzhen Institutes of Advanced Technology, the Chinese Academy of Sciences. His research interests include mobile computing, cloud computing, wireless communication, and software engineering. His recent research has focused on mobile cloud computing, data center and mobile data management. He has served as a reviewer for several international journals/conference proceedings.
Jiannong Cao (M’93–SM’05) received the B.Sc. degree in Computer Science from Nanjing University, China, in 1982, and the M.Sc. and Ph.D. degrees in Computer Science from Washington State University, Pullman, Washington, in 1986 and 1990, respectively. He is currently a Professor in the Department of Computing at Hong Kong Polytechnic University, Hung Hom, Hong Kong, where he is also the director of the Internet and Mobile Computing Lab. Before joining Hong Kong Polytechnic University, he was on the Faculty of Computer Science at James Cook University, the University of Adelaide in Australia, and the City University of Hong Kong. His research interest includes parallel and distributed computing, networking, mobile and wireless computing, fault tolerance, and distributed software architecture. He has published more than 200 technical papers in the above areas. His recent research has focused on mobile and pervasive computing systems, developing testbed, protocols, and middleware and applications. He is a senior member of the China Computer Federation, a senior member of the IEEE, including the Computer Society and the IEEE Communication Society, and a member of the ACM. He is also a member of the IEEE Technical Committee on Distributed Processing, the IEEE Technical Committee on Parallel Processing, and the IEEE Technical Committee on Fault-Tolerant Computing. He has served as a member of editorial boards of several international journals, a reviewer for international journals/conference proceedings, and also as an organizing/program committee member for many international conferences.
Haixia Mao received the BE and ME degrees in Remote Sensing and Geo-informatics at Wuhan University, Wu Han, China, in 2001, and 2004 respectively. She received the Ph.D. degree in Remote Sensing and Geo-informatics in 2010 from Hong Kong Polytechnic University. She is currently an Assistant Professor in the School of Automotive and Transportation Engineering at Shenzhen Polytechnic, China. Her research interests focus on data management, remote sensed image processing and massive geo-spatial data mining in geo-informatics.
Yunhuai Liu earned his B.S. degree from the Computer Science and Technology Department of Tsinghua University in July, 2000. Then he got his first job at Hewlett Packard (China) in Beijing as a System Engineer. After nearly three years of working with HPUX, HP servers and storage, he began to pursuit his Ph.D. at the Computer Science Department of the Hong Kong University of Science and Technology. In July 2008, he passed the final part of his PHD. His thesis topic is ‘‘Probabilistic Topology Control in Wireless Sensor Networks’’. After that, he worked as a Research Assistant Professor in the Hong Kong University of Science and Technology. In August 2010, he joined the Institute of Advanced Computing and Digital Engineering, Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS) to start a new career. He is now an Associate Researcher.