An Adaptive Cooperative Caching Strategy (ACCS) for Mobile Ad Hoc Networks

An Adaptive Cooperative Caching Strategy (ACCS) for Mobile Ad Hoc Networks

Accepted Manuscript An Adaptive Cooperative Caching Strategy (ACCS) for Mobile Ad hoc Networks Ahmed. I. Saleh PII: DOI: Reference: S0950-7051(17)30...

2MB Sizes 2 Downloads 85 Views

Accepted Manuscript

An Adaptive Cooperative Caching Strategy (ACCS) for Mobile Ad hoc Networks Ahmed. I. Saleh PII: DOI: Reference:

S0950-7051(17)30006-0 10.1016/j.knosys.2017.01.005 KNOSYS 3782

To appear in:

Knowledge-Based Systems

Received date: Revised date: Accepted date:

8 November 2015 31 December 2016 1 January 2017

Please cite this article as: Ahmed. I. Saleh , An Adaptive Cooperative Caching Strategy (ACCS) for Mobile Ad hoc Networks , Knowledge-Based Systems (2017), doi: 10.1016/j.knosys.2017.01.005

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

An Adaptive Cooperative Caching Strategy (ACCS) for Mobile Ad hoc Networks Ahmed. I. Saleh [email protected] Computer Engineering & Systems Dept., Faculty of Engineering, Mansoura University, Mansoura, Egypt

Abstract:

M

Keywords: Ad hoc Networks, Caching, Routing.

AN US

CR IP T

Data caching can remarkably improve the data availability in mobile ad hoc networks (MANETs) by reducing the access latency and bandwidth usage. Unfortunately, due to mobility and resource constraints, caching techniques designed for wired network are not applicable to MANETs. Moreover, frequent data updates, limited Mobile Terminal’s (MT’s) resources, insufficient wireless bandwidth and MT’s mobility make cache management a tedious task. However, as MTs in MANETs may have similar tasks and share common interests, cooperative caching, which allows the sharing and coordination of cached data among multiple MTs, can be used to reduce the bandwidth and power consumption. Hence, a nearby MT can serve requests instead of the distant data source. The originality of this paper is concentrated on introducing an adaptive cooperative caching strategy (ACCS) with a novel cache replacement and prefetching policies. ACCS divides the network into non-overlapping clusters. Unlike other caching techniques that employ reactive routing protocols, ACCS employs a novel built-in table driven routing strategy with no additional penalties. Such behavior significantly minimizing the query delay. The secret lies in collecting the routing information during the clusters formulation, then fill the routing tables accordingly. ACCS has been compared against recent cooperative caching strategies. Experimental results have shown that ACCS outperforms other strategies as it introduced the maximum cache hit as well as the minimum query latency.

1. Introduction

AC

CE

PT

ED

Recently, with the proliferation of portable computing devices as well as the massive development in wireless communication technologies, mobile ad hoc networks (MANETs) have gained a worldwide attention [1]. MANET is a multi-hop wireless network consisting of a collection of small wireless computing nodes such as: conventional computers (e.g., PC, PDA, or laptop), or embedded processors devices such as: tiny, low-cost, and low-power sensor [2]. They dynamically forming a temporary network for instant communication with no centralized administration or any prior infrastructure [3]. Accordingly, MANETs can be easy deployed and expanded allowing the establishment of temporary communications without any pre-installed infrastructure. They are mainly constructed for information sharing as well as task coordination among a group of people. However, due to the lack of infrastructure support, each node in MANET acts as a terminal and a router, so it has the ability to forward data packets for other nodes. MANETs have potential applications in civilian and military environments. For illustration, they are suitable for mobile conferencing, disaster and emergency relief, battlefield maneuvers, wireless offices, and mobile info-stations (in restaurants, tourist centers, and so on), making them an attractive research area [4]. Although the fundamental challenge in MANETs is centralized on enhancing the process of discovering routes among the network mobile terminals (MTs), caching also has become an attractive area of research [5]. This happened as the ultimate goal of MANETs is to provide a reliable data communication among MTs. Recent research in MANETs has recorded several challenging issues in data accessing due to frequent disconnections, limited bandwidth, high mobility of MTs, and the poor power resources (e.g., battery life and storage capacity of MTs) [6]. In MANETs, due to frequent network partition, data availability is lower than that of traditional wired networks. To solve such problem, two approaches can be used, which are; data replication and data caching [7]. Data replication is the maintenance of multiple copies of data at multiple locations. Data caching, on the other hand, is an efficient solution for data availability and query delay problems. It has been widely used in several fields such as: multi-processor, CPU design, memory architecture, and router design. Furthermore, Internet uses cache placement and replacement in proxy servers to significantly reduce the network latency of data queries [8]. 1

ACCEPTED MANUSCRIPT

Caching is a suggested solution to increase the efficiency of data access in MANETs by allowing MTs to copy some parts of data they want from the data source and store it in its local memory [5]. By caching or holding some pieces of data, MTs will not need to ask the data source for that cached data if such data needed in the near future. Accordingly, the data access latency is reduced as the service is provided locally [6]. However, each MT cannot cache all the data it predicts to use in the future. Moreover, caching techniques that have been successfully applied to one hop mobile environment are not suitable for multi-hop mobile environments such as MANETs since data or request may have to go through multiple hops. As a result, caching alone is not sufficient to guarantee low communication latency and high data accessibility in MANETs. A good solution for such issue is to support the cooperation among MTs, which saves the limited resources and overcomes the dynamicity of MANETs.

CR IP T

As MTs in MANETs may have similar tasks and usually share common interests, cooperative caching (COCA), which allows the sharing and coordination of cached data among multiple MTs, can be applied [9]. COCA allows MTs to collaborate with each other‟s to improve the system performance by forming a large unified cache. Such unified cache is formulated by sharing the local caches of the adjacent nodes. Accordingly, COCA allows a MT to obtain data items not only from its local cache but also from the cache of its neighbors. This reduces the query latency as well as bandwidth and power consumption since fewer MTs are involved in the query process. In addition, because the data center handles fewer requests, the workload is spread over the network, reducing the load on the data center. Furthermore, by applying COCA MTs have the ability coordinate the caching data tasks such as: the cache placement, cache replacement, redirection of data requests, and cache invalidation process. Although COCA has been extensively studied in wired networks, little work has been done to apply this technique to MANETs [9].

ED

M

AN US

Generally, designing an efficient cooperative caching mechanism for MANETs is challenged by three essential issues, which are; unpredicted nodes mobility, quickly drained battery power, and limited bandwidth [10]. Those issues directly affect the data availability as the caches of some MTs may become unavailable due to their mobility or when they are shut down because of power shortage, or switched to sleep mode to save power. Accordingly, a good cooperative caching for MANET should satisfy two requirements which are; (i) the ability to operate under unpredicted network partitioning, nodes mobility, and node failures, as the network area may be large with no availability of a fixed infrastructure. (ii) The support of a realtime communication as the tasks being performed by a MANET are often time-critical, such as military operations and disaster relief [9]. Accordingly, the design of a cooperative caching for MANET should address several issues, which are; (i) the existence of a good Cache Discovery Algorithm (CDA) to discover and deliver requested data items among the neighboring nodes [10]. CDA is also used to decide which data items should be cached for future use. (ii) There should be a Cache Replacement Algorithm (CRA) to make room for new items by replacing one or more of the cached ones when the cache is full [11-13]. (iii) A Cache Consistency Algorithm (CCA) should be also considered to ensure that the cached data items are always updated.

AC

CE

PT

This paper introduces a new cooperative caching strategy for MANET, which is called; Adaptive Cooperative Caching Strategy (ACCS). The ultimate goal of ACCS is to improve the data availability, so, data should be available on time by utilizing the minimum resources. ACCS divides the network into non-overlapping clusters in which each cluster is managed by a Cluster Master (CM) node. It has salient properties that other COCA techniques do not have such as: (i) ACCS minimizes the cache miss by using efficient anti-cache miss algorithms, which are; anti-local miss, anti-zone miss, and anticluster miss. (ii) ACCS maximizes the data availability by employing a new procedure called Keep Sufficient Copies (KSC). KSC allows the cluster to maintain a pre-defined copies of each data item. Hence, it keeps data availability at sufficient level while saving the cache space in spite of nodes mobility (which may cause several caching nodes to leave the cluster), and the sudden death of nodes. Also, using KSC, redundant copies are eliminated by keeping a predefined number of copies of each item within the cluster. This allows the cluster to cache as more distinct data items as possible. (iii) ACCS provides a new cache replacement policies based on a new fuzzy inference system, which is called Adaptive Cache Replacement (ACR). (iv) A new prefetching policy is also maintained by ACCS. Accordingly, the pre-fetched items can be used in the near future by the node itself or by one of its neighbors. (v) ACCS has a built-in table driven routing technique that provides the ability to fetch the data through the shortest paths, which makes it suitable for time sensitive and real time applications. Generally, ondemand routing protocols, such as AODV, are preferred in MANETs, hence, to transmit data between two nodes, extra time is needed to setup the connection, which increases the query delay. Unlike other MANET caching strategies, ACCS fills the routing tables of the network cluster masters with the valid and shortest routes to all cluster member during the cluster formulation, which is a novel idea. As clusters are reformulated periodically, routing tables are periodically updated to keep track with the dynamic topology of the MANET. This behavior minimizes the query delay and also maximizes the network lifetime in the case of power limitations as the minimum resources are used to transfer data. Moreover, due to its simplicity, (vi) ACCS does not have any additional overhead in implementation.

2

ACCEPTED MANUSCRIPT

To summarize, this paper contributes on the following three aspects:

 

 

Introducing a complete caching strategy with a new built-in routing strategy that can efficiently discover the shortest path among each pair of nodes in the network. Developing a new clustering strategy that can perfectly and quickly organize the network MTs into nonoverlapped clusters. Accordingly, all nodes can be easily combined to the nearest cluster if it can‟t register itself to the cluster master through the cluster discovery messages. Providing a set of consistent rules for data pre-fetching, which guarantees the maximum caching hit rate. Introducing a new adaptive cache replacement technique, which promotes the cache space utilization.

2.

CR IP T

Experimental results have shown that ACCS yields significant improvements compared to the recent cooperative caching techniques designed for MANETs. The paper is organizes as follows, section 2 concentrates on the concept of MANET clustering, section 3 summarizes the recent work in MANET caching, section 4 explains the proposed caching strategy (e.g., ACCS) in details, section 5 introduces Overheads and Added Penalties of ACCS, experimental results are introduced in section 6, while our conclusions and future work are depicted in sections 7 and 8 respectively.

MANET Clustering

PT

ED

M

AN US

From the functional point of view, we can identify two basic models of MANETs, which are; (i) Peer-To-Peer MANETs (PMANETs) and (ii) Commanding officers based MANETs (C-MANETs). Our classification is based on the characteristics of the MTs (MANET‟s nodes). The former model of MANETs is formulated by a set of peers that have limited level of power. P-MANET is illustrated in figure 1(A). In P-MANET any two arbitrary MTs can share data if and only if they are in transmission range of each other‟s or they can cooperatively exchange data through intermediate nodes. Considering figure 1(A), MTx can communicate with MTy through MTz. Formulating clusters in P-MANET is a tedious task as it requires flooding the network with control packets to elect the most suitable CMs as well as several additional penalties, such as the delay time, for network setup. On the other hand, C-MANET is formulated by a set of limited power MTs as well as a set of commanding officer (CO) nodes, which are considered as the servers of traditional wired/wireless networks as depicted in figure 1(B). COs are special MTs that are characterized by unlimited power as well as little mobility rates. So that, in the cluster based MANETs, COs are the most suitable nodes to be the clusters heads. Most of research in MANET, in the areas of routing and caching, is concentrated on P-MANETs. However, almost MANET applications use the second model (e.g., C-MANET), which is the most used model in real-world MANETs.

MTz

MTy

CE

MTx

AC

A

P-MANET

MT

MT’s transmission range

B

C-MANET

Cluster Boundary Official centers (Cluster Masters)

Remote Data Center

Figure 1, P-MANET versus C-MANET

C-MANETs do not require a “frozen period” for CM election, as is true in all Active Clustering (AC) schemes. AC requires two phases, which are; (i) cluster master election and (ii) cluster formulation. However, employing the C-MANET model will minimize the phases of AC techniques into one phase, which is the cluster formulation as it eliminates CM election phase. Surely, this has a good impact on the network performance as it minimizes the time needed to setup the clusters as well as minimizing the overhead caused by the control packet flooding used to elect CMs. So, our proposed caching strategy (e.g., ACCS) will be built upon a C-MANET model. 3

ACCEPTED MANUSCRIPT

3. Related Work Tremendous research had been done in the area of designing the caching techniques for MANETs. However, little work has paid attention to cooperative caching to impose more benefits of mobile ambient. To facilitate data discovery in cooperative cache, some schemes [14-16] are based on broadcasting, while others use the network clustering concept [17, 18].

AN US

CR IP T

Simple Cache (SC) is the most suitable cashing technique for on-demand data access applications. It is similar to the client cache of the traditional client–server model in wired networks. SC resolves the data request by checking the local cache first and send the request to the data server after local cache misses [19]. It does not consider node mobility, limited wireless bandwidth, and battery power of MANET. Moreover, in SC, nodes do not use the cache space available in their neighbors. Generally, SC works well as long as the connection to the data server is reliable and not too expensive. Otherwise, it results in failed requests or request timeouts. CacheData (CD) considers the cache placement policy at intermediate nodes in the route between the source and the requester [19]. Intermediate nodes caches a passing-by data item locally when it finds that the data item is popular (i.e., frequently requested) or the node has enough free caching space. Generally, in CD scheme, intermediate nodes cache the data based on conservative rule. A conservative rule can be proposed as follow: (i) A node does not cache the data item if all requests for that item are from the same node, (ii) The request is answered by intermediate nodes instead of the remote server when a valid copy of the requested item exists in the intermediate node‟s cache. However, there are several limitations of this approach such as: (i) no cooperative caching protocol among nodes, hence, each node independently performs the caching tasks such as placement and replacement, (ii) the data could take a lot of caching space in intermediate nodes, and (iii) CD needs extra space to save the data, so, it should be used prudently. CachePath (CP) is also proposed for redirecting the requests to the caching node [19]. To accomplish such aim, forwarding nodes cache the path to the closest caching node instead of the faraway server and redirect future requests along that cached path. Accordingly, CP scheme saves caching space compared to CD as it caches the path only not the data itself. However, as the caching node is dynamic (e.g., the network topology continuously changes), the cached paths could become obsolete due to the movement of MTs.

ED

M

Hybrid Cache (HC) tries to avoid the weakness of CD and CP schemes by deciding when to use which scheme based on the properties of the passing-by data, such as: it‟s size and Time-to-Live [20]. Generally, CD is preferred in two situations. First, if the size of the passing-by item is less than a threshold size [19], this is simply because the data item only needs a very small part of the internal cache. Second, if the TTL of item is less than threshold value [20]. This is because, the data item may be invalid soon, as a result, CP may provide the wrong path to the request and it resending to the data server. Otherwise, CP is better [19].

CE

PT

NeighborCaching (NC) [21] utilizes the caching space of inactive neighbors. To accomplish such target, when a node fetches a data item from a remote node, it puts that item in its own caching space for reuse. This procedure needs to evict the least valuable data item from the cache based on a replacement algorithm. With NC, the data item that is to be evicted is stored in the idle neighbor node‟s storage. In the near future, if the node needs the evicted data item again, it requests that item from the near neighbor that keeps a copy of the item. Hence, NC utilizes the available cache space of neighbors to improve the caching performance [21]. However, it lacks cache cooperation among the network nodes.

AC

Zone Cooperative (ZC) [22], considers the progress of data discovery. The set of one-hop neighbors in the transmission range forms a zone in which the nodes within the zone forms the cooperative cache. Each node has a local cache to store the frequently accessed data items, which satisfy not only the node‟s own requests but also the data requests of other nodes that pass through it. Data is updated only at server. Once the data is updated, nodes‟ cached copy becomes invalid [22]. For a data miss in the local cache, the node first searches the data in its zone before forwarding the request to the next node that lies on a path towards server. In spite of its effectiveness, the latency may become longer if the neighbors of intermediate nodes do not have a valid copy of the requested data item. In Group Caching (GC), each node as well as its one-hop neighbors form a group [23]. The group is maintained by periodically sending “Hello” messages among the nodes forming the group. For each group, a special node is elected to be the group master, which needs to communicate with group members directly through one-hop routing. The task of the group master is to increase the communication speed and reduces the delay among nodes within the group [23]. Periodically, the master node checks the caching status of its group members using caching control messages. Accordingly, it has the ability to store more different data items and increases the data accessibility.

4

ACCEPTED MANUSCRIPT

CR IP T

Global Cluster Cooperative (GCC) caching scheme [24, 25] divides the network into several clusters based on the geographical network proximity. Nodes in a cluster interact with each other, which enhances the system caching performance. For each cluster, a special node is selected to act as Cache State Node (CSN), which maintains the global cache state (GCS) information of different network clusters. GCS for a network is the list of the available data items along with their TTL values. Accordingly, when a node caches or replaces a data item, its corresponding GCS is updated at CSN. In GCC, when a node suffers from a local cache miss, the node will looks for the required data item at the cluster members. If the node cannot find the required data item in the cluster member‟s caches (e.g., cluster cache miss), it forwards the request to CSN, which keeps the global cache state and maintains the information about the nodes in the network that has a valid copy of the required data item [24]. If a cluster other than requesting node‟s cluster has the requested data (e.g., remote cache hit), then it can serve the request without forwarding it further towards the data source. Otherwise, the request will be replied by the data source [25]. In order to overcome the limitations of previous COCA techniques, we have designed a new cluster-based COCA strategy, which is called Adaptive Cooperative Caching Strategy (ACCS). Our goal is to provide a caching and power efficient services for MANETs. ACCS minimizes the power consumption as it maintains a realistic C-MANET model, hence, no packet flooding is required to elect the cluster masters as occurs in all other cluster-based caching techniques. This makes ACCS suitable for power sensitive networks such as wireless sensor networks. Moreover, by using C-MANET model, no time penalties are added for installing and monitoring the changes that may happened in the network topology.

ED

M

AN US

ACCS has several salient features that most other caching schemes do not have. First, ACCS is designed to reduce the redundancy of cached data. It checks the caching status of the cluster members when receiving a new data item. Then, at maximum ζ copies of the same data item are maintained within the cluster caching space. Accordingly, as a result of the redundancy removal, ACCS can store more distinct data items, which increases the cache space utilization while keep maintaining the data accessibility at a satisfactory level. Second, ACCS minimizes message overhead as it eliminates the network flooding when searching for a data item, also, it minimizes the time penalty of searching for a data item as the data is retrieved with minimum delay through the shortest path. Third, ACCS enhances the caching performance by utilizing the cluster‟s cache space as soon as possible. Hence, when a MT connects (joins) the network (its cache space is still empty), ACCS can perfectly utilize such raw cache by storing new data items that may be used by cluster members in the near future. Fourth, ACCS implements an efficient fuzzy based cache replacement as well as a new prefetching policies, which maximize cache space utilization. Finally, unlike most cooperative caching strategies for MANETs, which implement reactive routing protocols (such as AODV), ACCS implements a built-in table driven routing strategy that updates CMs‟ routing tables during the formulation of the network clusters. So, ACCS adds no penalties on the scattered bandwidth as well as it can perfectly and quickly provide the shortest path from the source to destination.

PT

4. The Proposed Adaptive Cooperative Caching Strategy (ACCS)

AC

CE

In this section, our proposed adaptive cooperative caching strategy (ACCS) will be introduced in more details. ACCS is an adaptive strategy that is proposed to solve the caching problems in C-MANETs. So that, the network has a pre-defined set of server nodes (e.g., commanding officials) that serve as CMs. Initially, the network will be divided into non-overlapping clusters in which each cluster maintains one CM as well as a set of cluster slaves. Each slave continuously records all nodes in its zone, which is defined by one hop surrounding neighbors. Several algorithms will be introduced to handle the different caching issues, such as: pre-fetching, replacement, and consistency. Other algorithms will be introduced to solve the cache miss problem, which are; Anti-Local Miss (ALM), Anti-Zone Miss (AZM), and Anti-Cluster Miss (ACM). Moreover, ACCS proposes a new procedure for effectively discover the network clusters. ACCS implements a quick and efficient path mining technique that ensures that data migrates through the shortest path. 4.1. Network model and Basic Definitions We have considered a C-MANET environment, which can be modeled as a graph G = (V, E), where V represents the set of network MTs, and E is the set of edges that representing the available interconnecting links. An edge e = (MTx↔MTy)E, where MTx, MTy V, exists if and only if MTx is in the transmission range of MTy and vice versa. All links (edges)E are bi-directional, hence, if MTx is in the transmission range of MTy then MTy is also in

5

ACCEPTED MANUSCRIPT

the transmission range of MTx. The network is assumed to be in a connected state. So, if it is partitioned, each partion is treated as an independent network.

CR IP T

The considered C-MANET is a realistic model of MANETs that consists of set MTs having the ability to access the data items stored by each other‟s. The network also includes a set of pre-defined promoted nodes that are usually semi-static and function as CMs. The original copies of a data items are stored in a static data center. If no cooperation among MTs, a data request initiated by a MT is forwarded hop-by-hop along the intermediate path until it reaches the data center, which sends back the requested data. However, to reduce network traffic and data access delay, MTs cooperate with each other‟s to maximize the data availability by caching some data locally [26]. In ACCS, the network is divided into several non-overlapping clusters. In each cluster, a pre-defined node is considered as CM. The number of cluster masters in the network is set during the network installation. It is assumed that the cluster master has un-limited energy (a commanding officer for example). 

Each mobile terminal (MT) is assigned a unique identifier in the network. The network has total of M mobile terminals, denoted as; MTi (1≤ i≤ M).



MTs can physically move at any time without notification, so there is no guarantee that MTi at time t will remain in the same cluster at time t+h. .

The devices (e.g., MTs) might be turned off or on at any time, so the set of active MTs (AMT) varies with time (e.g., |AMT|≤M), so, the network has no fixed size.

AN US



The day time (24 hours) is divided into a set of equal-sized time periods (time stamps) with length  .



There is one remote data source in the network, which has the original copies of the available data items. The set of the available data items is denoted by D = {d1, d2, …… dr}, where r is the total number of data items and d j (1≤ j≤r) expresses a data item.



Each MT has a cache space of C bytes.



The network uses the C-MANET architecture, hence, there is a pre-defined number of cluster masters (assumed to be Z) and accordingly the network has Z clusters. Accordingly, no frozen period is needed during the network installation for cluster masters election.



Each data item is periodically updated at the data source. Accordingly, after a data item is updated, its cached copy (maintained on one or more MTs) becomes obsolete.



Definition 1: C-MANET Cluster: The considered C-MANET is subdivided into a set of sub-domains called clusters. Each cluster includes a set of geographically neighboring nodes, and are automatically constructed around a pre-defined cluster master, which is a representative of the underlying cluster. Remaining cluster members (excluding CM) are ordinary nodes and are also called cluster slaves. The cluster is built and maintained at the beginning of each time interval, in which each CM initializes a procedure to discover its cluster members. The boundaries of a cluster are defined by the transmission area of its CM, which is assumed to be of µ layers around CM. Cluster slaves that are located at the boundaries of the cluster are promoted to be gateways. They serve as intermediate for intra-cluster communication and play the dominant role of forwarding data among clusters.



Definition 2: Cluster Master (CM); CM of a C-MANET is a pre-defined MT that is responsible of the communication among MTs within its cluster as well as the communication to MTs of other clusters. It stores a vital routing information. Moreover, CM keeps a complete map about the cluster structure as well as the role of each cluster slave (e.g., ordinary node or gateway). It is usually characterized by un-limited power as well as low mobility rates.



Definition 3: Cluster Slave (CS); a MT that is under the control of the cluster master. It usually characterized by a limited power as well as a high mobility rate. Hence, a CS may leave its cluster at any time with no notification. A CS may be ordinary node of a gateway.



Definition 4: Time Stamp: ACCS divides the day (24 h) into a set of equal-sized time units (also called time periods or time stamps). The length of the time unit  (represented in seconds) is set by the system administrator. Each time unit has a unique ID that represents its position in the day. So, any time unit is expressed as (i)  i = 1, 2, 3,…, λ, in which λ represents the number of time units in the day. All network CMs are synchronized so that the time period α starts at all CMs on the same time.

AC

CE

PT

ED

M



6

ACCEPTED MANUSCRIPT

Definition 5: Cluster Cache State (CCS); for a cluster, all the data items included in all caches for all cluster slaves along with the path to fetch the corresponding item as well as its TTL are stored at CM in an index called cluster cache index (C2I) .



Definition 6: Local Cache Miss (LCM); LCM takes place when the MT cannot find the requested data item in its local cache.



Definition 7: Zone Cache Miss (ZCM); ZCM takes place when the MT cannot find the requested data item in any cache of the nodes inside its zone (MT’s transmission range).



Definition 8: Cluster Cache Miss (CCM); CCM takes place when the MT cannot find the requested data item in any cache of the members of its cluster. As a result, the requested data should be fetched remotely (e.g., from other clusters or from data source).



Definition 9: Local Cache Hit (LCH); LCH takes place when a MT requires a data and finds it in its local cache.



Definition 10: Zone Cache Hit (ZCH); ZCH takes place when the MT finds the data item in a cache of a node inside its zone (MT transmission range) after a local cache miss takes place.



Definition 11: Cluster Cache Hit (CCH) CCH takes place when the MT finds the data item in the cluster members’ caches after a zone cache miss takes place.



Definition 12: Remote Cache Hit (RCH) RCH takes place when the MT finds the data item in the cache of a node in another cluster after a cluster cache miss takes place.



Definition 13: Global Cache Hit (GCH) GCH takes place when the MT gets the data item from the remote data source.

AN US

CR IP T



4.2. Network Installation: Setting up the Network Clusters

ED

M

The proposed caching strategy divides the day (24 h) into a set of equal-sized time units (also called time periods or time stamps). The length of the time unit  (represented in seconds) is set by the system administrator. Each time unit has a unique ID that represents its position in the day. So, any time unit is expressed as (i)  i = 1, 2, 3,…, λ, in which λ represents the number of time units (time stamps) in the day, hence, λ=(24*60/). All network CMs are synchronized, so that the time period (i) starts at each CM on the same time. During the network installation, each CM tries to discover its slaves (e.g., the MTs under its control). Although the cluster slaves are initially assumed to be far from the cluster master with a maximum pre-defined number of hops (µ hops), during the network identification, which starts at the beginning of each time stamp, the cluster may be expanded to include far nodes.

AC

CE

PT

In this section, a simple but effective clustering strategy will be introduced, which is called Low Overhead Clustering (LOC) and is proposed mainly for C-MANETs. At the beginning of each time period, each CM starts a topology discovery procedure to discover its cluster slaves, the path to each slave for inter-cluster communication, and the gateway nodes for intra-cluster communication. As illustrated in algorithm 1, LOC quickly divides the network into non-overlapping clusters through two sequential phases, which are; (i) Initial Cluster Formulation (ICF), illustrated in figure 2 (A), and (ii) Extended Cluster Formulation (ECF), illustrated in figure 2 (B). Setting up the network clusters should be accomplished in a pre-defined installation time, which is called Network Installation Time, and denoted as; τ=π1+ π2, while π1 is the time needed for ICF, and π2 is the time needed for ECF.

7

ACCEPTED MANUSCRIPT

A

B

ICF

ECF

Boundary of Cluster X

X

X Boundary of Cluster Z

Boundary of Cluster Y

Z

X

Master of Cluster X

Y

Master of Cluster Y

Z

Y

Master of Cluster Z

A Slave of Cluster Y

A Slave of Cluster X

A Slave of Cluster Z

Figure 2, setting up the network clusters

Node zone

Unclassified Slave

Definition 14: Network Installation Time (τ): it is the time required by CMs to setup the network clusters, so that

AN US



Z

CR IP T

Y

the network becomes ready to facilitate data exchange among the network nodes. It is the sum of two periods, which are; the initial cluster formulation time (denoted as; π1) and the extended cluster formulation time (denoted as; π 2). Hence, τ=(π1+ π2)<<. For illustration τ=0.1*.

CE

PT

ED

M

During ICF, as depicted in figure 2 (A), the task is to define the initial boundaries of each cluster. As the network is of C-MANET type, then, it has a set of predefined CMs. Each CM initializes a procedure to define the initial boundaries of its cluster, which is assumed to be of µ hops away from the cluster master. This can be accomplished by broadcasting a cluster discovery message (CDM), which is allowed to propagate while keep registering the sequence of nodes it visits. CDM is initialized at CM, which is assumed to be in layer (level) “0”. Accordingly, as CDM moves from node to the next, it keeps incrementing the level number so that to not exceed the allowed layers (e.g., µ). Two types of CDM are considered, the first is to discover the cluster layers and is assumed to move “Forward”, while the other type is used to inform CM with the shortest path to the MT as well as its cache state, which is assumed to move “Backward”. When MTx receives a forward CDM, it picks the sequence registered on the body of CDM, adds its ID to the sequence, increment CDM‟s level number, then perform two basic tasks. The first is to broadcast the forward CDM after updating the sequence as well as the level number. The second task is to clone the forward CDM, copy its cache state to the clone, then allow the clone to move backward toward CM. The backward CDM (e.g., the clone) will follow the pre-registered path to CM. Upon arrival, CM stores the path to MTx in its routing table for inter-cluster communication as well as the registered caching state. As the network has several CMs, which discover their clusters at the beginning of each time stamp, CDM should carry the Cluster Master ID (CMI) as well as the current time stamp.

AC

The format of CDM is shown in figure 3 (A). CDM consists of six layers, which are; (i) current level (layer), (ii) visited sequence (started from CM), (iii) current time stamp, (vi) cluster master ID (CMI), (v) the local cache state, and (vi) CDM direction, which may be forward (F) or backward (B). It is important to mention that; the local cache state field of CDM is used only when CDM moves backward (e.g., used by CDM clone) so that each MT can send its cache state to CM. The local cache state of the MT describes its cache contents (e.g., the IDs of the data items currently stored in the cache) as well as the corresponding TTLs. CMs are synchronized, so that, they discover their clusters members in a predefined checkpoints (e.g., at the beginning of each time stamp). Also, all network MTs are aware of the beginning of the time periods, so that they can discover their clusters, when they are not explicitly assigned to a cluster. After the period π1 has expired, each CM has already discovered its initial cluster area, which is of µ layers in size away from it. However, there may be some isolated MTs, which are away from the nearest CM by more than µ hops. Such issue is carried during the period π2 of the network installation time at which the task is to discover 8

ACCEPTED MANUSCRIPT

those isolated MTs, then register them to the most appropriate cluster. To accomplish such aim, if a MT do not receive the CDM after the period π1 has expired, it knows that it is now out of authority of any CM. To join the most appropriate cluster, it continuously perform a handshaking procedure during the period π2, then use the Most Nearest Neighbor (MNN) procedure to detect its cluster as illustrated in figure 4 (B) and algorithm 1. According to MNN, the un-joined MT asks its neighbors for the clusters they belong to. Then, the un-joined MT registers itself at the cluster to which most of its neighbors belong to.

AN US

CR IP T

As illustrated in algorithm 2 and flowchart 1, network CMs discover their cluster members at the beginning of each time stamp by broadcasting CDM. If MTx receives CDMi (e.g., the cluster discovery message of the ith time stamp) during the initial cluster formulation time period (e.g., π1), it first checks whether that CDM is valid by inspecting its time period field. If it‟s not valid (e.g., related to older time stamps), that CDM will be discarded. On the other hand, the valid CDM will be checked for direction. Backward CDMs are allowed to propagate hop-byhop to CM carrying the path as well as the local cache state of a specific MT. Forward CDMs are treated in a different manner. When MTx receives a valid forward CDMi, it picks the sequence carried by that CDM, adds its ID to the sequence. Then, if MTx did not receive another CDMi before, it will pick the level of CDM (denoted as; LCDM) and record LCDM to be its level (e.g., MT‟s level, denoted as; LMT, becomes equal to LCDM). At this point, CDMi carries the path from CM to MTx, then MTx stores that path as well as its level in its routing table. Afterward, CDMi is cloned with backward direction. The clone is then loaded with the cache state of MTx (e.g., the IDs of the items stored with MTx cache along with the corresponding TTLs) and allow the clone to propagate carrying its local cache space and the path to CM. Then, MTx increment LCDM of the forward CDM and broadcast it if the Length of the sequence carried by CDMi (denoted as; LenCDM) is less than a threshold value (denoted as; Lenmax).

5

2

3

4

5

6

6

CMjCEDRM



j

F

i

C E

Forward (F)

M

Direction Backward (B) Cluster Master ID

CMj

Local cache state (used only when direction is backward)

D

R

B

Sequence Layer Number

1

Time Stamp

3

2

1

6

4

AC

A

CE

PT

ED

M

On the other hand, if MTx has received another CDMi before, then, MTx owns a level as well as a valid path to CM. Then, MTx compares its registered level (from the previously received CDMi) with the level of the new CDMi. MTx will consider the smaller level as well as the shorter path to CM, while the other longer path is also maintained in MT‟s routing table as an alternative path. The path carried by the new CDMi is also sent to CM by cloning the forward CDMi, and allow the clone (e.g, the backward CDMi) to propagate to CM carrying the path and the local cache state. The level of the forward CDMi is then updated to be [min(LCDM, LMT)+1], then it is broadcasted if LenCDM< Lenmax. It is also important to mention that when the level of CDMi reaches the predefined number of levels (layers), which is denoted as µ, CDMi becomes obsolete as its function is to discover only µ layers around CM. Hence, the MT that receives such invalid CDMi will discard it. Such scenario is continued during the initial cluster formulation time period (e.g., π1). An illustrative example showing the contents of the CDM corresponding to the ith time stamp (e.g., CDMi) for the jth cluster master (e.g., CMj) is shown in figure 3 (B) assuming that the nodes C, E, D, R, and M did not receive an instance of CDMi before.

Figure 3, Cluster Discovery Message (CDM) Format

After ICF has finished, µ layers of MTs are defined around each CM. However, some MTs may be unreachable. This situation may happen if a MT is far away from the pre-defined CMs. Accordingly, it cannot be registered by any CM as it is far from the nearest CM by more than µ hops. To overcome such problem, EFC instantly begins just after finishing ICF. The task during ECF, which takes place during a pre-defined interval π2, is to register the 9

ACCEPTED MANUSCRIPT

CR IP T

unreachable MTs, which are out of authority of any CM, to the most suitable cluster. Hence, each un-joined (isolated) MT is registered to the nearest CM. To accomplish such aim, each of those isolated MTs tries to discover the other nodes in its zone (transmission ranges) through handshaking. Then, based on the belonging of its neighbors using the Most Nearest Neighbor (MNN) procedure, it can find the cluster it belongs to. If the unregistered MT finds that one of its neighbors has no belonging, it waits for a while then asks that neighbor again about its belonging. After it has collected all (or the most of neighbors if the un-replied MTs will not affect the decision) the belongings of all its neighbors, the un-registered MT can identify its cluster, which is the one to which most of its neighbors belong to. At the end of ECF, nodes which can „hear‟ two or more CMs become gateways, then, they send “I am a gateway” to all CMs they recognize, while the remaining MTs are considered as ordinary nodes. Details about LOC are illustrated in algorithm 1 as well as flowchart 1.

Low Overhead Clustering (LOC) Input: o o o o o o



Trigger:



Output:

o o

Dividing the network into m clusters, in which each cluster consists of one cluster master and a set of slaves.

Steps:

ED

M

For each cluster master cm Network do Cm sends cluster Discovery Message (CDMi) to all MTs in its transmission range. For each MTx  cluster (a neighborhood of µ hops away from cm) do If π1≤ time<π1+π2 Then MT discovers the belonging of its neighbors through handshaking If All neighbors belong to the same cluster Ask neighbors to the path to their unified CM MTx sends a registration message to CM Else Use KNN to discover its belonging MTx sends a registration message to CM it belongs to MTx sends a “I am a Gateway” message to all CMs it identifies.. End if Else When MTx receives CDMx do If Time_Stanp(CDMx) is not valid (e.g., x
AC

CE

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41:

At the start of each time stamp.

PT



Ad hoc networks with n nodes and m cluster masters. Cluster size=µ (number of hops that describes the initial cluster diameter). lenmax the maximum length of the path sequence carried by the cluster discovery message. π1: the initial cluster formulation time. π2: the extended cluster formulation time. ξi : the ith (current) time stamp.

AN US



10

ACCEPTED MANUSCRIPT

Start

MTx sends a registration message to CM it belongs to MTx sends a “I am a Gateway” message to all CMs it discovers.

Not Finished

MT discovers the belonging of its neighbors through handshaking

Valid

F

Yes

MTx receives CDMi before

Check time stamp

Not Valid

B

Check Direction

Discard CDM Forward CDMi to the next MT in the sequence stored on the CDMi itself

MT adds its ID to the sequence carried by CDMi

MTx sends a registration message to CM

π1

No

Stop

Compare levels of the new CDMi (e.g., LCDM) with the level of the MT (e.g., LMT)

Yes

No

No

LCDM ≥ LMT

M

Stop

Let LMT = LCDM

LMT LCDM π1 Lenseq

length of the sequence carried by CDMi Maximum allowed sequence length.

AC

Lenmax

The current level of MT. The level carried by a new arrived CDMi Initial Cluster formulation time

µ

ED Broadcast CDM

Yes

Modify the level carried by CDMi to be LMT+1 (i.e., let LCDM= LMT+1)

Stores the path carried by CDMi in the MT‟s routing table as the shortest path to CM.

PT

CM B F

where Mobile Terminal Cluster Discovery Message at ith time stamp. Master Cluster Backward direction Forward direction

CE

MT CDMi

MTx receives CDMi

Yes

All neighbors belong to the same cluster

Ask neighbors to the path to their CM

Finished

π2+π1

CM broadcast CDMi

AN US

No

Not Finished

Finished

th

CR IP T

Use KNN to discover its belonging

At the start of the i time stamp

Stop

Stores the path carried by CDMi in the MT‟s routing table as an alternative path to CM.

LCDM ++

Yes

Clone CDMi with direction=B

Lenseq ≥ Lenmax

Yes

Not Finished

LCDM<µ

π1

Allow (unicast) the clone to propagate reversely to CM carrying the sequence

No

Number of allowed levels.

Finished

Stop

Flowchart 1, Topology discovery procedure

Unlike existing active clustering methods, the proposed LOC relieves the network from the unnecessary burden of control messages broadcasting, especially for networks with relatively static topology. This is achieved through adapting the cluster discovery period according to MTs mobility pattern as well as the underlying application. LOC also has the advantage of fast completion of clustering procedure, while maintaining the battery power of 11

ACCEPTED MANUSCRIPT

cluster slaves as no cluster master election procedure. Moreover, at the end of the cluster formulation time, each CM owns the cache status of all members of its cluster as well as a basic routing information (shortest and alternative paths from CM to all cluster members). As will be seen later, those paths will pass through a quick mining process to discover the direct path between each pair of MTs in the cluster. 4.2.1. Gateway Nodes



AN US

CR IP T

As illustrated above, nodes which can „hear‟ two or more CMs become gateways. Those nodes sends “I am a Gateway” message to all cluster masters it recognizes. Hence, they can be used to exchange cached items among adjacent clusters, so gateways have a vital role in intra-cluster communication. However, each gateway still belongs to one cluster. Certainly, it will be better from the data availability point of view to make the gateways belong to all clusters they recognize during ICF. This increases the data availability in the clusters to which the gateway sees since its cache space is available for all members of the clusters it recognizes during ICF. However, this will cause the battery power of the gateway to degrade hardly as it communicates with huge number of nodes (it may communicate with the members of two or more clusters). Due to their vital role, it will be better to save the power of gateways for intra-cluster communication, while they can exchange data with the members of their clusters as the ordinary cluster slaves. This is the cause why the principle of overlapped clusters is not employed. Definition 15: Gateway Node is a special node that can observe several cluster masters. Accordingly, it can be used as a bridge to transmit data among different clusters.

4.2.2. Leave Alone Problem (LAP)

ED

M

LAP takes place when all the discovered slaves of a cluster master, say; CMx, are belonging also to other clusters. At this situation, CMx is demoted during the current time stamp to operate as a normal gateway. However, such demoted cluster master will try to discover new slaves to its cluster at the start of the next time stamp to promote itself again. As illustration is given in figure 4, at which the cluster master „A‟ demotes itself to become a gateway between clusters X and Y during the current timestamp. This CM is demoted during the current timestamp to be a normal gateway

A

1

Before mobility

AC Before mobility

x

x

C

B

2

After mobility y

C

Figure 5, weak mobility

Figure 4, Leave alone problem

1

2

y

CE

C

Cluster Y

PT

Cluster X

After mobility

y x

y

C

C

1

x

Before mobility

2

After mobility

y x

y

C

C

x

A

B

Node x still in its cluster Figure 6, Strong mobility

12

Node x moves outside its cluster

ACCEPTED MANUSCRIPT

4.2.3. Host Mobility Considerations

CR IP T

We have two types of node mobility; the first is the weak mobility, while the second is the strong mobility. During the first, the moving MT still maintains at least one of its neighbors that were discovered at the beginning of the current timestamp. Hence, there is still a valid path to its cluster master. An illustration is given in figure 5, in which mobile host “x” moves while maintaining one of its old neighbors (which is node y). On the other hand, the strong mobility happened if the MT moves so that it loses all its detected neighbors during the last topology discovery process as illustrated in figure 6 (A and B). This may happen if MT moves (in the same direction) a distance that is greater than the diameter of its transmission range. Such behavior may also causes the MT to totally leave its cluster to a neighboring one. To go around such problem, when a MT discovers (through handshaking) that it loses all its neighbors that were detected during the last topology discovery process, it uses the Most Nearest Neighbor (MNN) procedure to discover its current cluster, ask for the path to the CM of the cluster it currently belongs to, then sends its local cache state. If the MT can hear from several clusters after it have moved (either in a weak or strong manner), it immediately registers itself as a gateway at the identified CMs during the remaining time of the current timestamp. 4.2.4. Estimating the proper time stamp

M

AN US

Several issues can be considered to determine the suitable length of the time stamp (e.g., ) such as the mobility rate of the network MTs and the underlying application. Generally, for a MANET with MTs that move with high mobility rate,  should be small as the network topology rapidly changes. A basic consideration of ACCS is to reduce the volume of control messages circulated within the network. Minimization of broadcasted message ensures not only saving the network bandwidth but also MTs‟ battery power. The idea behind minimizing the amount of broadcast messages is based on the realistic observation that MANETs are seldom highly mobile. This is usually the case in MANETs communicating MTs in conferences, convention centers, or electronic classrooms. In other cases, MANET‟s MTs usually move together, so, they are considered semi-static, this is usually the case in MANETs used to communicating a group of solders in military applications [1-4].

AC

CE

PT

ED

Recent active clustering techniques involve broadcasting of periodic „hello‟ messages to sense potential topological differences between two successive timestamps [16-21]. However, when considering relatively static MANETs, effective topology changes seldom occur. Hence, network bandwidth and power resources are seriously consumed to only verify that existing topology is still valid. ACCS has a good ability to solve the problem of MTs mobility (weak and strong) as the moved MT has the ability to register itself to the new cluster as mentioned before. Accordingly, ACCS has a good ability to work with long timestamps with little performance degradation. However, excessive number of MTs strong mobility may cause complete change of the network topology. ACCS corrects this drawback by dynamically tuning „Hello‟ broadcast stamp (BS). By other words, BS duration depends on the recent strong mobility pattern of the network MTs. For high strong mobility of MTs, BS is shortened to guarantee an accurate topology information. However, with low mobility rates, BS is lengthened, relaxing the network from unneeded control message flooding. 4.2.5. Statefull and Stateless Behaviors The behavior of the network CMs guides the process of transferring of data items among network MTs. Two possible behaviors are considered, which are statefull and stateless. During the former, CM maintains a complete map of its current cluster topology. By other words, it has a set of valid paths between each pair of MTs within its cluster. This can be accomplished by discovering the cluster topology through a data mining process. Such process is guided by the paths collected during the cluster discovery task, which describes the paths among CM and all cluster members. Although paths mining may be a tedious task, which consumes computational power of CMs, it speeds up the data transmission among the cluster slaves as data is transmitted through the minimum number of hops. Accordingly, each CM has a complete map about its cluster‟s topology. Moreover, it maintains a panoramic view that shows the complete network topology by discovering paths to its adjacent CMs through the gateways identified during the cluster discovery process. Hence, CM can simply identify the shortest path between any pair 13

ACCEPTED MANUSCRIPT

of nodes. On the other hand, the stateless behavior allows the CM to have a valid path to each slave in its cluster as well as valid paths to adjacent CMs through the gateway nodes. CM in this case is not responsible of maintaining and discovering all valid paths among MTs of its cluster. This behavior minimized the consumed computational power and storage media. However, it has more latency time to transmit a data item between two MTs. Staeless behavior gives no consideration to the path length, so, to transfer a data item from MTx to MTy, the item should be fetched through the path MTxCMMTy as no direct path is available between MTx and MTy, which is the case in the statefull behavior. It‟s important to mention that the state of a CM is dynamic, accordingly, it can change its state from stateless to statefull during each time stamp. Definition 16: Statefull Behavior: it is a behavior of the cluster maser in which it maintains the complete topology of its cluster, by other words; it stores all valid paths among each pair of its cluster slaves. CM also has the ability to connect to the neighboring cluster masters via the gateway nodes.



Definition 17: Stateless Behavior: it is a behavior of the cluster maser in which it maintains only the valid direct paths to each of its cluster slaves. CM also has the ability to connect to the neighboring cluster masters via the gateway nodes.

CR IP T



M

AN US

To achieve the stateful behavior, a mining process should be performed on the collected paths just after the cluster formulation. However, this may introduce extra delay time to complete the mining process, which delays the network operation. To overcome such problem, after setting up the network clusters, initially, the network operates in a stateless behavior until the process of mining paths is completed. Then, the system changes its behavior to statefull. Hence, it can utilize the shorter mined paths to allow fast data transfer among the network MTs. However, this depends on the power (battery level) of CM itself. If CM has unlimited or sufficient power level, which is usually the case in C-MANETs, it will perform the paths mining process to discover all possible paths among its slaves (other nodes inside its cluster), then change its state to the statefull. However, as the statefull state needs CM to consume additional processing power, if CM has limited power, its state remains stateless, which causes the data to be transmitted through longer paths. 4.2.6. Mining Paths from CM to Cluster Members

AC

CE

PT

ED

ACCS is a complete data exchange strategy for MANET as it considers the basic issues required to exchange data, which are how to cache and route data among the network MTs. To the best of our knowledge, no complete strategy that considers both issues has been introduced yet. In all previous efforts in MANET management, caching and routing were handled separately. Although on-demand routing protocols are preferred in MANETs (such as AODV), to transmit data between two nodes, extra time is needed to setup the connection which increases the query delay. A novel idea is to build a complete integrated strategy that considers both issues (e.g., caching and routing). Then, they can efficiently work together. ACCS has a built-in table driven routing technique that provides the ability to transfer the data through the shortest paths, which makes it suitable for time sensitive and real time applications. The novelty here is to fill the routing tables of the network CMs with the valid and shortest routes to all cluster members during the cluster formulation with no extra time delay. As ACCS reformulates the network clusters periodically, routing tables are also updated periodically with no extra time to keep track of the dynamic topology of MANET. This behavior also maximizes the network lifetime in the case of power limitations as the minimum resources are used to transfer data. After cluster formulation, each CM owns a complete map showing the shortest as well as the alternative paths to all members of its cluster, hence, the network operation is started in a stateless fashion. However, to promote the system‟s performance, it will be better to change the behavior to statefull. Statefull behavior needs CM to discover the direct path between each pair of the cluster members. This consumes a part of CM‟s computational power to mine the collected paths through the cluster formulation. However, statefull mode has a good impact on the system performance as it minimizes the time penalty for setting up a connection between MTs and accordingly speeds up the data movements across the network.

14

ACCEPTED MANUSCRIPT The aim of routes (paths) mining is to fill CM‟s routing table with the valid paths (shortest as well as alternatives) among the cluster members. Such mining process is started just after the cluster formulation. The methodology followed for such mining process is illustrated in Algorithm 2. Although the most common way to represent the paths among the network MTs is the direct Acyclic Graph (DAG), we have simplified the structure into a set of trees. Our philosophy of using such representation is to simplify the search process for the shortest and alternative paths among the cluster MTs using the tree search strategies.



CR IP T

The basic idea is to utilize the collected paths among MTs through the cluster formulation to build a set of search trees. Each tree represents the available paths starting from a specific MT as the tree root. So, if the cluster has n MTs, n trees should be built. After building such trees, a simple search process can be followed to discover the shortest path as well as all alternative paths between the MT under scope (e.g., tree root) and all cluster MTs. Definition 18: Mobile Terminal Gate (MTG) set: Mobile terminal gate set for MTx is a set of MTs that appear directly after or directly before MTx in all paths collected by CM during the cluster formulation, and can be expressed as; MTG(MTx)={MTi | MTipaths that contain MTx while MTi is located directly after or directly before MTx}. MTG(MTx) can be used to detect Zone(MTx) by removing duplicated nodes. Then, Zone(MTx)=Remove_Dublication(MTG(MTx)).

Input: o o

Ad hoc networks with n nodes and m cluster masters. A complete map of paths (shortest and alternative) paths between cluster master and the cluster members for all network clusters. MAP={MAP1, MAP2, ...... MAPm}, where MAPi is a set containing the paths of the ith cluster.



Trigger:



Output:

o

Immediately after the extended cluster formulation phase finished.

o

Steps:

For each cluster master CMiNetwork do CMi discovers all MTs  ith Cluster and store them in the set MTCLU={MTi1 , MTi2 , ……}. For each MTij  MTCUL do Tree(MTij)= Construct_Tree(MTij ) Mine Tree(MTij) to detect all available paths from MTij to other cluster members. Store the detected paths corresponding to MTij in the routing table of CMi Next Next

ED

1: 2: 3: 4: 5: 6: 7: 8:

PT



Finding the available paths (shortest path as well as alternative paths) between each pair of nodes in each cluster.

M



AN US

Route Learning Procedure

AC

CE

Procedure Construct_Tree(Node MTx ) Root= MTx T=0 // T is the current tree layer While (Uncirculated branches) For each MTy layer T Z=Discover_Zone(MTy) Add all nodesZ in the level (T+1) Next Prune Circulated branches from level (T+1) Terminate Haste branches from level (T+1) T++ End While End Procedure

Procedure Discover_Zone(Node MTy ) Zone=Remove_Duplication( MTG(MTy)) Return(Zone) End Procedure

MTG T MTCLU Zone

Algorithm Parameters Mobile Terminal Gate. Current tree level. Cluster Members. The set of mobile terminals in the transmission range

Algorithm 2: Route learning procedure

The procedure used for constructing the tree corresponding to a specific mobile terminal say MTx, starts by letting MTx to be the tree root (which represents level 0 of the tree). The zone of MTx is then identified by inspecting the available paths, finding MTG(MTx), then remove duplicated nodes. The nodes located in the zone of MTx are put in a new level under MTx (e.g., level 1). The procedure is repeated by finding the zone of all nodes (e.g., MTs) located at level 1. Then, a new level is added to the tree, which creates new tree branches. However, the tree should be pruned so that circulated branches are terminated to limit the tree growth in a useful manner. The circulated tree branch, also called a loopback branch, is the one that has a repeated node as illustrated in figure 7. 15

ACCEPTED MANUSCRIPT

CR IP T

Moreover, to avoid unnecessary expanding tree size, the branch will not be allowed to grow further if the node at the head of the branch has appeared in upper layers of other branches, those branches are called the “Haste branches” (note that if the node at the head of the branch has appeared in one of the upper layers of the same branch, this results in a loopback branch and accordingly the branch will be pruned by excluding the node at the head of the branch as discussed before), figure 7 gives an illustration. The philosophy of stop the growth of Haste branches is that; there exist shorter paths for the nodes of that branch that can be obtained from other branches of the tree, so there is no need to expand those haste branches. Such scenario is continued as long as it is allowed to add new level to the tree (e.g., there still uncirculated branches). After constructing the tree of each cluster node, a simple tree traversal procedure can be used to identify the available paths from the root (e.g., MTx) to other cluster members. A Level 0 (Root)

B

Y

Level 1

N

M

X A

B

X Y

Level 3

D Level 4

X B

D M

X

C

AN US

Level 2

A branch that can be extended

A branch that not allowed to extend as the node at its head (e.g., node D) has appeared at level 3 of another branch

Circulated (loopback) branches, which should be pruned

M

Figure 7, setting up the tree of mobile terminal “A”

Illustrative Example

PT

 

CE



The example is applied to a part of C-MANET that consists of 8 MTs, grouped into one cluster, as well as one cluster master. The cluster slaves (ordinary nodes) can be expressed by the set MTCLU={A, B, C, D, E, F, G, H}. Zone(A)={CM,B,D}, Zone(B)={CM,A,E}, Zone(C)={CM,G,H}, Zone(D)={A}, Zone(E)={B,F}, Zone(F)={E,G}, Zone(G)={C,F}, Zone(H)={C}. The maximum sequence length (Lenmax) =4. Number of cluster layers µ=2.

AC



ED

In this subsection, a simple illustrative example showing how CM mines the paths to its cluster members will be presented. Through the example, the following assumptions are considered;

A

Level 0 (Root) B E Transmission range of CM.

Tree(A)

F

D

Layer 2

Level 3

Cluster master.

A

Level 4

Mobile terminal

Layer 1

B

G (stop) S

E

X

C

F

H G

CMBEFG CMBA CMBAD CMC CMCH CMCG CMCGF CMCGFE

Figure 8, constructing the tree of node A and mining paths for the illustrative example

16

Source

A

X B

C

X

B Stop

A

CM

E

X X

G

C

Mining paths between A and other cluster members

Collected Paths during cluster formulation

CMA CMAD CMAB CMABE CMABEF CMB CMBE CMBEF

CM Stop

Level 2

CM

D

Level 1

Destination CM B D C E F G H

Shortest path Direct ACMC ABE ABEF ACMCG ACMCGH

H

F (stop) S

X C

Alternative paths ABCM ACMB ----ACMCGF ABEFG -----

ACCEPTED MANUSCRIPT

The collected paths during the cluster formulation are organized several times based on the number of cluster members to build a tree for each cluster member. Hence, Tree(MTi) is the tree whose root is MTi and showing all the available connections between MTi and all members of the entire cluster. The resulting trees are mined in a breadth first manner [27] to identify all the direct paths between the cluster members. Figure 8 illustrates how to construct Tree(A), then identify the paths from node “A” to all other cluster members. 4.3. Mobile Node Caching and Prefetching Manager

Request for data item

Mobile Node Caching and Prefetching Manager

PT

Cache Access Log

CE

Caching Rules

Replacement Agent (RA)

Prefetching Agent (PA) Prefetching Diamon

Replacement Diamon

AC

Remote request in the case of local cache miss

Request Processing Module

ED

Local Request

Local Hit

M

Local Cache Repository

AN US

CR IP T

Figure 9 depicts a typical MT caching and prefetching manager. Each MT employs a rule generating engine to derive prefetching and replacement rules from its access log as well as the access logs of its neighbors. The derived rules are stored in the prefetching and replacement rule repositories of the MT. On the other hand, a set of caching rules are stored in the caching rules repository. When a mobile terminal; denoted as MTR, requests an item IR while suffering from a local cache miss, MTR looks for the required data item at its zone members (other MTs within its transmission range, which are reachable in one hop). If no reply (e.g., zone cache miss), MTR asks its cluster members for the required data item using a callback procedure. This can be accomplished through the following steps. Initially, MTR asks its cluster master, denoted as CMR, for IR, then CMR searches its index (e.g., Cluster Cache Index (C2I)), for the item. If CMR finds the required item indexed, it has two possibilities according to its mode, statefull or stateless. If the former is used, CMR calls back MTR with the path to the mobile terminal that stores a copy of the item. The path here is the valid link between MTR and MTs, which hosting IR. On the other hand, if the stateless mode is used, CMR has to fetch the required item first, then sends it back to MTR. However, if IR is not indexed at CMR (e.g., cluster cache miss). Then, CMR starts a “Help Me” (HM) procedure trying to fetch the item from neighboring clusters before asking the remote data source. More details about fetching a data item will be explained in the next section.

Prefetching Rules

Replacement Rules

Trigger

Perform Replacement

Stop

Need Prefetch

Yes No

Suff. Space

No

Yes

Remote Item to be cached

Discard Item Store item in local cache

Trigger RA to choose victim

Stop

Figure 9, Mobile Terminal Caching and Prefetching Manager

Input Passenger

17

ACCEPTED MANUSCRIPT During HM procedure, CMR sends “Please Help Me Message” (PHM2) to all its gateways. PHM2 has a unique ID, which consists of CM‟s ID as well as the IR’s ID. The gateway nodes then forward the request to the other CMs they observe. This procedure is continued until a path to a valid copy of IR is identified (Remote hit) or the query reached the data center itself. The path to IR is returned to CMR, which sends the path to MTR (in statefull or stateless manner). 4.4. Fetching a Data Item

AN US

CR IP T

In this section, a new cache discovery algorithm will be introduced, which determines the path to a node having the requested data or to the data source if a MT suffers from local cache miss. The discovery algorithm is illustrated in flowchart 2 and is described as follows. Assuming that MTR needs to access the data item Ix. Initially, MTR checks its own cache. If a valid copy of Ix is available at the local cache of MTR, the item can be used immediately. On the other hand, if Ix is not locally available, MTR checks with its neighbors, which are in its zone (transmission range) to see whether any of the neighboring MTs have a valid copy of desired data (e.g., Ix). If it is not available at the zone level, MTR sends a lookup packet to its cluster master (e.g., CMR). Upon receiving the lookup message, CMR searches its cluster cache index (C2I) for Ix. If Ix is found, CMR replies with an acknowledge packet containing the id of the MT that has cached the item (denoted as; MTS). Then, based on the pre-defined behavior of CMR, one of a two possible procedures will be followed. If the behavior is “statefull”, CMR will send MTR the direct path to MTS. Then after checking the availability of the path, MTR can directly retrieve Ix. On the other hand, if the behavior of CMR is “stateless”, Ix is forwarded hop-by-hop along the routing path from MTS to MTR through CMR.

M

However, if a cluster cache miss takes place, CMR sends Please Help Me Message (PHM2) to its gateways hopping to find a path to a valid copy of Ix at the neighboring clusters. If any of the gateways finds that Ix is indexed at any of the neighboring clusters, Ix will be fetched based on the behavior of the external cluster masters as well as CMR. If Ix is still not found, PHM2 will be forwarded until the requested data item is found or fetched from the data source. It is important to mention that to avoid flooding, in the case of cluster cache miss, an item cancellation procedure is used, which is triggered when the missed item Ix has been already fetched from other clusters. The target of such procedure is to prevent the continuity of discovering and returning new paths to the missed item.



ED

Anti-Local Miss (ALM) Algorithm Input: o

PT

Ad hoc networks divided into a set of overlapped clusters in which each cluster has a master node as well as a set of cluster slaves. MTR : the requester mobile host, which suffers from local cache miss. MTS : the source mobile host, which exists within the transmission range of MTR, i.e., within Zone(MTR). It stores a valid copy of the missed tem Ix in its cache. Ix: a valid copy of the missed item, which exists inside Cache(MTS).

o o

CE

o



Trigger:



Output:



Steps:

o

AC

o

MTR suffers from a local cache miss. path(MTR MTS): the path from MTR to MTS (i.e., the path to the missed item).

1: Buffer= 2: For each MTjZone(MTR) Do 3: MTR issues a handshaking procedure to ask MTj for the missed item (Ix). 4: If MTj has a valid copy of Ix then 5: Get the Life_Time(Ix) 6: Add life_Time(Ix) along with its argument. // the argument is Ix’s hosting node (i.e., MTj), 7: End if 8: Next 9: If Buffer= Then 10: Trigger a zone miss 11: else 12: MTS = argmin [Life_Time(Ix)] // choose the host which has the most fresh copy of Ix.  Life_Time(Ix)Buffer

13: 14:

MTR fetches Ix from MTS End if

Algorithm 3: Anti-Local Miss (ALM)

18

ACCEPTED MANUSCRIPT

MTR issues a request for item Ix

MTR checks availability of Ix in its local cache

Start

Yes

Abbreviation SRP 2 CI 2 PHM ICP No

Local Cache Hit

Meaning Store/Replace Procedure Cluster Cache Index Please Help Me Message Item Cancellation Procedure

MTR

Requester Mobile Terminal Source Mobile Terminal (the node which has ta valid copy of the requested item) The requested data item. The cluster master of MTR The cluster master of MTS

MTS Consistent

Inconsistent

Check consistency

Consistent CMR initializes SRP

Consistent

Stateless

State CMR

No

Zone Cache Hit

MTR checks availability of Ix in the 2 Cluster Cache Index (C I) at CMR

Inconsistent

Statefull

Yes

Check consistency

CMR sends the path (MTRMTS) to MTR

MTR checks validity of path (CMRMTR) via handshaking

CMR sends Ix to MTR

Yes

Check consistency

Ix CMR CMS

CR IP T

MTR access item Ix

Stop

MTR checks availability of Ix in caches of other MT in its Zone (transmission range)

CMR sends Please Help Me Message (PHM2) to its Gateways GW(CMR)

Yes

MTR checks validity of path (MTRMTS) via handshaking

Valid path

No Valid path

CMR searches for another stored path

M

Check consistency

ED CMR starts Item Cancellation Procedure (ICP)

AC Stop

No

Inconsistent

Forward PHM2 to new Gate Ways State of included CMs

Yes

Valid Path

Reconstruct a new path between MTR and MTS

Yes

Reach Datasource

Construct the shortest direct path between MTR and MTS, i.e., Path(MTRMTS) trough the considered gateways

MTR checks validity of path (MTRMTS)

MTR sends a request message for Ix to MTS MTR receives Ix

Store/Replace Procedure (SRP)

Statefull

Send Path(MTRMTS) to MTR

MTR checks validity of path (MTRMTS)

Initiate KSC Procedure

Item Ix is indexed in any of the observed CMs

No

Send Path(MTRMTS) to MTR

Update C2I to indicate that a valid copy of Ix at cache(MTR)

No

Consistent

Construct the shortest direct path between MTR and MTS, i.e., Path(MTRMTS) trough the considered gateways and CMs

Store Ix in cache(MTR)

Update C2I to indicate that a valid copy of Ix at cache(MTR) and replaced item is no longer available

Find path

Yes

Stateless

CE

Yes

MTR asks CMR for alternative path

Each Gateway Node GWNGW(CMR) forwards PHM2 to other CMs it observes

Yes

MTR sends a request message for Ix to MTS

PT

Start

Cache replacement

No

MTR receives Ix

No

MTR has suff. space

Yes

Yes

Find path

No

AN US

Inconsistent

CMR fetches Ix from MTS

No

Cluster Cache Hit

Yes

Valid Path

No

Reconstruct a new path between MTR and MTS

Flowchart 2, Fetching a data item

19

No

ACCEPTED MANUSCRIPT

Several procedures are used to discover a valid path from the requester node MTR to the source node MTS, which has the missed item Ix inside its cache. Each procedure is triggered via a type of item miss. The procedures used to solve the local, zone, and cluster cache misses are illustrated in algorithms; Anti-Local Miss (ALM), Anti-Zone Miss (AZM), and Anti-Cluster Miss (ACM). Which are illustrated in algorithm 3, 4, and 5 respectively, while the item cancellation procedure is illustrated in algorithm 6. On the other hand, an illustration showing the different cases for fetching a missed item are depicted in figures 10 and 11.

CR IP T

It is important to mention that once a node receives the requested data, it triggers the Keep Sufficient Copies (KSC) procedure, which resides at the cluster masters. The aim of such procedure is to save the cache space inside the cluster while maintaining the data availability at a satisfactory level as will be discussed in section 5.9. This can be accomplished by continuously monitoring the placed/replaced items at the different nodes of the cluster. Then, only ξ copies of each item are maintained within the cluster.

Anti-Zone Miss (AZM) Algorithm 

Input: o o o o o



Trigger:



Output:



Steps:

o

MTR suffers from a zone cache miss.

path(MTR MTS): the path from MTR to MTS (i.e., the path to the missed item).

M

o

AN US

Ad hoc networks divided into a set of overlapped clusters in which each cluster has a master node as well as a set of cluster slaves. MTR : the requester mobile host, which suffers from a zone cache miss. MTS : the source mobile host, which exists within Cluster(MTR). It stores a valid copy of the missed tem Ix in its cache. Ix: a valid copy of the missed item, which exists inside Cache(MTS). CM(MTR) : the master of Cluster(MTR).

PT

ED

1: MTR send Item Request Message for the item Ix to CM(MTR), denoted as; IRM(MTR, Ix). 2: Buffer= 3: For each MTjCluster(MTR) Do 4: CM(MTR) searches its index in the entries of MTj for the missed item (Ix). 5: If MTj has a valid copy of Ix then 6: Get the Life_Time(Ix) 7: Add life_Time(Ix) along with its argument to Buffer. // the argument is Ix’s hosting node (i.e., MTj) 8: End if 9: Next 10: If Buffer= Then 11: Trigger a cluster miss 12: else 13: MTS = argmin [Life_Time(Ix)] // choose the host which has the most fresh copy of Ix.  Life_Time(Ix)Buffer

If CM(MTR).State is statefull then CM(MTR) sends the direct path(MTRMTS) to MTR. MTR fetches Ix from MTS Else CM(MTR) fetches Ix from MTS. CM(MTR) sends Ix to MTR. End if

AC

CE

14: 15: 16: 17: 18: 19: 20: 21:

End if

Algorithm 4: Anti-Zone Miss (AZM)

Surely, keeping sufficient copies of the same data item promotes the data availability, so, if one copy becomes unreachable (the hosting MT leaves the network, suffers from a sudden death, or replaces the data), the remaining copies can be used. In MANETs, if MTs connect to the network, their caches cannot be used immediately, and the overall caching utilization decreases. On important feature of KSC procedure, is that it employs efficient cache placement and replacement policies, which consider the available caching space and node mobility for improving data accessibility in a cluster. Accordingly, the cache hit ratio in the cluster can be increased and the average search latency can be reduced significantly. The procedure used to fetch a data item is illustrated in flowchart 2. 20

ACCEPTED MANUSCRIPT

Anti-Cluster Miss (ACM) Algorithm Input: o o o o o



Trigger:



Output:



Steps:

o o

15: 16: 17: 18: 19:

path(MTR MTS): the path from MTR to MTS (i.e., the path to the missed item).

NOCM=PHM(CM(MTR), MTR) // NOCM stands for Neighboring Observed Cluster Masters For each cluster master CMNOCM Do CM searches its index for Ix If CM has a valid copy of Ix OR CM is the data Source then Please Help Me (PHM) Procedure If CM.State is statefull then CM sends path(MTS GR) Procedure PHM(CMi , Requester) to GR , where GR is the 1: CMi sends Please Help Me Message to its Gateways for the missed item Ix, denoted as; PHM2(Ix). gateway that issues the 2: For each Gateway GCMi.Gateways Do request for Ix. 3: G propagates PHM2(Ix) to all CMs they can observe. End if While discarding any CM receives the same Construct the path(MTR MTS) PHM2(Ix) before. Send path(MTR MTS) to MTR 4: store the Observed Cluster Masters into OCM set MTR fetches Ix. 5: If CMi.State is statefull Then 6: Sent path(RequesterG) to G. (additionally, G Start Item Cancellation Procedure (ICP) has the path to CMi) Exit 7: End if End if 8: Next Next

AN US

7: 8: 9: 10: 11: 12: 13: 14:

MTR suffers from a cluster cache miss.

8: Return(OCM) End Procedure

TTF(NOCM, 1) If TTF(NOCM, 1) has failed to find Ix Then TTF(ROCM(1), 2) else Exit If TTF(ROCM(1), 2) has failed to find Ix Then TTF(ROCM(2), 3) else Exit If TTF(ROCM(2), 3) has failed to find Ix Then TTF(ROCM(3), 4) else Exit The procedure is repeated recursively until the item Ix fetched from a remote node or from the data source.

M

1: 2: 3: 4: 5: 6:

Ad hoc networks divided into a set of overlapped clusters in which each cluster has a master node as well as a set of cluster slaves. MTR : the requester mobile host, which suffers from a zone cache miss. MTS : the source mobile host, which exists outside Cluster(MTR). It stores a valid copy of the missed tem Ix in its cache. Ix: a valid copy of the missed item, which exists inside Cache(MTS). CM(MTR) : the master of Cluster(MTR).

CR IP T



ED

Try To Find (TTF) Procedure Procedure TTF(OCM, i)

1: 2:

AC

CE

PT

3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

For each cluster master CMOCM Do ROCM(i)=PHM(CM, GR),where GR is the gateway that issues the request for Ix. // ROCM stands for Remote Observed Cluster Masters For each cluster master CMROCM(i) Do CM searches its index for Ix If CM has a valid copy of Ix OR CM is the data Source then If CM.State is statefull then CM sends path(MTS GR) to GR End if Construct the path(MTR MTS) Send path(MTR MTS) to MTR MTR fetches Ix. Start Item Cancellation Procedure (ICP) Exit End if Next

End Procedure

Algorithm 5: Anti-Cluster Miss (ACM)

21

ACCEPTED MANUSCRIPT

Assumptions:  Ix: Is an item that needs to be fetched.  MTR: The Mobile Host that suffers from a local cache miss for item Ix.  MTS: The Mobile Host that caches the required item Ix.  The cluster size is 3 hops away from the cluster master.

Different cases of fetching an item (Ix)

Case 2 Case 1

Statefull

CMR

CMR

Stateless

CMR

MTR

MTR

Result

CR IP T

Status: a valid copy of Ix exists at MTS Zone(MTR). Trigger: MTR suffers from a local cache miss. Action: MTR asks MTi  MTi Zone(MTR) for Ix. Result: MTS sends Ix to MTR (Zone cache hit).

MTR

AN US

MTS

MTS

CMR asks MTS to send Ix to MTR, CMR also sends the shortest valid path MTS  MTR to MTS, then MTS establishes the connection to send Ix to MTR.

MTS

CMR gets item Ix from MTS, then sends it to MTR.

M

 Status: a valid copy of Ix exists at MTS  Neighboring_Clusters(Cluster(MTR))  Trigger: MTR suffers from a cluster cache miss.  Action: CM(MTR) asks the masters of neighboring clusters through the cluster gateways.

ED

Case 3

Result

Statefull

PT

Stateless

CMS

CMR

CE

CMS

CMR

Y MTS

X

AC

   

 Status: a valid copy of Ix exists at MTS  [Cluster(MTR) - Zone(MTR)].  Trigger: MTR suffers from a zone cache miss.  Action: MTR asks CM(MTR) for Ix.

MTS

MTR

MTR CMS get item Ix from MTS, then sends it to the common gateway X, which sends Ix to CMR. then CMS sends Ix to MTR.

Cluster Master Cluster Slave

CMS sends the shortest valid path between the common gateway Y and MTS to Y, then Y gets the shortest path to MTR from CMR. Finally gateway Y concatenate the two paths to obtain the direct path between MTR and MTS , then Y sends the path MTS MTR to MTS, which in turn sends Ix to MTR.

Cluster Boundary Item source Figure 10, different cases of fetching a data item

22

Node zone Item requester

ACCEPTED MANUSCRIPT

Case 4

 Status: a valid copy of Ix exists at remote cluster, i.e., MTS  [Network - Neighboring_Clusters(Cluster(MTR))]  Trigger: MTR suffers from a neighboring clusters cache miss.  Action: masters of neighboring clusters forward the request to their neighboring masters through the gateways.

Stateless

Result

Statefull

CMS

CMS MTS

CR IP T

MTS

CMj

CMi

CMR

AN US

CMR

MTR

All the embedded cluster masters are included in the path between MTR and MTS

MTR

The missed item Ix is sent from MTS to MTR using the shortest path through the common gateways between the embedded clusters

ED

M

Case 5  Status: a valid copy of Ix exists at the datasource  Trigger: MTR suffers from a network cache miss.  Action: masters of neighboring clusters forward the request to their neighboring masters through the gateways till reaching datasource.

Stateless

CMk

AC

CE

CMk

Statefull

PT

Result

CMj

CMj

CMR

CMR

MTR

MTR

All the embedded cluster masters are included in the path between the data source and MTS

The missed item Ix is sent from the data source to MTR using the shortest path through the common gateways between the embedded clusters

Figure 11, different cases of fetching a data item (Cont.)

23

ACCEPTED MANUSCRIPT

Item Cancellation Procedure Trigger: o

Target: o



Assumptions: o o o



Cancelling the continuity of discovering and returning new paths to the missed item

MTR: the mobile host that suffered from local, then zone, then cluster cache misses. CMR: the master of the cluster to which MTR belongs. Ix: is the missed item.

CR IP T



When the missed item Ix has been already remotely fetched from other clusters (the missing here is due to cluster cache miss).

Steps: o o o o

Immediately after the reception of the missed item Ix, CMR initialize the item cancellation procedure by sending Item Cancellation Message (ICM) for the item I x, e.g., ICM(CMR, Ix) to all its gateways. Upon the reception of ICM, each gateway propagates ICM to other cluster masters it can observe. The other cluster masters will propagate ICM to the gateways of their clusters that have previously received the request for Ix and have not received a reply yet to block returning new paths to Ix. This procedure will be continued till informing all gateways to block discovering new paths for Ix.

AN US



Algorithm 6, Item Cancellation Procedure.

Illustrative Example

M

As Anti-Cluster Miss (ACM) algorithm is the most complicated one as it combines almost all situations and problems, it will be explained via an illustrative example through the next paragraphs as well as figures 12 and 13.

1.

2. 3.

4.

5.

PT

CE

Steps:

MTR: a mobile terminal which suffers from local, zone, then cluster cache miss for item Ix. MTS: a remote mobile terminal which owns a valid copy of Ix in its local cache. Ix: the missed item. CMR: the master of the cluster where MTR resides. CMS: the master of the cluster where MTS resides. Cluster size assumed to be three hops away from the cluster master (e.g., µ=3).

CMR sends Please Help Me Message for item Ix, denoted as; PHM2(CMR,Ix,MID), to all gateways of its cluster. MID refers to the Message Identification Number, which distinguishes the different requests from the same cluster master to the same item. This is illustrated in figure 12 (A), at which CMR has two gateways; nodes „e‟ and „v‟ (e.g., Gateways(CMR)={e,v}). All Gateways(CMR) propagate PHM2(CMR,Ix,MID) to other cluster masters they can observe. This is depicted in figure 12 (B), at which gateways „e‟ and „v‟ ask CMi and CMj for the item Ix. For simplicity, we will follow the request of gateway „e‟. At this situation, gateway „e‟ stores in its Request For Help (RFH) table the direct paths to CMR and CMi, which it had identified during the last cluster installation process, which are; and respectively. Moreover, if the state of CMR is statefull, then CMR will send Path(eMTR) to gateway „e‟, which is; . Now continue following PHM2(CMR,Ix,MID), which was sent to CMi by gateway „e‟. CMi will immediately search its Cluster Caching Index (C2I). Assuming that there is no valid copy of item Ix in C2I(CMi), then CMi propagates PHM2(CMR,Ix,MID) to

AC

     

ED

Assumptions:

Gateways(CMi) except the requester gateway (e.g., gateway „e‟) or any gateway that had received PHM2(CMR,Ix,MID) before. This is illustrated in figure 12 (C), at which CMi propagates the request for Ix to „m‟ as it is the only one gateway for CMi. Also, gateway „m‟ stores in its Request For Help (RFH) table the direct paths to CMi and CMS, which it had identified during the last cluster installation process, which are; and respectively. Moreover, if the state of CMi is statefull, then CMi will send Path(em) to gateway „m’, which is; .

24

ACCEPTED MANUSCRIPT

6. 7.

Gateway „m‟ will propagate PHM2(CMR,Ix,MID) to the other cluster masters it can observe. So, as illustrated in figure 12 (D), gateway „m‟ asks CMS for item Ix. When CMS searches its C2I, it finds that there is a valid copy of Ix at MTS. Hence, it starts to concatenate the path from MTS to MTR. The result of such process is illustrated in table 3 considering all cases of the embedded cluster masters (e.g., CMR, CMi, CMS) assuming „F‟ refers to statfull, while „L‟ refers to statless. Those paths are shown graphically through figure 12 (E and F) and figure 13 (AF).

ED

M

AN US

CR IP T

As observed in table 3, the shortest path, which introduces the smallest number of hops is obtained when the state of all embedded cluster masters (e.g., CMR, CMS, and CMi) is statefull. This is clearly appears in the last row of table 3. On the other hand, the longest path is obtained when the state of all embedded cluster masters stateless. This is clearly appears in the first row of the table. The conclusion is that although the statefull state requires more processing power, it introduces better decisions as it suggests shortest paths between the source and the destination of the missed item. To clarify the concept, for illustration, we will explain how to concatenate the path MTRMTS when State(CMR)=Statefull, State(CMi)=Statefull, and State(CMS)=Sateless. This can be accomplished by following the next procedure; upon the reception of the request for item Ix from gateway „m’, CMS searches its index for Ix, then it finds that there is a valid copy of Ix at CMS cache. Accordingly, CMS sends path(CMSMTS) to gateway „m’. As „m’ stores path(mCMS) in its RFH, „m’ can conclude path(mMTS) by concatenating path(mCMS) and path(CMS MTS), which are; mnopCMS and CMSqrMTS respectively. Hence, path(mMTS) will be; mnopCMSqrMTS. On the other hand, as State(CMi) is statefull, so gateway „m’ has the direct path to gateway „e’ (e.g., path(me)), which forwarded the request for item Ix from MTR. Gateway „m’ can simply conclude path(em) by reversing path(me), which will be; efgzyxwlm. By concatenating path(em) and path(mMTS), gateway „m’ obtains path(eMTS), which will be; efgzyxwlmnopCMSqrMTS. Then, gateway „m’ sends path(eMTS) to gateway „e’. As MTR is statefull, then gateway „e’ has the direct path to MTR in its RFH, denoted as; path(e MTR). Then „e’ can simply obtain path(MTRe), which is; MTRse. At this point, gateway „e’ can conclude path(MTRMTS) by concatenating path(MTRe) and path(eMTS), which will be; MTRse fgzyxwlmnopCMSqrMTS. Such path is sent to MTR, which will immediately initiates the request for the missed item Ix from MTS.

Table 3, The available concatenated paths from MTS to MTR („F‟ refers to statfull, while „L‟ refers to statless) State(CMi)

L

L

L

L

L

F

L

F

F

F

MTSrq CMSponmlkj CMiihgfedcCMRbaMTR

23 hops

MTSmlkjCMiihgfedcCMRbaMTR

17 hops

L

L

MTSrq CMSponmlwxyzgfedcCMRbaMTR

22 hops

F

MTSmlwxyzgfedcCMRbaMTR

16 hops

L

L

MTSrq CMSponmlkj CMiihgfesMTR

19 hops

L

F

MTSmlkjCMiihgfesMTR

13 hops

L

MTSrq CMSponmlwxyzgfesMTR

18 hops

F

MTSmlwxyzgfesMTR

12 hops

F

AC

F

Length

F

CE

F

Path(MTS  MTR)

State(CMS)

PT

State(CMR)

F

25

ACCEPTED MANUSCRIPT

A

CMS

B

CMS MTS

MTS

RFH table of gateway „e‟ CMi

CMi

Dest.

i h

Path

State

CMR

eba CMR

Both states

CMi

efghiCMR

Both states

MTR

esMTR

CMR is Statefull

f g e

e d MTR

CMR b u

CMR

CMj

t

a

CR IP T

c

CMj

b

s MTR

v

v

Dest.

CMS p

Dest.

Path

State

CMi

mlkjCMi

Both states

CMs

mnop CMS

Both states

mlwxyzgfe

CMi is Statefull

e

MTS

AN US

RFH table of gateway „m‟

RFH table of gateway „m‟

CMS p

Path

mlkjCMi

Both states

mnop CMS

Both states

mlwxyzgfe

CMi is Statefull

e

MTS

State

CMi

CMs

o

n

o

m

m

n

RFH table of gateway „e‟ j

Dest.

CMi

x

w

i y

h

z

Path

CMR

edc CMR

Both states

CMi

efghiCMi

Both states

MTR

es MTR

CMR is Statefull

f

g

e d s MTR

CMR

RFH table of gateway „e‟

CMi

State

w

x

y

M

k

l

l

z

Dest.

Path

CMR

edc CMR

Both states

CMi

efghiCMi

Both states

State

MTR

es MTR

CMR is Statefull

f

g

e d s

CMR c

MTR

ED

c

CMj

D

CE

PT

C

CMj

State(CMR) L

CMS q MTS

r p

m

n

AC

o

l

k

State(CMi) L

State(CMS) L

State(CMR) F

CMS q

State(CMi) L

State(CMS) L

MTS

r p o

m

n

j

CMi

l

k

j CMi

i

i h

h f

f

g

g e

e

d

E

c MTR

a

F

CMR

MTR

b

CMj

CMj

Figure 12, Getting a data item for the illustrative Example

26

s

CMR

ACCEPTED MANUSCRIPT

State(CMR) L

CMS

State(CMi) L

State(CMS) F

State(CMR) F

CMS

MTS

State(CMi) L

m

m j

k

l

CMi

j

k

l

CMi i

i

h

h f

f g

g e

e

d

A

c

B

CMR

s MTR

b

a

CMj

CMj

State(CMR) L

CMS q

State(CMi) F

State(CMS) L

AN US

m

n

State(CMi) F

State(CMS) F

MTS

p o

State(CMR) L

CMS

MTS

CMR

CR IP T

MTR

r

State(CMS) F

MTS

m

CMi

l w

CMi

l

x

x

w

z

y

z

y

f g e

d c

MTR

a

e

d

D

CMR

b

M

C

f g

a

CMR

b

CMj

PT

ED

CMj

c MTR

State(CMR) F

CMS

State(CMi) F

CE

MTS

State(CMS) F

CMS

w

r o

CMi

AC

w

CMi

z

x y

z

f

f

g

g e

E

State(CMS) L

m

n l

x

y

State(CMi) F

MTS

p

m

l

State(CMR) F

q

s

e

F

CMR

s

MTR

CMR

MTR

CMj

CMj

Figure 13, Getting a data item for the illustrative Example (Cont).

27

ACCEPTED MANUSCRIPT

4.5. Maintaining Cache Consistency The proposed caching technique uses a simple weak consistency model based on time to live (TTL), in which a MT considers a cached copy up-to-date if its TTL has not expired [26]. The client removes the cached data when the TTL expires. A client refreshes a cached data item and its TTL if a fresh copy of the same data passes by or if the cluster master detects a new copy of the item is cached in one of the cluster members. If the fresh copy contains the same data but a newer TTL, the node updates only the cached data‟s TTL field. If the data center has updated the data item, the node replaces both the cached data item and its TTL with the fresh copy [41].

CR IP T

4.6. Adaptive Cache Replacement (ACR) In cooperative caching, the aim is to cache the most useful data items, which are often requested not only locally by the node itself, but also by neighboring nodes [11]. Replacement is required if the cache is full while it is needed to cache a new data item. Hence, one or more items have to be removed from the cache to make room for the new item that has to be brought in. The challenge here is to choose the victim to be replaced by the new item. Generally, it will be better to choose the item that will not be heavily used.

AN US

For our cache replacement policy, each data item in the local cache will be assigned an Importance Value (IV). So, when it is needed to replace an item, the Least Important Item (LI2) will be replaced. The important issue now is how to estimate the IV of each cached item. Several parameters were combined to conclude the item's IV, which are; Item Access Frequency (LAF), Remote Access Frequency (RAF), Local Usage Recency Ratio (LUR2), and items Size (S). Definition 19: Local Access Frequency (LAF) Item local access frequency is the number of times the item is accessed locally. Then, it reflects the item's usage by the node itself.



Definition 20: Remote Access Frequency (RAF) remote local access frequency is the number of times the item is accessed remotely from other neighboring nodes. So, it reflects the item's popularity.

ED

M



AC

CE

PT

The item's local access frequency is considered because the more frequent the item is requested locally, the more the probability that it will be requested in the near future. Generally, it is needed to evict the less frequently locally requested data items. Moreover, in cooperative caching, data is not stored only on behalf of the caching node but also should take the interest of the neighboring nodes into account. Accordingly, LFU [11], LRU [12], and SXO [13] are not suitable for MANETs. Our emphasis to remove the items whose removals will introduce the least effect on both the requirements of the neighbors as well as the data availability within the cluster. To accomplish such aim, the item's Remote Access Frequency (RAF), which reflects the item's popularity is considered. It will be better to replace the item with the maximum RAF among all the cached items. This replacement policy is used because removing the item with the highest RAF has no significant effect on the data availability as other neighboring nodes have cached the same data item. Hence, as many times a data item is remotely accessed, its RAF is increased by one. Accordingly, this policy is better than LFU, LRU, and SXO as it is based on the recent interests of the neighboring nodes. To maximize the cache hit, it will be better to replace the item with larger size as it takes more cache space. Accordingly, the cache can accommodate more data items and then, it can satisfy more requests. Another parameter that should be taken into consideration is the item's Local Usage Recency (LUR). It is observed that items that have been heavily used recently will probably be used in the near future. Conversely, the data items that have not been used for ages will probably remain unused for a long time. Hence, it will be better to remove the items with the maximum LUR values. However, as the TTL value of the item is also a vital parameter, it is combined with LUR in one parameter called Local Usage Recency Ratio (LUR2).

28

ACCEPTED MANUSCRIPT 

Definition 21: Local Usage Recency Ratio (LUR2) is the ratio between time elapsed from the last local use of the item and the item's Time To Live (TTL). So, LUR2(Ix)=Time_Since_Last_Local_Use(Ix)/TTL(Ix). The aim is to keep the items with the minimum LUR2.

LUR2 overcomes the deficiency of the traditional LRU replacement policy, which considers only the local recency of the item with no awareness of the items TTL. As it will be better to keep those items that are recently used keeping in mind the suitable value of the items‟ TTL, it will be a good choice to keep the items with the minimum LUR2.

AN US

CR IP T

An important issue that should be considered is the duration on which the replacement parameters (e.g., LAF, RAF, LUR2, and S) are calculated. To the best of our knowledge, such issue was ignored by the previous work in the literature as they measure such parameters accumulatively during the items‟ TTL or during the network lifetime [23-25]. However, for accurate replacement decisions, we claim that calculating the replacement parameters should be done as closed to the time of taking the replacement decision as possible. To accomplish such aim, calculating the replacement parameters takes place during the current as well as the previous time stamps. On the other hand, as the replacement decision should be based on all the replacement parameters, we should find a way to combine those parameters for formulating the replacement decision. Fuzzy inference is suitable for uncertain or approximate reasoning. It allows decision making with estimated values under incomplete or uncertain information [28]. Hence, it can successfully employed to assign an importance value for the data items stored in the cache. Then the item with the lowest importance could be replaced by the new one. Applying the fuzzy inference system should be implemented through three steps, as depicted in figure 14, which are; (i) fuzzification, (ii) Fuzzy Rule Induction, and (iii) defuzzification. Input Membership Function

Fuzzification

M

Crisp Input

AC

CE

PT

ED

Fuzzy Input

Rule Induction

Rules / Inference

Fuzzy Output

Crisp Output

Defuzzification

Output Membership Function

Figure 14, Operation of Fuzzy System

a. Fuzzification

Four different fuzzy sets, which are; LAF, RAF, S, and IUR2 will be considered. During the fuzzification process, the input crisp values are transformed into grades of membership for linguistic terms, “Low” and “High” of the considered fuzzy sets. For each fuzzy set, a membership function is used to provide the similarity degree of the crisp input to the fuzzy set. It returns a value between 0.0 (for non-membership) and 1.0 (for full-membership). The membership functions for the considered four fuzzy sets are illustrated in figure 15. The considered values of α and β are illustrated in table 4.

29

ACCEPTED MANUSCRIPT

μ(RAF)

μ(LAF) High

Low

1.0

Table 4, the assigned values of α and β

0.5

0.5 0.0 αL

βL

Local Access Frequency

0.0 αR

βR

Parameter αL βL αR βR αS βS αU βU

Remote Access Frequency

μ(IUR2)

μ(S) High

Low

High

Low

1.0

Assigned value 1 5 1 5 5 15 0.25 0.5

0.5

0.5 Size

0.0 αS

0.0 αU

βS

βU

Item Usage Recency

CR IP T

1.0

High

Low

1.0

Figure 15, The membership functions for the considered fuzzy sets

AN US

b. Fuzzy Rule Induction

M

The output of the fuzzification process is the input for the fuzzy rule base. The considered rules are in the form; if (X is A) AND (Y is B) AND (Z is C) …… THEN (M is D), where X, Y, and Z represent the input variables (e.g., LAF, RAF, S, and IUR2), while A, B, and C represent the corresponding linguistic terms (e.g., low or high), M represents the rule output, and finally D represents one of the linguistic terms (low, medium, or high). The R.H.S of the rule (before THEN) is called “antecedent”, while the L.H.S (after THEN) is called “consequent”. There are 16 rules, which are depicted in table 5 (assuming „L’ refers to “Low”, „H’ refers to “High”, and „M’ refers to “Medium”). For illustration, the second rule in table 5 indicates that; IF LAF(Item) is Low AND RAF(Item) is Low AND Size(Item) is Low AND IUR2(Item) is High THEN Output is Low. Table 5, The used fuzzy Rules RAF L L L L H H H H

S L L H H L L H H

IUR2 L H L H L H L H

Rule output M L H M L L M L

ED

LAF L L L L L L L L

PT

ID 1 2 3 4 5 6 7 8

ID 9 10 11 12 13 14 15 16

LAF H H H H H H H H

RAF L L L L H H H H

S L L H H L L H H

IUR2 L H L H L H L H

Rule output H M H H M L H M

AC

CE

Four different methods of fuzzy rules inference are available, which are; max-product, max-min, sum-dot, and drastic product. The max-min is the considered method in this paper. It is based on choosing a min operator for the conjunction in the premise of the rule as well as for the implication function, while using the max operator for aggregation. For illustration, consider a simple case of two items of evidence per rule, so, the corresponding rules is illustrated in table 6. Table 6, The fuzzy rules using two items of evidence per rule Rule ID

1 2 -------

N

Rule

IF X11 AND X12 THEN Y1 IF X21 AND X22 THEN Y2 -------------------------------------

IF XN1 AND XN2 THEN YN

30

ACCEPTED MANUSCRIPT

Accordingly, the max-min compositional inference rule can be illustrated in (1).  

aggregation

 j  1,2,3,...., N 

Y  max  min   j1 ,  j 2  implication

(1)

This yield (2);

Y  max[min( X ,  X ), min( X ,  X ),........., min( X ,  X )] (2) 12

21

22

N1

N2

CR IP T

11

c. Defuzzification

μ(IV)

M

Medium 1

AN US

Since the output of the inference engine is a fuzzy set while crisp values are often required for most real life applications, the output of the fuzzy rules should be defuzzified. Defuzzification is the mapping from a space of fuzzy into a space of non-fuzzy actions. The most common defuzzification techniques are; max-criterion, centerof-gravity, and the mean of maxima. The Center of Gravity method (COG) is the most popular defuzzification technique at which the weighted average of the area bounded by the membership function curve is computed to be the crisp value of the fuzzy quantity. In our case, defuzzification was accomplished using the output membership function illustrated in figure 16. Hence, consider a data item di whose input parameters are; LAFi, RAFi, Si, and IUR2i The output value of the defuzzification process results in a crisp value that combines evidence from the item parameters and accordingly indicates the item's importance. So, the output crisp value of the defazification process is considered as the Importance Value of the item di, e.g., IV(di).

High

Low

βo=7

ED

αo=4

γo=10

Importance Value (IV)

Figure 16, The output membership function.

AC

CE

PT

Illustrative example Consider a cache that is filled with 5 data items as illustrated in table 7. The target is to replace one of those items by a new one. The issue now is how to choose the victim item to leave the cache hopping to create a sufficient size to cache the new required item. We will follow the steps of ACR to elect the most suitable victim. Figure 17 illustrates in details the sequential steps that should be followed to calculate the importance value of a cached item considering the item d1 as an example. Table 7, The considered data items for the illustrated example. Item

LAF

RAF

S (KB)

IUR2

d1 d2 d3 d4 d5

3 1 4 2 1

2 4 2 3 6

7 8 15 10 5

0.4 0.6 0.1 0.8 0.2

As illustrated in figure 17, the importance value of d1 was calculated as 7.13. Repeating such scenario four more times, the importance values of the items d2, d3, d4, and d5 are; 3.24, 9.32, 4.51, 4.6 respectively. Hence, the victim item will be d2. Another attractive feature of ACR is that the evicted data may be stored in neighboring MTs which have free space. Accordingly, when a decision is taken to replace an item, the Keep Sufficient Copies (KSC) procedure, 31

ACCEPTED MANUSCRIPT

which will be discussed in details through section 4.9, is called to insure that there still exist a sufficient number of copies of the evicted item inside the cluster. Calculate the fuzzy outputs for each rule.

Step 1

Find the fuzzy set membership values for the item d1 (LAF=3, RAF=2, S=7, IUR=0.4) μ(RAF)

μ(LAF) High

Low

1.0

High

Low

1.0 0.7

Local Access αL=1 LAF=3 βL=5 Frequency

0.0

 

Remote Access αR=1 RAF=2 βR=5 Frequency

 

μlow(LAF=3)=0.5 μHigh(LAF=3)=0.5

High

  

Low

High

1.0 0.6 0.4

0.2 0.0

Size αS=5 S=7

αU=0.25

 

μlow(LAF=7)=0.8 μHigh(LAF=7)=0.2

0.4

βU=0.5

μlow(LAF=0.4)=0.4 μHigh(LAF=0.4)=0.6

L H L H L H L H L H L H L H L H

Rule output M L H M L L M L H M H H M L H M

Output membership Min(0.5,0.7,0.8,0.4)=0.4 Min(0.5,0.7,0.8,0.6)=0.5 Min(0.5,0.7,0.2,0.4)=0.2 Min(0.5,0.7,0.2,0.6)=0.2 Min(0.5,0.3,0.8,0.4)=0.3 Min(0.5,0.3,0.8,0.6)=0.3 Min(0.5,0.3,0.2,0.4)=0.2 Min(0.5,0.3,0.2,0.6)=0.2 Min(0.5,0.7,0.8,0.4)=0.4 Min(0.5,0.7,0.8,0.6)=0.4 Min(0.5,0.7,0.2,0.4)=0.2 Min(0.5,0.7,0.2,0.6)=0.2 Min(0.5,0.3,0.8,0.4)=0.3 Min(0.5,0.3,0.8,0.6)=0.3 Min(0.5,0.3,0.2,0.4)=0.2 Min(0.5,0.3,0.2,0.6)=0.2

Defuzzification

μ(IV)

Low

1

Medium

High

Use the center of gravity method:

M

 

Usage Recency

0.0

βS=15

IUR2

L L H H L L H H L L H H L L H H

Low with membership=MAX(0.2,0.3, 0.5)=0.5 Medium with membership=MAX(0.2,0.4)=0.4 High with membership=MAX(0.2, 0.4)=0.4

Step 3

Low

S

L L L L H H H H L L L L H H H H

So that the output will be:

μlow(RAF=2)=0.7 μHigh(RAF=2)=0.3

μ(IUR2)

μ(S) 1.0 0.8

0.3 0.0

RAF

L L L L L L L L H H H H H H H H

AN US

0.5

LAF

CR IP T

Step 2

0.5 0.4 0

10 4 7 Importance Value (IV)

14

=7.13

Hence the importance value of the item d1 is 7.13

4.7. Prefetching Policies

ED

Figure 17, illustrative example showing how to calculate the importance value of a cached item using a fuzzy inference system.

AC

CE

PT

In this section, a new prefetching policies will be introduced. We consider the problem of caching text files, which is the most frequent cached type of data. Each item has an attached envelope. The item envelope stores Ф keywords that express the content of the entire item in the stemmed form as illustrated in figure 18. Several techniques can be followed to prepare the item‟s envelope (e.g., electing the Ф keywords that express the item). The keywords that are contained in the items envelope can be chosen manually by the item‟s creator, hence, they are specified at the item‟s creation time. If they are not specified by the item creator, they can be set automatically at the data source by choosing the most frequent Ф keywords in the item, then those keywords are stemmed using the Wordnet [29]. The proper value of Ф can be set based on the processing power of the network MTs. The derived prefetching rules can be used to take the decision whether to cache an item or not if the item is passing-by a MT. The MT takes the decision to cache a passenger item if it‟s a friend to those items currently cached inside its cache. The degree of friendship can be determined based on text similarity methods. Item representative Ф keywords

Item Envelope Data Item

Figure 18, Item Envelope

32

ACCEPTED MANUSCRIPT The idea is to calculate the similarity between the passenger (passing-by item) and those items reside in MT‟s internal cache, if the similarity level exceeds a predefined value, it will be valuable to prefetch the item. Otherwise the item is neglected and allowed to pass without prefetching it. In this section, a new prefetching policy will be introduced, which is called Dynamic Prefetching Policy (DPP). Let Kji to be the ith keyword of the jth item‟s envelope. So, the weight of Kji, which is denoted as; KWji can be calculated using (3) as;

KW ji   η * LAFρi  χ * RAFρi  (3) ρ 1



IWj   KW ji

AN US

i 1

CR IP T

Where; m is the number of items in which the keyword Kji appears in their envelopes, LAFki and RAFki are respectively the local and remote access frequencies of the kth item that the keyword Kji appears in its envelope, 𝜂 and 𝜒 are equation weighting factors that allow to give different weights to the local and remote access frequencies. It is the task of the network administrator to assign their values. Generally, the value of 𝜂 should be greater than 𝜒 as it is logically to assign higher weight for the local access frequencies compared with the remote access frequencies. Then, the weight of the jth item in the MT‟s local cache, denoted as IWj, can be calculated using (4) as;

  m  IWj      * LAFi   * RAFi  i 1   1 

(4)

  m      * LAFi   * RAFi   r 1  i 1  1     q

ED

Wavg

M

After calculating the weight of each data item currently in the MT‟s local cache, the average items weight, which is denoted as; Wavg is calculated. Assuming the local cache contains q items, then Wavg can be calculated using (5) as; (5)

q

CE

PT

When it is needed to take a decision whether to prefetch a passenger (e.g., data item) Ipass, initially, the local cache is checked to ensure that there is a sufficient room to host Ipass. If there exists a sufficient space, the weight of the passenger is calculated using (6), denoted as; IWpass, based on the keywords inside its envelope. Then, IWpass is compared with Wavg. Then, Ipass is prefetched if IWpass ≥ Wavg.

IWpass 

 KW

i K i Envelope( I pass )

(6)

AC

Illustrative Example

Through this subsection, an illustrative example showing how the prefetching process takes place will be introduced. For simplicity, it is assumed that the cache of MTy contains five items denoted as; Ix x {1,2,3,4,5}. Also, the envelope of each item contains only three keywords (e.g., Ф=3). Figure 19 shows the contents of each items envelope as well as the local and remote access frequencies of each cached item. Initially the distinct keywords are identified, then based on LAF and RAF of each cached item, the weight of each keyword is calculated assuming 𝜒=𝜂=1, which gives equal weight for local and remote access frequencies. For illustration, the weight of the keyword “Network”, which is found in the 1st, 3rd and 4th items is calculated as; KW11=(2+3)+(2+5)+(6+1)=19. After calculating the weight of each keyword appeared inside the envelops of all items, the weight of each item is also calculated. For illustration, the weight of the 1 st item (e.g., I1) is calculated

33

ACCEPTED MANUSCRIPT

as; IW1=KW11+KW12+KW13=(19+12+5)=36. Then, the average items weight is calculated using (5), hence, Wavg=(36+19+37+44+26)/5=32.4.

2

3

4

5

Network

K12 K13 K21 K22 K23 K31 K32 K33 K41 K42 K43 K51 K52

Computer

K53

LAF

RAF

Item weight( IW)

3

2

IW 1=36

4

0

IW 2=19

2

5

IW 3=37

Routing System Server Program Network Database Server Database Network

6

1

IW 4=44

Program Computer Memory Processor

4

3

IW 5=26

CR IP T

1

Item Envelope

K11

Distinct keywords

Keyword weight (KW)

Network

19

Computer

12

Routing

5

System

4

Server

4

Program

11

Database

14

AN US

Item id

Memory

7

Processor

7

Figure 19, calculating the items‟ weights for the illustrative example

PT

ED

M

Now assume that a new passenger Ipass whose envelope, Envelope(Ipass)={“Network”, “Database”, “Application”}. Based on the pre-calculated weights of the items currently stored in the local cache, the weight of the passenger item can be calculated as; IWpass=19+14+0=33, which is greater than Wavg. Hence, the decision is to prefetch Ipass. If there is sufficient area in the local cache of MTy, then Ipass is stored immediately, otherwise, the cache replacement procedure (e.g., ACR) is called to make room for Ipass. It is important to mention that in both cases, Keep Sufficient Copies (KSC) procedure is called. In the first case (e.g., there is sufficient space to store Ipass in the local cache of MTy), KSC removes a copy of Ipass from the cluster, while if cache replacement takes place at MTy, KSC tries to find a room for the replaced item at remote MT.

CE

4.8. Cluster Master Considerations

AC

ACCS is built upon a C-MANET architecture. Accordingly, the network is managed by a per-defined set of θ cluster masters, and then, the network has θ clusters. Although assigning the number of network clusters is a true challenge, the proposed clustering technique used by ACCS has the ability to efficiently grouping the network MTs for any number of θ. Generally, to guarantee a satisfactory performance, CMs should efficiently address three basic issues, which are managed by three different agents. Those agents are continuously running at CM and constitute the cluster master manager as illustrated in figure 20. The three issues that should be addressed by each CM are; (i) holding routing and topology information to relax other cluster‟s MTs from such requirement, (ii) cluster formulation, so, setting up the network clusters is initialized by the pre-defined cluster masters, and (iii) redundancy management, then, CM is responsible of keeping sufficient copies (ζ copies) of the same data item promotes the data availability. So that; if one copy becomes obsolete (the hosting MT leaves the network, suffers from a sudden death, or replaces the data), the remaining copies can be used.

34

ACCEPTED MANUSCRIPT

Redundancy Remover Agent (RRA)

Cluster Formulation Agent (CFA)

CR IP T

Cluster Master Manager

Cluster Database

AN US

Data Routing Agent (DRA)

Figure 20, Cluster Master Manager

M

4.9. Keep Sufficient Copies (KSC) Procedure

PT

ED

Generally, multiple copies of the same data item should be cached at different locations of the cluster to ensure data accessibility. However, one of the most challenging problems that badly impacts the efficiency of the caching techniques is the redundancy problem. Such problem not only consumes the local cache area of network mobile terminals but also causes consistency problems. Surely, keeping sufficient copies of the same data item promotes the data availability, hence, if one copy becomes obsolete (the hosting MT leaves the network, suffers from a sudden death, or replaces the data), the remaining copies can be used. However, keeping so many copies of the same data item requires additional overhead for making those copies consistent.

AC

CE

In C-MANETs, the network is portioned into non-overlapped clusters in which each cluster becomes under a full control of the cluster master. The role of the cluster master is to insure the existence of an effective communication channel among the cluster slaves. Under such situation in which MTs in the same cluster have the ability to negotiate with each other‟s for data exchange, it will be helpful to save the cluster cache storage by maintaining only ζ copies of the same data item. The suitable value of ζ can be chosen according the mobility rate of the network mobile terminals. Accordingly, in highly dynamic networks, ζ should be high as the network topology changes dramatically. Several MTs, which own the same copy of a data item may leave the cluster in the same time, which causes availability problems. Keep Sufficient Copies (KSC) procedure is used to maintain the data availability as well as minimizing the redundancy within the network clusters, which is depicted in flowchart 3. As illustrated in Flowchart 3, KSC procedure starts immediately after formulating the network clusters as well as when a node replaces an item. Each cluster master asks its slaves for the current contents of their caches, the available consistency information, as well as the free cache size. Hence, each slave sends the identifiers (IDs) of the items currently included in its cache as well as the TTL for each item. Then, CM starts to filter the received items to identify the distinct items. Then, the task is to keep ζ copies in the cluster. So, CM should decrease the instances of the item whose copies are greater than ζ, while increasing the instances of those items whose copies are less than ζ. The former action saves the cluster‟s cache space by reduce redundancy, while the latter maximizes the data availability within the cluster. It should be noted that the maintained ζ copies of each item are all taken 35

ACCEPTED MANUSCRIPT

from the most recent copy of the item while other copies are discarded. Hence, the most recent copy of each item is identified (based on the item‟s TTL). Consider the data item di whose number of copies with in the cluster is ε. If ε=ζ, the most recent copy of di, denoted as diR, replaces the other old copies at the remaining ζ-1 MTs. While if ε>ζ, diR replaces the ζ-1 most recent copies while MT(diR), which is the MT that hosts diR is out of consideration. Then, deleting the remaining ε-ζ copies. On the other hand, ε<ζ, diR replaces the ε copies. Then the task of the cluster master is to choose the most suitable ζ-ε MTs to receive and keep diR. The decision is taken according to the available cache size of the cluster slaves. Hence, diR is sent to the ζ-ε MTs with the maximum available free cache space. If there are no sufficient ζ-ε MTs to host diR cache replacement can be used to free additional space to host the additional copies of diR.

CR IP T

Start

Cluster Formulation

Each cluster master receives the cache contents of the cluster slaves, available free cache size and TTL information

Delete di from the set MT(di) - MTRes

Yes

No

N=0

Stop

Pick the next data item di

Calculate ε=the number of copies of di



Send diR to all MTs that have a copy of di except MT(diR)

Check

Send diR to all members of MTRes

Discover the set MLC which contains ζ- ε MTs with the least cached data.

Identify the set MTRes, which contains the ζ-1 nodes of the MTs (MT(diR) is not included), which have

ε

the most recent copies of di .

CE

PT

Keep the copy diR at MT(diR)

Send diR to all members of MLC

M

Send diR to all MTs that have a copy of di except MT(diR)

ED

Identify diR: the most recent copy of di



AN US

Identify the distinct data item, n=number of distinct items

Flowchart 3, Steps of Keep Sufficient Copies (KSC) Procedure.

AC

5. ACCS Overheads and Added Penalties The added overheads of the proposed ACCS strategy can be summarized in three main types, which are; (i) Time Penalty (TP), (ii) Storage Penalty (SP), and (iii) Bandwidth Penalty (BP). ACCS may suffer from several sources of TP. One source is the time taken to fill CM‟s routing table with the valid paths (shortest as well as alternatives) among the cluster members. Such mining process is started immediately after the cluster formulation. However, this task does not affect the network operation since the network starts with the stateless mode until the path mining process terminated. Another source of TP is the time lost for choosing the victim item that should be deleted from the local cache during the cache replacement process through the fuzzy inference system. However, the calculations can be pipelined. Hence, when the cache is filled with items. Calculations are done on time, accordingly, the decision is already taken and a victim is chosen for the next replacement when the cache is filled while there is a need for a new item. On the other hand, no time overhead for the applying the derived prefetching rules used to take the decision whether to cache an item or not if the item is passing-by a MT. This is simply because such process does not exhibit the node from doing its current work. 36

ACCEPTED MANUSCRIPT

On the other hand, ACCS suffers from a small additional SP, which is added on the nodes caches for storing the items‟ envelopes. However, the size of items‟ envelops will not has significant impact in MT‟s cache. As an envelope contains a few set of keywords, its size can be neglected compared with the size of the item. Moreover, the proposed Keep Sufficient Copies (KSC) procedure, which resides at the cluster masters compensate such defect as it saves the cache space inside the cluster while maintaining the data availability at a satisfactory level. Finally, BP of ACCS is due to the messages used by KSC procedure. However, those messages used to apply KSC are unicast ones. So, they will not have a bad impact on the overall network performance especially with the proposed built-in routing strategy, which prevents the network packet flooding to setup a path for sending data like other reactive routing protocols and significantly minimizing the query delay.

CR IP T

6. Experimental Results

In this section, ACCS will be compared against traditional caching schemes. The caching strategies have been evaluated with NS2 simulator with AODV routing protocol. NS2 is proven to be good simulator that has the ability to validate new protocols in near real situations and scenarios. It‟s becoming the basic tool used to implement almost all new techniques and protocols in both routing and caching, such as: [30-32]. The different simulation parameters are illustrated in table 8. Parameter

AN US

Table 8, The different parameters used in the simulation.

Assigned value

Simulation area Simulator Number of nodes

1000X1000 meters NS2 50  100 nodes

AODV for all caching techniques except ACCS, which employs its built in table driven routing technique.

Routing Protocol

PT

ED

M

Behavior Communication range Client cache size Server database size Network Bandwidth Mobility model Average Data item size Average node speed Replacement Policy Size of data item 𝜂 and 𝜒 Number of network clusters (θ) Sufficient Copies ζ

Stateless then Statefull 300 meters 200 1500 KB 1000 items 2 MBps Random WayPoint [33] 25 Kb 1 m/sec ACR 5 10 KB 1 5 (randomly positioned) 2 120 seconds.

Simulation time

10 minutes.

CE

Length of time stamp ξ

AC

As illustrated in table 7, some of these parameters such as: Simulation area, used routing protocol, mobility model, node speed, and communication range are used based on the previous efforts in the literature [1,9,13,19,22]. Others such as simulation time, cache size, and database size are assumed to have a single or a range of values, which are unified for all caching techniques. Other parameters are attached to the proposed caching strategy. Some of them are used with all their possible values, such as the behavior, which may be stateless or statefull. Others are taken to save the local cache space of MT such as the number of copies, which was taken to two copies. In this section, we conducted extensive experiments aiming to answer the following four research questions: 

Research Question 1: How to evaluate the performance of recent caching strategies in MANET as well as the proposed one, which performance metrics should be carefully evaluated, and which of these caching strategies have better performance? 37

ACCEPTED MANUSCRIPT 

Research Question 2: What are the effects of each contribution on the performance of the proposed caching strategy? As the proposed caching strategy includes several contributions, it is essential to evaluate each contribution alone while discarding the others. Hence, it will be easy to study its effect on the overall system performance. To accomplish such aim, the proposed caching strategy has been evaluated in several scenarios (e.g., excluding/ including ACR and excluding/ including the proposed prefetching policy).



CR IP T

Research Question 3: What are the effects of the external parameters on the performance of the proposed caching strategy? Similar to all MANET caching strategies, several parameters can potentially affect the performance of ACCS. It is essential to study the effect of those parameters on the performance of the proposed caching strategy. Accordingly, we can conclude sensitivity degree of the proposed strategy to a certain parameter. With the context of this paper, several parameters are considered, which are; number of MTs, number of clusters, and MTs‟ pause time.



AN US

Research Question 4: Is the proposed caching strategy applicable in MANET?

Since ACCS includes several contributions, it is important to prove that those contributions can effectively work together. Hence, ACCS should be evaluated against recent caching strategies including all contributions. Performance Metrics

M

6.1.

Definition 22: Cache Hit Ratio (HR) is the ratio of the number of successful requests to the total number of requests (i.e., the ratio of requested data items being in cache (local or remote) and valid), which can be calculated by (7). N N hit (7) HR  hit  N Total N hit  N miss

CE

PT



ED

The Performance metrics that will be used to compare the proposed caching strategy against the related schemes are; average query delay time, average hop count and cache hit ratio (include local cache hit, remote cache hit ratio in remote caching node, global cache hit at the server) of data items. Those different performance metrics are illustrated in the following definitions.

AC

Where NTotal is the total number of requests, Nhit is the number of the requested items that have been found in the local or remote cache and valid, and Nmiss is the number of requested items that have not been found neither in local or remote caches, or found but invalid. 

Definition 23: Cache Miss Ratio (MR) is the ratio of the number of failed requests to the total number of requests. Which can be calculated by (8). N N miss (8) MR  miss  N Total N hit  N miss



Definition 24: In Cache Ratio (ICR) is the ratio of requested data items being in cache (local or remote), valid or invalid, to all requested items. ICR can be calculated by (9). ICR 

N InCache N hit  N miss  N nf  NTotal N hit  N miss

(9)

Where Nnf is the number of times the requested data not found in the local or remote caches. 38

ACCEPTED MANUSCRIPT 

Definition 25: Average query delay (AQD), the query delay is the time elapsed between the query is sent by the requester and the data is transmitted back to him (for successful queries only), while the average query delay is the query delay averaged over all successful queries sent by all requesters. AQD can be calculated using (10).  TRS succ _ Re quests (10) AQD  N Succ Where, TR↔S is the query delay time and NSucc is the number of successful queries. Definition 26: Average Hop Count (AHC), the hop count is the number of hops between the requester and the source of data (for successful requests only), while the average hop count is the hop count averaged over all successful queries sent by all requesters. AHC can be calculated using (11).

AHC 

H

RS succ _ Re quests

N Succ

(11)

CR IP T



Where, HR↔S is the hop count and NSucc is the number of successful queries.

Testing the Proposed Adaptive Cooperative Caching Strategy (ACCS) Excluding ACR

AN US

6.2.

M

In this section, the aim is to test the proposed adaptive caching strategy (e.g., ACCS) against the recent and widely used caching techniques, which were explained in details in section 3. For accurate assessment of ACCS, we have excluded the proposed cache replacement technique (e.g., ACR) in this experiment. However, we have employed a unified cache replacement technique for all competitors, which is LRU [12]. Moreover, the proposed ACCS is implemented with no prefetching policy. Hence, the usefulness of ACCS as a caching strategy can be accurately highlighted.

AC

CE

PT

ED

Random Way Point mobility model was generated using the Scenario Generation tool supported by NS2 in which MTs are configured with a constant pause interval of 100 seconds. Figure 21 shows the cache hit rate using 100 MTs (e.g., n=100) against the cache size starting from 2001600 kb. On the other hand, figure 22 illustrates the hit rate using cache size of 1600 kb against different number of users. As illustrated in such figures, generally, the cache hit ratio increases by increasing the cache size. It is noted that the performance of SC is the worst. This happened because SC does not utilize the neighbor‟s cache. However, the query is answered by either MT‟s own cache or the remote data source (server) in the case of local cache miss. CD, on the other hand, introduces better performance than SC especially in small size of data. CD also outperforms CP as the latter is highly sensitive to the topology change, which degrades its performance. However, as HC utilizes the advantages of cache data and cache path, it introduces better performance than both of them. GC outperforms HC and CD as it has the ability to access the local cache of the group members, which in turn increases (virtually) the available cache size. By other words, the cumulative cache of a group is certainly larger than that of a single MT, which provides better performance. On the other hand, ZC utilizes more data in its local and zone cache hit than the global hit at the server. By inspecting the performance of NC, we notice that it has a high hit rate as it utilizes the caching space of the inactive neighbors. It is also notable that; GC introduces higher cache hit ratio than the pre-mentioned ones. This is simply because it partitions the network into one-hop neighbor clusters. In such caching strategy, a Cluster state node (CSN) maintains the Global cache state information, which summarizes the all cached content‟s information in the entire network, while, cluster member‟s stores the distinct data in the cache. Accordingly, these cached data items improve the cache hit ratio based on local hit, cluster hit and remote hit. ZC introduces good performance due to the good utilization of the zone cache in the case of local miss. GCC also introduce good performance as it divides the network into several clusters based on the geographical network proximity. Also, nodes in a cluster effectively interact with each other, which enhances the system caching performance. The use of Cache State Node (CSN), which maintains the global cache state (GCS) information of different network clusters, result in a good impact on the remote hit rate.

39

ACCEPTED MANUSCRIPT

CR IP T

Figure 22 shows the cache hit rate using 1600 kb cache size against the number of nodes 50100. By inspecting such figure, it is generally noted that; the cache hit ratio increases by increasing the number of MTs. In SC, the increase of MT‟s results in increasing the network size but it has no large effect in promoting the hit rate as SC does not utilize the cache contents of the requester neighbors. However, as the MTs local cache could not cache large amount of data. The request is usually answered by the server than MTs own cache. In CD, the cache size increases by increasing the number of MTs. CP stores the path for data with largest TTL and smallest hop count. HC, on the other hand, performs better when it avoiding the weakness of CD and CP. Increasing the number of MTs will increase the cumulative cache size in zone and groups of MTs. Hence ZC, GC, and NC have the ability cache the more data than the pre-mentioned caching techniques. When the number of MTs increased, GCC has more MTs in each cluster, which leads to improvement in cluster hit ratio and remote hit ratio. Moreover, due to cluster cooperation, GCC outperforms the pre-mentioned techniques. It is also noted that, the proposed ACCS outperforms all other caching techniques. It introduces the best hit rate as depicted in figures 21 and 22, while keeping the miss rate in an acceptable rate as illustrated in figures 23 and 24. This is because; (i) the effective utilization of the cluster caching space due to redundancy removal, (ii) effective data availability as it keeps sufficient copies of the cached items, and (iii) the good clustering as well as the perfect route learning policies.

AC

CE

PT

ED

M

AN US

Figure 25 illustrates the average query delay using 100 MTs against cache size starting from 2001600 KB for all competing caching strategies. Generally speaking, the small the cache size, the less the required data items can be stored, which in turn increases the average query delay as more items are fetched remotely. In SC, if a node cannot find data in its cache, it directs the request to the remote server that incurs a long delay. CD and CP utilize the neighbor‟s cache, which in turn reduces the delay. On the other hand, SC depends mainly on its own cache. In the case of local miss, SC sends the request to the remote server, which causes long delay. HC performs better than CD and CP as it utilizes the strengths of both schemes. In GC, when a MT receives the request, it first checks its local cache. If a local miss takes place, it selects the proper group members using the group table stored the master node. ZC utilizes the local cache and zone member‟s cache using MTs own table. Accordingly, ZC reduces the delay than GC. NC utilizes a similar strategy as ZC, which has a good impact in reducing the query delay. In GCC, each MT maintains its own cluster information. Moreover, the Cluster state node maintains information about other clusters. Hence, the needed data is frequently received by local hit, cluster hit, and remote hit than by requesting the remote server. Figure 25 shows the GCC is better than the pre-mentioned techniques. Figure 25 also depicts the strength of the proposed caching strategy. As ACCS relies on a built in table driven routing strategy while the other strategies employ AODV which is an on-demand routing protocol, the data transfer process is accelerated, which in turn minimizes the query delay for ACCS. By increasing the cache size, MTs has the ability to access most of their needed data from local, cluster, and remote cache, which minimizes the query delay. Figure 26 illustrates the average query delay using cache size of 1600 KB against the number of MTs starting from 50100. Generally, by increasing the number of MTs, the delay also increases. This happens because more nodes compete for limited bandwidth causing the average query delay to grow albeit at different rates for the different schemes. The performance of SC is also demoted by increasing the number of MTs. SC utilizes only the contents of its local cache, while the remote server is called in the case of local miss. So, using many MTs, the network size will increase, which will increase the number of hops from the MT to the remote server. Then, replying a remote query, in the case of local miss, will take more time. CD and CP outperforms SC since it uses its local cache only. It is also noted that; GC performs better than HC since GC utilizes the cache of the group members to access the required data. The query delay of ZC and NC is higher than GCC. This is clearly because increasing the number of MTs in the network results in increasing the cluster size. Thus increasing the size of cluster cache and remote caches, which reduces the query delay in GCC. Figure 26 also records the effectiveness of ACCS. As illustrated in such figure, ACCS outperforms all the considered caching techniques as it introduces the minimum query delay.

40

CR IP T

ACCEPTED MANUSCRIPT

Figure 22, Hit Rate against number of MTs using 1600 KB cache size

ED

M

AN US

Figure 21, Hit Rate against cache size using 100 MTs

Figure 23, Miss Rate against cache size using 100 MTs

Figure 24, Miss Rate against number of MTs using 1600 KB cache size

AC

CE

PT

Figures 27 and 28 illustrate the average hop count between source and destination against the cache size and number of MTs respectively. The hop count refers to the number of hops in a path between a query source and its supplier. To reduce energy usage and the query delay, the number of hops between the query source and the query destination should be as small as possible. For all caching techniques excluding ACCS, AODV provides the hop count information between the source and destination of the query. However, ACCS implements an attractive path learning strategy that perfectly learns the shortest path, which has a good impact in decreasing the hop counts.

Figure 25, Average Query Delay against cache size using 100 MTs

Figure 26, Average Query Delay against number of MTs using 1600 KB cache size

41

CR IP T

ACCEPTED MANUSCRIPT

Figure 28, Average Hop Count against number of MTs using 1600 KB cache size

Figure 27, Average Hop Count against cache size using 100 MTs

Studying the Effect of Mobility

AN US

6.3.

AC

CE

PT

ED

M

The same scenario descripted in section 6.2 is repeated again, however, we focus on the impact of two essential network parameters, which are; MT density and mobility, on the considered performance measures. Initially, Random Way Point mobility model is employed to investigate the effect of mobility on the relative performance of the competing caching strategies assuming the cache size of 1600 KB and 100 MTs. To accomplish such aim, the different performance parameters are measured for the different caching strategies considering different pause times. Results are illustrated in figures 2932.

Figure 30, Miss Rate against Pause time using 1600 KB cache size for100 MTs

Figure 29, Hit Rate against Pause time using 1600 KB cache size for100 MTs

42

CR IP T

ACCEPTED MANUSCRIPT

Figure 32, Average Hop Count against Pause time using 1600 KB cache size for100 MTs

Figure 34, hit rate for CD against pause time and number of MTs

ED

Figure 33, hit rate for SC against pause time and number of MTs

M

AN US

Figure 31, Average Query Delay against Pause time using 1600 KB cache size for100 MTs

Figure 35, hit rate for CP against pause time and number of MTs

CE

PT

As depicted in figures 2931, it is found that all caching strategies behave well under low mobility rates, while their performance degrades in the case of highly mobile MTs. It is observed that the high mobility drastically affect the caching performance. The best hit rate as well as the minimum average query delay is obtained at the minimum mobility rate (e.g., when pause time equals 100 sec). It is also concluded that ACCS outperforms the other strategies even at high mobility rates. On the other hand, SC introduces the lowest performance as it badly affected by increasing MT‟s mobility rates.

AC

To study the combined effect of MTs‟ density as well as the mobility, both the hit rate and average query delay are measured considering different mobility rates and MT‟s densities for all competing caching strategies. As illustrated in figures 3350, our main conclusion is that all caching strategies introduce their best performance in terms of the highest hit rate and the minimum average query delay at the minimum MTs‟ mobility rates as well as the highest MT‟s density. Results also show that ACCS demonstrates the best performance. It is found that ACCS is little affected by the increase of the MTs‟ mobility, which makes it suitable for highly dynamic networks such as Vehicular Ad hoc networks (VANETs). This is due to its efficient built-in routing strategy that keeps several alternative paths between every pair of nodes as well as the effective employed KSC procedure. Although ACCS performance, as all other competing caching strategies, is badly affected by decreasing the number of MTs, it can be concluded that it introduces the minimal performance degradation compared with the others. In particular, this proves the suitability of ACCS for cache management in low dense ad hoc networks.

43

ACCEPTED MANUSCRIPT

Figure 39, hit rate for ZC against pause time and number of MTs

Figure 40, hit rate for GCC against pause time and number of MTs

Figure 38, hit rate for GC against pause time and number of MTs

CR IP T

Figure 37, hit rate for HC against pause time and number of MTs

Figure 41, hit rate for ACCS against pause time and number of MTs

PT

ED

M

AN US

Figure 36, hit rate for NC against pause time and number of MTs

Figure 43, Average Query Delay for CD against pause time and number of MTs

Figure 44, Average Query Delay for CP against pause time and number of MTs

AC

CE

Figure 42, Average Query Delay for SC against pause time and number of MTs

Figure 45, Average Query Delay for NC against pause time and number of MTs

Figure 46, Average Query Delay for HC against pause time and number of MTs

44

Figure 47, Average Query Delay for GC against pause time and number of MTs

Figure 48, Average Query Delay for ZC against pause time and number of MTs

6.4.

Figure 49, Average Query Delay for GCC against pause time and number of MTs

CR IP T

ACCEPTED MANUSCRIPT

Figure 50, Average Query Delay for ACCS against pause time and number of MTs

Testing the Proposed Adaptive Cooperative Caching Strategy (ACCS) Using ACR

ED

M

AN US

In this section, the aim is to prove the compatibility of two of the main proposals of this paper, which are ACCR and ACR. The scenario discussed in section 6.2 will be repeated gain. However, ACCS will be implemented twice, while excluding the prefetching policies. In the first implementation, ACCS replaces items, when a local cache miss takes place with no sufficient area, using traditional replacement techniques. We have chosen six replacement techniques in which three of them are uncoordinated, which are LRU [12], SXO [13], and LUV [34], while the remaining are coordinated, which are; TDS [35], LUV mi [34], and CV [36]. On the other hand, the second implementation of ACCS employs the proposed cache replacement technique (e.g., ACR). Hence, we can guarantee not only the effectiveness of ACR but also the possibility that ACR works effectively with ACCS. The same pre-mentioned parameters will be measured, which are; hit and miss rates, average query delay and average hop count against the cache size and number of MTs.

AC

CE

PT

As illustrated in figures 51 and 52, generally, the cache hit rate increases by increasing the cache size when implementing ACCS using all replacement schemes. However, the miss rate decreases by increasing the cache size. On the other hand, the situation will be inversed by increasing the number of MTs due to the competition for the limited network resources as illustrated in figures 53 and 54. Considering figures (5154) again, it is concluded that implementing ACCS using ACR improves the hit rate and minimizes the miss rate against the cache size and the number of MTs. However, it is also noted that the cache hit rate and miss rate against the cache size and number of MTs of the different replacement strategies are closed to each other‟s. This is due to the effectiveness of the proposed ACCS. Hence, if a local cache miss takes place, ACCS searches for a valid copy remotely. This minimizes the miss rate and maximizes the hit rate for all techniques. The effectiveness of ACR concentrated in minimizing the local cache miss. Then, as illustrated in figures 55 and 56, ACR minimizes the average query delay as most requests can be served locally. Also, as illustrated in figures 57 and 58, ACR minimizes the average hop count for the same reason.

45

CR IP T

ACCEPTED MANUSCRIPT

Figure 52, miss rate against cache size for different replacement techniques.

ED

M

AN US

Figure 51, hit rate against cache size for different replacement techniques.

Figure 54, Miss rate against number of mobile terminals for different replacement techniques.

AC

CE

PT

Figure 53, Hit rate against number of mobile terminals for different replacement techniques.

Figure 55, Average query delay against cache size for different replacement techniques.

Figure 56, Average query delay against the number of mobile terminals for different replacement techniques.

46

CR IP T

ACCEPTED MANUSCRIPT

Figure 58, Average hop count against number of mobile terminals for different replacement techniques.

6.5.

Testing the Proposed Prefetching Policy

AN US

Figure 57, Average hop count against cache size for different replacement techniques.

AC

CE

PT

ED

M

In this section, the aim is to test the efficiency of the proposed prefetching policy. To accomplish such aim, the proposed caching strategy (e.g., ACCS) will be implemented in two different scenarios. During the first, ACCS is implemented with no prefetching policy, while in the second scenario, ACCS is implemented using the proposed DPP. Hence, the impact of DPP on the overall system performance (e.g., hit and miss ratios, average query delay, and average hop count) can be clarified.

Figure 59, Hit Rate against cache size for ACCS with and without DPP prefetching using 100 MTs

Figure 60, Hit Rate against number of MTs for ACCS with and without DPP prefetching using 1600 KB cache size

47

CR IP T

ACCEPTED MANUSCRIPT

Figure 62, Miss Rate against number of MTs for ACCS with and without DPP prefetching using 1600 KB cache size

PT

ED

M

AN US

Figure 61, Miss Rate against cache size for ACCS with and without DPP prefetching using 100 MTs

Figure 64, Average query delay against number of MTs for ACCS with and without DPP prefetching using 1600 KB cache size

AC

CE

Figure 63, Average query delay against cache size for ACCS with and without DPP prefetching using 100 MTs

Figure 65, Average hop count against cache size for ACCS with and without DPP prefetching using 100 MTs

Figure 66, Average hop count against number of MTs for ACCS with and without DPP prefetching using 1600 KB cache size

48

ACCEPTED MANUSCRIPT

As illustrated in figures (5966), implementing the proposed caching strategy using DPP improves the system‟s performance as the hit rate was maximized, while the average miss rate, average query delay, and miss rate were minimized. This happened because, each MT has the ability to pre-fetch those items that are predicted for use in the near future. 6.6.

Measuring the systems performance against number of clusters

ED

M

AN US

CR IP T

Although the proposed clustering technique has the ability to group the network MTs regardless of the predefined number of clusters. It is also important to estimate the time required for network installation, e.g., the time needed to setup the network clusters (τ). Hence, we can prove that the proposed clustering technique can work efficiently with different numbers of clusters (e.g., θ). To accomplish such aim, network MTs were randomly deployed within the pre-defined simulation area, then the time needed to completely install the network is measured against the number of network MTs using different number of clusters.

PT

Figure 67, Network installation time against number of MTs for ACCS with different number of clusters.

AC

CE

As illustrated in figure 67, the network installation time increases by increasing the number of MTs while decreasing the number of clusters. The cause is that organizing many MTs into clusters takes more time especially for MTs that are far away from the pre-defined cluster masters. They take more time to determine their clusters. On the other hand, by increasing the number of clusters, more clusters are formulated in parallel, which causes the network nodes to be grouped and organized into clusters in smaller time. By inspecting figure 67, it is concluded that the proposed clustering technique has the ability to work efficiently even with small number of clusters.

7. Conclusion

MANET is a wireless network in which several interacting MTs have the ability to exchange information on the move in a peer-to-peer fashion. The main goal of such type of network is to setup the communication service for specialized and customized applications where there no pre-installed infrastructure (such as battlefield or Jungle explorations) or the infrastructure has been failed (e.g. earth quakes). Data caching has been suggested to reduce the data traffic as well as the access latency in MANETs. Cooperative caching can improve the data retrieval process since the requests are served from the local or neighbors‟ caches rather than getting it from the remote data source each time. Moreover, latency of fetching the data from the remote data source is reduced. 49

ACCEPTED MANUSCRIPT

The originality of this paper is concentrated in introducing an adaptive cooperative caching strategy (ACCS) with a novel cache replacement and pre-fetching policies. ACCS divides the network into non-overlapping clusters in which each cluster is formulated by a set of neighboring mobile nodes. It has its own table driven routing scheme which minimizes the routing latency. ACCS has been compared against recent cooperative caching strategies. Experimental results have shown that ACCS outperforms other strategies as it introduced the maximum cache hit as well as the minimum query delay.

8. Future Work

CR IP T

An extension of the work presented in this paper is to consider the effect of selfish nodes, which disturb the flow of the data across the network. Selfish Nodes are those malicious nodes that do not forward other‟s packets as well as the by-pass data requests. They try to maximize their benefits at the expense of all others. Selfish Nodes want to preserve their resources and energy rationally. Several modifications should be done to allow the proposed caching system to successfully deal with this type of nodes if they exist in the network so as to study their effects on the system‟s performance.

9. References

AC

CE

PT

ED

M

AN US

[1] H. Nishiyama, T. Ngo, N. Ansari, and N. Kato, "On Minimizing the Impact of Mobility on Topology Control in Mobile Ad Hoc Networks," Wireless Communications, IEEE Transactions, vol.(11), no.(3), pp. 1158-1166, 2012. [2] A. Dorri and S. Kamel and E. kheyrkhah, "Security Challenges in Mobile Ad Hoc Networks: A Survey", International Journal of Computer Science & Engineering Survey (IJCSES), vol.(6), no.(1), pp. 15-29, 2015. [3] S. E. Khawaga, A. I. Saleh, H. A. Ali, “An Administrative Cluster-based Cooperative Caching (ACCC) strategy for Mobile Ad Hoc Network”, Journal of Network and Computer Applications, vol.(69), no.(3), pp. 54-76, 2016. [4] N. Abbas, E. Ozen, and M. Ilkan, "The Impact of TCP Congestion Window Size on the Performance Evaluation of Mobile Ad Hoc (MANET) Routing Protocols", International Journal of Wireless & Mobile Networks (IJWMN) vol.(7), no.(2), pp. 29-43, 2015. [5] B. Kim and I. Kim, "Multimedia Caching Strategy Based on Popularity in Mobile Ad-Hoc Network", International Journal of Multimedia and Ubiquitous Engineering, pp.173-182, vol.(10), no.(4), 2015. [6] L. Yao, J. Deng, J. Wang, G. Wu, "A-CACHE: An anchor-based Public Key Caching Scheme in Large Wireless Networks", Computer Networks, vol.(87), no.(2), pp.78–88, 2015. [7] R. Prabha, G. Ganesan, "Cluster based Efficient File Replica in P2P Mobile Ad Hoc Network", Communications on Applied Electronics, vol.(1), no.(3), pp. 33-35, 2015. [8] J. Shim, P. Scheuermann, and R. Vingralek, “Proxy Cache Algorithms: Design, Implementation, and Performance”, IEEE Transaction in Knowledge and Data Engineering, vol.(11), no.(4), pp. 1-32, 1999. [9] M. Taghizadeh, K. Micinski, C. Ofria, E. Torng, and S. Biswas, “Distributed Cooperative Caching in Social Wireless Networks”, IEEE Transactions on Mobile Computing, vol.(12), no.(6), pp. 1037-1053, 2013. [10] J. Francisco, G. Cañete, E. Casilari, and A. Cabrera, “A Cross Layer Interception and Redirection Cooperative Caching Scheme for MANETs”, EURASIP Journal on Wireless Communications and Networking, vol.(63), no.(2), pp. 1-21, 2012. [11] P. Joy, and K. Jacob, "Cache Replacement Policies for Cooperative Caching in Mobile Ad hoc Networks", IJCSI International Journal of Computer Science Issues, vol.(9), no.(2), pp. 1-6, 2012. [12] A. Silberschatz, P.B. Galvin, and G. Gagne, “Operating System Concepts”, John Wiley and Sons, 2004. [13] L. Yin and G. Cao, “Supporting Cooperative Caching in Ad Hoc Networks”, IEEE Transaction on Mobile Computing, vol.(5), no.(1), pp. 77-89, 2006. [14] J. Zhao, P. Zhang, G. Cao, and C. Das, “Cooperative caching in wireless P2P networks: design, implementation and evaluation”, IEEE Transactions in Parallel and Distributed Systems, vol.(21), no.(2), pp. 229-241, 2010. [15] L. Yin and G. Cao, “Supporting cooperative caching in ad hoc networks”, IEEE Transactions in Mobile Computing, vol.(5), no.(1), pp. 77-89, 2006. [16] M. Fiore, F. Mininni, C. Casetti, D. Chiasserini, “To cache or not to cache?”, Proceedings of the IEEE Conference on Computer and Communications (INFOCOM 2009), Rio de Janeiro, Brazil, 2009. [17] M. Denko and J. Tian, “Cross-Layer Design for Cooperative Caching in Mobile Ad Hoc Networks”, Proceeding of IEEE Consumer Communications and Networking Conference, 2008. [18] J. Francisco, G. Cañete, E. Casilari, and A. Cabrera, “A Cross Layer Interception and Redirection Cooperative Caching Scheme for MANETs”, EURASIP Journal on Wireless Communications and Networking, vol.(63), no.(2), pp. 1-21, 2012. [19] G. Cao, L. Chita, and R. Das, “Cooperative Cache-Based Data Access in Ad Hoc Networks”, Proceeding in journal of Computer, IEEE Society, vol.(27), no.(2), pp. 32-39, 2004. [20] B. Tang, G. Das, “Benefit-Based Data Caching in Ad Hoc Networks”, IEEE Transactions on Mobile Computing, vol.(7), no.(3), pp. 289-304, 2008. [21] J. Cho, S. Oh, J. Kim, H. Ho, J. Lee, "Neighbor Caching in Multi-Hop Wireless Ad Hoc Networks", IEEE Communications Letters, vol.(7), no.(11), pp. 525 – 527, 2003. [22] S. Umamaheswari and G. Radhamani, “Enhanced ANTSEC Framework with Cluster Based Cooperative Caching in Mobile Ad hoc Networks”, IEEE International Journal in Communication and Networks, vol.(17), no.(1), pp. 40-46, 2015. [23] N. Chand, R. Joshi, and M. Misra, “Efficient Cooperative Caching in Ad Hoc Networks”, Proceeding of the first International Conference on Communication System Software and Middleware, New Delhi, pp.1-8, August 2006. [24] N. Chand, R. Joshi, and M. Misra, “A Cooperative Caching Strategy in Mobile Ad Hoc Networks Based on Clusters”, International Journal of Mobile Computing and Multimedia Communications, vol.(3), no.(3), pp.20-35, 2011.

50

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[25] N. Chand and N. Chauhan, “Cooperative Caching in Mobile Ad Hoc Networks Through Clustering”, International Conference on Software Engineering, Parallel and Distributed Systems, Cambridge University, UK, Feb. 20-22, pp. 78-83, 2011. [26] C. Chow, H. Leong, and A. Chan, “Peer-to-Peer Cooperative Caching in Mobile Environments”, Proceedings of the 24th International Conference on Distributed Computing Systems, pp. 528- 533, 2004. [27] C. Tsourosa and M. Satratzemia, "Tree search algorithms for the dominating vertex set problem", International Journal of Computer Mathematics, vol.(47), no.(3), pp. 127-133, 1993. [28] R. Piroozfar, "Fuzzy Logic: A Rule-Based Approach, in Search of A Justified Decision-Making Process in Urban Planning", Doctoral Thesis, Institute For Architecture and Urban Studies, Berlin University of Technology (Tu-Berlin), 2012. [29] https://wordnet.princeton.edu/. [30] R. Kumar, P. Singh, "Performance analysis of AODV, TORA, OLSR and DSDV Routing Protocols using NS2", International Journal of Innovative Research in Science, Engineering and Technology, vol.(2), no.(8), pp. 56-68, 2013. [31]D. Trung, W. Benjapolakul, D. Minh, “ Performance evaluation and comparison of different ad hoc routing protocols”, Elsevier Comput Commun vol.(30), pp. 2478-2496, 2007. [32] N. Venkatadri and K. Ramesh, "Performance Metrics Comparison for On-demand Routing Protocols using NS2", International Journal of Advanced Research in Computer Science and Software Engineering, vol.(5), no.(3), 2015. [33] C. Bettstetter, G. Resta, and P. Santi, “The Node Distribution of the Random Waypoint Mobility Model for Wireless Ad Hoc Networks” IEEE Transactions on Mobile Computing vol.(2), no.(3), pp. 257–269, 2003. [34] N. Chand, R. Joshi, and M. Misra, "Cooperative caching in mobile ad hoc networks based on data utility", Mobile Information Systems, vol.(3), no.(2), pp. 19–37, 2007. [35] S. Lim, W. Lee, G. Cao, and C. Das, "A Novel Caching Scheme for Improving Internet-Based Mobile Ad Hoc Networks Performance", Ad Hoc Networks, vol.(4), no.(3), pp. 225–239, 2006. [36] P. Kumar and N. Chauhan, “A Review of Cooperative Cache Management Techniques in MANETs”, International Conference on Recent Trends in Soft Computing and Information Technology, pp. 1-7, 2010.

51