Efficient and robust serial query processing approach for large-scale wireless sensor networks

Efficient and robust serial query processing approach for large-scale wireless sensor networks

Accepted Manuscript Efficient and Robust Serial Query Processing Approach for Large-Scale Wireless Sensor Network Applications A. Boukerche, A. Moste...

1MB Sizes 0 Downloads 41 Views

Accepted Manuscript

Efficient and Robust Serial Query Processing Approach for Large-Scale Wireless Sensor Network Applications A. Boukerche, A. Mostefaoui, M. Melkemi PII: DOI: Reference:

S1570-8705(16)30116-0 10.1016/j.adhoc.2016.04.012 ADHOC 1386

To appear in:

Ad Hoc Networks

Received date: Revised date: Accepted date:

21 November 2015 9 April 2016 28 April 2016

Please cite this article as: A. Boukerche, A. Mostefaoui, M. Melkemi, Efficient and Robust Serial Query Processing Approach for Large-Scale Wireless Sensor Network Applications, Ad Hoc Networks (2016), doi: 10.1016/j.adhoc.2016.04.012

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Ad Hoc Networks Ad Hoc Networks 00 (2016) 1–31

CR IP T

Efficient and Robust Serial Query Processing Approach for Large-Scale Wireless Sensor Network Applications A. Boukerchea , A. Mostefaouib,a , M. Melkemic PARADISE Research Laboratory, University of Ottawa, Canada 800 King Edward Avenue Ottawa, Ontario, Canada, K1N 6N5 Email: [email protected] b Femto-ST Institute, University of Franche-Comte. 1 Rue Engel Gros, Belfort 90000, France. Email: [email protected] c LMIA, University of Haute-Alsace. 4 rue des freres Lumiere, 68093, Mulhouse. Email: [email protected]

M

AN US

a

Abstract

CE

PT

ED

The goal of query processing in WSN is to get reliable information of interest from sensor nodes whilst preserving, as much as possible, the network resources, mainly energy. Among the various approaches proposed in the literature to tackle this issue, serial approaches, in which the query is carried out serially from node to node, have shown noticeable improvements in terms of query processing responsiveness and communication overhead reduction when compared to centralized and distributed ones. Nevertheless, they suffer two main drawbacks: (a) they are intrinsically very vulnerable and (b) they require the construction of a Hamiltonian path through the network, which is known to be a NP-Complete problem. In this paper, we investigate the issue of efficient and robust query processing by proposing a novel approach, which we refer to as GBT (Greedy & Boundary Traversal). GBT is of a serialized and localized nature (i.e., each node does not maintain any knowledge about the topology of the network). Furthermore, in GBT the selection of the next hop is totally independent from the previous hops (i.e., no path is defined in advance). This feature enforces the robustness of GBT as attested by the simulation results we obtained (a mean improvement of almost 50% reduction in terms of communication, energy and query responsiveness in large-scale network topologies). We also provide a complexity analysis (time, space and communication) of our query processing algorithm as well as formal proof of its correctness (i.e., termination and completeness).

AC

Keywords: Wireless sensor networks, query processing, Serial Algorithms, localized algorithms, robustness

———————————————————————

1. Introduction Wireless Sensor Networks (WSNs) consist of numerous sensors (from several hundred to several thousand) deployed in a region of interest in order to monitor either physical or environmental conditions according to the requirements of an application. They are capable of generating and collecting an enormous amount of data. Although the ultimate goal is to derive information of interest through queries sent to nodes 1

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

2

M

AN US

CR IP T

containing such data, the used method to process these queries has a great impact on the overall performance of the network, particularly when the sensors have limitations on their energy. This paper deals with query processing in WSNs: upon requesting information from the base station, also called Sink, sensor nodes in the network collect data and send it back to the sink, which is in charge of answering end-application/user queries. This process is referred to as query processing [1]. Because of its inherent constraints (i.e., limited energy, low communication and node reliability, large deployment, etc.), the way the network performs this task has a deep impact on its overall performance (e.g., required communications, consumed energy, query responsiveness, etc.). One of the most obvious ways to handle queries in sensor networks is known as warehousing [2], also called centralized approaches, where the access to the sensor network and the processing of queries are separated. In other terms, raw data is first collected at the sink, whereas a centralized DBMS can be used to provide access to the collected data through classical database queries. While the warehousing approach has the advantage of being simple and efficient in terms of answering queries, it suffers from two limiting drawbacks: accumulation of highly redundant data at the sink, and more importantly, over-utilization of communications resources in the WSN. As communications are known to be the most energy consuming tasks in the WSN [1], their usage must then be handled carefully in order to prolong the network lifetime, since sensors are driven by limited energy provisions (i.e., batteries). Hence, instead of sending all individual data to the sink, it would be more efficient, in terms of reducing communications and consequently reducing consumed energy, to partially process the query locally and send this information as a single packet to the sink. This process is referred as in-network processing. Three main design considerations have to be taken into account when developing a in-network query processing technique: (a) its cost in term of resources usage, mainly energy, (b) its ability to scale with the growing of nodes in the network (i.e., scalability) and (c) its robustness against link and nodes failures as WSNs are known to be frequently prone to failures.

AC

CE

PT

ED

1.1. Related work Distributed approaches [3, 4, 5, 6], based on in-network processing, were proposed as interesting alternatives to centralized approaches, which experience poor scalability and robustness againt link and node failures. In such approaches, nodes do not need to hold global knowledge about the current network topology, since each node communicates only with its immediate neighbors. The query answer is then successively carried out through local computation from the exchanged data. The advantages of such approaches are numerous: (a) no central base station is required, as every node holds the estimate of the unknown parameter; (b) multi-hop communications are avoided (only direct communications between neighbors are used) and consequently maintaining rooting data is no longer required; (c) better behavior is observed in front of communication and unreliability of nodes. Flooding [7], for instance, is the first and simplest approach for in-network distributed query processing. In this approach, each sensor node broadcasts its local data (e.g., stored and received data) to its immediate neighbors. The process is repeated until all nodes hold all sensed data. Afterwards, each node can locally compute the query answer. However, this technique requires a significant amount of transmissions and a large storage memory, which waste the network’s resources and decrease its lifetime considerably. Alternatively, iterative approaches [8, 9] were proposed to reduce local storage requirements. In fact, in these approaches, each node maintains a local estimate of the query answer (i.e., unknown parameter) that is updated iteratively with weighted data received from immediate neighbors. After several iterations, depending on the application requirements, the process is proven to converge to the right value of the unknown parameter. Again, iterative approaches, while they experience better robustness, require a great deal of communications. Furthermore, because of their synchronized iterations, these approaches generate an 2

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

3

CR IP T

important number of packet collision which may decrease the overall network performance, in particular query responsiveness. Another class of distributed approaches, named Serial approaches [10, 11, 12], were specifically designed to reduce communication overhead, as previously observed in both centralized and iterative approaches. The basic idea behind these approaches is that one node initiates the query and the answer is successively updated from node to node until all nodes in the network contribute to it; i.e., the query visits all nodes serially. The last visited node holds the right answer of the query [11, 10]. As stated in [10], serial algorithms observe significant performances in reducing needed communications in comparison to centralized and iterative approaches. Furthermore, because of their serial nature, these approaches do not require a sophisticated MAC layer since all the communications in the network are serial; i.e., at each step, only one node is required to communicate. Our study indicates that this interesting feature noticeably improves the query response time (see Section 8).

CE

PT

ED

M

AN US

1.2. Motivations Although theoretical foundations of serial approaches, specifically convergence proofs, have been discussed and provided in [10, 11, 13], some practical issues, to the best of our knowledge, remain open: in fact, serial approaches require the construction of a Hamiltonian path through the network (we note that previous research works make implicit the assumption of the existence of such a path, which is not true for any network configuration). Furthermore, even when such a Hamiltonian path exists, finding it is known to be a NP-Complete problem [14] which is not easy to meet, particularly in sparse and large-scale networks. The cost of finding such a path, in a decentralized manner to ensure scalability, could generate a prohibitive communication cost. To overcome this limitation, the authors in [12] propose to use space filling curves to derive a path, that is not necessarily Hamiltonian (i.e. it could visit a node more than once). Nevertheless, the proposed approach does not handle all network topologies. Indeed, it remains suitable for dense and regular network topologies without holes, whereas it is not able to ensure query completeness (i.e., visiting all nodes in the network) in any topology. To illustrate, we take the example of Figure 1.

AC

...

9

Hole

3

4

2 1 20

Backtracking 17

16

21

22

...

Figure 1: Example of a network deployment where serial approaches based on space filling curve fail to visit all nodes.

As provided in [12], we take the same scan curve with the orientation from left to right and from top to bottom (see Figure 1). Hence, all the nodes are “mapped” to this curve according to their locations; i.e., each node has its curve indices. Then, the selection of the next hop is performed according to the curve indices of the unvisited neighbors; i.e., the current node selects among the unvisited neighbors, the one with the next coordinate in the curve order. In the example of Figure 1, node labeled 3 will select 3

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

4

M

AN US

CR IP T

node 9 instead of node 4 because it is far in the curve order. This process is repeated whether there are unvisited neighbors. Similarly, when node 16 holds the query, it selects node 17 than node 21. This process is repeated until reaching a node that has no unvisited neighbors. This is the case of node 20. When the selection of unvisited neighbors is not possible, the authors propose to perform backtracking until a node with unvisited neighbors is reached. In the example, backtracking, illustrated with blue arrows, is stopped at node 16 since its neighbor 21 has not been visited yet. The process can then be continued. In the example of this network configuration with a hole, all red colored nodes (i.e., nodes labeled 4 to 8) are then left unvisited and consequently do not contribute in the query. Assume that the query is looking for the maximum and one of these left nodes holds this maximum. In this case, the returned result is totally wrong. We must note that the presence of holes in real WSN deployments is the norm and not the exception due to many environmental facts (bad weather conditions, presence of obstacles, etc.). For this reason, we have taken into account, in the earlier stage of the design of our algorithm, the presence of holes as well as the query completeness requirement. In fact, many realistic applications impose the query completeness. For instance, queries looking for the maximum, the minimum or counting the number of live nodes have to visit all nodes, whatever the network configuration is, otherwise they are not able to provide correct results. Another inherent limitation of serial approaches is their vulnerability. In fact, as the query is carried out following a path (e.g., serial nature), any cut in this path, particularly at the end of it, will interrupt the query. This risk becomes higher when, on the one hand, the derived path is predefined in advance (as is the case with previous approaches based on space-filling curves), and on the other hand when the network observes several perturbations (e.g. link failures). The question that rises then is: how could serial approaches meet WSNs requirements (i.e., scalability, robustness to a certain extent) while maintaining their performances? 1.3. Contributions

AC

CE

PT

ED

In this paper, we propose to overcome weaknesses of serial approaches as mentioned earlier, by designing a localized and robust serial query processing algorithm. The localized nature of our proposal means that each node, receiving the query, has to select the next hop based only on local information (immediate neighboring information) that is already available [15]. Consequently, the query path is constructed gradually and is not required to be known in advance. In other words, the selection of the next hop is totally independent from any previous selection in the sense that there is no path defined in advance; in contrast to space-filling based approaches in which the path is defined in advance according to the coordinates of nodes on the curve. This unique feature of our query processing allows both of the following: (a) its scalability, which is a fundamental requirement for protocol applicability to large-scale sensor networks, as the decision of the next hop is taken locally; and (b) makes it less sensitive to the risk of query interruption, since the current node does not require knowledge about previous hops. Hence, it enforces its link failure resilience. Furthermore, as for any distributed and localized algorithm, we proved its theoretical correctness: specifically we have to prove that our proposal does not revert into looping (i.e., it terminates and delivers a result). Additionally, to meet the query completeness requirement, we have to prove that our proposal also guarantees the visiting of all connected nodes. We provide in this paper theoretical proof of the correctness of our approach. Finally, in order to validate our proposal experimentally, we have conducted several series of simulations in which we have compared our approach to state-of-the-art approaches (centralized, distributed and serial ones). The results obtained show clearly on one hand the better performance of serial approaches over 4

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

5

centralized and other distributed approaches, and on the other hand the effectiveness of our algorithm over other localized algorithms. The rest of the paper is organized as follows: Section 2 presents some preliminaries. In Sections 3, we present our algorithm. Section 4 presents an extended version to cope with query completeness. We provide in Section 5 and Section 6, a complexity analysis of our approach as well as proof of its correctness, respectively. Sections 7 and 8 present performance results and Section 9 concludes the paper.

CR IP T

2. Preliminaries

In this section, we introduce the network and communication model we considered. We also present the boundary traversal algorithm we developed, since we used it in different parts of our proposed serial query processing approach. 2.1. Network and communication model

ED

M

AN US

We consider a geographic wireless network represented as a graph G(N, E), where N = {N1 , . . . , Nn } is a finite set of sensor nodes and E = {Ei j | Ni , N j ∈ N} is a finite set of links. It is assumed that each node knows its location by means of any localization process [16]. We also assume that each node is aware of its 2-hop information. We note for every node Ni its neighbors set as Vi = {N j | Ei j ∈ E}. The commonly used communication model in wireless sensor networks is of a broadcasting nature [1]; that is, when a node sends a packets, all neighbors within its transmission range will receive it. Hence, it is worthwhile to note that the listening process does not incur any additional transmission (no explicit request is sent to the transmitter). This interesting feature of the communication model will be used later on in our approach to update, without any additional communication costs, the visited neighbors table of each node. In our serial approach, the first node that launches the query is called the QIN (“Query Initiator Node”) For the convenience of presentation, we summarize in Table 1, all the abbreviations used in the rest of the paper.

AC

CE

PT

Abbreviation SL BTI BN NBN QIN RF BTR HER HCN UF FVN HXR

Meaning Sweeping Line Boundary Traversal Initiator Boundary Node Network Boundary Node Query Initiator Node Rule Flag Boundary Traversal Rule Hole Exploration Rule Hole Closer Node Unvisited Flag First Visited Nodes Hole eXit Rule

Table 1: Abbreviations used in the paper.

5

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

6

2.2. Boundary traversal algorithm One important component of our serial query processing is the use of a boundary (e.g., hole or network) traversal algorithm. The objective is to always keep the message on the boundary; that is all visited nodes belong to the same boundary.

CR IP T

Definition 1. (Boundary) A sequence of nodes S = (N1 , N2 , N3 , ..., N s ) is said forming a boundary in the direction ∠N1 N2 N3 iff: (a) the line segments, representing the communication links, formed from the elements of this sequence (i.e., E12 , E23 , . . . ) are not crossed by any line segment resulting from a communication link Ei j such that Ni , N j < S or (b) it exists a link Ei j such that Ni < S , N j ∈ S and Ni is located in the direction of the boundary.

6

Boundary 7

3

5

4

2

1

(a)

5

Boundary

2

AN US

6

4

3

1

(b)

Figure 2: Boundaries illustration.

AC

CE

PT

ED

M

From Figure 2-(a), the following boundaries can be derived: S 1 = (N1 , N2 , N3 , N6 , N7 ), S 2 = (N1 , N2 , N3 , N5 , N6 , N7 ), S 3 = (N1 , N2 , N3 , N4 , N6 , N7 ), . . . . In Figure 2-(b) however, neither the sequence S 4 = (N1 , N2 , N4 , N5 ) nor the sequence S 5 = (N1 , N2 , N3 , N4 , N5 ) are considered as boundaries. We note that the identification of a boundary is dependent of the used algorithm. In the literature, several research works have proposed boundary traversal algorithms, mainly within the geographic routing issue. These algorithm, of localized nature that we are aware of, are neither based on the Unit-Disk-Graph (UDG) model [17, 18, 19] or require a planarized graph [20]1 . In the UDG model, even these approaches require only 1-hop information, they are not however able to manage arbitrary topologies due to the presence of obstacles; i.e., some nodes are within the theoretical communication range of each others but could not communicate. Approaches based on graph planarization introduce however additional overhead that could be prohibitive in large scale networks. In this paper, we developed a novel localized approach that does not consider the UDG model neither uses graph planarization. The process always starts from a node located on the boundary, called Boundary Traversal Initiator (BTI) (i.e., node labeled 1 in Figure 3. BTI first hinges a Sweeping Line (SL) that is oriented in the direction of the boundary as shown in Figure 3 (dashed line). From this position, the BTI sweeps the SL counterclockwise until a neighbor is hit. This neighbor is then selected as the next hop and will receive the message. The process is then repeated by each visited node starting the SL from its corresponding starting angle, defined as follows: Definition 2. (Starting Angle) We define the Starting Angle of node Ni+1 , denoted S i+1 , as equals to the angle ∠Ni Ni+1 Ni−1 , where Ni−1 is the previous hop of node Ni and Ni+1 is the next hop of Ni if ∠Ni Ni+1 Ni−1 < π, 0 otherwise. 1

A planarized graph is a graph in which all crossing edges are removed, while maintaining its connectivity.

6

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

7

For instance, in the example of Figure 3, S 3 = 0 since the angle ∠N2 N3 N1 > π. However S 6 = ∠N3 N6 N2 since ∠N3 N6 N2 < π. The usage of starting angle ensures the algorithm to make progress at each hop and prevents it to fail into looping, as it is known in the right hand rule [20]. From implementation point of view, the starting angle of node Ni+1 is computed in the previous hop by node Ni and sent within the message.

SA 6 6 7

3

5

4

2

1

CR IP T

Boundary

Figure 3: Sweeping Line boundary traversal algorithm.

M

AN US

This algorithm, although it derives the true boundary in random network topologies, it fails however in a particular one, called intersection case, illustrated in Figure 2-(b). In this case, the traversal algorithm derives a false boundary; i.e., (N1 , N2 , N4 , N5 ). This misbehavior comes from the fact that the current node, N2 , could not “see” that a node (N6 ), located on the right side of the SL (and consequently in the boundary), has a neighbor on the other side (i.e., left side of the SL), which is responsible of the crossing communication (N3 ). The only way to avoid this situation is to let the current node to be aware of it, by asking for the 2-hop information. For this reason, we have considered 2-hop information as a requirement for our approach. Hence, N2 , rather than selecting node N4 ,causing the missing of node N6 , will select node N3 instead (i.e., the generated sequence is (N1 , N2 , N3 , N6 , N3 , N4 , N5 )). By dosing so, our traversal approach is able to resolve the intersection problem.

PT

ED

2.3. Boundary nodes The localized nature of our algorithm requires that each node, receiving the query, executes a defined set of rules depending on its localization inside the network. For this purpose, we have identified two categories of nodes: (a) boundary nodes that are located on a given (i.e., network or hole) boundary and (b) no-boundary nodes.

CE

Definition 3. (Boundary Node) A node N is said to be a boundary node in the direction ∠Ni NN j , denoted BN, iff when applying the boundary traversal algorithm from the farthest neighbor, the generated sequence contains the sub-sequences (Ni , N, N j ).

AC

For instance, in Figure 4, all colored nodes (green, orange and red) are boundary nodes. In the example, Ni is facing two boundaries: (1) the one in the direction ∠Nm Ni Nk and (2) the other one in the direction ∠Nk Ni N. We say that Nk and N are Ni ’s neighbors on the hole whereas Nm and Nk are its neighbors on the external boundary. With the help of this definition, every node in the network, after the initialization process, could locally determine if it is a BN, and in this case how many boundaries it faces, or not. Definition 4. (Network Boundary Nodes) we define the set of Network Boundary Nodes, denoted NBN, as a set of BN nodes such that the closed region bounded by this non self-intersecting polygonal set contains all nodes of the network. 7

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

8

N

External Boundary N N k

m

i

Hole

N

j

CR IP T

N

Network Boundary Hole Boundary Nodes Nodes

Figure 4: Boundary Nodes.

AN US

For instance, the set NBN of Figure 4 is composed of all green colored nodes. The identification process of those nodes, while necessary to our serial query processing approach because they perform different rules when they receive the query than other nodes, could not however be processed locally as was the case for boundary nodes. To this end, we developed a localized approach which is necessary to maintain the scalability and the robustness features of our proposal. The key idea behind our proposal is to use geographic greedy routing to try to reach a virtual node located outside the network area, as illustrated in Figure 5. Virtual Node

N14

M

N17

N15

N10

N8

ED

Hole

N13

PT

N12

N2

N4

N7

N1 N9

N11

N6 N16

QIN

First NBN N3

N5

Figure 5: Network Boundary Nodes detection process.

AC

CE

Hence, a message is initiated from a source node, usually the sink, to reach this virtual node. The source node, knowing the location of the destination node, sends the packet to its 1-hop neighbor, which is the closest node to the destination among all neighbors. Knowing the destination location from the received message, every node in the path repeats this process until the message fails in a local minimum situation (i.e., the message could not be delivered to the next hop, as the current node is the closest one to the destination). From this node, also called local minimum node, a recovery process is launched using our proposed boundary traversal algorithm as explained in Section 2.2. This process is repeated until the current selected node is closer to the virtual node than the local minimum node. Then, the greedy forwarding can resumed again, and so on. Following this approach and knowing that the virtual node is unreachable by construction, the message will make a cycle and return back to the last local minimum node, denoted Nll . Theorem 1. The last local minimum node, denoted Nll , belongs to the set NBN. 8

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

9

Proof 1. From the network connectivity condition (i.e., there exists a path between any pair of nodes in the network) on one hand, and from the greedy routing process (i.e., based on the lowest distance to the virtual node) on the other hand, we deduce that the last local minimum node, Nll , is the closest node among all nodes of the network to the virtual node. Hence, Nll belongs to the set NBNs.

3. Greedy & Boundary Traversal (GBT) Approach

CR IP T

Once a node has been identified as belonging to the set NBN, we say node Nll (N14 in the example of Figure 5), the second step of NBNs identification could start from this node by using the boundary traversal process. Hence, all visited nodes then belong to NBNs set.

In this section, we present our proposed approach named GBT (Greedy & Boundary Traversal). We first begin by introducing the different steps related to the initialization phase. We then discuss this version of our proposal that has shown very good performance in random generated configurations. We provide examples to illustrate the different steps of our algorithm.

AN US

3.1. Initialization

As mentioned before, our query processing approach is of a serial nature. This means that the QIN initiates the query and at each step, only the currently visited node, receiving the query, has to contribute in it. Then, it sends the new value to the next node, which is selected according to predefined rules (explained below). The exchanged message is composed of:

M

• The QIN identifier.

• The QIN coordinates used later on to apply the Greedy Rule (GR) during the next hop selection.

ED

• The current estimate value, updated by each visited node. • The last BTI identifier. This information is used to detect query termination.

PT

• A flag, called Rule Flag (RF), indicating the current used rule. We have used the following signification in our implementation: “0” for greedy rule, “1” for boundary rule, “2” for hole exploration rule and “3” for exit rule.

AC

CE

In addition, each node in the network extends its neighboring table with information about the visited neighbors for a given query (a boolean indicating if the corresponding neighbor has been visited or not). We note that this information is not explicitly requested by the current node, but it is made available from the broadcasting/listening communication model used in wireless sensor networks, as mentioned before in Section 2.1. 3.2. Algorithm

The key idea behind GBT is to successively extend the set of already visited nodes, starting from the QIN, by adding at each step a node based on its distance to the QIN; i.e., the node, among unvisited neighbors, which has the lowest distance to the QIN is selected as the next hop. When the greedy extension of the visited set is not possible, it then performs boundary traversal of this set, looking to other unvisited nodes to continue the visiting process. When all nodes have been visited, the last boundary sequence will produce a cycle and the process could then be stopped. Here after, we detail these rules: 9

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

10

• Greedy Rules (GR): This rule is applied first by the current node and states that among unvisited neighbors, the next hop is the closest to the QIN. In other words, based on the coordinates of the QIN that are available in the message, the current node computes the distances between the QIN and all its unvisited neighbors and chooses the one with the smallest distance. It sends then the message to the next hop. When a node is not able to apply the greedy rule (i.e., all its neighbors have been visited), it then applies the second rule, known as the Boundary Traversal Rule (BTR).

CR IP T

• Boundary Traversal Rule (BTR): The objective behind this rule is to allow the message to travel along the boundary of the visited region to nodes or regions of the network that have not yet been visited. The first node that triggers the boundary traversal process initiates the curved stick as shown in Figure 6 and sweeps it until a neighbor is hit. This neighbor is then selected as the next hop. Before sending it the message, the current node updates the flag RF in order to indicate that the current rule is the boundary traversal rule. This kind of message, which does not change the current query value, is known as the Boundary Traversal Rule (BTR) message. The next hop will apply the same rule whenever the greedy rule is not possible.

AN US

When a BTR message passes through a node belonging to the set NBN and it can not use the greedy rule, this node performs the following: (a) it verifies if it is the first NBN visited by this message by checking the BTI field in the message. If this is the case, it then records its identifier into the message (c.f., last visited node); (b) it changes the orientation of the curved stick toward the external boundary; (c) finally, it reapplies the BTR rule by selecting the next hop. And so on.

3.3. Illustrative example

ED

M

• Termination Condition: Because of the localized and distributed nature of our approach, its termination is not obvious and it is not necessarily detected by the QIN itself. Hence, to detect termination, the query message has to accomplish a cycle on the external network boundary. In other words, when a NBN receives the same message previously tagged by its identifier into the BTI field, it detects termination.

AC

CE

PT

Let us consider the following network as depicted in Figure 6. Green colored nodes represent the set NBN. The QIN is N1 . Starting from this node and applying GR, the first greedy sequence is GS 1 = {N1 , N4 , N2 , N3 , N5 , N6 , N7 , N10 , N17 }. When N17 holds the query, it detects that all its neighbors have been visited. It then initiates the boundary traversal process: the curved stick is oriented in the opposite direction of the oriented segment [N17 , QIN] and swept counterclockwise as shown in the figure. The first boundary sequence is then BS 1 = {N17 , N10 }. This sequence is interrupted at N10 as it has unvisited neighbors. The second greedy sequence could then start from N10 : GS 2 = {N8 , N9 , N16 , N11 , N12 , N13 , N14 , N15 }. Again N15 is unable to apply greedy rule. As N15 belongs to the set NBN, it updates the BTI field with its identity and similarly starts the second boundary sequence: BS 2 = {N15 , N14 , N13 , . . . , N10 , N8 , N15 }. The second boundary sequence, composed of all NBNs, is a cyclic sequence. The message gets back to N15 which is then able to detect query termination by checking the BTI field previously tagged with its identity. It sends then the query result to the QIN by using any geographical routing method [19, 17, 18]. 3.4. Query completeness performance GBT has shown a very efficient behavior in all network configurations we considered (i.e., generated randomly within a square region, see Section 7.2 for details). It has visited all the connected nodes and returned the correct value of the query. We have not noted any network configuration in which the query completeness was violated. These interesting experimental results do not however formally validate the 10

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31 Network Boundary Nodes N15

4

2

3

N14

N17

N10

N8

Hole

1

N13

N9 N12

N6

N11

N2

N4

N7

N16

N1 QIN N5

N3

CR IP T

N

11

Figure 6: GBT example: (1) first greedy sequence, (2) first boundary sequence, (3) second greedy sequence, (4) termination sequence

M

AN US

query completeness property of GBT. In other words, there is no formal guarantee about the ability of this approach to ensure the query completeness in any network configuration. As mentioned before, this property is mandatory to meet in applications where the query precision is primordial (e.g., sensitive applications). For instance, queries about the Max, Min or Count need to visit all connected nodes, otherwise their precision is altered. These applications need a formal guarantee about the returned results precision. To satisfy this requirement, whilst keeping the efficiency of GBT, we have extended it with additional rules in order to handle the query completeness requirement, as explained in the next section. 4. Extended Greedy & Boundary Traversal (EGBT) Algorithm

ED

Before describing the added rules, we must mention that while they may introduce additional communications overhead, their practical impact remained very low, as confirmed by our simulation evaluation (see Section 8), because the underlying topologies were very infrequent in randomly deployed networks.

PT

4.1. Hole Exploration Rule (HER):

AC

CE

An example of a topology in which GBT fails to visit all nodes is given in pear example (see Figure 7). We note that such situations result from particular nodes dispositions around holes. For instance, in the figure, based on the greedy rule N6 orients the greedy process toward N7 (e.g., the exterior of the hole), leaving hence N8 unvisited. As N8 has only one neighbor, e.g., N6 , it depends if N6 will be visited later by a boundary process or not. Nevertheless, N6 neighbors (e.g., N7 and N5 ) prevent any boundary process from reaching this node, thus leading to missed nodes (e.g., N8 ) for this query. To overcome this problem, we forced the query process to explore holes in order to ensure that they are freed of unvisited nodes. To this end, we have introduced the notion of Hole Closer Node (HCN) defined as follows: Definition 5. (Hole Closer Node) We say a node is a Hole Closer Node (HCN) iff it verifies the three following conditions: (a) it is located on the boundary of a hole H, (b) it currently holds the serial query and (c) two of its neighbors on the same hole H have already been visited.

11

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

1 HCN N15 2

N2

N1

12

QIN

Link N4

3

N3

N14

N5

N13

N6 N7

5 Hole

N9

N12 N11

N10

6

CR IP T

N8

4

Figure 7: Pear example: (1) first greedy sequence, (2) first exploration sequence, (3) first boundary sequence, (4)

AN US

second greedy sequence, (5) hole exploration termination sequence, (6) termination sequence

In other words, HCN could be the last visited node on the boundary of a hole; i.e., it closes the hole as illustrated in Figure 7 (N15 ). Hence, when a node detects itself as a HCN, it then verifies that the hole is empty from any unvisited nodes. To this end, it operates into two steps:

PT

ED

M

Step 1: it launches a control packet of boundary traversal nature in the hole (i.e in Figure 7, in the right side of the vector (N15 ,N2 )). This packet, containing the identity of the HCN and a flag (UF: Unvisited Flag) to indicate whenever there is at least one unvisited node in the hole. This packet will therefore travel along the hole boundary. Its objective is to mark all previously visited nodes in order to disable them to apply greedy rule whenever they have the opportunity (i.e., when they have unvisited neighbors). This information is stored locally by only previously visited nodes in order to indicate that this hole is under exploration and they need not to apply the GR on messages coming from this hole. By doing so, we ensure that whenever the greedy process is triggered inside the hole, it will always remain inside it until its entire exploration. In our example, N8 will modify the flag UF. When the hole exploration packet gets back to the HCN, the latter can decide to stop the hole exploration phase if there not exist unvisited nodes or decide to go to step 2 otherwise.

AC

CE

Step 2: whenever this step is triggered by the HCN, it consists in sending the query message with additional information about the originating HCN. For this reason, the query message is extended with a list containing the identities of possible HCNs. This information means that the query process is in hole exploration mode and consequently whenever the boundary process is launched inside the hole, all previously marked nodes are disabled to use greedy rule. In other words, the query remains in the hole until local termination is detected. By local termination, we mean the situation in which a previously unvisited node will detect query message looping. For instance, in our example, N8 , after contributing in the query, will start a boundary sequence as usual and will detect query message looping. In this case, hole exploration is considered as finished, and this node can release the query in the sense that it removes HCN identity, recorded previously by a HCN, from the message. From that instant, the query processing could be continued normally. In the example, N7 , after receiving the BTR message from N6 , changes the boundary process orientation toward the external network boundary since it belongs to NBN set and sends it to N5 . Hence, N7 will then be able to detect termination according to the previous termination condition, ensuring the contribution of all nodes in the network. 12

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

13

M

AN US

CR IP T

Nevertheless, HER raises an additional issue when the HCN is located on more than one hole: how could it locally (i.e., with only 1-hop information) identify the right hole to explore? In other words, what kind of information could the HCN use to distinguish the correct closed hole? To illustrate, we take the example of Figure 8. When N receives the query message from N j , it detects itself as a HCN since Ni and N j , its two neighbors on the two holes (e.g., Hole 1 and Hole 2), have been visited. But its local view does not allow it to see from which hole the query message comes from. To address this issue without generating overhead, we propose to integrate this task within the hole exploration message as follows: we extend the hole exploration message with an integer and a flag. The role of the integer, named FVN (First Visited Nodes), is to record the total number of First Visited Nodes. Initially, the flag is set on counting mode and the FVN to zero. When a node receives the exploration message, it first verifies if the flag is on counting mode or not. If it is set on counting and the node was already visited, it then increments the FVN value, otherwise, it sets the flag on no counting mode which stops the counting integer updates (i.e., it is the first unvisited node the exploration message meets). Of course, the boundary traversal process is initiated in the direction of the hole, and its orientation is set towards the last visited node as illustrated in Figure 8. We note that the HCN must launch as many hole exploration messages as the number of holes it closes. For instance, in the example of Figure 8, N has to launch two hole exploration messages. When all the messages get back to the HCN, it chooses the hole with the largest FVN value. By doing so, we ensure that any closed hole is totally explored before the query process continues. Consequently, our approach prevents leaving unvisited nodes inside holes. Of course, one can state that this approach has the drawback of additional overhead due to the exploration of several holes. Nevertheless, in practice, as shown in Section 8, we have noticed very few situations in which HCNs generate more than one hole exploration message. Their practical impact was insignificant as reported by our simulation results.

Hole 1 N Hole 2

Ni

PT

ED

Nj

CE

Figure 8: Hole Exploration: two messages launched towards two holes; i.e., Hole 1 and Hole 2.

4.2. Hole Exit Rule (HXR):

AC

Similarly, in very particular topologies, such as the one reported in Figure 9, a query message could loop inside a hole. In this case, when a node detects that a message is looping inside a hole, it sets up the flag RF on exit mode and sends it using the boundary traversal algorithm. When a node located on another hole receives this message, it changes its direction toward the other hole and sets up the flag RF on boundary mode. And so on, until the message is “broken” by a greedy rule or encounters a node belonging to NBN set, which in turn changes its orientation outside and is then able to detect termination condition. The overall serial query algorithm, executed by each node receiving the query message, is depicted in Algorihm 1. The algorithm resumes the different rules, exposed before, ordered by their execution order.

13

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

14

5. Complexity Analysis

ED

M

AN US

Require: Receive message M from the previous hop Greedy Rule (GR) if (M.HCN List is empty and unvisited neighbors) then Select the next hop based on GR Update M.RF and M.BTI Send the message M end if Boundary Traversal Rule (BTR) if (M.HCN List is empty and M.BTI is null and BN is True) then M.BTI = MyId end if Send the message M Hole Exploration Rule (HER) if (Marqued is False) then Apply Greedy Rule else Apply Boundary Rule end if Hole eXit Rule (HXR) if (M.RF is “Exit mode”) then if (The current node has other holes to explore) then Change the orientation toward the unexplored hole Update M.RF to Boundary mode Send the message M else Apply Boundary Rule end if end if

CR IP T

Algorithm 1 GBT Algorithm.

AC

CE

PT

In this section, we study the complexity of our approach from three different points of views: space, time and communication. From space complexity, the proposed approach extends the neighboring table with a boolean recording the visiting status of the corresponding neighbor. Hence, its space complexity is of O(n), where n stands for the number of neighbors (n = |V|). Concerning the time complexity, the proposed approach performs either the GR or the BTR. In the case of the GR, the current node selects from the unvisited neighbors the closest one to the QIN, this part takes at most O(k) where k means the number of unvisited neighbors. In the worst case, k is equal to n; i.e., the number of neighbors. In the case of the BTR, the time complexity of selecting the next hop according to our boundary traversal algorithm is of O(n log n). Nevertheless, We note that the first execution of our algorithm is of O(n log n), because it involves sorting neighbors according to their angles. Once this step done, the next executions are of O(n). Hence, in the worst case, the time complexity of our approach is O(n log n). Communication complexity, which reflects the number of required communications for one query execution, is more complex to study because it depends of several factors as the network topology, the underlying rooting structures, etc. Nevertheless, for our approach, we provide hereafter a general complexity analysis, making abstraction of these factors. Our GBT algorithm alternates the use of GR and BTR. It generates a sequence of visited nodes which has the form S = (GS 1 , BS 1 , GS 2 , BS 2 , · · · , GS k , BS k ), where 14

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

15

Hole Exploration

Hole 1

3

1

4

Looping sequence

5

CR IP T

Hole 2

QIN

2

Hole 1

Network Boundary Node

Figure 9: Example of Hole eXit Rule (HXR) behavior: (1) first greedy sequence, (2) first boundary sequence, (3)

AN US

second greedy sequence, (4) looping sequence and (5) Hole Exit sequence.

GS i and BS i are sequences of nodes visited, respectively, by GR and BTR. BS k stands for the termination sequence. P Pi=k The number of nodes visited by GR and BTR is respectively i=k i=1 |GS i | and i=1 |BS i |. Since for each visited node, one communication is needed, then the whole number of communication is:

M

i=k X i=1

|GS i | + |BS i | (1)

ED

P Pi=k Expression (1) involves two terms: i=k i=1 |GS i | and i=1 |BS i |. To find an upper bound of this expression, we first introduce the following definitions and properties:

CE

PT

Definition 6. (Network Boundaries Degree) The Network Boundaries Degree, denoted NBD, is defined as the maximum of the number of boundaries each node in the network is facing; i.e., NBD = max (boundaries(Ni )) where Ni ∈ N and boundaries(Ni ) stands for the number of boundaries Node Ni is facing. We denote BS = ∪i=k i=1 BS i ; i.e., the sequence containing all nodes having received a BTR message. It has to be noted that BS can contain multiples occurrences of a single node.

AC

Property 1. The number of occurrences of a node Ni in BS is at most equal to 2 ∗ NBD. Proof 2. We note that BTR is always, by construction, triggered from a boundary node, delimiting the boundary of the current visited region. Otherwise, GR can not be stopped. Hence, all nodes occurring in BS are necessary boundary nodes. Furthermore, a node can not occur more than once in any sequence BS j , where 1 ≤ j < k; i.e., except for the termination sequence BS k . This is a direct consequence of the fact that any BS j sequence involves only nodes located on the current visited region which is, by construction, not closed. Consequently, the number of occurrences of a node Ni in BS = ∪i=k−1 i=1 BS i is at most equal to NBD. On the other hand, BS k is the last BTR sequence that traverses the external boundary of the network. 15

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

16

Hence, a node Ni can occur in the sequence BS k at most boundaries(Ni ) times. Finally, 2 ∗ NBD can be set as an upper bound for the occurrence of a node in BS

Proof 3. We have:

i=k X i=1

|GS i | = M

(2)

(3)

k≤M

CR IP T

Theorem 2. Starting from the QIN, GBT algorithm visits nodes of the network in at most O(M) communication complexity, where M is the number of nodes in the network.

AN US

Expressions (2) and (3) are direct consequences of the following facts: GS i ∩ GS j = ∅ for all i , j and ∪i=k i=1GS i = N. This is true because GR hits a node only once and visits all the nodes of the network. On the other hand, thanks to Propriety 1, the number of communications generated by BTR is less than 2 ∗ NBD ∗ M. Overall, GBT generates less than M + 2 ∗ NBD ∗ M communications and hence its communication complexity remains of the order of O(M).

ED

M

In comparison to the other studied approaches2 , reported in Table 2, our approach does not introduce a significant overhead neither in space nor in time. However, it improves in term of number of communication because it is likely of order of M (the number of nodes in the network), which is not the case of the other approaches. For instance, Flooding and Iterative approaches rely both on the network diameter and convergence factor respectively. These factors, depending of the network topology and the query precision, can be important, which increase dramatically the communication cost. The Warehouse approach improves over the two precedent approaches but at an expense of rooting structures construction and maintenance, which is not the case in our approach. Furthermore, beyond complexity, our algorithm improves over the other approaches in term of scalability and robustness. This theoretical result has been confirmed by the experimental study presented in Section 8. Space Compl. O(n) O(M) O(n)

Time Compl. O(n) O(M) O(n)

Com. Compl. O(M) O(d ∗ M) O(c ∗ M)

CE

PT

Approaches Warehouse Flooding Iterative

AC

Table 2: M, n, d, c mean for the number of nodes in the network, the number of neighbors, the network diameter and the convergence factor, respectively.

6. Proof of correctness As for any distributed approach, we hereafter provide proof of the correctness of our proposed approach. More specifically, we first show that our distributed and localized algorithm generates a finite sequence of nodes (i.e., it terminates and does not revert to looping). After, we show that all nodes in the network are visited; i.e., all nodes contribute in the query evaluation. 2

We must mention that the reported communication complexities are ideal and do not consider the communication collisions due to concurrent communications. Serial approaches do not generate communications collisions.

16

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

17

6.1. Notations, definitions and properties We note that GBT generates a sequence of visited nodes, denoted S , which has the following form: B S = {N1G , . . . , NkG , NiB , Ni+1 , . . . , NlB , . . . , NbB , . . . , NbB } {z } | {z } | {z } | BS 1

GS 1

BS f inal

CR IP T

Where GS i means for the sub-sequence of nodes visited by the greedy rule and BS i means for the subsequence of nodes visited by the boundary traversal rules (BTR, HER and HXR). BS f inal stands for the last boundary traversal sub-sequence which is a cyclic sequence allowing termination detection. Definition 7. (visited region) we define the current visited region, denoted ψ, as the set of all currently visited nodes.

AN US

Initially we have ψ = {QIN}, then all sub-sequences GS i will add nodes to ψ. Furthermore, from the connectivity assumption of the network (i.e., ∀Ni , N j ∈ N there exists a path linking Ni to N j ) on one hand and the fact that ψ ⊂ N on the other hand, the set ψ is always connected to the set N. In other terms, if ψ , N, there exists a link Ei j such that Ni ∈ ψ and N j < ψ. The key idea behind our approach is to successively extend ψ by greedy sequences whenever it is possible. When ψ could not be extended, the boundary traversal sequences will change the “head (i.e. the current node)” of the greedy process to another node that could perform it again. The boundary traversal process could be seen as an intermediate process between two consecutive greedy processes. Definition 8. (Boundary of the visited region) we define B(ψ) the set of all nodes belonging to ψ and which are located on its external boundary.

M

Lemma 1. G ,..., At the end of every greedy sequence, GS i = {NiG , Ni+1 NkG }, the last node NkG is a boundary node (BN).

CE

PT

ED

Proof 4. We assume that NkG is not a BN. According to Definition 3 of a boundary node, this means that all angularly adjacent neighbors of NkG could communicate with each other as shown in Figure 10. QIN

W Nk

Figure 10: All neighbors of NkG are connected to each other.

AC

We observe that all its neighbors have been already visited since the greedy rule was stopped at this node. We note W = {N x ∈ Vk | d(QIN, N x ) > d(QIN, NkG )}. W stands for all NkG ’s neighbors that are located far from the query source than NkG itself. Consequently, from the greedy rule and from the fact that all neighbors are connected, this leads to at least one neighbor in W that could not be visited before NkG ; which contradicts the fact that all its neighbors have been visited. From Lemma 1, we have three cases to consider: (a) the case where NkG ∈ NBN; i.e., it is a network boundary node, (b) the case where NkG ∈ Hk ; i.e., it belongs to a hole Hk and at the same time it is located on the boundary of the current visited region (i.e., NkG ∈ B(ψ)) and (c) the case where it belongs to a hole Hk but it is not located on the boundary of the current visited region; i.e., NkG < B(ψ). 17

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

18

6.2. Studying the three cases raised from Lemma 1 These three cases are illustrated in Figure 11. We examine them more precisely by starting with the more “problematic” case. Nk

Hole

Hole

CR IP T

Nk

(b)

(a)

Hole (c)

AN US

Visited Region

Nk

Figure 11: Three possible cases for the end of a greedy sequence.

M

Case (c): NkG ∈ Hk and NkG < B(ψ) The risk in this case is that the boundary sequence BS k that follows immediately fails into looping inside the hole. We show that this situation is not possible.

ED

Lemma 2. B , . . . , N B } ⊂ BS , triggered inside the hole H , then it if it exists a cyclic sub-sequence BS S i = {NiB , Ni+1 k k i exists at least one node Ne ∈ BS S i such that Ne ∈ Hi and Hi , Hk .

AC

CE

PT

B , . . . , N B } is a cyclic boundary sequence inside the hole H , this means that Proof 5. As BS S i = {NiB , Ni+1 k i S all the neighbors of nodes belonging to it (i.e., Vi = V j | N j ∈ BS S i ) have already been visited, otherwise BS S i could not exist; i.e., it will be broken by the GR. We assume that BS S i does not contain any BN in other directions than Hi and we derive a contradiction. According to Definition 3 (BN definition) and since all nodes of BS S i are not BN in other directions, there exists therefore a path connecting Vi nodes without passing by them. We note Nlast as the last visited node, since the boundary sequence has been triggered inside the hole Hk (see Figure 12). However, unless it is a boundary node in other directions, N has neighbors that could not by construction be visited before it (nodes inside the grey region in Figure 12), otherwise there is a violation of the GR. We therefore deduce that there exists at least one BN in BS S i that belongs to a hole other than Hk which contradicts the previous assumption.

In other words, the boundary sequence that follows does not loop indefinitely, and there exists at least one node located on another hole which will change the boundary traversal orientation outside the current hole by using the HXR rule. Lemma 2 ensures that there exists at least one “exit node” when the boundary traversal process fails inside a hole. However, it does not ensure that the boundary traversal process will reach a node located on the boundary of the visited region to allow query progression. 18

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

19

QIN

CR IP T

Hole Nlast

Figure 12: NLast is the last visited node in the boundary sequence BS S i , triggered inside the hole.

AN US

Lemma 3. B , . . . , N B } ⊂ BS , then it exists at least one sequence if it exits a cyclic sub-sequence BS S i = {NiB , Ni+1 k i of nodes S N = {Ne1 , Ne2 , . . . , Net } and one sequence of holes S H = {H1 , H2 , . . . , Ht−1 } such that: Ne1 ∈ (H1 , H2 ), Ne2 ∈ (H2 , H3 ), . . . , Net ∈ (Ht , B(ψ)). Proof 6. We prove by induction. Initially, we are in the presence of a cyclic sequence BS S i inside the “inner” hole. Hence, when the number of holes is only one, Lemma 2 ensures that it exits at least one boundary node that is able to change the boundary process orientation outside the current hole. Assume now that this is true for t imbricated holes as illustrated in Figure 13.

M

3

QIN

1

Hole t Hole 1 4

2

CE

PT

ED

Hole t+1

AC

Figure 13: Illustration of imbricated holes: (1) first greedy sequence, (2) first boundary sequence, (3) second greedy

sequence and (4) Hole Exit sequence.

There exists therefore a sequence of nodes S N = {Ne1 , Ne2 , . . . , Net } such that (Ne1 ∈ H1 ∧ Ne1 ∈ H2 ), (Ne2 ∈ H2 ∧ Ne2 ∈ H3 ), . . . , (Net ∈ Ht ∧ Ne2 ∈ Ht+1 ). For hole Ht+1 , two situations are possible: (a) either there are nodes which have unvisited neighbors and in this case the boundary sequence will be broken or (b) there are boundary nodes which will change the boundary process orientation toward the outside of the hole. In both cases, the last node belongs to B(ψ).

Lemma 3 ensures that whenever the boundary traversal is triggered inside a hole, it always succeeds to get out of the hole and reach a visited region boundary node. 19

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

20

6.3. Main results Theorem 3. B , . . . , N B } is finite. Any boundary sequence BS i = {NiB , Ni+1 k

CR IP T

Cases (a) and (b): NkG ∈ B(ψ) It is obvious that if the greedy sequence stops at a network boundary node NkG , then it necessarily belongs to the set B(ψ) (i.e., located on the boundary of the current visited region). Hence the next boundary sequence will start from a boundary node and consequently all visited nodes belong to B(ψ). If B(ψ) , NBN, there exits at least one link Ei j such that Ni ∈ B(ψ) and N j < B(ψ). In this case, the boundary process will be “broken” by the greedy process at node Ni . Otherwise, when B(ψ) = NBN, the termination condition will stop the boundary process.

AN US

Proof 7. By means of Lemma 2 and Lemma 3, any boundary cyclic sequence triggered inside a hole will finish either by getting out of the hole or by another greedy process. On the other hand, our boundary traversal algorithm ensures boundary traversal; i.e., the boundary sequences are always progressing. In addition, for messages of HER (Hole Exploration Rule) or HXR (Hole eXit Rule) type, the two relative rules ensure their termination. Finally, for the termination sequence, the cyclic detection also allows the termination of this boundary sequence. In summary, whatever the boundary sequence is, it is always finite.

ED

M

As mentioned before, our proposed serial approach generates alternatively greedy sequences, followed immediately by boundary sequences. The greedy sequences are finite by construction and thanks to Theorem 3, boundary sequences are finite too. Hence, our proposed approach generates a finite sequence and does not revert into looping. Now we show that our approach visits all connected nodes into the network, given that there are no disconnected parts in the network (the network is not partitioned). The key idea behind our demonstration is that the greedy process extends the visited region, whereas the boundary process travels along its boundary in order to “meet” unvisited nodes.

PT

Theorem 4. GBT approach visits all connected nodes.

AC

CE

Proof 8. We demonstrate by contradiction: assume that there exists an unvisited set of nodes U left by our approach GBT. Under the connectivity condition, the set U is, on one hand, itself connected (i.e., there exits a path connecting any two nodes in U), and it is also connected to some visited nodes on the other hand. From this, three situations are to be considered as shown in Figure 14: (a) no node belonging to U is boundary node. This means that nodes on the external boundary of U are connected to visited nodes, which is in contradiction with the greedy rule. (b) Some nodes in U are hole boundary nodes and the rest are connected to visited nodes. This configuration is clearly in contradiction with the greedy and hole exploration rules. (c) Finally, some nodes in U are network boundary nodes and the rest is connected to visited nodes. Also, this configuration contradicts the termination condition. In summary, all possible configurations lead to contradiction.

7. Performance Evaluation In order to evaluate the effectiveness of our proposed approach, we have conducted several series of simulations using OMNET++ simulator [21] with Castalia package [22]. 20

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

21

Unvisited Region

Hole (b)

(a)

(c)

CR IP T

Figure 14: Three possible cases for an unvisited region: (a) connected to the visited region, (b) on the boundary of a

hole and (c) on the network boundary.

7.1. Implemented Approaches

In addition to our GBT approach, we have implemented the following approaches, representative of centralized, distributed and localized approaches:

AN US

• Warehouse Approach (WA): This approach operates in two steps: (a) during the first step, the sink broadcasts the query in the network in order to reach all nodes. Hence, each node receiving the query will broadcast it at its turn. (b) Upon receiving the query, nodes send their data to the sink during the second step. In our implementation, we have used the multi-path approach proposed in Castalia package for this purpose.

M

• Flooding Approach (FA): In this approach, as mentioned earlier, each node broadcasts its own data as well as data received during previous iterations. The process stops when all nodes hold all the data. The estimated parameter could then be computed locally.

ED

• Iterative Approach (IA): Initially launched from the QIN, each node starts exchanging its own estimate with its immediate neighbors. At each iteration, the local estimate is updated with weighted data received during the previous iteration. And so on. The process terminates when each node detects that the difference in values between two consecutive terms of the estimated parameter is smaller than a certain threshold (we have fixed it in our experiments to 10−3 ).

AC

CE

PT

• Depth First Search (DFS): As 1-hop localized and serial approach, DFS [23] constructs a rooted tree by expending this tree, at each step, towards unvisited nodes whenever it is possible (there is an unvisited neighbors). When a node has not unvisited neighbors, it sends then the message to its previous node (i.e., “father node”). Hence, each node, except the root (i.e., the QIN), has a upper node and each node has one or several children except the leafs. Moreover, each node stores locally the constructed path (more precisely, it stores a reference to its father). To explore a network with N connected nodes, this algorithm requires 2 ∗ (n − 1) communications.

It has to be noted that all the implemented approaches ensure query completeness regardless the network configuration. In other words, all nodes contribute in the query. We have not considered other localized approaches than DFS because they do not ensure query completeness as the case of space filling approaches [12], mentioned previously. For comparison purposes, we have used three main metrics: (a) communication efficiency which reports the required communications used in the network to accomplish one query, (b) the overall consumed energy in the network for one query and finally (c) time-to-end that records the time spent to accomplish one query. 21

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

22

7.2. Simulation scenarios and settings To measure the degree of vulnerability of serial approaches in general, and our GBT approach in particular, we have constructed in our simulation two different classes of scenarios: • Safe Scenarios: in this class of scenarios, we have considered networks without explicit failures, in particular link failures. This means that only during the query evaluation phase, the network is free of link failures.

AN US

CR IP T

• Noisy Scenarios: in this class of scenarios, we have explicitly introduced failures. To this end, we have defined the Link Failure Probability, denoted LFP, as the probability that the current link falls into failure. In practice, when the current node holds the query, it first selects the next hop according to the current used rules (i.e., GR for GBT for instance). Afterwards, it generates a random number comprised between 0 and 1. If the generated number is less than or equal to LFP, then the link is considered a failed link and the current node performs another next hop selection. Otherwise, the link is considered valid and the message is sent. Depending on the duration of the link failure, we have defined two sub-scenarios: (a) Temporal Link Failure (TLF) in which the link is considered unavailable only for the current selection and (b) Permanent Link Failure (PLF) where the link is considered lost for the rest of the query.

PT

ED

M

In addition, as mentioned before, serial approaches have the advantage of not generating packet collisions precisely because of their serial nature; i.e., at each time only one node is sending data. For this purpose, we have compared the studied serial approaches by enabling and disabling the MAC layer. We have used the CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) MAC protocol, available in Castalia package. Within this protocol, each node, prior to communication with another node, checks if the channel is clearer (i.e., no other node is transmitting at the time). To this end, it first sends a RTS (Request To Send). The receiver answers with a CTS (Clear To Send), indicating that the channel is free. Then the sender starts communication. However, if the channel is not clearer, the nodes waits for a randomly chosen period of time, before checking again. In our proposal, all this “negotiation” is no more necessary since only one node, holding the query, is able to communicate. Our objective, through this series of experiments, is to highlight the possible gains in term of reducing query responsiveness (i.e., query time-to-end). The overall simulation parameters are depicted in Table 3.

AC

CE

Parameter Network Area (m2 ) Transmission range (m) Node distribution Number of nodes Link Failure Probability Query initiator location Packet Size

value(s) 1000 x 1000 150 Uniform 100, 150, 200, . . . , 500 0%, 5%, 10%, 15%, . . . , 50% Random 50 Bytes

Table 3: Simulation parameters

We have used random uniform distribution to generate connected network deployment. We have fixed the communication range and varied the number of nodes from 10 to 500. We have noticed that below 100 nodes, the generated networks were not connected; i.e., not all nodes are connected to each others. Knowing 22

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

23

that network connectivity is a requirement for a fair comparison, we have not considered unconnected configurations. For the energy consumption metric, we have used the well known model proposed in [24]. In this model, the radio dissipates Eelec = 50 nJ/bit to run the transmitter/receiver circuitry and amp = 100 pJ/bit/m2 for the transmitter amplifier. Thus, to transmit a k-bit message within a distance d using this model, the radio consumes:

and to receive this message, the radio consumes:

ERx (k) = Eelec ∗ k

CR IP T

ET x (k, d) = Eelec ∗ k + amp ∗ k ∗ d2

Finally, for each query configuration (i.e., network deployment and query initiator), we have run the studied approaches and recorded the obtained results. The presented results below are the mean values of 30 runs for each network configuration.

AN US

8. Results

Before presenting details about the obtained results, we first discuss the impact of holes on the overall performance of our approach.

PT

ED

M

8.1. Holes Impact As mentioned in previous sections, we have noticed that in all network configurations tested in our simulations (about 2000 network configurations), we have not faced any configuration as those presented previously in Section 4. The HER, whenever used, has returned one loop; i.e., the hole has already been explored previously and the second step was not triggered by the HCN (see HER in Subsection 4.1). Furthermore, we have noted in very few sparse configurations (less than 0.02% of the total number of studied configurations) in which the HCN has launched more than one hole exploration message (i.e., HCN located on more than one hole) during its first step. In conclusion, the practical impact of holes (in the sense of using HER and HXR rules) was not noticeable on the performance of our approach.

AC

CE

8.2. Safe Scenarios Table 4 summarizes the obtained results for required communications (hop count). As expected, distributed approaches (Flooding and Iterative) are by far the greatest consumers of communications. This is due to the fact that they were primarily designed to experience better robustness than centralized approaches in noisy environments. Their distributed nature therefore generates enormous communications. We also note that Warehouse approach requires a lot of communications as well, in comparison to serial approaches. These results confirm the overall performance of serial approaches over centralized and distributed ones, as confirmed in previous research works [10], In Figure 15, we have explicitly plotted the results of serial approaches (DFS and GBT) with the theoretical optimal curve representing (m−1) hops for a network of m nodes. We can see that in sparse networks DFS approach slightly outperforms GBT approach. The reason for this is that in sparse configurations, there are many linear paths and our approach by construction will visit them at least twice: one by the greedy rule and one by the boundary rule. However, when the network becomes dense, GBT performs better than DFS. This result is more noticeable in dense networks, as it is the case for 500 nodes (i.e., nearly half communications reduction). The reason behind this improvement is that as the network is dense, each node 23

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

WA 2517 7133 12544 22831 23860 35005 49151 66069 80437

FA 68294 179568 387673 674752 1057230 1501175 2096460 2768062 3514371

IA 49489 266677 309449 532968 733622 994286 1124348 1491072 1861666

DFS 198 298 398 498 598 698 798 898 998

GBT 202 228 286 303 392 411 458 513 564

Opt. 99 149 199 249 299 349 399 449 499

CR IP T

#Nodes 100 150 200 250 300 350 400 450 500

24

Table 4: Communication Efficiency (# hops)

DFS GBT Opt.

AN US

1000

800

600

400

200

150

200

250 300 350 Number of Nodes

400

450

500

ED

0 100

M

Totale Number of Communications (# hops)

1200

PT

Figure 15: Communication Performance for Serial Approaches in Safe Scenarios.

AC

CE

has many neighbors and consequently the greedy rule is more often used. The limited use of the boundary traversal rule contributes in this improvement. We also note that GBT curve increases linearly with the increase of the number of nodes in the network, which confirms the good scalability of our approach. In Tables 5 and 6, we present the obtained results for energy consumption and query responsiveness (i.e., query time to end) respectively. For the same reasons as exposed above, serial approaches largely outperform other approaches, and consequently can noticeably improve the lifetime of the network. We note the effectiveness of our approach GBT over DFS, particularly in dense networks where the improvement results in a reduction of energy consumption of more than fifty percent. As expected for the query responsiveness, the iterative approach experiences relatively high delays in comparison to other approaches. This is due to its iterative nature, where several iterations are needed before it converges to the right estimate: we have set the convergence parameter to 10−3 . In Figure 16, we have plotted the results for the other approaches. The curves show that in sparse networks, Centralized approach slightly outperforms the other approaches because of the parallel communications it performs and the reduced number of nodes in the network. However, when the network becomes relatively dense, GBT shows a better behavior than the other approaches. Even though the query responsiveness increases with the increase of nodes in the network for all approaches, this increase remains 24

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

WA 16.65 56.65 99.75 195.73 172.64 265.33 392.25 547.65 656.49

FA 124.89 297.72 553.50 891.71 1318.13 1815.41 2403.04 3070.16 3815.71

IA 66.56 278.39 335.22 485.39 559.67 717.04 773.57 891.58 1035.53

DFS 1.10 2.07 2.68 3.32 4.10 4.72 5.33 6.04 6.72

GBT 1.17 1.27 1.56 1.62 2.08 2.04 2.27 2.53 2.78

CR IP T

#Nodes 100 150 200 250 300 350 400 450 500

25

Table 5: Total Query Energy Consumption in the Network (J).

WA 0.76 1.47 2.11 3.48 3.36 5.64 5.70 7.72 7.88

FA 1.43 3.57 5.39 6.72 8.15 9.38 11.96 13.84 15.82

IA 153.51 179.26 266.35 306.52 393.77 494.23 517.41 625.55 699.07

DFS 1.92 2.84 3.83 4.79 5.64 6.73 7.77 8.77 9.89

GBT 1.94 2.22 2.85 2.87 3.76 3.93 4.36 4.96 5,55

AN US

#Nodes 100 150 200 250 300 350 400 450 500

M

Table 6: Query Time-To-End (s).

PT

8.3. Link Failure Scenarios

ED

however more pronounced in FA, for instance than in our GBT approach. This confirms once again the good scalability of our proposal for query responsiveness.

AC

CE

In this series of simulations, we have gradually introduced link errors in the network according to the scenario described previously. We have varied the LFP from 0.0 to 0.50 and measured the percentage of visited nodes in the network for both TLF and PLF scenarios. The obtained results are shown in Table 7. Three remarks could be drawn from these results: (a) In all considered configurations, GBT experiences better behavior than DFS in the presence of temporal as well as permanent link failures. The DFS misbehavior stems primarily from the fact that in this approach each node has to locally record its father. When a son holds the query and the link between it and its father is broken, the query is consequently stopped. In this case, the QIN, after a determined waiting time, has to launch another query leading thus to poor performance in terms of communication overhead, as well as longer query execution time. Conversely, in our approach, thanks to its localized nature (i.e., the next hop is selected locally among available links), the query holder could continue the query processing even when a link is broken. For this reason, GBT shows better support for link perturbations, hence demonstrating its robustness. (b) The performance of GBT approach is somewhat proportional to the network size. In other terms, when the network size increases, the performance of GBT increases too, as shown in Figure 17. This is because in dense networks, GBT more frequently uses the Greedy Rule which is less sensitive to link failures. For the 25

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

20

26

WA FA DFS GBT

10

5

0 100

150

200

250 300 350 Number of Nodes

400

450

500

AN US

Figure 16: Query Responsiveness.

CR IP T

Query Time-To-End (s)

15

PT

ED

M

DFS approach, however, the more the network is dense and large, its performances decrease accordingly. This is due to the tree structure it constructs in order to visit all nodes. Hence, such a structure is more sensitive to link failure and consequently, when its size increases, the risk of query interruption becomes important. These results clearly confirm the scalability feature of our proposal even in noisy environments. (c) As expected, in TLF scenarios both approaches have demonstrated better behavior than in PLT, but surprisingly, the difference between the two scenarios for each considered approach was not so significant. The reason behind this is that in our GBT approach, a node is rarely visited more than twice (except nodes on the boundaries) which limits the impact of permanent link failures for the considered query. For DFS approach, as it uses a tree, the impact of link failure, whatever its nature, is noticeable only during the expansion phase of the query (i.e., when a node is looking for its children). However, TLF as well as PLF have the same impact during the backward phase (i..e, when a node is looking for its father). In summary, these results validate the robustness of our proposed GBT approach. 8.4. MAC Layer Impact

AC

CE

To measure the impact of using a MAC protocol on the performance of serial approaches, in particular on the query responsiveness, we have conducted this series of simulations by enabling and disabling the MAC protocol on the same network configurations. Figure 18 shows the obtained results for the query time-to-end. It has to be noted that we have not observed any impact of the MAC protocol on the number of communications, nor on the energy consumption. The curves clearly show the potential gains that serial approaches could obtain from disabling MAC protocol, especially to improve the query responsiveness. Hence it has been generally reduced nearly sixfold. Such an optimization could have a great impact on very large-scale WSN deployments. 9. Conclusion and future work This paper has addressed an important issue, namely query processing, in constrained WSNs where resources have to be preserved as long as possible in order to prolong the network lifetime. Serial approaches, among other centralized and iterative approaches, have shown an interesting improvement in 26

ACCEPTED MANUSCRIPT

27

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31 150

200

LFP

DFS

GBT

DFS

GBT

DFS

GBT

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

100% 24% 12% 9% 5% 4% 3% 4% 2% 3% 2%

100% 100% 94% 88% 85% 85% 69% 67% 81% 66% 43%

100% 20% 5% 5% 4% 2% 3% 3% 1% 1% 2%

100% 100% 100% 99% 92% 93% 93% 97% 83% 70% 72%

100% 11% 4% 2% 3% 1% 2% 1% 1% 1% 1%

100% 100% 96% 100% 99% 97% 89% 91% 91% 98% 86%

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

100% 24% 12% 13% 6% 4% 3% 5% 2% 3% 2%

100% 93% 90% 88% 82% 71% 65% 75% 61% 37% 35%

100% 18% 5% 5% 4% 2% 3% 2% 2% 1% 2%

100% 100% 97% 100% 81% 93% 90% 73% 77% 66% 62%

100% 9% 3% 2% 3% 1% 2% 1% 1% 1% 1%

100% 100% 96% 99% 93% 96% 94% 93% 88% 91% 81%

Number of Nodes in the Network 250 300 DFS GBT DFS GBT Temporal Link Failure (TLF) 100% 100% 100% 100% 8% 100% 5% 100% 3% 100% 4% 100% 3% 100% 2% 100% 2% 100% 2% 100% 1% 100% 2% 99% 2% 100% 1% 99% 1% 98% 1% 99% 1% 96% 1% 99% 1% 99% 1% 99% 1% 88% 1% 96% Permanent Link Failure (PLF) 100% 100% 100% 100% 8% 100% 5% 100% 3% 100% 4% 100% 2% 100% 2% 100% 1% 100% 2% 100% 2% 100% 2% 100% 2% 98% 1% 100% 1% 99% 1% 99% 1% 98% 1% 99% 1% 92% 1% 91% 1% 82% 1% 90%

350

400

500

DFS

GBT

DFS

GBT

DFS

GBT

100% 10% 4% 2% 2% 1% 1% 1% 1% 1% 1%

100% 100% 100% 100% 99% 98% 100% 98% 99% 99% 99%

100% 7% 3% 1% 1% 1% 1% 1% 1% 1% 0%

100% 100% 100% 100% 99% 98% 100% 99% 100% 100% 100%

100% 5% 2% 1% 1% 0% 1% 0% 0% 1% 0%

100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%

CR IP T

100

100% 9% 4% 2% 2% 1% 1% 1% 1% 1% 1%

100% 100% 100% 100% 100% 100% 100% 100% 98% 100% 91%

100% 7% 3% 1% 1% 1% 1% 1% 1% 1% 1%

100% 100% 100% 100% 100% 98% 100% 99% 100% 100% 97%

100% 5% 2% 1% 1% 0% 0% 1% 0% 1% 0%

100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%

AN US

Table 7: Percentage of Visited Nodes in Temporal Link Failure Scenario and Permanent Link Failure Scenario Respectively.

CE

PT

ED

M

terms of reducing communications, and consequently preserving the network resources whilst enhancing query responsiveness. In this paper, we have proposed a distributed and localized serial approach that performs very efficiently in large-scale and dense networks. This approach has been especially designed to fulfill the scalability and robustness requirements which are essential to large-scale WSNs deployment. Moreover, we have shown the theoretical correctness of our approach, whilst the simulation series we have conducted validated its effectiveness. Although our approach has been primarily designed for static networks during the query evaluation, we believe that it can support nodes mobility to a certain extent, without generating overhead due to beaconing. In fact, based on previous nodes positions, the two most used rules, namely GR and BTR, could be resilient to false nodes positions, provided that the distances between the old and the new positions of neighbors remain relatively short. Many other parameters impact this issue like nodes communication range, movement velocity, network topology, etc. We do believe that this important issue deserves more research efforts. This is precisely the subject of our future investigation. References

AC

[1] A. Boukerche, Algorithms and Protocols for Wireless Sensor Networks, Wiley Series on Parallel and Distributed Computing, John Wiley & Sons, 2008. [2] R. A. G. da Costa, C. E. Cugnasca, Use of data warehouse to manage data from wireless sensors networks that monitor pollinators, in: Proceedings of the Int’l Conf. on Mobile Data Management, 2010, pp. 402–406. [3] E. Nakamura, A. F. Loureiro, A. Boukerche, A. Y. Zomaya, Localized algorithms for information fusion in resource constrained networks, Information Fusion 15 (2014) 2–4. [4] K. Soummya, J. M. Moura., Distributed consensus algorithms in sensor networks with imperfect communication: Link failures and channel noise, IEEE Transactions on Signal Processing 57 (1) (2009) 355–369. [5] B. Oreshkin, M. Coates, M. Rabbat, Optimization and analysis of distributed averaging with short node memory, IEEE Transactions on Signal Processing 58 (5) (2010) 2850–2865. [6] Z. Ye, A. A. Abouzeid, J. Ai, Optimal stochastic policies for distributed data aggregation in wireless sensor networks, IEEE/ACM Transactions on Networking 17 (5) (2009) 1494–1507.

27

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

28

100 80 60 40 20 0 100

150

200

250 300 350 Number of Nodes

400

CR IP T

Percentage of Visited Nodes (%)

GBT (LFP=0.05) GBT (LFP=0.50) DFS (LFP=0.05) DFS (LFP=0.50)

450

500

AN US

Figure 17: Number of Visited Nodes in Noisy Environment.

AC

CE

PT

ED

M

[7] S. Guo, L. He, Y. Gu, B. Jiang, T. He, Opportunistic flooding in low-duty-cycle wireless sensor networks with unreliable links, IEEE Transactions on Computers 63 (11) (2014) 2787–2802. [8] L. Xiao, S. Boyd, S. lall, A scheme for robust distributed sensor fusion based on average consensus, in: Proceedings of the Int’l Conf. on Information processing in Sensor Networks (IPSN), 2005, pp. 63–70. [9] M. S. Talebi, M. Kefayati, B. H. Khalaj, H. R. Rabiee, Adaptive consensus averaging for information fusion over sensor networks, in: Proceedings of the IEEE Int’l Conf. on Mobile Adhoc and Sensor Systems (MASS), 2006, pp. 562–565. [10] M. Rabbat, R. Nowak, Quantized incremental algorithms for distributed optimization, IEEE Journal on Selected Areas in Communications 23 (4) (2005) 798–808. [11] A. Nedic, D. P. Bertsekas, Incremental subgradient methods for nondifferentiable optimization, SIAM Journal on Optimization 12 (1) (2001) 109–138. [12] S. Patil, S. R. Das, A. Nasipuri, Serial data fusion using space-filling curves in wireless sensor networks, in: Proceedings of the First Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks. IEEE SECON 2004, 2004, pp. 182–190. [13] R. D. Nowak, Distributed em algorithms for density estimation and clustering in sensor networks, IEEE Transactions on Signal Processing 51 (8) (2003) 2245–2253. [14] M. R. G., D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness, W. H. Freeman, New York, 1983. [15] J. S., Survey of local algorithms, ACM Computing Surveys (CSUR) 45 (2) (2013) 24. [16] C. Gentile, N. Alsindi, R. Raulefs, C. Teolis, Geolocation Techniques, Principles and Applications, Springer, 2013. [17] Q. Fang, J. Gao, L. Guibas, Locating and bypassing routing holes in sensor networks, in: Proceedings of IEEE INFOCOM, 2004, pp. 2458–2468. [18] W. Liu, K. Feng, Greedy routing with anti-void traversal for wireless sensor networks, IEEE Transactions on Mobile Computing 8 (7) (2009) 910–922. [19] A. Mostefaoui, M. Melkemi, A. Boukerche, Routing through holes in wireless sensor networks, in: Proceedings of the 15th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems, ACM, 2012, pp. 395–402. [20] B. Karp, H. T. Kung, Gpsr: Greedy perimeter stateless routing for wireless networks., in: Proceedings of the 6th ACM International Conference on Mobile Computing and Networking, MobiCom’00, ACM, 2000, pp. 243–254. [21] Omnet++ : Simulation environment (2015). URL http://www.omnetpp.org/ [22] Castalia : Wireless sensor network simulator (2015). URL http://castalia.research.nicta.com.au/index.php/en/ [23] E. Shimon, Graph Algorithms, Cambridge University Press, 2011. [24] W. B. Heinzelman, A. P. Chandrakasan, H. Balakrishnan, An application-specific protocol architecture for wireless microsensor networks, IEEE Transactions on Wireless Communications 1 (4) (2002) 660–670.

28

ACCEPTED MANUSCRIPT

12

8

6

4

M

Query Time-To-End (s)

10

DFS with MAC GBT with MAC DFS without MAC GBT without MAC

ED

2

0 100

AN US

CR IP T

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

150

200

250 300 350 Number of Nodes

400

450

500

AC

CE

PT

Figure 18: MAC Protocol Impact on the Query Responsiveness.

29

29

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

30

Biography

ED

M

AN US

CR IP T

Azzedine Boukerche (FEiC, FCAE, FAAAS) is a full professor and holds a canada research chair position at the University of Ottawa, Ontario, Canada. He is the founding director of PARADISE Research Laboratory and the scientific director of the DIVA Strategic Research Network at the University of Ottawa. His current research interests include sensor networks, mobile ad hoc networks, mobile computing, wireless multimedia, and distributed computing. He serves as an Associate Editor for several IEEE and ACM journals. He received several awards for his work on wireless sensor networking and mobile computing, including the George S. Glinski Award for Excellence in Research, the Premier of Ontario Excellence in research Award, the Ontarion Distinguished researcher award, the IEEE Core Golden Award, and the University of Ottawa Award for Excellence in Research.

AC

CE

PT

Ahmed Mostefaoui is currently an associate professor at the University of Franche Comte, France, since 2000. He received the M.S. and Ph.D. degrees in computer science from Ecole Normale Suprieure de Lyon (France) in 1996 and 2000, respectively. His research interests are in distributed algorithms in wireless ad-hoc and sensor networks emphasizing both practical and theoretical issues, multimedia systems and networking, in particular, distributed architectures.

Mahmoud Melkemi received his PhD in applied mathematics from the University of Grenoble, France in 1992. From 1993 to 2004, he worked as an Associate Professor in the Claude Bernard University, 30

ACCEPTED MANUSCRIPT

A. Boukerche, A. Mostefaoui and M. Melkemi / Ad Hoc Networks 00 (2016) 1–31

31

AC

CE

PT

ED

M

AN US

CR IP T

Lyon, France. He joined the Haute Alsace University, Mulhouse, France, in 2005 as a Professor. His main research interests are in the fields of pattern recognition, computer graphics and computational geometry with applications in ad hoc and sensor networks.

31