An efficient top-k query processing framework in mobile sensor networks

An efficient top-k query processing framework in mobile sensor networks

    An Efficient Top-k Query Processing Framework in Mobile Sensor Networks Heejung Yang, Chin-Wan Chung, Myoung Ho PII: DOI: Reference: ...

529KB Sizes 0 Downloads 36 Views

    An Efficient Top-k Query Processing Framework in Mobile Sensor Networks Heejung Yang, Chin-Wan Chung, Myoung Ho PII: DOI: Reference:

S0169-023X(16)30001-5 doi: 10.1016/j.datak.2016.02.001 DATAK 1557

To appear in:

Data & Knowledge Engineering

Received date: Revised date: Accepted date:

8 December 2014 28 September 2015 5 February 2016

Please cite this article as: Heejung Yang, Chin-Wan Chung, Myoung Ho, An Efficient Top-k Query Processing Framework in Mobile Sensor Networks, Data & Knowledge Engineering (2016), doi: 10.1016/j.datak.2016.02.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

PT

An Efficient Top-k Query Processing Framework in Mobile Sensor Networks

a The

RI

Heejung Yanga , Chin-Wan Chunga,b,∗, Myoung Ho Kima

School of Computing, KAIST, Daejeon 34141, Republic of Korea Hongguang Avenue, Banan District, Chongqing 400054, China

NU

SC

b No.69

Abstract

MA

Mobile sensor networks consist of a number of sensor nodes which are capable of sensing, processing, communicating and moving. These mobile sensor nodes move around and explore their surrounding areas. Top-k queries are useful in

D

many mobile sensor network applications. However, the mobility of sensor nodes incurs new challenges in addition to the problems of static sensor networks (i.e.,

TE

resource constraints). Since mobile sensor nodes tend to move continuously, the network condition changes frequently and they consume considerably more en-

AC CE P

ergy than static sensor nodes. In this paper, we propose an efficient top-k query processing framework in a mobile sensor network environment called mSensor. To construct an efficient routing topology, we devise a mobility-aware routing method. Using the semantics of the top-k query, we develop a filter-based data collection method which can save the energy consumption and provide more accurate query results. We also devise a data compression method for disconnected sensor nodes to deal with the problem of limited memory space of sensor nodes. The performance of our proposed approach is extensively evaluated using synthetic data sets and real data sets. The results show the effectiveness of our approach. Keywords: Distributed databases, Database applications, Mobile sensor networks, Top-k query processing ∗ Corresponding

author Email addresses: [email protected] (Heejung Yang), [email protected] (Chin-Wan Chung), [email protected] (Myoung Ho Kim)

Preprint submitted to Journal of LATEX Templates

February 15, 2016

PT

ACCEPTED MANUSCRIPT

1. Introduction

RI

Recent advances in robotics and low power embedded systems have enabled

SC

the deployment of mobile sensor networks for many applications. Mobile sensor networks implant mobility into static sensor networks. In a static sensor network, static sensor nodes are deployed in a region and collect data periodi-

NU

cally in their static locations. Static sensor nodes have capabilities of sensing, processing, and communication. On the other hand, mobile sensor networks

MA

are composed of mobile sensor nodes and each mobile sensor node not only has capabilities of the static sensor node, but also moves around the neighborhood. This mobile sensor node explores its surrounding and exchanges information

D

with its peers through wireless communication. A top-k query in sensor networks returns the k nodes with the highest (or

TE

lowest) sensor readings. Top-k queries are widely used for retrieving the k most interesting data in many mobile sensor network applications such as environmen-

AC CE P

tal monitoring, disaster and emergency management, and military surveillance. They can be used to not only monitor the data generated by sensors in real time but also perform data analysis for archival purposes or research study. Consider an environmental monitoring example. Environmental engineers want to know about contaminated areas and to take appropriate actions for the contamination. However, these areas might be too dangerous to access in person. In this case, mobile sensors can be used. These mobile sensor nodes are equipped with chemical or pollution sensors to detect the amount of toxic chemicals and GPS sensors to obtain location information. Each mobile sensor node moves around and generates sensor data (nodeID, GPS data, sensing value). A top-k query is issued to find out the locations with the highest sensing values. To compute a top-k query in a sensor network environment, it is necessary to retrieve sensor readings from multiple sensor nodes and compare their values to generate the results. A naive implementation of top-k query processing is to use a centralized approach in which all sensor readings are transmitted to the base 2

ACCEPTED MANUSCRIPT

PT

station and then the base station computes the top-k result. If a certain sensor reading is smaller than the k -th sensor reading of the top-k result, the transmission of this data is unnecessary. However, this approach transmits all sensor

RI

readings regardless of their values, thus it consumes too much energy. Since

SC

sensor nodes are battery-powered and deployed in an unattended manner, it is not easy to replace their batteries. Therefore reducing the energy consumption is a major concern in sensor networks. Batteries of sensor nodes are depleted by

NU

sensing, computation, and communication. Among these tasks, communication is the primary source of the energy consumption [1–3]. Thus several techniques

MA

have been proposed to resolve the limited energy constraints by reducing the communication. An in-network aggregation technique, initiated in TAG [4], is one of them. In this approach, a routing tree rooted at the base station is first

D

established, and the data are transmitted along the routing tree. An interme-

TE

diate sensor node compares the received data with its own sensor reading and sends the top-k sensor readings to the parent. Therefore, this approach can reduce the energy consumption compared to the centralized approach. Since users

AC CE P

are often not interested in exact answers and small errors can be accepted in many scenarios, approximate top-k query processing techniques are proposed. Silberstein et al. [3] developed a sampling-based approach. They use samples of the past sensor readings and formulate the problem of optimizing approximate top-k queries under the energy constraint as a linear program. FILA [5] is another approximation approach using a filter. The basic idea is to install a bounded filter at each sensor node to suppress unnecessary sensor updates. A sensor node transmits its sensor reading only when the sensor reading exceeds the bounded filter. However, all of these approaches are based on the static sensor network environment. In mobile sensor networks, the mobility of sensor nodes incurs new challenges in addition to the problems of static sensor networks (i.e., resource constraints). Since mobile sensor nodes tend to continuously move around, the network conditions change frequently and they consume considerably more energy than static sensor nodes. In static sensor networks, once the routing 3

ACCEPTED MANUSCRIPT

PT

topology is constructed, it can be used throughout the whole network lifetime, although some minor changes might be required because of node failures or bad link qualities, etc. However, in mobile sensor networks, the routing topology is

RI

changed frequently because of the movement of mobile sensor nodes. This incurs

SC

a significant routing overhead and increases the energy consumption. Therefore, we need a new routing method to deal with the network dynamics and the energy limitation of mobile sensor nodes. To reduce the energy consumption in data

NU

collection, it is important to avoid unnecessary transmissions. In top-k query processing, an unnecessary transmission means that the transmission of data

MA

which has a smaller sensing value than the k -th sensing value in the top-k result. Therefore, we need an energy-efficient data collection method while considering the semantics of the top-k query. Some sensor nodes might be disconnected from

D

the network. In this case, a disconnected sensor node has to store its data until

TE

the network is connected. Since we cannot store all data in the disconnected sensor node because of the limited memory space, we need a space-efficient data storing method.

AC CE P

In this paper, we propose mSensor which is an efficient top-k query process-

ing framework in mobile sensor networks. In order to deal with the network dynamics and the limited energy constraints of sensor nodes, a mobility-aware routing method is proposed together with relevant routing criteria. The Analytical Hierarchy Process (AHP) is used to construct the best route from all routing factors according to the user’s preferences of the routing factors. The filter-based data collection method using the semantics of the top-k query is devised to reduce the energy consumption. Since sensor data whose sensing value is smaller than the k -th sensing value in the local top-k result cannot be part of global top-k result, it is not necessary to transmit that sensor data. That is, k -th sensing value in the local top-k result can be used as a filter to filter out unnecessary transmissions. In order to deal with the limited memory constraints of sensor nodes, a data compression method using the Haar wavelet transform and thresholding is proposed to store data that have not been transmitted. Since Haar wavelet transform reduces the amount of data very effectively, it 4

ACCEPTED MANUSCRIPT

PT

can be adapted well in resource limited sensor networks [6]. Thresholding that discards some detail coefficients from the Haar wavelet transform is used for compact representation of the data. Our approach can save the energy in deal-

RI

ing with the problems caused by the mobility and resource constraints of mobile

SC

sensor nodes and provide a sufficiently accurate top-k result. The contributions of this paper are as follows:

NU

• We propose mSensor which is an efficient top-k query processing framework in a mobile sensor network environment.

MA

• We devise an efficient routing method to deal with the network dynamics and the energy limitation of sensor nodes. There are various factors that affect the quality of a route. We define routing criteria to represent im-

D

portant route characteristics for top-k query processing in mobile sensor networks. Based on the routing criteria, we provide an effective method

TE

to select the best route.

AC CE P

• We devise a filter-based data collection method to reduce the energy consumption which is more critical for mobile sensor nodes than static sensor nodes. We exploit the semantics of the top-k query to both increase the filtering efficiency and provide a more accurate result.

• We devise an effective data compression method for disconnected sensor nodes. Since sensor nodes have very limited memory constraints, we provide a data compression method using the Haar wavelet transform which can be performed with low computing power of sensor nodes, and thresholding in a way that is advantageous to top-k query processing. • Extensive experiments are conducted to evaluate the performance of our approach. The results show the effectiveness of our approach in the mobile sensor network environment. On the average, the energy consumption of the proposed approach is approximately 57% less than that of a recent approach adapted to the mobile environment while providing near 100% accuracy of the query results. 5

ACCEPTED MANUSCRIPT

PT

The remainder of this paper is organized as follows. Section 2 reviews the related work on top-k query processing in sensor networks. In Section 3, we present mSensor, our top-k query processing framework in mobile sensor net-

RI

works. The experimental results are shown in Section 4. Finally, in Section 5,

SC

we make conclusions.

NU

2. Related Work

Top-k query processing. There has been much work on top-k query processing in distributed networks. The main objective is to return the k highest

MA

answers efficiently. A typical assumption is that the ranking score of an object should be aggregated from a number of attribute values stored at distributed data sources i.e., vertically or horizontally fragmented data sets. The first im-

D

portant approach is Fagin’s algorithm (FA) [7]. The basic idea of FA is to read

TE

attribute values of the top-k objects from every sorted list until there are k objects which have been seen in all lists. There is no need to continue scanning the

AC CE P

rest of the lists. For each seen object, a random access is performed to find the missing attribute. The ranking scores of the seen objects are computed and the k objects with the highest ranking scores are returned. The Threshold Algorithm (TA) [8], [9], [10] is the most well-known algorithm for sorted lists. TA is simple and efficient and provides a significant performance improvement over FA. The main difference between TA and FA is the stopping mechanism, which determines when to halt sorted access to the lists. The stopping mechanism of TA uses a threshold which is computed using the last local scores seen under sorted access in the lists. It halts sorted access when there are at least k objects seen whose ranking scores are higher than or equal to the threshold. However, TA may perform more random accesses than FA. To reduce the number of accesses, the Best Position Algorithm (BPA) [11] is proposed. The stopping mechanism of this algorithm takes into account the positions seen in the lists. Another algorithm is the Three Phase Uniform Threshold (TPUT) algorithm [12]. This algorithm reduces remote accesses in large networks. The communication cost

6

ACCEPTED MANUSCRIPT

PT

is reduced by pruning away ineligible data items and restricting the number of round-trip messages between the query originator and the other nodes. In sensor network environments, the execution of a query is typically associ-

RI

ated with the transfer of data between sensor nodes along the routing topology.

SC

However, sensor nodes have very limited energy resources, and communication is the primary source of the energy consumption. Therefore, it is important to conserve energy by reducing the communication. To reduce data transmission,

NU

an in-network aggregation technique, initiated in TAG [4], has been proposed. This approach reduces the overall energy consumption by performing compu-

MA

tation within the network and reducing the size of the transmitted data. This can be applied to various aggregation functions, such as MIN, MAX, COUNT, SUM, and AVG. For top-k query processing, the intermediate sensor nodes of

D

the routing topology calculate their local top-k results and transmit only k data

TE

to their parent sensor nodes. Since users are often not interested in exact answers and small errors can be accepted in many scenarios, approximation can be used to reduce more energy consumption. Silberstein et al. [3] proposed a

AC CE P

sampling-based approach. Using samples of past sensor readings, they formulate the problem of optimizing approximate top-k queries as a linear program. They encode the energy budget, topology, and other features such as local filtering and proofs as constraints. The resulting plan is generated to achieve the highest possible accuracy while exhausting no more energy than allocated to it. Wu et al. [5] proposed a filter-based approach (FILA). A bounded filter is installed at each sensor node to filter out unnecessary sensor updates. At every sampling instance, a sensor node transmits its sensor reading to the base station only when the sensor reading passes the filter. This can reduce the number of transmissions. However, all of these works are focused on efficient top-k query processing in static sensor networks. Routing in mobile sensor networks. In mobile sensor networks, network topology is changed frequently due to the movement of mobile sensor nodes, and thus a large amount of overhead is generated to construct a new route. However, mobile sensor nodes have very limited resource constraints. Therefore, 7

ACCEPTED MANUSCRIPT

PT

traditional routing protocols for mobile ad hoc networks such as DSDV [13], AODV [14], DSR [15], ZRP [16] are not adequate for mobile sensor networks. Kim et al. [17] proposed a cluster-based routing protocol for mobile sensor net-

RI

works called LEACH-Mobile by modifying the LEACH (Low Energy Adaptive

SC

Clustering Hierarchy) protocol [18], which is the most popular energy-efficient hierarchical clustering protocol for static sensor networks. The membership declaration is added to confirm whether a mobile sensor node is able to com-

NU

municate with a specific cluster head. However, they focus on reducing the packet loss in a cluster-based environment. Lee et al. [6] proposed a routing

MA

protocol designed to send data from static sensor nodes to mobile sinks. Data is transmitted to relay nodes that are static sensor nodes along the predicted trajectory of the mobile sink. The relay nodes stash data until the mobile sink

D

passes and picks up the data. Linear programming is used to find optimal relay

TE

nodes that minimize the number of necessary transmissions. However, this work mainly focuses on predicting the trajectory of mobile sinks to improve routing efficiency. Moeller et al. [19] presented a dynamic backpressure routing proto-

AC CE P

col, called BCP (Backpressure Collection Protocol), for static sensor networks and sensor networks having mobile sinks. Routing and forwarding decisions are made dynamically for each packet. The expected number of transmissions and LIFO (Last-In-First-Out) queueing discipline are used to improve packet delays. However, they do not consider the energy saving aspect. Data compression in sensor networks. Sensor nodes have very limited memory constraints. Data compression can be a solution for efficient memory utilization. S-LZW (Sensor-LZW) [20] is a variant of the Lempel-Ziv-Welch (LZW) [21] compression algorithm for sensor nodes. S-LZW splits sensor data into fixed sized blocks and then compresses each block separately using a dictionary. The major concern of S-LZW is energy savings of sensor nodes. However, for disconnected sensor nodes, space-efficient memory utilization is important. In S-LZW, since sensor nodes have dictionaries to compress sensor data, this requires additional memory consumption. In order to improve the compression efficiency, several approaches exploiting correlation of sensor data have been 8

ACCEPTED MANUSCRIPT

PT

proposed. Dang et al. [22] proposed a compression method over a logical mapping, which assigns indices to sensor nodes based on the data content, in a cluster-based sensor network environment. A sensor node calculates the coeffi-

RI

cient corresponding to its index and only transmits non-zero coefficients to the

SC

server. However, this work requires a high communication overhead because each sensor node has to learn the pattern of data of the whole cluster. Gandhi et al. [23] proposed a framework to compress a large amount of sensor data,

NU

called GAMPS, using correlation of sensor data. GAMPS dynamically groups multiple sensor data so that the data within each group are correlated and can

MA

be maximally compressed together. Data in each group are transformed to further improve compression ratio using linear regression. However, they focus on efficient compression of multiple sensor data at the base station, and thus the

TE

D

energy saving aspect at the sensor nodes is not considered.

3. Proposed Approach

AC CE P

The goal of the approach is to provide an efficient top-k query processing framework in mobile sensor networks. A formal definition of the problem is given in Section 3.1. The mobility-aware routing, filter-based data collection, and data compression method for disconnected sensor nodes are discussed in detail in Sections 3.2, 3.3, and 3.4, respectively. 3.1. Problem Definition

This study considers a mobile sensor network, as depicted in Figure 1. It is assumed that the base station has a continuous power supply. In contrast, the sensor nodes are powered by battery. Each sensor node ni moves around and measures the local physical phenomenon vi . Sensor data di consists of (nodeID, GPS data, sensing value), where • nodeID is the sensor node identifier. • GPS data consists of the x and y location, the speed, the heading, and the time. 9

ACCEPTED MANUSCRIPT

PT

Mobile Sensor n1

Top-k Query

(GPS data, sensing value) Chemical Dispersion

RI

n3

Top-k Result

n2

SC

Base station

NU

nn

Figure 1: System architecture

MA

• sensing value is a sensor reading of a chemical or pollution sensor. A routing tree is used to transmit sensor data to the base station. The communication range of a sensor node is constrained to its local area. Transmissions

D

are based on TDMA (Time Division Multiple Access) to avoid a collision. To

TE

make a route to the base station, a sensor node selects one node as a parent among a set of sensor nodes which are within its communication range.

AC CE P

Given that each sensor node moves around, a parent-child relationship cannot be continued permanently. When the parent becomes beyond the sensor node’s communication range, a new route should be constructed. If a sensor node has no other sensor nodes within its communication range, it cannot transmit its data and therefore stores both the data received but not sent before the disconnection and its own sensor data generated after the disconnection in local memory.

For a mobile sensor network environment, we consider a top-k query that continuously requests the list of sensor data R with the highest sensor readings, that is R =< r1 , r2 , ..., rk > where ∀i < j, vi ≥ vj and ∀l > k, vl ≤ vk .

10

ACCEPTED MANUSCRIPT

PT

The result data ri consists of (xi , yi , ti , vi ), where xi and yi are the locations of the sensor node, ti is the time, vi is the sensing value. The top-k query results are used to identify the current interesting data or to perform data analysis for

RI

archival purposes or research study. Therefore, maintaining more accurate top-

SC

k results is necessary. The accuracy of the top-k query results is defined as the percentage of actual top-k results returned by the query. We devise an efficient top-k query processing framework while considering the mobility and resource

NU

constraints of sensor nodes. The top-k result is maintained and provided to

3.2. Mobility-aware Routing

MA

users by the base station.

In this section, a mobility-aware routing method is devised to deal with the

D

network dynamics and the energy limitation of sensor nodes. To make a route to the base station, REQUEST and REPLY messages are used, where REQUEST

TE

is used to find a set of candidate sensor nodes for the parent and REPLY is used by a candidate sensor node to inform a requesting sensor node of its status. The

AC CE P

message formats are as follows:

REQUEST {nodeID}

REPLY {nodeID, x, y, speed, heading, descNum, filterValue}

In a REPLY message, x, y, speed, and heading represent the mobility in-

formation of the sensor node. descNum and filterValue represent the number of descendant sensor nodes and the value of the filter, respectively. A sensor node ni sends a REQUEST message to sensor nodes that are within the communication range of ni . Once a sensor node nj receives a REQUEST message, it sends a REPLY message to ni . If a sensor node has no information about its mobility information, it sends a REPLY message without setting the corresponding values. Then, ni calculates a score for each REPLY message only when there is no empty value in it and selects the sensor node with the highest score as a parent. 11

ACCEPTED MANUSCRIPT

ExpectedTime

DescNum

Stability

Accuracy

FilterValue

Energy

RI

Routing Criteria DistanceToBS Delay Meaning

Find the best parent

PT

Goal

SC

Candidate sensor nodes

NU

Figure 2: Routing criteria

To find the best parent, Routing Criteria C is defined, as shown in Figure 2. Routing Criteria C consists of routing factors which are DistanceToBS,

MA

ExpectedTime, DescNum, and FilterValue. Let ni be a sensor node that is searching for a parent and nj be a candidate sensor node for a parent. The

D

routing factors are as follows:

TE

• DistanceToBS is the distance between nj and the base station. • ExpectedTime is the maximum possible duration that nj stays within the

AC CE P

communication range of ni . • DescNum is the number of descendant sensor nodes of nj . • FilterValue is the k -th sensing value in the local top-k result of nj . There are various factors that affect the quality of a route. From a network

perspective, a good route is one in which the transmission delay to the base station is small and the stability of the link is good enough to deal with the mobility of the sensor nodes. Among the candidate sensor nodes for the parent, the node which is close to the base station can transmit data in a short time. Therefore, DistanceToBS is utilized as a routing factor that represents the transmission delay. This is calculated by q (xj − xbs )2 + (yj − ybs )2

(1)

where (xj , yj ) and (xbs , ybs ) are the location of nj and the location of the base station, respectively. 12

ACCEPTED MANUSCRIPT

PT

As sensor nodes continuously move around, there are numerous possibilities to break the previous parent-child relationship. Whenever the link is broken, it is necessary to create a new link. However, this causes a considerable amount of

RI

routing overhead, which increases the energy consumption. To reduce the rout-

SC

ing overhead, the mobility information of a sensor node can be used. If a sensor node selects a candidate sensor node which will remain longer within its communication range as a parent, this parent-child relationship will last for a long time.

NU

Therefore, ExpectedTime is used as a routing factor to represent the stability of the link. The next position of a sensor node (xnextP osition , ynextP osition ) can be

MA

estimated by

(2)

ynextP osition = ycurrentP osition + s sin θ∆t

(3)

D

xnextP osition = xcurrentP osition + s cos θ∆t

TE

where (xcurrentP osition , ycurrentP osition ), s, and θ are the current position, speed, and heading of a sensor node, respectively. If the distance between the next position of nj and that of ni is smaller than the communication range of ni ,

AC CE P

nj is considered to stay within the communication range of ni . Therefore, expectedTime is the maximum ∆t which satisfies the following equation.

q

(xinextP osition − xjnextP osition )2 + (yinextP osition − yjnextP osition )2 < R

(4)

where (xinextP osition , yinextP osition ) and (xjnextP osition , yjnextP osition ) are the next positions of the sensor node ni and nj , respectively. From a query processing perspective, it is important to provide an accurate query result while reducing the energy consumption. DescNum and FilterValue are selected as routing factors to increase the accuracy of the query results and to reduce the energy consumption, respectively. To obtain a top-k query result, it is necessary to know the sensor values from all sensor nodes and sort them. However, in a mobile sensor network environment, some sensor nodes may be disconnected from the network. Disconnected sensor nodes cannot transmit both their own sensor data and received sensor data, therefore, these data are 13

ACCEPTED MANUSCRIPT

PT

excluded from the current top-k query processing. In this case, the base station cannot help computing the top-k result based only on the received sensor data. Therefore, a sensor node which has many descendant sensor nodes is a better

RI

parent to provide an accurate query result. To represent this, DescNum is

SC

utilized as a routing factor.

By exploiting the semantics of the top-k query in route construction, we can reduce the energy consumption of sensor nodes. In top-k query processing, an

NU

unnecessary transmission is a transmission of sensor data that has a smaller sensing value than the k-th sensing value of the top-k result. Therefore, the

MA

k-th sensing value of the local top-k result can be used as a filter to suppress unnecessary transmissions. Each sensor node sends its data only when the sensing value of the data is larger than the filter. For example, consider a

D

sensor node n who has sensor data {9, 20, 35, 40, 50, 75, 80, 100}. Suppose that

TE

there are three candidate sensor nodes n1 , n2 , and n3 whose filter values are 10, 70, and 33, respectively. If n1 is selected as a parent, 9 is only filtered out. If n2 is selected as a parent, 9, 20, 35, 40, and 50 are filtered out. If n3 is selected

AC CE P

as a parent, 9 and 20 are filtered out. By selecting a sensor node which has a large filter value as a parent, the filtering capability increases and thus the energy consumption of sensor nodes decreases. Therefore, F ilterV alue is used as a routing factor.

Based on Routing Criteria C, a sensor node computes a score for each can-

didate sensor node. The score of a candidate sensor node nj is S(nj ) =

X

wi ∗ v(ci )

(5)

ci ∈C

where ci is a routing factor in Routing Criteria C, v(ci ) is the value of ci and wi is the weight of ci . The value of each routing factor v(ci ) is normalized between 0 and 1. Weights for routing factors are determined using the Analytical Hierarchy Process (AHP) [24]. AHP is a well-researched and, widely-applied technique that is used in the multi-criteria decision analysis, a field in the decision theory. It provides a simple yet systematic means of finding the overall best choice from all alternatives according to the decision maker’s preferences of the alternatives. 14

ACCEPTED MANUSCRIPT

PT

To calculate the weights, a pairwise comparison matrix A is formed in which the value of an element aij represents the relative importance of ci as compared with cj . Saaty’s scale [24] is used to represent the user preferences of the routing

SC

Table 1: Satty’s scale Satty’s scale

RI

factors in Routing Criteria C. Table 1 illustrates Saaty’s scale as used here.

The relative importance of the two sub-elements Equally important

3

Moderately important with one over the other

5

Strongly important

7

Very strongly important

9

Extremely important

2, 4, 6, 8

MA

NU

1

For compromises between the above

D

When comparing routing factors ci to cj , 1 indicates that ci and cj are

TE

equally preferred, 3 signifies moderate preference for ci over cj , 5 indicates strong preference, 7 denotes very strong preference, and 9 indicates extreme preference. The inverse values 1/3, 1/5, 1/7, and 1/9 are used in the reverse

AC CE P

order of the comparison i.e., cj vs. ci . Even numbers 2, 4, 6, and 8 represent compromise values between odd numbers. We compare the importance of two routing factors and set the corresponding element of the matrix. After making the pairwise comparison matrix, its principal eigenvector is computed. The principal eigenvector is an eigenvector corresponding to the maximum eigenvalue of the matrix. Weights for routing factors are obtained from this principal eigenvector.

Figure 3 shows an example in which the weights are calculated for the routing criteria. For pairwise comparison matrix A, DB, ET, DN, and FV represent DistanceToBS, ExpectedTime, DescNum, and FilterValue, respectively. According to the user’s preference of one routing factor over the other, elements of the pairwise comparison matrix A are set as shown in Figure 3. Based on this pairwise comparison matrix A, the calculated weights for routing factors are w = {0.8533, 0.4810, 0.0560, 0.1930}. Some sensor nodes may fail. The failure of sensor nodes can break commu15

ACCEPTED MANUSCRIPT

DN

FV

1

3

9

5

1/3

1

7

5

DN

1/9

1/7

1

1/7

FV

1/5

1/5

7

1

ET

A=

w = (wDB, wET, wDN, wFV)

Solve Aw = λw by solving det(λI – A) = 0

RI

ET

Find an eigenvector with eigenvalue λmax

w = (0.8533, 0.4810, 0.0560, 0.1930)

SC

DB DB

PT

Routing Criteria C = {DistanceToBS, ExpectedTime, DescNum, FilterValue}

NU

Figure 3: An example of weight calculation

nication paths in the network and lead to loss of sensor data from a certain

MA

region. When we construct a route, DescN um is used to reduce the number of unknown sensor data for query processing as much as possible. However, if a parent node who has many descendant sensor nodes fails, both the sensor data

D

from its descendant sensor nodes and its own sensor data are lost. Therefore,

TE

the accuracy of the query results can be degraded. In order to deal with this problem, multipath routing can be used. Multipath routing is a routing tech-

AC CE P

nique that uses multiple alternative paths through a network to provide reliable data transmission. Instead of selecting one sensor node as a parent node, a sensor node selects multiple parents with the highest scores among candidate sensor nodes. A user specifies the number of parents considering the node failure rates. Sending sensor data to multiple parents reduces the possibility of sensor data loss and thus improves the transmission reliability. Although this multipath routing increases the energy consumption of sensor nodes, it can provide more accurate query results. There have been some researches for multipath routing in mobile sensor networks such as DCBM [25] and APMPR [26]. These approaches can also be applied on top of our mobility-aware routing method to provide reliable data transmission. 3.3. Filter-based Data Collection To reduce the energy consumption, we devise a filter-based data collection method. The basic idea is that the parent sensor node transmits a filter to

16

ACCEPTED MANUSCRIPT

PT

each child sensor node to filter out unnecessary transmissions. A sensor node transmits sensor data only when the sensor value is larger than the filter value. In a mobile sensor network environment, the parent-child relationship changes

RI

frequently. Therefore, we take the distributed approach rather than the central-

SC

ized approach used by FILA [5] in order to react quickly to network dynamics. In our distributed approach, the filter is transmitted from a parent to a child while the filter is transmitted from the base station to a sensor node in the

NU

centralized approach. Consequently, our approach consumes less energy than the centralized approach.

MA

To set the filter, a parent determines its local top-k result based on all received data and sets the filter to the k -th sensing value of that result. This filter is transmitted to the children. As sensor data having a smaller sensing

D

value will not be included in the final top-k result, a sensor node does not

TE

transmit the sensor data if its sensing value is smaller than the filter value. Let Down be the set of sensor data generated by the current sensor node, Dreceived be the set of sensor data received from the children of the current sensor

AC CE P

node, and Dtransmitted be the set of data to be transmitted from the current sensor node to its parent. The data transmission works as follows: Each sensor node has its own sensor data Down and received data from its children Dreceived if children sensor nodes exist. To formulate the transmitted data Dtransmitted , for each data in Dreceived , the sensing value v is checked as to whether it is larger than the filter value f . If it is, this data is added to Dtransmitted . To reduce the size of the transmitted data, a sensor node computes the difference between the sensing value and the filter value. Allowing that the parent knows the filter value, it can recover the original sensing value from the received data. If the data type of the sensing value is an integer, 32 bits are required to represent it. However, it is possible to reduce the size of data by converting the difference value into a binary number. For example, when the sensing value and the filter are 90 and 70, respectively, the difference value dif f is 20. This is converted into a binary number binaryDif f 10100. This representation only requires 5 bits compared to 32 bits in the original. For the data in Down , the same procedures 17

ACCEPTED MANUSCRIPT

PT

are applied. For the filter transmission, we use one value as a filter instead of two values used by the previous bounded filter approach to reduce the energy consumption.

RI

To further reduce the energy consumption of the filter transmission, a suppres-

SC

sion scheme is applied to it. If the same parent-child relationship remains for the next time and the current filter value is identical to the previous filter value, the parent does not transmit its current filter to the children. When a child sensor

NU

node does not receive a filter, it assumes that the filter value is unchanged from the previously received value. However, the parent may fail. In this case, the

MA

filter is not transmitted to the child and the child cannot determine whether this non-report is a suppression or a failure. In order to deal with this problem, the heartbeat message is used. The heartbeat message is a simple notification

D

packet to let the child know that the parent is alive. If the child receives only

TE

the heartbeat message from the parent, the child knows that the filter transmission is suppressed. However, if the child does not receive both the heartbeat message and the filter from the parent, the child knows that the failure occurs

AC CE P

at the parent. Therefore, there is no case where the failure is identified as the suppression.

The optimized filter transmission works as follows: The current filter value

fcurrent is set to the k -th sensing value in the local top-k result Rlocal . If fcurrent is unchanged from the previous filter value fprevious , the previous children prevChildren and the current children currChildren are compared. Let dif f Children = currChildren−prevChildren. The sensor node n in dif f Children is a new child. fcurrent is transmitted to n in dif f Children. If fcurrent is changed, fcurrent is transmitted to the sensor nodes which are in currChildren. 3.4. Data Compression Method for Disconnection In a mobile sensor network environment, there exist some disconnected sensor nodes. These sensor nodes have to store their data in their memory until the network is connected. However, sensor nodes have very limited memory constraints. Therefore, we propose an effective data compression method using 18

ACCEPTED MANUSCRIPT

PT

wavelet transforms and thresholding. Wavelet [27] is a mathematical tool for the hierarchical decomposition of functions. It is used for data compression. We use the Haar wavelet transform

RI

which can be performed with low computing power of sensor nodes. The Haar

SC

wavelet transform is obtained by recursive pairwise averaging and differencing at a different resolution. This principle is illustrated by means of a simple example. Consider dataset D = [9, 7, 3, 5]. We first average the values pairwise

NU

to obtain a new lower-resolution representation of D with average values [8, 4]. Clearly, some information is lost in this averaging process. To be able to

MA

restore the original values of D, it is necessary to store some detail coefficients which capture the missing information. These detail coefficients are calculated by subtracting the second of the averaged values from the computed pairwise

D

averages. Thus, in this simple example, for the first pair of averaged values,

TE

the detail coefficient is 1, as 8 - 7 = 1. The second detail coefficient is -1, as 4 - 5 = -1. Repeating this process recursively on the averages gives the full

AC CE P

decomposition as follows:

Resolution

Averages

Detail coefficients

4

[9 7 3 5]

2

[8 4]

[1 -1]

1

[6]

[2]

The result of the Haar wavelet transform is a single coefficient representing

the overall average of D followed by the detail coefficients in the order of increasing resolution. Thus, the wavelet transform of D is given by WD = [6, 2, 1, -1]. During a disconnection, a sensor node has to store both the data received but not sent before the disconnection and its own sensor data generated after the disconnection in local memory. The Haar wavelet is applied to these data. First, blocks of x, y, and v are formulated. The size of each block is 2n . Then, the Haar wavelet transform is applied to each block. For the timestamp data, since the sampling interval is known, the first time and the last time of the

19

ACCEPTED MANUSCRIPT

PT

Table 2: Notation Description

Nremoval

# of required removals

f

filter value

DreceivedBD

the data received but not sent before the disconnection

DownAD

the sensor’s own data generated after the disconnection

Dcompressed

the result data

bSize

the size of one block

processedSize

the size of the processed data from Doriginal

bIndex

block index

countBi

# of sensing values that are smaller than f in i-th block

xB

the set of blocks of x

yB

the set of blocks of y

vB

the set of blocks of v

S

sample space for the random sampling

judgingList

the list of numbers that represents which block to select for the

D

MA

NU

SC

RI

Notation

TE

removal of a detail coefficient countR

# of removed detail coefficients

AC CE P

disconnection are stored to reduce the size of the data. No information is lost during the transform. In the previous simple example,

the original data D has 4 values, as does the transformed result WD . If the memory space of a sensor node is not sufficient to store all of the data during the disconnection period, it becomes necessary to remove some detail coefficients to reduce the size of the data. The number of required removals is determined from the amount of insufficient memory space. The task of creating a compact representation of the data from the wavelet transform is known as thresholding. This process determines the best subset of detail coefficients to retain so that the approximation error is minimized. Therefore, thresholding is used to deal with the memory space limitation. Figure 4 shows the data compression algorithm. The notations used in Figure 4 are given in Table 2. If the cardinality of Doriginal (|Doriginal |) is not a power of two (Line 3), dummy data d = (0, 0, 0, 0) is inserted to make |Doriginal | to the next-highest power of two (Line 4 - 6) because the Haar wavelet transform is obtained by 20

PT

ACCEPTED MANUSCRIPT

AC CE P

TE

D

MA

NU

SC

RI

Algorithm 1 Data Compression Algorithm Input: Nremoval, f, DreceivedBD, DownAD, bSize Output: Dcompressed 1: Dcompressed = ‫׎‬ 2: Doriginal = DreceivedBD ସ DownAD 3: if !isPowerOfTwo(|Doriginal|) then 4: addedNum = # that can make Doriginal to the next-highest power of two 5: insert addedNum dummy data d = (0, 0, 0, 0) into Doriginal 6: end if 7: processedSize = 0 8: bIndex = 0 9: countBbIndex = 0 10: while processedSize < |Doriginal| do 11: for i = processedSize; i < processedSize + bSize; i + + do 12: di = (xi, yi, ti, vi) in Doriginal 13: insert xi, yi, vi into xBbIndex, yBbIndex, vBbIndex, respectively 14: if vi < f then 15: countBbIndex + + 16: end if 17: end for 18: apply the Haar wavelet transform to xBbIndex, yBbIndex, vBbIndex 19: processedSize+ = bSize 20: bIndex + + 21: countBbIndex = 0 22: end while 23: calculate removal rate r0 : r1 : … : rn - 1 = countB0 : countB1 : … : countBn - 1 n -1 24: lastNum = åi =0 ri 25: S = {1, 2, …, lastNum} 26: judgingList = 27: countR = 0 28: while countR < Nremoval / 3 do 29: num = randomly pick a number from S 30: if num ≤ judgingList[0] then 31: remove the smallest detail coefficient from xB 0, yB0, vB0 32: else if num > judgingList[j -1] && num ≤ judgingList[j] then // 0 < j ≤ n - 1 33: remove the smallest detail coefficient from xBj, yBj, vBj 34: end if 35: countR + + 36: end while 37: insert xB, yB, vB into Dcompressed 38: return Dcompressed

Figure 4: Data compression algorithm

21

ACCEPTED MANUSCRIPT

PT

pairwise computation. When blocks xB, yB, and vB are formulated for x, y, and v, respectively (Line 7 - 22), the number of sensing values of each block that are smaller than the filter value is counted (Line 14 - 16). The counted

RI

value of the block countB represents the probability that the block will be

SC

excluded in the final top-k result. Based on the countB values for the blocks, the removal rates are calculated (Line 23). If a certain block has a large count value, that block has a lower probability in the result. Thus, a higher removal

NU

rate is assigned for that block. To remove detail coefficients according to the removal rates for the blocks, a sample space S is formulated (Line 24 - 25). Since

MA

the number of detail coefficients to be removed from a block according to the removal rates may become a non-integer, we determine the number using the random sampling process which reflects the removal rates. Assume that there

D

are only two blocks B1 and B2 . Let the removal rates for blocks B1 and B2

TE

be 3:1. Then the sample space S = {1, 2, 3, 4}. If the total number of required removals is Nremoval , Nremoval /3 detail coefficients are removed from xB, yB, and vB (Line 28) because x, y, and v are considered to be equally important.

AC CE P

For each block in xB, yB, and vB, the number of removed detail coefficients is determined according to the removal rates (Line 29, 30, 32). If a randomly picked number num from S is satisfies the condition judgingList[j−1] < num ≤ judgingList[j] where 0 < j ≤ n − 1 and n is the number of blocks in xB (also in yB and vB), then one detail coefficient is removed from the j-th block xBj , yBj , and vBj , respectively (Line 32 - 34). If num ≤ judgingList[0], the detail coefficient is removed from the first block of xB, yB, and vB (Line 30 - 31). In the above example, let Nremoval be 6. We randomly pick 6 numbers from S. If 1, 2, or 3 are picked up from S, the detail coefficient in B1 is removed. Otherwise, the detail coefficient is eliminated from B2 . To minimize L2 error [28], the smallest detail coefficient is removed first from each block (Line 31, 33). In the case of minimizing the L∞ error, deterministic wavelet thresholding algorithm [29] can be applied.

22

ACCEPTED MANUSCRIPT

PT

4. Experimental Evaluation An experimental analysis was conducted to validate the proposed approach

RI

using our own simulator.

SC

4.1. Experimental Environment

Synthetic data sets and real data sets are used to evaluate the proposed

NU

approach. For the synthetic data sets, 500 mobile sensor nodes are randomly placed in an area of 1000m × 1000m. Within this area, the mobile sensor nodes move around according to the random waypoint model [15] with a maximum

MA

speed of 10m/s and zero pause time. Each sensor node randomly chooses a destination point and a speed and moves to this point at the chosen speed. The communication range of each sensor node is set to 50m and the links are

D

assumed to be bi-directional. This is the most common setting in the networking

TE

community [2]. The base station is located at the upper left corner of the area. To simulate the sensing values, a spatio-temporally correlated data set and

AC CE P

a random data set are generated. A pyramid event [30] is used to represent the spatio-temporal correlation. This event generates a continuous, gradually increasing or decreasing sensing value trend in all directions originating from a small region in the space. This can be considered as a situation of a toxic chemical leak. For the random data set, spatio-temporal correlation does not exist. Although two regions may be close or the time difference of a certain region may be small, the sensing values are very different. The value of k varies from 50 to 250. The top-k query is run for 500 epochs with 1 second for each epoch. Real data sets built from the Intel Berkeley Research Lab [31] were used. These consist of environmental sensor values over a series of time epochs collected by 54 static sensor nodes spread around their lab. To simulate the mobile sensor network environment, we give the mobility to sensor nodes using a random waypoint model. As the location of static sensor nodes in the lab is known, sensing values of locations are assigned and are allowed to change according to

23

ACCEPTED MANUSCRIPT

PT

the time flow. We employ 54 mobile sensor nodes. If a mobile sensor node moves to a certain location, the sensing value is set in accordance with its current location and timestamp. We assume that the base station is located at the

RI

upper left corner of the lab. As there are some missing and erroneous sensor

SC

data, those data are removed and 500 sensor data are extracted from each sensor node. The value of k varies from 5 to 25. The duration of top-k query is 500 epochs.

NU

The routing topology is constructed using the proposed mobility-aware routing algorithm. We assume that all routing factors have equal preferences. The

MA

default weights for the routing criteria are w = {0.5, 0.5, 0.5, 0.5}. To show the effects of the routing factors, the weights for them are varied. The performance of our approach (mSensor) was compared with FILA [5].

D

Although FILA is focused on top-k query processing in a static sensor network,

TE

it is also based on a filtering scheme. Therefore, we choose this as a comparison scheme. We adapt FILA to the mobile sensor network environment by using our mobility-aware routing method. In addition, we compared mSensor with Sasaki

AC CE P

et al.’s work [32] that is a two-phase filtering-based top-k query processing approach in mobile ad hoc networks. We denote their work as TwoPhase. In TwoPhase, the base station collects data from all nodes and determines the threshold, which is the k-th highest data, in the first phase. Then, in the second phase, the base station transmits a query with the threshold and each node sends back only its own data whose values are equal to or larger than the threshold. The performance is measured in terms of the energy consumption and the accuracy of the query results. Each experiment is run 5 times and the results are averaged. 4.2. Effects of the Routing Factors To show the effectiveness of the mobility-aware routing method, the preferences of the routing factors are varied and the quality of the routing tree is measured according to different weights of routing factors. As shown in Figure 2, each routing factor affects a different characteristic of the routing tree. The effect 24

ACCEPTED MANUSCRIPT

PT

of each routing factor is tested by comparing the corresponding characteristics of two routing trees R1 and R2 . R1 is the routing tree constructed by assigning the highest weight to the considered routing factor and the same weights to the

RI

other routing factors. The routing tree R2 is constructed by setting the smallest

SC

weight to the considered routing factor and the same weights to the other ones. For example, consider the effect of DistanceToBS (DB). DistanceToBS is used to control the transmission delay in the routing tree. The pairwise comparison

NU

matrices for calculating the weights of routing factors for R1 and R2 are shown in Figure 5. The superscripts of R1 and R2 (i.e., DB) represent the considered

MA

routing factor. For R1DB , DistanceToBS has the extreme importance compared with the other routing factors and the other routing factors (i.e., ExpectedTime, DescNum and FilterValue) are equally important. Therefore, 9 is set to the

D

elements aij in the pairwise comparison matrix, where i = 1 and 1 < j ≤ 4, the inverse value 1/9 is set to aji , and 1 is set to the remaining elements. The

TE

calculated weights for R1DB and R2DB are w(R1DB ) = {0.99, 0.11, 0.11, 0.11} and w(R2DB ) = {0.06, 0.58, 0.58, 0.58}, respectively. For the experiments of showing

AC CE P

the effects of the other routing factors, the weights are calculated in the same way as DistanceToBS. The weights used in each experiment are summarized in Table 3.

DB

A(R1DB) =

ET

DN

FV

DB

1

9

9

9

ET

1/9

1

1

1

DN FV

1/9 1/9

1 1

1 1

DB

ET

DN

FV

DB

1

1/9

1/9

1/9

ET

9

1

1

1

1

DN

9

1

1

1

1

FV

9

1

1

1

A(R2DB) =

(a) Pairwise Comparison Matrix for R1DB

(b) Pairwise Comparison Matrix for R2DB

Figure 5: Pairwise comparison matrices to show the effect of DistanceToBS

Table 4 shows the effect of DistanceToBS. DistanceToBS affects the transmission delay of the routing tree. Since the maximum transmission delay depends on the height of the constructed routing tree, the percentage of routing

25

ACCEPTED MANUSCRIPT

DescNum (DN)

FilterValue (FV)

w(R1DB ) = {0.99, 0.11, 0.11, 0.11}

RI

w(R2DB ) = {0.06, 0.58, 0.58, 0.58} w(R1ET ) = {0.11, 0.99, 0.11, 0.11} w(R2ET ) = {0.58, 0.06, 0.58, 0.58}

SC

ExpectedTime (ET)

Weights

w(R1DN ) = {0.11, 0.11, 0.99, 0.11} w(R2DN ) = {0.58, 0.58, 0.06, 0.58} w(R1F V ) = {0.11, 0.11, 0.11, 0.99}

NU

DistanceToBS (DB)

PT

Table 3: Weights to show the effects of routing factors Considered Routing Factor

w(R2F V ) = {0.58, 0.58, 0.58, 0.06}

MA

trees of R1DB that have smaller heights compared with those of R2DB is used to show how DistanceT oBS affects the routing tree construction. As shown in Table 4, the heights of R1DB s are smaller than those of R2DB s. For synthetic

D

data sets, in the case of correlated data, 85% of routing trees of R1DB have

TE

smaller heights than those of R2DB and in the case of random data, 83% of routing trees of R1DB have smaller heights than those of R2DB . For all three

AC CE P

cases of real data sets, 99% of routing trees of R1DB have smaller heights than those of R2DB . Therefore, by setting the highest weight to DistanceToBS, the transmission delay can be effectively controlled. Table 4: The effect of the routing factor DistanceToBS

% of R1DB ’s Improvement over R2DB

Data Sets

Synthetic Data Sets (k = 50)

Real Data Sets (k = 5)

Correlated Data

85%

Random Data

83%

Temperature

99%

Humidity

99%

Light

99%

ExpectedTime (ET) affects the stability of the link. If we give the highest importance to ExpectedTime, a child sensor node will choose a sensor node that has the longest ExpectedTime among candidate sensor nodes as a parent. This parent-child relationship will last for a long time. Therefore, the energy consumption caused by making a new route can be reduced. To show the effect 26

ACCEPTED MANUSCRIPT

PT

of ExpectedTime, the energy consumed by constructing the routing tree RiET for i = 1, 2, which is denoted as Routing Energy, is compared. The routing energy is measured by the total amount of exchanged messages (i.e., REQUEST and

RI

REPLY messages) to construct the routing tree. As shown in Table 5, when we

SC

assign the highest weight to ExpectedTime, the less energy is consumed to make the routing tree. The difference is relatively small for real data sets because the

small.

NU

area where the sensor nodes are moving and the number of sensor nodes are

Table 5: The effect of the routing factor ExpectedTime

MA

Data Sets

Correlated Data

Synthetic Data Sets (k = 50)

TE

D

Random Data

AC CE P

Real Data Sets (k = 5)

Temperature

Humidity

Light

Routing Energy (Bytes)

R1ET R2ET R1ET R2ET

4874751

R1ET R2ET R1ET R2ET R1ET R2ET

128920

5286994 4765021 5340803

129153 124632 124836 121988 122676

In the case of DescNum (DN), since the accuracies of R1DN and R2DN are

similar (near 100%), we do not show the results for them. To show the effect of FilterValue (FV), the numbers of filtered data are com-

pared between R1F V and R2F V . As shown in Table 6, in the case of assigning the highest weight to FilterValue, sensor nodes can aggressively filter out unnecessary sensor data by choosing sensor nodes that have larger filter values than other candidate sensor nodes as the parents. Therefore, the number of filtered data in R1F V is larger than that of R2F V . As shown in each experiment, the proposed mobility-aware routing method constructs an efficient routing topology according to the user preferences of routing factors.

27

ACCEPTED MANUSCRIPT

Data Sets

# of Filtered Data

Synthetic Data Sets (k = 50)

R2F V R1F V R2F V

SC

Random Data

R1F V

NU

Temperature

MA

Humidity

Light

41889

R1F V R2F V R1F V R2F V R1F V R2F V

37422 36578 13420 13319 20280 20223 19418 19363

D

4.3. Energy Consumption

42408

RI

Correlated Data

Real Data Sets (k = 5)

PT

Table 6: The effect of the routing factor FilterValue

TE

In this experiment, the energy consumption of mSensor is compared with those of FILA and TwoPhase. Since the primary source of the energy consump-

AC CE P

tion of a sensor node is the communication, the energy consumption is measured using the amount of transmitted data. The energy consumption includes transmissions for constructing the routing topology and processing the query. The basic data sizes used in the experiments are given below. Component

Size (bits)

REQUEST

32

REPLY mSensor

320

REPLY FILA

256

Result data

224

As FILA and TwoPhase do not use the filter information when creating the routing topology, it is not necessary for them to include filter information in a REPLY message. Therefore, the sizes of a REPLY message for FILA and TwoPhase are less than that for the proposed approach. Figure 6 (a) and (b) show the results for the synthetic data sets. As shown in Figure 6 (a) and (b), mSensor consumes less energy than FILA and TwoPhase in all cases. FILA

28

ACCEPTED MANUSCRIPT

mSensor

30000000 25000000 20000000 15000000 10000000

5000000 0 100

150 k

200

250

6000000 4000000 3000000 2000000

1000000 0 10

15 k

20

MA

5000000

Energy Consumption (Bytes)

TwoPhase

7000000

5

TwoPhase

25000000 20000000 15000000 10000000

5000000 0

50

100

150 k

200

250

(b) Random data

25

D

Energy consumption (Bytes)

FILA

30000000

NU

(a) Correlated data mSensor

35000000

SC

50

FILA

PT

TwoPhase

RI

FILA

Energy Consumption (Bytes)

Energy Consumption (Bytes)

mSensor 35000000

FILA

TwoPhase

6000000 5000000 4000000 3000000 2000000

1000000 0 5

10

15 k

20

25

(d) Light data

TE

(c) Temperature data

mSensor

7000000

AC CE P

Figure 6: Energy consumption comparison for various k values

uses a bounded filter ([l, u]) for each sensor node. The base station computes a filter and sends it to the corresponding sensor node. Therefore, the filter update overhead is quite large. In the case of TwoPhase, the k-th sensing value of the global top-k is used as a filter. Since it uses one value as a filter, the size of the filter is smaller than that of FILA. The base station collects sensor data from all sensor nodes and determines a filter and then transmits the filter to all sensor nodes. Thus, the filter setting and updating cost is high. However, mSensor uses the k -th sensing value of the parent’s top-k result as a filter, and the filter transmission is a single-hop communication between the parent and the child sensor nodes. Therefore, in mSensor, the size of the filter and the hop distance for filter transmission are smaller than those of FILA. Compared with TwoPhase, mSensor has the filter with the same size but it has lower transmission cost for the filter setting and update. To reduce the transmitted data size, mSensor compactly represents sensing values using

29

ACCEPTED MANUSCRIPT

PT

a small number of bits. Furthermore, mSensor applies suppression for filter transmission. Combining all of these features, mSensor significantly reduces the amount of energy consumed compared to FILA and TwoPhase.

RI

The overall energy consumption of the correlated data set in Figure 6 (a)

SC

is less than that of the random data set in Figure 6 (b). For the correlated data set, the filter update overheads for both approaches are reduced because there are spatio-temporal patterns of sensing values. However, in the case of the

NU

random data set, the sensing values change randomly. Therefore, filter updates occur more frequently than with the correlated data set.

MA

The results for the real data sets are shown in Figure 6 (c) and (d). We change the sensing value from temperature, humidity, light and voltage. We show the results for temperature in Figure 6 (c) and light in Figure 6 (d) .

D

The other results are similar to the temperature and light results. Similar to

TE

the results of the synthetic data sets, mSensor consumes much less energy than FILA and TwoPhase.

In order to test the scalability of the proposed mobility-aware routing method,

AC CE P

we conducted an experiment for varying the number of sensor nodes. The overall energy consumed by the mobility-aware routing method is measured. The number of sensor nodes are varied from 500 to 2000. The value of k is set to 250. Figure 7 shows the effect on energy when the size of the network changes. As the size of the network increases, the energy consumption also increases because the total number of sensor nodes increases. On the average, when the number of sensor nodes is doubled, the energy consumption increases about 1.1 times. Because the increase of the number of sensor nodes does not directly mean the increase of the density of sensor nodes within the communication range of the sensor node, the energy consumption does not increase dramatically. For a sensor network of 2000 sensor nodes, the average density of sensor nodes within the communication range of a sensor node increases about 22% compared with a sensor network of 1000 sensor nodes. However, once a sensor node finds a parent, the sensor does not send a REQUEST message to other sensor nodes until the parent-child relationship is broken. Therefore, our mobility-aware routing 30

ACCEPTED MANUSCRIPT

PT

method is not affected by the increased density of sensor nodes. 6000000 5000000

RI

4000000 3000000 2000000 1000000

SC

Energy Consumption (Bytes)

7000000

0

1000 1500 Number of Sensor Nodes

2000

NU

500

4.4. Accuracy mSensor

FILA

TwoPhase

MA

Figure 7: Energy consumption comparison for various numbers of sensor nodes

mSensor

60 40

0 100

150 k

AC CE P

50

200

Accuracy (%)

D

80

20

TwoPhase

80

60 40 20

0 250

50

(a) Correlated data mSensor

FILA

100

150 k

200

250

(b) Random data

TwoPhase

mSensor

100

100

80

80

Accuracy (%)

Accuracy (%)

FILA

100

TE

Accuracy (%)

100

60 40 20

FILA

TwoPhase

60 40 20

0

0 5

10

15 k

20

25

5

(c) Temperature data

10

15 k

20

25

(d) Light data

Figure 8: Accuracy comparison for various k values

The accuracy of top-k query results was evaluated using the percentage of actual top-k results returned by the query. Figure 8 shows the results of the accuracy for the synthetic data sets in Figure 8 (a), (b) and the real data sets 31

ACCEPTED MANUSCRIPT

PT

in Figure 8 (c), (d). mSensor shows better accuracy than FILA. In FILA, a bounded filter ([l, u]) is used for each sensor node. To maximize the filtering capability, the upper bound of top-(i+1 ) sensor node’s filter ui+1 is set to the

RI

lower bound of top-i sensor node’s filter li where 1 ≤ i ≤ k. A sensor node

SC

updates its data with the base station only when the sensor value passes the filter. Based on the newly received updates and the previous top-k result, the base station determines the current top-k result. However, this bounded filter

NU

cannot adequately capture small changes in the sensing values. In TwoPhase and mSensor, one sensing value is used as a filter: Twophase uses the k-th

MA

sensing value of the global top-k and mSensor uses the k -th sensing value of the parent’s top-k. This filter is more sensitive to that of FILA. Since the global top-k is obtained by selecting the k highest sensing values from the union of

D

local top-k results, the final top-k result by TwoPhase and that of mSensor

TE

are the same. Although TwoPhase can filter out unnecessary sensor data early during the data collection, there is no accuracy difference in filtering between TwoPhase and mSensor. However, as shown in Figure 6, TwoPhase consumes

AC CE P

more energy than mSensor because it uses a centralized approach in which the base station collects sensor data from all sensor nodes and determines the filter and then distributes it to all sensor nodes. To deal with the problem of limited memory space of disconnected sensor nodes, the Haar wavelet transform and thresholding are applied after making blocks of x, y, and v. Since the removal rates are assigned using the semantics of top-k query, the approximation is applied to blocks which have lower probabilities in the result. Moreover, the approximation error is minimized by removing the smallest detail coefficient first from each block. Therefore, mSensor can provide query results that are more accurate than those of FILA. 4.5. Memory Consumption To show the efficiency of the proposed data compression method for a disconnected sensor node, the memory consumption of mSensor during a disconnection period is compared with that of S-LZW [20]. S-LZW is a dictionary-based com32

ACCEPTED MANUSCRIPT

PT

pression algorithm developed for low powered wireless sensor nodes. It divides sensor data into fixed sized blocks and then compresses each block individually. For parameter values of S-LZW, the block size is set to 528 bytes and the dictio-

RI

nary size is set to 512 entries. When the dictionary gets full, a new dictionary

SC

entry is created.

The memory size of a sensor node is set to 4 KB according to the specification of Mica2 mote. During a disconnection, a sensor node stores both the data

NU

received but not sent before the disconnection and its own sensor data generated after the disconnection in local memory. The total number of sensor data to be

MA

stored in the sensor node varies from 100 to 400. The percentage of received data from the other sensor nodes changes from 10% to 90%. For example, in the case where the total number of sensor data is 100 and the percentage of

D

received data is 30%, the number of sensor data received from the other sensor

70.

TE

nodes is 30 and that of sensor data generated from the current sensor node is

Figure 9 and Figure 10 show the results of the memory consumption for the

AC CE P

correlated data and the random data, respectively. mSensor uses less memory than S-LZW in all cases. The memory consumption of S-LZW consists of two parts: 1) the memory used for the dictionary and 2) the memory used for the compressed data. For each bar of S-LZW, the bottom portion represents the memory consumption for the S-LZW dictionary and the top portion represents that of the compressed data. Actually, when the data memory of a sensor node is full, S-LZW cannot store sensor data to the sensor node any more. However, for comparison, it is assumed that the sensor node used for S-LZW has unlimited memory space. As the number of data to be stored in the sensor node increases, the memory consumption of S-LZW also increases because the size of the dictionary and the size of the compressed data increase. The memory requirement of the S-LZW dictionary is very high: on the average, the percentages of the memory used by the S-LZW dictionary are 70.5% for the correlated data set and 68.2% for the random data set. However, mSensor does not require a dictionary to compress the sensor data. The memory consumption of S-LZW 33

ACCEPTED MANUSCRIPT

5000 4000

3000 2000 1000

0 20

30 40 50 60 70 % of Received Data (%)

80

90

0 10

20

30 40 50 60 70 % of Received Data (%)

MA

8000

Memory Consumption (Bytes)

10000

2000

80

4000 2000

0 10

20

30 40 50 60 70 % of Received Data (%)

S-LZW Dictionary

90

D

Memory Consumption (Bytes)

S-LZW Compressed Data

4000

6000

80

90

(b) The number of data = 200

mSensor Compressed Data

6000

8000

NU

(a) The number of data = 100 S-LZW Dictionary

mSensor Compressed Data

SC

10

S-LZW Compressed Data

PT

S-LZW Dictionary

RI

S-LZW Compressed Data

mSensor Compressed Data

Memory Consumption (Bytes)

Memory Consumption (Bytes)

S-LZW Dictionary

12000 10000

8000 6000 4000 2000

0 10

20

30 40 50 60 70 % of Received Data (%)

80

90

(d) The number of data = 400

TE

(c) The number of data = 300

S-LZW Compressed Data

mSensor Compressed Data

Figure 9: Memory consumption comparison between mSensor and S-LZW for the correlated

AC CE P

data

increases as the percentage of the received data increases because the correlation of sensor data to be compressed decreases in accordance with the increased percentage of the received data. However, mSensor has the same memory consumption regardless of the percentage of the received data because it applies the thresholding when the memory space of the sensor node is not sufficient. On the average, the memory usages of mSensor are approximately 57% and 84% less than those of S-LZW for the correlated data set and for the random data set, respectively. Figure 11 shows the results of the memory consumption for the temperature data of the real data set. The results for the humidity data and the light data are not included as they also present similar results. As shown in Figure 11, mSensor uses less memory than S-LZW in all cases. S-LZW consumes approximately 71.2% of memory for the dictionary. On the average, mSensor

34

ACCEPTED MANUSCRIPT

10000 8000

6000 4000 2000

0 20

30

40 50 60 70 % of Received Data

80

90

5000 0 10

20

30 40 50 60 70 % of Received Data (%)

MA

25000

Memory Consumption (Bytes)

30000

10000

80

20

30

40 50 60 70 % of Received Data

S-LZW Dictionary

90

D

Memory Consumption (Bytes)

S-LZW Compressed Data

15000

10

80

90

(b) The number of data = 200

mSensor Compressed Data

20000

18000 16000 14000 12000 10000 8000 6000 4000 2000 0

NU

(a) The number of data = 100 S-LZW Dictionary

mSensor Compressed Data

SC

10

S-LZW Compressed Data

PT

S-LZW Dictionary

RI

S-LZW Compressed Data

mSensor Compressed Data

Memory Consumption (Bytes)

Memory Consumption (Bytes)

S-LZW Dictionary

35000 30000 25000 20000 15000 10000 5000 0

10

20

30 40 50 60 70 % of Received Data (%)

80

90

(d) The number of data = 400

TE

(c) The number of data = 300

S-LZW Compressed Data

mSensor Compressed Data

Figure 10: Memory consumption comparison between mSensor and S-LZW for the random

AC CE P

data

uses approximately 49% less memory than S-LZW.

5. Conclusion

This paper proposes an efficient top-k query processing framework for mo-

bile sensor networks. The mobility of sensor nodes adds more complexity to the previous top-k query processing in static sensor networks. In mobile sensor networks, it is necessary to consider both the resource constraints of sensor nodes and the frequent change of network dynamics. To deal with the network dynamics, a mobility-aware routing method is proposed in which the routing criteria are defined. The Analytical Hierarchy Process (AHP) is used to determine the weights for routing factors in routing criteria. To reduce the energy consumption during the data collection process, a filter-based approach exploiting the semantics of the top-k query is devised. Some sensor nodes can become disconnected

35

ACCEPTED MANUSCRIPT

4000 3000

2000 1000

0 20

30 40 50 60 70 % of Received Data (%)

80

90

30 40 50 60 70 % of Received Data (%)

MA

20

80

4000 3000 2000 1000 0 10

20

30 40 50 60 70 % of Received Data (%)

S-LZW Dictionary

Memory Consumption (Bytes)

S-LZW Compressed Data

90

D

Memory Consumption (Bytes)

10

5000

80

90

(b) The number of data = 200

mSensor Compressed Data

8000 7000 6000 5000 4000 3000 2000 1000 0

6000

NU

(a) The number of data = 100 S-LZW Dictionary

mSensor Compressed Data

SC

10

S-LZW Compressed Data

PT

S-LZW Dictionary

RI

S-LZW Compressed Data

mSensor Compressed Data

Memory Consumption (Bytes)

Memory Consumption (Bytes)

S-LZW Dictionary

10000 8000

6000 4000 2000

0 10

20

30 40 50 60 70 % of Received Data (%)

80

90

(d) The number of data = 400

TE

(c) The number of data = 300

S-LZW Compressed Data

mSensor Compressed Data

Figure 11: Memory consumption comparison between mSensor and S-LZW for the tempera-

AC CE P

ture data

from the network due to the movement of the sensor nodes. In such a case, a disconnected sensor node must store data in its limited local memory until the network is connected. To deal with this problem, an effective data compression method using the Haar wavelet transform and thresholding is proposed. Experimental results confirm the effectiveness of the proposed approach.

References [1] S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong, Tinydb: an acquisitional query processing system for sensor networks, ACM Trans. Database Syst. 30 (1) (2005) 122–173. [2] Y. Yao, J. Gehrke, Query processing in sensor networks, in: CIDR, 2003. [3] A. S. Silberstein, R. Braynard, C. Ellis, K. Munagala, J. Yang, A sampling-

36

ACCEPTED MANUSCRIPT

PT

based approach to optimizing top-k queries in sensor networks, in: ICDE, 2006.

RI

[4] S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong, Tag: a tiny aggre-

SC

gation service for ad-hoc sensor networks, OSDI 36 (SI).

[5] M. Wu, J. Xu, X. Tang, W.-C. Lee, Top-k monitoring in wireless sensor

NU

networks, IEEE Trans. on Knowl. and Data Eng. 19 (7) (2007) 962–976. [6] H. Lee, M. Wicke, B. Kusy, O. Gnawali, L. J. Guibas, Data stashing: energy-efficient information delivery to mobile sinks through trajectory pre-

MA

diction, in: IPSN, 2010, pp. 291–302.

[7] R. Fagin, Combining fuzzy information from multiple systems, J. Comput.

D

Syst. Sci. 58 (1) (1999) 83–99.

TE

[8] R. Fagin, A. Lotem, M. Naor, Optimal aggregation algorithms for middleware, in: PODS, 2001.

AC CE P

[9] U. G¨ untzer, W.-T. Balke, U. Guntzer, W. t. Balke, W. Kieling, Towards efficient multi-feature queries in heterogeneous environments, in: ITCC, 2001.

[10] S. Nepal, M. Ramakrishna, Query processing issues in image(multimedia) databases, in: ICDE, 1999.

[11] R. Akbarinia, E. Pacitti, P. Valduriez, Best position algorithms for top-k queries, in: VLDB, 2007. [12] P. Cao, Z. Wang, Efficient top-k query calculation in distributed networks, in: PODC, 2004. [13] C. E. Perkins, P. Bhagwat, Highly dynamic destination-sequenced distancevector routing (dsdv) for mobile computers, in: SIGCOMM, 1994, pp. 234– 244.

37

ACCEPTED MANUSCRIPT

PT

[14] C. E. Perkins, E. M. Belding-Royer, Ad-hoc on-demand distance vector routing, in: WMCSA, 1999, pp. 90–100.

RI

[15] D. B. Johnson, D. A. Maltz, Dynamic source routing in ad hoc wireless networks, in: Mobile Computing, Kluwer Academic Publishers, 1996, pp.

SC

153–181.

[16] Z. Haas, A new routing protocol for the reconfigurable wireless networks,

NU

in: Proc. of IEEE 6th International Conference on Universal Personal Communications (1CUPC’97), 1997, pp. 562–566.

MA

[17] D.-S. Kim, Y.-J. Chung, Self-organization routing protocol supporting mobile nodes for wireless sensor network, in: IMSCCS (2), 2006, pp. 622–626.

D

[18] W. R. Heinzelman, A. Chandrakasan, H. Balakrishnan, Energy-efficient

2000.

TE

communication protocol for wireless microsensor networks, in: HICSS,

AC CE P

[19] S. Moeller, A. Sridharan, B. Krishnamachari, O. Gnawali, Routing without routes: the backpressure collection protocol, in: IPSN, 2010, pp. 279–290.

[20] C. Sadler, M. Martonosi, Data compression algorithms for energyconstrained devices in delay tolerant networks, in: SenSys, 2006.

[21] T. A. Welch, A technique for high-performance data compression, IEEE Computer 17 (6) (1984) 8–19.

[22] X. T. Dang, N. Bulusu, W. chi Feng, Rida: A robust information-driven data compression architecture for irregular wireless sensor networks, in: EWSN, 2007, pp. 133–149. [23] S. Gandhi, S. Nath, S. Suri, J. Liu, Gamps: compressing multi sensor data by grouping and amplitude scaling, in: SIGMOD, 2009. [24] T. Saaty, Fundamentals of the Analytic Hierarchy Process, RWS Publications, 2000.

38

ACCEPTED MANUSCRIPT

PT

[25] A. Aronsky, A. Segall, A multipath routing algorithm for mobile wireless sensor networks, in: 3rd Joint IFIP Wireless and Mobile Networking Con-

RI

ference, WMNC, 2010.

[26] L. Yu, D. Chen, X. Huang, Multipath routing for mobile wireless sensor net-

SC

works through area partition, Journal of Information and Computational Science 10 (9) (2013) 2735–2745.

NU

[27] E. J. Stollnitz, T. D. DeRose, D. H. Salesin, Wavelets for computer graphics: A primer, part 2, IEEE Comput. Graph. Appl. 15 (4).

MA

[28] E. J. Stollnitz, T. D. Derose, D. H. Salesin, Wavelets for computer graphics: theory and applications, Morgan Kaufmann Publishers Inc., 1996.

D

[29] M. Garofalakis, A. Kumar, Deterministic wavelet thresholding for

TE

maximum-error metrics, in: PODS, 2004. [30] W. Xue, Q. Luo, L. Chen, Y. Liu, Contour map matching for event detec-

AC CE P

tion in sensor networks, in: SIGMOD, 2006. [31] http://db.csail.mit.edu/labdata/labdata.html. [32] Y. Sasaki, T. Hara, S. Nishio, Two-phase top-k query processing in mobile ad hoc networks, in: NBiS, 2011.

39

ACCEPTED MANUSCRIPT

MA

NU

SC

RI

PT

Biography

Heejung Yang received the B.S. degree in Computer Engineering from Inha University in 2005, and the Ph.D. degree in the School of Computing from KAIST in 2016. She is currently working as a researcher in TmaxData.

D

Her research interests include sensor network, stream data management, and

AC CE P

TE

spatio-temporal databases.

Chin-Wan Chung is a professor in the Chongqing Liangjiang KAIST International Program at the Chongqing University of Technology (CQUT), China and a professor in the School of Computing at the Korea Advanced Institute of Science and Technology (KAIST), Korea. He received a Ph.D. degree in Computer Engineering from the University of Michigan, Ann Arbor, USA. He was a Senior Research Scientist and a Staff Research Scientist in the Computer Science Department at the General Motors Research Laboratories. He has published over 130 papers in the international journals and conferences, and registered 25 40

ACCEPTED MANUSCRIPT

PT

international and domestic patents. He received the best paper award at ACM SIGMOD in 2013. He has been on program committees of major international conferences such as ACM SIGMOD, VLDB, IEEE ICDE, WWW, and ICWS.

RI

He was an Associate Editor of ACM TOIT, and is currently an Associate Edi-

SC

tor of WWW Journal. In 2014, he was the General Chair of the International WWW Conference. His current research interests include Web, sensor network

TE

D

MA

NU

databases, graph databases, and multimedia databases.

AC CE P

Myoung Ho Kim is a professor of the School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea. He received the B.S. and M.S. degrees in computer engineering from Seoul National University, Seoul, Korea in 1982 and 1984, respectively, and the Ph.D. degree in computer science from Michigan State University, East Lansing in 1989. He has worked on database and information processing areas, and his database laboratory was designated as a national research laboratory by National Research Foundation of Korea in 2007. His current research interests include database systems and applications, distributed and parallel database, multimedia database, bioinformatics, information retrieval and data mining.

41