Journal of Network and Computer Applications 35 (2012) 37–59
Contents lists available at ScienceDirect
Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca
Practical data compression in wireless sensor networks: A survey Tossaporn Srisooksai a,, Kamol Keamarungsi b, Poonlap Lamsrichan c, Kiyomichi Araki d a
TAIST-Tokyo Tech, ICT for Embedded System Program, Department of Electrical Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand Embedded System Technology Laboratory, National Electronics and Computer Technology Center, Thailand Department of Electrical Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand d Department of Electrical and Electronic Engineering, Tokyo Institute of Technology, Japan b c
a r t i c l e i n f o
abstract
Article history: Received 1 September 2010 Received in revised form 20 February 2011 Accepted 1 March 2011 Available online 16 March 2011
Power consumption is a critical problem affecting the lifetime of wireless sensor networks. A number of techniques have been proposed to solve this issue, such as energy-efficient medium access control or routing protocols. Among those proposed techniques, the data compression scheme is one that can be used to reduce transmitted data over wireless channels. This technique leads to a reduction in the required inter-node communication, which is the main power consumer in wireless sensor networks. In this article, a comprehensive review of existing data compression approaches in wireless sensor networks is provided. First, suitable sets of criteria are defined to classify existing techniques as well as to determine what practical data compression in wireless sensor networks should be. Next, the details of each classified compression category are described. Finally, their performance, open issues, limitations and suitable applications are analyzed and compared based on the criteria of practical data compression in wireless sensor networks. & 2011 Elsevier Ltd. All rights reserved.
Keywords: Wireless sensor networks Power consumption Data compression Energy-Efficient
1. Introduction Currently, we are in the third generation of computer evolution called the ubiquitous computing era (Schmidt, 2010). In this era, computers have begun to disappear into surrounding objects and people may not even know that they are interacting with a computer in their daily-life activities. One of the key technologies to support ubiquitous computing is wireless sensor networks (WSNs). WSNs are a combination of various kinds of sensors equipped with small micro-controller boards called sensor nodes and wireless networks. These sensor nodes are generally selforganized and form the wireless sensor network. The WSN system can be used to monitor the activities of a target and report events or information using radio communication to a base station or sink node which takes charge of processing and sending that information via the Internet. Several real-world applications are being studied and developed using WSN systems to allow the realization of ubiquitous computing. Prominent examples are home automation systems (Song et al., 2008), health care monitoring systems (Gao et al., 2007) and environment monitoring for agriculture or precision farming systems (Riquelme et al., 2009). Home automation systems enable home appliances to interact wisely with its residence. For example, lights are turned on automatically when the owner opens a specific door. Health care applications allow
Corresponding author.
E-mail address:
[email protected] (T. Srisooksai). 1084-8045/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.jnca.2011.03.001
doctors to diagnose patients not only in a hospital, but also from any location. For precision farming, a system based on WSNs allows farmers to monitor, control and manage their farms efficiently and conveniently. When designing a WSN system, there are a number of challenges which can be broadly classified into three major issues (Baronti et al., 2007). First, information management architecture has to be designed to address information conflict and interaction that occurs when gathering information from many sensors. Additionally, information loss has to be minimized. Second, sensor nodes are often randomly spread out over specific regions and can work without human intervention. The downside of this is that they are prone to all kinds of malicious attacks, which raise the importance of network security. In order to keep communication secure, sensitive data should be encrypted and a connection should be authenticated. Therefore, key management, which is a prerequisite of encryption and authentication, should be addressed carefully. Finally, since the sensor nodes in WSNs are powered by very low voltage batteries and they are deployed in the order of hundreds to thousands of nodes, replacing and recharging the batteries of so many nodes may realistically be considered infeasible. The last issue raises the importance of power consumption or power management in WSNs. The focus of this paper is aimed at tackling this power consumption issue, which affects the lifetime of WSNs. Generally, each sensor node consists of three sub-units: a sensing unit is used to acquire the target events or interesting data; a processing unit equipped with limited memory is used to manage the acquired data; and a communication unit, usually
38
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
a radio transceiver, is used to exchange information between nodes. The research work in Barr and Asanovic´ (2006) pointed out that the radio transceiver on board a sensor node is the main factor of power consumption. Thus far, there are several studies aimed at reducing the radio communication to achieve sufficient power saving. To reach this objective, two main approaches have been introduced: duty cycle and in-network processing. The duty cycle scheme coordinates and defines wake and sleep schedules among nodes in the network. Detail about the application of these techniques in WSNs can be found in Anastasi et al. (2009). The innetwork processing scheme solves this issue through reducing the amount of data to be transmitted by means of aggregation techniques and/or data compression. The aggregation techniques involve different ways of routing data packets in order to combine them by exploiting the extracted features and statistics of data sets coming from different sensor nodes, e.g., maximum value, minimum value and average value. Those aggregated data are then forwarded to sink node(s). To achieve its objective, the aggregation approach requires three basic components: a routing algorithm, data aggregation and data representation/data compression (Fasolo et al., 2007). Note that there are some overlaps between the aggregation techniques and data compression techniques. However, this paper focuses only on data compression techniques. Although existing data compression techniques for wireless sensor networks have been surveyed in the literature such as Kimura and Latifi (2005), the survey was not up-to-date and contained algorithms that were not practical in WSNs. With recent growing interest in this field, several new techniques have been proposed. Therefore, this article intends to survey new data compression algorithms and analyze the relationships as described in Section 2. Based on our analysis, data compression algorithms can be classified into two categories: a distributed data compression approach and a local data compression approach. This survey also aims to consider practical algorithms based on realworld requirements. Since different applications often have different characteristics in their data, suitable data compression algorithms which can minimize the power consumption in particular application should be considered. The rest of this article is organized as follows. First, Section 2 describes the criteria for defining the relationships of in-network processing techniques and classifying the data compression algorithm according to our survey. Next, the critical requirements for designing data compression based on real word applications of WSNs are described in Section 3. Section 4 then provides a definition of the criteria based on the requirements in Section 3 in order to review, analyze and compare the performance, limitations and open issue of each data compression algorithm. After that, details of the application of distributed approaches and local approaches of data compression algorithms in WSNs are described in Sections 5 and 6, respectively. In Section 7, the criteria in Section 3 are used to analyze and compare the performance of each existing data compression, including a report of their limitations, open issue and suitable applications. Finally, Section 8 is the conclusion.
2. Data compression algorithm classification Figure 1 illustrates the relationships between aggregation and data compression techniques. The aggregation techniques are usually adopted in dense sensor networks with multi-hop topology which require routing algorithms. Therefore, the data aggregation function, denoted by Set A in Fig. 1, is drawn on top of the routing algorithm. This aggregation function in existing literature was usually performed by extracting maximum, minimum and
Fig. 1. The relationships between aggregation and data compression techniques.
average values of aggregated data (Fasolo et al., 2007; Croce et al., 2008). In such a way, it can reduce the amount of communicating data in the dense sensor networks which affect the power consumption. However, this technique can lose much of the original structure in the extracted data because it provides only coarse statistics without local variations such as data distribution over an area (Guestrin et al., 2004). A number of papers solved this issue by using rules or factors to tune the degree of aggregation. This means that the aggregation is considered to operate based on the network’s information, such as the differences of consecutive data (Sharaf et al., 2004), location of nodes (Cayirci, 2003) and network capacity (Abdelzaher et al., 2004). However, these improved schemes have the possibility of losing their original data structure when there is a high degree of aggregation. The data compression schemes, which is shown as Set B in Fig. 1, have been applied to solve original data structure loss problems. Since these second sets of data compression algorithms (Guestrin et al., 2004; Ciancio and Ortega, 2005) were extended from aggregation techniques which operate on multi-hop network topology, they usually distribute a compression algorithm throughout the network. Meanwhile, another set of existing algorithms denoted by Set C in Fig. 1 do not require dense networks and a routing algorithm because their compression algorithms are performed at each local node independently of other nodes (Schoellhammer et al., 2004; Marcelloni and Vecchio, 2010, 2009; Liang and Peng, 2010). This last set of compression schemes works well on sparse topology sensor networks. By analyzing the relationship among various data compression algorithms described above, we can classify them into two categories: distributed data compression approach and local data compression approach. Figure 2 shows a taxonomy of our data compression algorithm classification in WSNs. How to classify the subcategories of each approach is explained in its respective section.
3. Data compression algorithm in wireless sensor networks for real-world requirements This section describes two sets of requirements in real-world WSN applications. The first set contains the common constraints of wireless sensor networks that should be considered in designing a data compression algorithm. The second set identifies the unique requirements related to the data compression designed for each particular existing real-world application. 3.1. Common requirements in real-world WSN applications 3.1.1. Computational cost and power saving Since the communication unit on a wireless sensor node is a major power consumer, a data compression scheme can be used to reduce the amount of information being exchanged in a network resulting in a saving of power. The higher the ratio of
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
39
Fig. 2. Taxonomy of data compression classification in WSNs.
data compression, the higher the percentage of power saving. However, when applying a data compression algorithm, the sensor node’s processing unit requires more power to operate the algorithm. In Barr and Asanovic´ (2006), the authors tested the wellknown data compression techniques on the Compaq Personal Server, which is a research version of the Compaq iPAQ personal digital assistant (PDA). The PDA was used for data collection in a similar manner to a wireless sensor node. Although the results of this experiment might not be readily applicable for WSNs, they still provided significant insights into the power consumption of data processing and transmission. The results showed that even with well-known algorithms—such as the scheme called prediction with partial match using Markov modeling followed by arithmetic coding (PPMd) (Shkarin and Ppmd, 2005), which has an impressive compression ratio—it still required more power than the non-compression system. This is due to the complexity of the data compression algorithm which requires tremendous time and memory. Therefore, an efficient data compression algorithm needs to be designed by considering the trade-off between the computational cost and the power saving from compression ratio. In other words, the power consumed by performing additional instructions to compress data has to be less than the power saved from transmitting the compressed data.
3.1.2. Commercial sensor node constraints In real-world applications using commercial wireless sensor nodes, these nodes are often characterized by their small size, ability to sense environmental phenomena through a set of sensors, low data rate radio transceivers, and small batteries. With these unique characteristics, the common hardware constraints can be divided into two groups. 1. Processing constraints: The well-known wireless sensor node’s platforms, such as Mica, Telos and Tmote Sky (Company, 2010; Sentilla company, 2010), are equipped with Atmel Atmega128L and Texas Instruments MSP430 micro-controllers, which have instruction memory of only 128 and 48 KB, respectively. Their processing clocks vary from 4 to 8 MHz. With these constraints, it is necessary to design a low complexity and small code-size compression algorithm for wireless sensor network applications. 2. Sensor accuracy/error: Due to inherent noise in hardware, commercial sensors used in WSNs usually produce different readings even though they are sampling the same constant phenomenon. For this reason, sensor manufactures specify not only the sensor’s operating range but also the sensor’s accuracy, which is expressed by a margin of error. However, they do not report a probability distribution for this margin of error. Thus, the users are only confident that the measured values are within the error margin, but the users do not know the magnitude of error and associated error probability (Schoellhammer et al., 2004). Due to this accuracy constraint, lossless data compression algorithms, which ensure the correctness of information during compression and
decompression processes, can be inefficient for sensors used in commercial nodes. On the other hand, the lossy data compression algorithms, which may generate a loss of information, are more suitable under the condition that only noise is allowed to be lost (Marcelloni and Vecchio, 2010). This can be realized by using de-noise techniques. Hence, for some applications, if they utilize a sensor with a high margin of error, it is necessary to avoid implementing the lossless data compression and it is more reasonable to apply the lossy compression with de-noise techniques. 3.2. Specific requirements in real-world WSN applications In Srivastava (2010), the purposes of wireless sensor networks implemented in real-world applications can be classified into two categories: tracking and monitoring. Several researches have applied tracking in their applications, for example, enemy tracking in military fields, animal tracking in habitat applications, human tracking in home automation applications, and traffic or car tracking in public/transportation. For monitoring purposes, there are many well-known applications, such as environmental monitoring applications, patient monitoring in health care applications, structural monitoring applications in civil engineering, and chemical or factory monitoring in industrial fields. In this section, we focus on the requirements concerning the design of a data compression algorithm based on the purposes of these realworld applications.
3.2.1. Statistical characteristics of real-world data When designing data compression algorithms in WSNs, knowing the statistical characteristics of measured data sets is absolutely essential. The characteristics of data in the tracking applications usually include small variance except when their sensing units capture the signal of the target. This means that there is a small difference between two consecutive signals in a normal situation and a big difference between two consecutive signals when special events occur. Since typical compression algorithms often work well on smooth data or data with low variation (Marcelloni and Vecchio, 2009), it is beneficial to apply data compression algorithms on tracking applications. However, the tracking application usually needs real-time operation. Thus, the algorithms’ speed is the most important factor to be considered when implementing them in this kind of application. Additionally, there are some tracking applications which are deployed in dense sensor networks. In these environments, sensors collaboratively process signal data to detect and classify targets. In one scenario, a network consists of a number of sensor arrays, with each array having a cluster of homogeneous sensors performing the target detection. Sensors in the array are geometrically close to each other, within a range of 100 m, while interarray distances could be much farther. (Figure 3 shows a sensor cluster with seven sensors.) One sensor is elected as the head, which aggregates data from its members, e.g., node 1 in Fig. 3, while the head sensor node also observes the same event. In this scenario, clearly there are both spatial and temporal correlations
40
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
in the sensed data. When both type of data correlations exist, it may be useful to apply data compression distributed across the nodes to reduce redundancy in the data (Tang et al., 2006). For monitoring applications, there are various kinds of data types. These data in wireless sensor networks can be classified by two criteria: data variance and data source. Data variance: Using the data’s variance, the data can be categorized into two types: smooth data and non-smooth data. Figure 4 shows the distribution plots of smooth data. Figure 5 shows the distribution plots of non-smooth data. These data were obtained from PDG 2008 Deployment data sets which were sampled every 2 min between April 4, 2008 and April 20, 2008 by the wireless sensor network systems of Sensor scope project (2010). The main point to be noted is that the standard deviation
Fig. 3. A sensor cluster with seven sensors.
of the smooth data sets is much lower than the standard deviation of the non-smooth data sets. In this case, the smooth data are temperature and relative humidity data sets and both data sets are more likely to fit well with Gaussian distribution. Using the same source of real-world data sets from Sensor scope project (2010), authors in Marcelloni and Vecchio (2009) pointed out that both smooth and non-smooth data have lower mean and lower standard deviation enabling their entropy to be low. Thus, if the entropy compression algorithms are applied on a low entropy data set, the compression ratio will be high. Although, there are different distributions in raw data between smooth and non-smooth types, the distribution of differences between two consecutive sample data (called residue) of both types is similar. This can be seen in Fig. 6. In Liang and Peng (2010), using the real-world data obtained by the system in Davis project (2010), the authors showed that the residues of all types fit better with Laplacian than Gaussian distribution. Laplacian distribution’s curves are narrower than those of Gaussian distribution. The authors in Marcelloni and Vecchio (2009) pointed out that a compression ratio using an entropy compression scheme for the residues of data set which have a lower entropy is also high. Data source: Based on the nature of the data’s source, the data can be divided into two types: single data type per sensor node and multiple data types per sensor node. Tracking applications usually use more than two sensors combined into a single module (Hao et al., 2006). However, those sensors are usually the same
Fig. 4. Distribution plots of the smooth data: (a) temperature data and (b) relative humidity data.
Fig. 5. Distribution plot of the non-smooth data, this case is solar radiation data.
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
41
Fig. 6. Residues distribution plot of (a) temperature data, (b) humidity data and (c) solar radiation data.
2.
3.
4.
Fig. 7. Distribution plot of multiple data types in a single node. There are two types in this case: temperature and humidity data.
5. sensor type generating the same data type. Thus, it is easier to design a compression algorithm for such applications knowing the same statistical characteristics of data. However, for monitoring applications, a single node can be attached with several sensor types which generate multiple data types. For instance, SHTxx series (Shtxx sensor, 2010), which is the well-known commercial sensors used in WSN applications, has two sensor types, which are temperature and relative humidity on the same package. If users deploy this type of sensor in a single node, the statistical characteristic of the data will have multiple modes as shown in Fig. 7. This constraint could reduce the compression performance. Thus, when designing a data compression algorithm, the multiple data types per sensor node are important factors which require an appropriate data manipulation solution. There are a few existing papers which focus on this issue, such as Sornsiriaphilux et al. (2010).
4. Comparison criteria for data compression algorithms Based on all the real-world constraints presented in Section 3, the following criteria will be used to review, analyze and compare the performance, limitations and open issues of each data compression algorithm. 1. Compression performance: The scope of this article does not aim to set up an actual experiment or simulation to measure the compression performance of each algorithm. However, the
6.
7.
compression ratios which were reported and used to express the performance in earlier literature are summarized to demonstrate the comparison. Power saved from reducing in transmission: This criterion presents how and how much each algorithm’s compression ratio affects the reduction of transmission. Additionally, the power which is saved due to compression performance is reported and compared. Power used when performing data compression algorithms: This criterion reviews and compares the complexity of each compression algorithm within its own category. This can be measured by either analyzing the algorithm complexity or counting the number of basic operations used in each algorithm, e.g., additions, comparisons and shifts. Then the power used to perform these instructions are reported. Net of power saving: This criterion can be measured by calculating the difference between power saved from reduced transmission and power used for performing the algorithm. The result is an important factor used to determine whether a compression algorithm is suitable for use in WSN applications. Algorithm code size: With the sensor node’s memory space constraints, this criterion aims to analyze and compare the code size of each algorithm in its own category. Suitability of using data compression classes (lossless or lossy): With the constraint of sensor accuracy described earlier, this criterion is used to analyze and compare whether each algorithm is properly designed for its corresponding data. Suitability for multiple data types or single data type: This criterion reviews each algorithm to determine whether it is suitable for multiple data types per node or single data types per node.
5. Distributed data compression approaches This section presents an overview of algorithms in the distributed data compression category. A performance review and comparison of these algorithms based on the criteria described in Section 4 will be discussed in Section 7. Distributed data compression approaches in WSNs are usually applied in dense sensor networks, which can be broadly classified into four main techniques: distributed source modeling (DSM), distributed transform coding (DTC), distributed source coding (DSC) and compressed sensing (CS) techniques. This classification is similar to the one given in Marcelloni and Vecchio (2010) and Wang et al. (2009b). In each technique, the overall technique is briefly discussed. Then, greater focus is placed on the details of each algorithm that explicitly addresses the power constraints and other requirements in WSNs mentioned in Section 3.
42
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
5.1. Distributed source modeling (DSM) The DSM technique aims to search for a function/model that best fits a set of input measurements acquired by a specific group of sensor nodes using parametric modeling and non-parametric modeling. Using parametric modeling, an algorithm treats sensor data as a random process that has to be optimally estimated when knowing its statistical parameters, such as mean and variance. Parametric modeling could yield superior performance when the statistical structure of the random process (sensor data) being observed is known. On the other hand, non-parametric modeling utilizes kernel-based regression to represent the sensor data where the regression coefficients are learned by treating the sensor data as input–output example pairs of some deterministic function observed in noise. In this case, it requires very little prior information about the nature of the data in the random process and is considered to be very robust (Oka and Lampe, 2008). 5.1.1. Parametric modeling This technique is often applied in distributed estimation systems. A simple example of a parametric modeling system is depicted in Fig. 8. The data (d1, d2,y,dn) in the figure are measured by a specific group of sensor nodes and they are sent to an aggregation/sink node called a fusion center to estimate a variable y which represents all input data. Under communication constraints, such as noise, the set of input data (d1, d2,y,dn) cannot send their real-value quantities without distortion (Gubner, 1993). Generally, quantization techniques are used to process the input data locally, which results in quantized data (q1, q2,y,qn) before sending them to the fusion center. A typical model used to represent this data transmission over the network is dn ¼ y þsn where dn is a random variable representing sensor data, y is an unknown expectation value of dn and sn is noise. Eventually, the fusion center estimates y which yields a minimum error between y and all of its quantized data. In this scheme, data compression based on quantization and parametric modeling techniques is used on both sensor nodes and aggregation/sink node(s). Therefore, the transmitted data are reduced resulting in power saving. Existing papers that focused on designing and optimizing the quantization’s parameters are Xiao et al. (2006) and Li and AlRegib (2009). Those papers that focused on estimation are Li and Alregib (2007) and Aysal and Barner (2008). Since this paper intends to review the data compression algorithms that explicitly addressed power constraints and other relevant constraints presented in Section 3, we summarize the details of related work as follows.
Fig. 9. Block diagram of OQET system.
(1.1) Optimizing quantization-based estimation target (OQET) scheme (Xiao et al., 2006): This algorithm aims to minimize the transmitted power by optimizing the level of adaptive quantization while meeting the target of estimation error. The mean squared error (MSE) estimation is used in this case. The authors in Xiao et al. (2006) focused on searching for parameters of quantization adaptively to minimize the communication power without affecting the performance of the estimation model. Figure 9 shows a block diagram of the OQET system (Xiao et al., 2006). There are N sensor nodes in which each node makes a measurement on a deterministic signal source (y). The assumption of measured data in this system is more realistic than the system in Fig. 8. In this scheme, the measured signal is corrupted by additive noise (s). Thus, the measured data (d) can be described by the following model, dn ¼ y þ sn . These input data are quantized locally at each sensor node. The quantizer is designed based on a probabilistic uniform quantization scheme (Xiao et al., 2006). The important quantization process is described by a mapping function Qn : dn /qn ðdn ,Ln Þ where Ln is the length in bits of quantized data. Since the power of transmission will be higher if the length of quantized data is longer, Ln should be designed appropriately. The outputs of quantization qn are also corrupted by quantization noise ðun Þ. Each output of quantization can be described by qn ¼ y þ sn þ un in which Eðqn Þ ¼ y and Varðqn Þ rW 2 =ð2Ln 1Þ þ s2n ¼
d2n þ s2n , where [ W,W] is the range of sensor’s signal, d2n ¼ W 2 =ð2Ln 1Þ, and ðsn Þ is the variance of observed value. After the quantization, the quantized data qn(dn,Ln) are sent to the fusion center under an assumption that the wireless communication channel is corrupted by additive white Gaussian noise (AWGN). The received data at the fusion center are then estimated. The estimation technique used in this scheme is quasiBLUE (Kay, 1993), in which the estimation value y can be determined by the following expression, yn ¼ yn ðq1 ,q2 , . . . ,qn Þ ¼ P P 2 2 2 1 2 ð N ð N n ¼ 1 1=ðdn þ sn ÞÞ n ¼ 1 qn =ðdn þ sn ÞÞ. The MSE performance (D) of this estimator is measured by !1 N X 1 : ð1Þ D ¼ Eðjyn yj2 Þ r 2 2 n ¼ 1 dn þ sn Since this OQET system assumed the un-coded quadrature amplitude modulated (QAM) scheme (Cui et al., 2005) for data transmission, the authors obtained the power consumption model (P) required for transmission with relevant parameters, which are sampling rate (Br), channel path loss (an), probability of bit error rate (pnb), power spectral density of channel corrupted by AWGN (cn) and length in bits of quantized data (Ln), as follows: ! 2 ð2Þ Pn ¼ Br cn an ln n ð2Ln 1Þ: pb
Fig. 8. Distributed estimation wireless sensor network system.
The objective of this scheme is to minimize the power consumption vector (P) while achieving MSE performance ðD0 Þ which is not
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
greater than targeted MSE performance (D0). Using L2-norm, this optimization can be processed by the following formula: minJPJ2
subject to D0 rD0 ,
where vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N uX Pn2 : JPJ2 ¼ t
ð3Þ
n¼1
This equation is then re-formulated into a convex problem by 2 defining rn ¼ 1=ðdn þ s2n Þ. Using convex optimization techniques (Boyd and Vandenberghe, 2004), the optimized value (ropt n ) can be obtained. Thus, the optimization of quantized length (Lopt n ) can be calculated as follows: 0 1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u W2 C B u B C Lopt C n ¼ log2 B1 þ u t 1 @ A 2 opt sn rn 8 for n Z N1 þ1, > <0 rffiffiffiffiffiffiffiffiffiffiffiffi W Z0 ð4Þ ¼ for n r N1 , > : log2 1 þ sn an 1 where N1 is the number of active sensors, an is channel’s path loss, and Z0 is a threshold. Detailed explanation can be found in Appendix B of Xiao et al. (2006). According to (2) and (4), the transmitted power for node n is given as s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þffi 2 Wan Z0 : ð5Þ 1 Pn ¼ Br cn ln pb sn an The Ln in (4) and Pn in (5) will be zero when Z0 =an r1 or an Z Z0 . Thus, if channel path loss (an) of each sensor node is higher than threshold Z0 , that node should discard its measured data in order to save its power. Note that in (4) the length of quantized bits is proportional to the logarithm of local signal-to-noise ratio (SNR) scaled by channel path gain. Based on the key concept presented above, the implementation of the OQET scheme can be summarized as follows. First, the sink node has to broadcast the threshold Z0 . Second, using (4), each sensor node determines the length of quantization L adaptively based on its local information, which are variance of observed value sn , channel path loss an and the received threshold Z0 . Finally, the node that has an Z Z0 should be inactive or not transmit. (1.2) Optimizing quantization-based power target (OQPT) scheme (Li and AlRegib, 2009): While OQET’s objective is to optimize adaptive quantization under the constraint of the estimation performance target, OQPT’s objective is to optimize adaptive quantization under the constraint of the total power transmission target. The system of OQPT consists of similar components as shown in Fig. 9. However, this scheme attempts to minimize the upper bounds of the MSE estimation under the target of power constraint. The optimization formulation in (3) is changed to the following problem: min
K1 X n¼1
ðDn Þ
subject to
K1 X
Pn r P0 ,
ð6Þ
n¼1
where Dn is the MSE performance of active sensor nodes described by (1), Pn is the actual transmitted power of active sensor nodes described by (2) and P0 is the target of total transmitted power in system. To solve this problem by convex optimization techniques, the equivalent unit-energy MSE function is defined by Li and AlRegib (2009) as gðs2n ,Ln ,an ,cn Þ ¼ PðLn ,an ,cn Þ f ðs2n ,Ln Þ,
ð7Þ
43
where P(Ln,an,cn) ¼Pn, Ln is the quantized length, an is the channel’s path loss, cn is the power spectral density of channel corrupted by AWGN, sn is the variance of observed value and f ðs2n ,Ln Þ ¼ 1=rn ¼ s2n þðW 2 =ð2Ln 1ÞÞ. Therefore, Dn can be determined by K1 X 1 Dn ¼ f ðs2n ,Ln Þ n¼1 K 1 X Pn : ð8Þ ¼ gðs2n ,Ln ,an ,cn Þ n¼1 Since the optimal solution for (6) cannot be written in a closed form, this optimization problem is separated into two cases: homogeneous sensor networks and heterogeneous sensor networks. In homogeneous sensor networks, the variances of noise signals for all sensors are identical. This means that s21 ¼ s22 ¼ ¼ s2n ¼ s2 . In this case, the optimal quantized message length (Lopt) and the optimal transmitted power (Popt), which are the same numbers for all nodes, can be obtained as W Lopt ¼ log2 1 þ ,
s
Popt ¼
caW
s2
:
ð9Þ
The optimal number of active sensor nodes (Kopt 1 ) can be calculated as P0 ð10Þ K1opt ¼ opt : P As a result of the analysis of Li and AlRegib (2009), this scheme can be implemented in a fully distributed manner as follows. First, the sink node can determine adaptively the length of quantization and its transmitted power based on local information using (9). These parameters are the same numbers for all nodes. Second, the optimal number of active sensor nodes (Kopt 1 ) is calculated using (10). Finally, at each task period, sensor nodes are allowed to be active at most Kopt nodes. This enables actual 1 total transmitted power (Pn) to not exceed the target of total transmitted power (P0). In heterogeneous sensor networks, which contrast with homogeneous cases, the noise signals of all sensor nodes are not identical. In this case, the optimal quantized message length (Lopt the optimal equivalent unit-energy MSE function n ), gðs2n ,Ln ,an ,cn Þ, and the optimal transmitted power (Popt n ) can be obtained as follows: W Lopt , n ¼ log2 1 þ
sn
gnopt ¼ 2cn an sn W, Pnopt ¼
cn an W
s2n
:
ð11Þ
The optimal number of active sensor nodes (Kopt 1 ) should follow the following procedure. First, all sensor nodes are sorted by their gopt from the smallest to the largest values. Second, an index i is defined for each sensor node by beginning with i1 for the smallest gopt and i2,i3,..,in according to their order. Sn is defined to be a set of these indexes {i1,i2,i3,y,in}. Finally, the optimal number of active sensor nodes is obtained by X opt Pi rP0 : ð12Þ K1opt ¼ maxn i A Sn
Implementing this procedure in WSNs can be done as follows. First, the sink node determines the maximum equivalent
44
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
unit-energy MSE function gopt th based on collected information in networks using (13) and broadcasts this threshold value to all local nodes. opt gth ¼ argmaxgnopt :
ð13Þ
n A SK opt
Second, each sensor node compares its equivalent unit-energy MSE value gopt with received maximum equivalent unit-energy MSE value gopt th . If its equivalent unit-energy MSE value is higher than the received threshold, the sensor node is set to inactive mode. Finally, if its equivalent unit-energy MSE function is lower than the received threshold, it is active and determines its optimal quantized message length and optimal transmitted power using (11) locally. However, in practice, Lopt should be n integer number. Thus, the formulas in (11) are not suitable for sensor nodes. These equations are converted into integer format, which are more practical in WSNs as follows: L
opt
ðs2 ,a,cÞ ¼ argmingðs2 ,b,a,cÞ LAZþ " ¼ argmin cað2L 1Þ s2 þ LAZþ
g opt ðs2 ,a,cÞ ¼ gðs2 ,L P
opt
ðs2 ,a,cÞ ¼ cað2L
opt
opt
!#
W2 L
2
ð2 1Þ
1Þ:
ð14Þ
(1.3) Data representation-based hidden Markov model (DRHMM) scheme (Oka and Lampe, 2008): While both of the previous schemes model a set of data in sensor networks by focusing on the noise of the sensor and channel, the DRHMM scheme models a set of data by exploiting its temporal and spatial correlations. Another difference is that the above schemes model a set of data by a linear model whereas this scheme models a set of data by a hidden Markov model (HMM). In this case, a HMM manipulates observed values from sensor nodes in a WSN cluster in order to obtain a binary value. By applying this scheme in detection or monitoring applications, a binary value can result in either a target event occurring or a target event not occurring. Since the HMM filters all inputs to obtain just the event identification and its location, which are subsequently sent to the sink node, the volume of communication between sensor nodes and sink node is decreased resulting in a saving of power. However, in general, a HMM requires O(22N) computational complexity. This implies that a high computational load on each sensor node and a large number of inter-communications among sensor nodes is required. The authors in Oka and Lampe (2008) proposed an approximated HMM algorithm which only requires O(N) computational complexity as summarized in the following detail. Let X ¼{Xts} be the hidden random process and Y ¼{Yts} be the random process observed by the sensors, where t and s are temporal index and spatial index, respectively. These processes are observed in additive Gaussian clutter or noise. Thus, observation process Y is related to the hidden process X according to the following HMM: Y ¼ X þV,
ð15Þ
where V is a spatio-temporally white independent and identically distributed (i.i.d) Gaussian (AWG) clutter process that is independent of the hidden process X with zero mean and variance s2v . In this case, the signal-to-clutter ratio (SCR) is defined by 1=s2v . Transition probabilities of this HMM are specified as follows: Pðxt jxt1 Þ ¼
Q ðxt ,xt1 Þ : qðxt1 Þ
Q ðxt ,xt1 ; y,WÞ ¼ Q ðz; y,WÞ ¼ exp zT y þ 12zT WzCðy,WÞ , W¼
Wspat
G
GT
Wspat
!
ð17Þ
!
,
yspat , yspat
y¼
ð18Þ
where z is [(xt)T, (xt 1)T]T and log-partition function Cðy,WÞ is P a normalization constant defined by logð z expfzT y þ 12zT Wz Cðy,WÞgÞ (Ackley et al., 1985). Using the Markovian independence properties of the chain, one can write the well-known forward filter equation or HMM filter of (Baum et al., 1970) X pt ðxt Þ ¼ cPðxt ,y t Þ ¼ cPðyt ,xt Þ Pðxt jxt1 Þpt1 ðxt1 Þ, ð19Þ xt1
,
ðs2 ,a,cÞ,a,cÞ,
ðs2 ,a,cÞ
exponential model with pair-wise interactions as expressed by (17), where ðy,WÞ are in partitioned forms defined in (18). The matrix W is sparse with a bounded radius of interaction and independent of N. This exponential model can be described with only O(N) parameters.
ð16Þ
To avoid the curse of dimensionality and allow the development of a scalable algorithm, Q(xt,xt 1) must be restricted to an
where c is a normalization constant and pt(xt) is known as the propagated a posteriori probability mass function. Since the delayfree filtering problem can be defined as the optimal Xts for every t time and each site based on Y s , a posteriori marginal of Xts has to be calculated as follows: X pt ðxt Þ: ð20Þ pts ðxts Þ ¼ xts a s
It is obvious that determining a posteriori conditionally most likely value, such as argmaxxts pts ðxts Þ in cases of using minimum error probability (MEP), takes a high computational cost. Thus, this scheme proposes an approximated filter of HMM filter while the loss of performance is not significant. The a posteriori marginal of Xts as shown in (20) can be rewritten by substituting (16) into (19) as pts ðxts Þ ¼ c
X
Pðyt ,xt ÞQ ðxt ,xt1 Þ
xts a s ,xt1
pt1 ðxt1 Þ , qðxt1 Þ
ð21Þ
where q(xt 1) is defined as X qðxt1 Þ, 8s A f1,2, . . . ,Ng qs ðxt1 s Þ¼ xts a s
and qðxt1 Þ ¼
X Q ðxt ,xt1 Þ:
ð22Þ
xt
Then, using the definitions in (23) for canonical parameters (logt likelihood ratio) ats , bs and hts, (21) can be approximated by (24). t1 t1 t1 exs as pt1 Þ, s ðx
xt1 bs s
e
t t
exs hs Pðyts jxts Þ,
qs ðxt1 Þ,
pts ðxts Þ c
X xts a s ,xt1
ð23Þ "
zT þ y þ
ht t1
a
#!
b
"
# ! ht 1 T þ z WzC y þ t1 ,W : a b 2
ð24Þ
The approximation in (24) can be implemented using the following algorithm. (1) b’M2 ðy,WÞ // Calculate and store the marginal of qð:Þ as in (22). (2) t’0, at ’b // Initialize the propagated p.m.f marginals. (3) t’t þ 1.
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59 t
(4) at ’M1 ðy þ½at1h b,WÞ // Update the propagated p.m.f as in (24). (5) Go to step 3 // Repeat the update for the next time epoch. This algorithm is a non-linear discrete time deterministic dynamical system in the state variables at , starting from the initial state a0 ¼ b, and driven by the random input sequence ht. Although the computational cost of calculating the entire propagated p.m.f pts(xts) in (20) is reduced by using approximated expression as shown in (24), the subroutines M1 and M2 in the above algorithm still need to compute the marginals of a distribution Q ðz; y,WÞ defined in (17). Based on information geometry methods, it is shown that exact marginalization is an m-projection from Q ðz; y,WÞ to submanifold M0 (more detail of information geometry can be found in Amari and Nagaoka (2000) which is an NP-hard problem. Thus, this scheme applies four popular linear complexity distributed algorithms that can calculate a good approximation of the marginals. Those algorithms consist of Gibbs sampling (GS) (Robert, 2004), mean field decoding (MFD) (Zhang and Fossorier, 2006), iterated conditional modes (ICM) (Dogandzic and Zhang, 2006) and broadcast belief propagation (BBP) (Ihler et al., 2005). Another related approach of the parametric model is using a hidden Markov random field (HMRF) framework to model spatially distributed random phenomena in sensor network environments. Detail of this approach can be found in Dogandzic and Zhang (2006, 2005). Thus far, detail of three parametric modeling schemes that explicitly addressed the power constraints has been presented. The performance of presented algorithms in view of power consumption and practical implementation will be discussed in Section 7. (1.4) Variational filtering in binary sensor network (VFBSN) scheme (Teng et al., 2010): While the DRHMM is relevant to some applications, such as for the detection of plumes or oil slicks (Oka and Lampe, 2008), it does not exploit any of the benefits of network topology. The VFBSN is specifically designed for target tracking applications and exploits some characteristics of clusterbased WSNs. Therefore, the main idea of the VFBSN scheme is to adopt filtering techniques to compress the data needed to exchange within clusters and between clusters of WSNs. This scheme considers the temporal states of targets wanted to be tracked as the complex dynamics of a random variable. Thus, the target tracking problem can be treated as an optimal filtering problem. This problem consists of the recursive calculating of predictive distribution pðxt jz1:t1 Þ and updating of the posterior distribution pðxt jz1:t Þ as follows: Prediction: Z pðxt jxt1 Þpðxt1 jz1:t1 Þdxt1 , ð25Þ pðxt jz1:t1 Þ ¼ Rnx
Update: pðxt jz1:t Þ ¼
pðzt jxt Þpðxt jz1:t1 Þ , pðzt jz1:t1 Þ
ð26Þ
where xt is the hidden target state and xt A Rnx of dimension nx, and z1:t is the sequence of observed data. According to the optimal filtering problem, two models that influence the accuracy and the energy efficiency of the target tracking solution are the transition model pðxt jxt1 Þ as expressed in (25) and the observation model pðzt jxt Þ as shown in (26). In the VFBSN, a coarse but energy-efficient binary proximity observation model (BPOM) (Singh et al., 2007) combined with the cluster-based scheme is proposed. The sensor nodes in this approach are divided into clusters. Each cluster consists of a single cluster head and multiple cluster slaves. During every sampling period, only one cluster of sensors located in the
45
proximity of the target is activated to track a target based on a proposed non-myopic rule. Therefore, the power consumption of inactive clusters can be saved. When adopting the BPOM in VFBSN, the cluster’s members communicate with each other using only 1 bit of binary value; therefore, the intra-cluster communication can be reduced. On the other hand, inter-cluster communication is also needed when a previously activated cluster exchanges information with a newly activated cluster. This is called a hand-off operation. To reduce inter-cluster costs, the VFBSN adopts a variational filtering (VF) algorithm to minimize the hand-off operation. As a result, the power consumption of communication between clusters can be reduced. 5.1.2. Non-parametric modeling Detection, tracking and estimation have often been considered in the framework of parametric models in which the statistics of phenomena under observation are known to the system designer. Such assumptions are typically motivated by data or prior application-specific domain knowledge. However, when data are sparse or prior knowledge is vague, robust non-parametric methods are more desirable. Authors in Predd et al. (2006) showed an overview of the non-parametric framework. They focused on non-parametric distributed learning which aimed to search for the best-fit function of sensor data. They studied not only distributed learning in wireless sensor networks with a fusion center system, but also distributed learning in ad-hoc wireless sensor network systems. For better understanding of how to implement a non-parametric model in ad-hoc wireless sensor network systems, the work of Guestrin et al. (2004) is reviewed in this section. Furthermore, there are other interesting schemes that apply non-parametric modeling techniques (Nguyen et al., 2005; Predd et al., 2009; Perez-Cruz and Kulkarni, 2010). Although those algorithms did not explicitly address the power constraints, they have a potential to be applied to WSNs for alleviating the power problem. Thus, these works (Nguyen et al., 2005; Predd et al., 2009; Perez-Cruz and Kulkarni, 2010) were selected and briefly presented in this section. (2.1) Distributed kernel linear regression model (DKLR) (Guestrin et al., 2004): With the motivation of extracting more complete information, which is shape and data structure, than in the data aggregation technique, this scheme aims to model a local correlation of data using the kernel linear regression function while using less communication. For example, a degree-two polynomial function, f^ ðtÞ ¼ w0 þ w1 t þ w2 t 2 , is used to fit a set of measured data where a set of basis function is {1,t,t2} or hu and a set of coefficient is {w0,w1,w2} or wu. In WSNs, there are two important correlations: temporal and spatial. By exploiting these correlations, the model used to represent local data is the regression function of x and t, where x is a vector of location (x,y,z) and t is a time sequence. This can be written as X f^ ðx,tÞ ¼ wu hu ðx,tÞ f ðx,tÞ, ð27Þ u
where f(x,t) is a set of measured data collected from WSNs at location x and a time index t. The model’s objective is to determine a set of wu that enables the basis function to fit best to a set of measured data f(x,t). The problem can be formulated as an optimization using the root mean square (RMS) criterion as vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m u1 X ðf ðx,tk Þf^ ðx,tk ÞÞ2 w ¼ argmint mk¼1 w vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !2 u m u1 X X f ðx,tk Þ wu hu ðx,tk Þ , ¼ argmint ð28Þ mk¼1 w i
46
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
where m is the number of measured data collected from WSNs. Converting the problem in Eq. (28) into matrix notation and setting its gradient to zero gives the optimal coefficients as follows: w ¼ argminJHwfJ w
T
¼ ðH HÞ
1
node i¼1 in this case, A(1) and b(1) are 0 1 0 1 að1Þ bð1Þ að1Þ 0 11 12 1 B C B C ð1Þ ð1Þ ð1Þ B ð1Þ C C Að1Þ ¼ B @ a21 a22 0 A, b ¼ @ b2 A, 0 0 0 0
ð31Þ
where
T
H f
aðiÞ zj ¼
m X
½kz ðxi ,tk Þhz ðxi ,tk Þ½kj ðxi ,tk Þhj ðxi ,tk Þ,
k¼1
¼ A1 b
where A ¼ HT H, b ¼ HT f:
ð29Þ bðiÞ z ¼
When solving a linear system of Aw ¼b by Gaussian elimination techniques (Golub and Loan, 1989), the coefficient wi can be obtained anytime. However, the system requires a high computational cost. Instead, this problem can be solved by the kernel linear regression technique, which reduces the dimension of matrix A while still preserving the essential structure of sensor network data. The coverage area of sensor networks can be separated into regions where the sensor node’s locations (x) can belong to one of j regions, where j ¼1,2,3,y,l. It is possible that there are overlaps of the regions or that there are multiple nodes in the same region. Each region j owns its corresponding non-negative kernel function Kj. This kernel function can be normalized by the sum of the values of all regions at location x. This normalization is called kernel weight kj(x) which represents the degree in which the location x is associated with the region j. Applying kernel weight, the regression function in Eq. (27) is changed to f^ ðx,tÞ ¼
l X
kj ðxÞ
j¼1
¼
wju hju ðx,tÞ
hju A Hj
l X X j¼1
X
hju
wju ½kj ðxÞhju ðx,tÞ,
ð30Þ
A Hj
where ½kj ðxÞhji ðx,tÞ is treated as H in Eq. (29). Since this scheme is extended from the data aggregation technique, the algorithm is placed on top of a routing algorithm. The routing algorithm is assumed to be a lossless, fixed routing tree with a sensor network localization algorithm which can recover a location in case of changing the locations of nodes over time. The algorithm can be divided into multiple phases and described as follows. In the first phase, called the dissemination phase, the following parameters are defined: a set of neighbor nodes Ei of node Ni in the routing tree, a number of regions or kernel j, a degree u of basis function, a size of time window for updating A and b, and a timer interval for exchanging information in a cluster. Additionally, a set Ki and a cluster Ci of kernel functions in which a node Ni can belong to are specified. For instance, K1 ¼ fK1 ,K2 g and C1 ¼{1,2} imply that sensor node N1 can access two kernel functions which are K1 and K2 of cluster C1. These specific information are broadcasted over the routing tree using existing query dissemination techniques such as TinyDB (Madden, 2003) or directed diffusion (Intanagonwiwat et al., 2000). In the second phase, location xi of nodes Ni is computed according to Eq. (29) to obtain its f^ ðx,tÞ. An example of two nodes {N1,N2}, three kernels {K1,K2,K3}, two clusters C1 ¼{1,2}, C2 ¼{2,3} and a third-order polynomial basis function is briefly demonstrated to help understand this computation. This is the same example presented in Guestrin et al. (2004). Before determining the f^ ðx,tÞ, each node has to know wu by solving the problem of Aw¼b. Each node can locally compute its A(i) and b(i). Thus, for
m X
½kz ðxi ,tk Þhz ðxi ,tk Þ½f ðxi ,tk Þ:
k¼1
Note that a(1) 13 ¼0, since node N1 does not belong to K3. This means that k31 ¼ 0. However, according to (29) the computation requires these forms A ¼ A(1) þ A(2) and b ¼b(1) þ b(2) which are then solved by Gaussian elimination techniques (Golub and Loan, 1989) as follows: 0 ð1Þ 10 1 1 0 bð1Þ a11 að1Þ 0 w1 12 1 B ð1Þ C B ð1Þ ð2Þ ð2Þ CB ð2Þ C C B ð1Þ Ba C @ 21 a22 þa22 a23 A@ w2 A ¼ @ b2 þb2 A, ð2Þ ð2Þ ð2Þ w 3 0 a32 a33 b3 0
að1Þ 11
B B B 0 B @ 0
að1Þ 12 ð2Þ að1Þ 22 þa22
að2Þ 32
0
að1Þ 21 að1Þ 11
að1Þ 12
1
0
w1
1
0
bð1Þ 1
1
C B C ð1Þ CB a21 ð2Þ ð1Þ C C B C B ð1Þ C að2Þ 23 C@ w2 A ¼ B b2 þb2 að1Þ b1 C: 11 A w @ A 3 að2Þ bð2Þ 33 3 ð32Þ
By solving the linear system in (32), wu can be obtained. In order to compute the linear equations above, node N1 computes ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ the terms að1Þ 22 ða21 =a11 Þa12 and b2 ða21 =a11 Þb1 locally and then sends both terms to node N2. Node N2 uses the received terms including its local information, A(1) and b(1), to solve the linear system of row two and row three in (32) to obtain wðC2 Þ which is fw2 ,w3 g. Similarly, if node N2 sends analogous terms to nodes N1, it can solve the linear system of row one and row two in (32) to obtain wðC1 Þ which is fw1 ,w2 g. After obtaining the basis function’s coefficients in the second phase, any nodes in a kernel can determine f^ ðx,tÞ. This modeling compression technique allows the sensor nodes to communicate only necessary information to reduce the transmitted power while preserving the original data structure. (2.2) Distributed non-parametric kernel-based scheme (DNKB) (Nguyen et al., 2005; Predd et al., 2009; Perez-Cruz and Kulkarni, 2010): While the DKLT scheme fits sensor data with linear regression functions, the DNKB scheme fits sensor data with non-linear regression functions. The first work that developed a non-parametric kernel-based methodology for designing decentralized detection systems in WSNs was Nguyen et al. (2005). In contrast to most works presented in a parametric modeling section in which the joint distribution of sensor observations is assumed to be known, this work addressed the problem when only a set of empirical samples was available without a priori parameter of observed processes in sensor networks. The authors in Nguyen et al. (2005) proposed a novel algorithm based on a combination of ideas from reproducing-kernel Hilbert spaces (Aronszajn, 1950; Saitoh, 1988) and the framework of empirical risk minimization from non-parametric statistics. There are several examples in the literature that extend this scheme, such as Predd et al. (2006). In this survey article, recent works that applied this kind of scheme and could have a potential for alleviating the power consumption in WSNs were focused on. In Predd et al. (2009), the authors applied a non-parametric estimation based on
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
a non-parametric kernel-based scheme in which they focused on an algorithm that trained each sensor node to achieve a minimal least square risk functional objective. However, the algorithm had a few drawbacks. After each node completed its training, the node needs to communicate its prediction set to all of its neighbors. The communication burden grows quadratically with the size of neighbor sensor nodes. Moreover, they cannot train several nodes simultaneously. This can be realized only if the union of their set of neighbors is an empty set. However, this further complicates the synchronization procedure. These limitations were solved by Perez-Cruz and Kulkarni (2010) in which they modified the network update rule. In that algorithm, each node sends the same message instead of a prediction set to all of its neighbors. This reduces the communication burden from quadratic to linear in the number and allows asynchronous updates. 5.2. Distributed transform coding (DTC) The transform coding technique decomposes a source output based on transform theories into components/coefficients that are then coded according to their individual characteristics. There are several well-known approaches, such as Karhunen–Loeve transform, cosine transform and wavelets transform. In non-WSN applications without power constraints, such kinds of compression approaches are widely used, especially in image, video and audio compression algorithms. In WSN applications, these approaches are widely adopted in certain types of WSN systems called wireless multimedia sensor networks (WMSNs). The authors in Chew et al. (2008) reviewed and evaluated eight popular image compression algorithms. They pointed out that the set-partitioning in hierarchical trees (SPIHT) waveletbased image compression is the most suitable hardware-implemented image compression algorithm in wireless sensor networks due to its high compression efficiency and its simplicity in coding procedures. However, this survey article is not focused on compression algorithms for WMSNs. The scope of this survey is limited to compression algorithms for conventional WSNs based on the requirements for practical WSN applications presented in Section 3, and algorithms that can fulfill the power saving requirements were also focused on. Readers interested in multimedia compression algorithms for WMSNs are referred to Chew et al. (2008), Puri et al. (2006), Kang and Lu (2007), and Yeo and Ramchandran (2010). Generally, it is difficult to implement a full version of those popular transform-based compression algorithms in WSNs. The main reason is that they often require one of the sensor nodes to have knowledge of all measurements in a network to calculate the transform coefficients. This requirement could potentially increase the volume of inter-node communication, which affects communication costs and results in higher power usage. Due to this reason, there are several works in the literature that aim to approximate or modify those classic transform-based algorithms in order to be suitably applicable to WSNs. In Gastpar et al. (2006), the authors introduced several approximated versions of the distributed Karhunen–Loeve transform (KLT) which have the potential to be applied in sensor networks, distributed image databases, hyper-spectral imagery, and data fusion. This KLTbased approach was studied and extended by several works, such as Nurdin et al. (2009), Goela and Gastpar (2009a,b), and Wiesel and Hero (2009). In more recent work Amar et al. (2010), the authors consider the data fusion problem in which each sensor node observes part of a data vector. Then, each data vector is compressed or encoded by means of KLT. Once those encoding matrices are sent to the fusion center, it reconstructs the entire data vector from these compressed sensor observations with minimal MSE. The work of
47
Amar et al. (2010) extended an iterative local KLT algorithm proposed in Gastpar et al. (2006). A greedy algorithm was proposed in Amar et al. (2010) based on assumptions that the observation is added with noise while the observation in the iterative local KLT algorithm of Gastpar et al. (2006) was assumed to be noiseless. The greedy algorithm is performed on the fusion center or sink node, where it does not have power constraints. In each step, one of the encoding matrices is selected and updated by appending an additional row. The principle of selecting a sensor node is based on the MSE of reconstructed data vector comparing to second-order statistics of observation known to the fusion center. The sensor node that has the largest decreasing of MSE after appending an additional row is selected in each step until all the encoding matrices reach their predefined dimension. Based on the simulation results of Amar et al. (2010), the greedy algorithm performed equally well in terms of MSE when compared to the local KLT algorithm. The advantage is that the complexity of the newer algorithm in Amar et al. (2010) is lower than the previous one in Gastpar et al. (2006). Although both works presented the data compression approach based on KLT in order to reduce the data to be transmitted over the WSNs, they did not evaluate their power consumption performance. For wavelets transform approaches, there are several works in the literature that propose a modified version of distributed wavelets transform using the lifting scheme (Ciancio, 2006). The lifting scheme is an alternative method to compute bi-orthogonal wavelets. It allows a faster implementation of the wavelet transform along with a full in-place calculation of the coefficients (Ciancio, 2006). The scheme consists of three steps: split, prediction and update. A block diagram of wavelet transform implementation based on the lifting scheme is shown in Fig. 10. In the split step, the signals s(n) are split into even signals and odd signals. Both types of signals are then processed in prediction (P) and update (U) steps. Finally, the detail coefficient d(n) and smooth coefficients y(n) are obtained. Since this group of works considered the power constraints in designing their algorithms, details of them are presented here. In Ciancio and Ortega (2004), the authors propose the lifting scheme to generate the 5/3 wavelet coefficients at each of the sensor nodes. Even-numbered sensor nodes would correspond to the even samples and odd-numbered sensor nodes to the odd samples. In this implementation, each sensor node only needs data from its neighbors at a given scale. That means the lifting scheme for the wavelet can be performed in a two-step distributed way, as seen in Fig. 11. During the first step, the odd-numbered sensor nodes receive measurement data from their even-numbered neighbors, and compute the correspondent detail coefficient. In the second step, these coefficients are sent to the even-numbered sensor nodes, and to the central node. The even-numbered sensor nodes use the coefficients (along with their own measurement) to generate smooth coefficients which are then transmitted to the central node. However, in multi-hop sensor networks, measurement data are sent in a particular direction as shown in Fig. 12. Thus,
Fig. 10. Lifting implementation of wavelet transform.
48
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
Fig. 11. Two-step implementation of wavelet transform using lifting.
Fig. 12. Data flow in multi-hop sensor network.
Fig. 13. Partial coefficients are transmitted forward and updated at future sensors until the full coefficients are computed.
a two-step distributed approach as depicted in Fig. 11 is not suitable for implementing in this multi-hop WSN topology. The authors in Ciancio and Ortega (2005) proposed a suitable technique for multi-hop sensor networks which was extended from Ciancio and Ortega (2004). For a multi-hop topology as illustrated in Fig. 13, let D(n) denotes the measurement data at node n. DðnÞ and D p ðnÞ denote the full coefficient and partial coefficient for n-th sensor node, respectively. The 5/3 wavelet coefficients using lifting scheme at odd and even sensor nodes are given by 1 1 Dð2n þ1Þ ¼ Dð2nÞ þDð2n þ 1Þ Dð2n þ 2Þ 2 2 Dð2nÞ ¼
1 1 Dð2n1Þ þ Dð2nÞ þ Dð2n þ 1Þ: 4 4
In Fig. 12, sensor node 2nþ1 does not have access to data from sensor node 2n þ2. Thus, according to the above equations, it can compute just the partial coefficient D p ð2n þ 1Þ as D p ð2n þ 1Þ ¼ 12Dð2nÞ þ Dð2n þ 1Þ. When this partial data D p ð2n þ 1Þ arrive at sensor node 2n þ2, it will be updated to full coefficient as follows: Dð2n þ1Þ ¼ D p ð2n þ1Þ12Dð2n þ 2Þ. For a 1-level 5/3 wavelet, available data at each sensor node are illustrated in Fig. 13. A key concept of this work is to perform partial computations of the transform coefficients at each node. This greatly reduces unnecessary transmission over the network which results in a significant reduction of the overall energy consumption of WSNs. However, this technique still has a weakness in the quantization of partial coefficients which introduces extra distortion. In this article, this issue will not be addressed further and interested
readers are referred to Ciancio and Ortega (2005) and Ciancio (2006) for more details. Authors in Dong et al. (2006) presented an adaptive version of the algorithm in Ciancio and Ortega (2005) to select the optimal wavelet compression parameters that minimize the total energy dissipation in the final stage. Another extension to the wavelet approach in Ciancio et al. (2006) used the technique in Ciancio and Ortega (2005) in order to jointly apply it with the routing tree. While in previous works, the one-dimension (1-D) wavelets transform based on the lifting scheme was considered, the following literature works (Wagner et al., 2005; Wagner and Baraniuk, 2006; Shen and Ortega, 2008a,b) extended the lifting scheme to 2-D wavelets transform which was then applied with the routing tree. Another scheme in distributed wavelet transform for data gathering WSN applications was proposed by Shen and Ortega (2002). In this work, some of the inefficiencies of existing lifting transforms were discussed. In order to address these inefficiencies, they defined a new Haar-like wavelet transform which was analogous with the standard Haar wavelet when applied to 1-D paths. The results from their experiment showed that their new scheme outperformed existing works. 5.3. Distributed source coding (DSC) Distributed source coding (DSC) is a popular approach for data compression in WSNs. This approach follows the Slepian and Wolf theorem (Slepian and Wolf, 2003). This well-known theorem proved that separate encoding is as efficient as joint encoding for lossless compression. Similar results were obtained by Wyner
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
Fig. 14. DSC simple example.
and Ziv (1976) with regard to lossy coding of joint Gaussian sources. This technique will be reviewed in a simple example modified from Xiong et al. (2004). Suppose that there are two correlated eight bits of temperature data measured by sensor nodes X and Y as shown in Fig. 14. The temperature value x from sensor X and temperature value y from sensor Y are related as follows: x A fy3,y2,y1,y,y þ 1, y þ2,y þ3,y þ 4g. This means that the correlation between x and y is characterized by 3r xyr 4. In this case, x is only eight different values from y. Thus, x can be represented by just three bits instead of the original eight bits. However, in DSC we simply take a modulo of temperature value x with respect to eight, which also reduces the required bits to three. Specifically, let x ¼ 33 and y ¼ 30. Instead of transmitting both x and y at eight bits/ sample without loss, we transmit y ¼ 30 and x^ ¼ xðmod 8Þ ¼ 1 in distributed coding. Consequently, x^ indexes the set that x belongs to. In this case, set of index 1 which x belongs to is {1, 9, 17, 25, 33,.., 249}, and the joint decoder picks the element x ¼ 33 which is closest to y ¼ 30. We can see that x and y are encoded separately at sensor nodes X and Y. However, the joint decoder, which knows side information y, can discover x correctly. This approach creates a simple process at the encoder; however, it increases the complexity at the joint decoder. There are two well-known surveys and tutorial articles for the DSC approach in Xiong et al. (2004) and Pradhan et al. (2002). In Pradhan et al. (2002), the authors mainly focus on a practical DSC scheme called distributed source coding using syndromes (DISCUS). In the other article (Xiong et al., 2004), the authors’ main objective was to cover their work on DSC and other relevant research efforts motivated by DISCUS. First, the article reviews lossless source coding of discrete sources with side information at the decoder as a case of Slepian–Wolf coding. In this part, they mainly discussed a binary channel model of two correlated sources (X and Y) and how to establish a link with channel coding through Wyners syndrome concept and other near-capacity channel coding such as turbo and LDPC codes. Second, lossy (Wyner–Ziv) source coding with side information at decoder was considered. In this part, they discussed Wyner–Ziv coding design, especially quantization design, for both discrete (binary symmetric case) and continuous (quadratic Gaussian case) sources. Finally, they discussed several applications of DSC for WSNs and raised the main issues for deploying such an approach in WSNs. In our article, we review practical DSC algorithms applied for WSNs which explicitly considered the power constrains. The early works that focused on energy-efficient distributed source coding schemes for wireless sensor networks are Tang et al. (2007) and
49
Chou and Petrovic (2003). The former literature proposed an energy-efficient adaptive distributed source coding (EEADSC) technique for sensor clusters which have a high spatial correlation as depicted in Fig. 3. This was designed for the specific application of sensor networks that detect a target using acoustic signals, e.g., automatic target recognition (ATR). Based on the same topology as shown in Fig. 3, the encoders of this technique are nodes 2–5 in the sensor cluster, and the decoder is node 1. Since node 1 observes the same event and has a close correlation with other nodes in the cluster, it uses its own observed value as a side information for the decoder. An important concept of this method is to find a bit sequence that conveys information about the coset index which minimizes the Lagrangrian cost, the cost function of bit rate, the distortion and energy cost used in decoding algorithm and transmission. The work of Chou and Petrovic (2003) deployed DSC for simple networks such as a single cluster (star topology), where an aggregation node (decoder) did not suffer from power constrains. In the DSC scheme, since the compression rate directly depends on the amount of correlation among sources which vary over time, it is desirable to have one underlying codebook that is not changed among the sensors but can also support multiple compression rates. Thus, the authors in Chou and Petrovic (2003) devised a tree-based distributed compression code that can provide variable-rate compression without the need to change the underlying codebook. In this literature, the decoder tracks correlation among sensor nodes in a cluster (this mostly affects its computational cost). Then, it sends a query to sensor nodes (encoders) to encode their observed value according to correlation information tracked by the decoder. However, the authors did not include the computational cost in the analysis of power consumption in their experiment report. An interesting point in Chou and Petrovic (2003) was that they tested the proposed algorithm on multiple data types: temperature, humidity and light. The average power saving for these three data types were 67%, 45% and 12%, respectively. We might infer from these results that the algorithm did not deal well with multiple data types per sensor node as presented in Section 3. Both Tang et al. (2007) and Chou and Petrovic (2003) techniques are practical in cases of single cluster topology as illustrated in Fig. 3. However, in order to extend these schemes to a more complex topology as shown in Fig. 15, it is obvious that the decoders (nodes number 1 in each cluster) need a low-complexity
Fig. 15. A wireless sensor network consists of two clusters.
50
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
coding as well as encoders since both of them suffer from power constraints. This affects the decoder’s performance. In order to enable a decoder to operate in more complex topology, DSC algorithms for multiple correlated sources proposed by Liveris et al. (2003) and Lan et al. (1969) can be used. The concept of Lan et al. (1969) was deployed in distributed data aggregation using Slepian–Wolf coding in the cluster-based wireless sensor networks of Zheng et al. (2010). Although these literature works proposed the DSC scheme for the larger scale and more complex topologies of WSNs, they emphasized how to encode and decode correctly. However, they did not consider the size of the codebook used in the DSC scheme, which is another important issue to be considered when implementing the DSC for large scale WSNs. As illustrated in Ramaswamy et al. (2010), a conventional DSC for 20 sensor nodes requires a codebook of size 20 240 or 175 Tera-bytes. This would be impractical for large scale WSNs. In order to solve this problem, the authors in Ramaswamy et al. (2010) designed a bit-subset selector module combined with the DSC decoder at the sink node. This important module’s role is to extract a suitable subset of the received bits for decoding per individual source. As a result, this proposed approach requires a 16 times smaller codebook size compared to a conventional DSC decoder. Based on the same reason as in Chou and Petrovic (2003), the DSC compression rate is directly dependent on the level of correlation among sources which is not constant over time. This implies that it is beneficial to apply multi-rate distributed source coding in WSNs in order to enhance power saving. The preliminary studies of multi-rate for distributed source coding can be found in Wang et al. (2007a,b). In both literature works, the authors proposed an interaction mechanism between the application layer protocol (DSC) and the lower layer protocol (medium access control and physical layers) of the WSN to make joint decisions on multi-rate transmissions to enhance energy efficiency. However, their works focused on modifying existing WSN MAC protocols (e.g., T-MAC) to create a WSN multi-rate transmission platform, but they did not consider multi-rate data coding for the DSC. Because of this reason, the authors in Rezayi and Abolhassani (2009) and Wang et al. (2009a) proposed the multirate DSC compression schemes. The authors in Rezayi and Abolhassani (2009) proposed a technique to apply multi-rate distributed source coding using low density parity check (LDPC) codes. This is to reduce energy consumption between source outputs with high correlations and to decrease the bit error rate (BER) value in low correlations rather than using single rate DSC. On the other hand, the energy consumption of their multi-rate scheme was better than a single rate coding at similar maximum BER value. While Rezayi and Abolhassani (2009) focused on a multi-rate transmission platform and multi-rate distributed source coding separately, an interaction algorithm that optimized multi-rates for both transmission platform and distributed source coding was proposed in Wang et al. (2009a). Although there have been several literature works studying on the DSC approach in recent years, the power consideration of those works is often highly abstracted and does not consider practical issues sufficiently (Oldewurtel et al., 2010). For example, the power consumption of the processing unit is not taken into account in the analysis. This motivates the authors in Oldewurtel et al. (2010) to study the energy consumption of DSC in various and more realistic topologies together with the cluster head selection schemes of WSNs. Three topological models and three cluster head selection schemes were evaluated. The first topological model is random point process (PP) (Stoyan et al., 1995) which is based on Poisson distribution, and a rather unrealistic model was used in this evaluation for reference and comparison purposes. The second topological model called Thomas PP is an
extension of the PP model. The authors argued that the Thomas PP model led to a more realistic deployment model in the context of WSNs since they were often assumed to be clustered. The final topological model is a grid in which sensor nodes are placed on the vertices of a rectangular grid. In their simulation, sensor nodes in these topologies were formed into clusters. The cluster heads were selected by three selection schemes: random selection, closest-to-center of gravity, and closest to sink. To perform DSC, the compressing node is one of the cluster members and the reference node is its cluster head (see Fig. 14 for reference). Using a power consumption model that makes use of measurements obtained from real experiments, the results of simulation in Oldewurtel et al. (2010) pointed out that adopting closest-tocenter of gravity scheme on Thomas PP topology strongly outperformed other combinations. It could save power up to 50.7% and the WSN’s lifetime was extended up to 34.9%. Additionally, the work also described optimal parameters that could maximize the energy saving using DSC. 5.4. Compressed sensing (CS) Prior knowledge of the precise correlations in the data is very essential for compression schemes such as DSC. However, there is an emerging technique called compressed sensing or compressive sensing (CS) (Donoho, 2009) which is based on a sampling theory that leverages compressibility without relying on any specific prior knowledge or assumption on signals. This technique starts its consideration on an n-sample vector of signal (x) which is sparse. A vector x is said to be p-sparse if it has at most p non-zero entries, with p o n. This sparse n-sample signal can be accurately recovered from a small number of non-adaptive, randomized linear projection samples. Specifically, given a set of sparse signal x ¼ ðxi,j Þn1 , we can find a random projection matrix A ¼ ðAi,j Þkn with far fewer rows than columns (i.e., k 5n) to obtain a small compressed data set as y ¼ ðyi,j Þk1 ¼ Ax:
ð33Þ
Then, we can reconstruct an estimation of x by solving an optimization problem as follows: x^ ¼ argminJzJ1
subject to y ¼ Az,
z
P where JzJ1 ¼ N i ¼ 1 jzi j denotes the l1-norm. For the sake of simplicity, we will use a simple example adapted from Chou et al. (2009) in order to demonstrate the CS in multi-hop WSNs. A WSN with four sensor nodes and one sink node is shown in Fig. 16. If the projection vector is A ¼ [0.2,0.1,0.3,0.4] and the projected value is y ¼ Ax ¼ 0.2x1 þ0.1x2 þ0.3x3 þ0.4x4. The sink node can obtain this projected value without the sensor nodes sending their sensor readings to the sink node. This can be achieved by the sink node passing a message along the route S–1–2–4–3–S using source
Fig. 16. Example of multi-hop WSN.
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
routing in the WSN. Let us consider a case of projection vector A ¼ [0.2,0,0.3,0]. The sink node obtains the projected value y ¼ Ax ¼ 0.2x1 þ0.3x3 and the route is S–1–3–S. Thus, A is one of the key parameters that should be considered correctly in this scheme. This projection matrix A mentioned in Quer et al. (2009) is referred to as a routing matrix. In some WSN applications, such as data gathering or monitoring, the observed signal x is not always sparse. Another key parameter is a suitable transformation that makes the signal sparse in its domain. There must exist an invertible transformation matrix T of size n n such that we can write x ¼ Ts:
ð34Þ
where s is sparse. Then, substituting (34) into (33), the compressed data can be obtained as y ¼ Ax ¼ ATs: In the reconstruction process, the optimization problem for obtaining a recovered version of s can be written as: s^ ¼ argminJzJ1
subject to y ¼ ATz:
ð35Þ
z
It was shown that the above l1-minimization could be solved with a linear programming (LP) technique (Haupt et al., 2008). Although the reconstruction complexity of the LP-based decoder is polynomial, the complexity increases dramatically when n is very large. Thus, there are several ongoing researches looking for low-complexity reconstruction techniques (Tropp and Gilbert, 2007; Blumensath and Davies, 2008). Once we obtained a sparse solution s^ , the original data x can be recovered through (34). In Fig. 16, if the projection vector is A¼[0.2,0.1,0.3,0.4], the sink node obtains the projected value as y ¼ Ax ¼ 0:2x1 þ 0:1x2 þ 0:3x3 þ0:4x4 by passing a message along the route S–1–2–4–3–S using source routing in the WSNs. Then, after T is known at the sink node, the sink node can reconstruct x by computing (34) and (35). An overview of this compressed sensing for network data can be found in Haupt et al. (2008) which also introduces the potential of using the compressive sampling theory for data aggregation in a multi-hop WSN. However, no real scheme was reported based on this initial idea. Based on our review above, the compressed sensing scheme does not require complicated computation at the sensor node. Moreover, the total amount of communication can be reduced greatly while the amount of information at the sink node is guaranteed. It is attractive for the applications of WSNs. There are recent literature works that have focused on applying this scheme to data gathering applications. In Quer et al. (2009) and Luo et al. (2009), the authors addressed the data gathering problem in WSNs, where routing was used in conjunction with compression to transport random projections of the data. While the work of Luo et al. (2009) simulated the proposed concept using synthetic sample signals, the work of Quer et al. (2009) quantified the performance of compressed sensing in multi-hop wireless networks using both synthetic signals and real signals. Using synthetic signals which had sufficient sparsity, both works reported that compressed sensing achieved substantial gains compared to plain routing schemes. However, the work of Quer et al. (2009) considered real signals from different environmental phenomena. The main problem is to find the transform matrix T that makes the signal sparse. In their experiments, they tested two of their proposed transformations including two well-known transformations: discrete cosine transformation (DCT) and Haar wavelet (recognized as the first known wavelet) transformation. Neither of them were able to sparsify the data while at the same time being incoherent with respect to matrix A or the routing matrix. They also pointed out that finding a suitable transformation with good sparsity and
51
incoherence properties remained an open problem for data gathering in static WSNs. In addition to sparsity, the gathering cost also depends on the position of the sensors whose samples are aggregated in the measurements. For example, according to Fig. 16, we suppose that A1 ¼ [0.2,0.1,0,0] and A2 ¼ [0,0,0.3,0.4]. Then, the routing of using A1 is S–1–3–S, while using A2 is S–1–2–4–3–S. Even with the same amount of sampling, we can see that routing matrix A2 consumes a higher energy cost. Based on this motivation, authors in Lee et al. (2009) proposed a framework for efficient data gathering in WSNs using spatially localized sparse projections. In order to design distributed measurement strategies that are both sparse and spatially localized, this scheme divides the network into clusters of adjacent nodes and forces projections to be obtained only from nodes within a cluster. This means that routing matrix A has a diagonal structure. Based on their analysis and experiment of independent and joint reconstruction methods, a joint reconstruction method is adopted for localized projections. This scheme introduces overlapping energy measurement among transform functions T. Thus, it can exploit measurements in multiple clusters corresponding to energy in a given transform function T that overlaps those clusters. The works of Lee et al. (2009) reported that their proposed approach outperformed the approach of Quer et al. (2009) because their method achieved power saving with localized aggregation and captured more evenly distributed energy of transform functions. While most of the previous works proposed static compressed sensing for WSNs, the authors in Chou et al. (2009) proposed an adaptive algorithm based on the recently developed theory of adaptive compressive sensing (Ji et al., 2008; Kho et al., 2009) to collect information from WSNs in an energy-efficient manner. The key idea of the algorithm is to perform projections iteratively to maximize the amount of information gained per energy expenditure. However, they proved that this maximization problem is NP-hard. Thus, four heuristic algorithms were proposed to solve this problem. Those algorithms were evaluated in a performance comparison using data from both simulations and an outdoor WSN testbed. The results showed that the proposed algorithm was able to provide a more accurate approximation of the temporal-spatial field for a given energy expenditure.
6. Local data compression approaches This section presents only an overview of algorithms in the local data compression category. Performance reviews and comparisons of these algorithms based on the criteria described in Section 4 can be found in Section 7. The local data compression approach performs data compression locally on each sensor node without distributed collaboration among other sensor nodes. As a result, these schemes usually exploit only temporal correlation of the data and do not depend on the specific WSN topologies. These schemes are suitable for sparse WSNs which have a low spatial correlation property. This data compression category can be broadly classified into two main techniques: lossless compression algorithms, which ensure the correctness of information during compression and decompression process; and the lossy compression algorithms, which may generate some loss of information. An outline of this section is organized as follows. For each technique, we will briefly discuss the overall technique and then focus on the details of the algorithm which explicitly addresses the power constraints and other requirements in WSNs mentioned in Section 3. 6.1. Lossless compression In environmental monitoring tasks, particularly for new science discoveries, the accuracy of observations is often critical
52
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
for understanding the underlying physical processes. Scientists may not have a priori knowledge about the magnitude of observational errors that are tolerable without affecting their new research findings (Liang and Peng, 2010). Additionally, the criticality of some application domains demands sensors with high accuracy and cannot tolerate measurement being corrupted by compression processes such as body area networks (BANs) in which sensor nodes permanently monitor and log vital signs. Each small variation of these important signals has to be captured because it might provide crucial information to make a diagnosis (Marcelloni and Vecchio, 2009). Thus, lossless data gathering in WSNs is essential and desirable. To the best of our knowledge, only a few literature works have discussed the lossless compression for local data compression approaches. The first type of such algorithms is the lossless compression based on dictionary approaches (Marcelloni and Vecchio, 2009; Sadler, 2006). The second type is the lossless compression based on predictive coding approaches (Liang and Peng, 2010; Huang and Liang, 2007). Typically, dictionary-based algorithms are used to compress all kinds of data; however, care should be taken with their use. This approach is most useful when structural constraints restrict the frequently occurring patterns to a small subset of all possible patterns (Sayood, 2006). There are well-known techniques which consists of static dictionary techniques and adaptive dictionary techniques. While the static dictionary techniques such as diagram coding have a fixed code length, the adaptive dictionary techniques have a dynamic code length. The well-known examples of adaptive-dictionary-based algorithms are briefly mentioned here, such as LZ77, LZ78, and LZW. First, LZ77 was developed in 1977 (Ziv and Lempel, 1977) by Lempel and Ziv. Second, LZ78 (Ziv and Lempel, 1978) was developed by the same authors in 1978. Third, LZW (Welch, 1984) was developed by Welch. It is an extension of LZ78. The details of these algorithms can be found in Sayood (2006, Chapter 5). For WSNs, the authors in Barr and Asanovic´ (2006) and Sadler (2006) analyzed the original version of those algorithms and pointed out that they were not suitable for sensor nodes because their processing and memory requirements were greater than the ones available in commercial sensor nodes. Thus, approximated versions of these algorithms were proposed. In Sadler (2006); Lzo (1987), the authors introduced S-LZW and miniLZO which were purposely adapted versions of LZW and LZ77, respectively. The comparison results in Sadler (2006) reported that S-LZW outperformed miniLZO. Authors in Marcelloni and Vecchio (2009) who proposed approximated version of exponential-Golomb code (Teuhola, 1978) called LEC showed that their algorithm outperformed S-LZW. Therefore, only the LEC will be described in detail here. (1.1) A simple lossless entropy compression (LEC) scheme (Marcelloni and Vecchio, 2009): First, the measured signal from the sensing unit in a sensor node is converted by an analog-todigital converter (ADC) into a binary representation ri with a size of R bits. Then, the LEC algorithm computes a difference di ¼ ri ri 1 which is an input to an encoder. A key concept of this scheme is to apply a modified version of exponential-Golomb code in the encoder. The basic idea of the original version of exponential-Golomb code is to divide the alphabets of nonnegative numbers into groups whose sizes increase exponentially. Its codeword is a hybrid of unary and binary codes. In particular, the unary code (a variable-length code) specifies the group, while the binary code (a fixed-length code), represents the index within the group. In the LEC scheme, they also divide numbers into groups whose sizes increase exponentially but each group is entropy coded rather than unary coded as in the original version. Thus, a dictionary used in this scheme is called a prefix-free-table.
Table 1 The dictionary used in LEC (Marcelloni and Vecchio, 2009). ni 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
si
di
00 010 011 100 101 110 1110 11110 111110 1111110 11111110 111111110 1111111110 11111111110 111111111110
0 1, þ1 3, 2,þ 2,þ 3 7,y, 4, þ4,y,þ7 15,y, 8,þ8,y,þ15 31,y, 16,þ 16,y,þ 31 63,y, 32,þ 32,y,þ 63 127,y, 64,þ 64,y,þ 127 255,y, 128,þ 128,y,þ255 511,y, 256,þ 256,y,þ511 1023,y, 512,þ 512,y,þ1023 2047,y, 1024,þ 1024,y,þ2047 4095,y, 2048,þ 2048,y,þ4095 8191,y, 4096, þ4096,y,þ8191 16 383,y, 8192, þ 8192,y,þ16 383
Additionally, the LEC scheme can be used with both negative and non-negative numbers. Based on the above concept, the di representation are divided into two groups of codes: si and ai. First, the code si is a variablelength symbol generated from ni by using Huffman codes instead of unary code, where ni is computed from dlog2 ðjdi jÞe which is not greater than R. Second, the code ai, which is also a variable-length symbol, is a binary representation of di over ni bits. When di 40, ai is represented by ni low-order bits of two’s complement of di. For negative di or di o 0, ai is represented by ni low-order bits of the two’s complement of (di 1). When di ¼ 0, si is coded as 00 and ai is not presented. Finally, bit sequence used to represent di is formed by concatenation as si jai . Table 1, an excerpt from Marcelloni and Vecchio (2009), is constructed based on the explanation in the previous paragraph. The codes si in the first 11 lines of the LEC table coincide with the table used in the baseline JPEG algorithm for compressing the DC coefficients (Pennebaker and Mitchell, 1992). This implies that the values measured by the sensor nodes have statistical characteristics similar to the DC coefficients in the JPEG scheme. To demonstrate the LEC scheme, we assume that d1 ¼ 7 is the input of the LEC encoder. Since d1 is less than 0 and belongs to n1 ¼ dlog2 ðj7jÞe ¼ 3, we obtain the code s1 ¼ 100 and the code a1 ¼ 000 which is 3 low-order bits of the two’s complement representation of 71 ¼ 8 which is 000. Thus, the bit sequence used to represent d1 ¼ 7 is si jai ¼ 100j000. The algorithm presented above is the first type of lossless compression. The second type of lossless compression is based on a predictive coding approach (Elias, 1955). This approach is motivated by the fact that environmental monitoring data collected from real-world situations are usually temporally correlated, which leads to information redundancy inside the observation data. This inherent temporal correlation gives rise to the predictive data compression where future observation(s) can be predicted based on recent observations due to their temporal correlation. (1.2) Two-modal transmission (TMT) scheme: The works of Liang and Peng (2010) and Huang and Liang (2007) adopted the modified version of the predictive coding scheme based on the principle of predictive coding originally introduced by Elias (1955) for WSNs. Figure 17 depicts a block diagram of traditional predictive coding for WSNs. The difference between current observed values and predictive values called error terms are generally small. Occasionally, large error terms can occur in real-world environmental observations which might not be predictable. This small number of error terms, often called outliers, degrade the overall coding efficiency. Therefore, the authors of Huang and Liang (2007) proposed an approach called two-modal
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
53
Fig. 17. Encoder/decoder block diagram of predictive coding in scheme (Elias, 1955).
transmission for predictive coding. The first modal transmission, called compressed mode, transmits the compressed bits of error terms falling inside the interval [ R,R]. The second modal transmission, called non-compressed mode, transmits the original raw data of error terms falling outside the interval [ R,R]. The modified predictive coding based on the two-modal transmission approach can solve the problem of decreased coding efficiency due to the low performance of large error terms prediction. The proposed solution to this problem is to find an optimal interval of [ R,R] that provides the maximum compression ratio. In that work, the authors found the optimal interval [ R,R] by an empirical method that used a predictive coding configuration as follows: 1. Data characteristics of the error term were fitted with Gaussian distribution. 2. A linear predictor was employed due to the computational cost and limited computing capability of the sensor nodes. Its coefficient values were computed by using a training set. 3. Arithmetic coding with 10-based alphabet was chosen as the coding scheme according to a theory suggested by Sayood (2006) that arithmetic coding can achieve near-optimal performance value if the alphabet is small and highly skewed. They found that R¼10 yielded the interval of [ 10,10] which was the optimal bound in this case. This scheme was extended by the authors in Liang and Peng (2010) by using a predictive coding configuration as follows: 1. Data characteristics of the error term were fitted with a Laplacian distribution rather than Gaussian distribution. 2. A linear predictor was also employed. The sink node is responsible for computing its coefficient values. 3. Arithmetic coding was also chosen as the coding scheme. However, they were not fixed to a 10-based alphabet. Instead, they applied the optimal M-based alphabet. The key problem for this configuration is to search for the optimal function of R and M that yields the maximum compression ratio. Thus, the authors in Liang and Peng (2010) devised a simple heuristic method to solve the constrained optimization. This proposed heuristic algorithm is adaptively computed at the
sink node. To implement this scheme in WSNs, the sink node, which is not energy-limited, searches for optimal predictor’s coefficients, the optimal bound R and the optimal M for M-based alphabet coding. Then, these optimal parameters are broadcasted to other sensor nodes in order to perform a predictive coding based on the two-modal transmission algorithm. 6.2. Lossy compression Although the lossy compression algorithm may cause some loss of information during compression and decompression processes, it enables an encoder to increase its compression rate, which is higher than the lossless compression algorithm. It is beneficial for some applications that do not strictly require precise information. For WSNs, there are a few literature works, such as Schoellhammer et al. (2004) and Marcelloni and Vecchio (2010), that discussed the lossy compression for local data compression approaches. (2.1) Lightweight temporal compression (LTC) scheme (Schoellhammer et al., 2004): This scheme is designed by considering two key points. First, the observations of micro-climate data over a small enough window of time are linear. Second, the noise exists in commercial sensors as mentioned in Section 3. Based on these observations, LTC was designed for climate monitoring applications by fitting micro-climate data during a short range of time (window) with a sub-linear model. LTC was also designed to compress data when sensor accuracy was expressed as an error margin (e) and when the probability distribution of error was either uniform or unknown. This algorithm will be described by using a graphical example in Fig. 18, which is modified from the original graphical demonstration in Schoellhammer et al. (2004). During the initial state of this algorithm, sample point at time t0 is represented by v0, and sample point at time t1 is represented by the value ranged within error margin [v1 þe, v1 e]. This range is denoted by a vertical bar in the demonstrated figure where v1 þe is called the upper limit (UL) and the v1 e is called the lower limit (LL). Lines that connect point v0 to UL point of v1 and v0 to LL point of v1 are called highLine and lowLine, respectively. In Fig. 18, highLine and lowLine are denoted by l1 and l2, respectively. At t2, point v2 is also represented by UL and LL points. In this state, the line l3 that connects v0 to UL point of v2 is below the current highLine l1. Hence, line l3 is chosen to be a new highLine. Line l2 is still the
54
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
presented in this section. The following two tables compare the schemes presented in previous sections based on the criteria described in Section 4. Tables 2 and 3 compare the distributed approach and local approach, respectively. In certain literature, the authors did not report the exact numbers of each algorithm’s performance. In these cases, we use the O symbol to denote that the algorithm satisfies particular requirements instead of a number. We explain the details of our comparison below. 7.1. Distributed source modeling (DSM)
Fig. 18. LTC searches the best sub-linear models for data in time windows based on sensor margins.
lowLine since it goes above line that connects v0 to LL point of v2. At t3, the current highLine l3 and line lowLine l2 go above line that connects v0 to UL of v3 and line that connects v0 to LL of v3. On the other hand, the value range of v3 is out of the current highLine and lowLine range. After this situation, the search in the first window is stopped and the first sub-linear model is obtained. At this point, the first sub-linear model is denoted by a line m1, which is the line that connects v0 to a mid point (MP) of highLine and lowLine at t2. This MP is chosen to be a new starting point in order to search for the next sub-linear model for the second window. As demonstrated in the diagram, m2 is the next sub-linear model. In LTC, the window size is varied over time as shown in Fig. 18 that the second window’s size is larger than the first window’s size. Therefore, this technique is analogous to a run length encoding in the sense that it attempts to represent a long sequence of similar data with a single symbol. Where run length encoding searches for strings of a repeated symbol, LTC searches for linear trends. (2.2) Differential pulse code modulation-based optimization (DPCM-optimization) scheme (Marcelloni and Vecchio, 2010): This scheme considered three key points. First, the consecutive observations that are typically collected by WSNs are strongly correlated. Second, there are important metrics that should be considered in designing a lossy data compression for WSNs, e.g., compression ratio, information loss and power consumption. Finally, similar to the LEC scheme, this scheme considers the noise that exists in commercial sensors as mentioned in Section 3. Based on these three observations, the authors in Marcelloni and Vecchio (2010) applied an adapted version of differential pulse code modulation (DPCM) for WSNs. Note that the DPCM is a scheme which typically exploits the strong correlation that usually exists between neighboring samples. In order to design a set of optimal combinations of the quantization process parameters for DPCM that trades off three important factors — information entropy (compression performance metrics), signalto-noise ratio (information loss) and quantization complexity (power consumption)—they adopted one of the most popular multi-objective evolutionary algorithms (MOEAs), namely NSGAII (Deb et al., 2002). In this off-line optimization, the training sets collected by real-world WSNs were used. Additionally, de-noise techniques were applied on these data sets which allowed the loss of information in this lossy compression algorithm to be only the noise. Thus, this scheme can achieve a higher compression ratio without losing relevant information.
7. Performance and comparison of data compression in WSNs Comparisons of existing data compression algorithms and an analysis of whether the solution is suitable for the real word are
OQPT/OQET (Xiao et al., 2006; Li and AlRegib, 2009): The compression rates of these two schemes depend on the bit-length of quantized data (L) which is determined by an optimization solution. In this scheme, the authors showed that the power of communication can be reduced and controlled by adaptive quantization and the trade-off between estimation performance and power consumption. Since sensor nodes have to determine the optimal bit-length of quantized data as summarized in (14), each node requires an O(L2) computational cost. This might be high load for sensor nodes if L is large. However, in WSNs L has a value in small finite sets. Therefore, we can conclude that this computational cost does not affect the total power saving. Based on our observations, these schemes decode received data from sensor nodes using an estimation technique and do not require the data to be accurate. Hence, the algorithms in these schemes can be classified as lossy compression algorithms, which are suitable for tracking/detection applications. The authors in Xiao et al. (2006) and Li and AlRegib (2009) did consider intrinsic noise in the sensor but applied the algorithms on single data type only. DRHMM (Oka and Lampe, 2008): Authors in Oka and Lampe (2008) proposed an approximated HMM algorithm that required only O(N) computational complexity. The algorithm demanded a low-computational load and a small numbers of inter-communications among sensor nodes. In their experiment, the authors compared the power consumption between their algorithm and the distributed source coding using syndromes (DISCUS) (Pradhan et al., 2002). Both algorithms were tested in detection applications. The results showed that DRHMM outperformed DISCUS in terms of power saving. As a result, we might infer that DRHMM is more suitable for detection applications than the DSC technique. The example of this application was in the detection of plumes or oil slicks (Oka and Lampe, 2008). The user needs only a few summary statistics, which are the identification and the locations of the sensor nodes, where the plume is reliably detected. VFBSN (Teng et al., 2010): This approach considers the target tracking problem as an optimal filtering problem. Since the authors proposed an approximated transition model and observation model in optimizing this problem on clustered WSNs, the complexity is lower than other conventional approaches. They also evaluated target tracking accuracy, communication costs and computational complexity using state-of-the-art algorithms such as a Gaussian particle filter (Kotecha et al., 2003), binary particle filter (Djuric´ and Bugallo, 2004) and SOI-KF (Ribeiro et al., 2006). While the accuracy is not significantly different, VFBSN has lower communication costs and computation. DNKB (Perez-Cruz and Kulkarni, 2010): Although this nonparametric model-based algorithm did not explicitly address the power constraints, it showed a potential to be applied in WSNs for alleviating the power problem, in particular, if applied in parameter estimation systems for tracking/detection applications. According to the results of their experiment, the sink node can estimate with a desired error rate by using only local information about the network and communicating only with nearby sensor nodes. The amount of communications do not increase even though the number of sensor nodes increases.
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
Therefore, this scheme can save communication power. Moreover, the algorithm has a small code size and low complexity. In summary, we can conjecture that this scheme provides a total power saving in WSNs. Although distributed source modeling-based algorithms can be applied in order to reduce the transmitted data in the networks, those algorithms were only suitable for tracking/detection applications. Additionally, most algorithms are lossy. Therefore, this type of algorithm is not applicable for some application domains that demand sensed data with high accuracy and cannot tolerate measured data being corrupted by the compression process, such as body area networks and new science discoveries. 7.2. Distributed transform coding (DTC) While modeling schemes represent data by functions/models, distributed transform coding schemes transform data into components/coefficients and represent them by suitable codes. Only a set of these codes is transmitted to the sink node instead of sending all raw data. This enables sensor nodes to greatly reduce the transmitted data. Distributed Karhunen–Loeve transform (KLT) (Amar et al., 2010): Although, this data compression scheme based on the KLT approach aimed to reduce the data to be transmitted over the WSNs, they did not evaluate their power consumption performance. However, the algorithm works mostly on the fusion center or sink node, which is not a power constraint. Thus, this approach has potential in saving power consumption. The important advantage is that the fusion center requires low processing time in performing this algorithm. Therefore, this scheme is relevant to the applications that require real-time operation. Distributed wavelet transform-based lifting (DWT-lifting) (Ciancio and Ortega, 2005, 2004): In this scheme, an important assumption is that distances between a sensor node and its neighbors are much shorter than the distance between the sensor node and the sink node. This assumption is required because sensor nodes need to communicate with each other in order to compute the coefficients. However, the communication between sensor nodes and the sink node is lower. In Ciancio and Ortega (2004), the authors considered three components of power consumption: computational cost (algorithm performance), communication cost among sensor nodes, and communication cost between sensor nodes and sink nodes. They pointed out that computation of a wavelet coefficient using the lifting scheme took only two multiplications and four additions. This computational cost is very low compared to other components of power consumption. The compression ratio or average number of bits per sensor of this scheme outperforms non-compression processing. Thus, this scheme yields a total power saving. However, the degree of data compression ratio depends on the signal-to-noise ratio (SNR) that exists in the network, while the degree of power saving also depends on SNR. Distributed wavelet transform-based Harr (DWT-Harr) (Shen and Ortega, 2002): This scheme is similar to the DWT-lifting (Ciancio and Ortega, 2005, 2004) because the degree of compression ratio and power saving depend on the SNR that exists in the network. The difference is that DWT-Harr provides a better performance. 7.3. Distributed source coding (DSC) The main similarity in both distributed source modeling and transform coding is that they require more inter-communication among sensor nodes in order to infer (in HMM), learn (in regression method) and compute (distributed transform coding) observed data in networks. This problem can be reduced by means of distributed source coding since it does not need any
55
communication between each sensor node as presented in Section 5.3. EEADSC (Tang et al., 2007): The authors measured the power using a research version of Compaq iPAQ PDA. By deploying 234 frames of field data, although the computational cost of compression version was increased to 508.79 mJ from 0 mJ in noncompression system, the communication cost was decreased to 202.85 mJ from 1810 mJ. The total power saving using this DSC algorithm in a sensor node achieved 66.77% of a non-compression sensor node. Joint multi-rate transmission and muti-rate DSC (DSC-multirate) (Wang et al., 2009a): In this scheme, the authors aimed to enhance an information efficiency in the DSC-Multirate algorithm. The information efficiency is defined as a ratio of correctly decoded DSC information bits to overall energy consumption (bits/J) in which the higher value is better. In their analysis, the authors mentioned that computational costs and related overhead costs were included. The results in terms of information efficiency showed that the DSC-multirate algorithm outperformed the traditional DSC scheme in which different transmission rates for different source coding nodes were not considered. An interesting finding in this work was that the packet error rate in the communication channel caused higher re-transmission and resulted in increased power consumption. For this reason, the authors developed a modified scheme called jointly optimized DSC-multirate with re-transmission limit. The improved scheme could outperform DSC-multirate without re-transmission in terms of information efficiency and power consumption. Oldewurtel10 (Oldewurtel et al., 2010): In this work, the author simulated and evaluated the power consumption of the proposed approach using a power consumption model that makes use of measurements obtained from real experiments. By adopting a closest-to-center of gravity scheme on Thomas PP topology, this combination can achieve energy saving of up to 50.7% and lifetime extension of WSNs up to 34.9%. 7.4. Compressed sensing (CS) A disadvantage of DSC is that prior knowledge of the precise correlations in the data must be known. Therefore, DSC performance will depend on specific assumptions. In contrast, the compressed sensing theory provides data compression techniques in WSNs that do not rely on any specific prior knowledge or assumption on signals. In Table 2, this kind of scheme satisfies our requirements according to the explanation in Section 5.4. However, most of the algorithms that we previously presented used generated data which had sufficient sparsity in their experiments. There were only two literature works (Chou et al., 2009; Quer et al., 2009) that tested their algorithms using real-world data. Although the results of experiments using real-world data were worse than with synthesized data, their error reconstruction e for a given energy expenditure was acceptable; e was lower than 0.3 in case of Quer et al. (2009). Based on our observation, the work in Quer et al. (2009) used multiple data types while the work in Chou et al. (2009) used only a temperature data set. The results in Chou et al. (2009) seem to provide better performance than in Quer et al. (2009). 7.5. Local data compression While the distributed data compression approach has some limitations as shown in Table 2, the local data compression approach does not suffer from those issues. This kind of technique is universal and provides a robust data compression scheme.
Limited to some specified application and require much inter-sensor communication
Limited to some specific topologies and require much inter-sensor communication
Limited to a star topogies
Need to know an exact correlation among sensor nodes
Quer et al. (2009) is tested on realworld multiple data types Need a suitable transformation in order to make a good sparsity for real-world data Single Single Single Single
Single
Single Single Single
Single
Lossy Lossless Lossless Lossy Lossy
O O O
Lossy
Lossless Lossless Lossless
Lossless
O O O O O O
O
O O O O
O
O(N) OðNÞ/low cost OðL2 Þ
O
O
O 50.7% H66%
Low cost Low cost Low cost
O O O
Compression rate Communication saving Computation cost/complex Net of power saving Algorithm code size Loseless/lossy algorithm Single/multiple data types Limitation/ issued
n/a n/a Optimal
O
Low cost
Low cost Low cost H22%
O n/a H88%
Depends on data sparsity n/a H88%
Depends on SNR Depends on SNR O O n/a
Depends on packet error O
(Oldewurtel DSC-multirate et al. (2010)) (Wang et al., 2009a) EEADSC (Tang et al., 2006) KLT (Amar et al., 2010) DRHMM/VFBSN (Oka and Lampe, 2008; Teng et al., 2010) OQET/OQPT (Xiao et al., 2006; Li and AlRegib, 2009)
DSM Comparison topics
Table 2 The comparison of distributed data compression approaches.
DNKB (Predd et al., 2009; Perez-Cruz and Kulkarni, 2010)
DTC
DWT-lifting (Ciancio and Ortega, 2005)
DWT-Harr (Shen and Ortega, 2002)
DSC
Chou et al. (2009), Quer et al. (2009), Luo et al. (2009), Lee et al. (2009))
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
CS
56
Moreover, it can be combined with a distributed approach in order to exploit both spatial and temporal correlations in data compression for WSNs. In Marcelloni and Vecchio (2009), the authors tested the LEC scheme (Marcelloni and Vecchio, 2009) using real-world multiple data types. They also compared the performance between the LEC scheme and the LTC scheme (Schoellhammer et al., 2004). The performance comparison is reported in Table 3. Although the compression ratios are not significantly different, the data correctness of the LEC scheme is better. In the TMT scheme (Liang and Peng, 2010; Huang and Liang, 2007), the optimization is computed at the sink node. The major computation at the sensor node is dominated by arithmetic encoding which consists of four additions, two integer multiplications, two shifts and two comparisons in each round of encoding. The authors included this computational cost, which was only 4%, when determining the total power saving. Since the power saving from communication reduced by using the TMT compression scheme was much larger than the computational cost, the total power saving can achieved 36% in a 1-hop transmission. In the final scheme, called DCPM-optimization (Marcelloni and Vecchio, 2010), the quantization parameters were off-line optimized using a training set and then only the optimized parameters were applied to the sensor nodes. Thus, a low computation of approximately six instructions per saved bit is required in order to execute the algorithm. Although its performance is the best one in terms of both compression ratio and total power saving, there is still a limitation. The compression algorithm is not adaptable to changes of the data model. This is a consequence of off-line optimization. In order to fix this problem, the training set should be collected by sensors that are of the same type and frequency as the ones used in a real WSN application.
8. Conclusion In this article, we surveyed existing data compression approaches that could be practically applied in WSN applications. We reviewed and analyzed these approaches based on the requirements which we observed from real-world applications. The contributions of this survey can be summarized as follows. First, it is found that each approach has advantages and disadvantages in different ways. Thus, no data compression approach is the most suitable for all WSNs. Let us consider the two main approaches: the distributed data compression approaches and the local data compression approaches. The distributed approaches usually require the specific assumptions or models of WSNs, such as an observation model in the DSMbased HMM scheme (Oka and Lampe, 2008; Teng et al., 2010) or a correlation model in the DSC scheme (Tang et al., 2006; Shen and Ortega, 2002; Wang et al., 2009a). In practice, these assumption models cannot be corrected over time. The uncontrollable situations that varied from the assumptions can degrade the overall performance of the scheme. On the other hand, the local approaches (Marcelloni and Vecchio, 2010, 2009), which require less assumptions or models than the previous one, are more robust for any uncontrollable situations in WSNs. However, based on our observations, it cannot be assured that the performances of the local approaches are better than those of the distributed approaches. The reason that supports our observation is that the local approaches do not exploit any benefits of network routing and topology. This kind of data compression approaches might not give the best performance in dense and complex WSNs. In contrast, the distributed approaches might give the worst
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
57
Table 3 The comparison of local data compression approaches. Comparison topics
Compression rate Communication saving Computational cost /complexity
Local compression LEC (Marcelloni and Vecchio, 2009)
TMT (Liang and Peng, 2010; Huang and Liang, 2007)
LTC (Schoellhammer et al., 2004)
DPCM-optimization (Marcelloni and Vecchio, 2010)
H 45–75% H 53% 12 instructions for a saved bit
H 40% H 40% 4 additions, 2 integer multiplications, 2 shifts, 2 comparisons H 36%
H 45–75% H 45–75% H 49% instructions for a saved bit
H 85–94% H 85–94% 6 instructions for a saved bit
O
H 66%
O Lossy Single
O Lossy Single
Net of power saving
H 32%
Algorithm code size
O O Lossless Lossless Tested on real-world multiple Single data types Do not exploit spatial correlation in WSNs
Loseless/lossy algorithm Single/multiple data types Limitation/issued
performance in sparse WSNs due to a lack of correlation in network routing and topology. Based on our survey, there is still potential and opportunities for new research in developing more robust data compression scheme for dense WSNs by investigating and developing routing and topology algorithms for the local approaches. Second, in Sections 5 and 6, we reviewed each approach mentioned above focusing on the literature from the year 2005 to 2010. However, not all of the schemes we outlined in both sections are practical for implementation in real-world WSNs. The aim of our survey is to find a set of practical data compression algorithms based on our observations for real-world requirements. Thus, the high potential of practical approaches is selected and summarized in Tables 2 and 3. The reasons that support the potential of a practical approach for WSNs are illustrated in Section 7. Third, an interesting issue raised by Sornsiriaphilux et al. (2010) is that there are often multiple data types generated in each sensor node that could be equipped with multiple sensors to observe different environmental phenomena. Our survey showed that there are few literature works that aim to compress the multiple data types. The data compression scheme in literature works (Chou and Petrovic, 2003; Quer et al., 2009), which are distributed approaches, was tested using multiple data types. However, the performances were not good enough for multiple data types as shown in Section 5.3. The schemes in Marcelloni and Vecchio (2009) and Sornsiriaphilux et al. (2010), which are local compression approaches, are tested using multiple data types. The authors in Sornsiriaphilux et al. (2010) modified the scheme in Marcelloni and Vecchio (2009) to specifically address this multiple data type issue. Both algorithms showed better performance than the schemes in the distributed approaches of Chou and Petrovic (2003) and Quer et al. (2009). However, these local algorithms exploited only temporal correlation in WSNs and are not suitable for dense WSNs. This issue is also open for further research study. The effectiveness of data compression algorithms for a particular application is still an open issue requiring further investigation in WSNs. This article summarized and compared details, performance, suitable applications, and limitations of recently developed schemes in this interesting research topic. The surveyed results in this work will be a useful guideline in designing new practical data compression algorithms for wireless sensor networks.
Acknowledgments This research is financially supported by Thailand Advanced Institute of Science and Technology-Tokyo Institute of Technology
(TAIST-Tokyo Tech), National Science and Technology Development Agency (NSTDA), Tokyo Institute of Technology (Tokyo Tech), National Research Council of Thailand (NRCT) and Kasetsart University (KU).
References Abdelzaher T, He T, Stankovic J. Feedback control of data aggregation in sensor networks. In: In conference on decision and control, 2004. Ackley DH, Hinton GE, Sejnowski TJ. A learning algorithm for Boltzmann machines. Cogn Sci 1985; 9. Amar A, Leshem A, Gastpar M. Recursive implementation of the distributed Karhunen–Loe ve transform. Trans Signal Process 2010;58:5320–30. Amari S, Nagaoka H. Methods of information geometry. Translations of mathematical monographs, vol. 191. Oxford University Press; 2000. Anastasi G, Conti M, Di Francesco M, Passarella A. Energy conservation in wireless sensor networks: a survey. Ad Hoc Networks 2009;7(3):537–68. Aronszajn N. Theory of reproducing kernels. Trans Am Math Soc 1950;68: 337–404. Aysal TC, Barner KE. Constrained decentralized estimation over noisy channels for sensor networks. IEEE Trans Signal Process 2008;56(4):1398–410. Baronti P, Pillai P, Chook VWC, Chessa S, Gotta A, Hu YF. Wireless sensor networks: a survey on the state of the art and the 802.15.4 and ZigBee standards. Comput Commun 2007;30(7):1655–95. Barr KC, Asanovic´ K. Energy-aware lossless data compression. ACM Trans Comput Syst 2006;24:250–91. Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov Chains. Ann Math Stat 1970;41(1):164–71. Blumensath T, Davies M. Gradient pursuits. IEEE Trans Signal Process 2008;56(6): 2370–82. Boyd S, Vandenberghe L. Convex optimization. Cambridge University Press; 2004. Cayirci E. Data aggregation and dilution by modulus addressing in wireless sensor networks. IEEE Commun Lett 2003:355–7. Chew LW, Ang LM, Seng KP. Survey of image compression algorithms in wireless sensor networks. In: International symposium on information technology (ITSim), vol. 4, 2008. p. 1–9. Chou J, Petrovic D. A distributed and adaptive signal processing approach to reducing energy consumption in sensor networks. In: Proceedings of IEEE INFOCOM, 2003. p. 1054–62. Chou CT, Rana R, Hu W. Energy efficient information collection in wireless sensor networks using adaptive compressive sensing. In: IEEE 34th conference on local computer networks (LCN), 2009. p. 443–50. Ciancio R, Pattem S, Ortega A, Krishnamachari B. Energy-efficient data representation and routing for wireless sensor networks based on a distributed wavelet compression algorithm. In: International conference on information processing in sensor networks (ISPN). ACM Press; 2006. p. 309–6. Ciancio A, Distributed wavelet compression algorithms for wireless sensor networks. PhD thesis, University of Southern California, USA; 2006. Ciancio A, Ortega A. A distributed wavelet compression algorithm for wireless sensor networks using lifting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol. 4, 2004. p. iv633–6. Ciancio A, Ortega A. A distributed wavelet compression algorithm for wireless multihop sensor networks using lifting. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol. 4, 2005. p. iv825–8. Company xbow, online, /http://www.xbow.com/S [lasted access on 15.08.10].
58
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
Croce S, Marcelloni F, Vecchio M. Reducing power consumption in wireless sensor networks using a novel approach to data aggregation. Comput J 2008;51: 227–39. Cui S, Goldsmith AJ, Bahai A. Energy-constrained modulation optimization. IEEE Trans Wireless Commun 2005;4:2349–60. Davis project, online, /http://cheas.psu.edu/data/ux/wcreek/wcreek2000 met. txtS [lasted access on 15.08.10]. Deb KD, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 2002;6(2):182–97. Djuric´ MVPM, Bugallo M. Signal processing by particle filtering for binary sensor networks. In: Proceedings of IEEE 11th digital signal processing workshop and IEEE signal processing education workshop, 2004. Dogandzic A, Zhang B. Distributed signal processing for sensor networks using hidden Markov random fields. In: Proceedings of 39th annual conference on information science and systems, 2005. Dogandzic A, Zhang B. Distributed estimation and detection for sensor networks using hidden Markov random field models. IEEE Trans Signal Process 2006;54(8):3200–15. Dong H, Lu J, Sun Y. Adaptive distributed compression algorithm for wireless sensor networks, vol. 3. Los Alamitos, CA, USA: IEEE Computer Society; 2006. p. 283–6. Donoho D. Compressed sensing. IEEE Trans Inf Theory 2009;52(4):1289–306. Elias P. Predictive coding—i. IRE Trans Inf Theory 1955;1(1):16–24. Fasolo E, Rossi M, Widmer J, Zorzi M. In-network aggregation techniques for wireless sensor networks: a survey. IEEE Wireless Commun 2007;14(2): 70–87. Gao T, Massey T, Selavo L, Crawford D, Chen B, Lorincz K, et al. The advanced health and disaster aid network: a light-weight wireless medical system for triage. IEEE Trans Biomed Circuits Syst 2007;1(3):203–16. Gastpar M, Dragotti P, Vetterli M. The distributed Karhunen-Loe ve transform. IEEE Trans Inf Theory 2006;52(12):5177–96. Goela N, Gastpar M. Distributed Karhunen–Loe ve transform with nested subspaces. In: Proceedings of the 2009 IEEE international conference on acoustics, speech and signal processing, ICASSP ’09, 2009a. p. 2405–8. Goela N, Gastpar M. Linear compressive networks. In: Proceedings of the 2009 IEEE international conference on symposium on information theory, ISIT’09, vol. 1, 2009b. p. 159–63. Golub G, Loan CV. Matrix computations. Johns Hopkins; 1989. Guestrin C, Bodik P, Thibaux R, Paskin M, Madden S. 2004. Distributed regression: an efficient framework for modeling sensor network data. In: International conference on information processing in sensor networks (IPSN). Gubner JA. Distributed estimation and quantization. IEEE Trans Inf Theory 1993;39(4):1456–9. Hao Q, Brady DJ, Guenther BD, Burchett JB, Shankar M, Feller S. Human tracking with wireless distributed pyroelectric sensors. IEEE Sensors J 2006;6(6): 1683–96. Haupt J, Bajwa WU, Rabbat M, Nowak R. Compressed sensing for networked data, signal processing magazine. IEEE 2008;25(2):92–101. Huang F, Liang Y. Towards energy optimization in environmental wireless sensor networks for lossless and reliable data gathering. In: IEEE international conference on mobile adhoc and sensor systems (MASS), 2007. p. 1–6. Ihler ET, Member S, Iii JWF, Moses OL, Willsky AS. Nonparametric belief propagation for self-localization of sensor networks. IEEE J Sel Areas Commun 2005;23:809–19. Intanagonwiwat C, Govindan R, Estrin D. Directed diffusion: a scalable and robust communication paradigm for sensor networks. In: MOBICOM, ACM; 2000. p. 56–67. Ji S, Xue Y, Carin L. Bayesian compressive sensing. IEEE Trans Signal Process 2008;56(6):2346–56. Kang LW, Lu CS. Multi-view distributed video coding with low-complexity intersensor communication over wireless video sensor networks. In: IEEE international conference on image processing (ICIP), vol. 3, 2007. p. III-13–16. Kay SM. Fundamentals of statistical signal processing: estimation theory. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.; 1993. Kho J, Rogers A, Jennings N. Decentralised control of adaptive sampling in wireless sensor networks. ACM Trans Sensor Networks 2009;5(3):19–53. Kimura N, Latifi S. A survey on data compression in wireless sensor networks. In: International conference information technology: coding and computing (ITCC), 2005. Kotecha JH, Djuric´ PM, Member S. Gaussian particle filtering. IEEE Trans Signal Process 2003;51:2592–601. Lan C, Liveris, AD, Narayanan K, Xiong Z, Georghiades C. Slepian–Wolf coding of multiple M-ary sources using LDPC codes. In: IEEE data compression conference, 2004. p. 549. Lee S, Pattem S, Sathiamoorthy M, Krishnamachari B, Ortega A. Spatially-localized compressed sensing and routing in multi-hop sensor networks. In: Proceedings of the third international conference on geosensor networks (GSN). Berlin, Heidelberg: Springer-Verlag; 2009. p. 11–20. Li J, AlRegib G. Distributed estimation in energy-constrained wireless sensor networks. IEEE Trans Signal Process 2009;57(10):3746–58. Li J, Alregib G. Rate-constrained distributed estimation in wireless sensor networks. IEEE Trans Signal Process 2007;55(5-1):1634–43. Liang Y, Peng W. Minimizing energy consumptions in wireless sensor networks via two-modal transmission. SIGCOMM Comput Commun Rev 2010;40(1):12–8. Liveris A, Lan C, Narayanan, KR, Xiong Z, Georghiades C. Slepian–Wolf coding of three binary sources using LDPC codes. In: International symposium on turbo codes, 2003. p. 63–6.
Luo C, Wu F, Sun J, Chen, CW. Compressive data gathering for large-scale wireless sensor networks. In: MobiCom ’09: proceedings of the 15th annual international conference on Mobile computing and networking. New York, NY, USA: ACM; 2009. p. 145–56. Lzo, online, /http://www.oberhumer.com/opensource/lzo/S [lasted access on 15.08.10]. Madden S. The design and evaluation of a query processing architecture for sensor networks. PhD thesis, UC Berkeley; 2003. Marcelloni F, Vecchio M. An efficient lossless compression algorithm for tiny nodes of monitoring wireless sensor networks. Comput J 2009;52(8):969–87. Marcelloni F, Vecchio M. Enabling energy-efficient and lossy-aware data compression in wireless sensor networks by multi-objective evolutionary optimization. Inf Sci 2010;180(10):1924–41. Nguyen X, Wainwright MJ, Jordan MI. Nonparametric decentralized detection using kernel methods. IEEE Trans Signal Process 2005;53(11):4053–66. Nurdin HI, Mazumdar RR, Bagchi A. Reduced-dimension linear transform coding of distributed correlated signals with incomplete observations. IEEE Trans Inf Theory 2009;55:2848–58. Oka A, Lampe L. Energy efficient distributed filtering with wireless sensor networks. IEEE Trans Signal Process 2008;56(5):2062–75. ¨ ¨ onen ¨ Oldewurtel F, Riihijarvi J, Mah P. Efficiency of distributed compression and its dependence on sensor node deployments. In: VTC spring, 2010. p. 1–5. Pennebaker WB, Mitchell JL. JPEG still image data compression standard. 1st ed. Norwell, MA, USA: Kluwer Academic Publishers; 1992. Perez-Cruz F, Kulkarni S. Robust and low complexity distributed kernel least squares learning in sensor networks. IEEE Signal Process Lett 2010;17(4): 355–8. Pradhan S, Kusuma J, Ramchandran K. Distributed compression in a dense microsensor network. IEEE Signal Process Mag 2002;19(2):51–60. Predd J, Kulkarni S, Poor H. Distributed learning in wireless sensor networks. IEEE Signal Process Mag 2006;23(4):56–69. Predd J, Kulkarni S, Poor H. A collaborative training algorithm for distributed learning. IEEE Trans Inf Theory 2009;55(4):1856–71. Puri R, Majumdar A, Ishwar P, Ramchandran K. Distributed video coding in wireless sensor networks. IEEE Signal Process Mag 2006;23(4):94–106. Quer G, Masierto R, Munaretto D, Rossi M, Widmer J, Zorzi M. On the interplay between routing and signal representation for compressive sensing in wireless sensor networks. In: Information theory and applications workshop, 2009. p. 206–15. Ramaswamy S, Viswanatha K, Saxena A, Rose K. Towards large scale distributed coding. In: IEEE International conference on acoustics speech and signal processing (ICASSP), 2010. p. 1326–9. Rezayi E, Abolhassani B. Multirate distributed source coding in wireless sensor network usind ldpc codes. In: Canadian Conference on Electrical and Computer Engineering (CCECE), 2009. p. 171–4. Ribeiro A, Giannakis G, Roumeliotis S. SOI-KF: distributed Kalman filtering with low-cost communications using the sign of innovations. In: Proceedings of IEEE international acoustics, speech and signal processing, ICASSP 06, vol. 4, 2006, p. IV. Riquelme JL, Soto F, Suardiaz J, Sanchez P, Iborra A, Vera J. Wireless sensor networks for precision horticulture in southern Spain. Comput Electron Agric 2009;68(1):25–35. Robert CP. Monte Carlo statistical methods. New York: Springer; 2004. Sadler, C.M., Martonosi M. Data compression algorithms for energy-constrained devices in delay tolerant networks. In: Proceedings of the ACM conference on embedded networked sensor systems (SenSys), 2006. p. 265–78. Saitoh S. Theory of reproducing kernels and its applications. Harlow, UK: Longman; 1988. Sayood K. Introduction to data compression. 3rd ed. US: Morgan Kaufmann; 2006. Schmidt A. Ubiquitous computing: are we there yet? Computer 2010;43(2):95–7. Schoellhammer T, Osterweil E, Estrin D, Greenstein B, Wimbrow M. Lightweight temporal compression of microclimate datasets. In: Proceedings of the 29th annual IEEE international conference on local computer networks, 2004. p. 224–516. Sensor scope project, online, /http://sensorscope.epfl.chS [(lasted access on 10.10.10]. Sentilla company, online, /http://www.sentilla.com/moteiv-transition.htmlS [(lasted access on 15.08.10]. Sharaf A, Beaver J, Labrinidis A, Chrysanthis K. Balancing energy efficiency and quality of aggregate data in sensor networks. VLDB J 2004;13:384–403. Shen G, Ortega A. Transform-based distributed data gathering. IEEE Trans Signal Process 2002;58(7):3802–15. Shen G, Ortega A. Optimized distributed 2D transforms for irregularly sampled sensor network grids using wavelet lifting. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2008a. p. 2513–6. Shen G, Ortega A. Joint routing and 2D transform optimization for irregular sensor network grids using wavelet lifting. In: International conference on information processing in sensor networks (ISPN), Washington, DC, USA: IEEE Computer Society; 2008b. p. 183–94. Shkarin D. Ppmd. In: /ftp://ftp.elf.stuba.sk/pub/pc/pack/ppmdi1.rarS; 2002. Shtxx sensor, online, /http://www.sensirion.com/S [(lasted access on 15.08.10]. Singh J, Madhow U, Kumar R, Suri S, Cagley R. Tracking multiple targets using binary proximity sensors. In: Proceedings of the sixth international conference on information processing in sensor networks, IPSN ’07, 2007. p. 529–38. Slepian D, Wolf J. Noiseless coding of correlated information sources. IEEE Trans Inf Theory 2003;19(4):471–80.
T. Srisooksai et al. / Journal of Network and Computer Applications 35 (2012) 37–59
Song G, Zhou Y, Zhang W, Song A. A multi-interface gateway architecture for home automation networks. IEEE Trans Consum Electron 2008;54(3):1110–3. Sornsiriaphilux P, Thanapatay D, Kaemarungsi K, Araki K. Performance comparison of data compression algorithms based on characteristics of sensory data in wireless sensor networks. In: International conference on information and communication technology for embedded systems (ICICTES), Thailand, 2010. Srivastava N. Challenges of next-generation wireless sensor networks and its impact on society. J Telecommun 2010;1:128–33. Stoyan D, Kendall WS, Mecke J. Stochastic geometry and its applications. Wiley; 1995. Tang Z, Glover IA, Evans AN, He J. An energy-efficient adaptive DSC scheme for wireless sensor networks. Signal Process 2007;87(12):2896–910. Tang Z, Glover I, Monro D, He J. An adaptive distributed source coding scheme for wireless sensor networks. In: Proceedings of 12th European wireless conference, 2006. Teng J, Snoussi H, Richard C. Decentralized variational filtering for target tracking in binary sensor networks. IEEE Trans Mobile Comput 2010;9(10):1465–77. Teuhola J. A compression method for clustered bit-vectors. Inf Process Lett 1978;7(6):308–31. Tropp J, Gilbert A. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 2007;53(12):4655–66. Wagner R, Choi H, Baraniuk R. Distributed wavelet transform for irregular sensor network grids. In: IEEE statistical signal processing (SSP) workshop, 2005. Wagner RS, Baraniuk RG. Abstract an architecture for distributed wavelet analysis and processing in sensor networks. In: International conference on information processing in sensor networks (ISPN), 2006. Wang W, Peng D, Wang H, Sharif H, Chen HH. Energy efficient multirate interaction in distributed source coding and wireless sensor network. In: IEEE international conference on wireless communications and networking conference (WCNC), 2007a. p. 4091–5. Wang W, Peng D, Wang H, Sharif H, Chen HH. Taming underlying design for energy efficient distributed source coding in multirate wireless sensor network. In: IEEE vehicular technology conference (VTC), 2007b. p. 124–9.
59
Wang W, Peng D, Wang H, Sharif H, Chen HH. Cross-layer multirate interaction with distributed source coding in wireless sensor networks. IEEE Trans Wireless Commun 2009a;8(2):787–95. Wang YC, Hsieh YY, Tseng YC. Multiresolution spatial and temporal coding in a wireless sensor network for long-term monitoring applications. IEEE Trans Comput 2009b;58:827–38. Welch TA. A technique for high-performance data compression. Computer 1984;17(6):8–19. Wiesel A, Hero, AO. Principal component analysis in decomposable Gaussian graphical models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP’09), 2009. p. 1537–40. Wyner AD, Ziv J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans Inf Theory 1976;22:1–10. Xiao JJ, Cui S, Luo Z, Goldsmith AJ. Power scheduling of universal decentralized estimation in sensor networks. IEEE Trans Signal Process 2006;54: 413–22. Xiong Z, Liveris A, Cheng S. Distributed source coding for sensor networks. IEEE Signal Process Mag 2004;21(5):80–94. Yeo C, Ramchandran K. Robust distributed multiview video compression for wireless camera networks. IEEE Trans Image Process 2010;19(4): 995–1008. Zhang J, Fossorier MPC. Mean field and mixed mean field iterative decoding of Low-Density Parity-Check codes. IEEE Trans Inf Theory 2006;52(7): 3168–85. Zheng J, Wang P, Li C. Distributed data aggregation using Slepian–Wolf coding in cluster-based wireless sensor networks. IEEE Trans Veh Technol 2010;59(5): 2564–74. Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Trans Inf Theory 1977;23:337–43. Ziv J, Lempel A. Compression of individual sequences via variable-rate coding. IEEE Trans Inf Theory 1978;24(5):530–6.