Knowledge-Based Systems 60 (2014) 44–57
Contents lists available at ScienceDirect
Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys
Adaptive and online data anomaly detection for wireless sensor systems Murad A. Rassam a,b,⇑, Mohd Aizaini Maarof a, Anazida Zainal a a b
Information Assurance and Security Research Group (IASRG), Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia Faculty of Engineering and Information Technology, Taiz University, 6803 Taiz, Yemen
a r t i c l e
i n f o
Article history: Received 3 June 2013 Received in revised form 18 November 2013 Accepted 3 January 2014 Available online 18 January 2014 Keywords: Wireless sensor networks One-Class Principal Component Classifier Data anomaly detection Adaptive anomaly detection Sensor data analysis Sensor data quality
a b s t r a c t Wireless sensor networks (WSNs) are increasingly used as platforms for collecting data from unattended environments and monitoring important events in phenomena. However, sensor data is affected by anomalies that occur due to various reasons, such as, node software or hardware failures, reading errors, unusual events, and malicious attacks. Therefore, effective, efficient, and real time detection of anomalous measurement is required to guarantee the quality of data collected by these networks. In this paper, two efficient and effective anomaly detection models PCCAD and APCCAD are proposed for static and dynamic environments, respectively. Both models utilize the One-Class Principal Component Classifier (OCPCC) to measure the dissimilarity between sensor measurements in the feature space. The proposed APCCAD model incorporates an incremental learning method that is able to track the dynamic normal changes of data streams in the monitored environment. The efficiency and effectiveness of the proposed models are demonstrated using real life datasets collected by real sensor network projects. Experimental results show that the proposed models have advantages over existing models in terms of efficient utilization of sensor limited resources. The results further reveal that the proposed models achieve better detection effectiveness in terms of high detection accuracy with low false alarms especially for dynamic environmental data streams compared to some existing models. Ó 2014 Elsevier B.V. All rights reserved.
1. Introduction Wireless sensor networks (WSNs) are formed by a number of small, cheap, battery-powered, and multi-functional devices called sensors which are densely or sparsely deployed to collect information from environments or to monitor phenomena [1]. However, the constrained sensor resources in terms of storage, processing, bandwidth, and energy make WSNs more vulnerable to different types of misbehaviors or anomalies. Anomaly is defined in [2] as, ‘‘an observation that appears to be inconsistent with the reminder of a dataset’’. These anomalies always correspond to sensor software or hardware faults, reading errors, and malicious attacks. They may also correspond to some events of interest such as sudden changes in the monitored parameters that may indicate unusual phenomenon. Therefore, it is desirable to efficiently and accurately identifying anomalies in the sensor measurements to ensure the quality of these measurements for making right decisions. The default approach to detect anomalies is to separate them from normal data using various types of classifiers. However, the ⇑ Corresponding author at: Information Assurance and Security Research Group (IASRG), Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia. Tel.: +60 173681303. http://dx.doi.org/10.1016/j.knosys.2014.01.003 0950-7051/Ó 2014 Elsevier B.V. All rights reserved.
absence of ground truth labeled data hinders the direct classification of sensor measurements using traditional classifiers. Instead, the available normal data are modeled using one class classifiers as in [9–11] and then identify any deviation from this model as anomalies. A variety of anomaly detection models for WSNs have been proposed in the literature and they can be classified based on the place in which the detection is performed into centralized and decentralized. In the centralized models, it is assumed that the data is available in a central location (such as base station or cluster head) for further analysis [3]. Therefore, the normal model is built using the whole data sent by all nodes in a specific time period and used to detect any significant deviations. However, these models are not suitable for energy constrained WSNs because they assumed the availability of whole data at central location for further analysis and therefore cause prohibitive communication overhead. On the other hand, these models may be useful as baseline models for comparison with different detection algorithms [4]. Meanwhile, in the de-centralized models, each node uses its own data to build local normal model. The de-centralized models can be further classified based on the decision making mechanism and they can be called node-level and network-level models. In the network-level models [5–9], some parts of processing are locally
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
performed in each node and a cooperation mechanism between a group of nodes (i.e. in cluster) is conducted to decide about the potential anomalies. The distributed models aimed to solve the problem of high communication overhead of the centralized models by sending a summary that represents the normal reference model at each node to the central location. In the central location, the global normal reference model is computed by merging the received local models and sent back to each node for local anomaly detection. The problem of some models of this category is the data heterogeneity which makes the global model unsuitable representative of the network normal behavior. Moreover, the size of the normal model that is exchanged between nodes and the central location may increase the communication overhead which is the main factor of quick sensor energy consumption. In the node-level models such as [4], the processing, analysis, and decision about anomalies are totally performed locally at each node. The decision is then sent to the central location for further remedial action. Besides, most of the existing distributed detection models such as [9–12] are not suitable for online detection as their detection methods incur high computational complexity that quickly consume the limited sensor energy. The effectiveness of anomaly detection models is affected by dynamic changes of deployed environments. Therefore, adaptive detection of anomalies in such environments is an important challenge for assuring the quality of sensor measurements. These changes increase the false alarm rates and therefore affect effectiveness of detection models. Few works have tackled the adaptive detection issues such as [13,14]. However, these models have drawbacks such as the additional communication overhead incurred in distributed models and the high computational cost produced by the incremental learning procedures. The contribution of this paper is three folds: (i) We show how the Unsupervised Principal Component Classifier which was originally proposed in [29] can be adopted as a One-Class Principal Component Classifier suitable for anomaly detection in sensor measurements in the absence of ground truth labeled data. (ii) A new efficient and online anomaly detection model for WSN, namely Principal Component Classifier-based Anomaly Detection (PCCAD) model is proposed. The new model is totally local and does not incur any additional communication overhead. (iii) An adaptive APCCAD model that incorporates an incremental learning method in the design of the PCCAD model to track the data changes in dynamic environments. As a result, the misclassification error caused by this dynamic change is reduced. The efficiency and effectiveness of the proposed models are validated and compared with some existing detection models using real life datasets collected from real sensor network deployments. The remainder of this paper is structured as follows: related anomaly detection models in WSNs are presented in Section 2. The proposed models are described in Section 3. The experimental results, analysis, and evaluation of the proposed models are reported in Section 4. Section 6 concludes the paper and suggests some directions for future research.
2. Related works Anomaly detection in WSNs has many applications including intrusion detection, event detection, fault detection, and outlier detection [3]. The term fault refers to a deviation from an expected value regardless of the cause of that fault. Hence, data faults can be
45
considered as anomalies in the absence of ground truth values [15]. Three types of faults which are short, noise, and constant reading have been studied in [15]. The authors evaluated the performance of three methods to data fault detection in WSN that fall into three approaches: rule-based, estimation-based, and learning-based. It is reported that the methods worked well with high and medium intensity of short injected faults and high intensity noise injected faults. However, in most cases, these methods failed to detect long or constant injected faults and low intensity short and noise injected faults. For the real world datasets, it is reported that these methods performed generally well as these datasets have experienced high intensity faults. A combination of discrete wavelet transform (DWT) and self organizing map neural networks (SOM) to detect data anomalies in WSNs was developed in [16]. In this study, data faults were considered as anomalies. The data measurements were first encoded at each node using DWT and then sent to the base station where SOM was applied on a batch of wavelet coefficients. A similar combination of DWT and one-class support vector machines (OCSVM) was proposed in [17] to develop an anomaly detection model for WSNs. In this model, DWT was used for encoding the data measurements at each node like in [16]. The encoded measurements were then examined for anomalies by the OCSVM at the base station. In both models [16,17], the anomaly detection process was taken place in the base station on batches of encoded measurements. These models can be considered as distributed model as they apply DWT on data of each sensor before sending the encoded coefficients to the base station. They are also considered as centralized models as they perform the anomaly detection on batches of coefficients at the base station. The high computational complexity of OCSVM and SOM methods make these models unsuitable for online detection in sensor nodes. A segmented sequence analysis (SSA) algorithm was used to design an online anomaly detection model in [18]. Data anomalies were detected by comparing the constructed piecewise linear scheme of data collected in a fixed time period with a reference model using similarity metrics and flagged anomalies when there is a significant difference. A data from real world sensor network deployments was used to evaluate and demonstrate the effectiveness and efficiency of the proposed model. It is claimed that this model is efficient and effective in detecting data anomalies compared to some existing models [15]. This model can be considered as distributed model that has two layers of detection. The first layer is locally in the scope of each sensor node and the second layer is in the cluster head. However, there was no feedback from the cluster head to the local nodes and this issue was left as future work. In [5,6], two distributed anomaly detection models were proposed based on clustering ellipsoids of sensor measurements. The normal model was calculated locally in sensors and a summary of the model is sent to the cluster head to calculate the global normal model which is sent back to nodes. The proposed models in these works were aimed to detect anomalies in heterogeneous sensor networks where the distribution of data is evolving. A distance based anomaly detection model for WSNs was proposed in [19]. In this model, PCA was first applied to reduce the dimension of data and then a distance-based method was applied to detect anomalies. However, this model is static since it detects anomalies in batches of sensor measurements in a fixed period of time. Recently, the authors of [20] proposed an online histogrambased anomaly detection model to detect anomalies in a distributed manner. It is claimed that the proposed model overcome the existing histogram based models in such that the verification procedure was exempted by the use of probability estimation. Two important issues were not properly addressed in the works [19,20] which are: first, they assume a high correlation between
46
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
data measurements, so the global reference model that is calculated at cluster head will be a good representative of normal reference models constructed at each sensor. This assumption leads to the fact that sensor measurements obey the same distribution which limits the applicability of the proposed models only to homogeneous sensor network applications; second, the proposed models cannot be generalized for the multivariate data since the histogram techniques do not consider the issue of multivariate and therefore limits the applicability to more general and recent sensor network applications. Ensemble-based classifiers were utilized to detect anomalous sensor nodes behavior in [21,22]. The work of [22] investigated the design of Ensemble Based System (EBS) through the use of three binary classifiers namely, the average based classifier, autoregressive linear predictor based classifier and the neural network based classifier. The work of [22] was further extended in [21] by adding two additional classifiers which are the neural network autoregressive predictor based classifier, and adaptive neuro-fuzzy inference system (ANFIS)-based classifier. Two kinds of input data which included measurement time series collected by the sensor under investigation and the measurements gathered from neighboring sensors were used as input for the ensemble classifier system which was implemented in the base station. The authors concluded that the ensemble system assured the diversity of classifiers to build an efficient decision making system. Although, the ensemble-based classification design enhanced the results over the single-based in these works, some problems related to high communication overhead and the adaptation with dynamic changes need more investigation. A PCA-based anomaly detection model was introduced in [23] in which PCA was used with a fixed width clustering algorithm to establish the global normal profile. Another work reported in [24] describes a data fault detection method for WSNs using multi-scale PCA. In this work, wavelet analysis was integrated with PCA in which wavelet analysis is used to capture the time frequency information while PCA was used to detect faulty data. Existing PCA-based models considered the situation of univariate anomalous data. In reality, some variables which may show normality behavior as individuals are anomalous when considered together. Moreover, they used distance based methods to measure the deviation in the feature space and hence are computationally expensive for resource constrained devices (i.e. sensors). In [9– 11,25–28], the one class quarter sphere support vector machine (OCSVM) was used to design the anomaly detection models. The computational cost of solving the sphere problem of SVM is expensive to be used for online applications. Few works tackled the adaptive detection issue in dynamic WSN applications in which the data variables have dynamic changes over the time such as [13,14]. A sliding window based non-parametric anomaly detection model was proposed by [13]. In this model, the adaptability was considered by updating the normal reference model each sliding window. In [14], three different strategies were used to keep the normal reference model adaptive. Recently, authors of [4] proposed incremental elliptical boundary estimation methods for updating the normal reference model in order to cope with dynamic changes of monitored environments. The update procedure of the detection model in some existing solutions incurs high computational cost in retraining and recalculating the normal model. Moreover, the fixed sliding window based solutions may not be suitable for all types of applications because of frequent changes in sensed variables monitored by these applications. To conclude, most of existing anomaly detection models adopted periodic batch detection at central locations (cluster head or base station). The transmission of data in batches increases the communication overhead and therefore increases the energy
consumption. Moreover, it introduces a detection delay due to time window detection. Few of existing models perform the detection locally but their major disadvantage was the high computational cost of their detection methods. In addition, the adaptability with dynamic changes in the monitored environment was rarely and inadequately touched by previous studies.
3. The proposed models In this section, we first introduce the One-Class Principal Component Classifier as a modification of the existing unsupervised Principal Component Classifier (UNPCC) proposed in [29]. After that, the two proposed anomaly detection models PCCAD and APCCAD are described. 3.1. One-Class Principal Component Classifier The absence of ground truth dataset for evaluating anomaly detection models in WSNs directed the research towards proposing one-class learning methods that classify sensor measurements as normal or anomalies. In this paper, the original UNPCC [29] is modified to deal with those two classes. These types of classifiers such as one-class support vector machine (OCSVM) proposed in [9,10,17] are favorable in the case of anomaly detection in WSNs as they do not require pre-labeled data which are expensive or difficult to obtain. In these classifiers, the normal data are modeled to represent the normal behavior of sensor streams. Any deviation from this behavior is classified as anomalous. The modifications made on the original UNPCC algorithm are as follows: 1. The original UNPCC algorithm was designed based on the original PCA algorithm which involves the calculation of the covariance matrix and the use of singular value decomposition in order to project the data onto feature space. Unfortunately, the computational complexity caused by such projection is not affordable in the case of resource constrained devices such as sensors. Therefore, a lightweight PCA variant called Candid Covariance-Free Incremental Principal Component Analysis (CCIPCA) was used instead. The CCIPCA which was originally proposed in [30], is a dimensionality reduction technique proposed to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix. It was successfully used for adaptive and efficient data reduction for multivariate data in WSNs in [31]. 2. The UNPCC was developed for multiclass problem of intrusion detection in computer networks. To find the detection thresholds, Automated Clustering Threshold Determination (ACTD) procedure was used. Such procedure is not suitable for the case of sensors since it incurs high computational complexity that leads to high energy consumption. In the proposed PCCAD and APCCAD models, only one threshold is needed that separates normal data from anomalies. The details of obtaining this threshold are discussed in Section 3.2. 3.2. The PCCAD model In this section, the OCPCC algorithm is used to design the proposed principal component classifier based anomaly detection model (PCCAD). The PCCAD model has two main phases which are training phase and detection phase. The model is implemented locally in each sensor node. The training phase is conducted offline for the first time using collected data measurements for a specific time period. The detection phase is conducted online whenever the
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
sensor reads a new measurement. The following subsections explain each phase in some details. 3.2.1. Training phase In this phase, the normal data measurements are collected at each sensor node to build the normal reference model which is used later in detection phase for real time anomaly detection. The normal data measurements matrix STrain is first normalized by the mean (l) and standard deviation (,) of the normal data. Then, the CCIPCA algorithm is applied on the normalized measurements to obtain the Eigenvector Matrix (Vi), and their corresponding eigenvalues (ki). The projection of each data measurement of training data, Yi, on the new feature space is then calculated by Eq. (1):
Y i ¼ STrain ðiÞ VðiÞ
X y2 i
ki
projection on the feature space against the pair (MaxDiss, MinDiss). Fig. 2 shows the pseudo code algorithm of the online detection phase. As shown in Fig. 2, the new measurement is first normalized and centered using the same normalization parameters (l; ,), calculated in training phase and stored in each sensor. Then, the projection score of the measurement on the feature space is calculated using the reference model parameter V as in Eq. (5). The dissimilarity measure Dtest for the observation is calculated as in Eq. (6) using V and k. The classification of the new observation as either normal or anomaly is conducted by comparing the Dtest value with the maximum and minimum bounds MaxDiss and MinDiss. If the value of Dtest exceeds one of these bounds, the observation is classified as anomaly.
ð1Þ
The number of PCs suitable for the application is chosen and the dissimilarity measure Dtrain is calculated for each data measurement in the training set using Eq. (2):
Dtrain ¼
47
ð2Þ
The formulation of the dissimilarity measure in Eq. (2) is motivated by the fact that the sum of squares of the normalized principal component scores Yi is equivalent to the Mahalanobis distance of the observation data sample STrain(i) from the mean of the sample [32]. Based on this measure, the normal reference model that represents the normal behavior of sensor data is defined and any deviation from the model is considered as anomaly. We define the reference model by the maximum and the minimum bounds, MaxDiss and MinDiss, of the dissimilarity matrix Dtrain. We chose those two bounds because they represent the boundaries that reflect training data measurements. The reference model is then stored at each node to be used later for the online detection phase. Additional values which are the normalization parameters (l; ,), the eigenvector matrix (V), and the corresponding eigenvalues (k) are also stored in the node to be used in detection phase. Fig. 1 shows the pseudo code algorithm of the training phase for the proposed PCCAD model. 3.2.2. Detection phase This phase operates in real time for each new sensed measurement. In this phase, each new measurement is compared with the normal reference model built in the training phase. It is then classified as normal or anomalous based on the comparison of its
3.3. The APCCAD model The proposed PCCAD model presented in Section 3.2 is designed to detect anomalous sensor measurements in real time. The training of PCCAD model is performed only once in the initialization of the application. However, for some WSN applications that are deployed in dynamic environments, an adaptive detection that is able to track the dynamic data changes is required. In this section, we present the proposed adaptive APCCAD model which incorporates an incremental learning method in the design of the PCCAD model. The incremental learning method learns the statistical characteristics of data such as mean and standard deviation incrementally. These characteristics will be used together with the normal reference model defined in the description details of PCCAD model to detect anomalous measurements in real time. Fig. 3 shows the general structure of the proposed adaptive model. The model has three phases which are: first time training phase, online detection phase, and the retraining and update phase. The first time training and online detection phases are similar to their counterparts in the proposed PCCAD model. However, in the adaptive model, some components are added to achieve the adaptability with dynamic changes. In addition, an additional classification criterion is added to the online detection phase that further checks the abnormality of the measurement based on the statistical characteristics of the environment. At the end, the normal classified measurements are pooled to a buffer of size m to be used for the classifier retraining. In the retraining and update phase, the buffered m normal measurements are used to retrain the classifier and produce a new normal reference model. The new reference model is compared
Fig. 1. The pseudo code algorithm of the training phase of PCCAD model.
48
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
Fig. 2. The pseudo code algorithm of the training phase of PCCAD model.
Fig. 3. The general structure of the proposed adaptive anomaly detection model.
against the old model to decide about the necessity for update. If the update criterion is satisfied, the reference model is updated by the latest reference model and the process repeats. Fig. 4 shows the pseudo code algorithm of the first time training phase in the APCCAD model. It shows that the only difference from the training phase of the PCCAD model is the new component of the reference model which is the standard deviation stdtr which is calculated from the training data. Therefore, the normal reference model includes the tuple ðMinDiss; MaxDiss; V; k; l; stdtrÞ. Fig. 5 shows the online detection phase of the APCCAD model. In this phase, the incremental value of standard deviation stdts for each new Dtest value is calculated. The stdts is then compared with the reference standard deviation stdtr produced in the first time training phase. The detailed steps of the online detection phase are shown in Fig. 5. The new measurement is classified based on the classification criterion shown in step 5 of the algorithm in Fig. 5.
The incremental standard deviation value stdts for each new measurement is calculated as in Eq. (8) in Fig. 5. The implementation of incremental standard deviation procedure ðinc stdÞ can be seen in Fig. 6. It starts by initializing the mean l by the first Dtest value that corresponds to the first measurement and the standard deviation by zero. In the subsequent steps, the calculation of the mean and standard deviation for the current measurement depends on the mean and standard deviation of previous measurements. In the retraining and update phase, the previous m normal classified measurements are used to retrain the classifier and produce the new normal reference model parameters. The standard deviation of the new dissimilarity vector stdts is calculated incrementally using the procedure in Fig. 6 and compared with the reference parameter of previous m measurements. The old stdtr value is replaced by the new reference parameter stdts if the update
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
Fig. 4. Pseudo code algorithm for training phase of APCCAD model.
Fig. 5. Pseudo code algorithm for the online detection phase of APCCAD model.
Fig. 6. Pseudo code algorithm of incremental standard deviation [33].
49
50
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
criterion is satisfied. In our experiments, we choose the update criterion as follows:
If ðnew stdtr > old stdtrÞ then replace: This criterion implies that the replacement (update) of the reference model parameter stdtr is only needed when the new m measurements that are used to retrain the model include a set of measurements that lie outside the boundary of (MaxDiss and MinDiss) which represents the data normality region. As a result, the new boundary should be updated. The update of the reference model parameter stdtr based on this criterion satisfies the concept of ‘‘update when required’’.
used to generate artificial anomalous measurements that slightly deviate from normal data. The generated anomalies are then injected in the normal data traces. Histogram-based Labeling Approach: This approach which was adopted in [7,16,17] uses histogram-based methods and visual inspection to judge the abnormality of data traces and label them as anomalies. By plotting the histograms of data, the normality regions for each dataset are determined. Based on these regions, the dataset samples are labeled. In this paper, for the simulated-based approach, we used the IBRL dataset and generate artificial anomalies to evaluate the proposed PCCAD model. For the histogram-based approach, we used the same labeled datasets that were used in [16,17] which were extracted from IBRL, LUCE, and NAMOS datasets to compare the results of our proposed models with those two models.
4. Experiments and results To validate the proposed models, some data samples were extracted from three WSN deployments which represent static and dynamic environments. The next subsections introduce the datasets and explain the data labeling procedure. 4.1. Datasets and labeling approaches The datasets that are used in this paper are extracted from the following WSN deployments: (i) Intel Berkeley Research Lab (IBRL) IBRL dataset [34] was collected from the WSN deployed at Intel Berkeley Research Laboratory, University of Berkeley. The network consists of 54 Mica2Dot sensor nodes and was deployed in the period of 30 days from 28/04/2004 until 5/04/2004. Four types of measurements were collected which are: light, temperature, humidity, and voltage. The measurements were collected in 31 s intervals. In this research, subsets of this dataset were chosen for evaluating the proposed anomaly detection models. (ii) Sensorscope Lausanne Urban Canopy Experiment (LUCE) LUCE dataset [35] was collected by a sensorscope project in the École Polytechnique Fédérale de Lausanne (EFPL) campus between July 2006 and May 2007. The experiments aimed at better understanding of micrometeorology and atmospheric transport at the urban environments. The measurement system was based on a WSN of 110 sensor nodes deployed on the EPFL campus to measure key environment quantities which include; ambient temperature, surface temperature, and relative humidity. LUCE datasets is a type of dynamic datasets since its variables have frequent changes over time.
4.2. Results and analysis Since we used two approaches of data labeling, the experimental results and analysis on each approach are presented separately. For simulated-based datasets, the PCCAD model is tested and compared with some existing anomaly detection models. Meanwhile, for the histogram-based datasets which contains real anomalies, the PCCAD model is first applied and compared with some existing models. After that, the APCCAD model is tested on the same histogram-based dataset to show the advantages of the proposed adaptive model in tracking the dynamic changes and hence improving the detection effectiveness. The results are analyzed in terms of detection effectiveness which is represented by detection rate (DR), detection accuracy, and false negative rate (FNR). The results will be also analyzed in terms of efficiency which is measured by the computational complexity and storage utilization of the model. 4.2.1. Experimental results with simulated-based datasets Using the simulated-based labeling approach, three dataset samples D1, D2, and D3 were extracted from nodes N8, N9, and N10 in the IBRL deployment. Three variables that report temperature, humidity and voltage were chosen from each node. 100 artificial anomalies were randomly generated using normal distribution and injected in each dataset sequentially. The statistical characteristics measured by mean and standard deviation of both the normal and the generated artificial anomalies for datasets are presented in Table 1. Since anomalies were generated using normal distribution random function, the experiments were repeated 15 times with different testing set and the same training set to show the stability of the performance. The overall performance in terms of effectiveness measures is the average over the 15 runs. In the experiments, the size of training set was 250 instances while the size of each testing set was 150 instances that contain 50 normal instance and 100
(iii) Networked Aquatic Microbial Observing System (NAMOS) Temperature and chlorophyll concentration sensors were deployed in nine buoys at Lake Fulmor, James Reserve for 24 h in August, 2006 [36]. The measurements were sampled every 10 s. Chlorophyll concentration data measurements from the buoys no. 103 were extracted for the purpose of this study. The absence of ground truth labeled dataset for evaluating anomaly detection models in WSNs is a common difficulty that researchers face in this domain. However, two labeling approaches were used to solve this problem which are: simulated-based approach and the histogram-based approach. Simulated-based Labeling Approach: In this approach which was adopted in [8–10,19], the statistical characteristics (usually mean and standard deviation) of the normal data are determined and
Table 1 Statistical parameters of experimental datasets. Dataset
Variable
Normal
Anomalies
Mean
Std
Mean
Std
D1
Temperature Humidity Voltage
18.21 40.2 2.686
0.3837 0.1797 0.00696
18.71 40.5 2.715
0.395 0.188 0.0075
D2
Temperature Humidity Voltage
18.37 41.01 2.755
0.3530 0.2030 0.0071
18.71 40.5 2.725
0.395 0.244 0.0075
D3
Temperature Humidity Voltage
18.38 41.08 2.687
0.3745 0.244 0.0065
18.8 41.6 2.695
0.395 0.254 0.007
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57 Table 2 Average effectiveness of PCCAD model for simulated-based datasets.
Accuracy (%) DR (%) FNR (%)
D1
D2
D3
98.5 97.8 2.2
99.1 98.2 1.8
94.2 95.3 4.7
artificially generated anomalous instances. The average of effectiveness measures over 15 runs for all datasets is shown in Table 2. As can be seen in Table 2, an average of 98.5%, 99.1%, and 94.2% detection accuracy was achieved for dataset D1, D2 and D3, respectively. Similarly, the proposed model achieves higher detection rate for all datasets. The highest detection rate was reported for dataset D2 while the lowest was reported for D3. As a result, the FNR was affected by the detection rate such that it decreases with the increase of the detection rate and vice versa. Small deviations in the results are due to datasets which were obtained from different sensor traces. The FPR was constant because the normal instances were fixed in all runs.
4.2.1.1. Effectiveness evaluation. For this type of datasets, a comparison with two existing One-Class Support Vector Machine (OCSVM)-based models from literature is conducted. The proposed model was compared with the Quarter-sphere one class support vector machine (QS-OCSVM) model that was used by many researchers [9,10,14,25] and the Hyper-plan one class support vector machine (H-OCSVM) model. The differences between those two models are the use of plane geometry in the H-OCSVM and the sphere geometry in the QS-OCSVM to solve the quadratic optimization problem in the OCSVM formulation. Those two models were chosen for the comparison because they are based on the idea of one class classifier that does not require labeled data. Several experiments were conducted on those models using the simulated-based datasets to obtain the best parameters that lead to the best effectiveness performance. Fig. 7 plots the comparison result between the proposed PCCAD and QS-OCSVM, H-OCSVM models in terms of detection accuracy, detection rate, and false negative rates over 15 runs of each model for all datasets D1-D3. Each run uses different testing set that contains different generated artificial anomalies. The x-axis denotes the number of runs while y-axis represents the effectiveness measure which includes detection accuracy, detection rate (DR) and false negative rate (FNR). The comparison results shown in Fig. 7 suggest that the proposed PCCAD model outperforms both QS-OCSVM and H-OCSVM models in terms of increasing detection accuracy. However, in terms of false negatives and detection rate, the H-OCSVM model was better in some experimental runs and in the average values for datasets D2 and D3. On the other hand, PCCAD model achieved better performance in all runs compared to the QS-OCSVM. Generally, PCCAD model performs well compared to both models for this type of datasets. The results of the PCCAD model are consistent through the runs for all data sets. Moreover, the efficient resource consumption of the proposed PCCAD model in terms of computational cost is another advantage compared to other existing models as will be explored in Section 4.3. Throughout the experiments, the parameters which are sigma and Nu in both OCSVM formulations affect the effectiveness of both models. It was found that those two parameters need to be tuned for each dataset in the experiments to obtain the best results which make those two models impractical. Oppositely, the proposed PCCAD model has no parameters to be tuned throughout experiments which make it suitable and more practical for online and real time WSN applications.
51
4.2.2. Results with histogram-based datasets In this datasets, sensor measurements were extracted from real life WSN deployments and labeled using histogram-based methods. Three dataset samples E1, E2, and E3 from this type were used to evaluate the proposed PCCAD model. One variable was selected from the IBRL dataset which is the temperature to form the dataset E1. Similarly, ambient temperature readings were extracted from the data collected by station no. 39 in LUCE deployment to form the dataset E2. Finally, Chlorophyll concentration data measurements from the buoys no. 103 in the NAMOS dataset were extracted to form the dataset E3. Different scenarios of selecting the size of training sets are investigated. The following paragraphs present the details of these scenarios for each dataset. E1 dataset contains temperature readings extracted from IBRL dataset. Four scenarios were investigated for this dataset based on the size of training set size used to obtain the normal reference model. The results on this dataset using the four scenarios are provided in Table 3. In each scenario, the size and position of training and testing sets were varied. E2 dataset is ambient temperature readings extracted from the LUCE deployment. Two scenarios were examined for this dataset which varied in training and testing set sizes and positions as shown in Table 4. In E3 dataset which exhibit constant or long period anomalies, two scenarios were considered and shown in Table 5. Only one variable which is chlorophyll concentration was extracted from NAMOS deployment to form E3 dataset. Table 3 shows that the detection rate is 100% for all scenarios. This is because E1 dataset exhibits short anomalies which are clearly distinguishable from normal observations and therefore are easy to be detected by the proposed model. However, the FPR is a concern here as it increases with the increase of testing sets size. This indicates that the training sets do not represent the behavior of the whole data. Therefore, the proposed model which was trained on the first 1000 or 2000 observations misclassified these changes as anomalies since such behavior is not seen in the training sets. Similar to E1 dataset, E2 has short anomalies. Therefore, PCCAD model easily detected them as shown in Table 4. However, the FPR is very low compared to E1 because the data sample extracted from this dataset is less dynamic. Therefore, the chosen training sample was a good representative of the whole data sample. For E3, Table 5 shows that the FPR is quite high for the first scenario of this dataset compared to E1 and E2. The detection rate is 100% because the constant anomalies are clear and easy to be detected. The results in Tables 3–5 show that in most cases the PCCAD model achieves 100% DR for all datasets. FPR is highly affected by the samples used in training set as the normal reference model represented by the (MaxDiss and MinDiss) of the dissimilarity measures depends on the chosen training measurements. Therefore, the reference normal model become rigid by the time and therefore adaptive learning of the model is needed. 4.2.2.1. Effectiveness evaluation. In this section, the proposed PCCAD model is compared with two existing anomaly detection models on the same dataset samples E1, E2, and E3. The model is evaluated in terms of detection accuracy, detection rate, false positive rate, and false negative rate. The evaluation of efficiency in terms of storage, computational complexity, and communication overhead is reported in Section 4.3. Two anomaly detection models were chosen for the comparison which are: DWT + OCSVM [17] and the DWT + SOM [16] models. Table 6 presents the result of the comparison with the two models. Table 6 shows that for E1 dataset, a significant reduction of false positive alarms is achieved by the proposed PCCAD model as highlighted. Meanwhile, all models have achieved 100% detection rate of short anomalies in this dataset. Similarly, for E2 dataset which
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
Dataset
52
Accuracy
DR
FNR
1
1
0.8 PCCAD
PCCAD
D1
0.9
0.8
0.8
PCCAD
0.7
QS-OCSVM
0
5
10
15
0.4
0.2
0
5
10
15
1
0.9
0.8
0.8
5
10
15
0.4
15
PCCAD H-OCSVM
0.2 0
5
10
15
1
1
10
0.4
0.6
QS-OCSVM
5
QS-OCSVM
QS-OCSVM
H-OCSVM
0
0.6
H-OCSVM
PCCAD
0.7
0
0.8 PCCAD
0
0
0
5
10
15
0.4
0.9
D3
QS-OCSVM
0.4
0.6
1
D2
H-OCSVM QS-OCSVM
H-OCSVM
H-OCSVM
0.6
PCCAD
0.8
H-OCSVM
0.8
PCCAD
PCCAD
0.2
QS-OCSVM
H-OCSVM
H-OCSVM
0.7
QS-OCSVM
QS-OCSVM
0
5
10
15
0.6
0
5
10
15
0
0
5
10
15
No. of Runs Fig. 7. Comparison between PCCAD and QS-OCSVM, H-OCSVM models.
Table 3 Experimental results of PCCAD model on E1 dataset. Scenario
Training set size and position
Testing set size and position
DR (%)
Accuracy (%)
FPR (%)
FNR (%)
1 2 3 4
1000 2000 1000 2000
2000 (1001–3000) 2000 (2001–4000) 19000 18000
100 100 100 100
96 100 86 87
4 0.3 14 13
0 0 0 0
(1–1000) (1–2000) (1–1000) (1–2000)
Table 4 Experimental results of PCCAD model on E2 dataset. Scenario
Training set size and position
Testing set size
DR (%)
Accuracy (%)
FPR (%)
FNR (%)
1 2
1600 (1–1600) 4000 (1–4000)
30400 28000
100 100
99.9 99.9
0.09 0.01
0 0
Table 5 Experimental results of the PCCAD model on E3 dataset. Scenario
Training set size and position
Testing set size
DR (%)
Accuracy (%)
FPR (%)
FNR (%)
1 2
3000 (1–3000) 3000 (3001–6000)
7000 7000
100 100
90.2 99.81
12 2.18
0 0
also has short anomalies, the proposed PCCAD model has successfully reduced the false alarms to nearly zero (0.09%) while other models achieve similar performance as in E1 with higher false positive rate. For E3 dataset, the PCCAD model outperforms the DWT + OCSVM model in reducing false positives but the DWT + SOM achieves better detection rate and lower false positive rate than both models. As can be seen in Table 6, the proposed PCCAD model achieves good performance in terms of detection
rate and detection accuracy with relatively low false positives. It outperforms the existing models on all datasets with an advantage for DWT + SOM on dataset D3. However, 1% of FNR is reported for the DWT + SOM model that makes the proposed PCCAD model superior in all datasets. The results in Table 6 indicate that false positives are of main concern for these kinds of datasets. The amount of false positives increases with the increase of data dynamicity as shown in Tables
53
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57 Table 6 Comparison results of the proposed PCCAD model with DWT + SOM and DWT + OCSVM models. Dataset
Model
DR (%)
Accuracy (%)
E1
DWT + OCSVM DWT + SOM PCCAD
100 100 100
98.3 99 99.7
FPR (%) 1.9 1.09 0.3
0 0 0
E2
DWT + OCSVM DWT + SOM PCCAD
100 100 100
98.3 99 99.9
1.9 1.09 0.09
0 0 0
E3
DWT + OCSVM DWT + SOM PCCAD
100 99 100
88.6 99.4 90.2
12.8 0.5 11.5
FNR (%)
0 1 0
3–5. From Tables 3–5, it was found that the FPR increases when the training set is not a representative of the data behavior and hence the normal reference model becomes rigid. More specifically, when the training samples are selected from different positions of the dataset, FPR changes substantially. To solve the issue of FPR increase, two possible solutions are suggested: first, the training set should be selected carefully to represent the behavior of data in all circumstances. This solution requires an investigation by domain experts about the characteristics of data and its different behavior in different circumstances. This solution is very expensive and impractical especially for environmental phenomena where the normality of data changes dynamically or for unattended environments. The second solution is to have an automatic procedure of updating the normal reference model by the change of data behavior. This solution requires an adaptive method that tracks the dynamicity of data and updates the normal reference model when necessary. However, the design of such adaptive model should consider the tradeoff between the effectiveness and efficiency. The result of applying the proposed APCCAD model to solve the issue of dynamic data change is presented in Section 4.2.3. 4.2.3. Results of APCCAD model This section presents the experimental results of applying APCCAD model to solve the issue of data dynamic changes which the PCCAD model failed to address in previous section. The APCCAD model incorporates a method based on the calculation of incremental standard deviation of the training dissimilarity measure to track the changes of normal measurements. The same datasets that were used to evaluate the PCCAD model are also used in this section. Same scenarios that were used to evaluate the PCCAD model are used to evaluate the adaptive APCCAD model on same dataset samples. In each scenario, different positions and sizes of training and testing datasets were used. Tables 7–9 present the result on these datasets. For dataset E1, Table 7 shows that the detection rate remains 100% for all scenarios. However, the false positives which were a concern in PCCAD model are reduced to zero. Regardless of the training set size or position, the adaptive model has successfully tracked the changes and reduced the false alarms to zero. Meanwhile, zero false negatives were reported for this dataset and
therefore the detection accuracy of 100% was achieved by this model. For dataset E2, two scenarios that vary in training and testing set sizes and positions were examined. This dataset also has short anomalies which are distinguishable from normal measurements. Table 8 shows that 100% detection rate was achieved for each scenario because the dataset exhibits short anomalies that did not produce false negatives. Besides, the FPR was also reduced to zero as the adaptive model successfully managed to track the dynamic changes in this dataset. Since the anomalies in this dataset are short and clearly distinguishable from normal measurements, the value of reference model parameter stdtr is always smaller than the stdts value that represents the incremental standard deviation of dissimilarity vector Dtest for new measurements. Therefore, the chosen training set size and position have no impact on detection rate. Dataset E3 exhibits constant anomalies expressed by low or high values compared to normal measurements for a period of time. The results of the adaptive model on this dataset are reported in Table 9. Two scenarios were considered with different training set sizes and positions. Table 9 shows that detection rate is 100% for both scenarios. These types of anomalies are distinguishable and easy to be detected similar to short anomalies. In addition, very few false positives were reported for the first scenario and zero for the second scenario. In summary, Tables 7–9 indicate that APCCAD model has successfully tracked the dynamic changes in normal data that result in high false positives as reported for PCCAD model. Meanwhile, high detection rates were achieved by the model for all datasets. 4.2.3.1. Effectiveness evaluation. In this section, the adaptive model APCCAD is compared in terms of detection effectiveness with PCCAD model and the same two existing models from the literature which are DWT + OCSVM [17] and DWT + SOM [16]. Table 10 reports the comparison results in terms of detection rate (DR), detection accuracy, false positive rate (FPR) and false negative rate (FNR) between APCCAD and PCCAD, DWT + SOM [16], DWT + OCSVM [17] models. For the comparison, the best scenario for each dataset was selected. Table 10 shows that for dataset E1, a significant improvement was achieved in reducing the false positive alarms to zero by APCCAD model. Similarly, for E2 dataset which also has short anomalies, APCCAD model successfully reduced the false alarms to zero, while other model have higher false alarms (1.9% for DWT + OCSVM, and 1.09% for DWT + SOM) on both E1 and E2 datasets. For E3 dataset, APCCAD model outperforms all models in terms of detection rate, detection accuracy, and false negative rates. However, it has a higher false positive (1.1%) compared to DWT + SOM model which produced 0.5% false positive rate. Meanwhile, the DWT + SOM incurs 1% FNR. To conclude, the proposed adaptive model effectively tracked the dynamic changes of normal measurements which are produced due to normal changes of variables by reducing FPR to zero as highlighted in Table 10 for almost all datasets. 4.2.3.2. Retraining and online update. As mentioned in the design of APCCAD model, the retraining is conducted when the number of
Table 7 Results of APCCAD model on E1 dataset. Scenario
Training set size and position
Testing set size and position
DR (%)
Accuracy (%)
FPR (%)
FNR (%)
1 2 3 4
1000 2000 1000 2000
2000 (1001–3000) 2000 (2001–4000) 19000 18000
100 100 100 100
100 100 100 100
0 0 0 0
0 0 0 0
(1–1000) (1–2000) (1–1000) (1–2000)
54
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
Table 8 Results of APCCAD model on E2 dataset. Scenario
Training set size and position
Testing set size and position
DR (%)
Accuracy (%)
FPR (%)
FNR (%)
1 2
1600 (1–1600) 4000 (1–4000)
30400 28000
100 100
100 100
0 0
0 0
Table 9 Results of APCCAD model on E4 dataset. Scenario
Training set size and position
Testing set size and position
DR (%)
Accuracy (%)
FPR (%)
FNR (%)
1 2
3000 (1–3000) 3000 (3001–6000)
7000 7000
100 100
99.1 100
1.10 0
0 0
Table 10 A comparison between APCCAD model with PCCAD and two existing models [16,17]. Dataset
The AD model
DR (%)
Accuracy (%)
FPR (%)
FNR (%)
E1
DWT + OCSVM DWT + SOM PCCAD APCCAD
100 100 100 100
98.3 99 99.7 100
1.9 1.09 0.30 0
0 0 0 0
E2
DWT + OCSVM DWT + SOM PCCAD APCCAD
100 100 100 100
98.3 99 99.9 100
1.9 1.09 0.09 0
0 0 0 0
E3
DWT + OCSVM DWT + SOM PCCAD APCCAD
100 99 100 100
88.6 99.4 90.2 99.1
12.8 0.5 11.5 1.10
0 1 0 0
classified normal measurement in the buffer reach m which is specified by the application requirement. The online update of reference model is conducted when the update criterion is satisfied. The number of updates differs among datasets. The reference parameter which is represented by the standard deviation of training data dissimilarity vector stdtr is updated when the new stdtr value is greater than the previous stored one. Table 11 reports the change of stdtr value using different time window sizes m for each dataset E1–E3. To show the required number of updates of the reference parameter (stdtr), the incremental standard deviation of each new measurement stdts is plotted for each window m of normal measurements. Fig. 8 presents plots of 8 windows, m1–m8, for dataset E1. Fig. 8 shows that the incremental standard deviation stdts value for all m windows has not exceeded 0.4. In Table 11, the stdtr value for window m1 is 1.6. The stdtr values for the subsequent m windows do not exceed m1 value. This indicates that no update of the reference parameter stdtr is required in each window for this dataset. Similarly, Fig. 9 shows the incremental standard deviation stdts for two windows of m measurements in dataset E3.
Table 11 The stdtr for different m values. stdtr/time window
E1 (m = 2000)
E2 (m = 1000)
E3 (m = 1000)
m1 m2 m3 m4 m5 m6 m7 m8
1.6 2.4 1.8 1.6 0.9 1.1 1.8 4.5
1.6 1.6 2.9 2.6 1.6 1.5 2 3.7
1.4 4.8 1.3 1.7 4.3 1.5 1.4 1.2
As can be seen, some stdts values are greater than the reference stdtr value which is 1.4 for dataset E3. In Table 10, it was shown that FPR of 1.10 was incurred by APCCAD model on E3 dataset. This rate was caused by the misclassification of some measurements that show high stdts value in Fig. 9. To further reduce the FPR in this dataset, online update of the reference parameter stdtr is required at windows m2 and m5 which show stdts values greater than the reference stdtr value as shown in Table 11. From Table 11 and Figs. 8 and 9, it is concluded that, the update is required for some datasets that have dynamic change such as dataset E3 when the update criterion is satisfied. The update in some other datasets such as E1 is not necessary as they exhibit normal changes that do not require the update of the reference model. Unlike existing adaptive models that require update after each time window or others that require update on each occurred change, the proposed adaptive model in this paper retrain the detection model per time window but update the reference model when necessary. Therefore, it is more efficient in utilizing the limited resources of sensors. 4.3. Efficiency analysis To evaluate the efficiency of the proposed models, memory utilization, computational complexity, and communication overhead are discussed and compared with QS-OCSVM and H-OCSVM models [9,10,14,25], DWT + OCSVM [17], and DWT + SOM [16] models. (i) Memory Utilization In the PCCAD model, some parameters that represent the normal reference model are calculated offline during training phase. The reference model is used later during detection phase. These parameters are: (1) The standardization parameters (l, r) that are calculated based on the training data. (2) The eigenvector matrix V and the corresponding eigenvalues k calculated after applying the CCIPCA algorithm on the training samples. (3) The maximum and minimum borders, MaxDiss and MinDiss of the dissimilarity vector Dtrain. Few additional parameters which are the standard deviation stdtr of training dissimilarity vector Dtrain, the incremental standard deviation stdts of the test dissimilarity value Dtest, and incremental mean ltest are stored in each node for the APCCAD model. The size of each parameter is fixed. However, an additional buffer of size m is required for the classifier retraining. The value of m is also fixed and depends on the application requirement. The advancement of sensor technology provides sensors with sufficient memory to perform their basic operations and store
55
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57 0.4
0.2 0.15
Stdts
Stdts
0.3 0.2 0.1 0
0.1 0.05
0
500
1000 m1
1500
0
2000
0
500
1000 m2
1500
2000
0.2
0.2
Stdts
Stdts
0.15
0.1
0.1 0.05
0
0
500
1000 m3
1500
0
2000
Stdts
Stdts
0.1
0
500
1000 m5
1500
0
500
1000 m6
1500
2000
0
500
1000 m8
1500
2000
0.2
Stdts
Stdts
2000
0.1
0
2000
0.2
0.1
0
1000 m4
0.2
0.2
0
0
0
500
1000 m7
1500
0.1
0
2000
2
2
1.5
1.5
Stdts
Stdts
Fig. 8. The stdts values for m measurements window in dataset E1.
1
0.5
0.5 0
1
0
200
400
m4
600
800
1000
0
0
200
400
m6
600
800
1000
Fig. 9. The stdts values for two windows of m measurements in dataset E3.
some data for processing. So, storing the above mentioned values are not critical issue that affects the viability of the proposed models. In comparison, existing models do not store any parameters at nodes as they do their operations at the base station. Since the above mentioned parameters of the normal reference model are fixed in size, the space occupied by them is also fixed. (ii) Computational Complexity The proposed PCCAD model has two main phases: training phase and online detection phase. Therefore, the calculation of computational complexity is only applicable for online detection phase. The main operations that affect the computational complexity at detection phase are as follows:
(1) The standardization of the real time observation using the statistical parameters (l, r) that were computed in the training phase. (2) The transformation of the real time observation into feature space using the V and k (as in Eq. 5). (3) The calculation of the dissimilarity measure Dtest using y and k (as in Eq. 6). (4) The comparison between the Dtest with MaxDiss and MinDiss to decide on the class of the new observation either normal or anomalous. In steps 1–3, the only variable that affects the computational complexity is the size of the observation in terms of number of observed variables such as temperature and humidity. If the number
56
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
Table 12 Efficiency comparison summary. Model
Computational complexity (Batch)
Computational complexity (Online)
Communication overhead
QS-OCSVM H-OCSVM DWT + OCSVM DWT + SOM PCCAD APCCAD
O(N2) O(N3) O(N) O(N) – –
O(KMN) O(KM3) O(N) + O(KM) O(N) + O(L) O(N) O(mN)
O(MN) O(MN) O(MP) O(MP) – –
of observed variables is N, then the computational complexity is O(N). Step 4 is a direct comparison between two numbers which are the Dtest of the new observation and the stored MaxDiss and MinDiss values. As the number of measured variables is constant in most WSN applications, the PCCAD model incurs constant energy consumption over the time. For APCCAD model, the computations involved in calculating the incremental standard deviation include the calculations of incremental mean and the incremental standard deviation. These calculations depend on the previous incremental mean and standard deviation values. Therefore, no loops are involved in the online detection phase. Additional computations are required for the retraining process which involves the application of OCPCC classifier and calculating the normal reference model parameters for m new normal measurements. The complexity of such retraining depends on the number of sensed variables N and the number of measurements in the buffer which is specified by m. Other calculations such as addition, multiplications and divisions have fixed computational complexity as there is no loop involved in calculations. Therefore, the overall complexity of APCCAD model is O(mN). In the existing DWT + SOM and DWT + OCSVM models, the discrete wavelet transform (DWT) was applied in each node to encode the data observations which are then sent to the base station for anomaly detection by the OCSVM or SOM. The complexity of applying DWT in nodes is only considered while ignoring the complexity of OCSVM or SOM as they are applied in the base station. Given that the number of observed variables is N, the computational complexity of the DWT is O(N). The computational complexity of applying H-OCSVM online at each node is very high as it requires O(KM3) calculations where M is the number of observations and K is the complexity of kernel function calculation. Similarly, the complexity of QS-OCSVM for online detection is O(KMN) where M is the number of observations and N is the number of observed variables. Finally, the computational complexity of SOM is proportional to the size of data observations (L) used to find the organizing map and hence the complexity is O(L). (iii) Communication Overhead The communication overhead is measured by the amount of data transmitted in the network between sensor nodes and the central node (cluster head or base station) or between nodes themselves. In the proposed PCCAD model, nothing is transmitted in the network as the detection is performed locally. Similar to PCCAD model, the adaptive APCCAD model does not incur any communication overhead as the detection is also performed locally in each node. However, in DWT + SOM and DWT + OCSVM models, the wavelets coefficients result from the application of the DWT are transmitted to the base station for anomaly detection. Therefore, if the number of the wavelet coefficient to be transmitted is P, and the number of observations in the time window is M, the communication overhead incurred by these models is O(MP). Table 12 summarizes the efficiency comparison between the proposed models and existing models.
The communication overhead is more critical than the computational complexity in sensor networks because as stated in [37], the transmission of one bit of data consumes same amount of energy needed to process thousands of bits in sensors. This means that most of sensor energy is consumed in radio communication rather than sensing or processing. This fact gives an advantage to the local detection of anomalies in sensor nodes compared to centralized detection. In comparison, the DWT + SOM and DWT + OCSVM models adopt batch learning in which the whole set of collected data were encoded by DWT and sent to the base station for detection. The delay time spent in collecting the data, transforming them into wavelet coefficients, and sending them to the base station is critical for some real time WSN applications. During this delay, new types of data anomalies may evolve and hence some important events of interest are missed. To conclude, the proposed PCCAD and APCCAD model are more efficient and hence are suitable for online detection compared to the OCSVM based models that require the solution of quadratic or linear optimization problems. (i) Parameters Tuning Through the comparison experiments on QS-OCSVM, H-OCSVM, DWT + SOM and DWT + OCSVM models, it was found that some parameters have an impact on the overall detection effectiveness for each dataset. For OCSVM, two parameters, sigma and Nu, have substantial effects on the effectiveness and hence the general performance. This issue affects the usability and robustness of the model for different WSN applications since each application needs different parameters values. Furthermore, the dynamic data changes require the tuning of such parameters to cope with these changes. Such tuning is very difficult especially for WSN applications in harsh and unattended environments. In contrary, the proposed PCCAD and APCCAD models do not require any parameters tuning during detection process.
5. Conclusions The design and development of anomaly detection models that are capable of detecting sensor data anomalies accurately, efficiently, and timely is a challenge. Most of the proposed models in the literature incurred high energy consumption and hence cannot be used for online detection. Besides, low detection accuracy and high false alarms were produced by such models which lead to low data quality. In this paper, an efficient online anomaly detection model (PCCAD) for WSNs was proposed based on the One-Class Principal Component Classifier. Throughout the experiments on datasets with simulated and real anomalies, the PCCAD model showed advantages in terms of low computational complexity while keeping the memory utilization fixed. Moreover, the new model also showed consistent performance in terms of detection accuracy and detection rate. However, the percentage of FPR when applying PCCAD model for dynamic datasets with real anomalies was affected by the training period chosen. The FPR increases when the training samples are not good representative of the whole data nature. To solve this issue, a method of tracking these changes was designed and incorporated in the design of an adaptive anomaly detection model (APCCAD). The experimental results showed that the adaptive model has successfully tracked the dynamic data changes and reduced the FPR to zero in most cases. In future, we intend to investigate the exploitation of spatial correlations exist in sensor data for further improvement of detection accuracy, especially for simulated-based datasets. This exploitation can be implemented by distributing the detection process over the network.
M.A. Rassam et al. / Knowledge-Based Systems 60 (2014) 44–57
Acknowledgments This work of Dr. Murad Abdo Rassam is supported by the PostDoctoral Fellowship Scheme for the project: ‘‘Adaptive Anomaly Detection Model for Wireless Sensor Networks’’, Research Management Center (RMC), Universiti Teknologi Malaysia (UTM). The authors would like to thank the anonymous reviewers for their helpful comments and suggestions. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.knosys.2014. 01.003. References [1] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, Wireless sensor networks: a survey, Comput. Netw. 38 (2002) 393–422. [2] V. Hodge, J. Austin, A survey of outlier detection methodologies, Artif. Intell. Rev. 22 (2004) 85–126. [3] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey, ACM Comput. Surv. (2009). [4] M. Moshtaghi, J.C. Bezdek, T.C. Havens, C. Leckie, S. Karunasekera, S. Rajasegarar, M. Palaniswami, Streaming analysis in wireless sensor networks, Wireless Commun. Mobile Comput. (2012), http://dx.doi.org/ 10.1002/wcm.2248. [5] M. Moshtaghi, T.C. Havens, J.C. Bezdek, L. Park, C. Leckie, S. Rajasegarar, J.M. Keller, M. Palaniswami, Clustering ellipses for anomaly detection, Pattern Recogn. 44 (2011) 55–69. [6] M. Moshtaghi, S. Rajasegarar, C. Leckie, S. Karunasekera, Anomaly detection by clustering ellipsoids in wireless sensor networks, in: 2009 5th International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2009, pp. 331–336. [7] S. Rajasegarar, J.C. Bezdek, C. Leckie, M. Palaniswami, Elliptical anomalies in wireless sensor networks, ACM Trans. Sen. Netw. 6 (2010) 1–28. [8] S. Rajasegarar, C. Leckie, J.C. Bezdek, M. Palaniswami, Distributed anomaly detection in wireless sensor networks, in: 10th IEEE Singapore International Conference on Communication Systems, 2006, ICCS 2006, 2006, pp. 1–5. [9] S. Rajasegarar, C. Leckie, M. Palaniswami, J.C. Bezdek, Quarter sphere based distributed anomaly detection in wireless sensor networks, in: IEEE International Conference on Communications, ICC ‘07, 2007, pp. 3864–3869. [10] Y. Zhang, N. Meratnia, P. Havinga, An online outlier detection technique for wireless sensor networks using unsupervised quarter-sphere support vector machine, in: International Conference on Intelligent Sensors, Sensor Networks and Information Processing, 2008, ISSNIP 2008, 2008, pp. 151–156. [11] Y. Zhang, N. Meratnia, P. Havinga, Hyperellipsoidal SVM-based outlier detection technique for geosensor networks, in: Presented at the Proceedings of the 3rd International Conference on GeoSensor Networks, Oxford, UK, 2009. [12] J. Branch, B. Szymanski, C. Giannella, W. Ran, H. Kargupta, In-network outlier detection in wireless sensor networks, in: 26th IEEE International Conference on Distributed Computing Systems, 2006, ICDCS 2006, 2006. [13] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, D. Gunopulos, Online outlier detection in sensor data using non-parametric models, in: VLDB ‘06: Proceedings of the 32nd International Conference on Very Large Data Bases, 2006, pp. 187–198. [14] Y. Zhang, N. Meratnia, P. Havinga, Adaptive and online one-class support vector machine-based outlier detection techniques for wireless sensor networks, in: International Conference on Advanced Information Networking and Applications Workshops, 2009, pp. 990–995.
57
[15] A.B. Sharma, L. Golubchik, R. Govindan, Sensor faults: detection methods and prevalence in real-world datasets, ACM Trans. Sen. Netw. 6 (2010) 1–39. [16] S. Siripanadorn, W. Hattagam, N. Teaumroog, Anomaly detection in wireless sensor networks using self-organizing map and wavelets, Int. J. Commun. 4 (2010) 74–83. [17] S. Takianngam, W. Usaha, Discrete wavelet transform and one-class support vector machines for anomaly detection in wireless sensor networks, in: 2011 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), 2011, pp. 1–6. [18] Y. Yao, A. Sharma, L. Golubchik, R. Govindan, Online anomaly detection for sensor systems: a simple and efficient approach, Perform. Eval. 67 (2010) 1059–1075. [19] M. Xie, S. Han, B. Tian, Highly efficient distance-based anomaly detection through univariate with PCA in wireless sensor networks, in: The 10th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom 2011), Changsha, China, 2011, pp. 564–571. [20] X. Miao, H. Jiankun, T. Biming, Histogram-based online anomaly detection in hierarchical wireless sensor networks, in: 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2012, pp. 751–759. [21] D.-I. Curiac, C. Volosencu, Ensemble based sensing anomaly detection in wireless sensor networks, Expert Syst. Appl. 39 (2012) 9087–9096. [22] M. Plastoi, O. Banias, C. Volosencu, D.-I. Curiac, Experiences in ensemble-based decision systems for wireless sensor networks, in: ICSNC 2011, The Sixth International Conference on Systems and Networks Communications, 2011, pp. 63–67. [23] M.A. Livani, M. Abadi, Distributed PCA-based anomaly detection in wireless sensor networks, in: Internet Technology and Secured Transactions (ICITST), 2010 International Conference for, 2010, pp. 1-8. [24] Y. Xie, X. Chen, J. Zhao, Data Fault Detection for Wireless Sensor Networks Using Multi-Scale PCA Method (2011) 7035–7038. [25] S. Rajasegarar, C. Leckie, J.C. Bezdek, M. Palaniswami, Centered hyperspherical and hyperellipsoidal one-class support vector machines for anomaly detection in sensor networks, IEEE Trans. Inf. Forensics Secur. 5 (2010) 518–533. [26] Y. Zhang, N. Meratnia, P.J. Havinga, Ensuring high sensor data quality through use of online outlier detection techniques, Int. J. Sensor Networks 7 (2010) 141–151. [27] N. Shahid, I. Naqvi, Energy efficient outlier detection in WSNs based on temporal and attribute correlations, in: 2011 7th International Conference on Emerging Technologies (ICET), 2011, pp. 1–6. [28] N. Shahid, I.H. Naqvi, S.B. Qaisar, Quarter-sphere SVM: Attribute and SpatioTemporal correlations based Outlier & Event Detection in wireless sensor networks, in: Wireless Communications and Networking Conference (WCNC), 2012 IEEE, 2012, pp. 2048–2053. [29] Z. Xie, T. Quirino, M.-L. Shyu, S.-C. Chen, L. Chang, UNPCC: a novel unsupervised classification scheme for network intrusion detection, in: Presented at the Proceedings of the 18th IEEE International Conference on Tools with, Artificial Intelligence, 2006. [30] W. Juyang, Z. Yilu, H. Wey-Shiuan, Candid covariance-free incremental principal component analysis, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 1034–1040. [31] M.A. Rassam, A. Zainal, M.A. Maarof, An adaptive and efficient dimension reduction model for multivariate wireless sensor networks applications, Appl. Soft Comput. 13 (2013) 1878–1996. [32] J.D. Jobson, Applied multivariate data analysis, Categorical and Multivariate Methods, vol. II, Springer-Verlage, NY, 1992. [33] D.E. Knuth, Seminumerical Algorithms, third ed., vol. 2, Addison-Wesley, Boston, 1998. [34] IBRL, Intel Berkely Reseach Lab Dataset, 2004. (accessed on 15.01.12). [35] LUCE, Lausanne Urban Canopy Experiment. (accessed on 20.02.12). [36] NAMOS, Networked Aquatic Microbial Observing System Dataset, 2006. (accessed on 15.03.12). [37] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, K. Pister, System architecture directions for networked sensors, SIGPLAN Not. 35 (2000) 93–104.