Expert Systems with Applications 40 (2013) 3248–3255
Contents lists available at SciVerse ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Data reconciliation in a smart home sensor network Dorothy N. Monekosso a,⇑, Paolo Remagnino b a b
Faculty of Computing and Engineering University of Ulster, Shore Road, Co. Antrim, BT37 0QB, UK Faculty of Science, Engineering and Computing, Kingston University, Penrhyn Road, KT1 2EE, UK
a r t i c l e
i n f o
Keywords: Sensor data analysis Error detection Principal component analysis Canonical correlation analysis
a b s t r a c t This paper describes a data-driven approach to sensor data validation. The data originates from a network of sensors embedded in an indoor environment such as an office, home, factory, public mall or airport. Data analysis is performed to automatically detect events and classify activities taking place within the environment. Sensor failure and in particular intermittent failure, caused by electrical interference, undermines the inference processes. PCA and CCA are compared for detecting intermittent faults and masking such failures. The fault detection relies on models built from historical data. As new sensor observations are collected the model is updated and compared to that previously estimated, where a difference is indicative of a failure. Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction This paper describes a data-driven approach to online data reconciliation and validation for a small network of sensors. The sensors are embedded in an environment such as office facilities to monitor activity and detect unusual activity or behaviors. Using the language of automated diagnosis, the activity or behavior under observation corresponds to the process under observation. Statistical approaches are employed to analyze the data and infer the type of activity taking place. As sensors reading are collected they are checked for validity in real time. These sensors are susceptible to interference or could fail, undermining the performance of the system. This paper presents a method to detect and mask the readings deemed to be in error. Masking can be achieved in a number of ways; in this work statistical models are employed. A difficulty in this application results from the fact that the process (activity) under observation can cause deviations in the sensor outputs that are indistinguishable from noise and/or sensor faults. The approach adopted is to model the relationships between sensors rather than individual sensors and using these relationships to cross-validate and correct incoming readings. Thus the first step is to determine the relationships between sensor outputs and construct a statistical model. The main contribution of this paper is a data-driven method for detecting permanent and transient faults on a small network of sensors; it is exploits the sensor–sensor relationships to deal with uncertainty in sensor observations caused by noise or sensor failure. Sensor–sensor relationships are discovered using historical data. The search for similarity extends ⇑ Corresponding author. Tel.: +44 117 3287126. E-mail addresses:
[email protected],
[email protected] (D.N. Monekosso). 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.12.037
beyond sensors observing a similar process or proximity of sensors. It attempts to characterize any relationship over time. The methodology was tested offline on data from a smart home. Section 2 introduces related work, while Section 3 describes the methodology. Results are presented in Section 4 and discussed in Section 5. Finally, Section 6 concludes the paper.
2. Related work The objective is to detect and correct measurement errors. The errors encountered are either systematic (gross errors) or random fluctuations. This work deals with random errors caused by electrical and other types of interference and gross errors caused by sensor faults. In a first step, a sensor reading or readings are identified as anomalous. A second step reconciles all erroneous readings. The problem is essentially one of diagnosing faults, although, unlike most diagnosis, characterizing the nature of the fault is not necessary. Diagnosis of system faults is a field that has been widely studied in applications such as aerospace (Dearden et al., 2004), process control (Juricek, Seborg, & Larimore, 2004a), and electricity networks (Bauer, Botea, Grastien, Haslum, & Rintanen, 2011). With the proliferation of distributed communication networks, and in particular sensor networks, the diagnosis of such systems has also received much attention (Chen, Kher, & Somani, 2006; Gao, Xu, & Li, 2007; Hai Li, Price, Stott, & Marshall, 2007; Jiang, 2009; Krishnamachari & Iyengar, 2004; Lee & Choi, 2008). Most networks are large networks; however similar techniques have been applied to smaller networks. Kim and Prabhakaran (2011) describe fault diagnosis for a very small network, a body sensor network (BSN). The characteristics of BSN are similar to the sensor network presented here, albeit much smaller in size
D.N. Monekosso, P. Remagnino / Expert Systems with Applications 40 (2013) 3248–3255
than the environment network. There is a relatively small number of sensor nodes; each node comprising a wireless communication unit and a sensing element. The papers mentioned above are broadly classified as modelbased or data driven, although there is some overlap incorporating both model and data-driven. A model may be created from historical data in data-driven methods. The model-based approach may employ physical and other types of models. Irrespective of the type of model, it is used to reason about the behavior of the system, comparing the observed behavior with expected (predicted) behaviors. Approaches to model-based diagnosis differ principally by the type of model employed. These include Bayesian (Abreu, Zoeteweij, & van Gemund, 2009; Krishnamachari & Iyengar, 2004), and hidden Markov models (HMM) (Srivastava, 2005; Ying, Kirubarajan, Pattipati, & Patterson-Hine, 2000). Prediction filters have also been investigated in the form of Kalman filters (Kim, Suk, & Kyung, 2010) and particle filters (Zhou & Liu, 2010). The diagnosis problem may be cast as a feature classification problem and standard classification methods used such neural networks (Maidon, Jervis, Dutton, & Lesage, 1997; Venkatasubramanian, Vaidyanathan, & Yamamoto, 1990), fuzzy classifier (Lo, Fung, & Wong, 2009), and clustering (Iverson, 2004). The work presented a statistical model built from historical data. Statistical multivariate techniques (Ma, Wong, Jang, & Tseng, 2010) have been employed in data-driven diagnosis. A popular statistical approach is principle component analysis (PCA) (Wise, Gallagher, Butler, White, & Barna, 1999; Yue, Qin, Markle, Nauert, & Gatto, 2000; Zhang & Wang, 2004; Zhou, Zhang, & Wang, 2004). More recent is the canonical correlation analysis (CCA) and variants applied to fault detection as proposed in (Chen, Jiang, & Yoshihira, 2006; Juricek, Seborg, & Larimore, 2004b; Kang, Chen, & Jiang, 2010). The model-based and data-driven methods have been adapted to wireless sensor networks (WSN). Fault detection in WSN exploits the correlation in a network of sensor nodes. For example, two temperature sensors in a network in close proximity are likely to observe a similar temperature. Thresholds that must not be exceeded are defined for the difference between two sensor observations and for the increment in a unit time step (Lee & Choi, 2008). Jiang (2009) improves on this with a weighted average scheme. The author in Krishnamachari and Iyengar (2004) exploits correlation using a Bayesian algorithm. The proposed method exploits correlation; however it does not rely purely on proximity and/or sensor similarity. In many safety critical systems, a redundant unit replaces a faulty unit. In a static redundancy scheme, a voting system selects one from at least 3 units without performing fault detection. In a dynamic scheme, the faulty unit is replaced by a redundant unit after fault detection has identified the faulty unit. Fault hiding can be used to recover seamlessly from a fault (Guenab, Weber, Theilliol, & Zhang, 2011; Steffen, 2006; Richter, 2011). Once a sensor fault is isolated, the expected sensor output (from the model) is used to validate and if necessary mask the real faulty value in subsequent analyses.
3. Methodology The objective when analyzing smart environment data is to detect anomalous events or activities. The standard method is to build models of normal events and any new activity detected is compared. If an activity has not been previously detected, it is deemed abnormal. This assumes correct sensor measurements. Any significant deviation of raw measurements from the norm not resulting from the process under observation will undermine the analyses. A method is proposed to disambiguate process deviation from sensor deviation; combining standard data
3249
reconciliation techniques with failure detection techniques. The proposed method comprises two concurrent threads. One thread deals with random measurement fluctuations using a standard data reconciliation technique while the second thread detects and locates the source of systematic deviations in sensor readings (gross error) which may be caused either by a faulty sensor or anomalous process. If a failed sensor cannot be found, it is assumed that the process under observation is the source of the systematic deviation. Models are constructed from historical data and refined with incoming sensor readings. Using these models a sensor fault can be detected and the faulty sensor located. There is no need to identify the fault mechanism. Thus expected-behavior is modeled rather than fault-behavior as is normally the case in fault diagnosis. The failed sensor is located by modeling the relationships between the sensors in normal operation. The system dynamically searches for relationships between sensors and models the found relationships. Each new sensor reading is tested against the known relation, to establish its validity. The challenge is to find, for each sensor, the set of sensors that correlate and their relationship which may be dynamic. An intelligent environment contains a heterogeneous array of sensors. Consequently, the sensor readings differ in data type, amplitude, and frequency and may comprise continuous (e.g. temperature) and discrete (e.g. switch state), compound (e.g. an image). A representation that caters for all these types is required. A suitable representation uses a probability distribution allowing all sensors to be treated in the same manner and information to be combined. 3.1. Data reconciliation The classic data reconciliation technique (Eq. (1)) is not suitable on its own, if reconciling a datum means that an underlying deviation in the process will also be masked. Eq. (1) assumes no systematic errors are present in the measurement and that the measurement noise is random. This is the case for interference seen within the sensor network, which comprises sensors and all measurement related equipment. For an ensemble of n measurements yi (sensor readings)
yi ¼ xi þ e; where yi the ith sensor measurement, and xi is the ith true (unmeasured) value. The value e is the measurement error, consisting of random fluctuations only and has a Gaussian distribution. The objective is to minimize the least square correction error
2 n X yi yi min x;y
i¼1
ri
ð1Þ
Subject to the activities maximum project duration of a day and bounding x and y, the sensor minimum and maximum values:
ymin 6 y 6 ymax xmin 6 x 6 xmax where yi is the corrected value of the ith sensor measurement, and ri is the standard deviation, yi is the sensor measurement that maximizes the correlation with neighboring sensors. Neighborhood does not simply imply physical proximity of two or more sensors but it also implies sensors that exhibit some relationship (i.e. correlate). Canonical Correlation Analysis (CCA) is used to find correlating sensors. 3.2. Systematic error detection Sensor signals are usually multi-dimensional and might span a large spectrum/space. Signatures of an unfolding scene tend to exist in a subspace. Methods have been proposed to discover the
3250
D.N. Monekosso, P. Remagnino / Expert Systems with Applications 40 (2013) 3248–3255
underlying subspace and then exploit such subspace to compare observations and ensembles there-of. Manifold learning is a relatively new area of research (Huo, Ni, & Smith, 2007; Lee & Verleysen, 2007) that has moved a long way from much simpler but still widely used techniques, such as the principal component analysis (PCA). PCA makes the assumption that measurements/signals are unimodal and the underlying density function can be approximated with an elliptical (Gaussian) shape. The elliptical nature of a Gaussian is then exploited to discover the axes along which the ensemble is mostly scattered. Once such directions are estimated, they are subsequently used to project new data. In addition to the limitation imposed by the assumption of the underlying density function, PCA also does not provide a straightforward means to compare two ensembles of observations of different size potentially lying in different subspaces. An extension to this was devised by the canonical correlation analysis (CCA), which extends the PCA method, compensating for the main PCA shortcoming. The next two sections describe the two methods in detail. 3.2.1. Principal component analysis (PCA) PCA estimates a transformation in a feature space such that a set of uncorrelated variables is mapped into a set of uncorrelated variables; the latter are the principal components. PCA assumes that the underlying scatter of observations that define the cloud is Gaussian/elliptical. Eigenvalue decomposition is used to find the components, i.e. directions, with the largest variation. Given an ensemble of n observations of features in an m-dimensional space, a X matrix of m n can be defined. PCA applies singular value decomposition to decompose XT into a matrix of eigenvectors, W (m m of XXT), a rectangular diagonal matrix S(m n) and the V matrix of eigenvectors (n n of XXT). The transformation Y that preserves dimensionality is provided by
Y T ¼ X T W ¼ VST W T W ¼ VST
ð2Þ
T
W is an orthogonal matrix, Y is simply a rotation of the corresponding row of XT. Once the eigenvalues are extracted, these are sorted in decreasing order and the first l eigenvalues are used to define a subspace onto which observations can be projected. If we assume the first l eigenvalues of interest and Wl the first l corresponding eigenvectors, then the projection mathematically can be written as Y ¼ W Tl X ¼ Sl V T ; where Sl = IlxmS with Ilxm the lxm rectangular identity matrix. Projections onto the discovered subspace can then be used to compare observations and ensembles. 3.2.2. Canonical correlation analysis In order to discover relationships between sensors, we explore the interactions between sensor groups. The group membership is based on domain knowledge, which sensors are expected to trigger for a given activity or related activity. We use Canonical Correlation Analysis (CCA) (Hastie, Tibshirani, & Friedman, 2008; Johnson & Wichern, 2007) to investigate and identify patterns of interrelationship between the sets of sensor output, between sets of sensors and activities and to produce model equations that describe sensor-sensor and sensor-activity relationships. PCA lacks a means to compare ensembles of different size. The canonical correlation analysis (CCA) is a method related to principal component analysis (PCA). While PCA finds a transformation for a random variable x such that its components are uncorrelated, CCA works with pairs of random vectors x and y of n and m dimensions to search for a linear transformation x = xTwx and y = yTwy such that one component within each set of transformed variables is correlated with a single component in the other set. CCA works by maximizing q in the following expression to estimate the canonical variates wx and wy
h i E wTx xyT wTy E½xy ffi ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi q ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi iffi h E½x2 E½y2 E wTx xxT wTx E wTy yyT wTy
ð3Þ
where xxT = cov (x, x), xyT = cov (x, y) and yyT = cov (y, y). CCA then estimates another pair of vectors to maximize the same correlation subject to the constraint that they are uncorrelated with the first pair of canonical variables. This procedure can be continued up to min{m, n} times. Ultimately, the mathematical manipulations follow more or less the same sequence of steps, constructing a Lagrangian equation and maximizing the functional. The problem is to detect permanent as well as transient faults using canonical correlation analysis (CCA) and principal component analysis (PCA). CCA is then employed to mask fault. 4. Experimental results The dataset used to test our method was obtained from a publicly available repository,1 courtesy of van Kasteren, Noulas, and Kröse (2008). A number of sensors are installed in the environment, recording unfolding activities. The sensors monitor door status e.g. open or close, for any type of door, the operational status of appliances and motion. The sensors are Microwave (1), Hall-Toilet-door (2), Hall-Bathroom door (3), Cups-cupboard (4), Fridge (5), Platescupboard (6), Front-door (7), Dishwasher (8), Toilet-Flush (9), Freezer (10), Pans Cupboard (11), Washing-machine (12), Groceries-Cupboard (13), Hall-Bedroom-door (14). 4.1. Discovering relationships 4.1.1. Using PCA Principal component analysis is performed on the full sensor set during a specific time for which ground truth is available for the activity ‘preparing breakfast’ activity. The results of a PCA are the scores for each component (aka factor) which are the transformed variable values corresponding to a particular data point, and the loadings which are the weights by which the original set of variable are multiplied to get the component score. Figs. 1 and 2 show pictorially how the loadings form the principal components; a scree plot and corresponding distribution (2 components only) are shown. Table 1 shows the loadings under no fault and fault operation. The first 3 components account for 64% (32%, 19%, and 13% respectively) of the total process variation in the case of no fault and 66% (32%, 20%, and 14% respectively) of the total process variation. These factors were considered to be the most significant based on a scree test. This test gives the number of significant factors to retain in PCA. Each eigenvalue defines the degree of the spread along the corresponding eigenvector thus in the plot of eigen values against factor number (as shown in Fig. 1 (a) and Fig. 2 (b)) the point at which the steep slope levels off is an indication of the number of significant factors. This corresponds to the point at which the rate of change of variance slows in scree plot. From this diagram the most significant sensors are sensors #2, #6, #10, and #11 which is in agreement with the activity taking place (ground truth). 4.1.2. Using CCA The correlation for a group of sensors during the ‘preparing breakfast’ activity was investigated; the relationship among the full sensor set was investigated. The sensors are divided into two groups; group 1 and group 2. The sensors that never registered activity during the chosen activity were excluded from the 1
http://sites.google.com/site/tim0306/datasets.
D.N. Monekosso, P. Remagnino / Expert Systems with Applications 40 (2013) 3248–3255
3251
Fig. 1. PCA Loadings (no fault).
Fig. 2. PCA Loadings (fault condition).
Table 1 Proportion of variance (relationships).
No fault Fault
Comp 1
Comp 2
Comp 3
0.316371 0.3237876
0.1910524 0.1955247
0.1328107 0.1357811
investigation. Group 1 contains all active sensors and group 2 contains all sensors expected to be particularly active for the chosen activity; in this case preparing/eating breakfast thus making use of domain knowledge. Fig. 3 shows cross correlation between group 1 (p variables) and group 2 (q variables). The XY correlation map is a (p ⁄ q) matrix showing correlation between the X and Y variables. Negative high correlation is indicated by blue while high positive correlation is indicated by red-brown. Cyan and green indicate very low to zero correlation. The maps capture the expected relationship confirming that during a breakfast preparation activity there is higher correlation between sensors on the cup-cupboard and the fridge. It also captures other relationship not necessarily associated with the breakfast activity such as that between Hall-Toilet-door (2) and Hall-Bathroom door (3). The correlation landscape is a model of the inter-relationship between sensors, groups of sensors and sensors and activities. It is more informative to analyze the evolution of the correlation function. Intuitively it is expected to be dynamic and non-linear. We investigated the correlation as it evolves. The CCA is estimated at every time step. While the previous maps show the instantaneous correlation for all sensors, the diagrams in Figs. 4 and 5 show the correlation (extracted from the successive landscapes) at successive time steps for sensor pairs (#2 and #3) and (#3 and #6)
Fig. 3. Correlation Map – breakfast activity (Act 5).
for the chosen activity. Fig. 4 shows a case of increasing correlation, which is expected given domain knowledge of sensor-activity dependence, while Fig. 5 shows a case of poor correlation again
3252
D.N. Monekosso, P. Remagnino / Expert Systems with Applications 40 (2013) 3248–3255
Fig. 4. Evolution of 1st factor during fault condition.
Fig. 5. Evolution of 2nd factor during fault condition.
expected. In the first case the correlation settles to an upper limit. In Fig. 4, a high (relative) correlation is a positive value of 0.25 and above; most importantly the correlation found is corroborated by the domain knowledge. 4.2. Systematic error (Fault) detection 4.2.1. Using PCA Experiment #1: A fault was injected into the dataset to represent a permanent fault on the toilet sensor for ‘using toilet’ activity. The simulated fault on the sensor (#9) occurred from time index 107 onwards. The method makes use of the fact that PCA is sensitive to outliers to detect a fault; on the assumption that a failure manifests as an outlier. The first 3 components account for 80% (37%, 29%, and 14% respectively) of the total process variation in the case of no fault and 80% (36% and 29% respectively) of the total process variation in the case of fault (see Table 2).
Online detection of fault is achieved by detecting changes in the most significant factors (scores). Thus the evolution of the first three factors was investigated. Figs. 6 and 7 show the evolution of the first two factors just prior to and after a fault occurring. The capability of the system to discriminate between no fault and fault condition is investigate using distance between the PC scores under the no fault and fault conditions. The distance measure used is the Earth moving distance (EMD). The graph in Fig. 8 shows the distance between the three distributions (representing each of the three factors) as a function of sample size (size of history considered) and based on the 3 most significant factors. This curve is an indication of the discriminating properties of the method. An alternative method for discriminating between no fault and fault conditions uses Hotelling T2 statistics; a measure of distance from center a sensor reading is. The null hypothesis (there is no change in the most significant factors) is tested using Hotelling’s statistics; a sensor fault should cause the null hypothesis to be rejected. Table 3 shows Hotelling T2 statistics as a function of sample size (of history considered) for detection immediately before the fault. At the 5% significance level, the null hypothesis is accepted for all meaning that the fault cannot be detected. Table 4 shows Hotelling T2 statistics as a function of duration of the fault before detection. At the 5% significance level, the null hypothesis is accepted for all cases meaning that the fault cannot be detected. Experiment #2: A fault was injected into the dataset to represent a single transient sensor fault. The simulated fault again on sensor (#9) occurred from time index 107 – in the first instance a single datum was corrupted and in a second instance two successive sensor readings were substituted with incorrect values to simulate the fault. The results in Table 5 show that a transient fault lasting 1 time unit equivalent to 1 sample time will not be detected. 4.2.2. Using CCA In this section, CCA for fault detection is presented and compared to PCA. A fault was injected into the dataset to represent a single sensor fault. The simulated fault on the sensor (#9) occurred at time index 107 – one and two successive sensor readings were replaced with incorrect values to simulate the fault. The new correlation map for sensors activated during an activity is shown in Fig. 9. A failure is indicated by an abrupt change in the evolving correlation landscape. A visual inspection of the correlation map in Fig. 9 without a priori knowledge of the location of the fault yields little information. A more meaningful visualization is obtained by isolating the candidate sensors. The graph in Fig. 10 shows the evolving correlation between the two candidate sensors during activity; before and after the fault injection. However comparing quantitatively the maps yields more information on candidate sensors to investigate further. These curves show the correlation as it evolves with and without the fault. In Fig. 10(b) the correlation drops at time t = 107 following injection of the fault. Note that the graph displays the relationship (extracted from the landscape) between sensor #2 and #9. The fall in correlation threshold required to trigger fault was predefined as a 20% percentage decrease. Given that the
Table 2 Proportion of Variance (faults).
No fault Fault
Comp 1
Comp 2
Comp 3
0.3708627 0.3599644
0.2940374 0.2916248
0.1392598 0.1495811
3253
D.N. Monekosso, P. Remagnino / Expert Systems with Applications 40 (2013) 3248–3255 Table 4 Hotelling T2 stat vs. duration of undetected failure. Duration
Hotelling T2
P-value
2 5 7
1.3382 7.4981 4.9014
0.4547 0.01036 0.01892
Table 5 Detecting a transient fault. Duration of fault
EMD
Discriminate
1 2 4
0 0.06682704 0.1547841
No ? Yes
Fig. 6. Distance between scores using the most significant factors.
Fig. 7. Correlation over time between two sensors triggered during Activity 2.
Fig. 9. Correlation Map - case 1: faulty sensor.
correlation curve is ‘noisy’, the curve is smoothed using a running average is calculated. In practice, the smoothing window was preset but in theory the correlation threshold and smoothing window could be learnt and be adaptive.
4.3. Fault isolation and recovery Fig. 8. Uncorralated sensors triggered during Activity 2.
Table 3 Hotelling T2 stat for different sample size. History size
Hotelling T2
P-value
18 13 10 8 4
0.0015396 0.034689 0.19918 0.31529 0.5945
0.9999 0.9911 0.8956 0.8141 0.6413
If we can ascertain which sensor reading is anomalous (which sensor is faulty), then the failure can be masked. Given that we have models of the sensor, we can predict the expected sensor output value. Isolating the faulty sensor – The color maps shown in Fig. 11 is the difference between correlation matrices at t = 100 and t = 114 i.e. before and after fault. The color map (a) shows little variation (not measurable) under no fault condition. However small random fluctuations (noise) unrelated to faults can be expected. Greater variation is observed under fault condition (b) with a significant color (difference) for sensors #2 and #9. The conclusion is that one of the two (sensor #2 or #9) may be faulty.
3254
D.N. Monekosso, P. Remagnino / Expert Systems with Applications 40 (2013) 3248–3255
Fig. 10. Correlation between sensors 2 & 9 during Activity 5: fault on sensor #9 @ Time = 107.
Fig. 12. Correlation curves for Sensors #2, #3, and #9 before and after failure.
To disambiguate between the two possible causes of failure, the correlation between the two suspects (#2 and #9) and another sensor is investigated separately e.g. between #2 and #9 and between #3 and #9. Sensor #9 is known to be active during the chosen activity. The results are shown in Fig. 12. From Fig. 12, it can be seen that there is no change to sensor #2 after the failure is detected but there is a drop in correlation for sensor #9 which is indeed the faulty sensor.
Results have shown the feasibility of fault detection, the experiments described in this paper were performed on real data but off line. For real-time operation, the data rates from the sensors as well as computation time place additional constraints. All sensor modality in the environment have very low output data rates. In addition, there is generally little or no activity on sensors for long periods of time followed by bursts. During pre-processing the recorded sensor measurements were transformed from time stamped values to a time series comprising a vector of dimension n, for n sensors. The sample rate thus dictates the real time capability together with computation time. In the experiments described above, the sample rate was set to 10 s to cater the short and quick burst of some sensors though oversampling many others. Ultimately the computational speed is the most important factor in determining the response time to a fault. The remaining issue is the concern that a faulty sensor can compromise the security of the system and/or safety of the occupants. Fail safe mechanisms operating at hardware level can be used to deal with concerns over response time. Further work is needed to quantify the rate of detected faults in real time.
5. Discussion
6. Conclusions
The problem was to detect a single permanent fault and a transient fault on a sensor network and masking any detected fault. Two techniques were employed, PCA and CCA. The CCA method has the advantage of detecting the fault without delay. Detection with PCA is achieved by extrapolating i.e. assuming that the selected significant factors do not change by any significant amount in normal operation other than small fluctuations. To test this, a model is built however the accuracy of the model employed for extrapolation is important. A model based on historical data yields detection in no less than 4 sampling time units (Table 5).
The main contribution of the work presented in this paper is addressing the issue of sensor failure and recovery for a relatively small network of sensors. The aim was to develop a methodology to deal with permanent and transient faults affecting one or more sensors in a small sensor network such as that found in a smart home, smart office building, or somewhat larger public space such as a shopping mall or travel – port. This paper described results of the methodology applied to a smart home however the methodology can be generalised and scaled up to apply to a larger network of sensors. By transforming the data from the array of heterogeneous sensors into a single representation subsequent analysis of these
Fig. 11. Difference color map – fault on sensor #9.
D.N. Monekosso, P. Remagnino / Expert Systems with Applications 40 (2013) 3248–3255
data is independent of the nature of the sensor type. In addition, the sensor system can be replaced by any type of system. For example the observations could be the direct output from any system. This work builds on methodologies to monitor and analyse (human) in an intelligent environment, making it possible to distinguish between anomalous (human) behaviour and sensor failure. References Abreu, R., Zoeteweij, P. & van Gemund, A. J. C. (2009). A Bayesian Approach to Diagnose Multiple Intermittent Faults. In Proceedings of international workshop on principles of diagnosis (pp. 27–33). Bauer, A., Botea, A., Grastien, A., Haslum, P. & Rintanen, J. (2011). Alarm processing with model-based diagnosis of discrete event systems. In Proceedings of the International Workshop on Principles of Diagnosis, (pp. 52–59). Chen, H., Jiang, G. & Yoshihira, K. (2006). Fault detection in distributed systems by representative subspace mapping. In Proceedings of the IEEE international conference on pattern recognition, vol. 4 (pp. 912–915). Chen, J., Kher, S., & Somani, A. (2006). Distributed Fault Detection of Wireless Sensor Networks. Proceedings of workshop on dependability issues in wireless ad hoc networks and sensor networks (DIWANS) La, CA (pp. 65–72). ACM. Dearden, R., Willeke, T., Simmons, R., Verma, V., Hutter, F., & Thrun, S. (2004). Realtime fault detection and situational awareness for rovers: Report on the Mars technology program task. In Proceedings of the IEEE aerospace conference, vol. 2 (pp. 826–840). Gao, J., Xu, Y., & Li, X. (2007). Weighted-median based distributed fault detection for wireless sensor networks. Journal of Software, 18, 1208–1217. Guenab, F., Weber, P., Theilliol, D., & Zhang, Y. (2011). Design of a fault tolerant control system incorporating reliability analysis and dynamic behaviour constraints. International Journal of Systems Science, 42(1), 219–233. Hai Li, Price, M. C., Stott, J. & Marshall, I. W. (2007). The development of a wireless sensor network sensing node utilising adaptive self-diagnostics. In Proceedings of IWSOS 2007 (p. 1900-01-01). Hastie, T., Tibshirani, R., & Friedman, J. H (2008). The elements of statistical learning: Data mining. inference and prediction (2nd ed.). Springer Verlad. Huo, X., Ni, X. S., & Smith, A. K. (2007). A survey of manifold-based learning methods. In T. W. Liao & E. Triantaphyllou (Eds.), Recent advances in data mining of enterprise data (pp. 691–745). Singapore: World Scientific. Iverson, D. L. (2004). Inductive system health monitoring. Proceedings of international conference on artificial intelligence, IC-AI ’04. Las Vegas, Nevada, USA: SREA Press. Jiang, P. (2009). New method for node fault detection in wireless sensor networks. Sensors, 9, 1282–1294. Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (6th ed.). Prentice Hall. Juricek, B. C., Seborg, D. E., & Larimore, W. E. (2004a). Fault detection using canonical variate analysis. Industrial and Engineering Chemistry Research, 43(2), 458–474. Juricek, B. C., Seborg, D. E., & Larimore, W. E. (2004b). Fault detection using canonical variate analysis. Industrial and Engineering Chemistry Research, 43(2), 458–474. Kang, H., Chen, H. & Jiang, G. (2010). PeerWatch: A fault detection and diagnosis tool for virtualized consolidation systems. In Proceedings of the international conference on autonomic computing (pp. 119–128).
3255
Kim, Duk-Jin, & Prabhakaran, B. (2011). Motion fault detection and isolation in Body Sensor Networks. Pervasive and Mobile Computing, 7(6), 727–745. Kim, M. H., Suk, L., & Kyung, C. L. (2010). Kalman predictive redundancy system for fault tolerance of safety–critical systems. IEEE Transactions on Industrial Informatics, 6(1), 46–53. Krishnamachari, B., & Iyengar, S. (2004). Distributed bayesian algorithms for faulttolerant event region detection in wireless sensor networks. IEEE Transactions on Computers, 53(3), 241–250. Lee, M., & Choi, Y. (2008). Fault detection of wireless sensor networks. Computer Communications, 31, 3469–3475. Lee, J. A., & Verleysen, M. (2007). Nonlinear dimensionality reduction. Springer. Lo, C. H., Fung, E. H. K., & Wong, Y. K. (2009). Intelligent automatic fault detection for actuator failures in aircraft. IEEE Transactions on Industrial Informatics, 5(1), 50–55. Maidon, Y., Jervis, B. W., Dutton, N., & Lesage, S. (1997). Diagnosis of multifaults in analogue circuits using multilayer perceptrons. Proceedings of Institute of Electrical Engineering – Circuits Devices Systems, 144(3), 149–154. Ma, Ming-Da, Wong, D. S.-H., Jang, Shi-Shang, & Tseng, Sheng-Tsaing. (2010). Fault detection based on statistical multivariate analysis and microarray visualization. IEEE Transactions on Industrial Informatics, 6(1), 18–24. Richter, J. H. (2011). Reconfigurable control problem and fault-hiding approach. In Reconfigurable control of nonlinear dynamical systems. Lecture notes in control and information sciences: A fault hiden approach (vol. 408, pp. 33–54). Springer. Srivastava, A. (2005). Discovering system health anomalies using data mining techniques. In Proceedings of joint army navy NASA air force conference on propulsion, Charleston, SC, USA. Steffen, T. (2006). Control reconfiguration after actuator failures using disturbance decoupling methods. IEEE Transactions on Automatic Control, 51(10), 1590–1601. van Kasteren, T. L. M., Noulas, A. K., Englebienne G. & Kröse, B. J. A. (2008). Accurate activity recognition in a home setting. In Proceedings of ACM international conference on ubiquitous computing (pp. 1–9). Venkatasubramanian, V., Vaidyanathan, R., & Yamamoto, Y. (1990). Process fault detection and diagnosis using neural networks I: Steady state processes. Computers and Chemical Engineering, 14(7), 699–712. Wise, B. M., Gallagher, N. B., Butler, S. W., White, D. D., Jr., & Barna, G. G. (1999). A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process. Journal of Chemomotrics, 13, 379–396. Ying, J., Kirubarajan, T., Pattipati, K. R., & Patterson-Hine, A. (2000). A hidden Markov model-based algorithm for fault diagnosis with partial and imperfect tests. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 30(4), 463–473. Yue, H. H., Qin, S. J., Markle, R. J., Nauert, C., & Gatto, M. (2000). Fault detection of plasma etchers using optical emission spectra. IEEE Transactions on Semiconductor Manufacturing, 13(3), 374–385. Zhang, J. & Wang, S. (July 2004). Fault diagnosis in industrial processes using principal component analysis and hidden markov model. In Proceedings of american control conference, vol. 6 (pp. 5680–5685). Zhou, K., & Liu, L. (2010). Unknown fault diagnosis for nonlinear hybrid systems using strong state tracking particle filter. In International conference on intelligent system design and engineering application (pp. 850–853). Zhou, S., Zhang, J., & Wang, S. (2004). Fault diagnosis in industrial processes using principal component analysis and hidden Markov model. In Proceedings of american control conference, vol. 6 (pp. 5680–5685).