Accepted Manuscript
A Bayesian Network Model for Data Losses and Faults in Medical Body Sensor Networks Haibin Zhang, Jiajia Liu, Ai-Chun Pang PII: DOI: Reference:
S1389-1286(18)30493-6 10.1016/j.comnet.2018.07.009 COMPNW 6539
To appear in:
Computer Networks
Received date: Revised date: Accepted date:
6 September 2017 20 May 2018 2 July 2018
Please cite this article as: Haibin Zhang, Jiajia Liu, Ai-Chun Pang, A Bayesian Network Model for Data Losses and Faults in Medical Body Sensor Networks, Computer Networks (2018), doi: 10.1016/j.comnet.2018.07.009
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
A Bayesian Network Model for Data Losses and Faults in Medical Body Sensor Networks
CR IP T
Haibin Zhang, Jiajia Liu1 School of Cyber Engineering, Xidian University, No.2 South Taibai Road, Xi’an, Shaanxi, 710071, China.
Ai-Chun Pang
AN US
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, R.O.C.
Abstract
Medical body sensor network (BSN) is a promising and flexible platform for person monitoring under natural physiological status. Due to limited resources, noise and unreliable links, sensor faults and data losses are common in BSNs.
M
Most available works adopted schemes originated from traditional wireless sensor networks (WSNs) to detect faults and reconstruct data. However, these
ED
works either focused only on fault detection or failed to achieve a satisfactory reconstruction accuracy due to the lack of information redundancy in BSNs. In light of this, a Bayesian network based data reconstruction scheme is proposed
PT
in this paper, which rebuilds data using conditional probabilities of body sensor readings to recover missing data and sensor faults, rather than the redundant
CE
information collected from a large number of sensors. Note that the limited number of sensors in BSNs significantly reduces the complexity of Bayesian learning and thus enables efficient structure and parameter estimation of Bayesian net-
AC
work. Experiments on extensive online data sets have been conducted and our results show that the performance of our scheme outperforms all available data reconstruction schemes. Keywords: Reliability, Bayesian methods, fault detection, body sensor 1 Email
addresses:
[email protected],
[email protected].
Preprint submitted to Journal of LATEX Templates
July 6, 2018
ACCEPTED MANUSCRIPT
networks, medical diagnosis.
1. Introduction
CR IP T
In recent years, the medical body sensor networks have increasingly gained interests in both academia and industry [1, 2], which is expected to monitor
person’s vital signs under natural physiological status using wireless sensors 5
attached to or implanted in the body. The limited resource, noise, unreliable link and malicious attacks inevitably impact the data quality of body sensor
AN US
networks [3], including sensor faults and data losses which make detecting and
repairing important [4]. Poor data quality can heavily affect medical diagnosis, the credibility of such monitoring application and even threatens life safety of 10
the monitored person. Thus, data quality improvement is extremely important to ensure the reliability of body sensor data [5].
Fault detection [6, 7] of WSNs has been studied for many years, which forms
M
several types of fault detection schemes [8]: (1) Statistical based scheme, which assumes that the distribution of data fits for a statistical model. A data is 15
determined as a fault if the probability of the data generated by an estimated
ED
model is very low. (2) Nearest neighbor based scheme [9], which diagnoses a data as a fault if it is located far from its neighbors. (3) Clustering based scheme
PT
[10], which identifies a data as a fault if it does not belong to any cluster or its cluster is too small. (4) Classification based scheme, which classifies a data to 20
a class (normal/fault) by an estimated classification model.
CE
Fault detection can only tell us which data may be faulty, but it cannot
rebuild the fault to its truth value [11]. Data reconstruction is an important
AC
technique to recover missing data and sensor faults, which usually uses spatial and temporal correlations to estimate the values of sensors in wireless sensor
25
networks [12, 13]. Cover et al. in [9] formalized a K nearest neighbor (KNN) [9] method which could be used for data reconstruction. It utilized the values of the nearest K neighbors to estimate the value of the missing data. Rajasegarar et al. in [10] introduced a clustering based approach for data rebuilding, which
2
ACCEPTED MANUSCRIPT
grouped similar data into clusters with similar behavior to estimate the values 30
of sensors. Salem et al. in [14] utilized linear regression to estimate the value of a sensor by its neighbours’ readings. Candes et al. in [15] and Donoho in
CR IP T
[16] used a compressive sensing (CS) method to recover the whole data using a few training data. Kong et al. in [12] used an environmental space time im-
proved compressive sensing (ESTI- CS) method for data reconstruction, which 35
recovered data by finding a data set satisfying a predefined spatial-temporal correlation that minimized the Euclidean distance to the diagnosed data set.
Most of the available works for data reconstruction of wireless sensor net-
AN US
works utilize redundant information to estimate values of sensors, which cannot provide a satisfactory accuracy. They are not yet suitable for body sensor net40
work because of the very limited number of body sensors.
Bayesian network (BN) [17] based scheme learns a probabilistic graphical model using a set of training data and estimates a sensor value by calculating a conditional probability, which can aggregate readings from different sensors
45
M
at different times to provide a satisfactory estimation accuracy. There are few available works of data reconstruction for WSNs based on Bayesian network
ED
model, as there are a large number of sensors in WSNs [18] which makes the estimation scheme too difficult. Janakiram et al. in [19] and Elnahrawy et al. in [20] established a framework for outlier detection of WSNs using Bayesian
50
PT
network model. But the Bayesian network they used only considered sensor readings without the ground truth values. Krishnamachari et al. in [21], Luo et
CE
al. in [22] and Wu et al. in [23] used distributed Bayesian algorithms for fault diagnosis of event region detection in WSNs. Each sensor they used to detect
AC
event only possessed a binary value.
55
In this paper, we are motivated to formalize a data reconstruction scheme
using Bayesian network model to recover missing data and sensor faults. The Bayesian network contains the ground truth values of vital signs and sensory readings, which enables us to do performance analysis of the data reconstruction scheme. The main contributions of this paper are summarized as follows.
3
ACCEPTED MANUSCRIPT
• We formalize a Bayesian network model to capture the spatial and temporal correlations of body sensors, and provide the structure and parameter
60
learning algorithms for Bayesian network with training data. Based on
CR IP T
this model, we first provide a data reconstruction scheme for data losses and then give theoretical performance analysis of data reconstruction.
• We revised the data reconstruction scheme to rebuild sensor data containing faults based on a threshold, and provide theoretical performance
65
analysis to find the optimal threshold which minimizes the fault rate after
AN US
executing the data reconstruction process.
• We evaluate the performances of our data reconstruction scheme on an extensive online data set. The experimental results indicate that our scheme using Bayesian network model outperforms other available schemes, such
70
as KNN and ESTI-CS, which can achieve the least deviation of the rebuilt
data reconstruction.
M
data to the truth value, and reduce the fault rate to 35% of that before
The rest of this paper is organized as follows. Section 2 introduces the Bayesian network model and the problem formulation. In Section 3, we for-
ED
75
malize a data reconstruction scheme for data losses. Section 4 revises the data reconstruction scheme to recover sensor faults and provides theoretical perfor-
PT
mance analysis. In Section 5, we evaluate the performance of our scheme on an
CE
extensive online data set. Finally, Section 6 concludes this paper.
80
2. Problem Formulation and System Model In this section, we first introduce the problem formulation, then define a
AC
Bayesian network model for data reconstruction and provide a Bayesian learning process using historical training data. 2.1. Problem Formulation
85
In medical body sensor networks, we use body sensors to monitor vital signs, e.g., heart activities, blood pressure, respiration rate, saturation of oxygen in the 4
ACCEPTED MANUSCRIPT
arterial blood, and transmit the collected data periodically to a server device, e.g., a smartphone or a PDA. Then by wireless or wired connection, these data are streamed remotely to a medical doctor’s site for real time diagnosis, to a medical database for record keeping, or to the corresponding equipment that issues an emergency alert.
CR IP T
90
We use Q = (Q1 , · · · , QT ) to describe the ground truth values of n vital
signs during T time slots, where Qt = (q1t , · · · , qnt ) is a set of ground truth values at time t.
q11 q12 · · · q1T q22 · · · q2T .. .. .. . . .
AN US
q21 Q= .. . qn1
qn2 · · · qnT
We use F = (F1 , · · · , FT ) to describe the observations of sensors used to monitor is a set of observations at time t. f12 · · · f1T f22 · · · f2T .. .. .. . . . fn2 · · · fnT
ED
M
vital signs, where Ft = (f1t , · · · , fnt ) f11 f21 F = .. . fn1
fkt is assigned a specific value 0 to indicate that the kth sensor reading is missing
PT
at time t. fkt 6= qkt indicates that kth sensor incorrectly reports the value of vital sign at time t. There exist several types of data losses and faults that 95
are caused by different factors [12]. The noise and collision in BSNs may cause
CE
data losses and faults independently and randomly. Congestion may cause data losses and faults of adjacent sensor nodes during a period of time. Unreliable
AC
links which are inevitable in BSNs may cause data losses and faults frequently for some sensors. The damage or the exhaustion of the energy may cause data
100
losses and faults from a particular time slot. The data reconstruction problem can be defined as follows. • Data reconstruction problem. It is to rebuild the ground truth values of vital signs based on the gathered sensor readings. 5
ACCEPTED MANUSCRIPT
A reconstructed matrix is a matrix F¯ = (f¯)n×T generated by data recon105
struction of a sensor reading matrix F to approximate the ground truth matrix
2.2. Bayesian Network Model for Data Reconstruction
CR IP T
G.
A Bayesian network is a pair (G, θ), where G = (V, E) is a directed acyclic
graph and θ is a set of parameters. An edge e between two nodes in V denotes 110
a direct probabilistic relationship. A parameter on node v ∈ V is a probability
P (v|π(v)), where π(v) is the parent set of v [17]. If there is no parent of node
AN US
v, then the parameter on v is P (v).
We formalize a Bayesian network shown in Figure 1 to model the attributes of body sensors. In this model, X1 , X2 , · · · , Xn represent the current ground 115
truth values of n vital signs. Y1 , Y2 , · · · , Yn represent the current sensor readings of n vital signs. We select the first sensor as the diagnosed sensor. Xn+1 , Xn+2 and Yn+1 , Yn+2 respectively represent the previous and next time ground truth
M
values and sensor readings of the diagnosed vital sign.
Suppose that each Xk (1 ≤ k ≤ n + 2) has rk possible values, each Yk has gk possible values, π(Xk ) has uk possible values, X = {X1 , · · · , Xn+2 }
ED
120
has w possible values, Y = {Y1 , · · · , Yn+2 } has s0 possible values, and Y 0 = {Y2 , · · · , Yn+2 } has s1 possible values. For the convenience of description, we
PT
use Xk = i (1 ≤ i ≤ rk ) to denote that Xk is assigned to the ith value, and
similar expressions are used for Yk , π(Xk ), X, Y , Y 0 . Specially, Yk = 0 expresses 125
a missing sensor reading. Given a value i of Xk (or Yk ), we use O(i) to denote
CE
the specified value of i, e.g., Xk has 4 possible values, they are 5, 6, 7, 8, then
Xk = 3 means that Xk is currently assigned to 7 which is the 3th element of its
AC
possible values. Given a value j (1 ≤ j ≤ s0 ) of Y , a rebuilt value of j is such
a value that assigns a possible value for each Yk = 0 while all sensors whose
130
reading are not lost keep their original value unchanged, we use S(j) to denote the set of all possible rebuilt values of j. Given a value l (1 ≤ l ≤ s1 ) of Y 0 , we use S 0 (l) to denote the set of all possible rebuilt values of l. Given a value i (1 ≤ i ≤ w) of X, we use A(i, k) to denote the value z of π(Xk ) such that the 6
ACCEPTED MANUSCRIPT
Yi
Yn
Ă
Ă
Ă
Ă
Xi X2
Xn
X1 Yn+2
Xn+1
Xn+2 Y1
CR IP T
Y2
Yn+1
AN US
Figure 1: A Bayesian network for data reconstruction. Xi (1 ≤ i ≤ n) describes the ground
truth value of the ith vital sign with X1 being the diagnosed one. Yi represents the corresponding sensor reading. Xn+1 , Xn+2 describe the ground truth values of the diagnosed vital sign at the previous and next times, and Yn+1 , Yn+2 represent the corresponding sensor readings of the diagnosed vital sign. The broken circle denotes that the structure of variables X1 , · · · , Xn in it need to be learned by training data.
M
value of each Xm ∈ π(Xk ) in z is the value of Xm in i.
In the following, we use ckij to denote the conditional probability on Xk :
ED
ckij = P (Xk = j|π(Xk ) = i)
PT
bkij to denote the observation probability for the kth vital sign: bkij = P (Yk = j|Xk = i)
CE
aij to denote the transition probability on Xn+2 : aij = P (Xn+2 = j|X1 = i)
and dij to denote the conditional probability P (Xn+1 = j|X1 = i) on Xn+1
AC
with
135
dij =
aji , r1 P ali
l=1
1 |r1 | ,
if
r1 P
ali > 0
l=1
otherwise
Moreover, we use Iik to denote the probability on Xk , which is the initial probability of the ground truth values of the kth vital sign with Iik = P (Xk = i). 7
ACCEPTED MANUSCRIPT
2.3. Structure and Parameter Learning of Bayesian Network Using Bayesian network model for data reconstruction scheme, the precondition is the Bayesian learning with historical training data, which not only contains parameter learning, but also structure learning. In addition, the training
CR IP T
140
data for Bayesian learning may have missing values. We can use the supplemented expectation maximization (SEM) algorithm [24] for the learning of the
Bayesian network using these incomplete training data. SEM algorithm is di-
vided into two steps: structure learning and parameter learning. For structure 145
searching, SEM algorithm uses the expected sufficient statistics factor to re-
AN US
place sufficient statistics factor that does not exist to make the scoring function being decomposed, and finds the network structure with higher score by local searching. Then, SEM algorithm finds the parameter with the maximum score on the selected Bayesian network structure.
Let (G, θ) be a Bayesian network, and D = {D1 , · · · Dm } be a training data
M
set. We define the BIC score as
BIC(G, θ|D) = logP (D|G, θ) −
d(G) 2 logm
ED
¯ is a Bayesian network obtained by SEM algorithm from an ¯ θ) Suppose that (G, original Bayesian network, D is the training data set with missing values, and D is a complete data set obtained from D by repairing the missing values based as
PT
¯ then the BIC score of (G, θ) on D, written as B(G, θ|G, ¯ is defined ¯ θ), ¯ θ), on (G,
CE
m ¯ = P P P (Xl |Dl , G, ¯ ¯ θ) ¯ θ)logP B(G, θ|G, (Dl , Xl |G, θ) − l=1 Xl
d(G) 2 logm
where Xl is a set of variables without valuations in Dl . By the Bayesian deduc-
AC
tion, we can obtain that
where
n+2 uk rk ¯ = P P P γ G logθkji − ¯ θ) B(G, θ|G, kji k=1 j=1 i=1
d(G) 2 logm
G ¯ ¯ θ) γkji = P (Xk = i, πG (Xk ) = j|Dl , G,
8
ACCEPTED MANUSCRIPT
Algorithm 1 SEM(X, D, G0 , θ0,0 , M ). Require: X is a set of variables, D is a training data set with missing values, G0 is the initial structure, θ0,0 is the initial parameters, N is the number of
Ensure: A Bayesian network for i = 0 to ∞ do for j = 0 to M − 1 do
θi,j+1 = arg sup B(Gi , θ|Gi , θi,j ); θ
end for
CR IP T
steps for parameter optimization between two steps of structure optimization
AN US
U = {G|G is obtained from Gi by adding, deleting or rotating an edge }; (Gi+1 , θi+1,0 ) = arg max sup B(G, θ|Gi , θM ); G∈U
θ
if BIC(Gi+1 , θi+1,0 |D) ≤ BIC(Gi , θi,M |D); then return (Gi , θi,M );
end if
M
end for
πG (Xk ) is the parent set of Xk in G, and
ED
θkji = P (Xk = i, |πG (Xk ) = j)
Using training data D with missing values, the SEM algorithm for finding
150
PT
the optimal Bayesian network (G, θ) from an initial Bayesian network is given as Algorithm 1.
CE
3. Data Reconstruction for Data Losses In this section, we first formalize the data reconstruction scheme for data
losses and then provide theoretical performance analysis.
AC
155
3.1. Data Reconstruction for Random Data Losses We formalize a data reconstruction scheme considering the scenario that all
sensors correctly measure the ground truth values of vital signs except those missing data. For data reconstruction using Bayesian network model, the first 9
ACCEPTED MANUSCRIPT
step is to estimate the parameters by historical training data. In the scenario that there is no fault in the sensor data, we can estimate the structure and parameters ckij , aij and dij by training data. For each bkij , we can directly assign it as
CR IP T
1, if O(i) = O(j) bkij = 0, if O(i) 6= O(j)
The task of data reconstruction is to determine a ground truth value H
for the vital sign X1 by those sensor readings Y1 , · · · , Yn+2 containing missing
values. For each possible ground truth value of the diagnosed vital sign, we first calculate a conditional probability of this ground truth value under those known
AN US
160
sensor readings. And then we select a ground truth value which has a maximum conditional probability as the reconstructed data for the missing sensor reading. Given a value i of X1 and j of Y = {Y1 · · · Yn+2 } with Y1 = j1 6= 0, · · · , Yn+2 = jn+2 6= 0, we use αij to denote the expression
k2 =1
rn+2
···
X
kn+2 =1
b1ij1 · · · bn+2 kn+2 jn+2 dikn+1 aikn+2
M
r2 X
n Y
clkl E(m,l)
l=1
where k1 = i, m is a value of X with X1 = i, X2 = k2 , · · · , Xn+2 = kn+2 . The 165
ED
following theorem tells us a way to calculate the conditional probability used for data reconstruction.
PT
Theorem 1. Given a value j of sensor readings Y , the conditional probability of the estimated ground truth value H for X1 being a value i under those sensor
AC
CE
readings can be calculated as
Pi|j = P (H = i|Y = j) =
P
αik
k∈S(j) r1 P P
(1) αlk
l=1 k∈S(j)
Proof: By the theory of Bayesian network, the joint probability distribution
of the Bayesian network in Figure 2 can be calculated as P (X1 , · · · , Xn+2 , Y1 , · · · , Yn+2 ) = P (Y1 |X1 ) · · · P (Yn+2 |Xn+2 )P (Xn+1 |X1 )P (Xn+2 |X1 ) 10
n Q
i=1
P (Xi |π(Xi ))
ACCEPTED MANUSCRIPT
Then we can calculate the probability P (X1 , Y1 , · · · , Yn+2 ) by variable eliminating operations.
X2
Xn+2
P (Xn+1 |X1 )P (Xn+2 |X1 )
n Q
i=1
CR IP T
P (X1 , Y1 , · · · , Yn+2 ) P P = ··· P (Y1 |X1 ) · · · P (Yn+2 |Xn+2 )
P (Xi |π(Xi ))
And the probability P (H, Y1 , · · · , Yn+2 ) can be calculate as X
Pij = P (H = i, Y = j) =
P (H = i, Y = k) =
αik
(2)
k∈S(j)
AN US
k∈S(j) 170
X
The probability P (Y1 , · · · , Yn+2 ) can be calculated by summing P (H, Y1 , · · · , Yn+2 ) up. Pj = P (Y = j) =
r1 X l=1
By (2) and (3), we have
P (H = l, Y = j) =
αlk
(3)
l=1 k∈S(j)
P
αik
k∈S(j)
M
Pi|j =
r1 X X
r1 P P
αlk
l=1 k∈S(j)
ED
2
Given the estimated parameters of the Bayesian network and a value j of all correlative sensor readings Y1 , · · · , Yn+2 , we can calculate the feasible probability for each possible value of X1 by the Bayesian inference used in the proof of
PT
175
Theorem 1. Then we rebuild the missing data by a ground truth value i that
CE
can maximize the probability Pi|j in (1). 3.2. Data Reconstruction for Continuous Data Losses
AC
Some reasons such as congestion may cause continuous data losses of ad-
180
jacent sensor nodes. If all sensor readings are continuously dropped during a period of time, then we cannot rebuild the missing data by the probability Pi|j , because Y1 · · · Yn+2 are all missing or most of them are missing, the probabilities Pi|j for each possible value i of X1 are all the same. In this case, we can
first rebuild the missing data by the existing sensor readings at the beginning of 11
ACCEPTED MANUSCRIPT
185
missing time, and then rebuild the missing data by these rebuilt ground truth values step by step. Given m sensors with missing readings, there will be
m Q
rk possible rebuilt
k=1
CR IP T
values for XL = {Xz1 , · · · Xzm }. Suppose that XE = X\(XL ∪ {X1 }) =
{Xh1 , · · · , XhN } (N = n + 1 − m), l is a rebuilt value of XL with Xz1 =
l1 , · · · , Xzm = lm , j is a value of Y with Y1 = j ∗ , Yh1 = j1 , · · · , YhN = jN , Yz1 = 0 j10 , · · · , Yzm = jm , we use βijl to denote the expression
P
i1 =1
···
c1iA(o,1)
rhN
P
iN =1 N Q k=1
b1ij ∗ bhi11j1 · · · bhiNNjN bzl11j 0 · · · bzlmmj 0 dik1 aik2 m
1
chikkA(o,hk )
m Q
czlkkA(o,zk )
AN US
rh1
k=1
where o is a value of X with X1 = i, Xz1 = l1 , · · · , Xzm = lm , Xh1 = i1 , · · · , XhN = iN , k1 (k2 ) is the value of Xn+1 (Xn+2 ) in XL or XE. Then we calculate a probability for the data reconstruction of X1 ,
M
βikl
k∈S(j)
Pi|jl = P (H = x1 |Y = j, XL = l) =
r1 P
P
i0 =1 k∈S(j)
(4)
βi0 kl
and rebuild the continuous missing data for X1 by a ground truth value i that
ED
190
P
can maximize the probability Pi|jl in (4).
PT
3.3. Performance Analysis of Data Reconstruction for Data Losses To evaluate performances of data reconstruction, a metrics is defined as
CE
follows.
Definition 1. The deviation ratio (DR) is the metrics used to measure the
AC
reconstruction deviation. We denote it by r P (f¯kt − qkt )2
195
DR =
k,t,fkt =0
r
P
(qkt )2
k,t,fkt =0
Given sensor readings j of Y , we use R(j) to denote the reconstructed data i which can maximize the probability Pi|j . 12
ACCEPTED MANUSCRIPT
Theorem 2. Given the estimated parameters of the Bayesian network, the de-
DR =
i=1 j=1 l∈S(j)
s
αil (O(R(j)) − O(i))2
r1 P
i=1
(5)
CR IP T
viation ratio can be calculated as s s0 r1 P P P
Ii1 (O(i))2
Proof: Suppose the rate of data losses is pl, then the number of missing data with the ground truth value X1 = i in T time slots is
AN US
o1 = pl ∗ T ∗ P (X1 = i) in which the number of data with Y = j is
o2 = o1 ∗ P (Y = j|X1 = i) then DR can be calculated as
o2 (O(R(j))−O(i))2
i=1 j=1
s
r1 P
i=1
M
r1 P s0 P
o1 (O(i))2
=
s
r1 P s0 P
P
αil (O(R(j))−O(i))2
i=1 j=1 l∈S(j)
s
r1 P
i=1
Ii1 (O(i))2
2
ED
DR =
s
The metrics DR can distinguish the performances of different data recon-
200
struction schemes. A better scheme should have a smaller DR. However, if the
PT
sensor readings contain faults, then the proposed data reconstruction scheme and the metrics DR will not be applicative. We formalize a revised data recon-
CE
struction scheme for sensor data containing faults in the following section.
4. Data Reconstruction for Sensor Data Containing Faults
AC
205
The gathered readings of body sensors may not only have missing values,
but also contain faults. In this scenario, we can first detect the faults using the proposed data reconstruction scheme above, and then rebuild all faults as the estimated values.
13
ACCEPTED MANUSCRIPT
210
4.1. Sensor Faults Rebuilding In the scenario that sensor data contain faults, the sensor reading may not be equal to ground truth value of a vital sign. Thus, the training data used
CR IP T
to estimate the structure and parameters of Bayesian network model are not complete, which are only composed of sensor readings without the ground truth 215
values of vital signs. To estimate the structure and parameters of Bayesian
network, we can use Algorithm 1 after completing each sample by adding ground truth values and weights of vital signs.
Given sensor readings j of Y with Y1 = j1 and a threshold 0 ≤ δ ≤ 1. If the 220
AN US
estimated value R(j) for X1 obtained by calculating the probability Pi|j in (1) satisfies
|O(R(j)) − O(j1 )| > δ
(6)
then the sensor reading j1 is diagnosed as a fault and further rebuilt as the estimated value R(j). Otherwise, j1 is used as the rebuilt value of the sensor reading which is considered correctly report the ground truth value of the vital
4.2. Performance Analysis for Optimal Threshold Scheme
ED
225
M
sign.
Let T F be the number of faults correctly rebuilt, F F be the number of faults that are still faults after reconstruction, F C be the number of correct data that
PT
are rebuilt to faults, and T C be the number of correct data that are rebuilt as themselves. To evaluate the performance of the data reconstruction for sensor faults to find the optimal threshold, three metrics are defined as follows.
CE
230
Definition 2. The fault rate after reconstruction (FRR) is the proportion of faults among the total number of sensory data after reconstruction. We denote
AC
it by
FRR =
FF + FC TF + TC + FF + FC
Definition 3. The true fault reconstruction rate (TFRR) is the proportion of faults that are correctly rebuilt. We denote it by TFRR =
TF TF + FF
14
ACCEPTED MANUSCRIPT
Definition 4. The false reconstruction rate for correct data (FRRC) is the proportion of correct data that are incorrectly rebuilt to faults. We denote it by FC FC + TC
CR IP T
FRRC =
An efficient data reconstruction scheme should have a high TFRR, as the faults that are not correctly rebuilt will be left behind. In addition, it should
have a small FRRC, as the correct data may be rebuilt to newly produced faults.
For theoretical performance analysis, we define two functions. Given sensor
AN US
readings j of Y with Y1 = j1 , a ground truth value i of X1 and a threshold δ, 1, if |O(R(j)) − O(j )| ≥ δ 1 L(j) = 0, otherwise
M
1, if |O(R(j)) − O(j1 )| ≥ δ and L0 (j) = |O(R(j)) − O(i)| ≤ δ 0, otherwise
To calculate theoretical TFRR and FRRC, let us first consider the prob-
ED
ability P (Y 0 |Y1 , X1 ). Given a value l of Y 0 = {Y2 , · · · , Yn+2 }, we have 0
235
=
P
αim
m∈S(j)
PT
Pl|ki = P (Y = l|Y1 = k, X1 = i) =
P (Y 0 =l,Y1 =k,X1 =i) P (Y1 =k,X1 =i)
P l0
αij 0
where j is a value of Y with Y1 = k, Y 0 = l, j 0 is a value of Y with Y1 = k, Y 0 = l0 .
CE
Theorem 3. Given the estimated parameters of the Bayesian network and the
AC
fault rate p of the diagnosed sensor, TFRR can be calculated as P Ii1 b1ik L0 (j) αij 0 g r s 1 1 1 X X X j 0 ∈S(j) P TFRR = p∗ αim i=1 k(6=i)=1 l=1
(7)
l0 ∈S 0 (l)
and FRRC as
FRRC =
r1 X s1 X i=1 l=1
Ii1 b1ik L(j) (1 − p) ∗ 15
P
j 0 ∈S(j)
P
l0 ∈S 0 (l)
αij 0
αim
(8)
ACCEPTED MANUSCRIPT
where j is a value of Y with Y1 = k, Y 0 = l, m is a value of Y with Y1 = k, Y 0 = l0 . 240
k(6= i) in TFRR denotes O(k) 6= O(i), k in FRRC satisfies O(k) = O(i). Proof: Suppose that the number of received sensor data is T , then the
o1 = T ∗ P (X1 = i)
CR IP T
number of diagnosed sensor readings with the ground truth value X1 = i is
The number of faulty sensor readings with Y1 = k and O(k) 6= O(i) in those o1 readings is o2 = o1 ∗ P (Y1 = k|X1 = i)
AN US
The number of faulty sensor readings with Y 0 = l in those o2 readings is o3 = o2 ∗ P (Y 0 = l|X1 = i, Y1 = k)
in which the number of faults that can be detected and correctly rebuilt is o4 = o3 ∗ L0 (j)
M
where j is a value of Y with Y1 = k, Y 0 = l. Then we can calculate TFRR as
ED
TFRR =
r1 P
=
g1 P
s1 P
p∗T
i=1 k(6=i)=1 l=1
s1 P
o4
L0 (j)P (X1 =i)P (Y1 =k|X1 =i)Pl|ki
p P Ii1 b1ik L0 (j) αij 0 s 1 P j 0 ∈S(j) P p∗ αim i=1 k(6=i)=1 l=1 l0 ∈S 0 (l) r1 P
PT
=
r1 P
g1 P
i=1 k(6=i)=1 l=1
g1 P
In the previous o1 sensor readings with X1 = i, the number of correct data
CE
with Y 0 = l is
o5 = o1 ∗ P (Y 0 = l|Y1 = k(= i), X1 = i)
AC
where k(= i)denotes O(k) = O(i) and the number of sensor readings in o5 that are incorrectly diagnosed as faults is o6 = o5 ∗ L(j)
where j is a value of Y with Y1 = k, Y 0 = l and the FRRC can be calculated as follows. 16
ACCEPTED MANUSCRIPT
0.20
0.40
KNN [6]
KNN [6] 0.36
ESTI-CS [8] BN
0.16
ESTI-CS [8] BN
0.32
0.12
0.08
0.04
0.24
0.20
0.16
0.12
0.08
0.04
0.00
0.00 0.1
0.2
0.3
0.4
0.5
0.1
Missing data and fault rates of Heart Rate
0.2
CR IP T
Deviation Ratio
Deviation Ratio
0.28
0.3
0.4
0.5
Missing data and fault rates of Heart Rate
(a) Comparison of deviation ratio on a dataset (b) Comparison of deviation ratio on a dataset of 5000 groups of medical data.
AN US
of 9000 groups of medical data.
Figure 2: Performance comparison between our scheme and the available data reconstruction schemes for data losses.
o6
(1−p)∗T
=
r1 P s1 P
i=1 l=1
L(j)P (X1 =i)Pl|ki
=
1−p
r1 P s1 P
i=1 l=1
M
FRRC =
r1 P s1 P
i=1 l=1
Ii1 b1ik L(j) (1−p)∗
P
j 0 ∈S(j)
P
l0 ∈S 0 (l)
αij 0
αim
2
After executing the data reconstruction process, there exist three types of
245
ED
faults, one type is the faults that are not detected, another type is the detected faults that are incorrectly rebuilt, the other type is the faults that are newly
PT
introduced by incorrect reconstruction. FRR can be calculated as FRR = p ∗ (1 − TFRR) + (1 − p) ∗ FPRC
(9)
CE
The last question is how to select the threshold, which is to find a threshold
250
δ that can minimize FRR. FRR in (9) is a function of threshold δ, so the operation of finding the optical threshold is to solve the function FRR(δ) for
AC
the minimum value. It is very difficult to obtain the minimum value of FRR(δ).
Here, we provide an approximate process to find the optimal threshold. We first select a discrete threshold set Θ and then calculate the FRR for each δ ∈ Θ. The
255
threshold whose FRR is the smallest will be selected as the optical threshold.
17
ACCEPTED MANUSCRIPT
5. Numerical Results In this section, we evaluate the performance of our data reconstruction scheme for data losses and sensor faults compared with other available schemes
260
CR IP T
by a simulator developed in C++. 5.1. Experimental Methodology 5.1.1. Ground Truth Values
We respectively utilize data sequences containing 3 attributes: mean blood
pressure, heart rate and oxygen saturation of 9000 and 5000 time slots as the
265
AN US
ground truth values of vital signs. Those data are selected from an online medical dataset of the PhysioNet database [25]. 5.1.2. Artificial Data Losses and Sensor Faults
For those 9000 groups of online medical data, we select 8000 as the training data to estimate the Bayesian network, and 1000 as diagnosed data. For those
270
M
5000 groups of online medical data, we select 4800 as the training data and 200 as diagnosed data. For the simulation of data reconstruction for data losses,
ED
we artificially delete some of diagnosed data to produce missing data. For the simulation of data reconstruction for sensor faults, we use injected faults which are obtained by artificially modifying some of the diagnosed data. The locations
275
PT
of injected faults, missing data and the values of injected faults are all selected by random numbers.
CE
5.1.3. Experiment Procedure The procedure of this experiment is: (1) Learn the structure and parameters
AC
of the Bayesian network using the historical training data. (2) Generate the ground truth matrix Q according to the diagnosed data. (3) Generate the sensor
280
reading matrix F according to the ground truth matrix Q, artificial data losses and sensor faults. (4) Construct the rebuilt matrix F¯ by the data reconstruction
scheme. (5) Calculate the metrics DR, FRR, TFRR and FRRC.
18
ACCEPTED MANUSCRIPT
To verify the effectiveness of our data reconstruction scheme, we evaluate the performances of two classic methods ESTI-CS [12] and KNN [9] as comparison. 285
Given a sensor reading matrix F , ESTI-CS is an algorithm used to find a ground
CR IP T
truth matrix Q which can minimize ||F − Q||, where || · || is the Frobenius norm
used to measure the deviation between F and Q. The KNN method utilizes
the values of the nearest K neighbors to estimate the ground truth value of the missing data. Three series of experiments are conducted in this section. The 290
first two experiments evaluate the performances of data reconstruction for data
losses and sensor faults of different methods. The third experiment verifies the
AN US
matching of our theoretical prediction and simulation results. 5.2. Performance Comparisons for Data Losses
To evaluate the performance of the our data reconstruction scheme for data 295
losses, we first artificially delete some data in the 1000 and 200 groups of diagnosed data with the missing rates of mean blood pressure and oxygen saturation
M
being fixed at 10%, then we rebuild data and calculate the metrics DR. As a comparison, we evaluate the performances of the ESTI-CS and KNN
300
ED
methods. The K in KNN of our simulation is set as 3, which has a better DR. Figure 2 shows that our data reconstruction scheme possess the best performance. Even 50% data have been lost, our scheme can rebuild the data
PT
with DR less than 8%. In the simulation on 9000 groups of medical data, the DR of ESTI-CS and KNN are close to 15%. In the simulation on 5000 groups of medical data, the DR of ESTI-CS and KNN are all over 26%. Our scheme is better than other schemes in data reconstruction. However, when the sensor
CE 305
data contain faults, the direct data reconstruction and the metrics DR are not
AC
inapplicable. We conduct another simulation for data reconstruction of sensor faults in the following subsection. 5.3. Performance Comparisons for Data Faults
310
To evaluate the performance of the our data reconstruction scheme for sensor data containing faults, we artificially delete data and inject faults into the 1000
19
ACCEPTED MANUSCRIPT
0.60
1.0
KNN [6] 0.9
0.50
BN
0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00
0.8
0.7
0.6
KNN [6]
0.5
ESTI-CS [8] 0.4
BN
0.3
0.2
0.1
0.0 0.0
0.1
0.2
0.3
0.4
0.0
Missing data and fault rates of Heart Rate
KNN [6] ESTI-CS [8] BN
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00 0.0
0.3
0.4
(b) Comparison of TFRR.
0.50
0.40
0.2
0.1
AN US
False Reconstruction Rate for Correct Data
0.1
Missing data and fault rates of Heart Rate
(a) Comparison of FRR.
0.45
CR IP T
ESTI-CS [8] True Fault Reconstruction Rate
Fault Rate after Reconstruction
0.55
0.2
0.3
0.4
M
Missing data and fault rates of Heart Rate
(c) Comparison of FRRC.
ED
Figure 3: Performance comparison between our scheme and the available data reconstruction schemes on 9000 groups of medical data. (a) Comparison of FRR: fault rate after reconstruction. (b) Comparison of TFRR: true fault reconstruction rate. (c) Comparison of FRRC:
PT
false reconstruction rate for correct data.
and 200 groups of diagnosed data with the missing data rate and fault rate of
CE
mean blood pressure and oxygen saturation being fixed respectively at 3% and 6%.
To verify the effectiveness of our scheme, ESTI-CS and KNN are also selected
315
AC
for comparison. In this simulation, the threshold δ for ESTI-CS, KNN are set
as 3 which can approximatively obtain the smallest FRR, and the missing data
rate is set at 50% of the sensor fault rate.
320
Figure 3 gives simulation results on the 9000 groups of medical data. Fig-
ure 3(a) shows that our scheme has the best performance of FRR with the number of faults reduced to 35% of faults before data reconstruction. Fig20
ACCEPTED MANUSCRIPT
0.70
1.0
KNN [6]
BN
True Fault Reconstruction Rate
Fault Rate after Reconstruction
0.9
ESTI-CS [8]
0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00
0.8
0.7
0.6
KNN [6] ESTI-CS [8]
0.5
BN
0.4
0.3
0.2
0.1
0.0 0.0
0.1
0.2
0.3
0.4
0.0
0.1
Missing data and fault rates of Heart Rate
0.2
0.3
0.4
Missing data and fault rates of HR
(a) Comparison of FRR.
(b) Comparison of TFRR.
0.60 0.55
AN US
False Reconstruction Rate for Correct Data
CR IP T
0.65
KNN [6]
0.50
ESTI-CS [8]
0.45
BN
0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.1
0.2
0.3
0.4
M
0.0
Missing data and fault rates of Heart Rate
(c) Comparison of FRRC.
ED
Figure 4: Performance comparison between our scheme and the available data reconstruction schemes on 5000 groups of medical data. (a) Comparison of FRR: fault rate after reconstruction. (b) Comparison of TFRR: true fault reconstruction rate. (c) Comparison of FRRC:
PT
false reconstruction rate for correct data.
ure 3(b) indicates that our scheme has the best TFRR, ESTI-CS and KNN
CE
have a very small TFRR, which is less than 30%. From our simulation, we find that the fault detection rate, which is the proportion of faults that are cor-
325
rectly identified as such, of ESTI-CS and KNN are all between 75% and 80%,
AC
but the false reconstruction rate for detected faults, which is the proportion of detected faults that are incorrectly rebuilt, of KNN are around 65%, and the false reconstruction rate for detected faults of ESTI-CS are over 50%. Thus, the proportion of faults that are correctly rebuilt of ESTI-CS and KNN are
330
very small. Figure 3(c) tells us that our scheme has the smallest FRRC. The
21
ACCEPTED MANUSCRIPT
Table 1: The confidence intervals of the simulation with 5000 groups of data. Fault probability 0.05 0.10 0.15 0.20
FRRC(%)
1.16
2.09
2.88
4.39
[1.07,1.25]
[1.94,2.23]
[2.70,3.06]
[4.15,4.62]
The average
86.7
84.6
84.1
80.8
Confidence interval
[84.9,88.4]
[83.2,85.9]
[83.0, 85.3]
[80.0,81.9]
The average
0.51
0.60
0.64
0.68
Confidence interval
[0.48,0.56]
[0.54,0.67]
[0.51,0.67]
[0.59,0.77]
0.35
0.40
Fault probability FRR(%) TFRR(%) FRRC(%)
0.25
0.30
The average
5.94
7.32
Confidence interval
[5.69,6.20]
[7.06,7.59]
The average
78.6
77.7
Confidence interval
[77.6,79.6]
[76.8,78.6]
The average Confidence interval
CR IP T
TFRR(%)
The average Confidence interval
9.71
12.1
[9.41,10.0]
[11.7,12.5]
73.9
71.6
[73.1,74.8]
[70.7,72.4]
AN US
FRR(%)
0.79
0.90
0.95
1.22
[0.68,0.90]
[0.79,1.02]
[0.77,1.03]
[1.05,1.39]
FRRC of ESTI-CS and KNN are very large, which is even over the fault rate. The simulation indicates that there are a large number of faults which are left
M
or newly produced after data reconstruction. This leads to a very large fault rate which even larger than that before fault reconstruction. Figure 4 gives simulation results of another experiment on the 5000 groups of
335
ED
medical data, which shows similar performances to the first simulation on 9000 groups of data. Figure 4(b) indicates that our scheme has a better TFRR, which is over 75%. The TFRR of ESTI-CS and KNN are still small. The
340
PT
simulation tells us that the fault detection rates of ESTI-CS and KNN are respectively over 95% and 90%. The false reconstruction rate for detected faults
CE
of ESTI-CS and KNN are respectively around 80% and 70%. We can make a conclusion that it is not the efficient data reconstruction that generates the high FRR of ESTI-CS and KNN, which has been verified by their large FRRC
AC
and small TFRR. This simulation is the average result of 200 times random
345
experiments. We also calculate the the confidence intervals with a confidence level 95%. Table 1 shows the average values and the confidence intervals of FRR, TFRR and FRRC with different fault rates, which show that the lower and upper confidence limits are very close to the average values.
22
ACCEPTED MANUSCRIPT
5.4. Theoretical Prediction Against Simulation Results As we know, the performance of the proposed data reconstruction scheme
350
for data containing faults is impacted greatly by the threshold, and it is difficult
CR IP T
to select an appropriate threshold. We give a process to calculate the theoretical optimal threshold in the previous section. Is the calculated optimal threshold
the right optimal threshold? To verify this, we calculate the theoretical opti355
mal threshold and show the theoretical performances of the experiments in the previous subsection, then we compare the results to those of simulation. The
performances of the proposed scheme in Figure 3 are simulation results under
AN US
the optimal thresholds, which are obtained as: (1) For each fault rate p, we se-
lect Θ = {1, · · · , 10} as the discrete threshold set. (2) For each threshold δ ∈ Θ, 360
we conduct 300 simulations. In each simulation, we first inject faults by random numbers, the number of injected faults are decided by the fault rate p. Then we reconstruct data using the proposed scheme, and calculate the metrics FRR. The performance is the average result of those 300 times randomized simula-
365
M
tions. (3) We select the threshold δ which minimizes the simulation FRR as the optimal threshold. On the other side, we can calculate theoretical optimal
ED
threshold with the Bayesian network estimated by historical training data. For the experiment of 9000 groups of data, the optimal thresholds obtained by the simulation and the theoretical algorithm are all 3 when the fault rates are from
370
PT
5% to 40%. The performances of both simulations and theoretical predictions of the proposed data reconstruction scheme under the optimal thresholds are
CE
shown in Figure 5, which show that the results of simulation and theoretical prediction match well. Body sensors usually use batteries for power supplying, so the energy con-
AC
sumption is one of the major problems for body sensors. Data collecting and
375
transmitting are two main consumptions of energy for body sensors. Using the data reconstruction scheme based on the Bayesian network, we can prolong the sampling time of data to enhance the battery life. If we need some data between two sampling points, we can use the substituted data reconstructed by the Bayesian network. 23
ACCEPTED MANUSCRIPT
0.10
Fault Rate after Reconstruction
True Fault Reconstruction Rate &
0.9
0.8
0.7
0.6
0.5
Theoretical fault rate after reconstruction Simulated fault rate after reconstruction
0.4
Simulated true fault reconstruction rate Theoretical true fault reconstruction rate
0.3
0.2
0.1
0.0 0.0
0.1
0.2
0.3
0.4
Missing data and fault rates of Hear Rate
0.09
Theoretical results 0.08
Simulated results
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00 0.0
0.1
CR IP T
False Reconstruction Rate for Correct Data
1.0
0.2
0.3
0.4
Missing data and fault rates of Heart Rate
(a) Comparison of FRR and TFRR.
(b) Comparison of FRRC.
AN US
Figure 5: Theoretical prediction against simulation results. (a) Comparison of FRR: fault rate after reconstruction, and TFRR: true fault reconstruction rate. (b) Comparison of FRRC: false reconstruction rate for correct data.
380
6. Conclusions
The medical body sensor network is used for personal monitoring under
M
natural physiological status. The limited resource, noise, unreliable link and malicious attacks lead to the poor data quality which can heavily affect the monitoring effect, or even threat the life of monitored person. In this paper, we formalize a Bayesian network based data reconstruction scheme for body
ED
385
sensor networks. We first formalize a data reconstruction scheme for data losses
PT
and then give theoretical performance analysis. Further, we revised our data reconstruction scheme to rebuild data for sensor faults. Finally, we evaluate the performance of our data reconstruction scheme on an online dataset, which indicates that our scheme outperforms all available data reconstruction schemes.
CE
390
AC
Acknowledgement Part of this work has been supported by National Natural Science Foun-
dation of China (No. 61771373, 61771374, 61601357), China 111 Project (No. B16037), in part by the Fundamental Research Fund for the Central Universities
395
(No. JB181508, JB171501, JB181506, JB181507), and “13th Five-Year” Plan Equipment Pre-Research Foundation of China (No. 6140134040216HT76001). 24
ACCEPTED MANUSCRIPT
References References
CR IP T
[1] X. Cao, J. Chen, Y. Zhang and Y. Sun. “Development of an integrated wireless sensor network micro-environmental monitoring system,” ISA transac-
400
tions, vol. 47, no. 3, 2008, pp. 247-255.
[2] H. Zhang, J. Liu and N. Kato. “Threshold tuning based wearable sensor fault detection for reliable medical monitoring using Bayesian network
405
AN US
model,” IEEE Systems Journal, 2016, DOI: 10.1109/JSYST.2016.2600582.
[3] D. Zeng, P. Li, S. Guo, T. Miyazaki, J. Hu and Y. Xiang. “Energy minimization in multi-task software-defined sensor networks,” IEEE Transactions on Computers, vol. 64, no. 11, 2015, pp. 3128-3139.
[4] K. Zhang, K. Yang, X. Liang, Z. Su, X. Shen and H. H. Luo. “Security
M
and privacy for mobile healthcare networks: from a quality of protection perspective,” IEEE Wireless Communications, vol. 22, no. 4, 2015, pp.
410
ED
104-112.
[5] O. Salem, Y. Liu and A. Mehaoua. “Anomaly detection in medical wireless sensor networks,” Journal of Computing Science and Engineering, vol. 7,
415
PT
no. 4, 2013, pp. 272-284.
[6] H. Zhang, J. Liu and R. Li. “Fault detection for medical body sensor net-
CE
works under Bayesian network model,” Proc. IEEE MSN, 2015, pp. 37-42. [7] K. Xing, S. Zhang, L. Shi, H. Zhu and Y. Wang. “A localized backbone
AC
renovating algorithm for wireless ad hoc and sensor networks,” in Proc.
420
IEEE INFOCOM, 2013.
[8] Y. Zhang, N. Meratnia, and P. Havinga. “Outlier detection techniques for wireless sensor networks: a survey,” IEEE Communications Survey and Tutorials, vol. 12, no. 2, 2010, pp. 159-170.
25
ACCEPTED MANUSCRIPT
[9] T. Cover and P. Hart. “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, 1967, pp. 21-27. 425
[10] S. Rajasegarar, C. Leckie, M. Palaniswami, and J.C. Bezdek. “Distributed
CR IP T
anomaly detection in wireless sensor networks,” in Proc. IEEE ICCS, 2006. [11] H. Zhang, J. Liu and A. Pang. “A data reconstruction model addressing loss
and faults in medical body sensor networks,” Proc. IEEE GLOBECOM, 2016, DOI: 10.1109/GLOCOM.2016.7841491. 430
[12] L. Kong, M. Xia, X.-Y. Liu, M.-Y. Wu, and X. Liu. “Data loss and recon-
AN US
struction in sensor networks,” in Proc. IEEE INFOCOM, 2013.
[13] J. Liu and N. Kato. “A Markovian analysis for explicit probabilistic stopping based information propagation in post-disaster ad hoc mobile networks,” IEEE Transactions on Wireless Communications, vol. 15, no. 1, 2015, pp. 81-90.
435
M
[14] O. Salem, A. Guerassimov, A. Mehaoua, A. Marcus and B. Furht. “Sensor fault and patient anomaly detection and classification in medical wireless
ED
sensor networks,” in Proc. IEEE ICC, 2013. [15] E. J. Candes and T. Tao. “Near-Optimal Signal Recovery from Random Projections: Universal Encoding Strategies?” IEEE Transactions on Infor-
440
PT
mation Theory, vol. 52, no. 12, 2006, pp. 5406-5425. [16] D. L. Donoho. “Compressed sensing,” IEEE Transactions on Information
CE
Theory, vol. 52, no. 4, 2006, pp. 1289-1306.
[17] Z. Yang and R. N. Wright. “Privacy-preserving computation of Bayesian networks on vertically partitioned data,” IEEE Transactions on Knowledge
AC
445
and Data Engineering, vol. 18, no. 9, 2006, pp. 1253-1264.
[18] D. Zeng, L. Gu, S. Guo, Z. Cheng and S. Yu. “Joint optimization of task scheduling and image placement in fog computing supported softwaredefined embedded system,” IEEE Transactions on Computers, vol. 65, no.
450
12, 2016, pp. 3702-3712. 26
ACCEPTED MANUSCRIPT
[19] D. Janakiram, A. M. Reddy and A. P. Kumar. “Outlier detection in wireless sensor networks using Bayesian belief networks,” in Proc. IEEE COMSWARE, 2006.
455
CR IP T
[20] E. Elnahrawy and B. Nath, “Context-aware sensors,” in Proc. EWSN, 2004.
[21] B. Krishnamachari and S. Iyengar.“Distributed Bayesian algorithms for fault-tolerant event region dectection in wireless sensor networks,” IEEE Transactions on Computers, vol. 53, no. 3, 2004, pp. 241-250.
[22] X. Luo, M. Dong, and Y. Huang. “On distributed fault-tolerant detection
AN US
in wireless sensor networks,” IEEE Transactions on Computers, vol. 55, no. 1, 2006, pp. 58-70.
460
[23] W. Wu, X. Cheng, M. Ding, K. Xing, F. Liu, and P. Deng. “Localized outlying and boundary data detection in sensor networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 8, 2007, pp. 1145-1157.
M
[24] C. W. Duarte, Y. C. Klimentidis, J. J. Harris, M. Cardel, and J. R. Fernndez. “Using EM to obtain asymptotic variance-covariance matrices: The
465
ED
SEM algorithm,” Journal of the American Statistical Association, Vol. 86, No. 416 , 1991, pp. 899-909.
AC
CE
PT
[25] “Physionet,” http://www.physionet.org/cgi-bin/atm/ATM.
27
AN US
CR IP T
ACCEPTED MANUSCRIPT
Haibin Zhang received his B.Sc. degrees from the Ocean University of
470
China in 2003, and Ph.D. degree from the Xidian University in 2007. He joined School of Computer Science and Technolosy of Xidian University as a lecturer in 2008. He is currently an associate professor with the School of Cyber Engineering of Xidian university. His research interests concentrate on formal
M
verification, wireless sensor networks.
AC
CE
PT
ED
475
28