A Bayesian network model for data losses and faults in medical body sensor networks

A Bayesian network model for data losses and faults in medical body sensor networks

Accepted Manuscript A Bayesian Network Model for Data Losses and Faults in Medical Body Sensor Networks Haibin Zhang, Jiajia Liu, Ai-Chun Pang PII: D...

820KB Sizes 1 Downloads 25 Views

Accepted Manuscript

A Bayesian Network Model for Data Losses and Faults in Medical Body Sensor Networks Haibin Zhang, Jiajia Liu, Ai-Chun Pang PII: DOI: Reference:

S1389-1286(18)30493-6 10.1016/j.comnet.2018.07.009 COMPNW 6539

To appear in:

Computer Networks

Received date: Revised date: Accepted date:

6 September 2017 20 May 2018 2 July 2018

Please cite this article as: Haibin Zhang, Jiajia Liu, Ai-Chun Pang, A Bayesian Network Model for Data Losses and Faults in Medical Body Sensor Networks, Computer Networks (2018), doi: 10.1016/j.comnet.2018.07.009

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

A Bayesian Network Model for Data Losses and Faults in Medical Body Sensor Networks

CR IP T

Haibin Zhang, Jiajia Liu1 School of Cyber Engineering, Xidian University, No.2 South Taibai Road, Xi’an, Shaanxi, 710071, China.

Ai-Chun Pang

AN US

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, R.O.C.

Abstract

Medical body sensor network (BSN) is a promising and flexible platform for person monitoring under natural physiological status. Due to limited resources, noise and unreliable links, sensor faults and data losses are common in BSNs.

M

Most available works adopted schemes originated from traditional wireless sensor networks (WSNs) to detect faults and reconstruct data. However, these

ED

works either focused only on fault detection or failed to achieve a satisfactory reconstruction accuracy due to the lack of information redundancy in BSNs. In light of this, a Bayesian network based data reconstruction scheme is proposed

PT

in this paper, which rebuilds data using conditional probabilities of body sensor readings to recover missing data and sensor faults, rather than the redundant

CE

information collected from a large number of sensors. Note that the limited number of sensors in BSNs significantly reduces the complexity of Bayesian learning and thus enables efficient structure and parameter estimation of Bayesian net-

AC

work. Experiments on extensive online data sets have been conducted and our results show that the performance of our scheme outperforms all available data reconstruction schemes. Keywords: Reliability, Bayesian methods, fault detection, body sensor 1 Email

addresses: [email protected], [email protected].

Preprint submitted to Journal of LATEX Templates

July 6, 2018

ACCEPTED MANUSCRIPT

networks, medical diagnosis.

1. Introduction

CR IP T

In recent years, the medical body sensor networks have increasingly gained interests in both academia and industry [1, 2], which is expected to monitor

person’s vital signs under natural physiological status using wireless sensors 5

attached to or implanted in the body. The limited resource, noise, unreliable link and malicious attacks inevitably impact the data quality of body sensor

AN US

networks [3], including sensor faults and data losses which make detecting and

repairing important [4]. Poor data quality can heavily affect medical diagnosis, the credibility of such monitoring application and even threatens life safety of 10

the monitored person. Thus, data quality improvement is extremely important to ensure the reliability of body sensor data [5].

Fault detection [6, 7] of WSNs has been studied for many years, which forms

M

several types of fault detection schemes [8]: (1) Statistical based scheme, which assumes that the distribution of data fits for a statistical model. A data is 15

determined as a fault if the probability of the data generated by an estimated

ED

model is very low. (2) Nearest neighbor based scheme [9], which diagnoses a data as a fault if it is located far from its neighbors. (3) Clustering based scheme

PT

[10], which identifies a data as a fault if it does not belong to any cluster or its cluster is too small. (4) Classification based scheme, which classifies a data to 20

a class (normal/fault) by an estimated classification model.

CE

Fault detection can only tell us which data may be faulty, but it cannot

rebuild the fault to its truth value [11]. Data reconstruction is an important

AC

technique to recover missing data and sensor faults, which usually uses spatial and temporal correlations to estimate the values of sensors in wireless sensor

25

networks [12, 13]. Cover et al. in [9] formalized a K nearest neighbor (KNN) [9] method which could be used for data reconstruction. It utilized the values of the nearest K neighbors to estimate the value of the missing data. Rajasegarar et al. in [10] introduced a clustering based approach for data rebuilding, which

2

ACCEPTED MANUSCRIPT

grouped similar data into clusters with similar behavior to estimate the values 30

of sensors. Salem et al. in [14] utilized linear regression to estimate the value of a sensor by its neighbours’ readings. Candes et al. in [15] and Donoho in

CR IP T

[16] used a compressive sensing (CS) method to recover the whole data using a few training data. Kong et al. in [12] used an environmental space time im-

proved compressive sensing (ESTI- CS) method for data reconstruction, which 35

recovered data by finding a data set satisfying a predefined spatial-temporal correlation that minimized the Euclidean distance to the diagnosed data set.

Most of the available works for data reconstruction of wireless sensor net-

AN US

works utilize redundant information to estimate values of sensors, which cannot provide a satisfactory accuracy. They are not yet suitable for body sensor net40

work because of the very limited number of body sensors.

Bayesian network (BN) [17] based scheme learns a probabilistic graphical model using a set of training data and estimates a sensor value by calculating a conditional probability, which can aggregate readings from different sensors

45

M

at different times to provide a satisfactory estimation accuracy. There are few available works of data reconstruction for WSNs based on Bayesian network

ED

model, as there are a large number of sensors in WSNs [18] which makes the estimation scheme too difficult. Janakiram et al. in [19] and Elnahrawy et al. in [20] established a framework for outlier detection of WSNs using Bayesian

50

PT

network model. But the Bayesian network they used only considered sensor readings without the ground truth values. Krishnamachari et al. in [21], Luo et

CE

al. in [22] and Wu et al. in [23] used distributed Bayesian algorithms for fault diagnosis of event region detection in WSNs. Each sensor they used to detect

AC

event only possessed a binary value.

55

In this paper, we are motivated to formalize a data reconstruction scheme

using Bayesian network model to recover missing data and sensor faults. The Bayesian network contains the ground truth values of vital signs and sensory readings, which enables us to do performance analysis of the data reconstruction scheme. The main contributions of this paper are summarized as follows.

3

ACCEPTED MANUSCRIPT

• We formalize a Bayesian network model to capture the spatial and temporal correlations of body sensors, and provide the structure and parameter

60

learning algorithms for Bayesian network with training data. Based on

CR IP T

this model, we first provide a data reconstruction scheme for data losses and then give theoretical performance analysis of data reconstruction.

• We revised the data reconstruction scheme to rebuild sensor data containing faults based on a threshold, and provide theoretical performance

65

analysis to find the optimal threshold which minimizes the fault rate after

AN US

executing the data reconstruction process.

• We evaluate the performances of our data reconstruction scheme on an extensive online data set. The experimental results indicate that our scheme using Bayesian network model outperforms other available schemes, such

70

as KNN and ESTI-CS, which can achieve the least deviation of the rebuilt

data reconstruction.

M

data to the truth value, and reduce the fault rate to 35% of that before

The rest of this paper is organized as follows. Section 2 introduces the Bayesian network model and the problem formulation. In Section 3, we for-

ED

75

malize a data reconstruction scheme for data losses. Section 4 revises the data reconstruction scheme to recover sensor faults and provides theoretical perfor-

PT

mance analysis. In Section 5, we evaluate the performance of our scheme on an

CE

extensive online data set. Finally, Section 6 concludes this paper.

80

2. Problem Formulation and System Model In this section, we first introduce the problem formulation, then define a

AC

Bayesian network model for data reconstruction and provide a Bayesian learning process using historical training data. 2.1. Problem Formulation

85

In medical body sensor networks, we use body sensors to monitor vital signs, e.g., heart activities, blood pressure, respiration rate, saturation of oxygen in the 4

ACCEPTED MANUSCRIPT

arterial blood, and transmit the collected data periodically to a server device, e.g., a smartphone or a PDA. Then by wireless or wired connection, these data are streamed remotely to a medical doctor’s site for real time diagnosis, to a medical database for record keeping, or to the corresponding equipment that issues an emergency alert.

CR IP T

90

We use Q = (Q1 , · · · , QT ) to describe the ground truth values of n vital

signs during T time slots, where Qt = (q1t , · · · , qnt ) is a set of ground truth values at time t.



q11 q12 · · · q1T q22 · · · q2T .. .. .. . . .

      

AN US

   q21 Q=  ..  .  qn1



qn2 · · · qnT

We use F = (F1 , · · · , FT ) to describe the observations of sensors used to monitor is a set of observations at time t.  f12 · · · f1T   f22 · · · f2T   .. ..  .. . . .   fn2 · · · fnT

ED

M

vital signs, where Ft = (f1t , · · · , fnt )  f11    f21 F =  ..  .  fn1

fkt is assigned a specific value 0 to indicate that the kth sensor reading is missing

PT

at time t. fkt 6= qkt indicates that kth sensor incorrectly reports the value of vital sign at time t. There exist several types of data losses and faults that 95

are caused by different factors [12]. The noise and collision in BSNs may cause

CE

data losses and faults independently and randomly. Congestion may cause data losses and faults of adjacent sensor nodes during a period of time. Unreliable

AC

links which are inevitable in BSNs may cause data losses and faults frequently for some sensors. The damage or the exhaustion of the energy may cause data

100

losses and faults from a particular time slot. The data reconstruction problem can be defined as follows. • Data reconstruction problem. It is to rebuild the ground truth values of vital signs based on the gathered sensor readings. 5

ACCEPTED MANUSCRIPT

A reconstructed matrix is a matrix F¯ = (f¯)n×T generated by data recon105

struction of a sensor reading matrix F to approximate the ground truth matrix

2.2. Bayesian Network Model for Data Reconstruction

CR IP T

G.

A Bayesian network is a pair (G, θ), where G = (V, E) is a directed acyclic

graph and θ is a set of parameters. An edge e between two nodes in V denotes 110

a direct probabilistic relationship. A parameter on node v ∈ V is a probability

P (v|π(v)), where π(v) is the parent set of v [17]. If there is no parent of node

AN US

v, then the parameter on v is P (v).

We formalize a Bayesian network shown in Figure 1 to model the attributes of body sensors. In this model, X1 , X2 , · · · , Xn represent the current ground 115

truth values of n vital signs. Y1 , Y2 , · · · , Yn represent the current sensor readings of n vital signs. We select the first sensor as the diagnosed sensor. Xn+1 , Xn+2 and Yn+1 , Yn+2 respectively represent the previous and next time ground truth

M

values and sensor readings of the diagnosed vital sign.

Suppose that each Xk (1 ≤ k ≤ n + 2) has rk possible values, each Yk has gk possible values, π(Xk ) has uk possible values, X = {X1 , · · · , Xn+2 }

ED

120

has w possible values, Y = {Y1 , · · · , Yn+2 } has s0 possible values, and Y 0 = {Y2 , · · · , Yn+2 } has s1 possible values. For the convenience of description, we

PT

use Xk = i (1 ≤ i ≤ rk ) to denote that Xk is assigned to the ith value, and

similar expressions are used for Yk , π(Xk ), X, Y , Y 0 . Specially, Yk = 0 expresses 125

a missing sensor reading. Given a value i of Xk (or Yk ), we use O(i) to denote

CE

the specified value of i, e.g., Xk has 4 possible values, they are 5, 6, 7, 8, then

Xk = 3 means that Xk is currently assigned to 7 which is the 3th element of its

AC

possible values. Given a value j (1 ≤ j ≤ s0 ) of Y , a rebuilt value of j is such

a value that assigns a possible value for each Yk = 0 while all sensors whose

130

reading are not lost keep their original value unchanged, we use S(j) to denote the set of all possible rebuilt values of j. Given a value l (1 ≤ l ≤ s1 ) of Y 0 , we use S 0 (l) to denote the set of all possible rebuilt values of l. Given a value i (1 ≤ i ≤ w) of X, we use A(i, k) to denote the value z of π(Xk ) such that the 6

ACCEPTED MANUSCRIPT

Yi

Yn

Ă

Ă

Ă

Ă

Xi X2

Xn

X1 Yn+2

Xn+1

Xn+2 Y1

CR IP T

Y2

Yn+1

AN US

Figure 1: A Bayesian network for data reconstruction. Xi (1 ≤ i ≤ n) describes the ground

truth value of the ith vital sign with X1 being the diagnosed one. Yi represents the corresponding sensor reading. Xn+1 , Xn+2 describe the ground truth values of the diagnosed vital sign at the previous and next times, and Yn+1 , Yn+2 represent the corresponding sensor readings of the diagnosed vital sign. The broken circle denotes that the structure of variables X1 , · · · , Xn in it need to be learned by training data.

M

value of each Xm ∈ π(Xk ) in z is the value of Xm in i.

In the following, we use ckij to denote the conditional probability on Xk :

ED

ckij = P (Xk = j|π(Xk ) = i)

PT

bkij to denote the observation probability for the kth vital sign: bkij = P (Yk = j|Xk = i)

CE

aij to denote the transition probability on Xn+2 : aij = P (Xn+2 = j|X1 = i)

and dij to denote the conditional probability P (Xn+1 = j|X1 = i) on Xn+1

AC

with

135

dij =

      

aji , r1 P ali

l=1

1 |r1 | ,

if

r1 P

ali > 0

l=1

otherwise

Moreover, we use Iik to denote the probability on Xk , which is the initial probability of the ground truth values of the kth vital sign with Iik = P (Xk = i). 7

ACCEPTED MANUSCRIPT

2.3. Structure and Parameter Learning of Bayesian Network Using Bayesian network model for data reconstruction scheme, the precondition is the Bayesian learning with historical training data, which not only contains parameter learning, but also structure learning. In addition, the training

CR IP T

140

data for Bayesian learning may have missing values. We can use the supplemented expectation maximization (SEM) algorithm [24] for the learning of the

Bayesian network using these incomplete training data. SEM algorithm is di-

vided into two steps: structure learning and parameter learning. For structure 145

searching, SEM algorithm uses the expected sufficient statistics factor to re-

AN US

place sufficient statistics factor that does not exist to make the scoring function being decomposed, and finds the network structure with higher score by local searching. Then, SEM algorithm finds the parameter with the maximum score on the selected Bayesian network structure.

Let (G, θ) be a Bayesian network, and D = {D1 , · · · Dm } be a training data

M

set. We define the BIC score as

BIC(G, θ|D) = logP (D|G, θ) −

d(G) 2 logm

ED

¯ is a Bayesian network obtained by SEM algorithm from an ¯ θ) Suppose that (G, original Bayesian network, D is the training data set with missing values, and D is a complete data set obtained from D by repairing the missing values based as

PT

¯ then the BIC score of (G, θ) on D, written as B(G, θ|G, ¯ is defined ¯ θ), ¯ θ), on (G,

CE

m ¯ = P P P (Xl |Dl , G, ¯ ¯ θ) ¯ θ)logP B(G, θ|G, (Dl , Xl |G, θ) − l=1 Xl

d(G) 2 logm

where Xl is a set of variables without valuations in Dl . By the Bayesian deduc-

AC

tion, we can obtain that

where

n+2 uk rk ¯ = P P P γ G logθkji − ¯ θ) B(G, θ|G, kji k=1 j=1 i=1

d(G) 2 logm

G ¯ ¯ θ) γkji = P (Xk = i, πG (Xk ) = j|Dl , G,

8

ACCEPTED MANUSCRIPT

Algorithm 1 SEM(X, D, G0 , θ0,0 , M ). Require: X is a set of variables, D is a training data set with missing values, G0 is the initial structure, θ0,0 is the initial parameters, N is the number of

Ensure: A Bayesian network for i = 0 to ∞ do for j = 0 to M − 1 do

θi,j+1 = arg sup B(Gi , θ|Gi , θi,j ); θ

end for

CR IP T

steps for parameter optimization between two steps of structure optimization

AN US

U = {G|G is obtained from Gi by adding, deleting or rotating an edge }; (Gi+1 , θi+1,0 ) = arg max sup B(G, θ|Gi , θM ); G∈U

θ

if BIC(Gi+1 , θi+1,0 |D) ≤ BIC(Gi , θi,M |D); then return (Gi , θi,M );

end if

M

end for

πG (Xk ) is the parent set of Xk in G, and

ED

θkji = P (Xk = i, |πG (Xk ) = j)

Using training data D with missing values, the SEM algorithm for finding

150

PT

the optimal Bayesian network (G, θ) from an initial Bayesian network is given as Algorithm 1.

CE

3. Data Reconstruction for Data Losses In this section, we first formalize the data reconstruction scheme for data

losses and then provide theoretical performance analysis.

AC

155

3.1. Data Reconstruction for Random Data Losses We formalize a data reconstruction scheme considering the scenario that all

sensors correctly measure the ground truth values of vital signs except those missing data. For data reconstruction using Bayesian network model, the first 9

ACCEPTED MANUSCRIPT

step is to estimate the parameters by historical training data. In the scenario that there is no fault in the sensor data, we can estimate the structure and parameters ckij , aij and dij by training data. For each bkij , we can directly assign it as

CR IP T

  1, if O(i) = O(j) bkij =  0, if O(i) 6= O(j)

The task of data reconstruction is to determine a ground truth value H

for the vital sign X1 by those sensor readings Y1 , · · · , Yn+2 containing missing

values. For each possible ground truth value of the diagnosed vital sign, we first calculate a conditional probability of this ground truth value under those known

AN US

160

sensor readings. And then we select a ground truth value which has a maximum conditional probability as the reconstructed data for the missing sensor reading. Given a value i of X1 and j of Y = {Y1 · · · Yn+2 } with Y1 = j1 6= 0, · · · , Yn+2 = jn+2 6= 0, we use αij to denote the expression

k2 =1

rn+2

···

X

kn+2 =1

b1ij1 · · · bn+2 kn+2 jn+2 dikn+1 aikn+2

M

r2 X

n Y

clkl E(m,l)

l=1

where k1 = i, m is a value of X with X1 = i, X2 = k2 , · · · , Xn+2 = kn+2 . The 165

ED

following theorem tells us a way to calculate the conditional probability used for data reconstruction.

PT

Theorem 1. Given a value j of sensor readings Y , the conditional probability of the estimated ground truth value H for X1 being a value i under those sensor

AC

CE

readings can be calculated as

Pi|j = P (H = i|Y = j) =

P

αik

k∈S(j) r1 P P

(1) αlk

l=1 k∈S(j)

Proof: By the theory of Bayesian network, the joint probability distribution

of the Bayesian network in Figure 2 can be calculated as P (X1 , · · · , Xn+2 , Y1 , · · · , Yn+2 ) = P (Y1 |X1 ) · · · P (Yn+2 |Xn+2 )P (Xn+1 |X1 )P (Xn+2 |X1 ) 10

n Q

i=1

P (Xi |π(Xi ))

ACCEPTED MANUSCRIPT

Then we can calculate the probability P (X1 , Y1 , · · · , Yn+2 ) by variable eliminating operations.

X2

Xn+2

P (Xn+1 |X1 )P (Xn+2 |X1 )

n Q

i=1

CR IP T

P (X1 , Y1 , · · · , Yn+2 ) P P = ··· P (Y1 |X1 ) · · · P (Yn+2 |Xn+2 )

P (Xi |π(Xi ))

And the probability P (H, Y1 , · · · , Yn+2 ) can be calculate as X

Pij = P (H = i, Y = j) =

P (H = i, Y = k) =

αik

(2)

k∈S(j)

AN US

k∈S(j) 170

X

The probability P (Y1 , · · · , Yn+2 ) can be calculated by summing P (H, Y1 , · · · , Yn+2 ) up. Pj = P (Y = j) =

r1 X l=1

By (2) and (3), we have

P (H = l, Y = j) =

αlk

(3)

l=1 k∈S(j)

P

αik

k∈S(j)

M

Pi|j =

r1 X X

r1 P P

αlk

l=1 k∈S(j)

ED

2

Given the estimated parameters of the Bayesian network and a value j of all correlative sensor readings Y1 , · · · , Yn+2 , we can calculate the feasible probability for each possible value of X1 by the Bayesian inference used in the proof of

PT

175

Theorem 1. Then we rebuild the missing data by a ground truth value i that

CE

can maximize the probability Pi|j in (1). 3.2. Data Reconstruction for Continuous Data Losses

AC

Some reasons such as congestion may cause continuous data losses of ad-

180

jacent sensor nodes. If all sensor readings are continuously dropped during a period of time, then we cannot rebuild the missing data by the probability Pi|j , because Y1 · · · Yn+2 are all missing or most of them are missing, the probabilities Pi|j for each possible value i of X1 are all the same. In this case, we can

first rebuild the missing data by the existing sensor readings at the beginning of 11

ACCEPTED MANUSCRIPT

185

missing time, and then rebuild the missing data by these rebuilt ground truth values step by step. Given m sensors with missing readings, there will be

m Q

rk possible rebuilt

k=1

CR IP T

values for XL = {Xz1 , · · · Xzm }. Suppose that XE = X\(XL ∪ {X1 }) =

{Xh1 , · · · , XhN } (N = n + 1 − m), l is a rebuilt value of XL with Xz1 =

l1 , · · · , Xzm = lm , j is a value of Y with Y1 = j ∗ , Yh1 = j1 , · · · , YhN = jN , Yz1 = 0 j10 , · · · , Yzm = jm , we use βijl to denote the expression

P

i1 =1

···

c1iA(o,1)

rhN

P

iN =1 N Q k=1

b1ij ∗ bhi11j1 · · · bhiNNjN bzl11j 0 · · · bzlmmj 0 dik1 aik2 m

1

chikkA(o,hk )

m Q

czlkkA(o,zk )

AN US

rh1

k=1

where o is a value of X with X1 = i, Xz1 = l1 , · · · , Xzm = lm , Xh1 = i1 , · · · , XhN = iN , k1 (k2 ) is the value of Xn+1 (Xn+2 ) in XL or XE. Then we calculate a probability for the data reconstruction of X1 ,

M

βikl

k∈S(j)

Pi|jl = P (H = x1 |Y = j, XL = l) =

r1 P

P

i0 =1 k∈S(j)

(4)

βi0 kl

and rebuild the continuous missing data for X1 by a ground truth value i that

ED

190

P

can maximize the probability Pi|jl in (4).

PT

3.3. Performance Analysis of Data Reconstruction for Data Losses To evaluate performances of data reconstruction, a metrics is defined as

CE

follows.

Definition 1. The deviation ratio (DR) is the metrics used to measure the

AC

reconstruction deviation. We denote it by r P (f¯kt − qkt )2

195

DR =

k,t,fkt =0

r

P

(qkt )2

k,t,fkt =0

Given sensor readings j of Y , we use R(j) to denote the reconstructed data i which can maximize the probability Pi|j . 12

ACCEPTED MANUSCRIPT

Theorem 2. Given the estimated parameters of the Bayesian network, the de-

DR =

i=1 j=1 l∈S(j)

s

αil (O(R(j)) − O(i))2

r1 P

i=1

(5)

CR IP T

viation ratio can be calculated as s s0 r1 P P P

Ii1 (O(i))2

Proof: Suppose the rate of data losses is pl, then the number of missing data with the ground truth value X1 = i in T time slots is

AN US

o1 = pl ∗ T ∗ P (X1 = i) in which the number of data with Y = j is

o2 = o1 ∗ P (Y = j|X1 = i) then DR can be calculated as

o2 (O(R(j))−O(i))2

i=1 j=1

s

r1 P

i=1

M

r1 P s0 P

o1 (O(i))2

=

s

r1 P s0 P

P

αil (O(R(j))−O(i))2

i=1 j=1 l∈S(j)

s

r1 P

i=1

Ii1 (O(i))2

2

ED

DR =

s

The metrics DR can distinguish the performances of different data recon-

200

struction schemes. A better scheme should have a smaller DR. However, if the

PT

sensor readings contain faults, then the proposed data reconstruction scheme and the metrics DR will not be applicative. We formalize a revised data recon-

CE

struction scheme for sensor data containing faults in the following section.

4. Data Reconstruction for Sensor Data Containing Faults

AC

205

The gathered readings of body sensors may not only have missing values,

but also contain faults. In this scenario, we can first detect the faults using the proposed data reconstruction scheme above, and then rebuild all faults as the estimated values.

13

ACCEPTED MANUSCRIPT

210

4.1. Sensor Faults Rebuilding In the scenario that sensor data contain faults, the sensor reading may not be equal to ground truth value of a vital sign. Thus, the training data used

CR IP T

to estimate the structure and parameters of Bayesian network model are not complete, which are only composed of sensor readings without the ground truth 215

values of vital signs. To estimate the structure and parameters of Bayesian

network, we can use Algorithm 1 after completing each sample by adding ground truth values and weights of vital signs.

Given sensor readings j of Y with Y1 = j1 and a threshold 0 ≤ δ ≤ 1. If the 220

AN US

estimated value R(j) for X1 obtained by calculating the probability Pi|j in (1) satisfies

|O(R(j)) − O(j1 )| > δ

(6)

then the sensor reading j1 is diagnosed as a fault and further rebuilt as the estimated value R(j). Otherwise, j1 is used as the rebuilt value of the sensor reading which is considered correctly report the ground truth value of the vital

4.2. Performance Analysis for Optimal Threshold Scheme

ED

225

M

sign.

Let T F be the number of faults correctly rebuilt, F F be the number of faults that are still faults after reconstruction, F C be the number of correct data that

PT

are rebuilt to faults, and T C be the number of correct data that are rebuilt as themselves. To evaluate the performance of the data reconstruction for sensor faults to find the optimal threshold, three metrics are defined as follows.

CE

230

Definition 2. The fault rate after reconstruction (FRR) is the proportion of faults among the total number of sensory data after reconstruction. We denote

AC

it by

FRR =

FF + FC TF + TC + FF + FC

Definition 3. The true fault reconstruction rate (TFRR) is the proportion of faults that are correctly rebuilt. We denote it by TFRR =

TF TF + FF

14

ACCEPTED MANUSCRIPT

Definition 4. The false reconstruction rate for correct data (FRRC) is the proportion of correct data that are incorrectly rebuilt to faults. We denote it by FC FC + TC

CR IP T

FRRC =

An efficient data reconstruction scheme should have a high TFRR, as the faults that are not correctly rebuilt will be left behind. In addition, it should

have a small FRRC, as the correct data may be rebuilt to newly produced faults.

For theoretical performance analysis, we define two functions. Given sensor

AN US

readings j of Y with Y1 = j1 , a ground truth value i of X1 and a threshold δ,   1, if |O(R(j)) − O(j )| ≥ δ 1 L(j) =  0, otherwise

M

   1, if |O(R(j)) − O(j1 )| ≥ δ and   L0 (j) = |O(R(j)) − O(i)| ≤ δ     0, otherwise

To calculate theoretical TFRR and FRRC, let us first consider the prob-

ED

ability P (Y 0 |Y1 , X1 ). Given a value l of Y 0 = {Y2 , · · · , Yn+2 }, we have 0

235

=

P

αim

m∈S(j)

PT

Pl|ki = P (Y = l|Y1 = k, X1 = i) =

P (Y 0 =l,Y1 =k,X1 =i) P (Y1 =k,X1 =i)

P l0

αij 0

where j is a value of Y with Y1 = k, Y 0 = l, j 0 is a value of Y with Y1 = k, Y 0 = l0 .

CE

Theorem 3. Given the estimated parameters of the Bayesian network and the

AC

fault rate p of the diagnosed sensor, TFRR can be calculated as P Ii1 b1ik L0 (j) αij 0 g r s 1 1 1 X X X j 0 ∈S(j) P TFRR = p∗ αim i=1 k(6=i)=1 l=1

(7)

l0 ∈S 0 (l)

and FRRC as

FRRC =

r1 X s1 X i=1 l=1

Ii1 b1ik L(j) (1 − p) ∗ 15

P

j 0 ∈S(j)

P

l0 ∈S 0 (l)

αij 0

αim

(8)

ACCEPTED MANUSCRIPT

where j is a value of Y with Y1 = k, Y 0 = l, m is a value of Y with Y1 = k, Y 0 = l0 . 240

k(6= i) in TFRR denotes O(k) 6= O(i), k in FRRC satisfies O(k) = O(i). Proof: Suppose that the number of received sensor data is T , then the

o1 = T ∗ P (X1 = i)

CR IP T

number of diagnosed sensor readings with the ground truth value X1 = i is

The number of faulty sensor readings with Y1 = k and O(k) 6= O(i) in those o1 readings is o2 = o1 ∗ P (Y1 = k|X1 = i)

AN US

The number of faulty sensor readings with Y 0 = l in those o2 readings is o3 = o2 ∗ P (Y 0 = l|X1 = i, Y1 = k)

in which the number of faults that can be detected and correctly rebuilt is o4 = o3 ∗ L0 (j)

M

where j is a value of Y with Y1 = k, Y 0 = l. Then we can calculate TFRR as

ED

TFRR =

r1 P

=

g1 P

s1 P

p∗T

i=1 k(6=i)=1 l=1

s1 P

o4

L0 (j)P (X1 =i)P (Y1 =k|X1 =i)Pl|ki

p P Ii1 b1ik L0 (j) αij 0 s 1 P j 0 ∈S(j) P p∗ αim i=1 k(6=i)=1 l=1 l0 ∈S 0 (l) r1 P

PT

=

r1 P

g1 P

i=1 k(6=i)=1 l=1

g1 P

In the previous o1 sensor readings with X1 = i, the number of correct data

CE

with Y 0 = l is

o5 = o1 ∗ P (Y 0 = l|Y1 = k(= i), X1 = i)

AC

where k(= i)denotes O(k) = O(i) and the number of sensor readings in o5 that are incorrectly diagnosed as faults is o6 = o5 ∗ L(j)

where j is a value of Y with Y1 = k, Y 0 = l and the FRRC can be calculated as follows. 16

ACCEPTED MANUSCRIPT

0.20

0.40

KNN [6]

KNN [6] 0.36

ESTI-CS [8] BN

0.16

ESTI-CS [8] BN

0.32

0.12

0.08

0.04

0.24

0.20

0.16

0.12

0.08

0.04

0.00

0.00 0.1

0.2

0.3

0.4

0.5

0.1

Missing data and fault rates of Heart Rate

0.2

CR IP T

Deviation Ratio

Deviation Ratio

0.28

0.3

0.4

0.5

Missing data and fault rates of Heart Rate

(a) Comparison of deviation ratio on a dataset (b) Comparison of deviation ratio on a dataset of 5000 groups of medical data.

AN US

of 9000 groups of medical data.

Figure 2: Performance comparison between our scheme and the available data reconstruction schemes for data losses.

o6

(1−p)∗T

=

r1 P s1 P

i=1 l=1

L(j)P (X1 =i)Pl|ki

=

1−p

r1 P s1 P

i=1 l=1

M

FRRC =

r1 P s1 P

i=1 l=1

Ii1 b1ik L(j) (1−p)∗

P

j 0 ∈S(j)

P

l0 ∈S 0 (l)

αij 0

αim

2

After executing the data reconstruction process, there exist three types of

245

ED

faults, one type is the faults that are not detected, another type is the detected faults that are incorrectly rebuilt, the other type is the faults that are newly

PT

introduced by incorrect reconstruction. FRR can be calculated as FRR = p ∗ (1 − TFRR) + (1 − p) ∗ FPRC

(9)

CE

The last question is how to select the threshold, which is to find a threshold

250

δ that can minimize FRR. FRR in (9) is a function of threshold δ, so the operation of finding the optical threshold is to solve the function FRR(δ) for

AC

the minimum value. It is very difficult to obtain the minimum value of FRR(δ).

Here, we provide an approximate process to find the optimal threshold. We first select a discrete threshold set Θ and then calculate the FRR for each δ ∈ Θ. The

255

threshold whose FRR is the smallest will be selected as the optical threshold.

17

ACCEPTED MANUSCRIPT

5. Numerical Results In this section, we evaluate the performance of our data reconstruction scheme for data losses and sensor faults compared with other available schemes

260

CR IP T

by a simulator developed in C++. 5.1. Experimental Methodology 5.1.1. Ground Truth Values

We respectively utilize data sequences containing 3 attributes: mean blood

pressure, heart rate and oxygen saturation of 9000 and 5000 time slots as the

265

AN US

ground truth values of vital signs. Those data are selected from an online medical dataset of the PhysioNet database [25]. 5.1.2. Artificial Data Losses and Sensor Faults

For those 9000 groups of online medical data, we select 8000 as the training data to estimate the Bayesian network, and 1000 as diagnosed data. For those

270

M

5000 groups of online medical data, we select 4800 as the training data and 200 as diagnosed data. For the simulation of data reconstruction for data losses,

ED

we artificially delete some of diagnosed data to produce missing data. For the simulation of data reconstruction for sensor faults, we use injected faults which are obtained by artificially modifying some of the diagnosed data. The locations

275

PT

of injected faults, missing data and the values of injected faults are all selected by random numbers.

CE

5.1.3. Experiment Procedure The procedure of this experiment is: (1) Learn the structure and parameters

AC

of the Bayesian network using the historical training data. (2) Generate the ground truth matrix Q according to the diagnosed data. (3) Generate the sensor

280

reading matrix F according to the ground truth matrix Q, artificial data losses and sensor faults. (4) Construct the rebuilt matrix F¯ by the data reconstruction

scheme. (5) Calculate the metrics DR, FRR, TFRR and FRRC.

18

ACCEPTED MANUSCRIPT

To verify the effectiveness of our data reconstruction scheme, we evaluate the performances of two classic methods ESTI-CS [12] and KNN [9] as comparison. 285

Given a sensor reading matrix F , ESTI-CS is an algorithm used to find a ground

CR IP T

truth matrix Q which can minimize ||F − Q||, where || · || is the Frobenius norm

used to measure the deviation between F and Q. The KNN method utilizes

the values of the nearest K neighbors to estimate the ground truth value of the missing data. Three series of experiments are conducted in this section. The 290

first two experiments evaluate the performances of data reconstruction for data

losses and sensor faults of different methods. The third experiment verifies the

AN US

matching of our theoretical prediction and simulation results. 5.2. Performance Comparisons for Data Losses

To evaluate the performance of the our data reconstruction scheme for data 295

losses, we first artificially delete some data in the 1000 and 200 groups of diagnosed data with the missing rates of mean blood pressure and oxygen saturation

M

being fixed at 10%, then we rebuild data and calculate the metrics DR. As a comparison, we evaluate the performances of the ESTI-CS and KNN

300

ED

methods. The K in KNN of our simulation is set as 3, which has a better DR. Figure 2 shows that our data reconstruction scheme possess the best performance. Even 50% data have been lost, our scheme can rebuild the data

PT

with DR less than 8%. In the simulation on 9000 groups of medical data, the DR of ESTI-CS and KNN are close to 15%. In the simulation on 5000 groups of medical data, the DR of ESTI-CS and KNN are all over 26%. Our scheme is better than other schemes in data reconstruction. However, when the sensor

CE 305

data contain faults, the direct data reconstruction and the metrics DR are not

AC

inapplicable. We conduct another simulation for data reconstruction of sensor faults in the following subsection. 5.3. Performance Comparisons for Data Faults

310

To evaluate the performance of the our data reconstruction scheme for sensor data containing faults, we artificially delete data and inject faults into the 1000

19

ACCEPTED MANUSCRIPT

0.60

1.0

KNN [6] 0.9

0.50

BN

0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

0.8

0.7

0.6

KNN [6]

0.5

ESTI-CS [8] 0.4

BN

0.3

0.2

0.1

0.0 0.0

0.1

0.2

0.3

0.4

0.0

Missing data and fault rates of Heart Rate

KNN [6] ESTI-CS [8] BN

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.00 0.0

0.3

0.4

(b) Comparison of TFRR.

0.50

0.40

0.2

0.1

AN US

False Reconstruction Rate for Correct Data

0.1

Missing data and fault rates of Heart Rate

(a) Comparison of FRR.

0.45

CR IP T

ESTI-CS [8] True Fault Reconstruction Rate

Fault Rate after Reconstruction

0.55

0.2

0.3

0.4

M

Missing data and fault rates of Heart Rate

(c) Comparison of FRRC.

ED

Figure 3: Performance comparison between our scheme and the available data reconstruction schemes on 9000 groups of medical data. (a) Comparison of FRR: fault rate after reconstruction. (b) Comparison of TFRR: true fault reconstruction rate. (c) Comparison of FRRC:

PT

false reconstruction rate for correct data.

and 200 groups of diagnosed data with the missing data rate and fault rate of

CE

mean blood pressure and oxygen saturation being fixed respectively at 3% and 6%.

To verify the effectiveness of our scheme, ESTI-CS and KNN are also selected

315

AC

for comparison. In this simulation, the threshold δ for ESTI-CS, KNN are set

as 3 which can approximatively obtain the smallest FRR, and the missing data

rate is set at 50% of the sensor fault rate.

320

Figure 3 gives simulation results on the 9000 groups of medical data. Fig-

ure 3(a) shows that our scheme has the best performance of FRR with the number of faults reduced to 35% of faults before data reconstruction. Fig20

ACCEPTED MANUSCRIPT

0.70

1.0

KNN [6]

BN

True Fault Reconstruction Rate

Fault Rate after Reconstruction

0.9

ESTI-CS [8]

0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

0.8

0.7

0.6

KNN [6] ESTI-CS [8]

0.5

BN

0.4

0.3

0.2

0.1

0.0 0.0

0.1

0.2

0.3

0.4

0.0

0.1

Missing data and fault rates of Heart Rate

0.2

0.3

0.4

Missing data and fault rates of HR

(a) Comparison of FRR.

(b) Comparison of TFRR.

0.60 0.55

AN US

False Reconstruction Rate for Correct Data

CR IP T

0.65

KNN [6]

0.50

ESTI-CS [8]

0.45

BN

0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.1

0.2

0.3

0.4

M

0.0

Missing data and fault rates of Heart Rate

(c) Comparison of FRRC.

ED

Figure 4: Performance comparison between our scheme and the available data reconstruction schemes on 5000 groups of medical data. (a) Comparison of FRR: fault rate after reconstruction. (b) Comparison of TFRR: true fault reconstruction rate. (c) Comparison of FRRC:

PT

false reconstruction rate for correct data.

ure 3(b) indicates that our scheme has the best TFRR, ESTI-CS and KNN

CE

have a very small TFRR, which is less than 30%. From our simulation, we find that the fault detection rate, which is the proportion of faults that are cor-

325

rectly identified as such, of ESTI-CS and KNN are all between 75% and 80%,

AC

but the false reconstruction rate for detected faults, which is the proportion of detected faults that are incorrectly rebuilt, of KNN are around 65%, and the false reconstruction rate for detected faults of ESTI-CS are over 50%. Thus, the proportion of faults that are correctly rebuilt of ESTI-CS and KNN are

330

very small. Figure 3(c) tells us that our scheme has the smallest FRRC. The

21

ACCEPTED MANUSCRIPT

Table 1: The confidence intervals of the simulation with 5000 groups of data. Fault probability 0.05 0.10 0.15 0.20

FRRC(%)

1.16

2.09

2.88

4.39

[1.07,1.25]

[1.94,2.23]

[2.70,3.06]

[4.15,4.62]

The average

86.7

84.6

84.1

80.8

Confidence interval

[84.9,88.4]

[83.2,85.9]

[83.0, 85.3]

[80.0,81.9]

The average

0.51

0.60

0.64

0.68

Confidence interval

[0.48,0.56]

[0.54,0.67]

[0.51,0.67]

[0.59,0.77]

0.35

0.40

Fault probability FRR(%) TFRR(%) FRRC(%)

0.25

0.30

The average

5.94

7.32

Confidence interval

[5.69,6.20]

[7.06,7.59]

The average

78.6

77.7

Confidence interval

[77.6,79.6]

[76.8,78.6]

The average Confidence interval

CR IP T

TFRR(%)

The average Confidence interval

9.71

12.1

[9.41,10.0]

[11.7,12.5]

73.9

71.6

[73.1,74.8]

[70.7,72.4]

AN US

FRR(%)

0.79

0.90

0.95

1.22

[0.68,0.90]

[0.79,1.02]

[0.77,1.03]

[1.05,1.39]

FRRC of ESTI-CS and KNN are very large, which is even over the fault rate. The simulation indicates that there are a large number of faults which are left

M

or newly produced after data reconstruction. This leads to a very large fault rate which even larger than that before fault reconstruction. Figure 4 gives simulation results of another experiment on the 5000 groups of

335

ED

medical data, which shows similar performances to the first simulation on 9000 groups of data. Figure 4(b) indicates that our scheme has a better TFRR, which is over 75%. The TFRR of ESTI-CS and KNN are still small. The

340

PT

simulation tells us that the fault detection rates of ESTI-CS and KNN are respectively over 95% and 90%. The false reconstruction rate for detected faults

CE

of ESTI-CS and KNN are respectively around 80% and 70%. We can make a conclusion that it is not the efficient data reconstruction that generates the high FRR of ESTI-CS and KNN, which has been verified by their large FRRC

AC

and small TFRR. This simulation is the average result of 200 times random

345

experiments. We also calculate the the confidence intervals with a confidence level 95%. Table 1 shows the average values and the confidence intervals of FRR, TFRR and FRRC with different fault rates, which show that the lower and upper confidence limits are very close to the average values.

22

ACCEPTED MANUSCRIPT

5.4. Theoretical Prediction Against Simulation Results As we know, the performance of the proposed data reconstruction scheme

350

for data containing faults is impacted greatly by the threshold, and it is difficult

CR IP T

to select an appropriate threshold. We give a process to calculate the theoretical optimal threshold in the previous section. Is the calculated optimal threshold

the right optimal threshold? To verify this, we calculate the theoretical opti355

mal threshold and show the theoretical performances of the experiments in the previous subsection, then we compare the results to those of simulation. The

performances of the proposed scheme in Figure 3 are simulation results under

AN US

the optimal thresholds, which are obtained as: (1) For each fault rate p, we se-

lect Θ = {1, · · · , 10} as the discrete threshold set. (2) For each threshold δ ∈ Θ, 360

we conduct 300 simulations. In each simulation, we first inject faults by random numbers, the number of injected faults are decided by the fault rate p. Then we reconstruct data using the proposed scheme, and calculate the metrics FRR. The performance is the average result of those 300 times randomized simula-

365

M

tions. (3) We select the threshold δ which minimizes the simulation FRR as the optimal threshold. On the other side, we can calculate theoretical optimal

ED

threshold with the Bayesian network estimated by historical training data. For the experiment of 9000 groups of data, the optimal thresholds obtained by the simulation and the theoretical algorithm are all 3 when the fault rates are from

370

PT

5% to 40%. The performances of both simulations and theoretical predictions of the proposed data reconstruction scheme under the optimal thresholds are

CE

shown in Figure 5, which show that the results of simulation and theoretical prediction match well. Body sensors usually use batteries for power supplying, so the energy con-

AC

sumption is one of the major problems for body sensors. Data collecting and

375

transmitting are two main consumptions of energy for body sensors. Using the data reconstruction scheme based on the Bayesian network, we can prolong the sampling time of data to enhance the battery life. If we need some data between two sampling points, we can use the substituted data reconstructed by the Bayesian network. 23

ACCEPTED MANUSCRIPT

0.10

Fault Rate after Reconstruction

True Fault Reconstruction Rate &

0.9

0.8

0.7

0.6

0.5

Theoretical fault rate after reconstruction Simulated fault rate after reconstruction

0.4

Simulated true fault reconstruction rate Theoretical true fault reconstruction rate

0.3

0.2

0.1

0.0 0.0

0.1

0.2

0.3

0.4

Missing data and fault rates of Hear Rate

0.09

Theoretical results 0.08

Simulated results

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0.00 0.0

0.1

CR IP T

False Reconstruction Rate for Correct Data

1.0

0.2

0.3

0.4

Missing data and fault rates of Heart Rate

(a) Comparison of FRR and TFRR.

(b) Comparison of FRRC.

AN US

Figure 5: Theoretical prediction against simulation results. (a) Comparison of FRR: fault rate after reconstruction, and TFRR: true fault reconstruction rate. (b) Comparison of FRRC: false reconstruction rate for correct data.

380

6. Conclusions

The medical body sensor network is used for personal monitoring under

M

natural physiological status. The limited resource, noise, unreliable link and malicious attacks lead to the poor data quality which can heavily affect the monitoring effect, or even threat the life of monitored person. In this paper, we formalize a Bayesian network based data reconstruction scheme for body

ED

385

sensor networks. We first formalize a data reconstruction scheme for data losses

PT

and then give theoretical performance analysis. Further, we revised our data reconstruction scheme to rebuild data for sensor faults. Finally, we evaluate the performance of our data reconstruction scheme on an online dataset, which indicates that our scheme outperforms all available data reconstruction schemes.

CE

390

AC

Acknowledgement Part of this work has been supported by National Natural Science Foun-

dation of China (No. 61771373, 61771374, 61601357), China 111 Project (No. B16037), in part by the Fundamental Research Fund for the Central Universities

395

(No. JB181508, JB171501, JB181506, JB181507), and “13th Five-Year” Plan Equipment Pre-Research Foundation of China (No. 6140134040216HT76001). 24

ACCEPTED MANUSCRIPT

References References

CR IP T

[1] X. Cao, J. Chen, Y. Zhang and Y. Sun. “Development of an integrated wireless sensor network micro-environmental monitoring system,” ISA transac-

400

tions, vol. 47, no. 3, 2008, pp. 247-255.

[2] H. Zhang, J. Liu and N. Kato. “Threshold tuning based wearable sensor fault detection for reliable medical monitoring using Bayesian network

405

AN US

model,” IEEE Systems Journal, 2016, DOI: 10.1109/JSYST.2016.2600582.

[3] D. Zeng, P. Li, S. Guo, T. Miyazaki, J. Hu and Y. Xiang. “Energy minimization in multi-task software-defined sensor networks,” IEEE Transactions on Computers, vol. 64, no. 11, 2015, pp. 3128-3139.

[4] K. Zhang, K. Yang, X. Liang, Z. Su, X. Shen and H. H. Luo. “Security

M

and privacy for mobile healthcare networks: from a quality of protection perspective,” IEEE Wireless Communications, vol. 22, no. 4, 2015, pp.

410

ED

104-112.

[5] O. Salem, Y. Liu and A. Mehaoua. “Anomaly detection in medical wireless sensor networks,” Journal of Computing Science and Engineering, vol. 7,

415

PT

no. 4, 2013, pp. 272-284.

[6] H. Zhang, J. Liu and R. Li. “Fault detection for medical body sensor net-

CE

works under Bayesian network model,” Proc. IEEE MSN, 2015, pp. 37-42. [7] K. Xing, S. Zhang, L. Shi, H. Zhu and Y. Wang. “A localized backbone

AC

renovating algorithm for wireless ad hoc and sensor networks,” in Proc.

420

IEEE INFOCOM, 2013.

[8] Y. Zhang, N. Meratnia, and P. Havinga. “Outlier detection techniques for wireless sensor networks: a survey,” IEEE Communications Survey and Tutorials, vol. 12, no. 2, 2010, pp. 159-170.

25

ACCEPTED MANUSCRIPT

[9] T. Cover and P. Hart. “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, 1967, pp. 21-27. 425

[10] S. Rajasegarar, C. Leckie, M. Palaniswami, and J.C. Bezdek. “Distributed

CR IP T

anomaly detection in wireless sensor networks,” in Proc. IEEE ICCS, 2006. [11] H. Zhang, J. Liu and A. Pang. “A data reconstruction model addressing loss

and faults in medical body sensor networks,” Proc. IEEE GLOBECOM, 2016, DOI: 10.1109/GLOCOM.2016.7841491. 430

[12] L. Kong, M. Xia, X.-Y. Liu, M.-Y. Wu, and X. Liu. “Data loss and recon-

AN US

struction in sensor networks,” in Proc. IEEE INFOCOM, 2013.

[13] J. Liu and N. Kato. “A Markovian analysis for explicit probabilistic stopping based information propagation in post-disaster ad hoc mobile networks,” IEEE Transactions on Wireless Communications, vol. 15, no. 1, 2015, pp. 81-90.

435

M

[14] O. Salem, A. Guerassimov, A. Mehaoua, A. Marcus and B. Furht. “Sensor fault and patient anomaly detection and classification in medical wireless

ED

sensor networks,” in Proc. IEEE ICC, 2013. [15] E. J. Candes and T. Tao. “Near-Optimal Signal Recovery from Random Projections: Universal Encoding Strategies?” IEEE Transactions on Infor-

440

PT

mation Theory, vol. 52, no. 12, 2006, pp. 5406-5425. [16] D. L. Donoho. “Compressed sensing,” IEEE Transactions on Information

CE

Theory, vol. 52, no. 4, 2006, pp. 1289-1306.

[17] Z. Yang and R. N. Wright. “Privacy-preserving computation of Bayesian networks on vertically partitioned data,” IEEE Transactions on Knowledge

AC

445

and Data Engineering, vol. 18, no. 9, 2006, pp. 1253-1264.

[18] D. Zeng, L. Gu, S. Guo, Z. Cheng and S. Yu. “Joint optimization of task scheduling and image placement in fog computing supported softwaredefined embedded system,” IEEE Transactions on Computers, vol. 65, no.

450

12, 2016, pp. 3702-3712. 26

ACCEPTED MANUSCRIPT

[19] D. Janakiram, A. M. Reddy and A. P. Kumar. “Outlier detection in wireless sensor networks using Bayesian belief networks,” in Proc. IEEE COMSWARE, 2006.

455

CR IP T

[20] E. Elnahrawy and B. Nath, “Context-aware sensors,” in Proc. EWSN, 2004.

[21] B. Krishnamachari and S. Iyengar.“Distributed Bayesian algorithms for fault-tolerant event region dectection in wireless sensor networks,” IEEE Transactions on Computers, vol. 53, no. 3, 2004, pp. 241-250.

[22] X. Luo, M. Dong, and Y. Huang. “On distributed fault-tolerant detection

AN US

in wireless sensor networks,” IEEE Transactions on Computers, vol. 55, no. 1, 2006, pp. 58-70.

460

[23] W. Wu, X. Cheng, M. Ding, K. Xing, F. Liu, and P. Deng. “Localized outlying and boundary data detection in sensor networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 8, 2007, pp. 1145-1157.

M

[24] C. W. Duarte, Y. C. Klimentidis, J. J. Harris, M. Cardel, and J. R. Fernndez. “Using EM to obtain asymptotic variance-covariance matrices: The

465

ED

SEM algorithm,” Journal of the American Statistical Association, Vol. 86, No. 416 , 1991, pp. 899-909.

AC

CE

PT

[25] “Physionet,” http://www.physionet.org/cgi-bin/atm/ATM.

27

AN US

CR IP T

ACCEPTED MANUSCRIPT

Haibin Zhang received his B.Sc. degrees from the Ocean University of

470

China in 2003, and Ph.D. degree from the Xidian University in 2007. He joined School of Computer Science and Technolosy of Xidian University as a lecturer in 2008. He is currently an associate professor with the School of Cyber Engineering of Xidian university. His research interests concentrate on formal

M

verification, wireless sensor networks.

AC

CE

PT

ED

475

28