Fault detection and diagnosis for building cooling system with a tree-structured learning method

Fault detection and diagnosis for building cooling system with a tree-structured learning method

Accepted Manuscript Title: Fault Detection and Diagnosis for Building Cooling System With A Tree-structured Learning Method Author: Dan Li Yunxun Zhou...

6MB Sizes 0 Downloads 38 Views

Accepted Manuscript Title: Fault Detection and Diagnosis for Building Cooling System With A Tree-structured Learning Method Author: Dan Li Yunxun Zhou Guoqiang Hu Costas J. Spanos PII: DOI: Reference:

S0378-7788(16)30506-0 http://dx.doi.org/doi:10.1016/j.enbuild.2016.06.017 ENB 6752

To appear in:

ENB

Received date: Revised date: Accepted date:

6-1-2016 4-6-2016 6-6-2016

Please cite this article as: Dan Li, Yunxun Zhou, Guoqiang Hu, Costas J. Spanos, Fault Detection and Diagnosis for Building Cooling System With A Tree-structured Learning Method, (2016), http://dx.doi.org/10.1016/j.enbuild.2016.06.017 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ip t

Fault Detection and Diagnosis for Building Cooling System With A Tree-structured Learning Method Dan Lia , Yunxun Zhoub , Guoqiang Hua , Costas J. Spanosb a School

us

cr

of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798. b Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720.

an

Abstract

In order to save energy and improve the performance of building environment regulation, there is an increasing need for fault detection and diagnosis (FDD).

M

This paper investigates the effectiveness of tree-structured learning method for FDD of building cooling system. Researchers have been tackling building FDD task with a wide variety of techniques, such as analytical model-based, signal-

d

based and knowledge-based methods. Recently data-driven method has shown

te

its advantage in dealing with complex systems with random penetrations. Existing work on data-driven FDD merely formulates the task as a pure fault type classification problem, whereas fault severity levels and their inter-dependence

Ac ce p

have long been ignored. We propose a novel data-driven strategy that adopts structured labelling to include the dependence information and describe the severity levels in a large margin learning framework. A Tree-structured Fault Dependence Kernel (TFDK) method is derived and a corresponding on-line learning algorithm is developed for streaming data. As an improvement of traditional classification methods (e.g. SVM), TFDK encodes tree-structured fault dependence in its feature mapping, and takes regularized misclassification I This research is funded by the Republic of Singapore’s National Research Foundation under its Campus for Research Excellence and Technological Enterprise (CREATE) programme through a grant to the Berkeley Education Alliance for Research in Singapore (BEARS) for the Singapore-Berkeley Building Efficiency and Sustainability in the Tropics (SinBerBEST) Program. BEARS has been established by the University of California, Berkeley as a center for intellectual excellence in research and education in Singapore.

Preprint submitted to Journal of LATEX Templates

June 7, 2016

Page 1 of 42

cost as learning objective. Following the ASHRAE Research Project 1043 (RP1043), the strategy is applied to the FDD of a 90-ton centrifugal water-cooled

ip t

chiller. Experimental results show that compared to previous data-driven meth-

ods, TFDK can greatly improve the FDD performance as well as recognize the

cr

fault severity levels with high accuracy.

Keywords: Fault Detection and Diagnosis (FDD), Building Cooling System,

us

Data-driven Method, Pattern Classification, Machine Learning Method.

an

1. Introduction

Building energy consumption contributes to more than 40% of the total energy usage worldwide [1, 2]. Almost 32% of the total energy consumption

5

M

in industrialized countries is used by heating, ventilation, and air-conditioning (HVAC) systems [3]. The newly published ASHRAE Handbook has put special emphasis on automated fault detection and diagnosis (FDD) for smart building

d

systems. In particular, the new standard highlights the necessity of maintaining the whole building system in good working conditions through FDD techniques

10

te

as well as the significance of saving energy and improving occupancy comfort level and building safety level via automated FDD system [4]. Therefore, there

Ac ce p

is an increasing need for studying automated fault identification in buildings aiming at saving energy and offering more comfortable and safe dwelling environment [5, 6]. In the past decades, researchers have been sparing no efforts to develop algorithms and strategies that could detect and diagnose HVAC faults

15

to prevent unnecessary economic losses and maintain the system’s working efficiency [7, 8].

In the literature, miscellaneous FDD methods have been proposed, mainly

including three techniques and their combinations, such as analytical modelbased, signal-based and knowledge-based methods [9, 10, 11, 12, 13]. The model-

20

based method relies on explicit description of the system. Despite significant theoretical advancement made in this direction, few of the solutions can be directly inserted to the Building Management System (BMS) to conduct real time

2

Page 2 of 42

TEI/TEO FWE VFD

VFD PRE PRC

=Fan =Pump

Cooling Tower

Chiller

T=Temperature Sensor F=Flowrate Sensor

Return Air

P=Pressure Sensor

cr

VFD=Frequency Sensor

VFD FWC

Building Cooling System Mounted with Sensors

Operation

Water Cooling

us

TCI/TCO VFD

ip t

Supply Air

Classification

0.3

0.2

0.1

normal1 normal2 CF45 EO68 FWC40 FWE40 NC5 RL40 RO40

0

-0.1

-0.2

-0.3

-0.4 -0.5

-0.4

-0.3

-0.2

an

-0.1

0

0.1

0.2

0.2

0.15

0.1

0.05

0

-0.05

-0.15

-0.1

0.2

FWC20-NM FWC20-CF FWC20-EO FWC20-FWC FWC20-FWE FWC20-NC FWC20-RL FWC20-RO

0.18 0.16

Analysis

0.14

probability density

BMS

0.12

0.1

0.08 0.06 0.04

Expert

0.02 0

0

0.05

0.1

0.15 0.2 0.25 distance intervals

0.3

0.35

0.4

FDD Results

M

Data Labelling

Figure 1: Data-driven building FDD scheme,including deployed sensor network, data base

d

management, and a decision support system.

te

monitoring [10, 12]. The signal-based FDD method investigates the correlation between faults and system output signals, and improved performance can be achieved by adding the signal pattern of healthy status as a priori [10, 12]. The

Ac ce p

25

knowledge-based FDD method discovers the underlying knowledge and system features that represent the information redundancy among the system’s variables through learning from empirical data. Due to this fact, the knowledgebased method is commonly referred to as data-driven method [10, 13]. The

30

empirical data, which records outside environmental factors, internal loads, and mechanical system working conditions, is collected through sensor network and stored in the BMS [14, 15]. Experts and researchers analyze the empirical data and feedback to building operators if any fault is found. A common data-driven FDD system for smart buildings is depicted in Figure 1, including deployed

35

sensor network, data base management, and a decision support system. Recently, a wide range of statistical and machine learning techniques have

3

Page 3 of 42

Imprellers

Economizer Suction Line

Compressor

Evaporator

Pilot Valve

Main Gas Line

us

Main Valve Filter Drier

cr

Sensing Bulb

ip t

Oil Tank

Discharge Line

an

Economizer

Condenser

M

Main Liquid Line

Figure 2: Schematic diagram of chiller components and refrigerant flow paths; a typical centrifugal chiller system consists of: evaporator, compressor, condenser, economizer, motor,

d

pumps, fans, and distribution pipes etc.

te

been explored as data-driven methods in the building FDD field, including Principal Component Analysis (PCA) [16, 17, 18], Statistical Process Con-

Ac ce p

trol (SPC) [19, 20, 21], Multivariate Regression Models [22], Bayes Classifier

40

[23, 24, 25], Neural Networks (NN) [26, 27, 28], Fisher Discriminant Analysis (FDA) [29], Gaussion Mixture Model [30], Support Vector Data Description (SVDD) [31, 32], and Support Vector Machines (SVM) [33, 34, 35, 36, 37]. Among these approaches, PCA and SPC are unsupervised methods that do not require expert knowledge for fault labelling, but others like NN and FDA are

45

supervised multi-class classification methods that depend on the availability of labelled training data. Once the hypothesis/model is fitted from the training phase, new measurements will be tested by the classifiers and be assigned to corresponding categories (normal or faulty) automatically. Notwithstanding existing work on data-driven FDD has shown promising results in both detection

50

accuracy and efficiency, two important issues, namely fault interdependence and

4

Page 4 of 42

Table 1: Definitions of 24 essential variables in a typical cooling system (I)

Units

TEI Temperature of entering evaporator water F

cr

TEO Temperature of leaving evaporator water F

ip t

Label Description

TCI Temperature of entering condenser water F

us

TCO Temperature of leaving condenser water F

kW

FWC Condenser water flow rate

gpm

an

kW Compressor motor power consumption

gpm

TEA Evaporator approach temperature

F

M

FWE Evaporator water flow rate

F

TRE Refrigerant temperature in evaporator

F

d

TCA Condenser approach temperature

te

PRE Pressure of refrigerant in evaporator

F

Ac ce p

TRC Refrigerant temperature in condenser

psig

severity levels, are often ignored or over-simplified with homogeneity assumptions [13, 38, 39].

First of all, although it is quite intuitive to build fault dependence by

analysing the connections and structures of each component of HVAC system,

55

this prior knowledge is rarely considered in current data-driven FDD literature. For example, Zhao proposed a chiller fault detection method based on Support Vector Data Description (SVDD), which is a one-class classification technique describing the support of data distribution [31]. By training SVDD models for each fault type, they extended similar idea to a chiller fault diagnosis strategy

60

in [32]. Noticing that training a one-class classification model for each specific fault type is computationally costly, an alternative method is to formulate the FDD issue directly as a multi-class classification problem. To list a few, Du 5

Page 5 of 42

Table 2: Definitions of 24 essential variables in a typical cooling system (II)

Description

Units

PRC

Pressure of refrigerant in condenser

psig

T suc

F

Refrigerant suction temperature

cr

TRC sub Subcooling temperature

ip t

Label

F

F

TR dis

Refrigerant discharge temperature

F

Tsh dis

Refrigerant discharge superheat temperature F

P lift

Pressure lift across compressor

an

us

Tsh suc Refrigerant suction superheat temperature

M

TO sump Temperature of oil in sump

F F F

PO feed Pressure of oil feed

F

TWCD

Condenser temperature

F

TWED

te

d

TO feed Temperature of oil feed

F

Ac ce p

Evaporator temperature

proposed to utilize Fisher Discriminant Analysis (FDA) and Principal Component Analysis (PCA) to diagnose multiple sensor faults in AHU [29]. Keigo

65

employed semi-supervised FDA to detect building energy faults, and adopted Decision Boundary Analysis (DBA) to discover the hidden relationship between the extracted features and the corresponding faults [40]. However, all of the aforementioned work is restricted to modelling each type of fault separately with single (flat) class labels and ignores valuable prior information on fault

70

dependence, which could otherwise be exploited (fused) to improve the detection performance of the machine learning method [41]. Moreover, when dealing with complex building systems, the number of fault types (classes) is expected to be large, while usually only small number of labelled data for each fault class is available. From a statistical learning perspective, adopting a flat multi-class 6

Page 6 of 42

75

learning method and ignoring prior information will result in loss of valuable information, thus leading to degraded performance [42].

ip t

Secondly, the presence of different fault severity is well acknowledged in experiments but has long been ignored for FDD purpose. In a real building cooling

80

cr

system, faults naturally exhibit at various levels of severity due to different system /component degradations [43, 44, 45, 46]. For instance, in the research of

us

typical chiller faults, condenser fouling is a physical obstruction which is caused by the aggregation of non-decomposable chemical substances in the condenser tubes. It lowers the effective heat transfer coefficient and decreases the water

85

an

flow rate in a manner consistent with the degree of aggregation. Hence the severity/degree of fault provides researchers/system managers valuable information to optimize maintenance actions, as well as to set priorities for different

M

system scenarios. On the other hand, the advancement of the sensor network technology has greatly improved the capability to monitor temperature, flow rate, pressure, etc. with a refined spatial temporal granularity [44]. In short, detecting severity level in a data-driven framework is not only favorable, but

d

90

fault is.

te

also doable. Until now no work has tried to identify how serious the identified

Ac ce p

In this paper we emphasize the importance of incorporating the prior knowledge of fault dependence (derived from chiller system characteristics). The ob-

95

jective of this paper is to design a novel data-driven FDD method, so as to: (1) recognize faulty working conditions and identify the fault type, (2) determine how serious the identified fault is. We propose a unified framework, namely Tree-structured Fault Dependence Kernel (TFDK) method to include the interclass information and describe the fault severity levels. The tree-structured

100

labels regard the severity levels as child nodes of each fault type rather than viewing them as independent classes. In addition, an on-line learning method is developed to train a multi-class classifier with streaming sensor measurement data. The effectiveness of TFDK has been evaluated on the experimental data of ASHREA Research Project RP-1043, and the results show significant im-

105

provement over the state-of-the-art approaches. 7

Page 7 of 42

B A

B

AB

A

B AB

Water Flow Meter TSO

Valve

A

B

A

Condenser Water to Evaporator Water HX

AB

Stream AB

FWE

B A

ip t

Steam HX bypass

TSI

TWO AB

Drain

A

B AB

Condenser Water to City Water HX

A

B

TOB

FWC 90 ton Centrifugal Chiller

Drain

Chillered Water to Hot Water HX TEO

TBI THI

TWI

TEI

us

Evaporator

TCO

Pump

Condenser

Temperature Sensor

TCI

an

City Water

THO

cr

AB

Figure 3: Schematic of the cooling system test facility and sensors mounted in the related water circuits.

M

Compared with previous building FDD works, this paper presents its contributions in several ways. (1) A TFDK method is derived to make use of the fault dependence information, and thus achieving higher FDD accuracy com-

commodate streaming data, which enables seamless integration to the BMS and

te

110

d

pared with other methods; (2) An on-line learning method is developed to ac-

sequential decision-making for HVAC schedule; (3) Detailed information about

Ac ce p

the building performance is provided by identifying fault severity levels, hence providing researchers and building managers more options on taking actions to handle the faults.

115

In Section II, we present the formulation of structured dependence informa-

tion in the building cooling system. The derivation of TFDK which is based on structured building FDD formulation is given in Section III. Section IV presents the FDD results by TFDK and compares it with other multi-class classification methods. Section V summaries the paper and suggests possible future work.

8

Page 8 of 42

ip t

150

cr

100

TEO TCI kW PRC POfeed

0 0

500

1000

1500

2000

2500

3000

3500

us

50

4000

4500

5000

an

Figure 4: Raw data of five variables under normal condition collected by sensors mounted in cooling system.

2. Structured Class Information in the Building Cooling System

M

120

2.1. Cooling System with Centrifugal Water-cooled Chiller

d

In tropic area, such as Singapore, cooling systems, especially those with

te

centrifugal chillers, account for a large portion of the energy usage of HVAC systems [47]. As shown in Figure 2, a typical centrifugal chiller system consists 125

of: evaporator, compressor, condenser, economizer, motor, pumps, fans, and

Ac ce p

distribution pipes etc. Figure 2 also depicts the chiller refrigerant flow paths. At the beginning of refrigerant cycle, liquid refrigerant is distributed along the evaporator and sprayed through small holes with high pressure in a distributor to uniformly coat each evaporator tube. Here the liquid refrigerant absorbs

130

enough heat from the chiller water that is circulating through the evaporator tubes, thus turning into refrigerant vapor. The chiller water is cooled down during this process. Then the gaseous refrigerant is drawn through the eliminators (which remove droplets of liquid refrigerant from the gas) and delivered into impellers where the gas will be compressed. Once the compression is com-

135

pleted, the gas is discharged into the condenser, where baffles distribute the compressed refrigerant gas evenly across the condenser tube bundle. Cooling tower water which circulates through the condenser tubes absorbs heat from

9

Page 9 of 42

Raw Data:PO_feed under 8 Conditions 200

100

50

0 0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Number of Data Points

cr

Outlier-Removed Data: PO_feed under 8 Conditions 180

140

100 0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

an

Number of Data Points

us

PO_feed (F)

160

120

ip t

PO_feed (F)

150

Figure 5: Outliers of “pressure of oil feed (PO feed)” are removed by Thompson Tau method; the raw data is collected under 8 working conditions.

TEO (F)

60 50 500

1000

1500

M

Original Data: TEO under 8 conditions 70

2000

2500

3000

3500

4000

4500

5000

3500

4000

4500

5000

3500

4000

4500

5000

d

Number of Data Points De-noised Data: TEO under 8 conditions

60 50

te

TEO (F)

70

500

1000

1500

2000

2500

3000

Number of Data Points Residuals

Ac ce p

1 0.5

0

-0.5

500

1000

1500

2000

2500

3000

Number of Data Points

Figure 6: Temperature of leaving evaporator water pre-processed by wavelet de-noising (level=5); the raw data is collected under 8 working conditions.

the refrigerant, thus turning the gaseous refrigerant into liquid. The liquid refrigerant then drains from the bottom of the condenser and passes through an

140

expansion valve, where its pressure and temperature are reduced. At last, the low-pressure mixture enters the evaporator and starts the next cycle. In order to minimize the energy consumed by cooling system, the entering

10

Page 10 of 42

Original Data: TEO under 8 conditions

60 50 500

1000

1500

2000

2500

3000

3500

4000

4500

4000

4500

5000

Number of Data Points De-noised Data: TEO under 8 conditions 60 55 50 45 500

1000

1500

2000

2500

3000

3500

Number of Data Points Residuals

0 -5 -10 500

1000

1500

2000

2500

3000

3500

4000

4500

5000

an

Number of Data Points

5000

us

5

cr

TEO (F)

65

ip t

TEO (F)

70

Figure 7: Temperature of leaving evaporator water pre-processed by wavelet de-noising (level=10); the raw data is collected under 8 working conditions. Original Data: TEO under 8 conditions

60 50 500

1000

1500

M

TEO (F)

70

2000

2500

3000

3500

4000

4500

5000

3500

4000

4500

5000

3500

4000

4500

5000

d

Number of Data Points De-noised Data: TEO under 8 conditions

60 58 56 54

te

TEO (F)

62

500

1000

1500

2000

2500

3000

Number of Data Points Residuals

Ac ce p

5 0 -5 -10 -15

500

1000

1500

2000

2500

3000

Number of Data Points

Figure 8: Temperature of leaving evaporator water pre-processed by and wavelet de-noising (level=15); the raw data is collected under 8 working conditions. It can be viewed that periodic patterns can be removed when the wavelet decomposition level is relatively high.

condenser water temperature set-point should be as low as possible. At the same time, it should be at or above the lowest temperature attainable by cool-

145

ing tower at certain (wet-bulb) air temperature to avoid wasting fan energy for saturated value. The chilled water supply (leaving evaporator water) temperature is maintained at its set-point by regulating the cooling coil inlet valve

11

Page 11 of 42

position. The valve motor opens or closes to tune the valve position by a feedback controller which maintains the pre-set chilled water supply temperature based on the difference between the set-point and measured temperature value

ip t

150

[48]. There are a great number of sensors implemented within cooling systems in

cr

purpose of monitoring and controlling. In order to detect and diagnose typical

faults, analysis of sensor measurements for the most essential variables of cooling

155

us

system is included in this study. In this work, we analyze 24 essential variables according to [31] and [47]. They are common parameters for controlling and

an

monitoring in cooling system, as listed in Table 1 and Table 2. 2.2. Pre-processing Methods for Chiller Sensor Data

The cooling system studied in this paper is a typical centrifugal water-cooled

160

M

chiller system with motor-driven compressor. Following the ASHRAE RP-1043, the sensor measured raw data is in the form of time series, and presents periodic patterns mostly due to the on/off (open/close) states of some components in the

d

system. The raw data of five variables is shown in Figure 4, where periodic pat-

te

terns and obvious outliers can be viewed. Before formulating the tree-structure FDD of the cooling system, we propose to pre-process raw data by removing periodic patterns and outliers so as to avoid the non-negligible side effect of confusing patterns and outliers on data analytics [49].

Ac ce p

165

2.2.1. Periodic Pattern Removing In this study, Wavelet-based De-noising is utilized to remove the confusing

periodic patterns. Wavelet Transform is an infinite set of various transforms ψ (t), and is obtained from a single orthonormal wavelet, called mother wavelet or basic function, by scaling and shifting (translation). The wavelet series can be defined as

1 ψa,b (t) = √ ψ a



t−b a

 (1)

where a and b represent the scale and translation parameters respectively. In the discrete case where the Wavelet Transform can be used for denosing, the

12

Page 12 of 42

scale and translation parameters are discretized as a = 2m and b = n2m . The

ψa,b (t) =ψm,n (t) = 2−m/2 ψ 2−m t − n



ip t

dilated and translated version of the mother wavelet ψ (t) can be written as: (2)

cr

where m and n denote the scale and translation parameters respectively. Given

an original signal f (t), its wavelet coefficients are obtained through the inner Z∞ W (a, b) =

∗ ψa,b (t) f (t)dt

us

product operation:

−∞

(3)

where ∗ is the complex conjugate symbol and ψ is the basic function, which

170

an

can be chosen according to the properties of the given function f . The choice of mother wavelet (e.g. Haar, Daubechies, Coiflets, Symlet, Biorthogonal and

M

etc.) determines the final waveform shape, and in our case we choose Symlet as the basic function.

The essence of de-noising using Discrete Wavelet Transform (DWT) is to

d

reduce the noise in the wavelet transform domain [50]. Define the noisy obser-

te

vations as W = [w1 , w2 , ..., wN ], satisfying W=f+ε

(4)

Ac ce p

where f = [f1 , f2 , ..., fN ] is the desired noise-free signal, and ε = [ε1 , ε2 , ..., εN ]

is the observation noise. Firstly, we apply DWT to the noisy signal to produce

175

the noisy wavelet coefficients to the level which we can properly distinguish the signal pattern. Then we inverse wavelet transform of the filtered wavelet coefficients to obtain a de-noised signal. As shown in Figures 6-8, the raw data of leaving evaporator water temperature is pre-processed by Wavelet-based Denoising with increasing levels of wavelet decomposing. We can see from those

180

figures that the periodic patterns can be removed from the original raw data when the wavelet decomposition level is relatively high. 2.2.2. Outlier Removing Sensor data for each variable is treated as a column vector and thus obvious outliers can be detected by the Modified Thompson’s Tau method [51], which is 13

Page 13 of 42

based on the absolute deviation of each record from the mean of the entire vector. The strength of this method lies in the fact that it takes into account a data

ip t

set’s standard deviation and average, and provides a statistically determined

rejection zone; thus providing an objective method to determine whether a data

cr

point is an outlier. The rejection zone is given by tα/2 (n − 1) τ=√ q n n − 2 + t2α/2

us

(5)

where tα/2 is the critical value from the Student’s t distribution [52], and n is

an

the sample size. The absolute deviation of the data set is σ = |(X − mean (X))/S|

(6)

M

where S is the sample standard deviation. If σ > τ , the data point is an outlier. As shown in Figure 5, the variable “pressure of oil feed (PO feed)” is 185

measured by corresponding sensors under 8 conditions (normal condition and

d

7 faulty conditions), and the time series sensor measurements are smoother

te

without outliers compared with the raw data. 2.3. Tree Structure Formulation

Ac ce p

A large number of possible faults and failures have been identified by ASHRAE

190

RP-1043, while not all of them would be practical for further examination as part of the FDD scheme [44]. It is expected that the faults chosen for experimental testing could be detected and diagnosed by monitoring the thermodynamic states of the chiller. Based on how often one fault occurs and how much economical loss it causes, we pick 7 typical faults as our research content:

195

• Condenser fouling (CF) • Excess oil (EO) • Reduced condenser water flow rate (FWC) • Reduced evaporator water flow rate (FWE)

14

Page 14 of 42

4 Compressor

Air Handling Unit

Cooling Coil Water System

1

Cooling Tower Water System

Cooling Tower

cr

2 Evaporator

Condenser

3

5

ip t

7

6

an

us

1. Condenser fouling 2. Reduced condenser water flow rate 3. Non-condensable in refrigerant 4. Excess oil 5. Refrigerant leakage 6. Refrigerant overcharge 7. Reduced evaporator water flow rate

Figure 9: Seven typical faults and their locations in the cooling system. Faults 1 and 2 occur in the cooling tower water circle; faults 3, 5, and 6 occur in the refrigerant circle; fault 4 occurs

M

in the compressor; and fault 7 occurs in the cooling coil water circle.

• Non-condensable in the refrigerant (NC) • Refrigerant leak/undercharge (RL)

d

200

te

• Refrigerant overcharge (RO)

This paper aims to distinguish the 7 faulty working conditions from the

Ac ce p

normal working condition for a typical cooling system, and also recognize different fault severity levels based on the pre-defined severity level information.

205

Unlike previous FDD methods which assign the faults and their severity levels with plain labels and formulate the FDD task as simple multi-class classification problem, we include the fault dependence information into the feature mapping and encode the fault types as well as their severity levels with tree-structured labels.

210

In the light of the expert knowledge about the cooling system configuration,

the chosen faults happen in different places within the cooling system, which leads to structured inter-class relationships. As shown in Figure 9, chiller water flows through the evaporator pipes and the cooling coil in Air Handling Unit. Therefore the FWE fault, which occurs in the cooling coil water circuit, is 215

relatively not closely related to other faults that occur in other components or 15

Page 15 of 42

Refrigerant Fault

EO

O\L

NC

CF

FWC

RO

us

RL

Condenser Fault

Normal

cr

FWE

ip t

Cooling System

each fault type.

30

31

14-17

6-9

M

Root

an

Figure 10: Chiller faults with tree labelling; gradient arrows represent severity levels under

38

1

d

37

33

22-25

26-29

35

36

2-5

10-13

18-21

Ac ce p

te

32

34

39

Figure 11: Structured labels as a tree for typical chiller faults and corresponding severity levels.

subsystems. Similarly, the EO fault which happens in the compressor motor oil tank is also relatively not closely related to other faults. The NC fault and the RL/RO fault are correlated since they are relevant to the refrigerant. The CF fault, which means condenser pipes are partly blocked, and the FWC fault share

220

the closest correlation because they locate in the cooling tower water circuit and will influence the condenser performance in the first place once happen. On the basis of those prior expert knowledge, we can describe the relationship among the faults and their severity levels with a “tree”, where different fault types as

16

Page 16 of 42

well as the normal condition are described as the branch nodes (non-leaf nodes) 225

and severity levels for one fault are regarded as the leaf nodes rooted from the

ip t

same parent node. The “tree” is depicted in Figure 10, in which the gradient

cr

arrows represent severity levels under each fault type.

3. Feature Mapping for Tree-structured Fault Dependence

us

3.1. Feature Mapping

In this section, we introduce a feature mapping that incorporates the prior knowledge about faults and severity level dependence.

To begin with, let

be a set of labelled training data, where xi ∈ Rd denotes the

an

n {(xi , yi )}i=1

ith record of sensor measurements (d streams) under system condition yi ∈ Y , {1, ..., q} (Detailed data description is presented in Section IV). The tree-

M

structured relationships between chiller faults depicted in Figure 10 can be encoded as Figure 11. Each node, including the leaves for severity levels but except

d

the root, is numbered with an integer k ∈ {1, 2, · · · , s}. In our case, nodes 1−29 are classification categories, the normal situation is encoded as node 1, and the

te

seven faults with their four severity levels are encoded as nodes 2 − 29. Nodes 30 − 39 are intermediate nodes that represent the fault dependence. Next, to

Ac ce p

incorporate the tree structure information in each data sample we consider an attributes reweighing vector Λ (y) ∈ Rs and the transformation Φ : Rd → Rd×s according to [53], such that

230

Φ (x, y) = Λ (y) ⊗ x

(7)

where ⊗ denotes a tensor product, i.e. Φ (x, y) ∈ Rd×s is a vector containing all products of coefficients from the first and second vector argument. Writing out Φ (x, y),



λ1 (y) × x

   λ2 (y) × x Φ (x, y) =    ...  λs (y) × x

       

(8)

17

Page 17 of 42

in which the attributes reweighing vector is defined as

ip t

  v , ifz  y z λz (y) =  0, otherwise

(9)

235

cr

where the relation  denotes that a node z is y or the ancestor of y. The reweighting parameter vz ≥ 0 could be used to include the different influence of

node z on node y. In the simplest case it can be set to 1, and λz becomes an

us

indicator function. In a more refined configuration, one can set vz to a positive number that reflects the depth of node z in the tree.

an

Based on the above transformation one normal class and four severity levels for each of the seven faults are numbered as 29 categories and their dependence constitutes the additional 10 parent nodes in the tree. For example, the labelling

M

and the transformation for level 1 of the RL fault is

Λ (22) = [0, · · · , v22 , · · · , v32 , · · · , v37 , v38 , 0]

T T

d

Φ (x, 22) = [0, · · · , v22 x, · · · , v32 x, · · · , v37 x, v38 x, 0]

With the feature mapping, we consider a general version of discriminant

te

functions F for classification purpose,

Ac ce p

F (x, y; w) , hw, Φ (x, y)i

(10)

For simplicity, let hw, Φ (x, y)i = hwy , xi. It is a straightforward consequence

of the linearity of Eq. (10) to show that one can re-write F as an additive superposition of linear discriminant as follows, F (x, y; w) =

s X

λz (y) hwz , xi

(11)

z=1

where wz ∈ Rd is a weight vector associated with the rth class attribute. As a

concrete example, the discriminant function for node 22 in Figure 11 is hw, Φ (x, 22)i = hw22 , xi + hw32 , xi + hw37 , xi + hw38 , xi

18

Page 18 of 42

Algorithm 1 On-line Update Algorithm Input (xt+1 , yt+1 )

ip t

St+1 ⇐ ∅ while S1 ...St+1 still change do

cr

for ı = t + 1 : −1 : 1 do t+1 P P w= αjy0 δΦj (y 0 ) j=1 y 0 ∈Sj

H (y) = (1 − hw, δΦi (y)i) ∆ (yi , y)

us

y ∗ = arg max H (y) y

ξi = max {H (y)} y∈Si

an

if H (y ∗ ) > ξi + ε then Si ← Si ∪ {y ∗ } αS ← Solve dual with S

M

end if end for

te

Output S1 0 ...St 0 , St+1 0

d

end while

3.2. TFDK Learning Method

Ac ce p

The learning objective is to find optimal parameters w for the classification function f , which can be written as, f (x; w) , arg max F (x, y; w)

(12)

y∈Y

In this work we adopt a large margin learning formulation [54]. Firstly the

multi-class margin of a data sample (xi , yi ) with respect to a parameterization

w can be defined as

γi , F (xi , yi ; w) − max F (xi , y; w) y6=yi

(13)

Consider a category dependent cost ∆(yi , y) for misclassifying yi as y (which is clarified in subsection 4.2, and interested readers can refer to [55] for more information), one arrives at the following L2 regularized soft-margin learning

19

Page 19 of 42

objective n CX 1 2 min kwk + ξi 2 n i=1

240

 

γi (w) ≥ 1 −

ip t

ξi ≥ 0

∀i ξi ∆(yi ,y)

(15)

cr

s. t.

  

(14)

where C is a hyper-parameter that tunes the margin loss penalty.

us

According to Eq. (10) and Eq. (13) γi (w) = hw, Φ (xi , yi )i − max hw, Φ (xi , y)i y6=yi

(16)

an

≤ hw, Φ (xi , yi )i − hw, Φ (xi , y)i (∀y 6= yi ) ∆

Letting δΦi (y) = Φ (xi , yi ) − Φ (xi , y), for ∀i, ∀y 6= yi , we can get

M

hw, δΦi (y)i ≥ γi (w) ≥ 1 −

ξi ∆ (yi , y)

(17)

Thus the second constraint of Eq. (15) can be rewritten as ξi ≥0 ∆ (yi , y)

(18)

d

hw, δΦi (y)i − 1 +

The dual formulation of the above primal problem with the Lagrangian

te

multiplier method [56] is

Ac ce p

n n P P 2 L (w, ξ, α, η) = 21 kwk + C ξi − η i ξi n i=1 i=1   n P P − αiy hw, δφi (y)i − 1 + ∆(yξii ,y)

(19)

i=1 y6=yi

Computing the derivations of L with respect to the primal variables by KKT

conditions [57] results in

n X X ∂L =0⇒w= αiy δΦi (y) ∂w i=1

(20)

X ∂L C αiy = 0 ⇒ ηi = − ∂ξi n ∆ (yi , y)

(21)

y6=yi

y6=yi

Plugging w and ηi into Eq. (19), one can get L = − 21 +

P

P

αiy αjy0 hδΦi (y) , δΦj (y 0 )i

i,j y6=yi ,y 0 6=yj n P P

(22)

αiy

i=1 y6=yi

20

Page 20 of 42

Since Eq. (14) is equivalent to min −L, with the condition that the primal α

ηi =

ip t

variables are non-negative, one can get X C αiy − ≥0 n ∆ (yi , y)

(23)

y6=yi

α

XX 1X X αiy αjy0 hδΦi (y), δΦj (y 0 )i − αiy 2 i,j i y6=yi

y6=yi y 0 6=yj

  αiy ≥ 0 s. t.  P

∀ i,

(24)

∀ y 6= yi ≤

C n

∀i

an

αiy y6=yi ∆(yi ,y)

us

min

cr

Thus the primal-dual transition of the soft-margin learning objective is

Notice that the dual form only involves the inner product [58] of δΦi (y) and

M

δΦi (y 0 ), which admits direct calculation as follows

hδΦi (y), δΦj (y 0 )i = hΛ(y) − Λ(yi ), Λ(y 0 ) − Λ(yj )ihxi , xj i

(25)

Ac ce p

te

d

Hence we define the Tree-structured Fault Dependence Kernel (TFDK) as   0 if y = yi or y 0 = yj K(i,y)(j,y0 ) = (26)  hδΦi (y), δΦj (y 0 )i otherwise where K(i,·)(j,·) is a |Y| × |Y| matrix and constitute the (i, j)th block of the

245

overall kernel matrix K. It’s straightforward to check that K is positive semi-

definite, and thus the learning problem belongs to a convex quadratic program (QP). Although various methods exist in literature to solve convex QP, for the problem at hand, the presence of many linear constraints and the requirement for learning with streaming data motivate us to design an online active set

250

algorithm. Key update steps for the algorithm is summarized in Algorithm 1, where one can simply use empty set for the initialization of active sets. The convergence argument is included in the Appendix. As shown in Figure 12, to integrate the FDD tool into BMS and update its results successively, labeled historical data is firstly accumulated into a fault

255

library and stored in a database as the “batch training” data set, based on 21

Page 21 of 42

Communication System

ip t

Data Collection

Labeled Historical Data

cr

Data Pre-processing

BMS

Archiving

Monitoring System

us

DATABASE

an

On-line Monitoring

Fault Library Training

HVAC Control System

d

Feedback

Operation

TFDK-based Method

M

FDD TOOL

te

Figure 12: Schematic showing how to integrate the FDD tool into BMS.

Ac ce p

which an initial TFDK model is obtained (by running Algorithm 1). Then at each round of detection, real time sensor measurements and monitoring data are collected as the input of a stand-alone program, which implements the classifier, to conduct FDD. The detection results are provided to building managers

260

and operators for further inspection and possible action taken. Their feedback, i.e., another labeled instance, constitutes the “real time training” input of the algorithm, which is designed incrementally. With this online training phase, the TFDK model is refined and is used for future FDD. The process goes on iteratively as mentioned above.

265

Our team - the Building Efficiency and Sustainability in the Tropics (SinBerBEST) program

1

- is operating an Integrated Cyber-Physical test-bed lo-

1 http://sinberbest.berkeley.edu/research/thrust-6-test-bed-integration

22

Page 22 of 42

Table 3: Fault severity levels for each fault and corresponding experimental methods

SL

NM

Description

ip t

Name

Test run under normal condition

cr

CF 1/2/3/4 Plugged 20/33/49/74 tubes (out of 164) in the condenser EO 1/2/3/4 Oil charge 14%/32%/50%/68% more than nominal

us

FWC 1/2/3/4 Reduce condenser water flow rate by 10%/20%/30%/40%

FWE 1/2/3/4 Reduce evaporator water flow rate by 10%/20%/30%/40%

an

NC 1/2/3/4 Adding 0.1/0.16/0.22/0.54 lbs Nitrogen to the refrigerant; displacing about 1.0%/1.8%/2.4%/5.6% of the volume

M

at room temperature

RL 1/2/3/4 Refrigerant charge 10%/20%/30%/40% less than nominal

d

RO 1/2/3/4 Refrigerant charge 10%/20%/30%/40% more than nominal

te

cated in Singapore, and we are working towards implementing the designed FDD strategy in the test-bed together with a set of fault-prevention/reaction

Ac ce p

control laws.

270

4. Validation Results and Comparison 4.1. The Experimental Data The proposed fault detection framework is tested with the data collected

from ASHRAE RP-1043 project. As a brief introduction, one primary goal of the project was to obtain state measurement for a typical cooling system under

275

normal, and various faulty conditions. A 90-ton centrifugal water-cooled chiller is used, which is relatively small such that a comprehensive experiment design is possible, and it also bears enough representatives of chillers used in larger installations [44]. The experiment was conducted in an indoor environment with a nearly constant ambient temperature of 72◦ F , and the specifications of ARI 23

Page 23 of 42

RS-232 to RS-485 Converter

ip t

RS-485

JCI AHU Controller

us

cr

RS-232

PC Running VisSim

an

Centrifugal Chiller with MicroTech Controller

Figure 13: Schematic showing chiller test standard and control interface.

(Air-Conditioning and Refrigeration Institute) Standard 550 for Centrifugal and

M

280

Rotary Screw Water-Chilling Packages were adopted as the test requirement [44]. Sensor measurement is transferred to a database from the MicroTech

d

controllers, which are mounted on the chiller. The test standard is controlled

285

te

by three Johnson Controls Inc Air Handling Unit (JCI AHU) controllers on an N2 bus which is an RS-485 network. As shown in Figure 13, RS-485 is connected

Ac ce p

to the PC through COM Port 1 via an RS-485 to RS-232 converter. During the experiment, 9 typical faults suggested in the ASHRAE RP-1043

were introduced at multiple severity levels. More than 60 tests were conducted, and for each test, 64 variables, including direct sensor measurement and calcu-

290

lated physical indexes, were recorded once every 10 seconds. In this paper, 7 commonly encountered faults are taken into account. Those faults are emulated by various experimental methods as is summarized in Table 3. For example, in the original ASHRAE RP-1043 experiment the NC fault was introduced by incrementally adding Nitrogen to the refrigerant.

295

Considering the availability of sensor measurement in a more practical situation, 24 most accessible variables listed in Table 1 and Table 2 are chosen as algorithm input features. With TFDK method, the desired output includes not

24

Page 24 of 42

30

31

14-17

6-9

38

1

37

39

34

35

36

2-5 32

33 2

4

26-29

10-13

us

22-25

18-21 3

cr

5

ip t

Root

1

an

Figure 14: Structured labels as a tree for typical chiller faults and corresponding severity levels; examples of misclassification cost among severity levels and fault types.

M

only normal/fault types, but also 4 severity levels if a fault is detected. Data groups of all fault types with four severity levels as well as data collected under 300

normal condition are defined as the training data sets, i.e. 29 categories in total.

d

We encode those categories with tree-structured labels as depicted in Figure 11.

te

In order to justify the adopted Thompson Tau and Wavelet-based De-noising methods, we compare two cases with and without the data pre-processing tech-

Ac ce p

niques mentioned in section II. 305

4.2. Evaluation Measures

To evaluate the effectiveness of the tree-structured classification method on

a more rigorous basis, we employ two measures: testing accuracy and testing cost.

4.2.1. Testing Accuracy

The goal is to estimate the chance that the predictor f (x) is correct on

future unseen data, i.e., the generalization performance of the predictor. In this work, we use the empirical accuracy on a batch testing data set as an unbiased estimator. Let sign [., .] be 1 if the predicted label of one testing data point

25

Page 25 of 42

(a) Pre-processed Data

100

(b) Raw Data

100

ip t

90

90

80

70

60

50

TFDK MSVM DT NN AB QDA LA

30

60

50

40

TFDK MSVM DT NN AB QDA LA

30

20

20

10 0

50

100

150

200

0

50

100

150

200

Training Sample Size

an

Training Sample Size

us

40

70

cr

Classification Accuracy

Classification Accuracy

80

Figure 15: Classification accuracy as a function of training sample size by different methods; TFDK generates the highest accuracy. Figure (a) shows the classification accuracy of different

M

methods by data that is pre-processed by de-nosing and outlier removing; figure(b) is directly by raw data.

d

accorded with its original label and 0 otherwise, the testing accuracy is n

te

Accu (f ) =

1X sign [f (xi ) , yi ] n i=1

Ac ce p

  1 sign [f (xi ) , yi ] =  0

310

f (xi ) = yi

(27)

(28)

f (xi ) 6= yi

where f (xi ) is the predicted label for testing data point xi as in Eq. (12), which represents that the data point is recognized as a certain severity level of one fault type, and yi is the true label that records the real experiment condition.

4.2.2. Testing Cost

While testing accuracy is an unbiased estimator of classification correctness,

315

it treats all errors equally important, i.e., all types of errors induce the same cost. In practice, however, the seriousness and the consequence of committing different types of errors may vary significantly. In particular based on the tree-structured relationships between different faults in Figure 11, the misclassification of one category to another category will cause different losses. In order 26

Page 26 of 42

100 Pre-processed data Raw data

90

ip t

80

60

50

cr

Classification Accuracy

70

40

20

10

0 TFDK

MSVM

DT

NN

AB

QDA

LA

an

Training Sample Size: 30

us

30

Figure 16: Comparison of classification accuracy by pre-processed data and raw data. Data

320

M

pre-processing helps to improve the classification accuracy.

to incorporate this consideration, we define the cost of misclassification among severity levels under the same fault type to be the lowest, the cost of misclassi-

d

fication among fault types derived from different parent nodes to be higher, and

te

the cost of recognizing fault as normal to be the highest. Especially, one can assign a cost that is proportional to the node distance in the tree depicted in Figure 14. For instance, the cost of misclassification among leaf nodes 26 − 29

Ac ce p

325

is 1; and the cost among leaf nodes 26 − 29 and 18 − 21 is 3. Putting all the

defined costs in a cost matrix ∆, the misclassification cost can be characterized by a loss function as

q P q P

∆ij g (yj , f (xi ))

i=1 j=1

Fcost (f ) = P P

∆ij g 0 (yj , f (xi ))

(29)

where g (yj , f (xi )) is in fact a confusion matrix in which each row represents the

330

number of samples in predicted class while each column represents the samples in actual (true) class, g 0 (yj , f (xi )) represents how many testing data points will be classified to category j if testing data from category i is averagely classified to other categories, and ∆ij is the cost of classifying test data point from category

i to j (∆ij = 0 if i = j), and here Fcos t (f ) is the absolute cost value for the 27

Page 27 of 42

0.6

Misclassification Cost

Misclassification Cost

0.5

TFDK MSVM DT NN AB QDA LA

0.7

0.4

0.3

0.5

0.4

cr

0.6

(b) Raw Data

0.8 TFDK MSVM DT NN AB QDA LA

0.3

0.2

0.1

0

0 0

50

100

150

200

0

50

100

150

200

Training Sample Size

an

Training Sample Size

us

0.2

0.1

ip t

(a) Pre-processed Data

0.7

Figure 17: Misclassification cost as a function of training sample size by different methods; TFDK generates the lowest cost. Figure (a) shows the misclassification cost of different

M

methods by data that is pre-processed by de-nosing and outlier removing; figure (b) is directly by raw data.

0.25

d

Pre-processed data Raw data

te

0.15

Ac ce p

Misclassification Cost

0.2

0.1

0.05

0

TFDK

MSVM

DT

NN

AB

QDA

LA

Training Sample Size: 30

Figure 18: Comparison of misclassification cost by pre-processed data and raw data. Data pre-processing helps to reduce the misclassification cost.

335

classifier f . Notice that the misclassification cost is considered as one optimization constraint in Eq. (24).

28

Page 28 of 42

4.3. Results and Comparison

340

ip t

In order to justify the statistical performance of the proposed FDD framework, we adopt a classical “training-testing” procedure. The pre-processed data

is randomly divided into two parts, one for fitting the TFDK model and the

cr

other one for testing the attained model on unseen data set. Since labeled data usually has limited availability in practice, we train the TFDK classifier with

345

us

various sample sizes to analyze its impact on testing accuracy. Given that the raw data of ASHRAE RP-1043 are collected every 10 seconds, sample size also represents the time duration spent on data collection. For example, within 10

an

minutes, sensors can collect 60 data samples, each with 24 channels. In this work, we train the classifier with 8 different sample sizes (i.e. 6, 12, 18, 30, 48, 90, 120, and 180). For each configuration, the testing data is randomly chosen from the pre-processed testing data set and testing sample size is 1600 for each

M

350

fault type (400 for each severity level) and 400 for the normal condition.

d

4.3.1. Comparison of Accuracy and Cost Among Different Methods

te

We compare TFDK with other state-of-the-art methods, including Multiclass SVM (MSVM) with RBF kernel, Decision Tree (DT), Neural Network (NN), Ada Boost (AB), Quadratic Discriminant Analysis (QDA), and Logistic

Ac ce p

355

Regression (LR). Figure 15 shows the classification accuracy as a function of training sample size for all the methods. In order to demonstrate the effectiveness of de-trending and de-noising, two sets of experiments were conducted with (Figure 15 (a)) and without (Figure 15(b)) the proposed pre-processing

360

technique. Similarly the results of testing cost as a function of training sample size for all the methods are shown in Figure 17. It is seen that TFDK outperforms all the other methods in terms of testing

accuracy and cost under different training configurations. More specifically, TFDK achieves 1.49% to 9.19% improvement in accuracy and 10.69% to 75%

365

decrease in testing cost compared to the runner-up method. The enhancement is more significant when the sample size is larger. Although it appears that the improvement is not obvious under small sample size (≤ 12), TFDK has 29

Page 29 of 42

(a) TFDK: small sample size Accuracy=69.64%, Cost=0.1604

(b) TFDK: large sample size Accuracy=99.12%, Cost=0.0175 1500

3

13

5

3

4

3

5

NM 400

0

0

0

0

0

0

0

1515 26

21

6

12

6

11

CF

0

1596

4

0

0

0

0

0

8

EO

4

22 1494 18

16

11

12

23

FWC

1

23

26 1507

7

13

12

11

FWE

5

25

31

21 1461 18

13

26

NC

5

13

24

12

4

1508 11

23

RL

1

6

17

5

7

8

1113 443

0

9

16

12

10

10

368 1175

RO

1000

EO

0

3

1596

0

1

0

0

0

FWC

0

0

1

1599

0

0

0

0

FWE

0

1

0

0

1599

0

0

0

NC

0

0

1

0

0

1599

0

0

RL

0

0

2

0

0

0

1590

8

0

2

1

0

1

0

1500

ip t

CF

1000

500

RO

500

10 1586

cr

NM 359

0

0

NM CF EO FWCFWE NC RL RO

(c) MSVM: small sample size Accuracy=68.08%, Cost=0.2090

(d) MSVM: large sample size Accuracy=89.19%, Cost=0.0700

us

NM CF EO FWCFWE NC RL RO

1500

NM 351

43

5

CF

1

1174 211

0

0

0

0

1

5

35

3

139

32

EO

99

125 1131 10

12

166

41

16

FWC

0

68

53 1265 27

84

16

87

FWE

0

4

58

39 1440

0

56

3

NC

3

6

3

0

75 1122 334

57

RL

0

72

20

1

20

41 1197 249

0

228

13

1

0

8

1500

NM 390 CF

1

1

9

1518 27

0

0

0

0

0

11

12

19

0

EO

4

83 1472

0

11

6

24

FWC

0

74

85 1285 80

13

16

47

FWE

0

1

84

2

1

2

NC

0

5

0

19

87 1397 40

52

RL

0

30

4

114

57

181 923 291

1

157

4

22

0

65

4

1506

an

1000

RO

0

12

500

RO

305 1045 0

500

33 1318 0

NM CF EO FWCFWE NC RL RO

M

NM CF EO FWCFWE NC RL RO

1000

Figure 19: Confusion matrix of TFDK and MSVM among fault types under small training sample size and large training sample respectively. In (a) and (c), both TFDK and MSVM

d

are trained with small training sample size, and they generate similar classification accuracy, 69.64% and 68.08%. However, TFDK presents very little misclassification among fault types.

te

In (b) and (d), TFDK and MSM are trained with relatively large training sample size. TFDK presents very high classification accuracy, while MSVM still presents obvious misclassification.

Ac ce p

extra advantage of being robust to inter fault type misclassification, as will be revealed later with confusion matrix.

370

As expected the testing accuracy/cost increases/decreases accordingly with

the increment of training samples. For instance, the testing accuracy of TFDK has boosted from 69.64% (6 training samples) to 99.12% (180 training samples); similar trends can be observed for the other methods, which reaffirms the intuition that accumulating more training data is beneficial to data-driven

375

FDD.

Comparing the two sub-plots (a) and (b) of Figure 15 and Figure 17, we view that those methods with pre-processed data present better results in general. Specifically, we look into the case when the training sample size is 30, and compare the testing accuracy and cost for different methods in Figure 16 and

30

Page 30 of 42

(a) Severity Level Recognition of EO Fault (TFDK Accuracy=69.64%) 350

EO_1

2

2

374

3

2

4

4

2

2

2

3 300

0

9

0

356

11

1

6

5

3

3

6 200

EO_3

1

8

1

13

354

0

3

5

2

5

8

EO_4

1

3

4

5

3

363

5

4

4

2

6

NM

CF

EO_1

EO_2

EO_3

EO_4

FWC

FWE

NC

RL

RO

150 100

cr

50

ip t

250

EO_2

0

us

(b) Severity Level Recognition of EO Fault (MSVM Accuracy=68.08%)

350

EO_1

1

66

258

63

1

0

0

2

0

9

0

300 250

10

30

213

1

1

EO_3

8

49

0

30

226

25

EO_4

0

0

0

19

0

264

NM

CF

EO_1

EO_2

EO_3

EO_4

6

9

0

27

13

an

90

200

4

0

50

5

3

0

1

116

0

0

FWC

FWE

NC

RL

RO

M

EO_2

150 100 50 0

Figure 20: Confusion matrix of TFDK and MSVM for the severity levels of the EO fault under small training sample size. To inspect the severity level identification rates of EO fault under

d

small training sample size, (a) shows that most of TFDK’s misclassification occurs among its

te

four severity levels; while (b) shows that MSVM presents misclassification to both its four severity levels and other fault types.

Figure 18, respectively. It is seen that the proposed pre-processing techniques

Ac ce p

380

greatly improve the performance of several methods such as TFDK, MSVM and AB.

4.3.2. Advantages of Incorporating Fault Dependence Tree To further investigate the benefit of including the prior knowledge of fault

385

dependence, we compare detailed classification results for TFDK and MSVM. The comparative results are able to reflect the effect of tree-structured fault dependence information because TFDK can be viewed as a hierarchical variation of the traditional large margin SVM. Figure 19 (a) and (c) are the confusion matrixes of MSVM and TFDK respectively when the training sample size is 6,

390

which is the smallest training sample size in our test; and Figure 19 (b) and (d)

31

Page 31 of 42

are the confusion matrixes for MSVM and TFDK respectively under the largest training sample size of our test, which is 180.

ip t

As mentioned earlier, in the case of small training sample size, TFDK does

not bear notable improvement in accuracy compared to MSVM. However, close

scrutiny of Figure 19 (a) vs. (c) and Figure 20 (a) vs. (b) reveals that TFDK

cr

395

presents much lower misclassification rate among fault types. In Figure 20 (a)

us

and (b), we show the detailed prediction assignment for EO fault by TFDK and MSVM. Indeed, the errors of TFDK mainly occur among severity levels while the correct fault types have already been assigned (Figure 20 (a)). On the other hand, quite a few errors committed by MSVM occur among different fault types

an

400

(Figure 20 (b)).

In the case of larger training sample size, the classification accuracy of

M

MSVM is 89.19% which appears relatively high from the FDD perspective, nevertheless Figure 19 (d) presents that MSVM still generates significant mis405

classification rate among fault types under the large training sample size situa-

d

tion. Among all the methods, when the training sample size is 180 the proposed

te

TFDK behaves with extremely high classification accuracy (99.12%) and very

Ac ce p

low misclassification cost, which is shown in Figure 19 (b).

5. Conclusions and Future Work

410

In this paper, we have proposed a novel data-driven FDD method and devel-

oped corresponding on-line learning algorithm for streaming data. The integration of fault dependence information and the task of severity level detection are firstly considered in this work. Instead of using traditional classification methods which give each category plain labels and ignore the relationship among

415

different faults, we derive a hierarchical kernel learning method which assigns tree-structured labels to the faults. To be specific, we encode the fault dependence information as a “tree” and describe the severity levels as child nodes of each fault type rather than treating them as independent classes. With that, the prior knowledge of the system and the task of identifying fault severity levels

32

Page 32 of 42

420

are treated in a unified framework. We have formulated the tree-structured learning method to diagnose typical

ip t

faults of building cooling system. This method will be applied to identify faults

for more building sub-systems in the future work. For example, monitor the

425

cr

performance of the whole building HVAC system, and recognize all the typical

faults in cooling sub-system, AHU sub-system, and VAV sub-system with a

us

uniform classifier. In addition, other than utilizing expert knowledge to build the tree-structured relationship among faults, the hidden information that cannot be directly described by the physical structure of building system will also be

430

an

explored. We intend to introduce random forest or fixed-point model to capture the hidden information, and combine those information with expert knowledge

M

to capture more enhanced structure of common faults.

6. Appendix

d

6.1. Convergence Argument of the On-line Update Algorithm 6.1.1. Step One

te

Our notation in this paper follows the large margin formulation in [54].

435

Interested readers are referred to [59, 60] for more background information.

Ac ce p

To prove that sufficient improvement can be obtained for the objective func-

tion Eq. (24) in each iteration, firstly consider the dual formulation in Eq. (22) as

1 J (α) = − αT Kα + nT α 2

(30)

Define β as the update step size and τ as the update direction. We have ∆

δJ (β) = J (α + βτ ) − J (α) = − 21 τ T Kβτ − 12 τ T β T Kα − 21 τ T β T Kβτ + βnT τ

(31)

= −βαT Kτ − 21 β 2 τ T Kτ + βnT τ Thus by denoting h∇J (α) , τ i = nT τ − αT Kτ ∂δJ(β) ∂β

 = −βτ T Kτ − αT Kτ − nT τ = 0

⇒ β∗ =

nT τ −αT Kτ τ T Kτ

=

h∇J(α),τ i τ T Kτ

(32)

33

Page 33 of 42

Now substitute β ∗ into δJ (β) 2

2 1 (h∇J (α) , τ i) ∆ 1 Dατ = · (Dατ = h∇J (α) , τ i) 2 τ T Kτ 2 τ T Kτ

Since β is within a bounded section 0 ≤ β ≤ B (I) If β ∗ ≤ B, then 2 1 Dατ · T ; 2 τ Kτ

cr

δJ (β ∗ ) =

(34)

us

(II) If β ∗ ≥ B, since J is Convex Quadratic δJ (β ∗ ) ≥ δJ (B)

 = B nT τ − αT Kτ − 21 B 2 τ T Kτ B 2

= BDατ −

· τ T Kτ

an

2

(33)

ip t

δJ (β ∗ ) =

M

ατ Note that τ T Kτ > 0 and B ≤ β ∗ = τD T Kτ , then   1 Dατ 1 δJ (β ∗ ) ≥ B Dατ − · T · τ T Kτ = BDατ 2 τ Kτ 2

(35)

(36)

Hence, from (I) and (II), we can get 1 2

min

d

max δJ (β) ≥

0≤β≤B

6.1.2. Step Two

te

=

Dατ 2

n

2 Dατ , BDατ τ T Kτ

min



Dατ ,B τ T Kτ

o (37)



Ac ce p

At each step, assume (xi , yi ) is newly added. Optimize αiy in Eq. (24) with

the upper bound

αiy ≤ ∆ (yi , y)

C ∆ =B n

(38)

Consider the dual formulation in Eq. (22). It is easy to see X ∂L (α) =1− αjy0 K(i,y)(j,y0 ) = 1 − hw, δΦi (y 0 )i ∂αiy j,y

(39)

Since H (y) = (w, δΦi (y 0 )) ∆ (yi , y) and H (y ∗ ) ≥ ξi + ε, then ∂L (α) ξi + ε ε ≥ ≥ (∆ (yi , y) > 0, ξi ≥ 0) ∂αiy ∆ (yi , y) ∆ (yi , y)

(40)

Assuming the step size τ = 1, we can derive

Dατ = nT τ − αT Kη = 1 − αT K =

∂L (α) ∂αiy

(41)

34

Page 34 of 42

Substituting Eq. (38) and Eq. (41) to the result of Step One (Eq. (37)), we can get 1 min 2



 1 ∂L (α) C ∂L (α) · , · ∆ (yi , y) · K ∂αiy n ∂αiy

Due to Eq. (40) 1 2

min 1 2

·

n

ε2 , Cε K[∆(yi ,y)]2 n

min

ε C ∆(yi ,y) , n

o · ∆ (yi , y) · o

1 K

ε ∆(y,yi )

(43)

us

=

n

cr

δL (β) ≥

(42)

ip t

δL (β) ≥

If (xi , yi ) is already in the active set, the search direction τ could be tuned

6.1.3. Step Three

(44)

M

440

an

and with a similar argument. We can obtain ( ) 1 Cε ε2 δL (β) ≥ min 2, 2 4K[∆ (yi , y)] n

By denoting Eq. (24) as L (α) and Eq. (14) as P (w), based on the PrimalDual Theory we know that

te

d

L (α) ≤ min P (w)

(45)

Let w = 0, according to Eq. (15) (thus ξi > ∆ (yi , y)) ∆

2

Ac ce p

P (w) = min 21 kwk + ≥0+

C n

n P

C n

n P

ξi

i=1

(46)

∆ (yi , y)

i=1

L (α) ≤ min P (w) =

C n

n P

∆ (yi , y)

i=1



(47)

≥ C · max ∆ (yi , y) = C · ∆max y

Hence the optimal improvement of L (α) is at most C · ∆max . For each step n o ε2 as depicted in Eq. (44). Now the improvement is at least 12 min 4K∆ , Cε 2 n max

we can conclude that the algorithm will converge in the following steps   C · ∆max 4CK∆3max n∆max n o = 2 max , 1 ε2 ε2 ε , Cε 2 min 4K∆2 n

(48)

max

35

Page 35 of 42

References

ip t

[1] G. Mantovani, L. Ferrarini, Temperature control of a commercial building with model predictive control techniques, IEEE Transactions on Industrial

445

cr

Electronics 62 (4) (2015) 2651–2660.

[2] J. Yao, G. T. Costanzo, G. Zhu, B. Wen, Power admission control with predictive thermal management in smart buildings, IEEE Transactions on

us

Industrial Electronics 62 (4) (2015) 2642–2650.

[3] A. Schumann, J. Hayes, P. Pompey, O. Verscheure, Adaptable fault identi-

450

an

fication for smart buildings, in: Artificial Intelligence and Smarter Living, AAAI Workshop, 2011.

M

[4] A. Handbook, Hvac applications, ASHRAE Handbook, Fundamentals. [5] H. Dibowski, J. Ploennigs, K. Kabitzsch, Automated design of building

(2010) 3606–3613.

[6] T. Novak, A. Gerstinger, Safety-and security-critical services in building

te

455

d

automation systems, IEEE Transactions on Industrial Electronics 57 (11)

automation and control systems, IEEE Transactions on Industrial Elec-

Ac ce p

tronics 57 (11) (2010) 3614–3621.

[7] M. Comstock, J. Braun, E. Groll, The sensitivity of chiller performance to common faults, HVAC & R Research. 7 (3) (2001) 263–279.

460

[8] S. Wang, J. Cui, A robust fault detection and diagnosis strategy for centrifugal chillers, HVAC & R Research. 12 (3) (2006) 407–428.

[9] S. Katipamula, M. R. Brambley, Review article: Methods for fault detection, diagnostics, and prognostics for building systemsa review, part ii, HVAC & R Research 11 (2) (2005) 169–187.

465

[10] X. Dai, Z. Gao, From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis, IEEE Transactions on Industrial Informatics. 9 (4) (2013) 2226–2238. 36

Page 36 of 42

[11] Y. Yu, D. Woradechjumroen, D. Yu, A review of fault detection and diagnosis methodologies on air-handling units, Energy and Buildings 82 (2014) 550–562.

ip t

470

[12] Z. Gao, C. Cecati, S. X. Ding, A survey of fault diagnosis and fault-tolerant

cr

techniques-part i: fault diagnosis with model-based and signal-based ap-

proaches, IEEE Transactions on Industrial Electronics 62 (6) (2015) 3757–

475

us

3767.

[13] Z. Gao, C. Cecati, S. Ding, A survey of fault diagnosis and fault-tolerant

an

techniques part ii: Fault diagnosis with knowledge-based and hybrid/active approaches, IEEE Transactions on Industrial Electronics. [14] D. J. Cook, S. K. Das, How smart are our environments? an updated look

480

M

at the state of the art, Pervasive and mobile computing 3 (2) (2007) 53–73. [15] A. Purarjomandlangrudi, A. H. Ghapanchi, M. Esmalifalak, A data mining

d

approach for fault diagnosis: An application of anomaly detection algo-

te

rithm, Measurement 55 (2014) 343–352. [16] S. Wu, J. Sun, A top-down strategy with temporal and spatial partition

Ac ce p

for fault detection and diagnosis of building hvac systems, Energy and 485

Buildings 43 (9) (2011) 2134–2139.

[17] Y. Hu, H. Chen, J. Xie, X. Yang, C. Zhou, Chiller sensor fault detection using a self-adaptive principal component analysis method, Energy and buildings 54 (2012) 252–258.

[18] S. Li, J. Wen, A model-based fault detection and diagnostic methodology

490

based on pca method and wavelet transform, Energy and Buildings 68 (2014) 63–71.

[19] B. Sun, P. B. Luh, Q.-S. Jia, Z. O’Neill, F. Song, Building energy doctors: An spc and kalman filter-based method for system-level fault detection in hvac systems, IEEE Transactions on Automation Science and Engineering. 495

11 (1) (2014) 215–229. 37

Page 37 of 42

[20] B. Sun, P. B. Luh, Z. O’Neill, F. Song, Building energy doctors: Spc and

Science and Engineering (CASE), IEEE, 2011, pp. 333–340.

ip t

kalman filter-based fault detection, in: IEEE Conference on Automation

[21] H. Wang, Y. Chen, C. W. Chan, J. Qin, An online fault diagnosis tool of vav terminals for building management and control systems, Automation

cr

500

in Construction 22 (2012) 203–211.

us

[22] G. Mustafaraj, J. Chen, G. Lowry, Development of room temperature and relative humidity linear parametric models for an open office using bms

505

an

data, Energy and Buildings 42 (2010) 348–356.

[23] D. J. Hill, B. S. Minsker, E. Amir, Real-time bayesian anomaly detection for environmental sensor data, in: Proceedings of the Congress-International

M

Association for Hydraulic Research, Vol. 32, Citeseer, 2007, p. 503. [24] Y. Zhao, F. Xiao, S. Wang, An intelligent chiller fault detection and diag-

(2013) 278–288.

te

510

d

nosis methodology using bayesian belief network, Energy and Buildings 57

[25] F. Xiao, Y. Zhao, J. Wen, S. Wang, Bayesian network based fdd strategy

Ac ce p

for variable air volume terminals, Automation in Construction 41 (2014) 106–118.

[26] B. Fan, Z. Du, X. Jin, X. Yang, Y. Guo, A hybrid fdd strategy for lo-

515

cal system of ahu based on artificial neural network and wavelet analysis, Building and environment 45 (12) (2010) 2698–2708.

[27] Y. Zhu, X. Jin, Z. Du, Fault diagnosis for sensors in air handling unit based on neural network pre-processed by wavelet and fractal, Energy and buildings 44 (2012) 7–16.

520

[28] Z. Du, B. Fan, X. Jin, J. Chi, Fault detection and diagnosis for buildings and hvac systems using combined neural networks and subtractive clustering analysis, Building and Environment 73 (2014) 1–11.

38

Page 38 of 42

[29] Z. Du, X. Jin, Multiple faults diagnosis for sensors in air handling unit using fisher discriminant analysis, Energy Conversion and Management. 49 (12) (2008) 3654–3665.

ip t

525

[30] P. Jaikumar, A. Gacic, B. Andrews, M. Dambier, Detection of anomalous

cr

events from unlabeled sensor data in smart building environments, in: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing

530

us

(ICASSP), IEEE, 2011, pp. 2268–2271.

[31] Y. Zhao, S. Wang, F. Xiao, Pattern recognition-based chillers fault detec-

an

tion method using support vector data description (svdd), Applied Energy. 112 (2013) 1041–1048.

[32] Y. Zhao, F. Xiao, J. Wen, Y. Lu, S. Wang, A robust pattern recognition-

535

M

based fault detection and diagnosis (fdd) method for chillers, HVAC&R Research 20 (7) (2014) 798–809.

d

[33] J. Liang, R. Du, Model-based fault detection and diagnosis of hvac systems using support vector machine method, International Journal of refrigeration

te

30 (6) (2007) 1104–1114.

Ac ce p

[34] H. Han, Z. Cao, B. Gu, N. Ren, Pca-svm-based automated fault detection 540

and diagnosis (afdd) for vapor-compression refrigeration systems, HVAC & R Research 16 (3) (2010) 295–313.

[35] K.-Y. Chen, L.-S. Chen, M.-C. Chen, C.-L. Lee, Using svm based method for equipment fault detection in a thermal power plant, Computers in industry 62 (1) (2011) 42–50.

545

[36] K. Yan, W. Shen, T. Mulumba, A. Afshari, Arx model based fault detection and diagnosis for chillers using support vector machines, Energy and Buildings 81 (2014) 287–295. [37] T. Mulumba, A. Afshari, K. Yan, W. Shen, L. K. Norford, Robust modelbased fault diagnosis for air handling units, Energy and Buildings. 86 (2015)

550

698–707. 39

Page 39 of 42

[38] D. Dietrich, D. Bruckner, G. Zucker, P. Palensky, Communication and com-

tions on Industrial Electronics 57 (11) (2010) 3577–3584.

ip t

putation in buildings: A short introduction and overview, IEEE Transac-

[39] S. Yin, S. X. Ding, X. Xie, H. Luo, A review on basic data-driven approaches for industrial process monitoring, IEEE Transactions on Indus-

cr

555

trial Electronics 61 (11) (2014) 6418–6428.

us

[40] Y. Keigo, I. Minoru, Y. Takehisa, M. Kazuo, S. Masaki, M. Yoshio, Identification of causal variables for building energy fault detection by semi-

560

an

supervised lda and decision boundary analysis, in: IEEE International Conference on Data Mining Workshop (ICDMW’08), IEEE, 2008, pp. 164–173. [41] I. Tsochantaridis, T. Hofmann, T. Joachims, Y. Altun, Support vector ma-

M

chine learning for interdependent and structured output spaces, in: Proceedings of the twenty-first international conference on Machine learning,

565

d

ACM, 2004, p. 104.

[42] S. Dumais, H. Chen, Hierarchical classification of web content, in: Proceed-

te

ings of the 23rd annual international ACM SIGIR conference on Research

Ac ce p

and development in information retrieval, ACM, 2000, pp. 256–263. [43] L. K. Norford, J. A. Wright, R. A. Buswell, D. Luo, C. J. Klaassen, A. Suby, Demonstration of fault detection and diagnosis methods for air-handling

570

units, HVAC&R Research 8 (1) (2002) 41–71.

[44] M. Comstock, J. Braun, Fault detection and diagnostic (fdd) requirements and evaluation tools for chillers, West Lafayette, IN: ASHRAE.

[45] S. Li, J. Wen, X. Zhou, C. J. Klaassen, Development and validation of a dynamic air handling unit model, part 1 (rp-1312), ASHRAE Transactions

575

116 (1) (2010) 45. [46] S. Li, J. Wen, X. Zhou, C. J. Klaassen, Development and validation of a dynamic air handling unit model, part 2 (rp-1312), ASHRAE Transactions 116 (1) (2010) 57. 40

Page 40 of 42

[47] S. Wang, J. Cui, Sensor-fault detection, diagnosis and estimation for cen580

trifugal chiller systems using principal-component analysis method, Ap-

ip t

plied Energy 82 (3) (2005) 197–213.

[48] Y. Jia, Model-based generic approaches for automated fault detection, di-

cr

agnosis, evaluation (fdde) and for accurate control of field-operated centrifugal chillers, Ph.D. thesis (2002).

[49] X. Li, C. P. Bowers, T. Schnier, Classification of energy consumption in

us

585

buildings with outlier detection, IEEE Transactions on Industrial Elec-

an

tronics 57 (11) (2010) 3639–3644.

[50] H. Xie, L. E. Pierce, F. T. Ulaby, Sar speckle reduction using wavelet denoising and markov random field modeling, IEEE Transactions on Geoscience and Remote Sensing 40 (10) (2002) 2196–2212.

M

590

[51] J. M. Cimbala, Modified thompson tau used for determination of outliers,

d

Penn State University.

te

[52] D. R. Cox, D. V. Hinkley, Theoretical statistics, CRC Press, 1979. [53] L. Cai, T. Hofmann, Hierarchical document categorization with support vector machines, in: Proceedings of the thirteenth ACM international con-

Ac ce p

595

ference on Information and knowledge management, ACM, 2004, pp. 78–87.

[54] K. Crammer, Y. Singer, On the algorithmic implementation of multiclass kernel-based vector machines, The Journal of Machine Learning Research 2 (2002) 265–292.

600

[55] K. Wang, S. Zhou, S. C. Liew, Building hierarchical classifiers using class proximity, in: 25th International Conference on Very Large Data Bases, Vol. Proceedings of VLDB-99. [56] R. Bellman, Dynamic programming and lagrange multipliers, Proceedings of the National Academy of Sciences 42 (10) (1956) 767–769.

41

Page 41 of 42

605

[57] J. C. Platt, Using analytic qp and sparseness to speed training of support vector machines, Advances in neural information processing systems (1999)

ip t

557–563.

[58] S. Fine, K. Scheinberg, Efficient svm training using low-rank kernel repre-

[59] C. J. Burges, A tutorial on support vector machines for pattern recognition, Data mining and knowledge discovery 2 (2).

us

610

cr

sentations, The Journal of Machine Learning Research 2 (2002) 243–264.

[60] Y. Zhou, J. Y. Baek, D. Li, C. J. Spanos, Optimal Training and Efficient

an

Model Selection for Parameterized Large Margin Learning, Springer, 2016,

Ac ce p

te

d

M

pp. 52–64.

42

Page 42 of 42