Accepted Manuscript Title: Fault Detection and Diagnosis for Building Cooling System With A Tree-structured Learning Method Author: Dan Li Yunxun Zhou Guoqiang Hu Costas J. Spanos PII: DOI: Reference:
S0378-7788(16)30506-0 http://dx.doi.org/doi:10.1016/j.enbuild.2016.06.017 ENB 6752
To appear in:
ENB
Received date: Revised date: Accepted date:
6-1-2016 4-6-2016 6-6-2016
Please cite this article as: Dan Li, Yunxun Zhou, Guoqiang Hu, Costas J. Spanos, Fault Detection and Diagnosis for Building Cooling System With A Tree-structured Learning Method, (2016), http://dx.doi.org/10.1016/j.enbuild.2016.06.017 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ip t
Fault Detection and Diagnosis for Building Cooling System With A Tree-structured Learning Method Dan Lia , Yunxun Zhoub , Guoqiang Hua , Costas J. Spanosb a School
us
cr
of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798. b Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720.
an
Abstract
In order to save energy and improve the performance of building environment regulation, there is an increasing need for fault detection and diagnosis (FDD).
M
This paper investigates the effectiveness of tree-structured learning method for FDD of building cooling system. Researchers have been tackling building FDD task with a wide variety of techniques, such as analytical model-based, signal-
d
based and knowledge-based methods. Recently data-driven method has shown
te
its advantage in dealing with complex systems with random penetrations. Existing work on data-driven FDD merely formulates the task as a pure fault type classification problem, whereas fault severity levels and their inter-dependence
Ac ce p
have long been ignored. We propose a novel data-driven strategy that adopts structured labelling to include the dependence information and describe the severity levels in a large margin learning framework. A Tree-structured Fault Dependence Kernel (TFDK) method is derived and a corresponding on-line learning algorithm is developed for streaming data. As an improvement of traditional classification methods (e.g. SVM), TFDK encodes tree-structured fault dependence in its feature mapping, and takes regularized misclassification I This research is funded by the Republic of Singapore’s National Research Foundation under its Campus for Research Excellence and Technological Enterprise (CREATE) programme through a grant to the Berkeley Education Alliance for Research in Singapore (BEARS) for the Singapore-Berkeley Building Efficiency and Sustainability in the Tropics (SinBerBEST) Program. BEARS has been established by the University of California, Berkeley as a center for intellectual excellence in research and education in Singapore.
Preprint submitted to Journal of LATEX Templates
June 7, 2016
Page 1 of 42
cost as learning objective. Following the ASHRAE Research Project 1043 (RP1043), the strategy is applied to the FDD of a 90-ton centrifugal water-cooled
ip t
chiller. Experimental results show that compared to previous data-driven meth-
ods, TFDK can greatly improve the FDD performance as well as recognize the
cr
fault severity levels with high accuracy.
Keywords: Fault Detection and Diagnosis (FDD), Building Cooling System,
us
Data-driven Method, Pattern Classification, Machine Learning Method.
an
1. Introduction
Building energy consumption contributes to more than 40% of the total energy usage worldwide [1, 2]. Almost 32% of the total energy consumption
5
M
in industrialized countries is used by heating, ventilation, and air-conditioning (HVAC) systems [3]. The newly published ASHRAE Handbook has put special emphasis on automated fault detection and diagnosis (FDD) for smart building
d
systems. In particular, the new standard highlights the necessity of maintaining the whole building system in good working conditions through FDD techniques
10
te
as well as the significance of saving energy and improving occupancy comfort level and building safety level via automated FDD system [4]. Therefore, there
Ac ce p
is an increasing need for studying automated fault identification in buildings aiming at saving energy and offering more comfortable and safe dwelling environment [5, 6]. In the past decades, researchers have been sparing no efforts to develop algorithms and strategies that could detect and diagnose HVAC faults
15
to prevent unnecessary economic losses and maintain the system’s working efficiency [7, 8].
In the literature, miscellaneous FDD methods have been proposed, mainly
including three techniques and their combinations, such as analytical modelbased, signal-based and knowledge-based methods [9, 10, 11, 12, 13]. The model-
20
based method relies on explicit description of the system. Despite significant theoretical advancement made in this direction, few of the solutions can be directly inserted to the Building Management System (BMS) to conduct real time
2
Page 2 of 42
TEI/TEO FWE VFD
VFD PRE PRC
=Fan =Pump
Cooling Tower
Chiller
T=Temperature Sensor F=Flowrate Sensor
Return Air
P=Pressure Sensor
cr
VFD=Frequency Sensor
VFD FWC
Building Cooling System Mounted with Sensors
Operation
Water Cooling
us
TCI/TCO VFD
ip t
Supply Air
Classification
0.3
0.2
0.1
normal1 normal2 CF45 EO68 FWC40 FWE40 NC5 RL40 RO40
0
-0.1
-0.2
-0.3
-0.4 -0.5
-0.4
-0.3
-0.2
an
-0.1
0
0.1
0.2
0.2
0.15
0.1
0.05
0
-0.05
-0.15
-0.1
0.2
FWC20-NM FWC20-CF FWC20-EO FWC20-FWC FWC20-FWE FWC20-NC FWC20-RL FWC20-RO
0.18 0.16
Analysis
0.14
probability density
BMS
0.12
0.1
0.08 0.06 0.04
Expert
0.02 0
0
0.05
0.1
0.15 0.2 0.25 distance intervals
0.3
0.35
0.4
FDD Results
M
Data Labelling
Figure 1: Data-driven building FDD scheme,including deployed sensor network, data base
d
management, and a decision support system.
te
monitoring [10, 12]. The signal-based FDD method investigates the correlation between faults and system output signals, and improved performance can be achieved by adding the signal pattern of healthy status as a priori [10, 12]. The
Ac ce p
25
knowledge-based FDD method discovers the underlying knowledge and system features that represent the information redundancy among the system’s variables through learning from empirical data. Due to this fact, the knowledgebased method is commonly referred to as data-driven method [10, 13]. The
30
empirical data, which records outside environmental factors, internal loads, and mechanical system working conditions, is collected through sensor network and stored in the BMS [14, 15]. Experts and researchers analyze the empirical data and feedback to building operators if any fault is found. A common data-driven FDD system for smart buildings is depicted in Figure 1, including deployed
35
sensor network, data base management, and a decision support system. Recently, a wide range of statistical and machine learning techniques have
3
Page 3 of 42
Imprellers
Economizer Suction Line
Compressor
Evaporator
Pilot Valve
Main Gas Line
us
Main Valve Filter Drier
cr
Sensing Bulb
ip t
Oil Tank
Discharge Line
an
Economizer
Condenser
M
Main Liquid Line
Figure 2: Schematic diagram of chiller components and refrigerant flow paths; a typical centrifugal chiller system consists of: evaporator, compressor, condenser, economizer, motor,
d
pumps, fans, and distribution pipes etc.
te
been explored as data-driven methods in the building FDD field, including Principal Component Analysis (PCA) [16, 17, 18], Statistical Process Con-
Ac ce p
trol (SPC) [19, 20, 21], Multivariate Regression Models [22], Bayes Classifier
40
[23, 24, 25], Neural Networks (NN) [26, 27, 28], Fisher Discriminant Analysis (FDA) [29], Gaussion Mixture Model [30], Support Vector Data Description (SVDD) [31, 32], and Support Vector Machines (SVM) [33, 34, 35, 36, 37]. Among these approaches, PCA and SPC are unsupervised methods that do not require expert knowledge for fault labelling, but others like NN and FDA are
45
supervised multi-class classification methods that depend on the availability of labelled training data. Once the hypothesis/model is fitted from the training phase, new measurements will be tested by the classifiers and be assigned to corresponding categories (normal or faulty) automatically. Notwithstanding existing work on data-driven FDD has shown promising results in both detection
50
accuracy and efficiency, two important issues, namely fault interdependence and
4
Page 4 of 42
Table 1: Definitions of 24 essential variables in a typical cooling system (I)
Units
TEI Temperature of entering evaporator water F
cr
TEO Temperature of leaving evaporator water F
ip t
Label Description
TCI Temperature of entering condenser water F
us
TCO Temperature of leaving condenser water F
kW
FWC Condenser water flow rate
gpm
an
kW Compressor motor power consumption
gpm
TEA Evaporator approach temperature
F
M
FWE Evaporator water flow rate
F
TRE Refrigerant temperature in evaporator
F
d
TCA Condenser approach temperature
te
PRE Pressure of refrigerant in evaporator
F
Ac ce p
TRC Refrigerant temperature in condenser
psig
severity levels, are often ignored or over-simplified with homogeneity assumptions [13, 38, 39].
First of all, although it is quite intuitive to build fault dependence by
analysing the connections and structures of each component of HVAC system,
55
this prior knowledge is rarely considered in current data-driven FDD literature. For example, Zhao proposed a chiller fault detection method based on Support Vector Data Description (SVDD), which is a one-class classification technique describing the support of data distribution [31]. By training SVDD models for each fault type, they extended similar idea to a chiller fault diagnosis strategy
60
in [32]. Noticing that training a one-class classification model for each specific fault type is computationally costly, an alternative method is to formulate the FDD issue directly as a multi-class classification problem. To list a few, Du 5
Page 5 of 42
Table 2: Definitions of 24 essential variables in a typical cooling system (II)
Description
Units
PRC
Pressure of refrigerant in condenser
psig
T suc
F
Refrigerant suction temperature
cr
TRC sub Subcooling temperature
ip t
Label
F
F
TR dis
Refrigerant discharge temperature
F
Tsh dis
Refrigerant discharge superheat temperature F
P lift
Pressure lift across compressor
an
us
Tsh suc Refrigerant suction superheat temperature
M
TO sump Temperature of oil in sump
F F F
PO feed Pressure of oil feed
F
TWCD
Condenser temperature
F
TWED
te
d
TO feed Temperature of oil feed
F
Ac ce p
Evaporator temperature
proposed to utilize Fisher Discriminant Analysis (FDA) and Principal Component Analysis (PCA) to diagnose multiple sensor faults in AHU [29]. Keigo
65
employed semi-supervised FDA to detect building energy faults, and adopted Decision Boundary Analysis (DBA) to discover the hidden relationship between the extracted features and the corresponding faults [40]. However, all of the aforementioned work is restricted to modelling each type of fault separately with single (flat) class labels and ignores valuable prior information on fault
70
dependence, which could otherwise be exploited (fused) to improve the detection performance of the machine learning method [41]. Moreover, when dealing with complex building systems, the number of fault types (classes) is expected to be large, while usually only small number of labelled data for each fault class is available. From a statistical learning perspective, adopting a flat multi-class 6
Page 6 of 42
75
learning method and ignoring prior information will result in loss of valuable information, thus leading to degraded performance [42].
ip t
Secondly, the presence of different fault severity is well acknowledged in experiments but has long been ignored for FDD purpose. In a real building cooling
80
cr
system, faults naturally exhibit at various levels of severity due to different system /component degradations [43, 44, 45, 46]. For instance, in the research of
us
typical chiller faults, condenser fouling is a physical obstruction which is caused by the aggregation of non-decomposable chemical substances in the condenser tubes. It lowers the effective heat transfer coefficient and decreases the water
85
an
flow rate in a manner consistent with the degree of aggregation. Hence the severity/degree of fault provides researchers/system managers valuable information to optimize maintenance actions, as well as to set priorities for different
M
system scenarios. On the other hand, the advancement of the sensor network technology has greatly improved the capability to monitor temperature, flow rate, pressure, etc. with a refined spatial temporal granularity [44]. In short, detecting severity level in a data-driven framework is not only favorable, but
d
90
fault is.
te
also doable. Until now no work has tried to identify how serious the identified
Ac ce p
In this paper we emphasize the importance of incorporating the prior knowledge of fault dependence (derived from chiller system characteristics). The ob-
95
jective of this paper is to design a novel data-driven FDD method, so as to: (1) recognize faulty working conditions and identify the fault type, (2) determine how serious the identified fault is. We propose a unified framework, namely Tree-structured Fault Dependence Kernel (TFDK) method to include the interclass information and describe the fault severity levels. The tree-structured
100
labels regard the severity levels as child nodes of each fault type rather than viewing them as independent classes. In addition, an on-line learning method is developed to train a multi-class classifier with streaming sensor measurement data. The effectiveness of TFDK has been evaluated on the experimental data of ASHREA Research Project RP-1043, and the results show significant im-
105
provement over the state-of-the-art approaches. 7
Page 7 of 42
B A
B
AB
A
B AB
Water Flow Meter TSO
Valve
A
B
A
Condenser Water to Evaporator Water HX
AB
Stream AB
FWE
B A
ip t
Steam HX bypass
TSI
TWO AB
Drain
A
B AB
Condenser Water to City Water HX
A
B
TOB
FWC 90 ton Centrifugal Chiller
Drain
Chillered Water to Hot Water HX TEO
TBI THI
TWI
TEI
us
Evaporator
TCO
Pump
Condenser
Temperature Sensor
TCI
an
City Water
THO
cr
AB
Figure 3: Schematic of the cooling system test facility and sensors mounted in the related water circuits.
M
Compared with previous building FDD works, this paper presents its contributions in several ways. (1) A TFDK method is derived to make use of the fault dependence information, and thus achieving higher FDD accuracy com-
commodate streaming data, which enables seamless integration to the BMS and
te
110
d
pared with other methods; (2) An on-line learning method is developed to ac-
sequential decision-making for HVAC schedule; (3) Detailed information about
Ac ce p
the building performance is provided by identifying fault severity levels, hence providing researchers and building managers more options on taking actions to handle the faults.
115
In Section II, we present the formulation of structured dependence informa-
tion in the building cooling system. The derivation of TFDK which is based on structured building FDD formulation is given in Section III. Section IV presents the FDD results by TFDK and compares it with other multi-class classification methods. Section V summaries the paper and suggests possible future work.
8
Page 8 of 42
ip t
150
cr
100
TEO TCI kW PRC POfeed
0 0
500
1000
1500
2000
2500
3000
3500
us
50
4000
4500
5000
an
Figure 4: Raw data of five variables under normal condition collected by sensors mounted in cooling system.
2. Structured Class Information in the Building Cooling System
M
120
2.1. Cooling System with Centrifugal Water-cooled Chiller
d
In tropic area, such as Singapore, cooling systems, especially those with
te
centrifugal chillers, account for a large portion of the energy usage of HVAC systems [47]. As shown in Figure 2, a typical centrifugal chiller system consists 125
of: evaporator, compressor, condenser, economizer, motor, pumps, fans, and
Ac ce p
distribution pipes etc. Figure 2 also depicts the chiller refrigerant flow paths. At the beginning of refrigerant cycle, liquid refrigerant is distributed along the evaporator and sprayed through small holes with high pressure in a distributor to uniformly coat each evaporator tube. Here the liquid refrigerant absorbs
130
enough heat from the chiller water that is circulating through the evaporator tubes, thus turning into refrigerant vapor. The chiller water is cooled down during this process. Then the gaseous refrigerant is drawn through the eliminators (which remove droplets of liquid refrigerant from the gas) and delivered into impellers where the gas will be compressed. Once the compression is com-
135
pleted, the gas is discharged into the condenser, where baffles distribute the compressed refrigerant gas evenly across the condenser tube bundle. Cooling tower water which circulates through the condenser tubes absorbs heat from
9
Page 9 of 42
Raw Data:PO_feed under 8 Conditions 200
100
50
0 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Number of Data Points
cr
Outlier-Removed Data: PO_feed under 8 Conditions 180
140
100 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
an
Number of Data Points
us
PO_feed (F)
160
120
ip t
PO_feed (F)
150
Figure 5: Outliers of “pressure of oil feed (PO feed)” are removed by Thompson Tau method; the raw data is collected under 8 working conditions.
TEO (F)
60 50 500
1000
1500
M
Original Data: TEO under 8 conditions 70
2000
2500
3000
3500
4000
4500
5000
3500
4000
4500
5000
3500
4000
4500
5000
d
Number of Data Points De-noised Data: TEO under 8 conditions
60 50
te
TEO (F)
70
500
1000
1500
2000
2500
3000
Number of Data Points Residuals
Ac ce p
1 0.5
0
-0.5
500
1000
1500
2000
2500
3000
Number of Data Points
Figure 6: Temperature of leaving evaporator water pre-processed by wavelet de-noising (level=5); the raw data is collected under 8 working conditions.
the refrigerant, thus turning the gaseous refrigerant into liquid. The liquid refrigerant then drains from the bottom of the condenser and passes through an
140
expansion valve, where its pressure and temperature are reduced. At last, the low-pressure mixture enters the evaporator and starts the next cycle. In order to minimize the energy consumed by cooling system, the entering
10
Page 10 of 42
Original Data: TEO under 8 conditions
60 50 500
1000
1500
2000
2500
3000
3500
4000
4500
4000
4500
5000
Number of Data Points De-noised Data: TEO under 8 conditions 60 55 50 45 500
1000
1500
2000
2500
3000
3500
Number of Data Points Residuals
0 -5 -10 500
1000
1500
2000
2500
3000
3500
4000
4500
5000
an
Number of Data Points
5000
us
5
cr
TEO (F)
65
ip t
TEO (F)
70
Figure 7: Temperature of leaving evaporator water pre-processed by wavelet de-noising (level=10); the raw data is collected under 8 working conditions. Original Data: TEO under 8 conditions
60 50 500
1000
1500
M
TEO (F)
70
2000
2500
3000
3500
4000
4500
5000
3500
4000
4500
5000
3500
4000
4500
5000
d
Number of Data Points De-noised Data: TEO under 8 conditions
60 58 56 54
te
TEO (F)
62
500
1000
1500
2000
2500
3000
Number of Data Points Residuals
Ac ce p
5 0 -5 -10 -15
500
1000
1500
2000
2500
3000
Number of Data Points
Figure 8: Temperature of leaving evaporator water pre-processed by and wavelet de-noising (level=15); the raw data is collected under 8 working conditions. It can be viewed that periodic patterns can be removed when the wavelet decomposition level is relatively high.
condenser water temperature set-point should be as low as possible. At the same time, it should be at or above the lowest temperature attainable by cool-
145
ing tower at certain (wet-bulb) air temperature to avoid wasting fan energy for saturated value. The chilled water supply (leaving evaporator water) temperature is maintained at its set-point by regulating the cooling coil inlet valve
11
Page 11 of 42
position. The valve motor opens or closes to tune the valve position by a feedback controller which maintains the pre-set chilled water supply temperature based on the difference between the set-point and measured temperature value
ip t
150
[48]. There are a great number of sensors implemented within cooling systems in
cr
purpose of monitoring and controlling. In order to detect and diagnose typical
faults, analysis of sensor measurements for the most essential variables of cooling
155
us
system is included in this study. In this work, we analyze 24 essential variables according to [31] and [47]. They are common parameters for controlling and
an
monitoring in cooling system, as listed in Table 1 and Table 2. 2.2. Pre-processing Methods for Chiller Sensor Data
The cooling system studied in this paper is a typical centrifugal water-cooled
160
M
chiller system with motor-driven compressor. Following the ASHRAE RP-1043, the sensor measured raw data is in the form of time series, and presents periodic patterns mostly due to the on/off (open/close) states of some components in the
d
system. The raw data of five variables is shown in Figure 4, where periodic pat-
te
terns and obvious outliers can be viewed. Before formulating the tree-structure FDD of the cooling system, we propose to pre-process raw data by removing periodic patterns and outliers so as to avoid the non-negligible side effect of confusing patterns and outliers on data analytics [49].
Ac ce p
165
2.2.1. Periodic Pattern Removing In this study, Wavelet-based De-noising is utilized to remove the confusing
periodic patterns. Wavelet Transform is an infinite set of various transforms ψ (t), and is obtained from a single orthonormal wavelet, called mother wavelet or basic function, by scaling and shifting (translation). The wavelet series can be defined as
1 ψa,b (t) = √ ψ a
t−b a
(1)
where a and b represent the scale and translation parameters respectively. In the discrete case where the Wavelet Transform can be used for denosing, the
12
Page 12 of 42
scale and translation parameters are discretized as a = 2m and b = n2m . The
ψa,b (t) =ψm,n (t) = 2−m/2 ψ 2−m t − n
ip t
dilated and translated version of the mother wavelet ψ (t) can be written as: (2)
cr
where m and n denote the scale and translation parameters respectively. Given
an original signal f (t), its wavelet coefficients are obtained through the inner Z∞ W (a, b) =
∗ ψa,b (t) f (t)dt
us
product operation:
−∞
(3)
where ∗ is the complex conjugate symbol and ψ is the basic function, which
170
an
can be chosen according to the properties of the given function f . The choice of mother wavelet (e.g. Haar, Daubechies, Coiflets, Symlet, Biorthogonal and
M
etc.) determines the final waveform shape, and in our case we choose Symlet as the basic function.
The essence of de-noising using Discrete Wavelet Transform (DWT) is to
d
reduce the noise in the wavelet transform domain [50]. Define the noisy obser-
te
vations as W = [w1 , w2 , ..., wN ], satisfying W=f+ε
(4)
Ac ce p
where f = [f1 , f2 , ..., fN ] is the desired noise-free signal, and ε = [ε1 , ε2 , ..., εN ]
is the observation noise. Firstly, we apply DWT to the noisy signal to produce
175
the noisy wavelet coefficients to the level which we can properly distinguish the signal pattern. Then we inverse wavelet transform of the filtered wavelet coefficients to obtain a de-noised signal. As shown in Figures 6-8, the raw data of leaving evaporator water temperature is pre-processed by Wavelet-based Denoising with increasing levels of wavelet decomposing. We can see from those
180
figures that the periodic patterns can be removed from the original raw data when the wavelet decomposition level is relatively high. 2.2.2. Outlier Removing Sensor data for each variable is treated as a column vector and thus obvious outliers can be detected by the Modified Thompson’s Tau method [51], which is 13
Page 13 of 42
based on the absolute deviation of each record from the mean of the entire vector. The strength of this method lies in the fact that it takes into account a data
ip t
set’s standard deviation and average, and provides a statistically determined
rejection zone; thus providing an objective method to determine whether a data
cr
point is an outlier. The rejection zone is given by tα/2 (n − 1) τ=√ q n n − 2 + t2α/2
us
(5)
where tα/2 is the critical value from the Student’s t distribution [52], and n is
an
the sample size. The absolute deviation of the data set is σ = |(X − mean (X))/S|
(6)
M
where S is the sample standard deviation. If σ > τ , the data point is an outlier. As shown in Figure 5, the variable “pressure of oil feed (PO feed)” is 185
measured by corresponding sensors under 8 conditions (normal condition and
d
7 faulty conditions), and the time series sensor measurements are smoother
te
without outliers compared with the raw data. 2.3. Tree Structure Formulation
Ac ce p
A large number of possible faults and failures have been identified by ASHRAE
190
RP-1043, while not all of them would be practical for further examination as part of the FDD scheme [44]. It is expected that the faults chosen for experimental testing could be detected and diagnosed by monitoring the thermodynamic states of the chiller. Based on how often one fault occurs and how much economical loss it causes, we pick 7 typical faults as our research content:
195
• Condenser fouling (CF) • Excess oil (EO) • Reduced condenser water flow rate (FWC) • Reduced evaporator water flow rate (FWE)
14
Page 14 of 42
4 Compressor
Air Handling Unit
Cooling Coil Water System
1
Cooling Tower Water System
Cooling Tower
cr
2 Evaporator
Condenser
3
5
ip t
7
6
an
us
1. Condenser fouling 2. Reduced condenser water flow rate 3. Non-condensable in refrigerant 4. Excess oil 5. Refrigerant leakage 6. Refrigerant overcharge 7. Reduced evaporator water flow rate
Figure 9: Seven typical faults and their locations in the cooling system. Faults 1 and 2 occur in the cooling tower water circle; faults 3, 5, and 6 occur in the refrigerant circle; fault 4 occurs
M
in the compressor; and fault 7 occurs in the cooling coil water circle.
• Non-condensable in the refrigerant (NC) • Refrigerant leak/undercharge (RL)
d
200
te
• Refrigerant overcharge (RO)
This paper aims to distinguish the 7 faulty working conditions from the
Ac ce p
normal working condition for a typical cooling system, and also recognize different fault severity levels based on the pre-defined severity level information.
205
Unlike previous FDD methods which assign the faults and their severity levels with plain labels and formulate the FDD task as simple multi-class classification problem, we include the fault dependence information into the feature mapping and encode the fault types as well as their severity levels with tree-structured labels.
210
In the light of the expert knowledge about the cooling system configuration,
the chosen faults happen in different places within the cooling system, which leads to structured inter-class relationships. As shown in Figure 9, chiller water flows through the evaporator pipes and the cooling coil in Air Handling Unit. Therefore the FWE fault, which occurs in the cooling coil water circuit, is 215
relatively not closely related to other faults that occur in other components or 15
Page 15 of 42
Refrigerant Fault
EO
O\L
NC
CF
FWC
RO
us
RL
Condenser Fault
Normal
cr
FWE
ip t
Cooling System
each fault type.
30
31
14-17
6-9
M
Root
an
Figure 10: Chiller faults with tree labelling; gradient arrows represent severity levels under
38
1
d
37
33
22-25
26-29
35
36
2-5
10-13
18-21
Ac ce p
te
32
34
39
Figure 11: Structured labels as a tree for typical chiller faults and corresponding severity levels.
subsystems. Similarly, the EO fault which happens in the compressor motor oil tank is also relatively not closely related to other faults. The NC fault and the RL/RO fault are correlated since they are relevant to the refrigerant. The CF fault, which means condenser pipes are partly blocked, and the FWC fault share
220
the closest correlation because they locate in the cooling tower water circuit and will influence the condenser performance in the first place once happen. On the basis of those prior expert knowledge, we can describe the relationship among the faults and their severity levels with a “tree”, where different fault types as
16
Page 16 of 42
well as the normal condition are described as the branch nodes (non-leaf nodes) 225
and severity levels for one fault are regarded as the leaf nodes rooted from the
ip t
same parent node. The “tree” is depicted in Figure 10, in which the gradient
cr
arrows represent severity levels under each fault type.
3. Feature Mapping for Tree-structured Fault Dependence
us
3.1. Feature Mapping
In this section, we introduce a feature mapping that incorporates the prior knowledge about faults and severity level dependence.
To begin with, let
be a set of labelled training data, where xi ∈ Rd denotes the
an
n {(xi , yi )}i=1
ith record of sensor measurements (d streams) under system condition yi ∈ Y , {1, ..., q} (Detailed data description is presented in Section IV). The tree-
M
structured relationships between chiller faults depicted in Figure 10 can be encoded as Figure 11. Each node, including the leaves for severity levels but except
d
the root, is numbered with an integer k ∈ {1, 2, · · · , s}. In our case, nodes 1−29 are classification categories, the normal situation is encoded as node 1, and the
te
seven faults with their four severity levels are encoded as nodes 2 − 29. Nodes 30 − 39 are intermediate nodes that represent the fault dependence. Next, to
Ac ce p
incorporate the tree structure information in each data sample we consider an attributes reweighing vector Λ (y) ∈ Rs and the transformation Φ : Rd → Rd×s according to [53], such that
230
Φ (x, y) = Λ (y) ⊗ x
(7)
where ⊗ denotes a tensor product, i.e. Φ (x, y) ∈ Rd×s is a vector containing all products of coefficients from the first and second vector argument. Writing out Φ (x, y),
λ1 (y) × x
λ2 (y) × x Φ (x, y) = ... λs (y) × x
(8)
17
Page 17 of 42
in which the attributes reweighing vector is defined as
ip t
v , ifz y z λz (y) = 0, otherwise
(9)
235
cr
where the relation denotes that a node z is y or the ancestor of y. The reweighting parameter vz ≥ 0 could be used to include the different influence of
node z on node y. In the simplest case it can be set to 1, and λz becomes an
us
indicator function. In a more refined configuration, one can set vz to a positive number that reflects the depth of node z in the tree.
an
Based on the above transformation one normal class and four severity levels for each of the seven faults are numbered as 29 categories and their dependence constitutes the additional 10 parent nodes in the tree. For example, the labelling
M
and the transformation for level 1 of the RL fault is
Λ (22) = [0, · · · , v22 , · · · , v32 , · · · , v37 , v38 , 0]
T T
d
Φ (x, 22) = [0, · · · , v22 x, · · · , v32 x, · · · , v37 x, v38 x, 0]
With the feature mapping, we consider a general version of discriminant
te
functions F for classification purpose,
Ac ce p
F (x, y; w) , hw, Φ (x, y)i
(10)
For simplicity, let hw, Φ (x, y)i = hwy , xi. It is a straightforward consequence
of the linearity of Eq. (10) to show that one can re-write F as an additive superposition of linear discriminant as follows, F (x, y; w) =
s X
λz (y) hwz , xi
(11)
z=1
where wz ∈ Rd is a weight vector associated with the rth class attribute. As a
concrete example, the discriminant function for node 22 in Figure 11 is hw, Φ (x, 22)i = hw22 , xi + hw32 , xi + hw37 , xi + hw38 , xi
18
Page 18 of 42
Algorithm 1 On-line Update Algorithm Input (xt+1 , yt+1 )
ip t
St+1 ⇐ ∅ while S1 ...St+1 still change do
cr
for ı = t + 1 : −1 : 1 do t+1 P P w= αjy0 δΦj (y 0 ) j=1 y 0 ∈Sj
H (y) = (1 − hw, δΦi (y)i) ∆ (yi , y)
us
y ∗ = arg max H (y) y
ξi = max {H (y)} y∈Si
an
if H (y ∗ ) > ξi + ε then Si ← Si ∪ {y ∗ } αS ← Solve dual with S
M
end if end for
te
Output S1 0 ...St 0 , St+1 0
d
end while
3.2. TFDK Learning Method
Ac ce p
The learning objective is to find optimal parameters w for the classification function f , which can be written as, f (x; w) , arg max F (x, y; w)
(12)
y∈Y
In this work we adopt a large margin learning formulation [54]. Firstly the
multi-class margin of a data sample (xi , yi ) with respect to a parameterization
w can be defined as
γi , F (xi , yi ; w) − max F (xi , y; w) y6=yi
(13)
Consider a category dependent cost ∆(yi , y) for misclassifying yi as y (which is clarified in subsection 4.2, and interested readers can refer to [55] for more information), one arrives at the following L2 regularized soft-margin learning
19
Page 19 of 42
objective n CX 1 2 min kwk + ξi 2 n i=1
240
γi (w) ≥ 1 −
ip t
ξi ≥ 0
∀i ξi ∆(yi ,y)
(15)
cr
s. t.
(14)
where C is a hyper-parameter that tunes the margin loss penalty.
us
According to Eq. (10) and Eq. (13) γi (w) = hw, Φ (xi , yi )i − max hw, Φ (xi , y)i y6=yi
(16)
an
≤ hw, Φ (xi , yi )i − hw, Φ (xi , y)i (∀y 6= yi ) ∆
Letting δΦi (y) = Φ (xi , yi ) − Φ (xi , y), for ∀i, ∀y 6= yi , we can get
M
hw, δΦi (y)i ≥ γi (w) ≥ 1 −
ξi ∆ (yi , y)
(17)
Thus the second constraint of Eq. (15) can be rewritten as ξi ≥0 ∆ (yi , y)
(18)
d
hw, δΦi (y)i − 1 +
The dual formulation of the above primal problem with the Lagrangian
te
multiplier method [56] is
Ac ce p
n n P P 2 L (w, ξ, α, η) = 21 kwk + C ξi − η i ξi n i=1 i=1 n P P − αiy hw, δφi (y)i − 1 + ∆(yξii ,y)
(19)
i=1 y6=yi
Computing the derivations of L with respect to the primal variables by KKT
conditions [57] results in
n X X ∂L =0⇒w= αiy δΦi (y) ∂w i=1
(20)
X ∂L C αiy = 0 ⇒ ηi = − ∂ξi n ∆ (yi , y)
(21)
y6=yi
y6=yi
Plugging w and ηi into Eq. (19), one can get L = − 21 +
P
P
αiy αjy0 hδΦi (y) , δΦj (y 0 )i
i,j y6=yi ,y 0 6=yj n P P
(22)
αiy
i=1 y6=yi
20
Page 20 of 42
Since Eq. (14) is equivalent to min −L, with the condition that the primal α
ηi =
ip t
variables are non-negative, one can get X C αiy − ≥0 n ∆ (yi , y)
(23)
y6=yi
α
XX 1X X αiy αjy0 hδΦi (y), δΦj (y 0 )i − αiy 2 i,j i y6=yi
y6=yi y 0 6=yj
αiy ≥ 0 s. t. P
∀ i,
(24)
∀ y 6= yi ≤
C n
∀i
an
αiy y6=yi ∆(yi ,y)
us
min
cr
Thus the primal-dual transition of the soft-margin learning objective is
Notice that the dual form only involves the inner product [58] of δΦi (y) and
M
δΦi (y 0 ), which admits direct calculation as follows
hδΦi (y), δΦj (y 0 )i = hΛ(y) − Λ(yi ), Λ(y 0 ) − Λ(yj )ihxi , xj i
(25)
Ac ce p
te
d
Hence we define the Tree-structured Fault Dependence Kernel (TFDK) as 0 if y = yi or y 0 = yj K(i,y)(j,y0 ) = (26) hδΦi (y), δΦj (y 0 )i otherwise where K(i,·)(j,·) is a |Y| × |Y| matrix and constitute the (i, j)th block of the
245
overall kernel matrix K. It’s straightforward to check that K is positive semi-
definite, and thus the learning problem belongs to a convex quadratic program (QP). Although various methods exist in literature to solve convex QP, for the problem at hand, the presence of many linear constraints and the requirement for learning with streaming data motivate us to design an online active set
250
algorithm. Key update steps for the algorithm is summarized in Algorithm 1, where one can simply use empty set for the initialization of active sets. The convergence argument is included in the Appendix. As shown in Figure 12, to integrate the FDD tool into BMS and update its results successively, labeled historical data is firstly accumulated into a fault
255
library and stored in a database as the “batch training” data set, based on 21
Page 21 of 42
Communication System
ip t
Data Collection
Labeled Historical Data
cr
Data Pre-processing
BMS
Archiving
Monitoring System
us
DATABASE
an
On-line Monitoring
Fault Library Training
HVAC Control System
d
Feedback
Operation
TFDK-based Method
M
FDD TOOL
te
Figure 12: Schematic showing how to integrate the FDD tool into BMS.
Ac ce p
which an initial TFDK model is obtained (by running Algorithm 1). Then at each round of detection, real time sensor measurements and monitoring data are collected as the input of a stand-alone program, which implements the classifier, to conduct FDD. The detection results are provided to building managers
260
and operators for further inspection and possible action taken. Their feedback, i.e., another labeled instance, constitutes the “real time training” input of the algorithm, which is designed incrementally. With this online training phase, the TFDK model is refined and is used for future FDD. The process goes on iteratively as mentioned above.
265
Our team - the Building Efficiency and Sustainability in the Tropics (SinBerBEST) program
1
- is operating an Integrated Cyber-Physical test-bed lo-
1 http://sinberbest.berkeley.edu/research/thrust-6-test-bed-integration
22
Page 22 of 42
Table 3: Fault severity levels for each fault and corresponding experimental methods
SL
NM
Description
ip t
Name
Test run under normal condition
cr
CF 1/2/3/4 Plugged 20/33/49/74 tubes (out of 164) in the condenser EO 1/2/3/4 Oil charge 14%/32%/50%/68% more than nominal
us
FWC 1/2/3/4 Reduce condenser water flow rate by 10%/20%/30%/40%
FWE 1/2/3/4 Reduce evaporator water flow rate by 10%/20%/30%/40%
an
NC 1/2/3/4 Adding 0.1/0.16/0.22/0.54 lbs Nitrogen to the refrigerant; displacing about 1.0%/1.8%/2.4%/5.6% of the volume
M
at room temperature
RL 1/2/3/4 Refrigerant charge 10%/20%/30%/40% less than nominal
d
RO 1/2/3/4 Refrigerant charge 10%/20%/30%/40% more than nominal
te
cated in Singapore, and we are working towards implementing the designed FDD strategy in the test-bed together with a set of fault-prevention/reaction
Ac ce p
control laws.
270
4. Validation Results and Comparison 4.1. The Experimental Data The proposed fault detection framework is tested with the data collected
from ASHRAE RP-1043 project. As a brief introduction, one primary goal of the project was to obtain state measurement for a typical cooling system under
275
normal, and various faulty conditions. A 90-ton centrifugal water-cooled chiller is used, which is relatively small such that a comprehensive experiment design is possible, and it also bears enough representatives of chillers used in larger installations [44]. The experiment was conducted in an indoor environment with a nearly constant ambient temperature of 72◦ F , and the specifications of ARI 23
Page 23 of 42
RS-232 to RS-485 Converter
ip t
RS-485
JCI AHU Controller
us
cr
RS-232
PC Running VisSim
an
Centrifugal Chiller with MicroTech Controller
Figure 13: Schematic showing chiller test standard and control interface.
(Air-Conditioning and Refrigeration Institute) Standard 550 for Centrifugal and
M
280
Rotary Screw Water-Chilling Packages were adopted as the test requirement [44]. Sensor measurement is transferred to a database from the MicroTech
d
controllers, which are mounted on the chiller. The test standard is controlled
285
te
by three Johnson Controls Inc Air Handling Unit (JCI AHU) controllers on an N2 bus which is an RS-485 network. As shown in Figure 13, RS-485 is connected
Ac ce p
to the PC through COM Port 1 via an RS-485 to RS-232 converter. During the experiment, 9 typical faults suggested in the ASHRAE RP-1043
were introduced at multiple severity levels. More than 60 tests were conducted, and for each test, 64 variables, including direct sensor measurement and calcu-
290
lated physical indexes, were recorded once every 10 seconds. In this paper, 7 commonly encountered faults are taken into account. Those faults are emulated by various experimental methods as is summarized in Table 3. For example, in the original ASHRAE RP-1043 experiment the NC fault was introduced by incrementally adding Nitrogen to the refrigerant.
295
Considering the availability of sensor measurement in a more practical situation, 24 most accessible variables listed in Table 1 and Table 2 are chosen as algorithm input features. With TFDK method, the desired output includes not
24
Page 24 of 42
30
31
14-17
6-9
38
1
37
39
34
35
36
2-5 32
33 2
4
26-29
10-13
us
22-25
18-21 3
cr
5
ip t
Root
1
an
Figure 14: Structured labels as a tree for typical chiller faults and corresponding severity levels; examples of misclassification cost among severity levels and fault types.
M
only normal/fault types, but also 4 severity levels if a fault is detected. Data groups of all fault types with four severity levels as well as data collected under 300
normal condition are defined as the training data sets, i.e. 29 categories in total.
d
We encode those categories with tree-structured labels as depicted in Figure 11.
te
In order to justify the adopted Thompson Tau and Wavelet-based De-noising methods, we compare two cases with and without the data pre-processing tech-
Ac ce p
niques mentioned in section II. 305
4.2. Evaluation Measures
To evaluate the effectiveness of the tree-structured classification method on
a more rigorous basis, we employ two measures: testing accuracy and testing cost.
4.2.1. Testing Accuracy
The goal is to estimate the chance that the predictor f (x) is correct on
future unseen data, i.e., the generalization performance of the predictor. In this work, we use the empirical accuracy on a batch testing data set as an unbiased estimator. Let sign [., .] be 1 if the predicted label of one testing data point
25
Page 25 of 42
(a) Pre-processed Data
100
(b) Raw Data
100
ip t
90
90
80
70
60
50
TFDK MSVM DT NN AB QDA LA
30
60
50
40
TFDK MSVM DT NN AB QDA LA
30
20
20
10 0
50
100
150
200
0
50
100
150
200
Training Sample Size
an
Training Sample Size
us
40
70
cr
Classification Accuracy
Classification Accuracy
80
Figure 15: Classification accuracy as a function of training sample size by different methods; TFDK generates the highest accuracy. Figure (a) shows the classification accuracy of different
M
methods by data that is pre-processed by de-nosing and outlier removing; figure(b) is directly by raw data.
d
accorded with its original label and 0 otherwise, the testing accuracy is n
te
Accu (f ) =
1X sign [f (xi ) , yi ] n i=1
Ac ce p
1 sign [f (xi ) , yi ] = 0
310
f (xi ) = yi
(27)
(28)
f (xi ) 6= yi
where f (xi ) is the predicted label for testing data point xi as in Eq. (12), which represents that the data point is recognized as a certain severity level of one fault type, and yi is the true label that records the real experiment condition.
4.2.2. Testing Cost
While testing accuracy is an unbiased estimator of classification correctness,
315
it treats all errors equally important, i.e., all types of errors induce the same cost. In practice, however, the seriousness and the consequence of committing different types of errors may vary significantly. In particular based on the tree-structured relationships between different faults in Figure 11, the misclassification of one category to another category will cause different losses. In order 26
Page 26 of 42
100 Pre-processed data Raw data
90
ip t
80
60
50
cr
Classification Accuracy
70
40
20
10
0 TFDK
MSVM
DT
NN
AB
QDA
LA
an
Training Sample Size: 30
us
30
Figure 16: Comparison of classification accuracy by pre-processed data and raw data. Data
320
M
pre-processing helps to improve the classification accuracy.
to incorporate this consideration, we define the cost of misclassification among severity levels under the same fault type to be the lowest, the cost of misclassi-
d
fication among fault types derived from different parent nodes to be higher, and
te
the cost of recognizing fault as normal to be the highest. Especially, one can assign a cost that is proportional to the node distance in the tree depicted in Figure 14. For instance, the cost of misclassification among leaf nodes 26 − 29
Ac ce p
325
is 1; and the cost among leaf nodes 26 − 29 and 18 − 21 is 3. Putting all the
defined costs in a cost matrix ∆, the misclassification cost can be characterized by a loss function as
q P q P
∆ij g (yj , f (xi ))
i=1 j=1
Fcost (f ) = P P
∆ij g 0 (yj , f (xi ))
(29)
where g (yj , f (xi )) is in fact a confusion matrix in which each row represents the
330
number of samples in predicted class while each column represents the samples in actual (true) class, g 0 (yj , f (xi )) represents how many testing data points will be classified to category j if testing data from category i is averagely classified to other categories, and ∆ij is the cost of classifying test data point from category
i to j (∆ij = 0 if i = j), and here Fcos t (f ) is the absolute cost value for the 27
Page 27 of 42
0.6
Misclassification Cost
Misclassification Cost
0.5
TFDK MSVM DT NN AB QDA LA
0.7
0.4
0.3
0.5
0.4
cr
0.6
(b) Raw Data
0.8 TFDK MSVM DT NN AB QDA LA
0.3
0.2
0.1
0
0 0
50
100
150
200
0
50
100
150
200
Training Sample Size
an
Training Sample Size
us
0.2
0.1
ip t
(a) Pre-processed Data
0.7
Figure 17: Misclassification cost as a function of training sample size by different methods; TFDK generates the lowest cost. Figure (a) shows the misclassification cost of different
M
methods by data that is pre-processed by de-nosing and outlier removing; figure (b) is directly by raw data.
0.25
d
Pre-processed data Raw data
te
0.15
Ac ce p
Misclassification Cost
0.2
0.1
0.05
0
TFDK
MSVM
DT
NN
AB
QDA
LA
Training Sample Size: 30
Figure 18: Comparison of misclassification cost by pre-processed data and raw data. Data pre-processing helps to reduce the misclassification cost.
335
classifier f . Notice that the misclassification cost is considered as one optimization constraint in Eq. (24).
28
Page 28 of 42
4.3. Results and Comparison
340
ip t
In order to justify the statistical performance of the proposed FDD framework, we adopt a classical “training-testing” procedure. The pre-processed data
is randomly divided into two parts, one for fitting the TFDK model and the
cr
other one for testing the attained model on unseen data set. Since labeled data usually has limited availability in practice, we train the TFDK classifier with
345
us
various sample sizes to analyze its impact on testing accuracy. Given that the raw data of ASHRAE RP-1043 are collected every 10 seconds, sample size also represents the time duration spent on data collection. For example, within 10
an
minutes, sensors can collect 60 data samples, each with 24 channels. In this work, we train the classifier with 8 different sample sizes (i.e. 6, 12, 18, 30, 48, 90, 120, and 180). For each configuration, the testing data is randomly chosen from the pre-processed testing data set and testing sample size is 1600 for each
M
350
fault type (400 for each severity level) and 400 for the normal condition.
d
4.3.1. Comparison of Accuracy and Cost Among Different Methods
te
We compare TFDK with other state-of-the-art methods, including Multiclass SVM (MSVM) with RBF kernel, Decision Tree (DT), Neural Network (NN), Ada Boost (AB), Quadratic Discriminant Analysis (QDA), and Logistic
Ac ce p
355
Regression (LR). Figure 15 shows the classification accuracy as a function of training sample size for all the methods. In order to demonstrate the effectiveness of de-trending and de-noising, two sets of experiments were conducted with (Figure 15 (a)) and without (Figure 15(b)) the proposed pre-processing
360
technique. Similarly the results of testing cost as a function of training sample size for all the methods are shown in Figure 17. It is seen that TFDK outperforms all the other methods in terms of testing
accuracy and cost under different training configurations. More specifically, TFDK achieves 1.49% to 9.19% improvement in accuracy and 10.69% to 75%
365
decrease in testing cost compared to the runner-up method. The enhancement is more significant when the sample size is larger. Although it appears that the improvement is not obvious under small sample size (≤ 12), TFDK has 29
Page 29 of 42
(a) TFDK: small sample size Accuracy=69.64%, Cost=0.1604
(b) TFDK: large sample size Accuracy=99.12%, Cost=0.0175 1500
3
13
5
3
4
3
5
NM 400
0
0
0
0
0
0
0
1515 26
21
6
12
6
11
CF
0
1596
4
0
0
0
0
0
8
EO
4
22 1494 18
16
11
12
23
FWC
1
23
26 1507
7
13
12
11
FWE
5
25
31
21 1461 18
13
26
NC
5
13
24
12
4
1508 11
23
RL
1
6
17
5
7
8
1113 443
0
9
16
12
10
10
368 1175
RO
1000
EO
0
3
1596
0
1
0
0
0
FWC
0
0
1
1599
0
0
0
0
FWE
0
1
0
0
1599
0
0
0
NC
0
0
1
0
0
1599
0
0
RL
0
0
2
0
0
0
1590
8
0
2
1
0
1
0
1500
ip t
CF
1000
500
RO
500
10 1586
cr
NM 359
0
0
NM CF EO FWCFWE NC RL RO
(c) MSVM: small sample size Accuracy=68.08%, Cost=0.2090
(d) MSVM: large sample size Accuracy=89.19%, Cost=0.0700
us
NM CF EO FWCFWE NC RL RO
1500
NM 351
43
5
CF
1
1174 211
0
0
0
0
1
5
35
3
139
32
EO
99
125 1131 10
12
166
41
16
FWC
0
68
53 1265 27
84
16
87
FWE
0
4
58
39 1440
0
56
3
NC
3
6
3
0
75 1122 334
57
RL
0
72
20
1
20
41 1197 249
0
228
13
1
0
8
1500
NM 390 CF
1
1
9
1518 27
0
0
0
0
0
11
12
19
0
EO
4
83 1472
0
11
6
24
FWC
0
74
85 1285 80
13
16
47
FWE
0
1
84
2
1
2
NC
0
5
0
19
87 1397 40
52
RL
0
30
4
114
57
181 923 291
1
157
4
22
0
65
4
1506
an
1000
RO
0
12
500
RO
305 1045 0
500
33 1318 0
NM CF EO FWCFWE NC RL RO
M
NM CF EO FWCFWE NC RL RO
1000
Figure 19: Confusion matrix of TFDK and MSVM among fault types under small training sample size and large training sample respectively. In (a) and (c), both TFDK and MSVM
d
are trained with small training sample size, and they generate similar classification accuracy, 69.64% and 68.08%. However, TFDK presents very little misclassification among fault types.
te
In (b) and (d), TFDK and MSM are trained with relatively large training sample size. TFDK presents very high classification accuracy, while MSVM still presents obvious misclassification.
Ac ce p
extra advantage of being robust to inter fault type misclassification, as will be revealed later with confusion matrix.
370
As expected the testing accuracy/cost increases/decreases accordingly with
the increment of training samples. For instance, the testing accuracy of TFDK has boosted from 69.64% (6 training samples) to 99.12% (180 training samples); similar trends can be observed for the other methods, which reaffirms the intuition that accumulating more training data is beneficial to data-driven
375
FDD.
Comparing the two sub-plots (a) and (b) of Figure 15 and Figure 17, we view that those methods with pre-processed data present better results in general. Specifically, we look into the case when the training sample size is 30, and compare the testing accuracy and cost for different methods in Figure 16 and
30
Page 30 of 42
(a) Severity Level Recognition of EO Fault (TFDK Accuracy=69.64%) 350
EO_1
2
2
374
3
2
4
4
2
2
2
3 300
0
9
0
356
11
1
6
5
3
3
6 200
EO_3
1
8
1
13
354
0
3
5
2
5
8
EO_4
1
3
4
5
3
363
5
4
4
2
6
NM
CF
EO_1
EO_2
EO_3
EO_4
FWC
FWE
NC
RL
RO
150 100
cr
50
ip t
250
EO_2
0
us
(b) Severity Level Recognition of EO Fault (MSVM Accuracy=68.08%)
350
EO_1
1
66
258
63
1
0
0
2
0
9
0
300 250
10
30
213
1
1
EO_3
8
49
0
30
226
25
EO_4
0
0
0
19
0
264
NM
CF
EO_1
EO_2
EO_3
EO_4
6
9
0
27
13
an
90
200
4
0
50
5
3
0
1
116
0
0
FWC
FWE
NC
RL
RO
M
EO_2
150 100 50 0
Figure 20: Confusion matrix of TFDK and MSVM for the severity levels of the EO fault under small training sample size. To inspect the severity level identification rates of EO fault under
d
small training sample size, (a) shows that most of TFDK’s misclassification occurs among its
te
four severity levels; while (b) shows that MSVM presents misclassification to both its four severity levels and other fault types.
Figure 18, respectively. It is seen that the proposed pre-processing techniques
Ac ce p
380
greatly improve the performance of several methods such as TFDK, MSVM and AB.
4.3.2. Advantages of Incorporating Fault Dependence Tree To further investigate the benefit of including the prior knowledge of fault
385
dependence, we compare detailed classification results for TFDK and MSVM. The comparative results are able to reflect the effect of tree-structured fault dependence information because TFDK can be viewed as a hierarchical variation of the traditional large margin SVM. Figure 19 (a) and (c) are the confusion matrixes of MSVM and TFDK respectively when the training sample size is 6,
390
which is the smallest training sample size in our test; and Figure 19 (b) and (d)
31
Page 31 of 42
are the confusion matrixes for MSVM and TFDK respectively under the largest training sample size of our test, which is 180.
ip t
As mentioned earlier, in the case of small training sample size, TFDK does
not bear notable improvement in accuracy compared to MSVM. However, close
scrutiny of Figure 19 (a) vs. (c) and Figure 20 (a) vs. (b) reveals that TFDK
cr
395
presents much lower misclassification rate among fault types. In Figure 20 (a)
us
and (b), we show the detailed prediction assignment for EO fault by TFDK and MSVM. Indeed, the errors of TFDK mainly occur among severity levels while the correct fault types have already been assigned (Figure 20 (a)). On the other hand, quite a few errors committed by MSVM occur among different fault types
an
400
(Figure 20 (b)).
In the case of larger training sample size, the classification accuracy of
M
MSVM is 89.19% which appears relatively high from the FDD perspective, nevertheless Figure 19 (d) presents that MSVM still generates significant mis405
classification rate among fault types under the large training sample size situa-
d
tion. Among all the methods, when the training sample size is 180 the proposed
te
TFDK behaves with extremely high classification accuracy (99.12%) and very
Ac ce p
low misclassification cost, which is shown in Figure 19 (b).
5. Conclusions and Future Work
410
In this paper, we have proposed a novel data-driven FDD method and devel-
oped corresponding on-line learning algorithm for streaming data. The integration of fault dependence information and the task of severity level detection are firstly considered in this work. Instead of using traditional classification methods which give each category plain labels and ignore the relationship among
415
different faults, we derive a hierarchical kernel learning method which assigns tree-structured labels to the faults. To be specific, we encode the fault dependence information as a “tree” and describe the severity levels as child nodes of each fault type rather than treating them as independent classes. With that, the prior knowledge of the system and the task of identifying fault severity levels
32
Page 32 of 42
420
are treated in a unified framework. We have formulated the tree-structured learning method to diagnose typical
ip t
faults of building cooling system. This method will be applied to identify faults
for more building sub-systems in the future work. For example, monitor the
425
cr
performance of the whole building HVAC system, and recognize all the typical
faults in cooling sub-system, AHU sub-system, and VAV sub-system with a
us
uniform classifier. In addition, other than utilizing expert knowledge to build the tree-structured relationship among faults, the hidden information that cannot be directly described by the physical structure of building system will also be
430
an
explored. We intend to introduce random forest or fixed-point model to capture the hidden information, and combine those information with expert knowledge
M
to capture more enhanced structure of common faults.
6. Appendix
d
6.1. Convergence Argument of the On-line Update Algorithm 6.1.1. Step One
te
Our notation in this paper follows the large margin formulation in [54].
435
Interested readers are referred to [59, 60] for more background information.
Ac ce p
To prove that sufficient improvement can be obtained for the objective func-
tion Eq. (24) in each iteration, firstly consider the dual formulation in Eq. (22) as
1 J (α) = − αT Kα + nT α 2
(30)
Define β as the update step size and τ as the update direction. We have ∆
δJ (β) = J (α + βτ ) − J (α) = − 21 τ T Kβτ − 12 τ T β T Kα − 21 τ T β T Kβτ + βnT τ
(31)
= −βαT Kτ − 21 β 2 τ T Kτ + βnT τ Thus by denoting h∇J (α) , τ i = nT τ − αT Kτ ∂δJ(β) ∂β
= −βτ T Kτ − αT Kτ − nT τ = 0
⇒ β∗ =
nT τ −αT Kτ τ T Kτ
=
h∇J(α),τ i τ T Kτ
(32)
33
Page 33 of 42
Now substitute β ∗ into δJ (β) 2
2 1 (h∇J (α) , τ i) ∆ 1 Dατ = · (Dατ = h∇J (α) , τ i) 2 τ T Kτ 2 τ T Kτ
Since β is within a bounded section 0 ≤ β ≤ B (I) If β ∗ ≤ B, then 2 1 Dατ · T ; 2 τ Kτ
cr
δJ (β ∗ ) =
(34)
us
(II) If β ∗ ≥ B, since J is Convex Quadratic δJ (β ∗ ) ≥ δJ (B)
= B nT τ − αT Kτ − 21 B 2 τ T Kτ B 2
= BDατ −
· τ T Kτ
an
2
(33)
ip t
δJ (β ∗ ) =
M
ατ Note that τ T Kτ > 0 and B ≤ β ∗ = τD T Kτ , then 1 Dατ 1 δJ (β ∗ ) ≥ B Dατ − · T · τ T Kτ = BDατ 2 τ Kτ 2
(35)
(36)
Hence, from (I) and (II), we can get 1 2
min
d
max δJ (β) ≥
0≤β≤B
6.1.2. Step Two
te
=
Dατ 2
n
2 Dατ , BDατ τ T Kτ
min
Dατ ,B τ T Kτ
o (37)
Ac ce p
At each step, assume (xi , yi ) is newly added. Optimize αiy in Eq. (24) with
the upper bound
αiy ≤ ∆ (yi , y)
C ∆ =B n
(38)
Consider the dual formulation in Eq. (22). It is easy to see X ∂L (α) =1− αjy0 K(i,y)(j,y0 ) = 1 − hw, δΦi (y 0 )i ∂αiy j,y
(39)
Since H (y) = (w, δΦi (y 0 )) ∆ (yi , y) and H (y ∗ ) ≥ ξi + ε, then ∂L (α) ξi + ε ε ≥ ≥ (∆ (yi , y) > 0, ξi ≥ 0) ∂αiy ∆ (yi , y) ∆ (yi , y)
(40)
Assuming the step size τ = 1, we can derive
Dατ = nT τ − αT Kη = 1 − αT K =
∂L (α) ∂αiy
(41)
34
Page 34 of 42
Substituting Eq. (38) and Eq. (41) to the result of Step One (Eq. (37)), we can get 1 min 2
1 ∂L (α) C ∂L (α) · , · ∆ (yi , y) · K ∂αiy n ∂αiy
Due to Eq. (40) 1 2
min 1 2
·
n
ε2 , Cε K[∆(yi ,y)]2 n
min
ε C ∆(yi ,y) , n
o · ∆ (yi , y) · o
1 K
ε ∆(y,yi )
(43)
us
=
n
cr
δL (β) ≥
(42)
ip t
δL (β) ≥
If (xi , yi ) is already in the active set, the search direction τ could be tuned
6.1.3. Step Three
(44)
M
440
an
and with a similar argument. We can obtain ( ) 1 Cε ε2 δL (β) ≥ min 2, 2 4K[∆ (yi , y)] n
By denoting Eq. (24) as L (α) and Eq. (14) as P (w), based on the PrimalDual Theory we know that
te
d
L (α) ≤ min P (w)
(45)
Let w = 0, according to Eq. (15) (thus ξi > ∆ (yi , y)) ∆
2
Ac ce p
P (w) = min 21 kwk + ≥0+
C n
n P
C n
n P
ξi
i=1
(46)
∆ (yi , y)
i=1
L (α) ≤ min P (w) =
C n
n P
∆ (yi , y)
i=1
∆
(47)
≥ C · max ∆ (yi , y) = C · ∆max y
Hence the optimal improvement of L (α) is at most C · ∆max . For each step n o ε2 as depicted in Eq. (44). Now the improvement is at least 12 min 4K∆ , Cε 2 n max
we can conclude that the algorithm will converge in the following steps C · ∆max 4CK∆3max n∆max n o = 2 max , 1 ε2 ε2 ε , Cε 2 min 4K∆2 n
(48)
max
35
Page 35 of 42
References
ip t
[1] G. Mantovani, L. Ferrarini, Temperature control of a commercial building with model predictive control techniques, IEEE Transactions on Industrial
445
cr
Electronics 62 (4) (2015) 2651–2660.
[2] J. Yao, G. T. Costanzo, G. Zhu, B. Wen, Power admission control with predictive thermal management in smart buildings, IEEE Transactions on
us
Industrial Electronics 62 (4) (2015) 2642–2650.
[3] A. Schumann, J. Hayes, P. Pompey, O. Verscheure, Adaptable fault identi-
450
an
fication for smart buildings, in: Artificial Intelligence and Smarter Living, AAAI Workshop, 2011.
M
[4] A. Handbook, Hvac applications, ASHRAE Handbook, Fundamentals. [5] H. Dibowski, J. Ploennigs, K. Kabitzsch, Automated design of building
(2010) 3606–3613.
[6] T. Novak, A. Gerstinger, Safety-and security-critical services in building
te
455
d
automation systems, IEEE Transactions on Industrial Electronics 57 (11)
automation and control systems, IEEE Transactions on Industrial Elec-
Ac ce p
tronics 57 (11) (2010) 3614–3621.
[7] M. Comstock, J. Braun, E. Groll, The sensitivity of chiller performance to common faults, HVAC & R Research. 7 (3) (2001) 263–279.
460
[8] S. Wang, J. Cui, A robust fault detection and diagnosis strategy for centrifugal chillers, HVAC & R Research. 12 (3) (2006) 407–428.
[9] S. Katipamula, M. R. Brambley, Review article: Methods for fault detection, diagnostics, and prognostics for building systemsa review, part ii, HVAC & R Research 11 (2) (2005) 169–187.
465
[10] X. Dai, Z. Gao, From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis, IEEE Transactions on Industrial Informatics. 9 (4) (2013) 2226–2238. 36
Page 36 of 42
[11] Y. Yu, D. Woradechjumroen, D. Yu, A review of fault detection and diagnosis methodologies on air-handling units, Energy and Buildings 82 (2014) 550–562.
ip t
470
[12] Z. Gao, C. Cecati, S. X. Ding, A survey of fault diagnosis and fault-tolerant
cr
techniques-part i: fault diagnosis with model-based and signal-based ap-
proaches, IEEE Transactions on Industrial Electronics 62 (6) (2015) 3757–
475
us
3767.
[13] Z. Gao, C. Cecati, S. Ding, A survey of fault diagnosis and fault-tolerant
an
techniques part ii: Fault diagnosis with knowledge-based and hybrid/active approaches, IEEE Transactions on Industrial Electronics. [14] D. J. Cook, S. K. Das, How smart are our environments? an updated look
480
M
at the state of the art, Pervasive and mobile computing 3 (2) (2007) 53–73. [15] A. Purarjomandlangrudi, A. H. Ghapanchi, M. Esmalifalak, A data mining
d
approach for fault diagnosis: An application of anomaly detection algo-
te
rithm, Measurement 55 (2014) 343–352. [16] S. Wu, J. Sun, A top-down strategy with temporal and spatial partition
Ac ce p
for fault detection and diagnosis of building hvac systems, Energy and 485
Buildings 43 (9) (2011) 2134–2139.
[17] Y. Hu, H. Chen, J. Xie, X. Yang, C. Zhou, Chiller sensor fault detection using a self-adaptive principal component analysis method, Energy and buildings 54 (2012) 252–258.
[18] S. Li, J. Wen, A model-based fault detection and diagnostic methodology
490
based on pca method and wavelet transform, Energy and Buildings 68 (2014) 63–71.
[19] B. Sun, P. B. Luh, Q.-S. Jia, Z. O’Neill, F. Song, Building energy doctors: An spc and kalman filter-based method for system-level fault detection in hvac systems, IEEE Transactions on Automation Science and Engineering. 495
11 (1) (2014) 215–229. 37
Page 37 of 42
[20] B. Sun, P. B. Luh, Z. O’Neill, F. Song, Building energy doctors: Spc and
Science and Engineering (CASE), IEEE, 2011, pp. 333–340.
ip t
kalman filter-based fault detection, in: IEEE Conference on Automation
[21] H. Wang, Y. Chen, C. W. Chan, J. Qin, An online fault diagnosis tool of vav terminals for building management and control systems, Automation
cr
500
in Construction 22 (2012) 203–211.
us
[22] G. Mustafaraj, J. Chen, G. Lowry, Development of room temperature and relative humidity linear parametric models for an open office using bms
505
an
data, Energy and Buildings 42 (2010) 348–356.
[23] D. J. Hill, B. S. Minsker, E. Amir, Real-time bayesian anomaly detection for environmental sensor data, in: Proceedings of the Congress-International
M
Association for Hydraulic Research, Vol. 32, Citeseer, 2007, p. 503. [24] Y. Zhao, F. Xiao, S. Wang, An intelligent chiller fault detection and diag-
(2013) 278–288.
te
510
d
nosis methodology using bayesian belief network, Energy and Buildings 57
[25] F. Xiao, Y. Zhao, J. Wen, S. Wang, Bayesian network based fdd strategy
Ac ce p
for variable air volume terminals, Automation in Construction 41 (2014) 106–118.
[26] B. Fan, Z. Du, X. Jin, X. Yang, Y. Guo, A hybrid fdd strategy for lo-
515
cal system of ahu based on artificial neural network and wavelet analysis, Building and environment 45 (12) (2010) 2698–2708.
[27] Y. Zhu, X. Jin, Z. Du, Fault diagnosis for sensors in air handling unit based on neural network pre-processed by wavelet and fractal, Energy and buildings 44 (2012) 7–16.
520
[28] Z. Du, B. Fan, X. Jin, J. Chi, Fault detection and diagnosis for buildings and hvac systems using combined neural networks and subtractive clustering analysis, Building and Environment 73 (2014) 1–11.
38
Page 38 of 42
[29] Z. Du, X. Jin, Multiple faults diagnosis for sensors in air handling unit using fisher discriminant analysis, Energy Conversion and Management. 49 (12) (2008) 3654–3665.
ip t
525
[30] P. Jaikumar, A. Gacic, B. Andrews, M. Dambier, Detection of anomalous
cr
events from unlabeled sensor data in smart building environments, in: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing
530
us
(ICASSP), IEEE, 2011, pp. 2268–2271.
[31] Y. Zhao, S. Wang, F. Xiao, Pattern recognition-based chillers fault detec-
an
tion method using support vector data description (svdd), Applied Energy. 112 (2013) 1041–1048.
[32] Y. Zhao, F. Xiao, J. Wen, Y. Lu, S. Wang, A robust pattern recognition-
535
M
based fault detection and diagnosis (fdd) method for chillers, HVAC&R Research 20 (7) (2014) 798–809.
d
[33] J. Liang, R. Du, Model-based fault detection and diagnosis of hvac systems using support vector machine method, International Journal of refrigeration
te
30 (6) (2007) 1104–1114.
Ac ce p
[34] H. Han, Z. Cao, B. Gu, N. Ren, Pca-svm-based automated fault detection 540
and diagnosis (afdd) for vapor-compression refrigeration systems, HVAC & R Research 16 (3) (2010) 295–313.
[35] K.-Y. Chen, L.-S. Chen, M.-C. Chen, C.-L. Lee, Using svm based method for equipment fault detection in a thermal power plant, Computers in industry 62 (1) (2011) 42–50.
545
[36] K. Yan, W. Shen, T. Mulumba, A. Afshari, Arx model based fault detection and diagnosis for chillers using support vector machines, Energy and Buildings 81 (2014) 287–295. [37] T. Mulumba, A. Afshari, K. Yan, W. Shen, L. K. Norford, Robust modelbased fault diagnosis for air handling units, Energy and Buildings. 86 (2015)
550
698–707. 39
Page 39 of 42
[38] D. Dietrich, D. Bruckner, G. Zucker, P. Palensky, Communication and com-
tions on Industrial Electronics 57 (11) (2010) 3577–3584.
ip t
putation in buildings: A short introduction and overview, IEEE Transac-
[39] S. Yin, S. X. Ding, X. Xie, H. Luo, A review on basic data-driven approaches for industrial process monitoring, IEEE Transactions on Indus-
cr
555
trial Electronics 61 (11) (2014) 6418–6428.
us
[40] Y. Keigo, I. Minoru, Y. Takehisa, M. Kazuo, S. Masaki, M. Yoshio, Identification of causal variables for building energy fault detection by semi-
560
an
supervised lda and decision boundary analysis, in: IEEE International Conference on Data Mining Workshop (ICDMW’08), IEEE, 2008, pp. 164–173. [41] I. Tsochantaridis, T. Hofmann, T. Joachims, Y. Altun, Support vector ma-
M
chine learning for interdependent and structured output spaces, in: Proceedings of the twenty-first international conference on Machine learning,
565
d
ACM, 2004, p. 104.
[42] S. Dumais, H. Chen, Hierarchical classification of web content, in: Proceed-
te
ings of the 23rd annual international ACM SIGIR conference on Research
Ac ce p
and development in information retrieval, ACM, 2000, pp. 256–263. [43] L. K. Norford, J. A. Wright, R. A. Buswell, D. Luo, C. J. Klaassen, A. Suby, Demonstration of fault detection and diagnosis methods for air-handling
570
units, HVAC&R Research 8 (1) (2002) 41–71.
[44] M. Comstock, J. Braun, Fault detection and diagnostic (fdd) requirements and evaluation tools for chillers, West Lafayette, IN: ASHRAE.
[45] S. Li, J. Wen, X. Zhou, C. J. Klaassen, Development and validation of a dynamic air handling unit model, part 1 (rp-1312), ASHRAE Transactions
575
116 (1) (2010) 45. [46] S. Li, J. Wen, X. Zhou, C. J. Klaassen, Development and validation of a dynamic air handling unit model, part 2 (rp-1312), ASHRAE Transactions 116 (1) (2010) 57. 40
Page 40 of 42
[47] S. Wang, J. Cui, Sensor-fault detection, diagnosis and estimation for cen580
trifugal chiller systems using principal-component analysis method, Ap-
ip t
plied Energy 82 (3) (2005) 197–213.
[48] Y. Jia, Model-based generic approaches for automated fault detection, di-
cr
agnosis, evaluation (fdde) and for accurate control of field-operated centrifugal chillers, Ph.D. thesis (2002).
[49] X. Li, C. P. Bowers, T. Schnier, Classification of energy consumption in
us
585
buildings with outlier detection, IEEE Transactions on Industrial Elec-
an
tronics 57 (11) (2010) 3639–3644.
[50] H. Xie, L. E. Pierce, F. T. Ulaby, Sar speckle reduction using wavelet denoising and markov random field modeling, IEEE Transactions on Geoscience and Remote Sensing 40 (10) (2002) 2196–2212.
M
590
[51] J. M. Cimbala, Modified thompson tau used for determination of outliers,
d
Penn State University.
te
[52] D. R. Cox, D. V. Hinkley, Theoretical statistics, CRC Press, 1979. [53] L. Cai, T. Hofmann, Hierarchical document categorization with support vector machines, in: Proceedings of the thirteenth ACM international con-
Ac ce p
595
ference on Information and knowledge management, ACM, 2004, pp. 78–87.
[54] K. Crammer, Y. Singer, On the algorithmic implementation of multiclass kernel-based vector machines, The Journal of Machine Learning Research 2 (2002) 265–292.
600
[55] K. Wang, S. Zhou, S. C. Liew, Building hierarchical classifiers using class proximity, in: 25th International Conference on Very Large Data Bases, Vol. Proceedings of VLDB-99. [56] R. Bellman, Dynamic programming and lagrange multipliers, Proceedings of the National Academy of Sciences 42 (10) (1956) 767–769.
41
Page 41 of 42
605
[57] J. C. Platt, Using analytic qp and sparseness to speed training of support vector machines, Advances in neural information processing systems (1999)
ip t
557–563.
[58] S. Fine, K. Scheinberg, Efficient svm training using low-rank kernel repre-
[59] C. J. Burges, A tutorial on support vector machines for pattern recognition, Data mining and knowledge discovery 2 (2).
us
610
cr
sentations, The Journal of Machine Learning Research 2 (2002) 243–264.
[60] Y. Zhou, J. Y. Baek, D. Li, C. J. Spanos, Optimal Training and Efficient
an
Model Selection for Parameterized Large Margin Learning, Springer, 2016,
Ac ce p
te
d
M
pp. 52–64.
42
Page 42 of 42