Building and Environment 163 (2019) 106319
Contents lists available at ScienceDirect
Building and Environment journal homepage: www.elsevier.com/locate/buildenv
Fault diagnosis based operation risk evaluation for air conditioning systems in data centers
T
Xu Zhu, Zhimin Du*, Xinqiao Jin, Zhijie Chen School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
A R T I C LE I N FO
A B S T R A C T
Keywords: Building thermal environment Fault diagnosis Random forest Decoupling feature Data center air conditioning system
The air conditioning systems in data centers are essential, energy-hungry infrastructures that can keep the rack of servers below the critical temperature limits. Faulty operation may result in increased energy usage, reduced thermal comfort, and increased maintenance costs. This paper presents a novel hybrid model based fault diagnosis and estimation approach to manage the operation risk of data center air conditioning systems. Four decoupling features are firstly developed for the typical faults of the system investigated. They are uniquely affected by individual faults and deviate from their fault-free values when their corresponding faults occur. Then the fault diagnosis model integrating the decoupling features and random forest classifier is used to identify the fault type. Subsequently, the fault intensity estimator established through the random forest regression method is used to estimate the fault severity that can represent the magnitude of operation risk. The performance of the presented models are validated by the experimental data. The results illustrate that the proposed fault diagnosis model has significant performance advantage over two pure data-driven models. It is more robust to various operation conditions that may occur in practical applications. For new operation conditions, the overall correct diagnosis rate is still 94.17%, and the false alarm rates are within 5%. In addition, the mean absolute errors of the four fault intensity estimators are less than 4%.
1. Introduction Building thermal environment has become an increasingly important topic, and a large number of studies focused on the indoor thermal comfort related issues [1–4]. Heating, ventilation and air conditioning (HVAC) systems are essential, energy-hungry infrastructures that can create a desirable indoor environment in various buildings. Specially, for the air conditioning systems in data centers, they can keep the rack of servers below the critical temperature limits. However, various faults may occur inevitably in the data center air conditioning systems after long-term operation. Faulty operation may lead to higher ambient temperatures for the servers in data center, which results in the failure of critical electronic devices. Therefore, it is of great significance to establish an effective fault diagnosis and isolation mechanism for air conditioning systems in data centers. The building architects and managers have already made some efforts and provided some valid methods for regular manual maintenance. However, this may not be the most efficient method due to increased equipment downtime and financial cost for scheduled engineers. The automated fault detection and diagnosis (FDD) technology [5,6] is an instantaneous, more sensitive and potentially cheaper
*
alternative to provide abnormal monitoring and health management. It can therefore be used as an effective and reliable technical tool to manage the operation risk of the data center air conditioning systems. The automated FDD research in the HVAC field began from the late 1980s and early 1990s. In the past two decades, various FDD strategies [7–9] were proposed to address the common faults in the open literatures. Generally, existing researches can be divided into three categories: rules-based, model-based and data-driven approaches. The rules-based approaches [10–12] typically identify the faults according to a set of specific or fuzzy logic rules, which are established based upon expert knowledge or practical experience. The diagnostic rules are generally tailored to specific system structures. They may invalid for the other systems that contain different structures. Consequently, building a complete rule library is the key to the field applications of the rules-based approaches. The model-based approaches [13–15] need to develop accurate mathematical models that match the real physical processes. In fact, it is not an easy task. The semi-empirical models established based upon simplified hypotheses can reduce the model complexity. However, the diagnostic ability may be decreased with the simplification of physical models. Recently, the data-driven approaches have attracted extensive attention with the rapid development of
Corresponding author. E-mail addresses:
[email protected] (X. Zhu),
[email protected] (Z. Du),
[email protected] (X. Jin),
[email protected] (Z. Chen).
https://doi.org/10.1016/j.buildenv.2019.106319 Received 9 May 2019; Received in revised form 5 July 2019; Accepted 29 July 2019 Available online 30 July 2019 0360-1323/ © 2019 Published by Elsevier Ltd.
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Nomenclature P T h f m V W v α a b CRAC CVL CF EA LL NOF CART RF OOB ER MSE MAE CR HR
FAR FI FDD
pressure (Pa) temperature (°C) enthalpy (kJ kg−1) frequency (Hz) mass flow rate (kg s−1) volume flow rate (m3 s−1) power consumption (kW) specific volume (m3 kg−1) heat loss coefficient empirical coefficient of heat loss coefficient model empirical coefficient of fault-free pressure drop model computer room air conditioning compressor valve leakage condenser fouling evaporator airflow reduction liquid line restriction no fault classification and regression tree random forest out of bag error rate for classification mean square error for regression mean absolute error correct rate hit rate
false alarm rate fault intensity fault detection and diagnosis
Subscripts act com cd ca db dis eva ea eev id in ll od out pre ref res shl suc wb
actual compressor condenser condenser air side dry-bulb discharge of compressor evaporator evaporator air side electronic expansion valve indoor inlet liquid line outdoor outlet prediction refrigerant residual compressor shell suction of compressor wet-bulb
algorithms: support vector machines [25–27], Bayesian networks [28], neural networks [29,30], decision trees [31], clustering analysis [16] and association mining [32]. Yan et al. [33] incorporated auto-regression with exogenous variables and support vector machines to diagnose the chiller faults. The method presented had high prediction accuracy and low false alarm rate. Kocyigit [34] combined a fuzzy inference system and multi-layer neural network to diagnose the faults of vapor compression systems. Zhao et al. [35] developed a three-layer chiller diagnostic structure based on the Bayesian belief network. The fault diagnosis method could make full use of the useful information of the chiller concerned and effectively diagnose faults with uncertain information. Generally, the data-driven approaches can obtain good
pattern recognition and data mining techniques. The data-driven approaches [16–19] can be constructed using the historical operation data without the need to gain insight into the physical structure, which greatly increases the possibility of their applications in various HVAC systems. The early data-driven approaches were from quality control methods, including contribution maps [20], partial least squares [21], and cumulative sum control charts [22] etc. Subsequently, the statistical models such as principal component analysis [23] and linear discriminant analysis [24] gradually gained attention. In recent years, various machine learning models have been used for fault detection and diagnosis, which mainly consist of the following and their derivative
Fig. 1. Layout of a typical room in the data center building. 2
Building and Environment 163 (2019) 106319
X. Zhu, et al.
power meters are available. The condenser in the outdoor unit can remove heat, and the evaporator in the CRAC unit supplies the cooled air into the cabinets to keep the hottest regions of the cloud servers below critical temperature limits. The compressor with variable frequency drive can adjust the speed as the load changes. The indoor and outdoor fans operate at a constant speed and their rated volume flow rates are 500 m3 h−1 and 1800 m3 h−1, respectively.
diagnostic results if massive data is available. However, one key shortcoming of the data-driven approaches is that they rely heavily on training data and are sensitive to variations in operation conditions. To ensure good diagnostic ability, the historical data should be rich enough to be representative of the common variabilities of actual operation conditions. The conventional data-driven approaches may not work well when the practical air conditioning systems operate under new operation conditions. Unfortunately, it is generally difficult to obtain the historical data from complete operation conditions in real systems, which greatly limits their practical applications. In this paper, the novel hybrid model based fault diagnosis and estimation approach is presented for the data center air conditioning systems. To overcome the weakness of conventional data-driven approaches, the model-based decoupling features are developed and integrated with the date-driven machine learning models to enhance the robustness. The fault diagnosis model and four fault intensity estimators are presented to diagnose and estimate the typical thermal faults of the data center air conditioning systems, respectively. The performance of the models presented is validated by the experimental data under various operation conditions. In addition, the performance comparison between the proposed fault diagnosis model and two pure data-driven models is given.
2.2. The faults investigated Compressor valve leakage (CVL), condenser fouling (CF), evaporator airflow reduction (EA) and liquid line restriction (LL) are four typical faults of the system studied. The four thermal faults investigated will not lead to the shutdown of system but the deviation of operation performance. Their occurrences are usually gradual and therefore difficult to identify in the early stage. In order to simulate the four types of thermal faults, experiments under various fault intensities are carried out. The fault implementation method, fault intensity definition and range are shown in Table 1. The CVL fault refers to the leakage of refrigerant from the highpressure chamber to the low-pressure chamber inside the compressor, which may be caused by component wear or improper lubrication. The bypass valve shown in Fig. 2 is used to regulate the refrigerant mass flow from the high-pressure side to the low-pressure side to simulate the valve leakage. The fault intensity is defined as the ratio of change in refrigerant mass flow rate to the fault-free refrigerant mass flow rate. The fouling of condenser can lead to the reduction of air flow rate across the condenser. The CF fault is simulated through blocking the external surface of the condenser coils. The fault intensity is defined as the ratio of blocked area to total area. The EA fault refers to the reduction of air flow rate across the evaporator, which may be caused by fan fault or dirty air duct. It is introduced by decreasing the indoor fan speed to reduce the volume flow rate in the wind tunnel. The fault intensity is defined as the ratio of the reduced air flow rate to the fault-free value. The LL fault refers to the liquid pipe blockage caused by the impurities that may be brought into the refrigerant system due to improper initial installation or charge. The liquid line valve shown in
2. System description and faults investigated 2.1. The air conditioning system studied The layout of a typical room in the data center is shown in Fig. 1. The computer room air conditioning (CRAC) units in data center can supply the cooled air into the cabinets through elevated floor and provide the temperature environment required for the reliable operation of servers. The data center air conditioning system including the indoor CRAC and its outdoor units is studied in this paper. Fig. 2 presents the studied air-cooled data center air conditioning system which mainly consists of an evaporator, a condenser, an inverter compressor, an electronic expansion valve (EEV), indoor and outdoor fans. Various temperature sensors, pressure transmitters, mass flow and
Fig. 2. The studied air-cooled air conditioning system in the data center. 3
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Table 1 The faults investigated and fault intensity range. Fault
Fault implementation method
Fault intensity definition
CVL CF EA LL
Partially opening bypass valve Blocking the condenser surface Slowing the indoor fan frequency using installed speed controller Increasing resistance through regulating liquid line valve
% % % %
reduction in refrigerant flow rate from fault-free value of coil area blocked reduction in air flow rate from fault-free value increase in pressure drop of liquid line from fault-free value
Fig. 3. The learning framework of the random forest classifier and regressor. 4
Fault intensity range (%) 0.2–9.8 10–30 3.4–28.9 4.1–38.1
Building and Environment 163 (2019) 106319
X. Zhu, et al.
where heva,in and heva,out are the refrigerant enthalpies at the evaporator inlet and outlet, respectively. The enthalpies hea,in and hea,out are determined based upon the air dry-bulb and wet-bulb temperature measurements at the evaporator inlet and outlet. The normal evaporator air flow rate Vea,normal is 500 m3 h−1 in the system investigated.
Fig. 2 is partially closed to increase the pressure drop of liquid line to simulate the LL fault. The fault intensity is defined as the ratio of increase in the pressure loss of liquid line to the fault-free value. 3. Methodology of fault diagnosis and estimation 3.1. Decoupling features
3.1.4. Liquid line restriction The pressure drop between condenser outlet and expansion valve inlet can represent the LL fault. The residual between the measured and normal pressure drops is used as the decoupling feature of LL fault expressed as Eq. (12).
The model-based decoupling features are developed for the four faults studied. They are uniquely affected by individual faults and deviate from their fault-free values when their corresponding faults occur. The four decoupling features developed in this section are used to integrate with the subsequent machine learning techniques. 3.1.1. Compressor valve leakage The residual between the measured and estimated normal discharge temperatures of compressor can be used to break the coupling between compressor valve leakage and other faults [36]. The normal discharge temperature Tdis, normal is estimated as follows.
hdis, normal = hsuc +
Wcom (1 − αlos ) mref
(2)
Tdis, normal = T (Pdis , hdis, normal )
(3)
Tsuc + Tdis 2
αlos = a0 + a1 (Tshl − Tod, db) + a2 (Tshl − Tod, db)2
+b7 Tid, db Tod, db + b8 Tid, db fcom + b9 Tod, db fcom (13) 3.2. Random forest framework Random forest (RF) is an ensemble learner that integrates a large number of classification and regression trees (CART) [38]. It is a powerful nonparametric data mining algorithm allowing to consider regression as well as multi-class classification problems, depending on whether the basic learner is the regression tree (RT) or the classification tree (CT). RF has better learning capacities than some conventional machine learning techniques, such as support vector machine, radial basis function and logistic regression [39]. Firstly, RF uses a bagging learning framework, it is efficient to overcome the over-fitting problem. In addition, it is robust to the training dataset. The input features can be non-linear, discrete or continuous. Moreover, the bootstrap sampling (random sampling with replacement) is used for the model training, which can improve the generalization capacity. Finally, it has the fast computational speed due to the parallel computing of CARTs. Consequently, the RFs for classification and regression are used to learn the behavior of the four decoupling features in different categories.
(4) (5)
The coefficients a0, a1 and a2 can be determined by the regression based on the fault-free data. The decoupling feature of CVL fault is expressed as Eq. (6).
ΔTdis, res = Tdis − Tdis, normal
(6)
3.1.2. Condenser fouling The reduction of condenser air flow rate caused by the fouling is an independent feature [37]. It is used as the decoupling feature of CF fault expressed by Eq. (8).
Vca =
3.2.1. Random forest classifier The learning framework of the RF classifier is depicted in Fig. 3. Given a set of training data including N samples D = {(Xi , Yi )}iN= 1, where Xi is a vector of training inputs and Yi is the corresponding class label (normal or four faulty modes). The RF classifier can be constructed as follows.
νca mref (hcd, in − hcd, out ) cca (Tca, out − Tca, in )
ΔVca, res = Vca − Vca, normal
(7) (8)
where Vca is the estimated condenser air flow rate, Tca,in and Tca,out are the air temperatures at the condenser inlet and outlet, respectively. The enthalpies hcd,in and hcd,out are determined from the temperature and pressure measurements at the condenser inlet and outlet. The normal condenser air flow rate Vca,normal is 1800 m3 h−1 in the studied system.
(1) Bootstrap samples (random samples with replacement) are drawn from the original dataset D. The selected bootstrap samples are considered as the training data of the single classification tree. (2) Rather that all M features are used for training, a randomly subset of m (the square root of M) features are selected as the training input features. (3) At each node of the single tree, dataset S is divided into two parts: S1 and S2. For a certain feature A and segmentation point s, the Gini impurity of dataset S is calculated as follows.
3.1.3. Evaporator airflow reduction The residual between the estimated and normal evaporator air flow rates can be used as the decoupling feature of EA fault. The evaporator air flow rate can be estimated by Eq. (9). The decoupling feature of EA fault is expressed as Eq. (10).
Vea =
Gain _GiniA, s (S ) =
νea mref (heva, out − heva, in ) hea, in − hea, out
(9)
ΔVea, res = Vea − Vea, normal
(10)
(12)
= ΔPll − ΔPll, normal
2 2 ΔPll, normal = bo + b1 Tid, db + b2 Tod, db + b3 fcom + b4 Tid2 , db + b5 Tod , db + b6 fcom
where hdis, normal is the estimated normal discharge enthalpy, the compressor consumption Wcom and refrigeration mass flow rate mref are estimated using the empirical compressor maps. In this paper, the heat loss coefficient αlos is approximately estimated by the temperature difference between the compressor shell Tshl (the mean of suction and discharge temperatures) and ambient temperatures Tod,d.
Tshl =
(11)
Δ2 Pll, res
where Pcd,out is the pressure at the condenser outlet, Peev,in is the pressure at the expansion valve inlet. The normal pressure drop of liquid line ΔPll,normal is estimated using the ten-coefficient empirical model expressed as Eq. (13). The coefficients can be determined by the regression based upon the fault-free data.
(1)
hsuc = h (Tsuc , Psuc )
ΔPll = Pcd, out − Peev, in
NS1 N Gini (S1) + S 2 Gini (S2) NS NS
(14)
where Ns is the sample number of dataset S, Ns1 and Ns2 are the sample numbers of subset S1 and S2, respectively. The Gini impurities of dataset S1 and S2 can be calculated as follows. 5
Building and Environment 163 (2019) 106319
X. Zhu, et al. K
Gini (S1) = 1 −
∑ (p1k )2 k=1
(4) Repeat the above three steps (1)–(3) until Ntree such CTs are fully constructed.
(15)
K
Gini (S2) = 1 −
∑ (p2k )2 k=1
3.2.2. Random forest regressor The learning framework of the RF classifier is also illustrated in Fig. 3. Given a set of training data including N samples D = {(Xi , Yi )}iN= 1, where Xi is a vector of training inputs and Yi is the corresponding fault intensity. The learning procedure of the RF regressor is as below. Steps (1) and (2) are the same as the training process of the RF classifier. The prediction output of the RF classifier is the discrete class label, while the prediction output of the RF regressor is the continuous fault intensity. Consequently, in step (3), the splitting criterion at each node of the single tree is the minimum square error instead of the Gini impurity [38]. At a certain feature A and segmentation point s, the total square error of dataset S is calculated as follows.
(16)
where p1k and p2k are the frequency of the kth category sample in the subset S1 and S2, respectively, K is the total category number. There are a total of five classification modes (normal and four faulty modes) in this study, namely K = 5. All possible features and segmentation points are traversed. Subsequently, the optimal feature A and segmentation point s are determined as below.
Gcla (S ) = arg min(Gain _GiniA, s (S )) A, s
(17)
Repeat the above processes until the single tree is fully grown without pruning.
Gain _EA, s (S ) = E (S1) + E (S2)
Fig. 4. The proposed three-stage fault diagnosis and estimation logic. 6
(18)
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Table 2 Confusion matrix of a five-classification case.
Table 3 Test conditions of the experimental system.
Diagnosed class
Happened class
E (S1) =
∑
Class Class Class Class Class
1 2 3 4 5
No.
Class 1
Class 2
Class 3
Class 4
Class 5
C11 C21 C31 C41 C51
C12 C22 C32 C42 C52
C13 C23 C33 C43 C53
C14 C24 C34 C44 C54
C15 C25 C35 C45 C55
(Yi − μ1)2
(19)
Xi ∈ S1
E (S2) =
∑
1 2 3 4 5 6 7 8 9
(Yi − μ 2 )2
where Yi is the prediction value of the ith sample, μ1 and μ2 are the prediction mean values of dataset S1 and S2, respectively. All possible features and segmentation points are traversed. The optimal feature A and segmentation point s are determined when the total mean square error of dataset S is minimum.
Greg (S ) = arg min(Gain _EA, s (S )) A, s
Operation condition Tod,db (°C)
Tod,wb (°C)
fcom (Hz)
Low temperature
23
15
Medium temperature
27
17
High temperature
31
20
30 40 50 30 50 70 50 70 90
Table 4 Sample number of the fault-free and faulty tests.
(20)
Xi ∈ S2
Condition mode
(21)
The above processes are repeated at each node until the single RT is fully grown. The above three steps (1)–(3) are repeated until Ntree such RTs are fully constructed.
Test
Sample number
NOF CVL CF EA LL Total
12653 11007 10860 8679 7689 50888
new steady state. The transient data at the beginning of fault process are not used as the training samples.
3.2.3. Out-of-bag error estimate Specially, the bootstrap samples (random samples with replacement) are drawn from the original dataset at the beginning of constructing individual tree. Therefore, for each tree, nearly 1-exp−1 of the original data are sampled as the training samples, and the remaining data are left out of the bootstrap samples [38]. The unselected samples constitute the out-of-bag (OOB) samples. For each input Xi of the training data D = {(Xi , Yi )}iN= 1, the ensemble prediction tˆOOB (Xi ) is calculated by aggregating the trees that take Xi as the OOB sample. The error rate (EROOB) for classification and the mean square error (MSEOOB) for regression are expressed as Eq. (22) and Eq. (23). In the learning procedure, the EROOB and MSEOOB can be used to evaluate the performance of the trained RF classifier and regressor, respectively.
3.3.1.2. Feature extraction. Feature extraction is to extract the highlevel features from the original measurements in high-dimensional space. The high-level features can preserve as much of the class discriminatory information as possible in low-dimensional space. Consequently, the classification ability of the RF can be enhanced by using the high-level features as the inputs. The four decoupling features are firstly obtained from the steadystate measurements. They are further transformed to the dimensionless normalized features expressed as Eqs. (24)-(27), which are used as the high-level features in this study.
NFCVL =
ΔTdis, res Tdis, normal
(24)
NFCF =
ΔVca, res Vca, normal
(25)
NFEA =
ΔVea, res Vea, normal
(26)
where Yi is the class label for classification and the fault intensity for regression, I(*) is the indicator function.
NFLL =
Δ2 Pll, res ΔPll, normal
(27)
3.3. Hybrid model based fault diagnosis and estimation approach
3.3.1.3. Parameter tuning. The fault diagnosis task is to distinguish between the normality (NOF) and four faults (CVL, CF, EA and LL), which is formulated as a five-classification pattern recognition problem. The input and output of the RF classifier used for fault diagnosis are the four normalized features and class label, respectively. The input and output of the RF regressor used for fault estimation are the four normalized features and corresponding fault intensity, respectively. The EROOB and MSEOOB have been proven to be an unbiased estimate of the test error and can be used as the performance indexes for parameter tuning [41–43]. The tree number Ntree and the minimum observations per leaf Nleaf are two key parameters of the RF. The grid search method [44] is used to obtain the optimal parameters Ntree and Nleaf. The RF classifier with optimal parameters is used to recognize the fault type in stage two. The four RF regressors with optimal parameters are used to estimate the fault intensity of the identified type in stage
N
EROOB
=
∑i = 1 I (tˆOOB (Xi ) ≠ Yi ) (22)
N N
MSEOOB
=
∑i = 1 {tˆOOB (Xi ) − Yi }
2
N
(23)
The fault diagnosis and estimation logic shown in Fig. 4 mainly consists of three stages: model training, fault diagnosis and fault estimation. 3.3.1. Stage one: model training 3.3.1.1. Steady state detection. All of the measurement information including steady-state and transient data are collected into the history database. Since the decoupling features are established based upon the steady-state equations. The transient data should be filtered out using an effective steady-state detector. The moving window algorithm [40] is used to judge whether the studied system operates under steady state. For each fault scenario, the training sample data under moving time window could only be collected once the studied system reaches to the 7
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Fig. 5. Clustering performance of the normalized features under three temperature condition modes.
Fig. 6. Parameter tuning of the RF algorithms based on the grid search method.
three.
integrating the decoupling features and the trained RF classifier. Firstly, the test data including transient information is filtered by the moving window algorithm [40]. The steady-state data is remained as the effective test data. The decoupling features are then determined based
3.3.2. Stage two: fault diagnosis The hybrid model based fault diagnosis is implemented through 8
Building and Environment 163 (2019) 106319
X. Zhu, et al.
samples that are diagnosed as class j when class i actually happens. Correct diagnosis rate (CR) expressed as Eq. (30) is defined as the percentage of the correctly diagnosed samples among the total samples of all classes, it is an overall performance index. Hit rate (HR) expressed as Eq. (31) is defined as the percentage of the correct diagnosed samples among all of the happened samples for a given kth class, it is an individual performance index of each class. HRk can represent the correct diagnosis ability for the kth class. False alarm rate (FAR) expressed as Eq. (32) is defined as the percentage of the false diagnosed samples among all of the not-happened samples for a given kth class. FARk can represent the false alarm rate for the kth class.
Table 5 Optimal parameters of the RF classifier and regressors. Model RF RF RF RF RF
classifier regressor regressor regressor regressor
for for for for
CVL CF EA LL
Ntree
Nleaf
The minimum of EROOB or MSEOOB
33 48 48 39 27
3 5 1 3 2
4.6447 × 10−4 0.0400 0.0123 0.2472 0.1879
upon the steady-state test samples and further transformed to the normalized features, which are used as the test inputs of the trained RF classifier with optimal parameters. Each CT will give a predicted class label of the test normalized feature sample. The predictions of Ntree CTs are finally aggregated through majority voting. Consequently, as shown in Eq. (28), the class Y (normal or 4-faulty modes) with the highest number of votes is the recognized category for the test input sample x. Once the system is diagnosed as a type of fault, it may be at operation risk.
CR =
HR k =
Y
∑
I (CTj (x ) = Y )
j=1
(28)
Ckk (31)
5 ∑i = 1 Cik 5 5 ∑i = 1 ∑ j = 1 Cij
− Ckk 5
− ∑ j = 1 Ckj
(32)
N
3.3.3. Stage three: fault estimation The hybrid model based fault intensity estimator is the integration model of the decoupling features and the trained RF regressor. When the data center air conditioning system is recognized as a type of fault, the corresponding RF regressor for the diagnosed fault is used to estimate the fault intensity. The normalized features are used as the test inputs of the trained RF regressor with optimal parameters. Each RT will provide an estimated fault intensity of the test input sample. The estimated fault intensity expressed as Eq. (29) is the average of all estimated values of Ntree RTs. The fault intensity estimated can represent the magnitude of operation risk.
MAE =
∑i = 1 FIpre − FIact N
(33)
4. Validation data and data preparation 4.1. Experimental data The experimental data is used to validate the performance of the proposed fault diagnosis and estimation approach. Nine operation conditions including low, medium and high temperature modes are selected as the test conditions shown in Table 3. The experimental system is firstly tested under fault-free operation conditions. The tests of the four faults at various intensities are then performed under different conditions illustrated in Table 3. Each test lasts for at least 20 min and the test samples are collected by the data acquisition system with a sampling rate of 1 s. Each test contains the total measurement information of temperature, pressure, flow rate and power of the experimental data center air conditioning system. All of the original samples collected constitute the experimental database. Table 4 presents the sample number of the fault-free and faulty tests.
N
RTj (x ) ∑ j =tree 1 Ntree
5
The mean absolute error (MAE) is used to evaluate the performance of the proposed four fault intensity estimators. The MAE expressed as Eq. (33) is defined as the average of the absolute values of the residuals between the predicted and actual intensities of N samples.
where CTj is the jth classification tree, I(*) is the indicator function, C(x) is the recognized class of the test input sample x.
FI (x ) =
(30)
∑ j = 1 Ckj
FAR k =
Ntree
C (x ) = arg max
5 ∑i = 1 Cii 5 5 ∑i = 1 ∑ j = 1 Cij
(29)
where RTj is the jth regression tree, FI(x) is the estimated fault intensity. 3.4. Evaluation index The confusion matrix for five-class pattern recognition is shown in Table 2. The Cij in the confusion matrix represents the number of
Fig. 7. Diagnosis results of the proposed hybrid model based fault diagnosis method. 9
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Fig. 8. Diagnosis results of the pure CART.
Fig. 9. Diagnosis results of the pure RF.
coordinate system, because the normalized features are close to zero when the studied system operates normally. On the other hand, the samples of the CF, EA, and LL faults are mainly distributed in the coordinate direction of the corresponding normalized features since the decoupling features are uniquely affected by individual faults. The good classification capacity of the normalized features can enhance the learning performance of the integrated RF classifier. In addition, the distribution patterns of the normalized feature samples are similar under three temperature condition modes, indicating that the clustering performance is less affected by the condition changes.
4.2. Data segmentation To test the robustness of the presented models, only three operation conditions (No. 1, No. 5, No. 9) including low, medium and high temperature modes are used as the training conditions, and the remaining operation conditions are used as the extended conditions. Subsequently, seventy percent of the data under the training conditions are randomly selected as the training data, which construct the historical database for model training shown in Fig. 4. The rest data in the experimental database are used as the validation and test data, which constitute the validation & test database in Fig. 4. The validation and test data are used to evaluate the performance of the presented models for the training and extended conditions, respectively.
5.2. Parameter tuning of the random forest The tree number Ntree and the minimum observations at leaf node Nleaf are two important parameters of the RF. The tuning of the two key parameters is realized by the grid search method based upon the training data set. In a given search region, the parameter set (Ntree, Nleaf) that minimizes the EROOB or MSEOOB is the optimal one. As illustrated in Fig. 6, the EROOB for classification and the MSEOOB for regression decrease rapidly with the increase of the parameter Ntree, which illustrates that the EROOB and MSEOOB are more sensitive to the parameter Ntree. The optimal parameter set of the RF classifier is (33, 3) and the minimum of the EROOB is 4.65 × 10−4. The RF regressor for CVL fault is taken as an example to show the tuning process of the fault intensity estimators. At this time, the optimal parameter set is (48, 5) and the
5. Validation and results 5.1. Clustering performance of the decoupling features It is generally difficult to visualize the four-dimensional normalized feature samples in three-dimensional space. Consequently, the threedimensional normalized feature samples (NFCF, NFEA, NFLL) obtained from the training data are taken as a case to show the clustering performance of the decoupling features in this section. As illustrated in Fig. 5, different categories of normalized feature samples are clearly distinguished. The fault-free samples are clustered near the origin of the 10
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Fig. 10. Comparison between the proposed hybrid model and two pure data mining methods based on the validation data.
Fig. 11. Comparison between the proposed hybrid model and two pure data mining methods based on the test data.
minimum of the MSEOOB is 0.04. The optimal parameter sets of the other fault diagnosis and estimation models are presented in Table 5.
5.3. Fault diagnosis results 5.3.1. Fault diagnosis using the hybrid model based fault diagnosis method Fig. 7 shows the diagnosis results of the proposed hybrid model based fault diagnosis method based upon the validation and test data. 11
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Fig. 12. Fault estimation results of the four fault intensity estimators based on the validation data under the training conditions.
Fig. 9, as for the pure RF, the overall CR for the validation data is up to 99.89%, while the overall CR for the test data is 45.87%. Since the RF model is an ensemble learner that integrates a large number of the CARTs, the diagnostic performance of the pure RF is better than that of the pure CART.
The proposed method performs well for the validation data under the training conditions, and the overall CR is as high as 99.94%. A small percentage of the samples under normal and faulty modes are misdiagnosed as the LL fault when the hybrid model is evaluated based on the test data under the extended conditions. Even then the overall CR is still 94.17%. In general, the hybrid model based fault diagnosis method has good performance for the training and extended conditions. It may be robust to the new operation conditions that may occur in practical application.
5.3.3. Comparison between the hybrid model and pure data mining methods The diagnosis results of the proposed hybrid model based fault diagnosis and two pure data mining methods are compared based on the validation data. As shown in Fig. 10, they have excellent classification capacities for the validation data under the training conditions. Their HRs are all above 99%, and the FARs are less than 0.2%. Even for the worst-performing CART model, only 0.17% of the faulty samples are missed as the normality, and 0.1% of the fault-free and other faulty samples are misdiagnosed as the LL fault. In general, the proposed hybrid diagnosis model has a slight performance advantage over the two pure data mining methods. However, the pure CART and RF work not well when they are evaluated based on the test data under the extended conditions shown in Fig. 11. They have significant performance degradation in terms of the HR and FAR. It is worth mentioning that the pure RF presents good diagnosis results for the CVL and EA faults, but performs poorly for
5.3.2. Fault diagnosis using the pure data mining methods Instead of using the integration of the decoupling features and RF classifier, two pure data mining CART [45] and RF are used to recognize the faults. The fault diagnosis task is still considered as the fiveclassification pattern recognition problem. The inputs of the two pure data-driven models are all the variables used to develop the four decoupling features, and the output is the corresponding class label. The pure CART is pruned to prevent the overfitting. The optimal parameter set (Ntree, Nleaf) of the pure RF is (40, 7). As illustrated in Fig. 8, the pure CART performs well for the validation data, and the overall CR is 99.70%. However, its CR is only 37.20% when it is evaluated based on the test data under the extended conditions. Similarly, as shown in 12
Building and Environment 163 (2019) 106319
X. Zhu, et al.
Fig. 13. Fault estimation results of the four fault intensity estimators based on the test data under the extended conditions.
conditions. Compared with the fault estimation results in Fig. 12, the performance of the four fault intensity estimators is degraded. The RF model is influenced by the untrained conditions, which results in the increase of estimation errors for those extended samples. In addition, the calculation of decoupling features may have large errors for the untrained conditions. Consequently, it decreases the estimation accuracy when the calculated features are used as the inputs of the RF. Even so, the four models still maintain good fault estimation performance for the extended conditions. Although the CVL fault intensity estimator shows large estimation errors in some cases, the overall MAE is still 1.22%. In addition, the LL fault intensity estimator has the maximum MAE among the four models. The actual fault intensity of the LL fault is distributed in the range of 4.1%–38.1%. The performance is actually also good even if the MAE is 3.67%
other faults. It indicates that the pure data-driven methods without physical meaning may provide unbalanced diagnosis results for different fault categories. The proposed hybrid model based fault diagnosis method has good recognition capacity for each class and has performance advantages over the pure CART and RF. The HRs are above 90%, and the FARs are less than 5%. 5.4. Fault estimation results 5.4.1. Fault estimation of the four estimators based on the validation data The performance of the four fault intensity estimators is evaluated based on the validation data under the training conditions illustrated in Fig. 12. The maximum estimation errors are within ± 2%, and the overall MAEs are less than 0.5%. The four fault intensity estimators shows the similar prediction results under low, medium and high temperature condition modes. Through the validation data is not used for training, the four fault intensity estimators still have high prediction accuracy.
6. Conclusions To manage the operation risk of the studied air-cooled air conditioning system in the data center, the novel hybrid model based fault diagnosis and estimation approach is presented in this paper. The diagnosis and estimation performance are validated by the experimental data under various operation conditions, and the conclusions can be
5.4.2. Fault estimation of the four estimators based on the test data As illustrated in Fig. 13, the performance of the four fault intensity estimators is evaluated based upon the test data under the extended 13
Building and Environment 163 (2019) 106319
X. Zhu, et al.
drawn as follows.
(2011) 970–979. [16] Z.M. Du, B. Fan, X.Q. Jin, J.L. Chi, Fault detection and diagnosis for buildings and HVAC systems using combined neural networks and subtractive clustering analysis, Build. Environ. 73 (2014) 1–11. [17] A. Beghi, R. Brignoli, L. Cecchinato, G. Menegazzo, M. Rampazzo, F. Simmini, Datadriven fault detection and diagnosis for HVAC water chillers, Contr. Eng. Pract. 53 (2016) 79–91. [18] M. Najafi, D.M. Auslander, P. Haves, M.D. Sohn, A statistical pattern analysis framework for rooftop unit diagnostics, HVAC R Res. 18 (2012) 406–416. [19] S.C. Gao, M.C. Zhou, Y.R. Wang, J.J. Cheng, H. Yachi, J.H. Wang, Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction, IEEE T. Neur. Net. Lear. 30 (2) (2019) 601–614. [20] S.W. Wang, J.T. Cui, Sensor-fault detection, diagnosis and estimation for centrifugal chiller systems using principal-component analysis method, Appl. Energy 82 (2005) 197–213. [21] S.M. Namburu, M.S. Azam, J.H. Luo, K. Choi, K.R. Pattipati, Data-driven modeling, fault diagnosis and optimal sensor selection for HVAC chillers, IEEE Trans. Autom. Sci. Eng. 4 (2007) 469–473. [22] Z.W. Li, C.J.J. Paredis, G. Augenbroe, G.S. Huang, A rule augmented statistical method for air-conditioning system fault detection and diagnostics, Energy Build. 54 (2012) 154–159. [23] Z.M. Du, X.Q. Jin, L.Z. Wu, Fault detection and diagnosis based on improved PCA with JAA method in VAV systems, Build. Environ. 42 (2007) 3221–3232. [24] D. Li, G.Q. Hu, C.J. Spanos, A data-driven strategy for detection and diagnosis of building chiller faults using linear discriminant analysis, Energy Build. 128 (2016) 519–529. [25] H. Han, Z.K. Cao, B. Gu, N. Ren, PCA-SVM-based automated fault detection and diagnosis (AFDD) for vapor-compression refrigeration systems, HVAC R Res. 16 (2010) 295–313. [26] W.Y. Zhang, H.G. Zhang, J.H. Liu, K. Li, D.S. Yang, H. Tian, Weather prediction with multiclass support vector machines in the fault detection of photovoltaic system, IEEE-CAA J. Automatic. 4 (3) (2017) 520–525. [27] P.Y. Zhang, S. Shu, M.C. Zhou, An online fault detection model and strategies based on SVM-Grid in clouds, IEEE-CAA J. Automatic. 5 (2) (2018) 445–456. [28] S.W. He, Z.W. Wang, Z.W. Wang, X.W. Gu, Z.F. Yan, Fault detection and diagnosis of chiller using Bayesian network classifier with probabilistic boundary, Appl. Therm. Eng. 107 (2016) 37–47. [29] S.B. Shi, G.N. Li, H.X. Chen, J.Y. Liu, Y.P. Hu, L. Xing, W.J. Hu, Refrigerant charge fault diagnosis in the VRF system using Bayesian artificial neural network combined with ReliefF filter, Appl. Therm. Eng. 112 (2017) 698–706. [30] Z.J. Huang, Z.S. Wang, H.G. Zhang, Multilevel feature moving average ratio method for fault diagnosis of the microgrid inverter switch, IEEE-CAA J. Automatic. 4 (2) (2017) 177–185. [31] R. Yan, Z.J. Ma, Y. Zhao, G. Kokogiannakis, A decision tree based data-driven diagnostic strategy for air handling units, Energy Build. 133 (2016) 37–45. [32] G.N. Li, Y.P. Hu, H.X. Chen, H.R. Li, M. Hu, Y.B. Guo, J.Y. Liu, S.B. Sun, M. Sun, Data partitioning and association mining for identifying VRF energy consumption patterns under various part loads and refrigerant charge conditions, Appl. Energy 185 (2017) 846–861. [33] K. Yan, W. Shen, T. Mulumba, A. Afshari, ARX model based fault detection and diagnosis for chillers using support vector machines, Energy Build. 81 (2014) 287–295. [34] N. Kocyigit, Fault and sensor error diagnostic strategies for a vapor compression refrigeration system by using fuzzy inference systems and artificial neural network, Int. J. Refrig. 50 (2015) 69–79. [35] Y. Zhao, F. Xiao, S.W. Wang, An intelligent chiller fault detection and diagnosis methodology using Bayesian belief network, Energy Build. 57 (2013) 278–288. [36] H.R. Li, J.E. Braun, Decoupling features and virtual sensors for diagnosis of faults in vapor compression air conditioners, Int. J. Refrig. 30 (2007) 546–564. [37] X.Z. Zhao, M. Yang, H.R. Li, Field implementation and evaluation of a decouplingbased fault detection and diagnostic method for chillers, Energy Build. 72 (2014) 419–430. [38] G. Biau, E. Scornet, A random forest guided tour, Test-Spain 25 (2016) 197–227. [39] J. Maroco, D. Silva, A. Rodrigues, M. Guerreiro, I. Santana, A. de Mendonca, Data mining methods in the prediction of Dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests, BMC Res. Notes 4 (2011) 299. [40] M. Kim, S.H. Yoon, W.V. Payne, P.A. Domanski, Cooling Mode Fault Detection and Diagnosis Method for a Residential Heat Pump vol. 1087, NIST Special Publication, 2008. [41] A.L. Boulesteix, S. Janitza, J. Kruppa, I.R. Konig, Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics, Wires Data Min. Knowl. 2 (2012) 493–507. [42] C. Lindner, P.A. Bromiley, M.C. Ionita, T.F. Cootes, Robust and accurate shape model matching using random forest regression-voting, IEEE T. Pattern Anal. 37 (2015) 1862–1874. [43] E. Scornet, G. Biau, J.P. Vert, Consistency of random forests, Ann. Stat. 43 (4) (2015) 1716–1741. [44] L. Torgo, Data Mining with R: Learning with Case Studies, Chapman and Hall/CRC, 2011. [45] W.Y. Loh, Classification and regression trees, Wires Data Min. Know l (1) (2011) 14–23.
1) The four decoupling features can be used as the high-level features to get maximal class separation and remain the category discriminatory information as much as possible in the low-dimensional space. The good clustering performance of the decoupling features can strengthen the learning ability and enhance the robustness of the integrated RF algorithms. 2) The hybrid model based fault diagnosis method performs well under various conditions. The overall CRs for the training and extended conditions are 99.94% and 94.17%, respectively. Compared with the pure data-driven CART and RF methods, the proposed fault diagnosis method is more robust to various operation conditions that may occur in practical applications. 3) The four hybrid model based fault intensity estimators have high prediction accuracy for the training conditions. In addition, they also perform well for the extended conditions. For new operation conditions, the MAEs of the four fault estimation models are still less than 4%. Compared with the fault diagnosis, the estimation errors are relatively high because of the difficulty of intensity estimation and disturbance of untrained conditions. The novel hybrid model based fault diagnosis and estimation approach that integrates the decoupling features and random forests is effective for the faults investigated. In the future research, we'd like to test and compare other machine learning methods. Acknowledgments This work was supported by the National Natural Science Foundation of China (No. 51876119) and the Shanghai Pujiang Program (No.17PJD017). References [1] X. Zhao, W. Liu, D. Lai, Q. Chen, Optimal design of an indoor environment by the CFD-based adjoint method with area-constrained topology and cluster analysis, Build. Environ. 138 (2018) 171–180. [2] W. Liu, T. Zhang, Y. Xue, Z.J. Zhai, J. Wang, Y. Wei, Q. Chen, State-of-the-art methods for inverse design of an enclosed environment, Build. Environ. 91 (2015) 91–100. [3] J. Yin, S. Zhu, P. MacNaughton, J.G. Allen, J.D. Spengler, Physiological and cognitive performance of exposure to biophilic indoor environment, Build. Environ. 132 (2018) 255–262. [4] W.X. Pan, S.M. Liu, S.S. Li, X.L. Cheng, H. Zhang, Z.W. Long, T.F. Zhang, Q.Y. Chen, A model for calculating single-sided natural ventilation rate in an urban residential apartment, Build. Environ. 147 (2019) 372–381. [5] K.P. Lee, B.H. Wu, S.L. Peng, Deep-learning-based fault detection and diagnosis of air-handling units, Build. Environ. 157 (2019) 24–33. [6] S.Y. Wu, J.Q. Sun, Cross-level fault detection and diagnosis of building HVAC systems, Build. Environ. 46 (8) (2011) 1558–1566. [7] Y. Zhao, T.T. Li, C. Fan, J.C. Lu, X.J. Zhang, C.B. Zhang, S.Q. Chen, A proactive fault detection and diagnosis method for variable-air-volume terminals in building air conditioning systems, Energy Build. 183 (2019) 527–537. [8] Z.K. Liu, Y.H. Liu, D.W. Zhang, B.P. Cai, C. Zheng, Fault diagnosis for a solar assisted heat pump system under incomplete data and expert knowledge, Energy 87 (2015) 41–48. [9] E.K. Alexandersen, M.R. Skydt, S.S. Engelsgaard, M. Bang, M. Jradi, H.R. Shaker, A stair-step probabilistic approach for automatic anomaly detection in building ventilation system operation, Build. Environ. 157 (15) (2019) 165–171. [10] H. Yang, S. Cho, C.S. Tae, M. Zaheeruddin, Sequential rule based algorithms for temperature sensor fault detection in air handling units, Energy Convers. Manag. 49 (2008) 2291–2306. [11] F. Xiao, C.Y. Zheng, S.W. Wang, A fault detection and diagnosis strategy with enhanced sensitivity for centrifugal chillers, Appl. Therm. Eng. 31 (2011) 3963–3970. [12] H.T. Wang, Y.M. Chen, C.W.H. Chan, J.Y. Qin, An online fault diagnosis tool of VAV terminals for building management and control systems, Autom. ConStruct. 22 (2012) 203–211. [13] H.R. Li, J.E. Braun, Development, evaluation, and demonstration of a virtual refrigerant charge sensor, HVAC R Res. 15 (2009) 117–136. [14] S. Li, J. Wen, A model-based fault detection and diagnostic methodology based on PCA method and wavelet transform, Energy Build. 68 (2014) 63–71. [15] X.B. Yang, X.Q. Jin, Z.M. Du, Y.H. Zhu, A novel model-based fault detection method for temperature sensor using fractal correlation dimension, Build. Environ. 46 (4)
14