International Journal of Refrigeration 107 (2019) 63–72
Contents lists available at ScienceDirect
International Journal of Refrigeration journal homepage: www.elsevier.com/locate/ijrefrig
Gradual fault early stage diagnosis for air source heat pump system using deep learning techniques Zhe Sun a, Huaqiang Jin a, Jiangping Gu a, Yuejin Huang a, Xinlei Wang b, Xi Shen a,∗ a b
School of Mechanical Engineering, Zhejiang University of Technology, 288 Liuhe Road, Hangzhou 310023, China Department of Agricultural and Biological Engineering, University of Illinois at Urbana-Champaign, Pennsylvania Avenue, Urbana IL 61801, United States
a r t i c l e
i n f o
Article history: Received 16 April 2019 Revised 10 June 2019 Accepted 22 July 2019 Available online 29 July 2019 Keywords: ASHP Gradual fault Early stage diagnosis Deep learning Intelligent modeling
a b s t r a c t Due to slow development and no evident characteristic of gradual fault in air source heat pump (ASHP) systems, existing methods are insufficient in detecting gradual fault at early stages, which causes many ASHPs to be running under minor gradual fault. Gradual fault in systems, including minor gradual fault, will decrease efficiency, increase energy consumption, reduce environmental thermal comfort, and increase carbon emissions. This paper proposes a novel gradual fault diagnosis approach, which mainly includes three contributions. Firstly, for ASHP modeling, a convolution-sequence (C-S) model is proposed; Secondly, a pre-process thinking for fault diagnosis is proposed, which makes the diagnosis method have a more suitable dataset; Finally, a convolutional neural network with an optimized convolution kernel (one-dimensional convolution kernel) is used to diagnose the specific failure for ASHP. The optimal hyperparameter selection is identified with many attempts. Furthermore, a detailed comparison between different fault diagnosis method models is also studied. In the last part of the results and discussion, the outcome of the diagnosis effectiveness by the C-S model accuracy is obtained. Therefore, the proposed method has a desirable effect on gradual fault detection and diagnosis, which means it is a feasible and high-precision detection and diagnosis method for gradual fault in ASHP systems. © 2019 Elsevier Ltd and IIR. All rights reserved.
Diagnostic précoce des défaillances graduelles d’un système de pompe à chaleur aérothermique en utilisant des techniques d’apprentissage approfondi Mots-clés: Pompe à chaleur aérothermique (ASHP); Défaillance graduelle; Diagnostic précoce; Apprentissage approfondi; Modélisation intelligente
1. Introduction Energy consumption of buildings has contributed a large proportion to the increasing total energy usage. It represents more than 40% of the total energy consumption in the U.S. and Europe (Hong et al., 2015; Zhao and Magoulès, 2012). With the rapid development of the Chinese economy, it is also evident that building energy consumption is increasing. In 2010, approximately 27.3% of the total energy was consumed by building systems, making China the second largest building energy consumer in the world after the U.S. In particular, heating, ventilation, and air-conditioning (HVAC)
∗
Corresponding author. E-mail address:
[email protected] (X. Shen).
https://doi.org/10.1016/j.ijrefrig.2019.07.020 0140-7007/© 2019 Elsevier Ltd and IIR. All rights reserved.
systems account for almost half of building energy consumption (Cao et al., 2016), and energy-saving is always a key research focus of HVAC fields. Air source heat pumps (ASHP) are an important component of HVAC systems. Noting that the Chinese government has advocated for a coal-to-electricity policy, the energy consumption of building heating will continue to increase. Therefore, ASHP energy saving is an important research subject. Running ASHP systems in a faulty state can cause inefficient operations, increased energy costs, user discomfort, shorter equipment life, and increased carbon emissions. Recognition and remediation of faulty conditions can reduce energy consumption and carbon emissions by 15–30% in ASHP systems (Kim and Katipamula, 2017; LazarovaMolnar and Mohamed, 2017; Lazarova-Molnar et al., 2016). In recent years, China has continued to implement energy saving emission reduction initiatives, green buildings, and building energy
64
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
Nomenclatures Tdis Tsuc Tc Te Tin|c Tout|c Tin|e Tout|e pdis psuc Toutdoor Tindoor S
υ
Tren|c Tsuy|c Tren|e Tsuy|e E P I V
discharge temperature suction temperature condensing temperature evaporating temperature temperature of condenser inlet temperature of condenser outlet temperature of evaporator inlet temperature of evaporator outlet discharge pressure suction pressure outdoor temperature indoor temperature speed of compressor opening of expansion valve temperature of condenser return air temperature of condenser supply air temperature of evaporator return air temperature of evaporator supply air electric energy electric power current voltage
efficiency policies. Therefore, a fault diagnosis study of ASHP systems has great potential for increased energy saving and environmental protection. Common ASHP faults can be classified into two kinds: abrupt fault and gradual fault. The former occurs in a very short time, such as fan breakoff, compressor breakoff, etc., which will have a huge impact on the system. Therefore, it must be detected rapidly. Fortunately, abrupt fault has many obvious characteristics, which makes it easily detectable. The latter occurs slowly and accumulates continually, such as condenser fouling, refrigerant leakage, etc. This kind of fault occurs in a gradual change process, which makes it difficult to be detected at an early stage. Investigation shows that over 20% of HVAC systems are running under early gradual fault and present energy wastage over 15% (Yu et al., 2014). In industry, periodic maintenance is a normal way to eliminate the impact of gradual fault. Periodic maintenance is an insufficient and blind maintenance strategy that also causes resources to be wasted. Evaluating the degree of gradual fault accurately and using it to direct maintenance can surely increase efficiency and reduce maintenance costs. Thus, the aim of this paper is diagnosis of gradual fault.
1.1. Related works The theoretical research and practical application of the automatic fault diagnosis technology over the past four decades have provided a new way to improve system reliability and safety. Compared with traditional fault diagnosis methods, automatic fault diagnosis methods have low cost, fast speed, and less dependence on knowledge, which greatly liberates manpower and improves efficiency. Automatic fault diagnosis began to appear in the early seventies. Early automatic fault diagnosis technology was used in many fields, such as aerospace and automotive engineering, nuclear energy, and national defense. More recently, it has also been used in the HVAC industry. Katipamula and Brambley (2005a, b) conducted a detailed review of fault detection and diagnosis studies of building systems. In the review, all of the methods are
divided into three categories: quantitative model-based, qualitative model-based, and data-driven methods. This classification provided a good overview of the earlier fault diagnosis methods. After 2005, new methods of fault diagnosis have also been organized according to this classification method. Although data-driven methods began to develop rapidly after 2005, model-based approaches are still very common. For example, many people such as Müller et al. (2013) still used model-based methods to diagnose faults of different components and systems (Kim et al., 2008; O’Neill et al., 2014; Schein et al., 2006; Sterling et al., 2014; Yang et al., 2011; Zhou et al., 2009). In the following years, new mathematical methods and new signal processing methods were introduced by research in the field of fault diagnosis. Du et al. (2007) used the principal component analysis (PCA) method for fault detection. This method can transform data into subspaces so that the data are orthogonal to each other and the coupling is reduced. Thus, new data can be sorted according to the contribution rate. PCA is capable of not only reducing the dimension but also detects abnormal data very well (Li et al., 2016b; Li and Wen, 2014). Kocyigit (2015) proposed a system fault diagnosis method using fuzzy inference systems. Widodo and Yang (2007) introduced a support vector machine into the fault classification field and obtained a pretty good effect. Fan et al. (2010) trained a neural network model by faultlabeled data, and used the trained and wavelet analysis to classify sensor faults. Cai et al. (2016) used Bayesian networks for real-time fault diagnosis of complex systems, which can be applied to heat pump systems. With the reduction of sensor prices and the development of the internet in recent years, more and more historical data has been stored in databases, which makes data mining and data-driven methods readily available. A large number of data-based fault diagnosis methods emerged after 2006. Hou et al. (2006) used data mining methods to analyze historical data to approach fault diagnosis. Cai et al. (2017) proposed a Bayesian network based data-driven fault diagnosis methodology in three-phase inverters for PMSM drive systems, and the research has shown encouraging results. Namburu et al. (2007) used principal component analysis, a support vector machine, a least-squares method, and a genetic algorithm comprehensively to analyze historical data and diagnose fault, which provided a new means to fault detection and diagnosis. Li et al. (2016a) used historical data to train decision trees and classify faults. Cai et al. (2014) proposed a multi-source information fusion based fault diagnosis method for ground-source heat pumps using a Bayesian network. This method can increase the fault diagnostic accuracy for single fault and can correct the wrong results for multiple-simultaneous faults. In the field of data-driven research, the neural network has always been a mainstream method. Wang and Chen (2002) were the first to propose a fault classification method based on a neural network. The research of optimization of neural network training through different method, such as rule-based methods, fuzzy optimization methods, and wavelet analysis, to achieve higher accuracy and faster convergence have appeared frequently (Du et al., 2009; Mavromatidis et al., 2013; Zhu et al., 2012). There are various diagnosis targets, including sensor-level, component-level, and systemlevel assessments. Given the rise of big data, intelligent fault diagnosis methods have also appeared in large numbers after 2014. Hu et al. (2018) proposed a Bayesian network model for refrigerant charge fault diagnosis of variable refrigerant flow air conditioning systems. Guo et al. (2018) used a deep belief network (DBN) model to diagnose fault of variable refrigerant flow air conditioning systems. Wang et al. (2018) used a Bayesian network to select features for chiller fault diagnosis. Although many intelligent methods have shown a higher performance in fault diagnosis, gradual fault diagnosis at early stages
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
is lacking. Thus, the research emphasis of this study is to enhance the timely diagnosis of gradual fault at very early stages. 1.2. Deep learning Deep learning is one kind of machine learning and a hotspot in artificial intelligence. Unlike other machine learning methods, such as the kernel method or Bayesian analysis, deep learning leverages more neural network based methods. Prior to 2006, artificial neural networks have mainly referred to shallow neural networks based on connectionism and back-propagation algorithms, which were mainly represented by Back-Propagation neural networks. Shallow neural networks utilize a back-propagation algorithm in training the model with a gradient descent algorithm and can be fit to simple nonlinear systems. Due to its simple structure, it is impossible to achieve accurate fitting of highly complex systems. One way to solve this problem is to increase the hidden layer neurons and make its structure more complex. However, the gradient will gradually decrease in the process of backward propagation. When the number of layers is large, the gradient will eventually tend to 0, which is known as ‘vanishing gradient’. The appearance of vanishing gradient makes it impossible to train the latter layers, which renders increased layers useless. In 2006, Hinton showed that a kind of neural network called a deep belief network could be efficiently trained by using a strategy called greedy layer-wise pre-training (Hinton et al., 2006). This strategy can solve the ‘gradient disappeared’ challenge. Thus, it makes researchers able to train deeper neural networks, which creates a neural network with more hidden layers and fit more complex models. Proposal of this strategy has also made the term ‘deep learning’ widely known (Goodfellow et al., 2016). With the advent of faster CPUs, general-purpose GPUs, and the explosive growth of data, the advantages of deep learning have become increasingly prominent. For the deep learning approach, many advanced neural networks have been proposed, including convolutional neural networks (CNN) (O’Toole et al., 2018), recurrent neural networks (RNN) (De Mulder et al., 2015), and deep belief networks (DBN) (Rizk et al., 2018). All of these have achieved very good effects in many research areas, such as speech recognition, image processing, and machine translation. This paper merges CNN, encoder-decoder, and RNN, proposing a convolution-sequence model to fit ASHP and, then, uses CNN to achieve fault diagnosis, which is typically a kind of deep learning method.
65
Table 1 The details of three sub-datasets. Kind of variables
Name of variables
Environment variable Control variable State variable
Toutdoor , Tindoor S, v Tdis , Tsuc , Tc , Te , Tout|c , Tin|e ,Pdis , Psuc
2. Proposed method 2.1. Architecture Fig. 1 shows the structure of the diagnostic strategy. It has two models needed to be trained: the C-S model and the CNN model. 2.1.1. C-S model training The training data set was the operating data and the control data of the ASHP system, and this model was fit to a working and healthy ASHP operation. The output of this model was the theoretical operating data values of a health system at the next time point. 2.1.2. CNN model training The training set for this model was the residual values of operating data, being the difference between the theoretical values from the C-S model and the measured values of ASHP. The training set consisted of both normal data and faulty data. The output of this model was the faulty category of the target system. The trained model can be used in the diagnosis process. The diagnosis flow is presented, as follows: Step 1: Data is collected for target system and divided dataset into three sub-sets, which are the environmental variable set, control variable set, and state variable set. All the data arre normalized before use. Step 2: The three datasets are fed to the C-S model, and the output is the theoretical state data in current time. Step 3: The measured values are subtracted from the theoretical values to get the residual values. Step 4: The residual data is fed to the CNN model to then get the result of fault diagnosis. The details of the three sub-datasets are shown in Table 1, and the description of the two models can be found in the following contents. 2.2. Convolution-sequence model
1.3. The contributions of this study The novelty of this study mainly manifests in three aspects. Firstly, CNN, Encoder-decoder, and RNN are merged, and a convolution-sequence (C-S) model is proposed. Secondly, based on the C-S model, a pre-processing thinking for fault diagnosis is proposed, which results in a more suitable dataset for the diagnosis method. Finally, a convolutional neural network with optimized convolution kernel (one-dimensional convolution kernel) is used to diagnose the specific failure for ASHP systems. All of the three novelties contribute to a real-time intelligent fault diagnosis method and greatly improve the diagnosis accuracy, which can be used to achieve early stage diagnosis of gradual fault. 1.4. Organization of this paper The paper is organized as follows: Section 2 is dedicated to deriving the proposed fault diagnosis method; Section 3 outlines the structure of the experimental platform and the fault simulation method; Section 4 presents the performance of the proposed method; and some concluding remarks are given in Section 5.
Because of slight changes, early stage gradual fault is hard to diagnose. The key problem is how to accurately predict the theoretical value of operation parameters that refer to ideal state values of a healthy system. Due to strong coupling and pure delay, ASHP is hard to model through a physical method. Data-driven methods, especially deep learning, can learn from operational data, and thus, it is very well suited for heat pump modeling. A heat pump is a kind of pure delay system, so the current state parameters of the system are greatly affected by the previous moment. On the other hand, a system with early gradual fault has a slight change in these characteristics, so the difference between a normal system and target system is not obvious in a short period of time. For this reason, a good solution is to extend the evaluation period and calculate the cumulative difference. This paper proposes a convolution-sequence model (C-S model) using deep learning techniques, which can predict theoretical state variables based on control variables and environmental variables. The model structure is shown as Fig. 2. The model includes three main parts: convolution layer, encoder-decoder, and recurrent neural network. The state vari-
66
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
Fig. 1. Structure of the diagnostic strategy based on deep learning.
Therefore, the more time steps recorded will result in greater detection accuracy, given that the C-S model is precise enough. However, too many time steps for RNN is unrealistic, as it is limited by the model precision and time cost. The specific value of time steps is determined in Section 4.1. In conclusion, the proposed model has two main advantages. Firstly, the convolution layer makes a more accurate initial state detection. Secondly, the RNN part has amplified the difference between the ideal system and target system. Both of the two system advantages can improve the accuracy of fault detection. Fig. 2. Structure of the C-S model.
ables in a period of time are fed to the convolution layer. Then, the convolution layer is encoded as a vector. This vector contains both transient and hysteretic characteristics of the system. Using this vector as an initial state to calculate system state values of next period can greatly improve the prediction accuracy. The last part of the C-S model is a recurrent neural network (RNN). Long short time memory (LSTM) is one kind of optimized RNN. The input parameters are control variables and environmental variables. The length of the RNN is equal to the number of predicted steps. The output of each of the steps is the ideal system state values of that time point, which means that if the system is healthy, the output will be very similar to the measured state values. If the system is faulty, the difference between prediction and measurement values will be growing continuously with the increase of time steps.
2.3. 1D-kernel-CNN based fault diagnosis Due to lots of interfering factors and dynamic operating conditions, diagnosing fault via operational data already presents limitations. So, this paper pre-processes operational data based on the C-S model and diagnoses fault with operation residual data, which is the output of the pre-processing step. This kind of processing method yields very good results. Different from fully connected neural network, CNN has a parameter sharing mechanism. The convolution kernel is the shared parameters and gives CNN the ability to extract features of several adjacent parameters. This paper proposes that CNN with a onedimensional convolution kernel can improve the diagnosis accuracy significantly. Shown as Fig. 3, every table is an input matrix of CNN, and the rows of the tables are the system state variables, such as Tdis , Tsuc . The columns of the tables are time steps, and the interval time is 3 min. The convolution kernel is (5, 1) which
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
67
Table 2 Specifications of heat pump system. Part
Specifications
Compressor
Constant-speed reciprocating compressor Displacement (cm3 ): 12 Fin-tube heat exchanger Row/step: 2/10 Width/length/height(mm): 250/120/240 Heat exchange area(m2 ): 3.4 Fin-tube heat exchanger Row/step: 1/9 Width/length/height(mm): 250/120/240 Heat exchange area(m2 ): 2 Thermostatic expansion valve (Adjusted by external nitrogen source)
Condenser side (Indoor unit)
Evaporator side (Outdoor unit) Expansion device
Fig. 3. Calculation steps of convolution.
means a two-dimensional matrix, and the strides are (1, 1). Every convolution transforms 5 data to 1 float data, and this data can represent the process state of five time points for a specific variable. Compared with 2-D convolution kernel, 1-D convolution kernel can retain maximally independent information of different parameters and, at the same time, achieve the extraction of trend characteristics within a time period. Thus, the performance of the 1D-kernel-CNN is much better than the normal 2D-kernel- CNN in this study. In the layer selection, a pooling layer is added after the convolution layer for down sampling, generally. But for this method, down sampling loses many important information. When tested with experiments, only one convolution layer presents the best accuracy. In conclusion, the CNN-based fault diagnosis has two advantages. Firstly, diagnosis through residual data has a better immunity when compared with diagnosis with operation data directly. Secondly, CNN has a better ability to extract time features when compared with a full connected neural network. So, this diagnosis method has a very satisfactory effect, which will be shown in Section 4. All kinds of residual data will be used to train this model, including normal data, signal fault data, and multiple fault data. The trained model can output fault labels (0 for healthy system, 1 for condensing fouling, and 2 for refrigeration leaking, etc.) by using residual data of the target system. 3. System overview 3.1. Experiment platform structure The experimental platform included three sub-systems: refrigeration system, electrical and electronic system, and software system. The refrigeration system was a small-sized ASHP with one condenser and one evaporator, and each heat exchanger had a finnedtube exchanger. A small reciprocating compressor with R134A was selected. For the expansion valve, a thermostatic expansion valve
Fig. 4. The schematic of experimental platform.
adjusted by external nitrogen source was selected. The details of the heat pump system are shown in Table 2. The electrical and electronic system mainly included data acquisition, DA output, and power control modules. The ADVANTECH data collection system was selected for temperature and pressure value acquisition. An electric energy meter and electric power meter was used for power control and energy measurement. The software system was designed in LabVIEW 2014, which integrates measurement, control, and visualization in one system. Convenient operation of the human interface and strong support from the National Instruments company makes this software system friendly to users. All of the measurement parameters were collected once every 30 s and, then, stored into the MySQL Database. The schematic of the experimental platform is shown as Fig. 4. There are 6 temperature sensors (T1–T6) and 2 pressure sensors (P1 and P2). The picture of the platform is shown in Fig. 5. 3.2. Measurement parameters The measurement parameters mainly include five parts: temperature, pressure, electrical, control, and secondary parameters. The secondary parameters are those parameters that were calculated by other measurement parameters. The details of all kinds of measurement parameters are shown in Table 3. The typical winter weather in Hangzhou, China was selected as the experimental condition, and the details are presented in Table 4. The working conditions at the beginning of system operation varied significantly, and thus, this part of the data could be harmful for model training. In order to ensure the quality of the training data, the first 2 h of collected data were discarded.
68
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
Fig. 6. Performances of C-S model with difference activation function and optimizer.
Refrigerant leakage fault is easier to simulate. Extracting part of the refrigerant in the system can improve the measurement and make for a satisfactory fault simulation. Less than 5% was extracted to make sure the simulation is accordance with early stage fault. All three signal faults can exist at the same time, and multiple fault diagnosis was an important part of this study.
Fig. 5. Picture of experiment platform. Table 3 Measurement parameters. Part
Parameter
Temperature parameter Pressure parameter Electrical parameter Control parameter Secondary parameter
Tdis , Tsuc , Tout|c , Tin|e , Toutdoor , Tindoor pdis , psuc E, P, I, V S, υ Tc , Te
Table 4 Experimental condition. Variables
Value
Refrigerant Indoor dry bulb temperature (°C) Indoor wet bulb temperature (°C) Outdoor dry bulb temperature (°C) Outdoor wet bulb temperature (°C)
R134A 23 15 2 1
In Table 3, it can be seen that the speed of the compressor is one of the measurement parameters. While the compressor is at constant speed, the rotor speed still fluctuates under varying load, and therefore, ignoring the speed variation will reduce the accuracy of model. Thus, a speed measurement method of a constantspeed hermetic compressor, proposed in previous research, was also used in this paper (Zhe et al., 2018).
3.4. Algorithm operating environment The deep learning algorithm was coded in Python 3.6 in the Pycharm 2017 edition development environment. Keras is a wellknown deep learning framework, and the 2.2.4 edition was chosen for this study. Although the GPU environment can obviously improve training speed for large neural networks, it is not effective for small neural networks, such as the neural network in this paper. Given previous attempts to train the same neural network both by CPU and GPU, the results showed that CPU was slightly faster than GPU. It was determined that data reading and writing between internal storage and GPU has a high time cost. For large scale networks, the acceleration by GPU is quite large, so the time cost for data reading and writing can be neglected. On the other hand, CPU is more suitable for a small scale networks, as the data reading and writing time cost has been counteracted by the acceleration in network training. The deep learning algorithm was performed in Intel(R) Xeon(R) E5-1650 v3, which is a CPU made by Intel. The computer memory was 16 GB, and the operating system was Windows 7 × 64. 4. Results and discussion
3.3. Fault simulation 4.1. Accuracy of modeling approach Three kinds of faults were simulated in the platform, including condenser fouling, evaporator fouling, and refrigerant leakage. The simulation methods were introduced as follows. For the fouling simulation, a slice of hardboard was used to prevent cooling air. The feasibility of this simulation method is explained as follows. The main reason for finned-tube exchanger fouling is wet-particle deposition on the air-side (Zhan et al., 2018), and this phenomenon can drop pressure of the cooling air. Using a hardboard to prevent cooling air caused the same phenomenon with wet-particle deposition, and both of them decrease heat transfer capacity. Thus, the fouling simulation method was reasonable. The area of the hardboard was approximated as 10% of the exchanger sectional surface. In consideration of design redundancy, this type of fault simulation is atypical of early stage fouling.
In Section 2.2, the framework of the C-S model was introduced. However, hyper-parameter selection for the C-S model also had a great influence on the modeling performance. In this section, the accuracy of each C-S model under the five changeable hyperparameters will be discussed, in order to choose best group for practical application. Each of the networks were trained on operational data from 80 0 0 groups and tested on a 20 0 0 group data set. First of all, the accuracy of the C-S model with varying activation functions and optimizers was determined. The basic idea of activation function and optimizer can be found in the book Deep Learning (Goodfellow et al., 2016). Given differences between groups of activation function and optimizer, the validation loss, which is calculated by mse arithmetic, is shown in Fig. 6.
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
Generally, lower loss means better performance. However, the difference in each set of loss values were too large, thus, a logarithmic axis was used to present loss data. Each value was multiplied by −1 to make sure every data point was positive. After those transformations, the higher value indicated better performance. From this figure, it can be seen that when the activation function is selu and optimizer is Adam, the best performance is achieved for all conditions. Thus, this group was chosen in the later works. The LSTM nodes are the neural number of the RNN component. Generally, more LSTM nodes number present results in better performance. When limited by the training data set size, too many nodes will make performance drop, as well as always cost too much time for training. As such, the chosen nodes number needs to be studied. From Fig. 7a, it can be observed that when nodes are larger than 200, the loss has little optimized space, and the cost time can potentially grow. To balance the time cost and performance, 200 was chosen as the nodes number. Batch size is the value of the data size in one training time. A large batch size will shorten training time but also increase the validation loss, which results in worse performance. In Fig. 7b, it can be found that when the batch size is between 100 and 250, the loss value is very similar, but the cost time is at the minimum when the batch size is equal to 100. Thus, when the batch size equals 100, the best return is observed. Training epochs is the training times for all of the training data. With the increase of training epochs, the validation loss will decrease first and then increase later. This is because too many training epochs will lead to overfitting. Overfitting the neural network has a poor generalization. In Fig. 7c, it can be observed that when the epoch number is under 40, the loss is relatively large. As the epoch number increases, the loss decreases gradually and decreases sharply after 50 epochs. Between 70 and 100, the loss value has little change, and also decreases slowly after 150. However, the cost time of network training is sustaining growth and increases sharply after 150 epochs. Thus, 100 epochs were chosen in the later works. The convolutional layer shape was the key to detection of the initial state of the system. The shape is equal to state parameters number times the input time steps. The state parameters number was dependent on the data collection system and, in this paper, was actually 8 (shown in Table 1). Thus, the main factor was input time steps. This parameter needs to adapt to the system property and control strategy, which means that it should describe the system hysteretic characteristics. In Fig. 7d, it can be seen that with the increasing of input time steps, the time cost rose slightly, but the increment can be neglected. The lowest loss was observed at a length of 5 time steps. Therefore, the optimizing input time steps was equal to 5. As discussed in Section 2.2, the length of RNN, which is also called output time steps, is always a hyper-parameter of the C-S model. Although the increased output time steps benefit fault diagnosis, it also caused more time cost and higher validation loss. Fig. 7e shows that with the increase of output time steps, the validation loss rose continuously, and the time cost presented the same tendency. Thus, too many time steps will decrease model precision. From the figure, it can be observed that when output time steps was equal to 4, the validation loss presented a low peak. Therefore, for comprehensive consideration, 4 was chosen as the value for output time steps. By applying more hyper-parameters to the C-S model, the final version of the model reached a validation loss value of 4.2 × 10−5 . The time cost, model size, and memory cost of the final version are also shown in Table 5. It can be determined that with the calculation power of modern computers, the training memory cost and model size can be neglected. The evaluation time of the C-S model can also be neglected. The training time cost may be a little long,
69
Fig. 7. The performance of C-S model with vary hyper-parameters. (a: activation=selu; optimizer=Adam epochs=100; batch_size=100; input time steps=5; output time steps=4). (b: activation=selu; optimizer=Adam epochs=100; LSTM_nodes=150; input time steps=5; output time steps=4). (c: activation=selu; optimizer=Adam; batch_size=100; LSTM_nodes=150; input time steps=5; output time steps=4). (d: activation=selu; optimizer=Adam; epochs=100; batch_size=100; LSTM_nodes=150; output time steps=4). (e: activation=selu; optimizer=Adam epochs=100; batch_size=100; LSTM_nodes=150; input time steps=5).
70
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72 Table 5 The running performance of the C-S model and fault diagnosis model.
Training data Testing data Training epochs Batch size Model size (Mb) Training memory cost (Mb) Training time cost (s) Testing time cost (s)
C-S model
Fault diagnosis model
4000 2000 150 200 39.1 63.5 180.12 0.47
6400 3200 100 200 22.9 36.7 44.26 0.605
Fig. 9. The performance of the M-CNN model under different kernel length and stride. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 8. The validation loss of each CNN with different kernel shape.
whereas the neural network does not need training as frequently. Hence, the time cost is also acceptable. 4.2. The results and comparisons of fault diagnosis In this section, the effect of proposed method is discussed and compared to three similar methods. Hyper-parameters of each diagnosis model were chosen according to the method, which has been described in the previous section. At the beginning of this section, the advantages for 1D-kernel were discussed. Hence, a comparison between the 1D-kernel-CNN and normal CNN has been studied. Because both the 1D-kernelCNN and normal CNN almost reach a 100% accuracy rate, a validation loss was used for comparison in this part. Four CNN models with (3,1) kernel, (5,1) kernel, (3,3) kernel, and (5,5) kernel have been tested, and both models were chosen to have (1,1) strides. It is widely accepted that lower loss indicates better performance. And from Fig. 8, it is noted that the validation loss of the 1D-kernel-CNN is generally lower than the normal CNN (2Dkernel-CNN). Therefore, the CNN with a 1D kernel was better for ASHP fault diagnosis. As the key factor, the shape of the convolution kernel was very important. In Section 2.3, the (5,1) kernel was used as an example, and in this section, the optimal shape will be discussed. On the other hand, the stride also had an influence on performance. Among the diverse choices tested, the results are shown in Fig. 9. The color of each area represents the validation loss of the model. For better visualization, the logarithm of the loss has been taken
and converted to a positive number. In Fig. 9, the red area indicates better performance, and the blue area indicates bad performance. It is easy to observe that the best group appears on the right bottom, where kernel length is 7 and stride is 1. This proposed method diagnoses fault based on the C-S model, which means that residual data generated from the C-S model first and then the residual data is diagnosed with the CNN model. This method is known as M-CNN. In this section, two advantages of the M-CNN model are discussed. The first one is the advantage of using the 1D-kernel-CNN (CNN for short in this section). It is compared with a fully connected neural network based model (called ANN for short) to show the benefit of using CNN. The second one is the advantage of using the C-S model to pre-process data. ANN is compared with CNN, both without the C-S model. The hyperparameters of four models are shown as Table 6. Each model was trained by data from 12,800 groups, which included validation data from 6400 groups and was tested on 3200 group datasets. The performance of each kind of fault and total performance are shown in Fig. 10. The category of each label is as follows: NO.1: health, NO.2; evaporator fouling, NO.3: condenser fouling, NO.4: refrigerant leakage, NO.5: condenser and evaporator fouling, NO.6: refrigerant leakage and evaporator fouling, NO.7: refrigerant leakage and condenser fouling, NO.8: three fault concurrency. It can be clearly seen that the ANN model showed the worst performance for all kinds of faults. Compared with the CNN model, the total accuracy rate declined by 8.8. Comparing M-ANN with MCNN, the total accuracy rate declined by 7.3. And for each specific fault, CNN showed better performance too. Comparing ANN with CNN both without the C-S model pre-process, it can be observed that for both total and specific fault, the performance of models with the C-S component was better than the model without the C-S model. To better demonstrate the diagnostic effect, the M-CNN model was validated with the validation dataset of 3200 groups. The mean error and the standard deviation error are shown in Table 7. From the table, it can be seen that the mean error of each temperature parameter was less than 0.1 °C, and the dispersion degree was also satisfactory. In conclusion, both the CNN and C-S
Table 6 The structure of each diagnosis model.
ANN M-ANN CNN M-CNN
Input shape
Dimensionality
Nodes of full connection layers
Batch size
Epochs
Activation function
Optimizer
1 × 10 1 × 10 10 × 10 10 × 10
1D 1D 2D 2D
(200,30) (200,30) (100) (100)
100 100 100 100
80 80 80 80
selu selu selu selu
Adam Adam Adam Adam
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
71
Fig. 10. The performance of each diagnosis model. Table 7 The mean value and standard deviation of predicted errors by M-CNN model.
μ σ
Tdis / °C
Tsuc / °C
Tin|e / °C
Tout|e / °C
Tin|c / °C
Tout|c / °C
Te / °C
Tc / °C
pdis /MPa
psuc /MPa
−0.015 0.241
−0.096 0.181
−0.029 0.076
−0.082 0.181
0.056 0.185
0.002 0.075
−0.030 0.061
0.009 0.079
0.0002 0.0027
−0.0003 0.0007
declined too. Compared with 98.88% of CNN, model-1 to model3 were below this value. Therefore, for diagnosis performance improvement, it must be guaranteed that the loss of the C-S model is below 0.01. This makes the C-S model play an important role for Gradual Fault Early Stage Diagnosis. 5. Conclusions
Fig. 11. The accuracy rates of different diagnosis models.
model improved the accuracy rate, which contributed to the proposed method having a very high accuracy rate. The running performance of the fault diagnosis model is shown in Table 5. Similar to the above analysis, the time cost and memory cost was also acceptable, as well as completely meeting the demand of real-time diagnosis. 4.3. The effect by the model accuracy In Section 4.2, it was proven that the diagnosis method with C-S model pre-processing showed a better performance. However, the influence of the C-S model accuracy to the diagnosis result has not been discussed yet. In this section, 6 C-S models were used to train different accuracy models by adjusting number of nodes, activation function, and optimizer. The validation loss of each model was as follows: model-1: 0.56, model-2: 0.35, model-3: 0.13, model-4: 0.01, model-5: 5.9 × 10−3 , model-6: 3.5 × 10−4 . Each of the diagnosis methods was compared with the CNN model without C-S model. Fig. 11 shows the accuracy rate of each diagnosis model. It can be observed that, with the accuracy decline, the accuracy rate
A novel gradual fault early stage diagnosis model of ASHP system was established by using deep learning. There were three innovations in the proposed method. Firstly, a C-S model has been proposed. Secondly, based on the C-S model, a pre-processing thinking for fault diagnosis was proposed, which makes the diagnosis method have a more suitable dataset. Finally, a convolutional neural network with an optimized convolution kernel (onedimensional convolution kernel) was used to diagnose the specific failure for ASHP. The main conclusions were as follows: (1) Using 1D-kernel-CNN to diagnosis fault was shown to have a higher performance for ANN, and using a C-S model to preprocess operation data can improve accuracy rates. Compared with CNN, ANN, and M-ANN, the M-CNN model showed the best performance. (2) The accuracy of the C-S model had a great influence on the fault diagnosis performance. Compared with the CNN model, only the C-S model loss lower than 0.01 could help improve the accuracy rate. (3) It was found that the hyper-parameters also had an influence on the diagnosis performance. At present, the parameter selection of the neural network did not have a specific criterion. The models are usually developed with many attempts. Finally, it was determined that the model showed the best performance when activation function is sule and optimizer is Adam. Also, a recommended number of nodes, batch size, and epochs are also provided, demonstrating a satisfactory effect. In summary, the method proposed in this paper is a feasible and practical diagnosis method for gradual fault in ASHP systems. It can be used to intelligently diagnose ASHP faults and maintain it in a timely manner. Energy wastage resulting from faulty
72
Z. Sun, H. Jin and J. Gu et al. / International Journal of Refrigeration 107 (2019) 63–72
conditions in buildings can be minimized. Therefore, this method can reduce energy consumption, reduce maintenance costs, and increase equipment life, which makes it have very broad application prospects. Acknowledgments The authors gratefully acknowledge the support of Zhejiang Provincial Natural Science Foundation of China (Grant Nos. LGG18E050024 and LGG19E050020). References Cai, B., Liu, H., Xie, M., 2016. A real-time fault diagnosis methodology of complex systems using object-oriented Bayesian networks. Mech. Syst. Signal Process. 80, 31–44. Cai, B., Liu, Y., Fan, Q., Zhang, Y., Liu, Z., Yu, S., Ji, R., 2014. Multi-source information fusion based fault diagnosis of ground-source heat pump using Bayesian network. Appl. Energy 114, 1–9. Cai, B., Zhao, Y., Liu, H., Xie, M., 2017. A data-driven fault diagnosis methodology in three-phase inverters for PMSM drive systems. IEEE Trans. Power Electron. 32, 5590–5600. Cao, X., Dai, X., Liu, J., 2016. Building energy-consumption status worldwide and the state-of-the-art technologies for zero-energy buildings during the past decade. Energy Build. 128, 198–213. De Mulder, W., Bethard, S., Moens, M.-.F., 2015. A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 30, 61–98. Du, Z., Jin, X., Wu, L., 2007. Fault detection and diagnosis based on improved PCA with JAA method in VAV systems. Build. Environ. 42, 3221–3232. Du, Z., Jin, X., Yang, Y., 2009. Fault diagnosis for temperature, flow rate and pressure sensors in VAV systems using wavelet neural network. Appl. Energy 86, 1624–1631. Fan, B., Du, Z., Jin, X., Yang, X., Guo, Y., 2010. A hybrid FDD strategy for local system of AHU based on artificial neural network and wavelet analysis. Build. Environ. 45, 2698–2708. Goodfellow, I., Bengio, Y., Courville, A., Bach, F., 2016. Deep Learning. MIT Press. Guo, Y., Tan, Z., Chen, H., Li, G., Wang, J., Huang, R., Liu, J., Ahmad, T., 2018. Deep learning-based fault diagnosis of variable refrigerant flow air-conditioning system for building energy saving. Appl. Energy 225, 732–745. Hinton, G.E., Osindero, S., Teh, Y.-.W., 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554. Hong, T., Koo, C., Kim, J., Lee, M., Jeong, K., 2015. A review on sustainable construction management strategies for monitoring, diagnosing, and retrofitting the building’s dynamic energy performance: focused on the operation and maintenance phase. Appl. Energy 155, 671–707. Hou, Z., Lian, Z., Yao, Y., Yuan, X., 2006. Data mining based sensor fault diagnosis and validation for building air conditioning system. Energy Convers. Manag. 47, 2479–2490. Hu, M., Chen, H., Shen, L., Li, G., Guo, Y., Li, H., Li, J., Hu, W., 2018. A machine learning bayesian network for refrigerant charge faults of variable refrigerant flow air conditioning system. Energy Build. 158, 668–676. Katipamula, S., Brambley, M.R., 2005a. Review article: methods for fault detection, diagnostics, and prognostics for building systems—a review. Part I. HVAC&R Res. 11, 3–25. Katipamula, S., Brambley, M.R., 2005b. Review article: methods for fault detection, diagnostics, and prognostics for building systems—a review. Part II. HVAC&R Res. 11, 169–187. Kim, M., Yoon, S.H., Payne, W.V., Domanski, P.A., 2008. Cooling Mode Fault Detection and Diagnosis Method for a Residential Heat Pump. NIST Special Publication 1087. Kim, W., Katipamula, S., 2017. A review of fault detection and diagnostics methods for building systems. Sci. Technol. Built Environ. 20, 1–18.
Kocyigit, N., 2015. Fault and sensor error diagnostic strategies for a vapor compression refrigeration system by using fuzzy inference systems and artificial neural network. Int. J. Refrig. 50, 69–79. Lazarova-Molnar, S., Mohamed, N., 2017. A framework for collaborative cloud-based fault detection and diagnosis in smart buildings. In: Proceedings of the 7th International Conference on Modeling, Simulation, and Applied Optimization (ICMSAO), Sharjah, pp. 1–6. Lazarova-Molnar, S., Shaker, H.R., Mohamed, N., Jrgensen, B.N., 2016. Fault detection and diagnosis for smart buildings: state of the art, trends and challenges. In: Proceedings of the 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC), Muscat, pp. 1–7. Li, D., Zhou, Y., Hu, G., Spanos, C.J., 2016a. Fault detection and diagnosis for building cooling system with a tree-structured learning method. Energy Build. 127, 540–551. Li, G., Hu, Y., Chen, H., Shen, L., Li, H., Hu, M., Liu, J., Sun, K., 2016b. An improved fault detection method for incipient centrifugal chiller faults using the PCA-R-SVDD algorithm. Energy Build. 116, 104–113. Li, S., Wen, J., 2014. A model-based fault detection and diagnostic methodology based on PCA method and wavelet transform. Energy Build. 68, 63–71. Müller, T., Réhault, N., Rist, T., 2013. A qualitative modeling approach for fault detection and diagnosis on HVAC systems. In: Proceedings of the International Conference for Enhanced Building Operations, 2013, pp. 8–11. Mavromatidis, G., Acha, S., Shah, N., 2013. Diagnostic tools of energy performance for supermarkets using artificial neural network algorithms. Energy Build. 62, 304–314. Namburu, S.M., Azam, M.S., Luo, J., Choi, K., Pattipati, K.R., 2007. Data-driven modeling, fault diagnosis and optimal sensor selection for HVAC chillers. IEEE Trans. Autom. Sci. Eng. 4, 469–473. O’Neill, Z., Pang, X., Shashanka, M., Haves, P., Bailey, T., 2014. Model-based real-time whole building energy performance monitoring and diagnostics. J. Build. Perform. Simul. 7, 83–99. O’Toole, A.J., Castillo, C.D., Parde, C.J., Hill, M.Q., Chellappa, R., 2018. Face space representations in deep convolutional neural networks. Trends Cogn. Sci. 22, 794–809. Rizk, Y., Hajj, N., Mitri, N., Awad, M., 2018. Deep belief networks and cortical algorithms: a comparative study for supervised classification. Appl. Comput. Inform 15, 81–93. Schein, J., Bushby, S.T., Castro, N.S., House, J.M., 2006. A rule-based fault detection method for air handling units. Energy Build. 38, 1485–1492. Sterling, R., Provan, G., Febres, J., O’Sullivan, D., Struss, P., Keane, M.M., 2014. Model-based fault detection and diagnosis of air handling units: a comparison of methodologies. Energy Procedia 62, 686–693. Wang, S., Chen, Y., 2002. Fault-tolerant control for outdoor ventilation air flow rate in buildings based on neural network. Build. Environ. 37, 691–704. Wang, Z., Wang, Z., Gu, X., He, S., Yan, Z., 2018. Feature selection based on Bayesian network for chiller fault diagnosis from the perspective of field applications. Appl. Therm. Eng. 129, 674–683. Widodo, A., Yang, B.-.S., 2007. Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process. 21, 2560–2574. Yang, X.-.B., Jin, X.-.Q., Du, Z.-.M., Zhu, Y.-.H., 2011. A novel model-based fault detection method for temperature sensor using fractal correlation dimension. Build. Environ. 46, 970–979. Yu, Y., Woradechjumroen, D., Yu, D., 2014. A review of fault detection and diagnosis methodologies on air-handling units. Energy Build. 82, 550–562. Zhan, F., Zhuang, D., Ding, G., Ju, P., Tang, J., 2018. Influence of wet-particle deposition on air-side heat transfer and pressure drop of fin-and-tube heat exchangers. Int. J. Heat Mass Transf. 124, 1230–1244. Zhao, H.-x., Magoulès, F., 2012. A review on the prediction of building energy consumption. Renew. Sustain. Energy Rev. 16, 3586–3592. Zhe, S., Jiangping, G., Huaqiang, J., Yuejin, H., Xi, S., 2018. An investigation on speed measurement method of hermetic compressor based on current fluctuation. Int. J. Refrig. 88, 211–220. Zhou, Q., Wang, S., Ma, Z., 2009. A model-based fault detection and diagnosis strategy for HVAC systems. Int. J. Energy Res. 33, 903–921. Zhu, Y., Jin, X., Du, Z., 2012. Fault diagnosis for sensors in air handling unit based on neural network pre-processed by wavelet and fractal. Energy Build. 44, 7–16.