Anomaly detection for wind turbines based on the reconstruction of condition parameters using stacked denoising autoencoders

Anomaly detection for wind turbines based on the reconstruction of condition parameters using stacked denoising autoencoders

Renewable Energy 147 (2020) 1469e1480 Contents lists available at ScienceDirect Renewable Energy journal homepage: www.elsevier.com/locate/renene A...

3MB Sizes 1 Downloads 65 Views

Renewable Energy 147 (2020) 1469e1480

Contents lists available at ScienceDirect

Renewable Energy journal homepage: www.elsevier.com/locate/renene

Anomaly detection for wind turbines based on the reconstruction of condition parameters using stacked denoising autoencoders Junsheng Chen a, Jian Li b, *, Weigen Chen b, Youyuan Wang b, Tianyan Jiang b a b

College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China State Key Laboratory of Power Transmission Equipment & System and New Technology, Chongqing University, Chongqing, 400044, China

a r t i c l e i n f o

a b s t r a c t

Article history: Received 5 January 2019 Received in revised form 10 September 2019 Accepted 13 September 2019 Available online 14 September 2019

This paper proposes an approach for detecting anomalies in a wind turbine (WT) based on multivariate analysis. Firstly, the stacked denoising autoencoders (SDAE) model with moving window and multiple noise levels is developed to reconstruct the normal operating data. The correlations among multivariable and temporal dependency inherent in each variable can be captured simultaneously with moving window processing. Both the coarse-grained and fine-grained features of input data can be learned by training with multiple noise levels. Then, the monitoring indicator is derived from the reconstruction error. The threshold value of monitoring indicator is determined by statistical analysis of the values of the monitoring indicator during normal operation. To identify the most relevant parameter related to the detected anomaly in WT, the contribution degree to which each parameter contributes to the exceedance of the threshold is calculated. Finally, the abnormal level is quantified according to the overlap between test behavior distribution and baseline condition to provide supports for operation and maintenance planning of WT. Demonstration on real SCADA data collected from a wind farm in Eastern China shows that the proposed method is effective for the anomaly detection and early warning of an actual WT. © 2019 Published by Elsevier Ltd.

Keywords: Wind turbine SCADA data Anomaly detection Stacked denoising antoencoders Moving window Multiple noise levels

1. Introduction Wind energy has emerged as a promising alternative source to burning fossil fuels [1]. The operation and maintenance (O&M) issues have been drawn extensive attention motivated by the rapid expansion of wind farms. The O&M costs of onshore wind turbines (WTs) account for approximately 10%e15% of the overall generation cost, while the proportion is about 20%e25% in offshore WTs [2,3]. High O&M costs lead to low cost effectiveness for WTs. In order to reduce the O&M costs as well as unscheduled downtime and ensure a high WTs availability, advanced condition monitoring and anomaly detection approaches are highly demanded. The existing approaches for condition monitoring and anomaly detection of WTs can be categorized as model-based approaches and data-driven approaches [4,5]. The model-based approaches rely on establishing accuracy mathematical or physical models. The

* Corresponding author. School of Electrical Engineering, Chongqing University, No.174 Shazhengjie, Shapingba, Chongqing, 400044, China. E-mail addresses: [email protected] (J. Chen), [email protected] (J. Li), [email protected] (W. Chen), [email protected] (Y. Wang), jiangtianyan@ cqu.edu.cn (T. Jiang). https://doi.org/10.1016/j.renene.2019.09.041 0960-1481/© 2019 Published by Elsevier Ltd.

model-based approaches typically include observer-based estimator [6], Kalman filter [7,8] and parity equations [9]. For instance, Sanchez et al. [6] proposed a fault diagnosis approach for WT taking account of model parametric uncertainty and noise based on interval observers and analytical redundancy relations. Additionally, several model-based fault reconstruction schemes and advanced fault-tolerant control frameworks were developed under component fault and system uncertainties [10e12]. A new Laplace [1 Huber based Kalman filter was developed for the attitude estimation of small satellite with model error and heavy-tailed noise in literature [13]. These contributions have greatly enriched and promoted the research on anomaly detection of WTs. However, accuracy mathematical or physical models for WTs can seldom be built owing to the complex relationships and characteristics of all related components in a WT. Such model-based approaches are difficult to apply in practical. Conversely, the data-driven approaches do not require any prior knowledge about the system instead utilize the measurement data to realize condition assessment of WTs. Therefore, the data-driven approaches are more suitable for WT system with complex coupling and highly nonlinear dynamic. Recently proposed data-driven approaches, including vibration analysis [14,15] and acoustic analysis [16], are

1470

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

difficulty to be widely used in actual wind farms due to the limitation of data storage capacities and the high investments of installing additional sensors. Furthermore, supervisory control and data acquisition (SCADA) systems have been widely implemented in actual wind farms to collect WT operating data. The SCADA system of a WT can provide a large amount of operating data [17]. No more hardware investment is needed when developing a SCADA data-driven method for WT condition monitoring. Therefore, the data-driven approaches with extracting useful information contained in SCADA data is a cost-effective way to monitor WT performance. The large volume of monitoring data collected by SCADA system also advance the development and application of datadriven approaches. Actually, most SCADA data are collected during healthy operating conditions of WTs, while faulty data are usually scarce and sometimes even unavailable. It is one of the urgent problems at present to propose an effective condition monitoring and anomaly detection method based on normal data only. Various data-driven approaches based on SCADA data, such as NN [18e20], support vector machine (SVM) [21], nonlinear state estimation technology (NSET) [22], adaptive network-based fuzzy inference system (ANFIS) [23] and cointegration analysis [24] were used for condition monitoring of WTs. In Ref. [22], a generator temperature model was constructed using NSET to monitor the health of WT generator. The significant residuals between model estimates and measured generator temperature indicated an incipient failure in generator. The above methods were mostly implemented aiming at a certain component by corresponding variable modeling. Each parameter in a WT needs to be analyzed one by one. A high degree of automatization development is required to reduce the efforts. Additionally, the nonlinear correlations among relevant SCADA data and temporal dependency at each parameter were seldom taken into consideration. Obviously, the SCADA data of a WT are highly correlated because of the interaction and dependence between different components in a WT system [25]. The operation condition of a WT may be reflected in multiple relevant parameters. Actually, if a malfunction occurs in a certain component of a WT, multiple condition parameters may vary simultaneously, also the correlations among relevant SCADA data might change. Thus, monitoring the correlations among multiple parameters is an effective and handy way to detect anomalies in a WT. The correlations among multiple parameters are difficulty to obtain directly, while they can be reflected by multivariate reconstruction error. This paper focuses on multivariate reconstruction modeling and anomalies detecting based on reconstruction error. Deep learning has been attracted considerable attentions in feature extraction for condition monitoring and anomaly detection due to its capability of addressing largescale data and learning high-level representation [26]. Deep learning-based methods aim to extract hierarchical representations from input data by deep architecture with multiple layers of nonlinear transformations, which is suitable for WT system with highly nonlinear and correlation. Different deep learning networks, such as deep NN (DNN) [27], convolutional NN (CNN) [28,29], deep belief network (DBN) [30] and stacked denoising autoencoders (SDAE) [31,32] have been reported in condition monitoring and anomaly detection. For instance, a deep network architecture was proposed in Ref. [26] to monitor WT gearboxes and identify their imminent faults. However, the temporal dependence hidden in each parameter was ignored. In the monitoring of industrial process, a common processing is incorporating previous data into current observation vector for considering temporal information [33,34]. We also adopted this solution to incorporate the temporal dependency of each parameter.

The motivation of this manuscript lies in the detection of WT anomalies based on SCADA data. In this study, we propose an innovative approach for anomalies detection of WTs based on multivariate reconstruction using SDAE. The reconstruction model is trained offline using normal operating data to learn the features of multivariate. Then the anomalies online are detected using trained reconstruction model by comparing monitoring indicators derived from RE. The SDAE using moving window and multiple noise levels is modelled to reconstruct the status data of WT. The monitoring indicator is defined using the Mahalanobis distance (MD) of reconstruction errors. The threshold of monitoring indicator is determined based on the probability density function (PDF) of MD with the kernel density estimation (KDE). We utilize the duration of out-of-limit to detect the anomaly conditions of WTs. The anomalies of WTs are identified based on the contribution of each variable on out-of-limit. A new index named abnormal level (AL) is defined to quantify the abnormal level of WT operation condition. The main contributions of this paper are following: 1) Multivariate reconstruction model is developed to detect the anomalies of WTs instead of utilizing univariate prediction model. Additionally, the moving window processing and multiple noise levels training method are utilized to improve the detection accuracy of model. 2) The abnormal level of a WT is quantified based on the overlap of distributions between a known normal behavior and the current testing behavior. The remainder of the manuscript is structured as follows: The SCADA data that can be used for WT anomaly identification are discussed in Section 2. Section 3 presents the moving window SDAE model trained under multiple noise levels for representation learning. The online WT fault detection procedure based on the trained model is illustrated in Section 4. The proposed method is validated by the actual SCADA data in Section 5. The conclusions are drawn in Section 6. 2. Parameters description and classification The SCADA system records comprehensive WTs condition parameters that contain abundant information concerning the health

Fig. 1. The main components and sensor positions of the WT.

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

of WTs. Fig. 1 shows the corresponding sensors positions in SCADA system of a wind farm located in Eastern China. The typical 22 WT condition parameters measured and gathered by SCADA system are given in Table 1. These condition parameters contain the operational status information of the critical components. They can be mainly classified as two types based on the influence degree of environmental factors on each parameter: Type 1 parameters involve various component temperatures and WT output power, which are strongly affected by environmental conditions. Type 2 parameters include yaw angle, pitch angle, gearbox lubrication pressure and hydraulic oil pressure. Type 2 condition parameters do not have an obvious relationship with environmental conditions. Since the bad data identification and anomalies detection for condition parameters of Type 2 can be easily done by setting certain threshold values from experience, only those for condition parameters of Type 1 will be discussed in this study. 3. Moving window SDAE model with multiple noise levels for multivariable reconstruction In this work, a normal behavior model using normal operating data is built firstly, based on which the testing data are then evaluated to detect the anomalies in WTs. A SDAE model using moving window and multiple noise levels named moving window stacked multilevel denoising AE (MW-SMDAE) is presented in this section. This model aims to discover hidden latent structure in condition monitoring data during normalcy of WT and robustly reconstruct the original data. Moving window processing is adopted to extract nonlinear correlations among multivariate variables and temporal dependencies in each variable simultaneously. Multiple noise levels are utilized to train the model to learn fine-grained and coarsegrained features in data. When the raw normal operating data contains a large number of bad data (such as missing data, outlier or power limited operating data), the correlations among multivariable cannot be captured accurately. Therefore, the operating data of WT with a high quality recently should be selected as the training data. If the required training data cannot be selected from the raw SCADA data, it is necessary to improve the raw data quality to ensure a high detection accuracy.

Table 1 WT Condition Parameters Studied in this study.

3.1. SDAE A basic AE is a type of feedforward NN with three-layer. The AE, which is composed of encoder and decoder, reproduces its inputs as its outputs, and is used to learn efficient data coding in an unsupervised manner. There are a small amount of unnatural data in the training data. Hence, we introduce a modified version of the basic AE, denoising autoencoder (DAE) [35], to reconstruct the raw inputs from corrupted ones. Corrupting the original inputs can prevent the AE from simply learning an identity mapping between the inputs and reconstructed outputs, help capture more hidden information and acquire robust representation from noisy data. Multiple DAEs are stacked to constitute a deep learning network called SDAE in order to obtain highly nonlinearity and complex patterns in the data and improve the learning representation. The status data are reconstructed according to the captured correlations among condition parameters during normal operation of WT. The testing data are evaluated with corresponding reconstruction errors obtained by the trained MW-SMDAE model. Generally, an input example x2ℝm is converted to a corrupted ~ with function x ~  qD ðx ~ jxÞ firstly. Then a hidden repreversion x sentation h2ℝd is obtained with encoding process,

~ þ b1 Þ h ¼ sðW1 , x

WT condition parameters

Unit

Type

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Temp. of main bearing a (rotor side) Temp. of main bearing b (gearbox side) Temp. of gearbox input shaft Temp. of gearbox output shaft Temp. of gearbox oil Temp. of gearbox cooling water Temp. of generator winding ph.1 (U) Temp. of generator winding ph.2 (V) Temp. of generator winding ph.3 (W) Temp. of generator bearing a (front) Temp. of generator bearing b (back) Temp. of generator cooling air Temp. of control cabinet Temp. of converter controller Temp. of blade motor Output power Wind speed Ambient temperature Yaw angle Pitch angle Gearbox lubrication pressure Hydraulic oil pressure for yaw Hydraulic oil pressure for rotor brake



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 e e 2 2 2 2 2

C C C  C  C  C  C  C  C  C  C  C  C  C  C kW m/s  C  

 

bar bar bar

(1)

where W1 2ℝdm is the weight matrix and b1 2ℝd is the bias vector, s(∙) is a nonlinear activation function for encoder, for instance sigmoid function. The decoder attempts to map the hidden representation h back to a reconstruction z:

z ¼ dðW2 , h þ b2 Þ

(2)

where W2 2ℝmd is the weight matrix and b2 2ℝm is the bias vector in decoder, d(∙) is activation function for decoder. Note that the key difference of DAE from the classical AE is that z ~ rather than the original input x. is the reconstruction of corrupted x The reconstruction z does not simply reproduce the input x. The training process of DAE is to find optimal parameters W1, W2, b1 and b2 by minimizing the overall cost function. Generally, the weight decay term (also called regularization term) is introduced to prevent overfitting. We define the overall cost function to be



No.

1471

slþ1  sl X 2 l X m 2 X  1 X 1  ðkÞ  ðlÞ Wji z  xðkÞ  þ m 2 2 i¼1 j¼1 k¼1

(3)

l¼1

where the first term in the definition of J is an average sum-ofsquares of error term, x(k) and z(k) are the kth original input and reconstruction of corrupted input respectively. The second term is a weight decay term that tends to decrease the magnitude of the ðlÞ weights, l is the weight decay coefficient. Wji denotes the weight associated with the connection between unit j in layer l, and unit i in layer lþ1, s1, s2, s3 are the number of neurons in input layer, hidden layer and output layer respectively. The minimization of J is usually realized by stochastic gradient descent (SGD) or more advanced optimization techniques such as Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). L-BFGS yields better performance than SGD, such as more stable, faster to train and easier to check for convergence [36]. Therefore, we employ the L-BFGS algorithm in this paper. Shallow network has limited ability to address complex problems, while SDAE has deep nonlinear mapping, which can capture more hidden information and high-order features. In encoder, the hidden representation of the (n-1)th DAE is defined as the input of the nth DAE. A SDAE network can be established by decoding the

1472

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

nth DAE, (n-1)th DAE, …, 1st DAE orderly. The greedy layer-wise procedure is adopted for unsupervised learning of each DAE. Hence, faulty data are not required to be used as label information. 3.2. Moving window processing As mentioned before, the condition parameters of a WT collected by the SCADA system are highly nonlinear correlated. In addition, the monitoring parameters especially temperature parameters that are time series data with characteristics of temporal dependence. The current value will be significantly affected by the previous value. Ignoring the temporal information inherent in time series data will lead to unreliable detection results. For this purpose, we employ moving window approach to process multivariate time series data firstly. The previous information of WTs is incorporated into current observation vector to generate augmented condition parameter matrix that is taken as the input of SDAE model. Thus, we can simultaneously capture the nonlinear correlations among multiple variables and temporal dependencies at each variable via SDAE model. Fig. 2 shows the framework of the proposed SDAE model with moving window processing, named moving window SDAE (MW-SDAE). Concretely, the process steps of moving window are as follows: ðjÞ Let X ¼ fxi g(i ¼ 1,2, …,n; j ¼ 1,2, …,m) be the collected nvariate vector recording normal operation status data of a WT, where n denotes the number of condition parameters, m is the number of data samples, Xi that denotes the data of the ith condition parameter can be denoted by

i h ð1Þ ð2Þ ðmÞ Xi ¼ xi ; xi ; :::; xi

(4)

Assuming that the bandwidth of moving window is b, and the length of moving step is set as one. Hence, the number of moving ðlÞ window is (m-bþ1) in response to m sample data. Noting Si (i ¼ 1, 2, …, n) for the collected data of parameter i of the lth moving window,

iT h ðlÞ lþd1 Si ¼ xli ; xlþ1 i ; :::; xi

(5)

The data of the lth moving window can be denoted by

i h ðlÞ ðlÞ ðlÞ T SðlÞ ¼ S1 ; S2 ; :::; Sn

(6)

As shown in Fig. 6, the corresponding augmented status data matrix (ASDM) Y of original SCADA data X by moving window processing is obtained,

i h Y ¼ Sð1Þ ; Sð2Þ ; :::; Sðmdþ1Þ

(7)

It is obvious that Y obtained by moving window processing contains the temporal information of each variable. Hence, the trained SDAE model can extract not only nonlinear correlations among variables but also temporal dependency of each variable. The learned relations are embedded in learned optimal weight metrics and bias vectors of model. Specially, if b is set as one, Y is simply the original SCADA data X.

3.3. Training with multiple noise levels The noise level is a key tuning parameter in DAE network. The DAE can learn different feature representations with different noise levels. The noise level in the DAE or SDAE is usually determined by experience and remains fixed throughout the training process. Literature [37] have shown that the network tends to learn general features when the inputs are corrupted with a high noise level during training while the network tends to learn detailed features with a low noise level during training. The abnormal information contained in condition parameters is easily covered by their fluctuations. To obtain more information hidden in SCADA data, learn the general and detailed features simultaneously and improve the detection performance, we use multiple noise levels to train the SDAE model instead of a single fixed noise level. In this manuscript, the AE trained with multiple noise levels is named multilevel denoising AE (MDAE). The DAE model is firstly trained with a higher initial noise level c0 to learn the general features of input data, then retrained by successively decreasing multiple noise levels from c1 to cT orderly to learn the detailed features gradually. The initial noise level c0 is chosen to be a high noise level that corrupts most of the input. The model attempts to reconstruct the original data using only few input neurons. Hence it learns only general features about the data like big hills and valleys present in the manifold. As training progresses and the noise is reduced, the autoencoder encounters more input features that helps to reconstruct the original input with ease. Since more and more input features become available during each pass, the autoencoder tries to incorporate new knowledge in the existing knowledge about the data. This leads to fine tuning the curvature of hills and valleys in the manifold. This process can be viewed as fine-tuning. The procedure of multiple noise levels training are given as follows:

Fig. 2. The framework of the SDAE model using moving window.

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

Step 1 The ASDM Y is converted to the corrupted version Yc with noise level c0. The L-BFGS algorithm is utilized to obtain the optimal parameters (weight matrix and bias vector) and hidden representation. Step 2 The ASDM Y is corrupting to the Yc with noise level ct, where ct ¼ ct1  Dc. The weight matrix and bias vector are initialized with parameters obtained based on noise level ct1 and the DAE network is fine-tuned by noise level ct. The optimal parameters and hidden representation are updated. Step 3 Repeat step 2 till ct ¼ cT . The final optimal parameters and hidden representation of DAE are obtained with multiple noise levels. A deep network named stacked multilevel denoising AE (SMDAE) formed by stacking multiple MDAEs is established to obtain high-level representations from SCADA data. SMDAE network is trained at each stacked layer. The DAE at each stacked layer of SMDAE is trained using multiple noise levels. In such way, each AE can uncover more hidden information in SCADA data. The training process of a two-layer SMDAE is shown in Fig. 3. The two DAEs are trained under the initial noise level c0 firstly, and the hidden layer of the first DAE will be treated as the input of second

1473

DAE. Secondly, the first DAE will be further trained with multiple noise levels from c1 to cT orderly. The weight matrix and bias vector of model are sequentially fine-tuned until the final noise level. Then, the second DAE will also be trained using the proposed multiple noise levels training scheme. Eventually, the optimal model parameters are obtained. Meanwhile, the general and detailed feature representations of nonlinear correlations among multiple variables and temporal dependencies at each variable are well learned. 4. WT fault detection based on MW-SMDAE model The normal SCADA data are utilized to train the MW-SMDAE model to develop a normal behavior model of WT. The collected raw SCADA data may contain bad data such as missing data and noisy data even when the WT is operating normally due to various reasons including communication failures, lost data, standstill and other factors. The performance of data-driven approaches are highly sensitive to the data quality. Analysis using poor quality operating data may lead to biased monitoring results and incorrect decisionmaking. To improve the quality of training data and ensure the performance of anomaly detection, data preprocessing is necessary before training to identify and correct the bad data in normal SCADA data. Besides, it is necessary to ensure that the data used for the model training should cover all possible normal operation regions of WTs. The normal data will be reconstructed with low reconstruction errors while abnormal data will yield high errors. Therefore, the monitoring indicator derived from reconstruction error can be used for detecting the potential anomalies of WTs. A univariate monitoring indicator obtained based on multivariate reconstruction errors is needed to detect the anomalies effectively. 4.1. Offline analysis Let XN be the preprocessed normal SCADA data. The corresponding ASDM YN of data set XN can be obtained via moving window processing. The MW-SMDAE model is utilized to reconstruct YN. The multivariate reconstruction errors EN of normal operating data are defined as the difference between inputs YN and reconstructed outputs ZN:

EN ¼ Y N  ZN

(8)

To detect the anomalies in WTs expediently and effectively, a univariate monitoring indicator extracted from the multivariate errors is essential. The MD is unitless, scale-invariant, and take into account the correlations of the data [38e40]. By using the MD of reconstruction errors, the multivariate data will be transformed into univariate data to significantly enhance the computational efficiency. Therefore, we select the MD of reconstruction error as the monitoring indicator. The monitoring indicator tk of the kth sample data in YN can be calculated by:

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi    T ðkÞ ðkÞ tk ¼ EN  u S1 EN  u ðkÞ

Fig. 3. The flow chart of two-layer SDAE training with multiple noise levels.

(9)

where, EN denotes the reconstruction error of the kth sample, u is the mean value of reconstruction errors of normal samples, and S1 is the inverse covariance matrix of normal samples. The monitoring indicator values of all normal data are obtained by equation (9) and denoted as ftk g, t ¼ 1,2, …, m-bþ1. However, the distribution of monitoring indicator is unknown. To determine the threshold for detection, KDE is employed to analyze the probability density distribution of monitoring indicator during normal operation.

1474

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

Let the estimated PDF of MD obtained by normal status data is b f ðtÞ. The estimated PDF using KDE at point t is defined as:

ci ¼

 mbþ1  1 X t  tk b f ðtÞ ¼ K nh h

where, ci denotes the contribution degree of the ith condition parameter of WT to out-of-limit, a is the number of data points of

(10)

k¼1



j¼1

ðjÞ

ðjÞ

t M  t M;i



(15)

1 ¼ pffiffiffiffiffiffie h 2p

ðtt Þ  i2

ðjÞ

t M;i is the monitoring indicator value of the jth data point excluding the ith condition parameter. Accordingly, the data from (b,(i-1)þ ðjÞ

1)th row to the (b,i)th row in EM should be excluded when calcuðjÞ

2

2h

(11)

The detection threshold td can be obtained according to the PDFs of monitoring indicator at a given confidence level a by

ðt

a ¼ Pðt < td Þ ¼ bf ðtÞdt

which monitoring indicators exceed the threshold, t M is the monitoring indicator value of the jth data point in the testing YM,

lating t M;i .

! t  ti K h

a

ðjÞ

where, tk is monitoring indicator of the kth data point in ASDM YN, h is the bandwidth parameter, K(∙) is the kernel function. The optimal bandwidth is selected with the approximate mean integrated square error (AMISE) based method described in Ref. [41]. The Gaussian kernel is selected in this paper:



1Xa 

(12)

0

Although preprocessed normal SCADA data are selected as training data, few noisy data still exist in the training data, which will also cause monitoring indicators to exceed threshold. In fact, the over-limited because of the noisy data only last for a short time compared with that owing to the anomalies such as potential faults. Hence, the duration of out-of-limit is taken into account to evaluate the operation status of WTs. The threshold Qd of duration is also determined by the analysis results of normal data's MDs. It is obvious that one data point in ASDM YN obtained by moving window processing contains the information of consecutive b original SCADA data points XN. Thus, the minimum value of threshold Qd should be b. Therefore, anomalies in WTs can be predicated when the monitoring indicators and their durations both exceed their corresponding thresholds.

The process of the WTs anomalies detection is shown in Fig. 4. If the monitoring indicator value tM of testing data is less than threshold td, and the duration value QM of out-of-limit is less than the threshold Qd, the WT is operating normally. The testing data is identified as noisy data on condition that tM is greater than td, and QM is less than Qd. When tM exceeds the threshold td, and QM is greater than Qd, a potential fault may be occur in WT. The most related variables or components of WT related to the detected anomaly can be identified based on the base of the contribution degree of each WT monitoring variable. 4.3. Abnormal level quantification It is desirable to provide an abnormal level for detected anomaly in a WT to wind farm operators. To quantify the abnormal level of WT, the amount of overlap or the degree of similarity between the current behavior distribution and a known normal behavior

4.2. Online detection The testing data set XM is converted to ASDM YM by applying the moving window processing with the same width. The reconstruction outputs ZM are obtained with the inputs YM according to the trained MW-SMDAE model. The reconstruction errors of testing ASDM YM can be denoted as:

EM ¼ Y M  ZM

(13)

The monitoring indicator of the jth data point in the testing data YM is calculated by: ðjÞ tM

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ffi  T ðjÞ ðjÞ 1 EM  u ¼ EM  u S

(14)

ðjÞ

where, EM denotes the reconstruction errors of the jth data point in EM, u and S1 are the mean value and the inverse covariance matrix of normal sample reconstruction errors respectively. If tM exceeds the detection threshold td, meanwhile, the duration QM of out-of-limit is greater than threshold Qd, a potential fault of WT is detected. Then, for diagnosing the most relevant parameters or components of WT related to the detected fault, the contribution degree of each parameter to out-of-limit is calculated as:

Fig. 4. The procedure of WT anomaly detection.

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

distribution can be employed. To quantify the abnormal level of a WT, the amount of overlap between the current behavior distribution and a known normal behavior distribution is calculated with the correlation coefficient of two distributions. The more amount of overlap or degree of similarity between two distributions, the more normal the operation condition of WT is. AL is close to 0 accordingly. Conversely, the less amount of overlap or degree of similarity is, the more serious the abnormal level of a WT is, and AL approaches to 1. In this manuscript, the abnormal level is calculated by following equation:

ð FðxÞ,GðxÞdx AL ¼ 1  rðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð G2 ðxÞdx F 2 ðxÞdx

(16)

where, F(x) represents the test or current behavior distribution, G(x) is the baseline condition of the WT. Obviously, the abnormal level is between 0 and 1 based on the Cauchy-Schwarz inequality, wherein the greater AL represents larger distance or smaller overlap between the two distributions. The amount of overlap between current behavior distribution and a known normal behavior distribution is utilized to quantify the abnormal level of WT. The correlation coefficient of two distributions is employed to calculate the amount of overlap between two distributions in this manuscript. The probability density functions of two distributions are needed to calculate the correlation coefficient of two distributions. Gaussian mixture model (GMM), which refers to a linear combination of multiple Gaussian distribution functions, is the intensively used parametric model for approximating arbitrary distribution even there is no particular knowledge or assumption regarding the parametric form of the density. Theoretically, GMM enables to obtain the probability density function of arbitrary distribution. The monitoring indicator of WT does not follow any known distribution. Therefore, GMM can be used to decompose the non-Gaussian feature set of WTs into a combination of normal functions. The PDF for the GMM can be mathematically described in:

GðxÞ ¼

Xn

k Nðxjmi ; si Þ i¼1 i

(17)

P where, ki is the weight for the ith component satisfying i ki ¼ 1 and 0  ki  1. Nðxjmi ; si Þ denotes the ith Gaussian distribution component with mean mi and covariance matrix si. Expectation maximization (EM) algorithm is the most common approach to estimate the parameters of a GMM [42]. The EM algorithm alternates between performing an expectation step and a maximization step. The expectation step computes the posteriori probability or the expectation of latent variable utilizing the current estimate for the parameters. Meanwhile, the maximization step obtains the updated parameters by maximizing the loglikelihood found on the expectation step. Then, these parameterestimates are used to determine the distribution of the latent variables in the next expectation step. More information about the GMM can be found in Ref. [41]. In addition, the Bayesian Information Criterion (BIC) [43] is employed to determine the optimal number n of mixtures.

1475

utilized to verify the proposed approach. WT 12 suffered a sudden malfunction on Sep 21, 2016 due to the overheating of the generator bearing b. According to the maintenance records, this WT has been in normal operation for five months before this malfunction. The healthy SCADA data collected from May 1, 2016 to Sep 13, 2016 are selected as the training data to build the normal behavior model. Each sample contains the 1st to 18th variable data in Table 1. To build the normal behavior model, the normal SCADA data is preprocessed with bad data identification and correction to improve the data quality firstly. And then we linearly scale each variable in normal operating data to the range [0, 1]. The moving window processing is utilized to generate the ASDM of normal operating data for incorporating the temporal information. The width b of moving time window is set as 60 min, which means there are 6 data points contained in the moving window. For the 18 variables, our proposed MW-SMDAE model has 108 input and output nodes respectively. The SMDAE model is developed to reconstruct the ASDM of normal data. The number of hidden layers and the number of neurons in hidden layer both have little influence on the detection result, which will be discussed in the subsection. The hidden layers number of the MW-SMDAE model is set as 2, and the number of neurons in two hidden units are set as 20 and 10. The maximum number of L-BFGS iterations is set as 500. Fig. 5 presents the monitoring indicator values of 19584 training data of WT 12 obtained by the established MW-SMDAE model and the corresponding threshold td with confidence level 99.7%. The MW-SMDAE model is consist of two DAEs. The initial and terminal noise level of multiple noise levels c0, cT are set as 0.45 and 0.05 respectively, and Dc is defined as 0.05. The threshold Qd of out-oflimit duration is determined to be 6 according to the maximum duration of out-of-limit in the training data. Fig. 6 a) shows the histogram and estimated PDF of monitoring indicator with KDE. The estimated PDFs of monitoring indicator with different numbers of hidden layers, number of units in hidden layers and denoising levels are shown in Fig. 6b) and c) and d). The monitoring indicators derived by MDAE model with moving window are generally less than those derived by MW-SMDAE. The more the network layers, the more concentrated the probability density distribution of the normal data monitoring indicators is, and the longer the tail is, which is conducive to the determination of monitoring index threshold. The reason may be that deep nonlinear mapping ability and high-level representation learning ability of deep networks improve the reconstruction accuracy of the model and further make the distribution of monitoring indicators centralized. Moreover, the distribution of monitoring

5. Validation on SCADA data SCADA data come from an actual onshore wind farm located in Eastern China. This wind farm is consist of 33 WTs with doubly fed induction generators (DFIG) and nominal power 1.6 MW, marked WT 1 to WT 33. The cut-in, rated and cut-out wind speed of WT are 2 m/s, 10 m/s and 20 m/s respectively. The 10 min SCADA data are

Fig. 5. The monitoring indicator values of training data and corresponding threshold according to equation (12).

1476

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

Fig. 6. The distributions of monitoring indicator obtained with different model parameters.

indicator obtained by two-layer and three-layer MW-SMDAE are similar. Therefore, the network with two-layers has been able to obtain the features of input data. The deep network architecture can help capture more hidden information in SCADA data, but it is unnecessary with too many hidden layers in consideration of the calculation cost. As shown in Fig. 6 c), the PDFs obtained from the models with different numbers of hidden layers nodes are very close, which demonstrate that the number of nodes in hidden layers has little effect on the value of monitoring indicator. The testing data are reconstructed based on the trained MWSMDAE model to detect anomaly of WT. The test samples are rescaled according to the maximum and minimum values of training data to ensure both data sets in the similar range. Then the moving window processing with the same width is applied on the scaled testing data to obtain the corresponding ASDM. The monitoring indicator values during Sep 17 to 21, 2016 (5 days before fault) are derived from the reconstruction of test data with the trained MW-SMDAE model. Fig. 7 shows the totally 715 monitoring indicator values obtained by the MW-SMDAE model when the width is set as 6. It can

be seen that the monitoring indicator value exceeds the threshold td in the 577th data point of ASDM (corresponding to the 572nd data point in raw testing data), and then drops below the threshold in about four hours. However, the out-of-limit occurs again in the 655th data point, and continues until the downtime of WT 12. The anomaly in WT has been detected nearly 1 day (143 data points) ahead of the overheating malfunction of the generator bearing b. Fig. 8 and Fig. 9 present the monitoring results with SMDAE model and MW-SDAE model respectively. The anomaly of WT can also be detected with two models. However, the noise data and abnormal data are hard to distinguished in the SMDAE model because both of them would lead to the occurrence of over-limit. MW-SDAE model tends to learn detailed features when the noise level is fixed as 0.05. They are close to the downtime when the anomalies are detected by the above two models. In this case, there is not enough time left for the operator to take timely maintenance actions. The general features of input data are ignored and sometimes the detection results may be unreliable. Compared with the monitoring results of different models, the proposed MW-SMDAE model can provide earlier warning for impending failure of WT to

Fig. 7. The monitoring indicator values of testing data obtained by MW-SMDAE model with b setting as 6.

Fig. 8. The monitoring indicator values of testing data obtained by SMDAE model.

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

Fig. 9. The monitoring indicator values of testing data obtained by MW-SDAE model with fixed noise level 0.05.

avoid further deterioration or catastrophic failure and enable timely maintenance planning. 29 fault cases occurred in wind farm from 2016 to 2017 are adopted and 30 normal cases are also selected to further investigate the influence of several parameters on detection accuracy, including width of moving window b, the initial noise level c0, the final noise level cT and interval Dc. A single detection result refers to the detection result of 59 groups testing samples which including 29 faulty data samples and 30 normal data samples. The accuracy of single detection is defined as:

Accuracy ¼

tp þ tn 59

(18)

where, tp is the number of correctly detected faulty instances; tn is the number of correctly detected normal instances. The average detection accuracy of ten repeated detection results is utilized in the following analysis. The detection accuracy with different widths of moving window is given in Table 2. In this table, the initial noise level c0, the final noise level cT and interval Dc are set as 0.5, 0.05 and 0.05, respectively. It can be seen that the detection accuracy is obviously improved with the width increasing. The larger width value is, the more temporal dependence of each parameter is contained in the moving window, which further validates the necessity of incorporating temporal dependence of each parameter. However, the larger width value will produce a higher computational cost. Therefore, the width of moving window is set as 6 in this case. Table 3 depicts the detection accuracy under different combinations of c0, cT and Dc. The interval Dc is equal to 0.05 when the final noise level cT is 0.05. Otherwise, Dc is 0.1. The proposed MWSMDAE model training with multiple noise levels effectively improves the detection accuracy. The accuracy is greater than 96% when c0 is greater than 0.4 and cT is less than 0.2. When the input data is corrupted by excessive noise level, the information hidden in input data may be lost. In practice, the initial noise level can be set between 0.4 and 0.5. The smaller interval Dc is, the more input features are obtained by SDAE in the multiple noise levels training. The proposed anomalies detection method with moving window

1477

processing and multiple noise levels training enables SDAE to learn the manifold in more neighborhoods, thus improving the detection accuracy. As shown in Fig. 10, the contribution degree of each parameter on out-of-limit of monitoring indicator is calculated according to equation (15) using monitoring indicators presented in Fig. 7. The parameter of generator bearing b temperature scores highest, which is consistent with the actual overheating fault. The parameters of generator bearing a temperature and winding temperatures have large contribution degree due to the interaction among parameters of the same component. Therefore, the proposed contribution degree index in this paper is effective to identify the most relevant parameters or components of WT related to the detected fault. The prediction model of generator bearing b temperature using back propagation neural networks (BPNN) is developed to compare with proposed model. The prediction model is trained with most recent normal SCADA data of WT 12. The low bound and upper bound of 99.7% confidence interval are 1.8753  C and 1.9132  C. Fig. 11 shows the prediction error during Sep 17 to Sep 21, 2016. The predicted error of generator bearing b temperature increases obviously in Sep 21, 2016. Although the prediction error of generator bearing b temperature can also be used to identify the abnormal condition of WT 12, there are frequent misdiagnoses during Sep 20, 2016. Fig. 12 shows the wind speed of WT 12 increases rapidly from 14:00 to 14:30 on Sep 20, 2016. The highest fluctuation value is reach to 12 m/s. Fig. 13 presents the statistical distribution of wind speed variations within 30 min during the period which the training sample of prediction model of generator bearing b temperature belongs to. It can be seen that the wind speed variations within 30 min are mostly distributed between -5 m/s and 5 m/s. The wind speed variations within 30 min of WT 12 and WT 7 from Sep 2015 to Sep 2016 are shown in Fig. 14. It can be seen that only few wind speed variations are greater than 10 m/s. Therefore, when the training samples lack the rapidly fluctuating data of wind speed, corresponding prediction method can hardly provide accuracy predictions. In this case, the detection results for operational condition of WTs are unreliable. 29 fault cases and 30 normal cases are also adopted to further validate the presented method. Table 4 shows the detection results comparison of each method. The total accuracy of the proposed anomaly detection method is 98.14%, which demonstrates that the effectiveness of the anomaly identification method. It can be seen that univariable prediction-based method needs the least the training time. However, the prediction model of each parameter in a WT should be developed one by one in practice. Hence, the total time consumed will be greatly increased. Consequently, the proposed method can detect the anomalies in WTs accurately and quickly. Fig. 15 shows the calculated ALs of two different periods. The AL value is 0.3768 during Sep 17 to 19, 2016, and increases to 0.7550 during Sep 19 to 21, 2016. Obviously, the correlations of each parameter change more significantly when the fault is impending. Meanwhile, the reconstruction errors increase gradually, and the MDs of reconstruction errors are constantly away from the normal

Table 2 Detection accuracy with different widths of moving window. (%). `

b¼0

b¼1

b¼2

b¼3

b¼4

b¼5

b¼6

b¼7

b¼8

Normal Abnormal Total accuracy

76.67 83.10 79.83

91.67 87.24 89.49

93.33 89.66 91.53

94.33 92.76 93.56

96.67 93.10 94.92

98.67 94.83 96.78

99.33 96.90 98.14

99.33 97.24 98.31

99.67 97.59 98.64

1478

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

Table 3 Detection accuracy under different c0, cT and Dc. (Dc is equal to 0.05 only when cT is 0.05. Otherwise, Dc is equal to 0.1). Parameters in multiple noise ratio training

c0 ¼ 0

c0 ¼ 0.05

c0 ¼ 0.1

c0 ¼ 0.2

c0 ¼ 0.3

c0 ¼ 0.4

c0 ¼ 0.5

c0 ¼ 0.6

cT ¼ 0 cT ¼ 0.05 cT ¼ 0.1 cT ¼ 0.2 cT ¼ 0.3 cT ¼ 0.4 cT ¼ 0.5 cT ¼ 0.6

83.73 e e e e e e e

e 92.20 e e e e e e

e 93.90 92.37 e e e e e

e 95.08 93.56 92.54 e e e e

e 96.10 95.08 94.58 92.37 e e e

e 97.29 96.95 95.25 94.24 92.03 e e

e 98.14 97.46 96.61 95.93 94.07 91.86 e

e 97.46 96.27 96.78 96.10 95.76 93.73 91.36

Fig. 10. The contribution degree of each condition parameter.

Fig. 13. Statistical distribution of wind speed variation within 30 min of normal sample data.

Fig. 11. Prediction errors of generator bearing b temperature during Sep 17 to Sep 21, 2016.

Fig. 14. The statistical distributions of wind speed variation within 30 min of WT 12 and WT 7 from Sep 2015 to Sep 2016.

Fig. 12. Wind speed of WT 12 in September 2016.

condition. Hence, the amount of overlap between the test behavior distribution and baseline condition will be less and less. The operation condition of WT can be estimated by monitoring the changes of the AL. Therefore, the presented method for abnormal level quantification can accurately provide quantitative abnormal level and effectively track the development trends of operational condition of a WT.

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480

1479

Table 4 Detection accuracy and training time of each method. Methods

Abnormal (%)

Normal (%)

Total (%)

Training time (s)

Univariable prediction SMDAE MW-SDAE MW-SMDAE (Two-layer)

75.17 86.55 90.69 96.90

79.33 77.67 93.67 99.33

77.29 82.03 92.20 98.14

206 964 1172 1738

Presented abnormal level can accurately reflect the operational risk of WT for enabling timely maintenance planning. Remarkable advantages of the proposed method are based on its easy integration into the SCADA system, capacity for addressing largescale data and monitoring online. We hope our work is useful for researchers who are interested in WT condition monitoring, fault detection and helpful for wind farm operators to monitor the health condition of WT. Due to the limited faulty data that we collected at present, the presented approach for detecting anomaly in a WT was verified with actual SCADA data of a wind farm located in Eastern China. The availability of proposed approach will be further validated with SCADA data of other wind farms in our future work. Moreover, it is necessary to analyze the influence of various control strategies such as power limiting and frequency modulation on the operation condition of a WT. Acknowledgments The work presented in this paper was jointly supported by the National Natural Science Foundation of China (51637004), the National Key Basic Research Program of China (973 Program) (2012CB215205) and the National 111 Project of the Ministry of Education of China (B08036). References

Fig. 15. The overlap between the test behavior distribution and baseline distribution.

6. Conclusions An innovative approach for anomaly detection of WTs was presented based on multivariable analysis in this study. A SMDAE with moving window processing and multiple noise levels, named MW-SMDAE, has been modelled for multivariable reconstruction. The moving window processing can help capture the correlations among multivariable and temporal dependency inherent in each variable simultaneously. Training the model with multiple noise levels can facilitate to learn the general and detailed features of inputs. Incorporating the temporal information and learning both the coarse-grained and fine-grained features have greatly improved the detection performance. The MD of reconstruction error is selected as the monitoring indicator for detecting anomaly in WT expediently and effectively. The threshold value of monitoring indicator and maximum duration of exceedance of monitoring indicator are determined by statistical analysis of the values of the monitoring indicator during normal operation. The most relevant parameter related to the detected anomaly is identified by calculating the contribution degree of each variable. The abnormal level is quantified based on the overlap between test behavior distribution and baseline condition of WT. Validation on the SCADA data collected from a wind farm in Eastern China has been conducted. The results show that the proposed approach is effective for the anomalies detection and early warnings of actual WTs.

[1] A. Chehouri, R. Younes, Review of performance optimization techniques applied to wind turbines, Appl. Energy 142 (2015) 361e388. http://doi.org/10. 1016/j.apenergy.2014.12.043. [2] W. Qiao, D. Lu, A survey on wind turbine condition monitoring and fault diagnosisdPart II: signals and signal processing methods, IEEE Trans. Ind. Electron. 62 (10) (2015) 6546e6557. http://doi.org/10.1109/TIE.2015. 2422394. [3] M.D.P. Gil, O. Gomis-bellmunt, A. Sumper, Technical and economic assessment of offshore wind power plants based on variable frequency operation of clusters with a single power converter, Appl. Energy 125 (2014) 218e229. http://doi.org/10.1016/j.apenergy.2014.03.031. [4] Z. Hameed, Y.S. Hong, Y.M. Cho, S.H. Ahn, C.K. Song, Condition monitoring and fault detection of wind turbines and related algorithms: a review, Renew. Sustain. Energy Rev. 13 (1) (2009) 1e39. http://doi.org/10.1016/j.rser.2007.05. 008. [5] A. Kusiak, Z. Zhang, A. Verma, Prediction, operations, and condition monitoring in wind energy, Energy 60 (0) (2013) 1e12. http://doi.org/10.1016/j. energy.2013.07.051. [6] H. Sanchez, T. Escobet, V. Puig, et al., Fault diagnosis of an advanced wind turbine benchmark using interval-based arrs and observers, IEEE Trans. Ind. Electron. 62 (6) (2015) 3783e3793. https://doi.org/10.1109/TIE.2015. 2399401. [7] Z. Feng, S. Qin, M. Liang, Time-frequency analysis based on Vold-Kalman filter and higher order energy separation for fault diagnosis of wind turbine planetary gearbox under nonstationary conditions, Renew. Energy 85 (2016) 45e56. https://doi.org/10.1016/j.renene.2015.06.041. [8] P. Cross, X.D. Ma, Nonlinear system identification for model-based condition monitoring of wind turbines, Renew. Energy 71 (11) (2014) 166e175. https:// doi.org/10.1016/j.renene.2014.05.035. [9] C.W. Chan, H. Song, H. Zhang, Application of fully decoupled parity equation in fault detection and identification of DC motors, IEEE Trans. Ind. Electron. 53 (4) (2006) 1277e1284. http://doi.org/10.1109/TIE.2006.878304. [10] B. Xiao, X. Yang, H.R. Karimi, et al., Asymptotic tracking control for a more representative class of uncertain nonlinear systems with mismatched uncertainties, IEEE Trans. Ind. Electron. 66 (12) (2019) 9417e9427. http://doi. org/10.1109/TIE.2019.2893852. [11] B. Xiao, S. Yin, H. Gao, Reconfigurable tolerant control of uncertain mechanical systems with actuator faults: a sliding mode observer-based approach, IEEE

1480

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

J. Chen et al. / Renewable Energy 147 (2020) 1469e1480 Trans. Control Syst. Technol. 26 (2018) 1249e1258. http://doi.org/10.1109/ TCST.2017.2707333. B. Xiao, S. Yin, Exponential tracking control of robotic manipulators with uncertain dynamics and kinematics, IEEE Trans Ind Inform 15 (2) (2019) 689e698. http://doi.org/10.1109/TII.2018.2809514. L. Cao, D. Qiao, X. Chen, Laplace [1 Huber based cubature Kalman filter for attitude estimation of small satellite, Acta Astronaut. 148 (2018) 48e56. https://doi.org/10.1016/j.actaastro.2018.04.020. A. Abouhnik, A. Albarbar, Wind turbine blades condition assessment based on vibration measurements and the level of an empirically decomposed feature, Energy Convers. Manag. 64 (4) (2012) 606e613. https://doi.org/10.1016/j. enconman.2012.06.008. P. Caselitz, G.W. Bussel, F. Spinato, Rotor condition monitoring for improved operational safety of offshore wind energy converters, J Sol Energy Eng Trans 127 (2) (2005) 253e261. http://doi.org/10.1115/1.1850485. S. Soua, P.V. Lieshout, A. Perera, T.H. Gan, B. Bridge, Determination of the combined vibrational and acoustic emission signature of a wind turbine gearbox and generator shaft in service as a pre-requisite for effective condition monitoring, Renew. Energy 51 (2013) 175e181. http://doi.org/10.1016/j. renene.2012.07.004. A. Kusiak, A. Verma, A data-mining approach to monitoring wind turbines, IEEE Trans Sustain Energy 3 (1) (2012) 150e157. http://doi.org/10.1109/tste. 2011.2163177. A. Zaher, S.D.J. Mcarthur, D.G. Infield, Y. Patel, Online wind turbine fault detection through automated SCADA data analysis, Wind Energy 12 (6) (2010) 574e593. http://doi.org/10.1002/we.319. P. Bangalore, L.B. Tjernberg, An artificial neural network approach for early fault detection of gearbox bearings, IEEE Trans Smart Grid 6 (2) (2017) 980e987. http://doi.org/10.1109/TSG.2014.2386305. P. Sun, J. Li, C. Wang, X. Lei, A generalized model for wind turbine anomaly identification based on SCADA data, Appl. Energy 168 (2016) 550e567. http:// doi.org/10.1016/j.apenergy.2016.01.133. Y.Q. Liu, J. Shi, Y.P. Yang, W. Lee, Short-term wind-power prediction based on wavelet transform-support vector machine and statistic-characteristics analysis, IEEE Trans. Ind. Appl. 48 (2012) 1136e1141. http://doi.org/10.1109/TIA. 2012.2199449. G. Peng, D. Infield, X. Yang, Wind turbine generator condition-monitoring using temperature trend analysis, IEEE Trans Sustain Energy 3 (1) (2012) 124e133. http://doi.org/10.1109/tste.2011.2163430. M. Schlechtingen, I.F. Santos, S. Achiche, Wind turbine condition monitoring based on SCADA data using normal behavior models. Part 1: system description, Appl. Soft Comput. 13 (1) (2013) 259e270. http://doi.org/10. 1016/j.asoc.2012.08.033. P.B. Dao, W.J. Staszewski, T. Barszcz, T. Uhl, Condition monitoring and fault detection in wind turbines based on cointegration analysis of SCADA data, Renew. Energy 116 (2018) 107e122. http://doi.org/10.1016/j.renene.2017.06. 089. W.X. Yang, R. Court, J.S. Jiang, Wind turbine condition monitoring by the approach of SCADA data analysis, Renew. Energy 53 (2013) 365e376. http:// doi.org/10.1016/j.renene.2012.11.030. R. Zhao, R. Yan, Z. Chen, K. Mao, R.X. Gao, Deep learning and its applications to machine health monitoring, Mech. Syst. Signal Process. 115 (2019) 213e237. http://doi.org/10.1016/j.ymssp.2018.05.050. L. Wang, Z. Zhang, H. Long, J. Xu, R. Liu, Wind turbine gearbox failure

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

identification with deep neural networks, IEEE Trans Ind Inform 13 (3) (2017) 1360e1368. http://doi.org/10.1109/TII.2016.2607179. T. Ince, S. Kiranyaz, L. Eren, et al., Real-time motor fault detection by 1-D convolutional neural networks, IEEE Trans. Ind. Electron. 63 (11) (2016) 7067e7075. http://doi.org/10.1109/TIE.2016.2582729. Li S, Liu G, Tang X, Lu J, Hu J. An ensemble deep convolutional neural network model with improved D-S evidence fusion for bearing fault diagnosis. Sensors, 17(8), 1729. http://doi.org/10.3390/s17081729. M. Gan, C. Wang, C. Zhu, Construction of hierarchical diagnosis network based on deep learning and its application in the fault pattern recognition of rolling element bearings, Mech. Syst. Signal Process. 72 (2016) 92e104. http://doi. org/10.1016/j.ymssp.2015.11.014. C. Lu, Z.Y. Wang, W.L. Qin, J. Ma, Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification, Signal Process. 130 (2017) 377e388. http://doi.org/10.1016/j.sigpro. 2016.07.028. L. Wang, Z. Zhang, J. Xu, R. Liu, Wind turbine blade breakage monitoring with deep autoencoders, IEEE Trans Smart Grid 9 (4) (2018) 2824e2833. http://doi. org/10.1109/TSG.2016.2621135. W. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemometr. Intell. Lab. Syst. 30 (1) (1995) 179e196. http://doi.org/10.1016/0169-7439(95)00076-3. W. Lin, Y. Qian, X. Li, Nonlinear dynamic principal component analysis for online process monitoring and diagnosis, Comput. Chem. Eng. 24 (2000) 423e429. http://doi.org/10.1016/s0098-1354(00)00433-6. P. Vincent, H. Larochelle, Y. Bengio, et al., Extracting and composing robust features with denoising autoencoders, in: Proc 25th Int Conf Mach Learn, (ICML), 2008, pp. 1096e1103. http://doi.org/10.1145/1390156.1390294. Q.V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, A.Y. Ng, On optimization methods for deep learning, in: Proc 28th Int Conf Mach Learn, (ICML), 2011, pp. 265e272, http://doi.org/10.1.1.220.8705. B. Chandra, R.K. Sharma, Adaptive noise schedule for denoising autoencoder, in: Proc 21st Int Conf Neural Inf Process, ICONIP), 2014, pp. 535e542. http:// doi.org/10.1007/978-3-319-12637-1_67. lez de la, Wind farm monitoring using Mahalanobis disR.R. Hermosa Gonza tance and fuzzy clustering, Renew. Energy 123 (2018) 526e540. http://doi. org/10.1016/j.renene.2018.02.097. J. Lin, Q. Chen, Fault diagnosis of rolling bearings based on multifractal detrended fluctuation analysis and Mahalanobis distance criterion, Mech. Syst. Signal Process. 38 (2) (2013) 515e533. http://doi.org/10.1016/j.ymssp. 2012.12.014. N. Patil, D. Das, M. Pecht, Anomaly detection for IGBTs using Mahalanobis distance, Microelectron. Reliab. 55 (7) (2015) 1054e1059. http://doi.org/10. 1016/j.microrel.2015.04.001. P.E.P. Odiowei, Y. Cao, Nonlinear dynamic process monitoring using canonical variate analysis and kernel density estimations, IEEE Trans Ind Inform 6 (1) (2010) 36e45. http://doi.org/10.1109/TII.2009.2032654. R. Singh, B.C. Pal, R.A. Jabr, Statistical representation of distribution system loads using Gaussian mixture model, IEEE Trans. Power Syst. 25 (1) (2010) 29e37. http://doi.org/10.1109/TPWRS.2009.2030271. L. Liao, J. Lee, A novel method for machine performance degradation assessment based on fixed cycle features test, J. Sound Vib. 326 (2009) 894e908. http://doi.org/10.1016/j.jsv.2009.05.005.