A semi-supervised Support Vector Data Description-based fault detection method for rolling element bearings based on cyclic spectral analysis

A semi-supervised Support Vector Data Description-based fault detection method for rolling element bearings based on cyclic spectral analysis

Mechanical Systems and Signal Processing 140 (2020) 106682 Contents lists available at ScienceDirect Mechanical Systems and Signal Processing journa...

6MB Sizes 0 Downloads 45 Views

Mechanical Systems and Signal Processing 140 (2020) 106682

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing journal homepage: www.elsevier.com/locate/ymssp

A semi-supervised Support Vector Data Description-based fault detection method for rolling element bearings based on cyclic spectral analysis Chenyu Liu a,b, Konstantinos Gryllias a,b,⇑ a b

Faculty of Engineering Science, Department of Mechanical Engineering, KU Leuven Celestijnenlaan 300, Box 2420, 3001 Leuven, Belgium Dynamics of Mechanical and Mechatronic Systems, Flanders Make, Belgium

a r t i c l e

i n f o

Article history: Received 5 June 2019 Received in revised form 18 November 2019 Accepted 26 January 2020

Keywords: Condition monitoring Rolling element bearings Fault detection Cyclic Spectral Correlation Cyclic Spectral Coherence Support Vector Data Description

a b s t r a c t The early and accurate detection of rolling element bearing faults, which is closely linked to the timely maintenance and repair before a sudden breakdown, is still one of the key challenges in the area of condition monitoring. Nowadays in the frames of research advanced signal processing techniques are combined with high level machine learning approaches, focusing towards automatic fault diagnosis and decision making. A plethora of Health Indicators (HIs) have been proposed to feed in machine learning models in orderto track the system degradation. Cyclic Spectral Analysis (CSA), including Cyclic Spectral Correlation (CSC) and Cyclic Spectral Coherence (CSCoh), has been proved as a powerful tool for rotating machinery fault detection. Due to the periodic mechanism of bearing fault impacts, the HIs extracted from the Cyclostationary (CS) domain can detect bearing defects even in premature stage. On the other hand, supervised machine learning approaches with labelled training and testing datasets cannot be yet realistically applied in industrial applications. In order to overcome this limitation, a novel semi-supervised Support Vector Data Description (SVDD) with negative samples (NSVDD) fault detection approach is proposed in this paper. The NSVDD model utilizes CS indicators to build the feature space, and fits a hyper-sphere to calculate the Euclidean distances in order to isolate the healthy and faulty data. An uniform object generation method is adopted to generate artificial outliers as negative samples for the NSVDD. A systematic fault detection decision strategy is proposed to estimate the bearing status simultaneously with the detection of fault initiation. Furthermore, a multi-level anomaly detection framework is built based on data at i) single sensor level, ii) machine level and iii) entire machine fleet level. Three run-to-failure bearing datasets including signals from twelve bearings are used to implement the proposed fault detection methodology. Results show that, the CS based indicators outperform time indicators and Fast Kurtogram (FK) based Squared Envelope Spectrum (SES) indicators. Moreover, the proposed NSVDD model show superior characteristics in anomaly detection compared to three classification methodologies, i.e. the Back-Propagation Neural Network, the Random Forest and the K-Nearest Neighbour. Ó 2020 Elsevier Ltd. All rights reserved.

⇑ Corresponding author at: Faculty of Engineering Science, Department of Mechanical Engineering, KU Leuven Celestijnenlaan 300, Box 2420, 3001 Leuven, Belgium. E-mail address: [email protected] (K. Gryllias). https://doi.org/10.1016/j.ymssp.2020.106682 0888-3270/Ó 2020 Elsevier Ltd. All rights reserved.

2

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

1. Introduction Rolling element bearings are important components of rotating machinery and are highly sensitive to the external operation environment. Bearing failure caused by mounting errors, poor lubrication, debris contamination, wear etc. may trigger a breakdown of the machinery or even a fatal accident [1]. Therefore, early bearing fault detection arises as a critical mission in the frame of predictive maintenance, receiving extensive attention during the last years. Data-driven based fault detection technologies decrease the reliance on dedicated physical and mathematical models, and therefore have been widely applied in the field of condition monitoring of complex mechanical and mechatronic systems. According to recent literature [2–4], the framework of data-driven fault detection consists of four steps: i) data acquisition, ii) construction of Health Indicators (HIs), iii) determination of the decision threshold, and iv) detection of anomaly. Vibration signals can be measured in the data acquisition step to detect rolling element bearing faults. HIs, derived from different domains, are used to track the bearing fault progression. Time domain HIs, such as Root Mean Square (RMS), kurtosis, skewness, etc. have been widely employed to represent the statistical features of the time sequence. However, the time domain indicators can neither characterize nonlinearities of bearing vibrations, nor provide high class separability of the anomalies [5]. On the other hand, a plethora of signal processing methods have been developed to enhance the embedded defect information, including Envelope Analysis [6], Spectral Kurtosis [7], Short-Time Fourier Transform (STFT) [8], Wavelet Transform [9], Cyclic Spectral Analysis (CSA) [10] etc. These approaches allow for the extraction of sophisticated HIs which can detect, diagnose and track the evolution of defects in various industrial applications. Among the above mentioned signal processing methods, Cyclic Spectral Analysis recently gained much attention due to its capability of revealing the second-order Cyclostationary (CS) periodicities of rolling element bearing signals. It has been proved that the vibration signals of faulty bearings, especially with localized faults, exhibit cyclostationary behaviour [11,12]. Cyclic Spectral Correlation (CSC) and Cyclic Spectral Coherence (CSCoh) have been developed as efficient tools in cyclostationary analysis. They represent the potential fault modulation information into frequency-frequency domain bivariable maps. The integration over the spectral frequency, leads to the estimation of the Enhanced Envelope Spectrum (EES), which reveals the modulation frequencies and their harmonics [13]. Due to the periodic mechanism of the bearing faults’ impacts, the EES can provide a clearer representation of bearing faults compared to other methods such as the Squared Envelope Spectrum (SES) [14]. Some approaches have been proposed to construct HIs from EES. In [15], a self-running bearing diagnosis framework with scalar indicators derived from EES has been set up. Moreover an indicator, equal to the sum of the amplitudes of a number (e.g. three) harmonics of the characteristic fault frequencies at the EES, has been used initially for diagnosis and further has been extended for prognosis [16]. Another key issue is the definition of the threshold, which can trigger the detection based on the HIs, estimated on signals emitted by faulty bearings. On the one hand, the traditional statistical decision threshold strongly depends on the expert knowledge of the indicators’ statistical distribution as well as on the empirical understanding of bearing faults. On the other hand, the detection threshold should be able to precisely isolate the faults meanwhile tolerating outliers to maintain a robust performance of the false alarm and misdetection rate under different external environments. Nowadays machine learning approaches show great potential to achieve this comprehensive target. Moreover in the area of automatic fault diagnosis, machine learning algorithms are considered as black-box models which can extract system behaviours from data. Therefore, they can provide continuous information for the rapid deterioration of bearings and detect anomalies. Support Vector Machine (SVM) is one of the initial machine learning attempts in real-world fault detection applications [17]. The major principle of SVM is to seek an optimal hyper-plane as the decision surface that can maximize the margin of separation between two classes. In the case of bearing fault detection, the hyper-plane of SVM has been used as the detection threshold to segment the indicators extracted from bearing signals into healthy and faulty classes [18,19]. With the rise of deep learning concept, Deep Neural Networks (DNN) are developed to fulfill the task of fault detection in different scenarii. A supervised non-linear autoregressive neural network with exogenous input was trained in [20] to model the healthy condition of bearings meanwhile setting up the detection threshold using the Mahalanobis distance to detect the bearing faults. Furthermore, a Long Short-Term Memory (LSTM) network was explored as a distribution estimator to find the dependencies from time sequences. By defining a threshold based on the distribution deviation, the appearance of bearing faults can be detected using the trained LSTM model [21]. Although DNN gains much attention in the recent researches, its drawbacks are still evident and significant: extremely time consuming fine-tuning of the hyperparameters, difficult to interpret, and highly vulnerable to the overfitting problem. On the other hand, in real industrial conditions, the bearings are operating for most of the time in healthy conditions and the faulty datasets are rather limited and usually not labelled. Therefore standard supervised machine learning approaches, which are based on known labels of the classes, cannot be used for fault detection. Semi-supervised fault detection based on the Support Vector Data Description (SVDD) technique is one of the solutions to overcome these limitations. SVDD has been proposed by Tax and Duin [22] and has been successfully used in one-class classification problems for various applications. SVDD is inspired by SVM but uses a hyper-sphere, instead of a hyper-plane, to solve the non-linear separable problem. SVDD provides a geometric description of the observed data by calculating the Euclidean distances between the data and the center of the hyper-sphere. It is trained by minimizing the volume of the sphere, which includes the majority of the training data. When testing with the trained SVDD, any data found to have larger distance than the boundary of the sphere, are considered to be anomalies. In the bearing fault detection, the hyper-sphere of the SVDD

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

3

can be used as the detection threshold. Furthermore, Tax and Duin [23] also proposed the SVDD with negative samples (NSVDD), which uses extra constraints with negative samples to give a more accurate margin of the data. By employing two slack variables, NSVDD has been proved to reach higher classification accuracy with a varying offset of the radius of the hyper-sphere [24]. In this paper, advanced cyclostationary diagnostic indicators, based on the Cyclic Spectral Correlation (CSC) and the Cyclic Spectral Coherence (CSCoh), are estimated and are combined for the first time with a powerful semi supervised learning method, the Support Vector Data Description with negative samples (NSVDD), in order to detect early and accurately bearing faults. Moreover the uniform object generation method is used, in the context of semi-supervised learning, to generate artificial outliers as negative samples during the training of the NSVDD classifier. The parameters of the NSVDD model are then optimized to minimize the target rejection and the outlier acceptance rate. A comprehensive bearing status decision strategy is further proposed within this framework. The methodology is tested and evaluated on experimental data from run-tofailure bearing datasets, under three novel concepts: sensor level, machine level and fleet level. The results demonstrate the efficacy of the method, presenting high detection rate with low false alarm and misdetection rate. The main contributions of this paper can be summarized as following: 1. Advanced CS-based indicators are used for the first time in combination with a semi supervised learning method (NSVDD) for fault detection. Moreover CS-based indicators, indicators based on the Fast Kurtogram based Squared Envelope Spectrum (FK-based SES) and classical time indicators are used as inputs in the NSVDD and the results are compared leading to the conclusion that the first ones provide the best results. 2. The NSVDD model, which is constructed with artificial outliers using the uniform object generation algorithm, is proposed instead of the classical SVDD model. The two models are compared and the former one achieves better classification performance. Moreover the NSVDD outperforms methods based on the Back-Propagation Neural Network (BPNN), the Random Forest, and the K-Nearest Neighbour (KNN). 3. An advanced anomaly decision making strategy is fused in the novel detection method, providing three stages of the fault development i.e. ‘‘Healthy”, ‘‘Warning” and ‘‘Faulty”. The strategy can provide robust detection results and can find immediate application in industry. 4. The methodology is applied and evaluated on the well-known NASA IMS data in a different concept compared to the majority of relevant literature. First of all, all the signals, healthy and faulty, are processed evaluating the ratio of false alarms and missed detections. Additionally the 3 datasets, captured by 4–8 sensors, are used at three different levels of training and testing: a) at a sensor level independently, b) at a machine level and c) at a fleet level, assuming that the three tests are setting a fleet. The rest of the paper is organised as following: the background of Cyclic Spectral Analysis is discussed in Section 2 and the theory of SVDD is introduced in Section 3. The proposed methodology and the experimental setup are presented respectively in Section 4 and 5. The results of the method are presented and analysed in Section 6. Finally, the paper closes with some conclusions.

2. Cyclic spectral analysis The concept of cyclostationarity was first proposed in the field of telecommunications and later was introduced to the mechanical engineering community. As a subcategory of non-stationary processes, cyclostationarity is a stochastic process that exhibits hidden periodicities embedded in systems. When a fault is generated in rolling element bearings in different locations (inner race, outer race or rolling element), a series of shocks in repetition form is generated, modulated by other frequencies like the shaft speed. This phenomenon can be described as cyclostationary and can be exploited to detect bearing damage. Different orders of cyclostationarity are defined based on the order of moments. The first-order cyclostationarity (CS1) is described as:

R1x ðtÞ ¼ R1x ðt þ TÞ ¼ EfxðtÞg

ð1Þ

where the first-order moment, called the statistical mean, R1x ðtÞ is periodic with period T. xðtÞ represents the time signal. EfxðtÞg is the ensemble average, which represents the average of the same stochastic process of repeated experiments. CS1 exhibits as periodic waveforms in vibration signals of rotating machinery and can be generated by imbalances, misalignments or flexible coupling [12]. Moreover, second-order cyclostationary (CS2) depicts the periodicity of second-order moments with the autocorrelation function:

R2x ðt; sÞ ¼ R2x ðt þ T; sÞ ¼ EfxðtÞxðt  sÞ g

ð2Þ

where s is the time-lag, and the star  denotes the complex conjugation. CS2 provides a distinction of the stochastic process with amplitude or frequency modulation. CS2 has been proved extremely effective to achieve diagnosis on rotating components that are not completely phase-locked with shaft speeds, such as the rolling element bearings.

4

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

The Cyclic Spectral Correlation (CSC) is designed as a tool to describe CS1 and CS2 signals in the frequency-frequency domain. It is defined as the autocorrelation function of two frequency variables using the two-dimensional Fourier transform, as shown in Eq. (3):

CSCða; f Þ ¼ lim

1

W!1 W

EfF W ½xðtÞF W ½xðt þ sÞg

ð3Þ

where a is the cyclic frequency related with the modulation and f is the spectral frequency representing the carrier. F stands for the Fourier Transform and W is the time duration of the signal. Furthermore, a normalization procedure can be added to the CSC, to minimize uneven distributions, which is known as Cyclic Spectral Coherence (CSCoh). The CSCoh is estimated as:

CSCða; f Þ CSCohða; f Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi CSCð0; f ÞCSCð0; f  aÞ

ð4Þ

The integration of the CSC and the CSCoh over the spectral frequency f from zero Hz up to the Nyquist frequency can lead to the Enhanced Envelope Spectrum (EES) which is an improved version of the Squared Envelope Spectrum. If the integration is realised in a specific bandwidth [F 1 ; F 2 ] then the spectrum is termed Improved Envelope Spectrum (IES). The CSC and CSCoh based IES are described in Eqs. (5) and (6) respectively:

IESCSC ðaÞ ¼

1 F2  F1

IESCSCoh ðaÞ ¼

Z

1 F2  F1

F2

jCSCða; f Þjdf

F1

Z

F2

F1

ð5Þ

jCSCohða; f Þjdf

ð6Þ

Both the two forms of the IES can better enhance non-zero cyclic components i.e. the characteristic frequencies of bearing faults. Due to the regularization process, the CSCoh provides better results for detection while the CSC is better for fault trending. 3. Support Vector Data Description (SVDD) 3.1. Theory of SVDD Semi-supervised learning approaches focus on partially labelled datasets. The objective of a semi-supervised model is to classify some of the unlabelled data, leveraging information from the labelled part. SVDD has been proposed as a one-class classifier in order to distinguish one class of data from the rest, therefore can be adapted in the framework of semisupervised learning to exploit information of both labelled and unlabelled samples. SVDD can be seen as an extension of SVM, which uses a hyper-sphere instead of the hyper-plane as the classification decision surface [23]. The sketch of SVDD classifier dealing with one-class classification problem is shown in Fig. 1. As shown in the sketch map, the hyper-sphere is characterized by the center a and the radius R, which are the decision variables. The labelled data ‘‘targets” are regarded as the training set of SVDD. The data, being exactly on the boundary, are the support vectors and the data outside are ‘‘outliers”. The primary goal of SVDD is to construct a hyper-sphere with a minimum radius, which simultaneously contains the maximum number of targets. The objective function can be described by Eq. (7):

FðR; aÞ ¼ min R2 þ C

n X ni

ð7Þ

i¼1

This function subjects to two constraint conditions: kxi  ak2 6 R2 þ ni ; 8i ¼ 1; . . . ; n and ni P 0; 8i ¼ 1; . . . ; n. Here xi 2 R is the training dataset with n samples. ni is the slack variable which represents the tolerance of targets being outside the

Fig. 1. Sketch map of SVDD.

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

5

boundary to avoid an extremely large radius hyper-sphere that reduces the descriptive ability. C is the penalty factor that controls the trade-off between the radius and the rejected number of data points. Eq. (7) is a convex quadratic programming problem which cannot be solved for unknown R. However, it can be transformed to an equivalent dual problem with Lagrange duality. By applying Lagrange multipliers, the constraint conditions can be fused into the objective function, which turns it to a dual form as Eq. (8):

L ¼ max

n X

X

i¼1

i;j

ai ðxi  xi Þ 

ai aj ðxi  xj Þ

ð8Þ

where ðxj  xi Þ stands for the inner product of xi and xj and ai 2 R are the Lagrange multipliers. New conditions are set for the P dual form: ni¼1 ai ¼ 1 and 0 6 ai 6 C; 8i ¼ 1; . . . ; n. The dual form is composed of the data itself which makes the problem solvable. The training data xi and its corresponding ai are related with the radius R and the center a of the hyper-sphere according to Eq. (9):

kxi  ak2 < R2 () ai ¼ 0 kxi  ak2 ¼ R2 () 0 < ai < C

ð9Þ

kxi  ak > R () ai ¼ C 2

2

The inner product ðxi  xj Þ in the objective function of the Eq. (8) can be replaced with kernel functions Kðxi ; xj Þ to make a description for non-linear datasets. The kernel function can map the data to a higher dimension feature space which makes the non-linear data separable. Among all the kernel functions, Gaussian kernel is the most common choice and is also adopted in this paper as described in Eq. (10):

Kðxi ; xj Þ ¼ exp

kxi  xj k2 ; 2s2

ð10Þ

where s represents the kernel width parameter. By using the Gaussian kernel, the distance of any observation z to the center of the sphere can be described as follows: 2

dist ðzÞ ¼ Kðz; zÞ  2

X

X

i

i;j

ai Kðxi ; zÞ þ

ai aj Kðxi ; xj Þ

ð11Þ 2

Considering the previous spherical data boundary conditions in Eq. (9), dist ðzÞ < R2 represents the position inside the 2

2

2

2

sphere and dist ðzÞ ¼ R is on the boundary of the sphere. When dist ðzÞ > R , then z is recognized as an outlier. It should be noticed that the width of the kernel function s is a key user-defined parameter which can significantly influence the classification performance through the target rejection fraction f T . It has been proved in [22] that, for an SVDD classifier with a Gaussian kernel, when f T is calculated as a function of s, its value is close to the fraction of support vectors obtained by Eq. (12):

E½Pðerror target setÞ ¼

#SV N

ð12Þ

where #SV is the number of support vectors and N is the total number of samples in the target set. The fraction of objectives which become support vectors can be estimated with a leave-one-out method on the target samples, therefore can be used to measure f T . On the other hand, the variable m can be defined as the upper bound for the ‘‘targets” that are outside of the SVDD i.e. the rejection fraction of targets [25]. The relationship between m and C can be described as:



1 NC

ð13Þ

A flexible boundary of the hyper-sphere, that can reject a certain fraction of target samples, can be established for various values of s and m. The number of support vectors will be used to estimate and minimize the error of the classifier. 3.2. SVDD with negative samples When the training set contains not only ‘‘target” samples (positive samples) but also data from the ‘‘outlier” class (negative samples), an NSVDD model can be built to obtain a more accurate description considering the slack variables of both the target and the outlier classes. The following equation is derived to represent the NSVDD:

FðR; aÞ ¼ min R2 þ C 1

n m X X ni þ C 2 nl i¼1

ð14Þ

l¼1

where ni and nl are slack variables which represent, respectively, the tolerances of the target and the outlier samples. C 1 and C 2 are the corresponding penalty factors. The constrains are:

6

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

kxi  ak2 6 R2 þ ni ;

ni P 0; i ¼ 1; 2; . . . ; n

kxl  ak2 P R2  nl ;

nl P 0; l ¼ 1; 2; . . . ; m

ð15Þ

where xi and xl denote respectively the positive and negative training samples. The objective function can be described with:

L ¼ max

X

X

X

X

X

i¼1

l¼1

i;j

l;j

l;m

ai ðxi  xi Þ 

al ðxl  xl Þ 

ai aj ðxi  xj Þ þ 2

al aj ðxl  xj Þ 

al am ðxl  xm Þ

ð16Þ

In order to get an unitary objective function of the standard SVDD and NSVDD, the Lagrange multipliers can be modified with the labels of the data i.e. a0i ¼ yi ai , where yi ¼ 1 for positive samples, and yi ¼ 1 for negative samples. Then Eqs. (8) and (16) become equivalent. Thus, when negative samples are available for training, the objective function of the NSVDD can be derived by using a0i instead of ai in Eq. (8). A classifier should have a generalization ability, which can isolate the ‘‘target” samples which are far from the ‘‘target” distribution, meanwhile accepting the ‘‘outlier” that fall into it. As discussed in [22], there are two kinds of errors for a one-class classifier: i) the rejection fraction of the target samples f T and ii) the acceptance fraction of the outliers f Oþ . Considering the two penalty factors, the variable m from Eq. (13) can be extended to the upper bound for both the target outside of the SVDD and the outlier inside the SVDD i.e. f T and f Oþ [25]. The relationship of m; C and the number of target samples N can then be extended to:

m1 ¼

1 ; NC 1

m2 ¼

1 NC 2

ð17Þ

where m1 and m2 represent, respectively, the upper bound of f T and f Oþ . Considering both the rejected targets and the accepted outliers, a general form of error for an NSVDD classifier can be described as:

K ¼ kf T þ ð1  kÞf Oþ

ð18Þ

where k represents the weight corresponding to the number of targets and outliers. When k ¼ 12, the number of target and outlier objects are considered to be the same, and the errors for the targets and outliers are weighted equally [25]. In the NSVDD framework with parameter s; m1 and m2 , the error can be described as:

Kðs; m1 ; m2 Þ ¼ k

#SV þ ð1  kÞf Oþ N

ð19Þ

A standard SVDD classifier can only be optimized with a target rejection rate by computing the fraction of the support vectors. In order to take the f Oþ into account for a more comprehensive optimization, artificial outliers are planted in the proposed NSVDD model, which will be illustrated in Section 4.2. 4. Proposed methodology 4.1. Health Indicator construction In order to realise the fault detection, Health Indicators (HIs) need to be constructed to evaluate the current condition as well as the degradation level of the monitored bearing. The goodness of HIs can be characterized by their monotonicity, their trendability and their prognosability. In real applications, the HIs should be able to detect bearing faults in early stage meanwhile minimizing the false alarm and misdetection rate. Time domain statistic features, such as the Root Mean Square (RMS), the kurtosis, the skewness are classically used HIs but their the effectiveness is usually limited. To overcome this limitation a number of signal processing tools, which exploit the property of cyclostationarity of rotating machinery vibrations, have been proposed. In this paper, the HIs are constructed based on features extracted from the EES, as shown in Table 1. The CS-based HIs are defined as the amplitude of the harmonics of the bearing characteristic frequencies obtained from the EES, i.e. the Shaft Rotation speed (SR), the Ball Pass Frequency Outer race (BPFO), the Ball Pass Frequency Inner race (BPFI)

Table 1 Health indicators from the CSC and the CSCoh domain. S1R S2R S3R S4R SSR S1O S2O S3O S4O SOR S1I

EES peak EES peak EES peak EES peak Sum of 4 EES peak EES peak EES peak EES peak Sum of 4 EES peak

at SR at 2nd harmonic of SR at 3rd harmonic of SR at 4th harmonic of SR EES peaks of SR harmonics at BPFO at 2nd harmonic of BPFO at 3rd harmonic of BPFO at 4th harmonic of BPFO EES peaks of BPFO harmonics at BPFI and sidebands

S2I S3I S4I SIR SIOR S2B S4B S6B S8B SBF

EES peak EES peak EES peak Sum of 4 Sum of 4 EES peak EES peak EES peak EES peak Sum of 4

at 2nd harmonic BPFI and sidebands at 3rd harmonic BPFI and sidebands at 4th harmonic BPFI and sidebands EES peaks of BPFI harmonics and sidebands EES peaks of BPFO and BPFI harmonics at two times BSF and sidebands at FTF at four times BSF and sidebands at FTF at six times BSF and sidebands at FTF at eight times BSF and sidebands at FTF EES peaks of BSF and sidebands

7

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682 Table 2 Health indicators from the time domain. RMS Kurtosis Skewness

rffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2ffi 1 x RMS ¼ n ni¼1 i Pn ðx mÞ4 i¼1 i KU ¼ ðn1Þ r4 Pn ðx mÞ3 i¼1 i SK ¼ ðn1Þ r3

Variance Mean Crest Factor

Pn ðx mÞ2 i¼1 i VR ¼ ðn1Þ r2 Pn xi m ¼ i¼1 n ffiffiffiffiffiffiffiffiffiffiffiffiffiffi CF ¼ pmax Pjxn i j 2ffi 1 n

Shape Factor

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2ffi 1 x n i¼1 i SF ¼ 1P n

Impulse Factor

max jxi j IF ¼ 1P n

Clearance Factor

jxi j CLF ¼ 1Pmax n pffiffiffiffiffi 2

n

n

x

i¼1 i

i¼1

i¼1

ðn

jxi j

jxi j

i¼1

jxi jÞ

Fig. 2. Sketch map of the NSVDD with artificial outliers.

with sidebands at the SR and the Ball Spin Frequency (BSF) with sidebands at the SR. The first four (4) harmonics and their sum are chosen to keep the robustness of the HIs and to facilitate the fault detection as early as possibly. In order to compare the classification performance by using different health indicators, features from both the CSC-based EES and the CSCoh-based EES are extracted. Meanwhile, nine (9) statistic features from the time domain, as shown in Table 2, and the same twenty-one (21) frequency domain features from the Fast Kurtogram (FK) based SES are also calculated to form a comparison group. The results from these indicators are illustrated in the following section.

4.2. Artificial outlier generation for NSVDD As discussed in the previous section, the NSVDD is able to obtain a more accurate description with two slack variables. In order to extend the standard SVDD to the NSVDD without the information of real outliers in the training set, artificial outliers can be used in the feature space. In this paper, the uniform object generation algorithm is applied in the proposed NSVDD model based on the research of [25]. The sketch of the NSVDD is shown in Fig. 2. Before generating the outliers, an SVDD hyper-sphere with radius R is pre-trained with target samples X using a linear kernel. An artificial outlier set Y, is generated surrounding X and follows a Gaussian distribution i.e. Y  N (0,1). Y is constrained in another hyper-sphere with the same center and a larger radius R þ DR. The square distances of Y, i.e. ðR þ DRÞ2 , should follow the distribution of v2d in d degrees of

freedom. ðR þ DRÞ2 can then be transformed to N (0,1) with a cumulative distribution v2d . Following this method, Y will be rescaled and transformed into a unit hyper-sphere in d dimensions [25]. To give a visual comparison of the classification results achieved from the standard SVDD and the NSVDD, an example is demonstrated using the above mentioned uniform object generation algorithm. As shown in Fig. 3, a one-class 2D circular dataset with 500 uniformly distributed samples is generated as marked with circles. These data are considered as the real targets in the training set. Another 50 samples, marked with cross symbols, are generated in the same distribution but closely surrounding the targets both outside and inside the circle, to simulate the real outliers existing in the training set. The SVDD is trained with the target rejection fraction m ¼ 0:1. On the other hand, 500 artificial outliers are generated during the training of the NSVDD with the same target rejection fraction m1 ¼ 0:1, and outlier rejection fraction m2 ¼ 0:1. The radius of the artificial outliers is set 0.3 times larger than the pre-trained hyper-sphere. It should be noticed that the artificial outliers are not shown in Fig. 3. The projection of the decision boundaries can be observed as the discriminant with different kernel width s. It can be visually found that, with the decrease of s, both SVDD and NSVDD classifiers can describe more details of the dataset. However, NSVDD can capture the target objectives more precisely, with lower acceptance of the real outliers compared to the standard SVDD using the same kernel width parameter and target rejection fraction. In the bearing fault anomaly detection scenario, the features of the faulty data are closely distributed around the healthy ones during the damage premature stage. Therefore the capability to obtain a robust estimation of the boundary is critical for the classifier. The example shows that artificial outliers can significantly improve the robustness of the classifier therefore can provide a boost of the classification performance. Testing results using real bearing data will be discussed to validate the effectiveness of the NSVDD model in a later section.

8

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Fig. 3. Trained classifiers of circular dataset under different kernel width s (a) s ¼ 1 (b) s ¼ 0:75 (c) s ¼ 0:5 (d) s ¼ 0:25.

4.3. Parameter optimization with grid search cross-validation There are three hyperparameters that can be optimized in the proposed NSVDD model: the kernel width s, the target rejection fraction m1 and the outlier accepted fraction m2 . These parameters are related with the error of the NSVDD classifier according to Eq. (19). Therefore the optimization goal is to search for a minimum error K. A grid search cross-validation method is adopted in the proposed optimization framework with three parameters grouped in different combinations as shown in Fig. 4. The searching range of s is set from a small value close to zero (104 in this paper) to the maximum Euclidean distance of the training set, and is divided evenly. Meanwhile m1 and m2 are searched in [0, 0.05, 0.1] according to the research from [25]. Considering the computation time, s is searched with 9 values in this paper and the minimum value of K can be obtained from a 9 by 9 matrix, as marked by a red square in Fig. 4. K-fold cross-validation is combined with the grid search method to get an average error value as the performance of the classifier. The entire NSVDD training algorithm is shown in Algorithm 1. Algorithm 1: Training process of the proposed NSVDD with artificial outliers. The kernel width s, the upper bound of rejected targets fraction m1 , and the accepted outliers fraction m2 are hyper-parameters. 1. 2. 3. 4.

Pre-train a linear kernel SVDD classifier W with training set X Calculate the center of the hyper-sphere a and its radius R Generate Gaussian distribution artificial outliers with a and radius R þ DR to form a new training set X 0 Set up the searching range of parameters s, m1 and m2 for si , m1j , and m2j do for kth fold do  Train NSVDD classifier W k using X 0 with si , m1j , and m2j

 Calculate the error: Kk (si , m1j , m2j )=k #SV N +(1-k)f Oþ  Find the minimum Kk with parameters sopt , m1opt and m2opt end for end for 5. Train NSVDD using X, with sopt , m1opt and m2opt to get the final model

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

9

Fig. 4. Optimization of kernel width using grid search with cross-validation.

4.4. Bearing fault detection procedure Machine learning classifiers can label the testing samples but for the practical need of anomaly detection, a systematic decision strategy is needed as a complement for the NSVDD model based on CS-based indicators. The flowchart of the proposed bearing fault detection process is shown in Fig. 5. The entire procedure consists of three (3) major parts: feature extraction, classification and fault detection. The CS-based HIs are extracted from the EES to construct the feature space in the feature extraction stage. The features are then normalized by subtracting the mean value and dividing by the standard deviation to avoid the calculated Euclidean distances to be governed by particular features in the NSVDD classifier. Semisupervised methods are able to learn from fewer labelled data points with the help of a large number of unlabelled data points, therefore the training data are selected from the health conditions which normally can be gathered at the beginning period of the bearing operation. In the classification stage, the entire dataset is split into training and testing sets with respectively 25% and 75% of the whole dataset. The training set includes the validation part to realize the parameter optimization and meanwhile is labelled as healthy. Once the NSVDD classifier is trained, a moving window is applied on the testing set. The data in the window are sent to the NSVDD model to calculate the Euclidean distances from the center of the hyper-sphere. Considering the distances and the threshold i.e. the boundary of the hyper-sphere, a fault detection decision strategy is built based on the following three conditions: (1) 50% or more of the distances in the window are above the threshold. (2) 50% or more of the distances in the window continuously pass the threshold. (3) The average distance of the data in the window is equal or greater than the threshold. These conditions can keep the detection threshold robust from the influence of random dominant outliers meanwhile reducing the misdetection rate. Three bearing statuses are then defined, according to the conditions. When all the three conditions are fulfilled, the status of the bearing is considered as ‘‘Faulty”. If one or two conditions are satisfied, the status is considered as ‘‘Warning”. If the distances from the bearing data satisfy none of the three conditions, it is considered as in ‘‘Healthy” status. The starting point of the warning status represents the existence of a premature anomaly and the starting point of the faulty status is the detection point of a mature bearing defect. When applying the NSVDD classifier to the testing set, some metrics can be derived from the confusion matrix to evaluate the quality of the output as shown in Fig. 6. The metrics, including the Precision, the Recall, the Accuracy and the F-measure can be calculated based on the estimated labels and the true labels. The measure of the classification performance can then be obtained from these metrics, e.g., high values indicate a better classification performance. 4.5. Different levels of training of the anomaly detection models The performance of the machine learning model is closely related to the quality and the quantity of the training dataset. From a general perspective, extending the amount of training set data will lead to a better performing model. However, when the training data come from multiple sources under different conditions, the model will encounter the multi-source domain adaptation problem [26], and may behave extremely different comparing to a single source dataset. In an industrial application, a number of sensors can be mounted on a unit and a number of similar units are operating under similar operating conditions. The unit can be a factory machine, a wind turbine, a compressor etc. Therefore in the frames of monitoring and anomaly detection a question which arises is at which level should the training data be used: a) at a sensor level, where data are used to train a model specifically for this sensor, b) at a machine level, where data from

10

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Fig. 5. Procedure for the NSVDD-based fault detection.

all the sensors mounted on the unit are used to train a model at the unit level, c) at a fleet level, where data from all the sensors of all the machines are used to train a model. Therefore in this paper, the effectiveness of the proposed anomaly detection model based on the NSVDD model combined with the CS-based indicators is evaluated at the three levels as shown in Fig. 7. Thus the training datasets are prepared in three levels: data from a single sensor (sensor level), data from multiple sensors of one machine (machine level) and data from several sensors of several machines of a fleet (fleet level). Following the three levels, three NSVDD models are trained with CS-based indicators and then sensor data are used in order to evaluate the performance of the method. 5. Experimental setup and dataset The proposed NSVDD based semi-supervised methodology is tested and validated using the NASA Intelligent Maintenance Systems (IMS) dataset [27]. The IMS dataset was collected during a rolling element bearing endurance experiment using a dedicated test rig at the University of Cincinnati. The layout of the test rig is shown in Fig. 8, and consists of an electric

Fig. 6. Calculation of the evaluation metrics.

11

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

motor coupled with a rub belt operating at 2000 rpm, four (4) double row Rexnord ZA-2115 bearings mounted on a common shaft lubricated by a circulation system, a 6000 lbs radial load applied on Bearing 2 and 3 and a shaft. Eight PCB 353B33 High Sensitivity Quartz ICP accelerometers are mounted horizontally and vertically on the bearing housings. Three run-to-failure tests have been performed. The collection of data was stopped when the accumulated debris exceeded a certain level on a magnetic plug and that event was considered as the end of the bearing life. The physical parameters and the characteristic frequencies can be found in Fig. 8. Some details concerning the datasets should be highlighted. According to the document describing the experiment, the sampling frequency was 20 kHz but the duration of each signal is one second (1 s) and its length is 20,480 samples. Therefore probably the sampling frequency was not 20 kHz but 20.48 kHz and this one is used in this paper. Moreover, the measurements of Dataset 1 are not continuous. Several interruptions can be detected. These interruptions influence the behaviour of some HIs and this issue will be discussed in the following section. Each dataset signal is marked with the capturing time (Day) and the number of the signal (# signal number).

6. Application of the method and analysis of the results The three datasets, including signals from twelve (12) bearings, are processed using the proposed method. The results are discussed in three sub-sessions according to the three levels of model training: at the sensor level, the data from each single channel is divided into a training set and a testing set, to train and test the NSVDD model. In the machine level, the data from all the sensors of each test (three in total) are used to train the model, but the testing set contains only data from each single sensor channel. Moreover the three tests can be regarded as data from three independent machines which make a fleet. Therefore, the fleet level anomaly detection is performed by training with data from all the tests, and testing on each single sensor data. The classification performance of a method is usually evaluated based on a number of metrics, e.g. the Precision, the Recall, the Accuracy and the F-measure. In the condition monitoring context, the true time instance at which a fault is generated is practically unknown and absent from all publicly available datasets. In order to be able to use metrics to evaluate in a consistent way the efficacy of the proposed semi supervised learning classification method a time of detection is needed as

Fig. 7. The anomaly detection framework with NSVDD model, evaluated on three levels: i) sensor level, ii) machine level and iii) fleet level.

Physical Parameters Pitch diameter

71.5 mm

Rolling element diameter

8.4 mm

Number of rolling element per row

16

Contact angle

15.7

Static load

26690 N

Characteristic Frequencies Shaft rotation speed

33 Hz

Ball pass frequency outer race (BPFO)

236 Hz

Ball pass frequency inner race (BPFI)

297 Hz

Ball spin frequency (BSF)

139 Hz

Fundamental train frequency (FTF)

15 Hz

Fig. 8. IMS test rig layout (left) and characteristics of bearings (right).

12

Dataset 1 Bearing

Bearing 1

Dataset 2 Bearing 2

Bearing 3

Bearing 4

Dataset 3

Bearing 1

Bearing 2

Bearing 3

Bearing 4

Bearing 1

Bearing 2

Bearing 3

Bearing 4

Channel

1

2

3

4

5

6

7

8

1

2

3

4

1

2

3

4

SVM [28]















#531 (3.7 Days)















OC-SVM [21] m-SVM [29] PCA-SVDD [21] HMM-DPCA [30]

– – – –

– – – –

– – – –

– – – –

– – – –

– – – –

– – – –

#609 #602 #574 #539

– – – –

– – – –

– – – –

– – – –

– – – –

– – – –

– – – –

HMM-PCA [30]















#538 (3.7 Days)















Self-EES [15]









#1803 (31.0 Days) – – – #1760 (30.7 Days) #1780 (30.8 Days) –









– –

– –

– –

– –

– –

– –

– –

– –

– –

Auto-encoder [33]















– –

#527 (3.7 Days) –

– –

– –

– –

– –

– –

#2435 (16.9 Days) – –



– –

#1673 (30.0 Days) – 25 Days (#1179- #1192)



– –

#1681 (33.0 Days) – –



– –

#2122 (34.0 Days) – 30 Days (#1626- #1684)

#610 (4.2 Days)

LSTM [21] GAN [34]

– 29 Days (#1508- #1525) #2120 (33.9 Days) – –

4.1 Days (#585- #598) – –



– –

5.4 Days (#772- #785) – –



– –

4.5 Days (#642- #655) – –



Discrepancy analysis [32] Spectral model [31]

3.4 Days (#483- #497) #535 (3.7 Days) –

(4.2 (4.2 (4.0 (3.7

Days) Days) Days) Days)

– –

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Table 3 Fault detection results for the NASA IMS dataset based on a literature review from 2012 till November 2019.

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

13

Table 4 Parameters of the three classifiers: Back-Propagation Neural Network, Random Forest, K-Nearest Neighbour. BPNN Random Forest KNN

Number of layers: 3 (100–20-1); Activation function: Sigmoid; Learning rate: 0.01. Number of trees: 500; Criterion: Mean Squared Error (MSE). K = 20; Weights: Uniform.

Fig. 9. Feature maps of the Channel 5 measured on the Bearing 3 at Dataset 1: (a) Time domain (b) FK based SES domain (c) CSC domain (d) CSCoh domain.

Fig. 10. Detection results of the Channel 5, measured on Bearing 3, at Dataset 1 with CSC HIs (a) distances to the center of the NSVDD (b) decision results.

a reference. Therefore an extensive literature review as been realised and the results of the recently published papers are presented in the following section. 6.1. Literature results and comparative methods NASA IMS dataset has been widely used in order to evaluate the performance of diagnostic indicators and anomaly detection methodologies. Detection results published from 2012 till now (November 2019) are summarized in Table 3 mentioning the time of first detection in days and as a signal number. The ‘‘–” symbol is used to signal that the result of the correspond-

14

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Fig. 11. Detection results of the Channel 5, measured on Bearing 3, at Dataset 1 with CSCoh HIs (a) distances to the center of the NSVDD (b) decision results.

Fig. 12. Detection results of Dataset 1 Bearing 3 Channel 5 with Time HIs (a) distances to the center of the NSVDD (b) decision results.

Fig. 13. Detection results of Dataset 1 Bearing 3 Channel 5 with SES HIs (a) distances to the center of the NSVDD (b) decision results.

ing channel is not given in the corresponding reference. The results depict that various anomaly detection techniques based on machine learning algorithms [28,21,29,4], statistical methods [15,30–32] and deep learning frameworks [21,33,34] have been tested over the IMS datasets. Despite the performance of these methods, it is surprising to realise that so far no research has been published with a comprehensive analysis of the complete dataset including the 12 bearings and the 3 datasets. The existing literature, where IMS dataset is used, tends to concentrate more on the faulty bearings (Bearing 3 and Bearing 4 of Dataset 1, Bearing 1 of

15

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Fig. 14. ROC curves and the AUC for different domain HIs. Table 5 Classification performance for different classifiers on Dataset 1 Bearing 3. Precision

Recall

Accuracy

F-measure

AUC

CSC based NSVDD CSCoh based NSVDD

0.896 0.882

1.000 0.994

0.920 0.905

0.945 0.935

0.983 0.961

Time based NSVDD FK based SES NSVDD

0.841 0.617

0.961 0.985

0.851 0.697

0.897 0.759

0.916 0.932

CSC based SVDD BPNN Random Forest KNN

0.884 0.808 0.826 0.567

0.945 0.937 0.887 0.904

0.871 0.810 0.785 0.619

0.914 0.868 0.855 0.697

0.958 0.975 0.921 0.905

Fig. 15. Feature maps of Dataset 1 Bearing 4 Channel 7 (a) Time domain (b) FK based SES domain (c) CSC domain (d) CSCoh domain.

16

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Fig. 16. Detection results of Dataset 1 Bearing 4 Channel 7 with CSC HIs (a) distances to the center of the NSVDD (b) decision results.

Table 6 Fault detection results of IMS Dataset 1. Bearing 1

Bearing 2

Bearing 3

Bearing 4

Channel

1

2

3

4

5

6

7

8

CSC-based NSVDD CSCoh-based NSVDD Time feature NSVDD FK-based SES NSVDD CSC-based SVDD

#2140 (34.1 Days) No detection

No detection No detection No detection No detection No detection No detection No detection No detection

#1840 (31.2 Days) #2123 (34.0 Days) #2105 (33.4 Days) #1827 (31.1 Days) #1811 (31.0 Days) #1828 (31.1 Days) #2025 (32.7 Days) #2130 (34.0 Days)

#2112 (33.4 Days) #2127 (34.0 Days) #2123 (34.0 Days) #2147 (34.1 Days) #2120 (33.9 Days) #2120 (33.9 Days) No detection

#1796 (30.9 Days) #1796 (30.9 Days) #1852 (31.3 Days) #1804 (31.0 Days) #1799 (30.9 Days) #1802 (30.9 Days) #1858 (32.8 Days) #2026 (32.7 Days)

#1823 (31.1 Days) #1884 (31.6 Days) #1926 (31.9 Days) #1897 (31.7 Days) #1842 (31.2 Days) #1834 (31.2 Days) #2079 (33.2 Days) #2070 (33.2 Days)

#831 (18.7 Days) #845 (18.8 Days) #1449 (28.0 Days) #1032 (23.9 Days) #847 (18.9 Days) #831 (18.7 Days) #1461 (28.1 Days) #1473 (28.2 Days)

#899 (23.0 Days) #878 (19.1 Days) #1462 (28.1 Days) #1043 (24.0 Days) #872 (19.0 Days) #912 (23.1 Days) #1557 (29. 1 Days) #1591 (29.3 Days)

BPNN Random Forest KNN

#2152 (34.2 Days) No detection #2140 (34.1 Days) #2145 (34.1 Days) #2157 (34.2 Days) No detection

No detection

Dataset 2 and Bearing 3 of Dataset 3). However, it is worthy to examine the data of the health bearings within the same detection frame in order to evaluate the rate of false alarms. Moreover the effective evaluation of the detection method should be conducted based on the ground truth of the bearing fault status i.e. the exact time point of fault generation. In this paper, the detection results from the first row of Table 3 are regarded as the reference since the methodology used in [28] has been consistently applied on two cases of the faulty bearings. As mentioned before, the specific reference was selected, based on the authors’ literature knowledge and experience, in order to be able to evaluate in a better way the anomaly detection method using dedicated metrics. The interested reader could of course select another reference for comparison. Moreover in order to better evaluate its classification and anomaly detection performance, the proposed methodology is compared with three machine learning methods: the Back-Propagation Neural Network (BPNN), the Random Forest (RF) and the K-Nearest Neighbour (KNN). It should be mentioned that SVDD is a semi supervised learning method but the trick of the generation of the artificial outliers allows for a direct and consistent comparison with supervised learning methods. The selection of the key parameters of the three classifiers is shown in Table 4. 6.2. Sensor level detection 6.2.1. Dataset 1 In order to compare the classification results of different HIs, the time domain HIs, the FK based SES domain HIs, the CSC domain HIs and the CSCoh domain HIs, after normalization, estimated by processing the data of Channel 5 of the Bearing 3 of the Dataset 1, are presented in the maps of Fig. 9. It should be noted that the time axes are not in equal intervals for these maps as well as for other figures involving Dataset 1 in this section, due to the interruptions mentioned before. The time axis represents the real time points in days from the beginning (Day 0) till the end of the experiment (Day 34.5) but not the recording time. Analysing the time features, the FK based SES features, the CSC features and the CSCoh features, it can be visually observed that there is a significant changing step before Day 7.1 (#157) due to a long period interruption. Therefore

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

17

Fig. 17. Feature map of Dataset 2 Bearing 1 Channel 1 (a) Time domain (b) FK based SES domain (c) CSC domain (d) CSCoh domain.

Fig. 18. Detection results of Dataset 2 Bearing 1 with CSC HIs (a) distances to the center of the NSVDD (b) decision results.

the data before Day 7.1 are abandoned to avoid contamination of the healthy training set, which is selected from Day 7.4 (#200) to Day 17 (#600). The remaining data are used for testing. The training and testing results of the Bearing 3 of the Dataset 1 are demonstrated by plotting the distances of the samples to the center of the NSVDD as well as the damage stages based on the proposed decision strategy. The results of the CSC and the CSCoh HIs are shown, respectively, in Fig. 10 and 11. The automatic parameter optimization method is also used in order to estimate the optimal value of s; m1 and m2 . It can be found that the warning stage for the CSC HIs starts from Day 29.1 (#1538) till Day 30.9 (#1795) and the fault is detected at Day 30.9 (#1796). Meanwhile, for the CSCoh HIs, the warning stage is about 5 days and the fault is also detected at Day 30.9 (#1796). The HIs from the time and the FK based SES domain are also used to train the NSVDD model, as shown in Figs. 12 and 13. Compared to the results of the CSC HIs, it is obvious that the distances of the time and the FK based SES indicators exhibit wide fluctuations around the threshold. The outliers above the threshold spread in the whole testing range for both HI groups. Hence, the fault detection results show longer warning stages for the two groups with, respectively, 13.3 and 8 days. The time indicators detect the fault at Day 31.3 (#1851) and the FK based SES indicators give a detection at Day 31.0 (#1801).

18

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Fig. 19. Detection results of Dataset 2 Bearing 1 with CSCoh HIs (a) distances to the center of the NSVDD (b) decision results.

Table 7 Fault detection results of IMS Dataset 2. Bearing 1

Bearing 2

Bearing 3

Bearing 4

Channel

1

2

3

4

CSC-based NSVDD CSCoh-based NSVDD Time feature NSVDD FK-based SES NSVDD CSC-based SVDD BPNN Random Forest KNN

#532 #534 #550 #541 #540 #570 #653 #612

#817 (5.7 Days) #952 (6.6 Days) #954 (6.6 Days) #952 (6.6 Days) #820 (5.7 Days) #955 (6.6 Days) No detection No detection

#656 #701 #682 #701 #663 #683 #709 #706

(3.7 (3.7 (3.8 (3.7 (3.7 (3.9 (4.5 (4.2

Days) Days) Days) Days) Days) Days) Days) Days)

#703 #709 #709 #706 #702 #725 #862 #812

(4.9 (4.9 (4.9 (4.9 (4.9 (5.0 (6.0 (5.6

Days) Days) Days) Days) Days) Days) Days) Days)

(4.5 (4.9 (4.7 (4.9 (4.6 (4.7 (4.9 (4.9

Days) Days) Days) Days) Days) Days) Days) Days)

The detection results show that the time and the FK based SES indicators are sensitive to the changing of the external environment due to the recording interruptions. On the contrary, the distances of the NSVDD results from both the CSC and the CSCoh indicators keep steady during most of the testing period which indicates that the CS-based HIs are robust in detecting the bearing faults. The Receiver Operating Characteristic (ROC) curve is used to evaluate the classification performance of features from different domains, as shown in Fig. 14. The corresponding Area Under the ROC Curve (AUC) is used as the metric to evaluate the classification performance. The classification results from the confusion matrices together with the AUC value for the different domain HIs and the three other classifiers can be found in Table 5. It can be noted that the CSC indicators achieve better performance compared to other indicators in the NSVDD frame. Additionally it is worth mentioning that, the recall value of the standard SVDD decreases when using the same input HIs, which indicates the presence of more false negative samples comparing to the NSVDD. Due to the lack of constraint on the margin of the outliers, the standard SVDD fails to describe a more accurate hyper-sphere and results in a low recall value. Comparing to the other types of classifiers, it can be concluded that the CSC-NSVDD model presents superior results in the accurate detection of the bearing faults. The second reported faulty bearing at the Dataset 1 is the Bearing 4, presenting a rolling element defect. The feature maps of this bearing from different domain HIs are shown in Fig. 15. The CSC indicators detect the fault initiation at Day 17.4 (#658) and get mature at Day 18.7 (#831), as shown in Fig. 16. The detailed detection results for the four bearings of Dataset 1 are listed in Table 6. The first two rows indicate the results of the CSC and the CSCoh HIs and the 3rd and 4th rows represent the results from the time and the FK based SES domain features. To compare the results between the standard SVDD and the proposed NSVDD, the CSC HIs are also used as inputs at a standard SVDD model, resulting in the 5th row. Furthermore, the BPNN, the Random Forest and the KNN are also used on the CSC features as shown at the 6th to the 8th row. The earliest detection for each channel is marked in bold. Although Bearing 3 and Bearing 4 are the bearings reported as faulty, anomalies are also detected from the other two bearings. The same phenomenon is also found in the other two datasets. The detection of such ‘‘false alarms” could be due to two reasons: (1) Since the four bearings are assembled on the same shaft, the vibration caused by the faulty bearing may be transmitted to the healthy ones.

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

19

Fig. 20. CSC HIs of Dataset 2 Bearing 1 with different kernel width s (a) s ¼ 2:63 (b) s ¼ 5:26 (c) s ¼ 7:90 (d) s ¼ 10:54.

(2) The phenomenon of the cross-talk might happen at the vibration measurement system which makes one channel to capture the signal from the others. 6.2.2. Dataset 2 The HI maps of the Bearing 1 of the Dataset 2 are shown in Fig. 17. Since the measurement of this dataset is continuous without any interruption, all the indicators behave more monotonically than in Dataset 1. A clear separation can be seen at Day 4.9 for all the indicators which represent a dramatic increasing of the HIs. On the other hand, the HIs from the CSC and the CSCoh show a forehead increase at Day 3.8 for the indicators constructed with the harmonics and the sum of the BPFO, which is the fault type in this case. The distances from the NSVDD and the fault detection results of the CSC-based indicators are shown in Fig. 18. The training set is selected from Day 0.3 (#50) to Day 2.1 (#300) and the remaining data are used for testing. Less distances pass the threshold, highlighting the monotonicity of the indicators. The warning stage is 0.1 days, from Day 3.6 (#521) to Day 3.7 (#531), which is relatively short due to the indicators with more tendency. Then the bearing fault is detected at Day 3.7 (#532). CSCoh domain HIs show similar results with a detection at Day 3.7 (#534) as shown in Fig. 19. More results can be found in Table 7. The CS-based indicators provide the earliest detection compared to the time indicators (Day 3.8, #550) and the FK based SES indicators (Day 3.7, #541). NSVDD also outperforms the standard SVDD and the three other classifiers. The amplitudes of the characteristic frequencies and their harmonics from the EES present higher monotonicity and trendability which provide evidence in the construction of the feature space during the training of the NSVDD model. As a result, the anomaly detection performs more proactively using the CS-based indicators. In order to validate the proposed parameter optimization method, NSVDD models with various kernel widths s are tested with the CSC-based HIs on the data of Bearing 1 of Dataset 2. Some distances of the trained models are plotted in Fig. 20. In the test, the other two parameters m1 and m2 are kept to 104 since they have comparatively minor influence on the classification performance. It can be observed that with the increasing of s, the distances of the testing set are separated more clearly into two different stages. Meanwhile the threshold also changes with the kernel width towards an optimal value with respect to the target rejection and the outlier acceptance fraction. Table 8 shows the influence of s to the training error K and the testing performance AUC. It can be found that the lower training error will lead to a high AUC and therefore to a better classification model.

20

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Table 8 Training error and testing performance with different kernel width s. s

K AUC

104 0.892 0.523

2.63

5.26

7.90

10.54

13.17

15.80

18.43

21.07

0.436 0.655

0.133 0.762

0.073 0.803

0.071 0.852

0.060 0.968

0.058 0.980

0.057 0.982

0.058 0.973

Fig. 21. Feature map of Dataset 3 Bearing 3 (a) Time domain (b) FK based SES domain (c) CSC domain (d) CSCoh domain.

Fig. 22. Detection results of Dataset 3 Bearing 3 with CSC HIs (a) distances to the center of the NSVDD (b) decision results.

6.2.3. Dataset 3 The fault of Bearing 3 of the Dataset 3 is developed at the end of the test as can be seen from the feature maps in Fig. 21. All the four groups of the HIs stay steady until the end of the test around 40 Days. The detection results of the CSC and the CSCoh HIs are shown in Figs. 22 and 23. The detection results of all the bearings in Dataset 3 are shown in Table 9.

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

21

Fig. 23. Detection results of Dataset 3 Bearing 3 with CSCoh HIs (a) distances to the center of the NSVDD (b) decision results.

Fig. 24. Machine level detection for (a) Dataset 1 Bearing 3 (b) Dataset 1 Bearing 4 (c) Dataset 2 Bearing 1 (d) Dataset 3 Bearing 3.

6.3. Machine level detection The machine level detection aims to train a model, with all the sensor level training sets from one subset, which is tested then with the testing set of each sensor independently. For instance, in the machine level detection of Dataset 1, the NSVDD model is trained with a dataset including the healthy data (#200 to #600) from all the eight channels, which form a joint training set with 3208 samples, as it would happen in a real case. Then the trained NSVDD model is tested with the data range (#601 to #2156) of each channel. The CSC HIs are used for the machine level detection as shown in Fig. 24. The detection results of the faulty bearings are listed in Table 10.

22

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

Table 9 Fault detection results of IMS Dataset 3.

Channel

Bearing 1 1

Bearing 2 2

Bearing 3 3

Bearing 4 4

CSC-based NSVDD CSCoh-based NSVDD Time feature NSVDD FK-based SES NSVDD CSC-based SVDD BPNN Random Forest KNN

#5977 #6015 #6132 #6020 #5977 #6107 #6225 #6172

#6153 #6173 #6203 #6173 #6157 #6172 #6226 #6190

#5973 #6189 #6227 #6231 #5980 #6196 #6213 #6182

#6080 #6102 #6105 #6094 #6092 #6098 #6219 #6131

(41.5 Days) (41.8 Days) (42.6 Days) (41.8 Days) (41.5 Days) (42.4 Days) (43.2 Days) (42.9 Days)

(42.7 Days) (42.9 Days) (43.0 Days) (42.9 Days) (42.7 Days) (42.4 Days) (43.2 Days) (43.0 Days)

(41.5 Days) (42.9 Days) (43.2 Days) (43.2 Days) (41.5 Days) (42.3 Days) (43.1 Days) (42.9 Days)

(42.2 (42.2 (42.2 (42.1 (42.3 (42.3 (43.2 (42.6

Days) Days) Days) Days) Days) Days) Days) Days)

Table 10 Test level fault detection results of CSC-NSVDD model.

Detection date

Dataset 1 Bearing 3

Dataset 1 Bearing 4

Dataset 2 Bearing 1

Dataset 3 Bearing 3

Day 31.2 #1837

Day 28.3 #1487

Day 5.1 #736

Day 43.6 #6275

Fig. 25. Fleet level detection for (a) Dataset 1 Bearing 3 (b) Dataset 1 Bearing 4 (c) Dataset 2 Bearing 1 (d) Dataset 3 Bearing 3.

It can be found that the machine level detection results are close to the sensor level but with a certain delay of 0.5 to 1.3 Days to the detection of the mature bearing damage. On the one hand, the relatively small variance of the delay proves that the CSC HIs keep robust when the training set is more general containing data from other sensors. On the other hand, the training over a set of sensors seems to decrease the detection accuracy.

23

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682 Table 11 Fleet level fault detection results of CSC-NSVDD model.

Detection date

Dataset 1 Bearing 3

Dataset 1 Bearing 4

Dataset 2 Bearing 1

Dataset 3 Bearing 3

Day 32 #2083

Day 29.6 #1627

Day 3.9 #556

Day 43.8 #6315

6.4. Fleet level detection Moreover the fleet level detection is proposed in order to evaluate the proposed NSVDD model under the assumption of the existence of data from many similar machines. Since the three tests of the IMS dataset have been realised independently with the same setup, they can be regarded as a fleet of three machines running simultaneously. The training set of the fleet level detection includes all the sub-set training data from the 12 bearings and each single bearing dataset is then tested with the trained model. The distances to the NSVDD center detection for the faulty bearings are shown in Fig. 25. The fault detection results are shown in Table 11. With the enlargement of the training set with multi-source data, the NSVDD classifier with the CSC HIs can still detect all the bearing faults, which further proves the robustness and the efficacy of the method. However, most of the detections are delayed compared to the sensor and the machine level results with the exception of the testing result of the Bearing 3 of the Dataset 2, which achieves a detection 1.2 days earlier than the machine level. The reason could be that the features from other sources lead to the reduction of the radius of the hyper-sphere, which makes the NSVDD to reject more samples that are supposed to be in the target class, leading finally to an early detection. 7. Conclusion In this paper, a novel rolling element bearing automatic fault detection approach is proposed by combining powerful cyclostationary-based indicators and a semi-supervised SVDD model with artificial outliers. Experimental results from run-to-failure bearing datasets prove that bearing faults can be accurately detected with the proposed methodology. The proposed NSVDD model utilizes artificial outliers to achieve a better classification performance compared to the standard SVDD, which has been proved using both artificial and real bearing test datasets. An automatic parameter optimization algorithm is additionally embedded in the framework to improve the classification accuracy. Compared to the indicators from the time and the FK based SES domain, cyclostationary-based diagnostic indicators are more robust to the changing of external environment and can improve the performance of the NSVDD classifier. The proposed methodology outperforms other types of classifiers including BPNN, Random Forest and KNN. Furthermore, in order to transform the classification results into reasonable anomaly detection decision, a fault detection decision strategy has been proposed, which can map the classification results into three stages representing different damage severity levels. In addition, the CSC-NSVDD method is able to detect faults based on model trained at a sensor, a machine and a fleet level. To conclude, the method appears having a strong practical significance in industrial applications since it is effective to provide information for both bearing failure warning and damage detection. Acknowledgement Chenyu Liu would like to acknowledge the support from the China Scholarship Council. References [1] R.B. Randall, J. Antoni, Rolling element bearing diagnostics-a tutorial, Mech. Syst. Signal Process. 25 (2) (2011) 485–520. [2] B. Zhang, G. Georgoulas, M. Orchard, A. Saxena, D. Brown, G. Vachtsevanos, S. Liang, Rolling element bearing feature extraction and anomaly detection based on vibration monitoring, in: 2008 16th Mediterranean Conference on Control and Automation, IEEE, 2008, pp. 1792–1797. [3] S. Schmidt, P.S. Heyns, Localised gear anomaly detection without historical data for reference density estimation, Mech. Syst. Signal Process. 121 (2019) 615–635. [4] L. Montechiesi, M. Cocconcelli, R. Rubini, Artificial immune system via euclidean distance minimization for anomaly detection in bearings, Mech. Syst. Signal Process. 76 (2016) 380–393. [5] K. Gryllias, S. Moschini, J. Antoni, Application of cyclo-nonstationary indicators for bearing monitoring under varying operating conditions, J. Eng. Gas Turbines Power 140 (1) (2018) 012501. [6] P. Borghesani, R. Ricci, S. Chatterton, P. Pennacchi, A new procedure for using envelope analysis for rolling element bearing diagnostics in variable operating conditions, Mech. Syst. Signal Process. 38 (1) (2013) 23–35. [7] J. Antoni, R. Randall, The spectral kurtosis: application to the vibratory surveillance and diagnostics of rotating machines, Mech. Syst. Signal Process. 20 (2) (2006) 308–331. [8] S.C. Bradford, Time-frequency analysis of systems with changing dynamic properties, 2006.. [9] J. Lin, L. Qu, Feature extraction based on morlet wavelet and its application for mechanical fault diagnosis, J. Sound Vibr. 234 (1) (2000) 135–148. [10] J. Antoni, F. Bonnardot, A. Raad, M. El Badaoui, Cyclostationary modelling of rotating machine vibration signals, Mech. Syst. Signal Process. 18 (6) (2004) 1285–1314. [11] J. Antoni, Cyclic spectral analysis in practice, Mech. Syst. Signal Process. 21 (2) (2007) 597–630. [12] J. Antoni, Cyclostationarity by examples, Mech. Syst. Signal Process. 23 (4) (2009) 987–1036.

24

C. Liu, K. Gryllias / Mechanical Systems and Signal Processing 140 (2020) 106682

[13] R. Mauricio, A. Miguel, W. Smith, R. Randall, J. Antoni, K. Gryllias, Cyclostationary-based tools for bearing diagnostics, Online Proceedings of ISMA2018 including USD 2018 (2018) 905–918. . [14] A. Mauricio, J. Qi, K. Gryllias, Vibration-based condition monitoring of wind turbine gearboxes based on cyclostationary analysis, J. Eng. Gas Turbines Power 141 (3) (2019) 031026. [15] S. Kass, A. Raad, J. Antoni, Self-running bearing diagnosis based on scalar indicator using fast order frequency spectral coherence, Measurement 138 (2019) 467–484. [16] J. Qi, A. Ricardo Mauricio, K. Gryllias, Enhanced particle filter and cyclic spectral coherence based prognostics of rolling element bearings, in: PHM Society European Conference, vol. 4, 2018. . [17] K.C. Gryllias, I.A. Antoniadis, A support vector machine approach based on physical model training for rolling element bearing fault detection in industrial environments, Eng. Appl. Artif. Intell. 25 (2) (2012) 326–344. [18] T. Shon, Y. Kim, C. Lee, J. Moon, A machine learning framework for network anomaly detection using SVM and ga, in: Proceedings from the sixth annual IEEE SMC information assurance workshop, IEEE, 2005, pp. 176–183. [19] R. Perdisci, G. Gu, W. Lee, et al, Using an ensemble of one-class SVM classifiers to harden payload-based anomaly detection systems, ICDM, vol. 6, Citeseer, 2006, pp. 488–498. [20] P. Bangalore, L.B. Tjernberg, An artificial neural network approach for early fault detection of gearbox bearings, IEEE Trans. Smart Grid 6 (2) (2015) 980–987. [21] W. Lu, Y. Li, Y. Cheng, D. Meng, B. Liang, P. Zhou, Early fault detection approach with deep architectures, IEEE Trans. Instrum. Meas. 67 (7) (2018) 1679– 1689. [22] D.M. Tax, R.P. Duin, Support vector domain description, Pattern Recognit. Lett. 20 (11–13) (1999) 1191–1199. [23] D.M. Tax, R.P. Duin, Support vector data description, Mach. Learn. 54 (1) (2004) 45–66. [24] T. Mu, A.K. Nandi, Multiclass classification based on extended support vector data description, IEEE Trans. Syst., Man, Cybern. Part B (Cybernetics) 39 (5) (2009) 1206–1216. [25] D.M. Tax, R.P. Duin, Uniform object generation for optimizing one-class classifiers, J. Mach. Learn. Res. 2 (Dec) (2001) 155–173. [26] H. Daumé III, Frustratingly easy domain adaptation, 2009. arXiv preprint arXiv:0907.1815. . [27] H. Qiu, J. Lee, J. Lin, G. Yu, Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics, J. Sound Vibr. 289 (4–5) (2006) 1066–1090. [28] K. Gryllias, I. Antoniadis, C. Yiakopoulos, A novel semi-supervised mathematical morphology-based fault detection & classification method for rolling element bearings, in: Proceedings of the 22nd International Congress on Sound and Vibration, Int Inst Acoustics & Vibration, 2015. . [29] D. FernáNdez-Francos, D. MartíNez-Rego, O. Fontenla-Romero, A. Alonso-Betanzos, Automatic bearing fault diagnosis based on one-class m-svm, Comput. Ind. Eng. 64 (1) (2013) 357–365. [30] J. Yu, Health condition monitoring of machines based on hidden markov model and contribution analysis, IEEE Trans. Instrum. Meas. 61 (8) (2012) 2200–2211. [31] F. Dietel, R. Schulze, H. Richter, J. Jäkel, Fault detection in rotating machinery using spectral modeling, in: 9th France-Japan & 7th Europe-Asia Congress on Mechatronics (Mecatronics)/13th Int’l Workshop on Research and Education in Mechatronics (REM), IEEE, 2012, pp. 353–357. [32] S. Schmidt, P.S. Heyns, K.C. Gryllias, A discrepancy analysis methodology for rolling element bearing diagnostics under variable speed conditions, Mech. Syst. Signal Process. 116 (2019) 40–61. [33] R.M. Hasani, G. Wang, R. Grosu, An automated auto-encoder correlation-based health-monitoring and prognostic method for machine bearings, 2017. arXiv preprint arXiv:1703.06272. . [34] S. Baggeröhr, W. Booyse, P. Heyns, D. Wilke, Novel bearing fault detection using generative adversarial networks, Condition Monitor. Diagnostic Eng. Manage. (2018) 243.