A novel geodesic flow kernel based domain adaptation approach for intelligent fault diagnosis under varying working condition

A novel geodesic flow kernel based domain adaptation approach for intelligent fault diagnosis under varying working condition

Journal Pre-proof A novel geodesic flow kernel based domain adaptation approach for intelligent fault diagnosis under varying working condition Zhong...

883KB Sizes 1 Downloads 35 Views

Journal Pre-proof

A novel geodesic flow kernel based domain adaptation approach for intelligent fault diagnosis under varying working condition Zhongwei Zhang , Huaihai Chen , Shunming Li , Zenghui An , Jinrui Wang PII: DOI: Reference:

S0925-2312(19)31355-4 https://doi.org/10.1016/j.neucom.2019.09.081 NEUCOM 21336

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

22 February 2019 6 July 2019 28 September 2019

Please cite this article as: Zhongwei Zhang , Huaihai Chen , Shunming Li , Zenghui An , Jinrui Wang , A novel geodesic flow kernel based domain adaptation approach for intelligent fault diagnosis under varying working condition, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.09.081

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

A novel geodesic flow kernel based domain adaptation approach for intelligent fault diagnosis under varying working condition Zhongwei Zhang1, Huaihai Chen1, Shunming Li2, Zenghui An2, Jinrui Wang3 1. State Key Laboratory of Mechanics and Control of Mechanical Structures, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China 2. College of Energy and Power Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China 3. College of Mechanical and Electronic Engineering, Shandong University of Science and Technology, Qingdao 266590, China Abstract: Domain adaptation techniques have drawn much attention for mechanical defect diagnosis in recent years. Nevertheless, the traditional domain adaptation approaches may suffer two shortcomings: (1) Poor performance is obtained for many traditional domain adaptation approaches when the sample is insufficient. (2) The diagnosis results are not stable, that is to say, the traditional domain adaptation approaches may have poor robustness. In order to overcome these deficiencies, we propose a novel domain adaptation model named DAGSZ based on geodesic flow kernel (GFK), strengthened feature extraction and Z-score normalization. Firstly, time domain average and square for the power spectral density (PSD) matrix is applied for preprocessing the original vibration data to learn more representative features. Then, the geodesic flow kernel (GFK), an unsupervised domain adaptation method, is adopted for learning the transferable features. Finally, Z-score normalization is employed to normalize the learned transferable features and softmax regression is utilized to classify the health conditions. The realworld dataset of gears and bearings are employed to validate the effectiveness and robustness of our method. The result shows that DAGSZ obtains fairly high detection accuracies and is superior to the existing methods for mechanical fault detection. Keywords: Domain adaptation; GFK; strengthened feature extraction; Z-score normalization. Introduction Due to the fast development of the mechanical techniques, agricultural mechanization has become an important feature for modern agriculture. The fault rate of agricultural machineries is relatively high because of the tough and various working conditions. Engine is the vital part to provide power for the agricultural machinery. Any faults of engine may result in unwanted fatal breakdowns, expensive repair costs and even human labors. Furthermore, any defects in key component of engine (e.g. the bearings and gears) need to be diagnosed as early as possible. Consequently, the defect detection for bearings and gears has drawn extensive attention to ensure the normal operation of the engine [1-4]. In recent years, mechanical defect diagnosis has entered the era of Big Data [5] which owns the properties of large-volume, diversity and high-velocity. Thus, how to draw the features rapidly and accurately from mechanical big data becomes an urgent research subject. Unfortunately, the traditional defect diagnosis approaches perform badly when analyzing the fault of complex structure, and largely depend on much prior knowledge about signal processing techniques and diagnosis expertise. To overcome these shortcomings, mechanical defect diagnosis has been developed from traditional technologies to the machine learning techniques, such as deep belief networks (DBNs) [6], stacked auto-encoders (SAEs) [7], restricted Boltzmann machine (RBM) [8] and convolutional neural networks (CNN) [9, 10]. The machine learning techniques depend less on human knowledge and can achieve good performance in many fault diagnosis issues. Jia et al.

1

[11] proposed a five layers SAE-based DNNs framework and applied the frequency-domain data as the input of motor bearing defect detection. Li et al. [12] employed a deep random forest to fuse the outputs of the two deep Boltzmann machines (DBMs) for gearbox fault diagnosis. Machine learning has achieved marvelous success in the field of defect diagnosis recently [13]. However, the traditional machine learning algorithms perform well only based on a common assumption: The training and test data are extracted from the same probability distribution. Nevertheless, in the real world applications, the raw vibration data are usually acquired under different work conditions which run counter to the above assumption, and lead to large distribution divergences which may cause the performances of intelligent defect diagnosis approaches drop dramatically. This phenomenon is usually known as the cross-domain learning problem. For the cross-domain learning problem, two distinct domains are contained: the source domain and the target domain. The source domain includes a large amount of labeled data used for building the model. Nevertheless, in many real world applications, acquiring enough typically labeled data is expensive or even impossible. Comparing with the source domain, the target domain contains a related dataset which has different distributions used for model application. In order to build robust classifiers, it is critical to consider the shift between the source domain and target domain. This issue is called as domain adaptation (DA). The learning strategies of most published DA works can be roughly classified into two categories: (1) instance-based [14, 15], the source data is reweighted according to the shared information contained in the target data, and then a further analysis is employed on the reweighted source data [19-21]. Peng et al. [14] performed instance reweighting by imposing an ℓ2,1 norm on the embedding matrix. Zhang et al. [15] developed a novel framework based on multi-instance learning (MIL) algorithm which considered the distribution change. (2) subspace based [16-18], a common feature subspace is usually learned to ensure the classifier trained in the source domain can generalize well in the target domain [22-31]. A domain space transfer extreme learning machine (DST-ELM) approach was developed for unsupervised domain adaptation problems, which only required a small number of hidden nodes [29]. Das and Lee [28] proposed a novel graph-matching metric to minimize the cross-domain discrepancy, and then refined the model by exploiting confident unlabeled target samples and their pseudo-labels. Han et al. [30] developed a deep transfer network (DTN) with joint distribution adaptation (JDA) approach which presents smooth convergence and avoids negative adaptation in comparison with marginal distribution adaptation (MDA). Gong et al. [31] proposed a geodesic flow kernel (GFK)-based approach which inferred important algorithmic parameters without requiring extensive cross-validation or labeled data from either domain. Domain adaptation has been applied in many areas, such as object recognition, image classification and natural language processing, etc [32]. Nevertheless, only a few studies of domain adaptation employed in the defect diagnosis cases [20, 22, 26, 38, 40]. Lu et al. [38] proposed a deep neural network for domain adaptation in fault diagnosis (DAFD) to extract the representative information from the original data. Shen et al. [20] made use of source instance as auxiliary data to assist target data classification. An et al. [40] proposed a model based on multilayer multiple kernel variant of Maximum Mean Discrepancy for the defect diagnosis of rolling bearings. Most of the classical domain adaptation approaches work well only under a common assumption that a large amount of labeled source domain data and unlabeled target domain data are acquired from all categories to train and test the model. However, many real applications of fault detection show disobedience of this assumption which may result in the poor performance in fault detections. Taking the bearing fault diagnosis issue as an example, only the normal category samples could be easily gained in the practical application. In addition, the traditional domain adaptation approaches may have poor robustness in fault detections. To this end, our method is developed to overcome these two deficiencies of the subspace learning category. Considering the actual defects diagnosis application, we propose a novel

2

domain adaptation model based on GFK, strengthened feature extraction and Z-score normalization named DAGSZ to learn the transferable features from both source and target domains. Then, softmax regression is employed as a classifier to distinguish the mechanical health conditions based on the learned transferable features. Our method outperforms significantly state-of-the-art methods for domain adaptation. The main contributions of the proposed method are summarized as follows. (1) In order to simulate the actual applications of mechanical fault detection, the target domain only consists of normal data under the working load. Only a small amount of labeled source domain data and unlabeled target domain data are applied to extract the transferable features, and high classification accuracy is finally obtained. (2) The representative features of the raw vibration data are strengthened according to the processing of time domain average and square for the PSD matrix. (3) The defect diagnosis of the rolling bearings and gears are carried out separately to validate the effectiveness and robustness of the proposed method. The rest of our paper is described as follows: Briefly introductions of the time-frequency transform, geodesic flow kernel and softmax regression algorithm are included in Section 2. Section 3 mainly shows the process and framework of the proposed approach (DAGSZ). For Sections 4, firstly, effectiveness of our method is verified by fault diagnosis of a bearing dataset, and then the robustness of the proposed approach is validated by defect diagnosis of a gear dataset. Finally, the conclusions are discussed in Section 5. 2. Theoretical background 2.1. Time-frequency transform In many real world applications, the non-stationary vibration signals are commonly obtained. Generally speaking, the time-frequency transform approaches perform well for processing the non-stationary signals. The classical time-frequency domain methods mainly consist of STFT, WT (wavelet transform), Hilbert-Huang transform, et al [33]. The implementation processing of STFT algorithm: (1) Firstly, the vibration signal is segmented by the window function. (2) Secondly, the Fourier transform is applied for analyzing the segmented signal. (3) Thirdly, the relationship between signal frequency and time is obtained by shifting the position of the window function. Finally, the time-frequency domain characteristics are gained [34]. It should be noted that the vibration signal is assumed to be stable in the window. The formula of STFT is described as below.

St ( f )  





s(t   )h( )e j 2 f  d

(1)

Where h( ) denotes the window function, and the distinct window functions are employed to obtain different short time spectra. 2.2. Geodesic flow kernel (GFK) The specific processes of geodesic flow kernel (GFK) algorithm are described as follows: (1) obtain the optimal dimension of the subspaces; (2) build geodesic flow; (3) calculate the geodesic flow kernel; (4) construct the classifier. (1) Obtain the optimal dimension of the subspaces Generally, we assume the data can be transformed into a low-dimensional linear subspace. The optimal dimension of the subspaces is required by aligning as much as possible the subspaces of the source and target domains. In addition, the optimal dimension can be confirmed by a subspace disagreement metric (SDM). The SDM is defined as follows, ( ) , (2) Where represents the d-th principle angle between the PCAS and PCAS+T, and between the PCAT and PCAS+T [36].

3

In order to discover the optimal d, we employ a greedy strategy: * | ( ) + (2) Build geodesic flow

(3)

Fig 1 Description of geodesic flow for domain adaptation [36] In a general distance metric space, the smallest distance is the straight line in a Euclidean space (dashed line in Fig. 1) which denotes the length of a path between two points. Nevertheless, considering the actual situation, the distance between two points can be obtained as a length (see the solid red line in Fig. 1) in a geodesic flow metric space where Grassmann manifolds are used. represent the orthogonal basis gained by a principal component analysis (PCA) in the source and target domain, respectively, where D denotes the dimensionality of the dataset ( ) and d represents the dimensionality of subspace. In addition, refers to the orthogonal complement to . The typical Euclidean metric is applied for the Riemannian manifold, under the constraints that ( ) and ( ) , the geodesic flow , ( ) is parameterized as, ( ). For other t, it is formed as follows [35, 36]. ( ) , -[ ( ) ( ) ( ) (4) ], ( ) ( ) Where and denote the orthonormal matrices, which are obtained by the singular value decomposition (SVD): (5) and denote the diagonal matrices. The diagonal matrices consist of and , respectively, where . Besides, are noted as the principle angle between the PS and PT: ⁄ (6) Furthermore, the diagonal elements ( ) and ( ) are composed of ( ) and ( ) , separately. (3) Calculate the geodesic flow kernel (GFK) According to formula (4), the geodesic flow can be obtained by collecting the infinite subspaces gradually changing from DS to DT. For two raw D dimension of feature vectors and , their projections can be calculated into ( ) for a continuous t from 0 to 1.Then, all the projections are integrated into infinite dimensional feature vectors and . The inner product between the two projected feature vectors defines the geodesic-flow kernel [35]. 〈 〉 (7) ∫ ( ( ) ) ( ( ) ) Where denotes a positive semidefinite matrix. ( ) ( ) (8) ∫ We can compute matrix G into a closed-form previously defined matrices: , Where

to

-[

][

]

(9)

represent the diagonal matrices and their diagonal elements are: (

(

)

4

)

(

)

(10)

As we can see from Eq. (9), the closed-form expression of the geodesic flow kernel is very convenient to utilize and does not rely on user-selected parameters such as the bandwidth in the Gaussian RBF kernels. 2.3. Softmax regression The softmax regression algorithm [37] can be adopted for tackling the multiple regression problems. As a result, it is widely utilized in the deep learning researches as a classifier. Assume ) ( the training dataset of the softmax regression consists of m samples, i.e.*( )+, (i) () where x is the input feature, i=1, 2, , m, and the label set contains * +, where k denotes the amount of categories. The probability value p(y(i)=j|x(i)) of every category j is obtained according to the hypothesis function, that is, the probability of every classification results of x(i) is computed, and the maximum probability category is the output value. Therefore, the output ( ( ) ) is displayed as follows:  p  y (i )  1| x (i ) ;   exp 1T x (i )       p  y ( i )  2 | x ( i ) ;   exp  2T x (i )  1 (i ) h  x     k      exp  Tj x (i )      j 1   (i ) (i ) T (i )  p  y  K | x ;   exp  k x 

(11)

Where are the parameters of the model, which can be replaced by the matrix , - , ( () | ( ) ) denotes the probability that the sample ( ) belongs to category j. Moreover, can be obtained by minimizing the cost function to predict the category of a new sample. The cost function is formulated as follows: exp  Tj x (i )  1 m k  k n (i ) J     1 y  j log k   ij2 m i 1 j 1 2 i 1 j 0 exp  lT x (i )   l 1 (12) Where m refers to the amount of samples, k denotes the number of categories, n is the nth column of the weight matrix , represents the weight decay term, * + is an indicative function, i.e. 1{true} =1, 1{false} =0. A gradient descent approach is generally adopted for minimizing ( ), which is shown as follows:  j J ( )  

1 m (i)   x (1 y(i)  j  p( y (i)  j x(i) ; )) m i 1 

( ) denotes a vector that indicates the partial derivative of ( ) with respect to

(13) , where

3. Proposed framework In this section, we mainly describe the details of the proposed method. DAGSZ includes the following steps: (1) strengthen the representative features of the raw vibration data; (2) Compute the geodesic flow kernel (GFK); (3) Z-score normalization; (4) construct the softmax regression classifier. (1) Strengthen the representative features of the raw vibration data In this paper, the short time Fourier transform (STFT) is applied to preprocess the raw vibration data. However, the result of the STFT processing is a two-dimensional matrix which not fits for the GFK algorithm. Thus, we find a way to convert the two-dimensional matrix to one dimensional vector . Firstly, the power spectral density (PSD) matrix is obtained according to the Eq. (1), where m is the dimensionality of the frequency domain and n denotes the dimensionality of the time domain. Secondly, the PSD matrix is transformed into the form of decibel (dB) which can obtain more information of the vibration signal, especially the information of the low frequencies and small amplitudes: ( ) (14)

5

Where

denotes the PSD matrix of the i-th sample. remains the two-dimensional matrix . Thirdly, converting the two-dimensional matrix to one dimensional vector, ∑

,

-

(15)

can be considered as the closed-form expression of a time domain average for . Comparing with the time-domain data, the result of STFT contains both the frequency-domain and time-domain information which owns more representative features. For STFT, since the vibration signal is segmented by the window function, the time domain average of several segmented vibration data can reduce the randomness of the frequency-domain data. The amplitude of Li matrix is squared since the amplitude is too small to train the geodesic flow kernel. Thus, according to the processing of time domain average and square for matrix, the representative features of the raw vibration data can be strengthened. The STFT based feature strengthened approach is named as STFT-FS. Thus, the original vibration data are preprocessed by STFT-FS to obtain the strengthened representative features for source domain data XS DS and target domain data XT DT. (2) Compute the geodesic flow kernel (GFK) In this paper, the fault identification settings for calculating the geodesic flow kernel are described as follows: (a) For the source domain data XS, there are plenty of labeled samples with all categories; (b) The normal category data in the target domain denoted as are employed to compute the GFK, which is consistent with the actual fault diagnosis situations. It should be noted that the unlabeled target domain data XT which contains all categories are adopted to test the softmax regression classifier. First of all, the optimal dimensionality of the subspaces for representing the domains is confirmed in terms of Eq. (2) and (3). Then, an equivalent finite-dimensional domain-invariant feature space is extracted by Eq. (9) for constructing the classifier which not depends on the inner products (such as logistic regression). (3) Z-score normalization Z-score normalization is applied for getting the training data Tr and the testing data Te of the softmax regression classifier. The training data Tr and the testing data Te are calculate by Tr= (FS) and Te= (FT), where FS=G XS and FT=G XT are the learned transferable features which contain all categories, G is the geodesic flow kernel which constructed from the source domain data (all categories) XS and the normal category data in the target domain , (FS) and (FT) represent the processing of Z-score normalization which calculated as follows: ̅

( ) (16) Where X represents the finite domain invariant feature subspace FS or FT (contains all categories), ̅ represents the mean value of X, and refers to the standard deviation. The rescaled finite domain invariant feature subspace ( ) which has the properties of a standard normal distribution is obtained by the Z-score normalization. For many machine learning approaches, the certain weights may update faster for the process of gradient descent using the Z-score normalization. For the classifiers of machine learning, Z-score normalization may improve the speed of solving the optimal solution for the gradient descent, may be conductive to the adjustment of learning rate and may develop the classification accuracy. Thus, Z-score normalization is employed on the learned features of the training and test data for the softmax regression classifier, respectively. (4) construct the softmax regression classifier

6

The softmax regression classifier performs better in a multi-class problem than the standard SVM which is a binary classifier for defect diagnosis. Thus, the softmax regression classifier is adopted in this paper. The training data Tr is drawn from source domain and the test data Te which contain all categories are applied to identify the effectiveness of the softmax model. In addition, the () probability value ( ( ) | ) of every category j is obtained according to the Equation (11), () that is, the probability of every classification results of test data is calculated, and the maximum probability category is the predicted target label. Finally, the class detection performance can be gained by comparing the predicted target label with the true target label. The procedure of the proposed method is displayed in Fig. 2. Source Domain Data (All Categories)

Target Domain Data (Normal Category)

Segments

Segments

Optimized STFT Time-frequency domain data Compute Geodesic Flow Kernel Matrix G

Target Domain Learned Features (All Categories)

Source Domain Learned Features (All Categories)

Z-score Normalization

Z-score Normalization

Train

Test

Step1: Optimized STFT Softmax Regression

Step2: Compure GFK Step3: Z-score normalization

Output Diagnosis Results

Step4: Train classifier

Fig 2 Strucure of DAGSZ In summary, the steps of the proposed algorithm may be reported as: Algorithm 1: DAGSZ Input: Labeled source domain data DS, unlabeled target domain data DT Train: (1) Preprocess the raw vibration data DS and DT by STFT using Eq. (1); (2) Process the PSD matrix according to Eq. (14); (3) Strengthen the representative features by Eq. (15), obtain the source and target domain subspace: XS and XT; (4) Employ XS and (normal) which is only contains normal data to train the geodesic flow kernel; (5) Compute the geodesic flow kernel G according to Equation (9); (6) Compute the training data Tr and test data Te for softmax classifier by Equation (16); Classify: Return the predicted label for unlabeled target domain data. 4、Experiment results and analysis 4.1 Case 1: Bearing fault diagnosis

7

A. Experimental setup and data preparation The bearing dataset was employed to validate the effectiveness of DAGSZ. The original vibration signals of bearings adopted in this case were offered by Case Western Reserve University. The vibration signal of bearings assembled on the fan and the driving end were measured by the vibration accelerometer. Note that the sampling frequency was set as 12 kHz. The bearing vibration data of the driving end were utilized in this case, which was measured under four kinds of conditions, i.e. (1) normal condition; (2) outer ring defect (OD); (3) inner ring defect (ID); (4) roller defect (RD). Besides, the vibration data of three distinct defect sizes (0.007, 0.014, 0.028 inches) were separately collected. Furthermore, the vibration test data of bearings were obtained under four distinct loads: 0, 1, 2, 3 hp which corresponded to the four speeds (1797, 1772, 1750, 1730 rpm), respectively. The same health conditions under four different speeds were considered as one class. Thus, ten types of faults were obtained, and the details of each fault were described in Table 1. Each fault type of one rotating speed contained 100 samples and every sample included 1200 points. As a result, the bearing dataset totally consisted of 4000 samples, and there were 1000 samples under each speed. To simulate the circumstance of domain adaptation, the selection of data for training the GFK and constructing the softmax regression classifier are described as follows: (1) Data for training GFK (a) Source domain data The labeled source domain data is composed of the normal data and nine types of faulty data. = {normal, FI_0.007, FO_0.007, FB_0.007, FI_0.014, FO_0.014, FB_0.014, FI_0.021, FO_0.021, FB_0.021}. (b) Target domain data The available target domain data employed for training GFK mainly consists of the normal data. = {normal}. (2) Data for constructing the softmax regression classifier (a) Training data The composition of training data Tr is the same as that of source domain data . (b) Testing data The unlabeled testing data Te contains the normal data and the other nine types of faulty data. Te = {normal, FI_0.007, FO_0.007, FB_0.007, FI_0.014, FO_0.014, FB_0.014, FI_0.021, FO_0.021, FB_0.021}. Since the raw vibration data of bearings were collected under four distinct loads: 0, 1, 2, 3 hp, 12 domain adaptation tasks shown in Table 2 were employed in this section. For clarity, the domain adaptation task load0-1 is taken as the example to explain the data selection for training the GFK and constructing the softmax regression classifier. For domain adaptation task load0-1, the source domain data for training the GFK can be denoted as: =load0 {normal, FI_0.007, FO_0.007, FB_0.007, FI_0.014, FO_0.014, FB_0.014, FI_0.021, FO_0.021, FB_0.021} which contain 1000 samples. The target domain data can be denoted as: =load1 {normal} which only contain 100 samples. The training data for constructing the softmax regression classifier Tr= and the testing data are: Te=load1 {normal, FI_0.007, FO_0.007, FB_0.007, FI_0.014, FO_0.014, FB_0.014, FI_0.021, FO_0.021, FB_0.021}. Table 1 Description of bearing dataset Fault location Category Labels Fault diameter (inch)

Normal

Ball

Inner Race

Outer Race

Total

1

2

3

4

5

6

7

8

9

10

0

0.007

0.014

0.021

0.007

0.014

0.021

0.007

0.014

0.021

8

Load0 Load1 Load2 Load3

100 100 100 100

100 100 100 100

100 100 100 100

100 100 100 100

100 100 100 100

100 100 100 100

100 100 100 100

100 100 100 100

100 100 100 100

100 100 100 100

1000 1000 1000 1000

B. Experimental results Comparison Methods: Several successful methods are adopted to compare with DAGSZ in order to validate its effectiveness: (1) Sparse filtering (SF), which is a one-layer deep learning approach; (2) Stacked autoencoders (SAEs), which is a representative multi-layer deep learning method; (3) Subspace alignment (SA); (4) Deep neural network for domain adaptation in fault diagnosis (DAFD), which adds the MMD and weight regularization term into SAEs; (5) Transfer component analysis (TCA), which based on searching the feature subspace ; (6) A basic version of the GFK model, which uses the nearest neighbor classifier. (7) A deep neural network for bearing fault diagnosis using multiple kernel method [34]; (8) Distribution Matching Machine (DMM) [41], which jointly learns the transfer classifier and transferable knowledge with statistical guarantees. Among them, methods (1) and (2) are the classical unsupervised deep learning methods without considering the domain adaptation problem, methods (3)-(8) show the domain adaptation methods which have been applied in many fault diagnosis issues. Parameter selection: In order to ensure a fair condition for all the comparison methods, the raw vibration signals are first segmented in the same way, and then, the segmented data are preprocessed by Fast Fourier Transform. It should be noted that 30 trials were carried out to decrease the randomness of the experiment. For sparse filtering, the input and output dimension are chosen as 600, and the softmax parameter is lamda= 1e-5. For SAEs, we adopt the three layers neural network which has two hidden layers, and the dimension of each layer is {600,200,100}. Softmax regression is selected as the classifier. For methods (3)-(8), the training and test dataset are selected as the same as DAGSZ. For SA, the optimal dimension of the subspace is chosen as 90. For DAFD, two layers neural network are adopted, the hidden layer size is 1000 and the key parameters of the cost function are selected as: , and [38]. For TCA, the suitable dimension of subspace is selected by exploring {10, 20, 40, 60, 80, 100}; For GFK and DAGSZ, the optimal dimension of geodesic flow kernel is selected by subspace disagreement measure (SDM). The structure of method (7) is [1000, 600, 100, 10], and the keep probability p is set to 0.8. For DMM, subspace dimension r and penalty parameter is selected as 20 and 10-5, respectively. For the STFT of DAGSZ, Hamming window function is adopted. The length coefficient of the window is chosen as 0.03, and the overlapping rate between frames is adopted as 75%. The number of data points utilized for short time Fourier transform (STFT) is fixed as 512, and the PSD matrix is 257 10. In addition, for methods (1)-(8), the sigmoid function was adopted as the activation function. The fault detection results and details of all domain adaptation tasks are illustrated in Table2. Method SF SAEs SA DAFD TCA GFK Method[40] DMM

Table 2 Detection results of the bearing datasets Load1-0 Load2-0 Load3-0 Load0-1 Load2-1 74.5% 63.1% 69.3% 70.2% 67.8% 68.8% 63.2% 60.2% 67% 64.8% 82.6% 76.6% 62.8% 85.1% 79.5% 83.9% 78.1% 68.5% 82.3% 75.7% 89.8% 69.8% 83.3% 88.7% 85.7% 93.8% 90.4% 85.4% 94.2% 89.3% 95.3% 92.6% 90.0% 86.8% 85.3% 95.7% 86.7% 93.6% 90.7% 85.3%

9

Load3-1 54.1% 59.1% 71.4% 67.3% 82.1% 83.0% 93.2% 94.9%

DAGSZ Method SF SAEs SA DAFD TCA GFK Method[40] DMM DAGSZ

100%

99.8%

99.3%

99.4%

99.7%

99.3%

Table 2 Detection results of the bearing datasets (continued) Load0-2 Load1-2 Load3-2 Load0-3 Load1-3 Load2-3 62% 70.9% 67.9% 58.6% 65.6% 66.8% 61.2% 70.2% 65.6% 69% 63.1% 64.7% 74.5% 77.1% 65.4% 59.3% 67.4% 81.2% 78.2% 84% 72.8% 68.1% 75.2% 80.8% 74.2% 79.5% 85.9% 83.5% 77.7% 86.9% 86% 90.6% 92.3% 85.7% 89.8% 92.1% 86.4% 86.7% 87.9% 96.1% 83.5% 93.7% 82.1% 95.1% 82.7% 96.4% 86.4% 91.6% 99.0% 99.1% 99.8% 99.6% 98.9% 100%

Average 65.9% 64.7% 73.6% 76.2% 82.25% 89.38% 89.79% 90.10% 99.49 %

Results: As we can see from the diagnosis results in Table 2, classification accuracy of DAGSZ reaches 99.49%. The stable detection results under different transfer scenarios validate the effectiveness and robustness of the proposed method which significantly outperforms the other listed methods. Therefore, DAGSZ can not only predict the fault types, but also predict the defect severities effectively. Fewer samples are selected in this experiment, and the samples are acquired in an alternately segmenting way, which may cut down the detection accuracies of many listed methods. It can be seen from Table 2, the classical deep learning detection methods, sparse filtering and SAEs, whose average classification accuracies of the twelve scenarios are under 70%, performs poorly in domain adaptation case. SA and DAFD perform better, and the average accuracy reaches 73.6% and 76.2%, respectively. For the classical feature subspace learning method TCA, the average accuracy is 82.25%, which is 17.24% lower than the proposed method. A major deficiency of TCA is that the difference in fault feature distributions is not effectively reduced. DAGSZ overcomes this deficiency and obtains better performance in domain adaptation problems. Comparing with baseline GFK, 10.11% transfer improvement can be achieved for DAGSZ. For the baseline GFK method, the input data may not be represented accurately because small enough subspace dimension is required to ensure different subspaces transit smoothly along the geodesic flow. For the method used in the literature survey [40], its average accuracy reaches 89.79%, which is still 9.7% lower than the proposed approach. The diagnosis accuracies of method (7) in the domain adaptation tasks of load 1-0, load 2-0, load 3-1, load 3-0 and load 2-3 are all higher than 92.5%, but bad performances are obtained in the other domain adaptation tasks. For DMM, the average accuracy is 90.1%, but performs poorly for the robustness. DAGSZ can achieve higher classification accuracy by learning a more representative subspace. Secondly, the robustness of conventional frameworks performs poorly, and the detection performance varies a lot when dealing with different domain adaptation tasks. For example, in the twelve transfer scenarios on bearing dataset, sparse filtering and SAEs perform better in task load 0-1 and load 1-0, while poor performance in task load 0-3 and load 3-0. It is reasonable because the rotational speed variations between load 0 and load 1are much smaller than those between load 0 and load 3. Thus, higher detection accuracy can be obtained owing to the data in task load 0-1 shares a more similar feature subspace. This interesting phenomenon illustrates the inherent shortcoming of the traditional detection methods. The conventional diagnosis methods much rely on the similarity between source domain and target domain data. Nevertheless, the large discrepancy across domains is inevitable in the real world. Lastly, the robustness performs much poorly for the baseline domain adaptation methods. Thus, how to effectively improve the robustness of the defect diagnosis has drawn much attention in domain adaptation. DAGSZ can achieve better performance and is more robustness by employing STFT-FS and Z-score normalization in GFK to obtain more representative subspace.

10

True labels

Type-1

100

0

0

0

0

0

0

0

0

0

2

0

100

0

0

0

0

0

0

0

0

3

0

0

100

0

0

0

0

0

0

0

4

0

0

0

100

0

0

0

0

0

0

5

0

0

0

0

100

0

0

0

0

0

6

0

0

0

0

0

100

0

0

0

0

7

0

0

0

0

0

0

100

0

0

0

8

0

0

0

0

0

0

0

100

0

0

9

0

0

0

0

0

0

0

0

100

0

10

0

0

0

0

0

0

0

0

0

100

Type-1

2

3

4

5 6 7 Predicted labels

8

9

10

Fig 3 Confusion matrix of the scenario load 0-3 bearing dataset. Normal FB0.007 FB0.014 FB0.028 FI0.007 FI0.014 FI0.028 FO0.007 FO0.014 FO0.028

100

Dimenson 3

50 0 -50 -100 100

50

0

-50

-100

-50

-100

Dimenson 2

50

0

100

Dimension 1

Fig 4 Visualization maps based on the learned characteristics To further show the diagnosis information clearly, the scenario load 0-3 is taken as an example, and the confusion matrix of scenario load 0-3 is displayed in Fig. 3. As can be seen from the confusion matrix, each fault type contains 100 samples, and no samples are misclassified. To estimate the feature learning ability of DAGSZ, t-SNE [39] is utilized to convert the 257dimension feature vector into a 3-dimension map. The results of the bearing dataset are displayed in Fig. 4. It is noticed that most features of the same health condition under different speeds are gathered in the corresponding cluster and each cluster is separated from each other. 3

15

Time-domain

Frequency-domain

1

Amplitude

Ampiltude

2

0 -1

10

5

-2 -3

0

200

400 600 Samples

800

0

1000

0

200

(a) time domain

400 600 Samples

800

1000

(b) frequency domain 3

5000

Time-frequency domain

Z-score normalization

4000

Amplitude

Amplitude

2

3000

1 0 -1 -2

2000

0

200

400 600 Samples

800

-3

1000

(c) STFT

0

200

400 600 Samples

800

(d) Z-score normalization

11

1000

Fig 5 The distribution of features extracted by distinct domain analysis approaches: (a) timedomain method, (b) FFT method, (c) STFT method, (d) Z-score normalization

(a) Time-domain method (b) FFT method (c) STFT method Fig 6 The visualization of features extracted by distinct domain analysis approaches: (a) timedomain method, (b) FFT method, (c) STFT method. The distribution of features extracted by distinct domain analysis approaches is depicted in Fig. 5. As we can see from Fig. 5, the abscissa represents the number of samples and each fault type contains 100 samples. In addition, ordinate denotes the distribution of fault characteristics. The feature distribution of time-domain data is based on the 0-value line, and the overall amplitude of many fault features is similar, which may make it difficult to distinguish the different fault features effectively. The frequency-domain data performs better, more fault features can be recognized, and however, still some defect features perform similar. Comparing with the above mentioned cases, the difference between fault feature distributions is rather large for the timefrequency domain data, thus the defect features can be recognized more accurate. Therefore, more discriminative and representative features can be obtained from the time-frequency domain data. However, the difference between a few defect feature distributions is not obvious. In order to further study the fault feature distributions drawn by time series statistical analysis, FFT and STFT methods, the visualization of features extracted by distinct domain analysis approaches are shown in Fig. 6. As we can see from Fig. 6, the features extracted by STFT perform more evident and divisible than those acquired by the other two approaches. The matrix G is gained by training the GFK algorithm, and the learned features are obtained by multiplying the matrix G with the time-frequency domain data. Then, the Z-score normalization method is employed to normalize the learned features. It is can be seen from Fig. 6 , comparing with the time-frequency domain data, the distribution of Z-score normalized learned fault features is more discriminative and representative. Furthermore, ten defect features can be recognized accurately, which results in the satisfactory fault detection performance. As a result, the proposed approach performs much better than the-art-of-state domain adaptation methods. C. Effectiveness Analysis Unbalanced bearing dataset: DAGSZ performs well in bearing fault diagnosis based on the assumption: the number of samples and classes in the source and target domain is balanced. Thus, the effectiveness of DASCG when the number of samples and classes in the source and target domain is unbalanced needs to be discussed. The detail of the bearing dataset with unbalanced distribution is shown in Table 3. The basic version of the GFK model is applied as the comparison method in this section. The parameters settings of the GFK and DAGSZ are selected according to the section “Parameter selection”. The average diagnosis results of all domain adaptation tasks are depicted in Fig 7. Besides, 30 trials were carried out to decrease the randomness of the experiment. It can be seen from Fig. 7 that the average classification accuracy of baseline GFK is lower than 90%, which is 13.31% lower than DAGSZ. The average classification accuracy of DASCG with the unbalanced bearing dataset is only 0.55% lower than that with the balanced bearing dataset. In addition, the detection results of different transfer scenarios perform stable which validate the robustness of DAGSZ. Thus, DASCG is an effective and robust approach for the unbalanced samples and classes in the source and target domain.

12

Table 3 Description for the bearing dataset with unbalanced distribution Fault location Category Labels Fault diameter (inch) Load0 Load1 Load2 Load3

Normal

Ball

Inner Race

Outer Race

Total

1

2

3

4

5

6

7

8

9

10

0

0.007

0.014

0.021

0.007

0.014

0.021

0.007

0.014

0.021

100 100 100 100

30 10 50 100

20 15 50 100

10 10 50 100

30 10 30 100

20 15 30 100

10 10 30 100

30 10 20 100

20 15 20 100

10 10 20 100

280 205 400 1000

100%

Accuracy

95% 90% 85% 80% GFK DAGSZ

75% 70% Load1-0 2-0

3-0

0-1

2-1 3-1 0-2 1-2 Transfer scenarios

3-2

0-3

1-3

2-3

Fig 7 Detection performances of the bearing dataset with unbalanced distribution Discussion for DAGSZ: It can be seen from Fig.5 and Fig.6, more discriminative and representative features can be obtained by STFT, and thus the different heath conditions can be identified more accurate by the softmax regression classifier. In addition, Z-score normalization mainly rescales the features to have the properties of the mean of a standard normal distribution, which can strengthen the classify ability for the softmax regression classifier. All in all, good defect diagnosis performances can be obtained by employing these techniques for the proposed approach. In order to further research how STFT, Z-score and softmax regression affect the performance of the experiments individually, several single factor experiments were implemented and the results of these experiments were shown in Table 4. It should be noted that FFT is applied for the first three methods and STFT is used for DASCG. According to the results in Table 4, the average diagnosis accuracy of GFK with softmax regression classifier is 88.97%, which is 2.48% higher than that of GFK with k-Nearest Neighbor (KNN) classifier. In addition, when we apply z-score normalization to the softmax classifier, 2.15% transfer improvement can be obtained compared with the second method. Finally, STFT algorithm is employed for the DASCG method, and results in 8.37% transfer improvement compared with the third method which applies FFT to preprocess the original vibration data. Table 4 diagnosis results of different GFK based methods Method Load1-0 Load2-0 Load3-0 Load0-1 Load2-1 Load3-1 Average GFK+KNN 89.6% 88.4% 84.6% 87.6% 86.3% 82.6% GFK+softmax 92.8% 90.4% 85.4% 90.2% 89.3% 83.0% GFK+zscore+softmax 93.4% 91.8% 89.4% 92.3% 91.5% 87.1% DASCG 100% 99.8% 99.3% 99.4% 99.7% 99.3% Method Load0-2 Load1-2 Load3-2 Load0-3 Load1-3 Load2-3 GFK+KNN 83.7% 85.7% 89.9% 84.2% 87.2% 88.1% 86.49%

13

GFK+softmax 86.0% GFK+zscore+softmax 87.4% DASCG 99.0%

90.6% 91.2% 99.1%

92.3% 93.4% 99.8%

85.7% 87.9% 99.6%

89.8% 92.3% 98.9%

92.1% 88.97% 95.7% 91.12% 100.00% 99.49%

Discussions for time-frequency transform methods: In this section, three distinct timefrequency analysis methods: STFT-FS applied for DASCG, continuous wavelet transform (CWT) and Hilbert-Huang transform (HHT), are explored for their representation effectiveness. CWT is a linear time-frequency representation with a wavelet basis instead of sinusoidal functions. For CWT, the complex morlet wavelet was selected and the length of scale was adopted as 256. HHT mainly contains two steps: the empirical mode decomposition (EMD) of the time series signal and Hilbert spectrum construction. These three time-frequency transform methods are employed for the strengthened feature extraction of DASCG, and the detection performances of them are depicted in Fig. 8. As shown in Fig. 8, the average diagnosis accuracies of CWT and HHT based DASCG are lower than that of STFT-FS, which indicates the STFT-FS can extract more representative features than CWT and HHT. The proposed method can obtain more stable diagnosis accuracies for different domain adaptation tasks compared with CWT and HHT based DASCG. In addition, the proposed approach is more time-saving than the other two methods in the process of program calculation. Thus, STFT-FS is selected for our method in this paper. 100%

Accuracy

97% 95%

90%

85% load1-0 2-0

CWT HHT STFT-FS 3-0

0-1

2-1 3-1 0-2 1-2 Domain adaptation tasks

3-2

0-3

1-3

2-3

Fig 8 Detection performances of the bearing dataset with different time-frequency transform approaches 4.2. Case 2: gear fault diagnosis A. Experimental setup and data preparation The general gear failures mainly consist of pitting corrosion, teeth breakage, wear failure, and compound faults. It is critical to ensure normal operation of agricultural machinery by diagnosing the gear faults accurately [42]. In this section, the effectiveness and robustness of DAGSZ was validated by the gear defect diagnosis under distinct speeds. The gearbox platform was applied to collect the raw vibration data, which was depicted in Fig. 9. The platform consisted of a gearbox, an engine, a bearing seat, a flexible coupling, the base, et al. In addition, the agricultural machinery engine was employed to control the gearbox speed. Two gears were assembled in the gearbox, whose parameters were displayed in Table 5.

14

Fig 9 Platform of multi-fault gearbox Table 5 The parameters of the two gears Gear Teeth modulus pressure material name number (mm) angle (°) Pinion 55 2 20 S45C Wheel 75 2 20 S45C Table 6 The speeds for distinct defect types Type Type1 Type2 Type3 Type4 Type5 Case Case1(rpm) 800 825 834 812 830 Case2(rpm) 820 849 850 842 854 Case3(rpm) 852 864 866 860 868 The gear dataset utilized in this section mainly included four kinds of defects: pinion wear fault, compound failure of gearwheel pitting corrosion and pinion wear failure, gearwheel pitting corrosion, compound defect of gearwheel teeth breakage and pinion wear failure, which were named as Type2, Type3, Type4 and Type5, respectively. Furthermore, the normal condition was named as Type1. It can be seen from Table 6, the gear data was drawn under three distinct cases, and the speed of each fault type was distinct for the same case. The original vibration data was drawn by the acceleration sensor installed on the gearbox surface, and the frequency of sampling was 5120 Hz. In addition, 100 samples were included for every defect type, and every sample contained 512 data points. Note that these samples were sequentially intercepted from the original vibration data. In the process of STFT, the Hamming window function was applied, the length coefficient of the window was chosen as 0.03, and the overlapping rate between frames was 80%. To simulate the circumstance of domain adaptation, the selection of training data and test data are described as follow: 1) Training data The source domain is composed of the normal data and four types of faulty data drawn under case A-C, separately, = {type1, type2, type3, type4, type5}. The available target domain data employed in the training data mainly consists of the normal data captures under case A-C, separately. = {type1}. 2) Testing data The testing data mainly consists of the normal data and four types of faulty data captures under case A-C, separately. Td = {type1, type2, type3, type4, type5}. B. Experimental results

15

100% 90%

Accuracy

80% 70% 60% GFK DAGSZ

50% 40% A-B

A-C

B-C C-A Transfer scenarios

C-B

B-A

Fig 10 Detection performances of gear by the two methods Table 7 Detection results of the gear datasets Method A-B A-C B-C C-A C-B B-A GFK 92% 74.5% 93.5% 75.6% 96% 94% Proposed method 100% 100% 99.4% 100% 100% 100%

Average 87.6% 99.9%

For clarity, dataset A, B and C denote case 1, 2 and 3, respectively. Each dataset contains five type defects and each fault type consists of 100 samples. The baseline GFK is applied for comparing with DAGSZ in order to validate its effectiveness. For baseline GFK, the raw vibration signals are segmented as the same with DAGSZ, and then, the segmented data are preprocessed by Fast Fourier Transform. As a result, 600 Fourier coefficients are obtained. The softmax regression is applied as the classifier for these two methods. The results of all domain adaptation tasks are illustrated in Table 7 and Fig. 10. As we can see from Table 7 and Fig. 10, the average classification accuracy of baseline GFK is 87.6%, which is 12.3% lower than the proposed method. In addition, the detection results of different transfer scenarios vary a lot which indicates the poor robustness of the baseline GFK. The average accuracy of the proposed method is 99.9%, the stable detection results under different transfer scenarios validate the effectiveness and robustness of the proposed method which significantly outperforms the baseline GFK. The diagnosis result of transfer scenario B-C is 99.4%, which is lower than the other transfer scenarios. It may be caused by the large difference between the distribution of dataset B and C. All in all, the proposed method owns strong effectiveness and robustness to detect the gear faults. 5. Conclusions In this paper, the source domain with all fault types and the target domain with only the normal data are employed for DAGSZ to obtaining the transferable characteristics. In addition, a bearing and gear datasets are employed for verifying the effectiveness and robustness of our method. The conclusions can be summarized as follows: (1) The representative and robust characteristics can be effectually learned by DAGSZ for the cross-domain issues. The diagnosis results on real-world datasets validate the superiority of proposed approach. (2) The effectiveness and robustness of DAGSZ is further validated by comparing with other classical domain adaptation methods. (3) Due to its characteristic of high detection accuracy and time-saving, the proposed method may be a simple and effective domain adaptation method for cross-domain problems in the realworld fault diagnosis. Acknowledgments The research was supported by National Natural Science Foundation of China (51675262), the Project of National Key Research and Development Plan of China “New energy-saving environmental protection agricultural engine development” (2016YFD0700800), the

16

Fundamental Research Funds for the Central Universities (NP2018304) and the Major national science and technology projects (2017-IV-0008-0045). References [1] Z. He, J. Chen, T. Wang, F. Chu. Theory and application of mechanical fault diagnosis. Beijing: Higher Education Press, 2010. [2] B. Zhong, R. Huang. Mechanical Fault Diagnosis (the third edition). Beijing: Machinery Industry Press, 2006. [3] Z. Zhu. Research on the application of wavelet analysis in automotive gear transmission fault diagnosis. Hefei: HeFei University of Technology, 2002. [4] X. Jiao. The wavelet analysis and its application in fault diagnosis of gear box. Guangzhou: South China University of Technology, 2014. [5] J. K. Sinha, K. Elbhbah, A future possibility of vibration based condition monitoring of rotating machines, Mech. Syst. Sig. Process. 34 (2013) 231–240. [6] Z. Gao, C. Ma, D. Song, et al. Deep quantum inspired neural network with application to aircraft fuel system fault diagnosis. Neurocomputing, 238 (2017):13-23. [7] Wang J., Li S., An Z., et al. Batch-normalized deep neural networks for achieving fast intelligent fault diagnosis of machines. Neurocomputing, 329 (2019): 53-65. [8] L. Liao, W. Jin, R. Pavel. Enhanced restricted Boltzmann machine with prognosability regularization for prognostics and health assessment. IEEE Trans. Indust. Electron. 63 (2016) 7076-7083. [9] G. Liu, Z. Yin, Y. Jia, et al. Passenger flow estimation based on convolutional neural network in public transportation system. Knowledge-Based Systems, 123 (2017): 102-115. [10] O. Janssens, V. Slavkovikj, B. Vervisch, et al. Convolutional neural network based fault detection for rotating machinery. Journal of Sound and Vibration, 377 (2016): 331-345. [11] F. Jia, Y. Lei, J. Lin, et al. Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mechanical Systems and Signal Processing, 72-73 (2016) 303-315. [12] C. Li, R. V. Sanchez, G. Zurita, et al. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals. Mechanical Systems and Signal Processing, 76-77 (2016): 283-293. [13] Z. Gao, C. Cecati, and S. X. Ding, A survey of fault diagnosis and fault-tolerant techniques-Part II: Fault diagnosis with knowledge-based and hybrid/active approaches. IEEE Trans. Ind. Electron., ,62 (6) (2015) 3768-3774. [14] J. Peng, W. Sun, L. Ma and Q. Du. Discriminative transfer joint matching for domain adaptation in hyperspectral image classification. IEEE Geoscience and remote sensing letters. 99 (2019) 1-5. [15] W. Zhang, J. Li and L. Liu. Distributionally robust multi-instance learning with stable instances. Machine Learning. 2019. [16] J. Zhu and F. Gao. Similar batch process monitoring with orthogonal subspace alignment. IEEE Transactions on Industrial Electronics, 65 (10) (2018) 8173-8183. [17] Y. Liu, Y. Zhang, S. Coleman and J. Chi. Joint transfer component analysis and metric learning for person reidentification. Electronics Letters, 54 (13) (2018) 821-823. [18] Y. T. Hsieh, S. Y. Tao, YHH. Tsai, YR. Yeh, and YCF Wang. Recognizing heterogeneous crossdomain data via generalized joint distribution adaptation. IEEE International Conference on Multimedia and Expo, 2016. [19] B. Tan, Y. Zhang, S. J. Pan and Q. Yang. Distant domain transfer learning. In AAAI, (2017) 26042610. [20] F. shen, C. Chen, R. Yan, and R. X. Gao. Bearing fault diagnosis based on SVD feature extraction and transfer learning classification. In Proc. Prognostics Syst. Health Manage. Conf., Oct. (2015) 1-6. [21] W. S. Chu, F. De la Tore, and J. F. Cohn. Selective transfer machine for personalized facial action unit detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2013) 3515-3522. [22] S. Shao, S. Mcaleer, R. Yan and P. Baldi. Highly accurate machine fault diagnosis using deep transfer learning. IEEE Transactions on Industrial Informatics, 15 (4) (2019) 2446-2455. [23] D. Das and C. S. G. Lee. Sample-to-sample correspondence for unsupervised domain adaptation. Engineering Applications of Artificial Intelligence, 73 (2018) 80-91.

17

[24] D. Das and C. S. G. Lee. Unsupervised domain adaptation using regularized hyper-graph matching. IEEE 2018 International Conference on Image Processing (ICIP 2018), 2018. [25] N. Courty, R. Flamary, D. Tuia and A. Rakotomamonjy. Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39 (9) (2017) 1853-1865. [26] P. Cao, S. Zhang and J. Tang. Pre-processing Cfree gear fault diagnosis using small datasets with deep convolutional neural networks-based transfer learning. Neural and Evolutionary Computing, 6 (2017) 26241-26253. [27] J. Wei, J. Liang, R. He and J. Yang. Learning discriminative geodesic flow kernel for unsupervised domain adaptation. 2018 IEEE international conference on multimedia and expo (ICME), 1 (2018) 16. [28] D. Das and C. S. G. Lee. Graph matching and pseudo-label guided deep unsupervised domain adaptation. International Conference on Artificial Neural Networks (ICANN), 3 (2018) 342-352. [29] Y. Chen, S. Song, S. Li, L. Yang and C. Wu. Domain space transfer extreme learning machine for domain adaptation. IEEE Transactions on Cybernetics, 99 (2018)1-14. [30] T. Han, C. Liu, W. Yang, and D. Jiang. Deep transfer network with joint distribution adaptation: A new intelligent fault diagnosis framework for industry application.2018. [31] B. Gong, K. Grauman, F. Sha. Learning kernels for unsupervised domain adaptation with applications to visual object recognition. International Journal of Computer Vision, 109(1-2) (2014) 3-27. [32] V. M. Patel, R. Gopalan, R. Li, and R. Chellappa. Visual domain adaptation: a survey of recent advances. IEEE Signal Process, Mag., 32(3) (2015) 53-69. [33] P. J. Loughlin. Methods and applications of time-frequency analysis. Asian Pacific Journal of Allergy & Immunology, 107 (5) (2000) 30-36. [34] Y. Li. Theory and application of time-frequency transform. Xian: Northwestern Polytechnical University, 2003. [35] R. Geopalan, R. Li, R. Chellappa. Domain adaptation for object recognition: An unsupervised approach. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, (2011) 999-1006. [36] P. Samat. J. Gamba. et al. Geodesic flow kernel support vector machine for hyperspectral image classification by unsupervised subspace feature transfer. Remote Sensing, 8(3) (2016) 234. [37] M. Jiang, Y. Liang, X. Feng. Text classification based on deep belief network and softmax regression. Neural Computing & Applications, (2016) 1-10. [38] W. Lu, B. Liang, Y. Cheng, et al. Deep model based domain adaptation for fault diagnosis. IEEE Transactions on Industrial Electronics, 64(3) (2017) 2296-2305. [39] L. Maaten, G. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 620(1): (2008) 2579-2605. [40] Z. An, S. Li, J. W, Y. Xin and K. Xu. Generalization of deep neural network for bearing fault diagnosis under different working conditions using multiple kernel method. Neurocomputing, 352 (2018) 42-53. [41] Y. Cao, M. Long and J. Wang. Unsupervised domain adaptation with distribution matching machines. AAAI Conference on Artificial Intelligence (AAAI), 2018. [42] X. Jiang, S. Li, Y. Wang. A novel method for self-adaptive feature extraction using scaling crossover characteristics of signals and combining with LS-SVM for multi-fault diagnosis of gearbox. Journal of Vibro engineering, 17(4) (2015) 1861-1878.

18

BIOGRAPHIES:

Zhongwei Zhang received M.S. degrees in Shandong University (SDUT), Jinan, China, in 2014. Now he is a Ph.D. Candidate of Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China. His current research interests include Rotating Machinery Fault Diagnosis and Mechanical Signal Processing.

Huaihai Chen, Professor of Nanjing University of Aeronautics and Astronautics, Nanjing, China. His current research interests include random vibration control and vibration fatigue.

Shunming Li received Ph.D. degree in mechanics from Xi’an Jiaotong University, China, in 1988. He is a Professor in Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China. His current research interests include noise and vibration analysis & control, signal processing, machine fault diagnosis, sensing and measurement technology, intelligent vehicles.

Zenghui An received B.S. and M.S. degrees in University Of Jinan, Jinan, China, in 2013 and 2016. Now he is a Ph.D. Candidate with College of Energy and Power Engineering, Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China. His current research interests include mechanical fault diagnosis and deep learning.

19

Jinrui Wang received Ph.D. degree in Nanjing University of Aeronautics and Astronautics (NUAA), Nanjing, China, in 2019. He is a post-doctor in Shandong University of Science and Technology, Qingdao, China. His current research interests focus on intelligent fault diagnosis of machines.

20