Microelectronics Reliability 82 (2018) 179–189
Contents lists available at ScienceDirect
Microelectronics Reliability journal homepage: www.elsevier.com/locate/microrel
Fault diagnosis for the motor drive system of urban transit based on improved Hidden Markov Model
T
⁎
Huang Darong , Ke Lanyan, Chu Xiaoyan, Zhao Ling, Mi Bo Institute of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, PR China
A R T I C L E I N F O
A B S T R A C T
Keywords: Predictive neural network Intuitionistic fuzzy sets (IFS) Hidden Markov Model (HMM) Fault diagnosis Motor drive system
Fault diagnosis for the motor drive system of urban rail transit could reduce the hidden danger and avoid the disaster events as far as possible. In this paper, an improved Hidden Markov Model (HMM) algorithm is proposed for fault diagnosis of motors equipment for urban rail transit. In this approach, the initial value for observation matrix B in HMM is selected based on the predictive neural network and intuitionistic fuzzy sets. Firstly, by predictive neural network the observation probability matrix B is described qualitatively based on its mathematical explanation. Then, the quartering approach is introduced to define the rules between non-membership degree and observation probability matrix B, which obtains the matrix B quantitatively. Next, the selection algorithm for matrix B is given. Finally, the experiments about the motor drive system fault diagnosis of the urban rail transit are made to prove the feasibility for the proposed algorithm.
1. Introduction Urban rail transit has been the first choice to solve the traffic congestion due to its fast speed, high safe, good punctual, large capacity, etc. As a power equipment of the urban rail transit, the normal operation of the motor drive system is of great significance for the safety operation of the urban rail transit. In its running process, the most common faults are the bearing related faults, rotor windings inter-turn short circuit faults, motor broken bar faults and the other faults, etc. [1]. If these faults are not detected and some effective protective measures are also not taken in time, the fault-nodes will deteriorate even further, which will result in huge loss of personnel and property or even lead to disastrous consequences [2,3]. Therefore, it is of great theoretical and practical significance to reconstruct the motor fault so that the fault-tolerant control (FTC) can be carried out subsequently. As we know for experience, the vibration signature analysis is one of the most widely used methods for the condition monitoring of the rotating machinery equipment in the actual running. In fact, machine vibration arises due to action-reaction forces acting on the surface-tosurface contacts of moving machine parts [4,5]. In fact, a health machine should keep low level of vibrations. Once there will be security risk in machine equipment, the running signal include various types of vibration signatures. Based on above discussion, some works about fault diagnosis for motor are proposed, such as, expert system, artificial neural network, fuzzy diagnosis, support vector machine, baye etc. [6]. As one of the
⁎
intelligent diagnosis a method, HMMs method has been widely utilized because of its unique characteristics, i.e., it not only has a rich mathematical structure and a solid theoretical foundation, but also has some advantages. For example, the real model can be explained reasonably, the training sample is small and the classification precision is high, etc. It is the aforesaid reason that makes the HMMs widely used in the field of fault diagnosis. Recently, there are many reports about the fault detection and diagnosis for various motors based on HMMs [7–13]. The authors in [7,8] presents a novel method based on HMM technique and pattern recognition feature to diagnosis and detect motor bearing fault. In order to reduce the computation burden, an optimal method is proposed to select the order of HMM based on information entropy in [9]. For the short circuit fault of a motor, a novel diagnosis method for insulation failure of motor windings was constructed by combing with impulse testing and pattern recognition based HMM in [10]. Meanwhile, a HMM-based fault diagnosis algorithm was proposed to monitor the safety of speed-up and speed-down of rotating machinery in [11]. In reference [12], a method for faults diagnosis of rotating machinery is put forward based on 2-dimension HMM. In order to classify the state condition of running, a condition classification algorithm in [13] was conducted based on HMMs processing information obtained from vibration signals. In this paper, the authors had designed an on-line fault classification system with the adaptive model and re-estimation algorithm. The system may achieve a good rate of recognition in motor drive systems.
Corresponding author. E-mail address:
[email protected] (H. Darong).
https://doi.org/10.1016/j.microrel.2018.01.017 Received 19 October 2017; Received in revised form 15 January 2018; Accepted 28 January 2018 0026-2714/ © 2018 Elsevier Ltd. All rights reserved.
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
Nomenclature HMMS N T B πi bij aij pi μik Uik ωij ωki xj fi θk mi outputk E eij
Si O A μA γA πA σA Norm(⋅) E′ ξ η θ e(ξ, η, θ) p⋅(⋅) ξt(i, j) γt(⋅) αt(⋅) βt(⋅) φ⋅(⋅) P(⋅| ⋅)
Hidden Markov Models number of states in HMMs length of the observation sequence observation probability matrix initial state distribution of i state element of B state transition probability input of hidden layer mean vector of N covariance matrix of N weights between j and i weights between i and k input vectors of network activation function of network bias of each node k output of each hidden node i output of each output node k error sum of squares for output element of E
fault mode of MDS judgment matrix of membership intuitionistic fuzzy set membership function of A non-membership function of A hesitancy degree of x belongs to A non-hesitation degree index L2 normalization normalization relation of E first demarcation point in [0,1] second demarcation point in [0,1] third demarcation points in [0,1] mapping relation corresponding to every variable (ξ, η, θ) probability density of random variance probability in the state of Si at t and state of Sj at t + 1 mean probability of running state at t local probability in running state at t probability of observation sequence at t distribution function of probability conditional probability
the likelihood for various models via the Viterbi algorithm [16,17]. Finally, the fault diagnosis results are obtained based on the criterion of maximum likelihood. The flow frame is shown as Fig. 1. Fig. 1 indicates that the model training is one of the most important problems in fault diagnosis. However, the Baum-Welch algorithm may realize properly the model training, it still exists some defects. So it is necessary to solve these difficulties by some improved algorithms. To solve the problem, some studies have demonstrated that the
The main idea of these works are that the feature vectors, which are obtained by the FFT, wavelet transform, bispectrum, etc., are used as fault features, respectively. In addition, the obtained fault features are used as the observation time sequences which are fed to the HMMs. In actual application, by the Baum-Welch [14,15], every HMM model is trained in accordance with the training samples selected randomly from the fault characteristic, obtaining the HMMs. Subsequently, the test samples are regarded as the inputs of the obtained models to achieve
Fig. 1. Flow diagram of fault diagnosis based on HMM.
180
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
2. Theoretical foundations
training results about HMM is closely related with the selection of initial parameters [18–20]. Moreover, due to the non-stationarity on the output sequence state, the HMMs training results are affected by the initial value of matrix B greatly. If it is selected inappropriately, some uncontrolled errors will appear and the re-evaluation algorithm will loop ceaselessly. Aiming at the problem above, the K-means algorithm is widely used to get the initial value for matrix B. But the K-means algorithm is not perfect. Firstly, the estimation about the initial value K is a challenge in real application. Secondly, the final results are affected greatly by the selection of the initial clustering centers. Therefore, once the initial values are chosen inappropriately, it will difficult to get the effective results. Thirdly, during the adjustment of the sampling classification, the new clustering centers are demanded to calculate continuously, and its amount of calculation and the computer time are larger. Then, from the perspective of optimization, the optimal results are more accuracy and reliable if the initial values are selected around the optimization solution. Based on the aforesaid investigations, an optimized Baum-Welch algorithm is proposed by introducing gene expression programming in reference [21]. The transfer probability and emission probability are optimized to avoid the local optimum and improve the training quality of Baum-Welch algorithm. In order to simplify the complexity of the algorithm, the neural network is introduced to predict the matrix B in [22]. However, the method only gives the prediction about the observation probability matrix B qualitatively and there is no quantitative calculation method. In order to solve the problem, a new method for predicting the matrix B is proposed based on the predictive neural network and Intuitionistic Fuzzy Sets (IFS) in this paper. Firstly, in accordance with the analysis of the output about the predictive neural network and the mathematic explanation for matrix B, the relation between the predictive results and the elements of matrix B are explained qualitatively. Secondly, based on the predictive results of the neural network, the range of values about matrix B is given quantitatively by means of IFS. Thirdly, based on the results above, the rules are carried out to calculate the matrix B quantitatively. Additionally, the algorithm for the selection of matrix B is presented. Finally, the algorithm is applied to the HMM and the new model is used to make the experiments for motor faults diagnosis. The experimental results prove that the method not only can recognize the fault mode, but also can improve the recognition accuracy. The layout of the rest of the paper is organized as follows: Section 2 introduces the theoretical foundations about the purpose of this study. Section 3 gives the initial value predictive model, containing the qualitative description model, quantitative calculation model and specific algorithm. The contrast experiments are carried out in the Section 4. Finally, some conclusions and the directions for future work are discussed in Section 5.
2.1. The theory of Hidden Markov Model According to the reference [Nefian A V, Hayes M H. Face detection and recognition using Hidden Markov Models[C]//International Conference on Image Processing, 1998. ICIP 98. Proceedings. IEEE, 2002:141–145], HMMs are a set of statistical models used to characterize the statistical properties of a signal. HMM consists of an underlying, unobservable Markov chain with a finite number of states, a state transition probability matrix and an initial state probability distribution and a set of probability density functions associated with each state. Next, the basic principle of HMMs is introduced in detail as follows. Suppose that N denotes the number of states in HMMs and S is the set of states, i.e. S = (S1, S2, ⋯, SN). Thus, the state of the model at time t is given by qt ∈ S, 1 ≤ t ≤ T, where T is the length of the observation sequence. And let π = {πi} describes the initial state distribution, then
πi = P [q1 = Si], 1 ≤ i ≤ N
(1)
Next, let the state transition probability is shown as follows.
aij = P [qt = Sj | qt − 1 = Si] 1 ≤ i, j ≤ N
(2)
N
where, ξ and ∑ aij = 1, 1 ≤ i ≤ N . j=1
In a continuous density HMMs, the states are characterized by continuous observation of the model probability density function is a finite mixture of the form: M
bi (Ot ) =
∑ cik N (Ot , μik , Uik ), 1 ≤ i ≤ N k=1
(3)
where cik the mixture coefficient for the kth mixture in is state i, B = {bj(Ot)} is the state probability matrix. Without loss of generality N (Ot, μik, Uik) is assumed to be a Gaussian probability density function with mean vector μAi(E') and covariance matrix Uik. In real application, how to select B of HMMs is very important to diagnose the faults of systems. At present, the artificial neural network (ANN) is usually used to train and test the actual data for getting the initial value in the current study. Next, we will describe the structure and training process of the predictive neural network. 2.2. Predictive neural network ANN is a complex network interconnected by a large number of neurons, and its structure is shown as Fig. 2. As can be seen from Fig. 2, the input layer has m nodes for the input layer, the hidden layer has q nodes and the output layer has l nodes. Suppose that the pi represents the hidden input, thus it may be Fig. 2. The topology structure for ANN.
181
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
Accordingly, if X is defined in the discrete space, i. e. X = {x1, x2, …,xn}, thus the fuzzy set A is expressed as follows.
presented as follows. m
pi =
∑ ωij xj , i = 1, 2,
⋯, q
n
(4)
j=1
A=
i = 1, 2, ⋯, q
πA (x ) = 1 − μA (x ) − γA (x )
(5)
∼ 1 − μ (x ) − [1 − μ (x )] = 0, ∀ x ∈ X πA (x ) = A A
m
⎞ ⎛ ω m − θk k = 1, 2, ⋯, l ⎟ ⎜ ∑ ki i ⎠ ⎝ j=1
(10)
where, πA(x) denotes the hesitancy degree that x belongs to A. According to the condition of formula (4), the hesitancy degree may be also revised as other way as follows.
where f1 denotes the activation function of hidden layer and θi is the threshold of every node. Similarly, suppose that ωki is the connection weights between the hidden layer i and the output layer k, thus the output value of output layer may be indicated as follows.
outputk = f2
(9)
If A = < x, μA, γA > or A = < μA, γA > /x is the intuitionistic fuzzy sets, then for every fuzzy subset there exists intuitionistic fuzzy sets corresponding to its own subset (7). Therefore, for every intuitionistic fuzzy subset belonging to A, the intuitionistic index of x can be described as follows.
m
⎞ ⎛ ω x − θi , ⎟ ⎜ ∑ ij j ⎠ ⎝ j=1
> /x i , x i ∈ X
i=1
where ωij is the connection weights between the input layer j and the hidden layer i and xj is the input vectors. Accordingly, the output result of hidden layer may be constructed as follows.
mi = f1
∑ < μA (xi), γA (xi )
(11)
2.3.2. Correlation analysis among membership degree, non-membership degree and hesitation degree According to the definition of intuitionistic fuzzy set [25,26], the relationship among membership function μA(x), non-membership γA(x) and intuitionistic index πA(x) can be expressed as follows.
(6)
where f2 is the activation function for the output layer and θk is the bias of each node. In fact, the neural network can not only simulate the humans' ability about presentation and storage for knowledge, but also imitate the humans' reasoning behavior for knowledge. And the objective function is established by comparing the actual output with the ideal output. So the interconnection weights between the respective neurons may be revised by repeated learning to guarantee the convergence in given range. So, the neural network may be used as a predictor to predict the state of systems. Based on the idea above, we may construct a predictive neural network to train and predict the fault mode in engineering. For instance, the fault mode matrix and fault state characteristic vectors can be seen as the inputs of the trained network and the outputs of the predicted network, respectively. Hence, the objective function can be expressed by defining the error sum of squares for output, which is denoted by E. Moreover, the element bjk in matrix B denotes the conditional probability that the motor is in the state of Sj on the condition that the observational value is Ok. So, the performance of the neural network can be described by E, i. e., the larger E is, the worse the performance of the neural network is. In other words, the larger the E is, the smaller probability that the observation Ok is obtained from the neural network in the condition of the state of Sj. Obviously, the observation probability matrix B can be estimated qualitatively. However, it is difficult to obtain the matrix B quantitatively only by the predictive neural due to the ambiguity of the error. Then, and the fuzzy mathematics tool should be adopted to calculate the matrix B. Next, the basic theory of the intuitionistic fuzzy sets will be introduced in detail.
μA (x ) + γA (x ) + πA (x ) = 1
(12)
i.e.
γA (x ) = 1 − μA (x ) − πA (x )
(13)
Let δA(x) = 1 − πA(x), thus
γA (x ) = δA (x ) − μA (x )
(14)
where, δA(x) denotes the non-hesitation degree index and satisfies 0 ≤ δA(x) ≤ 1. Obviously, if the membership function μA(x) is determined, the nonmembership function of intuitionistic fuzzy set can be calculated through formulas (12)–(14). In fact, many research results shown that in most cases a constant intuitionistic index is enough to solve the real problems. So, let
πA (x ) = a (0 ≤ a ≤ 1)
(15)
Then, the non-hesitation degree can be obtained by the following formula:
δA (x ) = 1 − πA (x ) = 1 − a
(16)
Obviously, the non-membership degree is also a constant. Similarly, if let c = 1 − a, then
γA (x ) = c − μA (x )
(17)
So, the non-membership degree can be gotten once the intuitionistic index is given.
2.3. Theory of the intuitionistic fuzzy sets
3. Initial value predictive model based on neural network and intuitionistic fuzzy sets
2.3.1. The definition of intuitionistic fuzzy set Definition 1. [23,24] For a given universe X, the matrix A can be regarded as the intuitionistic fuzzy set if A meets the demands as below:
3.1. Qualitative description of observation probability matrix B for motor fault
A = { < x , μA (x ), γA (x ) | x ∈ X >}
(7)
In real engineering, the most common motor faults of urban transit are the bearing related faults, rotor windings inter-turn short circuit faults, motor broken bar faults and the other faults and so on. For sake of simplicity, we define the motor states as Si(i = 1, 2, ⋯, n). So, the predictive neural network is trained by the samples of fault mode and the trained model may diagnose the fault once the network is trained successfully in accordance to the specified conditions. Therefore, every state may be described Si = (0, ⋯, i, ⋯, 0). In addition, the conditional probability, which the observational value is Ok on the condition that
where, μA(x) : X → [0, 1] and γA(x) : X → [0, 1] denote the membership and non-membership function of A, respectively. And for ∀ x ∈ X, 0 ≤ μA(x) + γA(x) ≤ 1. If X is defined in the continuous space, then the fuzzy set A is expressed as follows.
A=
∫X
< μA (x ), γA (x ) > / x , x ∈ X
(8) 182
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
regarded as the corresponding membership function. Suppose that ξ, η and θ denote the demarcation points in interval [0,1], respectively. Moreover, their values must meet 0 < ξ < η < θ < 1. If the probability value satisfies P (ξ ≤ η ≤ θ) = 1for arbitrary random variables(ξ, η, θ), thus the mapping relation corresponding to every variable (ξ, η, θ) may be described as follows.
the motor is in the state of Sj, may be denoted as bjk = P(Ok|q = Sj), (j = 1, 2, ⋯, m). Thus, the observation matrix is achieved by processing the obtained data from the related equipment and denoted as X. Otherwise, the fault mode matrix, which is consisted of the fault mode vectors, is presented as S. Once S and X are regarded as the input and the output of the neural network, respectively. Thus, E is negatively related to the performance of the neural network according to the theoretical analysis in Section 2.2. Of course, the observation probability about HMMs can be estimated. Notice that the first column of the matrix S is input into the neural network, and then the corresponding output is regarded as the state observation O1. Then, the second order norm between the real observation value and the desired observation can be expressed as the objective function and it can be expressed by the vector
ei = [ei1 , ei2,⋯, eil]
e (ξ , η, θ): U → {A1 , A2 , A3 , A 4 }
In other words, the mapping relation can be further expressed in detail as follows.
⎧ A1 , x ≤ ξ ⎪ A2 , ξ < x ≤ η e (ξ , η, θ) (x ) = ⎨ A3 , η < x ≤ θ ⎪ ⎩ A4 , θ < x
(18)
Therefore, the error matrix E can be expressed as follows.
e e ⎡ 11 12 ⋮ ⋮ ⎢ E= ⋮ ⎢ ⋮ ⎢ ⎣ em1 em2
⋯ ⋯ ⋯ ⋯
e1l ⎤ ⋮⎥ ⋮⎥ eml ⎥ ⎦
(19)
μ A1 (x ) = P (x ≤ ξ ) =
μ A3
Definition 2. The norm(e) is the L normalization of the vector e = (e1, e2, …, en), if it meets the following requires:
=
norm (e ) 2
2
⎜
=
e1′2 + e2′2+⋯+en′ 2
⎟
⎜
⎟
⎜
2
⎜
⎟
(21)
ei norm (e )
+∞
(25)
⎟
⎜
⎟
⎜
− a2 ⎞ σ2 ⎠
⎜
⎟
⎜
− a3 ⎞ σ3 ⎠
μ A4 (x ) = φ ⎛ ⎝
Obviously, the mapping relation between x and x′ is indicated as follows.
ei′ =
+∞
∫x pξ (x ) dx − ∫x pη (x ) dx +∞ +∞ (x ) = P (η < x ≤ θ) = ∫ pη (x ) dx − ∫ pθ (x ) dx x x
x − a1 ⎞ μ A1 (x ) = 1‐φ ⎛ ⎝ σ1 ⎠ x − a1 ⎞ x − φ⎛ μ A2 (x ) = φ ⎛ ⎝ σ1 ⎠ ⎝ x − a2 ⎞ x − φ⎛ μ A3 (x ) = φ ⎛ ⎝ σ2 ⎠ ⎝
e12 + e22+⋯+en2 norm (e )2
e1 e2 en ⎛ ⎞ +⎛ ⎞ +⋯+⎛ ⎞ ⎝ norm (e ) ⎠ ⎝ norm (e ) ⎠ ⎝ norm (e ) ⎠
=
pξ (x ) dx
where, pξ(x), pη(x) and pθ(x) denote the probability densities of ξ, η and θ, respectively. Therefore, the corresponding membership function may be calculated by (25) and rewritten as follows.
(20)
e12 + e22+⋯+en2
+∞
μ A4 (x ) = P (θ < x ) = 1 − μ A1 (x ) − μ A2 (x ) − μ A3 (x )
Based on the Definition 1 above, the unite L2 normalization of e = (e1, e2… en) can be expressed as formula (21):
1 = norm (e′) =
∫x
μ A2 (x ) = P (ξ < x ≤ η) =
2
e12 + e22+⋯+en2
(24)
It is assumed that ξ, η and θ are considered to obey normal distribution. i.e. ξ~N(a1, σ12), η~N(a2, σ22), θ~N(a3, σ32). So, the four membership functions may be defined by formula (24) as follows:
where, ejk denotes the error that the output is Ok when the equipment is in the state of Sj (j = 1,2, …,m). Thus, due to the inconsistency about the error, the error matrix E should be normalized through Definition 2.
norm (e ) =
(23)
⎜
⎟
⎟
x − a3 ⎞ σ3 ⎠ ⎟
(26)
Obviously, if x and A* meet (24), then x belongs to the fuzzy set A*. Then, the specific formula is shown as follows.
(22)
So the error matrix may be normalized by (21) and (22). Then, the normalized error matrix E may be rewritten as E' by (22). Unfortunately, the mathematical explanation of ejk is fuzzy, so it is difficult to forecast the matrix B precisely through a single neural network theory. Therefore, it necessary to obtain bjk by means of the fuzzy theory to set the initial value of matrix B.
μ A∗ (x ) = max {μ A1 (x ), μ A2 (x ), μ A3 (x ), μ A4 (x ) }
(27)
Once the class information can be obtained by (27), the variation for observation probability may be determined by the Rule 1 as follows. Rule 1. If x belongs to the set A1, the corresponding value of observation probability belongs to the interval [0.75,1]; If x belongs to the set A2, the corresponding value belongs to interval [0.5,0.75]; If x belongs to the set A3, the corresponding value belongs to interval [0.25,0.5]; If x belongs to the set A4, the corresponding value belongs to interval [0,0.25].
3.2. Quantitative calculation model of observation probability matrix B Despite the fact that the obtained error matrix E can reflect the matrix B to some extent, the ambiguity about the error matrix makes it difficult to calculate the matrix B quantitatively. To solve the problem, the fuzzy mathematic theory can be adopted to calculate the matrix B. The basic idea of computing the membership degree is with aid of quartering. In fact, the quartering is a fuzzy statistical method that can determine the membership function and the non-membership function with four fuzzy concepts. What's more, the four concepts are small, smaller, normal and large, respectively, which corresponds to the large, larger, normal and small observation probability fuzzy set. Based on this method, the given universe X = [0,1] may be divided into four parts through fuzzy processes respectively. In fact, each demarcation point may be selected randomly and its probability distribution is
Obviously, a fault judgment matrix may be constructed according to Rule 1 as follows.
O = (μ A∗ (x ))m ∗ l
(28)
Thus, the output value of network may be gotten by (9)–(27). Next, an algorithm to get the observation probability matrix B needs to be designed. 3.3. Algorithm for the acquisition of observation probability matrix B The observation probability selection method based on predictive 183
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
Obviously, a reasonable computing flow framework for observation probability selection method may be designed as Fig. 3.
neural network, which simulates the elements of the matrix B, regards the system states as the network inputs and the corresponding observation as the network outputs. Meanwhile, the sum of squared error is seen as the objective function. The smaller the sum of the squared error is, the larger the probability that the observation is the output under the corresponding state is. In fact, the sum of squared error can reflect the simulation effects intuitively, and the error is inversely proportional to the simulating effects. However, the fuzziness about error makes it difficult to predict the matrix B by means of the predictive neural network solely. Then, a primary calculation algorithm for B may be designed based on predictive neural network as follows.
4. Experiments results and analysis To verifying the performance of the algorithm, the simulation equipment of the Motor Driven System of Urban Transit is selected to simulate the test data. The test system of Urban Transit is shown as Fig. 4. The simulate rig of Urban Transit Systems is constructed according to straddle-type monorail transportation in Chongqing of China P.R. In real engineering, the test rig of motor driven system is shown as Fig. 5. As we known, the faults of motor driven systems is caused by various factors. Of course, there are many types of faults add up to tens. So, it is difficult to diagnose and detect the running state of systems using all these factors. To simplicity purposes, we have collected five states such as the bearing related fault, rotor windings inter-turn short circuit fault, motor broken bar fault and fault-free condition to verify the effectiveness and practicability of the presented algorithm. They are represented symbolically by S1, S2, S3, S4 and S5, respectively. Therefore, the predictive neural network may be trained by the samples of fault mode. Once the network is trained successfully in accordance to the specified conditions, the trained model may diagnose the faults. More specifically, we may set S1 = (1 0 0 0 0), S2 = (0 1 0 0 0), S3 = (0 0 1 0 0), S4 = (0 0 0 1 0) and S5 = (0 0 0 0 1). Thus, the error matrix E and observation probability matrix B observation matrix will be achieved by processing the obtained data from the related equipment. Otherwise, the fault mode matrix, which is
Step 1. Set an appropriate network structure and training objective function. Let S and X are regarded as the network's input and output respectively to train the network. Step 2. Computing the error matrix E according the test samples by formula (19). And then the normalization error matrix E' will be obtained by (20)–(22). Step 3. Setting random the boundary points as ξ, η and θ in [0,1], and then computing the membership μA(x) and the error ejk at Sj by (23)–(26). Step 4. Computing the membership degree μAi(E') and the fault judgment matrix O by formulas (27)–(28). Step 5. Processing the fault judgment matrix O in accordance with the Rule 1, getting the range of initial value about B. Step 6. Outputting simulation results.
Fig. 3. Improved framework for observation probability selection method
184
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
Table 1 Training results. Epoch
MSE
Gradient
0/1000 14/1000 21/1000 60/1000 217/1000 307/1000
0.448/0.01 0.00676/0.01 0.00729/0.01 0.00774/0.01 0.00629/0.01 0.00463/0.01
0.0927/1e-06 0.00546/1e-06 0.00807/1e-06 0.0020/1e-06 0.00665/1e-06 0.0123/1e-06
networks with 13 nodes and 17 nodes are compared with 15 nodes in hidden layer. In fact, if the neural network with the least predictive error can be regarded as the optimal predictive neural network, the effect of the hidden nodes may be illustrated on network's performance. In real-world scenarios the time of training is 1000 and the limit of objective training error is 0.01.Then, the new network can be established and the training can be carried out as follows.
net = newff (minmax(S), [No(i), 7], {‘logsig’,‘logsig’},‘trainscg’);
net. trainParam. epochs = 1000; net. trainParam. goal = 0.01;
net = train(net, S, X); where, S and X represent the input and the output, respectively. Then, the training results are listed in Table 1. Obviously, the objective error meets the given demand after 10 times training, and then the specific training performance can be shown as Fig. 6. Meanwhile, the neural networks with 13 hidden nodes and 17 hidden nodes are also trained to verify their performance when the optimal training results are selected. Then, the optimal training results are compared with the results of the neural network with 15 hidden nodes. Additionally, the training performance of 13 hidden nodes and 17 hidden nodes is shown in Figs. 7 and 8, respectively. Obviously, the training performance with 15 hidden nodes is better than other two neural networks by comparing with Figs. 7 and 8. So, the neural network with 15 hidden nodes is selected to calculate the output error matrix, i.e.
Fig. 4. Simulate rigs of Urban Transit Systems
⎡ 0.0904 ⎢ 0.9271 E = ⎢ 0.5784 ⎢ 0.7499 ⎢ ⎣ 0.8645
0.9188 0.2379 0.5633 1.2478 0.7824
0.6355 0.5174 0.1774 1.0371 0.3599
0.7192 1.2200 0.9409 0.1175 1.1778
0.7522 ⎤ 0.6350 ⎥ 0.4405 ⎥ 0.8404 ⎥ 0.4407 ⎥ ⎦
(29)
Thus, the normalized error matrix E' can be achieved as
⎡ 0.0591 ⎢ 0.5287 E′ = ⎢ 0.4357 ⎢ 0.3790 ⎢ ⎣ 0.4934
0.6007 0.1357 0.4243 0.6306 0.4465
0.4155 0.2950 0.1336 0.5241 0.2054
0.4702 0.6957 0.7087 0.0594 0.6722
0.4918 ⎤ 0.3621 ⎥ 0.3318 ⎥ 0.4247 ⎥ 0.2515 ⎥ ⎦
(30)
Fig. 5. Test rigs of motor driven systems.
4.2. Achievement for the matrix B consisted of the fault mode vectors, is presented as S. Next, the error matrix E will be computing by the designed algorithm.
Notice that ejk is an important index that is used to judge the observation probability bjk. So, the object mode or the degree is determined by ejk. In addition, the elements from observation probability matrix are separated into four modes in accordance with the membership of fuzzy set. And four modes are denoted by large A1, larger A2, normal A3 and small A4, respectively. So, the variation of the observation probability may be determined by Rule 1. Suppose that the universe is X = [0,1], then the interval should be divided into four parts randomly. If the demarcation points are set as ξ = 0.25, η = 0.50 and θ = 0.75, respectively, then four intervals corresponding to the four modes can be expressed as [0,0.25], [0.25,0.5],
4.1. Acquisition for the error matrix E In our simulations, the number of the input nodes and output nodes is chosen as five and seven for the test environment, respectively. Therefore, the number of hidden nodes can be determined according to the rule that the number of hidden nodes equals to the sum of one and double number of output nodes, i.e. the number of hidden nodes is 15. To verify the forecasting performance, the running state of the neural 185
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
Fig. 6. Training performance with 15 hidden nodes.
Best Training Performance is 0.0079243 at epoch 10 1
10
Mean Squared Error (mse)
Train Best Goal 0
10
-1
10
-2
10
-3
10
0
1
2
3
4
5
6
7
8
9
10
10 Epochs [0.5,0.75] and [0.75,1]. And the mean value of ξ, η and θ is a1 = 0.25, a2 = 0.5 and a3 = 0.75. Otherwise, the variances of three variables is σ1 = 5/48, σ2 = 5/48 and σ3 = 5/48. Therefore, the corresponding distribution probability Φξ, Φη and Φθ are shown as follows.
E′ − Φξ = Φ ⎛ ⎝ σ1 ⎡ 0.7143 ⎢ 0.7138 =⎢ 0.7817 ⎢ 0.7789 ⎢ ⎣ 0.8157 ⎜
E ′ − a2 ⎞ Φη = Φ ⎛ ⎝ σ2 ⎠ ⎡ 0.0005 0.5862 ⎢ 0.1334 0.8909 = ⎢ 0.6566 0.9410 ⎢ 0.6071 0.7356 ⎢ ⎣ 0.2435 0.6554 ⎜
a1 ⎞ ⎟
⎠ 0.8132 0.9800 0.9445 0.0048 0.9451
0.2660 0.4886 0.6350 0.4480 0.9298
0.8861 0.6567 0.9743 0.7181 0.1023
0.8226 ⎤ 0.9963⎥ 0.8137 ⎥ 0.9348 ⎥ 0.8294 ⎥ ⎦
1
10
Mean Squared Error (mse)
Train Best Goal 0
10
-1
10
-2
10
-3
10
5
10
15
0.4061 0.3843 0.5816 0.1206 0.5056
0.6198 0.0087 0.4598 0.0005 0.8734
0.2635 ⎤ 0.5481 ⎥ 0.7750 ⎥ 0.9204 ⎥ 0.2225 ⎥ ⎦
(31)
Best Training Performance is 0.0090504 at epoch 37
0
⎟
20
25
30
37 Epochs 186
35
Fig. 7. Training performance with 13 hidden nodes.
(32)
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
Fig. 8. Training performance with 17 hidden nodes.
Best Training Performance is 0.0092933 at epoch 25 0
10
Mean Squared Error (mse)
Train Best Goal
-1
10
-2
10
-3
10
0
5
10
15
20
25
25 Epochs E′ − Φθ = Φ ⎛ ⎝ σ3 ⎡ 0.2341 ⎢ 0.4010 = ⎢ 0.4010 ⎢ 0.0067 ⎢ ⎣ 0.7502 ⎜
⎡ A2 ⎢ A3 O = ⎢ A3 ⎢ ⎢ A3 ⎢ ⎣ A4
a3 ⎞ ⎟
⎠ 0.1712 0.9024 0.9121 0.9855 0.3537
0.0221 0.0003 0.9487 0.8799 0.4350
0.5100 0.9996 0.5404 0.1375 0.9990
0.7411 ⎤ 0.8537 ⎥ 0.0012 ⎥ 0.0292 ⎥ 0.3870 ⎥ ⎦
0.1868 0.0200 0.0555 0.9952 0.0549
0.7340 0.5520 0.3750 0.5120 0.0702
0.1039 0.3433 0.0257 0.2819 0.8977
0.1774 ⎤ 0.0037 ⎥ 0.1863 ⎥ 0.0652 ⎥ 0.1706 ⎥ ⎦
⎡ 0.7138 ⎢ 0.5804 μ A2 (E′) = ⎢ 0.1251 ⎢ 0.1718 ⎢ ⎣ 0.5724
0.2270 0.0891 0.0035 0.7404 0.2897
0.6721 0.0637 0.0434 0.6086 0.4242
0.2663 0.6654 0.5145 0.7176 0.9757
0.5591 ⎤ 0.4482 ⎥ 0.0387 ⎥ 0.0144 ⎥ 0.6069 ⎥ ⎦
⎡ 0.0005 ⎢ 0.8932 μ A3 (E′) = ⎢ 0.6162 ⎢ 0.9940 ⎢ ⎣ 0.6379
0.0290 0.0085 0.0289 0.2403 0.3017
0.9160 0.5114 0.0079 0.3640 0.5056
0.5724 0.3125 0.9998 0.0005 0.0709
0.6406 ⎤ 0.5908 ⎥ 0.7738 ⎥ 0.9200 ⎥ 0.2225 ⎥ ⎦
A3 A1 A4 A4 A3
A4 A4 A3 A2 A4
A4 ⎤ A4 ⎥ A3 ⎥ ⎥ A3 ⎥ A2 ⎥ ⎦
B′= ⎡[0.50, 0.75] ⎢[0.00, 0.25] ⎢ ⎢[0.25, 0.50] ⎢[0.25, 0.50] ⎢ ⎣[0.0025, 0.]
(34)
[0.00, 0.25] [0.00, 0.25] [0.00, 0.25] [0.00, 0.25] [0.00, 0.25]
0.1712 0.9024 0.9121 0.9855 0.3537
0.0221 0.0003 0.9487 0.8799 0.4350
0.5900 0.9996 0.5404 0.1375 0.9990
0.7422 ⎤ 0.8537 ⎥ 0.0012 ⎥ 0.0292 ⎥ 0.3870 ⎥ ⎦
[0.25, 0.50] [0.75, 1.00] [0.00, 0.25] [0.00, 0.25] [0.25, 0.50]
[0.00, 0.25] [0.00, 0.25] [0.25, 0.50] [0.50, 0.75] [0.00, 0.25]
[0.00, 0.25]⎤ [0.00, 0.25]⎥ ⎥ [0.25, 0.50]⎥ [0.25, 0.50]⎥ ⎥ [0.50, 0.75]⎦
(39)
To sum up, the variation for the initial value of the observation probability matrix B is obtained, and then it can be used in HMM to solve the problems about the training time. (35)
4.3. Motor fault diagnosis experiment 4.3.1. Algorithm flow about Baum-Welch In real-world scenarios, the initial state vector of fault diagnosis based on improved Hidden Markov Model is selected combining with the proposed algorithm as follows.
(36)
(
π = 1 5, 1 5, 1 5, 1 5, 1 5 ⎡ 0.2341 ⎢ 0.4010 μ A4 (E′) = ⎢ 0.0404 ⎢ 0.0067 ⎢ ⎣ 0.7502
(38)
Subsequently, the degree that eij belongs to the fuzzy sets {A1, A2, A3, A4} can be obtained, thus the observation probability interval can be gotten by Rule 1 and be denoted as B′, i.e.
(33)
So, the degree that the elements of matrix E' are affiliated with the fuzzy set A1, A2, A3 and A4 can be described as follows:
⎡ 0.2857 ⎢ 0.2862 μ A1 (E′) = ⎢ 0.2183 ⎢ 0.2211 ⎢ ⎣ 0.1843
A4 A4 A4 A4 A4
)
Thus, the initial observation probability matrix is calculated by using the proposed method. Suppose that the models are trained by the EM algorithm. In general, the test samples may be input into the trained networks to get the likelihood logarithmic under various states. Finally, the state corresponding to the maximum likelihood logarithmic is regarded as the system state. To decrease the selection time, the initial value for matrix B is selected by the proposed method and is regarded as the initial value of the Baum-Welch algorithm. Meanwhile, if the initial state vector is selected, the state transition matrix may be given randomly. Therefore, the parameters of model
(37)
Notice that the maximum value of elements in μA1(E'), μA2(E'), μA3(E') and μA4(E')is assigned to one mode of the fuzzy sets {A1, A2, A3, A4}. So the fault judgment matrix O may be denoted by formulas (27)–(28) as following.
187
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
recognition rate
Table 2 Diagnosis results for five states.
-8
-10
loganithmic likelihood
-12
-14
S1 S2 S3 S4 S5
-16
-18
State symbol
S1
S2
S3
S4
S5
Recognition rate
S1 S2 S3 S4 S5
9 1 1 0 1
0 9 1 0 0
1 0 7 0 1
0 0 0 9 0
0 0 1 1 8
90% 90% 70% 90% 80%
Thus
ξt (i, j ) = P (qt = Si , qt + 1 = Sj | O, λ ) P (qt = Si , qt + 1 = Sj , O | λ )
=
P (O | λ ) α t (i) aij bj (Ot + 1 ) βt + 1 (j )
-20
= -22 1
2
3
4
5 6 exprimental times
7
8
9
10
P (O | λ ) α t (i) aij bj (Ot + 1 ) βt + 1 (j )
=
N
(a) Diagnosis results selecting B randomly
i=1 j=1
(42)
where, αt(i) denotes the local probability in the forward algorithm and βt+1(j) denotes the probability of local observation sequence in the backward algorithm respectively.
recognition rate -8
-10
αt (i) = P (o1,⋯, ot , qt | λ )
(43)
N
-12 loganithmic likelihood
N
∑ ∑ α t (i) aij bj (ot + 1 ) βt + 1 (j )
βt (t ) =
∑ aij bj (Ot+1 ) βt+1 (j) j=1
-14
t = T − 1, T − 2, ⋯, 1, 1 ≤ j ≤ N
M step: Combining with the calculated results in E step, π, A and B are rewritten by the following formula.
S1 S2 S3 S4 S5
-16
-18
(44)
πi = p (q1 = Si ) = γ1 (i)
(45)
T−1
-20
-22 1
2
3
4
5 6 exprimental times
7
8
9
∑ ξt (i, j ) aij = = t =T −1 1 ∑ γt (i)
10
t=1
(46)
T
(b) Diagnosis results selecting B by the new algorithm
∑ γt (j ) × δ (ot , vk )
Fig. 9. Diagnosis results on fault modes.
bj (k ) =
t=1 T
∑ γt (j ) t=1
may be optimized gradually by the EM algorithm to get the maximum of the likelihood logarithmic. The basic steps for the algorithm are designed as follows:
(47)
So λi+1 may be deduced by an iterative method. Step 3. Cycle of design
Step 1. Initialization Let πi = 1 5 , 1 5 , 1 5 , 1 5 , 1 5 , , the state transition Matrix is given randomly, and bjk is assigned by the presented method. Meanwhile, the initial model should be also achieved and let i = 0. Step 2. Steps for EM E step: Suppose that ξt(i, j) represented the probability in the state of Si at t and the state of Sj at t + 1 for the situation the Hidden Markov Model and the observation sequence of the system was given. And γt(i) is the mean probability in the state of Si at t. Let
(
)
ξt (i, j ) = P (qt = Si , qt + 1 = Sj | O, λ )
Repeating EM step guarantee that π, A and B are converged. If the trained HMMs had been obtained, the test sample should be regarded as input of HMMs to calculate the likelihood for each mode by Viterbi. Then, the fault diagnosis is implemented according to the maximum likelihood criterion. 4.3.2. Fault diagnosis results and analysis Aiming at the five kinds of motor states, they are bearing related faults S1, rotor windings inter-turn short circuit faults S2, motor broken bar faults S3, and the other faults S4, and fault-free condition S5. 300 groups of processed fault characteristics vectors, which are consist of five groups of data corresponding to every type of fault, are chosen randomly. Moreover, in each 60 groups of data, there are 50 groups of data are used to train the corresponding Hidden Markov Model, and the other 10 groups are used to verify the method. Then, the fault mode's experiment is taken as the example to verify the diagnosis effects by
(40)
N
γt (i) =
∑ ξt (i, j) j=1
(41)
188
Microelectronics Reliability 82 (2018) 179–189
H. Darong et al.
References
comparing the traditional methods, and the specified results can be shown in Fig. 9. It can be seen from Fig. 9(a) that the motor is diagnosed by state S1 with 6 times and state S2 with 2 times. Obviously, the accuracy of fault recognition nearly is 80% in tests. Similarly, Fig. 9(b) shown that there is only one test sample that motor is wrongly diagnosed by S2. In other words, the possibility that the fault S1 can be diagnosed with fault S1 is 90%. This shown that the improved algorithm proposed in this paper can improve the diagnosis rate to some extent. Moreover, it can shorten the training time due to the determination of the initial value for matrix B. Based on the proposed method above, the specified diagnosis results for the five motor states are shown as in Table 2. The diagnosis results in Table 2 show that the accuracy of average recognition rate may be approximately 84%. The rate of bearing related faults S1, rotor windings inter-turn short circuit faults S2 and the other faults S4 can be up to 90%. However, the recognition rate of motor broken bar faults S3 is only 70%. Although the small sample can affect the experimental results, the rate is effective on the whole.
[1] Fanzhong Wang, Motor Fault Diagnosis Based on Wavelet Neural Network and Support Vector Machine, Shanxi: Taiyuan, China P.R. (2011). [2] Talel Bessaoudi, Faycal Ben Hmida, Chien-Shu Hsieh, Robust state and fault estimation for linear descriptor stochastic systems with disturbances: a DC motor application, IET Control Theory Appl. 11 (5) (2017) 601–610. [3] Zhiwei Gao, Carlo Cecati, Steven X. Ding, A survey of fault diagnosis and faulttolerant techniques—part ii: fault diagnosis with knowledge-based and hybrid/active approaches, IEEE Trans. Ind. Electron. 62 (6) (2015) 3768–3774. [4] X. Zhang, Q. Miao, Z.W. Liu, Remaining useful life prediction of lithium-ion battery using an improved UPF method based on MCMC, Microelectron. Reliab. 75 (2017) 288–295. [5] Yuchen Song, Datong Liu, Chen Yang, Yu Peng, Data-driven hybrid remaining useful life estimation approach for spacecraft lithium-ion battery, Microelectron. Reliab. 75 (2017) 142–153. [6] Yan-xia Shen, Zhi-cheng Ji, Jian-guo Jiang, Review of artificial intelligence methods in motor fault diagnosis, Tech. Rev. 2 (2004) 39–42. [7] Tianjian Yu, Tefang Chen, Yating Chen, Shu Cheng, Fault diagnosis of bearing failure using HMMs, J. Harbin Inst. Technol. 48 (2) (2016) 184–188. [8] Tianjian Yu, Yating Chen, Tefang Chen, Chunyang Chen, Research on motor fault diagnosis based on HMM, J. Railw. Sci. Eng. 11 (4) (2014) 104–108. [9] Hu Wei, Gao Lei, Fu Li, Research on motor fault detection method based on optimal order Hidden Markov Model, Chin. J. Sci. Instrum. 34 (3) (2013) 524–530. [10] Hisahide Nakamura, Masaaki Chihara, Tomokazu Inoki, et al., Impulse testing for detection of insulation failure of motor winding and diagnosis based on Hidden Markov Model, IEEE Trans. Dielectr. Electr. Insul. 17 (5) (2010) 1619–1627. [11] Z. Li, Z. Wu, Y. He, C. Fulei, Hidden Markov Model-based fault diagnosis method in speed-up and speed-down process for rotating machinery, Mech. Syst. Signal Process. 19 (2005) 329–339. [12] D. Ye, Q. Ding, Z. Wu, New method for faults diagnosis of rotating machinery based on 2-dimension Hidden Markov Model, Proc. Int. Symp. Precis. Mech. Meas. 4 (2002) 391–395 (Hefei, China). [13] Q. Miao, V. Makis, Condition monitoring and classification of rotating machinery using wavelets and Hidden Markov Models, Mech. Syst. Signal Process. 21 (2) (2007) 840–855. [14] S.Z. Yu, H. Kobayshi, An efficient forward-backward algorithm for an explicitduration Hidden Markov Model, Signal Proc. IEEE Lett. 10 (1) (2003) 11–14. [15] Rui Wang, Li Yanxiao, Sun Hui, et al., Performance analysis of switched flight control system based on Hidden Markov Model, J. Electron. Inf. Technol. 39 (4) (2017) 989–996. [16] G.D.J. Forney, The Viterbi algorithm, Proc. IEEE 61 (5) (1973) 268–278. [17] Rory Telford, Stuart Galloway, Fault classification and diagnostic system for unmanned aerial vehicle electrical networks based on Hidden Markov Models, IET Electr. Syst. Transp. 5 (3) (2015) 103–111. [18] Zhang Yi, Yao Yuan, Yuan Luo, An improved HMM dynamic hand gesture recognition algorithm based on B parameter, J. Huazhong Univ. Sci. Technol. 43 (S1) (2015) 416–419. [19] Zaijian Wang, Yuning Dong, Zhang Hui, A multimedia service classification algorithm based on improved Hidden Markov Model, J. Electron. Inf. 37 (2) (2015) 499–503. [20] Huang Jingde, Hao Xue, Huang Yi, Potential fault recognition based on improved Hidden Markov Model, Chin. J. Sci. Instrum. 32 (11) (2011) 2481–2486. [21] Zhang Zengyin, Yuan Chang An, Hu Jianjun, et al., HMM training method based on GEP and Baum-Welch algorithm, Comput. Eng. Des. 31 (9) (2010) 2019–2027. [22] Xi Xiaojing, Kunhui Lin, Zhou Changle, et al., Artificial Intelligence Applications and Innovations [M], Springer US, 2005. [23] Peizhuang Wang, Fuzzy Set Theory and Its Application [M], Shanghai science and Technology Press, Shanghai, 1986. [24] Chuan Li, Luiz Ledo, Myriam Delgado, Mariela Cerrada, Fannia Pacheco, Diego Cabrera, René-Vinicio Sánchez, José Valente de Oliveira, A Bayesian approach to consequent parameter estimation in probabilistic fuzzy systems and its application to bearing fault classification, Knowl.-Based Syst. 129 (2017) 19–60. [25] Xing Qinghua, Liu Fuxian, Method of determining membership and nonmembership function of intuitionistic fuzzy sets, Control. Decis. 24 (3) (2009) 393–397. [26] Lei Yang, Hua Jixue, Yin Hongyan, Lei Yingjie, Method of determining IFS nonmembership function based on three - point method [J], Comput. Therm. Sci. 3 (01) (2009) 128–130.
5. Conclusion In accordance with the analysis of Baum-Welch algorithm, the selection method about the observation probability matrix B is put forward. Firstly, the related theory and knowledge about predictive neural network and intuitionistic fuzzy sets are introduced. Subsequently, the quantitative description model for the observation probability matrix is established by the predictive neural network. Based on this, the calculation model is given by the theory of intuitionistic fuzzy sets for computing the value range B′ of the observation probability matrix B. Finally, the motor states recognition experiments are carried out to verify the method proposed in this paper, and the experiment results prove that the proposed method is feasible and it can improve the recognition rate to some extent. However, owing to the fact that information data comes from different parts of Motor Driven Systems in Urban Transit Systems, the running situation of whole system is affected. Thereby, the investigation about how to logically distribute the information of systems remains an interesting area for further research. Meanwhile, in practical application of Motor Driven Systems in Urban Transit Systems, the users always hope that diagnosis algorithm can forecast and recognize all faults caused by all factors. So, how to fuse the incomplete and complete information of MDS to diagnosis the faults is also very important problem in Urban Transit Systems. Acknowledgements The work is sponsored by the Natural Science Foundation of China P.R. [Grant 61304104, 61573076 and 61663008]; the Program for Excellent Talents of Chongqing Higher School of China P.R. [Grant 2014-18]; the Chongqing Natural Science Foundation of China P.R. [Grant cstc2015jcyjA0504]. The authors would also like to thank the anonymous referees for the invaluable suggestions on improving this paper.
189