Digital Signal Processing 16 (2006) 402–418 www.elsevier.com/locate/dsp
A hybrid PSO-GD based intelligent method for machine diagnosis ✩ Qian-jin Guo a,b , Hai-bin Yu a,∗ , Ai-dong Xu a a Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, People’s Republic of China b Graduate School of the Chinese Academy of Sciences, Beijing 100039, People’s Republic of China
Available online 2 February 2006
Abstract This paper presents an intelligent methodology for diagnosing incipient faults in rotating machinery. In this fault diagnosis system, wavelet neural network techniques are used in combination with a new evolutionary learning algorithm. This new evolutionary learning algorithm is based on a hybrid of the constriction factor approach for particle swarm optimization (PSO) technique and the gradient descent (GD) technique, and is thus called HGDPSO. The HGDPSO is developed in such a way that a constriction factor approach for particle swarm optimization (CFA for PSO) is applied as a based level search, which can give a good direction to the optimal global region, and a local search gradient descent (GD) algorithm is used as a fine tuning to determine the optimal solution at the final. The effectiveness of the HGDPSO based WNN is demonstrated through the classification of the fault signals in rotating machinery. The simulated results show its feasibility and validity. © 2005 Elsevier Inc. All rights reserved. Keywords: Wavelet neural network; Particle swarm optimization; Gradient descent algorithm; Machine diagnosis
1. Introduction With the development of industry, the machine or the production system is becoming more and more complicated, and the fault diagnosis for such large and complicated system is also becoming more and more difficult. Any know-how to help make the fault diagnosis for the large-scale system easier to handle and implement, and make the diagnosis accuracy is clearly desirable. Recently, neural networks have become a popular tool in fault diagnosis due to their fault tolerance and their capacity for self-organization. The multilayer perception (MLP) [1], along with the back-propagation (BP) training algorithm, is probably the most frequently used type of neural network in practical applications. Unfortunately, these ANNs have some inherent defects, such as low learning speed, existence of local minima, and difficulty in choosing the proper size of network to suit a given problem. To solve these defects, we combine wavelet theory with it and form a wavelet neural network (WNN) whose activation functions are drawn from a family of wavelets. The wavelet neural network (WNN) has been proposed as a novel universal tool for functional approximation [2], which shows surprising effectiveness in solving the conventional problem of poor convergence or even divergence encountered in other kinds of neural networks. It can dramatically increase convergence speed [3–5]. Advances in ✩
Project supported by the National High-Tech. R&D Program for CIMS, China (Grant 2003AA414210).
* Corresponding author.
E-mail address:
[email protected] (H. Yu). 1051-2004/$ – see front matter © 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.dsp.2005.12.004
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
403
its theory, algorithms, and application have greatly influenced the development of many disciplines of science and technology, including mathematics, physics, engineering, and will continue to influence other areas as well [6–8]. How to determine the structure and parameters of the neural networks promptly and efficiently has been a difficult issue all the time in the field of neural networks research [9]. In previous literature [10–15,20], the standard PSO algorithm has been used in training MFNNs. Other variations of PSO were also to train MFNNs [16–18], and the performance was acceptable. This paper tries to apply the HGDPSO that combines the PSO with gradient search algorithm for training the WNN. Particle swarm optimization (PSO) developed by Kennedy and Eberhart [19,20] is one of the modern heuristic algorithms under the evolutionary algorithms (EAs) and gained lots of attention in various engineering applications [21–25]. Generally, the PSO is characterized as a simple heuristic of well balanced mechanism with flexibility to enhance and adapt to both global and local exploration abilities [23]. It is a stochastic search technique with reduced memory requirement, computationally effective and easier to implement compared to other EAs. PSO shares some of the common features available in other EAs, even with the selection procedure in some adaptive PSO versions [26,27]. Although PSO and other EAs seem to be good methods to solve optimization problem, sometimes the solutions obtained from both methods are just near global optimum ones. To overcome this drawback, a hybrid method that integrates the PSO with a gradient descent algorithm is proposed in this paper. The hybrid proposed method uses the property of PSO, which can provide a good solution even the problem has many local optimum solutions at the beginning. Then, the local searching property of GD algorithm is used to obtain a final solution. Basically, the hybrid method can be divided into two parts. The first part employs PSO to obtain a near global solution, while the second part employs GD algorithm to determine the optimal solution. To validate the performance of the proposed approach, fault diagnosis for rotating machine is tested and the results obtained are compared with those obtained using PSO and GD technique in this paper. This paper is organized as follows. In Section 2, the architectures of wavelet neural networks are introduced. Overviews of PSO and GD algorithm are described in Sections 3 and 4, respectively. Section 5 describes the learning algorithm of HGDPSO for wavelet neural network design. In Section 6, designs of WNN by HGDPSO are simulated and applied to intelligent fault diagnosis. Finally, conclusions are presented in Section 7. 2. Neural network for fault diagnosis The WNN employed in this study is designed as a three-layer structure with an input layer, wavelet layer (hidden layer) and output layer. The topological structure of the WNN is illustrated in Fig. 1, where wj k denotes the weights connecting the input layer and the hidden layer, uij denote the weights connecting the hidden layer and the output layer. In this WNN model, the hidden neurons have wavelet activation functions of different resolutions; the output neurons have sigmoid activation functions. A similar architecture can be used for general-purpose function approximation and system identification. The activation functions of the wavelet nodes in the wavelet layer are derived from a mother wavelet ψ(x), suppose ψ(x) ∈ L2 (R), which represents the collection of all measurable functions in the real space, satisfies the admissibility condition [28]: +∞ ˆ |ψ(ω)|2 dω < ∞, (1) ω −∞
ˆ where ψ(x) indicates the Fourier transform of ψ(x). The output of the wavelet neural network Y is represented by the following equation: M L uij ψa,b wj k xk (t) , i = 1, 2, . . . , N, (2) yi (t) = σ (xn ) = σ j =0
k=0
where σ (xn ) = 1/(1 + e−xn ).
(3)
Here yj denotes the j th component of the output vector, xk denotes the kth component of the input vector, uij is the connection weight between the output unit i and the hidden unit j , wj is the weight between the hidden unit j and
404
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
Fig. 1. The WNN topology structure.
input unit k, aj is the dilation coefficient of wavelons in hidden layer, bj is the translation coefficient of wavelons in hidden layer, L, M, N is the sum of input, hidden, and output nodes, respectively. 3. Particle swarm optimization 3.1. Basic concept of particle swarm optimization Particle swarm optimization (PSO) is an evolutionary computation technique motivated by the simulation of social behavior [19,20]. Namely, each individual (agent) utilizes two important kinds of information in decision process. The first one is their own experience; that is, they have tried the choices and know which state has been better so far, and they know how good it was. The second one is other agent’s experiences; that is, they have knowledge of how the other agents around them have performed. Namely, they know which choices their neighbors have found are most positive so far and how positive the best pattern of choices was. In the PSO system, each agent makes his decision according to his own experiences and other agent’s experiences. The system initially has a population of random solutions. Each potential solution, called a particle (agent), is given a random velocity and is flown through the problem space. The agents have memory and each agent keeps track of its previous best position (called the pbest) and its corresponding fitness. There exist a number of pbest for the respective agents in the swarm and the agent with greatest fitness is called the global best (gbest) of the swarm. Each particle is treated as a point in a D-dimensional space. The ith particle is represented as Xi = (xi1 , xi2 , . . . , xiD ). The best previous position of the ith particle (pbesti ) that gives the best fitness value is represented as Pi = (pi1 , pi2 , . . . , piD ). The best particle among all the particles in the population is represented by Pg = (pg1 , pg2 , . . . , pgD ). The velocity, i.e., the rate of the position change for particle i is represented as Vi = (vi1 , vi2 , . . . , viD ). During the iteration procedure, the velocity of particle i is updated according to the following equation: k+1 k k k + c2 rand × pgd − xid , (4) vid = wvid + c1 rand × pid − xid where Vik is the current velocity of agent i at iteration k, Vik+1 is the modified velocity of agent i at iteration k + 1, rand is the random number between 0 and 1, Xik is the current position of agent i at iteration k, pid is the pbest of agent i, pgd is the gbest of the group, w is the weight function for velocity of agent i, c1 , c2 are the weight coefficients for each term, k is the current iteration number.
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
405
Using the above equation, a certain velocity that gradually gets close to pbest and gbest can be calculated. The current position (searching point in the solution space) can be modified by the following equation: k+1 k+1 k xid = xid + vid .
(5)
Velocities of particles on each dimension are clamped by a maximum velocity Vmax . If the sum of accelerations would cause the velocity on that dimension to exceed Vmax , which is a parameter specified by the user, then the velocity on that dimension is limited to Vmax . Vmax influences PSO performance sensitively. A larger Vmax facilitates global exploration, while a smaller Vmax encourages local exploitation [29]. The constants c1 and c2 represent the weighting of the stochastic acceleration terms that pull each particle toward pbest and gbest positions. Low values allow particles to roam far from target regions before being tugged back. On the other hand, high values result in abrupt movement toward, or past, the target regions. Hence, the acceleration constants c1 and c2 were often set to be 2.0 according to past experience. The inertia weight w is introduced in [30] to improve PSO performance. The model using (6) is called “inertia weights approach (IWA).” Suitable selection of inertia weight w provides a balance between global and local exploration and exploitation, and on average results in less iterations required to find a sufficiently optimal solution. As originally developed, the inertia weight w is set according to the following equation: w = wmax −
wmax − wmin × k, itermax
(6)
where wmax is the initial weight, wmin is the final weight, itermax maximum iteration number. 3.2. Constriction factor approach (CFA) [20,31,32] The basic system equation of PSO ((4)–(6) in IWA) can be considered as a kind of difference equations. In later studies, for insuring convergence, an analysis of the algorithm from mathematical aspects was given by Clerc [31,32], who proposed the use of a constriction factor χ . The algorithm was named the constriction factor approach (CFA). The velocity of CFA (simplest constriction) can be expressed as follows:
k+1 k + c rand ×(p − x k ) + c rand ×(p − x k )), vid = χ(vid 1 id 2 gd id id k+1 k + v k+1 , = xid xid id
χ=
2κ , t|2 − ϕ − ϕ 2 − 4ϕ
φ = c1 + c2, ϕ > 4.
(7)
The variable κ can range in [0, 1]; a value of 1.0 works fine, as does a value of ϕ = 4.1. Thus, if ϕ = 4.1 and κ = 1, then χ ≈ 0.73, simultaneously damping the previous velocity term and the random ϕ. The CFA of PSO is not defined for ϕ 4.0. As ϕ increases above 4.0, χ gets smaller; for instance, if ϕ = 5, then χ ≈ 0.38, and the damping effect are even more pronounced. The convergence characteristic of the system can be controlled by ϕ. Namely, Clerc et al. [32], found that the system behavior could be controlled so that the system behavior has the following features: (a) The system does not diverge in a real value region and finally can converge. (b) The system can search different regions efficiently by avoiding premature convergence. Unlike other EC methods, CFA of PSO ensures the convergence of the search procedures based on the mathematical theory. CFA can generate higher quality solutions than PSO with IWA [33]. Eberhart and Shi compared the performance of PSO with its different versions, and concluded that the best approach is to use the constriction factor while limiting the maximum velocity Vmax to the dynamic range of the variable Xmax on each dimension [33]. For instance, it is convenient to take Vmax = Xmax . However, CFA only considers dynamic behavior of one agent, but not the effect of the interaction among agents. Namely, the equations were developed with fixed best positions (pbest and gbest) although pbest and gbest can be changed during search procedure in the basic PSO equations.
406
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
Fig. 2. Typical neighborhood topologies.
3.3. Neighborhood types Eberhart and Kennedy called the above-mentioned basic method as “gbest model.” In 1999, Kennedy studies how different social network structures can influence the performance of the PSO algorithm, arguing that a manipulation affecting the topological distance among particles might affect the rate and degree to which individuals are attracted toward a particular solution [34]. The influence of various neighborhood topologies on the PSO has been studied extensively [34–40]. Figure 2 illustrates the best-known neighborhood topologies. (1) Star (see Fig. 2a), also known as gbest, is a fully connected neighborhood relation. Each particle has all the other particles as neighbors; this implies that the global best particle-position for all particles is identical. (2) Circle (see Fig. 2b), also known as lbest, connects each particle to its K immediate neighbors. For K = 2 the circle topology corresponds to a ring. The “flow of information” is heavily reduced compared to the star topology. It will for instance take swarmsize/2 time steps for a new global best position to propagate to the other side of the ring. (3) Von Neumann (see Fig. 2c), the Von Neumann architecture, taking the form of a grid with wrap-around, considers the particles above, below, to the left and to the right to be within a given particle’s neighborhood. The more densely connected the neighborhood, the quicker information about good solutions is communicated amongst particles in the swarm. Neighborhood topologies such as lbest and Von Neumann result in superior solutions at the cost of slower convergence, since diversity within the swarm is maintained longer. (4) Wheel connects one central particle to all the others (see Fig. 2d), where all peripheral particles are not connected with each other. Even though the number of connections is smaller in this topology than in the circle topology (swarmsize − 1 compared to swarmsize), the information will flow faster since it will not take more than two timesteps for one new global best position to reach any other particle.
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
407
In comparisons of lbest and gbest canonical algorithms, a difference has been noted in convergence speed and ability to search over local optima [20]. The gbest topology (i.e., the biggest neighborhood possible) has often been shown to converge on optima more quickly than lbest, but gbest is also more susceptible to the attraction of local optima, as the population rushes unanimously toward the first good solution found. 4. Gradient descent (GD) algorithm Gradient descent is a long term established search/learning technique and commonly used to train multilayer feed forward neural networks [41,42]. The network is trained with gradient descent (GD) algorithm in batch way, During the training phase, wavelet node parameters, a, b and WNN weights, wj k , uij , are adjusted to minimize the leastp square error, given the di as the ith desired target output of pth input pattern, the cost function can be written as E=
P N 1 p p 2 di − yi . 2
(8)
p=1 i=1
The selection of the mother wavelet is very important and depends on the particular application. There are a number of well-defined mother wavelets such as Morlet, Harr, Mexican Hat, and Meyer. Groups of them are called families, such as Daubechies, Biorthogonals, Coifets, and Symmlets [43]. For this wavelet neural network, Morlet wavelet has been chosen to serve as an adoption basis function to the network’s hidden layer, which has been the preferred choice in most work dealing with WNN, due to its simple explicit expression. ψa,b (t) = cos(1.75tz )e−tz /2 , 2
where
tz =
(9)
t −b , a
2
2 tz t + cos(1.75tz ) exp − z tz . 1.75 sin(1.75tz ) exp − 2 2 Denote that netj = L k=0 wj k xk . Thus
netj −bj ψa,b (netj ) = ψ , aj M uij ψa,b (netj ) , i = 1, 2, . . . , N, yi (t) = σ ψa,b (t) = −
(10)
(11) (12)
j =0
σ (u) =
∂σ (u) = σ (u) 1 − σ (u) . ∂u
(13)
Such that p p ∂E p p p di − yi yi 1 − yi ψa,b netj , =− ∂vij P
δvij =
(14)
p=1
δwj k
P N p p p ∂E p p p di − yi yi 1 − yi uij ψa,b netj xk aj , = =− ∂wj k
(15)
p=1 i=1
δbj =
P N p ∂E p p p p di − yi yi 1 − yi uij ψa,b netj aj , = ∂bj
(16)
p P N p netj −bj ∂E p p p p di − yi yi 1 − yi uij ψa,b netj = . ∂aj aj2
(17)
p=1 i=1
δaj =
p=1 i=1
408
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
The learning rate and momentum are set as η and μ in the experiments respectively. ∂E + μ wj k (t), ∂wj k ∂E uij (t + 1) = −η + μ uij (t), ∂uij ∂E aj (t + 1) = −η + μ aj (t), ∂aj ∂E bj (t + 1) = −η + μ bj (t). ∂bj
wj k (t + 1) = −η
(18) (19) (20) (21)
Then the parameters are updated as follows: wj k (t + 1) = wj k (t) + wj k (t + 1),
(22)
uij (t + 1) = uij (t) + uij (t + 1),
(23)
aj (t + 1) = aj (t) + aj (t + 1),
(24)
bj (t + 1) = bj (t) + bj (t + 1).
(25)
5. Hybrid particle swarm optimization In their original paper, Eberhard and Kennedy applied PSO technique to optimizing artificial neural networks [20]. They found that particle swarm optimization was comparable to gradient descent algorithm for training neural networks, and was even superior in some respects. This is why particle swarm optimization was chosen as a training method for this project. However, it is important to note that their neural networks were very limited—the network they describe in their paper used only 13 connection weights. The neural networks used here have anywhere from 150 connection weights, so it is yet to be seen if particle swarm optimization is comparable to gradient descent algorithm in a non-trivial context. The stochastic nature of the particle swarm optimizer makes it more difficult to prove (or disprove) like global convergence. Ozcan and Mohan have published the first mathematical analyses regarding the trajectory of a PSO particle [44,45]. From the theoretical analysis [32], the trajectory of the particle xi (t) converges onto a weighted mean of pbest and gbest. Solis and Wets [46] have studied the convergence of stochastic search algorithms, most notably that of pure random search algorithms, providing criteria under which algorithms can be considered to be global search algorithms, or merely local search algorithms. Van Den Bergh [18] used their definitions extensively in the study of the convergence characteristics of the PSO and the guaranteed convergence PSO (GCPSO), he proved the PSO is not even guaranteed to be local extrema, and GCPSO can converge on a local extremum. Cui and Zeng [47] have proposed a new particle swarm optimizer, called stochastic PSO (SPSO), which is guaranteed to convergence to the global optimization solution with probability one, and have provided the global convergence analysis of SPSO using the Solis and Wets’ research results. Inspired by all these, we used a hybrid algorithm integrating PSO with gradient descent algorithm for WNN training. We will call this algorithm HGDPSO in the following sections. The constriction PSO method results in particle convergence over time; that is, the amplitude of the individual particle’s oscillations decreases as it focuses on a previous best point [20]. Though this kind of particle converges to a point over time, another factor in the paradigm prevents collapses of the trajectory—that is the fact that the target “best” point is actually a stochastically weighted average of two points, pbest(i) and gbest [20]. If those two points are near one another, then the particle will cycle around singular center, eventually converging on the reign of the two points. On the other hand, if the swarm is still exploring various optima, and a particle’s own previous is in a different region from the neighborhood’s previous best, that is, pbest(i) is distant from gbest, then the particle’s cycles will remain wide; it cannot converge on a target that keeps moving around the search space. Thus a particle with a built-in tendency to converge will continue to explore when the “social” conditions are not conformist [20]. As the members of a neighborhood begin to cluster in the same optimal region, the particle trajectories will become narrower, intensely exploiting a focused region of the search space [20]. So, if xi (t) = pbest(i) = gbest, the velocity of particle i tends to be quite small. To improve the global search capability, we conserve the current best position of
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
409
the swarm gbest, and use gradient descent algorithm to optimize particle i’s position xi (t + 1), and other particles are manipulated according to (7), this means, If pbest(i) = gbest, then particle i’s position xi (t + 1) needs to continue optimize using gradient descent algorithm and other particles are manipulated according to (7); if pbest(i) = gbest, and does not change, then all particles are manipulated according to (7); if pbest(i) = gbest, and changes gbest, there exists an integer k, which is satisfied xk (t + 1) = pbest(k) = gbest, then particle k’s position xk (t) needs to continue optimize using gradient descent algorithm and other particles are manipulated according to (7), thus the global search capability enhanced. It is assumed that the WNN is chosen for all application cases in this study. Without loss of generality, it is denoted that W being the connection weight matrix between the input layer and the first hidden layer, U , the one between the hidden layers and the output layer, A, the dilation coefficient of wavelons in hidden layer, and B, the translation coefficient of wavelons in hidden layer. It is further denoted as X = {W, U, B, A}. When a PSO is used to train the WNN, the ith particle is represented as Xi = {Wi , Ui , Bi , Ai }. The best previous position (i.e., the position giving the best fitness values) of any particle is recorded and represented as Pi = {Piw , Piu , Pib , Pia }. The index of the best particle among all particles in the population is represented by the symbol g, hence Pg = {Pgw , Pgu , Pgb , Pga } is the best matrix found in the population. The rate of position change (velocity) for particle i is represented as Vi = {Viw , Viu , Vib , Via }. Let m and n denote the index of matrix row and column, respectively. The particles are manipulated according to Eq. (7). The pseudo code of the proposed solution methodology that integrates the CFA of PSO with GD algorithm for WNN training can be summarized as follows: (1) Initialize the dilation parameter aj , translation parameter bj and node connection weights wj k , uij to some random values. The dilation parameter aj are limited in the interval (1, 3), translation parameter bj are limited in the interval (1, 2). Randomly initialize particle position Xi ∈ D in R. Randomly initialize particle velocities 0 vi v max . Let k indicates iteration times, set constants kmax , c1 , c2 , V max , w, d. Set i = 1. (2) Input learning samples Xn (t) and the corresponding output values On to WNN, where t varies from 1 to L, representing the number of the input nodes, n represents the nth data samples of training set. (3) Initialize a population (array) of particles with random positions and velocities in the d dimensional problem space. (4) For each particle, evaluate the desired optimization fitness function in d variables. (5) Compare particle’s fitness evaluation with particle’s pbest. If current value is better than pbest, then set pbest value equal to the current value and the pbest location equal to the current location in d-dimensional space. (6) Compare fitness evaluation with the population’s overall previous best. If the current value is better than gbest, then reset gbest to the current particle’s array index and value. (7) For every particle define a neighborhood and determine the local best fitness value (lbest) and its coordinates for every particle. (8) If pbest(i) = gbest, then all particles are manipulated according to Eq. (7) respectively; go to step (10). (9) If pbest(i) = gbest, then particle i’s position xi (t + 1) needs to use gradient descent algorithm to optimize and other particles are manipulated according to Eq. (7). (10) (Check the stop criterion). Repeat step (4) until a criterion is met, usually a sufficiently good fitness or a maximum number of iterations/epochs. (11) Save the training parameters and the whole training of the WNN is completed. As a result, the network model in this paper is constructed by using HGDPSO as training algorithm and Morlet mother wavelet basic function as node activation function. 6. Experiments The practical task of fault diagnosis is to realize mapping between the symptoms and the faults, and this mapping can be implemented by function approximation. There are two ways to solve problems concerning the approximate function. One is the parameter algorithm, that is, to get answers via analytical expression. The other is the nonparameter algorithm. Neural networks are an efficient way to turn such an idea into reality. As a test of the HGDPSO based WNN described above, we simulated a few examples by choosing different methods to compare.
410
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
Fig. 3. The architecture of CBM system.
Fig. 4. The sketch of the experimental rotor-bearing system.
6.1. Case 1. Experiments for fault diagnosis of rotating system Here the work uses a CBM (condition based maintenance) system that we have developed to test WNN method for fault diagnosis of rotating machinery. This system is designed in a modular fashion (see Fig. 3), and it contains the data acquisition module, signal processing module, feature extraction module, intelligent diagnosis module, training module, and prognostics module, user module, etc. 6.1.1. Data acquisition Experiments were performed on a machinery fault simulator, which can simulate the most common faults, such as misalignment, unbalance, resonance, radial rubbing, oil whirling, and so on. The schematic of the test apparatus is shown in Fig. 4. It mainly consists of a motor, a coupling, bearings, discs, and a shaft. In this system, the rotor is driven by an electromotor, and the bearing is the journal bearing. The fault samples are obtained by simulating corresponding fault on experiment rotating system. For example, adjusting the simulator
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
411
Fig. 5. User interface.
plane highness and degree simulates the misalignment faults; adding an unbalance weight on the disc at the normal condition creates the unbalance. A radial acceleration was picked up from an accelerometer located at the top of the right bearing housing. The shaft speed was obtained by one laser speedometer. The measurements with acceleration, velocity, or displacement data from rotating equipment are acquired by the NI digital signal acquisition module, and then are collected into an embedded controller. A total of six conditions were tested: resonance, stable condition after resonance, bearing housing looseness, misalignment, oil whirling, and unbalance. Each condition was measured with a given times continuously. The frequency of used signal is 5000 Hz and the number of sampled data is 1024. 6.1.2. Feature extraction The features of vibration signals are extracted with the fast Fourier transform (FFT), the time-frequency spectrum of data is computed and fed into the training stage, in which 6 faults and 7 frequency bounds are selected to form a feature vector. These feature vectors are used as input and output of BP or WNN. Figure 5 shows the user interface. 6.1.3. Fault diagnosis and analysis 6.1.3.1. Experimental parameters According to the theory [2,4], the number of nodes in the hidden layer of the network is equal to that of wavelet base. If the number is too small, WNN may not reflect the complex function relationship between input data and output value. On the contrary, a large number may create such a complex network that might lead to a very large output error caused by over-fitting of the training sample. The network architecture used for fault diagnosis consists of 7 inputs corresponding to the 7 different ranges of the frequency spectrum of a fault signal (Tables 1 and 2), 10 hidden nodes, and 6 outputs corresponding to 6 respective faults, such as unbalance, misalignment, oil whirling, oil oscillating, radial rubbing and twin looseness, and so on. In the experiment, 400 groups of data of feature are acquired on a machinery fault simulator; 100 groups of the data are used as the training set, the remaining 300 groups are used as the diagnosis set samples. Feature vectors are used as input and output of BP or WNN. After all possible normal operating modes of the fault are learned, the system enters the fault diagnosis stage, in which the machinery vibration data are obtained and are subjected to the preprocessing and feature extraction methods described in the signal processing and feature extraction stage. 6.1.3.2. Selection of parameters for training algorithms The selection of these optimal parameters plays an important role in various training algorithms. A single parameter choice has a tremendous effect on the rate of convergence. For this paper, the optimal parameters are determined by trail and error experimentations.
412
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
Table 1 The sets of training sample Fault class
<0.40ω0
0.4–0.5ω0
0.51–0.99ω0
1ω0
2ω0
3–5ω0
>5ω0
Unbalance Misalignment Oil whirling Oil oscillating Radial rubbing Twin looseness
0.00000 0.00000 0.00000 0.00000 0.21000 0.00000
0.00000 0.00000 0.71000 0.96520 0.18000 0.00000
0.00000 0.00000 0.00000 0.00000 0.16520 0.00000
1.00000 0.90000 1.00000 1.00000 1.00000 0.20000
0.00420 1.00000 0.02001 0.01000 0.17650 0.15000
0.00350 0.00100 0.00660 0.00500 0.19000 0.43620
0.00000 0.00000 0.00000 0.00000 0.20000 0.25330
Table 2 Desired output of networks Fault class
1
2
3
4
5
6
Unbalance Misalignment Oil whirling Oil oscillating Radial rubbing Twin looseness
1.00000 0.00000 0.00000 0.00000 0.00000 0.00000
0.00000 1.00000 0.00000 0.00000 0.00000 0.00000
0.00000 0.00000 1.00000 0.00000 0.00000 0.00000
0.00000 0.00000 0.00000 1.00000 0.00000 0.00000
0.00000 0.00000 0.00000 0.00000 1.00000 0.00000
0.00000 0.00000 0.00000 0.00000 0.00000 1.00000
In HGDPSO, a particle X is a set of parameters of WNN denoted as {W 1 , W 2 , B, A}. The domain of the weights wj , the dilation aj and translation bj is different, so this work divides X into X1 , X2 , and X3 , where X1 = {W 1 , W 2 }, X2 = {B}, X3 = {A}, and the rate of position change (velocity) V for particle X is accordingly divided into V1 , V2 , V3 , respectively, where V1 = {Viw1 , Viw2 }, V2 = {Vib }, V3 = {Via }. The search space range available for X1 is defined as (−100, 100), X2 is defined as (0, 100), and X3 is defined as (1, 100), that is, if xd > Xmax , then xd = Xmax and vd = −αvd , where α is between 0 and 1, and similarly for Xmin . So the particle cannot move out of this range in each dimension. The other settings of the HGDPSO algorithm are as follows: swarm size was set to 50, c1 and c2 were set to 2 and 2.1, respectively, then ϕ = c1 + c2 = 4.1, set κ = 1 and the constriction coefficient χ is thus 0.729 according to Eq. (7). The maximum number of epochs was limited to 10,000. In CFA of PSO, the population size and initial individuals are the same as those used in HGDPSO. For fair comparison, the parameters c1 , c2 , ϕ, and χ are also the same as those used in HGDPSO. The evolution is processed for 10,000 iterations. In APSO [48], the population size and neighborhood size are equal to 20 and 3, respectively. Coefficient ϕ is 4.1 and the constriction coefficient χ is thus 0.729. The stop criterion is 3000 function evaluations for APSO. In CPSO [16], the population size and initial individuals are the same as those used in HGDPSO. For fair comparison, the parameters c1 , c2 , ϕ, and χ are also the same as those used in HGDPSO. The evolution is processed for 10,000 iterations. The GD algorithm has two parameters to be set: the learning rate and the momentum. The maximum epoch is set to 10,000. 6.1.3.3. Network training and determination In this experiment, two types of neural networks, namely wavelet neural networks and BP networks, are trained on fault classification using various training algorithms. In the first series of experiments, we want to test the performance of the WNN and BP network. In the second series of experiments, we want to test the performance of the various training algorithms for WNN. In all experiments, a swarm size of 20 was used. Each experiment was run 50 times for given iterations, and the results were averaged to account for stochastic difference. Instead of counting the number of iterations that the outer loop is executed, the number of times that the objective function is evaluated is counted [18]. For PSO-based perceptrons, such “evaluation time” is equivalent to a production of the population size and the evolving generations. In order to compare performances between the PSO- and BPbased perceptrons, the training histories are recorded with the same fitness evaluation times F for both perceptrons to be compared with.
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
Fig. 6. Convergence curves using GD algorithm for WNN (GWNN) and BP (GBP).
413
Fig. 7. Convergence curves using HGDPSO with lbest version for WNN(LGPWNN) and BP(LGPBP) network.
Table 3 Comparison of WNN and BP method Method
Diagnosis accuracy (%)
Sum error
Evaluation time (F)
GBP GWNN APWNN LGPBP LGPWNN PWNN GGPWNN CPWNN
90.33 95.33 94.67 93.67 96.67 89.33 96.33 95.33
0.001 0.001 0.001 0.001 0.001 0.005 0.001 0.001
8900 5550 7920 6600 5120 60,000 5020 7800
For a standard PSO algorithm, F = S × I,
(26)
where F is the number of times the objective function is evaluated, S is the swarm size, and I is the number of iterations that the algorithm is run. For a HGDPSO algorithm, F = S1 × I1 × K1 + S2 × I2 × K2 ,
(27)
where F is the number of times the objective function is evaluated, S1 is the swarm size that the PSO algorithm is run, S2 is the swarm size that the GD algorithm is run, I1 is the number of iterations that the PSO algorithm is run, I2 is the number of iterations that the GD algorithm is run, K1 is the factor that the PSO algorithm is run, and K2 is the factor that the GD algorithm is run. (1) Compare with WNN and BP network Comparing to WNN, the same 300 groups of data are processed with a 7-10-6 BP network; expecting output error threshold is 0.001. The MSE function is same as Eq. (8). The other parameters are same as WNN. Training processes terminate in given fitness evaluation times. Figures 6 and 7 demonstrate the training history and the performance of the WNN and BP networks using GD and HGDPSO, respectively. By looking at the shapes of the curves in Fig. 6 and the values in Table 3, it is easy to see the WNN trained with GD algorithm converges more quickly than the BP trained with GD. As seen in Fig. 7 and in Table 3, it is clear that the simulation time obtained by the WNN trained with the lbest version of HGDPSO algorithm is comparatively less compared to the BP networks trained with the lbest version of HGDPSO algorithm. Table 3 shows the final values reached by each algorithm after 10,000 iterations were performed. It shows the results of a convergence test. Here, a training session is said to converge when it fails to improve for 10,000 consecutive epochs. The second column in this table lists the diagnosis accuracy on the 300 actual sample data. The WNN trained
414
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
Fig. 8. Convergence curves using CPSO (CPWNN), gbest version GDPSO (GGPWNN), and lbest version HGDPSO (LGPWNN) for training WNN.
Fig. 9. Convergence curves using PSO (PWNN), GD (GWNN), APSO (APWNN), and gbest version HGDPSO (GGPWNN) for training WNN.
Fig. 10. Convergence curves using APSO (APWNN), and lbest version GDPSO (LGPWNN) for training WNN.
with GD algorithm achieve higher diagnosis accuracy (95.33%) of the tested set than that (90.33%) of BP trained with GD algorithm. The WNN trained with the lbest version of HGDPSO algorithm achieve higher diagnosis accuracy (96.67%) of the tested set than that (93.67%) of BP trained with the lbest version of HGDPSO algorithm. The test results confirm that, in both cases, the proposed WNN have a better capability for generalization than the BP methods. The third column in this table lists the average number of error function evaluations (WNN and BP networks) used during training, until running terminated with the network converged or the epochs exceeding the maximum epochs. Table 3 shows that the HGDPSO based WNN method outperformed all the other architectures. As can be seen in Table 3, the WNN architecture is decidedly superior, yielding errors that are smaller than the BP architecture. Another benefit of the WNN approach is the reduced training time. From the comparison of training histories in WNN and BP networks trained with various methods, it can be seen that, for the given errors, the WNN presents better convergence performance than the BP networks. On considering the classify accuracy, the results produced by the WNN model are more close to the original data than those by BP networks. In general, the WNN generates more accurate classify and presents better performance than the BP ones does. (2) Compare with HGDPSO, GD, APSO, CPSO, and PSO algorithm To show the effectiveness and efficiency of HGDPSO integrating the GD algorithm with the CFA of PSO, WNN designed by GD algorithm, APSO [48], CPSO [16], and CFA of PSO are also applied to the same fault diagnosis problem. In training WNN, the main goal is to obtain a set of weights w, u, the dilation a, and the translation b that will minimize mean squared error (MSE). As shown in Figs. 8–10 and Table 3, the MSE error of 0.001 has been reached by using HGDPSO algorithm with lbest version (LGPWNN) after an average of 5120 evaluation times in the training phase, after 5550 evaluation times, the mean squared error reached the same level by using GD algorithm (GWNN), after 7800 evaluation times, the mean squared error reached the same level by using CPSO algorithm (CPWNN), and after 7920 evaluation times, the mean squared error reached the same level by using APSO algorithm (APWNN). Meanwhile, after 60,000 evaluation times, the mean squared error reached by using PSO technique (PWNN) was not less than 0.005. By looking at the shapes of the curves in Fig. 9, it is easy to see the PSO converges quickly under training phase but will slow its convergence speed down when reaching the optima. The PSO lacks global search ability at the end of run that a method is required to jump out of the local minimum in some cases. The performance of HGDPSO that combined PSO with GD algorithm can be improved greatly and have better results than that of both PSO and GD algorithm. In the testing phase, the lbest version of HGDPSO trained WNN (LGPWNN) was able to discriminate the fault examples with an accuracy higher than 96% (see Table 3), compared to GD’s 95.33%, CPSO (CPWNN)’s 95.33%, and APSO (APWNN)’s 94.67%, while the PSO trained WNN (PWNN) with an accuracy less than 90%. From Table 3, we can see that the classification error for HGDPSO with lbest version was the smallest. For the training problem, the
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
415
Fig. 11. Convergence curves using gbest version GPSO (GGPWNN), and lbest version GPSO (LGPWNN) for training WNN.
Fig. 12. The structure of HGDPSO based wavelet neural network for machine health assessment.
performance of the HGDPSO algorithm with lbest version was better than that of the GD, the APSO, the CPSO, and the PSO algorithms. Figure 11 and Table 3 have shown the lbest version of HGDPSO algorithm to converge somewhat more slowly than the gbest version, but it is less likely to become trapped in an inferior local minimum. The results on performance clearly show that the HGDPSO is a stronger optimizer than the CFA of PSO, the APSO, the CPSO, and the implemented GD algorithm on the WNN training. The HGDPSO represent a clear and substantial improvement over the CFA of PSO, APSO, and GD algorithm, not only in the final solutions, but also in the speed with which they are found. The purpose of this work is to demonstrate a novel approach HGDPSO that substantially improves the basic algorithm. The comparison and potential combination of HGDPSO with other PSO improvements is part of ongoing research and will be a subject of future work. 6.2. Case 2. Intelligent system for machine health assessment Figure 12 shows the intelligent system we developed. It consists of three parts: (a) data acquisition and preprocessing, (b) feature extraction, and (c) classification using a HGDPSO based wavelet neural network.
416
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
Fig. 13. Convergence curves using gbest version GDPSO (GGPWNN) and lbest version GDPSO (LGPWNN) for training WNN.
Wavelet layer is responsible for feature extraction from the vibration signals. The feature extraction process has two stages: (1) Wavelet packet decomposition For wavelet packet decomposition of the vibration signals, the tree structure was used as a binary tree at depth m = 8. Wavelet packet decomposition was applied to the vibration signals using the Daubechies − 1 wavelet packet filters with the Shannon entropy [49] as defined in Eq. (28). In the equation, s is the vibration signal and si is the coefficient of wavelet packet decomposition of s. Thus obtaining 28 = 246 terminal node signals. E(s) = − si2 log si2 . (28) i
(2) Wavelet packet entropy An entropy-based criterion describes information-related properties for an accurate representation of a given signal. Entropy is a common concept in many fields, mainly in signal processing [50]. A method for measuring the entropy appears as an ideal tool for quantifying the ordering of non-stationary signals. We next calculated the norm entropy as defined in Eq. (29) of the waveforms at the terminal node signals obtained from wavelet packet decomposition |si |P , (29) E(s) = i N where the wavelet packet entropy E is a real number, s is the terminal node signal and si is the waveform of terminal node signals. In norm entropy, P is the power and must be such that 1 P < 2. During the HGDPSO based WNN learning process, the P parameter is updated together with weights to minimize the error. The resultant entropy data were normalized with N = 20. Thus, the feature vector was extracted by computing the 256-wavelet packet entropy values per vibration signal. HGDPSO based WNN learning layer realizes the intelligent classification using features from wavelet layer. We performed experiments using 300 samples taken from four fault vibration signals. The data from a part of the fault vibration signal samples were used for training and another part in testing the HGDPSO based WNN. In this experiments, nearly 100% correct classification was obtained at the HGDPSO based WNN training among the fault vibration signal classes. It clearly indicates the effectiveness and the reliability of the proposed approach for extracting features from fault vibration signals. The feature choice was motivated by a realization that HGDPSO based WNN essentially is a representation of a signal at a variety of resolutions. In brief, the wavelet packet decomposition has been demonstrated to be an effective tool for extracting information from the fault vibration signals. The proposed feature extraction method is robust against to noise in the fault vibration signals. In this paper, the application of the wavelet packet entropy in the wavelet layer to the adaptive feature extraction from the fault vibration signals was shown. Wavelet packet entropy proved to be a very useful features for characterizing the vibration signals, furthermore the information obtained from the wavelet packet entropy is related to the energy and consequently with the amplitude of signal. This means that with
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
417
this method, new information can be accessed with an approach different from the traditional analysis of amplitude of vibration signals. Figure 13 shows the HGDPSO based WNN training performance. It can bee seen that the HGDPSO based WNN has achieved significantly better solutions. The most important aspect of the HGDPSO based intelligent system is the ability of self-organization of the WNN without requirements of programming and the immediate response of a trained net during real-time applications. These features make the intelligent system suitable for automatic classification in interpretation of the fault vibration signals. 7. Conclusions A HGDPSO-based wavelet neural network approach is developed for fault diagnosis. The approach takes a novel kind of optimization algorithm, i.e., HGDPSO optimization algorithm, to train wavelet neural work. The feasibility and effectiveness of this new approach is validated and illustrated by the study cases of modeling fault diagnosis on rotating machine. The data measured at a machinery fault simulator by the data acquisition module are chosen as the original data to testify the performance of the proposed model and to validate the numerical simulations. The numerical results show that the HGDPSO-based WNN has a better training performance, faster convergence rate, as well as a better diagnosis ability than the other modules to be selected cases. References [1] D.S. Chen, R.C. Jain, A robust back propagation learning algorithm for function approximation, IEEE Trans. Neural Networks 5 (3) (1994) 467–479. [2] Q. Zhang, A. Benveniste, Wavelet networks, IEEE Trans. Neural Networks 3 (1992) 889–898. [3] Y.C. Pati, P.S. Krishnaprasad, Analysis and synthesis of feedforward neural networks using affine wavelet, IEEE Trans. Neural Neworks 4 (1) (1993) 73–75. [4] J. Zhang, G.G. Walter, Y. Miao, W.N.W. Lee, Wavelet neural networks for function learning, IEEE Trans. Signal Process. 43 (1995) 1485– 1497. [5] B. Delyon, A. Juditsky, A. Benveniste, Accuracy analysis for wavelet approximations, IEEE Trans. Neural Networks 6 (1995) 332–348. [6] H.L. Shashidhara, S. Lohani, V.M. Gadre, Function learning using wavelet neural networks, in: Proceedings of IEEE International Conference on Industrial Technology, vol. 2, 2000, pp. 335–340. [7] Q. Zhang, Using wavelet network in nonparametric estimation, IEEE Trans. Neural Networks 8 (1997) 227–236. [8] C.K. Chui, et al., Wavelets: Theory Algorithms and Application, Academic Press, San Diego, 1994. [9] M.M. Johansen, Evolving neural networks for classification, in: Topics of Evolutionary Computation 2002. Collection of Student Reports, 2002, pp. 57–63. [10] C. Zhang, H. Shao, Particle swarm optimisation in feedforward neural network, in: Artificial Neural Networks in Medicine and Biology (ANNIMAB), 2001, pp. 327–332. [11] D.K. Agrafiotis, W.W. Cedeño, Feature selection for structure-activity correlation using binary particle swarms, J. Med. Chem. 45 (2002) 1098–1107. [12] V.G. Gudisz, G.K. Venayagamoorthy, Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks, in: IEEE Swarm Intelligence Symposium 2003 (SIS 2003), Indianapolis, IN, 2003, pp. 110–117. [13] Z. He, C. Wei, L. Yang, X. Gao, S. Yao, R. Eberhart, Y. Shi, Extracting rules from fuzzy neural network by particle swarm optimization, in: Proceedings of IEEE International Conference on Evolutionary Computation, Anchorage, AK, 1998, pp. 74–77. [14] F. van den Bergh, A.P. Engelbrecht, Cooperative learning in neural networks using particle swarm optimizer, Proc. SAICSIT (2000) 84–90. [15] K.E. Parsopoulos, M.N. Vrahatis, Modification of the particle swarm optimizer for locating all the global minima, in: V. Kurkova, N.C. Steele, R. Neruda, M. Karnay (Eds.), Artificial Neural Networks and Genetic Algorithm, Springer, 2001, pp. 324–327. [16] B.B. Thompson, R.J.I. Marks, M.A. El-Sharkawi, W.J. Fox, R.T. Miyamoto, Inversion of neural network underwater acoustic model for estimation of bottom parameters using modified particle swarm optimaery, in: Proceedings of International Joint Conference on Neural Networks (IJCNN 2003), 2003, pp. 1301–1306. [17] F. van den Bergh, A.P. Engelbrecht, Training product unit networks using cooperative particle swarm optimizers, in: Proceedings of IJCNN 2001, Washington, DC, 2001, pp. 126–131. [18] F. van den Berg, An analysis of particle swarm optimizers, Ph.D. dissertation, University of Pretoria, Pretoria, 2001. [19] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proceedings of IEEE International Conference on Neural Networks, 1995, pp. 1942–1948. [20] J. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, Morgan Kaufmann, 2001. [21] H. Yoshida, K. Kawata, Y. Fukuyama, S. Takayama, Y. Nakanishi, A particle swarm optimization for reactive power and voltage control considering voltage security assessment, IEEE Trans. Power Syst. 15 (4) (2000) 1232–1239. [22] M.A. Abido, Optimal design of power system stabilizers using particle swarm optimization, IEEE Trans. Energy Convers. 17 (3) (2002) 406–413.
418
Q. Guo et al. / Digital Signal Processing 16 (2006) 402–418
[23] M.A. Abido, Optimal power flow using particle swarm optimization, Int. J. Electr. Power Energy Syst. 24 (7) (2002) 563–571. [24] R.C. Eberhart, X. Hu, Human tremor analysis using particle swarm optimization, in: Proceedings of Congress on Evolutionary Computation 1999, Washington, DC, IEEE Service Center, Piscataway, NJ, 1999, pp. 1927–1930. [25] H. Yoshida, K. Kawata, Y. Fukuyama, S. Takayama, Y. Nakanishi, A particle swarm optimization for reactive power and voltage control considering voltage security assessment, IEEE Trans. Power Systems 15 (4) (2000) 1232–1239. [26] P.J. Angeline, Using selection to improve particle swarm optimization, in: Proceedings of IEEE International Conference on Evolutionary Computation, Anchorage, AK, May 4–9, 1998, pp. 84–89. [27] G.C. Onwubolu, TRIBES application to the flow shop scheduling problem, in: New Optimization Techniques in Engineering, Springer, Heidelberg, 2004, pp. 517–536. [28] I. Daubechies, The wavelet transform, time-frequency localization, and signal analysis, IEEE Trans. Inform. Theory 36 (1990) 961–1005. [29] K.E. Parsopoulos, M.N. Vrahatis, Recent approaches to global optimization problems through particle swarm optimization, Natural Comput. 1 (2002) 235–306. [30] Y. Shi, R. Eberhart, A modified particle swarm optimizer, in: Proceedings of IEEE International Conference on Evolutionary Computation, Anchorage, AK, 1998, pp. 69–73. [31] M. Clerc, The swarm and queen: Towards a deterministic and adaptive particle swarm optimization, in: Proceedings of IEEE Congress on Evolutionary Computation, 1999, pp. 1951–1957. [32] M. Clerc, J. Kennedy, The particle swarm: Explosion, stability, and convergence in a multi-dimensional complex space, IEEE Trans. Evolut. Comput. 6 (1) (2002) 58–73. [33] R. Eberhart, Y. Shi, Comparing inertia weights and constriction factors in particle swarm optimization, in: Proceedings of Congress on Evolutionary Computation (CEC 2000), 2000, pp. 84–88. [34] J. Kennedy, Small worlds and mega-minds: Effects of neighborhood topology on particle swarm performance, in: Proceedings of IEEE Congress on Evolutionary Computation, Washington, DC, 1999, pp. 1931–1938. [35] J. Kennedy, R. Mendes, Population structure and particle swarm performance, in: Proceedings of IEEE Congress on Evolutionary Computation (CEC 2002), vol. 2, Honolulu, HI, 2002, pp. 1671–1676. [36] J. Kennedy, Stereotyping: Improving particle swarm performance with cluster analysis, in: Proceeding of 2000 Congress on Evolutionary Computation, IEEE Service Center, Piscataway, NJ, 2000, pp. 1507–1512. [37] E.S. Peer, F. van den Bergh, A.P. Engelbrecht, Using neighbourhoods with the guaranteed convergence PSO, in: Proceedings of IEEE Swarm Intelligence Symposium (SIS), Indianopolis, IN, 2003, pp. 235–242. [38] Y. Shi, R.C. Eberhart, Empirical study of particle swarm optimization, in: Proceedings of IEEE International Congress on Evolutionary Computation, 1999, pp. 101–106. [39] P.N. Suganthan, Particle swarm optimiser with neighbourhood operator, in: IEEE Congress of Evolutionary Computation, vol. 3, 1999, pp. 1958–1962. [40] F. van den Bergh, A. Engelbrecht, A new locally convergent particle swarm optimizer, in: Proceedings of IEEE Conference on Systems, Man and Cybernetics, Hammamet, Tunisia, 2002, pp. 96–101. [41] S. Kaniel, Estimates for some computational techniques in linear algebra, Math. Comput. 20 (1996) 369–378. [42] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation, Nature 323 (1986) 533–536. [43] M. Misiti, Y. Misiti, et al., Wavelet Toolbox User’s Guide, The Math Works Inc., 1996. [44] E. Ozcan, C.K. Mohan, Analysis of a simple particle swarm optimization system, Intell. Eng. Syst. Artificial Neural Networks (1998) 253–258. [45] E. Ozcan, C.K. Mohan, Particle swarm optimization: Surfing the waves, in: Proceedings of Congress on Evolutionary Computation, 1999, pp. 1939–1944. [46] F. Solis, R. Wets, Minimization by random search techniques, Math. Oper. Research. 6 (1981) 19–30. [47] Z. Cui, J. Zeng, A guaranteed global convergence particle swarm optimizer, Rough Sets Current Trends Comput. (2004) 762–767. [48] W. Zhang, Y. Liu, M. Clerc, An adaptive PSO algorithm for reactive power optimization, in: Proceedings of Advances in Power System Control Operation and Management (APSCOM), Hong Kong, 2003, pp. 302–307. [49] R.K.H. Galvao, F. Yoneyama, A neural classifier employing biased wavelets, in: Proceedings of Fifth Brazilian Symposium on Neural Networks, 1998, pp. 106–111. [50] A. Bakhtazad, A. Palazoglu, J.A. Romagnoli, Process data de-noising using wavelet transform, Intell. Data Anal. 3 (1999) 267–285.
Qian-jin Guo is currently pursuing the Ph.D. degree in mechatronic engineering from Shenyang Institute of Automation, Chinese Academy of Sciences. His research interests are in the areas of signal processing, machinery surveillance and diagnostics. Hai-bin Yu holds a Ph.D. degree in automation control from Northeastern University, Shenyang, China. He is a Professor of Shenyang Institute of Automation, Chinese Academy of Sciences. His research has involved wireless sensor network, networked control systems, and networked manufacturing. Ai-dong Xu is a Professor of Shenyang Institute of Automation, Chinese Academy of Sciences. His research has involved wireless sensor network and industry communication.