ARTICLE IN PRESS Mechanical Systems and Signal Processing Mechanical Systems and Signal Processing 21 (2007) 809–823 www.elsevier.com/locate/jnlabr/ymssp
An adaptive predictor for dynamic system forecasting Wilson Wang Department of Mechanical Engineering, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, Canada P7B 5E1 Received 23 July 2005; received in revised form 11 December 2005; accepted 14 December 2005 Available online 8 February 2006
Abstract A reliable and real-time predictor is very useful to a wide array of industries to forecast the behaviour of dynamic systems. In this paper, an adaptive predictor is developed based on the neuro-fuzzy approach to dynamic system forecasting. An adaptive training technique is proposed to improve forecasting performance, accommodate different operation conditions, and prevent possible trapping due to local minima. The viability of the developed predictor is evaluated by using both gear system condition monitoring and material fatigue testing. The investigation results show that the developed adaptive predictor is a reliable and robust forecasting tool. It can capture the system’s dynamic behaviour quickly and track the system’s characteristics accurately. Its performance is superior to other classical forecasting schemes. r 2006 Elsevier Ltd. All rights reserved. Keywords: Neuro-fuzzy forecasting scheme; Adaptive training; Machinery condition monitoring; Model uncertainty; Fatigue testing
1. Introduction A reliable predictor is very useful to a wide range of industries to forecast the upcoming states of a dynamic system. In mechanical systems, for example, the forecasting information can be used for: (1) condition monitoring to provide an accurate alarm before a fault reaches critical levels so as to prevent machinery performance degradation, malfunction, or catastrophic failure; (2) scheduling of repairs and predictive/ preventive maintenance in manufacturing facilities; and (3) predictive and fault-tolerant control. System state forecasting utilises available observations to predict the future states of a dynamic system. The observations can be patterns from such information carriers as temperature, acoustic signal, or vibration. The vibrationbased approach, however, is the most commonly used technique because of the ease of measurement and analysis. Thus, it is also used in this study. Time-series forecasting can be performed for one-step or multiple-step predictions. The more steps, the less reliable the forecasting operation is because the involved approaches in the multiple-step prediction are associated with one-step operations. Thus, this research also focuses on one-step forecasting operations. Several techniques have been proposed in the literature for time-series prediction. The classical approaches are the use of stochastic models [1] and dynamics-based techniques [2,3]; usually, these methods are easy to implement but difficult in forecasting the behaviour of complex dynamic systems. Recent interest in time-series Tel.: +1 807 766 7174; fax: +1 807 343 8928.
E-mail address:
[email protected]. 0888-3270/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.ymssp.2005.12.008
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
810
Dynamic System
Data Collection
Signal Processing Post Processing Predictor
Database Training
Fig. 1. The architecture of the forecasting tool based on the adaptive predictor.
forecasting has focused on the use of flexible models such as neural networks (NNs). After being properly trained, NNs can represent the non-linear relationship between the previous states and the future states of a dynamic system [4]. NN-based predictors have two typical network architectures: feedforward and recurrent networks, both of which have been employed in some applications [5,6]. Advanced investigation has indicated that the recurrent network predictors perform superior to those based on the feedforward networks [7]. NN forecasting schemes, however, have some disadvantages: Their forecasting operation is opaque to users, and the convergence of training is usually slow and not guaranteed. To solve these problems, synergetic schemes based on NNs and fuzzy logic have been adopted in time-series forecasting [8]. In such synergetic schemes, the fuzzy logic provides NNs with a structural framework with high-level if-then rule-based thinking and reasoning, whereas the NNs provide the fuzzy systems with learning capability [9,10]. Jang et al. [11] proposed a neuro-fuzzy (NF) scheme for time-series forecasting. By simulation, they found that the NF predictor performed better than both the stochastic models and the feedforward NNs. The author and his research team developed an NF prognostic system for machinery condition applications [12]. Their investigation indicated that if an NF predictor is properly trained, it performs even better than both the feedforward and the recurrent network forecasting schemes. Even though the NF predictors have demonstrated their superior properties to other classical time-series forecasting schemes, advanced research needs to be done in a few aspects before they can be applied to general real-time industrial applications [13]: (1) improving their application robustness to accommodate different system conditions; (2) mitigating the requirements for the representative data sets; and (3) improving their convergence properties, especially for complex operation applications. Consequently, the aim of this paper is to develop an adaptive predictor to solve these problems in order to provide industries with a more reliable and real-time forecasting tool. Fig. 1 schematically shows the architecture of the forecasting tool based on the proposed adaptive predictor. Signals are collected using corresponding sensors. After being properly filtered and sampled, the signals are fed into a computer. In further processing, the first step is to generate the representative features from the collected signals by applying different signal processing techniques. Postprocessing is done to enhance the feature characteristics and derive monitoring indices for forecasting operations. All the involved signal processing techniques and forecasting schemes are coded in MATLAB and then translated to a C++ environment for general application purposes. This presentation starts with a description of the adaptive predictor and the corresponding adaptive training technique. Next, the predictor is implemented for real-time monitoring applications. The viability of this new predictor is verified by online experimental tests related to gear condition monitoring and material fatigue testing. 2. The adaptive NF predictor In this proposed adaptive predictor, the forecasting reasoning is performed by fuzzy logic, whereas the fuzzy system parameters are trained by using NNs. To make it compatible with those in the author’s prior work [12], four input variables fx3r x2r xr x0 g are utilised in this case, where x0 represents the current state of the dynamic system and r denotes the time step. If two membership functions (MFs), small and large, are assigned to each input variable, then 24 ¼ 16 rules will be formulated to predict the future state of a dynamic system, x+r,
cj0 x0
þ
cj1 xr
þ
cj2 x2r
þ
cj3 x3r
þ
cj4 ;
M ji
denote MFs;
cji
(1)
are constants; i ¼ 0,1,y,3, j ¼ 1,2,y,16.
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
811
Fig. 2. The network architecture of the adaptive predictor.
The network architecture of this adaptive predictor is schematically shown in Fig. 2. It is a 5-layer network in which each node performs a particular activation function on the incoming signals. The links, with unity weights, represent the flow direction of signals between nodes. The nodes in layer 1 transmit input signals to the next layer. Each node in layer 2 acts as an MF, which is either a single node that performs a simple activation function or multilayer nodes that perform a complex function. Different from the general NF predictor that was proposed in [12], this adaptive predictor has a feedback link to each node in layer 2 to improve the forecasting accuracy and convergence property (stability). The signal in each feedback link represents the node output in the previous time step (i.e., at n1). For example, if sigmoid MFs with parameters faji bji g are utilised in this case, then 1 , (2) ¼ mM j xðnÞ ir i 1 þ exp ½aji ðX i bji Þ where ðn1Þ j x X i ¼ xðnÞ þ m ir ir M i
1 h i , ð3Þ j ðn1Þ bji 1 þ exp ai xir ðn1Þ j x and m represent the MF values at the current and previous time steps, respectively. where mM j xðnÞ ir ir M ¼
xðnÞ ir
þ
i
i
Each node in layer 3 performs a fuzzy T-norm operation. If a max-product operator is applied in this case, the rule firing strength is mj ¼
3 Q i¼0
mM j ðxir Þ; i
j ¼ 1; 2; . . . ; 16 .
(4)
All the rule firing strengths are normalised in layer 4. After a linear combination of the input variables in layer 5, the predicted output x+r is obtained by using the centroid defuzzification: xþr ¼
16 X
m¯ j ðcj0 x0 þ cj1 xr þ cj2 x2r þ cj3 x3r þ cj4 Þ,
j¼1
where m¯ j ¼ mj =
P16
j¼1 mj
denotes the normalised rule firing strength.
(5)
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
812
The input variables as well as the forecasted output x+r at each time step are stored in the training database. The predictor, as represented in Eq. (1), can also be adopted for variable step forecasting operations. If the time interval is kr, where k is an integer, the input variables to the predictor become fx3kr x2kr xkr x0 g. The predicted state x+kr is equivalent to a multiple-step forecasting operation, and the corresponding forecasting rules will be represented by
(6)
where C j ¼ cj0 x0 þ cj1 xkr þ cj2 x2kr þ cj3 x3kr þ cj4 , i ¼ 0,1,y,3, j ¼ 1,2,y,16. M ji and cji may differ from those in Eq. (1), which can be obtained by using training data recalibration and adaptive training as discussed in the next section. 3. The adaptive training technique 3.1. The training algorithm Before being applied to real-time applications, a forecasting scheme should be properly trained using representative data sets, which should cover all the possible application conditions [14,15]. Such a requirement is usually difficult to achieve in real-world applications because most machinery operates in a noisy and an uncertain environment. Furthermore, the classical forecasting schemes are mainly used for time-invariant systems or systems with slowly varying model parameters. However, machinery dynamic characteristics may change suddenly just after repair or regular maintenance [16]. Therefore, a real-time predictor should have sufficient adaptive capability to accommodate different operation conditions [17]. In this section, an adaptive training technique is adopted to approach these problems. According to [12], an NF forecasting scheme can be properly trained as long as the number of representative data pairs is more than five times the number of parameters to be updated. The developed NF predictor, as represented in Eq. (1), has 96 unknown parameters (16 MF parameters and 80 consequent parameters). Therefore, to improve the training efficiency and facilitate the database management, the size of the training data sets in this adaptive predictor is limited to N ¼ 500. The data sets in the training database are represented as 3 2 ð1Þ ð1Þ ð1Þ ð1Þ x x x x xð1Þ þr r 2r 0 7 6 3r ð2Þ ð2Þ 7 6 xð2Þ xð2Þ x x xð2Þ þr 7 6 3r r 2r 0 7. 6 Td ¼ 6 (7) 7 ... 7 6 5 4 ðNÞ xðNÞ xðNÞ xðNÞ xðNÞ xþr r 3r 2r 0 ðnÞ ðnÞ ðnÞ ðnÞ For the nth input data pair fxðnÞ 3r x2r xr x0 g, n ¼ 1,2,y,N, the forecasted state xþr is computed by Eq. (5):
xðnÞ þr ¼
16 X
j ðnÞ j ðnÞ j ðnÞ j m¯ j ðcj0 xðnÞ 0 þ c1 xr þ c2 x2r þ c3 x3r þ c4 Þ.
(8)
j¼1
Different from other online training methods that use the definite weight functions [8], this adaptive training technique will use an exponential gain function to highlight the contribution of the data sets from a timevariant series gn ¼
1 , 1 þ exp ½aðn=2 bÞ
where gn 2 ½0; 1; a ¼ 5 105 N, b ¼ N=2, and N is the total number of training data pairs.
(9)
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
813
Eq. (9) is illustrated in Fig. 3. The recent data sets are given more gain factors, whereas the old data pairs contribute less to training. All the gain factors are recognised in a gain matrix G: 3 2 g1 0 0 0 7 6 6 0 g2 0 0 7 7 6 (10) G¼6 . 7, 7 6 .. 5 4 0 0 0 gN where gN ¼ 1. If d(n) denotes the desired output value for the nth training data set (i.e., d ðnÞ ¼ x0ðnþ1Þ ), the forecasting error is defined as E¼
N X
En ¼
n¼1
N 2 1X ðnÞ gn xðnÞ . þr d 2 n¼1
(11)
If sigmoid MFs with parameters faji bji g are applied, the fuzzy MF parameters are updated by using the gradient method: aji ðm þ 1Þ ¼ aji ðmÞ Za
N X qE n n¼1
bji ðm þ 1Þ ¼ bji ðmÞ Zb
qaji
N X qE n n¼1
qbji
,
(12)
,
(13)
where m denotes the mth training epoch; Za and Zb are the step sizes; and C ðnÞ P16 m P16 m C ðnÞ j k¼1 k k¼1 k k ðnÞ j ðnÞ ¼ g x d ðxðnÞ P 2 þr n ir bi Þð1 mM ji Þmj , j 16 qai k¼1 mk
qE n
qE n qbji
¼
gn xðnÞ þr
(14)
C ðnÞ P16 m P16 m C ðnÞ j k¼1 k k¼1 k k d ðmM j 1Þmj aji , P 2 i 16 k¼1 mk ðnÞ
(15)
where i ¼ 0,1,y,3, j ¼ 1,2,y,16; the manipulation of Eqs. (14) and (15) can be found in the appendix. 1
Gain Level
0.9 0.8 0.7 0.6 0.5
0
100
200
300
Sample Span Fig. 3. The gain function.
400
500
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
814
Given the values of the MF parameters and N training data pairs to the adaptive predictor, ðnÞ ðnÞ ðnÞ ðnÞ fxðnÞ 3r x2r xr x0 d g, n ¼ 1,2,y,N, N linear equations in terms of the consequent parameters h can be formed as (16)
Ah ¼ d, where A is the resulting matrix 2 ð1Þ m¯ ð1Þ xð1Þ m¯ ð1Þ 1 xr 6 1ð2Þ 0ð2Þ 6 m¯ x ð2Þ m¯ ð2Þ 6 1 0 1 xr 6 A¼6 6 4 ðNÞ ðNÞ m¯ ðNÞ m¯ ðNÞ 1 x0 1 xr
from the inference operation of the adaptive predictor: ð1Þ m¯ ð1Þ 1 x2r
ð1Þ m¯ ð1Þ 1 x3r
m¯ ð1Þ 1
ð1Þ m¯ ð1Þ 16 x2r
ð1Þ m¯ ð1Þ 16 x3r
ð2Þ m¯ ð2Þ 1 x2r
ð2Þ m¯ ð2Þ 1 x3r
m¯ ð2Þ 1
ð2Þ m¯ ð2Þ 16 x2r
ð2Þ m¯ ð2Þ 16 x3r
m¯ 1ðNÞ xðNÞ 2r
m¯ 1ðNÞ xðNÞ 3r
m¯ 1ðNÞ
.. .
ðNÞ m¯ ðNÞ 16 x2r
ðNÞ m¯ ðNÞ 16 x3r
m¯ ð1Þ 16
3
7 7 m¯ ð2Þ 16 7 7. 7 7 5 ðNÞ m¯ 16
(17)
h is a vector whose elements are the predictor’s consequent parameters to be updated: h ¼ ½ c10
c11
c12
c13
c14
c16 0
c16 1
c16 2
c16 3
T c16 4 .
(18)
d represents the vector of the desired forecasting states: d ¼ ½ d ð1Þ
d ð2Þ
d ðNÞ T .
(19)
Because the row vectors in A and the associated elements in d are obtained sequentially, h in Eq. (16) can be computed recursively. Let ak denote the kth row vector in matrix A, Ak be a submatrix of A composed by the first kth rows of A, dk be the subvector of d composed by the first kth elements of d, d(k) be the kth element of d, Gk be a submatrix of G composed by the first kth rows of G. Then the current output of the predictor is xðkÞ þr ¼ ak hk ,
(20)
where k ¼ 0,1,y,N1. Correspondingly, Eq. (11) can be rewritten as E¼
1 X 2 1 1N g ak hk d ðkÞ ¼ ðAh dÞT GðAh dÞ 2 k¼0 k 2
¼ 12ðhT AT GAh 2d T GAh þ d T GdÞ.
ð21Þ
To minimise the forecasting error, set qE=qh ¼ 0, then h ¼ ðAT GAÞ1 AT Gd.
(22)
To derive the formula for a recursive estimate, corresponding to the kth input data pair, hk can be computed by Eq. (22), that is, hk ¼ ðATk G k Ak Þ1 ATk G k d k .
(23)
Corresponding to the (k+1)th data pair faTkþ1 ; d ðkþ1Þ g, Eq. (23) becomes 0" #" #" # " #11 " # " # Ak T G k 0 Ak T G k 0 Ak dk A hkþ1 ¼ @ T T akþ1 aTkþ1 0 1 akþ1 0 1 d ðkþ1Þ ¼ ðATk G k Ak þ akþ1 aTkþ1 Þ1 ðATk G k d k þ akþ1 d ðkþ1Þ Þ.
ð24Þ
T Let S k ¼ ðATk G k Ak Þ1 , that is, S 1 k ¼ Ak G k Ak , then 0" # " #11 #" Ak T G k 0 Ak A S kþ1 ¼ @ T T akþ1 0 1 akþ1
¼ S k S k akþ1 ðI þ aTkþ1 S k akþ1 Þ1 aTkþ1 S k .
ð25Þ
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
815
Eq. (24) becomes hkþ1 ¼ S kþ1 ATk G k d k þ S kþ1 akþ1 d ðkþ1Þ ¼ hk þ S kþ1 akþ1 ðd ðkþ1Þ aTkþ1 hk Þ,
ð26Þ
where k ¼ 0,1,y,N1. The least-squares estimate of h is hN. The initial conditions to Eqs. (25) and (26) are h0 ¼ 0 and S0 ¼ aI, respectively, where a 2 ½102 106 , and I is an identity matrix which is 16 16 in this case. 3.2. Training processes and error measurement ðiÞ (i) ðiÞ ðiÞ ðiÞ Consider a given (ith) data set fxðiÞ 3r x2r xr x0 d g, where d is the desired system output. As illustrated in Fig. 2, the relative forecasting error (i.e., the point residual error) is measured by xðiÞ d ðiÞ þr ei ¼ (27) , d ðiÞ ðiÞ where xþr is the predictor output obtained by Eq. (5), and d ðiÞ ¼ xðiþ1Þ . 0 The change rate of the forecasting error for this ith data pair will be xði1Þ d ði1Þ ðiÞ dei xðiÞ þr þr d ¼ e_i ¼ . ði1Þ d ðiÞ dt d
(28)
The adaptive training of the predictor is triggered if the point residual error ei exceeds a tolerance et as well as if the error changes increasingly, that is, ½ei 4t ^ ½_ei 4040.
(29)
In training, the predictor’s consequent parameters are optimised in the forward pass of each training epoch by using Eqs. (25) and (26), whereas the MF parameters are fine-tuned in the backward pass by using Eqs. (12) and (13). The training error E in Eq. (11) is minimised successively. If the mismatching is caused by, for example, the convergence deterioration due to a local minimum, the error-making rules, which take the most active part in making the current forecasting operation, are punished by reducing their contributions to the following operations. The corresponding rule boundary properties in the decision space are modified accordingly. To prevent possible trapping due to that local minimum, an optimal search direction is generated based on both forecasting error states and the error change trend orientations over a long term. The training process is terminated as long as the training error is reduced to a specified tolerance level (e.g., 105 in this case) DE ¼ EðmÞ Eðm 1Þp105 ,
(30)
where EðmÞ and Eðm 1Þ represent, respectively, the training error for the mth and ðm 1Þth epoch. The general forecasting accuracy of the predictor can also be estimated over a specified region (e.g., over M data sets) according to the following regional residual error vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !2 u M ðiÞ u1 X xðiÞ t þr d eR ¼ . (31) M i¼1 d ðiÞ
3.3. Modelling uncertainty analysis In employing training data sets to identify predictor model parameters, there always exist some errors due to imperfections in model assumptions and training processes, as well as noises (i.e., the signals not related to the information of interest) in measurement and analysis. This effect is termed model uncertainty. Several techniques have been proposed in the literature for analysing model uncertainty problems [18]. For linear models, in which the model’s output is linear (or approximately linear) in terms of its parameters, the model
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
816
uncertainty processing techniques can be classified as active and passive approaches [19]. The noise in active approaches is assumed random in nature and is characterised by some stochastic representations such as probability density functions. Consequently, reference observers are constructed for model uncertainty analysis [20]. In real-world applications such as in machinery systems, however, noise signals may not be exactly random in nature, and they are difficult to characterise stochastically. The passive approach applies error thresholds for noise analysis, and a noise signal is characterised by its upper and lower bounds. The wellaccepted passive approaches include ellipsoidal outer bounding technique, orthotopic outer bounding algorithm, and exact parameter description method [18]. In this study, because the NF predictor has a completed architecture (i.e., 16 complementary rules corresponding to four input variables each having two MFs) and is adaptively trained, a simplified bounded-error method is adopted for model uncertainty analysis. Based on the specific application requirements, a point error threshold et is chosen (e.g., t ¼ 0:05). For a given data set, if the relative residual error is less than the priori threshold, that is xðiÞ d ðiÞ þr ei ¼ (32) p d ðiÞ t the monitoring is assumed in its normal condition. Adaptive training is performed as long as Eq. (29) holds. After training process is completed, the relative model uncertainty error can be determined by Eq. (31), vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! u N u 1 X xðiÞ d ðiÞ 2 t þr , (33) eR ¼ N i¼1 d ðiÞ where N represents the number of training data sets. In normal forecasting operations, for a given (the ith) data set, the point relative residual signal will satisfy ðiÞ xðiÞ þr d
d ðiÞ
2 ½t t .
(34)
Correspondingly, the confidence interval for the predictor output xðiÞ þr is ðiÞ ð1 t Þd ðiÞ pxðnÞ þr pð1 þ t Þd .
(35)
4. Performance evaluation In this section, the proposed adaptive predictor is implemented for real-time forecasting applications. Online tests are conducted corresponding to gear system monitoring and material fatigue testing, respectively. To make a comparison, the forecasting results from a general NF predictor with offline training, as proposed in [12], are also listed; that general NF predictor has demonstrated its superior performance to other classical forecasting schemes. 4.1. Initial training Suppose there are no sufficient representative data sets available at the starting stage (a general case), the initial training data pairs in the training database Td in Eq. (7) are generated from the Mackey–Glass differential equation [21] with initial conditions of xð0Þ ¼ 1 and t ¼ 2: dxðtÞ 0:2xðt tÞ ¼ 0:1xðtÞ. dt 1 þ x10 ðt tÞ
(36)
The data sets from Eq. (36) have chaotic, non-linear, and non-convergent properties, which make them suitable to train forecasting schemes for general application purposes. In the test cases in this section, both predictors are initially trained using the same data set from Eq. (36). In the adaptive predictor, the
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
817
Mackey–Glass data pairs in the training database are gradually replaced by the available new data sets from testing, whereas the adaptive training is conducted as long as Eq. (29) holds. 4.2. Gear system monitoring 4.2.1. Experimental set-up Fig. 4 schematically shows the experimental set-up used for online gear system monitoring. The system is driven by a 3-HP DC motor. The load is provided by a heavy-duty magnetic break system. A two-stage gearbox is being tested in this work. The number of teeth is 17, 25, 19, and 23, respectively, for gears ]1 to ]4. The shafts are mounted to the gearbox housing by journal bearings. A magnetic pick-up sensor is mounted in the radial direction of gear ]4 to provide a reference signal (one pulse per each tooth period) for the time synchronous average filtering. The gap between the magnetic pick-up sensor and the gear top land is in the range of 0.3–1.0 mm. The motor rotation is controlled by a speed controller, which allows tested gear system to operate in the range of 20–3600 rpm. The speed of the drive motor and the load of the magnetic break system are adjusted automatically to accommodate the range of speed/load operation conditions. The vibration is measured using two accelerometers mounted at both ends of the gearbox housing. The signals from different sources are collected using an intelligent data acquisition device made by the author’s research team [17], which consists of a micro-controller and the circuits for the purposes of sampling, anti-aliasing filters, and AD/DA converters. The preconditioned signals are then fed into a computer for further processing. 4.2.2. Monitoring index determination The measured vibration from the experimental set-up is an overall signal associated with different vibratory sources, such as magnetic break, shafts, journal bearings, and gear mesh. As discussed in [17], each component will generate a vibratory signal with specific spectral characteristics, whereas the characteristic frequencies of a gear train are located in the highest bandwidth in a gearbox. Therefore, the gear signal can be derived by filtering out those frequency components lower than the gear mesh frequency frZ Hz, where fr denotes the gear rotation speed in Hz, and Z is the number of teeth of the gear of interest. The monitoring in this study is conducted gear by gear. Thus, an important procedure is to differentiate the signal specific to each gear by using the time synchronous average filtering [22]. This process is performed with the help of a reference signal related to the rotation of the gear of interest. In this case, the rotation reference for each gear is computed by the corresponding transmission ratio with respect to gear ]4. Through this
Fig. 4. The experimental set-up for gear system monitoring.
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
818
synchronous average filtering process, all the signals non-synchronous to the rotation of the gear of interest are removed, and each gear signal is derived and represented in one full revolution (i.e., the signal average). In condition monitoring, the monitoring indices should be sensitive to pattern modulation due to machinery faults but insensitive to noise [13]. Several signal processing techniques have been proposed in the literature for gear fault diagnosis, but each has its own merits and limitations. According to the investigation in [23], one of the most robust gear fault detection techniques is the beta kurtosis, and it will be applied as an example in this paper to demonstrate the monitoring process. Suppose m and s2 represent the mean and variance of the gear signal average, respectively, the beta kurtosis index can be represented as xb ¼
abða þ b þ 2Þða þ b þ 3Þ , 3ða þ b þ 1Þð2a2 2ab þ a2 b þ ab2 þ 2b2 Þ
(37)
where a ¼ ðm=s2 Þðm m2 s2 Þ and b ¼ ð1 m=s2 Þðm m2 s2 Þ. The details about these signal processing techniques can be found in [23]. Three gear cases are tested in this study as represented in Fig. 5: (a) healthy gears, (b) cracked gears, and (c) pitted gears. Many tests have been conducted corresponding to each gear condition, and two typical examples are illustrated in the following sections. In the following analysis, the predictor is believed being trained properly if the regional residual error eR p0:01, whereas eR is computed by Eq. (31) over the training data sets. In the following forecasting operations, the threshold for point residual error is chosen as t ¼ 5%, whereas the threshold for regional residual error is r ¼ 2%. Therefore, the predictor is believed to have captured the dynamic characteristics of the monitored system as long as ei p0:05 and eR p0:02, where eR and ei are computed by Eqs. (31) and (32), respectively. 4.2.3. Cracked gear monitoring At first, all the gears in the gearbox are in a healthy condition. The tests are conducted under load levels from 0.5 to 3 hp and motor speeds from 100 to 3600 rpm. During testing, motor speed and load levels are
(a)
(b)
(c)
Fig. 5. Gear conditions tested: (a) healthy gear, (b) cracked gear, (c) pitted gear.
0.8
0.6
0.4 (a)
1 Beta Kurtosis Index
Beta Kurtosis Index
1
0
50
100
150
200
250
Time Span (hours)
300
0.8
0.6
0.4
350 (b)
0
50
100
150
200
250
300
350
Time Span (hours)
Fig. 6. The test results for the cracked gear (solid curves): (a) the forecasting result by the adaptive predictor (dotted curve); (b) the forecasting result by the general predictor (dotted curve).
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
819
randomly changed to simulate practical machinery operation conditions. The monitoring time-step is set at r ¼ 0:5 h, that is, both predictors are applied automatically every 0.5 h to forecast the upcoming values of the normalised beta kurtosis index xb. After about 150 h, a transverse cut with 10% of the tooth root thickness is introduced to one tooth of gear ]1 to simulate the initial fatigue crack. Then the test is resumed and continues until the damaged tooth is broken off about 206 h later. Fig. 6 shows the forecasting results of the normalised monitoring index xb for gear ]1. It is seen that both predictors can capture the system’s dynamic behaviours. After the fault is introduced, the gear mesh dynamics change correspondingly. The general predictor (Fig. 6b) takes about 5 samples to recapture the system’s dynamic characteristics, whereas the adaptive predictor (Fig. 6a) takes only a couple of steps. As more data sets are available in testing, the adaptive predictor performs better than the general predictor because of the adaptive training process. During the last monitoring section, big fluctuations appear because the gear mesh dynamics change dramatically just before and after the tooth failure. The adaptive and the general predictors have provided alarms at about 5–6 samples and 3–4 samples, respectively, prior to the tooth breakage. These are very valuable indications for gear system condition monitoring.
0.8
Beta Kurtosis Index
Beta Kurtosis Index
4.2.4. Pitted gear monitoring The prior testing is associated with a localised gear fault. This test is for distributed fault condition monitoring. A new pair of gears ]1 and ]2 is mounted and tested. After about 150 h, several pits are introduced on a tooth surface in gear ]1 to simulate a pitting fault. The tests resume and proceed until serious noise occurs due to pitting damage. Fig. 7 shows the forecasting results. Both predictors perform reasonably well during the healthy period. After the pitting defect is introduced, the adaptive predictor (Fig. 7a) responses more quickly than the general predictor (Fig. 7b) to re-capture the gear system’s new dynamic characteristics. As the pitting fault propagates, the monitoring indicator xb becomes smaller, even though the noise level becomes stronger. This healing phenomenon is misleading in condition monitoring, which is associated with fault signal properties. After the pitting occurs, the pitted area can no longer carry a load, and the unpitted area has to take the extra load. Consequently, the unpitted area is prone to fatigue, and the pitting fault will propagate rapidly over the entire tooth surface and to other gear teeth. Correspondingly, the localised fault becomes the distributed surface failure. From the point of view of signal properties, when a localised fault occurs, some high-amplitude pulses will be generated due to impacts, which are relatively easier for a signal processing technique to detect. As the pitting propagates, the overall energy of the fault will increase, but it often becomes more wideband in nature and difficult to detect in the presence of the other vibratory components of the machine. This example identifies a characteristic of currently used signal processing techniques: It is usually easier to detect a distinct low-level narrowband tone than a high-level wideband signal in the presence of other signals or noises. Therefore, a possible solution to solve this healing problem is to integrate the information from both vibration monitoring and acoustics analysis. Furthermore, from examining Fig. 7, it is clear that the general predictor (Fig. 7b) loses track of the gear system’s behaviour during the healing section. It is because during this specific condition, the general predictor
0.7 0.6 0.5 0
(a)
50
100
150
200
Time Span (hours)
250
0.8 0.7 0.6 0.5 0
300 (b)
50
100
150
200
250
300
Time Span (hours)
Fig. 7. The test results for the pitted gear (solid curves): (a) the forecasting result by the adaptive predictor (dotted curve); (b) the forecasting result by the general predictor (dotted curve).
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
820
cannot adapt itself to the system’s new dynamics, so its forecasting operation is trapped by local minima. On the other hand, the adaptive predictor (Fig. 7a) can prevent this local trapping problem because of the effective training process. It should be stated that if variable time steps, such as 8 0.5, 20 0.5, 100 0.5 h, etc., are applied, as represented in Eq. (6), the proposed adaptive predictor can also be used for long-term forecasting operations. That information will be more helpful to schedule machinery maintenance and repairs to eliminate routine machine shutdowns and check outs. 4.3. Material fatigue testing The generality of the developed adaptive predictor is evaluated in this section by applying it to other cases represented by different scales and monitoring parameters. One of the examples is material fatigue testing, which is usually a time-consuming process. A reliable forecasting tool is very useful in quickly identifying the material properties so that the experimental time can be shortened. This test is performed on a specimen with a thickness of 3 mm, as shown in Fig. 8. The left end of the specimen is fixed to an experimental set-up, whereas the load is applied to its right end using an exciter. An initial crack of about 3 mm is introduced in the middle area, close to the supporting end of the plate. Both predictors are also initially trained by using a Mackey–Glass data set. The monitoring time-step is set at r ¼ 3000 cycles. Fig. 9 shows the forecasting results of the crack propagation trend which is represented by a direct (offline) measurement. At the starting period, the general predictor (Fig. 9b) gives large forecasting errors due to its slow convergence. The adaptive predictor (Fig. 9a), however, can overcome this problem by an adaptive training process. The same fatigue crack propagation trend is also indirectly measured (online) at a location close to the crack with the help of a special electric circuit. The forecasting results are illustrated in Fig. 10. It is seen that both predictors can capture and track the system’s characteristics. However, adaptive predictor (Fig. 10a) performs
Crack
Fig. 8. A specimen with an initial crack for material fatigue testing.
25 Monitoring Index (mm)
Monitoring Index (mm)
25 20 15 10 5 0 (a)
2
4 Cycles
6 (105)
8
10
20 15 10 5 0
(b)
2
4 Cycles
6
8
10
(105)
Fig. 9. The test results for a crack propagation using direct measurement (solid curves): (a) the forecasting result by the adaptive predictor (dotted curve); (b) the forecasting result by the general predictor (dotted curve).
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
80 Monitoring Index (Volts)
Monitoring Index (Volts)
80 60 40 20 0 -20
821
0
2
4
6
8
5
Cycles (10 )
(a)
60 40 20 0 -20
10
0
(b)
2
4
6
8
10
5
Cycles (10 )
Fig. 10. The test results for a crack propagation using indirect measurement (solid curves): (a) the forecasting result by the adaptive predictor (dotted curve); (b) the forecasting result by the general predictor (dotted curve).
better than the general predictor (Fig. 10b), thanks to its adaptive network architecture and effective training process. Without a doubt, the forecasting performance in the aforementioned cases can be further improved if the predictors are properly trained by using representative data sets corresponding to specific applications. 5. Conclusion To provide a wide array of industries with a more reliable and real-time forecasting tool, an adaptive predictor was developed in this paper based on the NF approach to predict the behaviour of dynamic systems. An adaptive training technique is proposed to further improve the forecasting efficiency. The adaptive predictor has been implemented for both gear condition monitoring and material fatigue testing, whereas the gear conditions consist of both the localised and distributed faults. Test results showed that the developed adaptive predictor is a reliable forecasting tool. It can capture the system’s dynamic behaviour quickly and track the system’s characteristics accurately. It is also a robust forecasting tool in terms of its capabilities to accommodate different system operation conditions and variations in system’s dynamic characteristics. The adaptive training technique is efficient in improving the forecasting performance by modifying the properties of the decision space boundaries and by preventing possible trapping due to local minima. The forecasting accuracy of the proposed adaptive predictor is higher than the general NF predictor, which is, in turn, superior to other classical forecasting schemes. Further research is currently underway to implement the adaptive predictor in complex industrial facilities and to develop new strategies for multiple-step predictions. Acknowledgements The author wishes to thank Drs. F. Ismail and F. Golnaraghi from the Department of Mechanical Engineering at the University of Waterloo for their support of this work. Appreciation is also given to the project students, Mrs. D. Simatovic and M. Puric, for their help in conducting the tests. Financial support of this project has been provided by the Natural Sciences and Engineering Research Council of Canada and by MC Technologies Inc. Appendix A. Derivation of some equations ðnÞ ðnÞ ðnÞ ðnÞ For the nth training data pair fxðnÞ 3r x2r xr x0 d g, n ¼ 1,2,y,N, if, for example, sigmoid MFs with j j parameters fai bi g are applied,
mM j ðxðnÞ ir Þ ¼ i
1 ; j 1þexp ðaji ðxðnÞ ir bi ÞÞ
i ¼ 0; 1; . . . ; 3; j ¼ 1; 2; . . . ; 1 ,
(A.1)
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
822
the following equations can be obtained j j xðnÞ exp aji xðnÞ ir bi ir bi ðnÞ j i h i2 ¼ xir bi 1 mM ji mM ji , j ¼ j qai 1 þ exp aji xðnÞ ir bi
(A.2)
j exp aji xðnÞ aji ir bi j i ¼h i2 ¼ ðmM ji 1Þai mM ji . j ðnÞ j qbji 1 þ exp ai xir bi
(A.3)
qmM j
qmM j
From Eq. (4) qmj ¼ qmM j
q
3 Q
k¼0
k
¼
qmM j
i
3 Q
mM j
k¼0
mM j
mM j
i
mj . mM j
k
¼
i
(A.4)
i
From Eq. (5) qxðnÞ þr qaji
¼
¼
¼
!
P16
ðnÞ k¼1 mk C k P 16 qaji k¼1 mk P16 ðnÞ P16 ðnÞ Cj k¼1 mk k¼1 mk C k P 2 16 k¼1 mk
q
C ðnÞ j
P16
k¼1 mk
P16
ðnÞ k¼1 mk C k
P 16
k¼1 mk
ðnÞ P16
¼
Cj
k¼1 mk
2
P16
ðnÞ k¼1 mk C k
P 16
k¼1 mk
2
qmj qaji qmj qmM ji , qmM j qaji i j ðxðnÞ ir bi Þð1 mM j Þmj ,
ðA:5Þ
i
j ðnÞ j ðnÞ j ðnÞ j ðnÞ j where C ðnÞ j ¼ c0 x0 þ c1 xr þ c2 x2r þ c3 x3r þ c4 . Similarly,
qxðnÞ þr qbji
¼
¼
P16
ðnÞ k¼1 mk C k P16 k¼1 mk
q qbji C ðnÞ j
P16
k¼1 mk
! ¼
C ðnÞ j
P16
ðnÞ k¼1 mk C k
k¼1 mk
P 16
P16
k¼1 mk
P16
P 16
k¼1 mk
2
ðnÞ k¼1 mk C k
2
qmj @bji
ðmM j 1Þmj aji .
ðA:6Þ
i
Therefore, qE n qaji
ðnÞ ¼ gn ðxðnÞ þr d Þ
¼
gn ðxðnÞ þr
ðnÞ
d Þ
qxðnÞ þr qaji C ðnÞ j
P16
k¼1 mk
P 16
P16
k¼1 mk
ðnÞ k¼1 mk C k
2
j ðxðnÞ ir bi Þð1 mM j Þmj , i
ðA:7Þ
ARTICLE IN PRESS W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823
qE n qbji
ðnÞ ¼ gn ðxðnÞ þr d Þ
¼
gn ðxðnÞ þr
ðnÞ
d Þ
823
qxðnÞ þr qbji C ðnÞ j
P16
k¼1 mk
P
P16
16 k¼1 mk
ðnÞ k¼1 mk C k 2
ðmM j 1Þmj aji .
ðA:8Þ
i
Eqs. (A.7) and (A.8) are Eqs. (14) and (15), respectively. References [1] M. Pourahmadi, Foundations of Time Series Analysis and Prediction Theory, Wiley, New York, 2001. [2] D. Chelidze, J. Cusumano, A dynamical systems approach to failure prognosis, Journal of Vibration and Acoustics 126 (2004) 1–7. [3] C. Li, H. Lee, Gear fatigue crack prognosis using embedded model, gear dynamic model and fracture mechanics, Mechanical Systems and Signal Processing 9 (2005) 836–846. [4] D. Husmeier, Neural Networks for Conditional Probability Estimation: Forecasting beyond Point Prediction, Springer-Verlag London Ltd., 1999. [5] A. Atiya, S. El-Shoura, S. Shaheen, M. El-Sherif, A comparison between neural-network forecasting techniques-case study: river flow forecasting, IEEE Transactions on Neural Networks 10 (1999) 402–409. [6] J. Connor, R. Martin, L. Atlas, Recurrent neural networks and robust time series prediction, IEEE Transactions on Neural Networks 5 (1994) 240–254. [7] P. Tse, D. Atherton, Prediction of machine deterioration using vibration based fault trends and recurrent neural networks, Journal of Vibration and Acoustics 121 (1999) 355–362. [8] F. Karray, C. deSilver, Soft Computing and Intelligent Systems Design: Theory, Tools, and Applications, Pearson Education Publishing Inc., 2004. [9] A. Ruano, Intelligent Control Systems using Computational Intelligence Techniques, IEE, London, 2005. [10] J. Korbicz, Fault Diagnosis: Models, Artificial Intelligence, Applications, Springer, Berlin, 2004. [11] J. Jang, C. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1997. [12] W. Wang, F. Golnaraghi, F. Ismail, Prognosis of machine health condition using neuro-fuzzy systems, Mechanical Systems and Signal Processing 18 (2004) 813–831. [13] W. Wang, F. Ismail, F. Golnaraghi, A neuro-fuzzy approach for gear system monitoring, IEEE Transactions on Fuzzy Systems 16 (2004) 710–723. [14] D. Nauck, Adaptive rule weights in neuro-fuzzy systems, Journal of Neural Computing and Applications 9 (2000) 60–70. [15] M. Figueiredo, R. Ballini, S. Soares, M. Andrade, F. Gomide, Learning algorithms for a class of neurofuzzy network and applications, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Review 34 (2004) 293–301. [16] W. Wang, F. Golnaraghi, F. Ismail, Condition monitoring of a multistage printing press, Journal of Sound and Vibration 270 (2004) 755–766. [17] W. Wang, F. Golnaraghi, F. Ismail, A real-time condition monitoring system for multistage machinery, US Patent #6901,335, 2005. [18] E. Walter, L. Pronzato, Identification of Parametric Models from Experimental Data, Springer, Berlin, 1997. [19] M. Kowal, J. Korbicz, Robust fault detection using neuro-fuzzy networks, in: Proceedings of the 16th IFAC World Congress, Prague, Czech Republic, 2005. [20] R. Pattern, J. Chen, Robust Model-based Fault Diagnosis for Dynamic Systems, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1999. [21] M. Mackey, L. Glass, Oscillation and chaos in physiological control systems, Science 197 (1977) 287–289. [22] P. McFadden, Interpolation techniques for time domain averaging of gear vibration, Mechanical Systems and Signal Processing 3 (1989) 87–97. [23] W. Wang, F. Ismail, F. Golnaraghi, Assessment of gear damage monitoring techniques using vibration measurements, Mechanical Systems and Signal Processing 15 (2001) 905–922.