Applied Soft Computing 7 (2007) 957–967 www.elsevier.com/locate/asoc
CPBUM neural networks for modeling with outliers and noise Chen-Chia Chuang a, Jin-Tsong Jeng b,* a
Department of Electrical Engineering, National Ilan University, 1, Sec. 1, Shen-Lung Road, I-Lan 260, Taiwan, ROC b Department of Computer Science and Information Engineering, National Formosa University, 64, Wen-Hua Road, Huwei Jen, Yunlin County 632, Taiwan, ROC Received 1 July 2005; received in revised form 16 March 2006; accepted 6 April 2006 Available online 18 July 2006
Abstract In this study, CPBUM neural networks with annealing robust learning algorithm (ARLA) are proposed to improve the problems of conventional neural networks for modeling with outliers and noise. In general, the obtained training data in the real applications maybe contain the outliers and noise. Although the CPBUM neural networks have fast convergent speed, these are difficult to deal with outliers and noise. Hence, the robust property must be enhanced for the CPBUM neural networks. Additionally, the ARLA can be overcome the problems of initialization and cut-off points in the traditional robust learning algorithm and deal with the model with outliers and noise. In this study, the ARLA is used as the learning algorithm to adjust the weights of the CPBUM neural networks. It tunes out that the CPBUM neural networks with the ARLA have fast convergent speed and robust against outliers and noise than the conventional neural networks with robust mechanism. Simulation results are provided to show the validity and applicability of the proposed neural networks. # 2006 Elsevier B.V. All rights reserved. Keywords: Outliers; Modeling; Annealing robust learning algorithm; CPBUM neural networks
1. Introduction The conventional neural networks are often used for modeling system due to its simplicity and nonlinear [1–3]. These conventional neural networks include multi-layered perceptron (MLP), wavelet networks, radial basis function network (RBF), piecewise smooth networks (PWSN), time delay input multi-layered perceptron, general regression neural network (GRNN), recurrent neural networks and so on. In those approaches, a task is to obtain networks that can act as closely to the system to be modeled as possible. Since these conventional neural networks approximated functions without requiring mathematical description of how the outputs functionally depend on the inputs, they are often referred to as model-free estimators [4]. That is, the basic modeling philosophy of model-free estimators is that they build systems from input–output patterns directly, or in more abstract, they learn from examples without any knowledge of the model type. This kind of learning schemes used for neural
* Corresponding author. Tel.: +886 5 6315573; fax: +886 5 6330456. E-mail addresses:
[email protected] (C.-C. Chuang),
[email protected] (J.-T. Jeng). 1568-4946/$ – see front matter # 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2006.04.009
networks can also be called data learning. Such learning schemes are to find functions that can match all training data as close as possible, no matter whether these data are trustable or not. That is, these methods use the conventional feedforward and the recurrent neural networks to modeling of nonlinear systems, and these conventional neural networks can approximate continuous function with desirable precision. However, these methods suffer a major drawback of slow learning speed, lack of a systematic design method and difficult to deal with the training data with outliers and noise for modeling. Our previous results [5,6] proposed the CPBUM neural networks for function approximation. The CPBUM neural networks can overcome the above problems. That is, the approximate transformable techniques, which consist of both the direct and indirect transformations are proposed to obtain the CPBUM neural networks for the feedforward/recurrent neural networks [5]. Specifically, the unified model can be obtained by way of the approximate transformable technique when the activation function of neural network is Riemann integrable. Hence, the unified model can be represented as a Chebyshev polynomial functional link network. Besides, the CPBUM neural networks not only have the same capability of universal approximator, but also have a faster learning speed
958
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
than the conventional feedforward/recurrent neural networks. For the recurrent neural network, the feedback parameter is regarded as input parameter in the proposed CPBUM neural networks. In addition, the relationship between the singlelayered neural network and the multi-layered perceptron neural network is derived [5]. Besides, Patra and Kot [7] used Chebyshev functional link network to do identification problem. Though the Chebyshev functional link network can overcome the learning speed and systematic design method, it is difficult to deal with the model with outliers and noise. That is, this study extends the CPBUM neural networks for modeling with outliers and noise. In general, for the scientific and engineering applications, the obtained training data maybe subject to outliers. The intuitive definition of an outlier [8] is ‘‘an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism’’. However, outliers may occur due to various reasons, such as erroneous measurements or noisy data from the tail of noise distribution functions. When the outliers exist, there still exist some problems in the traditional neural networks approaches [9,10]. Hence, various robust learning approaches [9–11] are proposed to overcome the problems of traditional neural network approaches while facing with outliers. Those robust approaches could indeed improve the learning performance when training data contain outliers. Nevertheless, in the use of robust learning algorithms, there also exist the problems of initialization (i.e. initial parameters in networks) and the selection of cut-off points (i.e. the rejected values of outliers). Besides, the MLP neural networks are often referred to as universal approximators. Nevertheless, if the used training data are corrupted by large noise, such as outliers, traditional backpropagation learning schemes may not always come up with acceptable performance. Even though various robust learning algorithms have been proposed in the literature, those approaches still suffer from the initialization problem. On the other hand, Loo et al. [12] proposed robust incremental growing multi-experts network with expert to deal with function approximation with outliers. Englund and Verikas [13] proposed a hybrid approach with fuzzy expert and neural networks to detect outliers. These approaches need some experts on the learning structure for function approximation with outliers. In this study, the ARLA that adopts the annealing concept into the robust learning algorithms is proposed to deal with the problem of modeling under the existence of outliers. The purpose of this study is to develop an annealing robust CPBUM neural network for the modeling with outliers and noise that not only has the better performance as the robust neural networks, but also requires less computing epoch than some existing conventional neural networks with robust mechanism. That is, to overcome the problems of the robust neural networks approaches for modeling with outliers and noise, a novel approach, based on the ARLA for the CPBUM neural networks, is proposed. In this proposed approach, an ARLA is applied to adjust the synaptic weights of the CPBUM neural networks for the improving learning performance. Because the ARLA has
been proposed to overcome the problems of initialization and cut-off point selection in the robust back-propagation learning algorithms, the results of ARLA have shown the superiority over the existing approaches with or without outliers [14]. In this study, not only is the annealing process adopted into the robust learning algorithms but also the annealing schedule k/t has the best performance among other annealing schedules [14]. Here, k is a constant and t is the epoch number. Hence, the CPBUM neural networks with the ARLA have fast convergence speed and robust against outliers. Finally, five simulation results are provided to show that the proposed method has fast convergence speed and can overcome the modeling with outliers and noise. 2. Statement of problems For some feedforward neural networks given by yˆ ðxin Þ ¼
h X
wi f i ðnetÞ;
net ¼ W j xin þ ui ;
(1)
i¼1
where xin is input vector, wi the weight of hidden layer to output layer, Wj the weight matrix of input layer to hidden layer, ui bias constant, yˆ output vector, f i activation function, and h is number of hidden neuron. Typical examples are MLP, wavelet network, RBFN, and PWSN. Besides, some feedforward neural networks such as general regression neural networks, and probabilistic neural networks are expressed by Ph wip f i ðnetÞ ; (2) yˆ ðxin Þ ¼ Phi¼1 0 j¼1 w jp f j ðnetÞ where wip and w jp are the weights of input unit to pattern unit for general regression neural networks, and f j is activation function. Recurrent neural networks are given by yˆ ðxin Þ ¼
h X
wi f i ðnetÞ;
net ¼ W j xin þ W j0 u þ ui ;
(3)
i¼1
where W j0 is the weight matrix of input layer to hidden layer for the feedback input and u is the feedback input from yˆ . Typical examples are time delay MLP, recurrent neural networks, and recurrent RBFN. Assume that the activation functions of neural networks satisfy the Riemann integrable condition. The objective of this study is to obtain the unified model based on Chebyshev polynomials given below ˆ yˆ WXðT i ðxin ÞÞ;
(4)
such that the CPBUM neural networks are not only capable of functional approximation, but also are faster in learning speed than conventional feedforward/recurrent network. When the target function is single input, XðT i ðxin ÞÞ ¼ ½ T 0 ðxÞ T 1 ðxÞT 2 ðxÞ T PðoÞ1 ðxÞT PðoÞ ðxÞT . Besides, when the target function is mult-input, XðT i ðxin ÞÞ ¼ ½ 1 jT 1 ðx1 Þ T 1 ðxn Þj T 2 ðx1 Þ T 2 ðxn Þ T 1 ðx1 Þ T 1 ðx2 Þ T 1 ðxn1 Þ T 1 ðxn Þ j j T PðoÞ ðx1 Þ T PðoÞ ðxn ÞT1L ; where P(o) is the maximum order
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
of approximate polynomials, Ti(x) Chebyshev polynomial with order i, n the number of input and xin is input vector. Besides, the representation of X(Ti(xin)) can be found in Ref. [5]. The objective of this study is to use a new cost function for the CPBUM neural networks are defined here EN ðtÞ ¼
N 1X r½ei ðtÞ; bðtÞ; N i¼1
(5)
where yˆ ðxin ; wÞ ¼ ½ yˆ 1 yˆ 2 yˆ 1 T ; yˆ i ¼ f ð hn wnhn P f ð hn1 wðn1Þhn1 f ð f ðnetÞ ÞÞÞ; net = W1xin + u, xin is input vector, Wn the weight of hidden layer to output layer, W1 the weight matrix of input layer to hidden layer, ui the bias constant, y the output vector, f i the activation function, and h is the number of hidden neuron. The recurrent neural network can be described as a nested nonlinear function in the following: Nðxin ; wÞ ¼ yˆ ðxin ; wÞ;
where ˆ ei ðtÞ ¼ yi yˆ ðxin Þ ¼ yi WXðT i ðxin ÞÞ;
959
P
T
(6)
t is the epoch number, ei(t) the error between the ith desired output and the ith output of the CPBUM neural networks at epoch t and r() is a logistic loss function and defined as 2 b ei r½ei ; b ¼ ln 1 þ ; (7) 2 b where b(t) is a deterministic annealing schedule acting like the cut-off points. The cut-off points determine that how large errors can be regarded as outliers. If the errors are regarded as outliers, then the errors are reduced by the loss function. Hence, the magnitude of weights update are lower than the original magnitude of weights update when the errors are large than the cut-off points. That is, the magnitude of error is decreased by the loss function when the magnitude of error is large than the cut-off points. Otherwise, the magnitude of error is remained and not affected by the loss function. Moreover, the ARLA is proposed to overcome the problems of initialization and cut-off points in the robust learning algorithm and deal with the model with outliers and noise [14]. It tunes out that the CPBUM neural networks with the ARLA have fast convergence speed and robust against outliers and noise than the conventional neural networks with robust mechanism.
P
where yˆ ðxin ; wÞ ¼ ½ yˆ 1 yˆ 2 yˆ 1 ; yˆ i ¼ f ð hn wnhn P f ð hn1 wðn1Þhn1 f ð f ðnetÞ ÞÞÞ; net = W1xin + Wju + ui, and Wj is the weight matrix of input layer to hidden layer for the feedback input. According to Theorem 1, we have ˆ yˆ ðxin ; wÞ WXðT i ðxin ÞÞ:
The development of CPBUM in a three-layered feedforward/recurrent neural networks can been found in Lee and Jeng [5]. Besides, for the general feedforward/recurrent neural networks can be found in Jeng and Lee [6]. The CPBUM neural networks are summarized as follows: Theorem 1. If every activation function satisfies the Riemann integrable condition for the standard MLP and the recurrent neural network, then the MLP/recurrent neural network can always be represented as CPBUM neural networks [6]: (8)
where XðT i ðxin ÞÞ ¼ ½ 1 jT 1 ðx1 Þ T 1 ðxn Þj T 2 ðx1 Þ T 2 ðxn ÞT 1 ðx1 ÞTP 1 ðx2 Þ T 1 ðxn1 ÞT 1 ðxn Þj jT PðoÞ ðx1 Þ T PðoÞ PðoÞ ˆ ¼ ½ w0 w1 w2 ðxn ÞT1L ; L ¼ k¼0 ðk þ n 1Þ!=k!ðn 1Þ!; W wL1 wL . The standard MLP neural network can be described as a nested nonlinear function in the following: Nðxin ; wÞ ¼ yˆ ðxin ; wÞ;
(9)
(11)
The proposed CPBUM neural networks are shown in Fig. 1. It consists of two parts; namely, numerical transformed part in part (a) and learning part in part (b). Numerical transformation deals with the input layer to hidden layer by the approximate transformable method. Hence, there is no learning in this part. As a result, the Chebyshev polynomial basis can be viewed as a new input vector. Therefore, it will reduce much learning time after this transformation. The part (b) is a functional link network, based on Chebyshev polynomials. Because this model is a single-layered model, its learning speed is fast. Note that from Eq. (11), the CPMUM neural network is a
3. The CPBUM neural networks
ˆ yˆ ðxin Þ ¼ WXðT i ðxin ÞÞ;
(10)
Fig. 1. The architecture of the CPBUM neural networks.
960
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
linear-parameter and a nonlinear-input structure. However, the CPBUM neural networks in Fig. 1 are easy to be extended as a nonlinear-parameter and a nonlinear-input structure. This modified part is output layer in Fig. 1 to use nonlinear activation functions such as sigmoid function and hyperbolic tangent. This new structure can be represented by ˆ yˆ new aðˆyðxin ; wÞÞ ¼ aðxin ; WÞ;
(12)
where a is a nonlinear activation function. 4. The annealing robust learning algorithm for the CPBUM neural networks In the CPBUM neural networks, an ARLA is proposed as learning algorithm in this study. An important feature of ARLA that adopt the annealing concept into the cost function of the robust back-propagation learning algorithm is proposed in Ref. [14]. Hence, the ARLA can be overcome the existed problems in the robust back-propagation learning algorithm. Based on the gradient-descent kind of learning algorithms, the synaptic weights w j are updated as N X
Dw j ¼ h
@EARCNNs @ei ¼ h ’ðei ; bÞ ; @w j @w j i¼1
(13)
’ðei ; bÞ ¼
@rðei ; bÞ ei ¼ ; @ei 1 þ e2i =bðtÞ
(14)
where h is a learning constant and w() is usually called the influence function. When outliers exist, they have great impact on the approximated results. Such an impact can be understood through the analysis of the influence function. The using loss function in Eq. (7) and its influence function are shown at Fig. 2. In the ARLA, the properties of the annealing schedule b(t) are summarized as [14]: (A) binitial = c max{jeijinitial}, for i = 1, . . ., N, where jeijinitial is the absolute value of the error for the ith training data at the first epoch and c is a constant. In this study, the constant c is chosen as 2; (B) b(t) ! 0+ for t ! 1; (C) b(t) = k/t for any t epoch, where k is constants. For property (A), an annealing schedule is to use larger cutoff points in the early training stage and then to use smaller cutoff points in the later training stage. In the beginning of the process, the function to be modeled is completely unknown and then the error measure used to discriminate against outliers may not be correct. Thus, it is better to use larger cut-off points to
Fig. 2. The logistic loss function (7) and its influence function are shown.
include all possible points or even not to use the loss function. Hence, the suitable initial value of the cut-off points are selected as binitial = c max{jeijinitial}, for i = 1, . . ., N, where jeijinitial is the absolute value of the error for the ith training data at the first epoch and c is a constant. In this paper, the constant c is chosen as 2. For property (B), the similar concept has also been taken into account in the traditional BP learning algorithms. For example, the so-called early stopping approaches, which are to stop the training process under certain conditions, are often proposed to overcome the overfitting phenomenon [15]. Furthermore, it has been showed that in the robust statistic theory, when b(t) ! 0+ for t ! 1, the Huber M-estimator is equivalent to a linear L1 norm estimator [16]. For property (C), how fast the annealing schedule should be defined. When the decay is too quick, the approximation of the majority may not have enough time to converge and the training data may mostly be degraded. If the decay is too slow, the robust learning algorithm may not be in time to discriminate against those outliers before overfitting occurs. Additionally, the differences and similarities between the robust BP and the ARBP learning algorithm are summarized as Table 1. The procedure of the CPBUM neural networks with the ARLA are stated as follows: Step 1. Initialize the CPBUM neural networks structure using the random weights. Step 2. Compute the estimated result and its error via Eq. (6) for all training data.
Table 1 The differences and similarities between the ARLA learning algorithm and the robust BP learning algorithm are summarized
Differences
Similarities
ARLA
Robust BP learning algorithm
The ARLA overcomes the problems of initialization and the cut-off points in the robust BP learning algorithm [12] Deal with outliers
The problems of initialization and cut-off value are existed in the robust BP learning algorithm [12]. Robust BP learning algorithm need initial procedure Deal with outliers
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
Step 3. The synaptic weights w j are iteratively updated via Eqs. (13) and (14). At this process, the influence of the outliers is detected and discriminated. Step 4. Determine the values of annealing schedule b(t) = k/t. Step 5. Compute the robust cost function EN defined via Eq. (5). Step 6. If the termination conditions are not satisfied, then go to Step 2; otherwise terminate the learning process. 5. Simulation results In this section, five examples are used to verify of the proposed approach. The simulations were conducted in the Matlab environment. The root mean square error (RMSE) of the testing data is used to measure the performance of the learned networks (generalization capability). The RMSE is defined as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN yi yi Þ2 i¼1 ðˆ RMSE ¼ ; (15) N where yi is the desire value at xi and yˆ i is the output of the CPBUM neural networks. The learning constant h is chosen as 0.01 for all examples. 5.1. Example 1 Consider a dynamic system and its model is written as yðx½kÞ ¼ sinð5x½kÞ cos1 ðx½kÞ cosð3x½k 1Þ þ G þ H; for k ¼ 0; 1; 2; . . . ; 50;
(16)
961
where G is normal distributions and H is outliers. In this example, G N(0, 1) and there are five outliers H. The x(k) is the input sequence generated by equally sampling the interval [1, 1] for k = 0, 1, 2, . . ., 50. The initial condition for the system is y(x[0]) = 3 and b(t) = 100/t. Case 1 only adds five outliers. The learning result is shown in Fig. 3(c) under five outliers. The target data with five outliers and initial condition are shown in Fig. 3(a) and (b), respectively. Besides, the proposed structure is not greater than 15 epochs to obtain Fig. 3(c). The RMSE versus epoch is shown in Fig. 3(d). On the contrary, when this function uses the conventional neural networks with the robust BP learning algorithm to get Fig. 3(c), it needs greater than 3000 epochs [14]. In Case 2, we use the same parameters, but the complicated function with outliers and noise. The noise is a normal distribution with mean zero, variance one and standard deviation one. The training data, learning results and error curve are shown in Fig. 4. The learning epoch is not greater than 10 epochs to obtain Fig. 4(c). From this example, the proposed robust learning algorithms could indeed improve the learning performance as the training data contain outliers and noise. Table 2 shows the RMSE errors of the CPBUM neural networks with the ARLA and the robust BP with different initial epochs for example 1 with outliers and noise. From Table 2, it is also evident that the performance of the CPBUM neural networks with the ARLA is better than the robust learning algorithms no matter when to switch learning algorithms. Besides, the computer is P4 2.4 G. The computational time with proposed approach for this example with outliers and noise is 7.543 s. However, robust BP needs 3.255 min. Besides, the conventional neural networks and CPBUM neural networks without robust mechanism are
Fig. 3. The simulation results for example 1 with five outliers are shown.
962
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
Fig. 4. The simulation results for example 1 with five outliers and noise are shown.
difficult to deal with modeling with outliers and noise and may not always come up with acceptable performance. 5.2. Example 2 The complicated function is presented in Ref. [17]: f ðxÞ ¼ sinð2p5xÞ þ sinð2p10xÞ þ G þ H;
x2½0
0:3 : (17)
This example was used in the previous paper [6] for comparing with the conventional neural networks without outliers and noise. It needs 40 hidden neurons in Ref. [17] to perform this function approximation without G and H. In this study, P(o) equals to 20. That is, the total number of weights is 21 in the unified model. The number of sampling data is 121 and b(t) = 50/t. Case 1 only adds H. There are two outliers in Table 2 The RMSE errors of the CPBUM neural networks with the ARLA and the robust BP with different initial epochs for example 1 with outliers and noise are shown Learning algorithms
RMSE
Robust BP with initial epoch = 1000 Robust BP with initial epoch = 2000 Robust BP with initial epoch = 3000 Robust BP with initial epoch = 4000 Robust BP with initial epoch = 5000 CPBUM neural networks with the ARLA (b(t)=100/t) for epoch = 50 CPBUM neural networks with the ARLA (b(t)=10/t) for epoch = 50
0.1432 0.1273 0.1153 0.1130 0.1116 0.0910 0.0932
this case. The learning result is shown in Fig. 5(c) under two outliers. The target data with two outliers and initial condition are shown in Fig. 5(a) and (b), respectively. Besides, the proposed structure is not greater than 10 epochs to obtain Fig. 5(c). The RMSE versus epoch is shown in Fig. 5(d). On the contrary, when this function uses the conventional neural networks with the robust BP learning algorithm to get Fig. 5(c), it needs greater than 3500 epochs. In Case 2, we use the same parameters, but the complicated function with outliers and noise. The noise is a normal distribution with mean zero, variance one and standard deviation one. The training data, learning results and error curve are shown in Fig. 6. The learning epoch is not greater than 10 epochs to obtain Fig. 6(c). From this example, the proposed robust learning algorithms could indeed improve the learning performance as the training data contain outliers and noise. Finally, the computational time with proposed approach for this example with outliers and noise is 8.126 s. However, robust BP needs 4.325 min. 5.3. Example 3: approximation of a two-dimensional function Consider a two-dimensional function is given by f ðx1 ; x2 Þ ¼ ðx21 x22 Þ sinð0:5x1 Þ þ G þ H; x1 ; x2 2 ½ 10 10 :
(18)
This two-dimension example was also used in the previous paper [6] for comparing with the conventional neural networks without outliers and noise. In this study, assume the number of sampling data is 1681, P(o) equals to 38 and b(t) = 258/t.
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
963
Fig. 5. The simulation results for example 2 with two outliers are shown.
Case 1 only adds H. There are five outliers in this case. The learning result is shown in Fig. 7(c) under five outliers. The target data without outliers and target data with outliers are shown in Fig. 7(a) and (b), respectively. Besides, the proposed structure
is not greater than 10 epochs to obtain Fig. 7(c). The RMSE versus epoch is shown in Fig. 7(d). On the contrary, when this function uses the conventional neural networks with the robust BP learning algorithm to get Fig. 7(c), it needs greater than
Fig. 6. The simulation results for example 2 with two outliers and noise are shown.
964
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
Fig. 7. The simulation results for example 3 with five outliers are shown.
4000 epochs. In Case 2, we use the same parameters, but the complicated function with outliers and noise. The noise is a normal distribution 5 N(0, 1). The training data, learning results and error curve are shown in Fig. 8. The learning epoch is not greater than 10 epochs to obtain Fig. 8(c). From this
example, the proposed robust learning algorithms could indeed improve the learning performance as the training data contain outliers and noise. Finally, the computational time with proposed approach for this example with outliers and noise is 88.635 min. However, robust BP needs 125.178 min.
Fig. 8. The simulation results for example 3 with five outliers and noise are shown.
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
965
Fig. 9. The simulation results for example 4 with five outliers are shown.
5.4. Example 4 The two-variable sin c function is considered. sin c (x1, x2) is defined as y¼
sinðx1 Þ sinðx2 Þ þ G þ H; x1 x2
5 x1 ; x2 5:
(19)
In this example, G N(0, 1) and there are five outliers H. Besides, P(o) equals to 25 and b(t) = 108/t. The number of sampling data is 1225. Case 1 only adds H. There are five outliers in this case. The learning result is shown in Fig. 9(c) under five outliers. The target data without outliers and target data with outliers are shown in Fig. 9(a) and (b), respectively. Besides, the proposed structure is not greater than five epochs to
Fig. 10. The simulation results for example 4 with five outliers and noise are shown.
966
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
Fig. 11. The simulation results for example 5 with five outliers are shown.
obtain Fig. 9(c). The RMSE versus epoch is shown in Fig. 9(d). On the contrary, when this function uses the conventional neural networks with the robust BP learning algorithm to get Fig. 9(c), it needs greater than 2500 epochs. In Case 2,
we use the same parameters, but the complicated function with outliers and noise. The noise is a normal distribution 0.05 N(0, 1). The training data, learning results and error curve are shown in Fig. 10. The learning epoch is not greater
Fig. 12. The simulation results for example 5 with five outliers and noise are shown.
C.-C. Chuang, J.-T. Jeng / Applied Soft Computing 7 (2007) 957–967
967
than five epochs to obtain Fig. 10(c). From this example, the proposed robust learning algorithms could indeed improve the learning performance as the training data contain outliers and noise.
noise. Simulation results are provided to show the validity and applicability of the proposed neural networks.
5.5. Example 5
This work was supported by National Science Council Under Grant NSC92-2213-E-150-007.
Acknowledgement
Consider a two-dimensional function is given by f ðx; yÞ ¼ 0:2xy þ 1:2 sinðx2 þ y2 Þ þ G þ H; x; y 2 ½ 1 3 :
References (20)
In this example, G N(0, 1) and there are five outliers H. Besides, P(o) equals to 47 and b(t) = 108/t. The number of sampling data is 1681. Case 1 only adds H. There are five outliers in this case. The learning result is shown in Fig. 11(c) under five outliers. The target data without outliers and target data with outliers are shown in Fig. 11(a) and (b), respectively. Besides, the proposed structure is not greater than five epochs to obtain Fig. 11(c). The RMSE versus epoch is shown in Fig. 11(d). On the contrary, when this function uses the conventional neural networks with the robust BP learning algorithm to get Fig. 11(c), it needs greater than 5000 epochs. In Case 2, we use the same parameters, but the complicated function with outliers and noise. The noise is a normal distribution N(0, 1). The training data, learning results and error curve are shown in Fig. 12. The learning epoch is not greater than 10 epochs to obtain Fig. 12(c). From this example, the proposed robust learning algorithms could indeed improve the learning performance as the training data contain outliers and noise. From above results, the CPBUM neural networks with the ARLA have fast convergence speed and robust against outliers and noise. That is, the proposed CPBUM neural networks with the ARLA have fast convergent speed than conventional robust neural networks. 6. Conclusions In this study, the CPBUM neural networks with the ARLA are developed to improve the conventional robust neural networks for function approximation with outliers and noise. In this proposed approach, an ARLA is applied to improve the performance of the CPBUM neural networks. That is, the ARLA can be overcome the problems of initialization and cutoff point are existed in the conventional robust neural networks with outliers and noise. Hence, the proposed neural networks have fast convergence speed and robust against outliers and
[1] C.C. Chuang, J.T. Jeng, T.T. Lee, Robust adaptive tracking control via CPBUM neural network for mimo nonlinear systems, Int. J. Electr. Eng. 12 (2005) 313–324. [2] C.C. Chuang, J.T. Jeng, P.T. Lin, Annealing robust radial basis function networks for function approximation with outliers, Neurocomputing 56 (2004) 123–139. [3] N. Roy, R. Ganguli, Filter design using radial basis function neural network and genetic algorithm for improved operational health monitoring, Appl. Soft Comput. J. 6 (2006) 154–169. [4] P. Melin, O. Castillo, Adaptive intelligent control of aircraft systems with a hybrid approach combining neural networks, fuzzy logic and fractal theory, Appl. Soft Comput. J. 3 (2003) 353–362. [5] T.T. Lee, J.T. Jeng, The Chebyshev-polynomial-based unified model neural networks for function approximation, IEEE Trans. Syst. Man Cybernatics Part B: Cybernet. 28 (1998) 925–935. [6] J.T. Jeng, T.T. Lee, Control of magnetic bearing systems via the Chebyshev polynomial-based unified model (CPBUM) neural network, IEEE Trans. Syst. Man Cybernatics Part B: Cybernet. 30 (2000) 85–92. [7] J.C. Patra, A.C. Kot, Nonlinear dynamic system identification using Chebvshev functional link artificial neural networks, IEEE Trans. Syst. Man Cybernatics Part B: Cybernet. 32 (2002) 505–511. [8] D.M. Hawkins, Identification of Outliers, Chapman and Hall, 1980. [9] C.C. Lee, P.C. Chung, J.R. Tsai, C.I. Chang, Robust radial basis function neural networks, IEEE Trans. Syst. Man Cybernet. 29 (1999) 674–685. [10] C.C. Chuang, S.F. Su, J.T. Jeng, C.C. Hsiao, Robust support vector regression networks for function approximation with outliers, IEEE Trans. Neural Networks 13 (2002) 1322–1330. [11] D.S. Chen, R.C. Jain, A robust back-propagation learning algorithm for function approximation, IEEE Trans. Neural Networks 5 (1994) 467–479. [12] C.K. Loo, M. Rajeswari, M.V.C. Rao, Robust incremental growing multiexperts network, Appl. Soft Comput. 6 (2006) 139–153. [13] C. Englund, A. Verikas, A hybrid approach to outlier detection in the offset lithographic printing process, Artif. Intell. 18 (2005) 759–768. [14] C.C. Chuang, S.F. Su, C.C. Hsiao, The annealing robust backpropagation (ARBP) learning algorithm, IEEE Trans. Neural Networks 11 (2000) 1067–1077. [15] W.S. Sarle, Stopped training and other remedies for overfitting, in: Proceeding of the 27th Symposium on Interface Computing Science and Statistics, 1995, 352–360. [16] W. Li, J.J. Swetits, Linear L1 estimator and Huber M-estimator, SIAM 29 (1995) 316–325. [17] Y.C. Pati, P.S. Krishnaprasad, Analysis and synthesis of feedforward neural networks using discrete affine wavelet transformations, IEEE Trans. Neural Networks 4 (1993) 73–85.