Expert Systems with Applications 36 (2009) 8925–8931
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Hybrid robust approach for TSK fuzzy modeling with outliers Chen-Chia Chuang a, Jin-Tsong Jeng b,*, Chin-Wang Tao a a b
Department of Electrical Engineering, National Ilan University, 1, Sec. 1, Shen-Lung Road, I-Lan 260, Taiwan Department of Computer Science and Information Engineering, National Formosa University, 64, Wen-Hua Road, Huwei Jen, Yunlin County 632, Taiwan
a r t i c l e
i n f o
Keywords: TSK fuzzy model Robust clustering algorithm Hybrid robust approach Robust learning algorithm Outliers
a b s t r a c t This study proposes a hybrid robust approach for constructing Takagi–Sugeno–Kang (TSK) fuzzy models with outliers. The approach consists of a robust fuzzy C-regression model (RFCRM) clustering algorithm in the coarse-tuning phase and an annealing robust back-propagation (ARBP) learning algorithm in the fine-tuning phase. The RFCRM clustering algorithm is modified from the fuzzy C-regression models (FCRM) clustering algorithm by incorporating a robust mechanism and considering input data distribution and robust similarity measure into the FCRM clustering algorithm. Due to the use of robust mechanisms and the consideration of input data distribution, the fuzzy subspaces and the parameters of functions in the consequent parts are simultaneously identified by the proposed RFCRM clustering algorithm and the obtained model will not be significantly affected by outliers. Furthermore, the robust similarity measure is used in the clustering process to reduce the redundant clusters. Consequently, the RFCRM clustering algorithm can generate a better initialization for the TSK fuzzy models in the coarsetuning phase. Then, an ARBP algorithm is employed to obtain a more precise model in the fine-tuning phase. From our simulation results, it is clearly evident that the proposed robust TSK fuzzy model approach is superior to existing approaches in learning speed and in approximation accuracy. Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction In the past, fuzzy modeling techniques were successfully employed for modeling complex systems. Generally, the TSK type of fuzzy models proposed in Sugeno and Yasukawa (1993) and Takagi and Sugeno (1985) has attracted considerable attention from the fuzzy modeling community due to its good performance in various applications. Besides, the online identification of the TSK fuzzy model can be found in Angelov and Filev (2004) and Kim, Whang, Park, Kim, and Park (2005). In the literature, various approaches (Babuska, 1998; Chang & Liu, 2008; Dickerson & Kosko, 1996; Klawonn & Kruse, 1997; Kwak & Kim, 2006) for constructing the TSK fuzzy rules have been proposed. Traditionally, the input space is first divided into fuzzy subspaces through an unsupervised clustering algorithm based on only the input portion of training data (Bezdek, 1981; Jain & Dubes, 1988). The fuzzy subspaces are then tuned through a supervised learning algorithm. In another approach (Delgado, Gómez Skarmeta, & Vila, 1996; Jain & Dubes, 1988; Simpson, 1993; van den Bergh & van den Berg, 2000), in order to account for the interaction between the input and output variables, the authors considered the product space of the input and output variables instead of only considering the input variable space. How* Corresponding author. Tel.: +886 (5) 6315573; fax: +886 (5) 6330456. E-mail addresses:
[email protected] (C.-C. Chuang),
[email protected] (J.-T. Jeng). 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.11.053
ever, those approaches still define fuzzy subspaces in a clustering manner, and do not take into account the functional properties of the TSK fuzzy models. As a result, the number of fuzzy subspaces may tend to be more than enough. Recently, a novel approach was proposed in Euntai, Minkee, Seunghwan, and Mignon (1997) and Hathaway and Bezdek (1993). In the approach, the fuzzy subspaces and the parameters of functions in the consequent parts are simultaneously identified via the fuzzy C-regression model (FCRM) clustering algorithm. FCRM is to define fuzzy subspaces in terms of errors to the functions defined in the TSK fuzzy models instead of errors to the cluster centers. But, in that approach, users still need to assign the number of clusters in the FCRM clustering algorithm. In the FCRM clustering algorithm, the clustering behavior does not incorporate the optimization operations into the modeling process, and as a result, the modeling performance is inadequate. Hence, in our study, a supervised learning algorithm based on the principle of least square error minimization is employed to improve the modeling accuracy. In real applications, the obtained data may be subject to outliers. Outliers occur for various reasons, such as erroneous measurements or noisy data from the heavy-tails of noise distribution functions (Evans, Hastings, & Peacock, 1993; Hawkins, 1980). The effects of outlier are not considered in the above-mentioned approaches. In order to reduce the effects of outliers, the used clustering and learning algorithms must be equipped with robust mechanisms. In the past, several robust approaches for the TSK
8926
C.-C. Chuang et al. / Expert Systems with Applications 36 (2009) 8925–8931
fuzzy modeling (Chuang, Su, & Chen, 2001; Kim, Kyung, Park, Kim, & Park, 2004; Leski, 2004, 2005) are proposed to degrade outliers’ effects. In Kim et al. (2004), proposed the algorithm that additional term is added to an objective function of noise clustering algorithm to obtain robust performance against outliers. In Chuang et al. (2001), the robust fuzzy regression agglomeration (RFRA) clustering algorithm is proposed to define the fuzzy subspaces in a fuzzy regression manner with robust capability against outliers. Since the concept of competitive agglomeration is incorporated into the cost function of the RFRA clustering algorithm for combining clusters, this approach requires more computation time due to the complicated formulas involved. Also, since the input data distribution is not considered, properties related to the input data distribution are lost and sometimes the algorithm may fail to obtain a satisfactory initialization for the TSK fuzzy model. In order to enhance the modeling accuracy, a robust back-propagation (BP) learning algorithm is proposed. Besides, this robust BP learning algorithm has been successfully applied to fuzzy neural networks (Wang, Lee, Liu, & Wang, 1997). Nevertheless, in the use of robust BP learning algorithms, there also exist problems regarding initialization and how to select the cut-off value (Chuang, Su, & Hsiao, 2000). In Leski (2004, 2005), the epsilon-insensitive loss function of support vector regression (SVR) is incorporated into the FCRM for fuzzy modeling. This approach leads to C simultaneous quadratic programming problems with bound constraints and one linear equality constraint. However, the parameter selection problem in SVR also appears in this approach. In Chuang, Su, Jeng, and Hsiao (2002) and Jeng (2006), authors illustrated that the performance (results) of SVR is sensitive to the hyper-parameters of SVR. On the other hand, Gu and Wang (2007) proposed a hybrid approach for the fuzzy modeling. However, this approach did not consider the outlier for the fuzzy modeling. To overcome the above problems, this study proposes the hybrid robust approach for the TSK fuzzy modeling with outliers. First, the RFCRM clustering algorithm is employed to obtain the fuzzy subspaces and the parameters of functions in consequent parts for the TSK fuzzy models. To reduce the number of clusters in the RFCRM clustering process, the merging approaches of clusters must be used. Previously, several similarity measures for the merger of clusters have been proposed (Backer & Jain, 1981; Frigui & Krishnapuram, 1996). In this study, the robust similarity measure is used to reduce the number of clusters (Frigui & Krishnapuram, 1996). Notably, the cluster merging approach is not incorporated into the cost function of the RFCRM clustering algorithm. Consequently, the complicated formulas and their computation time in the RFRA clustering algorithm can be largely reduced. In other words, the proposed approach in the coarse-tuning phase is simply the computational formula of the RFRA clustering algorithm (Chuang et al., 2001). Additionally, the number of clusters is obtained by using the robust similarity measure. Secondly, to enhance the modeling accuracy and to overcome the problems in the robust BP learning algorithm, an ARBP learning algorithm is employed to fine-tune the obtained TSK fuzzy model. As a result, the proposed hybrid approach can provide an efficient approach for robust TSK fuzzy modeling with outliers. The simulation results illustrate that the proposed hybrid robust approach indeed exhibits superior performance in the learning speed and in the approximation accuracy. The remainder of this paper is organized as follows. After the introduction section, Section 2 discusses the robust TSK fuzzy modeling concept. Next, Section 3 discusses the proposed RFCRM clustering algorithm. An ARBP learning algorithm then is briefly discussed in Section 4. In Section 5, various examples are considered to demonstrate the superiority of the proposed approach. Finally, Section 6 concludes this paper.
2. The concepts of the robust TSK fuzzy modeling ^ from a set of The considered problem is to obtain a model y xð2Þ; y2 Þ; . . . ; ð~ xðNÞ; yN Þg with ~ xðiÞ 2 Rn observations, fð~ xð1Þ; y1 Þ; ð~ and yi e R, where N denotes the number of training data, ~ xðiÞ ¼ ½x1 ðiÞ; x2 ðiÞ; . . . ; xn ðiÞ represents the ith input vector, and yi is the desired output for the input ~ xðiÞ. Suppose that those observations are obtained from an unknown function y ¼ f ðx1 ; x2 ; . . . ; xn Þ. ^ that can accurately characterize Ideally, the task is to construct a y f in terms of input–output relationships. In this study, a TSK fuzzy ^. model is used to represent y Generally, a TSK fuzzy model consists of a set of IF-THEN rules with the form
Ri : If x1 is Ai1 and x2 is Ai2 ; ; xn is Ain i then h ¼ fi ðx1 ; x2 ; . . . ; xn ; ~ ai Þ ¼ fi ð~ x; ~ ai Þ ¼ ai0 þ ai1 x1 þ þ ain xn
ð1Þ Aij
for i = 1,2,. . .,C, where C denotes the number of rules, is the corresponding fuzzy set, and ~ ai ¼ ðai0 ; :::; ain Þ is the parameter set in the consequent part. The predicted output of the fuzzy model is inferred as
PC i wi h ^ ¼ Pi¼1 y ; C i i¼1 w
ð2Þ
where hi = fi ð~ xðjÞ; ~ ai Þ ¼ ai0 þ ai1 x1 ðjÞ þ þ ain xn ðjÞ and is the output of xðjÞÞ. Both the rule Ri and the weight wi is obtained as min Aij ð~ j¼1;2;...;N
parameters in the premise parts and the consequent parts for the TSK fuzzy model are required to be identified. Another, the number of rules is also specified. As mentioned earlier, many clustering approaches, such as the FCM and the FCRM clustering algorithms, have been proposed to find those parameters in the premise parts and in the consequent parts. However, outliers may affect these clustering results. Thus, various robust clustering algorithms have been proposed to deal with this problem (Dave & Krishnapurum, 1997; Frigui & Krishnapuram, 1999). Similarly, traditional supervised learning algorithms cannot overcome the effects of outliers, and various robust learning algorithms (Chen & Jain, 1994; David & Sanchez, 1995; Wang et al., 1997) have also been proposed. Usually, the so-called M-estimators are employed in the training process of the robust approaches for the clustering algorithms and for the supervised learning algorithms. The basic idea of M-estimators is to replace the squared error term (L2 norm) in the cost function by the robust loss functions (Hampel, Ronchetti, Rousseeuw, & Stahel, 1986; Rousseeuw & Leroy, 1987) so that the effects of outliers may be degraded. In this study, such robust concept is used in the RFCRM clustering algorithm for the coarse-tuning phase and in the ARBP learning algorithm for the fine-tuning phase. The detailed algorithms will be introduced and discussed in the following sections. 3. Robust fuzzy C-regression model clustering algorithm In the RFCRM clustering algorithm, the robust loss function of regression errors and the input data distribution are considered. The cost function of the RFCRM clustering algorithm is defined as
J¼
C X N X i¼1
2 u2ij /i a dij þ ð1 aÞ r 2ij ;
ð3Þ
j¼1
subject to C X i¼1
uij ¼ 1;
for 1 6 j 6 N;
ð4Þ
8927
C.-C. Chuang et al. / Expert Systems with Applications 36 (2009) 8925–8931
where uij denotes the firing strength of the ith rule for the jth training pattern, ui() represents a robust loss function, a is a balance parameter, dij is the distance between the jth input data and the centers of the ith cluster (~ bi ), i.e.
dij ¼ k~ xðjÞ ~ bi k;
ð5Þ
and rij is the error between the jth desired output of the modeled system (i.e., yj) and the output of the ith rule with the jth input data, i.e.
r ij ¼ yj fi ð~ xðjÞ; ~ ai Þ;
i ¼ 1; 2; . . . ; C
and j ¼ 1; 2; . . . ; N:
ð6Þ
According to the concept of the FCRM clustering algorithm, the regression errors (rij) are considered in the cost function of the RFCRM clustering algorithm. In order to enhance the robust capability of the RFCRM clustering algorithm, the robust mechanism (i.e., a robust loss function) is incorporated into the cost function of the RFCRM clustering algorithm. Since the input data distribution is not considered, the properties of the input data distribution are lost and sometimes the algorithm may fail to obtain a satisfactory initialization for the TSK fuzzy model. In order to consider the input data distribution, the distances between the input data and the centers of clusters (dij) are also considered in the cost function of the RFCRM 2 clustering algorithm. Since the scales of dij and of r2ij are different, a balance parameter a is used in the cost function of the RFCRM clustering algorithm as shown in Eq. (3). Generally, the balance parameter a is selected as 0 6 a 6 1 depending on the applications. If the balance parameter a is chosen as 1, then Eq. (3) is regarded as the cost function of the fuzzy C-mean clustering algorithm with a robust mechanism. Meanwhile, when the balance parameter a is chosen as 0, Eq. (3) is regarded as the cost function of the FCRM clustering algorithm with a robust mechanism. That is, the fuzzy C-mean clustering algorithm with a robust mechanism and the FCRM clustering algorithm with a robust mechanism are both special cases of the RFCRM clustering algorithm. In this study, the robust loss function is chosen as Tukey’s biweight function (Frigui & Krishnapuram, 1999; Rousseeuw & Leroy, 1987) and is defined as
8 " # 2 3 > > < 1 1 1 sij ; jsij j 6 MADi MADi : /i ðs2ij Þ ¼ 3 > > :1; jsij j > MADi 3
ð7Þ
Additionally, the derivative of Tukey’s biweight function is obtained as
wi ðs
2 ij Þ
8 > <
2 sij 2 1 d/i 1 ; jsij j 6 MADi 2 MADi : ¼ 2 ¼ MADi dsij > : 0; jsij j > MADi
ð8Þ
Here, MADi is the median of absolute deviations of sij (Hampel et al., 1986; Rousseeuw & Leroy, 1987). The Lagrange multiplier method is applied to minimize J in Eq. (3) subject to Eq. (4). Then, the Lagrange function is defined as
L¼
C X N X i¼1
j¼1
2
u2ij /i ða dij þ ð1 aÞ r 2ij Þ
N X
C X
kj
j¼1
This method is to solve rL = 0, where rL :¼ necessary conditions for minimizing J are
h
! uij 1 ;
ð9Þ
i¼1 @L @~ ai
@L @uij
N h i @ðr ij Þ @L X 2 ¼ u2ij wi adij þ ð1 aÞr 2ij ð1 aÞ ¼ 0; @~ ai @~ ai j¼1 h i @L 2 ¼ 2uij /i adij þ ð1 aÞr 2ij kj ¼ 0; @uij N h i @L X @dij 2 ¼ u2ij wi adij þ ð1 aÞr 2ij 2a dij ¼0 ~ @~ bi @ bi j¼1
@L @~ bi
@L @kj
iT
. The
and C @L X ¼ uij 1 ¼ 0: @kj i¼1
Note that
@r 2ij @~ ai
¼ 2r ij
@r 2ij
in Eq. (10) can be obtained using Eq. (6) as
@~ ai
@r ij @r ij ¼ 2 yj fi ðxj ; ~ ai Þ : @~ ai @~ ai
ð14Þ
Substituting Eq. (14) into Eq. (10), then it yields N X
2
u2ij wi ½adij þ ð1 aÞr 2ij ð1 aÞ
j¼1
þ ð1 aÞr2ij ð1 aÞ
N X @r ij 2 yj u2ij wi ½adij i @~ a j¼1
@r ij fi ð~ xðjÞ; ~ ai Þ ¼ 0; @~ ai
i ¼ 1; 2; :::; C
ð15Þ
Additionally, Eq. (15) then can be rewritten in a matrix form as
XQ i Y ðXQ i X T Þ ~ ai ¼ 0;
for i ¼ 1; 2; . . . ; C:
ð16Þ
Here, X, Y and Qi in Eq. (16) are defined as follows:
2
1
1
1
1
1
3
6 x ð1Þ x ð2Þ x ð3Þ x ðN 1Þ x ðNÞ 7 7 6 1 1 1 1 1 7 6 6 : : : : : : 7 7 6 X¼6 7 7 6 : : : : : : 7 6 7 6 4 : : : : : : 5 xn ð1Þ xn ð2Þ xn ð3Þ xn ðN 1Þ xn ðNÞ 3 y1 6 : 7 6 7 6 7 7 and Y ¼6 6 : 7 6 7 4 : 5 2
yN 2 6 6 6 6 Qi ¼ 6 6 6 4 qik ¼
;
ðnþ1ÞN
N1
qi1
0
0
0
0 .. .
qi2
.. .
0 .. .
0 .. .
0
qik
0
0
0 .. .
0
0
u2ik wi ð
2 dik
a
3
7 7 7 7 7 7 7 0 5
qiN Þr 2ik Þ
þ ð1 a
where
NN
for k ¼ 1; . . . ; N:
Hence, the parameter vector ~ ai for the consequent part of the ith rule is obtained as
~ ai ¼ ½XQ i X T 1 XQ i Y;
i ¼ 1; 2; . . . ; C:
ð17Þ
To solve Eq. (11), the solution uij can be represented as
uij ¼
kj 2 dij
2/i ½a
þ ð1 aÞr 2ij
:
ð18Þ
Besides, kj can be solved by substituting Eq. (18) into Eq. (4) and then
kj ¼ PC
k¼1
1 h
2/
2
ð13Þ
1
ad2kj þð1aÞr2kj
i
:
ð19Þ
ð10Þ ð11Þ ð12Þ
The parameter kj in Eq. (18) can be eliminated by using Eq. (19). Then, Eq. (18) can be rewritten as
h i 2 1=2/i adij þ ð1 aÞr2ij h i : uij ¼ P 2 C 2 k¼1 1=2/i adkj þ ð1 aÞr kj
ð20Þ
8928
C.-C. Chuang et al. / Expert Systems with Applications 36 (2009) 8925–8931
To solve Eq. (12), the solution ~ bi can be represented as
PN ~ bi ¼
h
j¼1
i
2 dij
a u2ij wi a þ ð1 aÞr2ij ~xðjÞ h
PN
j¼1
a u2ij wi ad2ij þ ð1 aÞr2ij
i
:
ð21Þ
In this study, the robust similarity measure that adopts the concept of cardinality is used to reduce the redundant clusters. The robust similarity measure (Frigui & Krishnapuram, 1996) is discussed here. Let Gi denote the set of all feature vectors belonging to the ith cluster, Gij = Gi [ Gj denote the set of all feature vectors belonging to P either the ith cluster or the jth cluster and jGi j ¼ feature vector2Gi uik be the cardinality of cluster i. The similarity measure between cluster i and cluster j can be defined as
P
Sij ¼ 1
feature vector 2Gij juik
ujk j
jGi j þ jGj j
:
ðxj hij1 Þ2 2ðhij2 Þ2
N 1 X q½ei ðtÞ; nðtÞ; N i¼1
ð25Þ
where t denotes the epoch number, ei(t) represents the error between the ith desired output and the ith output of the TSK fuzzy model at epoch t, n(t) is a time-varying parameter that acting like the cut-off point and q() denotes the logistic loss function and is defined as
n 2
q½ei ; n ¼ ln 1 þ
2 ei : n
ð26Þ
In the ARBP learning algorithm, the properties of the time-varying parameter n(t) are set as follows: (A) n(t = 1) = ninitial, where ninitial is determined by the maximum of absolute errors for the initialization of TSK fuzzy models; (B) n(t) ? 0 for t ? 1; (C) n(t) = k/t for any epoch t, where the constant k is ninitial.
), where hij1 and hij2 are two adjustable
parameters of the jth membership function of the ith fuzzy rule. The parameters in the premise parts of the TSK fuzzy models then can easily be obtained from uij (Euntai, Minkee, Seunghwan, & Mignon, 1997)
hij1 ¼
EARBP ðtÞ ¼
ð22Þ
It is easy to show that 0 6 Sij 6 1. When clusters i and cluster j are identical, the similarity measure Sij = 1. That is, cluster i and cluster j are merged into a single cluster in this situation. Meanwhile, when cluster i and cluster j are disjoint (i.e., Gi \ Gj = 0), the similarity measure Sij = 0. Thus, cluster i and cluster j are not merged into a cluster in this situation. In our clustering process, cluster i and cluster j are merged when Sij P ð1 eÞ. (1 e) is regarded as the merging threshold. Assume that the Gaussian membership functions are used in the premise parts of the TSK fuzzy models, (i.e., Aij ðhij1 ; hij2 Þ ¼ exp
ARBP to adjust the parameters of the TSK fuzzy rules in the finetuning phase. An important feature of the robust learning algorithm is to use a loss function in place of the quadratic form of the cost function in a BP algorithm. Thus, a cost function for an ARBP learning algorithm is defined as
PN
2 k¼1 ðuik Þ xj ðkÞ PN 2 k¼1 ðuik Þ
ð23Þ
and
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uPN u k¼1 ðuik Þ2 ðxj ðkÞ hij1 Þ2 : hij2 ¼ t PN 2 k¼1 ðuik Þ
ð24Þ
First, the RFCRM clustering algorithm is used. In the algorithm, C(t) is the number of clusters at the tth iteration. The initial value of C is set to a large value (say Cmax). The procedure of the RFCRM clustering algorithm in the coarse-tuning phase is stated as follows: Step 1: Set the parameters t = 0, C(0) = Cmax, the initialization of ~ bi for i = 1,. . .,C, and the stop criterion. In our simulation, the parameter e and the balance parameter a are experimentally chosen as 0.3 and 0.2, respectively. Step 2: Compute the consequent parameter sets ~ ai by using Eq. (17), dij by using Eq. (5) and rij by using Eq. (6) for 1 6 i 6 CðtÞ and 1 6 j 6 N. bi by Eq. (21). Step 3: Update the weights uij by Eq. (20) and ~ Step 4: Compute Sij by using Eq. (22). If Sij P ð1 eÞ, cluster i and cluster j are merged and update the number of clusters as C(t + 1). Step 5: If the stop criterion is not satisfied, then go to Step 2; otherwise go to Step 6. Step 6: Compute the parameters in the premise parts by using Eqs. (23) and (24). 4. Annealing robust back-propagation learning algorithm After the TSK fuzzy rules are obtained, the adjustable parameters of the TSK fuzzy models can be adjusted by the supervised learning algorithm with a robust mechanism to improve the modeling accuracy. The proposed hybrid robust approach employs an
Ideally, the time-varying parameter n(t) is to use a large cut-off point in the early training stage and then to use a small cut-off point in the later training stage. Since the function to be modeled is completely unknown, the error measure used to discriminate against outliers may be incorrect in the beginning of the process. Consequently, it is better to use a large cut-off point to include all possible points or even not to use the loss function. The similar concept for property (B) has also been used in traditional BP learning algorithms. It is the so-called early stopping, which is to stop the training process under certain conditions and is often employed to overcome overfitting phenomena. In property (C), it is to define the property of time-varying parameter n(t). When the decay is too quick, the approximation of the majority may not have enough time to converge and the training data may mostly be degraded. If the decay is too slow, the robust learning algorithm may not be in time to discriminate against those outliers before overfitting occurs. A suitable time-varying parameter n(t) = k/t has been proposed in Chuang et al. (2000). Since the TSK fuzzy models are described as Eq. (1) and its output is obtained as Eq. (2), the parameters in the premise parts are updated as (Euntai et al., 1997)
1 Dhijk ¼ gq0 ðy y^Þðyi y^Þ PC
@wi
i¼1
wi
@hijk
ð27Þ
;
where q0 () represents the derivative of the logistic loss function q(), g denotes the learning constant, y represents the desired out^ is the output of the TSK fuzzy model, yi denotes the ith rule put, y i are defined as follows. Let output of the TSK fuzzy model, and @w @hiji * j be the index j when the minimization in wi occurs; i.e., xðjÞÞ: Then, when j is equal to j* j*=arg min Aij ð~ j
@wi @hij1 @wi @hij2
j¼1;2;:::;N
¼ ¼
i 1 xðjÞ hj1
hij2
hij2
( exp
xðjÞ hij1 @wi hij2
@hij1
ðxðjÞ hij1 Þ2 2ðhij2 Þ2
and ð28Þ
:
When j is not equal to j*,
)
@wi @hij2
i
@w ¼ @h i ¼ 0. j1
8929
C.-C. Chuang et al. / Expert Systems with Applications 36 (2009) 8925–8931
The parameters of the consequent parts are updated as i
w xðjÞ Daij ¼ fq0 ðy y^Þ PC ; i i¼1 w
ð29Þ
where f is a learning constant. The procedure of the ARBP learning algorithm in the fine-tuning phase is stated as follows: Step 1: Set the initial constant k=ninitial in the time-varying parameter n(t) and the stop criterion. Step 2: For the training pair ð~ xðkÞ; yk Þ, compute the estimated ^k Þ. ^k by Eqs. (1) and (2) and its error ðyk y result y Step 3: Update the premise and the consequent parameters using Eqs. (27)–(29). Step 4: Compute n(t) =ninitial =t (acting like the cut-off point). Step 5: Compute the robust cost function EARBP, defined by Eq. (25). Step 6: If the stop criterion is not satisfied, then go to Step 2; otherwise terminate the fine-tuning process.
5. Simulation results In this study, three examples are used to verify the validity of the proposed hybrid robust approach. For Example 1, a simple function with a noise distribution is considered as
Y ¼ gðxÞ þ F;
2 6 x 6 2;
ð30Þ
where g(x) = x2/3, g(x) is the original function, and F is a gross error model. The gross error model is defined as
F ¼ ð1 dÞG þ dH;
ð31Þ
where G and H are probability distributions that occur with probabilities 1 d and d, respectively. In this example, the values, used in the gross error model are d = 0.05, G N(0, 0.05) and H N(0, 1). There are 201 training data pairs are generated as the training data pairs. After training, another 401 data pairs are used to evaluate the performance of the learned model. The index used for evaluating the performance is the root mean square error (RMSE) defined as
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN ^ 2 j¼1 ðyj yj Þ RMSE ¼ ; N
ð32Þ
where N denotes the number of the test data, yj represents the ac^j is the predicted output of the learned TSK fuzzy tual output and y model for the jth training pattern. In this study, the RFCRM clustering algorithm is first applied to obtain a rough approximation of the TSK fuzzy model. In the coarse-tuning phase, the initial number of clusters is chosen as 40. Furthermore, users can adjust the balance parameter a in the RFCRM clustering algorithm according to users’ preference. (Table 1) lists the testing RMSE for the initialization of the TSK fuzzy mod-
el by the RFCRM clustering algorithm with different a and for the final TSK fuzzy model, it is adjusted by the ARBP learning algorithm. Additionally, the cross-validation method is employed to determine the a value. The testing RMSE with different a values using fivefold cross-validation in the coarse-tuning phase are summarized in Table 2 for Example 1. The obtained TSK fuzzy model using the RFCRM clustering algorithm with the balance parameter a = 0.2 is shown in Fig. 1a. In this example, the number of clusters (i.e., rules) is obtained as 3 in the coarse-tuning phase. The testing RMSE for the final TSK fuzzy models is obtained as 0.0574 in the fine-tuning phase. Furthermore, the final results of the TSK fuzzy models using the proposed hybrid robust approach are also shown in Fig. 1b. For comparison, the RFRA clustering algorithm with the robust BP learning algorithm (Chuang et al., 2001) is considered in this study. The number of clusters and the testing RMSE for the rough approximation of the TSK fuzzy model by the RFRA clustering algorithm are obtained as 5 and 0.1639, respectively. Additionally, the testing RMSE of the final TSK fuzzy models is obtained as 0.0622 after the robust BP learning algorithm. The second example is a simple nonlinear autoregressive (NAR) model time series with a gross error model. This example has been used in Chuang et al. (2001) and is defined as
yk ¼ xðkÞ þ v k ;
ð33Þ
where
xðkÞ ¼ 1:5xðk 1Þe
x2 ðk1Þ 4
þ ek ;
ð34Þ
Table 2 The testing RMSE with different alpha values using fivefold cross-validation in the coarse-tuning phase for Example 1. Balance parameter a
Testing RMSE for the initialization of TSK fuzzy models Mean/std
0 0.2 0.4 0.6 0.8 1.0
0.0784/0.0816 0.0677/0.0116 0.0810/0.0222 0.0863/0.0213 0.1202/0.0423 0.1500/0.0284
Note: The mean and standard deviation of the testing RMSE using fivefold crossvalidation in the coarse-tuning phase are represented as the ‘mean’ and ‘std’, respectively.
Table 1 The testing RMSE of the TSK fuzzy models that are obtained by the proposed hybrid approach with different balance parameters for Example 1. Balance parameter a
Testing RMSE for the initialization of TSK fuzzy models
Testing RMSE of final TSK fuzzy models
0 0.2 0.4 0.6 0.8 1.0
0.0730 0.0648 0.0858 0.0870 0.1627 0.2288
0.0721 0.0574 0.0638 0.0687 0.0799 0.0866
Note that the testing RMSE for the initialization and final result of the TSK fuzzy models using the RFRA clustering algorithm and the robust BP learning algorithm are obtained as 0.1639 and 0.0662, respectively.
Fig. 1a. The training data pairs are generated by Eq. (30) with the gross error model, represented as ‘o’, the true function and the initialization of the TSK fuzzy models using the RFCRM clustering algorithm with a = 0.2 are shown.
8930
C.-C. Chuang et al. / Expert Systems with Applications 36 (2009) 8925–8931
ven. Hence, in this study we only present the comparison between the proposed hybrid robust approach and the previous approach (the RFRA clustering algorithm with a robust learning algorithm). The compared results of the proposed hybrid approach and the RFRA clustering algorithm with a robust learning algorithm for Examples 2 and 3 are shown in Figs. 2, 3a, 3b, respectively. Since the true function does not exist for Example 3, the cross-validation method is applied to evaluate the performance of the final TSK fuzzy models. Fivefold cross-validation is employed to evaluate the performance. According to the cross-validation method, the performance of the proposed robust approach and the RFRA clustering algorithm with the robust BP learning algorithm are obtained as 30.4591 and 37.4771, respectively. Based on these results, it is obvious that the final result of the proposed hybrid robust approach is superior to that of previous algorithms. Finally, the simulation results for Example 2 are listed in Table 3. From Table 3, the simulation results show that the testing RMSE of the proposed RFCRM clustering algorithm is lower than that of the RFRA clusterFig. 1b. The final results of the TSK fuzzy models using the proposed hybrid robust approach are shown.
k denotes the time index, ek is generated by N(0, 1) and vk is generated by the gross error model with e = 0.05, G N(0, 0.05) and H N(0, 3). Fig. 2a shows a time series of length N = 200 generated from (33) and (34). There are 200 training data pairs are used. The used test data have 401 points in this example. In the third example, a real data set, motor vehicles engines and parts/CPI Canada, 1976.5–1991.12. IV = 164, is considered. The original data set can be downloaded form http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/. This data set, as presented in Fig. 3a and represented as ‘o’, consists of 188 CPI values of motor vehicles engines. The data input is month (i.e., 1976.5–1991.12) and has been normalized into the interval [0,18.7]. Similarly, the data output is the CPI value of engines and parts of motor vehicles in Canada and has been divided by 10 for normalization. The initial number of clusters is chosen as 40 for Examples 2 and 3. The numbers of cluster (i.e., rules) are obtained as 6 and 13 for Examples 2 and 3, respectively, in the coarse-tuning phase. For the fine-tuning phase, 3000 training epochs are performed for those examples. In our previous work (Chuang et al., 2001), the comparisons between that approach and other approaches are gi-
Fig. 3a. The data set of Motor vehicles engines and parts/CPI Canada, 1976.5– 1991.12. IV = 164 (‘o’), the initialization of the robust TSK fuzzy modeling using the RFCRM clustering algorithm (‘.’) and the RFRA clustering algorithm (‘.’) are shown.
8 6 4
y(t)
2
RFRA
RFCRM 0
true model -2 -4 -6 -6
-4
-2
0
2
4
6
8
y(t-1) Fig. 2. The true model (‘’)and the final results of Example 2 using the proposed hybrid approach for the robust TSK fuzzy modeling (‘.’), the RFRA clustering algorithm with robust learning algorithm (‘.’) and the scatter plot of yt versus yt1 (‘o’) are shown.
Fig. 3b. The final results of the robust TSK fuzzy modeling using the proposed hybrid robust approach (‘.’) and the RFRA clustering algorithm with robust learning algorithm (‘.’) are shown.
C.-C. Chuang et al. / Expert Systems with Applications 36 (2009) 8925–8931 Table 3 The testing RMSE of the TSK fuzzy models that are obtained by different robust approaches for Example 2. Algorithms
RMSE
Rules
The initialization of the TSK fuzzy model using the RFRA clustering algorithm The final of the TSK fuzzy model using the RFRA clustering algorithm with the robust learning algorithm The initialization of the TSK fuzzy model using the RFCRM clustering algorithm The final of the TSK fuzzy model using the RFCRM clustering algorithm with the robust learning algorithm The initialization of the TSK fuzzy model using the FCRM clustering algorithm The final of the TSK fuzzy model using the FCRM clustering algorithm with the BP learning algorithm
0.5317
5
0.4200 0.4619
6
0.4046 0.7070
5
0.7897
Note that the number of rule is equal to the number of clusters in the coarse-tuning phase.
ing algorithm in the fine-tuning phase. Additionally, the testing RMSE of the final TSK fuzzy models using the proposed hybrid robust approach are also superior to that of the RFRA clustering algorithm with the robust BP learning algorithm. 6. Conclusion This study developed a hybrid robust approach for constructing the TSK fuzzy models with outliers. In the coarse-tuning phase, the RFCRM clustering algorithm considers the regression error and the input data distribution. In other words, the RFCRM algorithm is proposed to simultaneously define the fuzzy subspaces and identify the parameters in the consequent parts of the TSK rules. Moreover, the robust similarity measure is used to reduce the number of redundant clusters in the clustering process. Consequently, the proposed approach in the coarse-tuning phase can simply be the computational formula of the RFRA clustering algorithm. Additionally, an ARBP learning algorithm is employed to obtain a more precision model in the fine-tuning phase. Thus, when an initial structure of the TSK fuzzy model is obtained via the RFCRM clustering algorithm with robust similarity measure, the proposed TSK fuzzy model using the ARBP learning algorithm is robust against outliers and better than that of using the RFRA clustering algorithm. References Angelov, P. P., & Filev, D. P. (2004). An approach to on-line identification of TS fuzzy models. IEEE Transactions on Systems Man Cybernetics: Part B, 34, 484–498. Babuska, R. (1998). Fuzzy modeling for control. Boston: Kluwer Academic Publishers. Backer, E., & Jain, A. K. (1981). A clustering performance measure based on the fuzzy set decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 66–74. Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithm. Plenum. Chang, P. C., & Liu, C. H. (2008). A TSK type fuzzy rule based system for stock price prediction. Expert Systems with Applications, 34, 135–144. Chen, D. S., & Jain, R. C. (1994). A robust back-propagation learning algorithm for function approximation. IEEE Transactions on Neural Networks, 5, 467–479.
8931
Chuang, C. C., Su, S. F., & Chen, S. S. (2001). Robust TSK fuzzy modeling for function approximation with outliers. IEEE Transactions on Fuzzy Systems, 9, 810–821. Chuang, C. C., Su, S. F., & Hsiao, C. C. (2000). The annealing robust backpropagation (ARBP) learning algorithm. IEEE Transactions on Neural Networks, 11, 1067–1078. Chuang, C. C., Su, S. F., Jeng, J. T., & Hsiao, C. C. (2002). Robust support vector regression networks for function approximation with outliers. IEEE Transactions on Neural Networks, 13, 1322–1330. Dave, R. N., & Krishnapurum, R. (1997). Robust clustering methods: A unified view. IEEE Transactions on Fuzzy Systems, 5, 270–293. David, V., & Sanchez, A. (1995). Robustization of learning method for RBF networks. Neurocomputing, 9, 85–94. Delgado, M., Gómez Skarmeta, A. F., & Vila, A. (1996). On the use of hierarchical clustering in fuzzy modeling. International Journal of Approximate Reasoning, 14, 237–259. Dickerson, J. A., & Kosko, B. (1996). Fuzzy function approximation with ellipsoidal rules. IEEE Transactions on Systems, Man and Cybernetics, 26, 542–560. Euntai, K., Minkee, P., Seunghwan, J., & Mignon, P. (1997). A new approach to fuzzy modeling. IEEE Transactions on Fuzzy Systems, 5, 328–337. Evans, M., Hastings, N., & Peacock, B. (1993). Statistical distributions. Wiley. Frigui, H., & Krishnapuram, R. (1996). A robust algorithm for automatic extraction of an unknown number of clusters from noise data. Pattern Recognition Letters, 17, 1223–1232. Frigui, H., & Krishnapuram, R. (1999). A robust competitive clustering algorithm with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 450–465. Gu, H., & Wang, H. (2007). Fuzzy prediction of chaotic time series based on singular value decomposition. Applied Mathematics and Computation, 185, 1171–1185. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust statistics, the approach based on influence functions. Wiley. Hathaway, R. J., & Bezdek, J. C. (1993). Switching regression models and fuzzy clustering. IEEE Transactions on Fuzzy Systems, 1, 195–204. Hawkins, D. M. (1980). Identification of outliers. Chapman and Hall. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall. Jeng, J. T. (2006). Hybrid approach of selecting hyper-parameters of support vector machine for regression. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 36, 699–709. Kim, K., Kyung, K. M., Park, C. W., Kim, E., & Park, M. (2004). Robust TSK fuzzy modeling approach using noise clustering concept for function approximation. Lecture Notes in Computer Science (Computational & Information Science), 3314(2004), 538–543. Kim, K., Whang, E. J., Park, C. W., Kim, E., & Park, M. (2005). A TSK fuzzy inference algorithm for on line identification. Lecture Notes in Computer Science (Computational & Information Science), 3613(2005), 179–188. Klawonn, F., & Kruse, R. (1997). Constructing a fuzzy controller from data. Fuzzy Sets and Systems, 85, 177–193. Kwak, K. C., & Kim, D. K. (2006). TSK-based linguistic fuzzy model with uncertain model output. IEICE Transactions on Information and Systems, E89, 2919–2923. Leski, J. M. (2004). Epsilon-insensitive fuzzy C-regression models: Introduction to epsilon-insensitive fuzzy modeling. IEEE Transactions on Systems, Man and Cybernetics, Part B, 34, 4–15. Leski, J. M. (2005). TSK fuzzy modeling based on e – Insensitive learning. IEEE Transactions on Fuzzy Systems, 13, 181–193. Rousseeuw, P. J., & Leroy, M. A. (1987). Robust regression and outlier detection. New York: Wiley. Simpson, P. K. (1993). Fuzzy min–max neural networks – Part 2: Clustering. IEEE Transactions on Fuzzy Systems, 1, 32–45. Sugeno, M., & Yasukawa, T. (1993). A fuzzy-logic-based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems, 1, 7–31. Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man and Cybernetics, 15, 223–231. van den Bergh, W. M., & van den Berg, J. (2000). Competitive exception learning using fuzzy frequency distributions. ERIM Report Series Research in Management
. Wang, W. Y., Lee, T. T., Liu, C. L., & Wang, C. H. (1997). Function approximation using fuzzy neural networks with robust learning algorithm. IEEE Transactions on Systems, Man and Cybernetics, Part B, 27, 740–747.