Expert Systems with Applications 39 (2012) 8909–8928
Contents lists available at SciVerse ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Fuzzy linear regression based on Polynomial Neural Networks Seok-Beom Roh a, Tae-Chon Ahn a,⇑, Witold Pedrycz b,c a
School of Electronic and Control Engineering, Wonkwang University, Iksan, Chon-Buk 570-749, Republic of Korea Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2G7 c Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland b
a r t i c l e
i n f o
Keywords: Fuzzy linear regression Polynomial Neural Networks Particle Swarm Optimization Fuzzy Least Square Estimatiom (LSE)
a b s t r a c t In this study, we introduce an estimation approach to determine the parameters of the fuzzy linear regression model. The analytical solution to estimate the values of the parameters has been studied. The issue of negative spreads of fuzzy linear regression makes the problem to be NP complete. To deal with this problem, an iterative refinement of the model parameters based on the gradient decent optimization has been introduced. In the proposed approach, we use a hierarchical structure which is composed of dynamically accumulated simple nodes based on Polynomial Neural Networks the structure of which is very flexible. In this study, we proposed a new methodology of fuzzy linear regression based on the design method of Polynomial Neural Networks. Polynomial Neural Networks divide the complicated analytical approach to estimate the parameters of fuzzy linear regression into several simple analytic approaches. The fuzzy linear regression is implemented by Polynomial Neural Networks with fuzzy numbers which are formed by exploiting clustering and Particle Swarm Optimization. It is shown that the design strategy produces a model exhibiting sound performance. Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction In recent years, the problem of modeling and prediction from observed data has been one of the most commonly encountered research topics in machine learning and data analysis (Guvenir & Uysal, 2000). A simple way to describe a system is the regression analysis (Yu & Lee, 2010). In the classical regression both independent and dependent variables are treated as real numbers. However, in many real-world situations, where the complexity of the physical system calls for the development of a more general viewpoint, regression variables are specified in the form of some non-numeric (granular) entities such as e.g., linguistic variables (Cheng & Lee, 2001). The well-known and commonly encountered classical regression cannot handle such situations (Bardossy, 1990; Bardossy, Bogardi, & Duckstein, 1990). The fuzzy regression, which can deal with the non-numerical entities, especially linguistic variables, was proposed by Imoto, Yabuuchi, and Watada (2008), Tanaka, Uejima, and Asai (1982), Toyoura, Watada, Khalid, and Yusof (2004), and Watada (2001). A fuzzy linear regression proposed by Tanaka is composed of the numeric input variables and the linguistic (granular) coefficients ⇑ Corresponding author. Tel.: +82 63 850 6344; fax: +82 63 853 2196. E-mail addresses:
[email protected] (S.-B. Roh),
[email protected] (T.-C. Ahn),
[email protected] (W. Pedrycz). 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.02.016
which are treated as some fuzzy numbers (in particular, those are described by triangular membership functions). The linguistic coefficients of the regression lead to the linguistic output of the regression model. In other words, the output of a fuzzy linear regression model becomes also a triangular fuzzy number. In essence, the fuzziness of the output of the regression model emerged because of the lack of perfect fit of numeric data to the assumed linear format of the relationship under consideration. In other words, through the introduction of triangular numbers (parameters of the model), this fuzzy regression reflects the deviations between the data and the linear model. Computationally, the estimation of the fuzzy parameters of the regression is concerned with some problems of linear programming (Bargiela, Pedrycz, & Nakashima, 2007). Diamond developed a simple regression model for triangular fuzzy numbers under the conceptual framework as
FðRm Þ ! FðRÞ
ð1Þ
where F(R) denotes a family of fuzzy numbers (in our case triangular ones) defined in the space of real numbers R. For the conceptual framework formed by (1), the various analytical formulae quantifying the values of the parameters of the regression model had to address the issue of negative spreads (Diamond & Koerner, 1997), which complicates significantly the algorithms and makes them difficult to apply to highly-dimensional data.
8910
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
Fig. 1. An overall structure of the Polynomial Neural Network with fuzzy data.
Fig. 2. An overall structure of the PNN.
Considering the optimization standpoint, A. Bargiela et al. (2007) revised the mapping between the independent variables and the dependent variable to be expressed as follows
FðRÞ FðRÞ FðRÞ ! FðRÞ
ð2Þ
In addition, to deal with the issue of negative spreads, A. Bargiela proposed a certain re-formulation of the regression problem as a gradient-descent optimization task, which enables a generic
generalization of the simple regression model to multiple regression models in a computationally feasible way (Toyoura et al., 2004). The iterative refinement based on the gradient decent approach to estimate the parameter of fuzzy linear regression is the modification of the conventional gradient decent optimization. The drawback of the gradient decent optimization is well-known: the optimization performance of the gradient decent optimization
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
Main Procedure of Polynomial_Neural_Network Decide on the design parameters such as (a) The number K of input variables coming to each node (b) Total number M of candidates (which can be chosen as the input variables at next layer) (c) The number L of layers on Polynomial Neural Networks Repeat a. Call Procedure of Polynomial_Neuron b. Sort modeling performance index obtained from procedure of Polynomial_Neuron by a descending series c. Choose M candidates and Set the selected M nodes as new input variables for next layer Until the predefined maximum number of layer L is reached End Procedure of Polynomial_Neural_Network Procedure of Polynomial_Neuron Repeat a. Determine the structure of a polynomial such as the selected input variables and the order of polynomial b. Estimate the coefficients of a polynomial by using the weighted LSE c. Calculate modeling performance index m! ⋅ T (T=4, m is the total number of input variables) is reached Until S = m C K ⋅T = (m − K )! K! Return Sort modeling performance index in Procedure of Polynomial_Neural_Networks End Procedure of Polynomial_Neuron Fig. 3. Pseudo code used in the design of Polynomial Neural Networks.
Fig. 4. Structure of particle.
Fig. 5. Construction of triangular fuzzy sets around the projections of the prototypes defined by particle.
8911
8912
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
Fig. 6. Example of a particle.
mainly depends on the shape of the error surface, the starting point of the candidate solution and the learning coefficient (Seiffert & Michaelis, 2002). In this paper, to overcome the drawback of analytical estimation approach and the iterative refinement approach to estimate parameters of the model, we introduce the new estimation technique, which based on the concept of the Polynomial Neural Networks (PNNs). When dealing with high-order nonlinear and multivariable equations of the model, we require a vast amount of data for estimating all its parameters (Cherkassky, Gehring, & Mulier, 1996;
Table 1 Selected numeric values of the parameters of the proposed model. Parameter
Value
Polynomial Neural Networks No. of rules (k) No. of layer (L) No. of input variables (S) No. of candidates (M)
5, 10, 15, 20, 25, 30 2, 3, 4, 5 layers 2 10
Particle Swarm Optimization No. of particles (n) No. of generations (G)
50 100
6.5
6.5
6
6
5.5 RMSE
RMSE
5.5 5
5 4.5 4.5
4 0
4
5
10
15 20 No. of Rules
25
30
3.5 0
35
5
(a) No. of layers (L) = 2
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
7
7 6.5
6.5
6
RMSE
RMSE
6
5.5
5.5 5
5 4.5 4.5
4 0
4
5
10
15 20 No. of Rules
25
(c) No. of layers (L) = 4
30
35
3.5 0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 7. The performance of the proposed model in terms of environmental parameters such as the number of rules and the number of layers for training data.
8913
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
7
7
6.5
6.5
6 5.5
RMSE
RMSE
6
5
5.5
5 4.5 4.5
4 3.5 0
5
10
15 20 No. of Rules
25
30
4 0
35
5
(a) No. of layers (L) = 2
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
6.5
7 6.5
6
6
RMSE
RMSE
5.5
5
5.5 5
4.5 4.5 4
3.5 0
4
5
10
15
20
25
30
35
3.5 0
5
10
15
20
25
No. of Rules
No. of Rules
(c) No. of layers (L) = 4
(d) No. of layers (L) = 5
Fig. 8. The performance of the proposed model in terms of environmental parameters such as the number of rules and the number of layers for testing data.
1
Z 11 = 52.688− 4.575 X1 −0.0156 X2
2
Z 12 = 56.2056− 0.2675 X1 −0.0099 X4
Z 21 =−0.1178+ 0.0497Z11 + 0.9548Z12 Yˆ = 0.9812 + 0.3929Z 21 + 0.5691Z 22
3
Z 13 = 47.859− 0.0080 X2 − 0.1860 X3
4
Z 14 = 56.0016− 0.0114 X3 − 0.0098 X4
Z 22 =−0.0093+ 0.0577Z13 + 0.9426Z14
Fig. 9. The topology of the Polynomial Neural Networks for fuzzy linear regression when L = 3 and R = 3.
Dickerson & Kosko, 1996). To help alleviate the problems, one of the first approaches along the line of a systematic design of nonlinear relationships between system’s inputs and outputs comes under the name of a Group Method of Data Handling (GMDH). GMDH was developed in the late 1960s by Ivakhnenko (Ivakhnenko, 1971; Ivakhnenko & Madala, 1994; Ivakhnenko & Ivakhnenko, 1995; Ivakhnenko, Ivakhnenko, & Muller, 1994) as a
vehicle for identifying nonlinear relations between input and output variables. GMDH-type algorithms have been extensively used since the mid-1970s for prediction and modeling complex nonlinear processes. The GMDH algorithm generates an optimal structure of the model through successive generations of Partial Descriptions (PDs) of data being regarded as quadratic regression polynomials of two input variables. While providing with a systematic design
8914
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
3.55
3.45
3.5 3.4 3.45 3.35 RMSE
RMSE
3.4 3.35
3.3
3.3 3.25 3.25 3.2
3.2
0
5
10
15 20 No. of Rules
25
30
35
0
5
3.45
3.45
3.4
3.4
3.35
3.35
3.3
3.25
3.2
3.2
5
10
15 20 No. of Rules
25
25
30
35
30
35
3.3
3.25
0
15 20 No. of Rules
(b) No. of layers (L) = 3
RMSE
RMSE
(a) No. of layers (L) = 2
10
30
(c) No. of layers (L) = 4
35
0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 10. The performance of the proposed model optimized by PSO in terms of environmental parameters such as the number of rules and the number of layers for training data.
procedure, GMDH comes with some drawbacks. First, it tends to generate quite complex polynomial even for relatively simple systems (experimental data). Second, owing to its limited generic structure (that is quadratic two-variable polynomials), GMDH also tends to produce an overly complex network (model) when it comes to highly nonlinear systems. Third, if there are less than three input variables, GMDH algorithm does not generate a highly versatile structure. To alleviate the problems associated with the GMDH, PNNs were introduced by Oh and Pedrycz (2002) and Oh, Pedrycz, and Park (2003) as a new category of neural networks. In a nutshell, these networks come with a high level of flexibility associated with each node (processing element forming a PD (or PN)) can have a different number of input variables as well as exploit a different order of the polynomial (say, linear, quadratic, cubic, etc.). In comparison to well-known neural networks whose topologies are commonly selected and kept prior to all detailed (parametric) learning, the PNN architecture is not fixed in advance but becomes fully optimized (both structurally and parametrically). As a consequence, PNNs show a superb performance in comparison to the previously presented intelligent models. Although the PNN has a flexible architecture whose potential can be fully utilized through a systematic design, it is difficult to obtain the structurally and
parametrically optimized network because of the limited design of the polynomial neurons (PNs) located in each layer of the PNN. In other words, when we construct PNs of each layer in the conventional PNN, such parameters as the number of input variables (nodes), the order of the polynomial, and the input variables available within a PN are fixed (selected) in advance by the designer. Accordingly, the PNN algorithm exhibits some tendency to produce overly complex networks as well as a repetitive computation load by the trial and error method and/or the repetitive parameter adjustment by designer like in case of the original GMDH algorithm. In order to generate a structurally and parametrically optimized network, such parameters need to be optimal. We augment the conventional PNNs (which is focused on dealing with numeric data) to process fuzzy variables. The fuzzy variables are formed on a basis of available numeric data by using clustering and Particle Swarm Optimization (PSO) (Juang & Wang, 2009; Kaveh & Laknejadi, 2011). Through clustering we capture the distribution of data. PSO can find the optimal fuzzy variables, which afterwards can represent the relationships between input and output variable. The paper is organized in the following manner. First, in Section 2, we introduce a concept of fuzzy linear regression. The
8915
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
4
4
3.8
3.8 3.6
RMSE
RMSE
3.6
3.4
3.4 3.2
3.2 3 3
2.8 0
2.8
5
10
15 20 No. of Rules
25
30
2.6 0
35
5
4
4
3.8
3.8
3.6
3.6
3.4
3.2
3
3
5
10
15
20
25
25
30
35
30
35
3.4
3.2
2.8 0
15 20 No. of Rules
(b) No. of layers (L) = 3
RMSE
RMSE
(a) No. of layers (L) = 2
10
30
35
2.8 0
5
10
15 20 No. of Rules
No. of Rules
(c) No. of layers (L) = 4
25
(d) No. of layers (L) = 5
Fig. 11. The performance of the proposed model optimized by PSO in terms of environmental parameters such as the number of rules and the number of layers for testing data.
architecture and development of the PNNs for fuzzy linear regression are studied in Section 3. In Section 4, PSO is described. In Section 5, we report on a comprehensive set of experiments. Finally, concluding remarks are covered in Section 6.
(iii) le is upper semi continuous A e ¼ fz 2 R : l ðzÞ > 0g is bounded (iv) supð AÞ eA e can be represented as a family of sets called A fuzzy number A
e a , defined as a-cuts, A 2. Linear regression with fuzzy data In order to generalize a simple linear regression to the case of imprecise (non-numeric) independent and dependent variables we follow the approach proposed by Diamond (Bargiela et al., 2007) and adopt a subfamily of fuzzy sets, called fuzzy numbers, as a formal framework for the representation of imprecise data. A fuzzy number can be formally defined as follows (Bargiela et al., 2007): e of the set of real numbers R with Definition 1. A fuzzy subset A membership function le : R ! ½0; 1 is called a fuzzy number if A
e is normal, i.e. there exists an element z0 such that (i) A leðz0 Þ ¼ 1 A
e (ii) A
is fuzzy convex, i.e. le ðkz1 þ ð1 kÞz2 Þ P le ðz1 Þ^ A A leA ðz2 Þ; 8z1 ; z2 2 R; 8k 2 ½0; 1
e a ¼ fz 2 R : A
leA ðzÞ P ag
ð3Þ
and giving rise to the following set-based representation
e ¼ [a2ð0;1 A ea A
ð4Þ
From the definition of the fuzzy number, it can be easily seen e is a closed interval that every a-cuts of a fuzzy number A e a ¼ ½A e L ðaÞ; A e U ðaÞ where A
e L ðaÞ ¼ inffz 2 R : l ðzÞ P ag A e
ð5Þ
A ðaÞ ¼ supfz 2 R : le ðzÞ P ag
ð6Þ
A
eU
A
e and B e with a-cuts In the sequel, for two fuzzy numbers A L U L U e e e e e e A a ¼ ½ A ðaÞ; A ðaÞ and B a ¼ ½ B ðaÞ; B ðaÞ we define a distance e and B e to be between A
8916
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
8
8.5
8
7.5
7.5 RMSE
RMSE
7 7
6.5 6.5 6
6
5.5 0
5
10
15 20 No. of Rules
25
30
5.5 0
35
5
10
(a) No. of layers (L) = 2
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
7.6
8
7.4 7.5
7.2 7 RMSE
RMSE
7 6.8 6.6
6.5 6.4 6.2
6
6 0
5
10
15 20 No. of Rules
25
30
35
5.5 0
5
10
(c) No. of layers (L) = 4
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 12. The performance of the proposed model with FCM in terms of environmental parameters such as the number of rules and the number of layers for training data.
e BÞ e ¼ dð A;
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z 1 Z 1 e L ðaÞ B e U ðaÞ B e L ðaÞÞ2 da þ e U ðaÞÞ2 da ðA ðA 0
ð7Þ
0
The fuzzy linear regression model with multiple independent variables can be defined as follows (Bargiela et al., 2007).
e ¼ b0 þ b1 X e Y
The objective function to evaluate the model parameters is defined as follows
Hðb0 ; b1 Þ ¼ e ¼ b0 þ b1 X e 1 þ b2 X e 2 þ þ bm X em Y
ð8Þ e1
e; X ; X ; . . . ; X em where m is the number of independent variables, Y are all fuzzy numbers (in particular triangular fuzzy number) and b0, b1, b2, . . . , bm are real numbers. The parameters b0, b1, b2, . . . , bm are evaluated by minimizing the cost function H(.) defined as a squared distance between the fuzzy observations and the corresponding fuzzy dependent variable Y evaluated from (8):
min Hðb0 ; b1 ; ; bm Þ ¼
k X
2
d
e2
e i ; b0 þ b1 X e 1 þ b2 X e 2 þ þ bm X em Y i i i
i¼1
ð9Þ The minimum and maximum values of a-cuts intervals should be changed as the model parameters b0, b1, b2, . . . , bm. To show how to estimate the model parameters, we consider the simple linear regression with the only one input variable as follows.
ð10Þ
k X
e iÞ e i ; b0 þ b1 X d ðY 2
ð11Þ
i¼1
We minimize this objective function with respect to b0 and b1 that is
min Hðb0 ; b1 Þ
ð12Þ
The minimum and maximum values of a-cuts intervals of the e depend fuzzy set which is defined by the linear regression b0 þ b1 X on the sign of the parameter b1. If b1 > 0, the minimum value and maximum value of a-cuts e is defined as b0 þ b1 X e L ðaÞ and intervals of the fuzzy set b0 þ b1 X U e b0 þ b1 X ðaÞ respectively. In this case, the objective function is defined as (13).
min Hþ ðb0 ; b1 Þ ¼
k Z X
1
k Z X
1
i¼1
þ
i¼1
0
0
2 e L ðaÞ b0 þ b1 X e L ðaÞ da Y i i
e U ðaÞ b0 þ b1 X e U ðaÞ Y i i
2
da
ð13Þ
8917
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
8.5
8.5 8
8 7.5 7 RMSE
RMSE
7.5
7
6.5 6
6.5
5.5 6 5 5.5 0
5
10
15
20
25
30
4.5 0
35
5
10
15
20
25
No. of Rules
No. of Rules
(a) No. of layers (L) = 2
(b) No. of layers (L) = 3
8.5
30
35
30
35
9 8.5
8 8 7.5 RMSE
RMSE
7.5
7
7 6.5
6.5
6 6 5.5 5.5 0
5
10
15 20 No. of Rules
25
30
35
5 0
5
(c) No. of layers (L) = 4
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 13. The performance of the proposed model (with the FCM used) versus the number of rules and the number of layers; the plots concern testing data.
If b1 < 0, the minimum value and maximum value of a-cuts e is defined as b0 þ b1 X e U ðaÞ and intervals of the fuzzy set b0 þ b1 X L e b0 þ b1 X ðaÞ respectively. In this case, the objective function is defined as (14).
min H ðb0 ; b1 Þ ¼
k Z X i¼1
þ
1
0
k Z X 0
i¼1
1
2 e U ðaÞ b0 þ b1 X e L ðaÞ da Y i i
k Z 1 X @Hþ ðb0 ; b1 Þ e U ðaÞ da e L ðaÞ þ Y ¼ 4kb0 2 Y i i @b0 0 i¼1 ! k Z 1 X L U e e X i ðaÞ þ X i ðaÞda ¼ 0 2b1
b0 ¼ ðY b1 X Þ
Y ¼
2 e L ðaÞ b0 þ b1 X e U ðaÞ da Y i i X ¼ ð14Þ
After assuming the sign of the parameter b1, we determine the objective function among two sort of the objective functions H+ or H and minimize the determined objective function to estimate the parameters b0 and b1 as follows. If b1 > 0, the parameters can be calculated in the form.
i¼1
where
! k Z 1 1 X e L ðaÞ þ Y e U ðaÞ da ; Y i 2k i¼1 0 i ! k Z 1 1 X L U e e X ðaÞ þ X i ðaÞ da 2k i¼1 0 i
k X @Hþ ðb0 ; b1 Þ ¼ 4kb0 X 2 @b1 i¼1
þ 2b1
k Z X i¼1
þ
b1 ¼
0
1
Z 0
1
e L ðaÞ X e L ðaÞ þ Y e U ðaÞ X e U ðaÞ da Y i i i i
2 2 e L ðaÞ þ X e U ðaÞ da ¼ 0 X i i
Pk R 1 e L eL eU eU i¼1 0 Y i ðaÞ X i ðaÞ þ Y i ðaÞ X i ðaÞ da 2kX Y Pk R 1 e L 2 e U 2 X i ðaÞ þ X i ðaÞ da 2kb0 X 2 i¼1 0
ð15Þ
0
If b1 < 0, the parameters are calculated as follows
8918
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
5.5
5.2 5.15
5.4
5.1 5.3 5.05 5 RMSE
RMSE
5.2 5.1
4.95 4.9
5
4.85 4.9 4.8 4.8 4.7 0
4.75 5
10
15 20 No. of Rules
25
30
4.7 0
35
5
5.4
5.4
5.3
5.3
5.2
5.2
5.1
5.1
5
4.9
4.8
4.8
4.7
4.7
5
10
15 20 No. of Rules
25
25
30
35
30
35
5
4.9
4.6 0
15 20 No. of Rules
(b) No. of layers (L) = 3
RMSE
RMSE
(a) No. of layers (L) = 2
10
30
35
(c) No. of layers (L) = 4
4.6 0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 14. The performance of the proposed model (PSO realization) for varying values of the number of rules and the number of layers – training data.
k Z 1 X @H ðb0 ; b1 Þ e L ðaÞ X e U ðaÞ þ Y e U ðaÞ X e L ðaÞ da Y ¼ 4kb0 X 2 i i i i @b1 0 i¼1 k Z 1 2 2 X e L ðaÞ þ X e U ðaÞ da ¼ 0 X þ 2b1 ð16Þ i i i¼1
b1 ¼
0
Pk R 1 e L eU eU eL i¼1 0 Y i ðaÞ X i ðaÞ þ Y i ðaÞ X i ðaÞ da 2kX Y 2 2 Pk R 1 e L e U ðaÞ da 2kb0 X 2 X i ðaÞ þ X i¼1 0 i
3. Polynomial Neural Networks with fuzzy data Although it is possible to determine the analytical solution to (15) and (16), in case of a large number of regression variables the above mentioned estimation is not straightforward as the number of objective functions that have to be considered grows exponentially as (where m is a number of regression variables). In other words, the problem of forming an analytical solution to the fuzzy linear regression is NP-complete given number of regression variables. To overcome the above problem, Bargiela et al. have introduced the iterative refinement of the parameters of the regression model, which is based on the gradient decent
approach (Bargiela et al., 2007). This iterative refinement is a certain modification of the conventional gradient decent optimization. Therefore, the iterative refinement introduced by Bargiela exhibits the same drawbacks as those present in the original gradient decent optimization. It is well known that the gradient decent optimization is dependent on the shape of the error surface, the starting point (viz. the values of the randomly initialized weights) and some additional parameters (i.e., the value of the learning rate) (Seiffert & Michaelis, 2002). In this study, we introduce the estimation approach based on Polynomial Neural Networks (PNNs) (Park, Pedrycz, & Oh, 2007). We iteratively accumulate several PNs with two fuzzy variables to overcome the drawbacks of the gradient decent approach by using analytic approach to estimate the parameters of PN with two fuzzy variables (i.e. when the fuzzy linear regression is consider, the number of objective functions that have to be considered is just 4. This kind of problem is not difficult to solve). An overall design scheme of the Polynomial Neural Networks with fuzzy data is illustrated in Fig. 1. In this figure, p1, p2, . . . , pk are the apexes of clusters which are obtained by using clustering method and PSO. The numeric data set (x1, x2, . . . , xm, y) can be converted into the fuzzy numbers such as ~1 ; x ~2 ; . . . ; x ~m ; y ~ by the use of these apexes. x
8919
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
6.5
6
6
5.5
5.5
RMSE
RMSE
6.5
5
5
4.5
4.5
4 0
5
10
15 20 No. of Rules
25
30
4 0
35
5
(a) No. of layers (L) = 2
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
6
7
5.8
6.5
5.6
6 5.2
RMSE
RMSE
5.4
5
5.5 5
4.8
4.5 4.6
4
4.4 4.2 0
5
10
15 20 No. of Rules
25
30
35
3.5 0
5
(c) No. of layers (L) = 4
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 15. The performance of the proposed model (PSO optimization) versus the number of rules and the number of layers – results reported for testing data.
The PNN is constructed by iteratively estimating the parameters of PNs with fuzzy data by using analytic estimation approach like (15) and (16). Let us stress that the parameters of PNN are numeric not fuzzy. When fuzzy data are applied to PNN as inputs, the outputs of PNN are also fuzzy numbers. In other words, when fuzzy variables such ~1 ; x ~2 ; . . . ; x ~ m are the inputs of PNN, the output of PNN is equal as x ~ ^. to y ^ comes from the numeric input variables The numeric output y x1, x2, . . . , xm.
3.1. Conventional Polynomial Neural Networks
cubic. By choosing the most significant input variables and selecting an order of the polynomial among these various types of forms being available, we can construct the best polynomial neuron (PN) of each layer. Additional layers are generated until the best performance of the extended model has been reached. This type of design methodology leads to an optimal PNN structure. Let us recall that the input–output data are given in the form
ðXi ; yi Þ ¼ ðx1i ; x2i ; . . . ; xmi ; yi Þ;
i ¼ 1; 2; . . . ; N
The input–output relationship of the above data by PNN algorithm can be described in the following manner:
y ¼ f ðx1 ; x2 ; . . . ; xm Þ ^ reads as The estimated output y
In order to alleviate the lack of the structural flexibility of GMDH, self-organizing Polynomial Neural Networks (PNNs) were proposed by Oh et al. The structure of PNN is similar to that of a feed-forward neural network. PNN is not a static architecture whose topology is predefined and left unchanged. To the contrary, we encounter dynamically generated networks whose topologies could be adjusted during the design process. The PNN algorithm is based on the GMDH method and utilizes some classes of polynomials such as linear, modified quadratic, and
^ ¼ ^f ðx1 ; x2 ; ; xm Þ y ¼ a0 þ
m X k¼1
ak xk þ
m X m X k¼1 l¼1
akl xk xl þ
m X m X m X
ajkl xj xk xl þ
j¼1 k¼1 l¼1
where ck denotes the kth coefficient of the model. ^, we construct a PN To determine the output of the network y form for each pair of independent variables in the first iteration according to the predefined number (K) of the input variables
8920
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
24
16
22
15
20
14 13
18 RMSE
RMSE
12 16 14
11 10
12
9
10
8
8
7
6 0
5
10
15
20
25
30
6 0
35
5
10
No. of Rules
15
20
25
30
35
30
35
No. of Rules
(a) No. of layers (L) = 2
(b) No. of layers (L) = 3
18
16 15
16
14 13
14 RMSE
RMSE
12 12
11 10
10
9 8
8
7 6 0
5
10
15
20
25
30
No. of Rules
(c) No. of layers (L) = 4
35
6 0
5
10
15
20
25
No. of Rules
(d) No. of layers (L) = 5
Fig. 16. The performance of the proposed model with prototypes defined by FCM in terms of environmental parameters such as the number of rules and the number of layers for training data.
available to the PN node. Here, one determines the parameters of the PN by invoking the weighted least square method by using some training data. In this way, we choose the optimal model forming the 1st layer. In the sequel, we construct new PNs using intermediate variables (for example,zm) being generated at the current iteration. Afterwards, we take another pair of new input variables, and repeat construction of PNs until the stopping criterion has been satisfied. Once the final layer has been constructed, the node characterized by the best performance is selected as the output node. All remaining nodes in that layer are discarded. Furthermore, all the nodes of previous layers that do not have influence on the estimated output node are also removed by tracing the data flow at each iteration. The essence of the design is such that simple functions are combined at all nodes of each layer of the PNN, which leads to more complex forms. The outputs obtained from each of nodes of the same layer are combined to produce a higher order polynomial. 3.2. Fuzzy linear regression based on Polynomial Neural Networks As mentioned in Section 3.1, in case of the numeric data, PNNs have the advantage of being able to choose the meaningful input
variables among all input variables and to form various structures of the networks. In case of the fuzzy data, the advantages of PNNs seem to be intensified because of the NP completeness nature of the problem at hand. Let us again stress that 2m analytic approaches required to evaluate the model parameters of fuzzy linear regression can divided into several 22 analytic approaches for PNNs. Comparing PNNs with the gradient decent optimization, here it is not necessary to set up initial points to run an iterative scheme and to set up further parameter such as learning coefficient. As mentioned before, these simple functions are combined at all nodes of each layer of the PNN, which leads more complex relationships. In Fig. 2, we display an overall structure of the PNNs, which has e j is the trianbeen implemented in an iterative fashion. In Fig. 1, X e ij gular fuzzy set which is defined in the jth input variable space. Y is the triangular fuzzy set which is produced by the jth Polynomial Node (PN) in the ith layer. ‘‘m’’ Means the number of input variables, ‘‘s’’ Is defined as m(m 1)/2, and ‘‘M’’ is the number of the candidates for the input variables of the next layer. The ‘‘Sort and Selection Operation’’ step is based on the performance index given as
8921
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
24
16
22
15
20
14 13
18 RMSE
RMSE
12 16 14
11 10
12
9
10
8
8
7
6 0
5
10
15 20 No. of Rules
25
30
6 0
35
5
16
16
14
14
12
12
8
8
6
5
10
15 20 No. of Rules
25
25
30
35
30
35
10
10
6 0
15 20 No. of Rules
(b) No. of layers (L) = 3
18
RMSE
RMSE
(a) No. of layers (L) = 2
10
30
35
(c) No. of layers (L) = 4
4 0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 17. The performance of the proposed model with prototypes defined by FCM in terms of environmental parameters such as the number of rules and the number of layers for testing data.
PI ¼
k Z X i¼1
0
1
e e L ðaÞ Y b L ðaÞ Y i i
2
da þ
k Z X i¼1
0
1
2 e e U ðaÞ Y b U ðaÞ da Y i i ð17Þ
For simplicity, the number of input variables for the PN is set to 2. When the number of input variables is 2, the 22 analytical approaches are carried out to estimate the parameters of fuzzy linear regression. The pseudo code used to build Fuzzy Linear Regression Polynomial Neural Networks is presented in Fig. 3.
4. Particle Swarm Optimization The socially- inspired optimization technique of particle swarm optimization (PSO), has been recently proposed (Aliev, Guirimov, Fazlollahi, & Aliev, 2009; Angeline, 1998; Juang, Chung, & Hsu, 2007; Kennedy, 1997; Kennedy & Eberhart, 1995; Ozcan & Mohan, 1998; Shi & Eberhart, 1998). PSO is one of the modern heuristic algorithms that belongs to the category of Swarm Intelligence methods. PSO has been motivated by the behavior of organisms, present in fish schooling and bird flocking. Generally, PSO is characterized as a simple concept, it is easy to implement,
and the method itself computationally efficient. Unlike other heuristics, PSO comes with a flexible and well-balanced mechanism to enhance the global and local exploration abilities (Abido, 2002). In what folows, we offef a concise description of the method. Step (1) (Initialization): Set the time counter t = 0 and generate random n particles, {xj(0), j = 1, 2, . . . , n}, where xj(0) = [xj1(0), xj2(0), . . . , xjm(0)]. xjk(0) is generated by randomly selecting a value with uniform probability over the kth max optimized parameter search space xmin . Similarly, k ; xk generate randomly initial velocities of all particles, {vj(0), j = 1, 2, . . . , n}, where vj(0) = [vj1(0), vj2(0), . . . , vjm(0)]. vjk(0) is generated by randomly selecting a value with uni form probability over the kth dimension v max ; v max . k k Each particle in the initial population is evaluated using the objective function, J. For each particle, set pbestj(0) = xj(0) and Jbestj = Jj, j = 1, 2, . . . , n. Search for the best value of the objective function Jbest. Set the particle associated with as the global best, gbest(0), with an objective function of gJbest. Set the initial value of the inertia weight w(0). Step (2) (Time updating): Update the time counter t = t + 1.
8922
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
7.5
6.8 6.7 6.6
7 6.5 RMSE
RMSE
6.4 6.5
6.3 6.2 6.1
6 6 5.9 5.5 0
5
10
15 20 No. of Rules
25
30
35
0
5
(a) No. of layers (L) = 2
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
7
7
6.8
6.6
RMSE
RMSE
6.5
6.4
6.2 6 6 5.8 5.5 0
5
10
15 20 No. of Rules
25
30
35
(c) No. of layers (L) = 4
0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 18. The performance of the proposed model with prototypes defined by PSO in terms of environmental parameters such as the number of rules and the number of layers for training data.
Step (3) (Weight updating): Update the inertia weight w(t). Function (1) w(t) = aw(t 1), where a is a decrement constant smaller than but close to 1. wmin Function (2) wðtÞ ¼ wmax wmax t, where itermax is iter max the maximum number of iterations (generations). Step (4) (Velocity updating): Using the global best and individual best, the jth particle velocity in the kth dimension is updated according to the following equation:
v jk ðtÞ ¼ wðtÞ v jk ðt 1Þ þ c1 r1 ðpbestjk ðt 1Þ xjk ðt 1ÞÞ þ c2 r 2 ðgbestk ðt 1Þ xjk ðt 1ÞÞ
ð18Þ
Check the velocity limits. If the velocity violated its limit, set it at its proper limit. It is worth mentioning that the second term represents the cognitive part of PSO where the particle changes its velocity based on its own thinking and memory. The third term represents the social part of PSO where the particle changes its velocity based on the social-psychological adaptation of knowledge. Step (5) (Position updating): Based on the updated velocities, each particle changes its position according to the following equation:
xjk ðtÞ ¼ v jk ðtÞ þ xjk ðt 1Þ
ð19Þ
xmin 6 xjk ðtÞ 6 xmax k k
ð20Þ
Step (6) (Individual best updating): Each particle is evaluated according to the updated position. If Jj < Jbestj, j = 1, . . . , n, then update individual best as pbestj(t) = xj(t) and Jbestj = Jj, and go to Step 7; else go to Step 7. Step (7) (Global best updating): Search for the minimum value Jmin among Jbestj, where min is the index of the best particle with minimum objective function value, i.e., min 2 {j; j = 1, . . . , n}. If Jmin < gJbest, then update global best as gbest(t) = pbestmin(t) and gJbest = Jmin and go to Step 8; else go to Step 9. Step (8) (Stopping criteria): If one of the stopping criteria is satisfied, then stop, or else go to Step 2. Step (9) (Optimal parameter): The particle that generates the latest gbest is an optimal parameter. In Fig. 4, we present an overall structure of the particle, which includes the apexes of the fuzzy sets defined in the input and output spaces. In Fig. 5, the triangular fuzzy sets constructed based on the example particle shown in Fig. 6 are depicted. Fig. 6 shows the
8923
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
10
8.5 8
9 7.5 7 RMSE
RMSE
8
7
6.5 6
6
5.5 5 5 4 0
5
10
15 20 No. of Rules
25
30
4.5 0
35
5
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
(a) No. of layers (L) = 2 8.5
9
8
8.5 8
7.5
RMSE
RMSE
7.5 7 6.5
7 6.5
6
6
5.5 5 0
5.5
5
10
15 20 No. of Rules
25
30
(c) No. of layers (L) = 4
35
5 0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 19. The performance of the proposed model with prototypes defined by PSO in terms of environmental parameters such as the number of rules and the number of layers for testing data.
example structure of a particle which represents the prototypes defined in input space and output space. 5. Experimental studies We have carried out a suite of experiments to quantify the performance of PNN with fuzzy data and contrast its behavior with other models available in the literature. The experiments deal with some selected data coming from the machine learning repository (http://www.ics.uci.edu/mlearn/MLSummary.html). Each experiment is completed in a 5-fold cross-validation mode for the purpose of tuning the key parameters of each method. In this crossvalidation, we report the values of error obtained on the testing data. The parameter values that yield the best testing results are chosen. We investigate and report the results of each experiment expressed in terms of the mean and the standard deviation of the performance index. We consider some predefined values of the parameters whose values are summarized in Table 1. The choice of these particular numeric values has been motivated by the need to come up with a possibility to investigate of the performance of the model in a fairly comprehensive range of scenarios. In this study, we chose the number of rules (R), and the number of layers by crossvalidation.
The number of iterations was chosen to be in the range of 100– 200 in which case no further significant improvement of the performance has been reported. As a matter of fact, most of the optimization realized by the PSO occurs at the beginning of the overall process. After 50–60 iterations, further reduction of the values of the performance index is quite limited (Pedrycz & Hirota, 2007). The performance of the proposed model can be evaluated from two points of view. The first one is how small the difference between the fuzzy out~ Þ and the fuzzy variable ðy ~Þ defined by fuzzification put of PNN ðy method is. The second one is the standard performance index expressed as the Root Mean Square Error (RMSE) being given as
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X ^i Þ2 RMSE ¼ t ðy y N i¼1 i
ð21Þ
In this paper, we use RMSE as the performance index to evaluate performance of the proposed model. (1) Automobile Miles Per Gallon (MPG) Data We consider the well-known automobile MPG data (ftp:// ics.uci.edu/pub/machine-learning-databased/auto-mpg) with the output being the automobile’s fuel consumption
8924
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
240
180
220 160 200 140
160
RMSE
RMSE
180
140 120
120
100
100 80 80 60 0
5
10
15 20 No. of Rules
25
30
60 0
35
5
(a) No. of layers (L) = 2
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
180
200 180
160
160
RMSE
RMSE
140
120
140 120
100 100 80
60 0
80
5
10
15 20 No. of Rules
25
30
35
(c) No. of layers (L) = 4
60 0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 20. The performance of the proposed model with prototypes defined by FCM in terms of environmental parameters such as the number of rules and the number of layers for training data.
expressed in miles per gallon. The dataset includes 392 input–output pairs (after removing incomplete instances) where the input space involves eight input variables. Fig. 7 shows the performance of the proposed model for the training data according to the experiment environmental parameters such as k and L (the role of these parameters is explained above). The prototypes of the triangular fuzzy sets for the proposed model named as ‘‘fuzzy linear regression based on PNNs’’ are defined by Fuzzy C-Means clustering. The performance of the proposed model for the testing data is shown in Fig. 8. In Fig. 9, we show the topology of PNNs for fuzzy linear regression when the number of layers is 3 and the number of rules is 3. b of the networks depicted in Fig. 9 is a The final output Y b is calfuzzy set not the numerical value. The final output Y culated by using the following expression
54:6562 0:109X 1 5:673 104 X 2 0:0122X 3 0:009X 4 ð22Þ To calculate the numerical output of the networks, the numerical input should be substituted for the fuzzy sets such as X1, X2, X3, X4 in the networks.
Up to now, the fuzzy sets used as the inputs of fuzzy linear regression have been defined by using FCM approach. The modeling performance of fuzzy linear regression mainly depends on the information granules defined by several fuzzy sets. If the information granules can represent the characteristics of the given data set, the modeling performance of fuzzy linear regression becomes better. Figs. 10 and 11 show the performance of the proposed model with regard to the values of the parameters such as k and L (the role of these parameters has been explained above) for the training data and the testing data, respectively. The prototypes of the triangular fuzzy sets for the proposed model named as ‘‘fuzzy linear regression based on PNNs’’ are constructed by running the PSO algorithm. Fig. 11 illustrates the performance of the proposed model with prototypes optimized by PSO for the testing data. (2) Boston Housing Data This dataset concerns data about real estate in the Boston area (ftp://ftp.ics.uci.edu/pub/machine-learningdatabases/ housing/housing.data). There are 13 continuous attributes and 1 binary-valued attribute. The output variable MEDV means the median value of owner-occupied homes. The dataset provides 506 instances.
8925
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
220
200
200 180 150
140
RMSE
RMSE
160
120 100
100
80 60 40 0
5
10
15 20 No. of Rules
25
30
50 0
35
5
(a) No. of layers (L) = 2
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
180
200 180
160 160 140 RMSE
RMSE
140
120
120 100
100
80 80 60 60 0
5
10
15 20 No. of Rules
25
30
(c) No. of layers (L) = 4
35
40 0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 21. Performance of the model versus number of layers.
Fig. 12 shows the performance of the proposed model for some selected values of the parameters. Fig. 13 displays the performance of the proposed model for the testing data where the prototypes were formed by the FCM. The performance of the model is quantified through plots shown in Figs. 14 and 15. (3) Medical Imaging System (MIS) Data This data set consists of 390 software modules written in Pascal and FORTRAN. It uses eleven input variables being typical software measures that include the number of lines of code, number of characters, number of characters, McCabe complexity measure (known as cyclomatic complexity), and a bandwidth complexity measure. The performance of the proposed model with prototypes of fuzzy sets defined by fuzzy C-means clustering approach is shown in Fig. 16. Fig. 17 illustrates the performance of the proposed model with prototypes defined by FCM for the testing data. The performance of the model is shown in Figs. 18 and 19. (4) CPU Performance Data This dataset deals with data describing a relative central processing unit (CPU) performance of computer hardware
(ftp://ftp.ics.uci.edu/pub/marchine-learning-databases/cpuperfomance/). Each data point characterizes a published relative performance (PRP) of the hardware being expressed in terms of its six features. The PRR is the output variable. The obtained results are reported in Fig. 20. Here the prototypes of the triangular fuzzy sets are constructed by running the FCM method. The details as to the performance of the model are covered in Fig. 21. Figs. 22 and 23 show the performance of the proposed model with the prototypes optimized by PSO. Table 2 summarizes the performance of the proposed model vis-à-vis performance of other models such as linear regression, and fuzzy linear regression optimized by gradient decent approach (Bargiela et al., 2007) for the four data sets, respectively. The performance of ‘‘Fuzzy Linear Regression with GD’’ is worse than the linear regression. Note however that the fuzzy linear regression with GD focuses on the minimization of the objective function (17) not the RMSE index (21). The objective function of the gradient decent method makes the performance of the fuzzy linear regression, whose coefficients are estimated by gradient decent method, worse.
8926
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
66
64
64
62
62 60
58
RMSE
RMSE
60
56
58 56
54 54 52 52
50 48 0
5
10
15 20 No. of Rules
25
30
50 0
35
5
(a) No. of layers (L) = 2
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3 65
70
65
60 RMSE
RMSE
60
55
55 50
45 0
5
10
15 20 No. of Rules
25
30
35
(c) No. of layers (L) = 4
50 0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 22. The performance of the proposed model with prototypes defined by PSO in terms of environmental parameters such as the number of rules and the number of layers for training data.
Table 2 Results of comparative analysis. The significance of bold and italic values show the outstanding performance values. Models Automobile MPG data
Linear regression Fuzzy linear regression with GD Proposed model
Boston housing data
Linear regression Fuzzy linear regression with GD Proposed model
Medical imaging system data
FCM PSO
Linear regression Fuzzy linear regression with GD Proposed model
CPU performance data
FCM PSO
FCM PSO
Linear regression Fuzzy linear regression with GD Proposed model
FCM PSO
Design parameters
Testing data (mean ± std)
O=1 R = 10
3.401 ± 0.462 11.560 ± 1.491
R = 20, L = 5 R = 30, L = 3
3.459 ± 0.534 3.334 ± 0.587
O=1 R = 15
4.886 ± 0.488 14.500 ± 2.472
R = 25, L = 5 R = 5, L = 5
5.899 ± 1.531 4.974 ± 0.7728
O=1 R = 30
6.815 ± 1.367 19.106 ± 6.288
R = 10, L = 3 R = 20, L = 5
6.995 ± 1.306 6.495 ± 1.447
O=1 R = 30
71.289 ± 26.333 176.319 ± 31.449
R = 25, L = 5 R = 20, L = 5
77.584 ± 29.409 62.146 ± 21.958
8927
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
110
110
100
100
90
90 RMSE
120
RMSE
120
80
80
70
70
60
60
50
50
40 0
5
10
15 20 No. of Rules
25
30
40 0
35
5
10
15 20 No. of Rules
25
30
35
30
35
(b) No. of layers (L) = 3
(a) No. of layers (L) = 2 120
100
110
90 100 90
80 RMSE
RMSE
80 70 60
70 60
50
50
40 30 20 0
40 5
10
15 20 No. of Rules
25
30
(c) No. of layers (L) = 4
35
0
5
10
15 20 No. of Rules
25
(d) No. of layers (L) = 5
Fig. 23. The performance of the proposed model with prototypes defined by PSO in terms of environmental parameters such as the number of rules and the number of layers for testing data.
6. Conclusions We have presented a new estimation approach for fuzzy liner regression. The proposed approach is based on Polynomial Neural Networks, which is the hierarchically cumulated networks and the final model of which is represented by a polynomial. From the experiments, we can conclude that the proposed estimation approach can handle the issue of negative spreads which makes the analytic estimation for the parameters of fuzzy linear regression impossible when the number of the input variables is large. The experiments ascertain the fact that the fuzzy linear regression based on Polynomial Neural Networks performs better than the fuzzy linear regression optimized by gradient decent approach. Acknowledgement This paper was supported by Wonkwang University in 2009. References Abido, M. A. (2002). Optimal design of power-system stabilizers using particle swarm optimization. IEEE Transaction on Energy Conversion, 17(3), 406–413.
Aliev, R. A., Guirimov, B. G., Fazlollahi, B., & Aliev, R. R. (2009). Evolutionary algorithm-based learning of fuzzy neural networks. Fuzzy Sets and Systems, 160, 2553–2566. Angeline, P. (1998). Evolutionary optimization versus particle swarm optimization: Philosophy and performance differences. In Proc. 7th ann. conf. evolutionary program (pp. 601–610). Bardossy, A. (1990). Note on fuzzy regression. Fuzzy Sets and Systems, 37, 65–75. Bardossy, A., Bogardi, I., & Duckstein, L. (1990). Fuzzy regression in hydrology. Water Resources Research, 26, 1497–1508. Bargiela, A., Pedrycz, W., & Nakashima, T. (2007). Multiple regression with fuzzy data. Fuzzy Sets and Systems, 158, 2168–2188. Cheng, C. B., & Lee, E. S. (2001). Fuzzy regression with radial basis function network. Fuzzy Sets and Systems, 119(2), 291–301. Cherkassky, V., Gehring, D., & Mulier, F. (1996). Comparison of adaptive methods for function estimation from samples. IEEE Transaction on Neural Networks, 7, 969–984. Diamond, P., & Koerner, R. (1997). Extended fuzzy linear models and least squares estimates. Computer and Mathematics Applied, 33(9), 15–32. Dickerson, J. A., & Kosko, B. (1996). Fuzzy function approximation with ellipsoidal rules. IEEE Transactions on System, Man, and Cybernetics, SMB-26, 542–560. Guvenir, H. A., & Uysal, I. (2000). Regression on feature projections. KnowledgeBased Systems (13), 207–214. Imoto, S., Yabuuchi, Y., & Watada, J. (2008). Fuzzy regression model of R&D project evaluation. Applied Soft Computing, 8, 1266–1273. Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man and Cybernetics, SMC-1, 364–378. Ivakhnenko, A. G., & Madala, H. R. (1994). Inductive learning algorithms for complex systems modeling. Boca Raton, Fl: CRC Press. Ivakhnenko, A. G., & Ivakhnenko, G. A. (1995). The review of problems solvable by algorithms of the group method of data handling (GMDH). Pattern Recognition and Image Analysis, 5(4), 527–535.
8928
S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928
Ivakhnenko, A. G., Ivakhnenko, G. A., & Muller, J. A. (1994). Self-organization of neural networks with active neurons. Pattern Recognition and Image Analysis, 4(2), 185–196. Juang, C. F., Chung, I. F., & Hsu, C. H. (2007). Automatic construction of feedforward/ recurrent fuzzy systems by clustering-aided simplex particle swarm optimization. Fuzzy Sets and Systems, 158, 1979–1996. Juang, C. F., & Wang, C. Y. (2009). A self-generating fuzzy system with ant and particle swarm cooperative optimization. Expert Systems with Applications, 36, 5362–5370. Kaveh, A., & Lakanejadi, K. (2011). A novel hybrid charge system search and particle swarm optimization method for multi-objective optimization. Expert Systems with Applications, 38, 15475–15488. Kennedy, J. (1997). The particle swarm: Social adaptation of knowledge. In Proc. IEEE int. conf. evolutionary comput. (pp. 303–308). Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proc. IEEE int. conf. neural networks IV (pp. 1942–1948). Oh, S. K., & Pedrycz, W. (2002). The design of self-organizing Polynomial Neural Networks. Information Science, 141, 237–258. Oh, S. K., Pedrycz, W., & Park, B. J. (2003). Polynomial Neural Networks Architecture: Analysis and design. Computers and Electrical Engineering, 29(6), 703–725. Ozcan, E., & Mohan, C. (1998). Analysis of a simple particle swarm optimization system. Intelligent Engineering Systems Through Artificial Neural Networks, 8, 253–258.
Park, H. S., Pedrycz, W., & Oh, S. K. (2007). Evolutionary design of hybrid selforganizing fuzzy polynomial neural networks with the aid of information granulation. Expert Systems with Applications, 33, 830–846. Seiffert, U., & Michaelis, B. (2002). On the gradient descent in back propagation and its substitution by a genetic algorithm. In Proceedings of the IASTED international conference applied informatics, Austria (pp. 821–826). Shi, Y., & Eberhart, R. (1998). Parameter selection in particle swarm optimization. In Proc. 7th ann. conf. evolutionary program (pp. 591–600). Tanaka, H., Uejima, S., & Asai, K. (1982). Linear regression analysis with fuzzy model. IEEE Transactions on Systems, Man, and Cybernetics, 12, 903–907. Toyoura, Y., Watada, J., Khalid, M., & Yusof, R. (2004). Formulation of linguistic regression model based on natural words. Soft Computer, 681–688. Pedrycz, W., & Hirota, K. (2007). Fuzzy vector quantization with the particle swarm optimization: A study in fuzzy granulation–degranulation information processing. Signal Processing, 87, 2061–2074. Yu, J. R., & Lee, C. W. (2010). Piecewise regression for fuzzy input–output data with automatic change-point detection by quadratic programming. Applied Soft Computing, 10, 111–118. Watada, J. (2001). The thought and model of linguistic regression. In Proceedings the 9th world congress of international fuzzy systems association, Canada (pp. 340–346).