Fuzzy linear regression based on Polynomial Neural Networks

Expert Systems with Applications 39 (2012) 8909–8928 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal hom...

Download PDF

1MB Sizes 0 Downloads 79 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 39 (2012) 8909–8928

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Fuzzy linear regression based on Polynomial Neural Networks Seok-Beom Roh a, Tae-Chon Ahn a,⇑, Witold Pedrycz b,c a

School of Electronic and Control Engineering, Wonkwang University, Iksan, Chon-Buk 570-749, Republic of Korea Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada T6G 2G7 c Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland b

a r t i c l e

i n f o

Keywords: Fuzzy linear regression Polynomial Neural Networks Particle Swarm Optimization Fuzzy Least Square Estimatiom (LSE)

a b s t r a c t In this study, we introduce an estimation approach to determine the parameters of the fuzzy linear regression model. The analytical solution to estimate the values of the parameters has been studied. The issue of negative spreads of fuzzy linear regression makes the problem to be NP complete. To deal with this problem, an iterative reﬁnement of the model parameters based on the gradient decent optimization has been introduced. In the proposed approach, we use a hierarchical structure which is composed of dynamically accumulated simple nodes based on Polynomial Neural Networks the structure of which is very ﬂexible. In this study, we proposed a new methodology of fuzzy linear regression based on the design method of Polynomial Neural Networks. Polynomial Neural Networks divide the complicated analytical approach to estimate the parameters of fuzzy linear regression into several simple analytic approaches. The fuzzy linear regression is implemented by Polynomial Neural Networks with fuzzy numbers which are formed by exploiting clustering and Particle Swarm Optimization. It is shown that the design strategy produces a model exhibiting sound performance. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction In recent years, the problem of modeling and prediction from observed data has been one of the most commonly encountered research topics in machine learning and data analysis (Guvenir & Uysal, 2000). A simple way to describe a system is the regression analysis (Yu & Lee, 2010). In the classical regression both independent and dependent variables are treated as real numbers. However, in many real-world situations, where the complexity of the physical system calls for the development of a more general viewpoint, regression variables are speciﬁed in the form of some non-numeric (granular) entities such as e.g., linguistic variables (Cheng & Lee, 2001). The well-known and commonly encountered classical regression cannot handle such situations (Bardossy, 1990; Bardossy, Bogardi, & Duckstein, 1990). The fuzzy regression, which can deal with the non-numerical entities, especially linguistic variables, was proposed by Imoto, Yabuuchi, and Watada (2008), Tanaka, Uejima, and Asai (1982), Toyoura, Watada, Khalid, and Yusof (2004), and Watada (2001). A fuzzy linear regression proposed by Tanaka is composed of the numeric input variables and the linguistic (granular) coefﬁcients ⇑ Corresponding author. Tel.: +82 63 850 6344; fax: +82 63 853 2196. E-mail addresses: [email protected] (S.-B. Roh), [email protected] (T.-C. Ahn), [email protected] (W. Pedrycz). 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.02.016

which are treated as some fuzzy numbers (in particular, those are described by triangular membership functions). The linguistic coefﬁcients of the regression lead to the linguistic output of the regression model. In other words, the output of a fuzzy linear regression model becomes also a triangular fuzzy number. In essence, the fuzziness of the output of the regression model emerged because of the lack of perfect ﬁt of numeric data to the assumed linear format of the relationship under consideration. In other words, through the introduction of triangular numbers (parameters of the model), this fuzzy regression reﬂects the deviations between the data and the linear model. Computationally, the estimation of the fuzzy parameters of the regression is concerned with some problems of linear programming (Bargiela, Pedrycz, & Nakashima, 2007). Diamond developed a simple regression model for triangular fuzzy numbers under the conceptual framework as

FðRm Þ ! FðRÞ

ð1Þ

where F(R) denotes a family of fuzzy numbers (in our case triangular ones) deﬁned in the space of real numbers R. For the conceptual framework formed by (1), the various analytical formulae quantifying the values of the parameters of the regression model had to address the issue of negative spreads (Diamond & Koerner, 1997), which complicates signiﬁcantly the algorithms and makes them difﬁcult to apply to highly-dimensional data.

8910

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

Fig. 1. An overall structure of the Polynomial Neural Network with fuzzy data.

Fig. 2. An overall structure of the PNN.

Considering the optimization standpoint, A. Bargiela et al. (2007) revised the mapping between the independent variables and the dependent variable to be expressed as follows

FðRÞ FðRÞ FðRÞ ! FðRÞ

ð2Þ

In addition, to deal with the issue of negative spreads, A. Bargiela proposed a certain re-formulation of the regression problem as a gradient-descent optimization task, which enables a generic

generalization of the simple regression model to multiple regression models in a computationally feasible way (Toyoura et al., 2004). The iterative reﬁnement based on the gradient decent approach to estimate the parameter of fuzzy linear regression is the modiﬁcation of the conventional gradient decent optimization. The drawback of the gradient decent optimization is well-known: the optimization performance of the gradient decent optimization

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

Main Procedure of Polynomial_Neural_Network Decide on the design parameters such as (a) The number K of input variables coming to each node (b) Total number M of candidates (which can be chosen as the input variables at next layer) (c) The number L of layers on Polynomial Neural Networks Repeat a. Call Procedure of Polynomial_Neuron b. Sort modeling performance index obtained from procedure of Polynomial_Neuron by a descending series c. Choose M candidates and Set the selected M nodes as new input variables for next layer Until the predefined maximum number of layer L is reached End Procedure of Polynomial_Neural_Network Procedure of Polynomial_Neuron Repeat a. Determine the structure of a polynomial such as the selected input variables and the order of polynomial b. Estimate the coefficients of a polynomial by using the weighted LSE c. Calculate modeling performance index m! ⋅ T (T=4, m is the total number of input variables) is reached Until S = m C K ⋅T = (m − K )! K! Return Sort modeling performance index in Procedure of Polynomial_Neural_Networks End Procedure of Polynomial_Neuron Fig. 3. Pseudo code used in the design of Polynomial Neural Networks.

Fig. 4. Structure of particle.

Fig. 5. Construction of triangular fuzzy sets around the projections of the prototypes deﬁned by particle.

8911

8912

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

Fig. 6. Example of a particle.

mainly depends on the shape of the error surface, the starting point of the candidate solution and the learning coefﬁcient (Seiffert & Michaelis, 2002). In this paper, to overcome the drawback of analytical estimation approach and the iterative reﬁnement approach to estimate parameters of the model, we introduce the new estimation technique, which based on the concept of the Polynomial Neural Networks (PNNs). When dealing with high-order nonlinear and multivariable equations of the model, we require a vast amount of data for estimating all its parameters (Cherkassky, Gehring, & Mulier, 1996;

Table 1 Selected numeric values of the parameters of the proposed model. Parameter

Value

Polynomial Neural Networks No. of rules (k) No. of layer (L) No. of input variables (S) No. of candidates (M)

5, 10, 15, 20, 25, 30 2, 3, 4, 5 layers 2 10

Particle Swarm Optimization No. of particles (n) No. of generations (G)

50 100

6.5

6.5

6

6

5.5 RMSE

RMSE

5.5 5

5 4.5 4.5

4 0

4

5

10

15 20 No. of Rules

25

30

3.5 0

35

5

(a) No. of layers (L) = 2

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

7

7 6.5

6.5

6

RMSE

RMSE

6

5.5

5.5 5

5 4.5 4.5

4 0

4

5

10

15 20 No. of Rules

25

(c) No. of layers (L) = 4

30

35

3.5 0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 7. The performance of the proposed model in terms of environmental parameters such as the number of rules and the number of layers for training data.

8913

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

7

7

6.5

6.5

6 5.5

RMSE

RMSE

6

5

5.5

5 4.5 4.5

4 3.5 0

5

10

15 20 No. of Rules

25

30

4 0

35

5

(a) No. of layers (L) = 2

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

6.5

7 6.5

6

6

RMSE

RMSE

5.5

5

5.5 5

4.5 4.5 4

3.5 0

4

5

10

15

20

25

30

35

3.5 0

5

10

15

20

25

No. of Rules

No. of Rules

(c) No. of layers (L) = 4

(d) No. of layers (L) = 5

Fig. 8. The performance of the proposed model in terms of environmental parameters such as the number of rules and the number of layers for testing data.

1

Z 11 = 52.688− 4.575 X1 −0.0156 X2

2

Z 12 = 56.2056− 0.2675 X1 −0.0099 X4

Z 21 =−0.1178+ 0.0497Z11 + 0.9548Z12 Yˆ = 0.9812 + 0.3929Z 21 + 0.5691Z 22

3

Z 13 = 47.859− 0.0080 X2 − 0.1860 X3

4

Z 14 = 56.0016− 0.0114 X3 − 0.0098 X4

Z 22 =−0.0093+ 0.0577Z13 + 0.9426Z14

Fig. 9. The topology of the Polynomial Neural Networks for fuzzy linear regression when L = 3 and R = 3.

Dickerson & Kosko, 1996). To help alleviate the problems, one of the ﬁrst approaches along the line of a systematic design of nonlinear relationships between system’s inputs and outputs comes under the name of a Group Method of Data Handling (GMDH). GMDH was developed in the late 1960s by Ivakhnenko (Ivakhnenko, 1971; Ivakhnenko & Madala, 1994; Ivakhnenko & Ivakhnenko, 1995; Ivakhnenko, Ivakhnenko, & Muller, 1994) as a

vehicle for identifying nonlinear relations between input and output variables. GMDH-type algorithms have been extensively used since the mid-1970s for prediction and modeling complex nonlinear processes. The GMDH algorithm generates an optimal structure of the model through successive generations of Partial Descriptions (PDs) of data being regarded as quadratic regression polynomials of two input variables. While providing with a systematic design

8914

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

3.55

3.45

3.5 3.4 3.45 3.35 RMSE

RMSE

3.4 3.35

3.3

3.3 3.25 3.25 3.2

3.2

0

5

10

15 20 No. of Rules

25

30

35

0

5

3.45

3.45

3.4

3.4

3.35

3.35

3.3

3.25

3.2

3.2

5

10

15 20 No. of Rules

25

25

30

35

30

35

3.3

3.25

0

15 20 No. of Rules

(b) No. of layers (L) = 3

RMSE

RMSE

(a) No. of layers (L) = 2

10

30

(c) No. of layers (L) = 4

35

0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 10. The performance of the proposed model optimized by PSO in terms of environmental parameters such as the number of rules and the number of layers for training data.

procedure, GMDH comes with some drawbacks. First, it tends to generate quite complex polynomial even for relatively simple systems (experimental data). Second, owing to its limited generic structure (that is quadratic two-variable polynomials), GMDH also tends to produce an overly complex network (model) when it comes to highly nonlinear systems. Third, if there are less than three input variables, GMDH algorithm does not generate a highly versatile structure. To alleviate the problems associated with the GMDH, PNNs were introduced by Oh and Pedrycz (2002) and Oh, Pedrycz, and Park (2003) as a new category of neural networks. In a nutshell, these networks come with a high level of ﬂexibility associated with each node (processing element forming a PD (or PN)) can have a different number of input variables as well as exploit a different order of the polynomial (say, linear, quadratic, cubic, etc.). In comparison to well-known neural networks whose topologies are commonly selected and kept prior to all detailed (parametric) learning, the PNN architecture is not ﬁxed in advance but becomes fully optimized (both structurally and parametrically). As a consequence, PNNs show a superb performance in comparison to the previously presented intelligent models. Although the PNN has a ﬂexible architecture whose potential can be fully utilized through a systematic design, it is difﬁcult to obtain the structurally and

parametrically optimized network because of the limited design of the polynomial neurons (PNs) located in each layer of the PNN. In other words, when we construct PNs of each layer in the conventional PNN, such parameters as the number of input variables (nodes), the order of the polynomial, and the input variables available within a PN are ﬁxed (selected) in advance by the designer. Accordingly, the PNN algorithm exhibits some tendency to produce overly complex networks as well as a repetitive computation load by the trial and error method and/or the repetitive parameter adjustment by designer like in case of the original GMDH algorithm. In order to generate a structurally and parametrically optimized network, such parameters need to be optimal. We augment the conventional PNNs (which is focused on dealing with numeric data) to process fuzzy variables. The fuzzy variables are formed on a basis of available numeric data by using clustering and Particle Swarm Optimization (PSO) (Juang & Wang, 2009; Kaveh & Laknejadi, 2011). Through clustering we capture the distribution of data. PSO can ﬁnd the optimal fuzzy variables, which afterwards can represent the relationships between input and output variable. The paper is organized in the following manner. First, in Section 2, we introduce a concept of fuzzy linear regression. The

8915

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

4

4

3.8

3.8 3.6

RMSE

RMSE

3.6

3.4

3.4 3.2

3.2 3 3

2.8 0

2.8

5

10

15 20 No. of Rules

25

30

2.6 0

35

5

4

4

3.8

3.8

3.6

3.6

3.4

3.2

3

3

5

10

15

20

25

25

30

35

30

35

3.4

3.2

2.8 0

15 20 No. of Rules

(b) No. of layers (L) = 3

RMSE

RMSE

(a) No. of layers (L) = 2

10

30

35

2.8 0

5

10

15 20 No. of Rules

No. of Rules

(c) No. of layers (L) = 4

25

(d) No. of layers (L) = 5

Fig. 11. The performance of the proposed model optimized by PSO in terms of environmental parameters such as the number of rules and the number of layers for testing data.

architecture and development of the PNNs for fuzzy linear regression are studied in Section 3. In Section 4, PSO is described. In Section 5, we report on a comprehensive set of experiments. Finally, concluding remarks are covered in Section 6.

(iii) le is upper semi continuous A e ¼ fz 2 R : l ðzÞ > 0g is bounded (iv) supð AÞ eA e can be represented as a family of sets called A fuzzy number A

e a , deﬁned as a-cuts, A 2. Linear regression with fuzzy data In order to generalize a simple linear regression to the case of imprecise (non-numeric) independent and dependent variables we follow the approach proposed by Diamond (Bargiela et al., 2007) and adopt a subfamily of fuzzy sets, called fuzzy numbers, as a formal framework for the representation of imprecise data. A fuzzy number can be formally deﬁned as follows (Bargiela et al., 2007): e of the set of real numbers R with Deﬁnition 1. A fuzzy subset A membership function le : R ! ½0; 1 is called a fuzzy number if A

e is normal, i.e. there exists an element z0 such that (i) A leðz0 Þ ¼ 1 A

e (ii) A

is fuzzy convex, i.e. le ðkz1 þ ð1 kÞz2 Þ P le ðz1 Þ^ A A leA ðz2 Þ; 8z1 ; z2 2 R; 8k 2 ½0; 1

e a ¼ fz 2 R : A

leA ðzÞ P ag

ð3Þ

and giving rise to the following set-based representation

e ¼ [a2ð0;1 A ea A

ð4Þ

From the deﬁnition of the fuzzy number, it can be easily seen e is a closed interval that every a-cuts of a fuzzy number A e a ¼ ½A e L ðaÞ; A e U ðaÞ where A

e L ðaÞ ¼ inffz 2 R : l ðzÞ P ag A e

ð5Þ

A ðaÞ ¼ supfz 2 R : le ðzÞ P ag

ð6Þ

A

eU

A

e and B e with a-cuts In the sequel, for two fuzzy numbers A L U L U e e e e e e A a ¼ ½ A ðaÞ; A ðaÞ and B a ¼ ½ B ðaÞ; B ðaÞ we deﬁne a distance e and B e to be between A

8916

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

8

8.5

8

7.5

7.5 RMSE

RMSE

7 7

6.5 6.5 6

6

5.5 0

5

10

15 20 No. of Rules

25

30

5.5 0

35

5

10

(a) No. of layers (L) = 2

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

7.6

8

7.4 7.5

7.2 7 RMSE

RMSE

7 6.8 6.6

6.5 6.4 6.2

6

6 0

5

10

15 20 No. of Rules

25

30

35

5.5 0

5

10

(c) No. of layers (L) = 4

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 12. The performance of the proposed model with FCM in terms of environmental parameters such as the number of rules and the number of layers for training data.

e BÞ e ¼ dð A;

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Z 1 Z 1 e L ðaÞ B e U ðaÞ B e L ðaÞÞ2 da þ e U ðaÞÞ2 da ðA ðA 0

ð7Þ

0

The fuzzy linear regression model with multiple independent variables can be deﬁned as follows (Bargiela et al., 2007).

e ¼ b0 þ b1 X e Y

The objective function to evaluate the model parameters is deﬁned as follows

Hðb0 ; b1 Þ ¼ e ¼ b0 þ b1 X e 1 þ b2 X e 2 þ þ bm X em Y

ð8Þ e1

e; X ; X ; . . . ; X em where m is the number of independent variables, Y are all fuzzy numbers (in particular triangular fuzzy number) and b0, b1, b2, . . . , bm are real numbers. The parameters b0, b1, b2, . . . , bm are evaluated by minimizing the cost function H(.) deﬁned as a squared distance between the fuzzy observations and the corresponding fuzzy dependent variable Y evaluated from (8):

min Hðb0 ; b1 ; ; bm Þ ¼

k X

2

d

e2

e i ; b0 þ b1 X e 1 þ b2 X e 2 þ þ bm X em Y i i i

i¼1

ð9Þ The minimum and maximum values of a-cuts intervals should be changed as the model parameters b0, b1, b2, . . . , bm. To show how to estimate the model parameters, we consider the simple linear regression with the only one input variable as follows.

ð10Þ

k X

e iÞ e i ; b0 þ b1 X d ðY 2

ð11Þ

i¼1

We minimize this objective function with respect to b0 and b1 that is

min Hðb0 ; b1 Þ

ð12Þ

The minimum and maximum values of a-cuts intervals of the e depend fuzzy set which is deﬁned by the linear regression b0 þ b1 X on the sign of the parameter b1. If b1 > 0, the minimum value and maximum value of a-cuts e is deﬁned as b0 þ b1 X e L ðaÞ and intervals of the fuzzy set b0 þ b1 X U e b0 þ b1 X ðaÞ respectively. In this case, the objective function is deﬁned as (13).

min Hþ ðb0 ; b1 Þ ¼

k Z X

1

k Z X

1

i¼1

þ

i¼1

0

0

2 e L ðaÞ b0 þ b1 X e L ðaÞ da Y i i

e U ðaÞ b0 þ b1 X e U ðaÞ Y i i

2

da

ð13Þ

8917

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

8.5

8.5 8

8 7.5 7 RMSE

RMSE

7.5

7

6.5 6

6.5

5.5 6 5 5.5 0

5

10

15

20

25

30

4.5 0

35

5

10

15

20

25

No. of Rules

No. of Rules

(a) No. of layers (L) = 2

(b) No. of layers (L) = 3

8.5

30

35

30

35

9 8.5

8 8 7.5 RMSE

RMSE

7.5

7

7 6.5

6.5

6 6 5.5 5.5 0

5

10

15 20 No. of Rules

25

30

35

5 0

5

(c) No. of layers (L) = 4

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 13. The performance of the proposed model (with the FCM used) versus the number of rules and the number of layers; the plots concern testing data.

If b1 < 0, the minimum value and maximum value of a-cuts e is deﬁned as b0 þ b1 X e U ðaÞ and intervals of the fuzzy set b0 þ b1 X L e b0 þ b1 X ðaÞ respectively. In this case, the objective function is deﬁned as (14).

min H ðb0 ; b1 Þ ¼

k Z X i¼1

þ

1

0

k Z X 0

i¼1

1

2 e U ðaÞ b0 þ b1 X e L ðaÞ da Y i i

k Z 1 X @Hþ ðb0 ; b1 Þ e U ðaÞ da e L ðaÞ þ Y ¼ 4kb0 2 Y i i @b0 0 i¼1 ! k Z 1 X L U e e X i ðaÞ þ X i ðaÞda ¼ 0 2b1

b0 ¼ ðY b1 X Þ

Y ¼

2 e L ðaÞ b0 þ b1 X e U ðaÞ da Y i i X ¼ ð14Þ

After assuming the sign of the parameter b1, we determine the objective function among two sort of the objective functions H+ or H and minimize the determined objective function to estimate the parameters b0 and b1 as follows. If b1 > 0, the parameters can be calculated in the form.

i¼1

where

! k Z 1 1 X e L ðaÞ þ Y e U ðaÞ da ; Y i 2k i¼1 0 i ! k Z 1 1 X L U e e X ðaÞ þ X i ðaÞ da 2k i¼1 0 i

k X @Hþ ðb0 ; b1 Þ ¼ 4kb0 X 2 @b1 i¼1

þ 2b1

k Z X i¼1

þ

b1 ¼

0

1

Z 0

1

e L ðaÞ X e L ðaÞ þ Y e U ðaÞ X e U ðaÞ da Y i i i i

2 2 e L ðaÞ þ X e U ðaÞ da ¼ 0 X i i

Pk R 1 e L eL eU eU i¼1 0 Y i ðaÞ X i ðaÞ þ Y i ðaÞ X i ðaÞ da 2kX Y Pk R 1 e L 2 e U 2 X i ðaÞ þ X i ðaÞ da 2kb0 X 2 i¼1 0

ð15Þ

0

If b1 < 0, the parameters are calculated as follows

8918

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

5.5

5.2 5.15

5.4

5.1 5.3 5.05 5 RMSE

RMSE

5.2 5.1

4.95 4.9

5

4.85 4.9 4.8 4.8 4.7 0

4.75 5

10

15 20 No. of Rules

25

30

4.7 0

35

5

5.4

5.4

5.3

5.3

5.2

5.2

5.1

5.1

5

4.9

4.8

4.8

4.7

4.7

5

10

15 20 No. of Rules

25

25

30

35

30

35

5

4.9

4.6 0

15 20 No. of Rules

(b) No. of layers (L) = 3

RMSE

RMSE

(a) No. of layers (L) = 2

10

30

35

(c) No. of layers (L) = 4

4.6 0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 14. The performance of the proposed model (PSO realization) for varying values of the number of rules and the number of layers – training data.

k Z 1 X @H ðb0 ; b1 Þ e L ðaÞ X e U ðaÞ þ Y e U ðaÞ X e L ðaÞ da Y ¼ 4kb0 X 2 i i i i @b1 0 i¼1 k Z 1 2 2 X e L ðaÞ þ X e U ðaÞ da ¼ 0 X þ 2b1 ð16Þ i i i¼1

b1 ¼

0

Pk R 1 e L eU eU eL i¼1 0 Y i ðaÞ X i ðaÞ þ Y i ðaÞ X i ðaÞ da 2kX Y 2 2 Pk R 1 e L e U ðaÞ da 2kb0 X 2 X i ðaÞ þ X i¼1 0 i

3. Polynomial Neural Networks with fuzzy data Although it is possible to determine the analytical solution to (15) and (16), in case of a large number of regression variables the above mentioned estimation is not straightforward as the number of objective functions that have to be considered grows exponentially as (where m is a number of regression variables). In other words, the problem of forming an analytical solution to the fuzzy linear regression is NP-complete given number of regression variables. To overcome the above problem, Bargiela et al. have introduced the iterative reﬁnement of the parameters of the regression model, which is based on the gradient decent

approach (Bargiela et al., 2007). This iterative reﬁnement is a certain modiﬁcation of the conventional gradient decent optimization. Therefore, the iterative reﬁnement introduced by Bargiela exhibits the same drawbacks as those present in the original gradient decent optimization. It is well known that the gradient decent optimization is dependent on the shape of the error surface, the starting point (viz. the values of the randomly initialized weights) and some additional parameters (i.e., the value of the learning rate) (Seiffert & Michaelis, 2002). In this study, we introduce the estimation approach based on Polynomial Neural Networks (PNNs) (Park, Pedrycz, & Oh, 2007). We iteratively accumulate several PNs with two fuzzy variables to overcome the drawbacks of the gradient decent approach by using analytic approach to estimate the parameters of PN with two fuzzy variables (i.e. when the fuzzy linear regression is consider, the number of objective functions that have to be considered is just 4. This kind of problem is not difﬁcult to solve). An overall design scheme of the Polynomial Neural Networks with fuzzy data is illustrated in Fig. 1. In this ﬁgure, p1, p2, . . . , pk are the apexes of clusters which are obtained by using clustering method and PSO. The numeric data set (x1, x2, . . . , xm, y) can be converted into the fuzzy numbers such as ~1 ; x ~2 ; . . . ; x ~m ; y ~ by the use of these apexes. x

8919

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

6.5

6

6

5.5

5.5

RMSE

RMSE

6.5

5

5

4.5

4.5

4 0

5

10

15 20 No. of Rules

25

30

4 0

35

5

(a) No. of layers (L) = 2

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

6

7

5.8

6.5

5.6

6 5.2

RMSE

RMSE

5.4

5

5.5 5

4.8

4.5 4.6

4

4.4 4.2 0

5

10

15 20 No. of Rules

25

30

35

3.5 0

5

(c) No. of layers (L) = 4

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 15. The performance of the proposed model (PSO optimization) versus the number of rules and the number of layers – results reported for testing data.

The PNN is constructed by iteratively estimating the parameters of PNs with fuzzy data by using analytic estimation approach like (15) and (16). Let us stress that the parameters of PNN are numeric not fuzzy. When fuzzy data are applied to PNN as inputs, the outputs of PNN are also fuzzy numbers. In other words, when fuzzy variables such ~1 ; x ~2 ; . . . ; x ~ m are the inputs of PNN, the output of PNN is equal as x ~ ^. to y ^ comes from the numeric input variables The numeric output y x1, x2, . . . , xm.

3.1. Conventional Polynomial Neural Networks

cubic. By choosing the most signiﬁcant input variables and selecting an order of the polynomial among these various types of forms being available, we can construct the best polynomial neuron (PN) of each layer. Additional layers are generated until the best performance of the extended model has been reached. This type of design methodology leads to an optimal PNN structure. Let us recall that the input–output data are given in the form

ðXi ; yi Þ ¼ ðx1i ; x2i ; . . . ; xmi ; yi Þ;

i ¼ 1; 2; . . . ; N

The input–output relationship of the above data by PNN algorithm can be described in the following manner:

y ¼ f ðx1 ; x2 ; . . . ; xm Þ ^ reads as The estimated output y

In order to alleviate the lack of the structural ﬂexibility of GMDH, self-organizing Polynomial Neural Networks (PNNs) were proposed by Oh et al. The structure of PNN is similar to that of a feed-forward neural network. PNN is not a static architecture whose topology is predeﬁned and left unchanged. To the contrary, we encounter dynamically generated networks whose topologies could be adjusted during the design process. The PNN algorithm is based on the GMDH method and utilizes some classes of polynomials such as linear, modiﬁed quadratic, and

^ ¼ ^f ðx1 ; x2 ; ; xm Þ y ¼ a0 þ

m X k¼1

ak xk þ

m X m X k¼1 l¼1

akl xk xl þ

m X m X m X

ajkl xj xk xl þ

j¼1 k¼1 l¼1

where ck denotes the kth coefﬁcient of the model. ^, we construct a PN To determine the output of the network y form for each pair of independent variables in the ﬁrst iteration according to the predeﬁned number (K) of the input variables

8920

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

24

16

22

15

20

14 13

18 RMSE

RMSE

12 16 14

11 10

12

9

10

8

8

7

6 0

5

10

15

20

25

30

6 0

35

5

10

No. of Rules

15

20

25

30

35

30

35

No. of Rules

(a) No. of layers (L) = 2

(b) No. of layers (L) = 3

18

16 15

16

14 13

14 RMSE

RMSE

12 12

11 10

10

9 8

8

7 6 0

5

10

15

20

25

30

No. of Rules

(c) No. of layers (L) = 4

35

6 0

5

10

15

20

25

No. of Rules

(d) No. of layers (L) = 5

Fig. 16. The performance of the proposed model with prototypes deﬁned by FCM in terms of environmental parameters such as the number of rules and the number of layers for training data.

available to the PN node. Here, one determines the parameters of the PN by invoking the weighted least square method by using some training data. In this way, we choose the optimal model forming the 1st layer. In the sequel, we construct new PNs using intermediate variables (for example,zm) being generated at the current iteration. Afterwards, we take another pair of new input variables, and repeat construction of PNs until the stopping criterion has been satisﬁed. Once the ﬁnal layer has been constructed, the node characterized by the best performance is selected as the output node. All remaining nodes in that layer are discarded. Furthermore, all the nodes of previous layers that do not have inﬂuence on the estimated output node are also removed by tracing the data ﬂow at each iteration. The essence of the design is such that simple functions are combined at all nodes of each layer of the PNN, which leads to more complex forms. The outputs obtained from each of nodes of the same layer are combined to produce a higher order polynomial. 3.2. Fuzzy linear regression based on Polynomial Neural Networks As mentioned in Section 3.1, in case of the numeric data, PNNs have the advantage of being able to choose the meaningful input

variables among all input variables and to form various structures of the networks. In case of the fuzzy data, the advantages of PNNs seem to be intensiﬁed because of the NP completeness nature of the problem at hand. Let us again stress that 2m analytic approaches required to evaluate the model parameters of fuzzy linear regression can divided into several 22 analytic approaches for PNNs. Comparing PNNs with the gradient decent optimization, here it is not necessary to set up initial points to run an iterative scheme and to set up further parameter such as learning coefﬁcient. As mentioned before, these simple functions are combined at all nodes of each layer of the PNN, which leads more complex relationships. In Fig. 2, we display an overall structure of the PNNs, which has e j is the trianbeen implemented in an iterative fashion. In Fig. 1, X e ij gular fuzzy set which is deﬁned in the jth input variable space. Y is the triangular fuzzy set which is produced by the jth Polynomial Node (PN) in the ith layer. ‘‘m’’ Means the number of input variables, ‘‘s’’ Is deﬁned as m(m 1)/2, and ‘‘M’’ is the number of the candidates for the input variables of the next layer. The ‘‘Sort and Selection Operation’’ step is based on the performance index given as

8921

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

24

16

22

15

20

14 13

18 RMSE

RMSE

12 16 14

11 10

12

9

10

8

8

7

6 0

5

10

15 20 No. of Rules

25

30

6 0

35

5

16

16

14

14

12

12

8

8

6

5

10

15 20 No. of Rules

25

25

30

35

30

35

10

10

6 0

15 20 No. of Rules

(b) No. of layers (L) = 3

18

RMSE

RMSE

(a) No. of layers (L) = 2

10

30

35

(c) No. of layers (L) = 4

4 0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 17. The performance of the proposed model with prototypes deﬁned by FCM in terms of environmental parameters such as the number of rules and the number of layers for testing data.

PI ¼

k Z X i¼1

0

1

e e L ðaÞ Y b L ðaÞ Y i i

2

da þ

k Z X i¼1

0

1

2 e e U ðaÞ Y b U ðaÞ da Y i i ð17Þ

For simplicity, the number of input variables for the PN is set to 2. When the number of input variables is 2, the 22 analytical approaches are carried out to estimate the parameters of fuzzy linear regression. The pseudo code used to build Fuzzy Linear Regression Polynomial Neural Networks is presented in Fig. 3.

4. Particle Swarm Optimization The socially- inspired optimization technique of particle swarm optimization (PSO), has been recently proposed (Aliev, Guirimov, Fazlollahi, & Aliev, 2009; Angeline, 1998; Juang, Chung, & Hsu, 2007; Kennedy, 1997; Kennedy & Eberhart, 1995; Ozcan & Mohan, 1998; Shi & Eberhart, 1998). PSO is one of the modern heuristic algorithms that belongs to the category of Swarm Intelligence methods. PSO has been motivated by the behavior of organisms, present in ﬁsh schooling and bird ﬂocking. Generally, PSO is characterized as a simple concept, it is easy to implement,

and the method itself computationally efﬁcient. Unlike other heuristics, PSO comes with a ﬂexible and well-balanced mechanism to enhance the global and local exploration abilities (Abido, 2002). In what folows, we offef a concise description of the method. Step (1) (Initialization): Set the time counter t = 0 and generate random n particles, {xj(0), j = 1, 2, . . . , n}, where xj(0) = [xj1(0), xj2(0), . . . , xjm(0)]. xjk(0) is generated by randomly selecting a value with uniform probability over the kth max optimized parameter search space xmin . Similarly, k ; xk generate randomly initial velocities of all particles, {vj(0), j = 1, 2, . . . , n}, where vj(0) = [vj1(0), vj2(0), . . . , vjm(0)]. vjk(0) is generated by randomly selecting a value with uni form probability over the kth dimension v max ; v max . k k Each particle in the initial population is evaluated using the objective function, J. For each particle, set pbestj(0) = xj(0) and Jbestj = Jj, j = 1, 2, . . . , n. Search for the best value of the objective function Jbest. Set the particle associated with as the global best, gbest(0), with an objective function of gJbest. Set the initial value of the inertia weight w(0). Step (2) (Time updating): Update the time counter t = t + 1.

8922

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

7.5

6.8 6.7 6.6

7 6.5 RMSE

RMSE

6.4 6.5

6.3 6.2 6.1

6 6 5.9 5.5 0

5

10

15 20 No. of Rules

25

30

35

0

5

(a) No. of layers (L) = 2

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

7

7

6.8

6.6

RMSE

RMSE

6.5

6.4

6.2 6 6 5.8 5.5 0

5

10

15 20 No. of Rules

25

30

35

(c) No. of layers (L) = 4

0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 18. The performance of the proposed model with prototypes deﬁned by PSO in terms of environmental parameters such as the number of rules and the number of layers for training data.

Step (3) (Weight updating): Update the inertia weight w(t). Function (1) w(t) = aw(t 1), where a is a decrement constant smaller than but close to 1. wmin Function (2) wðtÞ ¼ wmax wmax t, where itermax is iter max the maximum number of iterations (generations). Step (4) (Velocity updating): Using the global best and individual best, the jth particle velocity in the kth dimension is updated according to the following equation:

v jk ðtÞ ¼ wðtÞ v jk ðt 1Þ þ c1 r1 ðpbestjk ðt 1Þ xjk ðt 1ÞÞ þ c2 r 2 ðgbestk ðt 1Þ xjk ðt 1ÞÞ

ð18Þ

Check the velocity limits. If the velocity violated its limit, set it at its proper limit. It is worth mentioning that the second term represents the cognitive part of PSO where the particle changes its velocity based on its own thinking and memory. The third term represents the social part of PSO where the particle changes its velocity based on the social-psychological adaptation of knowledge. Step (5) (Position updating): Based on the updated velocities, each particle changes its position according to the following equation:

xjk ðtÞ ¼ v jk ðtÞ þ xjk ðt 1Þ

ð19Þ

xmin 6 xjk ðtÞ 6 xmax k k

ð20Þ

Step (6) (Individual best updating): Each particle is evaluated according to the updated position. If Jj < Jbestj, j = 1, . . . , n, then update individual best as pbestj(t) = xj(t) and Jbestj = Jj, and go to Step 7; else go to Step 7. Step (7) (Global best updating): Search for the minimum value Jmin among Jbestj, where min is the index of the best particle with minimum objective function value, i.e., min 2 {j; j = 1, . . . , n}. If Jmin < gJbest, then update global best as gbest(t) = pbestmin(t) and gJbest = Jmin and go to Step 8; else go to Step 9. Step (8) (Stopping criteria): If one of the stopping criteria is satisﬁed, then stop, or else go to Step 2. Step (9) (Optimal parameter): The particle that generates the latest gbest is an optimal parameter. In Fig. 4, we present an overall structure of the particle, which includes the apexes of the fuzzy sets deﬁned in the input and output spaces. In Fig. 5, the triangular fuzzy sets constructed based on the example particle shown in Fig. 6 are depicted. Fig. 6 shows the

8923

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

10

8.5 8

9 7.5 7 RMSE

RMSE

8

7

6.5 6

6

5.5 5 5 4 0

5

10

15 20 No. of Rules

25

30

4.5 0

35

5

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

(a) No. of layers (L) = 2 8.5

9

8

8.5 8

7.5

RMSE

RMSE

7.5 7 6.5

7 6.5

6

6

5.5 5 0

5.5

5

10

15 20 No. of Rules

25

30

(c) No. of layers (L) = 4

35

5 0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 19. The performance of the proposed model with prototypes deﬁned by PSO in terms of environmental parameters such as the number of rules and the number of layers for testing data.

example structure of a particle which represents the prototypes deﬁned in input space and output space. 5. Experimental studies We have carried out a suite of experiments to quantify the performance of PNN with fuzzy data and contrast its behavior with other models available in the literature. The experiments deal with some selected data coming from the machine learning repository (http://www.ics.uci.edu/mlearn/MLSummary.html). Each experiment is completed in a 5-fold cross-validation mode for the purpose of tuning the key parameters of each method. In this crossvalidation, we report the values of error obtained on the testing data. The parameter values that yield the best testing results are chosen. We investigate and report the results of each experiment expressed in terms of the mean and the standard deviation of the performance index. We consider some predeﬁned values of the parameters whose values are summarized in Table 1. The choice of these particular numeric values has been motivated by the need to come up with a possibility to investigate of the performance of the model in a fairly comprehensive range of scenarios. In this study, we chose the number of rules (R), and the number of layers by crossvalidation.

The number of iterations was chosen to be in the range of 100– 200 in which case no further signiﬁcant improvement of the performance has been reported. As a matter of fact, most of the optimization realized by the PSO occurs at the beginning of the overall process. After 50–60 iterations, further reduction of the values of the performance index is quite limited (Pedrycz & Hirota, 2007). The performance of the proposed model can be evaluated from two points of view. The ﬁrst one is how small the difference between the fuzzy out~ Þ and the fuzzy variable ðy ~Þ deﬁned by fuzziﬁcation put of PNN ðy method is. The second one is the standard performance index expressed as the Root Mean Square Error (RMSE) being given as

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u N u1 X ^i Þ2 RMSE ¼ t ðy y N i¼1 i

ð21Þ

In this paper, we use RMSE as the performance index to evaluate performance of the proposed model. (1) Automobile Miles Per Gallon (MPG) Data We consider the well-known automobile MPG data (ftp:// ics.uci.edu/pub/machine-learning-databased/auto-mpg) with the output being the automobile’s fuel consumption

8924

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

240

180

220 160 200 140

160

RMSE

RMSE

180

140 120

120

100

100 80 80 60 0

5

10

15 20 No. of Rules

25

30

60 0

35

5

(a) No. of layers (L) = 2

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

180

200 180

160

160

RMSE

RMSE

140

120

140 120

100 100 80

60 0

80

5

10

15 20 No. of Rules

25

30

35

(c) No. of layers (L) = 4

60 0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 20. The performance of the proposed model with prototypes deﬁned by FCM in terms of environmental parameters such as the number of rules and the number of layers for training data.

expressed in miles per gallon. The dataset includes 392 input–output pairs (after removing incomplete instances) where the input space involves eight input variables. Fig. 7 shows the performance of the proposed model for the training data according to the experiment environmental parameters such as k and L (the role of these parameters is explained above). The prototypes of the triangular fuzzy sets for the proposed model named as ‘‘fuzzy linear regression based on PNNs’’ are deﬁned by Fuzzy C-Means clustering. The performance of the proposed model for the testing data is shown in Fig. 8. In Fig. 9, we show the topology of PNNs for fuzzy linear regression when the number of layers is 3 and the number of rules is 3. b of the networks depicted in Fig. 9 is a The ﬁnal output Y b is calfuzzy set not the numerical value. The ﬁnal output Y culated by using the following expression

54:6562 0:109X 1 5:673 104 X 2 0:0122X 3 0:009X 4 ð22Þ To calculate the numerical output of the networks, the numerical input should be substituted for the fuzzy sets such as X1, X2, X3, X4 in the networks.

Up to now, the fuzzy sets used as the inputs of fuzzy linear regression have been deﬁned by using FCM approach. The modeling performance of fuzzy linear regression mainly depends on the information granules deﬁned by several fuzzy sets. If the information granules can represent the characteristics of the given data set, the modeling performance of fuzzy linear regression becomes better. Figs. 10 and 11 show the performance of the proposed model with regard to the values of the parameters such as k and L (the role of these parameters has been explained above) for the training data and the testing data, respectively. The prototypes of the triangular fuzzy sets for the proposed model named as ‘‘fuzzy linear regression based on PNNs’’ are constructed by running the PSO algorithm. Fig. 11 illustrates the performance of the proposed model with prototypes optimized by PSO for the testing data. (2) Boston Housing Data This dataset concerns data about real estate in the Boston area (ftp://ftp.ics.uci.edu/pub/machine-learningdatabases/ housing/housing.data). There are 13 continuous attributes and 1 binary-valued attribute. The output variable MEDV means the median value of owner-occupied homes. The dataset provides 506 instances.

8925

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

220

200

200 180 150

140

RMSE

RMSE

160

120 100

100

80 60 40 0

5

10

15 20 No. of Rules

25

30

50 0

35

5

(a) No. of layers (L) = 2

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

180

200 180

160 160 140 RMSE

RMSE

140

120

120 100

100

80 80 60 60 0

5

10

15 20 No. of Rules

25

30

(c) No. of layers (L) = 4

35

40 0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 21. Performance of the model versus number of layers.

Fig. 12 shows the performance of the proposed model for some selected values of the parameters. Fig. 13 displays the performance of the proposed model for the testing data where the prototypes were formed by the FCM. The performance of the model is quantiﬁed through plots shown in Figs. 14 and 15. (3) Medical Imaging System (MIS) Data This data set consists of 390 software modules written in Pascal and FORTRAN. It uses eleven input variables being typical software measures that include the number of lines of code, number of characters, number of characters, McCabe complexity measure (known as cyclomatic complexity), and a bandwidth complexity measure. The performance of the proposed model with prototypes of fuzzy sets deﬁned by fuzzy C-means clustering approach is shown in Fig. 16. Fig. 17 illustrates the performance of the proposed model with prototypes deﬁned by FCM for the testing data. The performance of the model is shown in Figs. 18 and 19. (4) CPU Performance Data This dataset deals with data describing a relative central processing unit (CPU) performance of computer hardware

(ftp://ftp.ics.uci.edu/pub/marchine-learning-databases/cpuperfomance/). Each data point characterizes a published relative performance (PRP) of the hardware being expressed in terms of its six features. The PRR is the output variable. The obtained results are reported in Fig. 20. Here the prototypes of the triangular fuzzy sets are constructed by running the FCM method. The details as to the performance of the model are covered in Fig. 21. Figs. 22 and 23 show the performance of the proposed model with the prototypes optimized by PSO. Table 2 summarizes the performance of the proposed model vis-à-vis performance of other models such as linear regression, and fuzzy linear regression optimized by gradient decent approach (Bargiela et al., 2007) for the four data sets, respectively. The performance of ‘‘Fuzzy Linear Regression with GD’’ is worse than the linear regression. Note however that the fuzzy linear regression with GD focuses on the minimization of the objective function (17) not the RMSE index (21). The objective function of the gradient decent method makes the performance of the fuzzy linear regression, whose coefﬁcients are estimated by gradient decent method, worse.

8926

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

66

64

64

62

62 60

58

RMSE

RMSE

60

56

58 56

54 54 52 52

50 48 0

5

10

15 20 No. of Rules

25

30

50 0

35

5

(a) No. of layers (L) = 2

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3 65

70

65

60 RMSE

RMSE

60

55

55 50

45 0

5

10

15 20 No. of Rules

25

30

35

(c) No. of layers (L) = 4

50 0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 22. The performance of the proposed model with prototypes deﬁned by PSO in terms of environmental parameters such as the number of rules and the number of layers for training data.

Table 2 Results of comparative analysis. The signiﬁcance of bold and italic values show the outstanding performance values. Models Automobile MPG data

Linear regression Fuzzy linear regression with GD Proposed model

Boston housing data

Linear regression Fuzzy linear regression with GD Proposed model

Medical imaging system data

FCM PSO

Linear regression Fuzzy linear regression with GD Proposed model

CPU performance data

FCM PSO

FCM PSO

Linear regression Fuzzy linear regression with GD Proposed model

FCM PSO

Design parameters

Testing data (mean ± std)

O=1 R = 10

3.401 ± 0.462 11.560 ± 1.491

R = 20, L = 5 R = 30, L = 3

3.459 ± 0.534 3.334 ± 0.587

O=1 R = 15

4.886 ± 0.488 14.500 ± 2.472

R = 25, L = 5 R = 5, L = 5

5.899 ± 1.531 4.974 ± 0.7728

O=1 R = 30

6.815 ± 1.367 19.106 ± 6.288

R = 10, L = 3 R = 20, L = 5

6.995 ± 1.306 6.495 ± 1.447

O=1 R = 30

71.289 ± 26.333 176.319 ± 31.449

R = 25, L = 5 R = 20, L = 5

77.584 ± 29.409 62.146 ± 21.958

8927

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

110

110

100

100

90

90 RMSE

120

RMSE

120

80

80

70

70

60

60

50

50

40 0

5

10

15 20 No. of Rules

25

30

40 0

35

5

10

15 20 No. of Rules

25

30

35

30

35

(b) No. of layers (L) = 3

(a) No. of layers (L) = 2 120

100

110

90 100 90

80 RMSE

RMSE

80 70 60

70 60

50

50

40 30 20 0

40 5

10

15 20 No. of Rules

25

30

(c) No. of layers (L) = 4

35

0

5

10

15 20 No. of Rules

25

(d) No. of layers (L) = 5

Fig. 23. The performance of the proposed model with prototypes deﬁned by PSO in terms of environmental parameters such as the number of rules and the number of layers for testing data.

6. Conclusions We have presented a new estimation approach for fuzzy liner regression. The proposed approach is based on Polynomial Neural Networks, which is the hierarchically cumulated networks and the ﬁnal model of which is represented by a polynomial. From the experiments, we can conclude that the proposed estimation approach can handle the issue of negative spreads which makes the analytic estimation for the parameters of fuzzy linear regression impossible when the number of the input variables is large. The experiments ascertain the fact that the fuzzy linear regression based on Polynomial Neural Networks performs better than the fuzzy linear regression optimized by gradient decent approach. Acknowledgement This paper was supported by Wonkwang University in 2009. References Abido, M. A. (2002). Optimal design of power-system stabilizers using particle swarm optimization. IEEE Transaction on Energy Conversion, 17(3), 406–413.

Aliev, R. A., Guirimov, B. G., Fazlollahi, B., & Aliev, R. R. (2009). Evolutionary algorithm-based learning of fuzzy neural networks. Fuzzy Sets and Systems, 160, 2553–2566. Angeline, P. (1998). Evolutionary optimization versus particle swarm optimization: Philosophy and performance differences. In Proc. 7th ann. conf. evolutionary program (pp. 601–610). Bardossy, A. (1990). Note on fuzzy regression. Fuzzy Sets and Systems, 37, 65–75. Bardossy, A., Bogardi, I., & Duckstein, L. (1990). Fuzzy regression in hydrology. Water Resources Research, 26, 1497–1508. Bargiela, A., Pedrycz, W., & Nakashima, T. (2007). Multiple regression with fuzzy data. Fuzzy Sets and Systems, 158, 2168–2188. Cheng, C. B., & Lee, E. S. (2001). Fuzzy regression with radial basis function network. Fuzzy Sets and Systems, 119(2), 291–301. Cherkassky, V., Gehring, D., & Mulier, F. (1996). Comparison of adaptive methods for function estimation from samples. IEEE Transaction on Neural Networks, 7, 969–984. Diamond, P., & Koerner, R. (1997). Extended fuzzy linear models and least squares estimates. Computer and Mathematics Applied, 33(9), 15–32. Dickerson, J. A., & Kosko, B. (1996). Fuzzy function approximation with ellipsoidal rules. IEEE Transactions on System, Man, and Cybernetics, SMB-26, 542–560. Guvenir, H. A., & Uysal, I. (2000). Regression on feature projections. KnowledgeBased Systems (13), 207–214. Imoto, S., Yabuuchi, Y., & Watada, J. (2008). Fuzzy regression model of R&D project evaluation. Applied Soft Computing, 8, 1266–1273. Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man and Cybernetics, SMC-1, 364–378. Ivakhnenko, A. G., & Madala, H. R. (1994). Inductive learning algorithms for complex systems modeling. Boca Raton, Fl: CRC Press. Ivakhnenko, A. G., & Ivakhnenko, G. A. (1995). The review of problems solvable by algorithms of the group method of data handling (GMDH). Pattern Recognition and Image Analysis, 5(4), 527–535.

8928

S.-B. Roh et al. / Expert Systems with Applications 39 (2012) 8909–8928

Ivakhnenko, A. G., Ivakhnenko, G. A., & Muller, J. A. (1994). Self-organization of neural networks with active neurons. Pattern Recognition and Image Analysis, 4(2), 185–196. Juang, C. F., Chung, I. F., & Hsu, C. H. (2007). Automatic construction of feedforward/ recurrent fuzzy systems by clustering-aided simplex particle swarm optimization. Fuzzy Sets and Systems, 158, 1979–1996. Juang, C. F., & Wang, C. Y. (2009). A self-generating fuzzy system with ant and particle swarm cooperative optimization. Expert Systems with Applications, 36, 5362–5370. Kaveh, A., & Lakanejadi, K. (2011). A novel hybrid charge system search and particle swarm optimization method for multi-objective optimization. Expert Systems with Applications, 38, 15475–15488. Kennedy, J. (1997). The particle swarm: Social adaptation of knowledge. In Proc. IEEE int. conf. evolutionary comput. (pp. 303–308). Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proc. IEEE int. conf. neural networks IV (pp. 1942–1948). Oh, S. K., & Pedrycz, W. (2002). The design of self-organizing Polynomial Neural Networks. Information Science, 141, 237–258. Oh, S. K., Pedrycz, W., & Park, B. J. (2003). Polynomial Neural Networks Architecture: Analysis and design. Computers and Electrical Engineering, 29(6), 703–725. Ozcan, E., & Mohan, C. (1998). Analysis of a simple particle swarm optimization system. Intelligent Engineering Systems Through Artiﬁcial Neural Networks, 8, 253–258.

Park, H. S., Pedrycz, W., & Oh, S. K. (2007). Evolutionary design of hybrid selforganizing fuzzy polynomial neural networks with the aid of information granulation. Expert Systems with Applications, 33, 830–846. Seiffert, U., & Michaelis, B. (2002). On the gradient descent in back propagation and its substitution by a genetic algorithm. In Proceedings of the IASTED international conference applied informatics, Austria (pp. 821–826). Shi, Y., & Eberhart, R. (1998). Parameter selection in particle swarm optimization. In Proc. 7th ann. conf. evolutionary program (pp. 591–600). Tanaka, H., Uejima, S., & Asai, K. (1982). Linear regression analysis with fuzzy model. IEEE Transactions on Systems, Man, and Cybernetics, 12, 903–907. Toyoura, Y., Watada, J., Khalid, M., & Yusof, R. (2004). Formulation of linguistic regression model based on natural words. Soft Computer, 681–688. Pedrycz, W., & Hirota, K. (2007). Fuzzy vector quantization with the particle swarm optimization: A study in fuzzy granulation–degranulation information processing. Signal Processing, 87, 2061–2074. Yu, J. R., & Lee, C. W. (2010). Piecewise regression for fuzzy input–output data with automatic change-point detection by quadratic programming. Applied Soft Computing, 10, 111–118. Watada, J. (2001). The thought and model of linguistic regression. In Proceedings the 9th world congress of international fuzzy systems association, Canada (pp. 340–346).

Fuzzy linear regression based on Polynomial Neural Networks

Fuzzy linear regression based on Polynomial Neural Networks

Recommend Documents