Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method (FCM) and polynomial neural networks (PNNs)

Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method (FCM) and polynomial neural networks (PNNs)

Accepted Manuscript Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method...

985KB Sizes 0 Downloads 22 Views

Accepted Manuscript Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method (FCM) and polynomial neural networks (PNNs) Wei Huang, Sung-Kwun Oh, Witold Pedrycz PII: DOI: Reference:

S0893-6080(14)00202-0 http://dx.doi.org/10.1016/j.neunet.2014.08.007 NN 3384

To appear in:

Neural Networks

Received date: 3 February 2014 Revised date: 20 May 2014 Accepted date: 20 August 2014 Please cite this article as: Huang, W., Oh, S. -K., & Pedrycz, W. Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method (FCM) and polynomial neural networks (PNNs). Neural Networks (2014), http://dx.doi.org/10.1016/j.neunet.2014.08.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Design of Hybrid Radial Basis Function Neural Networks (HRBFNNs) Realized With the Aid of Hybridization of Fuzzy Clustering Method (FCM) and Polynomial Neural Networks (PNNs) Wei Huang1,2 1 State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China 2 School of Computer and Communication Engineering, Tianjin University of Technology, Tianj in 300384, China 3

Sung-Kwun Oh3 Department of Electrical Engineering, The University of Suwon, San 2-2 Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743, South Korea 4

Witold Pedrycz4,5 Department of Electrical & Computer Engineering, University of Alberta, Edmonton T6R 2G7 Canada 5 Systems Science Institute, Polish Academy of Sciences Warsaw, Poland

Abstract In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature.

Keywords: Hybrid Radial Basis Function Neural Networks (HRBFNNs), Fuzzy Clustering Method (FCM), Polynomial fuzzy neurons (PFNs), Principal Component Analysis (PCA), Genetic algorithm (GA)

1. Introduction The past decades have witnessed a continuously growing interest in the development of diverse models based on techniques of Computational Intelligence (CI), whose predominant technologies include neural networks, fuzzy sets, and evolutionary methods [1-3]. The techniques of CI gave rise to a number of new methodological insights and increased our awareness about crucial tradeoffs that one has to consider in system modeling [4-5]. As one of successful neural networks based on the principles of CI, Radial Basis Function Neural Networks (RBFNNs) have been applied in lots of fields such as engineering, medical engineering, and social science [6-7]. In the design of RBFNNs, the activation functions of hidden nodes are typically realized by using some Gaussian functions, while the output comes as a weighted sum of the activation levels of the individual RBFs [8],[9]. Such architecture albeit quite flexible, is not free from eventual drawbacks. One limitation is that discriminant functions generated by RBFNNs have a relatively simple geometry which is implied by the limited geometric variability of the underlying receptive fields (radial basis functions) located at the hidden layer of the RBF network [10]. To enhance the capabilities of information granulation, Fuzzy Radial Basis Function Neural Networks (FRBFNNs) have been further proposed. Unlike the RBFNNs, in the FRBFNNs FCM is used to realize information granulation in the premise part. FRBFNNs exhibit some advantages including global optimal approximation and classification capabilities, and a rapid convergence of the learning procedures, see [ 11-12]. However, FRBFNNs cannot easily generate complex nonlinear functions (e.g. high-order nonlinear discriminant functions). Some enhancements are proposed to the FRBFNN such as Polynomial Fuzzy Radial Basis Function Neural Networks (PF RBFNNs), yet the problem of generating high-order discriminant functions remains open. Group method of data handling (GMDH) developed by Ivakhnenko [13] is used as a convenient vehicle for identifying high-order nonlinear relations between input and output variables. GMDH forms a viable alternative to other models as such networks can easily cope with a nonlinear nature of data, and this becomes an evident advantage over low order polynomials such as linear and quadratic functions which are usually used in a "standard" regression polynomial fuzzy model [14]. Pioneering studies by Mital[15], Nagata et al. [16], Iwasaki et al. [17], and Elattar et al. [18] led to different improved GMDH models. In spite of their advantages, GMDH comes with some limitations [19]: first, it tends to generate quite complex polynomials even for relatively simple relationships; second, it also tends to produce an overly complex network when it comes to highly nonlinear systems due to its limited generic structure, and it does not generate a highly versatile structure if there are less than three input variables. To overcome these limitations, in the previous 2

study, we proposed an architecture of Fuzzy Polynomial Neural Networks (FPNNs) [20]. In the sequel, Radial Basis Function-based Polynomial Neural Networks (RBF-PNNs) [21] that are a certain type of FPNNs with Radial Basis Function neurons were developed as well. Nevertheless, in most cases all these advanced models are still not free from limitations. In this study, we develop hybrid radial basis function neural networks (HRBFNNs), which use fuzzy clustering method (Fuzzy C-Means, FCM, to be specific) to form information granules for the premise part of the network. The consequent parts of HRBFNNs are designed by using fuzzy polynomial neural networks. Along with the enhanced architecture, we take advantage of both polynomial neural networks and fuzzy clustering. The FPNNs of the HRBFNNs are formed with the use of polynomial fuzzy neurons (PFNs). Furthermore, genetic algorithm (GA) is exploited to optimize the parameters of HRBFNNs. To evaluate the performance of the proposed model, we discuss several experimental studies exploiting well-known data being already used in the realm of neuro-fuzzy modeling. Principal component analysis (PCA) is utilized to preprocess the data. The obtained results demonstrate superiority of the proposed networks over an already development models. This paper is organized in the following manner. Section 2 introduces the design of hybrid radial basis function neural networks. Section 3 gives an overall description of a detailed design methodology of HRBFNNs using genetic optimization. Section 4 reports on a comprehensive set of experiments. Finally concluding remarks are covered in Section 5.

2. Hybrid Radial Basis Function Neural Networks: an idea In this section, we first briefly review several essential classes of networks such as RBFNN, PFRBFNN, and FPNN, and then focus on the underlying idea of HRBFNNs.

2.1 Background of Hybrid Fuzzy Polynomial Neural Networks 1) RBFNN. RBFNN initialized by Broomhead and Lowe is an artificial neural network that embraces three different layers: an input layer, a hidden layer with a non-linear RBF activation function, and a linear output layer (viz. neurons with linear activation functions). Input layer: the input variables

are represented as a vector of real number.

Hidden layer: this layer consists of several hidden neurons, whose RBF activation functions are typically realized in the form of some Gaussian function. The center and width (spread) determine the location of RBF in the input space, and the number of the receptive fields is always provided in advance. Fuzzy clustering (here FCM) may be used to determine the number of RBFs, a position of their centers, and the values of the

3

widths of these functions [8], [9]. Output layer: the output of RBFNN is a weighted sum of the activation levels of the individual RBFs. The learning algorithm (e.g. a gradient method) is used to adjust the weights of the links coming from the hidden layer to the output layer [8],[9]. 2) PFRBFNN. The structure of PFRBFNN is similar to the conventional RBFNN, while there are two key differences between RBFNN and PFRBFNN. First, FCM is used to form the receptive fields (RBFs) in the PFRBFNN. In this manner, the RBFs do not assume any explicit functional form (such as Gaussian, ellipsoidal, triangular and others) but are of implicit nature as the activation level (degree of membership) result from the computing of the relevant distances between data and the corresponding prototypes (centers of the receptive fields) [10],[39]. Second, four types of polynomials, namely constant (zero-order), linear (1-order), quadratic (2-order) and reduced quadratic (2-order) are used as the local model [10]. The reason why one uses reduced quadratic is that in case of quadratic functions the dimensionality of the problem increases quite quickly especially when dealing with a large number of input variables. As pointed out by Jang, RBFNNs and fuzzy inference systems having rules with constant consequent (a zero-order Sugeno model) are functionally equivalent [22]. Along with this observation, a PFRBFNN can also be represented as follows:

R i : IF x is in cluster Ai THEN yi where

Ri

f i ( x)

(1)

is the ith fuzzy rule, i=1, ∙∙∙, n, where n is the number of fuzzy rules (the number of clusters),

f i ( x) is the consequent polynomial of the ith fuzzy rule, i.e., a local model representing input-output relationship of the ith sub-space (local area). The highest polynomial order of f i ( x ) is equal to 2. 3) FPNN. FPNN is based on the GMDH that is a dynamically formed model, whose topology is not predefined and kept unchanged [20-21]. Unlike static neural networks, FPNN consists of several different layers, namely an input layer, an output layer, and several hidden (intermediate) layers (e.g. 1st layer, 2nd layer, 3rd layer, etc). In the intermediate layers, each layer includes several polynomial neurons (PN) with multiple inputs and one output. The output

of a PN in the current layer is used as the input of PN located at

the next layer. In particular, the input of PN in the 1st layer is the input variables in the input layer. As to the PN, the output is a class of base polynomials, such as linear, quadratic, etc. Assume that

X

( x1 , x2 ,..., xl ) are the input of PN, the output of PN is one type of base polynomials whose general

type can be described as follows l

y

f ( x1 , x2 ,..., xl )

a0

l

l

ai xi + i 1

aij xi x j

(2)

i 1 j 1

Furthermore, consider the input-output data are given in the following fashion

( X i , yi )

( x1i , x2i ,..., xli , yi ),

i 1, 2,..., N

(3)

The estimated output yˆ of the FPNN reads as follows

4



fˆ ( x1 , x2 ,..., xl )

l

a0

l

l

ai xi + i 1

l

l

l

aij xi x j + i 1 j 1

aijk xi x j xk

......

(4)

i 1 j 1 k 1

2.2 A Concept of Hybrid Radial Basis Function Neural Networks

In the ―conventional‖ RBFNNs, the input variables coming from the experimental data are directly used. It is apparent that this methodology may lead to the limitation of approximation abilities. By using FCM to form the receptive fields (RBFs), it is clear that the FRBFNN has the advantage by providing a through coverage of the entire input space in comparison with the RBFNN. However, the resulting accuracy of the model becomes limited due to the low order (less than or equal to the 2-order) polynomial used in the local models of PFRBFNN. If higher order polynomials were used as some local models, one can easily develop a fuzzy model with higher approximation ability based once the coefficients of this model have been estimated. Nevertheless, estimating coefficients of such high-order local models could become very difficult. When constructing such a nonlinear model with high-order character, FPNN initialized by Oh [20] comes as one of the effective neural networks that helps us alleviate the problem. In other words, estimating coefficients of high-order polynomials (local models) can be realized by using the FPNN to some extent. With this regard, we can develop a HRBFNN, which applies the FPNN as the consequence part of PFRBFNN. The HRBFNN is expressed in the form l

R i : IF x is in cluster Ai THEN yi

a0

l

l

ai xi + i 1

l

l

l

aij xi x j + i 1 j 1

aijk xi x j xk

......

(5)

i 1 j 1 k 1

Furthermore, there remains two open problems in the design of HRBFNN. First, the number of PNs substantially goes up with when increasing the number of input variables. Second, there are several parameters needed to be determined in the design of the neural network. Here the PCA is used to reduce the dimensionality of the input space, while the GA is utilized to obtain the "good choice" of parameters.

3. Architecture of Hybrid Radial Basis Function Neural Networks In (5), we proposed a novel architecture of hybrid radial basis function neural networks. A general architecture of HRBFNNs consists of five functional components: input, premise part, conclusion part, aggregation part, and output, as shown in Figure 1(a). Premise and conclusion parts relate to the formation of the fuzzy rules. The aggregation part is concerned with a fuzzy inference (mapping procedure). It has to be stressed that the HRBFNNs become the ―conventional‖

5

polynomial radial basis function neural networks (pRBFNNs) [10] when there is only a single node in the PFRNN. In other words, the HRBFNNs emerge as extended neural networks of pRBFNNs. It has to be noted that the fuzzy polynomial neural networks exhibits also a new architecture, which is essentially an improved fuzzy polynomial neural network (FPNN) [20-21] with novel polynomial fuzzy neurons. In Figure 1, k stands for the number of input variables coming from original system, n is the number of clusters coming from the partitioning of the input space by using FCM, wi ( i

[1, n] )

are

membership grades of fuzzy clusters, and m is the number of PFNs for constructing PFNs in the next layer.

Z i1 , Z i 2 ,..., Z iP

are new virtual input variables, which are the output variables of PFNs. As

the consequent part of HRBFNNs, we use a linear function in the following way: Fj ( Z i1 , Z i 2 ,..., Z iP )

a j0

a j1Z i1

a j 2 Zi 2

... a jP Z iP (i 1, 2,.., m, j

(6)

1, 2,..., n)

Where P denotes the number of the virtual input variables whose number is selected through genetic optimization (see Section 5). In this study, Fuzzy C-Means is considered to partition input space to form the premise part, while the minimization of the LSE is used to estimate the coefficients of the polynomials in the aggregation part. In general, the development of effective neural network becomes more complicated in the presence of the increasing number of input variables. The use of PCA [23] as a viable and quite commonly considered alternative to carry out dimensionality reduction is a sound alternative in our study. The PCA is then considered to preprocess the data sets to realize data reduction and extraction of features.

3.1 Partition Input Space via Fuzzy C-Means (FCM) In the design of the premise part, Fuzzy C-Means (FCM) is utilized to partition the space of input data. The FCM clustering method is used widely as a vehicle to cluster data into overlapping clusters described by a certain partition matrix. Let us briefly recall the essence of the FCM method. Consider the data set

X {x1 , x 2 ,

, xp,

, xm} , x p

{x p1 , x p 2 ,

, x pj ,

, x pl } , 1

p

m, 1

where m is the number of input data, k is the number of input variables. The input vectors

xk

j

l ,

X

ar

e partitioned into n clusters where the clusters are represented by the prototypes (center values) vi

, vik } ,

{vi1 , vi 2 ,

1 i

n . A fuzzy partition matrix U [uij ] satisfies the following conditions n

uij

1,

1 j

m.

(7)

i 1

and m

0

uij m, 1 i

n use < symbol not this strange one

(8)

j 1

The FCM algorithm minimizes the following objective function 6

n

m

(uij ) r d (x j , vi ), 1 r

Jm

(9)

i 1 j 1

where r is the fuzzification coefficient r>1, d (xk , vi ) is a distance between the input vecto k r x p X and the prototype (centroid) v i . The distance is taken as the sum (weighted Eu clidean distance) d (x p , v i )

xp

vi

2

l

( x pj

2 j

Premise part

Data Processing

xm

(10)

is the variance of the j th input variable.

Input x1 x2

2

2 j

j 1

In (5),

vij )

Fuzzy clusting PCA

x1 x2

x1, , xk

xk

Cluster2

w1 w2

Clustern

wn

Cluster1

Aggregation Part

Conclusion Part

x1, , xk

FPNN

w1 F1 ( z1 ,

, zP )

Output

w2 F2 ( z1 ,

, zP )

yˆ k

wn Fn ( z1 ,

, zP )

FPNN 1st layer

x1 x2

2nd layer or higher

PFN

PFN

PFN

PFN

PFN

PFN

F1 ( z1 ,

, zP )

PFN

F2 ( z1 ,

, zP )

PFN

Fn ( z1 ,

, zP )

PFN

x3

PFN

x4

PFN

PFN

PFN

PFN

PFN xk 1 xk 2

 xkl

FCM A Cluster1 1

w1k

A2 w2 k

Cluster2



f1 ( xk1 ,

, xkl )

f2 (xk1,

, xkl )

yˆ k

 A

Clusternn

wnk

fn (xk1, , xkl )

[ x k 1 , x k 2 ,..., x kl ]

7

Fig. 1. Architecture of HRBFNNs Under the assumption of the weighted Euclidean distance, the necessary conditions for solutions (U,V) of min{ J m (U, V)} come in the form 1 (11) uip wip , 1 p m, 1 i n (2/( r 1)) n x p vi xp

i 1

vj

and m

uipr x p p 1 m

vi

, 1 i n

(12)

r ip

u p 1

3.2 Estimating Coefficients of the Polynomials In the design of the aggregation part, the least squares error (LSE)-based method is utilized to estimate the values of the parameters of the polynomials forming the local models. The objective function (performance index) reads as follows: m

2

n

JG=

yk k 1

(13)

wik f i ( xk )) i 1

where wik is the normalized firing (activation) level of the i-th rule. The performance index J G can be expressed in a concise matrix form as follows JG

Y Xa

T

Y Xa

(14)

where a is the vector of the coefficients of the polynomial,

is the output vector of data,

Y

X

is matrix which rearranges input data, information granules (centers of each cluster) and activation levels. In case all consequent polynomials are linear (first-order polynomials),

X

and a can be

expressed as follows

X

1

u11 x11

u11 xl1

1

un1 x11

un1 xl1

1

u1k x12

u1k xl1

1

unk x12

unk xl1

1 u1m x1m 1

1 unm x1m

u1m xlm

1

1

n

1

the first fuzzy rule a [ a10

a11



n

unm xlm n

n

the n'th fuzzy rule

a1l

 

an 0

a n1

 a nl ]T

The optimal values of the coefficients of the consequent are determined in the form a

( XT X ) 1 XT Y

(15)

8

3.3 Developing Fuzzy Polynomials Neural Networks via Polynomial Fuzzy Neurons In the design of the conclusion part, we note that the PFN is essentially a fuzzy radial basis function neural network. In the second and successive layers, PFNs are utilized as the generic processing units while their outputs are viewed as virtual input variables, which are no longer the input variables of the original system. The input variables set of a PFN is a collection of the specific subset of input variables coming from the original system. The PFNs are developed based on a fuzzy inference and fuzzy partitions (spaces). In some sense, a general topology of PFNs comes as an extended structure of the conventional fuzzy inference system (FIS) as shown in Figure 1. Unlike the conventional FIS, the premise part of PFNs is realized with the aid of information granulation, while the consequence part of PFNs is constructed based on different types of polynomials. In the design of premise part, information granules (the centers of individual clusters) and activation levels (degree of memberships) are determined by means of the FCM. In the design of consequence part, four types of polynomials, namely a constant type (a zero-order polynomial), linear type (a first-order polynomial), quadratic type (a secondorder polynomial), and a modified quadratic type (a modified second-order polynomial) being viewed as a local model representing the input-output relationship in a sub-space of the corresponding antecedent. One of the four types is selected for each sub-space as the result of the optimization, which will be described later in this study. It has to be stressed that the PFNs do not suffer from the curse of dimensionality (as all variables are considered en-block) [10][28] in comparison with the conventional fuzzy neurons in FPNN [20]. As a result, we may construct more accurate and compact models with only a small number of fuzzy rules by using relative high-order polynomials. More specifically, PFNs can be represented in form of ―if-then‖ fuzzy rules R i : IF x k is Ai THEN yki

Where

Ri

f i (x k )

(16)

is the ith fuzzy rule, i=1, ∙∙∙, n, n is the number of fuzzy rules (the number of clusters),

fi (xk ) is the consequent polynomial of the ith fuzzy rule, i.e., a local model representing input-

output relationship of the ith sub-space (local area). wik is the degree of membership (or activation level) of the ith local model vi =[vi1 vi2

… vil]T is the ith prototype

Table 1. Type of polynomials used in the local models (conclusions) Order

Type

Polynomial 9

1

C: Constant

fi ( xk1 ,..., xkl ) ai 0

2

L: Linear

fi ( xk1 ,..., xkl )

3 4

Q: Quadratic M: Modified quadratic

ai 0

ai1 xk1 ai 2 xk 2 ... ail xkl

fi ( xk1 ,..., xkl ) ai 0 ai1 xk1 ai 2 xk 2 ... ail xkl ai (2l 1) xk1 xk 2 ... ai ((l fi ( xk1 ,..., xkl )

ai (l 1) xkl2 ... ai (2l ) xkl2 x

x

1)( l 2)/2) k ( l 1) kl

ai 0 ai1 xk1 ai 2 xk 2 ... ail xkl

ai (l 1) xk1 xk 2 ... ai (l (l

x

x

1)/2) k ( l 1) kl

As mentioned earlier, we admit the consequent polynomials to be in of one of the forms shown in Table 1. The numeric output of the model, based on the activation levels of the rules, is given in the form n

yˆ k

wik fi ( xk1 , xk 2 ,...,xkl )

(17)

i 1

The learning of the PFNs involves the development of the premise as well as the consequent part of the rules. The premise part is formed through fuzzy clustering realized in the input space (here the fuzzification coefficient is equal to 2), while the learning of the consequent part is implemented by minimizing the LSE.

4. Genetic Optimization of Hybrid Radial Basis Function Neural Networks

Genetic Algorithms (GA) have been shown to be useful when dealing with nonlinear and complex problems. GA is a stochastic search technique based on the principles of evolution, natural selection, and recombination. GA is exploited here to optimize the parameters of HRBFNNs. In this section, we elaborate on the algorithm details of the design method by considering the functionality of FCM, PFNs, and GA in the architecture of HRBFNNs. The design procedure of HRBFNNs comprises the following steps. [Step 1] Preprocessing data set using PCA To obtain dimensionality reduction, principal component analysis is exploited here to preprocess the data sets for data reduction and extraction of features. [Step 2] Form training and testing dataset For convenience, an input–output data set is denoted as (xi,yi) = (x1i, x2i, …, xni, yi), i=1, 2, …, N, where N is the number of data points. Here we divided the data set into three parts, namely training data, validation data, and testing data. The training and validation data are used to construct the HRBFNNs, while the testing data is utilized to evaluate the quality of the network. The objective

10

function (performance index) includes both the training data and validation data and comes as a convex combination of these two components. PI+ (1- ) *VPI

f ( PI ,VPI )

Where

(18)

is a weighting factor that allows us to form a sound balance between the performance of

the model for the training, validation data, and testing data. We use two performance indexes viz. the standard root mean squared root error (RMSE), and the mean squared error (MSE) 1 m

PI (VPI or EPI ) 1 m

m

( yi

yi* ) 2 ,

( RMSE )

(19)

i 1

m

( yi

yi* ) 2 .

( MSE )

i 1

Here yˆ is the output of the neural network, m is the total number of data, and i is the data number. [Step 3] Decide upon the generic parameters used in the conclusion part Here we decide upon the essential design parameters of HRBFNNs. The choice of the essential design parameters of FPNN shown in Table 2 allows us selecting the best model with regard to the characteristics of the data, model design strategy, nonlinearity and predictive capabilities.

Table 2. Summary of design parameters used in the design of the network Index

Parameter Number of layers (NL) Max No. of nodes (w) created in a layer No. of input variables Fuzzification coefficient (f) No. of clusters (c) Polynomial Order (PO)

FPNN

PFN

Criterion 1~3 15 2 2 2 L, Q, M

[Step 4] Determine system’s input variables Define system’s input variables x1, x2, …, xn related to the output variable y. [Step 5] Design of PFNs For the selection of r inputs, the number of nodes (PFNs) to be generated in each layer becomes equal to k

n! , where, n is the number of total inputs and r stands for the number of the (n r )!r !

chosen input variables. For instance, in case of PFN with two inputs and linear polynomial, we have k partial descriptions such as (20); refer to Figure 1 and Table 1. zˆk

a0 a1 ( x p v1T ) a2 ( xq v2T ), k 1, 2,...,

n! (n 2)!2!

(20)

[Step 6] Check the termination criterion 11

A size of the network (expressed in terms of the maximum polynomials order for prediction, or the maximal number of the PFN layers) is used as the termination condition, which controls the polynomial order of the HRBFNNs. As far as the depth of the network is concerned, the generation process is stopped at a depth of less than three layers of PFNs. This size of the network has been experimentally found to form a sound compromise between the high accuracy of the resulting neural network and its complexity as well as generalization abilities. [Step 7] Select nodes with the best predictive capability and construct their corresponding layer To select nodes (PFNs) exhibiting the highest predictive capability, we use the following steps: [Step 7-1] Estimate the polynomial coefficient parameters (a0, a1 , …, a5) of each PFN by using the subset of the training data and validation data. [Step 7-2] Determine the identification error (EPI) of each PFN with the aid of the testing data set. [Step 7-3] Sort and rearrange all PFNs in a descending order based on their performance (EPI1, EPI2, …, EPIn!/(n-r)!r!).

[Step 7-4] Select the best w nodes, which are used for constructing the next layer PFNs. w is a pre-defined number of nodes with better predictive capability. All nodes (PFNs) are first rearranged in a descending order based on their performance (EPI1, EPI2, …, EPIn!/(n-r)!r!), and then some nodes will be selected. There are two cases as to the number of the preserved nodes in each layer i) If the number of the created nodes is n!/(n-r)!r! > w, then the number of the nodes retained for the next layer is equal to w. Namely, w nodes of the all generated nodes are selected according to the low-end order of EPI and the remained nodes, (n!/(n-r)!r!)–w, are discarded. ii) If the number of the created nodes is n!/(n-r)!r!

w, then for the next layer, the number of the

retained nodes is equal to n!/(n-r)!r!. Through the previous conditions of i) and ii), one can effectively reduce a large number of nodes and avoid a significant computing overhead.

Fuzzification coefficient



The number The number of Input variables of clusters input variables





Input variables

...

Input variables



(a) Arrangement of chromosome used in HRBFNNs

12

1.10



3



4

6



4

1

3

5

7

2

8



Interpretation of chromosome ① Fuzzification coefficient : 1.10 ② The number of clusters : 3 ③ The number of input variables : 4 ④ The selected input variables : [6 4 1 3] here 6, 4, 1 and 3 are indexes of input variables, respectively (b) Example of interpretation of chromosome

Fig. 2. Chromosome composition of genetic optimized HRBFNNs and its interpretation [Step 8] Select structure of HRBFNN by means of genetic optimization With regard to the optimization of HRBFNNs, the maximal number of layers in FPNNs is set to three. We consider eight parameters, i.e., the number of clusters (fuzzy rules), the fuzzification coefficients of FCM, the number of input variables for PFNs, and a collection of the specific subset of input variables. The first two parameters come from the premise part, while the other parameters are from the conclusion part (first layer of FPNNs). Figure 2 depicts the arrangement of the chromosomes used in the GA optimization. Fig. 2(b) offers an interpretation of the chromosome in case the upper bound of search space of the fuzzy clusters is 3. The fuzzification coefficient is 1.1, the number of the clusters is 3 and the first four values present in part ④ are selected. The structure selection of HRBFNNs can formed through a series of the following steps: [Step 8-1] Form the optimal subset of virtual input variables (Z1, Z2 , …, ZP) with the aid of GA. [Step 8-2] Determine the two parameters (including fuzzification coefficient and the number of clusters) by means of GA, and then construct partition matrix W based on the original input variables x1, x2, …, xn by using FCM. [Step 8-3] Compute output of the neural network by using weighted average defuzzifier (decoding scheme). [Step 9] Output the final result An overall design flowchart for the proposed HRBFNNs is shown in Fig. 3.

13

5. Experimental Studies

To evaluate the performance of the model, the proposed HRBFNNs are experimented with by using a series of numerical data [28-34]. In the first experiments, the proposed network is applied to deal with MIS data, which is a well-known software engineering data. The second and third experiments proceed with two selected data sets coming from Machine Learning repository. Finally, the experimental results of several other real-world problems of varying levels of complexity (different numbers of input variables and available data) are summarized as shown in Appendix I. For each experiment, two types of design scenarios are considered. In the first case, we have completed the experiments based on the original dataset (without using the PCA reduction), while in the second case, the data set is preprocessed by using PCA. In all cases, the experiment was repeated 10 times leading to a

random sub-sampling validation. Each data set is divided into three parts: 50% of data set is used for the training data; 30% of data set is utilized for the validation data; and the remaining 20% of data set is considered as testing data. For convenience, some representations are summarized as follows: PO is the polynomial order of PFNs, PI denotes the performance index for training data set, VPI represents the performance for the validation dataset, and EPI concerns the performance for the testing dataset.

Table 3. Generic parameters of genetic algorithm (GA) Item

Criterion

Population size Crossover probability Mutation probability Generations (iterations) Selection

100 0.65 0.1 300 Roulette wheel

Table 3 depicts the parameters of GA. The selection mechanism of the GA uses a roulette wheel while the generic parameters of GA are set up as follows: we use 300 generations and a population of size of 100 individuals for the optimization of HRBFNNs. The crossover rate and mutation probability are set up to be equal to 0.65 and 0.1, respectively (the choice of these specific values of the parameters is a result of intensive experimentations; as a matter of fact, those values are in line with those reported in the literature).

14

START 1

Preprocess data using PCA Training data

2

Validation data

Division of data

Testing data 3

Initial information for constructing HRBFNN architecture

No. of PFN layers

Decision of no. of nodes in each layer

0≤ q≤ 3

No. of nodes ≤ w PFN

No. of input variables No. of clusters Polynomial type L, Q, M 2≤ c≤ 5 2≤ r≤ 4 4

Obtain entire system's input variables

x1,x2,…,xk

5

Constructing PFNs (PFN1,PFN2,…,PFNm)

m

(k

k! r )! r !

z1,z2,…,zP 7

z1,z2,…,zm 6

Selection

NO

Termination criterion ?

z1,z2,…,zm

YES

z1,z2,…,zm

8

Selection of HRBFNN architecture using GA

Selection of input variables

Fuzzifications

No. of clusters

z1,z2,…,zp (p from m)

1.2 ≤ f ≤ 4.0

2 ≤ c ≤ 12

z1,z2,…,zp

8-1

8-2

Local models

c

f

Fuzzy C-Means clustering

FF1,F 1,F 2,…,F 2,…,F p n

x1,x2,…,xk

w1,w2,…,wn

8-3 Defuzzification

9 Final output yˆ

END

Fig. 3. An overall flowchart of design of HRBFNNs 5.1 Medical Imaging System (MIS) In this section, the proposed HRBFNNs are applied to software engineering data. Here we consider a medical imaging system data set, which involves 390 software modules written in Pascal and FORTRAN. Each module is described by 11 input variables, that is, total lines of code

15

including comments (LOC), total code lines (CL), total character count (TChar), total comments (TComm), number of comment characters (MChar), number of code characters (DChar), Halstead’s program length (N), Halstead’s estimated program length ( Nˆ ), Jensen’s estimator of program length (NF), McCabe’s cyclomatic complexity (V(G)), and Belady’s bandwidth metric (BW), respectively [28]. The output variable of the model is the number of reported changes—change reports (CRs). We consider the RMSE defined by (19) to be used as the performance index.

Table 4. Performance index of HRBFNNs for the MIS data (a) Without the use of PCA NL (FPNN) PO (PFN)

NL=1

NL=2

NL=3

PI

VPI

EPI

PI

VPI

EPI

PI

VPI

EPI

L

0.7074 ±0.104

1.4268 ±0.204

1.0576 ±0.438

0.6995 ±0.106

1.4413 ±0.209

1.0596 ±0.447

0.6996 ±0.107

1.4322 ±0.198

1.0541 ±0.471

Q

0.7069 ±0.115

1.5074 ±0.194

1.0000 ±0.363

0.6860 ±0.099

1.5574 ±0.244

1.0356 ±0.444

0.6815 ±0.093

1.4756 ±0.214

1.0585 ±0.403

M

0.8077 ±0.089

1.8921 ±0.204

1.1146 ±0.481

0.8023 ±0.086

1.9971 ±0.305

1.1187 ±0.459

0.7861 ±0.099

2.6056 ±1.598

1.1960 ±0.525

(b) With the use of PCA NL (FPNN) PO (PFN)

NL=1

NL=2

NL=3

PI

VPI

EPI

PI

VPI

EPI

PI

VPI

EPI

L

0.2802± 0.049

0.5656± 0.098

0.4264± 0.221

0.2769± 0.048

0.5608± 0.099

0.4299± 0.226

0.2739 ±0.048

0.5603± 0.098

0.4321 ±0.226

Q

0.2908± 0.048

0.5618± 0.094

0.4274± 0.231

0.2797± 0.047

0.5559± 0.094

0.4104± 0.219

0.2763 ±0.049

0.5545± 0.094

0.4460 ±0.259

M

0.2853± 0.049

0.5937± 0.119

0.4668± 0.303

0.2720± 0.051

0.6723± 0.171

0.5493± 0.292

0.2656 ±0.051

1.9336± 3.369

3.6293 ±6.653

The results are listed in Table 4. Here the experimental results are displayed in the same manner as in the previous examples, and the model with the best PI is considered as the optimal model. As shown in Table 4, the optimal model (in boldface) could be emerged according to the layer of assigned PFN. Without the use of PCA, the optimal performance is PI= 0.6815±0.093, VPI=1.475±0.214 and EPI=1.0588±0.403; while with the use of PCA, the optimal performance comes with PI=0.2763±0.049, VPI=0.5545±0.094 and EPI=0.4460±0.259 reported for the threelayer FPNN.

16

2

2

1.8

1 0.8 0.6 0.4

1.8 1.6 1.4 1.2

0.2 0

Testing data error (EPI)

2.2

1.2

Validation data error (VPI)

Training data error (PI)

1.4

1

1

2 Layer of FPNN

0.8

3

1.6 1.4 1.2 1 0.8 0.6

1

(a) Training data

2 Layer of FPNN

0.4

3

1

(b) Validation data

2 Layer of FPNN

3

(c) Testing data

Fig. 4. Performance index of HRBFNNs for the MIS without the use of PCA

0.5

0

-0.5

1

2 Layer of FPNN

1

0.5

0

3

1.5 Testing data error (EPI)

1.5

Validation data error (VPI)

Training data error (PI)

1

(a) Training data

1

2 Layer of FPNN

3

(b) Validation data

1

0.5

0

-0.5

1

2 Layer of FPNN

3

(c) Testing data

Fig. 5. Performance index of HRBFNNs for the MIS with the use of PCA

Input

x1 x2

FCM

x3 , x 4 ,

, x9

Cluster1

w1

Cluster2

w2

x11

The 1st layer x3 x4 x5 x6 x7 x8 x9

The 2nd layer

PFN PFN PFN PFN

F1 ( z3 , z8 , z14 )

w1F1 ( z3 , z8 , z14 )

yˆ k

PFN

F2 ( z3 , z8 , z14 )

PFN

Output

w2 F2 ( z3 , z8 , z14 )

PFN PFN

(a) Without the use of PCA

17

Input x1 x2

FCM PCA

x11

x1 x2 x3 x4

Cluster1

w1

Cluster2

w2

x1 , x2 , x3 , x4

The 1st layer

The 2nd layer

x1

PFN

PFN

x2

PFN

x3

PFN

x4

PFN

PFN PFN

F1 ( z1 , z2 , z4 )

w1 F1 ( z1 , z2 , z4 )

F2 ( z1 , z2 , z4 )

w2 F2 ( z1 , z2 , z4 )

Output

yˆ k

(b) With the use of PCA

Fig. 6. Optimal HRBFNN architectures for the MIS data

Table 5. Results of comparative analysis (MIS) Model

PI

VPI

EPI

Simplified

40.753

17.898

Linear

35.745

17.807

SI = 2

32.195

18.462

SI = 3

32.251

19.622

PN based FSONN

18.043

11.898

FPN based FSONN

23.739

9.090

All input

40.05

36.32

Joint

35.23

28.03

Successive

32.34

25.98

Linear regression

5.877±0.626

6.570±1.024

Incremental model

4.620±0.896

6.624±0.773

SI = 2

2.2141±0.879

3.4630±1.490

SI = 3

0.8852±0.082

3.4690±0.780

SI = 4

0.7818±0.085

3.4211±1.306

SONFN[28]

Index MSE

FPNN[201]

MSE

GA-based FSONN[29]

MSE

Regression model [30] HPGA-optimized FIS [30]

MSE

Incremental model[28]

RMSE

GO-FPNN [31] (3rd layer)

Without the use of PCA

PO=Q

Our Model (HRBFNNs) With the use of PCA

MSE

PO=Q

NL=1

0.7069±0.115

1.5074±0.194

1.0000±0.363

NL=2

0.6860±0.099

1.5574±0.244

1.0356±0.444

NL=3

0.6815±0.093

1.4756±0.214

1.0585±0.403

NL=1

0.2908±0.048

0.5618±0.094

0.4274±0.231

NL=2

0.2797±0.047

0.5559±0.094

0.4104±0.219

NL=3

0.2763±0.049

0.5545±0.094

0.4460±0.259

RMSE

RMSE

18

For the HRBFNN when the NL is equal to 3, the layers of FPNN impact the performance of the networks are as shown in Fig. 4 and Fig. 5, respectively. Their increase leads to the reduction of error for the training set and validation set, while there is a different tendency for the testing set. As shown in this figure, the value of the training error is decreased with the increasing prediction abilities of HRBFNN, whereas the testing error becomes higher. Fig. 6 illustrates the comparison of the optimal topologies of HRBFNN for the third layer of FPNNs. It is apparent that the input variables become only four with the use of PCA, while the performance of HRBFNN is better than the HRBFNN without the use of PCA. Table 5 reports the performance of the proposed model vis-à-vis performance of other models. In all cases, the proposed model provides substantially better approximation and generalization capabilities.

5.2 Abalone Data (ABA) The second data is Abalone machine learning data, which concern with predicting the age of abalone from physical measurements [28]. This is a larger data set consisting of 4177 input-output pairs and seven input variables (including Length, Diameter, Height, Whole weight, Shucked weighted, Viscera weight, and Shell weight). The performance index is MSE defined by (19). For the ABA dataset, the optimal performance of the proposed HRBFNNs is summarized in Table 6. As shown there, the tendency in the performance when dealing with the PFN of different polynomial order are similar to those of the network for ABA. That is to say, the proposed HRBFNNs lead to stable performance when considering a sound balance between the approximation and generalization capabilities. With the use of PCA, the dimensionality of the input space was reduced from seven to three. It has to be stressed that the performance of the model where the PCA reduction has been completed is quite better in comparison with the results produced by the model without the use of PCA. Table 6. Performance index of HRBFNNs obtained for the ABA data (a) Without the use of PCA NL (FPNN) PO (PFN)

NL=1

NL=2

NL=3

PI

VPI

EPI

PI

VPI

EPI

PI

VPI

EPI

L

5.1252 ±0.342

4.3092 ±0.394

3.2088 ±0.649

5.1324 ±0.359

4.2631 ±0.433

3.2170 ±0.772

5.0647 ±0.375

4.2146 ±0.470

3.0610 ±0.688

Q

5.1379 ±0.355

4.3177 ±0.394

3.2020 ±0.660

5.0776 ±0.339

4.2137 ±0.428

3.2858 ±0.895

5.0508 ±0.363

4.2367 ±0.427

3.0705 ±0.778

M

4.7720 ±0.293

4.1700 ±0.411

3.2713 ±1.174

4.7153 ±0.295

4.1556 ±0.402

3.3350 ±1.165

4.6917 ±0.298

4.1377 ±0.430

3.5023 ±1.317

19

(b) With the use of PCA NL (FPNN) PO (PFN)

NL=1

NL=2

NL=3

PI

VPI

EPI

PI

VPI

EPI

PI

VPI

EPI

L

0.1456 ±0.008

0.1664 ±0.010

0.1683 ±0.045

0.1401 ±0.011

0.1640 ±0.013

0.1587 ±0.356

0.1371 ±0.012

0.1605 ±0.013

0.1608 ±0.045

Q

0.1403 ±0.010

0.1586 ±0.012

0.1529 ±0.019

0.1231 ±0.014

0.1462 ±0.016

0.1302 ±0.014

0.1202 ±0.014

0.1409 ±0.019

0.1288 ±0.015

M

0.1347 ±0.016

0.1309 ±0.015

0.1350 ±0.016

0..1315 ±0.016

0.1333 ±0.015

0.1377 ±0.018

0.1311 ±0.016

0.1332 ±0.015

0.1366 ±0.015

Figures 7-8 depict the tendency of the changes in the performance index of the best neural networks. They show that with the increasing number of layers in FPNN, the model produces better approximation while the prediction ability decreases to some extent.

5

5.5

4.2 4

5

4.5

Testing data error (EPI)

Validation data error (VPI)

Training data error (PI)

4.8 4.6 4.4 4.2 4 3.8 4

1

2 Layer of FPNN

3.6

3

3.8 3.6 3.4 3.2 3 2.8

1

(a) Training data

2 Layer of FPNN

2.6

3

1

2 Layer of FPNN

(b) Validation data

3

(c) Testing data

Fig. 7. Performance index of HRBFNNs for the MIS without the use of PCA

0

1

2 Layer of FPNN

(a) Training data

3

Testing data error (EPI)

0.5

-0.5

1

1

Validation data error (VPI)

Training data error (PI)

1

0.5

0

-0.5

0.5

0

-0.5 1

2 Layer of FPNN

(b) Validation data

3

1

2 Layer of FPNN

3

(c) Testing data

Fig. 8. Performance index of HRBFNNs for the ABA with the use of PCA

20

Input

x1 x2

FCM

x1 , x 4 , x 5 , x 7

Cluster1

w1

Cluster2

w2

x7

The 1st layer x1

PFN

x4

PFN

x5

PFN

x7

PFN

The 2nd layer

PFN

F1 ( z2 , z3 , z7 )

w1 F1 ( z2 , z3 , z7 )

Output yˆ k

PFN

F2 ( z2 , z3 , z7 )

w2 F2 ( z2 , z3 , z7 )

PFN

(a) Without the use of PCA Input x1 x2 x7

FCM PCA

x1

x1 , x 2 , x 3

x2

Cluster1

w1

Cluster2

w2

x3

The 1st layer

x1 x2

PFN

PFN

x3 PFN

The 2nd layer

PFN

F1 ( z1 , z2 , z3 )

w1 F1 ( z1 , z2 , z3 )

F2 ( z1 , z2 , z3 )

w2 F2 ( z1 , z2 , z3 )

PFN

Output yˆ k

PFN

(b) With the use of PCA

Fig. 9. Optimal HRBFNN architectures for the ABA data

Figure 9 visualize the architectures of the optimal HRBFNN for the ABA data set. It is clear that the optimal HRBFNN with the use of PCA leads to better performance and simpler structure in comparison with the HRBFNN without the use of PCA. The comparative analysis illustrated in Table 7 contrasts the proposed model with other models and the proposed models are preferred as architecture in the modeling of ABA dataset given approximation and generalization aspects of the model.

21

Table 7. Results of comparative analysis (ABA) Model

PI

VPI

EPI

Linear regression[32]

14.15±0.07

17.22±0.20

RBFN [32]

H=30

10.36±0.02

10.48±0.01

RBFN+context-free clustering [32]

H=30

10.54±0.01

10.58±0.006

Boosting of granular model [32]

P=5, C=6

8.39±0.008

8.68±0.014

RBFNN[33]

6.36±0.24

6.94±0.31

RBFNN with context-free clustering[33]

5.52±0.25

6.91±0.45

Without optimization

5.21±0.12

6.14±0.28

One-loop optimization

4.80±0.52

5.22±0.58

Multi-step optimization

4.12±0.35

5.32±0.96

H=30

4.496±0.195

4.94±0.245

Linguistic Modeling[33]

RBFNN I [34]

Functional RBFNN [34]

m=2.0; H

33

3.846±0.113

4.800±0.235

m=2.5; H

18

4.147±0.127

4.892±0.205

m=4.0; H

33

4.369±0.190

4.643±0.202

PSO-based PNN [22]

4.338±0.231

4.398±0.282

7.997±9.465

PSO-based FPNN [22]

4.217±0.142

4.340±0.188

2e+7±5e+7

RBFPNN[22]

3.605±0.169

4.561±0.272

4.710±0.224

NL=1

4.7720±0.293

4.1700±0.411

3.2713±1.174

NL=2

4.7153±0.295

4.1556±0.402

3.3350±1.165

NL=3

4.6917±0.298

4.1377±0.430

3.5023±1.317

NL=1

0.1403±0.010

0.1586±0.012

0.1529±0.019

NL=2

0.1231±0.014

0.1462±0.016

0.1302±0.014

NL=3

0.1202±0.014

0.1409±0.019

0.1288±0.015

Without the use of PCA

PO=M

Our Model (HRBFNNs) With the use of PCA

PO=Q

5.3 Boston Housing Data The third data is the well-known Boston Housing (BH) data [33-35], which deals with the house price and consists of 506 observations and 13 independent variables and a single output variable. In our experiments, the performance of the proposed neural network is evaluated in terms of the average and standard deviation of RMSE. Table 8 summarizes the performance of HRBFNNs. As shown, there is a clearly visible tendency that the values of the performance index get lower with the increase of the order of the polynomial, the best performance is shown in bold. This effect is not surprising as too many information granules (higher granularity) might contribute to a potential memorization effect. In most of cases, the appropriate increase of the polynomial order (complexity) makes the prediction abilities (EPI) of the proposed model better. It is also apparent that the preprocessing of PCA has an important impact on the enhanced performance of the neural networks.

22

Table 8. Performance index of HRBFNNs for the BH data (a) Without the use of PCA NL (FPNN) PO (PFN)

NL=1

NL=2

NL=3

PI

VPI

EPI

PI

VPI

EPI

PI

VPI

EPI

L

2.1476± 0.643

4.0877± 1.243

4.2276± 1.639

1.9638±0 .585

3.8616± 1.174

9.4547± 9.108

1.8692 ±0.556

3.7816± 1.141

4.1689 ±1.267

Q

2.3483± 0.395

4.4705± 0.763

4.3438± 0.298

1.9672 ±0.106

4.1808± 0.662

3.9196± 0.593

1.8839 ±0.086

4.3140± 0.782

4.2471 ±0.319

M

2.5399± 0.365

4.5112± 0.469

3.9173± 0.455

2.0898±0 .310

4.2564± 0.318

3.6317± 0.213

1.9151 ±0.277

4.4086± 0.583

4.7049 ±1.406

(b) With the use of PCA NL (FPNN) PO (PFN)

NL=1

NL=2

NL=3

PI

VPI

EPI

PI

VPI

EPI

PI

VPI

EPI

L

0.3758± 0.017

0.4767± 0.035

0.4990± 0.051

0.3715 ±0. 012

0.4751± 0.032

0.5001± 0.074

0.3731 ±0.011

0.4734± 0.036

0.4871 ±0.065

Q

0.3369± 0.018

0.4681± 0.050

0.4933± 0.075

0.3226 ±0.021

0.4687± 0.053

0.4801± 0.094

0.3226 ±0.022

0.4601± 0.055

0.5001 ±0.075

M

0.3790± 0.023

0.4862± 0.045

0.5286± 0.084

0.3403 ±0.023

0.4787± 0.052

0.6663± 0.500

0.3205 ±0.021

0.4694± 0.055

0.8349 ±1.001

3

5

5

4.8

2

1.5

Testing data error (EPI)

Validation data error (VPI)

Training data error (PI)

4.8

2.5

4.6 4.4 4.2 4 3.8

1

1

2 Layer of FPNN

3

4.6 4.4 4.2 4 3.8 3.6

3.6

1

(a) Training data

2 Layer of FPNN

3.4

3

1

2 Layer of FPNN

(b) Validation data

3

(c) Testing data

Fig. 10. Performance index of HRBFNNs for the BH data without the use of PCA

1

0.5

0

-0.5

1

2 Layer of FPNN

(a) Training data

3

1.5

Testing data error (EPI)

1.5 Validation data error (VPI)

Training data error (PI)

1.5

1

0.5

0

-0.5

1

2 Layer of FPNN

(b) Validation data

3

1

0.5

0

-0.5

1

2 Layer of FPNN

3

(c) Testing data

Fig. 11. Performance index of HRBFNNs for the BH data with the use of PCA

23

Input

x1 x2

FCM

x1, x2 , , x13

Cluster1

w1

Cluster2

w2

x13

The 1st layer

x2 ( Zn)

PFN

x5 ( Nox) x6 ( Rm) x10 (Tax)

PFN

The 2nd layer

PFN

F1 ( z11 ,

PFN

, z15 )

w1F1 ( z11 ,

, z15 )

yˆ k

PFN PFN PFN

x11 ( Ptratio) x13 ( Lstat )

Output

F2 ( z11 ,

PFN

, z15 )

w2 F2 ( z11 ,

, z15 )

PFN PFN PFN

(a) Without the use of PCA

Input x1 x2 x13

FCM PCA

x1 x2

x1, x2, , x7

Cluster1

w1

Cluster2

w2

x7

The 1st layer x1 x2 x3 x4

PFN

The 2nd layer

PFN

PFN PFN

x5

PFN

x6

PFN

F1 ( z1 , z5 , z9 )

w1 F1 ( z1 , z5 , z9 )

F2 ( z1 , z5 , z9 )

w2 F2 ( z1 , z5 , z9 )

PFN

Output

yˆ k

PFN

x7

(b) With the use of PCA

Fig. 12. Optimal HRBFNN architecture for the BH data

Fig. 10 and Fig. 11 display the values of the performance index (both in terms of its mean value and the standard deviation) vis-à-vis the HRBFNNs with the increasing layer of FPNN. As shown in this figure, the value of the training error is decreased with the increasing prediction abilities of 24

HRBFNN. The testing error reduces in case of NL (the number of layers in FPNN) is equal to two, while it becomes high when NL arrives at three. This tendency shows that the substantial increase of the polynomial order of the proposed model reduces the training error, while sometimes it worsens the prediction abilities. The optimal HRBFNNs formed for the BH data set are shown in Figure 12. With the use of PCA, the number of input variables are reduced to seven. The results demonstrate that the use of the PCA is beneficial to the enhanced prediction abilities of the model in contrast to the results formed by the HRBFNN without the use of PCA. Table 9 reports the results of comparative analysis of the performance of the proposed network with some other models. The experimental results reveal that the proposed network outperforms the existing models both in terms of its approximation capabilities as well as generalization abilities.

Table 9. Results of comparative analysis (BH) Model

PI

RBFNN[33] RBFNN with context-free clustering[33]

EPI

Index

6.36±0.24

6.94±0.31

RMSE

5.52±0.25

6.91±0.45

RMSE

Without optimization

5.21±0.12

6.14±0.28

One-loop optimization

4.80±0.52

5.22±0.58

Multi-step optimization

4.12±0.35

5.32±0.96

Linear regression

4.535±0.240

5.043±0.396

Polynomial(2 order)

3.815±0.264

4.455±0.399

Incremental model

3.279±0.177

4.298±0.439

RBFNN I [34]

H=30

5.105±1.816

5.907±2.311

RMSE

RBFNN II [34]

H=30

7.031±2.202

7.558±3.063

RMSE

m=2.0

4.724±0.644

14.064±0.820

m=2.5

8.079±1.762

14.825±1.361

m=3.5

8.450±1.029

14.523±1.563

Linguistic Modeling[33]

Incremental model[28]

nd

Functional RBFNN [34]

VPI

RMSE

RMSE

MSE

PSO-based PNN [34]

2.129±0.304

3.738±0.625

7.661±9.690

RMSE

PSO-based FPNN [34]

2.779±0.325

3.298±0.327

7.479±7.596

RMSE

RBFPNN[22]

1.398±0.244

3.637±0.298

4.827±1.474

RMSE

Type-1 hybrid

2.60

3.63

RMSE

Type-2 hybrid

2.35

3.87

RMSE

QANFN[35]

Without the use of PCA

PO=Q

Our model (HRBFNNs) With the use of PCA

PO=Q

NL=1

2.3483±0.395

4.4705±0.763

4.3438±0.298

NL=2

1.9672±0.106

4.1808±0.662

3.9196±0.593

NL=3

1.8839±0.086

4.3140±0.782

4.2471±0.319

NL=1

0.3369±0.018

0.4681±0.050

0.4933±0.075

NL=2

0.3226±0.021

0.4687±0.053

0.4801±0.094

NL=3

0.3226±0.022

0.4601±0.055

0.5001±0.075

RMSE

25

6. Concluding Remarks Classical polynomial neural networks (PNNs) have placed substantial emphasis on the generation of complex nonlinear functions However, they do not exhibit abilities to deal with granular information. With this regard, fuzzy sets bring some useful opportunities. By combing PNNs and FCM, we take advantages of the two technologies and proposed architecture of HFRBFNN to address serials of prediction problems. The work contributes to the research on neuro-fuzzy models and the main findings can be briefly summarized as follows. First, we have investigated hybrid fuzzy radial basis function neural networks. On the one hand, when compared with the conventional PNN, the HFRBFNN helps overcome the limitation where the information granulation of the PNNs. Information granules that is essentially reflective of the characteristic of the system(experimental data) is a suitable way of realizing abstraction converts the original problem into a series of manageable subtasks. On the other hand, when compared with the conventional FRBFNN, the HFRBFNN obtain the capabilities to estimate the complex functional (highly polynomial) character between the input and output data, especially high-order nonlinear character. Instead of conventional neurons, PFNs are constructed to generate polynomial functions for the better predictive capability. Second, we have investigated the HFRBFNNs optimized by genetic algorithms. In the design of HFRBFNN, there are a number of parameters required to be determined. With the use of GAs, the proposed HFRBFNN produces optimal parameters. Third, we have used PCA to preprocess the dataset. As an important step in data processing, PCA is used for reducing the dimension of input variables. With the use of PCA, the proposed HFRBFNN leads to the better performance of the developed network. On average, we have observed at least 50% improvement of the performance. The experimental studies using several well-known datasets show a superb performance of the HFRBFNN when compared with some recent neuro-fuzzy models, especially for the datasets such as the Medical image system data, Abalone machine learning data. More importantly, with the proposed network with the information granulation, one can efficiently develop the optimal topology of network (optimization of the structural and parametrical network architecture), which is crucial to improving performance of the resulting model. For future study, HRBFNNs may be improved by constructing new fuzzy neurons or new architectures. Furthermore, multi-objective evolutionary algorithms can be used to optimize the HRBFNNs.

26

Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant No. 61301140), and supported by the Open Funding Project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China (Grant No. BUAA-VR-14KF-11), and also supported by the GRRC program of Gyeonggi province [GRRC Suwon 2013-B2, Center for U-city Security & Surveillance Technology] and by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (NRF-2012R1A1B3003568).

Reference [1] K. D. Karatzas, S. Kaltsatos, ―Air pollution modelling with the aid of computational intelligence methods in Thessaloniki, Greece‖, Simulation Modelling Practice and Theory, Vol. 15, pp. 1310-1319, 2007. [2] G. Bao, S. Wen, Z. Zeng, ―Robust stability analysis of interval fuzzy Cohen-Grossberg neural networks with piecewise constant argument of generalized type‖, Neural Networks, Vol. 33, pp. 32-34, 2012. [3] C. Riziotis, A. V. Vasilakos, ―Computational intelligence in photonics technology and optical networks: A survey and future perspectives‖, Information Sciences, Vol. 177, pp. 5292-5315, 2007. [4] R. del-Hoyo, B. Martin-del-Brio, N. Medrano, J. Fernandez-Navajas, ―Computational intelligence tools for next generation quality of service management‖, Neurocomputing, Vol. 72, pp. 3631-3639, 2009. [5] S. Dick, A. Tappenden, C. Badke, O. Olarewaju, ―A granular neural network: Performance analysis and application to re-granulation‖, International Journal of Approximate Reasoning, Vol. 54, No. 8, pp. 1149-1167, 2013. [6] G. W. Chang, C.I. Chen, Y. F. Teng, ―Radial-Basis-Function-Based neural network for harmonic detection‖, IEEE Transactions on Industrial Electronics, Vol. 57, No. 6, pp. 2171-2179, 2010. [7] G. Simone, F. C. Morabito, ―RBFNN-based hole identification system in conducting plates‖, IEEE Transactions on Neural Networks, Vol. 12, No. 6, pp. 1445-1454, 2001. [8] F. Behloul, B.P.F. Lelieveldt, A.Boudraa, and J.H.C. Reiber, ―Optimal design of radial basis function neural networks for fuzzy-rule extraction in high dimensional data,‖ Pattern Recognition, vol. 35, pp. 659-675, 2002. 27

[9] A. Staiano, R. Tagliaferri, and W. Pedrycz, ―Improving RBF networks performance in regression tasks by means of a supervised fuzzy clustering,‖ Neurocomputing, vol. 69, pp. 1570-1581, 2006. [10] S. K. Oh, W. D. Kim, W. Pedrycz, B. J. Park, ―Polynomial-based radial basis function neural networks (P-RBFNNs) realized with the aid of particle swarm optimization‖, Fuzzy Sets and Systems, Vol. 163, pp. 54-77, 2011. [11] H. G. Han, Q. L. Chen, J. F. Qiao, ―An efficient self-organizing RBF neural network for water quality prediction‖, Neural Networks, Vol. 24, pp. 717- 725, 2011. [12] C. Y. Liu, C. Chen, C. T. Chang, L. M. Shih, ―Single-hidden-layer feed-forward quantum neural network based on Grover learning‖, Neural Networks, Vol. 45, pp. 144-150, 2013. [13] A. G. Ivakhnenko, G. A. Ivakhnenko, ―The review of problems solvable by algorithms of the Group Method of Data Handling (GMDH)‖, Pattern Recogn. Image Anal. Vol. 5(4), pp. 527535, 1995. [14] B. J. Park, W. Pedrycz, S.K. Oh, ―Fuzzy Polynomial Neural Networks: hybrid architectures of fuzzy modeling‖, IEEE Transactions on Fuzzy Systems, Vol. 10(5), pp. 607-621, 2002. [15] A. Mital, ―Prediction of human static and dynamic strengths by modified basic GMDH algorithm‖, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 14, pp. 773-776, 1984. [16] M. Nagata, K. Takada, M. Sakuda, ―Nonlinear Interpolation of mandibular kinesiographic signals by applying sensitivity method to a GMDH correction model‖, IEEE Transactions on Biomedical Engineering, Vol. 38(4), pp. 326-329, 1991. [17] M. Iwasaki, H. Takei, N. Matsui, ―GMDH-based modeling and feedforward compensation for nonlinear friction in table drive systems‖, IEEE Transactions on Industrial Electronics, Vol. 50(6), pp. 1172-1178, 2003. [18] E.E. Elattar, J. Goulermas, Q.H. Wu, ―Generalized locally weighted GMDH for short term load forecasting‖, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 42(3), pp. 345356,2012. [19] S.K. Oh, W. Pedrycz, B. J. Park, ―Polynomial Neural Networks architecture: analysis and design‖, Computers and Electrical Engineering, Vol. 29, pp. 703-725, 2003. [20] S. K. Oh, W. Pedrycz, ―Fuzzy polynomial neuron-based self-organizing neural networks‖, Int. J. of General Systems, Vol. 32, No. 3, pp. 237-250, 2003. [21] S.K. Oh, H.S. Park, W.D. Kim, W. Pedrycz, ―A new approach to radial basis function-based polynomial neural networks: analysis and design‖, Knowl. Inf. Syst., Vol. 157, pp. 121-151, 2013. [22] J.S.R. Jang and C.T. Sun, ―Functional equivalence between radial basis function networks and 28

fuzzy inference systems,‖ IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 156-158, 1993. [23] Y. Li, Q. Zhang, "The application of principal component analysis on financial analysis in real estate listed company", Procedia Engineering, Vol. 15, pp. 4499-4503, 2011. [24] M. Filippone, ―Dealing with non-metric dissimilarities in fuzzy central clustering algorithm‖, International Journal of Approximate Reasoning, Vol. 50, No. 2, pp. 363-384, 2009. [25] W. Pedrycz, P. Rai, ―Collaborative clustering with the use of fuzzy C-Means and its quantification‖, Fuzzy Sets and Systems, Vol. 159, pp. 2399-2427, 2008. [26] B. Wiswedel, M. R. Berthold, ―Fuzzy clustering in parallel universes‖, International Journal of Approximate Reasoning, Vol. 45, No. 3, pp. 439-454, 2007. [27] H. J. Song, C. Y. Miao, Z. Q. Shen, W. Roel, D. H. Maja, C. Francky, ―Design of fuzzy cognitive maps using neural networks for predicting chaotic time series‖, Neural Networks, Vol. 23, No. 10, pp. 1264-1275, 2010. [28] W. Pedrycz, K. C. Kwak, ―The Development of Incremental Models‖, IEEE Trans. Fuzzy Systems, Vol. 15, No. 3, pp. 507-518, 2007. [29] S. K. Oh, H. S. Park, C. W. Jeong, S. C. Joo, ―GA-based Feed-forward Self-organizing neural network architecture and its applications for multi-variable nonlinear process systems‖, KSII Transactions on Internet and Information Systems, Vol. 3, No. 3, pp. 309-330, Jun, 2009. [30] J.N. Choi, S.K. Oh, W. Pedrycz, ―Identification of fuzzy models using a successive tuning method with a variant identification ratio‖, Fuzzy Sets and Systems, Vol. 159, pp. 2873-2889, 2008. [31] S.K. Oh, W.D. Kim, B.J. Park, W. Pedrycz, ―A design of granular-oriented self-organizing hybrid fuzzy polynomial neural networks‖, Neurocomputing, Vol. 119, pp. 292-307, 2013. [32] W. Pedrycz, K.C. Kwak, ―Boosting of granular models‖, Fuzzy Sets and Systems, Vol. 157, pp. 2943-2953, 2006. [33] W. Pedrycz, K. C. Kwak, ―Linguistic models as a framework of user-centric system modeling‖, IEEE Trans. SMC-A, Vol. 36, No. 4, pp. 727-745, 2006. [34] W. Pedrycz, H.S Park, S.K. Oh, ―A granular-oriented development of functional radial basis function neural networks‖, Neurocomputing. Vol.72, pp. 420-435, 2008. [35] S.S. Kim, K.C. Kwak, ―Development of quantum-based adaptive neuro-fuzzy networks‖, IEEE Transactions on Systems, Man, and Cybernetics-Part B, Vol. 40, No.1, pp. 91-100, 2010. [36] Alcala R., Ducange P., Herrera F., Lazzerini B., Marcelloni F, ―A Multiobjective Evolutionary Approach to Concurrently Learn Rule and Data Bases of Linguistic Fuzzy-Rule-Based Systems‖, IEEE Transactions on Fuzzy Systems. 17, 1106-1122 (2009) [37] Robles I., Alcala R., Benitez J.M., Herrera F, ―Evolutionary Parallel and Gradually Distributed 29

Lateral Tuning of Fuzzy Rule-Based Systems‖, J. Evolutionary Intelligence. 2, 5-19 (2009) [38] W. Pedrycz, H.S. Park , S.K. Oh, ―A Granular-Oriented Development of Functional Radial Basis Function Neural Networks‖, Neurocomputing. 72, 420-435 (2008) [39] Hichem Frigul, Quiem Bchir, and Naouel Baili, ― An Overview of Unsupervised and SemiSupervised Fuzzy Kernel Clustering‖, IJFIS, Vil. 13, No.4, 254-268 (2013)

APPENDIX To further evaluate the proposed model, we test four well-known big data sets with different complexities, which have different number of variables and available data. Table 10 shows the main characteristics of the four datasets. Table 11 supports a comparative analysis considering some existing models. It is evident that the proposed model compares favorably in terms of accuracy, prediction capabilities, and stability. Here MSE/2 used as the performance index in the references [36], [37], and [38] is considered as the performance index of our models (HRBFNNs).

Table 10. Descriptions of four selected machine learning data sets. Datasets

Name

Variables

Patterns

Treasury

TR

15

1049

Mortgage

MO

15

1049

Weather Izmir

IZ

9

1462

Computer Activity

CA

21

8192

Table 11. Comparison of performance with some selected models. Model

Dataset TR [36][37] MO[36][37] IZ[36][37][38] CA[36]

The best model listed in references

Without the use of PCA TR With the use of PCA Our model (HRBFNNs)

Without the use of PCA MO With the use of PCA

IZ

Without the use of PCA With the

NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1

PI 0.08±0.04 0.05±0.02 1.48±0.34 11.99±2.99 0.105±0.008 0.102±0.006 0.102±0.007 0.083±0.014 0.080±0.015 0.079±0.014 0.058±0.002 0.056±0.004 0.053±0.004 0.0079±0.0007 0.0074±0.0008 0.0070±0.0006 0.562±0.020 0.555±0.017 0.550±0.019 0.078±0.008

VPI Unkown Unkown Unkown Unkown 0.116±0.016 0.113±0.153 0.108±0.140 0.069±0.017 0.067±0.018 0.067±0.019 0.054±0.003 0.053±0.005 0.051±0.003 0.0072±0.0009 0.0071±0.0011 0.0064±0.0010 0.620±0.013 0.616±0.016 0.607±0.014 0.092±0.007

EPI 0.14±0.15 0.09±0.10 1.64±0.34 13.43±4.66 0.115±0.020 0.116±0.020 0.118±0.023 0.083±0.012 0.081±0.011 0.080±0.011 0.057±0.005 0.058±0.009 0.057±0.005 0.0078±0.0013 0.0074±0.0015 0.0073±0.0015 0.600±0.052 0.603±0.055 0.600±0.065 0.086±0.008

30

use of PCA Without the use of PCA CA With the use of PCA

NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3

0.075±0.008 0.073±0.008 11.46±0.443 11.03±0.452 11.21±0.427 0.107±0.026 0.094±0.030 0.088±0.033

0.088±0.008 0.089±0.008 11.53±0.575 11.10±0.582 11.24±0.624 0.118±0.034 0.107±0.040 0.101±0.043

0.084±0.004 0.125±0.129 11.99±0.986 12.03±0.972 12.23±0.954 0.114±0.034 0.099±0.036 0.108±0.036

31