Accepted Manuscript Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method (FCM) and polynomial neural networks (PNNs) Wei Huang, Sung-Kwun Oh, Witold Pedrycz PII: DOI: Reference:
S0893-6080(14)00202-0 http://dx.doi.org/10.1016/j.neunet.2014.08.007 NN 3384
To appear in:
Neural Networks
Received date: 3 February 2014 Revised date: 20 May 2014 Accepted date: 20 August 2014 Please cite this article as: Huang, W., Oh, S. -K., & Pedrycz, W. Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method (FCM) and polynomial neural networks (PNNs). Neural Networks (2014), http://dx.doi.org/10.1016/j.neunet.2014.08.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Design of Hybrid Radial Basis Function Neural Networks (HRBFNNs) Realized With the Aid of Hybridization of Fuzzy Clustering Method (FCM) and Polynomial Neural Networks (PNNs) Wei Huang1,2 1 State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China 2 School of Computer and Communication Engineering, Tianjin University of Technology, Tianj in 300384, China 3
Sung-Kwun Oh3 Department of Electrical Engineering, The University of Suwon, San 2-2 Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743, South Korea 4
Witold Pedrycz4,5 Department of Electrical & Computer Engineering, University of Alberta, Edmonton T6R 2G7 Canada 5 Systems Science Institute, Polish Academy of Sciences Warsaw, Poland
Abstract In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature.
Keywords: Hybrid Radial Basis Function Neural Networks (HRBFNNs), Fuzzy Clustering Method (FCM), Polynomial fuzzy neurons (PFNs), Principal Component Analysis (PCA), Genetic algorithm (GA)
1. Introduction The past decades have witnessed a continuously growing interest in the development of diverse models based on techniques of Computational Intelligence (CI), whose predominant technologies include neural networks, fuzzy sets, and evolutionary methods [1-3]. The techniques of CI gave rise to a number of new methodological insights and increased our awareness about crucial tradeoffs that one has to consider in system modeling [4-5]. As one of successful neural networks based on the principles of CI, Radial Basis Function Neural Networks (RBFNNs) have been applied in lots of fields such as engineering, medical engineering, and social science [6-7]. In the design of RBFNNs, the activation functions of hidden nodes are typically realized by using some Gaussian functions, while the output comes as a weighted sum of the activation levels of the individual RBFs [8],[9]. Such architecture albeit quite flexible, is not free from eventual drawbacks. One limitation is that discriminant functions generated by RBFNNs have a relatively simple geometry which is implied by the limited geometric variability of the underlying receptive fields (radial basis functions) located at the hidden layer of the RBF network [10]. To enhance the capabilities of information granulation, Fuzzy Radial Basis Function Neural Networks (FRBFNNs) have been further proposed. Unlike the RBFNNs, in the FRBFNNs FCM is used to realize information granulation in the premise part. FRBFNNs exhibit some advantages including global optimal approximation and classification capabilities, and a rapid convergence of the learning procedures, see [ 11-12]. However, FRBFNNs cannot easily generate complex nonlinear functions (e.g. high-order nonlinear discriminant functions). Some enhancements are proposed to the FRBFNN such as Polynomial Fuzzy Radial Basis Function Neural Networks (PF RBFNNs), yet the problem of generating high-order discriminant functions remains open. Group method of data handling (GMDH) developed by Ivakhnenko [13] is used as a convenient vehicle for identifying high-order nonlinear relations between input and output variables. GMDH forms a viable alternative to other models as such networks can easily cope with a nonlinear nature of data, and this becomes an evident advantage over low order polynomials such as linear and quadratic functions which are usually used in a "standard" regression polynomial fuzzy model [14]. Pioneering studies by Mital[15], Nagata et al. [16], Iwasaki et al. [17], and Elattar et al. [18] led to different improved GMDH models. In spite of their advantages, GMDH comes with some limitations [19]: first, it tends to generate quite complex polynomials even for relatively simple relationships; second, it also tends to produce an overly complex network when it comes to highly nonlinear systems due to its limited generic structure, and it does not generate a highly versatile structure if there are less than three input variables. To overcome these limitations, in the previous 2
study, we proposed an architecture of Fuzzy Polynomial Neural Networks (FPNNs) [20]. In the sequel, Radial Basis Function-based Polynomial Neural Networks (RBF-PNNs) [21] that are a certain type of FPNNs with Radial Basis Function neurons were developed as well. Nevertheless, in most cases all these advanced models are still not free from limitations. In this study, we develop hybrid radial basis function neural networks (HRBFNNs), which use fuzzy clustering method (Fuzzy C-Means, FCM, to be specific) to form information granules for the premise part of the network. The consequent parts of HRBFNNs are designed by using fuzzy polynomial neural networks. Along with the enhanced architecture, we take advantage of both polynomial neural networks and fuzzy clustering. The FPNNs of the HRBFNNs are formed with the use of polynomial fuzzy neurons (PFNs). Furthermore, genetic algorithm (GA) is exploited to optimize the parameters of HRBFNNs. To evaluate the performance of the proposed model, we discuss several experimental studies exploiting well-known data being already used in the realm of neuro-fuzzy modeling. Principal component analysis (PCA) is utilized to preprocess the data. The obtained results demonstrate superiority of the proposed networks over an already development models. This paper is organized in the following manner. Section 2 introduces the design of hybrid radial basis function neural networks. Section 3 gives an overall description of a detailed design methodology of HRBFNNs using genetic optimization. Section 4 reports on a comprehensive set of experiments. Finally concluding remarks are covered in Section 5.
2. Hybrid Radial Basis Function Neural Networks: an idea In this section, we first briefly review several essential classes of networks such as RBFNN, PFRBFNN, and FPNN, and then focus on the underlying idea of HRBFNNs.
2.1 Background of Hybrid Fuzzy Polynomial Neural Networks 1) RBFNN. RBFNN initialized by Broomhead and Lowe is an artificial neural network that embraces three different layers: an input layer, a hidden layer with a non-linear RBF activation function, and a linear output layer (viz. neurons with linear activation functions). Input layer: the input variables
are represented as a vector of real number.
Hidden layer: this layer consists of several hidden neurons, whose RBF activation functions are typically realized in the form of some Gaussian function. The center and width (spread) determine the location of RBF in the input space, and the number of the receptive fields is always provided in advance. Fuzzy clustering (here FCM) may be used to determine the number of RBFs, a position of their centers, and the values of the
3
widths of these functions [8], [9]. Output layer: the output of RBFNN is a weighted sum of the activation levels of the individual RBFs. The learning algorithm (e.g. a gradient method) is used to adjust the weights of the links coming from the hidden layer to the output layer [8],[9]. 2) PFRBFNN. The structure of PFRBFNN is similar to the conventional RBFNN, while there are two key differences between RBFNN and PFRBFNN. First, FCM is used to form the receptive fields (RBFs) in the PFRBFNN. In this manner, the RBFs do not assume any explicit functional form (such as Gaussian, ellipsoidal, triangular and others) but are of implicit nature as the activation level (degree of membership) result from the computing of the relevant distances between data and the corresponding prototypes (centers of the receptive fields) [10],[39]. Second, four types of polynomials, namely constant (zero-order), linear (1-order), quadratic (2-order) and reduced quadratic (2-order) are used as the local model [10]. The reason why one uses reduced quadratic is that in case of quadratic functions the dimensionality of the problem increases quite quickly especially when dealing with a large number of input variables. As pointed out by Jang, RBFNNs and fuzzy inference systems having rules with constant consequent (a zero-order Sugeno model) are functionally equivalent [22]. Along with this observation, a PFRBFNN can also be represented as follows:
R i : IF x is in cluster Ai THEN yi where
Ri
f i ( x)
(1)
is the ith fuzzy rule, i=1, ∙∙∙, n, where n is the number of fuzzy rules (the number of clusters),
f i ( x) is the consequent polynomial of the ith fuzzy rule, i.e., a local model representing input-output relationship of the ith sub-space (local area). The highest polynomial order of f i ( x ) is equal to 2. 3) FPNN. FPNN is based on the GMDH that is a dynamically formed model, whose topology is not predefined and kept unchanged [20-21]. Unlike static neural networks, FPNN consists of several different layers, namely an input layer, an output layer, and several hidden (intermediate) layers (e.g. 1st layer, 2nd layer, 3rd layer, etc). In the intermediate layers, each layer includes several polynomial neurons (PN) with multiple inputs and one output. The output
of a PN in the current layer is used as the input of PN located at
the next layer. In particular, the input of PN in the 1st layer is the input variables in the input layer. As to the PN, the output is a class of base polynomials, such as linear, quadratic, etc. Assume that
X
( x1 , x2 ,..., xl ) are the input of PN, the output of PN is one type of base polynomials whose general
type can be described as follows l
y
f ( x1 , x2 ,..., xl )
a0
l
l
ai xi + i 1
aij xi x j
(2)
i 1 j 1
Furthermore, consider the input-output data are given in the following fashion
( X i , yi )
( x1i , x2i ,..., xli , yi ),
i 1, 2,..., N
(3)
The estimated output yˆ of the FPNN reads as follows
4
yˆ
fˆ ( x1 , x2 ,..., xl )
l
a0
l
l
ai xi + i 1
l
l
l
aij xi x j + i 1 j 1
aijk xi x j xk
......
(4)
i 1 j 1 k 1
2.2 A Concept of Hybrid Radial Basis Function Neural Networks
In the ―conventional‖ RBFNNs, the input variables coming from the experimental data are directly used. It is apparent that this methodology may lead to the limitation of approximation abilities. By using FCM to form the receptive fields (RBFs), it is clear that the FRBFNN has the advantage by providing a through coverage of the entire input space in comparison with the RBFNN. However, the resulting accuracy of the model becomes limited due to the low order (less than or equal to the 2-order) polynomial used in the local models of PFRBFNN. If higher order polynomials were used as some local models, one can easily develop a fuzzy model with higher approximation ability based once the coefficients of this model have been estimated. Nevertheless, estimating coefficients of such high-order local models could become very difficult. When constructing such a nonlinear model with high-order character, FPNN initialized by Oh [20] comes as one of the effective neural networks that helps us alleviate the problem. In other words, estimating coefficients of high-order polynomials (local models) can be realized by using the FPNN to some extent. With this regard, we can develop a HRBFNN, which applies the FPNN as the consequence part of PFRBFNN. The HRBFNN is expressed in the form l
R i : IF x is in cluster Ai THEN yi
a0
l
l
ai xi + i 1
l
l
l
aij xi x j + i 1 j 1
aijk xi x j xk
......
(5)
i 1 j 1 k 1
Furthermore, there remains two open problems in the design of HRBFNN. First, the number of PNs substantially goes up with when increasing the number of input variables. Second, there are several parameters needed to be determined in the design of the neural network. Here the PCA is used to reduce the dimensionality of the input space, while the GA is utilized to obtain the "good choice" of parameters.
3. Architecture of Hybrid Radial Basis Function Neural Networks In (5), we proposed a novel architecture of hybrid radial basis function neural networks. A general architecture of HRBFNNs consists of five functional components: input, premise part, conclusion part, aggregation part, and output, as shown in Figure 1(a). Premise and conclusion parts relate to the formation of the fuzzy rules. The aggregation part is concerned with a fuzzy inference (mapping procedure). It has to be stressed that the HRBFNNs become the ―conventional‖
5
polynomial radial basis function neural networks (pRBFNNs) [10] when there is only a single node in the PFRNN. In other words, the HRBFNNs emerge as extended neural networks of pRBFNNs. It has to be noted that the fuzzy polynomial neural networks exhibits also a new architecture, which is essentially an improved fuzzy polynomial neural network (FPNN) [20-21] with novel polynomial fuzzy neurons. In Figure 1, k stands for the number of input variables coming from original system, n is the number of clusters coming from the partitioning of the input space by using FCM, wi ( i
[1, n] )
are
membership grades of fuzzy clusters, and m is the number of PFNs for constructing PFNs in the next layer.
Z i1 , Z i 2 ,..., Z iP
are new virtual input variables, which are the output variables of PFNs. As
the consequent part of HRBFNNs, we use a linear function in the following way: Fj ( Z i1 , Z i 2 ,..., Z iP )
a j0
a j1Z i1
a j 2 Zi 2
... a jP Z iP (i 1, 2,.., m, j
(6)
1, 2,..., n)
Where P denotes the number of the virtual input variables whose number is selected through genetic optimization (see Section 5). In this study, Fuzzy C-Means is considered to partition input space to form the premise part, while the minimization of the LSE is used to estimate the coefficients of the polynomials in the aggregation part. In general, the development of effective neural network becomes more complicated in the presence of the increasing number of input variables. The use of PCA [23] as a viable and quite commonly considered alternative to carry out dimensionality reduction is a sound alternative in our study. The PCA is then considered to preprocess the data sets to realize data reduction and extraction of features.
3.1 Partition Input Space via Fuzzy C-Means (FCM) In the design of the premise part, Fuzzy C-Means (FCM) is utilized to partition the space of input data. The FCM clustering method is used widely as a vehicle to cluster data into overlapping clusters described by a certain partition matrix. Let us briefly recall the essence of the FCM method. Consider the data set
X {x1 , x 2 ,
, xp,
, xm} , x p
{x p1 , x p 2 ,
, x pj ,
, x pl } , 1
p
m, 1
where m is the number of input data, k is the number of input variables. The input vectors
xk
j
l ,
X
ar
e partitioned into n clusters where the clusters are represented by the prototypes (center values) vi
, vik } ,
{vi1 , vi 2 ,
1 i
n . A fuzzy partition matrix U [uij ] satisfies the following conditions n
uij
1,
1 j
m.
(7)
i 1
and m
0
uij m, 1 i
n use < symbol not this strange one
(8)
j 1
The FCM algorithm minimizes the following objective function 6
n
m
(uij ) r d (x j , vi ), 1 r
Jm
(9)
i 1 j 1
where r is the fuzzification coefficient r>1, d (xk , vi ) is a distance between the input vecto k r x p X and the prototype (centroid) v i . The distance is taken as the sum (weighted Eu clidean distance) d (x p , v i )
xp
vi
2
l
( x pj
2 j
Premise part
Data Processing
xm
(10)
is the variance of the j th input variable.
Input x1 x2
2
2 j
j 1
In (5),
vij )
Fuzzy clusting PCA
x1 x2
x1, , xk
xk
Cluster2
w1 w2
Clustern
wn
Cluster1
Aggregation Part
Conclusion Part
x1, , xk
FPNN
w1 F1 ( z1 ,
, zP )
Output
w2 F2 ( z1 ,
, zP )
yˆ k
wn Fn ( z1 ,
, zP )
FPNN 1st layer
x1 x2
2nd layer or higher
PFN
PFN
PFN
PFN
PFN
PFN
F1 ( z1 ,
, zP )
PFN
F2 ( z1 ,
, zP )
PFN
Fn ( z1 ,
, zP )
PFN
x3
PFN
x4
PFN
PFN
PFN
PFN
PFN xk 1 xk 2
xkl
FCM A Cluster1 1
w1k
A2 w2 k
Cluster2
f1 ( xk1 ,
, xkl )
f2 (xk1,
, xkl )
yˆ k
A
Clusternn
wnk
fn (xk1, , xkl )
[ x k 1 , x k 2 ,..., x kl ]
7
Fig. 1. Architecture of HRBFNNs Under the assumption of the weighted Euclidean distance, the necessary conditions for solutions (U,V) of min{ J m (U, V)} come in the form 1 (11) uip wip , 1 p m, 1 i n (2/( r 1)) n x p vi xp
i 1
vj
and m
uipr x p p 1 m
vi
, 1 i n
(12)
r ip
u p 1
3.2 Estimating Coefficients of the Polynomials In the design of the aggregation part, the least squares error (LSE)-based method is utilized to estimate the values of the parameters of the polynomials forming the local models. The objective function (performance index) reads as follows: m
2
n
JG=
yk k 1
(13)
wik f i ( xk )) i 1
where wik is the normalized firing (activation) level of the i-th rule. The performance index J G can be expressed in a concise matrix form as follows JG
Y Xa
T
Y Xa
(14)
where a is the vector of the coefficients of the polynomial,
is the output vector of data,
Y
X
is matrix which rearranges input data, information granules (centers of each cluster) and activation levels. In case all consequent polynomials are linear (first-order polynomials),
X
and a can be
expressed as follows
X
1
u11 x11
u11 xl1
1
un1 x11
un1 xl1
1
u1k x12
u1k xl1
1
unk x12
unk xl1
1 u1m x1m 1
1 unm x1m
u1m xlm
1
1
n
1
the first fuzzy rule a [ a10
a11
n
unm xlm n
n
the n'th fuzzy rule
a1l
an 0
a n1
a nl ]T
The optimal values of the coefficients of the consequent are determined in the form a
( XT X ) 1 XT Y
(15)
8
3.3 Developing Fuzzy Polynomials Neural Networks via Polynomial Fuzzy Neurons In the design of the conclusion part, we note that the PFN is essentially a fuzzy radial basis function neural network. In the second and successive layers, PFNs are utilized as the generic processing units while their outputs are viewed as virtual input variables, which are no longer the input variables of the original system. The input variables set of a PFN is a collection of the specific subset of input variables coming from the original system. The PFNs are developed based on a fuzzy inference and fuzzy partitions (spaces). In some sense, a general topology of PFNs comes as an extended structure of the conventional fuzzy inference system (FIS) as shown in Figure 1. Unlike the conventional FIS, the premise part of PFNs is realized with the aid of information granulation, while the consequence part of PFNs is constructed based on different types of polynomials. In the design of premise part, information granules (the centers of individual clusters) and activation levels (degree of memberships) are determined by means of the FCM. In the design of consequence part, four types of polynomials, namely a constant type (a zero-order polynomial), linear type (a first-order polynomial), quadratic type (a secondorder polynomial), and a modified quadratic type (a modified second-order polynomial) being viewed as a local model representing the input-output relationship in a sub-space of the corresponding antecedent. One of the four types is selected for each sub-space as the result of the optimization, which will be described later in this study. It has to be stressed that the PFNs do not suffer from the curse of dimensionality (as all variables are considered en-block) [10][28] in comparison with the conventional fuzzy neurons in FPNN [20]. As a result, we may construct more accurate and compact models with only a small number of fuzzy rules by using relative high-order polynomials. More specifically, PFNs can be represented in form of ―if-then‖ fuzzy rules R i : IF x k is Ai THEN yki
Where
Ri
f i (x k )
(16)
is the ith fuzzy rule, i=1, ∙∙∙, n, n is the number of fuzzy rules (the number of clusters),
fi (xk ) is the consequent polynomial of the ith fuzzy rule, i.e., a local model representing input-
output relationship of the ith sub-space (local area). wik is the degree of membership (or activation level) of the ith local model vi =[vi1 vi2
… vil]T is the ith prototype
Table 1. Type of polynomials used in the local models (conclusions) Order
Type
Polynomial 9
1
C: Constant
fi ( xk1 ,..., xkl ) ai 0
2
L: Linear
fi ( xk1 ,..., xkl )
3 4
Q: Quadratic M: Modified quadratic
ai 0
ai1 xk1 ai 2 xk 2 ... ail xkl
fi ( xk1 ,..., xkl ) ai 0 ai1 xk1 ai 2 xk 2 ... ail xkl ai (2l 1) xk1 xk 2 ... ai ((l fi ( xk1 ,..., xkl )
ai (l 1) xkl2 ... ai (2l ) xkl2 x
x
1)( l 2)/2) k ( l 1) kl
ai 0 ai1 xk1 ai 2 xk 2 ... ail xkl
ai (l 1) xk1 xk 2 ... ai (l (l
x
x
1)/2) k ( l 1) kl
As mentioned earlier, we admit the consequent polynomials to be in of one of the forms shown in Table 1. The numeric output of the model, based on the activation levels of the rules, is given in the form n
yˆ k
wik fi ( xk1 , xk 2 ,...,xkl )
(17)
i 1
The learning of the PFNs involves the development of the premise as well as the consequent part of the rules. The premise part is formed through fuzzy clustering realized in the input space (here the fuzzification coefficient is equal to 2), while the learning of the consequent part is implemented by minimizing the LSE.
4. Genetic Optimization of Hybrid Radial Basis Function Neural Networks
Genetic Algorithms (GA) have been shown to be useful when dealing with nonlinear and complex problems. GA is a stochastic search technique based on the principles of evolution, natural selection, and recombination. GA is exploited here to optimize the parameters of HRBFNNs. In this section, we elaborate on the algorithm details of the design method by considering the functionality of FCM, PFNs, and GA in the architecture of HRBFNNs. The design procedure of HRBFNNs comprises the following steps. [Step 1] Preprocessing data set using PCA To obtain dimensionality reduction, principal component analysis is exploited here to preprocess the data sets for data reduction and extraction of features. [Step 2] Form training and testing dataset For convenience, an input–output data set is denoted as (xi,yi) = (x1i, x2i, …, xni, yi), i=1, 2, …, N, where N is the number of data points. Here we divided the data set into three parts, namely training data, validation data, and testing data. The training and validation data are used to construct the HRBFNNs, while the testing data is utilized to evaluate the quality of the network. The objective
10
function (performance index) includes both the training data and validation data and comes as a convex combination of these two components. PI+ (1- ) *VPI
f ( PI ,VPI )
Where
(18)
is a weighting factor that allows us to form a sound balance between the performance of
the model for the training, validation data, and testing data. We use two performance indexes viz. the standard root mean squared root error (RMSE), and the mean squared error (MSE) 1 m
PI (VPI or EPI ) 1 m
m
( yi
yi* ) 2 ,
( RMSE )
(19)
i 1
m
( yi
yi* ) 2 .
( MSE )
i 1
Here yˆ is the output of the neural network, m is the total number of data, and i is the data number. [Step 3] Decide upon the generic parameters used in the conclusion part Here we decide upon the essential design parameters of HRBFNNs. The choice of the essential design parameters of FPNN shown in Table 2 allows us selecting the best model with regard to the characteristics of the data, model design strategy, nonlinearity and predictive capabilities.
Table 2. Summary of design parameters used in the design of the network Index
Parameter Number of layers (NL) Max No. of nodes (w) created in a layer No. of input variables Fuzzification coefficient (f) No. of clusters (c) Polynomial Order (PO)
FPNN
PFN
Criterion 1~3 15 2 2 2 L, Q, M
[Step 4] Determine system’s input variables Define system’s input variables x1, x2, …, xn related to the output variable y. [Step 5] Design of PFNs For the selection of r inputs, the number of nodes (PFNs) to be generated in each layer becomes equal to k
n! , where, n is the number of total inputs and r stands for the number of the (n r )!r !
chosen input variables. For instance, in case of PFN with two inputs and linear polynomial, we have k partial descriptions such as (20); refer to Figure 1 and Table 1. zˆk
a0 a1 ( x p v1T ) a2 ( xq v2T ), k 1, 2,...,
n! (n 2)!2!
(20)
[Step 6] Check the termination criterion 11
A size of the network (expressed in terms of the maximum polynomials order for prediction, or the maximal number of the PFN layers) is used as the termination condition, which controls the polynomial order of the HRBFNNs. As far as the depth of the network is concerned, the generation process is stopped at a depth of less than three layers of PFNs. This size of the network has been experimentally found to form a sound compromise between the high accuracy of the resulting neural network and its complexity as well as generalization abilities. [Step 7] Select nodes with the best predictive capability and construct their corresponding layer To select nodes (PFNs) exhibiting the highest predictive capability, we use the following steps: [Step 7-1] Estimate the polynomial coefficient parameters (a0, a1 , …, a5) of each PFN by using the subset of the training data and validation data. [Step 7-2] Determine the identification error (EPI) of each PFN with the aid of the testing data set. [Step 7-3] Sort and rearrange all PFNs in a descending order based on their performance (EPI1, EPI2, …, EPIn!/(n-r)!r!).
[Step 7-4] Select the best w nodes, which are used for constructing the next layer PFNs. w is a pre-defined number of nodes with better predictive capability. All nodes (PFNs) are first rearranged in a descending order based on their performance (EPI1, EPI2, …, EPIn!/(n-r)!r!), and then some nodes will be selected. There are two cases as to the number of the preserved nodes in each layer i) If the number of the created nodes is n!/(n-r)!r! > w, then the number of the nodes retained for the next layer is equal to w. Namely, w nodes of the all generated nodes are selected according to the low-end order of EPI and the remained nodes, (n!/(n-r)!r!)–w, are discarded. ii) If the number of the created nodes is n!/(n-r)!r!
w, then for the next layer, the number of the
retained nodes is equal to n!/(n-r)!r!. Through the previous conditions of i) and ii), one can effectively reduce a large number of nodes and avoid a significant computing overhead.
Fuzzification coefficient
②
The number The number of Input variables of clusters input variables
②
②
Input variables
...
Input variables
②
(a) Arrangement of chromosome used in HRBFNNs
12
1.10
②
3
②
4
6
②
4
1
3
5
7
2
8
②
Interpretation of chromosome ① Fuzzification coefficient : 1.10 ② The number of clusters : 3 ③ The number of input variables : 4 ④ The selected input variables : [6 4 1 3] here 6, 4, 1 and 3 are indexes of input variables, respectively (b) Example of interpretation of chromosome
Fig. 2. Chromosome composition of genetic optimized HRBFNNs and its interpretation [Step 8] Select structure of HRBFNN by means of genetic optimization With regard to the optimization of HRBFNNs, the maximal number of layers in FPNNs is set to three. We consider eight parameters, i.e., the number of clusters (fuzzy rules), the fuzzification coefficients of FCM, the number of input variables for PFNs, and a collection of the specific subset of input variables. The first two parameters come from the premise part, while the other parameters are from the conclusion part (first layer of FPNNs). Figure 2 depicts the arrangement of the chromosomes used in the GA optimization. Fig. 2(b) offers an interpretation of the chromosome in case the upper bound of search space of the fuzzy clusters is 3. The fuzzification coefficient is 1.1, the number of the clusters is 3 and the first four values present in part ④ are selected. The structure selection of HRBFNNs can formed through a series of the following steps: [Step 8-1] Form the optimal subset of virtual input variables (Z1, Z2 , …, ZP) with the aid of GA. [Step 8-2] Determine the two parameters (including fuzzification coefficient and the number of clusters) by means of GA, and then construct partition matrix W based on the original input variables x1, x2, …, xn by using FCM. [Step 8-3] Compute output of the neural network by using weighted average defuzzifier (decoding scheme). [Step 9] Output the final result An overall design flowchart for the proposed HRBFNNs is shown in Fig. 3.
13
5. Experimental Studies
To evaluate the performance of the model, the proposed HRBFNNs are experimented with by using a series of numerical data [28-34]. In the first experiments, the proposed network is applied to deal with MIS data, which is a well-known software engineering data. The second and third experiments proceed with two selected data sets coming from Machine Learning repository. Finally, the experimental results of several other real-world problems of varying levels of complexity (different numbers of input variables and available data) are summarized as shown in Appendix I. For each experiment, two types of design scenarios are considered. In the first case, we have completed the experiments based on the original dataset (without using the PCA reduction), while in the second case, the data set is preprocessed by using PCA. In all cases, the experiment was repeated 10 times leading to a
random sub-sampling validation. Each data set is divided into three parts: 50% of data set is used for the training data; 30% of data set is utilized for the validation data; and the remaining 20% of data set is considered as testing data. For convenience, some representations are summarized as follows: PO is the polynomial order of PFNs, PI denotes the performance index for training data set, VPI represents the performance for the validation dataset, and EPI concerns the performance for the testing dataset.
Table 3. Generic parameters of genetic algorithm (GA) Item
Criterion
Population size Crossover probability Mutation probability Generations (iterations) Selection
100 0.65 0.1 300 Roulette wheel
Table 3 depicts the parameters of GA. The selection mechanism of the GA uses a roulette wheel while the generic parameters of GA are set up as follows: we use 300 generations and a population of size of 100 individuals for the optimization of HRBFNNs. The crossover rate and mutation probability are set up to be equal to 0.65 and 0.1, respectively (the choice of these specific values of the parameters is a result of intensive experimentations; as a matter of fact, those values are in line with those reported in the literature).
14
START 1
Preprocess data using PCA Training data
2
Validation data
Division of data
Testing data 3
Initial information for constructing HRBFNN architecture
No. of PFN layers
Decision of no. of nodes in each layer
0≤ q≤ 3
No. of nodes ≤ w PFN
No. of input variables No. of clusters Polynomial type L, Q, M 2≤ c≤ 5 2≤ r≤ 4 4
Obtain entire system's input variables
x1,x2,…,xk
5
Constructing PFNs (PFN1,PFN2,…,PFNm)
m
(k
k! r )! r !
z1,z2,…,zP 7
z1,z2,…,zm 6
Selection
NO
Termination criterion ?
z1,z2,…,zm
YES
z1,z2,…,zm
8
Selection of HRBFNN architecture using GA
Selection of input variables
Fuzzifications
No. of clusters
z1,z2,…,zp (p from m)
1.2 ≤ f ≤ 4.0
2 ≤ c ≤ 12
z1,z2,…,zp
8-1
8-2
Local models
c
f
Fuzzy C-Means clustering
FF1,F 1,F 2,…,F 2,…,F p n
x1,x2,…,xk
w1,w2,…,wn
8-3 Defuzzification
9 Final output yˆ
END
Fig. 3. An overall flowchart of design of HRBFNNs 5.1 Medical Imaging System (MIS) In this section, the proposed HRBFNNs are applied to software engineering data. Here we consider a medical imaging system data set, which involves 390 software modules written in Pascal and FORTRAN. Each module is described by 11 input variables, that is, total lines of code
15
including comments (LOC), total code lines (CL), total character count (TChar), total comments (TComm), number of comment characters (MChar), number of code characters (DChar), Halstead’s program length (N), Halstead’s estimated program length ( Nˆ ), Jensen’s estimator of program length (NF), McCabe’s cyclomatic complexity (V(G)), and Belady’s bandwidth metric (BW), respectively [28]. The output variable of the model is the number of reported changes—change reports (CRs). We consider the RMSE defined by (19) to be used as the performance index.
Table 4. Performance index of HRBFNNs for the MIS data (a) Without the use of PCA NL (FPNN) PO (PFN)
NL=1
NL=2
NL=3
PI
VPI
EPI
PI
VPI
EPI
PI
VPI
EPI
L
0.7074 ±0.104
1.4268 ±0.204
1.0576 ±0.438
0.6995 ±0.106
1.4413 ±0.209
1.0596 ±0.447
0.6996 ±0.107
1.4322 ±0.198
1.0541 ±0.471
Q
0.7069 ±0.115
1.5074 ±0.194
1.0000 ±0.363
0.6860 ±0.099
1.5574 ±0.244
1.0356 ±0.444
0.6815 ±0.093
1.4756 ±0.214
1.0585 ±0.403
M
0.8077 ±0.089
1.8921 ±0.204
1.1146 ±0.481
0.8023 ±0.086
1.9971 ±0.305
1.1187 ±0.459
0.7861 ±0.099
2.6056 ±1.598
1.1960 ±0.525
(b) With the use of PCA NL (FPNN) PO (PFN)
NL=1
NL=2
NL=3
PI
VPI
EPI
PI
VPI
EPI
PI
VPI
EPI
L
0.2802± 0.049
0.5656± 0.098
0.4264± 0.221
0.2769± 0.048
0.5608± 0.099
0.4299± 0.226
0.2739 ±0.048
0.5603± 0.098
0.4321 ±0.226
Q
0.2908± 0.048
0.5618± 0.094
0.4274± 0.231
0.2797± 0.047
0.5559± 0.094
0.4104± 0.219
0.2763 ±0.049
0.5545± 0.094
0.4460 ±0.259
M
0.2853± 0.049
0.5937± 0.119
0.4668± 0.303
0.2720± 0.051
0.6723± 0.171
0.5493± 0.292
0.2656 ±0.051
1.9336± 3.369
3.6293 ±6.653
The results are listed in Table 4. Here the experimental results are displayed in the same manner as in the previous examples, and the model with the best PI is considered as the optimal model. As shown in Table 4, the optimal model (in boldface) could be emerged according to the layer of assigned PFN. Without the use of PCA, the optimal performance is PI= 0.6815±0.093, VPI=1.475±0.214 and EPI=1.0588±0.403; while with the use of PCA, the optimal performance comes with PI=0.2763±0.049, VPI=0.5545±0.094 and EPI=0.4460±0.259 reported for the threelayer FPNN.
16
2
2
1.8
1 0.8 0.6 0.4
1.8 1.6 1.4 1.2
0.2 0
Testing data error (EPI)
2.2
1.2
Validation data error (VPI)
Training data error (PI)
1.4
1
1
2 Layer of FPNN
0.8
3
1.6 1.4 1.2 1 0.8 0.6
1
(a) Training data
2 Layer of FPNN
0.4
3
1
(b) Validation data
2 Layer of FPNN
3
(c) Testing data
Fig. 4. Performance index of HRBFNNs for the MIS without the use of PCA
0.5
0
-0.5
1
2 Layer of FPNN
1
0.5
0
3
1.5 Testing data error (EPI)
1.5
Validation data error (VPI)
Training data error (PI)
1
(a) Training data
1
2 Layer of FPNN
3
(b) Validation data
1
0.5
0
-0.5
1
2 Layer of FPNN
3
(c) Testing data
Fig. 5. Performance index of HRBFNNs for the MIS with the use of PCA
Input
x1 x2
FCM
x3 , x 4 ,
, x9
Cluster1
w1
Cluster2
w2
x11
The 1st layer x3 x4 x5 x6 x7 x8 x9
The 2nd layer
PFN PFN PFN PFN
F1 ( z3 , z8 , z14 )
w1F1 ( z3 , z8 , z14 )
yˆ k
PFN
F2 ( z3 , z8 , z14 )
PFN
Output
w2 F2 ( z3 , z8 , z14 )
PFN PFN
(a) Without the use of PCA
17
Input x1 x2
FCM PCA
x11
x1 x2 x3 x4
Cluster1
w1
Cluster2
w2
x1 , x2 , x3 , x4
The 1st layer
The 2nd layer
x1
PFN
PFN
x2
PFN
x3
PFN
x4
PFN
PFN PFN
F1 ( z1 , z2 , z4 )
w1 F1 ( z1 , z2 , z4 )
F2 ( z1 , z2 , z4 )
w2 F2 ( z1 , z2 , z4 )
Output
yˆ k
(b) With the use of PCA
Fig. 6. Optimal HRBFNN architectures for the MIS data
Table 5. Results of comparative analysis (MIS) Model
PI
VPI
EPI
Simplified
40.753
17.898
Linear
35.745
17.807
SI = 2
32.195
18.462
SI = 3
32.251
19.622
PN based FSONN
18.043
11.898
FPN based FSONN
23.739
9.090
All input
40.05
36.32
Joint
35.23
28.03
Successive
32.34
25.98
Linear regression
5.877±0.626
6.570±1.024
Incremental model
4.620±0.896
6.624±0.773
SI = 2
2.2141±0.879
3.4630±1.490
SI = 3
0.8852±0.082
3.4690±0.780
SI = 4
0.7818±0.085
3.4211±1.306
SONFN[28]
Index MSE
FPNN[201]
MSE
GA-based FSONN[29]
MSE
Regression model [30] HPGA-optimized FIS [30]
MSE
Incremental model[28]
RMSE
GO-FPNN [31] (3rd layer)
Without the use of PCA
PO=Q
Our Model (HRBFNNs) With the use of PCA
MSE
PO=Q
NL=1
0.7069±0.115
1.5074±0.194
1.0000±0.363
NL=2
0.6860±0.099
1.5574±0.244
1.0356±0.444
NL=3
0.6815±0.093
1.4756±0.214
1.0585±0.403
NL=1
0.2908±0.048
0.5618±0.094
0.4274±0.231
NL=2
0.2797±0.047
0.5559±0.094
0.4104±0.219
NL=3
0.2763±0.049
0.5545±0.094
0.4460±0.259
RMSE
RMSE
18
For the HRBFNN when the NL is equal to 3, the layers of FPNN impact the performance of the networks are as shown in Fig. 4 and Fig. 5, respectively. Their increase leads to the reduction of error for the training set and validation set, while there is a different tendency for the testing set. As shown in this figure, the value of the training error is decreased with the increasing prediction abilities of HRBFNN, whereas the testing error becomes higher. Fig. 6 illustrates the comparison of the optimal topologies of HRBFNN for the third layer of FPNNs. It is apparent that the input variables become only four with the use of PCA, while the performance of HRBFNN is better than the HRBFNN without the use of PCA. Table 5 reports the performance of the proposed model vis-à-vis performance of other models. In all cases, the proposed model provides substantially better approximation and generalization capabilities.
5.2 Abalone Data (ABA) The second data is Abalone machine learning data, which concern with predicting the age of abalone from physical measurements [28]. This is a larger data set consisting of 4177 input-output pairs and seven input variables (including Length, Diameter, Height, Whole weight, Shucked weighted, Viscera weight, and Shell weight). The performance index is MSE defined by (19). For the ABA dataset, the optimal performance of the proposed HRBFNNs is summarized in Table 6. As shown there, the tendency in the performance when dealing with the PFN of different polynomial order are similar to those of the network for ABA. That is to say, the proposed HRBFNNs lead to stable performance when considering a sound balance between the approximation and generalization capabilities. With the use of PCA, the dimensionality of the input space was reduced from seven to three. It has to be stressed that the performance of the model where the PCA reduction has been completed is quite better in comparison with the results produced by the model without the use of PCA. Table 6. Performance index of HRBFNNs obtained for the ABA data (a) Without the use of PCA NL (FPNN) PO (PFN)
NL=1
NL=2
NL=3
PI
VPI
EPI
PI
VPI
EPI
PI
VPI
EPI
L
5.1252 ±0.342
4.3092 ±0.394
3.2088 ±0.649
5.1324 ±0.359
4.2631 ±0.433
3.2170 ±0.772
5.0647 ±0.375
4.2146 ±0.470
3.0610 ±0.688
Q
5.1379 ±0.355
4.3177 ±0.394
3.2020 ±0.660
5.0776 ±0.339
4.2137 ±0.428
3.2858 ±0.895
5.0508 ±0.363
4.2367 ±0.427
3.0705 ±0.778
M
4.7720 ±0.293
4.1700 ±0.411
3.2713 ±1.174
4.7153 ±0.295
4.1556 ±0.402
3.3350 ±1.165
4.6917 ±0.298
4.1377 ±0.430
3.5023 ±1.317
19
(b) With the use of PCA NL (FPNN) PO (PFN)
NL=1
NL=2
NL=3
PI
VPI
EPI
PI
VPI
EPI
PI
VPI
EPI
L
0.1456 ±0.008
0.1664 ±0.010
0.1683 ±0.045
0.1401 ±0.011
0.1640 ±0.013
0.1587 ±0.356
0.1371 ±0.012
0.1605 ±0.013
0.1608 ±0.045
Q
0.1403 ±0.010
0.1586 ±0.012
0.1529 ±0.019
0.1231 ±0.014
0.1462 ±0.016
0.1302 ±0.014
0.1202 ±0.014
0.1409 ±0.019
0.1288 ±0.015
M
0.1347 ±0.016
0.1309 ±0.015
0.1350 ±0.016
0..1315 ±0.016
0.1333 ±0.015
0.1377 ±0.018
0.1311 ±0.016
0.1332 ±0.015
0.1366 ±0.015
Figures 7-8 depict the tendency of the changes in the performance index of the best neural networks. They show that with the increasing number of layers in FPNN, the model produces better approximation while the prediction ability decreases to some extent.
5
5.5
4.2 4
5
4.5
Testing data error (EPI)
Validation data error (VPI)
Training data error (PI)
4.8 4.6 4.4 4.2 4 3.8 4
1
2 Layer of FPNN
3.6
3
3.8 3.6 3.4 3.2 3 2.8
1
(a) Training data
2 Layer of FPNN
2.6
3
1
2 Layer of FPNN
(b) Validation data
3
(c) Testing data
Fig. 7. Performance index of HRBFNNs for the MIS without the use of PCA
0
1
2 Layer of FPNN
(a) Training data
3
Testing data error (EPI)
0.5
-0.5
1
1
Validation data error (VPI)
Training data error (PI)
1
0.5
0
-0.5
0.5
0
-0.5 1
2 Layer of FPNN
(b) Validation data
3
1
2 Layer of FPNN
3
(c) Testing data
Fig. 8. Performance index of HRBFNNs for the ABA with the use of PCA
20
Input
x1 x2
FCM
x1 , x 4 , x 5 , x 7
Cluster1
w1
Cluster2
w2
x7
The 1st layer x1
PFN
x4
PFN
x5
PFN
x7
PFN
The 2nd layer
PFN
F1 ( z2 , z3 , z7 )
w1 F1 ( z2 , z3 , z7 )
Output yˆ k
PFN
F2 ( z2 , z3 , z7 )
w2 F2 ( z2 , z3 , z7 )
PFN
(a) Without the use of PCA Input x1 x2 x7
FCM PCA
x1
x1 , x 2 , x 3
x2
Cluster1
w1
Cluster2
w2
x3
The 1st layer
x1 x2
PFN
PFN
x3 PFN
The 2nd layer
PFN
F1 ( z1 , z2 , z3 )
w1 F1 ( z1 , z2 , z3 )
F2 ( z1 , z2 , z3 )
w2 F2 ( z1 , z2 , z3 )
PFN
Output yˆ k
PFN
(b) With the use of PCA
Fig. 9. Optimal HRBFNN architectures for the ABA data
Figure 9 visualize the architectures of the optimal HRBFNN for the ABA data set. It is clear that the optimal HRBFNN with the use of PCA leads to better performance and simpler structure in comparison with the HRBFNN without the use of PCA. The comparative analysis illustrated in Table 7 contrasts the proposed model with other models and the proposed models are preferred as architecture in the modeling of ABA dataset given approximation and generalization aspects of the model.
21
Table 7. Results of comparative analysis (ABA) Model
PI
VPI
EPI
Linear regression[32]
14.15±0.07
17.22±0.20
RBFN [32]
H=30
10.36±0.02
10.48±0.01
RBFN+context-free clustering [32]
H=30
10.54±0.01
10.58±0.006
Boosting of granular model [32]
P=5, C=6
8.39±0.008
8.68±0.014
RBFNN[33]
6.36±0.24
6.94±0.31
RBFNN with context-free clustering[33]
5.52±0.25
6.91±0.45
Without optimization
5.21±0.12
6.14±0.28
One-loop optimization
4.80±0.52
5.22±0.58
Multi-step optimization
4.12±0.35
5.32±0.96
H=30
4.496±0.195
4.94±0.245
Linguistic Modeling[33]
RBFNN I [34]
Functional RBFNN [34]
m=2.0; H
33
3.846±0.113
4.800±0.235
m=2.5; H
18
4.147±0.127
4.892±0.205
m=4.0; H
33
4.369±0.190
4.643±0.202
PSO-based PNN [22]
4.338±0.231
4.398±0.282
7.997±9.465
PSO-based FPNN [22]
4.217±0.142
4.340±0.188
2e+7±5e+7
RBFPNN[22]
3.605±0.169
4.561±0.272
4.710±0.224
NL=1
4.7720±0.293
4.1700±0.411
3.2713±1.174
NL=2
4.7153±0.295
4.1556±0.402
3.3350±1.165
NL=3
4.6917±0.298
4.1377±0.430
3.5023±1.317
NL=1
0.1403±0.010
0.1586±0.012
0.1529±0.019
NL=2
0.1231±0.014
0.1462±0.016
0.1302±0.014
NL=3
0.1202±0.014
0.1409±0.019
0.1288±0.015
Without the use of PCA
PO=M
Our Model (HRBFNNs) With the use of PCA
PO=Q
5.3 Boston Housing Data The third data is the well-known Boston Housing (BH) data [33-35], which deals with the house price and consists of 506 observations and 13 independent variables and a single output variable. In our experiments, the performance of the proposed neural network is evaluated in terms of the average and standard deviation of RMSE. Table 8 summarizes the performance of HRBFNNs. As shown, there is a clearly visible tendency that the values of the performance index get lower with the increase of the order of the polynomial, the best performance is shown in bold. This effect is not surprising as too many information granules (higher granularity) might contribute to a potential memorization effect. In most of cases, the appropriate increase of the polynomial order (complexity) makes the prediction abilities (EPI) of the proposed model better. It is also apparent that the preprocessing of PCA has an important impact on the enhanced performance of the neural networks.
22
Table 8. Performance index of HRBFNNs for the BH data (a) Without the use of PCA NL (FPNN) PO (PFN)
NL=1
NL=2
NL=3
PI
VPI
EPI
PI
VPI
EPI
PI
VPI
EPI
L
2.1476± 0.643
4.0877± 1.243
4.2276± 1.639
1.9638±0 .585
3.8616± 1.174
9.4547± 9.108
1.8692 ±0.556
3.7816± 1.141
4.1689 ±1.267
Q
2.3483± 0.395
4.4705± 0.763
4.3438± 0.298
1.9672 ±0.106
4.1808± 0.662
3.9196± 0.593
1.8839 ±0.086
4.3140± 0.782
4.2471 ±0.319
M
2.5399± 0.365
4.5112± 0.469
3.9173± 0.455
2.0898±0 .310
4.2564± 0.318
3.6317± 0.213
1.9151 ±0.277
4.4086± 0.583
4.7049 ±1.406
(b) With the use of PCA NL (FPNN) PO (PFN)
NL=1
NL=2
NL=3
PI
VPI
EPI
PI
VPI
EPI
PI
VPI
EPI
L
0.3758± 0.017
0.4767± 0.035
0.4990± 0.051
0.3715 ±0. 012
0.4751± 0.032
0.5001± 0.074
0.3731 ±0.011
0.4734± 0.036
0.4871 ±0.065
Q
0.3369± 0.018
0.4681± 0.050
0.4933± 0.075
0.3226 ±0.021
0.4687± 0.053
0.4801± 0.094
0.3226 ±0.022
0.4601± 0.055
0.5001 ±0.075
M
0.3790± 0.023
0.4862± 0.045
0.5286± 0.084
0.3403 ±0.023
0.4787± 0.052
0.6663± 0.500
0.3205 ±0.021
0.4694± 0.055
0.8349 ±1.001
3
5
5
4.8
2
1.5
Testing data error (EPI)
Validation data error (VPI)
Training data error (PI)
4.8
2.5
4.6 4.4 4.2 4 3.8
1
1
2 Layer of FPNN
3
4.6 4.4 4.2 4 3.8 3.6
3.6
1
(a) Training data
2 Layer of FPNN
3.4
3
1
2 Layer of FPNN
(b) Validation data
3
(c) Testing data
Fig. 10. Performance index of HRBFNNs for the BH data without the use of PCA
1
0.5
0
-0.5
1
2 Layer of FPNN
(a) Training data
3
1.5
Testing data error (EPI)
1.5 Validation data error (VPI)
Training data error (PI)
1.5
1
0.5
0
-0.5
1
2 Layer of FPNN
(b) Validation data
3
1
0.5
0
-0.5
1
2 Layer of FPNN
3
(c) Testing data
Fig. 11. Performance index of HRBFNNs for the BH data with the use of PCA
23
Input
x1 x2
FCM
x1, x2 , , x13
Cluster1
w1
Cluster2
w2
x13
The 1st layer
x2 ( Zn)
PFN
x5 ( Nox) x6 ( Rm) x10 (Tax)
PFN
The 2nd layer
PFN
F1 ( z11 ,
PFN
, z15 )
w1F1 ( z11 ,
, z15 )
yˆ k
PFN PFN PFN
x11 ( Ptratio) x13 ( Lstat )
Output
F2 ( z11 ,
PFN
, z15 )
w2 F2 ( z11 ,
, z15 )
PFN PFN PFN
(a) Without the use of PCA
Input x1 x2 x13
FCM PCA
x1 x2
x1, x2, , x7
Cluster1
w1
Cluster2
w2
x7
The 1st layer x1 x2 x3 x4
PFN
The 2nd layer
PFN
PFN PFN
x5
PFN
x6
PFN
F1 ( z1 , z5 , z9 )
w1 F1 ( z1 , z5 , z9 )
F2 ( z1 , z5 , z9 )
w2 F2 ( z1 , z5 , z9 )
PFN
Output
yˆ k
PFN
x7
(b) With the use of PCA
Fig. 12. Optimal HRBFNN architecture for the BH data
Fig. 10 and Fig. 11 display the values of the performance index (both in terms of its mean value and the standard deviation) vis-à-vis the HRBFNNs with the increasing layer of FPNN. As shown in this figure, the value of the training error is decreased with the increasing prediction abilities of 24
HRBFNN. The testing error reduces in case of NL (the number of layers in FPNN) is equal to two, while it becomes high when NL arrives at three. This tendency shows that the substantial increase of the polynomial order of the proposed model reduces the training error, while sometimes it worsens the prediction abilities. The optimal HRBFNNs formed for the BH data set are shown in Figure 12. With the use of PCA, the number of input variables are reduced to seven. The results demonstrate that the use of the PCA is beneficial to the enhanced prediction abilities of the model in contrast to the results formed by the HRBFNN without the use of PCA. Table 9 reports the results of comparative analysis of the performance of the proposed network with some other models. The experimental results reveal that the proposed network outperforms the existing models both in terms of its approximation capabilities as well as generalization abilities.
Table 9. Results of comparative analysis (BH) Model
PI
RBFNN[33] RBFNN with context-free clustering[33]
EPI
Index
6.36±0.24
6.94±0.31
RMSE
5.52±0.25
6.91±0.45
RMSE
Without optimization
5.21±0.12
6.14±0.28
One-loop optimization
4.80±0.52
5.22±0.58
Multi-step optimization
4.12±0.35
5.32±0.96
Linear regression
4.535±0.240
5.043±0.396
Polynomial(2 order)
3.815±0.264
4.455±0.399
Incremental model
3.279±0.177
4.298±0.439
RBFNN I [34]
H=30
5.105±1.816
5.907±2.311
RMSE
RBFNN II [34]
H=30
7.031±2.202
7.558±3.063
RMSE
m=2.0
4.724±0.644
14.064±0.820
m=2.5
8.079±1.762
14.825±1.361
m=3.5
8.450±1.029
14.523±1.563
Linguistic Modeling[33]
Incremental model[28]
nd
Functional RBFNN [34]
VPI
RMSE
RMSE
MSE
PSO-based PNN [34]
2.129±0.304
3.738±0.625
7.661±9.690
RMSE
PSO-based FPNN [34]
2.779±0.325
3.298±0.327
7.479±7.596
RMSE
RBFPNN[22]
1.398±0.244
3.637±0.298
4.827±1.474
RMSE
Type-1 hybrid
2.60
3.63
RMSE
Type-2 hybrid
2.35
3.87
RMSE
QANFN[35]
Without the use of PCA
PO=Q
Our model (HRBFNNs) With the use of PCA
PO=Q
NL=1
2.3483±0.395
4.4705±0.763
4.3438±0.298
NL=2
1.9672±0.106
4.1808±0.662
3.9196±0.593
NL=3
1.8839±0.086
4.3140±0.782
4.2471±0.319
NL=1
0.3369±0.018
0.4681±0.050
0.4933±0.075
NL=2
0.3226±0.021
0.4687±0.053
0.4801±0.094
NL=3
0.3226±0.022
0.4601±0.055
0.5001±0.075
RMSE
25
6. Concluding Remarks Classical polynomial neural networks (PNNs) have placed substantial emphasis on the generation of complex nonlinear functions However, they do not exhibit abilities to deal with granular information. With this regard, fuzzy sets bring some useful opportunities. By combing PNNs and FCM, we take advantages of the two technologies and proposed architecture of HFRBFNN to address serials of prediction problems. The work contributes to the research on neuro-fuzzy models and the main findings can be briefly summarized as follows. First, we have investigated hybrid fuzzy radial basis function neural networks. On the one hand, when compared with the conventional PNN, the HFRBFNN helps overcome the limitation where the information granulation of the PNNs. Information granules that is essentially reflective of the characteristic of the system(experimental data) is a suitable way of realizing abstraction converts the original problem into a series of manageable subtasks. On the other hand, when compared with the conventional FRBFNN, the HFRBFNN obtain the capabilities to estimate the complex functional (highly polynomial) character between the input and output data, especially high-order nonlinear character. Instead of conventional neurons, PFNs are constructed to generate polynomial functions for the better predictive capability. Second, we have investigated the HFRBFNNs optimized by genetic algorithms. In the design of HFRBFNN, there are a number of parameters required to be determined. With the use of GAs, the proposed HFRBFNN produces optimal parameters. Third, we have used PCA to preprocess the dataset. As an important step in data processing, PCA is used for reducing the dimension of input variables. With the use of PCA, the proposed HFRBFNN leads to the better performance of the developed network. On average, we have observed at least 50% improvement of the performance. The experimental studies using several well-known datasets show a superb performance of the HFRBFNN when compared with some recent neuro-fuzzy models, especially for the datasets such as the Medical image system data, Abalone machine learning data. More importantly, with the proposed network with the information granulation, one can efficiently develop the optimal topology of network (optimization of the structural and parametrical network architecture), which is crucial to improving performance of the resulting model. For future study, HRBFNNs may be improved by constructing new fuzzy neurons or new architectures. Furthermore, multi-objective evolutionary algorithms can be used to optimize the HRBFNNs.
26
Acknowledgements This work was supported by the National Natural Science Foundation of China (Grant No. 61301140), and supported by the Open Funding Project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China (Grant No. BUAA-VR-14KF-11), and also supported by the GRRC program of Gyeonggi province [GRRC Suwon 2013-B2, Center for U-city Security & Surveillance Technology] and by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (NRF-2012R1A1B3003568).
Reference [1] K. D. Karatzas, S. Kaltsatos, ―Air pollution modelling with the aid of computational intelligence methods in Thessaloniki, Greece‖, Simulation Modelling Practice and Theory, Vol. 15, pp. 1310-1319, 2007. [2] G. Bao, S. Wen, Z. Zeng, ―Robust stability analysis of interval fuzzy Cohen-Grossberg neural networks with piecewise constant argument of generalized type‖, Neural Networks, Vol. 33, pp. 32-34, 2012. [3] C. Riziotis, A. V. Vasilakos, ―Computational intelligence in photonics technology and optical networks: A survey and future perspectives‖, Information Sciences, Vol. 177, pp. 5292-5315, 2007. [4] R. del-Hoyo, B. Martin-del-Brio, N. Medrano, J. Fernandez-Navajas, ―Computational intelligence tools for next generation quality of service management‖, Neurocomputing, Vol. 72, pp. 3631-3639, 2009. [5] S. Dick, A. Tappenden, C. Badke, O. Olarewaju, ―A granular neural network: Performance analysis and application to re-granulation‖, International Journal of Approximate Reasoning, Vol. 54, No. 8, pp. 1149-1167, 2013. [6] G. W. Chang, C.I. Chen, Y. F. Teng, ―Radial-Basis-Function-Based neural network for harmonic detection‖, IEEE Transactions on Industrial Electronics, Vol. 57, No. 6, pp. 2171-2179, 2010. [7] G. Simone, F. C. Morabito, ―RBFNN-based hole identification system in conducting plates‖, IEEE Transactions on Neural Networks, Vol. 12, No. 6, pp. 1445-1454, 2001. [8] F. Behloul, B.P.F. Lelieveldt, A.Boudraa, and J.H.C. Reiber, ―Optimal design of radial basis function neural networks for fuzzy-rule extraction in high dimensional data,‖ Pattern Recognition, vol. 35, pp. 659-675, 2002. 27
[9] A. Staiano, R. Tagliaferri, and W. Pedrycz, ―Improving RBF networks performance in regression tasks by means of a supervised fuzzy clustering,‖ Neurocomputing, vol. 69, pp. 1570-1581, 2006. [10] S. K. Oh, W. D. Kim, W. Pedrycz, B. J. Park, ―Polynomial-based radial basis function neural networks (P-RBFNNs) realized with the aid of particle swarm optimization‖, Fuzzy Sets and Systems, Vol. 163, pp. 54-77, 2011. [11] H. G. Han, Q. L. Chen, J. F. Qiao, ―An efficient self-organizing RBF neural network for water quality prediction‖, Neural Networks, Vol. 24, pp. 717- 725, 2011. [12] C. Y. Liu, C. Chen, C. T. Chang, L. M. Shih, ―Single-hidden-layer feed-forward quantum neural network based on Grover learning‖, Neural Networks, Vol. 45, pp. 144-150, 2013. [13] A. G. Ivakhnenko, G. A. Ivakhnenko, ―The review of problems solvable by algorithms of the Group Method of Data Handling (GMDH)‖, Pattern Recogn. Image Anal. Vol. 5(4), pp. 527535, 1995. [14] B. J. Park, W. Pedrycz, S.K. Oh, ―Fuzzy Polynomial Neural Networks: hybrid architectures of fuzzy modeling‖, IEEE Transactions on Fuzzy Systems, Vol. 10(5), pp. 607-621, 2002. [15] A. Mital, ―Prediction of human static and dynamic strengths by modified basic GMDH algorithm‖, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 14, pp. 773-776, 1984. [16] M. Nagata, K. Takada, M. Sakuda, ―Nonlinear Interpolation of mandibular kinesiographic signals by applying sensitivity method to a GMDH correction model‖, IEEE Transactions on Biomedical Engineering, Vol. 38(4), pp. 326-329, 1991. [17] M. Iwasaki, H. Takei, N. Matsui, ―GMDH-based modeling and feedforward compensation for nonlinear friction in table drive systems‖, IEEE Transactions on Industrial Electronics, Vol. 50(6), pp. 1172-1178, 2003. [18] E.E. Elattar, J. Goulermas, Q.H. Wu, ―Generalized locally weighted GMDH for short term load forecasting‖, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 42(3), pp. 345356,2012. [19] S.K. Oh, W. Pedrycz, B. J. Park, ―Polynomial Neural Networks architecture: analysis and design‖, Computers and Electrical Engineering, Vol. 29, pp. 703-725, 2003. [20] S. K. Oh, W. Pedrycz, ―Fuzzy polynomial neuron-based self-organizing neural networks‖, Int. J. of General Systems, Vol. 32, No. 3, pp. 237-250, 2003. [21] S.K. Oh, H.S. Park, W.D. Kim, W. Pedrycz, ―A new approach to radial basis function-based polynomial neural networks: analysis and design‖, Knowl. Inf. Syst., Vol. 157, pp. 121-151, 2013. [22] J.S.R. Jang and C.T. Sun, ―Functional equivalence between radial basis function networks and 28
fuzzy inference systems,‖ IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 156-158, 1993. [23] Y. Li, Q. Zhang, "The application of principal component analysis on financial analysis in real estate listed company", Procedia Engineering, Vol. 15, pp. 4499-4503, 2011. [24] M. Filippone, ―Dealing with non-metric dissimilarities in fuzzy central clustering algorithm‖, International Journal of Approximate Reasoning, Vol. 50, No. 2, pp. 363-384, 2009. [25] W. Pedrycz, P. Rai, ―Collaborative clustering with the use of fuzzy C-Means and its quantification‖, Fuzzy Sets and Systems, Vol. 159, pp. 2399-2427, 2008. [26] B. Wiswedel, M. R. Berthold, ―Fuzzy clustering in parallel universes‖, International Journal of Approximate Reasoning, Vol. 45, No. 3, pp. 439-454, 2007. [27] H. J. Song, C. Y. Miao, Z. Q. Shen, W. Roel, D. H. Maja, C. Francky, ―Design of fuzzy cognitive maps using neural networks for predicting chaotic time series‖, Neural Networks, Vol. 23, No. 10, pp. 1264-1275, 2010. [28] W. Pedrycz, K. C. Kwak, ―The Development of Incremental Models‖, IEEE Trans. Fuzzy Systems, Vol. 15, No. 3, pp. 507-518, 2007. [29] S. K. Oh, H. S. Park, C. W. Jeong, S. C. Joo, ―GA-based Feed-forward Self-organizing neural network architecture and its applications for multi-variable nonlinear process systems‖, KSII Transactions on Internet and Information Systems, Vol. 3, No. 3, pp. 309-330, Jun, 2009. [30] J.N. Choi, S.K. Oh, W. Pedrycz, ―Identification of fuzzy models using a successive tuning method with a variant identification ratio‖, Fuzzy Sets and Systems, Vol. 159, pp. 2873-2889, 2008. [31] S.K. Oh, W.D. Kim, B.J. Park, W. Pedrycz, ―A design of granular-oriented self-organizing hybrid fuzzy polynomial neural networks‖, Neurocomputing, Vol. 119, pp. 292-307, 2013. [32] W. Pedrycz, K.C. Kwak, ―Boosting of granular models‖, Fuzzy Sets and Systems, Vol. 157, pp. 2943-2953, 2006. [33] W. Pedrycz, K. C. Kwak, ―Linguistic models as a framework of user-centric system modeling‖, IEEE Trans. SMC-A, Vol. 36, No. 4, pp. 727-745, 2006. [34] W. Pedrycz, H.S Park, S.K. Oh, ―A granular-oriented development of functional radial basis function neural networks‖, Neurocomputing. Vol.72, pp. 420-435, 2008. [35] S.S. Kim, K.C. Kwak, ―Development of quantum-based adaptive neuro-fuzzy networks‖, IEEE Transactions on Systems, Man, and Cybernetics-Part B, Vol. 40, No.1, pp. 91-100, 2010. [36] Alcala R., Ducange P., Herrera F., Lazzerini B., Marcelloni F, ―A Multiobjective Evolutionary Approach to Concurrently Learn Rule and Data Bases of Linguistic Fuzzy-Rule-Based Systems‖, IEEE Transactions on Fuzzy Systems. 17, 1106-1122 (2009) [37] Robles I., Alcala R., Benitez J.M., Herrera F, ―Evolutionary Parallel and Gradually Distributed 29
Lateral Tuning of Fuzzy Rule-Based Systems‖, J. Evolutionary Intelligence. 2, 5-19 (2009) [38] W. Pedrycz, H.S. Park , S.K. Oh, ―A Granular-Oriented Development of Functional Radial Basis Function Neural Networks‖, Neurocomputing. 72, 420-435 (2008) [39] Hichem Frigul, Quiem Bchir, and Naouel Baili, ― An Overview of Unsupervised and SemiSupervised Fuzzy Kernel Clustering‖, IJFIS, Vil. 13, No.4, 254-268 (2013)
APPENDIX To further evaluate the proposed model, we test four well-known big data sets with different complexities, which have different number of variables and available data. Table 10 shows the main characteristics of the four datasets. Table 11 supports a comparative analysis considering some existing models. It is evident that the proposed model compares favorably in terms of accuracy, prediction capabilities, and stability. Here MSE/2 used as the performance index in the references [36], [37], and [38] is considered as the performance index of our models (HRBFNNs).
Table 10. Descriptions of four selected machine learning data sets. Datasets
Name
Variables
Patterns
Treasury
TR
15
1049
Mortgage
MO
15
1049
Weather Izmir
IZ
9
1462
Computer Activity
CA
21
8192
Table 11. Comparison of performance with some selected models. Model
Dataset TR [36][37] MO[36][37] IZ[36][37][38] CA[36]
The best model listed in references
Without the use of PCA TR With the use of PCA Our model (HRBFNNs)
Without the use of PCA MO With the use of PCA
IZ
Without the use of PCA With the
NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3 NL=1
PI 0.08±0.04 0.05±0.02 1.48±0.34 11.99±2.99 0.105±0.008 0.102±0.006 0.102±0.007 0.083±0.014 0.080±0.015 0.079±0.014 0.058±0.002 0.056±0.004 0.053±0.004 0.0079±0.0007 0.0074±0.0008 0.0070±0.0006 0.562±0.020 0.555±0.017 0.550±0.019 0.078±0.008
VPI Unkown Unkown Unkown Unkown 0.116±0.016 0.113±0.153 0.108±0.140 0.069±0.017 0.067±0.018 0.067±0.019 0.054±0.003 0.053±0.005 0.051±0.003 0.0072±0.0009 0.0071±0.0011 0.0064±0.0010 0.620±0.013 0.616±0.016 0.607±0.014 0.092±0.007
EPI 0.14±0.15 0.09±0.10 1.64±0.34 13.43±4.66 0.115±0.020 0.116±0.020 0.118±0.023 0.083±0.012 0.081±0.011 0.080±0.011 0.057±0.005 0.058±0.009 0.057±0.005 0.0078±0.0013 0.0074±0.0015 0.0073±0.0015 0.600±0.052 0.603±0.055 0.600±0.065 0.086±0.008
30
use of PCA Without the use of PCA CA With the use of PCA
NL=2 NL=3 NL=1 NL=2 NL=3 NL=1 NL=2 NL=3
0.075±0.008 0.073±0.008 11.46±0.443 11.03±0.452 11.21±0.427 0.107±0.026 0.094±0.030 0.088±0.033
0.088±0.008 0.089±0.008 11.53±0.575 11.10±0.582 11.24±0.624 0.118±0.034 0.107±0.040 0.101±0.043
0.084±0.004 0.125±0.129 11.99±0.986 12.03±0.972 12.23±0.954 0.114±0.034 0.099±0.036 0.108±0.036
31