Neurocomputing 119 (2013) 292–307
Contents lists available at SciVerse ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
A design of granular-oriented self-organizing hybrid fuzzy polynomial neural networks Sung-Kwun Oh a,n, Wook-Dong Kim a, Byoung-Jun Park b, Witold Pedrycz c,d a
Department of Electrical Engineering, The University of Suwon, San 2-2 Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do 445-743, South Korea Telematics & Vehicle-IT Convergence Research Department, IT Convergence Technology Research Laboratory, Electronics and Telecommunications Research Institute (ETRI), 161 Gajeong-dong, Yuseong-gu, Daejeon 305-350, South Korea c Department of Electrical & Computer Engineering, University of Alberta, Edmonton, Canada T6R 2G7 d Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland b
art ic l e i nf o
a b s t r a c t
Article history: Received 27 July 2012 Received in revised form 11 March 2013 Accepted 24 March 2013 Communicated by B. Apolloni Available online 4 June 2013
In this study, we introduce a new design methodology of granular-oriented self-organizing Hybrid Fuzzy polynomial neural networks (HFPNN) that is based on multi-layer perceptron with context-based polynomial neurons (CPNs) or polynomial neurons (PNs). In contrast to the typical architectures encountered in polynomial neural networks (PNN), our main objective is to develop a design strategy of HFPNN as follows: (a) The first layer of the proposed network consists of context-based polynomial neuron (CPN). Here CPN is fully reflective of the structure encountered in numeric data which are granulated with the aid of context-based fuzzy c-means (C-FCM) clustering method. The context-based clustering supporting the design of information granules is completed in the space of the input data (input variables) while the formation of the clusters here is guided by a collection of some predefined fuzzy sets (so-called contexts) specified in the output space. (b) The proposed design procedure being applied to each layer of HFPNN leads to the selection of the preferred nodes of the network (CPNs or PNs) whose local characteristics (such as the number of contexts, the number of clusters, a collection of the specific subset of input variables, and the order of the polynomial) can be easily adjusted. These options contribute to the flexibility as well as simplicity and compactness of the resulting architecture of the network. For the evaluation of the performance of the proposed HFPNN, we use well-known machine learning data coming from the machine learning repository. & 2013 Elsevier B.V. All rights reserved.
Keywords: Granular-oriented self-organizing hybrid fuzzy polynomial neural networks Context-based polynomial neuron Polynomial neuron Context-based fuzzy c-means clustering method Information granule Machine learning data
1. Introductory remarks With the continuously growing demand for models of complex systems inherently associated with nonlinearity, high-order dynamics, time-varying behavior, and imprecise measurements, there is a need for forming a relevant modeling environment. For these problems, traditional artificial intelligence methods were used to build logicbased architectures (such as knowledge, plan, inference and so on), however, it is very difficult to apply to real system with dynamic and uncertain environments. To help alleviate the problems, the methodology and techniques of Computational Intelligence (CI) have attracted attention in various areas of science and engineering. The research has demonstrated that various CI techniques can support the design of models with better performance (e.g., accuracy or interpretability) than those being obtained when using a single “conventional” technique [1–5]. In this study, we introduce a new
n
Corresponding author. Tel.: +82 31 229 8162; fax: +82 31 220 2667. E-mail address:
[email protected] (S.-K. Oh).
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.03.029
concept of granular-oriented self-organizing hybrid fuzzy polynomial neural networks to deal with nonlinear dynamic systems with complexity and high dimensionality. The construction of network exploits fundamental technologies of CI, namely, polynomial neural network and context-based fuzzy c-means clustering [10–15]. When dealing with high-order nonlinear and multivariable equations of the model, we require a vast amount of data for estimating its parameters. The group method of data handling (GMDH) introduced by Ivakhnenko [16] is one of the approaches that helps alleviate the problem. GMDH is a technique for identifying nonlinear relationships between system’s inputs and outputs. While providing with a systematic design procedure, GMDH has some drawbacks. First, it tends to generate quite complex polynomials for relatively simple systems. Second, owing to its limited generic structure (quadratic two variable polynomial), GMDH also tends to produce an overly complex network when it comes to highly nonlinear systems. In alleviating the problems of the GMDH algorithms, polynomial neural network (PNN) [6–9] were introduced as a new class of networks by Oh et al. The structure of the PNN is similar to that of a feedforward neural network. There is a certain important difference, though. PNN is not
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
a statically organized network whose topology has been predefined and then left intact. To the contrary, in the design of PNN we encounter dynamically generated networks whose topologies could be dynamically adjusted throughout the entire design process. PNN networks come with a high level of flexibility as each node of the network could have a different number of input variables as well as exploit a different order of the polynomial (linear quadratic, cubic, etc). In comparison to well-known neural networks whose topologies are in general set up prior to all detailed (parametric) learning and left unchanged during the optimization, the PNN architecture is not fixed in advance but becomes fully optimized (both structurally and parametrically). As a result, these networks exhibited a significantly improved performance in comparison with the previously constructed models [6–9]. In this study, we introduce a new modeling architecture called granular-oriented self-organizing hybrid fuzzy polynomial neural networks (HFPNN), which is constructed with the aid of contextbased fuzzy c-means (C-FCM) clustering as a tool of data processing for information granulation and PNN used to express the nonlinearity of a complex system. For effective modeling and to achieve accurate prediction abilities of the proposed model, the underlying design strategy comprises several aspects.
With respect to the network design, the first layer of the proposed
network consists of context-based polynomial neuron (CPN). CPN is fully reflective of the structure encountered in numeric data, which are granulated with the aid of C-FCM clustering method. The context-based clustering supporting the design of information granules is completed in the space of input variables. The input data while the build of the clusters is guided by a collection of some predefined fuzzy sets (so-called contexts) defined in the output space. In other words, the aim of CPN is an effective usage of granulation, which occurs at the level of the output variables (in which we exploit fuzzy c-means clustering) and the input variables (in which case we employ C-FCM clustering that reflects upon the structure already formed in the output space). The proposed design procedure being applied at each layer of HFPNN leads to the selection of preferred nodes of the network (CPNs or PNs) whose local characteristics (such as the number of contexts, the number of clusters, a collection of the specific subset of input variables, and the order of the polynomial) can be easily adjusted. These options contribute to the flexibility as well as simplicity and compactness of the resulting architecture of the network.
To evaluate the performance of the proposed model, we present three experimental studies exploiting well-known machine learning dataset being already used in the realm of neural network modeling (say, Automobile Miles Per Gallon, Boston Housing Price, and Medical Imaging System) [21]. The obtained results demonstrate superiority of the proposed networks over an already development models. This paper is organized in the following manner. First, Section 2 provides the introduction to the architecture and the development of CPN and PN layers of the HFPNN. Section 3 gives an overall description of a detailed design methodology of HFPNN. In Section 4, we report on comprehensive set of experiments. Finally concluding remarks are covered in Section 5.
2. The architecture of granular-oriented self-organizing HFPNN In this section, we elaborate on the design and architecture of the granular-oriented self-organizing Hybrid Fuzzy polynomial
293
neural networks (HFPNN). This network emerges on a basis of the context-based polynomial neuron (CPN) and polynomial neuron (PN) being regarded as a generic type of the processing units. Namely, the architecture of the resulting HFPNN combines CPN that are located at the first layer of the network with PN forming the remaining layers of the network as shown in Fig. 1. In order to enhance the abilities of data processing with the aid of information granulation, CPNs build the first layer of the HFPNN based on polynomial neural networks (PNN [6–9]). CPN uses a concept of the context-based fuzzy c-means (C-FCM [10–15]) clustering which brings forward an idea of grouping data which focuses on structure discovery that involves a simultaneous treatment of input and output data. It fully reflects characteristics of the system under modeling (experimental data, to be more specific) than PNN.
2.1. Data-driven development method Roughly speaking, information granules are formed as linked collections of objects (data points, in particular) drawn together by the criteria of indistinguishability, similarity or functionality [22]. Information granulation is a process of extracting meaningful concepts (information granules) from large collections of numeric data. It comes as an inherent activity of human being carried out with the purpose of a better comprehension of the underlying problem. The most commonly used formalisms of information granulation include sets, fuzzy sets, rough sets, random sets, and probabilistic granules. Here we are concerned with fuzzy sets formed with the aid of fuzzy c-means (FCM) [17,23,24]. We use the FCM clustering to determine a structure in the output space and afterwards exploit the specialized fuzzy clustering, referred to as C-FCM, in the input space by making sure that the resulting information granules formed there are directly associated with the clusters already constructed in the output space.
2.1.1. Information granulation of output space using fuzzy c-means clustering In this section, we are concerned with the FCM clustering. The FCM clustering method is generally one of the clustering method used widely as a data preprocessing and analysis features of a given data based on the information of the identified data by splitting the input data more efficiently than traditional methods of data preprocessing and also attempts to find the most representative data point in each cluster [17]. The FCM clustering partitions a collection of output data, that is {targetk}, k¼ 1,2,…,N, into P clusters. Here, P stands for a context. In order to form the membership matrix (partition matrix) T, the FCM clustering algorithm is invoked as outlined below. [Step 1] Determine the number of contexts and membership matrix (partition matrix) T(r) which is initialized in a random fashion. ( ) TðrÞ ¼
p
t ik ∈½0; 1; ∑ t ik ¼ 1 ∀k; i¼1
N
0 o ∑ t ik oN ∀i
ð1Þ
k¼1
[Step 2] Calculate the centers (prototypes) and the partition matrix for each context based on the values of tik resulted from Step 1. !! ðrÞ ðrÞ vi ðrÞ ¼ fvðrÞ ¼ i1 ; …; vip g; v ij
N
∑ ðt ik Þm ⋅x=
k¼1
N
∑ ðt ik Þm
k¼1
ð2Þ
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
...
...
294
Σ
Fig. 1. Architecture of the granular-oriented self-organizing HFPNN.
t ik ¼
∑pj¼ 1
1
2=ðm−1Þ
jjxk −vðrÞ jj i
ð3Þ
where m 4 1 is a fuzzification coefficient (fuzzification factor) and vi(r) denotes the center vector of output. In this paper, we consider m ¼2.0, which is a commonly used value. [Step 3] Update the partition matrix. These modifications (updates) are based on the Euclidean distance determined between the data points and center of each context. " #1=2 dik ¼ dðxk −viðrÞ Þ ¼
partition matrices induced by the j-th context and denote it by U(Tj) (
jjxk −vðrÞ jj j
q
2 ∑ ðxkq −vðrÞ ij Þ
UðTj Þ ¼
c
uik ∈½0; 1; ∑ uik ¼ t jk ∀k; i¼1
N
)
0 o ∑ uik oN ∀i
ð6Þ
k¼1
Recall that c is the number of clusters in input data and tjk denotes a membership value of the k-th datum to the j-th context obtained by FCM clustering in output space. The objective function of the C-FCM clustering is defined as follows:
ð4Þ
j¼1
c
N
2 V ¼ ∑ ∑ um ik jjx k −vi jj
ð7Þ
i¼1k¼1
[Step 4] Check a satisfaction of the termination criterion. If the condition jjT
ðrþ1Þ
ðrÞ
−T jj≤ε ðtolerance levelÞ
ð5Þ
is satisfied, then stop; otherwise set r ¼r+1 and return to [Step 2]
where m (m 41) and vi denote a fuzzification coefficient and the prototype located in the input space, respectively. The minimization of the performance index V is realized under the constraints expressed by (6) so in essence we end up with “P” separate clustering tasks implied by the corresponding contexts. Briefly speaking, we have Min V subject to UðTj Þ; j ¼ 1; 2; …; N
2.1.2. Information granulation of input space using context-based fuzzy c-means clustering The context-based clustering supporting the design of information granules is completed in the space of the input data while the development of the clusters is guided by a collection of some predefined fuzzy sets (so-called contexts) expressed in the output space [10–15]. This means that we first decide upon some granulation of the output variable and afterwards produce some information granules being, in fact, induced by the successive fuzzy sets already formed for the output variable. By taking into account the contexts, the clustering in the input space becomes navigated (focused) by some predefined fuzzy sets of contexts. This helps to reveal the relationships between the regions of the input space and the output space. We use FCM clustering in order to form contexts in the output space as mentioned in Section 2.1.1. We cluster the input data xk (k¼ 1, 2,…,N) into c clusters using the C-FCM clustering algorithm. Let us introduce a family of the
ð8Þ
The minimization of V as completed by the C-FCM clustering is realized by iteratively updating the values of the partition matrix and the centers. The update of the partition matrix is completed as follows. uik ¼
c
∑
q¼1
t ik
2=ðm−1Þ ;
jjxk −vi jj jjxk −vq jj
i ¼ 1; 2; …; c;
k ¼ 1; 2; …; N
ð9Þ
Here, in contrast with the matrix formed by the FCM algorithm, tik is not 1.0 but is i-th row of the partition matrix (obtained as described in Section 2.1.1) while x is the vector of the selected input variables. This exhibits the relationships between the regions of the input space and the output space as partitioning input space based on contexts for information granulation in input space.
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
The values of centers (v1, v2, …, vc) are calculated in the following form. N
∑
vi ¼
k¼1 N
um ik xk ð10Þ
∑ um ik
k¼1
i ¼ 1; 2; …; c:
295
2.2. Context-based polynomial neuron (CPN) The network of the proposed CPN dwells on the concept of C-FCM. The fuzzy partition formed for selected variables give rise to the topology as visualized in Fig. 2. In particular, the structure of CPN as shown in Fig. 2 comes with two variables selected out of four input variables, two context, and two clusters for each context. Alluding to the terminology of fuzzy rule-based systems, if a polynomial of the consequence part of fuzzy rules is linear (1st
Fig. 2. A general architecture of the CPN based HFPNN.
...
...
Fig. 3. A general architecture of the PN based HFPNN.
296
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
order polynomial), this structure translates into the following collection of rules: Ri : If x is Pth context and cth cluster then z^ i ¼ aT0 þ aTi xi
ð11Þ
where, z^ i is the i-th local model (i¼1, 2, …, Pc), x is the selected input set of the P-th context. Ri stands for the i-th rule. When taking into account all rules (as being represented by the network as shown in Fig. 2), we arrive at the expression governing the output of the network of CPN. i
c z^ ðxÞ ¼ ∑ ui ðxÞ aT0 þ aTi x1 þ i¼1
þ
pc
∑
i ¼ ðp−1Þcþ1
2c ∑ ui ðxÞ aT0 þ aTi x2 þ ⋯
i ¼ cþ1
ui ðxÞ aT0 þ aTi xp
ð12Þ
The performance index of the CPN to be investigated comes in the form of the root mean square error (RMSE) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 1 N ð13Þ RMSE ¼ ∑ target k −z^ k Nk¼1 where, target is the k-th original output data and stands for the k-th actual output of network for this specific data point. In addition, the standard least square error (LSE) is used to determine the coefficients of each of the local models. As shown in Figs. 1 and 2, z^ i obtained by (12) is treated simultaneously by an output of CPN node in the first layer and an input of PN node in the second layer. 2.3. Polynomial neuron (PN) As underlined, the PN based layer of HFPNN is based on the GMDH method and utilizes a class of polynomials such as linear, quadratic, modified quadratic, etc. to describe basic processing realized there. By choosing the most significant input variables and an order of the polynomial among various types of forms available, we can form the best architecture. Fig. 3 illustrates the basic form of PN formed by two inputs (zp, zq) and the quadratic form of the polynomial. PN is a basic processing unit forming the second layer or higher of the network of HFPNN. If the selected inputs come into a PN node, the node selects one of three types of polynomial shown in Table 1 and constructs a partial description [6–9]. The coefficients (ai) of polynomial are optimized through the minimization of (13). It is
Table 1 Type of polynomial equation. Order
Type
Polynomial equation
1 2
L: Linear Q: Quadratic
f ðxÞ ¼ a0 þ a1 xi þ a2 xj
2
M: Modified quadratic
f ðxÞ ¼ a0 þ a1 xi þ a2 xj þ a3 xi xj
f ðxÞ ¼ a0 þ a1 xi þ a2 xj þ a3 xi xj þ a4 x2i þ a5 x2j
realized by selecting nodes at each layer and eventually generating additional layers until the best performance has been reached. Such methodology leads to an optimal PN based layer of HFPNN. 2.4. Characteristics of the architecture and ensuing learning technique The proposed model is focused on improvement of the performance as well as simplification of the architecture when compared with the previously different hybrid models based on GMDH or PNN. There are a number of evidently distinct features of the proposed HFPNN and they are enumerated as follows: (a) Differences in terms of the architecture and the underlying design methodology. The first layer of the proposed model (HFPNN) consists of context-based polynomial neurons (CPNs) while the second and higher layers are composed of polynomial neurons (PNs), while all layer of the recently proposed RBF-based PNN (RBFPNN) is made up of radial-based polynomial neuron (RPNs) [27]. CPNs are based on Context FCM while RPNs are based on FCM. The second and higher layers of the proposed model are composed of Polynomial Neurons (PNs) and the PNs are designed based on the three types of polynomials. However, in all layers of the conventional RBF-based PNNs, some parameters to be optimized within the RPNs should be taken into account such as the fuzzification coefficient, the number of clusters and the order of polynomial because all layers of the network consist of RPNs. Consequently, the proposed model exhibits a simple and compact architecture. (b) Differences in terms of used learning techniques The learning algorithm for both models is carried out by the least square estimation (LSE)-based learning. LSE-based learning is used within the CPNs of the 1st layer designed as CFCM-based RBFNNs and PNs of the 2nd layer or higher, while used within RPNs of all layers designed as FCM-based RBFNNs. In case of CPNs of the first layer being designed as CFCM-based RBFNNs, each node of the first layer is based on the context fuzzy c-means clustering algorithm supporting the design of information granules instead of receptive fields (RBF). CFCM is processed in the space of the input variables (input space) and needs some predefined fuzzy sets (so-called contexts) defined in the output space. In general, the fuzzy sets(so-called contexts) is defined by using K-Means or FCM clustering methods in the output space. In case of RPNs of all layers designed as FCM-based RBFNNs, on the other hand, FCM algorithm is used only in the space of the input data and the partition matrix of FCM is directly used to produce levels of activation of the receptive fields (RBF). (c) Advantages in terms of the architecture and the performance of the proposed model
Table 2 Generic parameters used in the construction of the HFPNN. Index
Item Stopping condition
HFPNN
CPN PN
Maximal number of inputs for a node Number of nodes (W) forming a layer Number of contexts (P) Number of clusters (c) Order of polynomial Order of polynomial
Criterion Comparison of fitness of optimal nodes maximal number of layers ¼3 2–4 30 2–5 2–5 L L, Q, M
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
When designing the architecture of conventional RBFPNN, the structural and parametric factors (such as the number of clusters and fuzzification coefficient of FCM, the number of input variables
297
and the order of polynomial equation) should be taken into account in all nodes of each layer. In the other hand, because the second and higher layer of the proposed HFPNN is composed of
Fig. 4. An overall flowchart of the HFPNN algorithm.
298
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
polynomial equation, the structural complexity of model is quite decreased as well as the performance of the proposed HFPNN is better than that of the RBFPNN. Moreover, in the proposed model, the over-fitting problem when dealing with testing datasets can be alleviated (by looking at the performance results compared to the results produced by the conventional model).
3. The design of the granular-oriented self-organizing HFPNN In this section, we elaborate on the algorithmic details of the design method by considering the functionality of FCM, C-FCM and PNN in the architecture of HFPNN in order to process effectively information granulation for a characteristic of given data. The methodology behind the design of the previous PNN [6–9] has resulted in more advanced GMDH model, whereas the proposed methodology applying FCM and C-FCM clustering method gives rise to a new concept of polynomial neural network. The design procedure of HFPNN with CPN and PN comprises the following steps. [Step 1] Determine system’s input variables Define system's input variables x1, x2, …, xn related to the output variable y. If required, the normalization of input data can be completed. [Step 2] Form training and testing dataset The input–output data set (xi,yi) ¼(x1i, x2i,…, xni,yi), i ¼1, 2, …, N (with N being the total number of data points) is divided into the training and testing datasets. Denote their sizes by NPI and NEPI respectively. Obviously we have N ¼NPI+NEPI. The training data set is used to construct the HFPNN. Next, the testing data set is used to evaluate the quality of the network. [Step 3] Decide upon the basic parameters use in the construction of the HFPNN Here we decide upon the values of the essential design parameters of HFPNN. The values of the parameters are presented in Table 2. [Step 4] Choose a structure of the HFPNN We determine the number of input variables of a node from N input variables x1, x2, …, xN, the number of contexts and clusters of CPN in the first layer, and the orders of polynomial of CPN and PN for details refer to Table 2. For the selection of r inputs, the number of nodes to be generated in each layer becomes k¼n!/(n−r)!r!, where, n is the number of total inputs and r stands for the number of the chosen input variables. For instance, in case of PN with 2 inputs and quadratic polynomial, we have “k” partial descriptions;
refer also to Fig. 3 and Table 1. z^ k ¼ a0 þ a1 xp þ a2 xq þ a3 xp xq þ a4 x2p þ a5 x2q ; k ¼ 1; 2; …;
n! ðn−2Þ!2!
ð14Þ
The choice of the essential design parameters of HFPNN shown in Table 2 helps us choose the best model with regard to the characteristics of the data, model design strategy, nonlinearity and predictive abilities. [Step 5] Estimate the coefficient parameters of the polynomial in a node Using the training data subset, the coefficients used in (12) and (14) are derived by minimizing (13) utilizing the standard least squared estimation method. Evidently, the optimal values of the coefficients of each node in the layer are expressed in the form A ¼ ðXT XÞ−1 XT Y
ð15Þ T
where, Y ¼[y1, y2 … yNPI] , X ¼[x1, x2 … xNPI], and A¼ [a0, a1 … a5]. [Step 6] Select nodes with the best predictive capability and construct their corresponding layer Each node (CPN or PN) whose coefficient parameters (a0, a1, …, a5) of a polynomial were estimated using the training data subset is evaluated by computing an identification error (EPI) using the testing data set. Then we compare EPIs and choose several nodes that give the best predictive performance (the lowest EPI) for the output. Namely, the nodes obtained on the basis of the calculated performance (EPI1, EPI2, …, EPIn!/(n−r)!r!) are rearranged in a descending order and then we choose several nodes characterized by the best performance according to the order of the lowest EPI. Here, we use the pre-defined number, W, of nodes with better predictive capability that must be preserved for optimal operation of the next iteration in the HFPNN. The number of nodes (W) is guided by the following selection method. There are two cases as to the number of the preserved nodes in each layer (i) If the number of the created nodes is n!/(n−r)!r! 4W, then the number of the nodes retained for the next layer is equal to W. Namely, W nodes of the all generated nodes are selected according to the low-end order of EPI and the remained nodes, (n!/(n−r)!r!)–W, are discarded. (ii) If the number of the created nodes is n!/(n−r)!r!≤W, then for the next layer, the number of the retained nodes is equal to n!/ (n−r)!r! Through the previous conditions (i)–(ii), it can effectively reduce a large number of nodes and avoid a large amount of time-consuming iterations of layers.
Table 3 Performance index of HFPNN for the MPG data. SI
c
P 2
2
3
4
2 3 4 5 2 3 4 5 2 3 4 5
3
4
5
Ty
PI
EPI
Ty
PI
EPI
Ty
PI
EPI
Ty
PI
EPI
Q Q M L Q M Q Q Q Q M M
2.37357 0.177 2.1994 7 0.159 2.28017 0.145 2.0699 7 0.154 2.09767 0.197 2.0879 7 0.159 1.9054 7 0.190 1.89147 0.123 2.03247 0.144 1.8432 7 0.138 1.8826 7 0.075 1.7540 7 0.129
2.4459 7 0.296 2.5793 7 0.148 2.4598 7 0.322 2.63737 0.384 2.1922 7 0.268 2.2048 7 0.185 2.2922 7 0.272 2.31967 0.144 2.11887 0.119 2.37277 0.256 2.2298 7 0.126 2.29677 0.281
Q L Q L Q Q Q M Q M L M
1.8340 7 0.190 1.74137 0.147 1.59067 0.237 1.57607 0.086 1.64377 0.127 1.5282 7 0.158 1.3368 7 0.156 1.27817 0.110 1.51947 0.163 1.5430 7 0.149 1.36607 0.155 1.1632 7 0.171
2.0403 7 0.413 1.94997 0.271 2.02747 0.267 2.0380 7 0.269 1.7596 7 0.280 1.8895 7 0.336 1.95977 0.286 2.07547 0.156 1.88317 0.193 1.77107 0.255 1.88477 0.239 2.18707 0.284
Q Q L Q M M L M Q M L L
1.5385 7 0.198 1.35177 0.204 1.32357 0.172 1.2289 7 0.108 1.3946 7 0.127 1.2068 7 0.128 1.0823 7 0.189 1.0694 7 0.153 1.2565 7 0.134 1.04707 0.143 1.0692 7 0.147 0.86917 0.137
1.6741 70.400 1.7351 70.302 1.6632 70.237 1.7846 70.208 1.4404 70.263 1.6449 70.213 1.6938 70.299 1.877870.307 1.4657 70.246 1.6721 70.295 2.0713 70.404 2.0775 70.331
Q M L M M M M M M L L L
1.0454 70.177 1.0223 70.145 0.9102 70.093 0.8603 70.157 0.9411 70.135 0.8777 70.167 0.7284 70.099 0.5809 70.100 0.9547 70.117 0.8118 70.141 0.6132 70.074 0.4898 70.054
1.40297 0.234 1.4853 7 0.226 1.7650 7 0.285 1.7880 7 0.331 1.1806 7 0.233 1.22717 0.232 1.45917 0.316 1.8754 7 0.575 1.12707 0.163 1.7360 7 0.822 2.02977 0.555 2.81657 0.972
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
[Step 7] Check the stopping criterion The termination condition that controls the growth of the model consists of the two components, that is the performance index (represented in terms of comparison of fitness of optimal nodes) and a size of the network (expressed in terms of the maximal number of the layers). As far as the performance index is concerned (that captures the aspect of accuracy), a termination is straightforward and comes in the form, F 1 ≥F n
ð16Þ
network has been experimentally found to build a sound compromise between the high accuracy of the resulting model and its complexity as well as generalization abilities. [Step 8] Determine new input variables for the next layer If a stopping criterion has not been satisfied, the model is expanded. The outputs of the preserved nodes (zli, z2i, …, zWi) serves as new inputs to the next layer (x1j, x2j, …, xWj), namely, zli ¼x1j, z2i ¼x2j, …, zWi ¼ xWj, where, j¼i+1. The HFPNN algorithm is carried out by repeating step 4–8. If a stopping criterion has been met, for the polynomial of the node with the optimal performance Fn in the last layer, the output equation of a node in the previous layer substitutes the input of the optimal node. After the same operation is repeated until the first layer, the final model y^ is resulted. An overall design flowchart for the proposed HFPNN architecture is shown in Fig. 4.
1.4
2.4
1.3
2.2 Training data error
Training data error
where, F1 denotes a minimal error (fitness) occurring at the current layer, whereas Fn stands for the performance of an optimal node with a minimal error that occurs at the previous layer. As far as the depth of the network is concerned, the generation process is stopped at a depth of less than three layers. This size of the
1.2 1.1 1 0.9 0.8
1.8 1.6 1.4
1 2
3 4 No. of clusters
5
2
3 4 No. of clusters
5
2
3 4 No. of clusters
5
2
3 4 No. of clusters
5
2.6
1.1
2.4 Training data error
1 Training data error
2
1.2
0.7
0.9 0.8 0.7 0.6
2.2 2 1.8 1.6 1.4 1.2
0.5
1
0.4
0.8 2
3 4 No. of clusters
5
1.1
4
1
3.5 Training data error
Training data error
299
0.9 0.8 0.7 0.6
3 2.5 2 1.5 1
0.5
0.5
0.4 2
3 4 No. of clusters
5
Fig. 5. Performance index of the proposed model versus of the increasing number of clusters. (a) No. of selected inputs variables: 2, (b) No. of selected inputs variables: 3, (c) No. of selected inputs variables: 4.
300
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
4. Experimental studies The proposed architecture of the network, its development, and the resulting performance are illustrated with the aid of a series of numeric experiments. These series of experiments is concerned with some selected data sets from the machine learning Repository, University of California at Irvine [21]. First one is automobile miles per gallon (MPG) data, the second dataset deals with Boston housing data, and the last dataset is medical imaging system (MIS). The experiments completed in this study are reported for the 10-fold cross-validation where we randomly divide the data set into the training (60%) and testing (40%) parts of data, respectively, in all the experiments. To come up with a quantitative evaluation of the resulting HFPNN, we used the standard performance index of the RMSE as expressed by (13).
4.1. Automobile miles per gallon (MPG)
CPN1 Cylinders CPN8 Displacement
CPN18
Acceleration Model year CPN26
PN23
Origin CPN27 Fig. 7. HFPNN architecture (SI ¼ 4, PI ¼0.7895, EPI¼ 0.9991, P¼ 5, c¼ 2).
45
40
40
20
PN20
CPN25
50
25
PN1 CPN19
45
30
PN2
Weight
50
35
PN1
CPN11 Horsepower
Model output
Model output
We use the well-known automobile MPG data with the output being the automobile’s fuel consumption expressed in miles per gallon. Here we consider 392 input–output pairs (after removing incomplete data points). The number of input variables is seven, namely, cylinders, displacements, horsepower, weight, acceleration, model year, and origin. Table 3 summarizes the performance of HFPNN according to the information of Table 2. Here, SI, c, P and Ty denote the number of selected input variables, the number of clusters, the number of contexts and the type of polynomial of a neuron, respectively. PI is the performance index for training data set, and EPI concerns the performance for the testing dataset. The performance is evaluated in terms of the average and standard deviation of RMSE determined over 10 times repetition random sub-sampling validation for dividing 392 dataset into 235 training data (60%) and 157 testing data (40%). The results are the performance of the best model with an optimal combination of the input variables and the type of polynomial and these were obtained at the output of last layer (third layer). As indicated in Table 3, there is a clearly visible tendency that the values of the performance index get lower with the increase of the number of contexts, namely the best performance emerges when the model has 5 contexts in this study. This effect is not surprising as too many information granules (leading to higher information granularity level) might contribute to a potential memorization effect. In contrast, the error of a model is reduced by decreasing the number of clusters for each context. This means that the proposed model with the simple structure has better performance than the model exhibiting the complex structure of CPN forming the first layer of HFPNN.
In case when the number of the selected inputs is equal to two, the optimal performance (quantified as PI ¼1.0454 70.177 and EPI¼ 1.4029 70.234) is reported for the network with five contexts, two clusters per each context and the Q type of polynomial. The lowest value among the ten evaluation of the optimal network is PI ¼0.8035 and EPI¼ 1.0459. In case of three inputs, the best network with five contexts, three clusters per each context and M type of polynomial has PI¼ 0.6972 and EPI¼ 0.9462 (the best of the ten results is quantified by PI ¼0.8777 70.167 and EPI ¼1.2271 7 0.232). For the SI ¼4, the preferred network emerges as the one with five contexts, two clusters per each context and the M type of polynomial as PI¼0.9547 70.117, EPI ¼1.127070.163. In the 10-fold cross validation, the best performance comes with PI¼0.7895 and EPI ¼0.9991. Fig. 5 displays the values of the performance index (both in terms of its mean value and the standard deviation) vis-à-vis the number of cluster at 5 contexts shown in Table 3. As shown in this figure, the value of the training error is improved with the increase of the number of clusters, whereas the testing error becomes higher. The scatter plots shown in Fig. 6 visualize the performance of the constructed networks. The selected network with 4 inputs, 5 contexts, 2 clusters per each context and M type of polynomial comes with PI¼0.7895 and EPI ¼0.9991 for training and testing data, respectively. Fig. 7 illustrates the details of the optimal topologies of HFPNN for the third layer when we used 4 inputs, 5 contexts, 2 clusters per each context and M type of polynomial as shown in Fig. 6. Table 4 covers the results of comparative analysis of the performance of the proposed network with the performance of some other models.
35 30 25 20
15
15
10
10 5
5 5
10 15 20 25 30 35 40 45 50 Original output
5
10 15 20 25 30 35 40 45 50 Original output
Fig. 6. Scatter plots showing model output versus original output (PI ¼ 0.7895, EPI¼ 0.9991, SI ¼4, P ¼5, c¼ 2).
yˆ
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
In the RBFNN, the hidden layer consists of receptive fields and the optimal values of connections (weights) are estimated using Least Square Estimator (LSE) method. In case of RBFNN realized with context-free clustering, FCM algorithm as context-free clustering is used in the hidden layer, and the connections are estimated in the same manner as before. In case of the linguistic modeling, incremental model, and functional RBFNN, these models are based on the use of the context-based FCM algorithm. The linguistic model used iterative descent method to learn parameters while the incremental model and the functional RBFNN exploited LSE to estimate the values of the coefficients or the connections. In case of the HFSPNN, the fuzzy model combined with information granulation is included within PNs and the architecture of PNN of the network is optimized using genetic algorithm and is based on the extended group method of data handling (GMDH) algorithm. The experimental results reveal that the proposed network outperforms the existing models in terms of its approximation as well as generalization abilities.
301
4.2. Boston housing price (BHP) The Boston house price data was first published by D. Harrison, and D.L. Rubinfeld in their paper ‘Hedonic prices and the demand for clean air” [25]. The dataset deals with the house price and consists of 506 observations and 13 independent variables and a single output variable, For the BHP dataset, the optimal performance of the proposed HFPNN is summarized in Table 5. As mentioned previously, 506 data are partitioned randomly into the 304 training data (60%) and the 202 testing data (40%). As before the performance index of the proposed model was computed over 10-fold realization of the experiments. As shown by the results, the tendency concerning the number of contexts and clusters is similar to the one observed for the network for the MPG data. When considering a sound balance between the approximation and generalization capabilities of the proposed model, the best performance is reported for 5 contexts. With the selection of two inputs, the HFPNN with 5 contexts, 2 clusters per each context and M type of polynomial is preferred
Table 4 Comparative performance analysis of various models for MPG. Model
PI
RBFNN [12] RBFNN with context-free clustering [12] Linguistic modeling [12] Without optimization One-loop optimization Multi-step optimization Incremental model [13] Linear regression Polynomial(2nd order) Incremental model Functional RBFNN [14] m¼ 2.0 m¼ 2.05 m¼ 2.75 HFSPNN [8] (3rd layer) Triangular MF Gaussian MF PSO-based RBFPNNs [27] Case I Proposed model (3rd layer) SI ¼2 SI ¼3 SI ¼4
EPI
Index
3.247 0.24 3.217 0.21
3.62 7 0.31 3.517 0.27
3.78 7 1.52 2.90 7 0.32 2.86 7 0.83
4.22 7 1.22 3.177 1.01 3.147 0.98
3.383 7 0.194 2.8077 0.122 2.390 7 0.142
3.4727 0.295 2.9727 0.196 3.0607 0.285
2.905 7 0.441 2.369 7 0.339 5.8047 0.394
12.0247 1.897 13.8417 1.452 7.936 7 1.100
1.962 7 0.214 1.886 7 0.133
2.396 7 0.180 2.4797 0.192
RMSE
1.1857 0.181
2.6827 0.122
RMSE
1.0454 7 0.1774 0.87777 0.1677 0.95477 0.1170
1.4029 7 0.2346 1.22717 0.2322 1.12707 0.1634
RMSE
RMSE
RMSE
MSE
Table 5 Performance of HFPNN–the BHP data. SI
c
P 2
3
4
5
Ty
PI
EPI
Ty
PI
EPI
Ty
PI
EPI
Ty
PI
EPI
2
2 3 4 5
Q Q Q Q
2.93577 0.179 2.6403 7 0.249 2.6553 7 0.298 2.39517 0.219
2.98647 0.211 3.00447 0.614 3.05067 0.204 2.99447 0.249
Q Q Q Q
2.1053 7 0.261 1.94137 0.131 1.81717 0.252 1.69747 0.096
2.1887 70.417 2.2476 70.203 2.2316 70.259 2.2605 70.175
Q Q Q Q
1.5546 7 0.065 1.40157 0.156 1.2886 7 0.185 1.20767 0.164
1.69157 0.060 1.67967 0.182 1.9430 7 0.278 2.1532 7 0.469
Q Q Q Q
1.30557 0.081 1.2509 7 0.113 1.11287 0.161 0.9050 7 0.181
1.40967 0.127 1.4628 7 0.193 1.7463 7 0.238 2.11477 0.452
3
2 3 4 5
Q Q Q Q
2.5786 7 0.235 2.2588 7 0.110 2.07007 0.134 2.0030 7 0.104
2.5509 7 0.115 2.5652 7 0.197 2.5602 7 0.246 2.5296 7 0.073
Q Q Q M
1.84837 0.101 1.73017 0.154 1.61197 0.155 1.5080 7 0.175
1.9857 70.100 2.0241 70.197 2.0700 70.245 2.1043 70.252
Q M M M
1.44147 0.097 1.37177 0.142 1.14197 0.116 0.9620 7 0.112
1.5249 7 0.174 1.62177 0.147 1.7422 7 0.276 1.99167 0.453
M L M M
1.26807 0.050 1.17727 0.073 0.9546 7 0.132 0.6985 7 0.056
1.34337 0.080 1.40337 0.243 1.56907 0.176 2.0965 7 0.634
4
2 3 4 5
M M M Q
2.37797 0.147 2.07397 0.171 1.8909 7 0.170 1.77017 0.117
2.4825 7 0.181 2.3460 7 0.127 2.4098 7 0.144 2.41817 0.159
Q M M M
1.73977 0.109 1.66357 0.104 1.41447 0.097 1.3422 7 0.136
1.8777 70.210 1.8313 70.128 1.9425 70.148 2.0573 70.179
M Q M M
1.36637 0.070 1.15577 0.133 0.97617 0.085 0.8082 7 0.053
1.5025 7 0.127 1.7482 7 0.222 1.83317 0.315 2.52187 0.596
M M L L
1.12917 0.066 0.97387 0.078 0.97947 0.143 0.66417 0.075
1.32767 0.203 1.38277 0.159 2.18477 0.525 3.0536 7 0.926
302
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
3.2 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 2
4.3. Medical imaging system (MIS) In this section, we consider a medical imaging system data set which involves 390 software modules written in Pascal and FORTRAN. Each module is described by 11 input variables, that is, total lines of code including comments (LOC), total code lines (CL), total character count (TChar), total comments (TComm), number of comment characters (MChar), number of code characters (DChar), Halstead's
3 4 No. of contexts
3.2 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4
5
2.6
3
2.4
2.8
2.2
2.6
Training data error
Training data error
In the second layer, four PNs are selected and the 17th PN of the third layer refers to the best node that has better performance than any other node. Fig. 10 visualizes the difference between the original output and the output produced by the network presented in Fig. 9. The comparative analysis reported in Table 6 contrasts the proposed model with other models. The proposed models are preferred as architecture in the modeling of BHP dataset given approximation and generalization aspects of the model.
Training data error
Training data error
and the performance quantifies as PI¼ 1.3055 70.081 and EPI ¼ 1.4096 70.127 for the training and testing data, respectively. The five contexts, four clusters per each context and M type of polynomial are selected with PI¼0.9546 70.132 and EPI ¼ 1.5690 70.176 for three inputs. For four inputs, the optimal networks has the performance PI¼0.9738 70.078 and EPI ¼1.3827 7 0.159 at the five contexts, three clusters per each context and the M type of the polynomial. The tendency of the changes in the performance index versus the number of contexts is visualized in Fig. 8. Fig. 9 visualizes the architecture of a selected granular-oriented self-organizing polynomial neural network for PI ¼0.9716 and EPI ¼1.1443. The selected network with 4 inputs, 5 contexts, 2 clusters per each context and M type of polynomial comes with PI ¼0.7895 and EPI ¼0.9991 for training and testing data, respectively. As shown in Figs. 9 and 12 input variables except for “Indus” variable are linked to the first layer that consists of selected 12 CPNs among the overall number of 30 CPNs.
2 1.8 1.6 1.4 1.2
2
3 4 No. of contexts
5
2
3 4 No. of contexts
5
2
3 4 No. of contexts
5
2.4 2.2 2 1.8 1.6 1.4
1 0.8 2
3 4 No. of contexts
5
2.8
2.4
2.6
2.2
2.4
Training data error
Training data error
2.6
2 1.8 1.6 1.4 1.2
2.2 2 1.8 1.6 1.4
1 0.8 2
3 4 No. of contexts
5
Fig. 8. Performance index of the proposed model vis-a-vis the increase of the number of contexts. (a) No. of selected inputs variables ¼ 2, c ¼2, (b) No. of selected inputs variables ¼ 3, c ¼ 4, (c) No. of selected inputs variables: 4, c ¼3.
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
_ program length (N), Halstead’s estimated program length (N), Jensen’s estimator of program length (NF), McCabe’s cyclomatic complexity (V (G)), and Belady’s bandwidth metric (BW). The output variable of the model is the number of reported changes—change reports (CRs). Applying the proposed design methodology, the given dataset is randomly partitioned to produce two data sets. The first 60% of data set is used for training the models. The remaining 40% of data set, the testing data set, is provided to quantify the predictive quality (generalization ability) of the fitted models. As before, we consider the RMSE (13) as the performance index. Table 7 reports the results obtained for to the number of selected inputs, the number of contexts, the number of clusters per each context and the type of polynomial. The performance indexes have gotten in the third layer (the last layer) of network. As shown in Table 7, the optimal model could emerge by looking at the number of assigned contexts. For SI set to 2, the optimal performance is PI¼ 2.214170.879 and EPI¼3.463071.490 reported for 3 contexts, 3 clusters per each context and the Q type of polynomial. In case when the number of the selected inputs is 3, the optimal network comes with 4 contexts, 4 clusters per each context and L type of polynomial has PI¼0.885270.082 and EPI¼3.469070.780 as a performance index. The preferred HFPNN with 4 inputs is represented by five contexts, two clusters per each context and the L type of polynomial and shows PI¼0.781870.085 and EPI¼ 3.421171.306 for a sound balance between approximation and prediction capabilities.
CPN1 CPN4 Crim Zn Indus Chas Nox Rm Age Dis Rad Tax Ptratio B Lstat
CPN5 PN3
CPN7 CPN11
PN10
CPN12
yˆ
PN17 CPN16
PN13
CPN17 CPN19
PN21
CPN23 CPN26 CPN30
Fig. 9. Architecture of the selected HFPNN (SI ¼4, P¼ 5, c¼ 3, PI¼ 0.9716, and EPI¼ 1.1443).
Error
Error
5 4 3 2 1 0 -1 -2 -3 -4 -5 0
50
100 150 200 Training data
250
303
300
5 4 3 2 1 0 -1 -2 -3 -4 -5 0
20 40 60 80 100 120 140 160 180 200 Testing data
Fig. 10. Error values for the selected HFPNN (SI ¼4, P¼ 5, c¼ 3, PI ¼0.9716, and EPI¼ 1.1443), (a) PI, (b) EPI.
Table 6 Comparison of performance of the proposed model with other models for Boston House Housing Price. Model RBFNN [12] RBFNN with context-free clustering [12] Linguistic modeling [12] Without optimization One-loop optimization Multi-step optimization Incremental model [13] Linear regression Polynomial(2nd order) Incremental model Functional RBFNN [14] m¼ 2.0 m¼ 2.5 m¼ 3.5 HFSPNN [8] (3rd layer) Triangular MF Gaussian MF PSO-based RBFPNNs [27] Case I Proposed model (3rd layer) SI¼ 2 SI¼ 3 SI¼ 4
PI
EPI
Index
6.36 7 0.24 5.52 7 0.25
6.94 70.31 6.91 70.45
5.217 0.12 4.80 7 0.52 4.127 0.35
6.14 70.28 5.22 70.58 5.32 70.96
4.5357 0.240 3.8157 0.264 3.279 7 0.177
5.043 70.396 4.455 70.399 4.298 70.439
4.7247 0.644 8.0797 1.762 8.450 7 1.029
14.064 70.820 14.825 71.361 14.523 71.563
2.5767 0.236 2.4757 0.138
3.372 70.344 3.228 70.349
RMSE
1.346 7 0.214
3.419 70.401
RMSE
1.3055 7 0.081 0.9546 7 0.132 0.97387 0.078
1.4096 70.127 1.5690 70.176 1.3827 70.159
RMSE
RMSE
RMSE
MSE
304
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
Table 7 Performance index of HFPNN for the MPG data. SI
c
P 2
3
4
5
Ty
PI
EPI
Ty
PI
EPI
Ty
PI
EPI
Ty
PI
EPI
2 3 4 5
Q Q Q Q
3.8042 70.971 3.5895 70.933 2.8516 70.636 2.520770.137
3.71077 1.569 3.9399 7 1.281 4.39737 1.242 4.1963 7 1.614
Q Q Q L
3.0087 7 1.355 2.21417 0.879 1.53717 0.151 1.41437 0.077
3.6565 71.949 3.4630 71.490 4.1474 71.798 4.8611 72.449
Q M L L
2.1539 7 1.375 1.17357 0.102 1.00737 0.091 0.8993 7 0.098
3.9090 7 1.942 5.1932 7 1.652 4.7168 7 1.182 6.74477 2.321
Q M L L
2.2623 7 1.413 0.84887 0.122 0.72157 0.025 0.67607 0.048
3.9270 73.047 4.0393 71.355 8.1134 74.543 13.87776.413
3
2 3 4 5
Q Q M M
2.9576 70.685 2.6442 70.490 2.339670.138 2.1512 70.134
3.9099 7 1.186 3.74217 0.834 3.57517 1.072 4.43557 1.279
Q M L L
2.27697 1.116 1.50777 0.167 1.26727 0.083 1.18217 0.056
3.6015 71.743 3.2926 71.171 4.5340 71.280 5.1748 73.149
Q M L L
1.18167 0.259 1.03307 0.147 0.8852 7 0.082 0.77737 0.111
3.46337 1.558 3.42317 1.787 3.4690 7 0.780 9.24537 6.608
M L L L
0.82007 0.046 0.70247 0.037 0.67197 0.043 0.6038 7 0.046
3.6600 71.670 4.4706 71.980 13.341 76.654 22.665 721.05
4
2 3 4 5
M M M L
2.5068 70.253 2.4168 70.171 2.1200 70.137 2.0009 70.131
3.5936 7 0.838 3.82407 1.794 3.71627 1.143 5.01147 2.038
M L L L
1.8839 7 0.698 1.55277 0.580 1.1503 7 0.046 1.05717 0.075
3.2451 71.542 4.1391 71.505 8.1796 713.20 5.4978 73.651
L L L L
1.3622 7 0.839 0.90407 0.108 0.79707 0.120 0.71057 0.082
3.38517 1.805 4.33247 1.738 7.91547 4.125 12.6447 12.03
L L L L
0.78187 0.085 0.67237 0.024 0.60677 0.032 0.57707 0.042
3.4211 71.306 4.3141 72.470 27.028 720.66 39.355723.04
7
5 4.5 4 3.5 3 2.5 2 1.5 1 0.5
Training data error
Training data error
2
5 4 3 2 1
2
3 4 No. of contexts
5
2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8
Training data error
Training data error
6
2
3 4 No. of contexts
2
3 4 No. of contexts
5
2
3 4 No. of contexts
5
2
3 4 No. of contexts
5
20 18 16 14 12 10 8 6 4 2
5
5.5
3 Training data error
Training data error
5 2.5 2 1.5 1
4.5 4 3.5 3 2.5 2
0.5
1.5 2
3 4 No. of contexts
5
Fig. 11. Performance index of the proposed model versus the number of clusters. (a) No. of selected inputs variables: 2, (b) No. of selected inputs variables: 3, (c) No. of selected inputs variables: 4.
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
The number of contexts impacts the performance of the networks as shown in Fig. 11. Their increase leads to the reduction of error for the training set while there is a different tendency for the testing set. Here we select the best architecture of HFPNN with the values of PI ¼0.8246 and EPI ¼1.6261 in four inputs, five contexts, two clusters per each context and L type of polynomial and its detailed topology is visualized in Fig. 12. Fig. 13 shows the output and of the selected network obtained for the training and testing data. Table 8 reports the performance of the proposed model vis-àvis performance of other models. In all cases, the proposed model provides substantially better approximation and generalization capabilities. In the SONFN model, fuzzy relation-based fuzzy neural network is combined with the polynomial neural network. The output of the fuzzy relation-based fuzzy neural network is used as the input of the first layer of polynomial neural network. In the case of FPNN model, the PNs of the model are built up nodes with fuzzy set-based polynomial. The GA-based FSONN model consists of FPN (fuzzy polynomial neuron) and in each FPN, the parameters such as the number of input variables, selected input variables, and order of polynomial are optimized by genetic algorithm.
305
HFPNN driven to context-based fuzzy c-means (C-FCM) clustering method argued on an effective use of experimental data through a prudent process of information granulation, and self-organizing polynomial neural networks (PNN) regarded as a modeling vehicle for nonlinear and complex systems. Granulation of information is a suitable way of realizing abstraction that helps solve problems in a hierarchical fashion as well as convert the original problem into a series of manageable subtasks. Information granules are reflective of the characteristic of the system under modeling (experimental data, to be more specific) [26]. The key features of this approach can be highlighted as follows.
With respect to the network design, the main difference lies in an
5. Concluding remarks In this study, we have proposed and investigated a new architecture and design methodology of granular-oriented self-organizing
effective and comprehensive usage of experimental data and their prudent granulation which occurs at the level of the output variables (in which we exploit fuzzy c-means) and the input variables (in which case we employ a C-FCM clustering that reflects upon the structure already formed in the output space). Namely, C-FCM clustering method used to construct context-based polynomial neurons (CPNs) in the first layer of HFPNN deals effectively with the given experimental dataset through granulating input space for fuzzy sets (contexts) defined in the output space. A new methodology of advanced computational intelligence is built by combining the conventional PNN and C-FCM clustering for the effective data processing of information granulation. For the better predictive capability, HFPNN with CPN and PN driven to information granulation leads to the selection of the optimal network with the number of selected inputs of a node, the number of contexts, the number of clusters per each context and the type of polynomial. These parameters contribute to the
CPN4 Table 8 Results of comparative analysis.
CPN5 LOC CL TChar TComm MChar DChar N ˆ N NF V (G) BW
CPN9
PN6
Model SONFN [18] Simplified Linear FPNN [19] SI ¼2 SI ¼3 GA-based FSONN [20] PN based FSONN FPN based FSONN Incremental model [13] Linear regression Incremental model Proposed model (3rd layer) SI ¼2 SI ¼3 SI ¼4
CPN10 CPN12
PN8 PN5
CPN14
yˆ
PN20
CPN20 CPN26
PN25
CPN27 CPN29
100 90 Original output: Model output: 80 70 60 50 40 30 20 10 0 0 100 150 50 200 No. of training data
EPI
Index
40.753 35.745
17.898 17.807
MSE
32.195 32.251
18.462 19.622
MSE
18.043 23.739
11.898 9.090
MSE
5.8777 0.626 4.620 7 0.896
6.5707 1.024 6.6247 0.773
RMSE
2.21417 0.879 0.8852 7 0.082 0.78187 0.085
3.4630 7 1.490 3.4690 7 0.780 3.42117 1.306
RMSE
60 Original output: Model output:
50 40 Output
Output
Fig. 12. HFPNN architecture (SI ¼ 4, P¼ 5, c ¼2, PI¼ 0.8246, EPI ¼1.6261).
PI
30 20 10 0
250
-10
0
20
40
60 80 100 120 140 160 No. of testing data
Fig. 13. Model output versus output data (PI ¼0.8246, EPI¼ 1.6261, SI¼ 4, P¼ 5, c ¼2). (a) PI, (b) EPI.
306
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307
flexibility as well as simplicity and compactness of the resulting architecture of the network. As shown through experimental studies, the approximation capability of the proposed model is improved by increasing the number of layers, whereas the generalization ability of the model is bearish. In addition, making a performance better, the core parameters assigned in CPN and PN depend upon not values of a regular pattern but a characteristic of information granulation of a given dataset.
The experimental studies involving well-known datasets show a superb performance of the proposed network in comparison to the existing fuzzy and neuro-fuzzy models. Especially for the datasets such as automobile miles per gallon data, Boston House Price data, and the Medical Image System data, the proposed model is superior to the conventional models. Most importantly, through the proposed framework of information granulation, we can efficiently construct the optimal network architecture (structurally and parametrically optimized network) and this becomes crucial in improving the performance of the resulting model.
Acknowledgments This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2012003568) and by the GRRC program of Gyeonggi province [GRRC Suwon 2013-B2, Center for U-city Security & Surveillance Technology], and also supported by the University of Suwon in 2012. References [1] K.D. Karatzas, S. Kaltsatos, Air pollution modelling with the aid of computational intelligence methods in Thessaloniki, Greece, Simulation Model. Pract. Theory 15 (2007) 1310–1319. [2] C. Riziotis, A.V. Vasilakos, Computational intelligence in photonics technology and optical networks: a survey and future perspectives, Inf. Sci. 177 (2007) 5292–5315. [3] R. del-Hoyo, B. Martin-del-Brio, N. Medrano, J. Fernandez-Navajas, Computational intelligence tools for next generation quality of service management, Neurocomputing 72 (2009) 3631–3639. [4] D. Srinivasan, C.W. Chan, P.G. Balaji, Computational intelligence-based congestion prediction for a dynamic urban street network, Neurocomputing 72 (2009) 2710–2716. [5] M.R. AlRashidi, M.E. El-Hawary, Applications of computational intelligence techniques for solving the revived optimal power flow problem, Electr. Power Syst. Res. 79 (2009) 694–702. [6] S.K. Oh, W. Pedrycz, The design of self-organizing polynomial neural networks, Inf. Sci. 141 (3–4) (2002) 237–258. [7] S.K. Oh, W. Pedrycz, B.J. Park, Polynomial neural networks architecture: analysis and design, Comp. Electr. Eng. 29 (6) (2003) 703–725. [8] S.K. Oh, W. Pedrycz, S.B. Roh, Hybrid fuzzy set-based polynomial neural networks and their development with the aid of genetic optimization and information granulation, Appl. Soft Comput. 9 (3) (2009) 1068–1089. [9] H.S. Park, W. Pedrycz, S.K. Oh, Evolutionary design of hybrid self-organizing fuzzy polynomial neural networks with the aid of information granulation, Expert Syst. Appl. 33 (4) (2007) 830–846. [10] W. Pedrycz, Conditional fuzzy c-means, Pattern Recogn. Lett. 17 (1996) 625–631. [11] W. Pedrycz, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Networks 9 (4) (1998) 601–612. [12] W. Pedrycz, K.C. Kwak, Linguistic models as a framework of user-centric system modeling, IEEE Trans. SMC-A 36 (4) (2006) 727–745. [13] W. Pedrycz, K.C. Kwak, The development of incremental models, IEEE Trans. Fuzzy Syst. 15 (3) (2007) 507–518. [14] W. Pedrycz, H.S. Park, S.K. Oh, A granular-oriented development of functional radial basis function neural networks, Neurocomputing 72 (2008) 420–435. [15] H.S. Park, W. Pedrycz, S.K. Oh, Granular neural networks and their development through context-based clustering and adjustable dimensionality of receptive fields, IEEE Trans. Neural Networks 20 (10) (2009) 1604–1616. [16] A.G. Ivahnenko, The group method of data handling; a rival of method of stochastic approximation, Sov. Automat. Control 1–3 (1968) 43–55. [17] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, New York, 1981. [18] S.K. Oh, W. Pedrycz, B.J. Park, Relation-based neurofuzzy networks with evolutionary data granulation, Math. Comput. Model. 40 (7–8) (2004) 891–921.
[19] S.K. Oh, W. Pedrycz, Fuzzy polynomial neuron-based self-organizing neural networks, Int. J. Gen. Syst. 32 (3) (2003) 237–250. [20] S.K. Oh, H.S. Park, C.W. Jeong, S.C. Joo, GA-based feed-forward self-organizing neural network architecture and its applications for multi-variable nonlinear process systems, KSII Trans. Internet Inf. Syst. 3 (3) (2009) 309–330. [21] 〈http://archive.ics.uci.edu/ml〉. [22] L.A. Zadeh, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets Syst. 90 (1997) 111–117. [23] W. Pedrycz, P. Rai, A multifaceted perspective at data analysis: a study in collaborative intelligent agents, IEEE Trans. Syst. Man Cybern. B 39 (4) (2009) 834–844. [24] P. Maji, S.K. Pal, Rough set based generalized fuzzy c-means algorithm and quantitative indices, IEEE Trans. Syst. Man Cybern. B 37 (6) (2007) 1529–1540. [25] D. Harrison, D.L. Rubinfeld, Hedonic housing prices and the demand for clean air, J. Environ. Econ. Manage. 5 (1978) 81–102. [26] W. Pedrycz, G. Vukovich, Granular neural networks, Neurocomputing 36 (2001) 205–224. [27] S.K. Oh, H.S. Park, W.D. Kim, W. Pedrycz, A new approach to radial basis function-based polynomial neural networks: analysis and design, Knowl. Inf. Syst. 36 (1) (2013) 121–151.
Sung-Kwun Oh (M’98) received the B.Sc., M.Sc., and Ph. D. degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1981, 1983, and 1993, respectively. During 1983–1989, he was a Senior Researcher of R&D Lab. of Lucky-Goldstar Industrial Systems Co., Ltd. From 1996 to 1997, he was a Postdoctoral Fellow with the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada. He is currently a Professor with the Department of Electrical Engineering, University of Suwon, South Korea. His research interests include fuzzy system, fuzzyneural networks, automation systems, advanced computational intelligence, and intelligent control. He currently serves as an Associate Editor of the KIEE Transactions on Systems and Control, International Journal of Fuzzy Logic and Intelligent Systems of the KFIS, and International Journal of Control, Automation, and Systems of the ICASE, South Korea. Wook-Dong Kim received the B.Sc., and M.Sc. degrees in Electrical Engineering from Suwon University, Korea, in 2009 and 2011 respectively. He is currently a Ph.D. student at Electrical Engineering, Suwon University, Korea. His research interests include fuzzy set, neural networks, evolutionary algorithm, computational intelligence and statistical learning.
Byoung-Jun Park received the B.S. and M.S. degrees in control and Instrumentation Engineering and Ph.D. degrees in Electrical Engineering from Wonkwang University, Iksan, Korea, in 1998, 2000, and 2003, respectively. He is currently a senior member of engineering staff in Telematics & Vehicle-IT Convergence Research Department, IT Convergence Technology Research Laboratory, Electronics and Telecommunications Research Institute (ETRI) in Korea. From 2005 to 2006, he held a position of a Postdoctoral Fellow in the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada. His research interests include computational intelligence, fuzzy modeling, neurofuzzy systems, pattern recognition, granular and relational computing, and proactive idle stop. Witold Pedrycz (M’88-SM’90-F’99) received the M.Sc., and Ph.D., D.Sci. all from the Silesian University of Technology, Gliwice, Poland. He is a Professor and Canada Research Chair (CRC) in Computational Intelligence in the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada. He is also with the Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland. His research interests encompass computational Intelligence, fuzzy modeling, knowledge discovery and data mining, fuzzy control including fuzzy controllers, pattern recognition, knowledge-based neural networks, granular and relational computing, and
S.-K. Oh et al. / Neurocomputing 119 (2013) 292–307 Software Engineering. He has published numerous papers in these areas. He is also an author of 9 research monographs. Witold Pedrycz has been a member of numerous program committees of IEEE conferences in the area of fuzzy sets and neurocomputing. He currently serves as an Associate Editor of IEEE Transactions on
307
Systems Man and Cybernetics, IEEE Transactions on Neural Networks, and IEEE Transactions on Fuzzy Systems. He is also an Editor-in-Chief of Information Sciences and IEEE Transactions on Systems Man and Cybernetics Part A, President of IFSA.