ARTICLE IN PRESS
Neurocomputing 64 (2005) 397–431 www.elsevier.com/locate/neucom
Multi-layer hybrid fuzzy polynomial neural networks: a design in the framework of computational intelligence Sung-Kwun Oha, Witold Pedryczb,c,, Ho-Sung Parka a
School of Electrical and Electronic Engineering, Wonkwang University, 344-2 Shinyong-Dong, Iksan, Chon-Buk 570-749, South Korea b Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2G6 Canada c Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Received 27 March 2004; received in revised form 2 August 2004; accepted 6 August 2004 Communicated by K. Cios Available online 7 December 2004
Abstract We introduce a new architecture of hybrid fuzzy polynomial neural networks (HFPNN) that is based on a genetically optimized multi-layer perceptron and develop their comprehensive design methodology involving mechanisms of genetic optimization. The construction of HFPNN exploits fundamental technologies of computational intelligence (CI), namely fuzzy sets, neural networks, and genetic algorithms (GAs). The architecture of the resulting genetically optimized HFPNN (namely gHFPNN) results from a synergistic usage of the hybrid system generated by combining fuzzy polynomial neurons (FPNs)-based fuzzy neural networks (FNN) with polynomial neurons (PNs)-based polynomial neural networks (PNN). The design of the conventional HFPNN exploits the extended group method of data handling (GMDH) with some essential parameters of the network being provided by the designer and kept fixed throughout the overall development process. This restriction may hamper a possibility of producing an optimal architecture of the model. The augmented gHFPNN results in a structurally optimized structure and comes with a higher level of flexibility in comparison to the one we encounter in the conventional HFPNN. The GA-based Corresponding author. Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2G6 Canada. Tel.: +1-780-492-3333. E-mail address:
[email protected] (W. Pedrycz).
0925-2312/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2004.08.001
ARTICLE IN PRESS 398
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
design procedure being applied at each layer of HFPNN leads to the selection of preferred nodes (FPNs or PNs) available within the HFPNN. In the sequel, two general optimization mechanisms are explored. First, the structural optimization is realized via GAs whereas the ensuing detailed parametric optimization is carried out in the setting of a standard least square method-based learning. The performance of the gHFPNN is quantified through experimentation where we exploit data coming from processes of pH neutralization and NOx emission. These datasets have already been used quite intensively in fuzzy and neurofuzzy modeling. The obtained results demonstrate superiority of the proposed networks over the existing fuzzy and neural models. r 2004 Elsevier B.V. All rights reserved. Keywords: Hybrid fuzzy polynomial neural networks (HFPNN); Fuzzy neural networks (FNN); Polynomial neural networks (PNN); Multi-layer perceptron; Group method of data handling; Fuzzy polynomial neuron; Polynomial neuron (PN); Genetic algorithms; Design procedure
1. Introduction Recently, a great deal of attention has been directed towards advanced technologies of computational intelligence (CI) and their usage in system modeling. The challenging quest we are faced there comes with the multitude of demanding and conflicting objectives we wish to satisfy. The problem of designing models that exhibit significant approximation and generalization abilities as well as are easy to comprehend has been within the community for decades. Neural networks, fuzzy sets and evolutionary computing regarded as the leading technologies of CI have expanded and enriched a field of system modeling quite immensely. They have given rise to a number of new methodological issues and increased our awareness about tradeoffs one has to make in system modeling [1,3,10]. The most successful approaches to hybridize fuzzy systems and endow them with learning and adaptive abilities have been realized in the realm of CI. Especially neural fuzzy systems and genetic fuzzy systems enhance the variety of mechanisms of approximate reasoning encountered in fuzzy systems with the learning capabilities of neural networks and structural optimization being supported by evolutionary algorithms [2]. When the dimensionality of the model goes up (the number of system’s variables increases), so do the difficulties. In particular, when dealing with high-order nonlinear and multivariable equations of the model, we require a vast amount of data for estimating all its parameters. In the sequel, to build models with good predictive abilities as well as approximation capabilities, there is a need for advanced tools. As one of the representative and advanced design approaches comes a family of multi-layer self-organizing neural networks [16–20,26] such as hybrid fuzzy polynomial neural networks (HFPNN) as well as polynomial neural network (PNN) and fuzzy polynomial neural networks (FPNN) being treated as a new category of neuro-fuzzy networks. The design procedure of the multi-layer selforganizing neural networks exhibits some tendency to produce overly complex networks. The design comes with a repetitive computation load caused by the trial
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
399
and error method being a part of the development process. In essence, as being inherited from the original GMDH algorithm [7,4], the design process requires some repetitive parameter adjustment to be done by the system designer. In this study, by addressing the above problems with the conventional multi-layer self-organizing neural networks, in particular, HFPNN, we introduce a new genetic design approach. Having this new design in mind, we will be referring to these networks as genetically optimized HFPNN (‘‘gHFPNN’’ for brief). The determination of the optimal values of the parameters available within an individual PN and FPN (viz. the number of input variables, the order of the polynomial, and a collection of preferred nodes) leads to a structurally and parametrically optimized network. As a result, this network becomes more flexible as well as it starts exhibiting simpler topology in comparison to those conventional HFPNN, FPNN, and PNN as being already discussed in the previous research. Moreover the hybrid architecture is much more flexibly designed that help reach for a sound compromise between approximation and generalization abilities of the network. In essence, this leads to an important tradeoff between accuracy and complexity of the overall network. Let us reiterate that the objective of this study is to develop a general design methodology of gHFPNN modeling, come up with a logic-based structure of such model and propose a comprehensive evolutionary development environment in which the optimization of the models can be efficiently carried out both at the structural as well as parametric level [25]. This paper is organized in the following manner. First, Section 2 provides a brief introduction to the architecture and development of FPN-based and PN-based layer of the HFPNNs. Section 3 introduces the genetic optimization used in HFPNN. The genetic design of the HFPNN comes with an overall description of a detailed design methodology of HFPNN based on genetically optimized multi-layer perceptron architecture in Section 4. In Section 5, we report on a comprehensive set of experiments. Finally concluding remarks are covered in Section 6. To evaluate the performance of the proposed model, we discuss three experimental studies exploiting well-known data being already used in the realm of fuzzy or neurofuzzy modeling [28,13,19,6,11,9,21–23,27].
2. The architecture and development of the hybrid fuzzy polynomial neural networks Proceeding with the overall HFPNN architecture, essential design decisions have to be made with regard to the number of input variables, the order of the polynomial, and a collection of the specific subset of input variables. We distinguish between two kinds of layers of the HFPNN architecture (PN-based layer and FPNbased layer of the HFPNN). 2.1. The architecture of fuzzy polynomial neuron (FPN) based layer of HFPNN In this section, we introduce a fuzzy polynomial neuron (FPN). This neuron, regarded as a generic type of the processing unit, dwells on the concept of fuzzy sets.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
400
Fuzzy sets realize a linguistic interface by linking the external world—numeric data with the processing unit. Neurocomputing manifests in the form of a local polynomial unit realizing a nonlinear processing. We show that the FPN encapsulates a family of nonlinear ‘‘if-then’’ rules. When arranged together, FPNs build the first layer of the HFPNN. As visualized in Fig. 1, the FPN consists of two basic functional modules. The first one, labeled by F; is a collection of fuzzy sets that form an interface between the input numeric variables and the processing part realized by the neuron. In this figure, xq and xp denote input variables. The second module (denoted here by P) concentrates on the function-based nonlinear (polynomial) processing. This nonlinear processing involves some input variables (xi and xj ). Quite commonly, we will be using a polynomial form of the nonlinearity, hence the name of the fuzzy polynomial processing unit. The use of polynomials is motivated by their generality. In particular, they include constant and linear mappings as their special cases (that are used quite often in rule-based systems). The number of the input variables may vary. As mentioned, the set of input variables being transformed by the fuzzy sets (here fAl g and fB k g) may be different, the same or partially overlap with the variables that are processed by the second polynomial module of the neuron. This way of treating inputs adds an extra flexibility to the entire processing unit. Bearing in mind the collection of fuzzy sets used to transform the input variables, the FPN realizes a family of multiple-input single-output rules. Each rule, refer again to Fig. 1, reads in the form if xp is Al and xq is B k then z is Plk ðxi ; xj ; alk Þ;
(1)
where alk is a vector of the parameters of the conclusion part of the rule while Plk ðxi ; xj ; alk Þ denotes the regression polynomial forming the consequence part of the fuzzy rule which uses several types of high-order polynomials (linear, quadratic, and modified quadratic) besides the constant function forming the simplest version of the consequence; refer to Table 1.
P F
xp
1
ˆ 1
P1
2
ˆ 2
P2
3
ˆ 3
P3
k
ˆ k
PK
Σ
z
xq
Fig. 1. A general topology of the generic FPN module; note its fuzzy set-based processing part (the module denoted by F) and the polynomial form of mapping ðPÞ:
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
401
Table 1 Different forms of regression polynomials building PN and FPN Order of the polynomial FPN 0 1 2
(Type (Type (Type (Type
No. of inputs PN
1) 2) 3) 4)
(Type 1) (Type 2) (Type 3)
1
2
3
Constant Linear Quadratic
Constant Bilinear Biquadratic-1 Biquadratic-2
Constant Trilinear Triquadratic-1 Triquadratic-2
1: Basic type, 2: Modified type.
The types of the polynomial read as follows:
Bilinear ¼ c0 þ c1 x1 þ c2 x2 ; Biquadratic-1 ¼ c0 þ c1 x1 þ c2 x2 þ c3 x1 x2 þ c4 x21 þ c5 x22 ; Biquadratic-2 ¼ c0 þ c1 x1 þ c2 x2 þ c3 x1 x2 ; Trilinear ¼ c0 þ c1 x1 þ c2 x2 þ c3 x3 ; Triquadratic-1 ¼ c0 þ c1 x1 þ c2 x2 þ c3 x3 þ c4 x1 x2 þ c5 x2 x3 þ c6 x1 x3 þ c7 x21 þ c8 x22 þ c9 x23 ; Triquadratic-2 ¼ c0 þ c1 x1 þ c2 x2 þ c3 x3 þ c4 x1 x2 þ c5 x2 x3 þ c6 x1 x3 :
The maximal number of rules one can accommodate in the single FPN structure is determined by the number of the fuzzy sets defined for each variable (that is their combinations) and the number of the variables themselves. Alluding to the input variables of the FPN, especially a way in which they interact with the two functional blocks there, we use the notation FPN ðxp ; xq ; xi ; xj Þ to explicitly point at the variables. The processing of the FPN is governed by the following expressions that are in line of the rule-based computing commonly encountered in the literature, cf. [24,15] (a) The activation of the rule ‘‘K’’ is computed as an and-combination of the activations of the fuzzy sets standing in the rule. This combination of the subconditions is realized through any t-norm with the minimum and product being in common use. Denote the resulting activation level by mK : (b) The activation levels of the rules contribute to the output of the FPN computed as a weighted average of the individual condition parts (functional transformations) PK (note that the index of the rule, namely ‘‘K’’ is a shorthand notation for the two indexes of fuzzy sets used in rule (1), that is K ¼ ðl; kÞ) z¼
allX rules K¼1
, mK PK ðxi ; xj ; aK Þ
allX rules K¼1
mK ¼
allX rules
m~ K PK ðxi ; xj ; aK Þ:
(2)
K¼1
In the above, we use an abbreviated notation to describe an activation level of the Kth rule to be in the form mK : (3) m~ K ¼ Pall rules mL L¼1
ARTICLE IN PRESS 402
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
The topology of the FPN based layer of HFPNN implies the ensuing learning mechanisms; in the description below we indicate some of these learning issues that permeate the overall architecture. The FPN itself promotes a number of interesting design options, see Fig. 2. These alternatives distinguish between two categories such as designer-based and GA-based. The former concerns a choice of the membership function (MF) type, the consequent input structure of the fuzzy rules, and the number of MFs per each input variable. The latter is related to a choice of the number of inputs, and a collection of the specific subset of input variables and its associated order of the polynomial realizing a consequence part of the rules based on fuzzy inference method. Proceeding with the FPN-based layer of HFPNN architecture, see Fig. 3, essential design decisions have to be made with regard to the number of input variables and the order of the polynomial forming the conclusion part of the rules as well as a collection of the specific subset of input variables. The consequence part can be expressed by linear, quadratic, or modified quadratic polynomial equation as mentioned previously. Especially for the consequence part, we consider two kinds of input vector formats in the conclusion part of the fuzzy rules of the 1st layer, namely (i) selected inputs and (ii) entire system inputs, see Table 2. (i) The input variables of the consequence part of the fuzzy rules are same as the input variables of premise part. x1
x2
yk
xn
Designer-based design alternatives MF type
Consequent structure of the fuzzy rules
No. of MFs
Triangular
Selected input
2
Gaussian
Entire system input
3
GA-based design alternatives Fuzzy inference method
Parameters
Simplified(Type 1)
No. of inputs and order of the consequence polynomial
Regeression polynomial(Type 2~4)
Fig. 2. The design alternatives available within a single FPN.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
403
Σ
Fig. 3. A general topology of the FPN based layer of HFPNN.
Table 2 Polynomial type of the model classified with regard to the number of input variables in the conclusion part of fuzzy rules Type of the consequence polynomial input vector
Selected input variables in the premise part
Selected input variables in the consequence part
Entire system input variables
Type T Type T
A A
A B
B B
(ii) The input variables of the consequence part of the fuzzy rules in a node of the 1st layer are same as the entire system input variables and the input variables of the consequence part of the fuzzy rules in a node of the 2nd layer or higher are same as the input variables of premise part. Where the following notation is in use A: Vector of the selected input variables ðx1 ; x2 ; . . . ; xi Þ; B: Vector of the entire system input variables ðx1 ; x2 ; . . . ; xi ; xj ; . . .Þ; Type T: f ðAÞ ¼ f ðx1 ; x2 ; . . . ; xi Þ—type of a polynomial function standing in the consequence part of the fuzzy rules, Type T : f ðBÞ ¼ f ðx1 ; x2 ; . . . ; xi ; xj . . .Þ—type of a polynomial function occurring in the consequence part of the fuzzy rules
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
404
2.2. The architecture of the polynomial neuron (PN) based layer of HFPNN As underlined, the PNN algorithm in the PN based layer of HFPNN is based on the GMDH method and utilizes a class of polynomials such as linear, quadratic, modified quadratic, etc. to describe basic processing realized there. By choosing the most significant input variables and an order of the polynomial among various types of forms available, we can obtain the best one—it comes under a name of a partial description (PD). It is realized by selecting nodes at each layer and eventually generating additional layers until the best performance has been reached. Such methodology leads to an optimal PNN structure. Let us recall that the input–output data are given in the form ðX i ; yi Þ ¼ ðx1i ; x2i ; . . . ; xNi ; yi Þ;
i ¼ 1; 2; 3; . . . ; n:
(4)
where N is the number of input variables, i is the data number of each input and output variable, and n denotes the number of data in the dataset. The input–output relationship for the above data realized by the PNN algorithm can be described in the following manner y ¼ f ðx1 ; x2 ; . . . ; xN Þ:
(5)
Where, x1 ; x2 ; . . . ; xN denote the outputs of the lst layer of PN nodes (the inputs of the 2nd layer (PN nodes)). The estimated output y^ reads as y^ ¼ c0 þ
N X i¼1
ci xi þ
N X N X i¼1 j¼1
cij xi xj þ
N X N X N X
cijk xi xj xk . . . ;
(6)
i¼1 j¼1 k¼1
where, Cðc0 ; ci ; cij ; cijk ; . . .Þ ði; j; k; . . . : 1; 2; . . . ; NÞ and X ðxi ; xj ; xk ; . . .Þ; ði; j; k; . . . : 1; 2; . . . ; NÞ are vectors of the coefficients and input variables of the resulting multi-input single-output (MISO) system, respectively. The design of the PNN structure proceeds further and involves a generation of some additional layers. These layers consist of PNs (PDs) for which the number of input variables, the polynomial order, and a collection of the specific subset of input variables are genetically optimized across the layers. The detailed PN involving a certain regression polynomial is shown in Table 1. The architecture of the PN based layer of HFPNN is visualized in Fig. 4. The structure of the PNN is genetically optimized on the basis of the design alternatives available within a PN occurring in each layer. In the sequel, the PNN embraces diverse topologies of PN being selected on the basis of the number of input variables, the order of the polynomial, and a collection of the specific subset of input variables (as shown in Table 1). The choice of the number of input variables, the polynomial order, and input variables available within each node itself helps select the best model with respect to the characteristics of the data, model design strategy, nonlinearity and predictive capabilities.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
405
Fig. 4. A general topology of the PN based layer of HFPNN: note a biquadratic polynomial in the partial description (z: intermediate variable).
3. Genetic optimization of HFPNN The task of optimizing any complex model involves two main phases. First, a class of some optimization algorithms has to be chosen so that it is applicable to the requirements implied by the problem at hand. Secondly, various parameters of the optimization algorithm need to be tuned in order to achieve its best performance. Genetic algorithms (GAs) are optimization techniques based on the principles of natural evolution. In essence, they are search algorithms that use operations found in biology to guide a comprehensive search over the parameter space. GAs have been theoretically and empirically demonstrated to provide robust search capabilities in complex spaces thus offering a valid solution strategy to problems requiring efficient and effective searching. For the optimization applied to real world problems, many
ARTICLE IN PRESS 406
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
methods are available including gradient-based search and direct search, which are based on techniques of mathematical programming [12]. In contrast to these, genetic algorithm are aimed at stochastic global search that involves a structured information exchange [5]. It is eventually instructive to highlight the main features that tell GA apart from some other optimization methods: (1) GA operates on the codes of the variables (genotypes) rather than original variables. (2) GA searches optimal points starting from a group (population) of points in the search space (potential solutions) rather than using a single point. (3) GA’s search is directed only by some fitness function whose form could be quite complex; we do not require any specific format of this function (in particular we do not require its differentiability as commonly encountered in gradient-based optimization models). In this study, for the optimization of the HFPNN model, GA uses the serial method of binary type, roulette-wheel used in the selection process, one-point crossover in the crossover operation, and a binary inversion (complementation) operation in the mutation operator. To retain the best individual and carry it over to the next generation, we use elitist strategy [8]. The overall genetically-driven structural optimization process of HFPNN is shown in Fig. 5. Fig. 6 depicts the genetic optimization procedure for the generation of the optimal nodes in the corresponding layer. As shown there, all nodes of the corresponding layer of HFPNN architecture are constructed through the genetic optimization. As mentioned, when we construct PNs and FPNs of each layer in the conventional HFPNN, such parameters as the number of input variables (nodes), the order of polynomial, and input variables available within a PN and an FPN are fixed (selected) in advance by the designer. This could have frequently contributed to the difficulties in the design of the optimal network. To overcome this apparent drawback, we resort ourselves to the genetic optimization, refer to Fig. 7 for more detailed flow of the development activities.
Fig. 5. Overall genetically-driven structural optimization process of HFPNN.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
407
Fig. 6. The genetic optimization used in the generation of the optimal nodes in the given layer of the network.
Fig. 7. A general flow of genetic design of HFPNN.
ARTICLE IN PRESS 408
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
4. The algorithm and design procedure of genetically optimized HFPNN (gHFPNN) The genetically-driven HFPNN comes with a highly versatile architecture both in the flexibility of the individual nodes as well as the interconnectivity between the nodes and organization of the layers. Overall, the framework of the design procedure of the HFPNN based on genetically optimized multi-layer perceptron architecture comprises the following steps. Step 1 Determine system’s input variables. Define system’s input variables xi ði ¼ 1; 2; . . . ; nÞ related to the output variable y. If required, the normalization of input data is carried out as well. Step 2 Form a training and testing data. The input–output data set ðxi ; yi Þ ¼ ðx1i ; x2i ; . . . ; xni ; yi Þ; i ¼ 1; 2; . . . ; N (with N being the total number of data points) is divided into two parts, that is, a training and testing dataset. Denote their sizes by N t and N c respectively. Obviously we have N ¼ N t þ N c : The training data set is used to construct the HFPNN. Next, the testing data set is used to evaluate the quality of the network. Step 3 Decide initial information for constructing the HFPNN structure. We decide upon the design parameters of the HFPNN structure and they include: (a) According to the stopping criterion, two termination methods are exploited: Criterion level for comparison of a minimal identification error of the current layer with that occurring at the previous layer of the network. The maximum number of layers (predetermined by the designer) with an intent to achieve a sound balance between model accuracy and its complexity. (b) The maximum number of input variables coming to each node in the corresponding layer. (c) The total number W of nodes to be retained (selected) at the next generation of the HFPNN algorithm. (d) The depth of the HFPNN to be selected to reduce a conflict between overfitting and generalization abilities of the developed HFPNN. (e) The depth and width of the HFPNN to be selected as a result of a tradeoff between accuracy and complexity of the overall model. In addition, in case of the FPN-based layer of HFPNN, parameters related to the following item are considered besides what are mentioned above. (f) The decision of initial information for fuzzy inference method and fuzzy identification: fuzzy inference method MF type: triangular or gaussian-like MF No. of MFs per each input of a node (or FPN) structure of the consequence part of fuzzy rules Step 4 Decide a structure of the PN and FPN based layer of HFPNN using genetic design.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
409
This step concerns the selection of the number of input variables, the polynomial order, and the input variables to be assigned in each node of the corresponding layer. These important decisions are carried out through an extensive genetic optimization. When it comes to the organization of the chromosome representing (mapping) the structure of the HFPNN, we divide the chromosome to be used for genetic optimization into three sub-chromosomes as shown in Figs. 8 and 9. The 1st subchromosome contains the number of input variables, the 2nd sub-chromosome involves the order of the polynomial of the node, and the 3rd sub-chromosome (remaining bits) contains input variables coming to the corresponding node (PN and FPN). All these elements are optimized when running the GA. In nodes (PN and FPNs) of each layer of HFPNN, we adhere to the notation of Fig. 10. ‘PNn’ or ‘FPNn’ denotes the nth node (PN or FPN) of the corresponding layer, ‘N’ denotes the number of nodes (inputs or PNs/FPNs) coming to the corresponding node, and ‘T’ denotes the order of polynomial used within the corresponding node.
Fig. 8. The PN design used in the HFPNN architecture—structural considerations and mapping the structure on a chromosome.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
410
Selection of node(FPN) structrue by chromosome
i) Bits for the selection of the no. of input variables
Related bit items
Bit structure of subchromosome divided for each item
1
0
1
Normalization (less than Max)
Selection of no. of input variables (r)
Fuzzy inference & fuzzy identification
iii) Bits for the selection of input variables
0
1
1
Decoding (Decimal)
Decoding (Decimal)
Genetic Design
1
1
0
ii) Bits for the selection of the polynomial order
1
1
Normalization (1 ~ 4)
Selection of the order of polynomial
1
1
0
1
1
1
1
1
r
Decoding (Decimal)
Decoding (Decimal)
Normalization (1 ~ n(or W))
Normalization (1 ~ n(or W))
Decision of input variables
Decision of input variables
Selection of input variables
(Type 1~Type 4)
Fuzzy inferene method
MF Type
No. of MFs per each input
The structure of consequent part of fuzzy rules
Simplified or regression polynomial fuzzy inference
Triangular or Gaussian
2 or 3
Selected input variables or entire system input variables
FPN
Selected FPN
Fig. 9. The FPN design used in the HFPNN architecture—structural considerations and mapping the structure on a chromosome.
nth Polynomial or Fuzzy Polynomial Neuron(PN or FPN) xi
PNn or FPNn N T
xj
z
Polynomial order(Type T) No. of inputs
Fig. 10. Formation of each PN or FPN in HFPNN architecture.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
411
Each sub-step of the genetic design of the three types of the parameters available within the PN and the FPN is structured as follows: Step 4.1 Selection of the number of input variables (1st sub-chromosome) Sub-step 1 The first three bits of the given chromosome are assigned to the binary bits for the selection of the number of input variables. The size of this bit structure depends on the number of input variables. For example, in case of three bits, the maximum number of input variables is limited to 7. Sub-step 2 The three bits randomly selected as b ¼ ð22 bitð3ÞÞ þ ð21 bitð2ÞÞ þ ð20 bitð1ÞÞ
(7)
are then decoded into decimal. Here, bit(1), bit(2) and bit(3) show each location of these three bits and are denoted as ‘‘0’’, or ‘‘1’’ respectively. Sub-step 3 The above decimal value b is rounded off g ¼ ðb=aÞ ðMax 1Þ þ 1;
(8)
where Max denotes the maximal number of input variables entering the corresponding node (PN or FPN) while a is the decoded decimal value corresponding to the situation when all bits of the 1st sub-chromosome are set up as 1s. Sub-step 4 The normalized integer value is then treated as the number of input variables (or input nodes) coming to the corresponding node. Evidently, the maximal number (Max) of input variables is equal to or less than the number of all system’s input variables ðx1 ; x2 ; . . . ; xn Þ coming to the 1st layer, that is, Maxpn: Step 4.2 Selection of the order of polynomial (2nd sub-chromosome) Sub-step 1 The three bits of the 2nd sub-chromosome are assigned to the binary bits for the selection of the order of polynomial. Sub-step 2 The three bits randomly selected using (7) are decoded into a decimal format. Sub-step 3 The decimal value b obtained by means of (8) is normalized and rounded off. The value of Max is replaced with 3 (in case of PN) or 4 (in case of FPN), refer to Table 1. Sub-step 4 The normalized integer value is given as the selected polynomial order, when constructing each node of the corresponding layer. Step 4.3 Selection of input variables (3rd sub-chromosome) Sub-step 1 The remaining bits are assigned to the binary bits for the selection of input variables. Sub-step 2 The remaining bits are divided by the value obtained in step 4.1. If these bits are not divisible, we apply the following rule. For example, if the remaining are 22 bits and the number of input variables obtained in step 4.1 has been set up as 4, the 1st, 2nd, and 3rd bit structures (spaces) for the selection of input variables are assigned to six bits, respectively. The last (4th) bit structure (spaces) used for the selection of the input variables is assigned to four bits. Sub-step 3 Each bit structure is decoded into decimal (through relationship (7)).
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
412
Sub-step 4 Each decimal value obtained in sub-step 3 is then normalized following (8); moreover we round off the values obtained from this expression. We replace Max with the total number of inputs (viz. input variables or input nodes), n (or W) in the corresponding layer. Note that the total number of input variables denotes the number of the overall system’s inputs, n, in the 1st layer, and the number of the selected nodes, W, as the output nodes of the preceding layer in the 2nd layer or higher. Sub-step 5 The normalized integer values are then taken as the selected input variables while constructing each node of the corresponding layer. Here, if the selected input variables are multiple-duplicated, the multiple-duplicated input variables (viz. same input numbers) are treated as a single input variable while the remaining ones are discarded. Step 5 Estimate the coefficient parameters of the polynomial in the selected node (PN or FPN). Step 5.1 In case of a PN (PN-based layer) The vector of coefficients Ci is derived by minimizing the mean squared error between yi and zmi E¼
N tr 1 X ðy zmi Þ2 : N tr i¼0 i
(9)
Using the training data subset, this gives rise to the set of linear equations Y ¼ Xi Ci :
(10)
Evidently, the coefficients of the PN of nodes in each layer are expressed in the form Ci ¼ ðXTi Xi Þ 1 XTi Y;
(11)
where Y ¼ ½y1 y2 . . . yntr T ;
Xi ¼ ½X1i X2i . . . Xki . . . Xntr i T ;
m m XTki ¼ ½1xki1 xki2 . . . xkin . . . xm ki1 xki2 . . . xkin ;
Ci ¼ ½c0i c1i c2i . . . cn0 i T with the following notation i: node number, k: data number, ntr : number of the training data subset, n: number of the selected input variables, m: maximum order, n0 : number of estimated coefficients. Step 5.2 In case of a FPN (FPN-based layer) At this step, the regression polynomial inference is considered. The inference method deals with regression polynomial functions viewed as the consequents of the rules. Regression polynomials (polynomial and in the very specific case, a constant value) standing in the conclusion part of fuzzy rules are given as different types of Type 1, 2, 3, or 4, see Table 1. In the fuzzy inference, we consider two types of membership functions, namely triangular and Gaussian-like membership functions. The regression fuzzy inference (reasoning scheme) is envisioned: The consequence part can be expressed by linear, quadratic, or modified quadratic polynomial equation as shown in Table 1. The use of the regression polynomial inference method
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
413
gives rise to the expression Ri : If x1 is Ai1 ; . . . ; xk is Aik then yi ¼ f i ðx1 ; x2 ; . . . ; xk Þ;
(12)
i
where, R is the ith fuzzy rule, xl ðl ¼ 1; 2; . . . ; kÞ is an input variable, Aik is a membership function of fuzzy sets, k denotes the number of the input variables, f i ðÞ is a regression polynomial function of the input variables. The calculation of the numeric output of the model are carried out in the wellknown form Pn m f i ðx1 ; x2 ; . . . ; xk Þ Xn _ y^ ¼ i¼1 i P ¼ m f ðx ; x ; . . . ; xk Þ; (13) n i¼1 i i 1 2 i¼1 mi where, n is the number of the fuzzy rules, y^ is the inferred value, mi is the premise _ fitness of Ri and mi is the normalized premise fitness of mi : Here we consider a regression polynomial function of the input variables. The consequence parameters are produced by the standard least squares method. The procedure described above is implemented iteratively for all nodes of the layer and also for all layers of HFPNN; we start from the input layer and move towards the output layer. Step 6 Select nodes (PNs or FPNs) with the best predictive capability and construct their corresponding layer. All nodes of the corresponding layer of HFPNN architecture are constructed through the genetic optimization. The generation process can be organized as the following sequence of steps: Sub-step 1 We set up initial genetic information necessary for generation of the HFPNN architecture. This concerns the number of generations and populations, mutation rate, crossover rate, and the length of the chromosome. Sub-step 2 The nodes (PNs or FPNs) are generated through the genetic design. Here, a single population assumes the same role as the node (PN or FPN) in the HFPNN architecture and its underlying processing is visualized in Figs. 8 and 9. The optimal parameters of the corresponding polynomial are computed by the standard least squares method. Sub-step 3 To evaluate the performance of nodes (PNs or FPNs) constructed using the training dataset, the testing dataset is used. Based on this performance index, we calculate the fitness function. The fitness function reads as F ðfitness functionÞ ¼
1 ; 1 þ EPI
(14)
where EPI denotes the performance index for the testing data (or validation data). In this case, the model is obtained by the training data and EPI is obtained from the testing data (or validation data) of the HFPNN model constructed by the training data. Sub-step 4 To move on to the next generation, we carry out selection, crossover, and mutation operation using genetic initial information and the fitness values obtained via sub-step 3.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
414
Sub-step 5 The nodes (PNs or FPNs) obtained on the basis of the calculated fitness values ðF 1 ; F 2 ; . . . ; F z Þ are rearranged in a descending order. We unify the nodes with duplicated fitness values (viz. in case that one node is the same fitness value as other nodes) among the rearranged nodes on the basis of the fitness values. We choose several nodes (PNs or FPNs) characterized by the best fitness values. Here, we use the pre-defined number W of nodes (PNs or FPNs) with better predictive capability that need to be preserved to assure an optimal operation at the next iteration of the HFPNN algorithm. The outputs of the retained nodes serve as inputs to the next layer of the network. There are two cases as to the number of the retained nodes, that is (i) If W oW ; then the number of the nodes (PNs or FPNs) retained for the next layer is equal to z. Here, W denotes the number of the retained nodes in each layer that nodes with the duplicated fitness values were moved. (ii) If W XW ; then for the next layer, the number of the retained nodes (PNs or FPNs) is equal to W. Sub-step 6 For the elitist strategy, we select the node that has the highest fitness value among the selected nodes ðW Þ: Sub-step 7 We generate new populations of the next generation using operators of GAs obtained from Sub-step 4. We use the elitist strategy. This sub-step carries out by repeating sub-step 2–6. Especially in sub-step 5, we replace the node that has the lowest fitness value in the current generation with the node that has reached the highest fitness value in the previous generation obtained from sub-step 6. Sub-step 8 We combine the nodes (W populations) obtained in the previous generation with the nodes (W populations) obtained in the current generation. In the sequel, W nodes that have higher fitness values among them (2W) are selected. That is, this sub-step carries out by repeating sub-step 5. Sub-step 9 Until the last generation, this sub-step carries out by repeating sub-step 7–8. The iterative process generates the optimal nodes of the given layer of the HFPNN. Step 7 Check the termination criterion. The termination condition that controls the growth of the model consists of two components, that is the performance index and a size of the network (expressed in terms of the maximal number of the layers). As far as the performance index is concerned (that reflects a numeric accuracy of the layers), a termination is straightforward and comes in the form, F 1 pF :
(15)
Where, F 1 denotes a maximal fitness value occurring at the current layer whereas F stands for a maximal fitness value that occurred at the previous layer. As far as the depth of the network is concerned, the generation process is stopped at a depth of
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
415
less than five layers. This size of the network has been experimentally found to build a sound compromise between the high accuracy of the resulting model and its complexity as well as generalization abilities. In this study, we use a measure (performance indexes) that is the mean squared error (MSE) EðPI s or EPI s Þ ¼
N 1 X ðy y^ p Þ2 ; N p¼1 p
(16)
where, yp is the pth target output data and y^ p stands for the pth actual output of the model for this specific data point. N is training ðPI s Þ or testing ðEPI s Þ input–output data pairs and E is an overall(global) performance index defined as a sum of the errors for the N. Step 8 Determine new input variables for the next layer. If (15) has not been met, the model is expanded. The outputs of the preserved nodes ðz1i ; z2i ; . . . ; zWi Þ serves as new inputs to the next layer ðx1j ; x2j ; . . . ; xWj Þ ðj ¼ i þ 1Þ: This is captured by the expression x1j ¼ z1i ; x2j ¼ z2i ; . . . ; xwj ¼ zwi :
(17)
The HFPNN algorithm is carried out by repeating steps 4–8 of the algorithm.
5. Simulation studies In this section, we present two examples with data coming from the NOx emission process of gas turbine power plant [28,14,21,22] and pH neutralization process [6,9,11,13,23,27] which are well-documented datasets used in the realm of fuzzy and neurofuzzy modeling to illustrate the characteristics and capabilities of the HFPNN. We also contrast the performance of the model introduced here with those existing in the literature. 5.1. NOx emission process of gas turbine power plant NOx emission process is also modeled using the data of gas turbine power plants. Till now, NOx emission processes are almost based on ‘‘standard’’ mathematical model in order to obtain regulation data from control process. However, such models do not develop the relationships between variables of the NOx emission process and parameters of its model in an effective manner. A NOx emission process of a GE gas turbine power plant located in Virginia, USA, is chosen in this modeling study. The input variables include ambient temperature at site (AT), compressor speed (CS), low pressure turbine speed (LPTS), compressor discharge pressure (CDP), and turbine exhaust temperature (TET). The output variable is NOx [28,14,21,22]. We consider 260 pairs of the original input–output data. The performance index is
ARTICLE IN PRESS 416
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
defined by (16). One hundred and thirty out of 260 pairs of input–output data are used as learning set; the remaining part serves as a testing set. Using NOx emission process data, the regression equation (model) reads as follows: y ¼ 163:77341 0:06709x1 þ 0:00322x2 þ 0:00235x3 þ 0:26365x4 þ 0:20893x5 :
ð18Þ
This simple model comes with the value of PI ¼ 17:68 and E_PI ¼ 19:23: We will be using these results as a reference point when discussing GA-based HFPNN models. Table 3 summarizes the list of parameters used in the genetic optimization of the PN-based and the FPN-based layer of HFPNN. In the optimization of each layer, we use 100 generations, 60 populations, a string of 36 bits, crossover rate equal to 0.65, and the probability of mutation set up to 0.1. A chromosome used in the genetic optimization consists of a string including 3 sub-chromosomes. The 1st chromosome contains the number of input variables, the 2nd chromosome contains the order of the polynomial, and finally the 3rd chromosome contains input variables. The numbers of bits allocated to each sub-chromosome are equal to 3, 3, and 30, respectively. The population size being selected from the total population size (60) is equal to 30. The process is realized as follows. 60 nodes (PNs or FPNs) are generated in each layer of the network. The parameters of all nodes generated in each layer are estimated and the network is evaluated using both the training and testing data sets. Then we compare these values and choose 30 nodes (PNs or FPNs) that produce the best (lowest) value of the performance index. The maximal number (Max) of inputs to be selected is confined to two to five (2–5). In case of the PNbased layer of HFPNN, the order of the polynomial is chosen from three types that is Type 1, Type 2, and Type 3 (refer to Table1), while in case of the FPN-based layer of HFPNN, the polynomial order of the consequent part of fuzzy rules is chosen from four types, that is Type 1, Type 2, Type 3, and Type 4 as shown in Table 1. As usual in fuzzy systems, we may exploit a variety of membership functions in the condition part of the rules and this is another factor contributing to the flexibility of the network. Overall, triangular or Gaussian fuzzy sets are of general interest. The first class of triangular membership functions provides with a very simple implementation. The second class becomes useful because of an infinite support of its fuzzy sets. As mentioned previously in Table 2, we consider two kinds of input vector formats for the regression polynomial function of the conclusion part of the fuzzy rules in the 1st layer, namely selected inputs (Type T) and entire system inputs (Type T*). Fig. 11 shows an example of the FPN design that is driven by some specific chromosome, refer to the case that the performance values are PI ¼ 0:009; EPI ¼ 0:133 in the 1st layer (FPN-based layer) when using triangular MF and Max ¼ 4 with Type T in Table 5(a). In each FPN of the 1st layer, two triangular membership functions for each input variable are used. Here, the number of entire input variables (here, entire system input variables) considered in the 1st layer is given as 5. The polynomial order selected is given as Type 3. Refer to sub-step 4 of step 4.3 of the introduced design
Table 3 Computational aspects of the genetic optimization of the PN-based and FPN-based layer of HFPNN
GA
2nd layer
3rd layer
4th layer
5th layer
Maximum generation Total population size Selected population size (W) Crossover rate Mutation rate String length Maximal no. (Max) of inputs to be selected Polynomial type (Type T)a Maximal no. (Max) of inputs to be selected Polynomial type (Type T) of the consequent part of fuzzy rulesb Consequent input type to be used for Type Tc
100 60 30
100 60 30
100 60 30
100 60 30
100 60 30
0.65 0.1 3+3+30 1plpMax ð2 5Þ
0.65 0.1 3+3+30 1plpMax ð2 5Þ
0.65 0.1 3+3+30 1plpMax ð2 5Þ
0.65 0.1 3+3+30 1plpMax ð2 5Þ
0.65 0.1 3+3+30 1plpMax ð2 5Þ
1pTp3 1plpMax ð2 5Þ
1pTp3 1plpMax ð2 5Þ
1pTp3 1plpMax ð2 5Þ
1pTp3 1plpMax ð2 5Þ
1pTp3 1plpMax ð2 5Þ
1pTp4
1pTp4
1pTp4
1pTp4
1pTp4
Type T
Type T
Type T
Type T
Type T
Type T Triangular
Type T Triangular
Type T Triangular
Type T Triangular
Type T Triangular
Gaussian 2
Gaussian 2
Gaussian 2
Gaussian 2
Gaussian 2
Membership function (MF) type No. of MFs per input a;b
l, T, Max: integers,
c
refer to Tables 1 and 2 respectively.
ARTICLE IN PRESS
FPN-based layer
1st layer
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
PN-based layer
Parameters
417
ARTICLE IN PRESS 418
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
Fig. 11. The example of the FPN design guided by some chromosome (in case of using triangular MF and Max ¼ 4 with Type T (entire system input vector formats) in the 1st layer).
process. As mentioned previously, the maximal number (Max) of input variables for the selection is confined to 4, and three variables (such as x4 and x5 ) were selected among them. The parameters of the conclusion part (polynomial) of fuzzy rules can be determined by the standard least-squares method. Fig. 12 shows an example of the PN design that is driven by some specific chromosome, refer to the case that the performance values are PI ¼ 0:007; EPI ¼ 0:060 in the 2nd layer when using triangular MF and Max ¼ 4 as shown in Table 5(a). Especially, in the 2nd layer or higher (PN-based layer), the number of entire input variables is given as W, 30 that is the number of the nodes selected in the current layer, as the output nodes of the preceding layer. The maximal number of input variables for the selection is confined to 4 over nodes of the 2nd or higher layer of the network. Tables 4 and 5 summarize the results for two kinds of input vector formats: According to the maximal number of inputs to be selected (Max ¼ 2–5), the selected node numbers, the selected polynomial type, and its corresponding performance index (PI and EPI) were shown when the genetic optimization for each layer was carried out. ‘‘Node’’ denotes the nodes for which the fitness value is maximal in each layer. For
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
419
Fig. 12. The example of the PN design guided by some chromosome (in case of using triangular MF and Max ¼ 4 with Type T in the 2nd layer).
example, in case of Table 4(b), the fitness value in layer 5 is maximal for Max ¼ 5 when nodes 14, 25 occurring in the previous layer are selected as the node inputs in the present layer. Only two inputs of Type 2 (quadratic) were selected as the result of the genetic optimization. Here, node ‘‘0’’ indicates that it has not been selected by the genetic operation. Therefore the width (the number of nodes) of the layer can be lower in comparison to the conventional HFPNN (which immensely contributes to the compactness of the resulting network). In that case, the minimal value of the performance index at the node, that is PI ¼ 0:056; EPI ¼ 0:123 are obtained. Figs. 13 and 14 show the values of performance index vis-a`-vis number of layers of the gHFPNN with respect to the maximal number of inputs to be selected as optimal architectures of each layer of the network included in Tables 4 and 5 while in Figs. 13 and 14, A()- D() denote the optimal node numbers at each layer of the network, namely those with the best predictive performance. Here, the node numbers of the 1st layer (FPN-based layer) represent system input numbers, and the node numbers of each layer in the 2nd layer or higher (PN-based layer) represent the output node numbers of the preceding layer, as the optimal node that has the best output performance in the current layer. Figs. 15 and 16 illustrate the detailed optimal topologies of the network with two layers. In Fig. 15, when using selected input vector formats (Type T) and Max equal to 5, the performance of gHFPNN was quantified by the values of PI ¼ 0:150; EPI ¼ 0:342 for triangular MF, and PI ¼ 0:113; EPI ¼ 0:405 for Gaussian-like MF
420
ARTICLE IN PRESS
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
Table 4 Performance index of the network of each layer versus the increase of maximal number of inputs to be selected (Type T: selected input vector formats)
ARTICLE IN PRESS
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
Table 5 Performance index of the network of each layer versus the increase of maximal number of inputs to be selected (Type T : entire system input vector formats)
421
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
422
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
12
14 12 A : (18 30) B : (8 10 21) C : (14 18 19 27) D : ( 2 19 21 28 30)
8
A : (4 5) B : (3 4 5) C : (1 2 4 5) D : (1 2 3 4 5)
6 4
A : (7 11) B : (7 16 19) C : (11 24 25 0) D : (5 11 12 22 0)
A : (16 29) B : (10 16 28) C : ( 6 15 26 0) D : ( 1 11 16 0 0)
2
Testing error
Training error
10
A : (3 18) B : (5 10 15) C : ( 5 16 27 0) D : (19 22 23 28 0)
10 8 6 4 2
0
0 1
2
3
4
5
1
2
Layer
(a-1) Training error
5
Triangular MF Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ; 6
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ; 10
A : (4 5) B : (3 4 5) C : (2 3 4 5) D : (2 3 4 5 0)
5
9 8
4 A : (6 14) B : (9 24 0) C : (9 3 30 0) D : (3 7 28 0 0)
3 A : (8 29) B : (3 9 22) C : (8 12 19 25) D : (4 10 0 0 0)
2
Testing error
Training error
4
(a-2) Testing error
(a)
A : (2 9) B : (2 19 25) C : (20 23 0 0) D : (14 25 0 0 0) A : (9 20) B : (20 22 25) C : (15 17 18 0) D : (3 6 21 22 0)
1
7 6 5 4 3 2 1
0
0 1
(b)
3
Layer
2
3
4
5
1
2
3
Layer
Layer
(b-1) Training error
(b-2) Testing error
4
5
Gaussian-like MF
Fig. 13. Performance index according to the increase of number of layers (Type T): (a) triangular MF, (b) Gaussian-like MF.
whereas in Fig. 16, when using entire system input vector formats (Type T ) and Max equal to 4, the performance of gHFPNN architectures was reported by the values of PI ¼ 0:007; EPI ¼ 0:060 for triangular MF, and PI ¼ 0:016; EPI ¼ 0:091 for Gaussian-like MF. As shown in Figs. 15 and 16, the genetic design procedure at each stage (layer) of gHFPNN leads to the selection of the preferred nodes (FPNs or PNs) with optimal local characteristics (such as the number of input variables, the order of polynomial of the consequent part of fuzzy rules, and a collection of the specific subset of input variables). Therefore the width (the number of nodes) of the layer as well as the depth (the number of layers) of the network can be lower in comparison to the conventional network (which immensely contributes to the compactness of the resulting network).
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431 Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
423
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
0.015
0.14
0.014
A : (3 25) B : (2 13 21) C : (5 13 15 26) D : (2 24 27 0 0)
0.011 0.01
A : ( 9 13) B : (12 30 0) C : (4 19 23 0) D : (4 22 24 27 0)
0.009 0.008
Training error
Training error
0.012
0.12
A : (16 19) B : ( 3 24 29) C : (14 24 00) D : (10 11 25 0 0)
0.013
A : (15 17) B : (13 22 0) C : ( 4 11 17 20) D : (17 18 20 22 0)
A : (4 5) B : (4 5 0) C : (4 5 0 0) D : (4 5 0 0 0)
0.007 0.006
0.1 0.08 0.06 0.04
0.005
0.02 1
2
3
4
5
1
2
Layer
(a) Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
5
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
0.7 A : (2 5) B : (2 5 0) C : (2 4 5 0) D : (2 4 5 0 0)
0.14
0.6
A : (9 23) B : (7 11 0) C : (8 10 27 0) D : (9 29 0 0 0)
0.12 0.1
A : (10 19) B : (2 18 20) C : (1 8 10 0) D : (10 13 19 22 0)
0.08 0.06
Training error
0.16
Training error
4
Triangular MF
0.18
A : (7 16) B : (5 13 20) C : (2 8 23 0) D : (1 9 21 25 29) A : (12 15) B : (13 25 28) C : (10 11 19 0) D : (11 20 30 0 0)
0.04
0.5 0.4 0.3 0.2 0.1
0.02
0
0
1
2
3
4
5
1
Layer
(b)
3
Layer
2
3
4
5
Layer Gaussian-like MF
Fig. 14. Performance index according to the increase of number of layers (Type T ): (a) triangular MF, (b) Gaussian-like MF.
Fig. 17 illustrates the optimization process by visualizing the performance index in successive generations of the genetic optimization in case of Max ¼ 5 and Gaussianlike MF with Type T : It also shows the optimized network architecture over five layers. Table 6 contrasts the performance of the genetically developed network with other fuzzy and neuro-fuzzy models studied in the literatures. The experimental results clearly reveal that the proposed approach and the resulting model outperforms the existing networks both in terms of better approximation capabilities (lower values of the performance index on the training data, PIs ) as well as superb generalization abilities (expressed by the performance index on the testing data, EPIs ). Moreover, the structurally optimized gHFPNN leads to the effective reduction of the depth of network as well as the width of the layer and the avoidance of a substantial amount of time-consuming iterations for finding the most preferred network in the
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
424 FPN2 3
AT
4
FPN19
CS
4
AT
2
FPN21
LPTS
3
CDP
PN1
4
5
yˆ
2
4
PN6
LPTS
FPN28
TET
FPN PN1 4 PN 31 4
CS
1
2
CDP
FPN10 3
FPN30 5
yˆ
2
2
TET
1
(a)
(b)
Fig. 15. gHFPNN architecture in case of using selected input vector formats (Type T): (a) gHFPNN with two layers for Max ¼ 5 and triangular MF, (b) gHFPNN with two layers for Max ¼ 5 and Gaussian-like MF.
FPN5
AT
3
CS
2
2
PN19
LPTS
4
FPN15 2
CDP
yˆ
3
3
3
PN15
FPN10
LPTS
2
2
3
yˆ
2
CDP
FPN26
TET
4
CS
FPN13 3
FPN8
AT
4
FPN27
TET
2
2
4
(b)
(a)
Fig. 16. gHFPNN architecture in case of using entire system input vector formats (Type T ): (a) gHFPNN with two layers for Max ¼ 4 and triangular MF, (b) gHFPNN with two layers for Max ¼ 4 and Gaussianlike MF.
0.18
0.6
0.16 0.5 0.12
Testing error
Training error
0.14
0.1 1st layer
2nd layer
4th layer
3rd layer
5th layer
0.08 0.06
0.4 0.3
1st layer
2nd layer
3rd layer
4th layer
5th layer
0.2
0.04 0.1 0.02 0
0 0
(a)
100
200
300
Generation
400
500
0
(b)
100
200
300
400
500
Generation
Fig. 17. The optimization process reported in terms of PI and EPI (in case of using Gaussian-like MF, Max ¼ 5 and Type T ): (a) Training error (PI), (b) Testing error (EPI).
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
425
Table 6 Comparison of performance with other modeling methods Model
Performance index PI
Regression model Oh’s model [14] FNN [21] (GAs+complex) Multi-FNN [22] Proposed gHFPNN
Type T
Type T
FNN AIM Simplified Linear Linear Triangular ðMax ¼ 5Þ Gaussian ðMax ¼ 5Þ Triangular ðMax ¼ 4Þ Gaussian ðMax ¼ 4Þ
PIs
EPIs
17.68
19.23
5.835 8.420 y ¼ 0:4 y ¼ 0:2 y ¼ 0:75 2nd layer 5th layer 2nd layer 5th layer 2nd layer 5th layer 2nd layer 5th layer
6.269 3.725 0.720 0.150 0.071 0.113 0.056 0.007 0.005 0.016 0.002
8.778 5.291 2.025 0.342 0.131 0.405 0.123 0.060 0.032 0.091 0.039
conventional multi-layer self-organizing neural networks. PIs ðEPIs Þ is defined as the mean square errors (MSE) between the experimental data and the respective outputs of the model (network). 5.2. pH neutralization process To demonstrate the high modeling accuracy of the GA-based SOPNN, we apply it to a highly nonlinear of pH neutralization of a weak acid and a strong base. This model can be found in a variety of practical areas including wastewater treatment, biotechnology processing, and chemical processing [6,9,11,23,27]. pH is the measurement of the acidity or alkalinity of a solution containing a proportion of water. The system inputs of the GA-based SOPNN structure consist of the delayed terms of F b ðtÞ and ypH ðtÞ which are input and output of the process, i.e. y^ pH ðtÞ ¼ jðF b ðt 3Þ; F b ðt 2Þ; F b ðt 1Þ; ypH ðt 3Þ; ypH ðt 2Þ; ypH ðt 1ÞÞ (19) where, y^ pH and ypH denote the GA-based SOPNN model output and the actual process output, respectively. Five hundred data pairs are generated where total data are used for training. To come up with a quantitative evaluation of the network, we use the standard MSE performance index of Eq. (16). The parameters used for optimization of this process modeling are almost the same as used in the previous experiments. The GA-based design procedure is carried out in the same manner as in the previous experiments. Tables 7 and 8 summarize the
ARTICLE IN PRESS 426
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
detailed results for Type T and Type T respectively while Figs. 18 and 19 pertain to the values of the performance index. As shown in Tables 7 and 8, the performance of the genetically optimized network in case of using Gaussian-like MFs becomes almost the same as that in case of triangular MFs. In case of using triangular MFs of Table 7(a), the result for network in the 1st layer is obtained when using Max ¼ 4 with Type 3 polynomials (quadratic functions) and 4 node at input (node numbers are 3, 4, 5, 6); this network comes with the value of PI equal to 1:4e 4: When using Gaussian MF, the result for the network in the 1st layer comes with PI ¼ 1:3e 4; these have been reported when Max ¼ 4 again with the 2nd order polynomials and 4 node at the input (node inputs: 1, 2, 3, 6). Table 9 covers a comparative analysis including several previous fuzzy and neurofuzzy models. Compared with these models, the gHFPNN emerges as the one with a very high accuracy.
6. Concluding remarks In this study, we have introduced and investigated a class of HFPNN driven to genetic optimization regarded as a modeling vehicle for nonlinear and complex systems. The genetically optimized HFPNNs are constructed by combining FNNs with PNNs. In contrast to the conventional HFPNN structures and their learning, the proposed model comes with two types of fuzzy inference-based FNN (consisted of FPNs) as well as a diversity of local characteristics of PNs that are extremely useful when coping with various nonlinear characteristics of the system under consideration. In other words, through the consecutive generation of a layer through a growth process (iteration) of the gHFPNN, the depth (layer size) and width (node size of each layer) of the network could be flexibly selected based on a diversity of local characteristics of these preferred FPNs and PNs (such as the number of input variables, the order of the consequent polynomial of rules/the polynomial order, and a collection of specific subset of input variables) available within HFPNN. The design methodology comes as a hybrid structural optimization (based on GMDH method and genetic optimization) and parametric learning being viewed as two fundamental phases of the design process. The GMDH method is now comprised of both a structural phase such as a self-organizing and an evolutionary algorithm, and the ensuing parametric phase of the least square estimation (LSE)-based learning. In the sequel, the ensuing network leads to achieving a sound balance between model accuracy and its complexity as well as reducing a conflict between overfitting and generalization abilities of the developed model. The experimental studies involving well-known datasets quantify a superb performance of the network in comparison to the existing fuzzy and neuro-fuzzy models. Most importantly, through the proposed framework of genetic optimization we can efficiently search for the optimal network architecture (structurally and parametrically optimized network) and this becomes crucial in improving the performance of the resulting model.
ARTICLE IN PRESS
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
Table 7 Performance index of the network of each layer versus the increase of maximal number of inputs to be selected (Type T: selected input vector formats)
427
428
ARTICLE IN PRESS
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
Table 8 Performance index of the network of each layer versus the increase of maximal number of inputs to be selected (Type T : entire system input vector formats)
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431 Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
0.7
0.5 A : (6 18) B : (19 23 29) C : (20 21 23 26) D : (1 18 19 21 23)
0.5
0.45
A : (14 20) B : (4 16 26) C: (5 18 21 30) D : (1 6 11 21 22)
A : (5 6) B : (1 2 3) C : (3 4 5 6) D : (1 2 3 4 5)
0.4
A : (2 25) B : (3 5 29) C : (5 12 26 28) D : (4 6 19 25 28)
0.3
A : (1 20) B : (9 13 16) C : (8 14 18 23) D : (14 20 23 25)
0.2 0.1
A : (28 30) B : (4 13 30) C : (2 14 15 26) D : (2 3 12 13 21)
A : (3 5) B : (4 5 6) C : (1 2 3 6) D : (1 3 4 5 6)
0.4
Training error
0.6
Training error
429
0.35
A : (3 21) B : (8 18 26) C : (5 19 28 30) D : (1 2 7 11 23)
0.3 0.25
A : (7 10) B : (2 16 29) C : (1 5 12 19) D : (3 8 12 15 19)
A : (15 24) B : (2 15 30) C : (2 3 14 20)
0.2
D : (3 19 23 25 29)
0.15 0.1 0.05
0
0
1
2
(a)
3
4
5
1
2
3
(b)
Layer
4
5
Layer
Fig. 18. Performance index according to the increase of number of layers (Type T): (a) triangular MF, (b) Gaussian-like MF.
-4
1.38
A : (4 6) B : (1 3 4) C : (1 4 5 6) D : (1 2 3 5 6)
1.5
Training error
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ;
A : (27 30) B : (7 19 27) C : (7 21 22 23) D : (5 13 24 25 29)
1.45
A : (17 18) B : (3 4 8) C : (2 4 22 24) D : (3 7 24 27 28)
A : (5 11) B : (3 20 30) C : (7 12 24 28) D : (2 13 21 26 0)
1.4 1.35
x 10
A : (5 28) B : (6 13 18) C : (8 13 18 26) D : (3 18 22 27 0)
1.3
-4
Maximal number of inputs to be selected(Max) 2(A) ; , 3(B) ; , 4(C) ; ,5 (D) ; A : (19 23) B : (11 19 29) C : (11 21 25 28) D : (4 7 13 21 22)
A : (25 28) B : (7 17 26) C : (6 13 18 29) D : (10 16 19 23 27)
1.36
Training error
1.55
x 10
1.34
A : (7 18) B : (3 5 29) C: (3 19 21 26) D : (11 17 19 24 30)
A : (2 3) B : (1 4 6) C : (1 3 4 6) D : (1 2 3 4 5)
1.32 1.3
A : (14 19) B : (14 16 22) C : (12 14 20 22) D : (1 14 23 24 29)
1.28 1.26 1.24
1.25 1
2
(a)
3
4
Layer
1
5
2
(b)
3
4
5
Layer
Fig. 19. Performance index according to the increase of number of layers (Type T ): (a) triangular MF, (b) Gaussian-like MF.
Table 9 Comparison of performance with other modeling methods
Nie’s model [13] SOPNN [16]
Proposed gHFPNN
Model
Performance index
USOCPN SSOCPN Basic SOPNN (15th Layer) Modified SOPNN (10th Layer) Type T (1st layer) Type T (1st layer)
0.230 0.012 0.0015 0.0052 0.0039 0.0124 0.000141 0.000130 0.000149 0.000136
Triangular Gaussian Triangular Gaussian
Case 1 Case 2 Case 1 Case 2 Max ¼ 4 Max ¼ 4 Max ¼ 2 Max ¼ 2
ARTICLE IN PRESS 430
S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
Acknowledgements This work was supported by Wonkwang University (2005).
References [1] V. Cherkassky, D. Gehring, F. Mulier, Comparison of adaptive methods for function estimation from samples, IEEE Trans. Neural Networks 7 (1996) 969–984. [2] O. Cordon, et al., Ten years of genetic fuzzy systems: current framework and new trends, Fuzzy Sets and Syst. 141 (1) (2004) 5–31. [3] J.A. Dickerson, B. Kosko, Fuzzy function approximation with ellipsoidal rules, IEEE Trans. Syst. Man, Cybern. Part B 26 (1996) 542–560. [4] S.J. Farrow, The GMDH algorithm, in: S.J. Farrow (Ed.), Self-organizing Methods in Modeling: GMDH Type Algorithms, Marcel Dekker, New York, 1984, pp. 1–24. [5] J.H. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbour, 1975. [6] R.C. Hall, D.E. Seberg, Modeling and self-tuning control of a multivariable pH neutralization process, in: Proceedings of the ACC, 1989, pp. 1822–1827. [7] A.G. Ivahnenko, Polynomial theory of complex systems, IEEE Trans. Syst. Man Cybern. SMC-12 (1971) 364–378. [8] K.A. De Jong, Are genetic algorithms function optimizers?, in: R. Manner, B. Manderick (Eds.), Parallel Problem Solving from Nature 2, North-Holland, Amsterdam. [9] C.L. Karr, E.J. Gentry, Fuzzy control of pH using genetic algorithms, IEEE Trans. Fuzzy Syst. 1 (1993) 46–53. [10] S. Kleinsteuber, N. Sepehri, A polynomial network modeling approach to a class of large-scale hydraulic systems, Comput. Elect. Eng. 22 (1996) 151–168. [11] T.J. McAvoy, Time optimal and Ziegler-Nichols control, Ind. Eng. Chem. Process Des. Develop 11 (1972) 71–78. [12] Z. Michalewicz, Genetic algorithms + data structures ¼ evolution programs, Springer-Verlag, Berlin, Heidelberg, 1996. [13] J. Nie, A.P. Loh, C.C. Hang, Modeling pH neutralization processes using fuzzy-neural approaches, Fuzzy Sets and Syst. 78 (1996) 5–22. [14] S.-K. Oh, A study on the development of intelligent models and simulator concerning the pattern of an air pollutant emission in a thermal power plant, Technical Report, Korea Energy Management Corporation, Korea, 2003 (in Korean). [15] S.-K. Oh, W. Pedrycz, Identification of fuzzy systems by means of an auto-tuning algorithm and its application to nonlinear systems, Fuzzy Sets and Syst. 115 (2) (2000) 205–230. [16] S.-K. Oh, W. Pedrycz, The design of self-organizing polynomial neural networks, Informat. Sci. 141 (2002) 237–258. [17] S.-K. Oh, W. Pedrycz, Fuzzy polynomial neuron-based self-organizing neural networks, Int. J. General Syst. 32 (3) (2003) 237–250. [18] S.-K. Oh, W. Pedrycz, D.-W. Kim, Hybrid fuzzy polynomial neural networks, Int. J. Uncertainty, Fuzziness and Knowledge-Based Syst. 10 (3) (2002) 257–280. [19] S.-K. Oh, W. Pedrycz, T.-C. Ahn, Self-organizing neural networks with fuzzy polynomial neurons, Applied Soft Computing 2 (1F) (2002) 1–10. [20] S.-K. Oh, W. Pedrycz, B.-J. Park, Polynomial neural networks architecture: analysis and design, Comput. Electrical Eng. 29 (6) (2003) 703–725. [21] S.-K. Oh, W. Pedrycz, H.-S. Park, Hybrid identification in fuzzy-neural networks, Fuzzy Sets and Syst. 138 (2) (2003) 399–426. [22] S.-K. Oh, W. Pedrycz, H.-S. Park, Rule-based multi-FNN identification with the aid of evolutionary fuzzy granulation, J. Knowledge-Based Syst. 17 (1) (2004) 1–13.
ARTICLE IN PRESS S.-K. Oh et al. / Neurocomputing 64 (2005) 397–431
431
[23] G.A. Pajunen, Comparison of linear and nonlinear adaptive control of a pH-process, IEEE Control Syst. Mag. (1987) 39–44. [24] W. Pedrycz, An identification algorithm in fuzzy relational system, Fuzzy Sets and Systems 13 (1984) 153–167. [25] W. Pedrycz, M. Reformat, Evolutionary Optimization of Fuzzy Models in Fuzzy Logic: A Framework for the New Millennium, in: V. Dimitrov, V. Korotkich (Eds.), Studies in Fuzziness and Soft Computing, vol. 8, Physica-Verlag, Wurzburg, 1996, pp. 51–67. [26] B.-J. Park, W. Pedrycz, S.-K. Oh, Fuzzy polynomial neural networks: hybrid architectures of fuzzy modeling, IEEE Trans. Fuzzy Syst. 10 (5) (2002) 607–621. [27] F.G. Shinskey, pH and pION Control in Process and Waste Streams, Wiley, New York, 1973. [28] G. Vachtsevanos, V. Ramani, T.W. Hwang, Prediction of gas turbine NOx emissions using polynomial neural network, Technical Report, Georgia Institute of Technology, Atlanta, 1995.
Witold Pedrycz is a Professor and Canada Research Chair (CRC) in the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada. He is also with the Systems Research Institute of the Polish Academy of Sciences. He is actively pursuing research in Computational Intelligence, fuzzy modeling, knowledge discovery and data mining, fuzzy control including fuzzy controllers, pattern recognition, knowledge-based neural networks, relational computation, bioinformatics, and Software Engineering. He has published numerous papers in this area. He is also an author of eight research monographs covering various aspects of Computational Intelligence and Software Engineering. Witold Pedrycz has been a member of numerous program committees of conferences in the area of fuzzy sets and neurocomputing. He currently serves as an Associate Editor of IEEE Transactions on Systems Man and Cybernetics and IEEE Transactions on Fuzzy Systems. Dr. Pedrycz is an Editor-in-Chief of Information Sciences. He is a President-elect of IFSA and president of NAFIPS.
Sung-Kwun Oh received the BSc, MSc, and PhD degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1981, 1983, and 1993, respectively. During 1983–1989, he had worked as the Senior Researcher of R&D Laboratory of Lucky-Goldstar Industrial System Co. Ltd. He had worked as a Postdoctoral fellow in the Department of Electrical and Computer Engineering University of Manitoba, Canada, from 1996 to 1997. He is currently a Professor in the School of Electrical, Electronic and Information Engineering, Wonkwang University, Korea. His research interests include the fields of fuzzy system, fuzzy-neural networks, automation systems, advanced Computational Intelligence, and intelligent control. He is a member of IEEE. He currently serves as an Associate Editor of the Korean Institute of Electrical Engineerings (KIEE) and the Institute of Control, Automation & Systems Engineers (ICASE), Korea. Ho-Sung Park received his B.S. and M.S. degrees in Control and Instrumentation Engineering from Wonkwang University, Korea in 1999 and 2001, respectively. He is currently a PhD student at the same institute. His research interests include fuzzy and hybrid systems, neurofuzzy models, genetic algorithms, and computational intelligence. He is a member of KIEE and ICASE, Korea.