Advanced Engineering Informatics 18 (2004) 29–39 www.elsevier.com/locate/aei
A new approach to self-organizing multi-layer fuzzy polynomial neural networks based on genetic optimization Sung-Kwun Oha, Witold Pedryczb,c,* a
Department of Electrical Electronic and Information Engineering, Wonkwang University, 344-2, Shinyong-Dong, Iksan, Chon-Buk 570-749, South Korea b Department of Electrical and Computer Engineering, University of Alberta, Edmonton AB T6G 2G6, Canada c Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Received 22 December 2003; revised 15 May 2004; accepted 15 May 2004
Abstract In this paper, we introduce a new topology of Fuzzy Polynomial Neural Networks (FPNN) that is based on a genetically optimized multilayer perceptron with fuzzy polynomial neurons (FPNs) and discuss its comprehensive design methodology involving mechanisms of genetic optimization, especially genetic algorithms (GAs). The development of the ‘conventional’ FPNNs uses an extended Group Method of Data Handling. The network exploits a fixed fuzzy inference type in each FPN of the FPNN as well as considers a fixed number of input nodes at FPNs (or nodes) located in each layer. Here, the proposed FPNN gives rise to a structurally optimized structure and comes with a substantial level of flexibility in comparison to the one we encounter in conventional FPNNs. The structural optimization is realized via GAs whereas in case of the parametric optimization we proceed with a standard least-square method-based learning. The performance of the networks is quantified through experimentation involving two time series dataset already used in fuzzy modeling. The results demonstrate their superiority over the existing fuzzy and neural models. q 2004 Elsevier Ltd. All rights reserved. Keywords: Fuzzy polynomial neural networks; Multi-layer perceptron; Fuzzy polynomial neuron; Genetic optimization
1. Introduction Recently, much attention has been devoted towards advanced techniques of modeling complex system. Ideally, such models should come with significant approximation and generalization abilities as well as a high level of interpretability. Along this line, neural networks, fuzzy sets and evolutionary computing regarded as the fundamental technologies of Computational Intelligence (CI) have expanded and enriched a field of modeling quite immensely. They have also given rise to a number of new methodological issues and increased our awareness about tradeoffs one has to make in system modeling [1–4]. In particular, it is evident that hybridization of fuzzy sets and neurocomputing is one of the most promising pursuits combining representation issues with learning abilities. Especially, neural fuzzy systems and genetic fuzzy systems hybridize the mechanisms of * Corresponding author. Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 2G6, Canada. Tel.: þ 780-492-3333; fax: þ780-492-1811. E-mail address:
[email protected] (W. Pedrycz). 1474-0346/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.aei.2004.05.001
approximate reasoning available through fuzzy systems with the learning capabilities of neural networks and evolutionary algorithms [5]. When the dimensionality of the model goes up (say, the number of variables increases), so do the difficulties. Fuzzy sets capture an important aspect of transparency of the models. They also help to accommodate design experience of a system designer whose prior domain knowledge could be then quite effectively articulated and quantified in the language of fuzzy sets. One of the existing hybrid (neurofuzzy) systems comes in the form of fuzzy polynomial neurons (FPNs) and ensuing networks called self-organizing polynomial neural networks (FPNNs) [6]. Although an FPNN comes with a significant level of structural flexibility, it is difficult to design a fully structurally and parametrically optimized network because of the limited design mechanisms of the FPNs themselves. When we construct FPNs located at each layer of the conventional FPNN, their essential design parameters such as the number of input variables (nodes), the order of the consequent polynomial of fuzzy rules, and the collection of input variables available within a FPN have to be fixed
30
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
(selected) in advance by designer. In the sequel, the FPNN algorithm exhibits some tendency to produce overly complex networks. The design restrictions may lead to some computing overhead caused by the use of the trial and error method and/or a repetitive parameter adjustment process carried out by the system designer. Essentially this is the same drawback that we have already encountered when using the original Group Method of Data Handling (GMDH) algorithm. In this study, to address the design problems identified above that reside with the ‘conventional’ SOPNN (especially, FPN-based SOPNN called ‘FPNN’ [6,9]) as well as the GMDH algorithm itself, we introduce a new genetic design approach; as a consequence we will be referring to the resulting networks as GA-based FPNNs (to be called ‘gFPNN’). The determination of the optimal values of the parameters of the individual FPNs (viz. the number of input variables, the order of the polynomial, and a collection of input variables) leads to a structurally and parametrically optimized network. We present a detailed design process and then contrast the performance of such networks with other neurofuzzy models reported in the literature.
2. The architecture and development of fuzzy polynomial neural networks In this section, we elaborate on the architecture and a development process of the FPNN. This network emerges from the genetically optimized multi-layer perceptron architecture based on GA algorithms and the extended GMDH method [18,19]. 2.1. FPNN based on fuzzy polynomial neurons We start with a brief discussion of a FPN. This neuron, regarded as a generic type of the neural processing unit, dwells on the concepts of fuzzy sets and neural networks. Fuzzy sets form its linguistic interface by linking the external world-numeric data with the processing unit. Neurocomputing manifests in the form of a local polynomial unit realizing a nonlinear processing. The FPN encapsulates a family of nonlinear ‘if –then’ rules. When put together, FPNs results in a self-organizing FPNN. As visualized in Fig. 1, the FPN consists of two basic functional modules. The first one, labeled by F, is a collection of fuzzy sets (here {Al } and {Bk }) that form an interface between the input numeric variables and the processing part realized by the neuron. Here xq and xp denote input variables. The second module (denoted here by P) is concerned with some function-based nonlinear (polynomial) processing. This nonlinear processing involves some input variables (xi and xj ). In other words, FPN realizes a family of multiple-input single-output rules. Each rule, referred again to Fig. 1, reads in the form if xp is Al and xq is Bk then z is Plk ðxi ; xj ; alk Þ
ð1Þ
Fig. 1. A general topology of the generic FPN module (F, fuzzy set-based processing part; P, the polynomial form of mapping).
where alk is a vector of the parameters of the conclusion part of this rule while Plk ðxi ; xj ; alk Þ denotes the regression polynomial forming the consequent part of the fuzzy rule which uses several types of high-order polynomials (linear, quadratic, and modified quadratic). For details refer to Table 1. The activation levels of the rules contribute to the output of the FPN being computed as a weighted average of the individual condition parts (functional transformations) PK (note that the index of the rule, namely K is treated as a shorthand notation for the two indexes of fuzzy sets used in this rule (1), that is K ¼ ðl; kÞ). allX allX rules rules allX rules z¼ mK PK ðxi ;xj ;aK Þ mK ¼ m~K PK ðxi ;xj ;aK Þ K¼1
K¼1
K¼1
ð2Þ In the above expression, we use an abbreviated notation to describe an activation level of the ‘K’th rule to be in the form mK m~K ¼ all rules ð3Þ X mL L¼1
The FPNN comes with a highly versatile architecture both in terms of the flexibility of the individual nodes (that are essentially banks of nonlinear ‘if – then’ rules) as well as the interconnectivity between the nodes and organization of the layers. The learning of FPNNs is based upon the extended GMDH algorithm. When developing a certain FPN, we use a genetic algorithm to construct the optimized network and this is accomplished by selecting such essential Table 1 Different forms of the regression polynomials standing in the consequent part of the fuzzy rules Order of the polynomial
0 1 2 2
(Type 1) (Type 2) (Type 3) (Type 4)
No. of inputs 1
2
3
Constant Linear Quadratic
Constant Bilinear Biquadratic-1 Biquadratic-2
Constant Trilinear Triquadratic-1 Triquadratic-2
1, Basic type; 2, Modified type.
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
31
Fig. 2. Overall genetically driven structural optimization process of FPNN.
structural parameters as the number of input variables, the order of polynomial and a specific subset of input variables. Based on the number of the nodes (input variables) and the polynomial order, refer to Table 1, we build the optimized self-organizing network architectures of the FPNNs. Proceeding with each layer of FPNN, the design alternatives available within a single FPN can be carried out with regard to the entire collection of the input variables or its selected subset as they occur in the consequent part of fuzzy rules encountered at the first layer. Following these criteria, we distinguish between two fundamental types of the structure (called Type T, Type T*), that is Type T the input variables in the conditional part of fuzzy rules are the same as those occurring in the conclusion part of the fuzzy rules. Type T* the entire collection of the input variables is kept as input variables in the conclusion part of the fuzzy rules. 2.2. Genetic optimization of FPNN GAs is a stochastic search technique based on the principles of evolution, natural selection, and genetic recombination where an optimal solution is found by simulating a process of ‘survival of the fittest’ in a population of potential solutions (individuals). GAs are capable of a global exploration of a solution space. They help to pursue potentially fruitful paths while examining random points with an intent of reducing the likelihood of being stuck in local minima. The main features of genetic algorithms concern individuals viewed as strings, population-based optimization (search through the genotype space) and stochastic search
mechanisms (such as selection and crossover). In order to enhance the learning of the FPNN and augment its performance, we use genetic algorithms to carry out the structural optimization of the network by optimally selecting parameters such as the number of input variables (nodes), the order of polynomial, and input variables within a FPN. In this study, for the optimization of the FPNN model, GA uses a serial method of binary type, roulette-wheel used in the selection process, one-point crossover in the crossover operation, and a binary inversion (complementation) operation in the mutation operator [7]. To retain the best individual and carry it over to the next generation, we use elitist strategy [8]. The overall genetically driven structural optimization process of FPNN is visualized in Fig. 2.
3. The algorithms and design procedure of genetically optimized FPNN We use PNN as a basic building block in the structure of the FPNN. Each neuron of the network realizes a polynomial type of partial description (PD) of the mapping between input and output variables. The input – output relation formed by the PNN algorithm can then be described in the following manner y ¼ f ðx1 ; x2 ; …; xn Þ
ð4Þ
The estimated output y^ of the actual output y reads as X X ^ 1 ; x2 ; …; xn Þ ¼ c0 þ y^ ¼ fðx ck1 xk1 þ ck1k2 xk1 xk2 þ
X
k1
ck1k2k3 xk1 xk2 xk3 þ …
k1k2
ð5Þ
k1k2k3
where ck s are the coefficients of the model to be optimized.
32
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
The framework of the design procedure of the FPNN based on genetically optimized multi-layer perceptron architecture consists of the following steps. Step 1. Determine system’s input variables. Step 2. Form training and testing data. The input – output data set ðxi ; yi Þ ¼ ðx1i ; x2i ; …; xni ; yi Þ; i ¼ 1; 2; …; N (with N being the total number of data points) is divided into two parts, that is, a training and testing dataset. Step 3. Decide upon design parameters that determine the FPNN topology The essential design parameters include (a) fuzzy inference method and fuzzy identification † Fuzzy inference method: the regression polynomial inference method used † Membership function (MF) type: Triangular or Gaussian-like MF † Number of MFs for each input of a node (or FPN): two MFs used † Structure of the consequent part of fuzzy rules: based on the consequence polynomial type (b) According to the stopping criterion, two termination methods are exploited: † Criterion level for comparison of a minimal identification error of the current layer with that occurring at the previous layer of the network. † The maximum number of layers (predetermined by the designer) with intent to achieve a sound balance between model accuracy and its complexity. (c) The maximum number of input variables coming to each node in the corresponding layer. (d) The total number W of nodes to be retained (selected) at the next generation of the FPNN algorithm. (e) The depth of the FPNN to be selected in order to reduce a possible conflict between overfitting and generalization abilities of the developed FPNN. (f) The depth and width of the FPNN to be selected as a result of a tradeoff between accuracy and complexity of the overall model. Step 4. Choose a FPN structure using genetic design. When it comes to the organization of the chromosome representing (mapping) the structure of the FPNN, we divide the chromosome to be used for genetic optimization into three sub-chromosomes. The first sub-chromosome contains the number of input variables, the second subchromosome involves the order of the polynomial of the node, and the third sub-chromosome (remaining bits) contains input variables coming to the corresponding node (FPN). All these elements are optimized when running the GA. In nodes (FPNs) of each layer of FPNN, we adhere to the notation visualized in Fig. 3. ‘FPNn’ denotes the nth FPN (node) of the corresponding layer, N denotes
Fig. 3. Formation of each FPN in FPNN architecture.
the number of nodes (inputs or FPNs) coming to the corresponding node, and T denotes the polynomial order of conclusion part of fuzzy rules used in the corresponding node. Each sub-step of the genetic design of the three types of the parameters available within the FPN is structured as follows: Step 4-1. Selection of the number of input variables (first sub-chromosome). Sub-step (1). The first three bits of the given chromosome are assigned to the binary bits for the selection of the number of input variables. Sub-step (2). The three bits randomly selected as
b ¼ ð22 £ bitð3ÞÞ þ ð21 £ bitð2ÞÞ þ ð20 £ bitð1ÞÞ
ð6Þ
are then decoded into its decimal equivalent. Here, bitð1Þ; bitð2Þ and bitð3Þ show each location of these three bits and are denoted as ‘0’, or ‘1’, respectively. Sub-step (3). The above decimal value b is rounded off
g ¼ ðb=aÞ £ ðMax 2 1Þ þ 1
ð7Þ
where Max denotes the maximal number of input variables entering the corresponding node (FPN) while a is the decoded decimal value corresponding to the situation when all bits of the first sub-chromosome are set up as 1s. Sub-step (4). The normalized integer value is then treated as the number of input variables (or input nodes) coming to the corresponding node. Step 4-2. Selection of the polynomial order of the consequent part of fuzzy rules (second sub-chromosome). Sub-step (1). The three bits of the second subchromosome are assigned to the binary bits for the selection of the order of polynomial. Sub-step (2). The three bits randomly selected using (6) are decoded into a decimal format. Sub-step (3). The decimal value b obtained by means of (7) is normalized and rounded off. The value of Max is replaced with 4. This mapping is straightforward (a) The normalized integer value is given as 1—the polynomial order of the consequent part of fuzzy rules is 0 (constant: Type 1). (b) The normalized integer value is given as 2—the polynomial order of the consequent part of fuzzy rules is 1 (linear function: Type 2).
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
(c) The normalized integer value is given as 3—the polynomial order of the consequent part of fuzzy rules is 2 (quadratic function: Type 3). (d) The normalized integer value is given as 4—the polynomial order of the consequent part of fuzzy rules is 2 (modified quadratic function: Type 4). Sub-step (4). The normalized integer value is given as the selected polynomial order, when constructing each node of the corresponding layer. Step 4-3. Selection of input variables (third subchromosome). Sub-step (1). The remaining bits are assigned to the binary bits for the selection of input variables. Sub-step (2). The remaining bits are divided by the value obtained in Step 4-1. Sub-step (3). Each bit structure is decoded into decimal (through relationship (6)). Sub-step (4). Each decimal value obtained in Sub-step (3) is then normalized following (7). Sub-step (5). The normalized integer values are then taken as the selected input variables while constructing each node of the corresponding layer. Here, if the selected input variables are multiple-duplicated, the multiple-duplicated input variables (viz. same input numbers) are treated as a single input variable while the remaining ones are discarded. Step 5. Carry out fuzzy inference and coefficient parameters estimation for fuzzy identification in the selected node (FPN) Regression polynomials (polynomial and in the specific case, a constant value) occurring in the conclusion part of fuzzy rules are given as different types (Type 1, 2, 3, or 4), see Table 1. In each fuzzy inference, we consider two types of membership functions, namely triangular and Gaussianlike membership functions. The consequent part can be expressed by linear, quadratic, or modified quadratic polynomial equation as shown in Table 1. The use of the regression polynomial inference method gives rise to the expression
33
The generation process can be organized as the following sequence of steps Sub-step (1). We set up initial genetic information necessary for generation of the FPNN architecture. This concerns the number of generations and populations, mutation rate, crossover rate, and the length of the chromosome. Sub-step (2). The nodes (FPNs) are generated through the genetic design. Sub-step (3). To evaluate the performance of FPNs (nodes) constructed using the training dataset, the testing dataset is used. Based on this performance index, we calculate the fitness function. The fitness function reads as follows Fðfitness functionÞ ¼
1 1 þ EPI
ð10Þ
where EPI denotes the performance index for the testing data (or validation data). In this case, the model is obtained by the training data and EPI is obtained from the testing data (or validation data) of the FPNN model constructed by the training data. Sub-step (4). To move on to the next generation, we carry out selection, crossover, and mutation operation using initial genetic information and the fitness values obtained via Substep (3). Sub-step (5). The nodes (FPNs) obtained on the basis of the calculated fitness values ðF1 ; F2 ; …; Fz Þ are rearranged in a descending order. We unify the nodes with duplicated fitness values (viz. in case that one node has the same fitness value as other nodes) among the rearranged nodes on the basis of the fitness values. We choose several FPNs characterized by the best fitness values. Here, we use the pre-defined number W of FPNs with better predictive capability that need to be preserved to assure an optimal operation at the next iteration of the FPNN algorithm. The outputs of the retained nodes (FPNs) serve as inputs to the next layer of the network. There are two cases as to the number of the retained FPNs, that is
ð9Þ
If W p , W; then the number of the nodes (FPNs) retained for the next layer is equal to z: Here, W p denotes the number of the retained nodes in each layer that nodes with the duplicated fitness values are moved. (ii) If W p $ W; then for the next layer, the number of the retained nodes (FPNs) is equal to W:
where, fi ð·Þ is a regression polynomial function of the input variables. Here we consider a linear regression polynomial function of the input variables. The consequent parameters are produced by the standard least squares method. Step 6. Select nodes (FPNs) with the best predictive capability and construct their corresponding layer.
Sub-step (6). For the elitist strategy, we select the node that has the highest fitness value among the selected nodes ðWÞ: Sub-step (7). We generate new populations of the next generation using operators of GAs obtained from Sub-step (4). We use the elitist strategy. This sub-step carries out by repeating Sub-steps 2 –6. Especially in Sub-step (5), we replace the node that has the lowest fitness value in the
Ri : If x1 s Ai1 ; …; xk is Aik then yi ¼ fi ðx1 ; x2 ; …; xk Þ
ð8Þ
The numeric output y^ is determined in the same way as in the previous approach, namely Xn y^ ¼
i¼1
n X mi fi ðx1 ; x2 ; …; xk Þ Xn ¼ m^i fi ðx1 ; x2 ; …; xk Þ m i¼1 i¼1 i
(i)
34
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
current generation with the node that has reached the highest fitness value in the previous generation obtained from Substep (6). Sub-step (8). We combine the nodes (W populations) obtained in the previous generation with the nodes (W populations) obtained in the current generation. In the sequel, W nodes that have higher fitness values among them ð2WÞ are selected. That is, this sub-step carries out by repeating Sub-step (5). Sub-step (9). Until the last generation, this sub-step carried out by repeating Sub-steps 7– 8. The iterative process generates the optimal nodes of the given layer of the FPNN. Step 7. Check the termination criterion. The termination condition that controls the growth of the model consists of two components, that is the performance index and a size of the network (expressed in terms of the maximal number of the layers). As far as the performance index is concerned (that reflects a numeric accuracy of the layers), a termination is straightforward and comes in the form ð11Þ
F1 # Fp
where F1 denotes a maximal fitness value occurring at the current layer whereas Fp stands for a maximal fitness value that occurred at the previous layer. As far as the depth of the network is concerned, the generation process is stopped at a depth of less than five layers. This size of the network has been experimentally found to build a sound compromise between the high accuracy of the resulting model and its complexity as well as generalization abilities. In this study, we use two measures (performance indexes) that is the Mean Squared Error (MSE) and the Root Mean Squared Error (RMSE) (i) Mean squared error EðPIs or EPIs Þ ¼
N 1 X ðy 2 y^ p Þ2 N p¼1 p
(ii) Root mean squared error vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u X u1 N ðy 2 y^ p Þ2 EðPIs or EPIs Þ ¼ t N p¼1 p
ð12Þ
ð13Þ
where, yp is the pth target output data and y^ p stands for the pth actual output of the model for this specific data point. N is training ðPIs Þ or testing ðEPIs Þ input – output data pairs and E is an overall (global) performance index defined as a sum of the individual errors. Step 8. Determine new input variables for the next layer. If (11) has not been met, the model is expanded. The outputs of the preserved nodes ðz1i ; z2i ; …; zWi Þ serve as new inputs to the next layer ðx1j ; x2j ; …; xWj Þ ðj ¼ i þ 1Þ: This is captured by the expression x1j ¼ z1i ; x2j ¼ z2i ; …; xwj ¼ zwi
ð14Þ
The overall FPNN algorithm is carried out by repeating Steps 4– 8.
4. Experimental studies The performance of the GA-based FPNN is illustrated with the aid of well-known and widely used dataset of the chaotic Mackey – Glass time series [10 – 17] and NOx emission process of gas turbine power plants [20 – 23]. First, we demonstrate, how the gFPNN can be utilized to predict future values of a chaotic Mackey –Glass time series. The performance of the network is also contrasted with some other models existing in the literature [10 –16]. The time series is generated by the chaotic Mackey –Glass differential delay equation, which comes in the form x_ ðtÞ ¼
0:2xðt 2 tÞ 2 0:1xðtÞ 1 þ x10 ðt 2 tÞ
ð15Þ
The prediction of future values of this series arises as a benchmark problem that has been used and reported by a number of researchers. To obtain the time series value at each time instant, we applied the fourth-order Runge-Kutta method to find a numerical solution to (15). Then from the series xðtÞ; we extracted 1000 input – output data tuples that are organized in the following format ½xðt 2 18Þ; xðt 2 12Þ; xðt 2 6Þ; xðtÞ; xðt þ 6Þ where t ¼ 118 to 1117. The first 500 pairs were used as the training data set while the remaining 500 pairs formed the testing data set. To come up with a quantitative evaluation of the network, we use the standard RMSE performance index (13). Table 2 summarizes the list of parameters used in the genetic optimization of the network. In the optimization of each layer, we used 100 generations, 60 populations, a string of 36 bits, crossover rate equal to 0.65, and the probability of mutation set up to 0.1. Such results are commonly used in genetic optimization. A chromosome used in the genetic optimization consists of a string including three subchromosomes. The first chromosome contains the number of input variables, the second chromosome contains the order of the polynomial, and finally the third chromosome contains input variables. The number of bits allocated to each subchromosome are equal to 3, 3, and 30, respectively. The population size of the individuals being selected from the total population (60) is equal to 30. This process is realized as follows. Sixty nodes (FPNs) are generated in each layer of the network. The parameters of all nodes generated in each layer are estimated and the network is evaluated using both the training and testing data sets. Then we compare these values and choose 30 FPNs that produce the best (lowest) value of the performance index. The maximal number (Max) of inputs to be selected is confined to the range of two to five (2 –5) entries. The polynomial order of the consequent part of fuzzy
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39 Table 2 Summary of the parameters of the genetic optimization
GA
FPNN
Parameters
First layer
Second to fifth layer
Maximum generation Total population size Selected population size ðWÞ Crossover rate Mutation rate String length Maximal no. (Max) of inputs to be selected Polynomial type (Type T) of the consequent part of fuzzy rules Consequent input type to be used for Type T (*) Membership function (MF) type
100
100
60
60
30
30
0.65 0.1 3 þ 3 þ 30 1 # l # Maxð2 – 5Þ
0.65 0.1 3 þ 3 þ 30 1 # l # Maxð2 – 5Þ
1#T #4
1#T #4
Type T*
Type T
Triangular
Triangular
Gaussian 2
Gaussian 2
No. of MFs per input
l; T; Max : integers, Tp means that entire system inputs are used for the polynomial in the conclusion part of the rules.
rules is chosen from four types, that is Types 1– 4 (as shown in Table 1). Table 3 summarizes the performance of the first and the second layer of the network changing the maximal number of selected inputs; here Max assumes values between 2 and 5. Fig. 4 visualizes the performance index of each layer of gFPNN according to the increase of the maximal number of inputs to be selected. In Fig. 4, A(·)– D(·) denote the optimal node numbers at each layer of the network, namely those with the best predictive performance. Fig. 5(a) illustrates the detailed optimal topologies of gFPNN for the first layer when using Max ¼ 4 and triangular MF: the results of the network have been reported as PI ¼ 0:0017 and
35
EPI ¼ 0.0016. Fig. 5(b) illustrates the detailed optimal topology of the gFPNN for one layer in case we used Max ¼ 3 and Gaussian-like MF. This network comes with the values of PI and EPI equal to 0.0014 and 0.0013, respectively. As shown in Fig. 5, the proposed network enables the architecture to be structurally simpler network than the conventional FPNN. Fig. 6 illustrates the optimization process by visualizing the values of the performance index obtained in successive generations of GA. It shows the optimized architecture when using Gaussian-like MF (the maximal number ðMaxÞ of inputs to be selected is set to 4 with the structure composed of five layers). As shown in Fig. 6, the variation ratio (slope) of the performance of the network is almost the same up to the third to fifth layer. Therefore in this case, we can apply the stopping criterion up to maximally two layers and this is done in order to effectively reduce the number of nodes as well as the number of layers of the network. Table 4 provides a comparative summary of the network. The experimental results clearly reveal that it outperforms the existing models both in terms of better approximation capabilities (lower values of the performance index on the training data, PIs ) as well as superb generalization abilities (expressed by the performance index on the testing data EPIs ). PIs ðEPIs Þ is defined as the RMSE computed for the experimental data and the respective outputs of the network. In the next example, we are concerned with a fuzzy model for an NOx emission process that is usually represented with the aid of data of gas turbine power plants. Till now, almost NOx emission processes are based on a ‘standard’ mathematical model. However, such models do not fully capture the relationships between variables of the NOx emission process and parameters of its model in an effective manner. For this numeric experiment, we have considered the NOx emission process of a GE gas turbine power plant located in Virginia, USA. The input variables include Ambient Temperature at site (AT), Compressor Speed (CS), Low Pressure Turbine Speed (LPTS), Compressor Discharge Pressure (CDP), and Turbine Exhaust Temperature (TET). The output variable
Table 3 Performance index of the network of each layer versus the increase of the maximal number of inputs to be selected
36
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
Fig. 4. Performance index of gFPNN with respect to the increase of number of layers: (a) Triangular MF: (a-1) Training error; (a-2) Testing error. (b) Gaussianlike MF: (b-1) Training error; (b-2) Testing error.
Fig. 5. Genetically optimized FPNN (gFPNN) architecture: (a) Triangular MF (one layer) (b) Gaussian-like MF.
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
37
Fig. 6. The optimization process of each performance index by the genetic algorithms: (a) Training error; (b) Testing error.
is a concentration of NOx [20 – 23]. The performance index is defined in terms of (12). Using the emission process data, the regression equation (model) reads as follows y ¼ 2163:77341 2 0:06709x1 þ 0:00322x2 þ 0:00235x3 þ 0:26365x4 þ 0:20893x5
ð16Þ
This simple model comes with the value of PI ¼ 17:68 and E_PI ¼ 19:23: We will be using these results as a reference point when discussing the genetically optimized FPNN models. The parameters used for optimization of this process modeling are almost the same as used in the previous
experiments. The GA-based design procedure is carried out in the same manner as in the previous experiments. Fig. 7 depicts the values of the performance index vis-a`vis the number of layers of the gFPNN taken with respect to the maximal number of inputs to be selected as optimal architectures of each layer of the network when using Gaussian-like MF. Considering the training and testing data sets, the best results were obtained for a network where in the second layer we used using triangular MF and set up Max as five. This happens for the structure of Type 1 (constant) and four input nodes (i.e. nodes 2, 3, 6, and 16). Here we obtain PI ¼ 0:006; EPI ¼ 0:044. The best results for the network with the fifth layer and those come with PI ¼ 0:002 and EPI ¼ 0:023 have been reported when
Table 4 Comparative analysis of the performance of the network; considered are models reported in the literature Model
Performance index PI
Wang’s model [10]
Modified (fifth layer) Triangular
Gaussian
a b
EPIs
NDEIa
0.044 0.013 0.010
Cascaded-correlation NN [11] Backpropagation MLP [11] Sixth-order polynomial [11] ANFIS [12] FNN model [13] Recurrent neural network [14] SONNb [15] Basic (fifth layer)
Proposed gFPNN
PIs
0.0016 0.014
0.0015 0.009
0.0011 0.0027 0.0012 0.0038 0.0015 0.0011 0.0014 0.0009 0.0014 0.0005 0.0005
0.0011 0.0028 0.0011 0.0038 0.0014 0.0011 0.0013 0.0010 0.0013 0.0006 0.0006
0.06 0.02 0.04 0.007
0.0138 Case 1 Case 2 Case 1 Case 2 Second layer (Max ¼ 4) Fifth layer (Max ¼ 4) Second layer (Max ¼ 5) Fifth layer (Max ¼ 5) First layer (Max ¼ 3) First layer (Max ¼ 4) First layer (Max ¼ 5)
0.0014 0.0010 0.0011 0.0007 0.0014 0.0005 0.0005
0.005 0.011 0.005 0.016 0.0061 0.0045 0.0051 0.0031 0.0064 0.0023 0.0023
Non-dimensional error index (NDEI) as used in [11] is defined as the root mean square errors divided by the standard deviation of the target series. Conventional optimized FPNN.
38
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
Fig. 7. Performance index according to the increase of number of layers (in case of using Gaussian-like MF): (a) training error; (b) testing error.
Fig. 8. The optimization process of each performance index by the GAs (in case of Max ¼ 5 and Gaussian-like MF): (a) Training error; (b) Testing error.
using triangular MF and Max equal to five with Type 1 and four inputs nodes (nodes 4, 15, 23, and 29). In addition, the most preferred results for the network in the second and fifth layer coming with PI ¼ 0:012; EPI ¼ 0:094 (the corresponding node numbers are 9, 18), and PI ¼ 0:006; EPI ¼ 0:035 (the nodes are 5 and 7), respectively, have been reported when using Gaussian-like MF and Max ¼ 5 with Type 2. Fig. 8 illustrates the optimization process by visualizing the performance index obtained in successive generations of the gFPNN. As shown there, the variation
ratio (slope) of the performance of the gFPNN model is moving almost similarly around the third to fifth layer. Therefore, to effectively reduce a significant number of nodes and avoid a large amount of time-consuming iteration of gFPNN layers, the stopping criterion can be taken into consideration and applied up to maximally the second layer. Table 5 contrasts the performance of the gFPNN network with the previous models studied in the literature. Again, we note a significant improvement provided by the genetically optimized network.
Table 5 Comparison of performance with other modeling methods Model
Performance index PI
Regression model Oh’s model [21] FNN [22] (GAs þ Complex) Multi-FNN [23] Our model
FNN AIM Simplified Linear Linear Triangular (second layer) Gaussian (second layer)
PIs
EPIs
17.68
19.23
5.835 8.420
u ¼ 0:4 u ¼ 0:2 u ¼ 0:75 Max ¼ 5 Max ¼ 5
6.269 3.725 0.720 0.006 0.012
8.778 5.291 2.025 0.044 0.094
S.-K. Oh, W. Pedrycz / Advanced Engineering Informatics 18 (2004) 29–39
5. Concluding remarks In this study, the comprehensive GA-based design procedure of FPNN has been introduced. The GA-based design procedure applied at each stage (layer) of the FPNN leads to the selection of a subset of preferred nodes (or FPNs) with specific local characteristics (such as the number of input variables, the order of the consequent polynomial of fuzzy rules, and input variables). The detailed development process contributes to the flexibility of the architecture of the network. The design methodology comes in the form of a hybrid structural optimization and parametric learning being viewed as two fundamental phases of the design process. The GMDH method now comprises of intuitively appealing phases: the structural development supported through a certain self-organizing and an evolutionary algorithm and the ensuing parametric phase of the Least Square Estimation (LSE)-based learning. The experimental studies involving well-known datasets demonstrate a superb performance of the network in comparison to the number of existing fuzzy and neurofuzzy models. Most importantly, through the framework of genetic optimization we can efficiently search for the optimal network architecture (structurally and parametrically optimized network) and this helps improve the performance of the model.
Acknowledgements This work has been supported by KESRI(R-2003-B-274) which is funded by MOCIE (Ministry of Commerce, Industry, and Energy).
References [1] Cherkassky V, Gehring D, Mulier F. Comparison of adaptive methods for function estimation from samples. IEEE Trans Neural Netw 1996; 7:969–84. [2] Dickerson JA, Kosko B. Fuzzy function approximation with ellipsoidal rules. IEEE Trans Syst Man Cybern Part B 1996;26: 542 –60. [3] Sommer V, Tobias P, Kohl D, Sundgren H, Lundstrom L. Neural networks and abductive networks for chemical sensor signals: a case comparison. Sensors Actuators 1995;28:217–22. [4] Kleinsteuber S, Sepehri N. A polynomial network modeling approach to a class of large-scale hydraulic systems. Comput Electr Eng 1996; 22:151–68.
39
[5] Cordon O, Gomide F, Herrera F, Hoffmann F, Magdalena L. Ten years of genetic fuzzy systems: current framework and new trends. Fuzzy Sets Syst 2004;141(1):5–31. [6] Oh S-K, Pedrycz W. Self-organizing polynomial neural networks based on PNs or FPNs: analysis and design. Fuzzy Sets Syst 2004; 142(2):163 –98. [7] Michalewicz Z. Genetic algorithms þ data structures ¼ evolution programs. Berlin: Springer; 1996. [8] De Jong KA. Are genetic algorithms function optimizers?, In: Manner R, Manderick B, editors. Parallel problem solving from nature 2, Amsterdam: North-Holland; 1992. [9] Oh S-K, Pedrycz W. Fuzzy polynomial neuron-based self-organizing neural networks. Int J Gen Syst 2003;32(3):237–50. [10] Wang LX, Mendel JM. Generating fuzzy rules from numerical data with applications. IEEE Trans Syst Man Cybern 1992;22(6): 1414– 27. [11] Crowder IIIRS. Predicting the Mackey–Glass time series with cascade-correlation learning. In: Touretzky D, Hinton G, Sejnowski T, editors. Proceedings of the 1990 Connectionist Models Summer School. Carnegic Mellon University; 1990. p. 117 –23. [12] Jang JSR. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 1993;23(3):665–85. [13] Maguire LP, Roche B, McGinnity TM, McDaid LJ. Predicting a chaotic time series using a fuzzy neural network. Inf Sci 1998;112: 125–36. [14] James Li C, Huang T-Y. Automatic structure and parameter training methods for modeling of mechanical systems by recurrent neural networks. Appl Math Model 1999;23:933–44. [15] Oh S-K, Pedrycz W, Ahn T-C. Self-organizing neural networks with fuzzy polynomial neurons. Appl Soft Comput 2002;2(1): 1–10. [16] Lapedes AS, Farber R. Non-linear signal processing using neural networks: prediction and system modeling. Technical report LA-UR87-2662, Los Alamos, NM: Los Alamos National Networks; 1987. pp. 87545. [17] Mackey MC, Glass L. Oscillation and chaos in physiological control systems. Science 1977;197:287–9. [18] Oh S-K, Pedrycz W. The design of self-organizing polynomial neural networks. Inf Sci 2002;141:237 –58. [19] Oh S-K, Pedrycz W, Park B-J. Polynomial neural networks architecture: analysis and design. Comput Electr Eng 2003;29(6): 703 –25. [20] Vachtsevanos G, Ramani V, Hwang TW. Prediction of gas turbine NOx emissions using polynomial neural network. Technical report, Atlanta: Georgia Institute of Technology; 1995. [21] Oh S-K, Ahn T-C. A thesis of emission pattern model about the atmosphere pollution material of a power plant. Technical Report, Electrical Engineering and Science Research Institute; 1997. [in Korean]. [22] Oh S-K, Pedrycz W, Park H-S. Hybrid identification in fuzzy-neural networks. Fuzzy Sets Syst 2003;138(2):399 –426. [23] Oh S-K, Pedrycz W, Park H-S. Rule-based Multi-FNN identification with the aid of evolutionary fuzzy granulation. J Knowl-Based Syst 2004;17(1):1 –13.