A new algorithm for structure optimization in fuzzy neural networks

A new algorithm for structure optimization in fuzzy neural networks

A new algorithm for structure optimization in fuzzy neural networks B. Pizzileo, K. Li, G. W. Irwin School of Electronics, Electrical Engineering and ...

225KB Sizes 0 Downloads 175 Views

A new algorithm for structure optimization in fuzzy neural networks B. Pizzileo, K. Li, G. W. Irwin School of Electronics, Electrical Engineering and Computer Science,

Queen’s University of Belfast, UK

(e-mail: {bpizzileo01, k.li, g.irwin}@qub.ac.uk).

Abstract: Reduced complexity in a fuzzy neural network eases the computational burden of construction and training from data, while enhancing the interpretability of the final model. Such structure optimisation can be done either by adjusting the number of inputs and the size of the rule set. In the literature these have generally been addressed independently (Sugeno and Yasukawa [1993], Hong and Harris [2003]). This paper presents a new algorithm where both structural parameters for a fuzzy neural network model are optimized together. Results from simulation examples are given to illustrate the new approach and confirm its advantage over existing methods. 1. INTRODUCTION Fuzzy inference systems (FIS) are one of the best known applications of fuzzy logic and fuzzy set theory (Zadeh [1965]), replacing differential equations when the system is uncertain or ill-defined as they do not require any precise quantitative analysis. There are two possible FIS model designs, semantic (Koczy and Hirota [1991]) and data-driven (Takagi and Sugeno [1985]). Both require a rule induction and a structure optimization process. The semantic-driven model comes from the original idea of Zadeh (Zadeh [1973]), where the rules are built based on expert knowledge, whilst the data-driven model design builds the rules using some procedure which fits the data. Data-driven model design became popular when fuzzy models were found to be universal approximators (Wang [1992]), i.e. they were proven to be able to approximate any continuous function with arbitrary accuracy. Unfortu­ nately this requires an unbounded number of rules (Kle­ ment et al. [1999]). Therefore, if a huge number of rules is initially generated, two issues arise: loss of interpretability and a high complexity in the final model. The latter makes it infeasible to compare the fuzzy and the classical modelling approaches in terms of speed, since the number of parameters involved in fuzzy model construction is extremely high. This is referred to as the so-called ”curse­ of-dimensionality” (Bellman [1957]) and raises the need for structure optimization as part of FIS model design. In data-driven model designs, a reduction in complexity can be generally achieved either by an input selection or by a rule selection. Input selection can be performed in either a global or a local manner. In the first case, the variable is removed and so is used by none of the rules. In the second case, input selection is done at the rule level leading to incomplete rules. Input selection can be achieved using the regularity criterion (Sugeno and Yasukawa [1993]), the geometric criterion (Emami et al. [1998]), individual discrimination power (Hong and Chen [1999]), entropy variation index

(Pal [1999]) or by extending Gram-Schmidt orthogonal least-squares (Hong and Harris [2001]). Two approaches have mainly been used for rule selection: orthogonal-least squares-based techniques (Yen and Wang [1999], Hong and Harris [2003]) and singular value-based decomposition methods (Yam et al. [1999], Baranyi et al. [2003]). As stated earlier, rule induction is also needed for FIS model design. Fuzzy neural networks (FNNs) provide one such method, having the numerical accuracy of neural networks while avoiding their black box nature (Mitra and Hayashi [2000]). Fuzzy neural networks (Jang et al. [1997]) represent a large class that combine the advantages of associative memory networks (e.g. B-splines, radial basis functions and support vector machines whose linear parameters can be trained online with good convergence and stability properties) with improved transparency, a critical issue for nonlinear dynamic modelling applications. The basis functions of fuzzy neural networks are in fact associated with linguistic rules and every numerical result can therefore admit a linguistic interpretation. For fuzzy-neural networks, finding methods to reduce the complexity while still maintaining good modelling accuracy has been the subject of much research. In an earlier paper, Pizzileo and Li [2007] proposed an algorithm for forward rule selection which chose correlated rules one at a time. This was subsequently optimized in Pizzileo et al. [2009] by adding a refinement procedure and integrating the rule selection procedure with a backward method (Li et al. [2006]) for global input selection, performed prior to the rule selection. In this case the two phases were considered as independent. This paper presents a new algorithm where the two alternatives for FNN structure optimization, i.e. local input selection and rule selection, are considered strictly related and therefore are performed together. The paper was organized as follows. In section 2 some preliminaries about fuzzy-neural network construction are introduced. Section 3 explains the problem of input se­ lection and two of the best known approaches are briefly

introduced. In Section 4 a new algorithm for structure optimization is described, while section 5 contains results from a simulation example to show the effectiveness of the proposed approach. The paper ends in section 6 with some conclusions.

Further, from equations (1) and (2), ∀t ∈ {1, . . . , N } the observed output of the fuzzy neural network can then be expressed as: nR � n � y(t) = f (x (t)) = y ˆ(t) + ε(t) = ϕi,j (t) gi,j + ε(t) i=1 j=1

2. PRELIMINARIES

T

In a fuzzy neural network with nR rules and n inputs, if N samples are used for training, the network can then be described by: • The membership degree µi,j (t) ∈ [0 1] (associated with the ith rule, i = 1, . . . , nR , the j th input, j = 1, . . . , n, and the tth sample, t = 1, . . . , N ). • The fuzzy set associated with the j th input, denoted as Aj,l , j = 1, . . . , n, l = 1, . . . , kj , where kj is the number of fuzzy sets for the j th input. The construction of a fuzzy neural network involves the following three steps: i) fuzzification ; ii) rule evaluation; iii) aggregation of rules and defuzzification. Defuzzification is commonly achieved by the centroid technique, which produces the centre of gravity n R � ωi (t)yi (t) � nR yˆ(t) = i=1nR = φi (t)yi (t) (1) � i=1 ωi (t)

Here x(t) = ( x1 (t) . . . xn (t) ) is the FNN input vector and ε(t) is the model residual sequence. If N data samples {x(t), y(t)}N t=1 are used for network construction and training, (2) can then be reformulated as y = ΦΘ + Ξ with y = (y(1) ... y(N ))T Θ = (g1,1 ... g1,n ... gnR ,1 ... gnR ,n )T T Ξ = (ε(1) ... ε(N )) A quadratic function is now defined as nR � n �N � E (Θ) = (y(t) − ϕi,j (t)gi,j )2 t=1

(3)

i=1 j=1

to represent the overall cost function associated with the whole network, where all possible terms are included in the fuzzy model. 3. INPUT SELECTION

i=1

where yˆ(t) is the � estimated crisp value for the tth sam­ n R � ple; φi (t) = ωi (t) ωi (t) is the fuzzy basis function i=1

n �

associated with the ith rule where ωi (t) = nR =

n � j=1

j=1

µi,j (t),

kj and yi (t) is the output associated with the

ith rule and the tth sample whose form depends on the particular choice of model structure. For example, a linear model will have n � yi (t) = gi,j xj (t), i = 1, . . . , nR (2) j=1

where gi,j is the consequent parameter associated with the j th input xj and the ith rule. For convenience of notation, the following definitions are introduced: (i) X = [x1 . . . xn ] and xj = (xj (1) . . . xj (N ))T , for j = 1, . . . , n. (ii) The scalars ϕi,j (t) = φi (t)xj (t), for t = 1, . . . , N , i = 1, . . . , nR and j = 1, . . . , n; T

(iii) The vectors φi,j = ( ϕi,j (1) . . . ϕi,j (N) ) , and T g i = ( gi,1 . . . gi,n ) ; � � (iv) The full column rank sub-matrix Φi = φi,1 . . . φi,n i = 1, . . . , nR associated with the ith fuzzy rule. Using this notation, the full fuzzy matrix Φ, with all candidate rules included, can be written as: Φ = [Φ1 . . . ΦnR ]

As mentioned in section 1, variable selection can be achieved using a number of alternative methods: the regularity criterion, the geometric criterion, individual discrimination power and the entropy variation index. As the last two are restricted to classification problems, only the regularity and geometric criteria are described briefly below. 3.1 The Regularity Criterion In Sugeno and Yasukawa [1993] the Regularity Criterion (RC) is presented. First the training dataset is divided into two groups, say A and B. The RC index is then calculated by: �N � N B � �2 � �2 � � �A � A NA + NB ytB − ytBA yt − yt AB t=1 RC = t=1 2 (4) Here NA and NB are the numbers of data points in the groups A and B, ytA and ytB are the output data from groups A and B, ytAB (or ytBA ) is the model output for group A (or B) estimated by the model identified using the group B (or A) data. Both structure and parameter identification are therefore needed to calculate ytAB (or ytBA ) and consequently the overall RC value. The Regularity Criterion algorithm involves several steps. A fuzzy model with one input is first attempted. In this case the n inputs lead to n models to be identified, one for each particular input. Following a structure and parameter identification, the RC value for each model is calculated: the selected input is the one with the smallest RC value. Next, the chosen input is fixed and a second input is added to the fuzzy model, selected from the remaining

n − 1 candidates. The second input is then selected as previously, according to the minimum value of RC. The above process continues until the value of RC increases, the number of selected inputs being m ≤ n. The advantage of this method is that structure identification comprises m(m + 1)/2 cases instead of 2n − 1. Fig. 1 shows a system with n = 4 initial inputs, where the inputs x1 , x2 , x3 and x4 respectively are selected at the end of the first, second, third and fourth stages of the RC process. This requires at most a total of n(n + 1)/2 identifications. Although the RC approach is computationally more efficient than an exhaustive search, from eq. (4) two models have to be built for each of the n(n + 1)/2 identifications, which can be time-consuming. 3.2 The geometric criterion (Emami et al. [1998])

nR of rules. In Pizzileo and Li [2007] an algorithm for rule selection, which selected M < nR rules was proposed. However, such a one-by-one rule selection does not lead to an optimal model. Subsequently, in Pizzileo et al. [2009], an algorithm to refine the subset of rules produced in the earlier paper was proposed, which was also integrated with a global input selection, where every ith rule contains the same number m of selected inputs, i = 1, . . . , M . Here the global input selection was performed prior to the rule selection. In the local input selection every ith rule contains mi inputs, where mi �= mi+1 . In the following, an algorithm for simultaneously a local in­ put and rule selection will be described. First, a condition to terminate the algorithm has to be found. This could be either a threshold cost function EM AX , a maximum number MM AX of rules to include in the model or perhaps a maximum number kM AX of parameters to be estimated. For the first case the procedure to be described below is terminated when the error associated with a number l of M � selected terms is smaller than EM AX , l = mi , M being

Here the authors argued that the regularity criterion is indeed time-consuming, especially for real systems with a large number of potential input variables. Unlike the RC, the input selection is separated from structure iden­ tification and is considered as an inherent property of the system itself. The significance of each input variable in the system is given from a geometric criterion, summarized in the following statement.

the number of selected rules and l ≤ p. For the second case the procedure terminates when the number M of selected rules is MM AX ≤ M , while in the last case it terminates when kM AX ≤ k.

Corollary: In a fuzzy system model, the necessary and suf­ ficient condition for an input variable to be nonsignificant is that it has a convex membership grade of one all over its domain in all IF-THEN rules.

The objective of fuzzy local input selection is to choose the terms one-by-one which make the biggest contribution to reducing the cost function (3). This can be achieved by the following iterative approach.

From this, a quantitative index can be defined as an overall measure of the non-significance of an input variable in the fuzzy system as follows, for j = 1, . . . , n: nR � Γij πj = (5) Γj i=1

(i) The procedure starts by setting k the number of se­ lected terms as k = 0, and initializing other intermediate �matrices and vectors: � � P0 is an� empty matrix, C0 = Φ = φ1,1 . . . φnR ,n � ϕ1 . . . ϕp ∈ R(p−k)×N , R0 = I ∈ RN ×N , A0 = {ahj }0 ∈ Rk×p , a0 = {ah }0 ∈ Rp×1 and the cost function E0 = yT y.

where Γij is the range in which the membership function µi,j is one, Γj is the entire range of the variable xj . In the domain of [0, 1], small values of πj represent a more effective variable and vice versa. While πj presents the overall effect of xj on the system, leading to a global input Γ selection, each Γij can produce a meaningful information j of the effect of xj on the ith rule, i.e. about a local input selection. Although the geometric criterion is faster than the regularity one, it has the limitation of being feasible only if trapezoidal membership functions are used. In the case of triangular or Gaussian functions, the quantity Γij /Γj , and consequently πj , will be in fact close to zero for all i = 1, . . . , nR . 4. THE PROPOSED ALGORITHM, LIS From (1) and (2), the total number of unknown parameters in the FNN is p = n × nR . For a complex problem, p tends to be very large. The resultant fuzzy network may thus become overly complex and computationally expensive to train and implement. Moreover, a fuzzy neural network with too many parameters tends to have poor generalization. There are two possible solutions, either reducing the number n of inputs or the number

i=1

(ii) The index k = k + 1 and the following are updated for i = 1, . . . , p − k + 1: Pk,i = [ Pk−1 ϕi ] �

[ Pk−1 pk,i ] �

Ck,i = [Φ] − [Pk,i ] � ϕ1,i . . . ϕp−k,i

(6)



In (6) the symbol ”-” means substraction of regressor term sets. (iii) The following are then calculated for i = 1, ..., p−k+1. • The residual matrices Rk,i ∈ RN ×N : T

Rk,i = Rk−1 −

T

(pk,i ) [Rk−1 ] (pk,i )

• The matrices Ak,i = {ahj }k,i ∈ Rk×p for h = k:

T

[Rk−1 ] (pk,i ) (pk,i ) [Rk−1 ]

{ahj }k,i

⎧ 0, j
for h < k: ⎧ h < k, j = k ⎨ {ah,k+i−1 }k−1 , h < k, j = k + i − 1 {ahj }k = {ahk }k−1 , ⎩ {a } elsewhere hj k−1 , • The vector ak = {ah }k ∈ Rp×1 : ⎧ h
for h < k: ⎧ h < k, j = k ⎨ {ah,k+i−1 }k−1 , h < k, j = k + i − 1 {ahj }k,i = {ahk }k−1 , ⎩ {a } elsewhere hj k−1 ,

• The error Ek = min {Ek,i }: ⎧ Ek = Ek−1 + δEk ⎪ � �2 ⎪ ⎪ �k−1 ⎨ T (ah ahk ) /ahh y (pk ) − h=1 ⎪ = − δE ⎪ �k−1 ⎪ ⎩ k T 2 (pk ) (pk ) − (ahk ) /ahh

• The vectors ak,i = {ah }k,i ∈ Rp×1 : ⎧ h
h=1

In (9) for convenience of notation ah � {ah }k , ahk � {ahk }k and ahh � {ahh }k .

(7)

h=1

where in (7) for convenience of notation ah � {ah }k,i , ahk � {ahk }k,i and ahh � {ahh }k,i . (iv) Next pk = arg min {Ek,i } is found and Pk is updated pk,i

as follows: Pk � [ Pk−1 pk ] = [p1 . . . pk ] where pi , i = 1, . . . , k are the selected terms. The re­ maining unselected terms from Φ are the vectors ϕi , i = 1, . . . , p − k and they are grouped in the matrix � � Ck = [Φ] − [Pk ] � ϕ1 . . . ϕp−k (8) to constitute the candidate pool. In (8) the symbol ”-” again means the set difference. (v) The following updates are now performed • The residual matrix Rk ∈ RN ×N : T

Rk = Rk−1 −

T

[Rk−1 ] (pk ) (pk ) [Rk−1 ] T

(pk ) [Rk−1 ] (pk )

• The matrix Ak = {ahj }k ∈ Rk×p for h = k: ⎧ 0, j
(9)

(vi) If EM AX ≤ Ek or MM AX ≤ M or kM AX ≤ k, step (ii) is repeated. Otherwise the procedure stops and the final quantities are recorded, for l = k. Pl = [ p1 . . . pl ] (10) where pk , k = 1, . . . , l are the selected terms. The remain­ ing unselected terms are ϕi , i = 1, . . . , p − l and they are grouped in the matrix � � Cl = [Φ] − [Pl ] = ϕ1 . . . ϕp−l (11) In (11) the symbol ”-” means the set difference. Once the procedure is terminated, the matrix Pl in eq. (10) ˆ of selected has to be rearranged to constitute the matrix Φ rules with local inputs. For this purpose every selected term pk has to be associated with the indexes i and j so that pk = φi,j . Now we can identify the sub-matrices ˆ i groups the vectors pk characterized ˆ i ⊆ Φi , where Φ Φ by the same index i and with an ascending order to their index j. Example: Suppose there are originally n = 7 inputs, two fuzzy sets for each input and therefore nR = 49 total number of rules initially. Further, consider that the proposed algorithm terminated with the following selected terms: � � Pl = [ p1 . . . p6 ] = φ1,2 φ4,6 φ1,4 φ1,3 φ3,7 φ1,6 ˆ will be: then the matrix� Φ � ˆ φ Φ= 1,2 φ1,3 φ1,4 φ1,6 φ3,7 φ4,6 � � � � ˆ = φ1,2 φ1,3 φ1,4 φ1,6 , Φ ˆ 3 = φ3,7 and where Φ � 1 � ˆ 4 = φ4,6 . Based on the above, the final number of Φ selected rules is M = 3, the number of selected terms is l = 6, the number of local inputs is m1 = 4, m2 = 1, m3 = 1. Here only the original 6th input is shared between the first and third rule, whilst all other selected local inputs are different in the different rule sets. An analysis of the computational complexity can be found by referring to Li et al. [2005]. If N >> l and p >> l, the total number CLIS of operations with the proposed Local Input Selection (LIS) method is CLIS = pN l.

5. SIMULATION EXAMPLE Consider the following benchmark nonlinear system to be identified (Hong and Harris [2001]-Pizzileo et al. [2009]):

related to the best performance. The LIS generally showed better performance over the training and the validation dataset in terms of Sum Squared Error (SSE). ∅

2.5y(t − 1)y(t − 2) y(t) = 1 + y 2 (t − 1) + y 2 (t − 2) + 0.3 cos (0.5 (y(t − 1) + y(t − 2))) + 1.2u(t − 1) + e(t)

x2

x3

x4

x1 , x4

x2 , x3

x2 , x4

x1

(12)

x1 , x2

where the system input and noise sequences are respec­ tively

x1 , x3

x1 , x2 , x3

x1 , x2 , x4

u (t) = (1/2) [sin (πt/20) + sin (πt/50)] � � e (t) ∼ N 0, 0.052

x1 , x3 , x4

x3 , x4 x2 , x3 , x4

x1 , x2 , x3 , x4

A data sequence of 1000 samples was generated. The first 500 samples were used for training, the remaining 500 reserved for validation. Initially with ny = 3 and nu = 2 the full input vector was set to

Fig. 1. Search tree for input selection Table 1. Structure identification using the RC

T

x (t) = [y (t − 1) , y (t − 2) , y (t − 3) , u (t − 1) , u (t − 2)] A first-order B-spline neurofuzzy network was used to approximate the system. The knot vectors for the input domain were defined for y(t) as: {−1.2, −1.0, −0.1, 0.8, 1.7, 2.6, 2.8} and for u(t) as: {−1.2, −1.1, −0.6, 0, 0.6, 1.1, 1.2} Here the number of inputs was n = 5, each of which was clustered using kj = 5, j = 1, . . . , n fuzzy sets. The initial number of fuzzy rules was therefore 3125 and the initial number of parameters to be estimated was 15625. In the following, results from the new algorithm of sec­ tion 4 above were compared to other approaches. The geometric criterion was not able to give any useful informa­ tion, since the membership functions were not trapezoidal. The input selection results obtained from applying the regularity criterion are summarized in Table 1. At each step the case marked by an ”O” is the chosen one, being associated with the lowest RC. The case marked by an ”X” is related to a value of RC bigger than the smallest RC value at the previous step. This then is eliminated from the candidate input sets at next step (Sugeno and Yasukawa [1993]). As a result, the four inputs y(t−1), y(t− 3), u(t − 1), u(t − 2) were selected, while the input y(t − 2) was excluded from the input set. Comparing eq. (12) with the results in Table 1 suggests that the RC could not find either the correct variables or show m = 3 as the correct number of inputs. This was assumed instead to be m = 4. In applying the new local input selection (LIS) algorithm, the stop criterion was chosen to be l, the number of param­ eters to be estimated as an indicator of the computational complexity of the fuzzy-neural network. Table 2 compares this algorithm with a forward rule selection (FRS, Pizzileo and Li [2007]) and a backward rule selection algorithm (BRS, Pizzileo et al. [2009]), for different numbers l of parameters to be estimated. In Table 3 the proposed algo­ rithm was also compared with the RC for different values of l. The regularity criterion was first used to select the global inputs, and then integrated into either the FRS or the BRS. In all these tables, values marked by a ”*” are

Input variables y(t − 1) y(t − 2) y(t − 3) u(t − 1) u(t − 2)

RC 1. 848 × 10−2 5. 122 × 10−2 9. 862 × 10−2 6. 579 × 10−1 6. 480 × 10−1

y(t − 1), y(t − 2) y(t − 1), y(t − 3) y(t − 1), u(t − 1) y(t − 1), u(t − 2)

1. 552 × 10−2 1. 391 × 10−2 1. 756 × 10−2 1. 822 × 10−2

step 3

y(t − 1), y(t − 3), y(t − 2) y(t − 1), y(t − 3), u(t − 1) y(t − 1), y(t − 3), u(t − 2)

1. 392 × 10−2 1. 380 × 10−2 1. 390 × 10−2

X O

step 4

y(t − 1), y(t − 3), u(t − 1), u(t − 2)

1. 049 × 10−2

O

step 1

step 2

Table 2. Comparison of FRS, BRS and the proposed LIS algorithm l 25

50

75

Method FRS BRS LIS FRS BRS LIS FRS BRS LIS

Training 41.6 38.9 3.6* 9.6 5.1 1.4* 3.2 2.4 1.0*

SSE Validation 39.8 39.1 4.1* 12.1 8.5 2.9* 4.9 4.2 3.7*

Table 3. Comparison of the RC integrated with a FRS or BRS, and the LIS l 20

40

60

Method RC+FRS RC+BRS LIS RC+FRS RC+BRS LIS RC+FRS RC+BRS LIS

Training 34.5 32.1 5.9* 6.5 5.3 1.7* 2.8 2.0 1.2*

SSE Validation 31.9 27.5 5.6* 5.5 5.1 3.1* 3.4 2.7* 3.2

O

O

6. CONCLUSIONS AND FUTURE WORK This paper proposes a new algorithm for structure opti­ mization in fuzzy neural networks. In contrast to existing methods, local input selection is considered strictly related to rule selection, and these two approaches for reducing the overall network complexity are therefore performed together. Simulation results suggest that the new algo­ rithm for reducing the FNN complexity can improve the modelling accuracy compared to alternatives from the literature. Future work will consider a wider range of applications to further validate the convenience on using the proposed Local Input Selection approach. ACKNOWLEDGEMENTS Barbara Pizzileo would like to acknowledge the support from the Intelligent System and Control (ISAC) group of Queen’s University of Belfast. REFERENCES B. Baranyi, Y. Yam, A. Vrkonyi-Koczy, R.J. Patton, P.

Michelberger and M. Sugiyama. SVD based reduction

to MISO-TS fuzzy models, IEEE Trans. on Industrial

Electronics, volume 50, pages 232- 242, 2003.

R.E Bellman. Dynamic Programming, Princeton Univer­ sity Press, Princeton, NJ, 1957. M. R. Emami, I. B. Trksen and A. A. Goldenberg. Development of a systematic methodology of fuzzy logic modeling, IEEE Trans. Fuzzy Syst., volume 6, pages 346-361, 1998.

T.-P. Hong and J.-B. Chen. Finding relevant attributes

and membership functions, Fuzzy Sets Syst., volume 103,

pages 389-404, 1999.

X. Hong and C. J. Harris . Variable selection algorithm for

the construction of MIMO operating point dependent

neurofuzzy networks. IEEE Trans Fuzzy Syst, volume 9,

pages 88-101, 2001.

A. X. Hong and C. J. Harris. A neurofuzzy network knowl­ edge extraction and extended Gram-Schmidt algorithm

for model subspace decomposition, IEEE Trans Fuzzy

Syst, volume 11, pages 528-541, 2003.

J.-S. R. Jang, C.-T. Sun and E. Mizutani. Neuro-Fuzzy and Soft Computing, Englewood Cliffs, NJ: Prentice Hall, 1997. E.P. Klement, L.T. Koczy and B. Moser. Are fuzzy systems

universal approximators?, Int. J. General Systems,

volume 28, no. 2-3, pages 259-282, 1999.

L.T. Koczy and H. Hirota. Approximate reasoning by linear rule interpolation and general approximation, Int. J. Approx. Reason., volume 9, pages 197-225, 1991.

K. Li, J. Peng and G. W. Irwin, A fast nonlinear model

identification method. IEEE Trans on Automat. Contr.,

volume 50, number 8, pages 1211-1216, 2005.

K. Li, J. Peng and E. W. Bai. A two-stage algorithm for

identification of nonlinear dynamic systems. Automat­ ica, volume 42, number 7, pages 1189-1197, 2006.

S. Mitra and Y. Hayashi. Neuro-fuzzy rule generation:

Survey in soft computing framework, IEEE Trans.

Neural Networks, volume 11, pages 748-768, 2000.

N. R. Pal. Soft computing for feature analysis, Fuzzy Sets

Syst., volume 103, pages 201-221, 1999.

B. Pizzileo and K. Li. A new fast algorithm for fuzzy rule selection, IEEE International Conference on Fuzzy Systems, 2007. B. Pizzileo, K. Li and G. W. Irwin. A New Rule Selection Procedure for Fuzzy-Neural Modelling, To be presented at the 15th IFAC Symposium on System Identification, July 6-8, Saint Malo, France, 2009. M. Sugeno and T. Yasukawa. A fuzzy-logic-based approach

to qualitative modelling, IEEE Trans. Fuzzy Syst, pages

7-31, 1993.

T. Takagi and M. Sugeno. Fuzzy identification of systems

and its applications to modelling and control, IEEE

Trans. Syst., Man, Cybern., volume 15, pages 116-132,

1985.

L.X. Wang. Fuzzy systems are universal approximators,

Proc. of the IEEE Int. Conf. on Fuzzy Systems, pages

1163-1169, 1992.

Y. Yam, P. Baranyi and C.T. Yang. Reduction of fuzzy

rule base via singular value decomposition, IEEE Trans.

on Fuzzy Systems, volume 7, pages 120-131, 1999.

J. Yen and L. Wang. Simplifying Fuzzy Rule-Based Mod­ els Using Orthogonal Transformation Methods, IEEE

Trans. Systems, Man and Cybernetics-Part B: Cyber­ netics, volume 29, no. 1, pages 13-23, 1999.

L.A. Zadeh. Fuzzy sets, volume 8, no. 3, pages 338-353,

1965.

L.A. Zadeh. Outline of a new approach to the analysis of

complex systems and decision processes, IEEE Trans.

on SMC, volume 3, pages 28-44, 1973.