An online Bayesian Ying–Yang learning applied to fuzzy CMAC

An online Bayesian Ying–Yang learning applied to fuzzy CMAC

ARTICLE IN PRESS Neurocomputing 72 (2008) 562–572 www.elsevier.com/locate/neucom An online Bayesian Ying–Yang learning applied to fuzzy CMAC M.N. Ng...

703KB Sizes 5 Downloads 99 Views

ARTICLE IN PRESS

Neurocomputing 72 (2008) 562–572 www.elsevier.com/locate/neucom

An online Bayesian Ying–Yang learning applied to fuzzy CMAC M.N. Nguyena,, D. Shia, J. Fub a

School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore School of Electric and Information Engineering, Heilongjiang Institute of Science and Technology, Harbin, China

b

Received 12 April 2007; received in revised form 29 September 2007; accepted 23 November 2007 Communicated by D. Wang Available online 2 January 2008

Abstract This paper proposes an online Bayesian Ying–Yang (OBYY) clustering algorithm, which is then applied to the fuzzy cerebellar model articulation controller (FCMAC). Inspired by ancient Chinese Ying–Yang philosophy, Xu’s Bayesian Ying–Yang (BYY) learning has been successfully applied to clustering by harmonizing the visible input data (Yang) and the invisible clusters (Ying). In this research, the original BYY is advanced to dynamically recruit, adjust, and merge the fuzzy clusters to achieve maximum harmony and highest membership values. The proposed online FCMAC-BYY offers the following advantages. First, the antecedent of the fuzzy rules are dynamically constructed and optimized by the OBYY algorithm during the operation of the system. Second, the credit assignment is then employed in the learning process of the neurons to greatly speed up the learning process. These features make the entire online FCMACBYY an optimal structure with a fast learning speed that can perform online learning and suitable for real-time applications. The experimental results on some benchmark datasets show that the proposed model outperforms the existing representative techniques. r 2008 Elsevier B.V. All rights reserved. Keywords: Bayesian Ying–Yang learning; FCMAC; Online learning; Credit assignment

1. Introduction Nowadays, together with the increasingly infinite amount of data information, automatic adaptation techniques and online learning schemes for stream data are becoming more and more interesting areas of researchers [8,31]. For online identification tasks, this requires an adaptation of some model parameters in the form of incremental learning steps with new data. This is because a complete rebuilding of the models from time to time with all recorded measurements result in an unacceptable computational effort. Other requirements to incremental learning include refinement of existing knowledge-based models with data, preventing a virtual memory overload in case of huge databases and an automatic improvement of the accuracy and generalization capability of models initially trained on only a handful of data [6,20].

Corresponding author. Tel.: +65 91434740.

E-mail address: [email protected] (M.N. Nguyen). 0925-2312/$ - see front matter r 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2007.11.016

Online learning is a major aspect in designing a fuzzy neural network (FNN)-based controller for dealing with dynamically changing problems. In general, the performance of a fuzzy neural network strongly depends on the fuzzy system rules and the membership functions. Typically, the fuzzy rules are constructed by either predefined paradigms [24] or automatic methods for generating the fuzzy rules [4,9,14,28,32]. Since no prior knowledge is required for the fuzzy rules or fuzzy sets, the latter has attracted more and more interests. Fuzzy ART [3] introduced by Carpenter et al. and discrete incremental clustering (DIC) [28] proposed by Tung and Quek using unsupervised learning are self-organized clustering models, which group a given set of input patterns into some categories. They have the characteristics of noise tolerance and do not require prior knowledge of the number of clusters present in the training data set. However, the number of generated clusters is very sensitive to the predefined vigilance threshold value and the overlapping degree parameter. Another problem of these models is that they cannot delete fuzzy clusters once they are generated

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

therefore the number of fuzzy rules increases over the time, as a result, there are many redundant rules. To avoid the redundant clusters, some paradigms use reinforcement learning, in which not only the parameters can be adjusted but also the structure can be self-adaptive during learning, have been presented [4,9]. This approach uses the reinforcement signals that give rewards and penalties to the learner to perform a given action from a given state. The most popular method is Q-learning, which estimates the reward amounts for the future actions from given states based on temporal-difference (TD) learning. However, it is difficult to deal with continuous-value actions and another problem is that there are too many thresholds to be specified or defined by users. Generally speaking, if we do not have any knowledge about the system, identical membership functions whose domains can cover the region of the input space evenly are chosen, and for every possible combination of input fuzzy variables, a fuzzy rule has to be considered. However, the number of rules increases exponentially with increase in the number of input variables. As a consequence, the fuzzy neural network often includes many redundant or improper membership functions and fuzzy rules. This phenomenon is called ‘‘the curse of dimensionality’’ [21]. This motivates us to develop a fuzzification method that is capable to determine the fuzzy rules dynamically. In addition, in the case of real-time applications, not only the training part of the fuzzy neural network can be performed for current patterns, but the learning speed is also particularly important. The cerebellar model articulation controller (CMAC) [1,2] was first proposed by Albus. Fuzzy logic was later introduced into CMAC to obtain a new fuzzy neural system, known as fuzzy CMAC, or fuzzy cerebellar model articulation controller (FCMAC) [15,25,35]. It has been demonstrated to have good local generalization capability and rapid learning convergence. Although the learning time is significantly reduced in FCMAC, it is still not fast enough for some online applications [26,27]. To overcome the above shortcoming, credit assignment is employed into the learning process to greatly speed up the learning process. In this paper, a novel online Bayesian Ying–Yang (OBYY) learning for fuzzification is proposed, where credit assignment is employed to speed up the learning process so that the advanced FCMAC can perform online learning. In the proposed online FCMAC-Bayesian Ying–Yang (BYY), not only the precondition part is constructed online, but also the structure of the neural network can be generated and selfadaptive during learning. This paper is organized as follows. Section 2 introduces our previous work of fuzzification based on BYY learning method. In Section 3, we propose the OBYY learning for fuzzification which is used to automatically generate the precondition part of fuzzy rules dynamically. The architecture of the online FCMAC-BYY system is then given in details in Section 4. Section 5 shows the simulation results and some comparative studies with other representative techniques. Lastly, the conclusions are drawn in Section 6.

563

2. Fuzzification based on Bayesian Ying–Yang learning Inspired by ancient Chinese Ying–Yang philosophy, Xu [33,34] introduced BYY learning for clustering by harmonizing the visible input data (Yang) and the invisible clusters (Ying). In our previous work, the BYY is introduced into the fuzzification phase of the fuzzy CMAC [14]. Fuzzification is to discriminate the fuzzy clusters from the training patterns. If we treat the training data x and the fuzzy clusters c as random processes, the BYY fuzzification considers two complementary representations of the joint distribution of input pattern x and fuzzy cluster c as shown in Fig.R1. The first one is the forward/training model M1: pðcÞ ¼ pðcjxÞ pðxÞ dx. It is called a Yang/(visible) model which focuses on the mapping function of the visible input data x into an invisible cluster representation c via a forward propagation distribution p(c|x).RThe second one is the backward/running model M2: pðxÞ ¼ pðxjcÞpðcÞ dc. It is called Ying/(invisible) model which focuses on the generation function of the visible input data x from an invisible cluster representation c via a backward propagation distribution p(x|c). Under the principle of Ying–Yang harmony, the difference between these two models should be minimized. The difference between Ying and Yang can be evaluated by the Kullback-Leibler divergence: Z Z pðcjxÞpðxÞ KLðM 1 ; M 2 Þ ¼ dx dc (1) pðcjxÞpðxÞ ln pðxjcÞpðcÞ The above difference can be minimized by iterative execution of the following two steps: new Step 1: Fix M 2 ¼ M old ¼ arg minM 1 KLðM 1 ; 2 , get M 1 M 2 Þ, new Step 2: Fix M 1 ¼ M old ¼ arg minM 2 KLðM 1 ; 1 , get M 2 M 2 Þ,

which lead KL(M1, M2) to converge. In the case of free P(c|x), the probabilities P(c|x), P(c|x), and p(c) can be calculated as follows: pðxjcÞ ¼ Pðxjyc Þ k P pðcjxÞ ¼ Pðcj jxÞDðc  jÞ j¼1

pðcÞ ¼

k P

aj ðc  jÞ

j¼1

Fig. 1. Fuzzification based on Ying–Yang learning.

(2)

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

564

subject to k P

Pðcj jxÞ40;

Pðcj jxÞ ¼ 1;

j¼1 k P

aj 40

(3) aj ¼ 1;

j1

where yc stands for all the unknown parameters of cluster c (in this case, they are the mean and covariance), k is the number of fuzzy clusters and aj is the prior probability of the jth cluster. When the number of patterns N is large enough the minimization of KL(M1, M2) given by Eq. (1) is equivalent to minYk KLðYk Þ with Yk ¼ faj ; yj gkj¼1 and N X k 1X Pðcj jxi Þ ln Pðcj jxi Þ KLðYk Þ ¼ N i¼1 j¼1



N X k k X 1X Pðcj jxi ÞPðxi jyj Þ  aj ln aj . N i¼1 j¼1 j¼1

(4)

Eq. (4) is equivalent to maxYk LðYk Þ with LðYk Þ ¼ N1 where

N P

pðxi ; Yk Þ;

i¼1 k P

pðxi ; Yk Þ ¼

(5) aj pðxi jyj Þ:

j¼1

The maximization of Eq. (5) can be implemented by the Expectation-Maximization (EM) algorithm [11,19]: aj pðxi jyj Þ 1 PN ; aj ¼ Pðcj jxi Þ. N i¼1 pðxi ; Yk Þ PN ¼ maxyj i¼1 Pðcj jxi Þ ln Pðxi jyðoldÞ Þ. j

E step: Pðcj jxi Þ ¼ M step: yðnewÞ j

Fig. 2. Illustration of an online Bayesian Ying–Yang fuzzification.

cluster adjustment to maximize the log likelihood or the degree that the input patterns belong to the fuzzy clusters. This step allows the online FCMAC-BYY to produce the optimal fuzzy clusters and rules that achieve the highest harmony between the input data and fuzzy clusters. In the second step, output error optimization, the output error criterion is taken into account in the cluster creation and cluster pruning. This step allows the proposed model to control the overall performance of the model. The membership value is defined as the Gaussian basis function acting on each dimensional input: 2 ! 3 ðtÞ 2 ðtÞ x  m 1 j 5 GðxðtÞ ; cj Þ ¼ exp4 2 sðtÞ j for j ¼ 1; 2; . . . ; JðiÞ

(6) (t)

where J(i) is the number of clusters, x is the input vector, ðtÞ and mðtÞ j is the center of the jth cluster with width sj at time t.

3. Online Bayesian Ying–Yang learning for fuzzification The original fuzzification based on Ying–Yang learning suffers from two main problems. Firstly, the parameter learning phase can only be carried out when all the training data are obtained, so it is not suitable for online applications where the data are not always available during the training phase. Secondly, the fuzzy clusters of the original fuzzification based on Ying–Yang learning are systematically performed by the cluster number selection phase. However, in some applications the number of fuzzy clusters needs to be controlled and adjusted to reduce the output errors. To address the above two problems leads to our proposed OBYY learning for fuzzification, which is derived from the original fuzzification based on Ying–Yang learning and able to handle online data as well as control the number of fuzzy clusters to satisfy the output error criterion. As shown in Fig. 2, the OBYY learning for fuzzification consists of two optimization steps. In the first step, input data harmonization, the fuzzy clusters are adjusted by the

3.1. Cluster creation For each incoming training pattern, the OBYY fuzzification will determine whether or not a new fuzzy cluster should be created based on the following two criteria. The first one is the total firing strength of the activated fuzzy rules. F ðxðtÞ Þ ¼

aðtÞ X

lnðxðtÞ jyj Þ,

(7)

j¼1

where x(t) indicates the incoming pattern at time t, ln(x(t)|yj) is the log likelihood value indicating the degree that the input data, x(t), belongs to the jth cluster for j 2 ½1; aðtÞ, and a(t) is the number of activated clusters at time t, and yj stands for all the parameters of that cluster. When a new data is fed into the network, the total firing strength is computed and compared to the pre-specified threshold value . If F(x(t))oe this implies that the total firing strength of the existing network is not satisfied, and a new fuzzy cluster should be created.

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

The second criterion is based on the error constraint. It is not sufficient to consider the total firing strength of the fuzzy rules as the only criterion to generate new fuzzy clusters; a new fuzzy cluster also should be created when the error of the network does not satisfy the error criterion yto  ytd 4ke . Where ke is an error criterion threshold, yðtÞ o and yðtÞ d are the real output and desired output at time t, respectively. 3.2. Cluster adjustment In this section, the original BYY-based fuzzification is modified to handle online training for current patterns. Now, we assume that the number of clusters at time t is k(t), then the total probability at time t is: pðxðtÞ jYðtÞ KÞ ¼

kðtÞ X

ðtÞ aðtÞ j pðx jyj Þ

(8)

j¼1

and the posterior probability in the Expectation step of the EM algorithm becomes: Pðcj jxðtÞ Þ ¼

ajðt1Þ pðxðtÞ jyj Þ , pðxðtÞ ; Yk Þ

(9)

where aðt1Þ is the prior probability at the preceding time, j which can be obtained online by previous calculation result of the jth cluster. Then, the probability aðtÞ j at the moment t can be computed as follows [30]: t 1X Pðcj jxðiÞ Þ t i¼1

aðtÞ j ¼

ajðt1Þ

1 ¼ ðt  1Þ

ðiÞ

Pðcj jx Þ.

ðt1Þ ðt  1Þat1 þ Pðcj jxt Þxt j mj ¼ , atj t

sðtÞ j ¼

¼ ¼

i i 2 t 2 i¼1 Pðcj jx Þðx Þ  ðmj Þ Pt i i¼1 Pðcj jx Þ Pt1 i i 2 t t 2 i¼1 Pðcj jx Þðx Þ þ Pðcj jx Þðx Þ t aj t

 ðmtj Þ2

ðt1Þ ðt  1Þat1 þ ðmtj Þ2  þ Pðcj jxt Þðxt Þ2 j ½sj

atj t

 ðmtj Þ2 .

(13)

Therefore, all the parameters at time t can be computed base on these parameters at time t1. This feature allows the proposed OBYY learning for fuzzification to handle online applications which require an automatic improvement of the accuracy and generalization capability of models during running period. 3.3. Cluster pruning The objective of the OBYY learning for fuzzification is to minimize the difference between the two models of Ying and Yang. This is equivalent to maximize the likelihood. Therefore, the prior probability aj of each fuzzy cluster Start

Generate the first cluster

(11)

Satisfy ε -completeness criterion?

Yes Satisfy error checking criterion?

i¼1

At the beginning, if we assume the prior probability að0Þ j is uniform then the posterior probability for the first data P(cj|x1) in Eq. (9) can be obtained as well as the probability að1Þ in Eq. (10). These values are then stored for the next j computation. Based on the above analysis, the online learning of yc ¼ (mc, sc) can be carried out in the modified maximization step of EM algorithm: Pt Pðcj jxi Þxi ¼ mðtÞ Pi¼1 j t i i¼1 Pðcj jx Þ Pt1 Pðcj jxi Þxi þ Pðcj jxt Þxt ¼ i¼1 atj t

Pt



(10)

where t1 X

Pt

Compute membership values

1 ½ðt  1Þaðt1Þ þ Pðcj jxðtÞ Þ, j t

¼

565

mtj Þ2

i i i Pðcj jx Þðx  Pt i i¼1 Pðcj jx Þ

No No

Yes

Cluster Creation

After a certain number of patterns

Cluster Pruning

Cluster Adjustment

(12) End Fig. 3. Flowchart of the online Bayesian Ying–Yang learning for fuzzification.

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

566

becomes a nature criterion for judging the performance of that cluster. The prior probability is checked periodically, i.e. after a certain number of patterns presented to the network, the prior probability of all fuzzy clusters will be computed and the cluster with smallest prior probability value will be pruned. We can perform the cluster adjustment at time t as follows:

Step 1: Find the cluster cj, which has the minimal aj, Step 2: Temporarily delete cluster cj, Step 3: Perform the error checking yto  ytd oke . If the online FCMAC-BYY passes the error criterion, it means that the fuzzy clusters over-cover the input patterns and the network has low generalization capability; therefore, that cluster should be deleted. On the contrary, if the network cannot pass the error criterion, it means that the network already has a good condition and that cluster should not be deleted.

tion zifica

Based on the training schemes, the existing fuzzy neural networks can be divided into two categories: global learning and local learning. In the former, the output of the network is computed by all the neurons from previous layers. The parameters learning are then done on all the neurons and all the weights are adjusted at each training step. In other words, a coming training pattern will affect the whole network. Some examples of this category can be found at FALCON [12,30], GenSoFNN [28], and POPFNN [18]. On the other hand, in the local training structures, the output of the network is computed by only some ‘‘activated’’ neurons which are activated by the input pattern. It means that for each training step, only a subset

Activation Layer A A1

Output Layer O

Weights wj

on

R1

x1

Fuzzy value outputs

i icat

c11

Y fuz

4. OBYY-based fuzzy CMAC

zz defu

Input Layer X Fuzzification Rule Layer R Layer F

(t)

OBY

The OBYY learning for fuzzification can be summarized as in Fig. 3.

fuzzy inference

Linguistic fuzzy labels

Real value input data

3.4. Algorithm of OBYY-based fuzzification

R1

c1k(t)

y(t)

Fuzzy linguistic layer

A2

CMAC Technical layer ci1

ping

map

R1

xit cik(t)

Input nodes

An

R1

learning

Weighted real values

Fig. 4. The architecture of the online FCMAC-BYY.

desired output yd

Input data x

A1

w1

A2

w2

error

. . . An

wn

Fig. 5. The training process of the conventional CMAC.

actual output yo

Real value outputs

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

of neurons that effectively contribute to the output will be trained. That indicates a more efficient algorithm and reduced computational costs. Some candidates of this category can be found at EFuNNs, dmEFuNNS [10], and FCMAC [16]. In the global training schemes, the learning process is only concerned with parameter level adaptation within fixed structures and it is time consuming. For large-scale online data problems, it will be too complicated to determine the optimal premise-consequent structures, rule numbers, etc. On the other hand, the local learning structures have fast learning ability with one pass (epoch) training. Moreover, an important feature of these structures is the local generalization which allows only a subset of the neurons to be changed to adapt a new training pattern while the other neurons still remain. This feature is highly capable of online learning which requires only a necessary changing to adapt to an incoming data with the rest remaining to keep the historical information [22]. 4.1. System architecture In this research, the online FCMAC-BYY that inherited the local generalization is proposed by incorporating with the OBYY algorithm. The proposed model has five layers of neurons shown in Fig. 4. In this network model, all nodes/connections are created/connected as input data are presented. An online BYY-based FCMAC is basically a combination of fuzzy linguistic presentation and CMAC model. On one hand, the proposed system has the localization learning abilities and connectionist structures of the CMAC model, which uses mathematical relationship and mapping to learn weighted real values. On the other hand, it has the capability of human oriented representation using linguistic fuzzy labels and fuzzy value outputs. (1) Input Layer: Input variable of the network, X ¼ [x1, x2, y, xn] is obtained from the raw data set as crisp value. (2) Fuzzification Input Layer: The second layer represents fuzzy quantization of each input variable. To obtain fuzzy labels, the fuzzy input nodes transfer the input

567

values into membership values which are real numbers from 0 to 1 using Gaussian function. Each node in this layer corresponds to one linguistic value (small, medium, large, etc.) of one of the input variables in Layer 1. All clusters in this layer could be created, adjusted, aggregated, and pruned dynamically based on the OBYY fuzzification. (3) Rule Layer: The rule layer contains rule nodes that evolve through learning process. Each combination of the fuzzy cluster in the Fuzzification Layer becomes a rule node which represents a fuzzy rule. (4) Activation Layer: Each activated rule node in the Rule Layer becomes a firing neuron in this layer. All the neurons here are fully connected to the Output Layer. One novel parameter named ‘‘firing degree’’ is introduced to be relative with the number of times that a neuron is involved in the output calculation. This parameter is motivated from human behavior that not all of the neurons behave equally, but for some particular task the neurons that are fired more frequently than others should maintain its stability; the details are described in the next sub-section.

Table 1 Prediction results of online learning models on Mackey–Glass test data Methods

Fuzzy rules (DENFIS and OYYFCMAC) rule nodes (EFuNN) units (others)

NDEI for testing data

Neural gas RAN RAN ESOM ESOM EfuNN EfuNN DENFIS DENFIS DENFIS with rule insertion Online FCMACBYY

1000 113 24 114 1000 193 1125 58 883 883

0.062 0.373 0.17 0.32 0.044 0.401 0.094 0.276 0.042 0.033

16

0.072

Mackey-Glass Dataset 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0

500

1000

1500

2000

2500

3000

3500

4000

4500

Fig. 6. The Mackey–Glass time series dataset.

5000

5500

6000

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

568

Traffic Density of 3 lanes along PIE

Normalized Density

CV2

CV1

6

CV3

5 4 3 2 1

Lane 1

Lane 2

Lane 3

6800

6400

6000

5600

5200

4800

4400

4000

3600

3200

2800

2400

2000

1600

1200

800

400

0

0

Time (min)

Fig. 7. Traffic density of three straight lanes along PIE.

20

are updated according to the total number of activated neurons instead. The following formula is applied to the updating of the weights in the online FCMAC-BYY: P að nl¼1 f ðlÞ=f ðlÞÞ t ðtÞ ðt1Þ wj ¼ wj ðyd  yto Þ, þ (14) Z

15

and

40

Number of rules

35 30 25

 n  Pn X l¼1 f ðlÞ Z¼ , f ðlÞ l¼1

10 5 0

(15)

Fig. 8. Illustration of rules evolving during training period at t ¼ 5 min.

where wj(t) is the weight of the jth activated neuron at time t, a is the learning rate while ytd and yto are the desired and calculated output respectively, and f(l) returns the variable f_ freq. Using the derived Z of n activated neurons, Eq. (15) reduces proportionally the learning rate of a activated neuron as its fired frequency increases.

(5) Output Layer: The defuzzified values of the output variables are computed by the defuzzification center of area (COA) method.

5. Experimental results

0

100

200

300 Data index

400

500

600

4.2. Credit assignment applied to an online FCMAC-BYY The learning speed plays an important role in producing a good learning model for real-time applications within real-time constraints. In this research, the reinforcement learning is introduced to the original CMAC to speed up the learning process. In the conventional CMAC learning schemes, the errors are distributed equally to all the activated neurons of the current input pattern as shown in Fig. 5. In this way, the historical contributions of individual neurons to the past patterns are not taken into account. To overcome the above shortcoming, credit assignment is applied to consider the responsibilities of neurons. In the learning process of the online FCMAC-BYY, a variable, named f_freq, is added to each neuron to count the number of times the neuron has been fired. Using this technique, the neurons which are fired more frequently will perform learning at a reduced rate. Prior to this switch, the weights

We now compare the performance of the online FCMAC-BYY with other representative fuzzy neural networks. Our experiments were conducted on a chaotic time series Mackey–Glass data set, and a real-time application traffic prediction. 5.1. Mackey–Glass dataset In this section, the online FCMAC-BYY will be applied to modeling and predicting the future values of a chaotic time series: the Mackey–Glass (MG) data set [13], which has been used as a benchmark example in the areas of neural networks, fuzzy systems and hybrid systems. This time series is created with the use of the MG time-delay differential equation defined as: xðtÞ ¼

0:2xðt  tÞ  0:1xðtÞ. 1 þ x10 ðt  tÞ

(16)

To obtain values at integer time points, the fourth-order Runge–Kutta method was used to find the numerical

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

solution to the above MG equation. Here, we assume that time step is 0.1, x(0) ¼ 1.2, t ¼ 17, and x(t) ¼ 0 for to0. The task is to predict the values x(t+85) from input vectors [x(t18)x(t12)x(t6)x(t)] for any value of the time. As shown in Fig. 6, the following experiment was conducted: 3000 data points, for t ¼ 201–3200, are extracted from the time series and used as learning (training) data; 500 data points, for t ¼ 5001–5500, are used as test data. For each of the aforementioned online models, the learning data is used for the online learning processes, and then the testing data is used with the recalling procedure. To demonstrate the performance of the proposed model, we use some existing online learning models applied on the same task. These models are Neural gas [5], RAN [17], ESOM, EfuNN, and DENFIS [10]. In this experiment, the non-dimensional error index (NDEI) [13], which is defined as the root mean-square error (RMSE) divided by the standard deviation of the target series, is used as the measurement. As shown in Table 1, the proposed online FCMAC-BYY can archive a promising result with fewer rules as compared to other models. 5.2. Traffic prediction In this section, the online FCMAC-BYY is applied to a real-time traffic prediction application. The data were

569

collected at a site (Site 29) located at exit 15 along the eastbound Pan Island Expressway (PIE) in Singapore using loop detectors embedded beneath the road surface [15]. There are a total of five lanes at the site, two exit lanes and three straight lanes for the main traffic. For this experiment, only the traffic flow data for the three straight lanes were considered. The traffic data set has four input attributes. The four attributes are time and the traffic density of the three lanes. The purpose of this simulation is to model the traffic flow trend at the site using online FCMAC-BYY network. It is then used to obtain prediction for the traffic density of each lane at time t+t, where t ¼ 5, 15, 30, 45, and 60 min. Fig. 7 shows a plot of the traffic flow density data for the three straight lanes spanning a period of six days from 16:20 on 5 September to 10:30 on 10 September 1996. For the simulation, three cross-validation groups of training and test sets are used. They are CV1, CV2, and CV3. The square of the Pearson product-moment correlation value (denoted as R2) [7] is used to compute the accuracy of the predicted traffic trends obtained using online FCMAC-BYY network. The illustration of rules evolving of the proposed online FCMAC-BYY during the training period of lane 1 at t ¼ 5 of CV1 is shown in Fig. 8. Started at single rule at the beginning, the number of rules is rapidly increased and

Normalized Density Lane 1

5 4.5

Actual

FCMAC-BYY

Online FCMAC-BYY

4 3.5 3 2.5 2 1.5 1 0.5 0 2745

3145

3545

3945

4345

5145 4745 Time (min)

5545

5945

6345

6745

4345

4745 5145 Time (min)

5545

5945

6345

6745

25

Squared errors

20

FCMAC-BYY Online FCMAC-BYY

15 10 5 0 2745

3145

3545

3945

Fig. 9. Prediction and squared errors at t ¼ 5 min.

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

pruned in the first 200 data. However, the number of rules remains unchanged during the last 250 data when the structure gets to a stable state. The prediction and squared errors of lane 1 density using online FCMAC-BYY are shown in Fig. 9 at t ¼ 5 and in Fig. 10 at t ¼ 60 of CV1. The mean square errors (MSEs) against time interval t ¼ 5–60 for lane 1 using CV1 as training set for the online FCMAC-BYY, FCMAC-BYY, and the GenSoFNNCRI(S) [28] are shown in Fig. 11. The ‘‘Var’’ indicator (the change in Avg R2 value from t ¼ 5 min to w min expected as a percentage of the former) and the ‘‘Avg Var’’, the mean ‘‘Var’’ values across all three lanes, are used for the benchmarking of the various systems. These two indicators reflect the consistency of the predictions made by the benchmarked systems over different time intervals across the three lanes. The online FCMAC-BYY prediction model is compared with FCMAC-BYY, the Falcon-class of network [29], and the GenSoFNN-CRI(S) network [28]. Table 2 shows that online FCMAC-BYY has superior performance over all the networks.

fuzzification phase of the online FCMAC-BYY, in which the antecedent of the fuzzy rules are dynamically constructed and credit assignment is employed to speed up the learning process. The performance of the proposed model is validated by using a Mackey–Glass dataset and the real traffic flow data. Simulation results show that the online FCMAC-BYY can approximate the data dynamics accurately and consistently. The proposed system demonstrated superiority when compared to Neural gas [5], RAN [17], 0.3

GenSoFNN_CRI(S) FCMAC-BYY Online FCMAC-BYY

0.25 0.2 MSE

570

0.15 0.1 0.05 0 0

6. Conclusion and future work This paper presents a nature inspiration of the Ying–Yang philosophy which is applied to the online

15

30 Time Interval

45

Fig. 11. Mean squared errors of online FCMAC-BYY, FCMAC-BYY, and GenSoFNN-CRI(S) for t ¼ 5–60.

Normalized Density Lane 1

5 4.5

Actual

FCMAC-BYY

Online FCMAC-BYY

4 3.5 3 2.5 2 1.5 1 0.5 0 2745

3145

3545

3945

4345

4745 5145 Time (min)

5545

5945

6345

6745

4 FCMAC-BYY Online FCMAC-BYY

3.5 Squared errors

3 2.5 2 1.5 1 0.5 0 2745

3145

3545

3945

4345

60

4745 5145 Time (min)

5545

Fig. 10. Prediction and squared errors at t ¼ 60 min.

5945

6345

6745

ARTICLE IN PRESS M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

571

Table 2 Simulation results of traffic prediction Network

Lane 1 Var(%)

Lane 2 Var(%)

Lane 3 Var(%)

Avg Var(%)

Falcon-FCM(CL) Falcon-MLVQ(CL) Falcon-FKP(CL) Falcon-PFKP(CL) Falcon-MART GenSoFNN-VRI(S) FCMAC-BYY Online FCMAC-BYY

24.17 36.41 23.87 20.78 20.78 19.64 10.75 10.34

9.32 25.94 22.09 21.05 15.47 19.58 8.95 8.57

30.47 30.21 35.19 28.25 20.58 21.09 15.36 13.84

21.32 30.85 27.05 25.70 18.94 20.10 11.68 10.92

ESOM, EFuNN, and DENFIS [10] in the case of Mackey–Glass dataset and Falcon-class of network [29], and the GenSoFNN-CRI(S) network [28] in the case of traffic flow data. Our future work includes the application of our proposed online FCMAC-BYY to time series data analysis, such as speech signal processing [23].

[17] [18]

[19]

References [1] J.S. Albus, Data storage in the cerebellar model articulation controller (CMAC), Trans. ASME, Dyn. Syst. Meas. Control 97 (3) (1975) 228–233. [2] J.S. Albus, A new approach to manipulator control: the cerebellar model articulation controller (CMAC), Trans. ASME, Dyn. Syst. Meas. Control 97 (3) (1975) 220–227. [3] G.A. Carpenter, S. Grossberg, D.B. Rosen, Fuzzy ART: fast stable learning and categorization of analog patterns by an adaptive resonance system, Neural Netw. 4 (1991) 759–771. [4] M.J. Er, C. Deng, Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning, IEEE Trans. Syst. Man Cybern. Part B 34 (2004) 1478–1489. [5] B. Fritzke, A growing neural gas network learns topologies, Adv. Neural Inform. Process. Syst. 7 (1995). [6] J.B. Gao, S.R. Gunn, C.J. Harris, Mean field method for the support vector machine regression, Neurocomputing 50 (2003) 391–405. [7] R.N. Goldman, J.S. Weinberg, Statistics: an Introduction, PrenticeHall, Englewood Cliffs, NJ, 1985. [8] X. Hong, C.J. Harris, A neurofuzzy network knowledge extraction and extended Gram-Schmidt algorithm for model subspace decomposition, IEEE Trans. Fuzzy Syst. 11 (2003) 528–541. [9] C.F. Juang, Combination of online clustering and Q-value based GA for reinforcement fuzzy systems, IEEE Trans. Fuzzy Syst. 13 (2005) 289–302. [10] N. Kasabov, Evolving Connectionist Systems, Springer, 2003. [11] N.M. Laird, A.P. Dempster, D.B. Rubin, Maximum-likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B 39 (1977) 1–38. [12] C.-T. Lin, C.S.G. Lee, Neural-network-based fuzzy logic control and decision system, IEEE Trans. Comput. 40 (1991) 1320–1336. [13] M.C. Mackey, L. Glass, Oscillation and chaos in physiological control systems, Science 197 (1977) 287–289. [14] M.N. Nguyen, D. Shi, C. Quek, G.S. Ng, Traffic prediction using Ying–Yang fuzzy cerebellar model articulation controller, Presented at 18th International Conference on Pattern Recognition, 2006 (ICPR 2006), 2006. [15] M.N. Nguyen, D. Shi, C. Quek, FCMAC-BYY: fuzzy CMAC using Bayesian Ying–Yang learning, IEEE Trans. Syst. Man Cybern. Part B 36 (2006) 1180–1190. [16] M.N. Nguyen, J.F. Guo, D. Shi, ESOFCMAC: evolving selforganizing fuzzy cerebellar model articulation controller, in:

[20]

[21]

[22]

[23]

[24]

[25] [26]

[27]

[28]

[29] [30]

[31]

[32]

[33]

Proceedings of International Joint Conference on Neural Networks, 2006. J. Platt, A resource allocating network for function interpolation, Neural Comput. 3 (1991) 213–225. C. Quek, M. Pasquier, B.B.S. Lim, POP-TRAFFIC: a novel fuzzy neural approach to road traffic analysis and prediction, IEEE Trans. Intell. Transportation Syst. 7 (2006) 133–146. R.A. Redner, H.F. Walker, Mixture densities, maximum likelihood and the EM algorithm, SIAM Rev. 26 (1984) 195–239. D. Shi, S.R. Gunn, R.I. Damper, Off-line handwritten Chinese character recognition by radical decomposition, ACM Trans. Asian Lan. Inf. Process. 2 (2003) 27–48. D. Shi, S.R. Gunn, R.I. Damper, Handwritten Chinese radical recognition using nonlinear active shape models, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 277–280. D. Shi, D.S. Yeung, J.B. Gao, Sensitivity analysis applied to the construction of radial basis function network, Neural Netw. 18 (2005) 951–957. D. Shi, F. Chen, G.S. Ng, J. Gao, The construction of wavelet network for speech signal processing, Neural Comput. Appl. 15 (2006) 217–222. K.S. Lum, M.N. Nguyen, D. Shi, GA based FCMAC-BYY model for bank solvency analysis, presented at IEEE Congress on Evolutionary Computation (CEC), Singapore, 2007. D. Shi, C. Quek, R. Tilani, J. Fu, Product demand forecasting with a novel fuzzy CMAC, Neural Process. Lett. 25 (2007) 63–78. S.-F. Su, T. Tao, T.-H. Hung, Credit assigned CMAC and its application to online learning robust controllers, IEEE Trans. Syst. Man Cybern. Part B 33 (2003) 202–213. S.-F. Su, Z.-J. Lee, Y.-P. Wang, Robust and fast learning for fuzzy cerebellar model articulation controllers, IEEE Trans. Syst. Man Cybern. Part B 36 (2006) 203–208. W.L. Tung, C. Quek, PACL-FNNS: a novel class of falcon-like fuzzy neural networks based on positive and negative exemplars, in: C.T. Leondes (Ed.), Intelligent Systems: Technology and Applications. Fuzzy Systems, Neural Networks and Expert Systems, vol. II, CRC Press, Boca Raton, 2002, pp. 257–320 Chapter 10. W.L. Tung, C. Quek, GenSoFNN: a generic self-organizing fuzzy neural network, IEEE Trans. Neural Netw. 13 (5) (2002). W.L. Tung, C. Quek, Falcon: neural fuzzy control and decision systems using FKP and PFKP clustering algorithms, IEEE Trans. Syst. Man Cybern. Part B 34 (2004) 686–695. D.H. Wang, P. Bao, Robust impulse control of uncertain singular systems by decentralized output feedback, IEEE Trans. Autom. Control, Part B 45 (2000) 500–503. S. Wu, M.J. Er, Y. Gao, A fast approach for automatic generation of fuzzy rules by generailized dynamic fuzzy neural networks, IEEE Trans. Fuzzy Syst. 9 (2001) 578–594. L. Xu, BYY harmony learning, structural RPCL, and topological self-organizing on mixture models, Neural Netw. 15 (2002) 1125–1151.

ARTICLE IN PRESS 572

M.N. Nguyen et al. / Neurocomputing 72 (2008) 562–572

[34] L. Xu, Advances on BYY harmony learning: information theoretic perspective, generalized projection geometry, and independent factor autodetermination, IEEE Trans. Neural Netw. 15 (2004) 885–902. [35] H. Xu, C.M. Kwan, L. Haves, J.D. Pryor, Real-time adaptive on-line traffic incident detection, Fuzzy Sets Syst (1998) 173–183. Minh Nhut Nguyen received the B.Eng. degree and M.Phil. degree in Computer Engineering from Ho Chi Minh City University of Technology, Vietnam in 2001 and 2005, respectively. He is currently doing a Ph.D. at School of Computer Engineering, Nanyang Technological University, Singapore. His research interests include machine learning, fuzzy sets theory, pattern recognition and neural networks.

Daming Shi received the Ph.D. degree in mechanical control from Harbin Institute of Technology, China, and the Ph.D. degree in computer science from University of Southampton, United Kingdom. He has been serving as an Assistant Professor in Nanyang Technological University in Singapore since 2002. His current research interests include machine learning, medical image processing, pattern recognition and neural networks. Dr. Shi is a co-chair of the technical

committee on Intelligent Internet System, IEEE Systems, Man and Cybernetics Society. Jiacai Fu is now the Chair Professor and Dean of the School of Electric and Information Engineering, Heilongjiang Institute of Science and Technology. His current research interests include automation, control engineering.