Application of fuzzy Naive Bayes and a real-valued genetic algorithm in identification of fuzzy model

Application of fuzzy Naive Bayes and a real-valued genetic algorithm in identification of fuzzy model

Information Sciences 169 (2005) 205–226 www.elsevier.com/locate/ins Application of fuzzy Naive Bayes and a real-valued genetic algorithm in identifica...

301KB Sizes 2 Downloads 101 Views

Information Sciences 169 (2005) 205–226 www.elsevier.com/locate/ins

Application of fuzzy Naive Bayes and a real-valued genetic algorithm in identification of fuzzy model q Yongchuan Tang a

a,b,*

, Yang Xu

b

College of Computer Science, Zhejiang University, Hangzhou, Zhejiang 310027, PR China b Department of Applied Mathematics, Southwest Jiaotong University, Chengdu, Sichuan 610031, PR China Received 5 January 2003; received in revised form 8 March 2004; accepted 12 May 2004

Abstract We present a method to identify a fuzzy model from data by using the fuzzy Naive Bayes and a real-valued genetic algorithm. The identification of a fuzzy model is comprised of the extraction of ‘‘if–then’’ rules that is followed by the estimation of their parameters. The involved parameters include those which determine the membership function of fuzzy sets and the certainty factors of fuzzy if–then rules. In our method, as long as the fuzzy partition in the input–output space is given, the certainty factor of each rule is computed with the fuzzy conditional probability of the consequent conditioned on the antecedent by using the fuzzy Naive Bayes, which is a generalization of Naive Bayes. The fuzzy model involves the rules characterized by the highest values of certainty factors. The certainty factor of each rule is the fuzzy conditional probability, and it reflects the inner relationship between the antecedent and the consequent. In order to improve the accuracy of the fuzzy model, the real-valued genetic algorithm

q

This work has been supported by the National Natural Science Foundation of China (Grant No. 60074014) and Chinese 973 project (Grant No. 2002CB312106). * Corresponding author. Address: College of Computer Science, Zhejiang University, Hangzhou, Zhejiang 310027, PR China. Tel./fax: +86 0571 87952639. E-mail address: [email protected] (Y. Tang). 0020-0255/$ - see front matter  2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2004.05.004

206

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

is incorporated into our identification process. This process concerns the optimization of the membership functions occurring in the rules. We just involve the parameters of membership function of the fuzzy sets into the real-valued genetic algorithm, since the certainty factor of each rule can be computed automatically. The performance of the model is shown for the backing-truck problem and the prediction of Mackey–Glass time series.  2004 Elsevier Inc. All rights reserved. Keywords: Fuzzy model; Fuzzy Naive Bayes; Real-valued genetic algorithm; Conditional probability; Backing-truck; Time series prediction

1. Introduction When some additional parameters are incorporated into fuzzy modelling, the power of knowledge representation will be strengthened greatly, since in the application, the knowledge acquisition often involves imprecision [25]. The knowledge imprecision may be derived from several sources: the knowledge acquisition process used, the availability of the domain experts, the knowledge representation and reasoning method being employed. Therefore, it is necessary to introduce some additional parameters. A popular parameter in fuzzy modelling is the certainty factor of each fuzzy if–then rule, which describes how certain the relationship between the antecedent and the consequent of this rule is. The certainty factors can also fine tune the final fuzzy model, since the influence of the certainty factor is local [16]. Although the advantage of the certainty factor is clear, its determination is still a difficult problem. In general, the identification of the fuzzy model is a key problem in the application. There are mainly two approaches to tackle this problem. One is directly summarizing the operatorsÕ or expertsÕ experiences and translating their knowledge into fuzzy rules. The knowledge acquisition and verification processes, however, are difficult and time-consuming. The other approach is obtaining fuzzy rules through machine learning, with which knowledge can be automatically generated or extracted from sample cases or examples. In previous research about the automatic extraction of fuzzy models from data, the certainty factors were independently determined [1,2]. This means that these works didnÕt take into account the explicit relationship between the certainty factors and the fuzzy partition in the input–output space. The omission of this relationship will cause the number of parameters which should be determined independently to increase, and fail to give the theoretical instruction in designing the fuzzy model. One possible way to discuss the relationship between the certainty factors and other parts of the fuzzy model is the fuzzy Naive Bayes presented in Section 3.3.

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

207

The research about the Bayesian Belief Network is an active research field in current artificial intelligence research. The uncertainty due to randomness can be dealt with by the theory of probability (subjective or objective), and the human knowledge in this situation can be represented by the Bayesian belief network. Because of its mathematical foundation, the Bayesian belief network has been an important tool for reasoning in an uncertain situation. It has been used in expert systems [8], and some classification systems [5]. It should be clear that uncertainty often includes fuzziness and randomness, and they often coexist. Neither the tool based on fuzzy set theory nor the technique based on probability theory is a cure-all tool for all kinds of uncertainty, so these two theories can live side by side in providing tools for uncertainty analysis in complex, realworld problems, that is, they can operate together to deal with the uncertainty [3,4,14,20,22,24,26]. Many researchers have discussed this cooperation, Zadeh first provided the concepts of a fuzzy event and its probability [26]. And the term ‘‘fuzzy Bayesian Inference’’ was introduced as [20] meaning the generalization of Bayesian Statistics to fuzzy data, the in depth discussion about this was presented in [24]. A closer observation of the fuzzy Naive Bayes will reveal that the fuzzy if– then rules can be extracted from the fuzzy Naive Bayes. So it seems that the certainty factor of each rule can be explained as the fuzzy conditional probability of the consequent conditioned on the antecedent. An in depth analysis shows that this explanation is reasonable, and furthermore, we can extract a concise and consistent fuzzy model from all possible rules according to the certainty factors. Thus, the relationship between the certainty factors and other parts of a fuzzy model can be built up, and the problem of rule extraction can also be solved together. Then the only problem left is to determine the fuzzy partition in the input–output space. Although there exist many machine learning methods to learn about the fuzzy partition in the input–output space, we take the real-valued genetic algorithm as our feasible alternative, since genetic algorithms (GAs) have been proven to be robust mechanisms independent of domain for optimization. The genetic algorithms mimic the natural evolution process, and have a high probability of arriving at the global solution. Conventional GAs require the parameters optimized to be coded as a finite-length string over some finite alphabets, therefore, it results in a low precision and time-consuming computation resulting from the coding and decoding processes. Besides, when the search space is large, conventional GAs usually take a long time to get into the region of the global optimum and then arrive at it. These problems motivate us to use a real-valued genetic algorithm which uses real-valued genes. This real-valued genetic algorithm is then utilized to tune the fuzzy partition in the input–output space. The paper is organized as follows. Section 2 describes the fuzzy model we take into account. Section 3 introduces the Naive Bayes and fuzzy Naive Bayes. Section 4 is the identification process of the fuzzy model using the fuzzy

208

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

Naive Bayes. Section 5 is the implementation of a real-valued genetic algorithm in fuzzy modelling. Section 6 is our experimental studies about the backingtruck control problem and the Mackey–Glass time series prediction problem. 2. Fuzzy modelling Fuzzy modelling is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which a decision can be made, or patterns discerned. There are two types of fuzzy models or fuzzy inference systems: Mamdani-type and Sugeno-type. These two types of fuzzy models vary somewhat in the way outputs are determined. In this investigation we use the Mamdani-type fuzzy model. Assume we have a complex, nonlinear multiple input single output relationship (see Fig. 1). The technique of Mamdani-type fuzzy model allows us to represent the model of this system by partitioning the input space and output space. Thus if X1,X2, . . ., Xn are the input variables and Y is the output variable we can represent the non-linear function by a collection of M ‘‘rules’’ of the form RðrÞ : If ðX 1 is A1r Þ and ðX 2 is A2r Þ and    ðX n is Anr Þ then Y is Br with certainty factor ar ;

ð1Þ

where if Ui is the universe of discourse of Xi then Aij is a fuzzy subset of Ui and with V the universe of discourse of Y then Bi is a fuzzy subset of V. And r = 1,2, . . ., M, M is the total number of rules, ar (0
Fig. 1. A basic system.

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

(1) Calculate the firing level of each rule sj ^ Y sj ¼ ½Aij ðxi Þ or ½Aij ðxi Þ: i

209

ð2Þ

i

(2) Associate the importance or certainty factor aj with sj p j ¼ s j  aj : (3) Calculate the output of each rule as a fuzzy subset Fj of Y where ^ F j ðyÞ ¼ ½pj ; Bj ðyÞ:

ð3Þ

ð4Þ

(4) Aggregate the individual rule outputs to get a fuzzy subset F of Y where _ ð5Þ F ðyÞ ¼ ½F j ðyÞ: j

(5) Defuzzify the aggregate output fuzzy subset P y F ðy i Þ y ¼ Pi i : i F ðy i Þ

ð6Þ

The last step of the process is to get a singleton value for Y representative of the set F. This step is usually called the defuzzification process. Yager and Filev [21] have investigated this issue in considerable details. The most commonly used procedure for defuzzification process is the center of gravity [COA method], its calculating process is given in the last step. In general, the certainty factor of each rule represents how certain the relationship between the antecedent and the consequent of this rule is. On one hand the knowledge representation power of fuzzy model, incorporated with the certainty factors, will be strengthened. On the other hand, it is convenient to fine tune the fuzzy model when the additional certainty factors are introduced. In order to get the expected performance of the system, the learning process is necessary. Given a rule base, tuning of any fuzzy set A will influence all rules that involve that particular fuzzy set A. Thus such tuning schemes have global impact on the rule base. But just tuning the certainty factors is not enough to get the expected performance. So a learning process often begins with adjusting the fuzzy sets in the rule base firstly, then proceeds to tune the certainty factors, or learns all parameters simultaneously [1,2,6,18,23]. Here there is a hard problem, how to choice an initial fuzzy rule base. It is clear that the amount of the learned parameters increases when the additional parameters are introduced. In fact, the certainty factors are relative to the fuzzy sets in the fuzzy rule base. After describing the fuzzy Naive Bayes, we will give another explanation of the certainty factors.

210

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

3. Fuzzy Naive Bayes 3.1. Bayesian belief network Firstly we introduce the concept of Bayesian Belief Network. A Bayesian Belief Network consists of a graphical structure that is augmented by a set of probabilities. The graphical structure is a directed, acyclic graph in which nodes represent domain variables which have a finite number of possible values. Prior probabilities are assigned to source nodes, and conditional probabilities are associated with arcs. In particular, for each source node Xi (i.e., a node without any incoming arcs), there is a prior probability function P(Xi); for each node Xi with one or more direct predecessors pi, there is a conditional probability function P(Xijpi). That probability functions are represented in the form of explicit function tables called as the conditional probability tables (CPT). A general Bayesian Belief Network is represented as (V, A, P), where V is the set of variables (i.e., vertices or nodes). A the set of arcs between variables, and P the set of probabilities. Bayesian Belief Network is capable of representing the probabilities over any discrete sample space, such that the probability of any sample point in that space can be computed from the probabilities in the Bayesian Belief Network. The key feature of Bayesian Belief Network is their explicit representation of the conditional independence among events. A Bayesian Belief Network represents a full joint-probability space over the n event variables in the network, and it states that the join-probability p(X1, X2, . . ., Xn) can be factorized into the multiplication of the conditional probabilities of the variables given their respective parents, i.e., Y pðX 1 ; X 2 ; . . . ; X n Þ ¼ pðX i jpi Þ: ð7Þ i

Let hijk denote p(Xi = jjpi = k), where j is a value of variable Xi and k is a combination of the values of the parents of Xi. For convenience, we shall say that k is a value of pi and call hijk a parameter pertaining to variable Xi. And h is the vector of all parameters hijk. The parameter vector h is to be estimated from a collection D of data cases D1, D2, . . ., Dl that are independent given h. Each data case is a set of variable-value pairs. So the estimate h* of h can be obtained by setting [9] f ðX i ¼ j; pi ¼ kÞ hijk ¼ P j f ðX i ¼ j; pi ¼ kÞ

for all i; j; and k;

ð8Þ

where f(Xi = j,pi = k) stands for the number of data cases where Xi = j and pi = k in D. Although an arc from a node X to a node Y frequently is used to express that X cause Y, this interpretation of arcs in Bayesian Belief Networks is not

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

211

the only one possible. For example, Y may be only correlated with X, but not caused by X. Thus, although Bayesian Belief Networks are able to represent causal relationships, they are not restricted to such causal interpretations. In this regard, Bayesian Belief Networks can be viewed as a representation for probabilistic rule-based systems. Many researchers have studied the structure and the parameters learning problems of Bayesian Belief Network [10,13], and its application [8]. 3.2. Naive Bayes The simplest Bayesian Belief Network is so-called Naive Bayes. It just has two-level nodes and it has the simple structure (V,A,P) proposed in Fig. 2. The Naive Bayes has been successfully used to the classification problem and achieved the remarkable effect. This Naive Bayes learns from observed data the conditional probability of each variable Xi given the value of the variable Y. Then the computation of the probability P(YjX1, . . ., Xn) can be done by applying the Bayes rule. This computation is feasible by making the strong assumption that the variables Xi are conditionally independent given the value of the variable Y. Q P ðX 1 ; . . . ; X n jY ÞP ðY Þ P ðX i jY ÞP ðY Þ P ðY jX 1 ; . . . ; X n Þ ¼ ¼ i : ð9Þ P ðX 1 ; . . . ; X n Þ P ðX 1 ; . . . ; X n Þ 3.3. Fuzzy Naive Bayes The fuzzy Naive Bayes is a simple and direct generalization of the Naive Bayes. Both have the same graphical structure (see Fig. 2), the only difference between them is that the variables in the fuzzy Naive Bayes are linguistic variables which can take the fuzzy subsets as their values. Given the observed data set, Eq. (8) provides a method to estimate the conditional probabilities of the Bayesian Belief Network. In the following we will

Y

X1

X2

···

Fig. 2. Naive Bayes.

Xn

212

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

consider the computation of the conditional probabilities of each variable in the fuzzy Naive Bayes. In the fuzzy Naive Bayes, each variable takes the linguistic values each of which associates with a membership function. Let fAij : j ¼ 1; 2; . . . ; k i g is the fuzzy partition in the domain of the variable Xi, and {Bi: i = 1,2, . . ., m} is the fuzzy partition in the domain of the variable Y. Assume the elements in the observed data set D have the form X ¼ ½X ; y, and X is an n dimensional vector [x1,x2, . . ., xn]. So one way to compute the prior probabilities assigned to the node Y is proposed as follows: P

Bk ðyÞ : X 2D Bk ðyÞ

2D P ðY ¼ Bk Þ ¼ Pm XP k¼1

ð10Þ

And the conditional probabilities of other nodes can be estimated from the observed data as follows: P

P ðX i ¼

Aij jY

¼ Bk Þ ¼

i X 2D Aj ðxi ÞBk ðyÞ ; Pk i P i j¼1 X 2D Aj ðxi ÞBk ðyÞ

ð11Þ

where xi is the ith component of the vector X 2 D. It should be pointed out that Eq. (11) is a direct generalization of Eq. (8).

4. Identifying fuzzy model In this section we discuss our method to extract a concise and consistent fuzzy model from data using fuzzy Naive Bayes introduced in Section 3. Taking into account the fuzzy rule having the form as Eq. (1), there are two main duties to be completed. The first is which rules should be extracted from all possible rules. The second is how to determine the certainty factor of each rule. Actually these two duties are relative. In our method we determine the certainty factors of all possible rules firstly, then extract the rules as the executable rules according to the value of the certainty factors. Considering the system andQits fuzzy model in Sections 2 and 3, for each n possible consequent, there are i¼1 k i possible rules. For each possible rule having the form as Eq. (1), the certainty factor ar can be computed with the conditional probability of the consequent conditioned on the antecedent by using the corresponding fuzzy Naive Bayes, ar ¼ P ðY ¼ Br jX 1 ¼ A1r ; X 2 ¼ A2r ; . . . ; X n ¼ Anr Þ:

ð12Þ

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

213

According to Bayes rule, the following equation holds, P ðY ¼ Br jX 1 ¼ A1r ; X 2 ¼ A2r ; . . . ; X n ¼ Anr Þ P ðX 1 ¼ A1r ; X 2 ¼ A2r ; . . . ; X n ¼ Anr jY ¼ Br ÞP ðY ¼ Br Þ P ðX 1 ¼ A1r ; X 2 ¼ A2r ; . . . ; X n ¼ Anr Þ Qn P ðX i ¼ Air jY ¼ Br ÞP ðY ¼ Br Þ ¼ i¼1 P ðX 1 ¼ A1r ; X 2 ¼ A2r ; . . . ; X n ¼ Anr Þ Qn i i¼1 P ðX i ¼ Ar jY ¼ Br ÞP ðY ¼ Br Þ ¼ PB m 1 2 n Br ¼B1 P ðX 1 ¼ Ar ; X 2 ¼ Ar ; . . . ; X n ¼ Ar ; Y ¼ Br Þ Qn P ðX i ¼ Air jY ¼ Br ÞP ðY ¼ Br Þ ¼ PBm i¼1Qn ; i Br ¼B1 i¼1 P ðX i ¼ Ar jY ¼ Br ÞP ðY ¼ Br Þ ¼

ð13Þ

where the probability information P ðX i ¼ Air jY ¼ Br Þ and P(Y = Br) have been stored in the fuzzy Naive Bayes. In order to achieve a concise and consistent rule base, we extract a few executable rules from all possible rules as the final rule base. There are two basic demands for the rule base. The first is that for each possible consequent there is at least one executable rule. The second is that the rules in the rule base have enough large certainty factors. According to these two demands, we have the method to extract rules from all possible rules as follows: (1) Compute the certainty factor of each rule in all possible rules. This step can be done using Eqs. (12) and (13). (2) Extract rules from all possible Qnrules. For each possible consequent ‘‘Y is Bi’’ (i = 1,2, . . ., m), there are i¼1 k i possible rules which have this consequent, we take s (for example, s = 2, 3) rules which have the largest certainty factors. Therefore there are ms executable rules in the rule base. From the above procedure, as long as the fuzzy partition in the input–output space is given, the identification of fuzzy model is simple and direct from data. The appropriate choice of s can guarantee a concise and consistent rule base. When some machine learning methods are taken to learn the fuzzy partition in input–output space, the accuracy of fuzzy model will be improved largely. In the next section, we discuss the real-valued genetic algorithm which will be incorporated into the learning process of fuzzy model.

5. Real-valued genetic algorithm The genetic algorithm is a search and optimization technique which works based on evolutionary principle of natural chromosomes. Especially, the

214

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

evolution of chromosomes due to the action of crossover and mutation and natural selection of chromosomes based on DarwinÕs survival-of-the-fitness principles are all artificially simulated to constitute a robust search and optimization procedure. GAs are very different from traditional search and optimization methods used in engineering design problems. Because of their simplicity, ease of operation, minimal requirements, and global perspective, GAs have been successfully used in a wide variety of problem domains [18]. GAs work in a simple way. They begin with a population of string structures created at random. Thereafter, each string in the population is evaluated. The population is then operated by three main operators––reproduction, crossover, and mutation––to create a (hopefully) better population. The population is further evaluated and tested for termination. If the termination criteria are not met, the population is again operated by above three operators and evaluated. This procedure is continued until the termination criteria are met. One cycle of these operations and the evaluation procedure is known as a generation in GAs terminology. Conventional GAs usually represent the chromosome as discrete or binary string. Generally, the length of the string has great influence on the accuracy of the solution and the time consumption. In our optimization problem the optimized parameters in fuzzy model are continuous, so we take the real-valued genetic algorithm as our optimization tool. The real-valued genetic algorithm manipulatesP the parameters in the original real-valued space, and each chromosome is a 2ð ni¼1 k i þ mÞ-dimensional vector which is composed of the parameters of the optimization problem to be solved, where n is the number of the input variables in fuzzy model, ki is the number of fuzzy subregions of the ith variable, and m is the number of the fuzzy subregion of output variable. In Section 4, we have discussed the method of identifying the fuzzy model from data given the fuzzy partition in the input–output space, and pointed out the certainty factor of each rule is the fuzzy conditional probability of the consequent given the antecedent. In order to achieve an optimized fuzzy partition of the input–output space, we adopt the real-valued genetic algorithm to optimize the parameters which determine the shape of the membership function of fuzzy sets in the input–output space. We have assumed that the domain of the ith variable Xi is divided into ki fuzzy sets and the domain of output variable Y is divided into m fuzzy sets. Though any type of membership functions (e.g., the triangle-shaped, trapezoid-shaped and bell-shaped) can be used for fuzzy sets, we employ the gauss-shaped fuzzy sets, with the following membership function: Aij ðxi Þ ¼ exp Bi ðyÞ ¼ exp

ðxi  aij Þ2 2ðrij Þ2

ðy  ai Þ

2

2ðri Þ2

:

; ð14Þ

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

215

So each fuzzy set can be determined by two parameters (a,r), and Pnthe total fuzzy partitionPin the input–output space can be determined by 2ð i¼1 k i þmÞ n parameters––ð i¼1 k i þ mÞ pairs having the form (a,r). PnIn the real-valued genetic algorithm, therefore, each chromosome is a 2ð i¼1 k i þ mÞ-dimensional vector as the original form. We assume the population consists of N individuals or chromosomes. The chromosome in the population is evaluated by its fitness. A good chromosome means it has a high fitness, and a bad chromosome means it has a low fitness. Taking into account our actual optimization problem, we define the fitness of a chromosome as: F ¼

1 1 ¼ PT ; 2 Eþc yjÞ þ c j¼1 ðy j  

ð15Þ

where F is the fitness of this chromosome, E is the least square error function, T is the total number of the observed data, yj is the desired output and y j is the output (see Eq. (6)) of the fuzzy model determined by this chromosome, and c is a positive The initialization of the population is completed by crePnumber. n ating N 2ð i¼1 k i þ mÞ-dimensional real number vectors. In the following we will briefly describe three operators acting on the population. 5.1. Reproduction Reproduction is usually the first operator applied to a population. Reproduction selects good chromosomes in a population and forms a mating pool. The essential idea of this operation is that above-average chromosomes are picked from the current population and multiple copies of them are inserted in the mating pool. Since multiple copies of the good chromosomes are copied, bad chromosomes are eliminated from the population for further consideration. Thus the reproduction operator acts as an exploiting operation for the good chromosomes in the population. In the real-valued genetic algorithm, we take the commonly-used reproduction operator––the proportionate selection operator, where a chromosome in the current population is selected with a probability proportional to the chromosomeÕs fitness. Thus, the ith chromosome in the population is selected with a probability proportional to Fi––its fitness (see Eq. (15)). Since the population size is kept fixed, the cumulative probability for all chromosomes in the population PNmust be one, Therefore, the probability for selecting ith chromosome is F i = i¼1 F i , where N is the population size. 5.2. Crossover Crossover operator is applied next to the chromosomes of the mating pool. It is mainly responsible for the search aspect of genetic algorithm. There exists

216

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

a number of crossover operators in GAs, but in almost all crossover operators, two chromosomes are picked out from the mating pool at random and some portion of the chromosomes are exchanged between the chromosomes. In a single-point crossover operator, this is performed by randomly choosing a crossing site along the chromosome and by exchanging all bits on the right side of the crossing site. In a two-point crossover operator, two random sites are chosen and the contents bracketed by these sits are exchanged between two parents. This idea can be extended to create a multi-point crossover operator and the extreme of this extension is what is known as a uniform crossover operator. Our preference on choice of a crossover operator is the single-point crossover operator. However, in order to preserve some good chromosomes found in the mating pool, not all chromosomes in the population are used in crossover. Let pc be the crossover probability, then pcN chromosomes in the population are used in the crossover operation and (1pc)N chromosomes are simply copied to the new population. 5.3. Mutation With a small mutation probability pm, the mutation operator alters the value in each position along the chromosomes in the population. The need for mutation is to keep diversity in the population. Furthermore, for local improvement of a solution, mutation may be P found useful. In our approach, since the chromosomes are represented as 2ð ni¼1 k i þ mÞ-dimensional real number vectors, the mutation mechanism proceeds by adding a small or a large amount of noise to the selected positions of the chromosomes.

6. Experimental studies In this section, we test our approach to solve the backing truck control problem and V Mackey–Glass time series prediction problem. And we take the operator to compute the firing level of the input to the fuzzy model, see Eq. (2). 6.1. Experiment one: backing truck control It is a difficult task to back a truck to a loading dock [19,15]. The kinematics of the truck can be approximated by the following equation: xðt þ 1Þ ¼ xðtÞ þ cos½/ðtÞ þ hðtÞ þ sin½hðtÞ sin½/ðtÞ;

ð16Þ

yðt þ 1Þ ¼ yðtÞ þ sin½/ðtÞ þ hðtÞ  sin½hðtÞ cos½/ðtÞ;

ð17Þ

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

217

(x=10, φ = 90 o )

(x, y) θ

φ x=20

x=0

Fig. 3. The diagram of simulated truck. (For color see online version.)

/ðt þ 1Þ ¼ /ðtÞ  csin

  2 sin½hðtÞ : b

ð18Þ

where / is the angle of the truck, b is the length of the truck (e.g. b = 4 in our experiments), x and y are the Cartesian position of the rear of the center of the truck, and h is the control angle to the truck. Fig. 3 shows a computer-screen image of the simulated truck and loading zone. The goal is to design a controller, whose inputs are /(t) 2 [90,270] and x(t) 2 [0,20], and whose output is h(t) 2 [40,40] such that the final states will be (xf,yf) = (10,90). In order to design a fuzzy model––a fuzzy controller, we get training data using trial-and-error way to control the truck. In our experiment, we generate 13 successful control trajectory, their initial states (x,/) are (0,90), (0,90), (0,180), (5,30), (5,30), (5,120), (5,210), (10,60), (10,90), (10,145), (15,45), (15,45), (15,135). There are 643 data as the form of (x,/,h) in the training data. Then we use the real-valued genetic algorithm described in Section 5 to identify the parameters of the fuzzy model, and at the same time use the method described in the Section 4 to identify the construction of the fuzzy model. In real-valued genetic algorithm, the population size N is 100, the crossover probability pc is 0.8 and the mutation probability pm is 0.1, and each variable domain is divided into 5 subregions by 5 gauss-shaped fuzzy subsets which are defined in Eq. (14), so there are 125 possible rules. For each possible rule consequent, we just extract 2 rules which have the largest certainty factors from all 25 possible rules. After 1000 generations, we achieve an optimized fuzzy partition in the input–output space (see Table 1), and 10 rules as the final rule base of the fuzzy model (see Table 2).

218

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

Table 1 The distribution of fuzzy subsets of fuzzy controller Aij ¼ ðaij ; rji Þ j=1 j=2 j=3 j=4 j=5

Bj = (aj,rj)

x (i = 1)

/ (i = 2)

h

(5.328,5.242) (9.528,5.07) (2.704,4.672) (4.683,3.14) (15.6,7.156)

(3.037,0.218) (0.786,0.836) (0.484,1.483) (1.811,1.242) (1.026,0.725)

(0.613,0.072) (0.691,0.053) (0.524,0.303) (0.211,0.053) (0.211,0.056)

Table 2 The rule base of fuzzy controller Antecedent

R(1) R(2) R(3) R(4) R(5) R(6) R(7) R(8) R(9) R(10)

Consequent

if x is

and / is

then h is

A14 A13 A15 A12 A14 A11 A13 A13 A13 A13

A25 A25 A25 A25 A21 A21 A24 A22 A24 A22

B1 B1 B2 B2 B3 B3 B4 B4 B5 B5

With certainty factor

0.622 0.564 0.604 0.538 1.000 1.000 0.093 0.085 0.089 0.081

Three randomly chosen initial states (x,/) = (3,20), (8,160) and (18,80) are used to test this fuzzy model just learned from the training data which identified by the real-valued genetic algorithm and fuzzy Naive Bayes method. Fig. 4 illustrates the truck trajectory from the initial states. Actually more tests also show the control results are successful and our method is encouraging. Kong and Kosko [7] reported that their multi-layer neural controller was trained with 35 training trajectories and needed more than 10,000 iterations to train. Even after that laborious training, the truck controlled by the neural network sometimes followed an irregular path. Park and Pao [17] also designed a neural controller without hidden layer to control the truck. The control performance was improved, but two redundant input variables were used in this neural controller, and this neural network still needed 5000 iterations to train, and In comparisons with the above mentioned controllers, our fuzzy controller with 10 fuzzy rules after 1000 iterations training always moves the truck properly without deviation from the path. In Figs. 5 and 6, we show the comparisons of the values of the control action h generated by the classical fuzzy controller proposed in [7] and our proposed

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

219

o

(x = 3, φ = 20 )

o

(x = 18, φ = 80o)

(x = 8, φ = 160 )

5

0

5

10

15

20

25

Fig. 4. Truck trajectories controlled by the fuzzy model identified by GAs and fuzzy Naive Bayes.

fuzzy controller for trajectories (18,80) and (10,120). In these figures, we can see the control action using our proposed fuzzy controller is smoother and more human-like than that when using the classical fuzzy controller in [7]. 6.2. Time series prediction We construct a fuzzy model to predict a time series that is generated by the following Mackey–Glass (MG) time-delay differential equation. x_ ðtÞ ¼

0:2xðt  sÞ  0:1xðtÞ: 1 þ x10 ðt  sÞ

ð19Þ

This time series is chaotic, and so there is no clearly defined period. The series will not converge or diverge, and the trajectory is highly sensitive to initial conditions. This is a benchmark problem in the neural network and fuzzy modelling research communities [18]. To obtain the time series value at integer points, we applied the fourth-order Runge–Kutta method to find the numerical solution to the above MG equation. Here we assume x(0) = 1.2, s = 17, and x(t) = 0 for t<0. In time-series prediction we want to use known values of the time series up to the point in time, say, t, to predict the value at some point in the future, say, t + P. The standard method for this type of prediction is to create a mapping

220

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226 0.2

0.1

Control Action

0

- 0.1 - 0.2 - 0.3 -0.4 - 0.5 - 0.6 Number of Control Actions

Fig. 5. Comparison of control action h 2 [p/2,3p/2] between the classical fuzzy controller (dotted line) in [7] and our proposed fuzzy controller (solid line) for trajectory (18, 80). (For color see online version.) 0.25

0.2

Control Action

0.15

0.1

0.05

-0 - 0.05 - 0.1 Number of Control Actions

Fig. 6. Comparison of control action h 2 [p/2,3p/2] between the classical fuzzy controller (dotted line) in [7] and our proposed fuzzy controller (solid line) for trajectory (10,120). (For color see online version.)

from D sample data points, sampled every D units in time, (x(t(D1)D), . . ., x(tD),x(t)), to a predicted future value x(t + P). Following

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

221

the conventional settings for predicting the MG time series, we set D = 4 and D = P = 6. For each t, the input training data for the fuzzy model is a four dimensional vector of the form, X(t) = [x(t18), x(t12), x(t6), x(t)]. The output training data corresponds to the trajectory prediction, y(t) = x(t + 6). For each t, ranging in values from 118 to 1117, there will be 1000 input/output data values. We use the first 500 data values for the fuzzy model training (these become the training data set), while the others are used as checking data for testing the identified fuzzy model. When constructing a fuzzy model for this prediction problem, the domain of each variable is divided into 4 fuzzy subregion, the total 45 subregions construct a fuzzy partition in the domain of variables. Each fuzzy subregion is represented by a gauss-shaped membership function defined in Eq. (14). Like the identification of fuzzy controller for backing truck, for each possible consequent, there are 2 executable rules which have this consequent and have the largest certainty factors, So there are 8 rules in the rule base of the final fuzzy model. It is noticeable that the fuzzy model with certainty factors identified by the method mentioned in Section 4 can predict the trend of the time series without GA learning process, for example, when the partition is evenly fuzzy partition, the fuzzy model with 8 rules can predict the trend of the time series approximately, see Figs. 7 and 8, we have tried other fuzzy partitions, the results are similar. Therefore we expect the fuzzy model trained by the realvalued genetic algorithm can predict the time series more accurately. When real-valued genetic algorithm is applied to train the fuzzy model, the crossover probability pc is 0.8, mutation probability pm is 0.1, and the population size is 100. After 1000 generations, GA terminates, the final fuzzy model is achieved. Figs. 9 and 10 show the learning error and the testing error of the desired output and the predicted output. The final fuzzy partition and the rule base are shown in Tables 3 and 4. A deep observation of the rule base uncovers that each input space is just divided into 3 fuzzy subregions, but not 4 fuzzy subregions. Actually the fuzzy subsets Ai3 ; i ¼ 1; 2; 3; 4 have no effects on the rule base. This means that our method can reduce the redundancy of the fuzzy model. In this experiment, we notice that the prediction accuracy of the fuzzy model with the certainty factors is superior to that of the fuzzy model without certainty factors. The rules in these two fuzzy models have the same antecedents and consequences. The difference between them is that the certainty factors in the first model are explained as fuzzy conditional probabilities, but the certainty factors in the second model are assumed to be unit value. It is clear that the computational cost in the first model is higher than that in the second model, but in the practical application, the training phase is completed offtime, and the prediction accuracy is more important. As mentioned above, the certainty factor of each fuzzy rule is explained as the conditional fuzzy probability of the consequent conditioned on the

222

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4

1

100

200

300

400

500

Fig. 7. Mackey–Glass prediction result of the training data without GA learning: thin line is the desired output, thick line is the predicted output.

1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 501

600

700

800

900

1000

Fig. 8. Mackey–Glass prediction result of the testing data without GA learning: thin line is the desired output, thick line is the predicted output.

antecedent. This explanation reveals the inner relationship among components of fuzzy rule. Based on this explanation, one method to identify fuzzy model is presented in this paper. To our knowledge, previous work about fuzzy model failed to consider the relationship between the certainty factors and the bodies

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

223

0.15

0.1

0.05

0

- 0.05 - 0.1 1

100

200

300

400

500

Fig. 9. The learning error. (For color see online version.)

0.15

0.1

0.05

0

- 0.05 - 0.1 501

600

700

800

900

1000

Fig. 10. The testing error. (For color see online version.)

of rule. So our work presented in this paper is an attempt to build up this relationship. Of course, the conditional fuzzy probability of certainty factor is not the only explanation, there maybe exist other explanations about certainty factor, but it is not the case currently. The method presented in this paper can also cooperate with other identification methods. The hybrid method can work as follows, firstly the antecedent and consequent of fuzzy rule are identified from the training data set by using

224

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

Table 3 The distribution of fuzzy subsets of fuzzy predictor Aij ¼ ðaij ; rij Þ j=1 j=2 j=3 j=4

Bj = (aj,rj)

X1 (i = 1)

X2 (i = 2)

X3 (i = 3)

X4 (i = 4)

Y

(0.966,0.216) (1.251,0.099) (1.165,0.204) (0.758,0.216)

(0.772,0.134) (0.959,0.331) (0.886,0.283) (0.728,0.257)

(0.792,0.207) (0.528,0.262) (0.781,0.253) (0.935,0.235)

(1.111,0.294) (0.647,0.201) (0.774,0.293) (0.463,0.199)

(0.997,0.183) (0.455,0.188) (1.279,0.133) (0.764,0.243)

Table 4 The rule base of fuzzy predictor Antecedent

R(1) R(2) R(3) R(4) R(5) R(6) R(7) R(8)

Consequent

if X1 is

and X2 is

and X3 is

and X4 is

then Y is

A14 A14 A12 A12 A14 A14 A12 A12

A21 A21 A22 A22 A21 A21 A21 A21

A32 A32 A31 A34 A34 A31 A32 A32

A44 A42 A44 A44 A41 A41 A44 A42

B1 B1 B2 B2 B3 B3 B4 B4

With certainty factor

0.536 0.532 0.530 0.529 0.488 0.454 0.599 0.592

the conventional identification method, then the certainty factor is computed via the conditional fuzzy probability. For example, we can adopt method developed in [11] to construct the bodies of rule, and use the method in this paper to get the values of certainty factors.

7. Conclusions In this paper, we present a method to identify the fuzzy model from data. The key issue of this method is to identify the certainty factor of each fuzzy ‘‘if–then’’ rule. Previous work failed to consider the inner relationship between the certainty factors and other parts of fuzzy rules. In this paper, the certainty factor of each rule is computed with the fuzzy conditional probability of the consequent conditioned on the antecedent by using the fuzzy Naive Bayes. The final fuzzy model involves the rules characterized by the highest values of certainty factors. The probabilistic explanation of the certainty factors reduces the number of parameters determined independently. In order to improve the accuracy of the fuzzy model, the real-valued genetic algorithm is incorporated into the identification process to train the fuzzy model. The per-

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

225

formance of the model is shown for the backing-truck problem and prediction of the Mackey–Glass time series.

References [1] A. Blanco, M. Delgado, I. Requena, A learning procedure to identify weighted rules by neural networks, Fuzzy Sets and Systems 69 (1995) 29–36. [2] F. Cheong, R. Lai, Constraining the optimization of a fuzzy logic controller using an enhanced genetic algorithm, IEEE Transactions on Systems, Man, and Cybernetics––Part B: Cybernetics 30 (2000) 31–46. [3] John A. Drakopolulos, Probabilities, possibilities, and fuzzy sets, Fuzzy Sets and Systems 75 (1995) 1–15. [4] D. Dubois, H. Prade, Random sets and fuzzy interval analysis, Fuzzy Sets and Systems 42 (1991) 87–101. [5] N. Friedman, D. Geiger, M. Goldszmit, Bayesian network classifiers, Machine Learning 29 (1997) 131–163. [6] N. Kasabov, J. Kim, R. Kozma, A fuzzy neural network for knowledge acquisition in complex time series, Control and Cybernetics 27 (1998) 593–611. [7] S.-G. Kong, B. Kosko, Adaptive fuzzy systems for backing up a tuck-and-trailer, IEEE Transactions on Neural Networks 3 (2) (1992) 211–223. [8] G.D. Kleiter, Bayesian diagnosis in expert systems, Artificial Intelligence 54 (1992) 1–32. [9] G.D. Kleiter, Propagating imprecise probabilities in Bayesian networks, Artificial Intelligence 88 (1996) 143–161. [10] P. Larranaga, C.M.H. Kuijpers, R.H. Murga, Y. Yurramendi, Learning Bayesian network structures by searching for the best ording with genetic algorithms, IEEE Transactions on Systems, Man, and Cybernetics––Part A: Systems and Humans 26 (1996) 487–493. [11] C.C. Lee, Fuzzy logic in control systems: fuzzy logic controller––Part 1, IEEE Transactions on Systems, Man, and Cybernetics 20 (1990) 404–419. [12] C.C. Lee, Fuzzy logic in control systems: fuzzy logic controller––Part 2, IEEE Transactions on Systems, Man, and Cybernetics 20 (1990) 419–435. [13] Radford M. Neal, Connectionist learning of belief networks, Artificial Intelligence 56 (1992) 71–113. [14] Hung T. Nguyen, Fuzzy sets and probability, Fuzzy Sets and Systems 90 (1997) 129–132. [15] D. Nguyen, B. Widrow, The truck backer-upper: an example of self-learning in neural networks, IEEE Control Systems Magazine 10 (1990) 18–23. [16] K. Pal (nee Dutta), N.R. Pal, Learning of rule importance for fuzzy controllers to deal with inconsistent rules and for rule elimination, Control and Cybernetics 27 (1998) 521–543. [17] G.H. Park, Y.H. Pao, Training the neural-net controller with the help of trajectories generated from fuzzy rules (demonstrated with the truck backup task), Neurocomputing 18 (1998) 91– 105. [18] M. Russo, Genetic fuzzy learning, IEEE Transactions on Evolutionary Computation 4 (2000) 259–273. [19] M.-C. Su, H.-T. Chang, Application of neural networks incorporated with real-valued genetic algorithms in knowledge acquisition, Fuzzy Sets and Systems 112 (2000) 85–97. [20] R. Viertl, Is it necessary to develop a fuzzy Bayesian inference? in: R. Viertl (Ed.), Probability and Bayesian Statistics, Plenum Publishing Company, New York, 1987, pp. 471–475. [21] R.R. Yager, On the issue of defuzzification and selection based on a fuzzy set, Fuzzy Sets and Systems 55 (1993) 255–272.

226

Y. Tang, Y. Xu / Information Sciences 169 (2005) 205–226

[22] R.R. Yager, D.P. Filev, Including probabilistic uncertainty in fuzzy logic controller modeling using Dempster-Shafer theory, IEEE Transactions on Systems, Man and Cybernetics 25 (1995) 1221–1230. [23] R.R. Yager, D.P. Filev, Unified structure and parameter identification of fuzzy models, IEEE Transactions on Systems, Man and Cybernetics 23 (1993) 1198–1205. [24] F.-S. Sylvia, On fuzzy Bayesian inference, Fuzzy Sets and Systems 60 (1993) 41–58. [25] D.S. Yeung, E.C.C. Tsang, Weighted fuzzy production rules, Fuzzy Sets and Systems 88 (1997) 299–313. [26] L.A. Zadeh, Probability measures of fuzzy events, Journal of Mathematical Analysis and Applications 10 (1968) 421–427.