Application of neural networks incorporated with real-valued genetic algorithms in knowledge acquisition

Application of neural networks incorporated with real-valued genetic algorithms in knowledge acquisition

Fuzzy Sets and Systems 112 (2000) 85–97 www.elsevier.com/locate/fss Application of neural networks incorporated with real-valued genetic algorithms i...

220KB Sizes 2 Downloads 22 Views

Fuzzy Sets and Systems 112 (2000) 85–97 www.elsevier.com/locate/fss

Application of neural networks incorporated with real-valued genetic algorithms in knowledge acquisition Mu-Chun Su ∗ , Hsiao-Te Chang Department of Electrical Engineering, Tamkang University, Tamsui, Taiwan Received November 1997; received in revised form May 1998

Abstract Often a major diculty in the design of rule-based systems is the process of acquiring the requisite knowledge in the form of If–Then rules. This paper presents a class of fuzzy degraded hyperellipsoidal composite neural networks (FDHECNNs) that are trained to provide appealing solutions to the problem of knowledge acquisition. The values of the network parameters, after sucient training, are then utilized to generate If–Then rules on the basis of preselected meaningful features. In order to avoid the risk of getting stuck in local minima during the training process, a real-valued genetic algorithm is proposed to train FDHECNNs. The e ectiveness of the method is demonstrated on two problems, namely, the “truck backer-upper” problem as well as real-world application of a hypothesis regarding the pathophysiology c 2000 Elsevier Science B.V. All rights reserved. of diabetes. Keywords: Neural networks; Pattern recognition; Medicine; Fuzzy control

1. Introduction Rule-based expert systems are rather practical development in the arti cial intelligent (AI) eld. They are based on the premise that expertise can be encapsulated in a set of If–Then statements. Expert systems have already proven useful in many applications such as decision making, pattern recognition, speech understanding, fuzzy control, and so on. An area where expert systems nd exciting applications is in medical diagnosis because there are many diagnostic processes which are guided by precompiled If–Then rules. In many cases, we have to deal with data which are incomplete, imprecise, uncertain, or vague. In order to handle such kinds of data the probability-based method is often adopted. An important and well-known example is the MYCIN expert system [24] which introduces the concept of “certainty factors” to deal with uncertainty. However, humans do not think in terms of probability values but in terms of such expressions as “large”, “very hot”, “slow”, and so on. This motivates the methods of incorporating fuzzy sets and=or fuzzy logic into expert systems and forms so-called fuzzy systems. Fuzzy approach is in a sense matched to human ∗

Corresponding author. Tel.: +886-2-2621-5656, ext. 2615; fax: +886-2-2622-1565. E-mail address: [email protected] (M.-C. Su)

c 2000 Elsevier Science B.V. All rights reserved. 0165-0114/00/$ - see front matter PII: S 0 1 6 5 - 0 1 1 4 ( 9 8 ) 0 0 1 8 0 - 8

86

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

reasoning or decision-making. The performance of the rule-based systems (either conventional expert systems or fuzzy systems) is highly related to the completeness of the knowledge base. The construction of a complete knowledge base involves the process of knowledge acquisition. Traditionally, the design of rule-based expert systems involves a process of interaction between a domain expert and a knowledge engineer who formalizes the expert’s knowledge as inference rules and encodes it in a computer. However, there are several diculties in obtaining an adequate set of rules from human experts. Expert may not know, or may be unable to articulate, what knowledge they actually used in solving their problems. Often, the development of an expert system is time-consuming. Thus, the process of building an expert system requires much e ort. Another important problem is that it is dicult to determine whether the knowledge base is correct, consistent and=or incomplete. One way to alleviate these problems is to use machine learning to automate the process of knowledge acquisition [20]. Neural networks are attracting a lot of interest in the scienti c community because of their dynamical nature, robustness, capability of generalization and fault tolerance. A very appealing aspect of neural networks is that they can inductively acquire concepts from examples, therefore, neural networks have long been considered a suitable framework for machine learning. Basically, a neural network is a massively parallel distributed processor and can improve its performance by adjusting its synaptic weights. Specially, a feedforward multilayer neural network with sucient hidden nodes has been proven to be a universal approximator [4,8,14,22]. Although neural networks have many appealing properties, there are three main disadvantages in neural networks. The rst one is that there is no systematic way to set up the topology of a neural network. The second is that it usually takes lengthy time to train a neural network. The third and the most apparent disadvantage is that a trained neural network is unable to explain its response because the knowledge is encoded in the values of synaptic weights. In other words, a neural network cannot justify its response on the basis of explicit rules or logical reasoning process. There have been several attempts to overcome the problem. One approach is to interpret or extract rules from a trained backpropagation network [7]. The algorithm proposed by Bochereau and Bourgine can only extract some sets of the Boolean logic rules [3]. Goodman et al. considered problems of extracting rules that relate to a set of discrete feature variables [12]. Gallant developed a neural network expert system MACIE (MAtrix Controlled Inference Engine), which possesses features that are usually associated with conventional expert systems [9,10]. Basically, the above mentioned methods extract crisp rules. A new approach which have been attracting the growing interest of researchers is to integrate neural networks and fuzzy systems into an intelligent system. This neuro-fuzzy synergistic integration reaps the bene ts of both neural networks and fuzzy systems [1,2,15,16,23,29]. The integrated systems possess the advantages of neural networks (e.g. learning abilities, capability of generalization, optimization abilities, and connectionist structures) and fuzzy systems (e.g. high-level If–Then fuzzy rule thinking and reasoning). Each has its own advantages and disadvantages. In this paper we propose to train fuzzy degraded hyperellipsoidal composite neural networks (FDHECNNs) so as to provide an appealing solution to the problem of knowledge acquisition. The values of the network parameters, after sucient training, can be then utilized to generate fuzzy If–Then rules on the basis of preselected meaningful features. A FDHECNN is a two-layer feedforward neural network. As can be seen, a multilayer feedforward neural network incorporated with a gradient-based training algorithm is apt to getting stuck in local minima during the training procedure. In order to increase the chance of arriving at the global minimum, genetic algorithms (GAs) provide a feasible alternative since GAs have proved to be robust, domain independent mechanisms for optimization. While GAs have the advantage of not getting stuck in local optima, conventional GAs have shortcomings. Conventional GAs require the natural parameter set of an optimization problem to be coded as a nite-length string over some nite alphabets, therefore, it results in the problem of imprecision and computation load resulting from the coding and decoding processes. Besides, when the search space is large, GAs usually take lengthy time to get into the region of global optimum and then arrive at it. These problems motivated us to propose a real-valued genetic algorithm which use real-valued genes. This special real-valued GA is then utilized to train FDHECNNs.

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

87

This paper is organized into ve sections. The following section, Section 2, introduced the characteristics of the class of FDHECNNs. Section 3 presents the proposed real-valued GA. Experimental results are given in Section 4. We rst demonstrate how to use the real-valued GA to train a FDHECNN to extract fuzzy rules for backing a truck to a loading dock. We then apply FDHECNNs to an example of a hypothesis regarding the pathophysiology of diabetes. The last section of this paper will present a brief summary and conclusions based on our results. 2. Neuro-fuzzy systems A fuzzy model describes the system under consideration by establishing relationships between the linguistic values of the input and output variables. These relationships are expressed in the form of If–Then fuzzy rules. The Mamdani fuzzy model [19], Takagi–Sugeno fuzzy model [27], and singleton fuzzy model are the three most widely employed types of fuzzy models. The major di erence between these three fuzzy models lie in the consequents of their fuzzy rules, and thus their aggregation and defuzzi cation procedures di er accordingly. In general, any one of the above types of fuzzy models employs fuzzy grid partitions of the continuous domains of the input and output variables, as shown in Fig. 1. This partition strategy encounters problems when we have a moderately large number of inputs. First, the number of rules increases exponentially with the number of input and output variables and the number of partitions of the domains of these variables. In addition, Takagi and Hayashi stated that the grid partition strategy cannot capture the possible correlations between the input variables [26]. In order to overcome this weak point, they claimed that the input space should be partitioned appropriately in order to re ect the existence of such correlations. Therefore they suggested that the conventional antecedent of a fuzzy rule If

x1 is A1

and · · · and xn is An

(1)

should be modi ed to be If x is A;

(2)

where x = [x1 ; x2 ; : : : ; xn ]T , Ai and A are linguistic values characterized by a one-dimensional fuzzy set and a multidimensional fuzzy set de ning an arbitrarily shaped fuzzy region in the input space, respectively. This type of fuzzy partition o ers the possibility of capturing the correlations between the input variables. Fig. 2 illustrates such a multidimensional fuzzy partition. Obviously, the complexity of de ning this kind of partition increases drastically. They proposed training feedforward neural networks in order to implement this type of fuzzy partition. The price one has to pay in this case is that the synaptic weights of the trained network have no clear physical meaning. In order to make a satisfactory trade-o between capturing the correlations and the complexity of representation, one option is to use an aggregation of hyperellipsoids to t a multidimensional fuzzy set or a multidimensional fuzzy region, as shown in Fig. 3. It turns out that the fuzzy model so obtained can be viewed very well as a two-layer FDHECNNS. The class of FDHECNNs is a modi ed version of the class of neural networks with quadratic junctions. The inherent architectural eciency of this kind of neural networks was demonstrated in [5,6,25]. The term “degraded hyperellipsoid” refers to as a hyperellipsoid of which principal axes are parallel to the input coordinates. A two-layer feedforward fuzzy degraded hyperellipsoidal composite neural network (FDHECNN) shown in Fig. 4 is described by: netj (x) = dj2 −

n X

(wji xi + bji ) 2 ;

(3)

i=1

mj (x) = exp{−sj2 [max(dj2 ; dj2 − netj (x)) − dj2 ]};

(4)

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

88

Fig. 1. An example of traditional fuzzy rule partitions.

Fig. 3. An aggregation of ellipses to partition the input space.

Fig. 2. An example of arbitrarily shaped fuzzy rule partitions.

Fig. 4. The symbolic representation of a two-layer FDHECNN.

and Out(x) =

J X

wj mj (x) + ;

(5)

j=1

where bji ; wji and dj are adjustable weights, sj is the slope value which regulates how fast mj (x) goes down, wj is the connection weight from hidden node j to the output, and  is a bias term. If we assume dj is zero then FDHECNNs are isomorphic to the radial basis function (RBF) networks using Gaussian potential functions. The RBF networks using Gaussian potential functions have been proven to be universal approximators [22]. The essential di erence between the Gaussian function and the mj (x) can be illustrated in Figs. 5 and 6. Obviously, mj (x) o ers more exibility. The functionality of mj (x) can change from the Gaussian function depicted in Fig. 5 to the step-like function depicted in Fig. 6. Therefore, the FDHECNNs o er signi cantly more exibility in approximating functions.

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

89

Fig. 5. An example of mj (x) when dj = 0.

Fig. 6. An example of mj (x) when dj is a small number.

After sucient training, the values of the synaptic weights of a FDHECNN can be utilized to extract a set of If–Then fuzzy rules. Since the function mj (x) Pnmeasures the degree to which an input pattern x is close to the n-dimensional hyperellipsoid de ned by i=1 (wji xi + bji ) 2 = dj2 . We may de ne a fuzzy set HEj characterized by mj (x) in the input space, with mj (x) representing the grade of membership of x ∈ < n in the fuzzy set HEj . Then the presence of an input pattern to the jth node is equivalent to ring the following fuzzy rule: If

(x is HEj )

Then

Out(x) is wj :

(6)

In order to numerically combine these fuzzy rules and to compute a crisp output we use the following defuzzi cation procedure: Out(x) =

J X j=1

wj mj (x) + :

(7)

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

90

The defuzzi cation procedure de ned by Eq. (7) is computationally equivalent to the most popular “center average defuzzi cation” PJ Out(x) =

j=1

PJ

Outj mj (x)

j=1

mj (x)

(8)

except for the bias term . The reason for including  in the architecture of FDHECNNs is that we can increase the eciency of FDHECNNs. Pn The performance of a FDHECNN depends on the chosen fuzzy hyperellipsoids de ned by i=1 (wji xi + bji ) 2 = dj2 and the connection weights wj , for 16j6J . The fuzzy hyperellipsoids should suitably sample the input domain and re ect the data distribution. There are two approaches to adjust the weights. The rst approach is to use gradient-based methods. The methods can be subdivided into two subclasses. The rst subclass is to x the weights, bji , dj , and wji , and to update only the connection weights, wj , in real time using the recursive least mean-square (LMS) algorithm. However, it is advantageous to update all weights, bji ; dj ; wji , and wj , simultaneously, because this will signi cantly improve the modeling capability. The reason why the modeling capability can be improved is because the boundary of hyperellipsoids can be adjusted to eciently sample the data distribution. As can be seen, it is prone to get stuck in local minima since the network’s error surface is highly dimensional and usually contains many local minima. The second approach is based on genetic algorithms which usually avoid local minima by searching in several regions simultaneously [11]. The only information they need is some performance value that determines how good a given set of weights is (no gradient information). We will discuss GAs in the following section.

3. Real-valued genetic algorithms Genetic algorithms are derivative-free stochastic optimization methods based loosely on the concepts of natural selection. Their popularity can be mainly attributed to their freedom from dependence on functional derivatives. They use operations found in natural genetics to guide themselves through the paths in the search space. Three main operations are reproduction, crossover, and mutation. They work on a population of individuals each representing a possible (trial) solution to the problem. Each individual is assigned a “ tness score” according to how good a solution to the problem it is. Only individuals that are competitive (according to their tness values) get the chance to survive long enough to produce o spring by crossover and mutation and thus to transfer their genetic material to the next generation. After a number of generations, the population contains members with better tness scores; this is analogous to Darwinian models of evolution by random mutation and natural selection. Conventional GAs usually represent trial solutions in form of a discrete or binary string called a chromosome. For large problems, binary encoding result in very large strings which can slow down the evolution process. On the other hand, if the length of the string is not long enough, it is possible for GAs to get near the region of the global optimum but not to arrive at it. That is, one drawback of conventional GAs is that they seem to have diculties to ne tune the parameters. Since in physical world, most optimization problems involve real-valued parameters, it is better to manipulate them directly in the original real-valued space instead of the discretized space. In the following, we rst brie y describe the three operators composing of the proposed real-valued GA and then generalize the three operators to manipulate real-valued parameters. In our real-valued GA, each chromosome represents an n-dimensional vector which is composed of the parameters of the optimization problem to be solved.

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

Fig. 7. The proposed reproduction mechanism with the “bomb e ect”.

91

Fig. 8. An example of conventional two-point crossover mechanism.

3.1. Reproduction Reproduction is an exploitative mechanism. Exploitation is achieved by favoring chromosomes that are more t so as to improve the chances of converging towards an optimal region. Conventional reproduction proceeds in the way that individual strings are exactly copied to the population according to their tness value. In our approach, for each chromosome, only part of its o spring are exact copies of the original one and the remaining o spring are disturbed by a very small amount. In other words, the reproduction process results in a “bomb e ect” so as to increase the chance of arriving at the global optima. Fig. 7 illustrates the proposed reproduction mechanism with the bomb e ect. There are many variations of the reproduction mechanism depending on the bomb e ect. In one version, the bomb e ect decreases as the number of generations increases in order to avoid divergence and increase chances of arriving at the global optimum. 3.2. Crossover To exploit the potential of the current gene pool, we use crossover operators to generate new chromosomes that we hope will retain good features from the previous generation. There are two types of crossover operators: (1) one-point crossover where a crossover point on the gene code is selected at random and two parent chromosomes are interchanged at this point and (2) two-point crossover where two crossover points are selected and the part of the chromosome string between these two points is then swapped to generate two children. Fig. 8 illustrates the conventional two-point crossover mechanism undergoing crossing over. As can be seen, the crossover operator may bring two new chromosomes either more closer to or pull them even more far away from each other. Therefore, in our approach, the crossover mechanism is de ned as follows: ( 0 x1 = x1 + (x1 − x2 ); (9) Pull Away: x20 = x2 − (x1 − x2 ) or

( Get Closer:

x10 = x1 + (x2 − x1 ); x20 = x2 − (x2 − x1 );

(10)

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

92

where x1 and x2 are two selected chromosomes which are going to proceed crossover. The two vectors x10 and x20 are the two chromosomes produced after crossing over x1 and x2 . The parameter  is a randomly chosen small positive number. There are also di erent versions to proceed the proposed crossover mechanism. One way is to proceed the mechanism on a dimension-by-dimension base by tossing a fair coin to select variables to undergo the crossover mechanism. The other option is to proceed the mechanism just in the manner described by Eqs. (9) and (10). 3.3. Mutation Mutation provides a mechanism to maintain the population diversity in the way of spontaneously generating new chromosomes. The conventional way of implementing mutation is to ip a bit with a probability equal to a very low given mutation rate. In our approach, since the chromosomes are represented as n-dimensional vectors, the mutation mechanism proceeds by adding a small or large amount of noise to the selected chromosome. Again, we may choose to mutate the chromosome on a dimension-by-dimension base or to directly add an n-dimensional noise vector to the original chromosome. It should be emphasized that the mutated chromosome should still lie in its domain space. 4. Experimental results In this section, we present two examples to show how to use the proposed real-valued GA to train FDHECNNs to solve the problem of truck backer-upper control and a medical diagnostic problem. 4.1. Example 1: Truck backer-upper control It is a dicult task to back a truck to a loading dock. The kinematics of the truck can be approximated by the following equation: x(t + 1) = x(t) + cos[(t) + (t)] + sin[(t)] sin[(t)];

(11)

y(t + 1) = y(t) + sin[(t) + (t)] − sin[(t)] cos[(t)];   2 sin[(t)] ; (t + 1) = (t) − arcsin b

(12) (13)

where  is the angle of the truck, b is the length of the truck (e.g. b = 4 in our experiments), x and y are the Cartesian position of the rear of the center of the truck, and  is the control angle to the truck. Fig. 9 shows a computer-screen image of the simulated truck and loading zone. The goal is to design a controller, whose inputs are (t) ∈ [−90◦ ; 270◦ ] and x(t) ∈ [0; 20], and whose output is (t) ∈ [−40◦ ; 40◦ ] such that the nal states will be (xf ; f ) = (10; 90◦ ). In order to train a FDHECNN, a training data set composed of successful control examples is required. We rst wrote a small program to simulate a controller based on a simple control strategy to generate several successful control trajectories from di erent initial states. A brief description of the simple controller is given as follows. Procedure Try and Error Begin Read initial state [x(0); (0)] t:=0 While (t¡STEP) Do

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

Fig. 9. Diagram of simulated truck and loading zone.

93

Fig. 10. Truck trajectories using the neural-fuzzy controller implemented as a two-layer FDHECNN.

For ( = min angle; ¡max angle; ++) x(t + 1):=Fx (x(t)) (t + 1):=G ((t)) error:=[Weight x × (New x − 10) 2 + Weight  × (New  − 90) 2 ]1=2 If (error ¡ min error) best angle:=angle x(t + 1):=Fx (x(t)) (t + 1):=G ((t)) End If End For t:=t+1 End While Procedure Try and Error End where Fx (x(t)) is de ned by Eq. (11), and G ((t)) is de ned by Eq. (12). We generated 26 successful control trajectories which start from the following initial states: (3,20), (3,30), (3,40), (3,60), (4,40), (5,50), (6,20), (6,40), (6,60), (7,70), (9,20), (9,40), (9,60), (11,110), (11,120), (11,140), (11,160), (12,120), (13,130), (14,120), (14,140), (14,160), (15,150), (17,120), (17,140), and (17,160). We then used the real-valued GA to train a two-layer FDHECNN with ve hidden nodes to construct our neuro-fuzzy controller. In our experiments, the population size is 2000, the crossover probability is 0.6, and the mutation probability is 0.0333. After 200 generations, the ve fuzzy rules are extracted and represented as follows:   2 ( − 34:35) 2 d (x − 7:48) 2 + = (0:52) Rule 1: If (x; ) is near HE1 : (7:08) 2 (73:52) 2 Then  is 3.15.

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

94

 Rule 2:

If

(x; ) is near HE2d :

(x − 11:84) 2 ( + 32:82) 2 + = (0:77) 2 2 (3:06) (−73:46) 2



Then  is −9:06.   (x − 12:87) 2 ( + 45:09) 2 2 + = (0:57) Rule 3: If (x; ) is near HE3d : (6:62) 2 (1:71) 2 Then  is −8:76.   2 ( − 46:81) 2 d (x − 5:12) 2 + = (0:92) Rule 4: If (x; ) is near HE4 : (6:94) 2 (−37:39) 2 Then  is 10.64.   (x − 13:41) 2 ( + 77:91) 2 2 Rule 5: If (x; ) is near HE5d : + = (0:59) (7:39) 2 (59:54) 2 Then  is −9:11. Two randomly chosen initial states, (x0 ; 0 ) = (3:5; 35) and (12.5, 125), were used to test the neuro-fuzzy controller. Fig. 10 illustrates the truck trajectories from the two initial states. The truck backer-upper problem has been solved by Nguyen and Widrow [21] using a two-layered neural network with 25 hidden nodes, Kong and Kosko [17] using a fuzzy control strategy employing 35 fuzzy rules, and Wang and Mendel [28] using a ve-step procedure for constructing a fuzzy controller composed of 27 fuzzy rules. Based on the comparison of the number of rules or the number of hidden nodes, our approach to implementing a neuro-fuzzy controller as a two-layer FDHECNN seems very encouraging. 4.2. Medical diagnostic problem Diabetes mellitus is one of the major non-communicable chronic disease and as such it constitutes a major medical challenge. Diabetes mellitus is a syndrome involving both metabolic and vascular abnormalities. It results in a range of complications which a ect circular systems, eyes, and nerves. There are two primary categories of diabetes mellitus: Type I – insulin-dependent diabetes mellitus (IDDM); and Type II – noninsulin-dependent diabetes mellitus (NIDDM). The pathogenesis of IDDM is well understood; nevertheless, little progress has been made in discovering the mechanisms which trigger NIDDM. In the hopes of discovering a method of early detection of future NIDDM onset, and ultimately, a means to delay or avert the onset of this catastrophic metabolic disease, Hansen et al. [13] are studying its pathogenesis in Rhesus monkeys. Currently, nine phases of progression have been identi ed in the development of the Type II diabetes mellitus. These phases are identi ed by uctuations in various measurements taken on the Rhesus monkeys. Unfortunately, these relationships are rather complex and dicult to be described by human experts. Therefore, the class of FDHECNNs were used to elicit the knowledge for building a diagnosis expert system and eventually free human experts from tedious diagnosis loads. The data were collected from 42 monkeys which had been followed through the course of various phases of diabetes mellitus. The set of physiological variables used for phase identi cation is as follows: (1) age – monkey’s age (year); (2) wgt – monkey’s weight (kg); (3) fpg – fasting plasma glucose (mg=dl); (4) fpi – fasting plasma insulin (mU =ml); and (5) gdr - glucose disappearance rate. Total 479 observations were used in the process of building a diagnosis expert system. All of the data entries contain the above ve parameters and the corresponding diabetes phase classi cation identi ed by human

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

95

Table 1 The number of rules and the generalization performance Phases

1

2

3

4

5

6

7

8

9

Average

Number of rules Generalization performance

3 95.8

5 91.7

4 83.8

3 89.6

10 89.6

3 93.8

2 93.8

3 91.7

3 91.7

4.0 91.2%

experts. Because of the low number of data points per phase, it was decided to partition the complete data set in a ratio 90%=10% ratio for training and testing, respectively. Generalization performance is evaluated by presenting testing patterns to the trained neural network and comparing the network’s classi cation with the desired classi cation. The training procedure was implemented as a C program and run on a SUN SPARC II. In our simulations, the population size is 2000, the crossover probability is 0.55, and the mutation probability is 0.01. After 200 generations, the training procedure was terminated. The same data base was also used to train a backpropagation network with 39 nodes [18]. Table 1 tabulates the number of extracted rules for each phase and the generalization performance. The generalization performance is greatly improved from 70% (achieved by trained backpropagation networks) to 91.2% (achieved by trained FDHECNNs). More importantly, we can build a diagnosis expert system directly from the trained FDHECNNs. The most representative rule for identifying each phase is given as follows: Phase 1: If x is near the HE de ned by (x2 − 5:10) 2 (x3 − 138:90) 2 (x4 − 197:90) 2 (x5 − 3:49) 2 (x1 − 4:10) 2 + + + + = (1:00) 2 2 2 2 2 (4:10) (5:10) (109:60) (197:90) (3:30) 2 Then the monkey is in phase 1. Phase 2: If x is near the HE de ned by (x2 − 15:72) 2 (x3 − 35:00) 2 (x4 − 51:54) 2 (x5 − 3:81) 2 (x1 − 10:38) 2 + + + + = (1:00) 2 2 2 2 2 (7:43) (7:27) (35:00) (17:03) (2:81) 2 Then the monkey is in phase 2. Phase 3: If x is near the HE de ned by (x2 − 29:98) 2 (x3 − 73:21) 2 (x4 − 284:24) 2 (x5 − 4:68) 2 (x1 − 13:82) 2 + + + + = (1:00) 2 (4:48) 2 (10:03) 2 (38:16) 2 (256:47) 2 (3:67) 2 Then the monkey is in phase 3. Phase 4: If x is near the HE de ned by (x2 − 12:43) 2 (x3 − 120:04) 2 (x4 − 346:14) 2 (x5 − 2:27) 2 (x1 − 10:59) 2 + + + + = (1:00) 2 (8:89) 2 (2:88) 2 (61:96) 2 (263:56) 2 (0:67) 2 Then the monkey is in phase 4. Phase 5: If x is near the HE de ned by (x2 − 26:03) 2 (x3 − 56:45) 2 (x4 − 343:09) 2 (x5 − 3:04) 2 (x1 − 18:94) 2 + + + + = (1:00) 2 2 2 2 2 (3:86) (7:63) (26:45) (105:96) (2:42) 2 Then the monkey is in phase 5.

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

96

Phase 6: If x is near the HE de ned by (x2 − 18:01) 2 (x3 − 35:00) 2 (x4 − 142:50) 2 (x5 − 2:52) 2 (x1 − 25:61) 2 + + + + = (1:00) 2 2 2 2 2 (4:39) (4:86) (35:00) (55:92) (0:57) 2 Then the monkey is in phase 6. Phase 7: If x is near the HE de ned by (x2 − 13:08) 2 (x3 − 112:27) 2 (x4 − 313:39) 2 (x5 − 3:07) 2 (x1 − 26:25) 2 + + + + = (1:00) 2 2 2 2 2 (3:75) (10:58) (33:23) (286:39) (2:23) 2 Then the monkey is in phase 7. Phase 8: If x is near the HE de ned by (x2 − 5:10) 2 (x3 − 243:69) 2 (x4 − 502:33) 2 (x5 − 1:29) 2 (x1 − 13:57) 2 + + + + = (1:00) 2 2 2 2 2 (11:74) (5:10) (124:31) (491:68) (1:29) 2 Then the monkey is in phase 8. Phase 9: If x is near the HE de ned by (x2 − 13:30) 2 (x3 − 317:51) 2 (x4 − 23:2) 2 (x5 − 0:77) 2 (x1 − 18:48) 2 + + + + = (1:00) 2 2 2 2 2 (9:88) (12:80) (132:99) (23:2) (0:66) 2 Then the monkey is in phase 9. In the above x = (age; wgt; fpg; fpi; gdr)T . 5. Concluding remarks In this paper, a class of FDHECNNs are proposed to integrate neural networks and fuzzy systems so as to provide an appealing solution to the problem of knowledge acquisition. The propagation of the input values through a trained FDHECNN is equivalent to making fuzzy inference with the rule base consisting of fuzzy rules extracted from the values of the synaptic weights of the trained FDHECNN. In order to increase the chance of nding a global minimum for training a FDHECNN, a real-valued GA is proposed. The rst example shows us how to design a neuro-fuzzy controller directly from numerical data. The diagnostic example tells us a high prediction accuracy can be achieved by trained FDHECNNs. In addition, we can build a diagnosis expert system based on the extracted rules so as to free doctors from tedious diagnosis loads. Acknowledgements We would like to thank Dr. DeClaris and Dr. Hansen for providing us with the diabetes data set. References [1] H. Bersini, J.P. Nordvik, A. Bonarini, A simple direct adaptive fuzzy controller derived from its neural equivalent, Proc. IEEE Int. Conf. Fuzzy System, vol. 1, San Francisco, 1993, pp. 345 – 350. [2] J.C. Bezdek, S.K. Pal (Eds.), Fuzzy Models for Pattern Recognition, IEEE Press, New York, 1992. [3] L. Bochereau, P. Bourgine, Extraction of semantic features and logical rules from a multilayer neural network, Proc. IJCNN 90, Washington DC, 1990, pp. 579 – 582.

M.-C. Su, H.-T. Chang / Fuzzy Sets and Systems 112 (2000) 85–97

97

[4] G. Cybenko, Approximation by superpositions of sigmoidal functions, Math. Control Signals Systems 2 (1989) 315 – 341. [5] N. DeClaris, M.-C. Su, A novel class of neural networks with quadratic junctions, IEEE Int. Conf. on Systems Man and Cybernet. 1991, pp. 1557–1562 (Received the 1992 Franklin V. Taylor Award). [6] N. DeClaris, M.-C. Su, Introduction to the theory and application of neural networks with quadratic junctions, IEEE Int. Conf. on Systems Man and Cybernet., Chicago, USA, 1992, pp. 1320 –1325. [7] L.M. Fu, Rule learning by searching on adapted nets, AAAI’91, 1991, pp. 590 – 595. [8] K. Funahashi, On the approximation realization of continuous mappings by neural networks, Neural Networks 2 (1989) 183 –192. [9] S.I. Gallant, Connectionist expert systems, Commun. ACM 31 (1988) 152 –169. [10] S.I. Gallant, Neural Network Learning and Expert Systems, MIT Press, Cambridge, MA, 1993. [11] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA, 1989. [12] R.M. Goodman, C.M. Higgins, J.W. Miller, Rule-based neural networks for classi cation and probability estimation, Neural Comput. 4 (1992) 781– 804. [13] B.C. Hansen, N.L. Bodkin, Heterogeneity of insulin responses: phase leading to type 2 (noninsulin) diabetes mellitus in the Rhesus monkey, Diabetologia 29 (1986) 713 –719. [14] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks 2 (1989) 359 – 366. [15] J.S.R. Jang, C.T. Sun, Functional equivalence between radial basis function networks and fuzzy inference system, IEEE Trans. Neural Networks 4 (1) (1993) 156 –159. [16] Y. Kajitani, K. Kuwata, R. Katayama, Y. Nishida, An automatic fuzzy modeling with constraints of membership functions and a model determination for neuro and fuzzy model by plural performance indices, in: T. Terano, M. Sugeno, M. Mukaidono, K. Shigemasu (Eds.), Fuzzy Enginering toward Human Friendly Systems, Tokyo, IOS Press, 1991, pp. 586 – 597. [17] S.G. Kong, B. Kosko, Comparison of fuzzy and neural truck backer-upper control systems, Proc. IJCNN-90, 1990, pp. 349 – 358. [18] C.S. Lin, Implementation of a novel neural network design for a demanding medical problem, Master’s Thesis, University of Maryland, College Park, 1991. [19] E.H. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Int. J. Man–Mach. Studies 7 (1) (1975) 1–13. [20] R.S. Michalski, R.L. Chilausky, Learning by being told and learning from examples: an experimental comparison, Int. J. Policy Anal. Inform. Systems 4 (2) (1980) 125 –160. [21] D. Nguyen, B. Widrow, The truck backer-upper: an example of self-learning in neural networks, IEEE Control Systems Mag. 10 (3) (1990) 18 – 23. [22] I. Park, I.W. Sandberg, Universal approximation using radial-basis-function networks, Neural Comput. 3 (1991) 246 – 255. [23] W. Pedrycz, Fuzzy neural networks and neurocomputations, Fuzzy Sets and Systems 56 (1993) 1– 28. [24] E.H. Shortli e, Computer-Based Medical Consultations: MYCIN, Elsevier, New York, 1976. [25] M.C. Su, Network training using quadratic neural-type junctions, Master Thesis, University of Maryland, August 1990. [26] H. Takagi, I. Hayashi, NN-driven fuzzy reasoning, Int. J. Approx. Reason. 5 (3) (1991) 191– 212. [27] T. Takagi, M. Sugeno, Fuzzy identi cation of systems and its applications to modeling and control, IEEE Trans. Systems Man Cybernet. 15 (1) (1985) 116 –132. [28] L.X. Wang, J.M. Mendel, Generating fuzzy rules from numerical data, with applications, USC SIPI Report, no. 169, 1991; also, IEEE Trans. Systems Man Cybernet. 22 (6) (1992) 1414 –1427. [29] P.J. Werbos, Neurocontrol and fuzzy logic: connections and designs, Int. J. Approx. Reason. 6 (1992) 185 – 219.