Induction of fuzzy decision trees

Induction of fuzzy decision trees

FUZZY sets and systems ELSEVIER Fuzzy Sets and Systems 69 (1995) 125-139 Induction of fuzzy decision trees Y u f e i Y u a n a'*, M i c h a e l J. ...

964KB Sizes 42 Downloads 98 Views

FUZZY

sets and systems ELSEVIER

Fuzzy Sets and Systems 69 (1995) 125-139

Induction of fuzzy decision trees Y u f e i Y u a n a'*, M i c h a e l J. S h a w b aMiehael G. DeGroote School o[' Business, McMaster University, Hamilton. Ont., Canada L8S 4M4 bBeekman Institute o/ Advanced Technology and Science, University 0[" Illinois, Urbana-Champaign, 1L, USA

Received January 1994;revised May 1994

Abstract Most decision tree induction methods used for extracting knowledge in classification problems do not deal with cognitive uncertainties such as vagueness and ambiguity associated with human thinking and perception. In this paper cognitive uncertainties involved in classification problems are explicitly represented, measured, and incorporated into the knowledge induction process. A fuzzy decision tree induction method, which is based on the reduction of classification ambiguity with fuzzy evidence, is developed. Fuzzy decision trees represent classification knowledge more naturally to the way of human thinking and are more robust in tolerating imprecise, conflict, and missing information. Keywords. Possibility theory; Measures of information; Expert systems; Knowledge acquisition and learning

1. Introduction Many methods have been developed for constructing decision trees from collections of examples [22]. Although the decision trees generated by these methods are useful in building knowledge based expert systems, they often suffer from inadequately or improperly expressing and handling the vagueness and ambiguity associated with human thinking and perception. As pointed out by Quinlan [18], "the results (of decision trees) are categorical and so do not convey potential uncertainties in classification. Small changes in the attribute values of a case being classified may result in sudden and inappropriate changes to the assigned class. Missing or imprecise information may apparently prevent a case being classified at all". To overcome these shortcomings, Quinlan [18] suggested a probabilistic method to construct decision trees as probabilistic classifiers. Within his framework, inaccuracies of attribute values are treated as noises, branch thresholds are softened and the final classifications are assigned either a central or pessimistic probability estimation. The limitation of his framework, however, is that the types of uncertainties arising in classification problems are not necessarily to be probabilistic, appearing as randomness or noise. There are various classes of uncertainties that can be classified into two broad categories: statistical and cognitive. Statistical uncertainty deals with information or phenomena which arise from the random

* Corresponding author. 0165-0114/95/$09.50 © 1995 ElsevierScience B.V. All rights reserved SSDI 0165-01 14(94)00229-0

126

Y. Yuan, M.J. Shaw / Fuzzy Sets and Systems 69 (1995) 125 139

behaviour of physical systems. Cognitive uncertainty, unlike statistical uncertainty, is the uncertainty that deals with phenomena arising from human thinking, reasoning, cognition and perception processes, or cognitive information in general [-5]. The cognitive uncertainty can be further classified into two subcategories: vagueness and ambiguity. In general, vagueness is associated with the difficulty of making sharp or precise distinctions in the world, i.e., some domain of interest is vague if it cannot be delimited by sharp boundaries. Ambiguity, on the other hand, is associated with one-to-many relations, i.e., situations with two or more alternatives such that the choice between them is left unspecified [-8]. The objective of this paper is to explicitly represent, measure, and incorporate cognitive uncertainties into knowledge induction process for classification problems. The rest of the paper is organized as follows. In Section 2, the representation of cognitive uncertainties in classification problems is introduced. In Section 3, the measurements of vagueness and ambiguity are introduced. In Section 4, the truth level of fuzzy rules and the ambiguity of fuzzy classification are discussed. In Section 5, the method of inducing fuzzy decision trees is developed. To conclude, the advantages and limitations of fuzzy decision tree induction are discussed in Section 6.

2. The representation of cognitive uncertainties in classification problems 2.1. The classical classification problem A typical classification problem can be described as follows. A universe of objects or cases U = {u} are described by a collection of attributes A = {A1 . . . . . AK }. Each attribute AR measures some important feature of an object and is limited to a usually small set of discrete linguistic terms T(Ak) = {T~ . . . . . Tsk~}. T(Ak)is, in other words, the domain of the attribute Ak. Each object u in the universe is classified by a set of classes C = {C1 . . . . . CL}. A classification rule can be written in the form:

IF (A, is Ti~) A N D ...(Ak

is

Ti~ ) T H E N (C is Cj).

A set of classification rules can be induced by using a machine learning method from a training set of objects whose class is known. The classification rules then can be used to classify objects based on the values of their attributes.

2.2. The vagueness and ambiguity involved in the classification problem Most classification problems assume that each object takes one of the mutually exclusive values for each attribute and each object is classified into only one of the mutually exclusive classes [17]. As an example, an object of Saturday's weather (adopted from [17] with some modifications) can have four attributes

A = {Outlook, Temperature, Humidity, Wind } and each attribute has values

Outlook = {Sunny, Cloudy, Rain}, Temperature = { Cool, Mild, Hot }, Humidity = { Humid, Normal }, Wind = { Windy, Not_windy}. The classification can be the sport to play on the weekend, such as

C = {Swimming, Volleyball, Weight lifting}.

Y. Yuan, M.J. Shaw / Fuzzy Sets and Systems 69 (1995) 125 139

127

Since here all the attributes and classifications represent h u m a n ' s perception and desire, they are vague by their nature. F o r instance, people's feeling of cool, mild, and hot is vague and there is no crisp b o u n d a r y between them. Although the vagueness of t e m p e r a t u r e can be avoided by numerical measurement, a rule induced with crisp decision tree m a y then have an artificial crisp boundary, such as " I F temperature >~ 20 ° C T H E N swimming". But how a b o u t when the t e m p e r a t u r e is 19°C? Should a person definitely not go to swimming? Obviously the artificial crisp b o u n d a r y is not always desirable. Although there m a y be no vagueness between the sports swimming or volleyball, the classification when it is interpreted as the desire to play can still be vague. F o r instance, the weather can be perfect or just o k a y for playing volleyball. The classification ambiguity m a y also occur. F o r instance, the weather could be very good for both swimming and volleyball and one m a y feel it is hard to select one.

2.3. Fuzzy set theory The cognitive uncertainties can be well represented by Zadeh's fuzzy set theory [-27]. Some basic concepts are s u m m a r i z e d here. Let U be a collection of objects denoted generically by {u}. U is called the universe of discourse and u represents the generic element of U. Definition 1. A fuzzy set A in a universe of discourse U is characterized by a m e m b e r s h i p function PA which takes values in the interval [0, 1]. For u~U, I~A(U) ---- 1 means that u is definitely a m e m b e r of A and pA(U) = 0 means that u is definitely not a m e m b e r of A, and 0 < I~a(U) < 1 means that u is partially a m e m b e r of A. If either pA(U) = 0 or pa(U) = 1 for all u~ U, A is a crisp set. Definition 2. Let A and B be two fuzzy sets in U with m e m b e r s h i p functions /ta and PB, respectively. The union A u B is defined for all u~U by ~lAu B(U)= max{pA(U),pn(U)}. The intersection A c~ B is defined by IrA ~ B(U) = min { IJA(U),pB(U)}. The complement of A, denoted as A, is defined by pA(U) = 1 -- pA(U). A is B's subset if and only if pA(U)<~ pn(U) for all u6U. Definition 3. The cardinality measure (or sigma count) of a fuzzy set A is defined by M(A) = ~,~vpA(U), which is the measure of the size of A.

2.4 A fuzzy classification problem Fuzzy concept can be introduced into a classical p r o b l e m if either the objects or classes are fuzzy [-16]. An object is said to be fuzzy if at least one of its features (attributes) is fuzzy. A class is said to be fuzzy if it can be represented in fuzzy terms. In the fuzzy classification p r o b l e m we study here, both objects and classes are fuzzy. Each class Ct is defined as a fuzzy set on the universe of objects U. The m e m b e r s h i p function pc,(U) specifies the degree to which object u belongs to class Cz. Each attribute Ak is defined as a linguistic variable which takes linguistic values from T(Ak) = {T~ . . . . . T~k }. Each linguistic value T~ is also a fuzzy set defined on U. The m e m b e r s h i p p~ k (u) indicates the degree to which object u's attribute Ak is T~. The m e m b e r s h i p of a linguistic value can be subjectively assigned or transferred from numerical values by a m e m b e r s h i p function defined on the range of the numerical value. An example of a small training data set with fuzzy m e m b e r s h i p values is shown in Table 1. It should be mentioned that the m e m b e r s h i p is not probability and the sum of the m e m b e r s h i p values of all linguistic terms for an attribute m a y not equal to 1.

128

Y. Yuan, M.J. Shaw/ Fuzzy Sets and Systems 69 (1995) 125 139

Table 1 A small training set Case Outlook

Temperature

Humidity

Wind

Plan

Sunny

Cloud), Rain

Hot

Mild

Cool

Humid

Normal

Windy

Not windy Volleyball

Swimming

W I!fiin9

2 3 4

0.9 0.8 0.0 0.2

0.1 0.2 0.7 0.7

0.0 0.0 0.3 0.1

1.0 0.6 0.8 0.3

0.0 0.4 0.2 0.7

0.0 0.0 0.0 0.0

0.8 0.0 0.1 0.2

0.2 1.0 0.9 0.8

0.4 0.0 0.2 0.3

0.6 1.0 0.8 0.7

0.0 1.0 0.3 0.9

0.8 0.7 0.6 0.1

0.2 0.0 0.1 0.0

5 6 7 8

0.0 0.0 0.0 0.0

0.1 0.7 0.3 1.0

0.9 0.3 0.7 0.0

0.7 0.0 0.0 0.0

0.3 0.3 0.0 0.2

0.1/ 0.7 1.0 0.8

0.5 0.7 0.0 0.2

0.5 0.3 1.0 0.8

0.5 0.4 0.1 0.0

0.5 0.6 0.9 1.0

0.0 0.2 0.0 0.7

0.0 0.0 0.0 0.0

1.0 0.8 1.0 0.3

9 10 11 12

1.0 0.9 0.7 0.2

0.0 0.1 0.3 0.6

0.0 0.0 0.0 0.2

1.0 0.0 1.0 0.0

0.0 0.3 0.0 1.0

0.0 0.7 0.0 0.0

0.6 0.0 1.0 0.3

0.4 1.0 0.0 0.7

0.7 0.9 0.2 0.3

0.3 0.1 0.8 0.7

0.2 0.0 0.4 0.7

0.8 0.3 0.7 0.2

0.0 0.7 0.0 0.1

13 14 15 16

0.9 0.0 0.0 1.0

0.1 0.9 0.0 0.0

0.0 0.1 1.0 0.0

0.2 0.0 0.0 0.5

0.8 0.9 0.0 0.5

0.0 0.1 1.0 0.0

0.1 0.1 1.0 0.0

0.9 0.9 0.0 1.0

1.0 0.7 0.8 0.0

0.0 0.3 0.2 1.0

0.0 0.0 0.0 0.8

0.0 0.0 0.0 0.6

1.0 1.0 0.0

0.35

0 . 1 3 0.32

0.32

0.41

0.41

0.27

0.31

0.18

1

1.0

Vagueness of each linguistic term 0.19

0.37

0.21

0.22

Ambiguity of each attribute 0.13

0.17

0.23

0.31

0.20

3. The m e a s u r e of cognitive uncertainties O n c e t h e f u z z y sets a r e i n t r o d u c e d , t h e c o g n i t i v e u n c e r t a i n t i e s r e p r e s e n t e d b y f u z z y sets c a n t h e r e f o r e be m e a s u r e d . T w o c o g n i t i v e u n c e r t a i n t y m e a s u r e s h a v e b e e n s u g g e s t e d in t h e l i t e r a t u r e [8], t h e y a r e t h e v a g u e n e s s m e a s u r e s E,. a n d t h e a m b i g u i t y m e a s u r e Ea.

3.1. The m e a s u r e s o f vagueness T h e v a g u e n e s s o r f u z z i n e s s o f a f u z z y set c a n b e m e a s u r e d b y a f u z z y e n t r o p y [ 4 ] , s i m i l a r t o S h a n n o n ' s e n t r o p y m e a s u r e o f r a n d o m n e s s [23].

Definition 4. V a g u e n e s s m e a s u r e m e n t : Let A d e n o t e a f u z z y set o n t h e u n i v e r s e U w i t h m e m b e r s h i p f u n c t i o n laA(U) for all u e U . If U is a d i s c r e t e set U = {ul, u2 . . . . . u,~} a n d / ~ i = I~A(Ui), t h e v a g u e n e s s o r t h e f u z z i n e s s o f f u z z y set A is d e f i n e d b y

Ev(A)-

1

~ (/~iln/~i + ( 1 - / z i ) l n ( 1 /7//i = 1

p,)).

(1)

Y. Yuan, M.J. Shaw / Fuzz)' Sets and Systems 69 (I995) 125 139

129

Ev(A) m e a s u r e s the fuzziness or vagueness of a fuzzy set A. W h e n pA(U) = 0.5 for all u~U, Ev(A) = 1, which represents the greatest fuzziness. W h e n pA(U) = 1 or 0 for all u e U, Ev(A) = 0, which represents no fuzziness. The p r o p e r t i e s of fuzziness m e a s u r e is discussed in [-8]. Since all linguistic terms a n d classes are r e p r e s e n t e d as fuzzy sets, their vagueness can be m e a s u r e d by using Definition 4. F o r instance, for all the objects in U, the vagueness of each linguistic term T] can be m e a s u r e d by Ev(T k) a n d the vagueness of each class Ci can be m e a s u r e d by Ev(Ci). T h e vagueness of each linguistic term a n d class in o u r small t r a i n i n g d a t a set is shown in T a b l e 1. W e see that in o u r e x a m p l e E~(Mild) = 0.35 a n d Ev(Cool) = 0.13, which indicate that the term Mild is vaguer t h a n the term Cool. In general, a rule with a vaguer c o n d i t i o n m a y lead to a vaguer conclusion.

3.2. Possibili O, distribution and the measure of ambiguity A fuzzy m e m b e r s h i p function p(x) of a fuzzy variable Y defined on X can also be i n t e r p r e t e d as the possibility of t a k i n g value x for Y a m o n g all elements in X (see [28]). In this case, n(x) = p(x) for all x e X , can be viewed as a possibility d i s t r i b u t i o n of Y on X. n(x) = 1 m e a n s that Y = x is fully possible, n(x) = 0 m e a n s that Y = x is fully impossible. T h e higher n(x) is, the m o r e possible that Y = x. F o r any crisp set B c X, we define n(B)= maxx+Bn(x) as the possibility of t a k i n g value for Y from B. W e have n ( A w B ) = max(n(A),n(B)) a n d n ( A ~ B ) = m i n ( n ( A ) , n ( B ) ) . The possibility d i s t r i b u t i o n n is n o r m a l i z e d to

m a x ~ x n ( x ) = 1. The a m b i g u i t y or nonspecificity of a possibility d i s t r i b u t i o n can be defined a c c o r d i n g to Higashi a n d Klir

[7] as follows. Definition 5. A m b i g u i t y o r nonspecificity measure: Let n = (n(x)lxeX) d e n o t e a n o r m a l i z e d possibility d i s t r i b u t i o n of Y on X = {x 1, x2 ..... x. }, the possibilistic m e a s u r e of a m b i g u i t y o r nonspecificity is defined as Ea(Y )

=

g(n) = ~ (n* -- n * + l ) l n i ,

(2)

i-i

n* = {nl ,n2 . . . . . n*} is the p e r m u t a t i o n of the possibility d i s t r i b u t i o n n = {n(xl), n(x2) . . . . . n(x,)}, so that n* >~ n*+l for i = 1 . . . . . n, a n d n*+l = 0. nonspecificity m e a s u r e s defined here is also called U-uncertainty. It is the only function that satisfies nine r e q u i r e m e n t s for a possibilistic m e a s u r e of uncertainty, including such as expansibility, additivity, monotonicity, branching, a n d normalization [9, 10]. F r o m the definition we have Ea(Y)>~ O. If n* = 0, Ea(Y) = 0, which indicates no a m b i g u i t y since only one value is possible for Y. If n* = 1, Ea(Y) = ln(n), which indicates all values are fully possible for Y, representing the greatest ambiguity. W h e n there is o v e r l a p p i n g between linguistic terms of an a t t r i b u t e or between classes, the a m b i g u i t y exists. To m e a s u r e the a m b i g u i t y (overlapping) of an a t t r i b u t e A a m o n g its linguistic terms T(A) = { T~ ..... Ts}, we i n t e r p r e t the m e m b e r s h i p functions where sorted The all the

{ . ~ , (ui). ~ 2 (u,) . . . . . m~(u,) } as a possibility d i s t r i b u t i o n for object Ul to take linguistic term on term label space T(A) = {T~ . . . . . Ts}. To n o r m a l i z e the possibility d i s t r i b u t i o n , let

nr~(ui) = # r ~ ( u i ) / m a x {#rj(u/)},

s = 1 . . . . . S.

(3)

/ 1 <~j~S

The a m b i g u i t y of the a t t r i b u t e A for object ui therefore can be m e a s u r e d by

Ea(A (ui)) = g(nr(ui)).

(4)

Y. Yuan, M.J. Shaw / Fuzzy Sets and Systems 69 (1995) 125- 139

130

The ambiguity of attribute A then is

Ea(A) = -

1 m ~, Ea(A(u,)).

(5)

mi_ 1

The ambiguity of classes can be measured in the same way as attributes. As an example, to calculate the ambiguity of sport Plan for u I (case 1) in the training data of Table l, we have PVoUeyb,n(Ul)= 0, l*Swim,,i,g(Ul)= 0.8, Pweiohtnfa,g(ul)= 0.2. The m a x i m u m membership a m o n g three sports is 0.8. The normalized possibility distribution of sports for Ul is UVon~).b,u(Ul) = 0 / 0 . 8 = 0, nswi,,,,i,,(ul)=0.8/0.8 = 1, and Uw~ightnlti,o(ul)= 0.2/0.8 = 0.25. The sorted order of the possibilities of the sports is ul?lo,(u~)={1,0.25,0}, and the Plan ambiguity for u~ then is Ea(Plan(ut)) = (1 - 0.25)* ln(1) + (0.25 - 0)* ln(2) + (0 - 0)* In(3) = 0.17. With similar calculation, we have Ea(Plan(u2)) = 0.49, Ea(Plan(u3)) = 0.41, etc. The average ambiguity of Plan for the entire training set is Ea(Plan) = 0.20. The ambiguity of each attribute is listed in Table 1. It shows that the attribute Outlook is less ambiguous than the attribute Wind in our sample data. In general, to derive less ambiguous classification requires less ambiguous data but less ambiguous data m a y not guarantee less ambiguous classification.

4. Fuzzy classification rules and classification ambiguity 4.1. Fuzzy rules and the degree of truth of fuzzy rules Definition 6. A fuzzy rule takes a form: " I F A T H E N B" which defines a fuzzy relation from condition fuzzy set A to conclusion fuzzy set B. An example rule could be: " I F Temperature is Hot A N D O u t l o o k is Sunny T H E N Swimming". Here, "Temperature is Hot A N D O u t l o o k is Sunny" defines a composite condition fuzzy set A = Hot r~ Sunny, and "Swimming" defines a conclusion fuzzy set B. A rule " I F A T H E N B" is true means that A implies B, i.e. A ~ B. The implication operator can be interpreted in m a n y different ways [21]. In our interpretation, the implication A ~ B holds if and only if A is B's subset, i.e. A c B. F r o m Definition 2, A c B if and only if pA(U) <<.PB(U)for all ue U. When tXA(U) > pB(U) for some ue U, the rule will not be absolutely true but m a y still be partially true. The degree of truth of the rule " I F A T H E N B" therefore can be measured by the subsethood of A in B.

Definition 7. The fuzzy subsethood S(A, B) measures the degree to which A is a subset of B [12]. S(A, B) -

M(A c~ B) M(A)

-

Y..~vmin(pa(u),/xn(u)) E,~c, pA(U)

(6)

K o s k o [13] proves the subsethood theorem

S(A, B) --

M(B)S(B,A) , M(A)

(7)

which can be reviewed as a fuzzy equivalence of Bayes theorem

p(BIA) -

p(B)p(A[ B) , p(A)

where p(B[A) is conditional probability of B given condition A.

(8)

Y. Yuan, M.J. Shaw / Fuzzy Sets and Systems 69 (1995) 125 139

131

It should be mentioned that to calculate the subsethood S(A, B), A and B should be defined on the same universe. In our classification problem, all classes and linguistic values are defined as fuzzy sets on the same universe of objects so that the subsethood calculation a m o n g them is valid. F o r instance, in our example, Hot and Sunny should be interpreted as Hot days and Sunny days, each is a fuzzy set defined on the universe of all weekend days, and Swimming can be interpreted as Swimming days, a fuzzy set defined on the same universe of all weekend days. Obviously, if Hot and Sunny days is the subset of Swimming days, the rule " I F Temperature is Hot A N D O u t l o o k is Sunny T H E N Swimming" is true. With the training data set in Table 1, we have S(Hot ca Sunny, Swimming) = 0.854, which can be used as the truth level of the rule " I F Temperature is Hot A N D O u t l o o k is Sunny T H E N Swimming". The rule should be interpreted as: "If a weekend's weather is Hot and Sunny, go to Swimming on that day" since Hot, Sunny and Swimming are all referenced to a particular day.

4.2. Classification possibility with .[uzzy evidence We view classification a rational action that a decision maker will take in consistency with his or her knowledge. The desirability of an action (not the probability of an outcome) is closely related to the concept of possibility. As H a g g [6] points out: "To a certain extent this possibility (or feasibility) of an action is described by the probabilities of the different outcomes. But the probability concepts and their estimation methods seem to be most narrowly connected with the situation where an inactive observer is regarding the o u t c o m e of random, i.e. more or less unpredictable and uncontrollable events. The possibility concept, on the other hand, is more connected with a situation where an active decision maker is facing obstacles, which to a certain extent baffle his endeavors". Using our example, if the weather is unknown, all sports are possible to take, and the ambiguity of the plan is the greatest. With the knowledge (a fuzzy evidence) such that "The O u t l o o k is Sunny", the possibility to take each sport will be different and the ambiguity of the plan will be reduced. We now define fuzzy evidence and specify how to calculate the possibility and ambiguity of classification with fuzzy evidence.

Definition 8. In a classification problem, a fuzzy evidence is a condition fuzzy subset defined on object space, which represents the linguistic values taken by one or more attributes. F o r instance, in our example, a fuzzy evidence E could be Hot ~ Sunny, representing the condition that "the Temperature is Hot and the O u t l o o k is Sunny". Definition 9. Given fuzzy evidence E, the possibility of classifying an object to class Ci can be defined as

~(c, I E) = S(E, c,) ,/max S(E, Cj), /

(9)

"

where S(E, CI) represents the degree of truth for the classification rule: " I F E T H E N Ci", and ~(CIE) = {~(C, IE), i = 1 . . . . . L} is a normalized possibility distribution on the nonfuzzy label space C = {C1 . . . . . CL].

F o r example, based on our sample data in Table 1, with fuzzy evidence "the Temperature is Hot", we have S(Hot, Volleyball) = 0.38, S(Hot, Swimming) = 0.67, and S(Hot, Weight_lifting) = 0.20. The normalized possibility distribution of Plan with fuzzy evidence Hot is ~(Plan] Hot) = {0.56, 1,0.29}.

4.3. Classification amb(~ui O, with.fuzz), evidence and fuzzy partitioning K n o w i n g a single evidence, such as a particular value of an attribute, the classification ambiguity can be defined as follows.

Y. Yuan, M.J. Shaw / Fuzzy Sets and Systems 69 (1995) 125 139

132

Definition 10. The classification ambiguity with fuzzy evidence E is defined as measured based on the possibility distribution n(C IE).

G(E) = g(n(C[E)), which is

As an example, knowing g(Plan]Hot)={0.56,1, O.29} calculated above, we have 9(n(Plan] Hot)) = 0.51, the classification ambiguity with evidence "the Temperature is Hot".

G(Hot)=

K n o w i n g a set of evidences, such as the linguistic values of one attribute, we can partition all the objects based on the linguistic values of that attribute. Fuzzy partition can also be applied to another fuzzy set. For instance, the Outlook = {Sunny, Cloudy, Rain} can be used to partition Hot. Fuzzy partition of a fuzzy set can be defined as follows. Definition 11. Given a fuzzy evidence F and a set of fuzzy evidences P = [El ..... Ek } defined on the object space U, the fuzzy partition P on F is defined as P] F = {El c~ F . . . . . Ek c~ F}, where each object in F is partitioned to Ei with the membership IrE, ~ v. W h e n F = U we simply write P] U = P. The classification ambiguity of fuzzy partition can be defined as follows. Definition 12. The classification ambiguity with fuzzy partitioning P = [El . . . . . Ek} on fuzzy evidence F, denoted as G(PIF), is the weighted average of classification ambiguity with each subset of the partition k

G(PIF)= ~ w(E, IF)G(E,~F),

(10)

i-1

where G(E~m F) is the classification ambiguity with fuzzy evidence represents the relative size of subset Ei m F in F.

E~c~F, w(EilF) is the weight which

k

w(EiIF)= M(E ic~ F)/ Z M(Ejc~ F).

(11)

/j-I

As an example, we have a(OuttooklHot) = w(SunnylHot)* G(Sunny c~ Hot) + w(Cloudyl Hot)* G (Cloud), c~ Hot) + w(Rainl Hot)*G(Rain c~ Hot) = 0.52. The overlapping between linguistic terms m a y lead to high classification ambiguity. To reduce ambiguity, classification of an object should be determined based on its strong rather than weak evidences. Here, an evidence is strong if its membership exceeds a certain significant level. Definition 13. Given fuzzy evidence E with membership l~E(u), define E~ the fuzzy evidence at significant level, ~, with the membership

I~E~(U)={oE(U) ifif ~ire(U) ,/~E< >~ ( U ~. )

(12)

F o r a given set of fuzzy evidences {El . . . . . Ek], we partition fuzzy set F at significant level c~ with the following definition. Definition 14. Given a fuzzy evidence F and a set of fuzzy evidences P = {El . . . . . Ek } defined on the object space U, the fuzzy partition at significant level ~ is defined as P~IF2 = {E~I c~ F~ ..... E~k c~ Fe}, where E~i is the evidence Ei at significant level ~ and F~ is the evidence F at significant level ~. The calculation of classification ambiguity with fuzzy evidence or fuzzy partition at significant level ~ will be defined in a similar way as Definitions 10 and 12.

Y. Yuan, M.J. Shaw/ Fuzz)' Sets and Systems 69 (1995) 125 139

133

The significant level ~ provides a filter to reduce the ambiguity (the overlapping) in partitioning. The higher the a, the lower the ambiguity. For instance, at significant level :t = 0.5, we will have G(Hot) = 0.45 and G(Outlook[Hot) = 0.42 that are lower than the corresponding values calculated before at a = 0 (G(Hot) = 0.51 and G(Outlook[Hot)= 0.52). However, it should be noted that high c~ may lead to the omission of some objects that have no evidence exceed the significant level therefore do not belong to any subset of the partition. To make the symbol simple, we will omit the greek letter a attached to each evidence while keeping in mind that a significant level a is applied to all the evidences under consideration. The classification ambiguity measure will be used to guide the search for classification rules in the next section.

5. The induction of fuzzy decision trees

5.1. The main idea We construct a fuzzy decision tree in the process of reducing classification ambiguity with accumulated fuzzy evidences. Each fuzzy evidence is the knowledge about a particular attribute. The selection of each additional fuzzy evidence is based on its contribution in reducing the classification ambiguity. The method is similar to the nonfuzzy decision tree induction method such as ID3 (see [20]) where the use of information entropy as the heuristic induction criterion is replaced by the measurement of classification ambiguity. It should be mentioned that the method suggested here is different from other fuzzy extensions of ID3 reported in the literature [2, 24 26]. In [24], ID3 algorithm is incorporated into fuzzy modelling. The ID3 algorithm is used in a traditional way to select effective input (numerical) variables and their splitting intervals for output classification. These intervals are then used as fuzzy boundaries. Multiple regression is used in each subspace of the input space to form fuzzy rules where the premise is the corresponding fuzzy subspace in the input space and the conclusion is a linear combination of fuzzified numerical input variables. In [2], the classification problem under study has numerical attributes and two complementary crisp classes (yes or no). A continuous ID3 algorithm is used to convert a decision tree into a layer of a feedforward neural network. In the neuron network, a neuron with a sigmoid output function can be viewed as a hyperplane with a fuzzy boundary. Kosko's fuzzy entropy [13] is used to measure the fuzziness (i.e. the vagueness) of classification by the neuron. (Here, the use of a vagueness measure not an ambiguity measure can be justified since the classification involves only between one class and its complement.) The nodes within the hidden layer are generated until the fuzzy entropy is reduced to zero. (It can be achieved only when the classes become crisp.) New hidden layers are generated until there is only one node at the output layer. In [25, 26], the classification problem under study may have nominal or numerical attributes or conclusions. No fuzziness is involved with nominal data. Fuzzy-ID3 is applied for the cases where attributes or conclusions have numerical values. Numerical value is fuzzified to fuzzy terms before induction. The probability of fuzzy event is used to replace the probability of crisp term for numerical attribute values and a generalized measure of disorder based on [4] is used as an entropy function for numerical decision values. The algorithm and its justification, however, is not reported in detail. It seems that the fuzzy entropy used there captures only the fuzziness (i.e. the vagueness) of the fuzzified decision variable, not the ambiguity of classification. The branching form decision node seems somehow overlapping, but not being treated as fuzzy partitioning. There are several differences between our approach and above mentioned approaches. In terms of problems under study, our approach can handle the classification problems with both fuzzy attributes and fuzzy classes represented in linguistic fuzzy terms. It can also handle other situation in a uniform way where numerical values can be fuzzified to fuzzy terms and crisp categories can be treated as a special case of fuzzy terms with zero fuzziness. The major difference between our approach and other fuzzy ID3 is the use of

134

Y. Yuan. M.J. S h a w / Fuzzy Sets and Systems 69 (1995) 125 139

classification ambiguity as fuzzy entropy. The classification ambiguity directly measure the quality of classification rules at the decision node. It can be calculated under fuzzy partitioning and multiple fuzzy classes without any restrictions. Another advantage of our approach is the use of significant level of evidence and truth level threshold (will be discussed later) which provides effective control during the induction process. The fuzzy decision tree induction process suggested here consists of the following steps: (1) Fuzzifying the training data. (2) Inducing a fuzzy decision tree. (3) Converting the decision tree into a set of rules. (4) Applying fuzzy rules for classification.

5.2. Fuzzi/),ing the training data In a classification problem, training data can be in the form of either categorical or numerical. When the data are numerical, they need to be fuzzified into linguistic terms. In fact, the fuzzification is a process of conceptualization, which is often used by people to reduce information overload in decision making process. For instance, the numerical salary data may be perceived in linguistic terms such as high, average, and low. Their membership functions can be approximately determined based on experts' opinion or people's c o m m o n perception. Alternatively, the membership function may be derived from statistical data [3]. Fuzzy clustering based on self-organized learning can also be used to determine membership functions [11, 15]. Here a simple algorithm is used to generate triangular membership function on numerical data. Assume attribute A has numerical value x. The numerical values of attribute A for all objects u ~ U then can be represented by X = {x(u), u ~ U}. We want to cluster X to k linguistic terms T,., i = 1 . . . . . k. Each linguistic term T~ has a triangular membership function as follows:

ur,(x) =

I,

x~ml,

(m2 - x)/(m2 - m,),

ml
I

[0,

x >/ m 2 ,

l,

urn(x)=

f

(X--mk O,

(13)

X >/mk,

1)/(mk--mk

1),

l,

X >Jmi+ 1,

(mi+ , - x)/(ITli+ 1 mi), ur,(x)= ] ( x - m i 1 ) / ( m i - m i L),

~0,

(14)

l < X < mk ,

x~mk

0,

I

mk

-

-

mi< mi

x < mi+l,1

< i<

k

(15)

1 <- x < m i ,

x <~ mi

1,

The slopes of the triangular membership functions are selected in the way that adjacent membership functions cross at the membership value 0.5. In this case, the only parameters need to be determined are the set of k centres M = {m~,i = 1 . . . . . k}. The centres m~ can be calculated by using Kohonen's feature-maps algorithm [ 11 ]. At time 0, the centres m~[0] are initially set to be evenly distributed on the range of X, such as m,[O 3 = m i n { x , x ~ X }

+(max{x,x~X}-min{x,x~X})*(i--

1 ) / ( k - 1),

i = 1 . . . . . k.

Y. Yuan, M.J. Shaw / Fuzzy Sets and Systems 69 (1995) 125 139

135

The centres are then adjusted iteratively in order to reduce the total distance of X to M, defined as

D(X,M)= ~ m i n i l l x - - m i l ] . xe.X

Each iteration at time t consists of three steps: (1) r a n d o m l y draw one sample x from X, denoted as x [ t ] ; (2) find the closest centre to x[t], i.e. find c such that ]bx[t] - mc[t] II = mini I]x[t] - mi[t] I]; (3) adjust mc[t + 1] = m c [ t ] + q[t](x[t] - mc[-t]) and keep mi[t + 1] = mi[t] for i ¢ c, where t is iteration time, q It] is a m o n o t o n i c decreasing scalar learning rate. The iteration will continue until D(X, M) converges.

5.3. lnducing Juzzy decision tree With given evidence significant level ~ and truth level threshold/3, the induction process consists of the following steps: Step 1: Measure the classification ambiguity associated with each attribute and select the attribute with the smallest classification ambiguity as the root decision node. Step 2: Delete all empty branches of the decision node. F o r each n o n e m p t y branch of the decision node, calculate the truth level of classifying all objects within the branch into each class. If the truth level of classifying into one class is above a given threshold/~, terminate the branch as a leaf. Otherwise, investigate if an additional attribute will further partition the branch (i.e. generate more than one n o n e m p t y branch) and further reduce the classification ambiguity. If yes, select the attribute with smallest classification ambiguity as a new decision node from the branch. If not, terminate this branch as a leaf. At the leaf, all objects will be labelled to one class with the highest truth level. Step 3: Repeat step 2 for all newly generated decision nodes until no further growth is possible, the decision tree then is complete. All the above steps are carried out at a given significant level ~. An object belongs to a branch only when the corresponding membership is greater than ~. The ambiguity measure is also calculated at the significant level c~. The parameter ~ plays a very i m p o r t a n t role in filtering insignificant evidences therefore eliminating insignificant branches and leaves. The truth level threshold/3 controls the growth of the tree. Lower/3 m a y lead to a smaller tree but with lower classification accuracy. Higher/3 m a y lead to a larger tree with higher classification accuracy. However, when/~ increases to certain point, no any gain in accuracy can be reached. The selection of :~ and/3 depends on individual situation. We illustrate the induction process by using the small training data shown in Table 1. Let evidence significant level ~ = 0.5, and truth level threshold /~ = 0.7. Calculating classification ambiguity with each attribute, we have G(Outlook)= 0.52, G( Temperature) = 0.48, G(Humidity)= 0.84, and G(Wind)= 0.52. Since the attribute Temperature has the smallest classification ambiguity, it is selected as the root. There are three branches (Hot, Mild, and Cool) from the root Temperature. At the branch Hot, the classification truth levels for each of the classes are S(Hot, Volleyball) = 0.38, S(Hot, Swimming)= 0.67 and S(Hot, Weight_lifting ) = 0.20. N o any classification based on Hot can exceed the truth level /L Further partitioning the branch with different attributes needs to be considered. We have the classification ambiguity G(Hot) = 0.45, G(OutlooklHot) = 0.42, G(Humidity[Hot) = 0.50, and G(Wind]Hot)=0.52. Since adding the attribute Outlook reduces the classification ambiguity (G(OutlooklHot) < G(Hot) = 0.45), it is selected as a decision node at the branch Hot. Further induction will be carried out for the branches from the new decision node Outlook lHot. There are three branches (Sunny, Cloudy, Rain). At the branch Sunny, the classification truth level for Swimmin9 is 0.85 ~>/3 so it becomes a leaf with label Swimmin,q. At the branch Cloudy, the classification truth level for Swimmin9 is 0.72 >~ [t so it

136

Y. Yuan. M.J. S h a w / Fuzzy Sets attd Systems 69 (1995) 125 139

becomes a leaf with label Swimming. At the b r a n c h Rain, the truth level for Weight l!ftin,q is 0.73 ~> [4 so it becomes a leaf with label Weiqht l!['tin#. A the b r a n c h Mild from the root, no any classification can exceed the truth level t h r e s h o l d fi (S(Mild, Volleyball) = 0.52, S(Mild, Swimming) = 0.30 and S(Mild, Weight_l!['tin9) = 0.54) a n d further p a r t i t i o n i n g with a d d i t i o n a l a t t r i b u t e should be considered. W e have classification a m b i g u i t y G(Mild)= 0.83, G(OutlooklMild ) = 0.85, G(Humidity IMild) = 0.80, and G(Wind IMild) = 0.36 < G(mild). The decision node Wind is a d d e d to the b r a n c h Mild. F r o m this node, the b r a n c h Windy t e r m i n a t e s with label Weight l![tin# (truth level = 0.81) a n d b r a n c h Not windy t e r m i n a t e s with label Volleyhall (truth level = 0.78). At the b r a n c h Cool from the root, the truth level for class Volleyball is 0.21, for Swimming is 0,07, and for We#lht lifting is 0.88 >~ [~, so the b r a n c h Cool t e r m i n a t e s to be a leaf a n d Weightl(ltin# is selected as the label for the leaf. F i n a l l y we have c o m p l e t e d a fuzzy decision tree shown in Fig. I(A). The selection of significant level ~ will affect the i n d u c t i o n process. F o r instance, if we select ~ = 0, the a t t r i b u t e Temperature will still be selected as the root. However, classification a m b i g u i t y can not be further reduced at the b r a n c h Hot. It t e r m i n a t e s as the leaf of Swimming with truth level 0.67. The b r a n c h e s Mild will be further p a r t i t i o n e d by Wind as before a n d the b r a n c h Cool will t e r m i n a t e as the leaf of Weight l!ftin#.

5.4. Converting decision tree to a set o/class(tS"cation rules Each p a t h of b r a n c h e s from r o o t to leaf can be c o n v e r t e d into a rule with c o n d i t i o n part represents the a t t r i b u t e s on the passing b r a n c h e s from the r o o t to the leaf a n d the conclusion p a r t represents the class at the leaf with the highest classification truth level. F o r the decision tree shown in Fig. 1 (A), the c o n v e r t e d rules are shown in Fig. I(B). The decision tree and the c o r r e s p o n d i n g classification rules can be further simplified. The m e t h o d s of simplifying decision trees for nonfuzzy classification p r o b l e m s can be found in [19]. These techniques can be a d o p t e d to simplify fuzzy decision trees with some modifications. Here we use a simple rule simplification technique. F o r each rule we try to simplify it by r e m o v i n g one a t t r i b u t e term at a time from the I F part. Select the term to r e m o v e with the highest truth level of the simplified rule. If the truth level of this new rule is not A. Fuzzy decision tree Temperature? (G{ Temperature) - 0.48} llot (G(Hot} - 0.45): Outlook'? (G(Outlookltfot) = 0.42) Sunny: Swimmincl (S - 0.85} Cloudy: Swimmin,q (S - 0.72) Rain: Wei,qht li[iin,q (S - 0.73) Mild {G(Mild) - 0.83): Wind? (G( Wind I Mild) - 0.36) Windy: Weight I(]tin# (S = 0.81) Not windy: fq)lleyhall (S = 0.78} Cool (G(Cool) - 0.20): Wei,qht I!ltin,q (S = 0.88) Note: G is classification a m b i g u i t y measure at the decision node, S is the classification truth level at the lea[ B. Fuzzy rules c o n v e r t e d from fuzzy decision tree Rule 1: IF Temperature is Hot A N D Outlook is Sunny T H E N Swimming! (S = 0.85) Rule 2: I F Temperature is Hot A N D Outlook is Cloudy T H E N Swimming (S - 0.72) Rule 3: IF Temperature is Hot A N D Outlook is Rain T H E N Weight l![tin,~t {S - 0.73) Rule 4: IF Temperature is Mild A N D Wimt is Windy T H E N Weight I([tin~t {S - 0.81) Rule 5: IF Temperature is Mild A N D Wind is Not windy T H E N Volleyball (S = 0.78) Rule 6: IF Temperature is Cool T H E N Wei,qht l(/tin# (S - 0.88) Note: Rule 3 can be simplified to rule 3': Rule 3': I F Outlook is Rain T H E N Weight I([tinft (S - 0.89) Fig. 1. The induced fuzzy decision tree and fuzzy rules.

Y. Yuan, M.J. Shaw/Fuzz)' Sets and Systems 69 (1995) 125 139

137

lower than the threshold fl or the truth level of the original rule, the simplification is successful. The process will continue until no further simplification is possible for all the rules. In our example, rule 3: "IF Temperature is Hot AND Outlook is Rain T H E N Weight_lifting" can be simplified to rule 3': "IF Outlook is Rain T H E N Weight-liftin9". The truth level of rule 3' is 0.89, higher than 0.73, the truth level of the original rule 3. No other rules can be further simplified. After simplification, the rules will no longer correspond to the original tree. Simplifying rules without compromising their accuracy is desirable because a simplified rule with less conditions is more general, more likely to classify more objects. Hopefully, it may make better classification. (The accuracy of classification during the simplification is improved or maintained to certain level for the training cases, but the accuracy of the classification for unknown cases, however, depends on the representativeness of the training data, as it is always the case for any learning system.) A simplified rule is also more likely to tolerate missing or imprecise data. The collection of simplified rules can be stored in a rule base in a fuzzy expert system.

5.5. Apply rules for classification In nonfuzzy case, only one rule will be applied for each particular object. But in the fuzzy case, many rules can be applied at the same time therefore an object may be classified into different classes with different degrees. The classification for a given object is obtained with the following steps: Step 1: For each rule, calculate the membership of the condition for the object based on its attributes. The membership of conclusion (the classification to a class) will be set equal to the membership of condition.

Table 2 Learning result from the small training set Case

Classification known in training data

Classification with learned rules

Classification with no information on wind

Volleyball

Swimming

W lifting

Volleyball

Swimming

W lij~ing

Volleyball

Swimming

W lilting

1 2 3 4

0.0 1.0 0.3 0.9

0.8 0.7 0.6 0.1

0.2 0.0 0.1 0.0

0.0 0.4 0.2 0.7

0.9 0.6 0.7 0.3

0.0 0.0 a 0.3 0.3

0.0 0.4 0.2 0.7

0.9 0.6 0.7 0.3

0.0 0.4 a 0.3 0.7 h

5 6 7 8

0.0 0.2 0.0 0.7

0.0 0.0 0.0 0.0

1.0 0.8 1.0 0.3

0.3 0.3 0.0 0.2

0.1 0.0 0.0 0.0

0.9 0.7 1.0 0.8"

0.3 0.3 0.0 0.2

0.1 0.0 0.0 0.0

0.9 0.7 1.0 0.8 a

9 10 11 12

0.2 0.0 0.4 0.7

0.8 0.3 0.7 0.2

0.0 0.7 0.0 0.1

0.0 0.1 0.0 0.7

1.0 ~0 0.7 0.0

0.0 0.7 0.0 0.3

0.0 0.3 0.0 1.0

1.0 0.0 0.7 0.0

0.0 0.7 0.0 t.0 h

13 14 15 16

0.0 0.0 0.0 0.8

0.0 0.0 0.0 0.6

1.0 1.0 1.0 0.0

0.0 0.3 0.0 0.5

0.2 0.0 0,0 0.5

0.8 0.7 1.0 0.0 b

0.8 0.9 0.0 0.5

0.2 0.0 0.0 0.5

0.8 h 0.9 h 1.0 0.5 h

Classificationambiguity 0.20 a Wro ng classification. b Cann ot distinguish between two or more classes.

0.23

0.40

138

Y. Yuan, M.J. Shaw / Fuzzy Sets and Systems 69 (1995) 125 139

Step 2: When two or more rules are applied to classify the object into the same class with different memberships, take the maximum as the membership to the class. Step 3: An object may be classified into several classes with different degrees. When classification to only one class is required, select the class with the highest membership. In fact, it is not necessary to apply all the rules to a particular object in order to find the class with the highest membership. A Branch-BoundBacktrack Algorithm for fuzzy decision tree calculation is suggested in [1]. It can be used to improve calculation efficiency for a very large fuzzy decision tree. Finally, when classification needs to be converted into a numerical value, some defuzzification methods, such as the max criterion method, the mean of m a x i m u m method, and the centre of area method, may be applied [14]. With the derived and simplified six classification rules, the classification results for the training data are shown in Table 2. Among 16 training cases, 13 cases (except cases 2, 8, 16) are correctly classified. The classification accuracy is 81%. The classification ambiguity is 0.23, greater than the original classification ambiguity 0.20. To observe the impact of missing data, assuming that the information about the attribute Wind is not available, we assign both the memberships of Windy and Not_windy equal to 1. The classification results are also shown in Table 2. Five cases are affected (cases 4, 12, 13, 14, and 16). The missing of information does not simply lead to a wrong classification. It increases classification ambiguity and makes the choice between Volleyball and Weiqht_liftinq to be undecided for these cases. The average classification ambiguity for all the cases increases from 0.23 to 0.40.

6. Conclusion The fuzzy decision tree induction method proposed here has the following advantages: (1) It allows the representation of cognitive uncertainties in classification problems. (2) It provides more information to the decision maker regarding the truth levels of the rules and the memberships of classification. (3) It uses linguistic terms with soft boundaries to accommodate vagueness and ambiguity in human thinking and perception. (4) It tolerates missing, conflict, and imprecise data through degrading but not disfunctioning, and therefore is more robust. (5) It provides an efficient and effective way of building fuzzy expert systems, comparing with manual rule acquisition or massive generating of all possible combination of attribute values. (6) It provides the measurement of the quality of the training data. Small, but not large vagueness and ambiguity are desirable to generate fuzzy rules with good classification capability and robustness. The classification accuracy may be further improved through tuning the membership functions. One way is to add linguistic modifiers [27] such as "Very", "More or Less", "Not so", "Between", etc. to linguistic terms during the induction process. Another way is to convert membership functions and fuzzy rules into neural networks and using the learning mechanism of the neural networks to tune the membership function [15]. The first method is more suitable for originally categorical data and the second method is more suitable for originally numerical data.

Acknowledgement Financial support for this research was provided by the Natural Sciences and Engineering Research Council of Canada. The authors wish to say thanks to Ms. Qian Zhao for her computer programming to implement the fuzzy induction algorithm.

Y. Yuan, M.J. Shaw / Fuzz,; Sets and Systems 69 (1995) 125 139

139

References [1] R.L.P. Chang and T. Pavlidis, Fuzzy decision tree algorithms, IEEE Trans. Systems Man Cybernet. SMC-7 (1977) 28 35. [2] K.J. Cios and L.M. Sztandera, Continuous ID3 algorithm with fuzzy entropy measures, Proc. IEEE lnternat. Con/i on Fuzz)' Systems (San Diego, CA, 8 12 March 1992) 469 476. [3] M.R. Civanlar and H.J. Trussell, Constructing membership functions using statistical data, Fuzzy Sets and Systems 18 (1986) 1 14. [4] A. De Luca and S. Termin, A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory, InJorm. and Control 20 (1972) 301-312. [-5] M.M. Gupta, Twenty-five years of fuzzy sets and systems: A tribute to Professor Lotfi A. Zadeh, Fuzzy Sets and Systems 40 (1991) 409-413. [6] C. Hagg, Possibility and cost in decision analysis, Fuzzy Sets and Systems 1 (1978) 81 86. [-7] M. Higashi and G.J. Klir, Measures of uncertainty and information based on possibility distributions, lnternat. J. Gen. Systems 9 (1983) 43--58. [-8] G.J. Klir, Where do we stand on measures of uncertainty, ambiguity, fuzziness and the like? Fuzzy Sets and Systems 24 (1987) 141 160. [9] G.J. Klir, and T.A. Folger, Fuzzy Sets, Uncertainty, and ln[brmation (Prentice-Hall, Englewood Cliffs, N J, 1988). [10] G.J. Klir and M. Mariano, On the uniqueness of possibilistic measure of uncertainty and information, Fuzzy Sets and Systems 24 (1987) 197-219. [11] T. Kohonen, Se!f-Organization and Associative Memory (Springer, Berlin, 1988). [12] B. Kosko, Fuzzy entropy and conditioning, 1nJorm. Sci. 30 (1986) 165 174. [13] B. Kosko, Neural Networks and Fuzzy Systems (Prentice-Hall, Englewood Cliffs, N J, 1992). [14] C.C. Lee, Fuzzy logic in control systems: fuzzy logic controller, Part II, IEEE Trans. Systems Man Cybernet. 20 (1990) 419-435. [15] C.-T. Lin and C.S.G. Lee, Neural-network-based fuzzy logic control and decision system, IEEE Trans. Comput. 12 (1991)

1320-1336. [16] W. Meier, R. Weber and H.-J. Zimmermann, Fuzzy data analysis - methods and industrial applications, Fuzzy Sets and Systems 61 (1994) 19-28. [17] J.R. Quinlan, Induction of decision trees, Mach. Learning 1(1)(1986) 81 106. [18] J.R. Quinlan, Decision trees at probabilistic classifiers, Proc. 4th lnternat. Workshop on Machine Learning (Morgan Kauffman, Los Altos, CA, 1987) 31 37. [19] J.R. Quinlan, Simplifying decision trees, lnternat. J. Man Mach. Studies 27 (1987) 221 234. [20] J.R. Quinlan, Decision trees and decision making, 1EEE Trans. Systems Man Cybernet. 20 (1990) 339 346. [21] D. Ruan and E.E. Kerre, Fuzzy implication operators and generalized fuzzy method of cases, Fuzzy Sets and Systems 54 (1993) 23-37. [22] S.R. Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Trans. Systems Man Cybernet. 21 (1991) 66(~674. [23] C.E. Shannon, A mathematical theory of communication, Bell System Tech. J. 27 (1948) 379 423; 623-656. [24] T. Tani and M. Sakoda, Fuzzy modeling by ID3 algorithm and its application to prediction of heater outlet temperature, Proc. IEEE lnternat. Conj'. on Fuzz), Systems (San Diego, CA, 8-12 March 1992) 923 930. [25] R. Weber, Automatic knowledge acquisition for fuzzy control applications, Proc. lnternat. Symp. on Fuzzy Systems (lizuka, Japan, 12 15 July 1992) 9 12. [26] R. Weber, Fuzzy-ID3: a class of methods for automatic knowledge acquisition, Proc. 2nd Internat. Con[i on Fuzzy Logic & Neural Networks (lizuka, Japan, 17 22 July 1992) 265 268. [27] L.A. Zadeh, Fuzzy Sets, Inform. and Control 8 (1965) 338 353. [28] L.A. Zadeh, Fuzzy Sets as a bases for a theory of possibility, Fuzzy Sets and Systems 1 (1978) 3 28.