Managing uncertainty in a fuzzy expert system

Managing uncertainty in a fuzzy expert system

Int. J. Man-Machine Studies (1988) 29, 129-148 Managing uncertainty in a fuzzy expert system J. J. BUCKLEY Mathematics Department, University of Ala...

1MB Sizes 0 Downloads 102 Views

Int. J. Man-Machine Studies (1988) 29, 129-148

Managing uncertainty in a fuzzy expert system J. J. BUCKLEY

Mathematics Department, University of Alabama at Birmingham, Birmingham, A L 35294, USA (Received 24 December 1986 and in revised form 18August 1987) All data and rules in a fuzzy expert system are accompanied by their degree of confidence values. This paper is concerned with processing these confidence values during one round of rule firing in a fuzzy expert system. Part I discusses determining the final confidence in the left hand side of a rule which includes: (1) pattern evaluation; (2) finding the confidence in the antecedent; and (3) combining rule and antecendent confidence. Part II discusses the maintenance of memory in a fuzzy expert production system, in both deductive reasoning systems (with sequential rule-firing schemes) and in inductive reasoning systems (with rules fired in parallel).

Introduction The purpose of this paper is to discuss methods of managing uncertainty in a fuzzy expert system (FES). Although we attempt to describe fuzzy expert systems in general, from time-to-time our discussion will center on our FES shell called FLOPS (Buckley, Siler & Tucker, 1986b; Siler & Tucker, 1986; Siler, Buckley & Tucker, 1987; Siler, Tucker & Buckley, 1987). The data in working memory are organized in memory elements each of which contains one or more attributes. An example of a memory element that we shall use throughout this paper is A N I M A L with three attributes W E I G H T , SIZE, and C O L O R . There may be many different instances, or types, of A N I M A L in working memory owing to the memory maintenance system (discussed in Part II) or owing to the FES processing multiple data sets, originating from different problems, simultaneously. In order to differentiate between different instances of a memory element each memory element is assigned a time tag (1T). Time tags also indicate when this instance of a memory element was created within the system. Let us suppose the instance of A N I M A L we will be discussing has q-'F = 128 and will therefore be referred to as A N I M A L (128). The time tag of 128 is automatically assigned to all the attributes in A N I M A L (128) and whenever the values of the attributes change A N I M A L (128) gets a new time tag with the old copy deleted from the system. The value of an attribute can be a string, number, fuzzy number or a discrete fuzzy set. Let us assume we have for A N I M A L (128) W E I G H T = 120 pounds, C O L O R = white and,

,ZE f

' medium 'large

1'

(1)

where SIZE is a discrete fuzzy set with s i E [0, 1], 1 <--i -< 3. 129 0020-7373/88/070129+ 20503.00/0 (~ 1988 Academic Press Limited

130

J.J. BUCKLEY

Each attribute value has attached to it a degree of confidence, or cf-value. If W is an attribute, then cf(W) represents our confidence in the given value for W. Confidence values could be: (1) a single number in [0, 1]; (2) dual values L (lower), U (upper) where 0 - < L -< U - I ; (3) a fuzzy number in [0, 1]; or (4) dual fuzzy numbers /2 and 0. In this paper we will discuss only single cf-values in [0, 1]. A number of authors (Adlassnig, Kolarz, Scheithauer, Effenberger & Grabner, 1985; Applebaum & Ruspini, 1985; Baldwin, 1986; Farreny, Prade & Wyss, 1986; Martin-Clouaire & Prade, 1985; Schwartz, 1985) have considered employing dual confidence values. We will restrict our attention to single cf-values because: (1) we have experience with this situation through our FES shell FLOPS; and (2) it is the natural place to start in the hierarchy of types of possible cf-values. For A N I M A L (128) assume c f ( W E I G H T ) = 0-7. c f ( S I Z E ) = 0-8, c f ( C O L O R ) = 0-9 and then this memory element is completely specified as: A N I M A L (128) W E I G H T = (120, 0.7), SIZE =

({

0.6

091,081

smaall' m e d i u m ' l a r g e J

(2)

/'

C O L O R = (white, 0.9). Notice that in A N I M A L (128) our confidence in the fuzzy set for SIZE is 0-8 but our confidence in the animal's size being small is 0-3, medium is 0.6, and large is 0-9. If an attribute's value is a fuzzy number N, then our confidence in this value is cf(~r) but our confidence in any real number x belonging to N would be the value of the membership function for N evaluated at x. Default cf-values are one except for members of discrete fuzzy sets whose default cf-value is zero. In A N I M A L (128) by default we would have cf(WEIGHT) = cf(SIZE) = cf(COLOR) = 1 but, cf(small) = cf(medium) = cf(large) = 0 = si, 1 <- i <- 3. Each rule ~ has its conditional degree of confidence c f ( ~ I ~ ) and its unconditional degree of confidence cf(~). The conditional degree of confidence expresses our confidence in the consequence of ~ depending upon our final confidence in the antecedent (~) and cf(~) is the consequent confidence independent of antecedent confidence. The problem we face in a FES is to process all of these cf-values through each round of rule firing to obtain final cf-values for all the attributes in working memory. The problems to be solved have been addressed by other authors (Bonissone & Tong, 1985; Gaines, 1985), but no satisfactory solution has been proposed for all of these problems. In Part I below we will discuss finding the final cf-value for the left hand side of a rule and in Part II we will present memory maintenance systems designed to update working memory after one round of rule firing. But first let us introduce some notation that will be used in the paper. Let T be any t-norm and C any co-t-norm. Appendix A1 presents the definitions and basic properties, of t-norms and co-t-norms. Since T and C are associative they may be extended to n arguments so T ( x l . . . . . x~) and C ( x l . . . . . x,,) are defined. Specific T and C normally used are: (1) max and min; (2) probabilistic AND

MANAGING

131

UNCERTAINTY

probabilistic OR; and (3) Lukasiewicz A N D and OR. Let, L A N D (x~ . . . . .

xn)=max(~lxi-n+:

1, 0),

(3)

and, L O R (xl . . . . .

x n ) = m i n ( ~ l=x i ,

1),

(4)

which are the extensions of Lukasiewicz A N D and O R to n arguments. Also let n PAND

(X 1....

, Xn) =

H i=t

Xi,

(5)

and, P O R (xl . . . .

, x , ) = ~ x i - ~ xixj + . . " + (-1)n+lxl 9 9 9x,,, i=1 i
(6)

which are the extended probabilistic A N D and OR. When necessary-~ we use probabilistic arguments and analogies to indicate how we should process uncertainties in a FES. Others (Farreny et al., 1986; Martin-Clouaire & Prade, 1985) employ possibility theory in managing uncertainties. At this time we are unable to determine which procedure (probabilistic or possibilistic) is best suited for different types of problems. Suppose ~ is a space of elementary outcomes, P is a probability defined on ~2, and A i are events with P ( A i ) = xi, 1 <- i <- n. Then, LAND (xl,...,xn)<-P[i~=lAi]<-min(x, .....

x,),

(7)

and, max (xl . . . . .

x,,) <- e

Ai

<- L O R (xl, 9 9 9 , x,).

(8)

Equation (7) says that the maximum, and minimum, possible value for the probability of the intersection of the Ai is given by rain, and L A N D , respectively. Equation (8) says the maximum, and minimum, possible value for the probability of the union of the Ai is given by L O R and max, respectively. The inequalities in equations (7) and (8) for two events (n = 2), and their relation to managing uncertainty in an expert system, was first observed by Ruspini (1982). In order to be able to write an equality for these probabilities we now introduce the A N D and O R operators. Define (Buckley & Siler, 1987) P A N D (xl, 9 9 . , x,) + F (min (Xl . . . . . A N D ( x l , . , 9 x," F) = ' '

x,)

- P A N D (xl . . . . , Xn)), F >--O, P A N D (xa . . . . , x,) + F ( P A N D (xl . . . . . x,) -

LAND

(Xl,

. . . ,

(9)

Xn)), F ~- O,

and, O R (xl, 9 9

x,; F)

~POR (xl . . . . . x , ) + F ( L O R ( x l . . . . =[POR(x~, ,x,)+F(POR(xl, t Only in Sections 1 and 2 in Part I.

, x,) - POR(xl . . . . . x,)), F -> 0, ,x,) max(x~,...,x,)),F-<0,

(10)

132

J.J. BUCKLEY

where - 1 - - < F - < I . Using these operators we may now write the probabilities in equations (7) and (8) as,

P[i~_IAiJ=AND(xt .....

x,; F),

(11)

F is a measure of the association between the events, or a measure of overlap between the sets Ai. Equation (1l) says that there is some value of F, between - 1 and 1, so that the probability of the intersection of the Ai is equal to equation (9). Equation (12) says that there is some value of F, between - 1 and 1, so that the probability of the union of the A~ is equal to equation (10). In this way, using the A N D and O R operators, we have replaced the inequalities in equations (7) and (8) with equalities in equations (11) and (12). No independence assumptions are needed. If the events are positively associated as much as possible, or the sets overlap as much as possible, then F = 1 and A N D = min and O R = max. When the events are negatively associated as much as possible, or the sets have minimal overlap, then F = - 1 and A N D = L A N D and O R = L O R . When they are independent we have F = 0. For positive association we use a value of F in (0, 1] and F e [ - 1 , 0) for negative association. The functions A N D and O R are the simplest (piecewise linear) functions connecting m i n - P A N D - L A N D for intersection and L O R - P O R - m a x for union. The selection of a value of F in a fuzzy expert system is left up to the user. The main application of the A N D and O R operators is in determining antecedent confidence which is discussed below and in more detail in Part I. In determining antecedent confidence we often know positive, or negative, association but we hardly ever know the exact value to use for F in (0, 1], or in [ - 1 , 0), respectively. However, the procedure is still useful because without an exact value for F inputed by the user the system could default to F = 1 for known positive association and default to F = - 1 for known negative association. Assume each At corresponds to some fuzzy logical assertion in a FES (Buckley & Siler, 1987). For example, in Part I below A; might be a pattern in the antecedent of a rule. Let xg denote our confidence in A i , o r the probability that A i is true. Also let C O N F [ ] be the confidence in [ ]. We wish to find C O N F [A, A N D . 9 9 A N D Am] and C O N F [A1 O R Az" 9 9 O R A~]. These two values of C O N F [ ] will be the basic ingredients in determining antecedent confidence. Form probability theory we conclude that: L A N D (xl, 9 9 9 x , ) <--C O N F [A, A N D . 9 9 A N D A , ] ~ min (xl . . . .

, x,),

(13)

and, max (Xl . . . .

, x , ) -<- C O N F [A, O R . 9 9 O R A , ] -< L O R (x, . . . . .

x,).

(14)

We might now use some t - n o r m to find the confidence in equation (13) and a co-t-norm for the confidence in equation (14), but instead (Buckley & Siler, 1987) we set, C O N F [At A N D - - 9 A N D A,] = A N D (x, . . . . . x , ; F), (15) C O N F [At O R . -. O R A,,] = O R

(Xl,

. . . , X n "~ F ) .

(16)

MANAGING UNCERTAINTY

133

For positive association among the assertions we use some F > 0 and a F < 0 for negative association. For two assertions, or events, equations (15) and (16) simplify to: C O N F [A~ A N D A2] = A N D (xl, x2; F), (17) C O N F [A, O R A z ] = O R (X1, X2; F).

(18)

If F r 0, 1, then A N D (xl, x2;F) and O R ( x 1 , xz;F) satisfy all the conditions, except associativity, to be a t-norm, or a co-t-norm, respectively.

Part I FINAL LEFF HAND SIDE CONFIDENCE We first describe the three problems to be solved to determine final left hand side confidence and then discuss in detail how final left hand side confidence will be a function of pattern confidence, antecedent confidence, and prior rule confidence. Rules are of the form (Buckley et al., 1986a) If[ 1"[ l * ' " * [

l,then--.,

(19)

where [ ] is a pattern and each * represents A N D or OR. Antecedent confidence is the confidence across all patterns in the antecedent of a rule and final left hand side confidence is a term we use for combined antecedent and prior rule confidence. We do not interpret our rules as a logical implication nor as a modus ponens. A rule in our FES shell F L O P S is a description (the actions in the right hand side of the rule) on how to alter working memory, or create new rules, given that antecedent confidence exceeds some threshold value. Given that antecedent confidence exceeds threshold, the final left hand side confidence is assigned to all the actions in the right hand side of the rule and the resulting changes in working memory is the topic of Part II of this paper. The structure of a pattern is: (1) a single attribute A; (2) A ( R ) L where A is an attribute, L is a literal, and R is some relation; or, (3) A ( R ) B where A and B are attributes and R is some relation. A literal L acts like a constant to which we may compare attribute values using R. The cf-value of L is always one. Let o f ( A ) = a and cf(B) = b. Given the data A and B in working memory a pattern is evaluated, in the three cases mentioned above, as:

1.

~p(a ) = a,

2. ~ ( A ( R ) L ) = ~p(a, A R L , 1),

(20)

3. ~d(A(R)B) = ~p(a, A R B , b), where ~p is the function, with values in [0, 1], which determines final pattern confidence, and A R L ( A R B ) is the value of the relation R, in [0, 1], obtained by comparing the values of A and L (A and B). The relation R may be a binary relation (values zero or one) for comparing numbers or strings or a fuzzy relation (values between zero and one) for comparing fuzzy numbers or discrete fuzzy sets. For example, let R be some fuzzy equality relation between fuzzy numbers, then A R B is a measure of equality between fuzzy numbers A and B. Equality and inequality relations, both binary and fuzzy, are supplied by the FES shell but the

134

J.J. BUCKLEY

user has the option of constructing other relations to be employed in the patterns. The ~p function aggregates the cf-values of the input attributes together with the outcome of the comparison (if any) to compute our confidence in the pattern. Let ot be our final confidence in the pattern and let r = A R L or r = A R B . We must specify:

*p(a, r, b) = or,

(21)

for all a, r, b in [0, 1]. We obtain ~p(a, r, 1) by simply putting b = 1 in *p(a, r, b). At present FLOPS employs min for ~p. Selection of other possible ap functions will be discussed below. We do require one ~p function for all patterns. Alternatively, one might consider using different ~p functions depending on the relation R employed in the pattern. After all patterns have been evaluated, the rule becomes If aq * oc2 . . 9 9* cvn, then- 9 9 ,

(22)

where 06 is our confidence in the i th pattern. If /3 is our final confidence in the antecedent, then, C O N F [aq * or2 * - - - * a(n] =/3.

(23)

The C O N F function must also be specified and its selection is discussed below. We will argue that, depending on any prior associations between the patterns, different expressions should be used in finding/3. Finally, prior rule confidence and antecedent confidence must be combined producing the final confidence in the rule's left hand side, which will then be assigned to the rule's consequence. But first we must discuss thresholding, where rules with low confidence are not stacked for firing. Let 1: ~ [0, 1] be the rule threshold value. The threshold value would be user assigned and in our FES shell FLOPS the default value of 1: is 0-50. The thresholding procedure is: (1) if/3 > 1:, then stack the rule for firing and combine prior rule confidence and antecedent confidence to be assigned to the rule's consequence; (2) if/3 -< 3, do not stack the rule for firing. In FLOPS we threshold on antecedent confidence. Alternately, one could consider thresholding on the rule's final left hand confidence. Assuming/3 > 3, let, 0 ( c f ( ~ [/3),/3) = y,

(24)

which is our final confidence in the left hand side of the rule and c f ( ~ [/3) is the rule's prior conditional confidence. The 0 function combines antecedent confidence and prior conditional rule confidence into y which will be our confidence in the consequence. Other researchers use c f ( ~ [ 1), which is interpreted as the confidence in the consequence given that the confidence in the antecedent is one. Some of these authors then use min for 0 (Hajek, 1985), others employ the product for 0 (Negota, 1985; Ogawa, Fu & Yao, 1985; Shortliffe & Buchanan, 1975), and still another system gives a value for y somewhere between the product and min (Lesmo, Saitta & Torasso, 1985). Selection of the 0 function is also discussed below.

i. Pattern confidence The objective in this section is to discover the minimum number of reasonable properties to impose on the ~p function which will uniquely, except for possibly the

135

MANAGING UNCERTAINTY

values of some parameters, determine the structure of this function. Two basic properties the ~p function must possess are: 1. V2(a, r, b) -< min (a, r, b), and,

(25)

2. ~(a, r, b) is non-decreasing in a, r, b.

(26)

The first property says that final pattern confidence will not exceed input confidence and our confidence is the comparison between A and B. The second property means that if our confidence in A or B increases, or our confidence in the comparison between A and B increases, then our confidence in the pattern will not decrease. If we also assume that ~p(x, x, x ) = x , or that ~p is idempotent, then it easily follows that ~0 = min. However, we will not assume that ~0 is idempotent. Our first basic assumption on the structure of ~0 is:

~p = H(r, A),

(27)

A = F(a, b),

(28)

where F(a, b) = A e [0, 1] is the combined confidence in the input attributes and H combines this with r to produce final pattern confidence. That is, ,p is a function only of combined input confidence and the result of the comparison between A and B. Based on probabilistic reasoning (details in the Appendix A2) the simplest structure for H is: ~p(a, r, b) = A N D (r, A; F). (29) Notice that if r = 1, then ~p(a, 1, b) = F(a, b) for any F 6 [ - 1 , 1]. Using two more reasonable assumptions we can show that F must be, except for associativity, a t-norm. Let us tentatively assume that,

~p(a, r, b) = ~p(b, r, a),

(30)

L A N D (a, r, b) --- ~(a, r, b).

(31)

and, The first assumption says that ~0 is symmetric with respect to the confidence in the input attributes. The second assumption is that the smallest value for ~p that we will accept is the Lukasiewicz A N D of a, r, and b. Alternatively, we might try to justify the second assumption on probabilistic grounds using equation (7). Let r = 1 so that V/(a, 1, b ) = F ( a , b). Assuming equations (25), (26) and (30) hold and using equation (31) we obtain: a = L A N D (a, 1, 1) -< F(a, 1) -< a,

(32)

so F satisfies all the conditions for a t-norm except associativity. We will not make the assumptions in equations (30) and (31) but instead now introduce our last assumption on ap. Consider a pattern, IN = N],

(33)

where the relation R is equality and N is the value of some attribute. The value of the relation N R N = IN = N] should be one. If cf(N) = n, then the confidence in this

136

J,J. BUCKLEY

pattern would be ~p(n, 1, n). It seems reasonable that ~p(n, 1, n) should equal n. That is: ~pIN = N] = cf(N). (34) But this implies that F(n, n) = n since r = 1, F is idempotent, and it immediately follows that F = min. We may rewrite this last assumption as: ~p(a, 1, b ) = a,

(35)

whenever a = b. Recall, as we mentioned above, we wish to use the same ~p function for all patterns. If we want equation (34) to hold when R is equality, then it follows that F must be min. We now summarize the results of this section. The basic assumptions on ~p are: 1. Equations (25) and (26);

2. ~p(a, r, b) = n(r, F(a, b)), 3. n(r, A) = AND (r, A; F), 4. ~p[N = N] = cf(N). Then:

~p(a, r, b) = A N D (r, A; F),

A = min (a, b).

(36)

In FLOPS we now default to F = 1 which gives ~p = min. Alternatively, one might consider r and A to be usually independent and default to probabilistic A N D given by F = 0 .

2. Antecedent confidence We could have: CONF [ocl AND oc2] = T(ocl, 0:2)

(37)

CONF [0:1 OR 0:2] = C(ocl, oc2),

(38)

and, for some t-norm T and co-t-norm C. The use of brackets around groups of patterns in the left hand side of a rule will be important. Suppose we want C O N F [(ocl O R oc2)A N D oc3]= CONF [(oc~ AND 0:3) O R (oc2 A N D oc3)], (39) and, CONF [oca OR (o2 AND 0:3)] = CONF [(oct O R 0:2) AND (0:1 OR 0:3)].

(40)

T[C(oq, ocz), 0:3] = C(T(OC,, oc3), T(oc2, a3)),

(41)

C(oq, T(ocz, oc3)) = T(C(OC,, ocz), C(O~'l, 0/3)).

(42)

Then, and, Adding reasonable assumptions that T, C are continuous and T(x, x), C(x, x) are strictly increasing, then (Bellman & Giertz, 1973) we must have T = rain and C = max. Since we do not wish always to use rain for A N D and max for O R we will not assume equations (39) and (40) must hold. We maintain that any prior association between the patterns must be taken into account in finding antecedent confidence. Consider the rule: If ([SIZE is small] OR [SIZE is large]) AND POSITION is center, then. 9 9

(43)

MANAGING UNCERTAINTY

137

In this rule fuzzy numbers are the values of the attributes SIZE and POSITION and small, large and center are literals all defined by fuzzy numbers. Let 0:1, 0:2, 0:3 be our confidence in the first, second, and third pattern, respectively. Now the patterns [SIZE is small] and [SIZE is large] are negatively associated as much as possible. If one is true, the other is false. Therefore, we would employ the Lukasiewicz O R and obtain, CONF [0:1 OR ~'2] = L O R (o:1, 0:2). (44) One might believe that the size of an object and its position are independent and then use probabilistic AND to combine the confidence in SIZE with that of POSITION. It follows that this rule's antecedent confidence would be: /~ = 0:3

min (o:1 + 0:2, 1).

(45)

In general then we suggest, CONF[tr I AND 0:z] = AND(O:1, 0:2; F),

(46)

CONF[a'I OR 0:2] = OR(0:I, 0:z; 1").

(47)

and, For the rule given in equation (43) we first use OR(0:D 0:2;--1) and then AND (0:1, or2; 0). If brackets are employed to associate the patterns, as in equation (43), then we could use equations (46) and (47) to find the antecedent confidence. However, brackets are not always used especially in rules where all the pattens are connected by ANDs. We then suggest: CONF [o:~ A N D 0 : 2 " " AND 0:n] = A N D (0:1. . . . .

ocn;F) =/3,

(48)

where F is a measure of association between all the patterns. We believe that the measure of association should be 0 < F <--1 because in a " g o o d " rule the patterns will be positively associated. If some pattern/'1 is negatively associated with another pattern Pz in the antecedent of a rule, then this rule will never fire (not a "good" rule) because when /91 tends to be true (0:1 close to 1) /92 will tend to be false (0:2 close to 0) and then t3 will be close to zero and this rule will never pass threshold. In FLOPS we now default to F = 1 and CONF = rain in equation (48). One might also consider defaulting to probabilistic A N D with F = 0. Nevertheless, a FES shell should make available to the user a wide range of operators to find antecedent confidence (Applebaum & Ruspini, 1985).

3. Final left hand side confidence Assume we have a rule ~ , with conditional confidence c f ( ~ [ fi), which has passed threshold. Recall that c f ( ~ I/3) is our confidence in the rule's consequence, or our confidence in the actions to be performed by the right hand side of the rule, conditioned on our confidence /3 in the rule's antecedent. The value of c f ( ~ I/3) need not be one even if/3 equals one. Let, c f ( ~ [/3) = f(fl),

(49)

where f(/3)e [0, 1], f ( 0 ) = 0 , and f is non-decreasing. Equation (49) says our confidence in the consequence is a non-decreasing function of our confidence in the

138

J.J.

BUCKLEY

antecedent. An important, and useful, example of conditional confidence is: 0.9, f(~) =/0-7, L 0,

if 0.9<-~<_1, if 0.6--- r < 0.9, if 0---~6<0.6.

(50)

The conditional confidence given in equation (50) is simply expressed as: (1) our confidence in the consequence is high (0.9) if our confidence in the antecedent is high (0.9-
(52)

where again /6 is the antecedent confidence for input V. Using g(V) we will uncondition the conditional fuzzy set to obtain the confidence in the consequence 0. Let Et~ = {V [/~(V) = fl}. Given that we have observed E0 (antecedent confidence ~) we calculate, y = sup (min ( , ( 0 ] V),/~(V))), (53) V~EI~

which is the final left hand confidence to be assigned to 0 and all the attributes specified in 0. If 0 is not related to any V in Eo, then y = 0. The calculation in equation (53) simplifies to, V = min (f(fl), fl), (54) for any 0 related to a V in E0. Therefore, the 0 function discussed in Part I (equation (24)) may be taken as the minimum of f(fl) and ft. Alternatively, one could employ another t-norm for A N D in equation (53), instead of min, and obtain

7 = T(f(~6), j6). Equation (53) may be difficult to understand for those readers familiar only with probabilistic arguments so, in Appendix A3, we present a simple example illustrating the calculations leading up to equation (53) and also discuss the probabilistic analogue to this development.

139

MANAGING UNCERTAINTY

A rule's unconditional confidence is, cf(~) = sup (min (/~(0 ] V),/~(V))),

(55)

V~0

which simplifies to cf(~) = f ( 1 ) because f is non-decreasing. We therefore obtain the usual interpretation of cf(~) as the confidence in any consequence given our confidence in the antecedent is one. Using any t-norm for A N D in equation (55) we still get c f ( ~ ) = f ( 1 ) .

Part II. Memory maintenance systems In any production system the state of memory at any one time may make more than one rule concurrently fireable. We consider two basic modes of operation (1) deductive, in which one rule is selected for firing by some reasonable scheme and the rest are stacked for backtracking; and (2) inductive, in which all fireable rules are fired. A third, or mixed mode is possible, in which some rules are fired, selected by a clustering algorithm, and the rest are stacked for backtracking. In selecting a single rule for firing, the rules are ordered for "fireability" by any of a number of algorithms available for non-fuzzy systems. For a fuzzy system, we propose that rules be selected for firing based on the antecedent confidence (fl) above threshold, and that rules be first ordered for firing based on net confidence in the antecedent and in the rule itself (y). (Subsequent ordering to break ties may use any of the non-fuzzy algorithms.) The action specified on the right-hand side of a rule include most importantly those which modify memory, including D E L E T E , M O D I F Y and M A K E memory elements. In a fuzzy expert system, before memory may be modified the state of memory before a rule is fired must be considered. This is especially important in an inductive or parallel system; memory conflicts may occur when a number of actions, all referring to the same attribute in working memory, attempt to execute concurrently. We need an algorithm for resolving such conflicts; such an algorithm will be called a memory maintenance system. Since at present parallel computers are not readily available for fuzzy expert systems, the parallelism must be emulated on a sequential yon Neumann machine; although we consider rules to be fired simultaneously in the parallel o r mixed modes, memory update must be accomplished sequentially. We propose that a condition for the validity of a memory maintenance system is that the final state of working memory must be independent of the sequence in which the modifications called for by the rules are presented to the system. This is equivalent to the condition that the memory maintenance system be implementable on a true parallel machine. Let us consider an example which will illustrate the possible memory conflicts. We have a memory element A N I M A L (128) with an attribute W E I G H T whose current value is (120, 0-70). Three rules are fireable whose right-hand sides wish to modify both the value of W E I G H T (120 pounds) and its confidence (0-70) by storing: (I) (150, 0.60); (II) (160, 0.70);

(nI) (180,0.80). What action should we take?

140

J.J. BUCKLEY

The sequence used to execute these actions could be (I)-(II)-(III), or (III)-(II)-(I), etc. and the final state of working memory should be independent of the order chosen. Notice also that these actions desire to modify both the value of W E I G H T and its confidence. Most authors (except Bouchon & Lauriere, 1985) have ignored the possibility of changing the value of an attribute and have concentrated only on the attribute's confidence. Other complications in the m e m o r y update problem are (1) multiple instances of A N I M A L may exist in working memory; and (2) right-hand actions may be "joint" in that they must all execute together or none is executed. In the first section we will present a number of different m e m o r y maintenance systems and then in the following section we discuss selection of m e m o r y maintenance systems for a FES. But first let us discuss the general mechanism of any memory update algorithm. Let ~k, 1-< k--< K, be all the fireable rules each having final right hand side confidence y,. D E L E T E S and M O D I F I E S in the right hand side of a ~k refer to instances of memory elements whose attributes were used to fire ~k. Suppose some attributes in A N I M A L (128) were input into the left hand side of ~3. Then a D E L E T E and a M O D I F Y in the right hand side of ~3 which refer to A N I M A L are in fact referring to A N I M A L (128). A M A K E statement may act like a D E L E T E or M O D I F Y discussed above, or they can create entirely new instances of m e m o r y elements. We may execute all D E L E T E S in any sequence because the system will not allow a D E L E T E to effect currently executable M A K E S or M O D I F I E S . A D E L E T E refers to a whole memory element and effectively sets all cf-values for all attributes to zero but does not alter values or time tags. Suppose a D E L E T E refers to A N I M A L (128) and is executed before some M A K E S and M O D I F I E S , also referring to A N I M A L (128), can be executed. The system makes all initial cf-values for all the attributes in A N I M A L (128) available to these M A K E S and M O D I F I E S in the m e m o r y update algorithm. Otherwise, we would have a major problem in sequencing D E L E T E S , M A K E S and M O D I F I E S . We may also perform all the M A K E S in any sequence. A M A K E creates a new instance of a memory element (with a new time tag) and by default it uses confidence value Yk from the rule. The order M A K E S are executed effects only the time tags on the memory elements. Alternatively, one might consider combining M A K E S and M O D I F I E S on the same memory element in order to reduce the number of new instances of m e m o r y elements created by firing a group of rules. Therefore, a memory maintenance system is only needed for M O D I F I E S . Consider a memory element ~ with attributes Ai, 1 - i - m. We assume a number of instances ~ ( T T j ) , 1 -<] -- n, exist in working memory, where TFj is ~ ' s time tag. The m e m o r y maintenance system operating on all the M O D I F I E S , in the right hand side of a fireable rule ~k, is: (1) Collect all the M O D I F I E S referring to ~t(TYm), (a) Collect all the M O D I F I E S referring to Am in ~t(TT1) , (b) update working m e m o r y for Am, (c) repeat (a) and (b) for all A i , 2 <- i <--rn, (d) create new instance(s) of ~ with new time tag(s),

141

MANAGING UNCERTAINTY

(2) Repeat (1) for J/(TTj), 2 ---j ~ m, (3) Repeat (1) and (2) for all memory elements. We will discuss (1) (b) and (1) (d) in more detail below. 1. MEMORY M A I N T E N A N C E SYSTEMS

The memory update system employed will depend on how the values of attributes are specified. In our example ANIMAL (128) the value of an attribute could be a variable fuzzy set. We may allow multiple values for WEIGHT, such as (120, 0.7) and (170, 0.6), and then we would represent this attribute's value as: {~

0-6 iff0}'

(56)

a discrete fuzzy set. This fuzzy set could be allowed to grow to, 0.7 0.6 0.9}

{1-~' 170' 2-~ '

(57)

0-7} 1 ~ = (120, 0-7),

(58)

or shrink to,

as the result of firing rules. The fuzzy set for SIZE might be fixed in that we do not allow adding members such as "huge" nor do we allow deleting any of its members. Suppose initially, before the actions in the right hand side of fireable rules are executed, we have for ANIMAL (128) in working memory WEIGHT = (120, 0.7), or,

(I

(59)

06

and SIZE and COLOR are specified as in equation (2). A default value of one was assigned for our confidence in the fuzzy set for WEIGHT. Our assumption is that: (1) WEIGHT and COLOR have single values and SIZE is a fixed fuzzy set; or (2) WEIGHT is a variable fuzzy set, SIZE is a fixed fuzzy set and COLOR has a single value. The second assumption is needed in the next section. We will now consider two memory maintenance systems: (1) based on value; and (2) based on confidence. 1.1 Value based Suppose after a group of rules fire we have some MODIFIES, referring to WEIGHT in ANIMAL (128), which wish to store the following values and confidences:

(120, 0.6), (150, 0.8), (200, 0.6)(120, 0.75), (150, 0.7), (200, 0-9).

(61)

Grouping on value first and then taking maximum confidence we obtain: (120, 0.75), (150, 0.8), (200, 0-9).

(62)

142

J.J.

BUCKLEY

If W E I G H T is a variable fuzzy set we union the fuzzy set in equation (60) with (62) to obtain,

0.9})

(~0-75 0-8 0 - 6 , 2~ W E I G H T = \ ( 120 ' 150 ' 170

,1

(63)

When W E I G H T is a single value (120, 0-7) we would keep each pair in equation (62) and create new instances of A N I M A L . However, we first must consider what happens to SIZE and C O L O R . Let us assume that no changes are indicated for C O L O R but there are M O D I F I E S referring to the members of size which indicate that: cf(small) = 0-6, cf(medium) = 0.8, cf(large) = 0.7, cf(small) = 0.5. Taking maximum confidence we obtain:

SIZE=

({0.6

09 , )

smaall'medium la~geJ 0.8 .

(64)

When W E I G H T is a variable fuzzyset we obtain one new instance of A N I M A L (a new time tag) with W E I G H T given by equation (63), SIZE by equation (64) and no change for C O L O R . If W E I G H T is a single value we obtain three n e w instances of A N I M A L , all have the same S I Z E and C O L O R as above, but one instance has W E I G H T = (120, 0.75), another with (150, 0.8), and still another with (200, 0-9). Whenever any attribute in A N I M A L (128) is modified a new instance is created and the old instance (TI" = 128) is deleted. We would call a value based system strongly monotonic because both the number of values, and their confidences, assigned to attributes do not decrease. 1.2 Confidence based This procedure is essentially the same for fixed fuzzy sets (like SIZE) so let us concentrate on the attribute W E I G H T in A N I M A L (128) whose initial value and confidence are (120, 0.7) The confidence based method does not allow for variable fuzzy sets. The fireable rules ~k have M O D I F I E S in their right hand sides which wish to store (w~, y~) for W E I G H T , where wk is the value and the confidence in wk is YkLet y~ = max (y~). We may also have joint M O D I F I E S in the right hand side such as;

M O D I F Y ( W E I G H T = (w, y~) A N D C O L O R = (green, Yh)).

(65)

We must be able to execute both M O D I F I E S or neither will execute. Both may execute if and only if g~ --- 0.7 (initial confidence in W E I G H T ) and Yb >- 0"9 (initial confidence in C O L O R ) . Now let (ws, y,), s 9 S, be all the weight-confidence pairs for W E I G H T in the executable joint M O D I F I E S and set Y~ = max (Ys Is 9 S) and y * = max(yT, Y~). If S is empty, then y* will be y~'. Joint M O D I F I E S are not employed in the value based truth maintenance system when variable fuzzy sets are allowed. Memory update algorithms for W E I G H T can now be described in terms of y*

MANAGING UNCERTAINTY

143

as: 1. If y * > 0.7, then keep all the (wt, 7t) whose 7t = 7* and delete all the rest including the original (120, 0-7), 2. If 7* <0-7, then keep only the initial (120, 0-7); and 3. If y* = 0-7, then, (a) (WM1) keep all the (wt, 7t) with ~,t = 0.7 and also keep (120, 0-7), or, (b) (WM2) keep all the (wt, Yt) with y, = 0-7 but delete (120, 0.7), or, (c) (WM3) keep only (120, 0.7). As an example of these three procedures assume the fireable rules want to store (150, 0-7) and (200, 0-7) for WEIGHT. Then,

1. WM~ gives (120, 0-7), (150, 0-7), (200, 0-7), 2. WM2 gives (150, 0.7), (200, 0-7), and 3. W M 3 gives (120, 0-7). After generating all the new value-confidence pairs for W E I G H T and COLOR, and the next (fixed) fuzzy set for SIZE, we take all possible combinations of one value-confidence pair for WEIGHT and one for C O L O R together with SIZE, to create all the new instances of ANIMAL. If any value or cf-value in ANIMAL is MODIFIED, a new instance is created and the old one ANIMAL (128) is deleted. We call these memory update procedures weakly monotonic because the cf-values do not decrease. Still another method would be always to delete (120, 0.7) and keep all the (wt, Yt) with Yt = 7" irregardless of 7" being larger or smaller than the original confidence. We would call this last procedure non-monotonic (NM) because confidences may decrease. In FLOPS we now employ WM2 except for recursive calculations on numbers where we employ NM. Therefore, one may use different memory update methods within a FES. We also call WM2 replacement logic because we allow replacement of the original attribute value (120 pounds for weight) when the maximum confidence in the possible replacements equals our confidence in the initial value. 2. SELECTING A MEMORY MAINTENANCE SYSTEM A memory maintenance system is absolutely necessary in the parallel, or mixed, modes of operation but it is also required in the sequential mode. When we first developed our FES shell FLOPS it only had the sequential mode and the memory update procedure adopted was WM2. Later when the parallel mode was added we simply expanded WM2 to handle all the possible memory conflicts. However, after considering various other possible memory maintenance systems, we are now proceeding to change over the value based method. We have decided on the value based method, for MODIFIES and non-recursive calculations, where all data items are described by variable or fixed fuzzy sets, because: (1) the system will keep all attribute values generated by the rules; (2) confidence values will not decrease; and (3) when the rules call for altering an attribute value only one new instance of the memory element will be created. The value based method, with data as variable, or fixed, fuzzy sets processes all the ambiguities and uncertainties to produce a preliminary fuzzy set of conclusions.

144

J.J. BUCKLEY

The result could give a maximum number of conflicts within the fuzzy set of conclusions. If crisp output is required, then one should not necessarily pick the conclusion with highest confidence. What we have done (Buckley et al., 1986a; Siler et al., 1987) is to feed the conflicting conclusions into secondary blocks of rules, together with more data, specifically designed to break the conflicts. We have found that the final (crisp) conclusion is not always the one with highest confidence in the preliminary set of fuzzy conclusions. Therefore, the value based method is used to obtain the preliminary conclusions and more rules and data then are employed, when necessary, for crisp output. We have no general criteria for selecting a memory maintenance system except our feeling that it should be monotonic in values and confidences until the system decides to "change its mind". Confidences can be decreased and values deleted, or the system can "change its mind", through the use of DELETE and MAKE commands. That is, the FES should be monotonic in executing the MODIFY command. The only condition mentioned above for the validity of a memory maintenance system is that the final state of working should be independent of the sequence in which modifications called for by the rules are presented to the system in one round of rule firing. The memory maintenance systems discussed above only describe the final state of working memory after one round of rule firing. The validity condition is then a condition on the internal algorithm employed to achieve a certain final state of working memory and not a criterion on memory maintenance systems.

Summary and conclusions This paper describes how a fuzzy expert system may process uncertainties and ambiguities to produce as output a preliminary fuzzy set of conclusions. The uncertainties are specified as confidence values in the data and rules and the ambiguities are handled by fuzzy sets. Part I discusses methods of processing the uncertainties through a rule to obtain final left hand side confidence which is then assigned to all actions in the rule's consequence. Part II presents methods needed to update working memory after one round of rule firing in either the sequential, mixed, or parallel mode of operation. The power of the fuzzy set procedure, over traditional expert systems, is in its ability to handle uncertainties and ambiguities. This should be further exploited in future fuzzy expert system shell design.

References ADLASSNIG, K.-P. KOLARZ, G., SCHEITHAUER,W., EFFENBERGER, H. & GRABNER, G. (1985). CADIAG: Approaches to computer-assisted medical diagnosis. Computing in Biology and Medicine, 15, 315-335. APPLEBAUM,L. t~ RUSPINI,E. H. (1985). ARIES: A tool for inference under conditions of imprecision and uncertainty. Proceedings SPIE-International Society of Optical Engineering 548, 161-165. BALDWIN, J. F. (1986). Support logic programming. In: A. JONES, A. KAUF~MAN,H.-J. ZIMMERMANN, (Eds). Fuzzy Sets Theory and Applications, pp 133-170. Holland: D. Riedel.

MANAGING UNCERTAINTY

145

BELLMAN, R. & GIERTZ, M. (1973). On the analytic formalism of the theory of fuzzy sets. Information Sciences, 5, 149-156. BELLMAN, R. E. & ZADErL L. A. (1970). Decision-making in a fuzzy environment. Management Science, 17, 141-164. BONtSSONE, P. P. & TONG, R. M. (1985). Editorial: Reasoning with uncertainty in expert systems. International Journal of Man-Machine Studies, 22, 241-250. BOUCHON, B. & LAURIERE, J.-L. (1985). Symbolic normalized acquisition and representation of knowledge. Information Sciences, 37, 85-94. BUCKLEY, J. J., SILER, W., & Tucrd~n, D. (1986a). A fuzzy expert system. Fuzzy Sets and Systems, 20, 1-16. BVCKLEV, J. J., SILER, W. & TuC)CER, D. (1986b). FLOPS, A fuzzy expert system: applications and perspectives. In H. PRADE & C. V. NEGOrrA, A. Eds. Fuzzy Logic in Knowledge Engineering, 256-274. Verlag TUV Rheinland, Koln. BUCKLE'r, J. J. & SILER, W. (1987). Fuzzy operators for possibility interval sets. Fuzzy Sets and Systems, 22, 215-227. DUBols, D. & PRAOE, H. (1984a). Fuzzy-set-theoretic differences and inclusions and their use in the analysis of fuzzy equations. Control and Cybernetics, 13, 129-146. DvBolS, D. & PRADE, H. (1984b). Criteria aggregation and ranking of alternatives in the framework of fuzzy set theory. In H. ZIMMERMANN, L. A. ZAOEH & B. GAINES, Eds., Fuzzy Sets and Decision Analysis, TIMS Studies in the Management Sciences, Volume 20, pp. 209-240. Amsterdam: Elsevier. DvBols, D. & PRADE, H. (1985). A review of fuzzy set aggregation connectives. Information Sciences, 36, 85-121. FARRENV, H., PRAOE, H. & Wvss, E. (1986). Approximate reasoning in a rule-based expert system using possibility theory: a case study. International Conference on Business Applications of Approximate Reasoning, Paris. GAINES, B. R., Ed. (1985). Reasoning with Uncertainty in Expert Systems. Guest Editor P. P. BONISSONE. Special Issue of International Journal of Man-Machine Studies, 22(3), March 1985. HAJEK, P. (1985). Combining functions for certainty degrees in consulting systems. International Journal of Man-Machine Studies 22, 59-76. LESMO, L., SAWrA, L. & TORASSO, P. (1985). Evidence combination in expert systems. International Journal of Man-Machine Studies, 22, 307-326. MARTIN-CLoOAmE, R. & PRAr)E, H. (1985). On the Problems of representation and propagation of uncertainty in expert systems. International Journal of Man-Machine Studies, 22, 251-264. NEGOTA, C. V. (1985). Expert Systems and Fuzzy Systems. Menlo Park, CA: BenjaminCummings. OGAWA, H., Fu, K. S. & YAO, J. T. P. (1985). An inexact inference for damage assessment of existing structures. International Journal of Man-Machine Studies, 22, 295-306. RusPINI, E. H. (1982). Possibility theory approaches for advanced information systems. Computer, 15, 83-91. SCHWARTZ, D. G. (1985). The case for an internal-based representation of linguistic truth. Fuzzy Sets and Systems, 17, 153-165. SCHWEITZER, B. & SKLAR, A. (1963). Associative functions and abstract .semi-groups. Publication Mathematical Debrecen, 10, 69-81. SttORTLIFFE, E. & BUCrtANAN, B. G, (1975). A model of inexact reasoning in medicine. Mathematical Biosciences, 23, 351-377. SILER, W. & TUCKER, D. (1986). FLOPS, A Fuzzy Logic Production System User's Manual. Birmingham, AL: Kemp-Carraway Heart Institute. SILER, W., BVCKLEu J. & TUCV:ER, D. (1987). Functional requirements for a fuzzy expert system shell. In: ZAOEH, L. & SANClJEZ, E., (Eds), Artificial Intelligence: Applications of Quantitative Reasoning, 21-31. Oxford: Pergamon Press. SmER, W., TUCKER, D. & BUCKLEr, J. (1987). A parallel rule firing fuzzy production system with resolution of memory conflicts by weak fuzzy monotonicity, applied to the classification of multiple objects characterized by multiple uncertain features, International Journal of Man-Machine Studies. 267 321-332.

146

J.J. BUCKLEY

Appendix A1 In this appendix we define t-norms and co-t-norms and review some of their basic properties. A t-norm T (Schweitzer & Sklar, 1963) is a mapping T: [0, 1] x [0, 1]---> [0, 1] satisfying: (a) (b) (c) (d)

T(a, b)= T(b, a); T(O, O) = O, T(a, 1) = a; T(a, b) <- T(c, d) if a -< c and b -< d; and associativity.

These axioms attempt to capture the basic properties of set intersection. The smallest t-norm is: a, if b = l , Tt(a, b) = b, if a = 1, (AI.1) 0, otherwise.

I

The basic t-norms are Tt(a, b), L A N D (a, b) = max (a + b - 1, 0), P A N D (a, b) = ab and min (a, b) which satisfy the inequalities Tt(a, b) -< L A N D ( a , b) -< P A N D (a, b) --
(A1.2)

T-norms are considered appropriate functions to model fuzzy set intersection (Dubois & Prade, 1984a, b, 1985). A co-t-norm C (Schweitzer & Sklar, 1963) is a function C: [0, 1] • [0, 1]---> [0, 1] satisfying: (a) (b) (c) (d)

C(a, b) = C(b, a); C(1, 1) = 1, C(a, O) = a; C(a, b) <- C(c, d) if a -< c and b --- d; and associativity.

These axioms attempt to capture the basic properties of set union. The largest co-t-norm is: a, if b = 0 , Cu(a,b)= b, if a = 0 , (A1.3) 1, otherwise.

I

The basic co-t-norms are max(a, b), P O R ( a , b) =a + b - a b , min (a + b, 1) and C,(a, b) which satisfy the inequalities

L O R (a, b) =

max (a, b) ~ P O R (a, b) -< L O R (a, b) <- Cu(a, b ).

(A1.4)

Co-t-norms have been considered as suitable functions for modeling fuzzy set union (Dubois & Prade, 1984a, b, 1985).

Appendix A2 We will use probabilistic arguments to establish equation (29). The structure of the pattern is A(R)B so let ~ ( ~ ) be all the possible attribute inputs to the left hand side (right hand side) of the pattern. Also let ~ = ~ • ~ . Given cf(A) = a, cf(B) = b,

MANAGING

147

UNCERTAINTY

A = F(a, b) and A R B = r define, C1 = ( ( X , Y ) c @ I X R Y -- r),

(A2.1)

C2 = ((X, Y) ~ @ ] F(cf(X), cf(Y)) = A}.

(A2.2)

Next let P be a probability defined on @ with P[C1] = r and P[C1] = A. We have observed C1 and C2 and we wish to find the probability of Ca AND (:2. Using equation (11) we have, P[C 1 f] C2] = AND (r, A; F). (A2.3) This then is the expression we will use for H. We could also argue the same result using fuzzy sets. Define fuzzy subsets, INPUT and RELATION, of ~ whose membership functions are #I(X, Y)= F(cf(X), cf(Y)) and #R(X, Y ) = X R Y , respectively. We wish to find the fuzzy set OUTPUT where, OUTPUT = (INPUT) N (RELATION). (A2.4) For the membership function for OUTPUT we choose AND(#R(X, Y), #,(X, Y); F).

(A2.5)

Appendix A3 In this appendix we wish to present a simple example illustrating equation (53) and then discuss the probabilistic analogue to this equation. Consider the rule given in equation (43) now with the following consequence If ([SIZE is small] OR [SIZE is large]) AND [POSITION is center], then MODIFY (CLASSIFICATION = Ga),

(A3.1)

where Ga is an element in the fuzzy set, CLASSIFICATION={g~

g: g~33} ~G22 ~

(A3.2)

We have a computer vision problem where we are trying to classify various regions in a picture and any region may be G1, G2 or G3 with confidence g~, g2 or g3, respectively. The input into this rule's antecedent is V = (N, M) where ,~ is a fuzzy number for the region's size and 57/ is another fuzzy number representing the region's position. The output 0 from this rule is simply g2, our confidence that this region is Q , which will equal 7, the unknown final left hand side confidence. Therefore, M0 will be all V = (~r, ~/), where ~' can be any fuzzy number for the size of the region and ,~ is any fuzzy number representing the region's position. ~dl will be all possible ]/values, which theoretically could be any number between zero and one. We have a fuzzy set defined on M0 whose membership function is #(V) =/~, which is the resulting antecedent confidence obtained by putting V into the patterns in the rule's antecedent (see equation (45)). We also have a fuzzy set defined on ~/1, for each V in sO0, whose membership function is #(0 I V) = f ( f l ) , since f ( f i ) is our confidence in the consequence given the confidence in the antecedent is ft. We now observe fl, or we observe E~ which contains all inputs V

148

J.J. BUCKLEY

producing the same antecedent confidence ft. We then uncondition/l(0[ V), based on the information that V belongs to E~, as shown in equation (53), producing y =/~(0 1V e E~).

(A3.3)

In probability theory /~(0[ V) would be a conditional probability distribution on ~1 for each V in M0 and #(V) would be a probability distribution on ~0. Given that event Ea occurred, we would multiply /~(01 V) and /~(V) and then integrate the product over Ea to obtain the probability of 0 given that V was in E~. However, in fuzzy set theory we take min instead of product and supremum in place of integration (Bellman & Zadeh, 1970).