L,
•
"
-
FUZZY
"
sets and systems Fuzzy Sets and Systems 92 (1997) 223-240
ELSEVIER
Bayesian conditioning in possibility theory Didier Dubois*, Henri Prade Institut de Recherche en Informatique de Toulouse (I.R.I.T.) - C,N.R.S., Universit~ Paul Sabatier, 118 route de Narbonne, 31062 Toulouse Cedex 4, France
Received April 1997
Abstract In this paper, possibility measures are viewed as upper bounds of ill-known probabilities, since a possibility distribution is a faithful encoding of a set of lower bounds of probabilities bearing on a nested collection of subsets. Two kinds of conditioning can be envisaged in this framework, namely revision and focusing. On the one hand, revision by a sure event corresponds to adding an extra constraint enforcing that this event is impossible. On the other hand, focusing amounts to a sensitivity analysis on the conditioned probability measures (induced by the lower bound constraints). When focusing on a particular situation, the generic knowledge encoded by the probability bounds is applied to this situation, without aiming at modifying the generic knowledge. It contrasts with revision where the generic knowledge is modified by the new constraint. This paper proves that focusing applied to a possibility measure yields a possibility measure again, which means that the conditioning of a family of probabilities, induced by lower bounds bearing on probabilities of nested events, can be faithfully handled on the possibility representation itself. Relationships with similar results in the belief function setting are pointed out. Lastly the application of possibilistic focusing to exception-tolerant inference is suggested. © 1997 Elsevier Science B.V. K e y w o r d s : Probability bounds; Possibility theory; Conditioning; Default rules
1. Introduction I n f o r m a t i o n can be generic (i.e., pertaining to a class of situations) or evidential (i.e., pertaining to a particular case). This distinction between evidential and generic information is crucial for a proper understanding of belief revision processes and commonsense inference. In the following the term
*Corresponding author. Tel.: +33-61556765; fax: +3361556258; e-mail:
[email protected].
"knowledge" refers to generic information, while information a b o u t a case is referred to as "(factual) evidence". Let K denote the generic knowledge and E the particular "evidential" information on a case at hand. K is often represented as a rule-base, or as a probability distribution (encoded as a Bayesian network). In this paper it will be represented by a family of probability distributions induced by imprecise probabilistic knowledge, and more specifically by a family of probability distributions representable by a possibility measure. An example of probabilistic knowledge base that can be represented by means of a family of probability measures
S0165-0114/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved P I I S01 6 5 - 0 1 1 4 ( 9 7 ) 0 0 1 7 2 - 3
224
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
consists in a set of statements of the form "most Ai's are Bi's", "few Afs are Bfs", ... , etc. More generally, K = {"Qi Ai's are Bi's", i = 1, n} where Qi is an imprecise quantifier of the form [c~, 1] (for "most") or [0, ~] (for "few"), or [c~,fl], expressing an illknown proportion. Each such statement can be modelled by a constraint of the form P(BilA~) ~ Qi. The evidence E is supposed to be incomplete but not uncertain here. It is represented by a proposition or an event. Assume a new piece of information arrives. The problem addressed here is what to do in order to account for the new information in the pair (K, E) so as to produce new plausible conclusions. When the input information is evidential it only affects the evidence E. An operation we call focusing [16] consists in applying the generic knowledge to the context of interest as described by the evidential information (including the new one), in order to answer questions of interest pertaining to this particular situation. In such a process the knowledge should remain completely unchanged. Namely, we know that the case at hand is in class A. Then, considering the above probabilistic example, suppose it is asked whether the case at hand has property B. What is to be computed is P(B IA). However, since there may be more than one probability measure restricted by the statements in K, only upper and lower bounds of P(BIA) can be computed. This has to be contrasted with the situation where the input information is generic. It thus should be understood as a new constraint refining the available knowledge. In such a case, a genuine revision takes place since the knowledge has to be modified in order to incorporate the new piece of information as a new constraint (except if it is redundant with respect to what is already known). In the case of revision, the new piece of information is of the form P(A) = 1, which means that all cases (that the knowledge base refers to) satisfy property A. Then K becomes K'u{P(A)= 1} and new values for the upper and lower bounds for P(B) where B is a property of interest can be then computed. These bounds do not coincide with the ones of P (BIA) calculated above in the focusing case, and may even not exist i f K implies that P(A) :~ 1. These remarks make it clear that revising is not focusing. A revision process can be iterated, and it progressively modifies the initial knowledge base. Iter-
ated focusing only refines the body of evidence and the same generic knowledge is applied to a new reference class of situations pointed at by the evidence. In some problems, only evidential information is available, and new pieces of possibly uncertain evidence lead to a non-trivial revision process due to uncertainty, as in [37]. The distinction between focusing and (generic) revision can only be made in uncertainty frameworks where the difference between generic knowledge and factual evidence can be captured. For both revision and focusing on the basis of uncertainty functions, such as probability measures, the basic tool is the conditioning operation; see [16, 23] for general discussions about focusing vs. revision. Possibility theory is now acknowledged as one of the major non-probabilistic theory of uncertainty, and the only one that is, in some sense simpler than probability theory [42]. One of its strength is the multiplicity of its interpretive settings, ranging from purely ordinal ones to purely numerical ones. In particular, a possibility measure can be viewed as a particular case of upper probability function [13, 15]. One of the natural steps in the development of possibility theory is to understand the conditioning operation in connexion with Bayesian conditioning. This is the topic of this paper. In the following, we first briefly recall the upper probability view of possibility theory, then focusing and revision are explained in the framework of upper and lower probability systems, before considering these two operations in possibility theory. The main result of this paper proves that the focusing applied to a possibility measure yields a possibility measure again. It means that the conditioning of a family of probabilities, induced by lower bounds bearing on probabilities of nested events, can be faithfully handled at the level of the possibility representation itself. Lastly, the application of possibilistic focusing to exception-tolerant inference is suggested. This paper is an expanded version of [22].
2. Possibility as upper probability Possibility theory [12 3 is one of the simplest theories of uncertainty. It relies on the use of
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
set-functions called possibility measures, that are characterized by their max-decomposability [43], namely /7(AwB)=max(/7(A),FI(B)), for A, B c U where U is a referential set./7 is said to be "maxitive". On infinite sets a sup-decomposability must be stated for infinite unions (what could be called a-maxitivity or "supitivity"). Like probability measures, possibility measures are entirely characterized by possibility distributions that attach to each element u ~ U a level of possibility n(u), and it holds that lI(A)=max.~an(U) in the finite case (sup instead of max for the infinite case). By duality, a necessity measure N(A) = 1 - / 7 ( A ) is associated with H. Necessity measures satisfy the characteristic property N(A c7 B) = min(N(A), N(B)). Possibility theory can be interpreted either as a qualitative approach to uncertainty based on a linear ordering of uncertainty levels (where events can only be ranked according to their levels of possibility and their level of necessity defined in an ordinal scale), or as a numerical framework for uncertainty which then may be related to probability theory [5]. Mathematically speaking, numerical possibility measures, like probabilities, are indeed remarkable particular cases of Sharer plausibility functions, and can be also viewed as a simple, but non-trivial, system of upper probabilities; see [13] for a classification. In other words, a possibility measure is viewed as the upper bound of an unknown probability value. This upper approximation agrees with the idea that something should be first possible before being probable [43]. The present paper adopts this particular view of a possibility measure. See [18] for a more general discussion about fuzzy sets, possibility and probability. More recently Dubois et al. [11] have linked together possibility theory, likelihood functions and imprecise probabilities. From a formal point of view, a possibility measure with a finite number of levels of possibility can be viewed as a particular type of upper probability system in the following sense. Namely, consider an upper probability function P* induced by a finite set of lower bounds {P(Ai) >1 ~]i = 1. . . . . n}, i.e., P*(B) = sup{P(B)IP e ~},
225
where ~ = {P]P(Ai) >~ ei, i = 1, ..., n} andAi ~ U for some referential set U. Note that the lower probability is defined by P,(B) = inf{P(B)lP e ~ } = 1 - P * ( B ) . P* is a particular case of an upper envelope [9]. The set-function P* is a possibility measure if and only if there exists a nested set {A1 . . . . . A,}, e.g., A1 ___ A2 - ... - A, and a set of weights ~1 ~ ~2 ~ "" ~ ~n @ [0, 1] such that ~ = {P[ P(Ai) >1 :¢i, i = 1, n}; see [15] for a detailed exposition and proof. An example of such a set of nested constraints is provided by confidence intervals. The possibility distribution associated to P* is given by VuleAi,
ui(iAi-1,
fori=2, n
and
n(ui)=l-o~i-1 VueA1,
n(u)=l.
This result presupposes that ai ~< :q+ 1, which can be always done since Ai G A~+I (otherwise the constraint P(Ai+I)>~ ~i+1 will be redundant). Conversely, a family of probability distributions can be induced by any numerical possibility measure /7 on a finite set, namely ~(/7) = {PIP(A) <, FI(A), VA}, and it holds that P*(B) = sup{P(B)lP e ~(/-/)} = / 7 ( B ) for any event B. The nested sets A/s from which P* can be defined are nothing but the level-cuts of the possibility distribution n; see [7] for an extension of these results to infinite families of nested sets. Mathematical aspects of possibility theory are studied by Mesiar [32] and in a very long paper by De Cooman [6].
3. Focusing vs. revision in upper and lower probability systems One of the basic difficulties in the study of nonadditive uncertainty models is that of devising suitable counterparts to the notion of conditioning. Let us briefly analyze the situation in the frameworks of belief functions and of upper and lower probabilities. Belief and plausibility functions 1-37] are, at the mathematical level special kinds of lower and upper envelopes, respectively, denoted as Bel and P1. Namely, Bel is monotone of infinite order, in the terminology of Denneberg [9], that is,
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
226
for all n > 1 Bel(A1
uA
2 ~
""
wA,)
~> ~ (--1)/+1 ~ BeI(j~IAJ )" i= 1 Ill = i I_= {x,2...... } In the setting of belief functions, two different definitions of conditioning have emerged: - What is now known as Dempster's [8] conditioning, such that PI(BIA)=PI(Bc~A)/PI(A), BeI(BIA) = 1 - PI(BIA). - Another conditioning also proposed by Dempster, but revived by Fagin and Halpern [24] and De Campos et al. [4] and such that P1A(B) =
Pl(Bc~A) PI(Bc~A) + Bel(Ac~B)'
where Bel(B) = 1 - PI(B). By duality, Bela(B) = 1 - P1A(B) =
Bel(Bc~A) Bel(BnA) + PI(Ac~B) '
This conditioning is called Bayesian by Jaffray [29]. While there has been some disputes about which rule is best (e.g., [30, 36]), we claim that it is not surprising that there should be two rules, because each one serves a distinct purpose. However, in Bayesian probability theory it turns out that the two above conditioning rules reduce to Bayes conditioning rule, which is unfortunately used for these two purposes, hence the current confusion.
3.1. The maximum likelihood rule: Belief revision by Dempster conditioning Dempster rule of conditioning can be cast in the setting of upper and lower probabilities. In the setting of 2-monotone set-functions P . , Gilboa and Schmeidler [25] have proposed to select within the set :~ = {PIP >>-P,} of probabilities dominating P . only those probability measures P such that the probability P (A) of the conditioning event is maximal. Letting ~,] = {P ~ ~ I P ( A ) = P*(A)}, where P*(A) = 1 - P , ( A ) , define the maximum likelihood conditioning rule as
f P(A B) p P*(BIA)=sup~ P*(A) '
J
When P* is a plausibility function, P*(B[A)= (P* (B c~A))/(P* (A)) coincides with Dempster's revision rule [25]. We have claimed that Dempster rule is welladapted to the revision problem, i.e., for the purpose of revising uncertain prior knowledge upon arrival of new generic information [23]. Namely, let P represent our uncertain knowledge and suppose the input information is A and takes the form of a constraint stating that P(A)= 1. Revision by a constraint A comes down to modifying ~ into ~ = ~c~ {PIP(A) = 1} and evaluating lower bounds P.(B) = inf{P(B) lPe ~ ] } and upper bounds P*(B)=sup{P(B)IPeN +} for B o A . This scheme works only if : ~ ~ 0. In that case P*(A) = 1 and P*(BIA) = sup{P(B)lP e ~ } indeed. If P*(A) < 1, constraint P(A) = 1 is inconsistent with N, and one can envisage a maximum likelihood revision as described above, keeping only the probability functions such that the probability P(A) of the conditioning event is maximal. These probabilities are in some sense the "most plausible ones" in view of the new information P(A) = 1 [33]. Note that the closed form of the maximum likelihood revision rule P*(BIA)= (P*(B c~A))/(P*(A)) is not recovered for all kinds of upper envelopes, and generally not when = {PIP(BilAi)~ Qi, i = 1, n} where Q/is an interval, because the induced lower bounds are not 2-monotone; see [34] for a systematic study of belief revision with convex probability sets.
3.2. The focusing rule: Question answering via Bayesian conditioning On the contrary, the other form of conditioning is good for answering "what if" types of questions, on the basis of uncertain knowledge, and of evidence pertaining to a particular situation; see again [23]. The computation of Pla(B) can be achieved via sensitivity analysis provided that Bel(A)> 0: it is the upper bound of the set of conditional probabilities obtained by conditioning all probabilities dominated by the set function PI (i.e., Pla(B) = sup{P(B[A)IP >~Bel}). The latter definition is thus well in accordance with the setting of upper and lower envelopes [41], but the set-function PIA(B) is still a plausibility function [24, 29].
D, Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
This conditioning is called Bayesian conditioning since it relies on usual conditioning for the probabilities P >~ Bel. I In the general case, Bayesian conditioning of a set ~ of probabilities comes down to computing the range of P(BIA), for P e N provided that P(A) > 0, VA e ~. This range generally differs from and contains [P, (BIA), P*(BIA)] obtained with ~ * using the maximum likelihood rule (e.g., [30]). The closed form for BelA(B ) holds again for more general lower envelopes, namely those which are monotone of order 2 (supermodular in the terminology of Denneberg [9]). P~4(B) evaluates the plausibility (or upper probability) of the truth of B in situations where A is true. But it is not assumed that the plausibility of "not A" is zero in that case. Indeed it is not assumed that P ( A ) = 1 when computing the bounds of P (BIA). On the contrary, with Dempster rule, PI(BIA) evaluates the plausibility of B, "now that we know that A is impossible". Thus, the Dempster's conditioning is a revision rule, while the other form of conditioning can be called a focusing rule. Revision modifies a body of uncertain a priori knowledge (by entering as a new constraint the fact that one event has become impossible), while focusing operates a simple change of reference class from U to A (in the sense of Kyburg) without altering the original knowledge ("focusing on class A"). To see it, consider the upper and lower probability system defined by probability bounds. Such bounds can be, as said earlier, induced by numerically quantified conditional sentences (e.g., [35]), i.e., ~ = {P[P(BiIAi) ~ Qi, i = 1, n} where Qi is an interval. Such a system represents generic knowledge about Ai's which are Bi's probably; note that we may have Ai -- U for some i in case of unconditional pieces of knowledge.
227
Example. Consider the following small knowledge base P (young lstudent) >~ 0.9, P(singlelyoung) ~> 0.7. The focusing rule solves the following problem: Tom is a student; what is the probability that he is single. It can be checked that P(singlel student) is totally unknown: P , (single [ student) = 0, P* (single lstudent) = 1. In particular, nothing is known about P(student). The revision rule consists in entering the new piece of knowledge P(student) = 1, that expresses the fact that the knowledge base applies only to a population of students. Then it can be deduced that P(young) = P(younglstudent) >~ 0.9 and P(single) = P(singlelstudent) >/ P ( s i n g l e l y o u n g ) " P(young) = 0.63. However, it is a mistake to model "Tom is a student" by P ( s t u d e n t ) = 1 since the former statement applies to Tom only while P(student) = 1 refers to the whole population under concern. Another example illustrating the difference between focusing and revision can be given in the setting of belief functions: a die has been thrown by a player but he does not know the outcome yet. The player expects a rather high number (5 or 6), does not believe too much in a medium number (3, 4), and almost rules out the possibility of getting a small number (1, 2). He assigns a priori probability masses m({5, 6}) = 0.7, m({3, 4}) = 0.2 and m({1, 2}) = 0.1 to focal elements. If he asks himself what if the outcome were not a six, then he has to focus on the situations where he would get no 6 and compute BelA(B), P1A(B) for A = {1, 2, 3, 4, 5}, and B _ {1, 2, ..., 5}, where Bel(B) = }~E = Bm(E). For instance, Bela(5) = 0 = Bel(5),
t Rigorously this conditioning should be called "robust Bayesian conditioning" since upper and lower probabilities are used in robust statistics. Bayesian conditioning applies, stricto sense to probability distributions only, and Dempster (or maximum likelihood) conditioning also extends probabilistic Bayesian conditioning. But we keep the term Bayesian conditioning for the focusing rule, as a simplification.
0.7 Pla(5) - - - 0.7 = Pl(5). 0.7 + 0.3 This is because he cannot rule out the situation where his belief in outcome 5 precisely would be zero. But if a friend tells him that the outcome is not a 6, then by Dempster's conditioning one gets BeI(5IA) =0.7. What happens is that, insofar as
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
228
outcome 5 receives all the mass, 6 is now definitely ruled out.
= max {P(f-l(B)lf-X(A)),fe F} since f - I(A c~B) = f - l(A)c~f- I(B).
3.3. Conditioning in Dempster theory In Dempster's approach to belief function theory, a finite probability space (E2,P) is projected to a set U through a multiple-valued mapping F, thus inducing belief and plausibility functions on U, such that Bel(B) = P(B,)/P(U*)
~YB _ U, and
Pl(B) = P(B*)/P(U*), where B , = {og[F(co) _ B}, B* = {o9[F(o9)c~ B # 0 } are the lower and upper images of B via F. Revision by a constraint A ~ U comes down to modifying the multiple-valued mapping F: (2 ~ 2 v into FA:Q ~ 2A such that FA(tO ) = F(co)c~A, and normalizing the result. Denoting B , A and B *a the lower and upper images of B via Fa, yields P1 (B [A) = P (B*A)/P (A'A)
=
p ((B ~ A)*)/P (A*),
(Bc~A)* = {o)IFA(CO)c~A~B # 0} as defined above. On the contrary, focusing on A _ U comes down to envisaging all the possible inverse images of A through mappings f : ~2 --. U compatible with F, letting since B ' A =
F-I(A) =
{ f - l ( A ) [ f e F},
w h e r e f ~ F means Vo),f(co) 6 F(co), and to conditioning P on all possible such f - 1(A) in (2. Noticing that F - ~ ( A ) = {CIA. _ C _~ A*}, it can be proved that [19],
Besides f - ~(B) ~_ B* and A . _~ f - 1(A) _~ A* by the definition of A., A*. Hence, P1A(B) ~< max{P(B*[C), A. ~ C ~_ A*~. What is left to prove is that for all C, B* and C can be put under the form f - l ( B ) and f - l ( A ) , respectively. To see it, note that in order to verify f - a (A) = C, what is requested is that for all co ~ C, f(o)) ~ A, and for all o)¢C,f(co)6A. This is possible since for all co ~ C, F(oo)c~A # 0 because C __ A*, and for all co¢C, F(cn) ¢:A since A . __ C. The condition f - 1(A) = C is thus verified for any f e Fc, where Fc(cO)=F(co)c~A, for all co6C; and Fc(cO) = F(co)c~/1 if ~o¢C. The condition f - l ( B ) = B* can be simply verified by forcing f ( c o ) ~ B for all co~B*. Now if co~B*c~C, it is possible to let f(~o)e B ~ A since F(~o)c~ Bc~A # 0 and f(co)~Fc(~O) since Fc(O))c~B= F(cn)c~Bc~A. If ~o ~ B*c~C, we let f(~o) ~ Bc~A, a n d f E Fc is again possible. Hence, for all C,
3f6 F,
P(B* c~C) _ p ( f - 1 (B) c~f- 1(A)) P(C) P ( f - I(A))
that is, the supremum of P(B*FC) is attained at P1A(B ). Note that choosing C = 0 is possible, when A, = 0 and then it is possible to c h o o s e f s u c h that f - 1(A c~ B) = f - 1(A) = {05} for some 05 e Q. To see it, notice that if A c~ B # 0, then A* c~ B* ¢ 0. Let 05 e A* c~ B* and force f(05) e A c~ B. Now, for all ¢ o5 it is possible to letf(oJ)¢A since A, = 0. In that c a s e f - l ( A ) = {05}, P1A(B) = 1, and the maximum of P(B*IC) is attained for C = {05}. []
Theorem 1. Pla(B) = maxA, = c _~A*P(B*] C). Proof. Let ~ = {P', P' >~Bel} where P' is a probability function on U. L e t f e F and Ps be a probability measure on U induced by P, the probability measure on f2, via f i.e., PI(B)=P(f-I(B)), VB ~_ U. Let ~ r = {PI,f~ F}. It is well-known that ~ is the convex hull of ~ r [8]. Now, P1A(B) = sup{P'(B[A), P' 6 ~ }
= max{P'(B[A), P' e ~r} since ~ is a convex polyhedron
This proof is published in [19] with many typographical errors and is recalled here for the sake of completeness. In other words, focusing on B _ U comes down to conditioning on an ill-known event in ~, due to the imprecision expressed via the multiple-valued mapping. On the contrary, Dempster's conditioning rule assumes that C = B* in the above theorem. This result may help in the practical computing of the focusing rule since it comes down to maximizing over a finite set (t2 is assumed to be finite).
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
4. Bayesian conditioning in possibility theory
Lemma 1.
Since numerical possibility measures are particular cases of plausibility functions that are max-decomposable, it makes sense to envisage possibility degrees as upper bounds of probabilities and to apply Bayesian conditioning to possibility and necessity functions:
HA(B) =
max
(1)
(H(Ac~B"
II(Ac~B) " l l < A ~ B ) + ?C(A))"
Proof. II(Ac~B) FI(AnB) + min(N(A), N(B))
HA(B) =
{"
H(Bc~A) 'fiB, HA(B) = H(Bc~A) + N(Bc~A)
229
H(Ac~B)
n(Ac~B)
"~
= max ~ / / ( A ~ ~ ~ N(B)' II(A--~-+-N(A))" N(/~) = min(N(BwA), N(BwA)), hence, H(Ac~B) + N(B) = min(H(Ac~B) + N ( B w A), II(Ac~B) + N(/3uA)). Noticing that II(Ac~B) + N ( B w A ) = 1 and N ( B w A ) >~ N(A) yields the re-
Now and
N(Bc~A) NA(B) = N(Bc~A) + H(Bc~A)"
(2)
When N ( A ) > 0, (1) and (2) do correspond to Bayesian conditioning namely HA(B) = sup {P(-]A)[ P ~ ~(H)} and NB(A) = inf{P('lA)[P ~ ~(H)}, where ~ ( H ) = {PIP(B) < II(B), `fB ~_ U}. The following result, which is one of the main contributions of this paper, shows that despite its strong probabilistic flavor, Bayesian conditioning preserves the characteristic property of possibility measures.
Theorem 2. Given a possibility measure H on a set U, the conditional set-functions HA and NA still define a possibility measure and a necessity measure respectively, with possibility distribution 7za(u) = m a x
hA(U) = 0
.(u) )
7r(u), u(u) + N(A)
if u ~ A,
if uCA.
(3)
The proof of this result first claimed (without proof) in [17, 18] was published in the finite case in [22], and is here proved for a-maxitive possibility measures (see appendix). A similar result was independently proved in [24, 29] for belief functions: conditional belief functions in the Bayesian sense are still belief functions; see also [35] for a textbook containing this proof. This result was also recently published by Walley [42] for possibility measures, who uses an alternative, shorter proof based on the following lemma.
sults, since
H(A~B) H ( A ~ B ) + N(B) =max(lI(Ac~B),
H(ApB) II(Ac~B) + N(BuA)J"
[]
So the max- (and even the sup-) decomposability of HA follows from the continuity and increasingness of the mappingf(x) = max(x, x/(x + N(A))). The fact that HA is still a possibility measure means that the conditioning of a family of probability measures induced by lower bounds bearing on nested sets can be faithfully performed on the equivalent representation in terms of a possibility distribution and that the computed bounds on events delimit a family of probability measures of the same kind. How ZtAis modified with respect to 7z is pictured in Fig. 1. From (3), we get ~A(U) = ~(u)
if u E A and ~(u) ~> 1 - N(A) = FI(A)
~(u) g(u) + N(A) =0
if g(u) ~< H(A)
ifu6A.
Unsurprisingly, the support of gA is included in A as expected. In the finite case, let B1 -~ --- --- Bn be the fl-cuts of 7:, i.e., ~A(U) = fix = 1 if U ~ B1, n(u) = fli if u ~ B i f ~ B i _ l , i = 2, n. Let B k be such
230
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
I
II
iI
~A I
n~
~'~A
~1..~
I I
j
i
i I
I
A v
I
I I
I Fig. 1.
that Bk-1 CC__ A, and Bk ~ A ~ O. Then/-/(A) = ilk. Hence, if u ~ Bk, ~ra(u) = zt(u). Only the weights of the fl-cuts Bj & A of na are modified with respect to the ones of the fl-cuts Bj of z, for j = k + 1, n. Indeed, if u ~ A\Bk, then z~(u) = fl~ for some j > k and na(u) = flJ(fli + 1 - ilk). This quantity lies in [flj, fl,] since fl~ <<.flk on the one hand, and flJ/(fli + 1 - ilk) increases from 0 (flj = 0) to flk (fl~ = ilk) when flj increases. So, Zta(U) differs from n(u) only on A\Bk (and on .4 where z c a - 0), see Fig. 1. The following properties are easy to check and expected as characterizing a focusing operation: • if N(A) = 0, ha(u) = 1, Vu ~ A (total ignorance inside A when A is not an accepted belief, i.e., N(A) > 0 does not hold). This means that if nothing is known about A, focusing on A does not bring any information. However, this behavior of Eq. (1) cannot be justified in terms of Bayesian conditioning since P(A) = 0 may occur; • Na(B) = Na(AnB) = N a ( A u B ) (same property for Ha). It means that when we focus on A, inquiring on B, A c~ B or on B u_~ are the same. This is due to the fact that like in probability theory, NA(B) is the certainty degree of the conditional event B[A [27], viewed as the family {XIBc~A ~ X c_ A w B } . Indeed, NA(B ) only depends on N(Ac~B) and N ( A w B ) . Other, more usual forms of conditioning exist in possibility theory. In case of a numerical possibility scale, we can also use Dempster rule of conditioning, specialized to possibility measures, i.e., consonant plausibility measures of Shafer [-36]. It leads
to the definition
VB, B n a
# O, H(BIA) = H ( A ~ B ) ,
n(a)
(4)
provided t h a t / / ( A ) # 0. The conditional necessity function is defined by N(BFA)= 1 - I I ( B I A ) , by duality. The corresponding revised possibility distribution is
[ ,t(u) rt(ulA)
=
l~(A~)
Vu~A'
(5)
otherwise. Note that n a / > z('lA). Indeed, if N(A) = 0 then ~ A = 1 on A, and if N(A) > O, na >~ ~ = zt('lA) since II(A) = 1. It points out that focusing is not to be confused with revision: focusing does not bring new information, since we lose information when focusing on some ill-informed subsets (since rCA i> ~ on A), and ignorance is preserved. On the contrary, revision makes information more precise. The notion of conditional possibility measure goes back to Hisdal [28] who introduces the set function H(. IA) through the equality
VB, B ~ A # O, H(Ac~B) = min(H(BIA),II(A)). (6) This equation is similar to the one yielding (4), changing min into product. However, (6) makes sense for qualitative possibility measures that take values on an ordinal bounded totally ordered scale
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
L whose maximal element is denoted 1 and minimal element denoted 0. This scale may be finite. The ordinal conditional possibility measure I I ( B r A ) is often defined (see [12]) as the least specific solution to (6) which allocates the greatest allowed possibility degrees, that is, when II(A) > O, II(BI°A)=
1 II(Ac~B)
if II(Ac~B) = II(A) > O, otherwise.
(7) The ordinal conditional necessity function is defined by N ( B ] ° A ) = n(H(B[°A)) where n is an order-reversing function on L with n(1)= 0 and n(0) = 1. The possibility distribution associated with (7) is given by 1 re(ul°A) =
0
of the form "A's are B's generally" (see, e.g., [2]). The usual meaning of such a rule is statistical, that is, "most A's are B's" that come down to expressing a restriction on the probability P(B]A), usually a lower bound (as first studied by George Boole himself) or a fuzzy quantifier [44]. An alternative meaning of the rule can be that in the reference class A a situation where B occurs is more normal than a situation where its contrary occurs. This idea can be expressed in possibility theory, as suggested above - either in a purely qualitative setting, using the qualitative conditioning (7), by the constraint N(B [°A) > 0 that is equivalently expressed as II(Ac~B) > II(Ac~B)
(9)
- or in a numerical setting, using Dempster rule (4), i.e., and a numerical weight ~ ~ 1
if ~(u) = H(A), u ~ A,
re(u) if re(u) < H(A), u ~ A,
231
(8)
N(B]A) >~ ct > 0
if uq~A. 1
If II(A) = 0 then re(. [°A) is still a solution to (6) and is equal to/~A, the {0, 1}-valued characteristic function of A. In this case,/~A is simply substituted to re. Adopting the above ordinal definition of conditioning creates some technical problems in the infinite case due to the lack of continuity of G6del implication, since II(B[°A) = II(A) ~ II(Ac~B) where a ~ b = 1 i f a ~
b. Indeed H(B[°A) = sup,~re(u[°A) (a-maxitivity) may then fail to hold for non-compact events B. The restriction to compact events is no longer necessary with the product-based view of conditioning [3]. Note that for both Dempster and ordinal conditionings, N ( B I A ) > 0(resp. N(B[°A) > O) ~:~ H ( A ~ B ) > H(Ac~B), which expresses that B is somewhat certain in the context A if and only if B is more possible than /~ when A is true.
5. M o d e l l i n g
rules with exceptions
Conditional possibility in the sense of (7) or (4) plays an important role for modelling default rules
<=~II(Ac~S) > 1 - - ' I I ( A c ~ B ) . (1 -
c0
(10)
Indeed N(B[A) >7 ~ *~ 1 - ~ >>.H(B[A)
rt (A c~B) max(H (A c~B), II (A c~ B)) = min(1, H(Ac~B)~ II(Ac~B) J" Each of (9) or (10) expresses that the situation where A c~B is true is strictly more possible than the situation where A c~/~ is true, which agrees with our intuitive understanding of a default rule. The qualitative approach (9) does not use any idea of strength of the conclusion B in the presence of A, since N(B I° A) > 0 expresses only the acceptance of B in the reference class A. The interpretation of N(B[°A) > 0 in terms of acceptance is supported by the fact that the set of events {B]N(B[°A) > 0} is deductively closed (see [21]). On the contrary, (10) introduces a strength level ~ for the rule. Noticing that n = 1/(1 - ~) > 1 whenever a > 0, this type of rule comes down to claiming that the situation A c~B is n times more plausible than A c~/~. Using the principle of minimal specificity [14], the least informative possibility distribution
232
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
such that such that
7t*(o))=
/7(Ac~B) >I 1 / ( 1 - ct)II(Ac~B)
1 1 .-ct
Vco ~ AwB, otherwise,
is
(11)
specific possibility distribution exists and is defined by
/1
~zx(~o) =
if o ) ~ A n B , if co e A c~/3,
x ~x
1 - - -
if co~A.
1--a
i.e., the rule is viewed as an uncertain material implication (the latter is recovered if a = 1). The set of possibility measures compatible with the default rule is described by the possibility distributions The possibilistic modelling of a default rule in terms of focusing is motivated by Theorem 2, which establishes that the focusing rule is internal to the possibility calculus. It leads to another type of constraint, namely for ~ ¢ 1
NA(B) >1 5 .~ (1 - 5)' N (Ac~B) >~ o¢"/7 (A ~ B )
Clearly if x C y ~ [ 0 , 1 - - a ] , then neither rc~ ~< rcr nor roy/> ~x hold. Hence, the following result is obtained. Theorem 3. The maximal elements of the set {rt]NA(B) ~> a > O } , f o r the ordering 7r >>.~z',form the set {Trx[x6 [0, 1 - ~]}, where ~zx is defined by (13). For x = 0 (i.e., 5 = 1 ) one recovers n o = max(1 - PA, #n), i.e., the material implication. For
X=I--~,
¢*- 1 - ~ ~> max(H(A c~B), a . / 7 (A ~ B)
zcl_ ={11
ifo)~Ac~B,
5
-
+ (1 - a)./'/(.~)).
(13)
(12)
The second equivalence is easily obtained by noticing that N(Ac~B) = 1 - I I ( A u B ) = 1 - / / ( . 4 u ( A c ~ B ) ) . Na(B) >>.~ is a stronger constraint than N(B]A) >>.a since Na(B) <~ N ( B [ A ) (this inequality holds for belief functions as well). Hence, any n such that Na(B)>~ c~ is such that ~< ~*. This constraint (12) is also more complex than (9) since it can be split into two more elementary constraints on H (A n B)
I-I(AnB) <~ 1 - 5, II(Ac~B) <<.(1 - a)(1 _ / / ( . ~ ) ) = (1 - ~)N(A) 5 5 which one of these constraints is effective depends upon whether we assume that N ( A ) < a or not. In the case of a single rule, the least specific possibility distribution satisfying Na(B) ~ 5 does not exist generally, as opposed to other forms of constraints like (9) and (10). This has been rightly noticed by Maung [31]. To see it, notice that the greater the value of H(A ~ B) the smaller the possible values for H(.4) in (12). Let x = I-I(A~B) where x e [0, 1 - a]. When x is fixed the least
otherwise,
which is more specific than re*, the least specific possibility distribution obeying (10). In fact, for instance, as soon as/7(Ac~B) > 0 , / / ( ~ ) < 1 is enforced. Then II(Ac~B) must be 1 so as to preserve normalization. So the possibility that the rule does not apply (//(.4)) is constrained, while this latter restriction does not exist for (9) and (10). It is interesting to compare the constraint Na(B) ~> ~ with the probabilistic constraint P(B]A) >>.a. Namely, let ~ = {P[P(B]A) ~> ~} be the set of probabilities induced by the latter and ~g = {n]Na(B) ~> a} be the set of possibility distributions induced by NA(B) >>.a. Clearly, P(B[A) >~ 5 is equivalent to P ( A n B ) >1 e(P(Ar~B)+P(Ac~B)), and then to P ( A ~ B ) (1 - a) ~> a P ( A n B ) . The set ~ can thus be parametrized with two parameters y and t such that
P(A~B)=y P(AnB)
=
with y e [0, 1], (1- - y a)
-
a
-
t,
P(fi.) = 1 + t - y/a, where t ~> 0 and y are subject to constraints ensuring that O ~ < P ( A ) = y / a - t ~ < 1. It is easy to see that P * ( A c ~ B ) = s u p { P ( A c ~ B ) ] P e ~ } = I ,
233
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
and P*(A) = s u p { P ( A ) l P e ~ a} = 1. Moreover, P*(Ac~B) = s u p { P ( A ~ B ) I P e ~ } = 1 - e. Indeed, enforcing P(A) = 0 yields P(Ac~B) = 1 - y, and t = (y/a) - 1, which implies y ~> ~ since t >~ 0. Hence, P ( A c ~ B ) is at most 1 - e. It is obvious that the same bounds are obtained using FI*(X) = s u p { H ( X ) l n ~ c g } for X = A ~ B , Ac~B, /1, and that H * is the possibility measure induced by 7r* as defined by (11) (the only maximal element of {~]N(BJA) >~ e}). Let ~x be a maximal element of cg = {TZ, NA(B) ~> e} = Ux~[0,, ~]{ZCln ~< 7Cx}, and n such that P ~> N~ where N~ is the necessity measure induced by rex. Then it does hold that P >~ N~ implies P E [email protected], P >~ N~ implies that
Theorem 4. The set of probability measures ~ = {P[ P(BI A) >~ e} induced by a conditional probability lower bound is equivalently characterized by = ~ [ o , I - ~ ] { P I P ~< Hx} and is contained in { P [ P % H*} where H * is the least specific possibility measure such that N ( B I A ) >>. e.
P ( A c ~ B ) <%H x ( A C ~ B ) = x <. 1 - - e ,
In previous papers, the qualitative modelling of default rules via (9) has been studied at length (e.g., [-2, 20]). Modelling default rules using (10) is also investigated by Goldszmidt and Pearl [-26] by means of non-negative integers n(u) to evaluate degrees of impossibility of u (such that 7z(u)= 2-'(")), following a proposal by Spohn [-38]. It is also possible to envisage the modelling of a set of uncertain rules A = {(A i --+ Bi, ei)li = 1, n} as a set of constraints of the form suggested by Maung [-31]
ex
P(A) <<. H x ( A ) = 1
1 -e'
P ( A c ~ B ) >1 1 -- H ~ ( A w B ) = 1 - max(Hx(AC~B), H~(A))
=l-max
(x, 1 1 ~x ) -e
1-a'
since x~< 1 - e x / ( 1 - e ) when x~< 1 - e . So P ~> Nx implies P ( A c ~ B ) >~ ex/(1 -c~) and P ( A c ~ B ) <~ x, which implies ~P(Ac~B) ~< (1 - e) P(A m B), which is equivalent to P ( B I A ) >1 e. Conversely, let P e ~ , i.e., such that P ( B J A ) >~ ~. We may try to see if there is n scd such that P ~< H. The safest choice is n = ~ for some x e [0, 1 - e]. For x = 0, H ~> P such that P ( A n B ) = 0 . For x > 0, it is enough to check the inequality for Ac~/3 and Aw/~ since H x ( A C ~ B ) = 1 ~ IIx(A) = 1 - ex/(1 -- e) >i Hx(AC~B) = x > O, and for x ~< 1 - c~, we have I I x ( A u B ) = IIx(A) = f l x ( A c ~ B ) = Hx(AC~B). Using the above parametered representation of ~ , P ( A ~ / ~ ) = ((1 - e ) / e ) y - t, and we let x = ((1 - cO/e)y - t so that P ( A c~ B) = FI~(A c~ [~). This is possible since ((1 - e ) / e ) y - t ~< 1 - c~~:~ (1 - ~ ) ( y / e - 1) ~< t holds, due to y / e - 1 <~ t. Now we have P ( A w B ) = 1 -- y <~ I I x ( A w B ) = 1 - a((1 -- e)/ e)y-t)/(1-e)=l-y+~t/(1-e). Hence, we have proved the following result.
Hence, the constraint NA(B)>~ e (as inducing probability bounds) completely agrees with the constraint P ( B I A ) > ~ e, which is in turn tighter than the constraint N ( B I A ) >~ e.
6. Reasoning with Bayesian possibilistic conditionals
NA,(Bi) >~ ei,
i = 1, n.
The question-answering problem is to compute the lower bound of N~(C) for any pair of events (E, C). It aims at computing the degree of certainty of event C on the basis of evidence E and knowledge A; see [,1, 39, 40] for the same problem in terms of conditional probabilities, and [ 19] in terms of belief functions. Solving the above question-answering problem in full generality is beyond the scope of the paper. We shall consider a simple but typical example, we call the Bayesian possibilistic syllogism, which is the possibilistic counterpart of the probabilistic syllogism: NA(B) >~ e
NB(A) >1 [3
N~(C) >t 7
Nc(B) >. 3
Na(C) >~ x
N c ( A ) >~ y
(BFIS),
where the problem is to compute the least values of the bounds x and y, see [10, 40] for an extensive
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 2 2 3 - 2 4 0
234
study of this syllogism with conditional probabilities. This reasoning pattern is similar to the cut rule in logic (also called transitivity), since the first line describes a reasoning link between A and B, the second one a link between B and C, and the syllogism cancels B. (BIIS) can be rewritten using (11) as follows: 1 - • >1 II(Ac~B),
1 - ~ >i n(~c~B), 1 - Y >~ H(Bc~C), 1 - ~ ~ FI(Bc~C), 1 - ~ >1 cdI(Ac~B) + (1 - a)/-/(.zi), 1 - fl >>.flFI(Ac~B) + (1 - fl)H(B), 1 -- y >~ yFI(Bc~C) + (1 - 7)H(B), 1 - 3 >~ 6FI(BnC) + (1 - 6)FI(C).
(14)
In Appendix B, it is proved that optimal lower bounds are NA(C)>~ min (a, y, max(fl, 1 _ , +
Nc(A)>~ min(6, fl, max(7, 1
fl
+,))
2 I n d e e d , m i n ( 7 , m a x ( f l , fl/(1 - ? + fl)))/> m a x ( O , 1 - (1 - 7)/fl), 7fl >~ fl + 7 - 1
(y - 1)2 >/0.
and
N(B[A) >I a, N(AIB) >t fl, N(C[ B) >>-7, N(B]C) >I 6. This bound is easily obtained by applying the minimum specificity principle and is just N(C]A) >~ min(a, 7) (see the foot-note in the Appendix B). Here the chaining effect always occurs regardless of the value of ft. It indicates that inference based on Bayesian conditioning is more cautious than the one based on Dempster conditioning (and ordinal conditioning as well, see [2, 20]). The latter corresponds to a default inference which assumes everything is as normal as possible (the case when exceptions to the rule "B implies C" are the situations where A is true being then viewed as abnormal). Finally, it is interesting to notice that Na(C) and N(C]A) have the same lower bound if and only if y ~< ft. That is, chaining with Bayesian conditioning from A to C is allowed with strength depending only on the two rules "if A then B" and "if B then C'as soon as the weight of the rule "if B then A" is large enough so as to dominate the rule "if B then C".
.
If y ~< fl, then NA(C) >1 min(c~, 7)- Otherwise, NA(C) >~ min(a, fl/(1 - y + fl)). Note that the bound 6 on Nc(B) is not affecting the bound on NA(C). Moreover, when fl---0, that is, when the rule "ifB then A" is absent, Na(C) is unknown, and no chaining occurs that links A to C, except if7 = 1. Even ife = 1 we may have Na(C) = 0. Indeed even if"all A's are B's", as soon as the rule "B implies C" has exceptions, these exceptions may precisely be the A's. This is exactly what happens with probabilities, although the expression for the exact lower bound is different and is P(CIA)>~ e.max(0, 1 - ( 1 - 7)//?), clearly lower 2 than the
since
exact lower bound of NA (C)(e.g., [10]). This behavior contrasts with the one of the lower bound of N (CI A) under constraints
fl/(1 - ~, + fl) >~ (fl + y - 1)/fl
7. Concluding remarks This paper has introduced a Bayesian conditioning operation in numerical possibility theory, adapted to the idea of focusing a body of knowledge on a reference class described by a body of evidence. It can be justified with a view of a possibility measure as an upper bound of an ill-known probability. The result of this operation is still a possibility measure, which shows that Bayesian conditioning remains coherent with possibility theory. This conditioning operation is especially devoted to question-answering when the possibility distribution is a good approximation of a family of probability measures that represent generic knowledge, and some piece of evidence is available on a case at hand. An application to the representation of default rules has been briefly suggested, but the clarification of its merits with respect to other
D. Dubois, H. Prade/ FuzzySets and Systems 92 (1997) 223-240
235
approaches based on the revision-type conditioning still requires further research.
Lemma. VA, A', B, H(A' ~B) > H(Ac~B) implies N(A'c~B) <<.N(Ac~B).
Appendix A. Min-decomposability of conditional necessity measures in the Bayesian sense
Proof. Let us show that in possibility theory it is impossible to have both/1 (A' c~B) > H(A n B) and N(A'c~B) > N(Ac~B). In the sequel, we use the following equivalence.
First observe that
max(a, b) > max(a, c) *~. b > a and b > c.
min(N(A), N(B)) NB(A) = min(N(A), N(B)) + 17(Ac~B)
Using the notations introduced in Fig. 2, /7(Ac~ A' c~B) = u, FI(Ac~A' c~B) = t, ..., we have
since N is a necessity measure (N(Ac~B)= min(N(A),N(B))). Noticing that the function x/(x + ~) is increasing with x (if ~ > 0), we get
FI(A' c~B) > II(fitc~B)
Nn(A)
N(A'c~B) > N(Ac~B)
N(A) N(B)] = rain N(A) + II(Ac~S)' N(S) + H ( A ~ B )
(**)
,~ max(t, v) > max(t, u) ¢:~ v > max(u, t),
,~ max(H(A'), FI(B)) < max(H(A),//(B))
[
"
Given an arbitrary family of sets {Aili~I}, it must be proved that if N ( ~ i ~ I A i ) = infi~zN(Ai), then NB(~i~iAi) = infi~1NB(Ai). Noticing that H[(Ui~IAi)~B ] = supi~iI--l(flli~B) and due to the decreasingness of x/(x + ~) w.r.t, a, we get
(FI(A) > H(B) (max(u, t, x, y) > max(x, t, v, z) ¢¢" < [max(u, t, x, y) > max(x, y, w, z). We use (**), first with a = max(t,x), and then a = max(x,y). This leads to the following system of constraints which cannot be satisfied simultaneously max(u, y) > max(t, x);
min (N (B), infi ~~N (Ai)) min(N(B), infi~N(Ai)) + s u p ~ I I ( A j ~ B )
max(u, y) > max(v, z);
infi~1N (Ai n B ) infix1N(AI c~B) + sup~ t / / ( A j c~B)
max(u, t) > max(w, z);
max(u, t) > max(x, y);
and We, thus, have to prove that
v > max(u, t).
infi~,N(Aic~B) infi~lN(Ai~B) + supj~iH(Ajc~B) = inf N(Aic~B) i~l N(Aic~B) + H(A.i~B )'
It is easy to see that this system is incoherent by observing that all the variables t, u, v, w, x, y,
(*)
where the left-hand side of (*) also writes inf N (Ai c~B) i~t N ( A i ~ B ) + supj~H(A~c~B) and is obviously less than or equal to the right-hand side. To prove the equality we need the following lemma.
Fig. 2.
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
236
z appear in the right part of the inequalities, which makes it impossible to satisfy all the constraints together. Indeed, taking the "max" of the five inequalities, we get max(t, u, v, y) > max(t, u, v, y, x, w, z) which is impossible. [] Now,
Appendix B. Exact lower bounds in the Bayesian possibilistic syllogism Let co~, co2 . . . . , cos be the set of atoms of the Boolean algebra generated by the sets A, B, C, as pictured in the following Venn diagram
Nn(N,,I~IAi)can be written as (
B
N(Ai~B)
inf min \ N(A, ~ - ~ +--~A~ n B)'
i,j ~ I i#j
N(AimB)
N(Ajc~B)
N (Ai ~ B) + II(fii n B)' N(Ai n B) + Fl(-~i~ B)' N(Aj c~B) N(Af~B) + II(A~nB)/" Fix i # j. Assume II(A~c~B)> II(A~c~B), without loss of generality. Then, using the above lemma, N(Aic~B) ~ N(A~nB). So the least term of the four is N(Aic~B)/[N(AIc~B) + H ( f i . i n B ) ] . If H(Aic~B) > II(_~jc~B), the same property holds (up to a renaming o f / a n d j). Hence each term of the form N(A~c~B)/[N(A~c~B) + H(.4jc~B)] for j # i is at least as large as a term of the form N(Aic~B)/ [N(Aic~B) +//(Aic~B)]. Hence, (,) holds. Similarly, H~(A) is a possibility measure: Indeed, we have
17~(A) =
H(Ac~B) 17(Ac~B) + N(AnB) N(.~c~B) 1 = 1 - Ns(A). N(A~B) + H(A~B)
C i.e., col=A~B~C, ¢o2 = A ~ / ~ C , . . . , e t c . Let xi = FI(@), Vi = 1, 8. Then (BIIS) comes down to the following set of constraints on x~: 1 -- a ~> max(x6,
X2) ,
1 - fl ~> max(xT, x3), 1 - 7 >~ max(x4, xT), 1 - 6 t> max(x2, x5), 1 - - a i> a m a x ( x 6 , x2) + (1 - - a ) max(x3, x7, xs, Xs),
1 - fl ~> flmax(x3, x7) + (1 - fl)max(x2, x6, xs, xs), 1 - 7 >~ 7max(x4, xT) + (1 - 7)max(x> x6, xs, xs),
The associated possibility distribution is
1 - 6/> max(x2, xs) + (1 - 6)max(x4, x6, xT, xs).
~ ( u ) = rs~({u}) =
The four first constraints are more simply written as follows 3
~(u)
~z(u) + N({u}~B)
1--a~>x6,
~(u)
7z(u) + min(U({u}),
N(B))
~(u) ~(u) + min(1 - zr(u),
N(B))
~(u) rain(l, rc(u) + = max (u(u),
1 - max(a, 6) >~ x2, 1--fl/>X3,
1 - max(fl, 7)/> x7,
N(B)) 7:(u)
~(~) 7X~(B)/"
3 It is easy to check that using Dempster rule of conditioning for modelling the rules, the obtained constraints on the possibility distributions are precisely these ones and only these ones.
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223 240 1 -7
237
B. 1. X 4 > max(x3, xs, X6, X7, XS)
>~ x4,
1-6>~x5. Finding the least value of NA(C) under the above constraints comes d o w n to minimizing
N ( A m C) N ( A c~ C) + I I ( A c~ C) 1 - max(x3, x4, xs, x6, xv, x8) 1 -- max(x3, x4, xs, X6, X7, X8) -~- m a x ( x , . X6)
Then notice that we can assume x3--= x5 = x6 = x7 = x8 = 0 without altering the value of b. Then we consider the family of possibility distributions such that Xl = 1, x 4 e [0, 1 - 7], xi---0 otherwise. It is obvious to check that each of these distributions obey all constraints of (BFIS). U n d e r this assumption, the best value of b is (1 - 7)/7. Hence, i n f N c ( A ) <~ 7. If x 6 max(x3, x4, xs, xT, Xs), a similar analysis can be done and inf N c ( A ) <<. ~.
or equivalently to maximizing
B.2. xs >~ max(x4, xs, X6, X7, X8) max(x6, X4)
b =
It splits into two cases, whether x,~ >~ x 6 or not in the n u m e r a t o r of b:
1 - max(x3, x4, xs, x6, XT, X8)"
N o t e that the above expression is non-decreasing in any of x3 . . . . ,x8 so that the least value of NA(C) will be attained by maximizing these quantities. This is agreement with the minimal specificity principle. N o t e that ( B I I S ) does not constrain Xl in general. It is always possible to choose xl = 1, which implies that x 3, x4, xs, x6 and x7 can all differ from 1. If a > 0, fi > 0, 7 > 0, 6 > 0, assuming x8 = 1 is impossible (otherwise N ( A c ~ B ) = Na(B) = 0 = N(Bc~ C) = NB(C), which contradicts ( B I I S ) and trivializes the constraints). In that case x t should be 1. However, we have not considered the other constraints induced by (BIIS). In any case b*, the least upper b o u n d of b, is attained for the maximal value of one of X4
m a x ( x , , X6)
X6
1--X 4 1--X 6 max(x4, x6) 1 -
l--x 3 m a x ( x , . X6) max(x4, X6)
xs
1 -
xv
1 -
x8
according to the highest value attained a m o n g x3, x4, xs, x6, xT. x8. It is thus e n o u g h to consider each assumption of the form xi >/
max
Xj
] e {3,4, 5,6, 7, 8}
and check the constraints in (BIIS).
(i) x4 >~ x6. Then the constraints of (BUS) read X3 /> x4, 1 -- ~ >~ ~max(x6, x2) + (1 -- ~)X3,
1 -- fl >~ fix3 + (1 -- fl)max(x> x6, xs, x8), 1 -- 7 /> 7 max(x,, xv) + (1 -- 7)max(x2, x6, xs, x8), 1 -- 6 ~> 6 m a x ( x > x5) + (1 -- 6 ) m a x ( x 4 , x v , x 8 ) . Since x2 is irrelevant to b, it can always be assumed to be 0 for the c o m p u t a t i o n of NA (C). Moreover, in order to let x4 and x3 be as big as possible under the assumption of this paragraph, we m a y also let x s = x 6 = x T = x s = 0 . All possibility distributions such that xl = 1, x3 ~ [0, 1 - fl], x4 ~ [0, 1 - 7], and xi = 0 otherwise, are feasible. If fl ~< 7 then the value of b* under these assumptions is (1 - ?)/fl, and infNa(C) ~< ill(1 - 7 + fi). Otherwise x¢ = x3 = 1 - ft. Hence, infNA(C) <~ max(fl, fl/(1 - 7 + fi)). (ii) x3 >~ x6 ~> x4. A similar reasoning leads to assume x2 = x4=xs=xT=xs=0, x l = 1, x 6 ~ [ 0 , 1 - ~ ] , x3 ~ [0, 1 - fi]. These possibility distributions are solutions to (BHS) if and only if X3 ~ X6,
1 -- ~ >~ (XX6 -/V (1 - c0x3,
1 -/3 >/~x3 + (1 -/~)x6.
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 H997) 223-240
238
The problem is thus to maximize X6/(1 -- X3) under the above constraints on the rectangle (x3, x6) [0,1-a]x[0,1-fl]. It is easy to see that 1 - a ~> ax6 + (1 - a)x3 also writes X6/(l -- X3) (1 - a)/a. Assume a ~> fl, then it is easy to check that x6 = x4 = 1 - a is a feasible solution that is optimal. Otherwise if a < fl then the maximal value for x6 is 1 - fl as well as for x3. Hence, b* = min((1 - a)/a, (1 - fl)/fl) and infNA(C) ~< max(a, fl).
B.3. x 5 >>-max(x3, x4, X6, X7, XS) It splits into two cases, whether x4 ~> x6 or not in the n u m e r a t o r of b (i) x4 t> x6. Then the constraints of (BFIS) reduce to x5 i> x4, x4 ~ [0, 1 - ~], x5 ~ [0, 1 - 6] and
1 - 7 >I 7x4 + (1 - 7)x5
1-
fl >l fixv + ( 1 - fl)x6,
1-y
~>yx7+(1-7)x6.
This case is similar to (B.2,ii) and leads to
infNa(C) <~ max(a, fl, 7). B.5. Xs >1 max(x3, x4, X6, X5, X7) Assuming x4 ~> x6, we get the constraint 1-7
>~Tx4+(1-7)(1-x8)
yielding b* = (1 - ~)/7. With x6 ~> x4 we get b* =
(I -
a)/a.
O n the whole, infNA(C) = min(a, T, max(fl, ill(1 -- 7 + fl)), max(a, fl), max(v, 6), max(a, 6), max(fl, 7), max(a, fl, 7)) = rain(a, 7, max(fl, ill(1 -7 + fl)). Similarly, exchanging A and C, that is, exchanging a and ~, 7 and fl, one gets infNc(A).
Acknowledgements
1-fi ~,~xs+(1-fi)x4. This case is the same as (B.2, ii) and leads to infNA(C) <~ max(v, 6). (ii) x6 >i x4. Then the constraints of (BFIS) reduce to x5 ~> x6, x6 e [0, 1 - a], x5 ~ [0, 1 - ~] and 1 - a t> ax6 + (1 - a)xs, 1 -- ~ ~ 6X 5 -[- (1 - - 6 ) X 6.
This case is the same as (B.2,ii) and leads to infNa(C) <~ max(a, 6).
B.4. x7 >i m a x ( x 3 , x 4 , x 6 , x s , x s ) It splits into two cases, whether x4 ~> x 6 or n o t in the n u m e r a t o r of b (i) x 4 / > x6. Then the constraints of (BHS) reduce to x 7 1 > x 4 , x 4 e [ 0 , 1 - 7 ] , X v E [ 0 , 1 max(fl, 7)] and the other constraints are redundant. Hence, like in case (B.1), the best values are x4 = x7 = 1 -- max(fl, 7), which leads to infNA(C) <~ max(fl, 7). (ii) x6 ~> x4. Then the constraints of (BHS) reduce to x7 ~>x6, x 6 ~ [ 0 , 1 - a ] , xTE[0,1max (fl, 7)] 1 -a
t> ax6 + (1 - a)x7,
The authors are grateful to the referees for their careful reading which led to improve the presentation significantly. M o r e o v e r thanks to Gert De C o o m a n for pointing out the alternative p r o o f to T h e o r e m 2.
References [1] S. Amarger, D. Dubois, H. Prade, Constraint propagation with imprecise conditional probabilities, Proc. of the 7th Conf. on Uncertainty in AI, Los Angeles, CA, Morgan Kaufmann, San Francisco, 1991, pp. 26-34. [2] S. Benferhat, D. Dubois, H. Prade, Representing default rules in possibilistic logic, Proc. of the 3rd Internat. Conf. on Principles of Knowledge Representation and Reasoning (KR'92), Cambridge, MA, Morgan Kaufmann, San Francisco, 1992, pp. 673-684. [3] B. De Baets, E. Tsiporkova, R. Mesiar, The surprizing possibilistic nature of the algebraic product. Proc. of the 4th Europ. Congress on Intelligent Techniques and Soft Computing (EUFIT'96), Aachen, Germany 2-5 September, 1996, pp. 549-553. [4] L. M. de Campos, M. T. Lamata, S. Moral, The concept of conditional fuzzy measure, Internat. J. of Intelligent Systems 5(3) (1990) 237-246. [5] G. De Cooman, The formal analogy between possibility and probability theory, in: G. De Cooman, D. Ruan, E.E. Kerre (Eds.), Foundations and Applications of Possibility Theory, World Scientific, Singapore, 1995, pp. 71-87.
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240 I-6] G. De Cooman, Possibility theory - Part I: The measureand integral-theoretic groundwork; Part II: Conditional possibility; Part III: Possibilistic independence, Internat. J. General Systems, 25(4) (1997) 291-371. [7] G. De Cooman, D. Aeyels, On the coherence of supremum preserving upper previsions, Proc. of the 6th Internat. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'96), Granada, Spain, 1 5 July, 1996, pp. 1405-1410. [8] A. P. Dempster, Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Stat. 38 (1967) 325 339. [9] D. Denneberg, Non Additive Measure and Integral, Kluwer Academic, Dordrecht, The Netherlands, 1994. [10] D. Dubois, L. Godo, R. Lopez de Mantaras, H. Prade, Qualitative reasoning with imprecise probabilities. J. of Intelligent Information Systems 2 (1993) 319-363. [11] D. Dubois, S. Moral, H. Prade, A semantics for possibility theory based on likelihoods, Proc. of the Internat. Joint Conf. of the 4th IEEE Internat. Conf. on Fuzzy Systems (FUZZ-IEEE'95) and the 2nd Internat. Fuzzy Engineering Symp. (IFES'95), Yokohama, Japan, 20-24 March (1995) pp. 1597-1604, extended version to appear in Internat J. Math. Anal. & Appl. [12] D. Dubois, H. Prade, Possibility Theory - An Approach to Computerized Processing of Uncertainty, Plenum Press, New York, 1988. [13] D. Dubois, H. Prade, Modelling uncertainty and inductive inference, Acta Psychologica 68 (1988) 53-78. [14] D. Dubois, H. Prade, Fuzzy sets in approximate reasoning Part I: Inference with possibility distributions, Fuzzy Sets and Systems 40 (1991) 143-202. [15] D. Dubois, H. Prade, When upper probabilities are possibility measures, Fuzzy Sets and Systems 49 (1992) 65-74. [16] D. Dubois, H. Prade, Evidence, knowledge and belief functions, Internat. J. Approximate Reasoning 6(3) (1992) 295 319. [17] D. Dubois, H. Prade, Focusing and updating: Two concepts of conditioning in the setting of non-additive uncertainty models, (Extended abstract) 6th Internat. Conf. on the Foundations and Applications of Utility, Risk and Decision Theory (FUR-VI), Paris, Cachan, 15-18 June, 1992. [18] D. Dubois, H. Prade, Fuzzy sets and probability: Misunderstandings, bridges and gaps, Proc. 2nd IEEE Internat. Conf. on Fuzzy Systems (FUZZ-IEEE'93), San Francisco, CA, 28-March-1 April, 1993, 1059 1068. [19] D. Dubois, H. Prade, Focusing versus updating in belief function theory, in: R.R. Yager, M. Fedrizzi, J. Kacprzyk (Eds.), Advances in the Dempster-Shafer Theory of Evidence, Wiley, New York, 1994, pp. 71 95. [20] D. Dubois, H. Prade, Possibilistic logic and plausible inference. in: G. Coletti, D. Dubois, R. Scozzafava (Eds.), Mathematical Models for Handling Partial Knowledge in Artificial Intelligence, Plenum Press, New York, 1995, pp. 209-229. [21] D. Dubois, H. Prade, Numerical representations of acceptance, Proc. of the 1lth Conf. on Uncertainty in AI, Montreal, Quebec, 1995, Morgan Kaufmann, San Francisco, CA, pp. 149 156. -
239
[22] D. Dubois, H. Prade, Focusing vs. revision in possibility theory, Proc. of the 4th IEEE Internat. Conf. on Fuzzy Systems (FUZZ-IEEE'96), New Orleans, LA, 9-12 September, 1996. [23] D. Dubois, H. Prade, P. Smets, Representing partial ignorance, IEEE Trans. Systems, Man Cybernet. 26 (1996) 361-378. 124] R. Fagin, J.Y. Halpern, A new approach to updating beliefs, Research Report RJ 7222, IBM, Research Division, San Jose, CA, 1989. [25] I. Gilboa, D. Schmeidler, Updating ambiguous beliefs, Proc. of the 4th Conf. on Theoretical Aspects of reasoning about Knowledge (TARK'92), Monterey, CA, 1992, Morgan Kaufmann, San Francisco, CA, pp. 143-162. 1-26] M. Goldszmidt, J. Pearl, Rank-based systems: a simple approach to belief revision, belief update, and reasoning about evidence and actions, Proc. of the 3rd Internat. Conf. on Principles of Knowledge Representation and Reasoning (KR'92), Cambridge, MA, 1992, Morgan Kaufmann, San Francisco, CA, pp. 671-672. [27] I.R. Goodman, H.T. Nguyen, E. Walker, Conditional Inference and Logic for Intelligent Systems, North-Holland, Amsterdam, 1991. [28] E. Hisdal, Conditional possibilities - Independence and non-interactivity, Fuzzy Sets and Systems 1 (1978) 283-297. [29] J.Y. Jaffray, Bayesian updating and belief functions, IEEE Trans. Systems Man Cybernet. 22 (1992) 1144-1152. [30] H.E. Kyburg, Jr., Bayesian and non-Bayesian evidential updating, Artificial Intelligence 31 (1987)271-294. [31] I. Maung, Two characterizations of a minimum-information principle for possibilistic reasoning, Internat. J. Approximate Reasoning 12 (1995) 133-156. [32] R. Mesiar, On the integral representation of fuzzy possibility measures, Internat. J. General Systems 23 (1995) 109-121. [33] S. Moral, L. de Campos, Updating uncertain information, in: B. Bouchon-Meunier, R.R. Yager, L.A. Zadeh (Eds), Uncertainty in Knowledge Bases, Lecture Notes in Comput. Sci., vol. 521, Springer, Berlin, 1991, pp. 58-67. [34] S. Moral, N. Wilson, Revision rules for convex sets of probabilities, in: G. Coletti, D. Dubois, R. Scozzafava (Eds.), Mathematical Models for Handling Partial Knowledge in Artificial Intelligence, Plenum Press, New York, 1995, pp. 113-128. [35] J. Paris, The Uncertain Reasoner's Companion. Cambridge Univ. Press, Cambridge, UK, 1994. [36] J. Pearl, Reasoning with belief functions: An analysis of compatibility, Internat. J. Approximate Reasoning 4(5/6) (1990) 363-389. [37] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, NJ, 1976. [38] W. Spohn, Ordinal conditional functions: a dynamic theory of epistemic states, in: W. Harper, B. Skyrms (Eds.), Causation in Decision, Belief Change and Statistics, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1988, pp. 105-134.
240
D. Dubois, H. Prade / Fuzzy Sets and Systems 92 (1997) 223-240
[39] H. Th6ne, Precise conclusions under uncertainty and incompleteness in deductive database systems. Doctoral Dissertation, University of Tiibingen, Germany, 1994. [40] H. Th6ne, U. Giintzer, W. Kiegling, Towards precision of probabilistic bounds propagation, Proc. of the 8th Conf. on Uncertainty in Artificial Intelligence, Washington, DC, 1992, Morgan Kaufmann, San Francisco, CA, pp. 315-322.
1-41] P. Walley, Statistical Inference with Imprecise Probabilities. Chapman & Hall, London, 1991. [42] P. Walley, Measures of uncertainty in expert systems, Artificial Intelligence 83 (1996) 1-58. [43] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility Fuzzy Sets and Systems 1 (1978) 3-28. [44] L.A. Zadeh, Syllogistic reasoning in fuzzy logic and its application to usuality and reasoning with dispositions, IEEE Trans. Systems Man Cybernet 15 (1985) 754-763.