European Journal of Operational Research 25 (1986) 345-356 North-Holland
345
Fuzzy sets and statistical data Didier DUBOIS and Henri PRADE Language and System Informatics, University Paul Sabatier 31062 Toulouse, France
Abstract: Specific features of probability and possibility theories are discussed with emphasis on semantical aspects. Instead of putting forward the acknowledged usefulness of possibility theory for the non-statistical modelling of subjective categories, we try to figure out how statistical data and possibility theory could be matched. As a result, procedures for constructing weak possibilistic substitutes of probability measures, and for processing imprecise statistical data are outlined. They provide new insights on relatiQnship between fuzzy sets and probability theories.
I. Introduction
One of the main problems encountered by people who want to know about fuzzy set theory [30] is to catch the intuition of what a fuzzy membership function may mean, and to distinguish it from a probability function at a semantical level. The concept of probability is usually related to the frequency of occurrence of events, captured by repeated experiments whose outcomes are recorded. In contrast with this, Zadeh [31] claimed that fuzzy set theory could provide the appropriate framework to assess the possibility of events rather than their probability. Zadeh's understanding of possibility does not relate to the existence of statistical experiments, but to the imprecision of natural language statements translated by soft constraints restricting the values of variables. In the late fifties, an English economist, Shackle [22] has outlined a theory of individual decisionmaking in which the concept of possibility also
Most of the ideas and results contained in this paper were the topic of a communicationat the 5th European Conference on Operations Research, held in Lausanne (Switzerland), on July 11-14, 1982. Received October 1982; revisedJune 1984
played a crucial role. But his feeling of what 'possibility' means radically departs from Zadeh's concerns in [31,32]. According to Shackle [22, p. 67], " a man cannot, in general, tell what will happen, but his conception of the nature of things, the nature of men and their institutions and affairs, and of the non-human would enable him to form a judgement as to whether any suggested thing can happen". The possibility of an event is then viewed as expressing to what extent it is not surprizing for some individual. Such grades of possibility should assess subjective belief about the occurrence of events. Clearly, they cannot be thought independently of the frequency of occurrence of these events, as observed by the individual under concern, since things which often occur are judged less surprizing than things which seldom occur. It is noticeable that starting from independent intuitions, Zadeh and Shackle came up with the same mathematical model, in which the maximum and the minimum operators are used to combine grades of possibility. In contrast with this, traditional approaches to decision analysis modelling put forward an additive model of subjective belief, known under the name of subjective probability (e.g. Savage [21]). The mathematics of subjective probability are basically the same as those of frequential (objective) probability, sometimes up to a scaling transformation. However, attempts to
0377-2217/86/$3.50 © 1986, ElsevierSciencePublishers B.V. (North-Holland)
346
D. Dubois, H. Prade / Fuzzy sets and statistical data
justify the additivity rule of subjective probability axiomatically either failed or were not intuitively satisfactory (see Fine [11] for a critical review on this topic). Actually, it turns out that this prejudice in favor of additivity has historical grounds. Shafer [24], in a most interesting paper, established that scientists until Bernouilli distinguished between 'chance' (randomness) and 'probability' (attribute of opinion). The first concept can be described by the frequency of occurrence of events and leads to additive grades. The second is a feature of our subjective knowledge and need not be additive. Both concepts were unrelated until the 18th century, and Shafer [24] indicates how the additive point of view, especially motivated by Bayes' method for statistical inference, became the only intelligible approach to subjective probability. Shafer's theory of evidence [23] is a revival of forgotten studies on non-additive probability and encompasses both theories of probability and possibility. This paper is meant to investigate the links which can exist between possibility theory and the observation of the frequency of events. It is claimed that although possibility theory and fuzzy sets can be relevant in situations when no statistical data are available (but only, for instance, verbal statements about facts), it is licit to derive grades of possibility from observed frequencies. The first section investigates natural consistency conditions between grades of possibility, the dual grades of necessity, and grades of probability. Links with Shafer's theory of evidence and with Savage's qualitative probabilities are pointed out. Section 2 gives a method for deriving grades of possibility from a probability distribution obtained from a histogram. Intuitive and theoretic justifications of this method are provided. The last section points out that if unrelated random trials have imprecise but consistent outcomes, then whereas no probability distribution can be obtained, a possibility distribution can easily be derived which defines bounds on unknown probability values.
2. Probability and possibility In the following we investigate the meaning of possibility, deduce specific axioms for a possibilistic assessment of subjective knowledge and compare it to probabilistic rules. It is stressed that Savage's qualitative probability axioms are not
consistent with possibility theory. Then it is recalled how Shafer's theory encompasses both probability and possibility.
2. l. The meaning of possibility Traditionally, the word 'possible' conveys two meanings: physical and epistemic possibility. The former pertains to whether it is difficult or not to perform some action, i.e. questions such as "is it possible to squeeze eight tennis balls in this box". However, there is an other meaning of possibility, underlying questions such as " m a y it rain tomorrow?", closer to Shackle [22]'s ideas. As opposed to physical possibilities which are an objective feature of the world, we consider a subjective judgement pertaining to the possibility of occurrence of events. As noticed by Zadeh [31] epistemic possibility can be related to imprecise verbal statements such as "John is tall". 'Tall' denotes a fuzzy set of possible values for John's size (see [32] for the analysis of more complex statements). From the knowledge of the possibility distribution of John's size, provided by the verbal statement, Zadeh [31] points out that some answer to questions such as " m a y John be shorter than me?" can be provided. However, natural language statements are not the only reasonable source of knowledge about the possibility of occurrence of events. The evidence gathered by some individual comes from his own past observations of objective facts, as well as from verbal opinions he heard from other people. Hence the frequency of occurrence of event, as perceived by him, can have some noticeable influence on his judgments, i.e. epistemic possibility and statistical data are not completely unrelated. Let A, B, C .... be a finite collection of (nonfuzzy) events, which form the set of subsets of a finite set ~2. $2 is the sure event, the empty set models the impossible event. In the following, P(A) denotes the probability of A, understood as how frequent A is. II(A) denotes the possibility of occurence of A, i.e. a number assessing someone's knowledge about this possibility, in reply to the question " m a y A occur?". We shall adopt the scale [0, 1] for these numbers, with P(S~) = n ( ~ ) = 1, P(~) = / ~ ( ~ ) = 0.
(1) (2)
let sT be the event 'not A'. Since P(A) is in
D. Dubois, H. Prade / Fuzzy sets and statistical data
accordance with the relative frequency of occurrence of A, we have
P ( A ) + P(A-)= I,
(3)
and more generally if A and B are two mutually exclusive events, i.e. A nB=,O,
P( A U B) = P( A) + P( B)
(4)
where A U B means 'A or B', modeled by the union of sets A and B. Similarly, 'not A' is represented by the complementary set of A, and mutually exclusive events by disjoint sets. Turning to grades of possibility, it is clear that considering complementary questions such as " m a y A occur" and " m a y ' not A' occur", nothing prevents from giving a positive answer to both. On the other hand, it is forbidden to give a negative answer to both without being self-contradictory. In other words, at least one of A and A--must be possible, but both may be simultaneously judged as equally possible as well. Possibility is thus a weak notion of evidence. In particularly, what is probable must be possible but not conversely, so that we may require grades of probability to act as lower bounds on grades of possibility. These remarks, together with equation (3) naturally lead to assume
H( A) + H( ~ >~I.
(5)
Hence there is but a weak relationship between /7(A) and /7(A). Now the question " m a y ' not A' not occurr" can be naturally expressed as "should A occur". Here we point out the fact that possibility has a dual companion which is 'necessity': "A is necessary" ( = A is bound to occur) is equivalent to " n o t A is impossible". Thus, epistemic necessity to events, or the certainty of their occurrence, can be assessed by numbers N(A) such that
VA
N(A) = 1 - H(A--).
(6)
From (5) it is clear that
VA
N(A)+N(A-)<~I
(7)
and from (6), (5) is equivalent to YI(A)>~ N(A), i.e. the necessity of occurrence of an event is always less or equal to its possibility of occurrence, which is in accordance with our intuition. Remark. Modal logics (Hugues and Cresswell [14]) consider possibility and necessity as basic cate-
347
gories. (6) is a numerical translation of one of its basic definitions. It is noticeable that possibility theory is a multi-valued approach (i.e. degrees of possibility are allowed) while usual modal logic is a 0-1 approach (there are only two 'degrees' of possibility: complete possibility and complete impossibility). See Prade [20] for a more detailed account on the consistency between possibility theory and modal Aristotelian semantics.
2.2. Axioms for possibility The problem of finding a possibilistic substitute for the additive rule (4) of probability has been considered by Zadeh and Shackle independently. They both come up with the same answer, namely
VA, B, A fT B =,O H ( A ta B) = max( /-/( A ), H(B)).
(8)
note that (8) can be equivalently written without assuming A and B are disjoint (see [6]). A setfunction satisfying (1), (2) and (8) is called a possibility measure. Reasons for choosing (8) are the following. First, let g(A) assess the state of knowledge regarding the occurrence of A, with g(12)= 1, g ( ~ ) = 0; g may be a probability, a possibility function or anything else. I e t A and B be events such that A implies B, denoted A ___B. Then we know at least as much about the occurrence of B as about that of A, i.e.
VA,VB
Ac_B
~
g(A)<~g(B).
(9)
Set functions satisfying (9) were studied by Sugeno [26] under the name 'fuzzy measure'. A direct consequence of (9) is
VA,VB
g(AWB)>~max(g(A),g(B)).
(10)
Hence (8) appears as a limit case of a general knowledge assessment function; it corresponds to a prudent attitude since it assigns to A U B the least possible number. The statement "A is possible" is indeed a minimal commitment. On the other hand, some may argue that subjective grades of possibility cannot be numerically combined. The only thing we can know is whether some event is judged more possible, less possible than the other, or both are equally possible. The use of the maximum operator in (8) is, due to (9), the only possible solution which is consistent with the idea
D. Dubois, H. Prade / Fuzzy sets and statistical data
348
that subjective grades of possibility can only be compared. Note that while probability theory takes advantage of all mathematical properties of the scale [0, 1] (especially the semi-group structure), possibility theory only needs a complete distributive lattice (which [0, 1], equipped with its usual order, is indeed). Some pseudo-complementation is necessary to define necessity grades as in (6). From (8), the axiom of necessity is
VA, B I2 = A u B N(AnB)=min(N(A),
N(B)),
(11)
i.e. a limit case of (9) since for any fuzzy measure
'CA, B g ( A n B ) < ~ m i n ( g ( A ) , g ( B ) ) .
(12)
(11) is consistent with the fact that the statement "A is necessary" underlies a very strong commitment). (11) is equivalent to
N(AnB)=min(N(A),
bility and the probability (understood as the relative frequency of events). Namely reducing the possibility of occurrence of events also reduces their probability; similarly, enhancing the tendency of occurrence of events (by relaxing constraints acting on them) tends to increase their probability of occurrence. A consistency index between probability and possibility was first suggested by Zadeh [31]. He assumes I 2 = { w il i = l . . . . . n} to be finite and composed of n elementary events. Let ~,=/7({w,)),p,=P((w,}),
i = 1 . . . . . n,
be the grades of possibility and of probability of elementary events, the degree of consistency between rr = (Tq .-. ft,) and p = (Pl "'" P,) is y = ~ %p,
(e[0, 1]).
(13)
N(B))
for all A, B. Remark. The problem of finding all possible substitutes for the additivity rule of probabilities for combining subjective degrees of knowledge has been considered by the authors in [8] with (9) as a minimal requirement. It was found that there are only two classes of subjective knowledge assessment functions: pseudo-probabilities, such that 'CA, g(A) can be calculated from g(A). They are additive up to an isomorphism which may express the distortion of additivity by subjectivity. pseudo-possibilities, such that g(A) and g(A--) are only weakly related by the equation max(g(A), g(,,T))= 1, and their duals ~ ( A ) = 1 g(,,T), which satisfy min(g(A), ~(A-))=0. Thus, pseudo-possibilities (resp. pseudo-necessities) satisfy (5) (resp. (7)). Moreover,
"/= 1 (maximal consistency) means ~r, = 1, Vi such that p~ > 0. Of course, rr can also be viewed as a fuzzy set. Another way of mathematically expressing the verbal requirement "the more necessary an event, the more probable, and the more probable, the more possible" consists in the following inequalities (Dubois and Prade [6,7,8]):
'CA N ( A ) = I - H ( X ) < ~ P ( A ) < ~ H ( A ) .
(14)
-
-
ff,(A)>O
~
g(A)=l
=
~ ( A ) = 0,
and g(A) < 1
i.e. the occurrence of an event must be completely possible before being somewhat necessary, which is natural.
For a finite I2, ~, = 1 clearly implies (14). Sufficient conditions for /7 and P to satisfy (14) are given in [9]. Methods which derive possibility measures/7 from probabilistic data should be respectful of (14) in order to be intuitively accepted.
2.4. Shafer's approach to the mesurement of belief In a finite setting Shafer [23] proposed to split the total amount of knowledge, into quantities assigned to some events for which evidence is available as to whether they are to occur. Such events are called focal elements. A mapping m is defined which allocates to each event a portion of the total knowledge, i.e. m(JJ) = 0,
~_, re(B) = 1.
(15) (16)
2.3. Consistency between possibility and probability
Bc_O
As noticed earlier, common sense intuitively admits that some links exist between the possi-
The set of focal elements is ( B / m ( B ) > 0}. m is called a basic probability assignment. The grade of
D. Dubois, H. Prade / F u z z y sets and statistical data
belief in some event A is then calculated on the basis of events whose occurrence imply that of A, i.e.
re(B).
BeI(A)= •
The plausibility of event A is evaluated on the basis of the events B which may occur simultaneously with A: PI(A)=
m(e).
E
(18)
B n A ,#fJ
It is easy to figure out that BeI(A)~< PI(A) and BeI(A) = 1 - PI(A--). In his book, Shafer [23] proves that belief (resp. plausibility) functions are equivalent to super-additive (resp. subadditive) probabilities, i.e. we have for instance Vn~N >--
BeI(A1UA 2 . . . UA,,) E
+(-1)"
i
Bel(,4,
VA
n,42
-..
hA,,).
m(A)=p(x)
(19)
Probabilities are special cases of belief and plausibility functions, such that V,4, PI(,4)= Bel(A). Their focal elements are only elementary events, i.e. the evidence is scattered (hence contradictory), but precise. Possibilities are special cases of plausibility functions. P1 is a possibility measure if and only if the focal elements are nested, i.e. 3F 1 G F2 c_ • .. C_Fp such that r e ( A ) > 0 only if A is one of the F,'s. Of course the corresponding belief functions are necessity measures. Possibility measures are obtained when the evidence is totally consistent (consonant, as termed by Shafer), and only in this case. The contrast with the probabilistic case is drastic. It is interesting to notice that the structure of the body of evidence suffices to characterize the nature of the evidence measures, in the case of probability and possibility. This theory of evidence actually stems from works in statistical inference by Dempster [2]. The focal elements correspond to observed outcomes of random trials, contained in a set X, and there is a multimapping F which to every outcome x ~ X associates an actual event F ( x ) G 12, and if P ( ( x ) ) = p ( x ) is the statistically estimated prob-
ifA=F(x),
= 0
(20)
otherwise.
,4 = F ( x ) means that A is the (possibly imprecise) interpretation of x, in the relevant frame of discernment. It is assumed that F(x)-c/J, Vx. In this framework BeI(A) (resp. PI(A)) is the minimal (resp. maximal) amount of probability which can be allocated to ,4 by transporting the probability measure on X to 12 through F. Indeed, if
A* = ( x ~ x l r ( x )
nA .,6},
,4.= (x~XIr(x)c_,4},
(21) (22)
i.e. A* (resp. A.) is the set of outcomes which may (resp. must) correspond to event A, we clearly have
BeI(A)=P(A.),
BeI( A, ) - )-". Bef( A i n Aj ) + . . .
i=1 ..... n
ability of x, the basic assignment is defined by
(17)
BGA
349
PI(A)=P(A*),
(24)
where P is the probability measure on X. If, ultimately, due to some precisiation process, F can be reduced to an ordinary mapping X---, 12, i.e. Vx, F(x) is an elementary event, the transported probability P~ on 12, such that
p , ( ,4)= e( r - ' ( ,4))
(24)
will be consistent with Bel and PI in the sense that
'CA
BeI( A ) ~ P~( A ) ~ PI(A).
(25)
Particularly if there is an ordering of X, say
(x,,,, x,:) ..... x(p,} = x with
r(X(l)) c r(x(:)) _c ... _c r ( x , p ) ) , then Bel and P1 are necessity and possibility measures and (25) is exactly (14). Hence the consistency condition between possibility and probability is interpreted according to statistical inference. Remark 1. Dempster's approach can be carried over to infinite settings. This is the topic of the theory of random sets (Kendall [16], Matheron [17]). The above discussion suggests that fuzzy set and possibility theory can be interpreted in the framework of random sets (Goodman [12] and Wang and Sanchez [28]). Besides, H6hle [13] has
350
D. Dubois, H. Prade / Fuzzy sets and statistical data
developed a general theory of plausibility and belief functions in an algebraic setting. Remark 2. See [8] for a study of the relationship
between Shafer's theory of evidence, and the setfunctions obtained by generalizing the additivity law of probability.
2.5. Possibility theory and qualitative probability Savage [21], acknowledging the fact that subjective probability need not be quantified, introduced the concept of qualitative probability under the form of a relation among events, denoted >/ such that A >/B means A is at least as probable as B. A-B is short for A > / B and B>/A simultaneously, and reads A is as probable as B. Savage set forth the following axioms for this relation: (CO) 12 >,0 (non triviality). Here 'A > B' means A >1 B, and B >/A is false. (C1) A >/B or B >/A (comparability). (C2) A >1 B, B >t C imply A >/C (transitivity). (C3) A >/,0 (the impossible event is the least probable one) (C4) A N ( B U C ) = , O ~ (B>~C ~ A U B
This point was worth mentioning and would deserve a careful study. However it is not the main topic of this paper. See Dubois [3] in which axioms which characterize possibility are studied.
3. Transforming a probability measure into a possibility measure
From previous discussions, it follows that possibility of events can be related to their frequency of occurrence, that possibility and probability measures must satisfy some mutual consistency requirements, and that the mathematics of probability involves a richer structure (additivity) than those of possibility (comparability). Hence, given some histogram obtained after a sufficiently large number of experiments, how could possibility grades be derived? This section, based on [7] and [9], tries to provide some intuitively satisfactory answer in a finite setting.
3.1. Pointwise transformation Assume that a probability assignment p = ( p l • - - p , ) has been obtained from statistical data regarding 1 2 = { w I . . . . . w,}. A first ideal is to build a possibility assignment or=(orl "'" or,,) through a linear transformation of p i.e. 3k > 0
>~AuC).
vi
Axiom (C4) claims that if A and B are mutually exclusive, A and C are mutually exclusive, it is equivalent to know whether B is at least as probable as C and to know that 'A or B ' is at least as probable as 'A or C'. A numerical knowledge assessment function g is said to agree with >/ if and only if
A possibility measure is then defined by
A>~B
~,
g(A)>~g(B ).
(26)
Possibility measures generally do not agree with qualitative probabilities. Especially a possibility measure is inconsistent with axiom C4 since max(/-/(A), H ( B ) ) >1m a x ( H ( A ) , H ( C ) ) does not imply FI(B)>1 H(C). In order to encompass possibility measures, it is necessary to modify (C4) into a weaker version:
(A n ( a u c ) = , 0 and AUB>~AUC.
N>~ C)
VA
or, = kp,.
(27)
/7(A) = maxori. w, ~A
The condition /7(12)= 1 clearly implies 1 / k = maxip i, i.e. :ii, or~= 1. Since E~'_~p~ = 1, it is clear that for elementary events or~>/p~. However, condition (14) generally does not hold for nonelementary events (see Dubois and Prade [6], p. 259] for a counter-example).
3.2. A non-pointwise transformation Since the above pointwise transformation is not consistent with intuition, a more complex approach must be contemplated. In [7] a bijective mapping between probability and possibility assignments is proposed, based on the following intuitive argument. I Actually suggested by Sharer [23] independently.
D. Dubois.H. Prade /Fuzzy sets and statisticaldata A genuine random phenomenon is one whose outcome frequencies tend to be uniformly distributed. For instance a fair coin-tossing game should provide in the long run an equal number of heads and tails. If heads overcome tails, after a large number of experiments, then the coin is said to be biased, i.e. the phenomenon is no longer completely random. In other words, there is some necessity in favor of heads. If w~ denotes heads and w2 tails, this necessity is naturally assessed by N ( ( w, }) = p , - P 2 ,
(28)
N ( { w 2)) = 0.
(29)
351
Then
N( A) = max N( A,).
(31)
A,~A
• if Pl > p j for j > 1, N(A)> 0 if and only if wI GA. • N is a necessity measure, i.e. N(A N B ) = min(N(A), N(B)) • N is consistent with the probability measure it derives from, that is:
VA
N(A)<~P(A)<~I-I(A)
Saying the coin is biased, we are sensitive to the
where /7 is the possibility measure obtained from N by duality. It is easy to calculate the possibility assignment underlying N:
excess of probability on one side. (29) expresses
7r,=l - N ( ~ 2 - { w,}),
there is no necessity in favor of tails. Carrying this reasoning to any finite partition of the sure event, the necessity of A can be estimated by comparing the frequency of occurrence of elementary events implying the occurrence of A, and the most frequent elementary event implying the occurrence of the opposite event A. Let us define
~r,= ~ p , . - Y" m a x ( p j - p i , 0).
N(A) = ~
~rg= ~ m i n ( p , , p k ) ,
m a x ( p , - p A , 0)
(30)
i.e.
jq=i
k=l
Hence i = l . . . . . n.
(32)
k=l
%cA
where PA = max( Pil w~~ A }. Once more we are sensitive to the total excess of probability favoring A compared to adverse outcomes. The quantity N(A) is pictured in Figure 1. The following properties hold (see [7,9]): • Let the wi's be reordered such that Pl >/P2 >/ >/p,,,and
3.3. Interpretation: the converse mapping
A i= (w 1..... wj}
{w" = (~r1 • • • rr,,) [ max'si = 1}
. . .
forj=l
. . . . . n; A0=,~.
Equation (32) defines a bijective mapping between the set
{p=(p]"'p.)IEp,=I} of probability assignments, and the set i
of possibility assignments. To figure it out, note that, by ordering the p~'s as above: 7rt= ~ P i = l , i=l
~rj=jpj+ 12 W1
W2
W3
Figure 1. The quantity N( A)
W4
W5
w6
~ k =j+ 1
w7
7rn= npn,
Pk
(J=l,n--1),
D. Dubois. H. Prade / Fuzzy sets and statistical data
352
so that rr is the unique solution of the linear system Or,,+ 1 = 0):
~rj-%.+~ = J ( P j - P j + 1 ) , i = 1 . . . . . n.
(33)
Conversely, if the pi's are the unknowns, (33) defines a unique probability assignment. It is easy to check that
Vj & = L I ( k rrx.- ~rx.+,).
~)=/7({~})=
Y'. m ( A k ) , I
(34) i.e.
(34) defines a probability allocation procedure which is well known in Bayesian statistical inference, as is shown now. In a situation when no knowledge is available, it is often admitted to resort to a uniform probability assignment, although it is illicit, strictly speaking, to model a lack of knowledge by pure randomness (i.e. assuming the coin is unbiased, a priori). A careful modelling of total ignorance lies outside the domain of probability, but is easily carried out in Shafer's setting. Namely. we allocate the total amount of belief to the sure event and nothing to other events, i.e. VAcD
re(A)=0.
Since 12 is the only focal element, it defines a possibility and a necessity measure such that 'CA 4: ,O, /-/(A)= 1 and VA ~ ~2, N ( A ) = 0 and the corresponding possibility assignment is Vi
vj
A~w
k=j
m(I2)=l;
Now assume the focal elements are nested, i.e. there is an ordering of the wi's such that the focal elements are all of the form Aj = { w 1..... wj } for some j. The so-defined possibility measure can be characterized by the possibility assignment ~r such that
tl
~=
~ m(A~),
j=l
. . . . . n.
(36)
k=j
The basic probability assignment can thus be expressed in terms of ~ as Vj=I ..... m(A)=0
n,
m(Aj)='rgj-q(i+l,
(37)
ifA~A~,Vj.
Now it is clear that the converse of the probability---, possibility transformation suggested in Section 3.2 is nothing but equation (35), i.e. is related to the use of uniformly distributed Bayesian priors. For instance starting with a uniform probability, we get the total ignorance function, or if we interpret ~r as a normalized fuzzy set, the characteristic function of the set O.
3.4. Random experiments with fuzzy sets
% = Pl({w,}) = 1.
The Bayesian prior, in the presence of ignorance, consists in equally splitting the amount of belief between elementary events. More generally, if the available knowledge appears under the form of a Shaferian basic probability assignment, then a purely probabilistic approximation can be obtained if for each focal element B, we equally share the weight of evidence re(B) among the elementary events contained in B. Hence we define the probability measure P by (see [9])
Let F be a fuzzy set [30] on $2, with membership function ~F such that Vw,
~,~-(w,) = ~,.
(38)
where IAI stands for the cardinality of A. As proved in Dempster [2], sharing the weights of evidences among concerned elementary events always produces a probability such that
F is for instance the name of a subjective category, but it may as well be derived from the procedure described in section 3.2. We are now concerned with the random picking of some w, which would be a prototype of F. Kaufman [15] and Yager [29] have suggested a method based on the concept of a level-cut. Let a ~ (0, 1]. The a-level cut of F is the crisp set F, which contains all elements with membership grade at least equal to a. The method proceeds as follows: Step 1. Pick some level cut F, at random. Step 2. Pick some w, ~ F,, at random. In the finite case, the a-cuts of F are among the already introduced sets A j, such that
VA
Aj=F~
Vw
p(w)=P((w})=
~_, r e ( A )
(35)
IAI
BeI(A)<~P(A)~PI(A).
f o r a E ( , n ' j + l , ,n'j], j = l
. . . . . n,
D. Dubois, H. Prade / Fuzzy sets and statistical data
with % = 0 for j = n + 1 and % > %+ 1. The a-cuts of F are in other words the focal elements of the associated possibility measure. The probability of picking the focal element A k as an a-cut of F, is Pr(Ak) = ~rk - ~rk+ 1. The probability of picking element ~. in A k is 1 Pr(~) I Ak) -- Ak---I ~ =0
1 k
ifj<~k
otherwise(wj~A k ifj>k).
On the whole the probability of picking element is Pr(wj)= ~
Pr(wklAk)Pr(Zk)
k=l n
2., k=j
k
353
in [5,6]. It is shown that the arithmetic of fuzzy numbers straightforwardly generalizes interval analysis (Moore [19]) to fuzzy intervals. Addition of fuzzy numbers is remarkably shape-invariant contrary to random variable convolution: adding triangular distributions yields triangles, adding parabolic distributions yields paraboles, etc. (see [5,6]). Consequently usual shortest past algorithms can be readily adapted to the fuzzy case (see Dubois and Prade [4], Chanas and Kamburovski [1] for details). Using the probability-possibility transformation described above, one can, for instance, turn a stochastic shortest route problem into a fuzzy shortest route problem in a consistent manner, and get an easier problem to solve. Of course the gain in computability is balanced by a loss in precision since fuzzy models only provide best and worst case analysis and do not assume that errors compensate.
-PJ"
Hence the possibility-probability transformation suggested in this section preserves the probabilities of elementary events: It is equivalent to perform a random experiment with the probability assignment p and with the fuzzy set built from p by equation (32). This feature justifies the approach.
3.5. Why transforming a probability measure into a possibility measure? While studying a system, it is often easier to write down equations which describe it, than to perform the set of computations which would solve these equations. This is especially true with probabilistic models many of which cannot be applied due to their complexity. Consider for instance the simple shortest path problem on a graph. As soon as we allow distances between vertices to be random, we are faced with many difficulties among which the intricate dependency of paths, the necessity of performing repeated convolutions of random variables. In order to get a flavor of the complexity of the problem, refer to Sigal et al. [25]. On the other hand the fuzzy version of the shortest path problem is much easier to solve. It requires the ability to add and to calculate the maximum or the minimum of fuzzy numbers. Very simple methods for this purpose are described by the authors
4. Random experiments with imprecise outcomes
In the preceding section, we considered probability distributions obtained from regular random experiments; in order to derive a possibility measure out of it, we had to transform it by some suitable method. In this section we consider random experiments which asymptotically provide a possibility measure directly.
4.1. Example In order to estimate the meaning of the subjective category 'tall' in the sense of a group of people, it is natural to gather some data by asking each person for his or her own opinion about human tallness. There is a common agreement among people, regarding the fact that 'tall' means 'greater than some threshold size s'. The size scale can be the interval S = [0.5, 2.5] meters. S is discretized into n disjoint adjacent intervals S~, i = 1 . . . . . n.
Each person is assumed to provide some value of s. Hence an histogram can be built, and if the number of answers is large enough probabilities { p ( S i ) , i = 1 . . . . . n } are obtained (see Figure 2). Now, it is clear that any a n s w e r ' s ' means 'tall' = [s,2.5]; in other words the probabilities are assigned to intervals of the form [s~, 2.5] with s i = inf Si. The mapping F defined by
Vs
r ( s ) = Is,, 2.5]. if s ~ S,
D. Dubois, H. Prade / Fuzzy sets and statistical data
354
is clearly the multimapping considered by Dempster [2] and recalled in Section 2.4. A basic assignment is naturally defined on S such that
m([si,2.5])=P(S,) with nested focal elements. Hence what is obtained from the statistical data is a possibility measure. The underlying fuzzy set is defined by
Vs /.ttall(S)=Pl({s})= E P(Si). Sl~$
In other words, the membership function of 'tail' is the distribution function of the probability measure derived from statistical data (see Figure 2). This way of defining a membership function by a distribution function was first independently suggested by Mac Vicar-Whelan [18]. Of course letting n go to infinity would yield a consonant random set and a continuous membership function could be obtained.
4.2. General case
observable, in physics, by means of repeated measurements yielding error intervals. In order to be able to derive a possibility measure out of imprecise outcomes of random experiments, we must require, strictly speaking the set of obtained intervals ([a k, b~.llk = 1, q} to be nested. However this condition is clearly too strong in practice. A more realistic but weaker substitute is that all intervals somehow overlap. This consistency condition is very common for interval analysis methods in experimental sciences. We then assume 3 [ a , b] =
N
[a,, b,] 4:,0.
Let
[A,B]= U [a,, a,]. i=l,q
The gathering of the data can be done in the following way. First we define a set of nested intervals ( I,, i = 1 . . . . . n } such that
[a, b]c_I, c I 2 c A fuzzy set cannot always be viewed as a distribution function as in the preceding example. However the statistical method can be used for the estimation of any fuzzy set F describing a concept defined on some numerical scale S. Each person is then asked to provide a crisp representative of F under the form of some interval [a, b]. The same situation is met when estimating the value of some
Frequency
... c l , , = [A, B].
(40)
The I / s play the same role as the disjoint adjacent intervals used for building histograms in the case of classical random experiments. Each outcome [a k, b~.] is assigned in a unique way to the smallest interval which contains it. We then make a count of how many outcomes are assigned to each I~, and if q is large, come up with values re(l,) which can be assimilated to probabilities; m is exactly a basic probability assignment and defines a possibility measure. The corresponding fuzzy set is defined by (see Figure 3)
t ~ r ( x ) = P l ( ( x } ) = Y'. m ( l , ) = H ( { x } ) ,
i
! I I
(39)
i=l,q
#tall
i.e. #F(X)= 0
if X ~ /,,,
~F(X)=~-~m(Ij)
i f x ~ l , \ l , _ 1, i>~2,
j=i .
o 0.5
.
.
.
JII
t~r(x)=l
I
~
2.5
Si Figure 2. Histogram of the probability measure derived from statistical data
i f x ~ I I,
where I~\I~_1, is the set theoretic difference of I~ and I i_ 1. Of course a continuous membership function could be obtained by letting n (the number of
D. Dubois, H. Prade / F u z z y sets and statistical data
r_l-
m(I i}
S 15
I
h:
I I
,~,3 [ I1 I
355
Ii.-
I"
Figure 3. Values of m ( l , ) and the corresponding fuzzy set
intervals Ii) and q (the number of experiments) go to infinity; see Dubois and Prade [10] for a study of the continuous case. /~r represents the meaning of the concept F in the sense of a group of people, or in the experimental sciences context, the fuzzy set of possible locations of the values of some observable. It enables us to numerically assess the possibility or the necessity for this value to lie in some prescribed area A by H(A)=
s u p ~ r ( x ), x~A
N(A)=
inf 1 --~F(X). xq~A
Here, physical possibility (measurement of observables) and epistemic possibility (aggregation of subjective opinions) have the same mathematical model. See Wang [28] for recent work on imprecise statistics.
dent measurements does not apply. On the other side, experiments with imprecise but consistent outcomes naturally fit the setting of possibility theory. Of course in the case of random experiments producing imprecise and inconsistent data, it is still possible to use a nested set of intervals to process these data. However, the so-obtained fuzzy set tends to be meaningless: the consistency condition (39) prevents from classifying very narrow error intervals in very wide gauges Ii; this unfortunate feature could appear when (39) does not hold. In the case of inconsistency, the best thing is to make clusters of consistent data, and to build a basic probability assignment on these clusters. However, something more general than a possibility measure, namely a plausibility measure in the sense of Shafer is then obtained.
4.3. Discussion The procedure outlined above helps pointing out the respective specificity of possibility and probability theory, from the point of view of a statistician. A probability measure is a natural tool for modelling a series of independent random experiments with precisely located outcomes. Note that the precision of the outcomes implies their inconsistency (except for the trivial case of a deterministic phenomenon); especially, the traditional methodology of experimental scientists reducing an error interval by intersecting indepen-
5. Conclusion This paper investigated several issues related to fuzzy set, possibility and probability theories, in a rather informal style. It has been stressed that the concept of possibility, either physical or epistemic, which provides a natural interpretation of a fuzzy set, requires axioms different from those of probability, in order to be mathematically modelled. Probabilistic information, with its additive structure, is richer than possibilistic information, ordi-
356
D. Dubois, H. Prade / Fuzzy sets and statistical data
hal in nature. It has been indicated how statistical data may be helpful for the measurement of membership functions, viewed as possibility assignments. First, a constructive definition of a possibilistic substitute for a probability measure has been suggested. This may be useful when the fuzzy counterpart of a probabilistic model is easier to deal with. Lastly, it has been shown that a possibility measure is a natural model for a random phenomenon producing imprecise but consistent outcomes. In this respect, the fact that fuzzy arithmetics [5,6] generalize interval analysis [19] as performed by physicists supports the proposed possibility-based method for imprecise data processing. It is hoped that the various issues discussed in this paper may help people figuring out the meaning and potentialities of fuzzy set and possibility theory.
References [1] Chanas, S., and Kamburovski, J., "The use of fuzzy variables in PERT." Fuzzy Sets and Systems 5 (1981) 11-20. [2] Dempster, A.P., "Upper and lower probabilities induced by a multivalued mapping", Annals of Mathematical Statistics 38 (1967) 325-339. [3] Dubois, D., "Steps to a theory of qualitative possibility", Proceedings of the 6th International Congress of Cybernetics and Systems, Paris, September 10-14, 1984. [4] Dubois, D., and Prade, H., "Algorithmes de plus courts chemins pour traiter des donn~es floues", RAIRO, Operations Research Series 12 (1978) 213-227. [5] Dubois, D., and Prade, H., "Operations on fuzzy numbers", International Journal on Systems Sciences 9 (1978) 613-626. [6] Dubois, D., and Prade, H., Fuzzy Sets and Systems: Theory and Applications Academic Press, New York, 1980. [7] Dubois, D., Prade, H., "Unfair coins and necessity measures, towards a possibilistic interpretation of histograms", Fuzzy Sets and Systems 10 (1983) 15-20. [8] Dubois, D., and Prade, H., "A class of fuzzy measures based on triangular norms", International Journal on General Systems 8 (1982) 43-61. [9] Dubois, D., and Prade, H., "On several representations of an uncertain body of evidence", in: M.M. Gupta, E. Sanchez (eds.), Fuzzy Information and Decision Processes, North-Holland, Amsterdam, 1982, 167-181. [10] Dubois, D., and Prade, H., "Upper and lower possibilistic expectations and some applications", 4th Int. Seminar on Fuzzy Set Theory, J. Kepler UniversitY.t, Linz, Austria, 1982.
[11] Fine, T., Theories of Probability, Academic Press, New York, 1973. [12] Goodman, I.R., "Fuzzy sets as equivalence classes of random sets", in: R.R. Yager (ed.), Fuzzy Set and Possibility Theory, Pergamon press, Oxford, 1982, 327-343. [13] H6hle, U., "A mathematical theory of uncertainty", in: R.R. Yager (ed.), Fuzzy Set and Possibility Theory: Recent Developments, Pergamon Press, Oxford, 1982, 344-355. [14] Hugues, G.E. and Cresswell, M.J., An Introduction to Modal Logic, Methuen, London, 1972. [15] Kaufmann, A., "La simulation des ensembles flous", CNRS Round Table on Fuzzy Sets, Lyon, June 1980. [16] Kendall, D.G., "Foundations of a theory of random sets", in: E.F. Harding and D.G. Kendall (ed.), Stochastic Geometry, John Wiley and Sons, New York, 1974, 322-376. [17] Matheron, G. Random sets and Integral Geometry, John Wiley and Sons, New York, 1975. [18] MacVicar-Whelan, P.J., "Fuzzy logic: An alternative approach", Proceedings of the 9th IEEE International Symposium on Multiple-Valued Logic, Bath, 1979. [19] Moore, R., Interval Analysis, Prentice-Hall, Englewood Cliffs, N J, 1966. [20] Prade, H., "Modal semantics and fuzzy set theory", in: R.R. Yager (ed.), Fuzzy Set and Possibility theory: Recents Developments, Pergamon Press, Oxford, 1982, 232-246. [21] Savage, L.J., The Foundations of Statistics, Dover, New York, 1972. [22] Shackle, G.L.S. Decision, Order and Time in Human Affairs, Cambridge University Press, 1961. [23] Shafer, G., A Mathematical Theory of Evidence, Princeton University Press, N.J., 1976. [24] Shafer, G., "Non-additive probabilities in the work of Bernouilli and Lambert", Archive for History of Exact Sciences 19 (1978) 309-370. [25] Sigal, G.E., Pritsker, A.A.B., Solberg, J., "The stochastic route problem", Operations Research 28 (1980) 1122-1129. [261 Sugeno, M., Theory of Fuzzy Integrals and its Applications, Ph.D. thesis, Tokyo Institute of Technology, 1974. [27] Wang, P.Z., and Sanchez E., "Treating a fuzzy subset as a projectable random subset", in: M.M. Gupta, E. Sanchez (eds.), Fuzzy Information and Decision Processes, NorthHolland, Amsterdam, 1982, 213-220. [28] Wang, P.Z., "From the fuzzy statistics to the falling random subsets", in: P.P. Wang (ed.), Advances in Fuzzy Sets, Possibility Theory and Applications, Plenum Press, New York, 1983, 81-96. [29] Yager, R.R., "Level sets for membership evaluation of fuzzy subsets", in: R.R. Yager (ed.), Fuzzy set and Possibility Theory: Recent Developments, Pergamon Press, Oxford, 1982, 90-97. [30] Zadeh, L.A., "Fuzzy sets", Information and Control 8 (1965) 338-353. [31] Zadeh, L.A., "Fuzzy sets as a basis for a theory of possibility", Fuzzy sets and Systems 1 (1978) 3-28. [32] Zadeh, L.A., " P R U F - - A meaning representation language for natural languages", International Journal of M a n - Machine Studies 10 (1978) 395-460.