Direct algorithms for checking consistency and making inferences from conditional probability assessments

Direct algorithms for checking consistency and making inferences from conditional probability assessments

Journal of Statistical Planning and Inference 126 (2004) 119 – 151 www.elsevier.com/locate/jspi Direct algorithms for checking consistency and makin...

432KB Sizes 0 Downloads 34 Views

Journal of Statistical Planning and Inference 126 (2004) 119 – 151

www.elsevier.com/locate/jspi

Direct algorithms for checking consistency and making inferences from conditional probability assessments Peter Walleya , Renato Pelessonib , Paolo Vicigb a Departamento

de Ciencias de la Computacion e Inteligencia Articial, Universidad de Granada, Granada, Spain b Dipartimento di Matematica Applicata ‘B. de Finetti’, Universita di Trieste, Trieste, Italy

Received 26 June 2001; received in revised form 7 February 2003; accepted 4 September 2003

Abstract We solve two fundamental problems of probabilistic reasoning: given 2nitely many conditional probability assessments, how to determine whether the assessments are mutually consistent, and how to determine what they imply about the conditional probabilities of other events? These problems were posed in 1854 by George Boole, who gave a partial solution using algebraic methods. The two problems are fundamental in applications of the Bayesian theory of probability; Bruno de Finetti solved the second problem for the special case of unconditional probability assessments in what he called ‘the fundamental theorem of probability’. We give examples to show that previous attempts to solve the two problems, using probabilistic logic and similar methods, can produce incorrect answers. Using ideas from the theory of imprecise probability, we show that the general problems have simple, direct solutions which can be implemented using linear programming algorithms. Unlike earlier proposals, our methods are formulated directly in terms of the assessments, without introducing unknown probabilities. Our methods work when any of the conditioning events may have probability zero, and they work when the assessments include imprecise (upper and lower) probabilities or previsions. The main methodological contribution of the paper is to provide general algorithms for making inferences from any 2nite collection of (possibly imprecise) conditional probabilities. c 2003 Elsevier B.V. All rights reserved.  MSC: primary 62A01; secondary 68T37; 60A05 Keywords: Avoiding uniform loss; Bayesian inference; Coherent probabilities; Fundamental theorem of probability; Imprecise probability; Lower probability; Natural extension; Probabilistic logic; Probabilistic reasoning E-mail addresses: [email protected] (P. Walley), [email protected] (R. Pelessoni), [email protected] (P. Vicig). c 2003 Elsevier B.V. All rights reserved. 0378-3758/$ - see front matter  doi:10.1016/j.jspi.2003.09.005

120

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

1. Introduction 1.1. The fundamental problems of probabilistic reasoning This paper is concerned with solving two fundamental problems of probabilistic reasoning: The consistency problem. Given any 2nite collection of conditional probability assessments, how can we determine whether the assessments are mutually consistent? The inference problem. How can we determine what the assessments imply about other conditional probabilities? In the simplest version of the problems, conditional probabilities P(Ai |Bi ) are speci2ed for i = 1; 2; : : : ; m, where Ai and Bi are any events such that Bi is possible. The two problems are to check whether these conditional probability assessments are mutually consistent, and to use them to make inferences about a further conditional probability P(A|B). The consistency and inference problems are fundamental problems in the Bayesian theory of probability and statistical inference (de Finetti, 1974) and in the theory of imprecise probability (Walley, 1991). In this paper we describe general algorithms that can be used to solve the two problems. The computations involve only linear programming. In almost all applications, each problem can be solved through one or two linear programs. Our solutions are based on concepts of avoiding uniform loss and natural extension from the theory of imprecise probability. The algorithms we propose for solving the consistency and inference problems are direct: they work by investigating certain kinds of linear combinations of the assessments. The algorithms that have been studied previously, since Boole (1854a), are indirect: they work by 2nding precise probability measures that extend the assessments to other events. Although direct and indirect solutions are related through duality, the direct formulation is, in general, conceptually simpler than the indirect one. Two further properties that distinguish our algorithms from most of the previous proposals (exceptions are noted in Section 1.4) are that they work when any of the conditioning events may have probability zero, and they work when any of the conditional probability assessments are imprecise. The most frequently used methods for checking consistency or making inferences are based on probabilistic logic and similar approaches (Hailperin, 1986; Nilsson, 1986), and they can be implemented by solving just a single linear program. For these methods to give reliable answers, it is suIcient that the probabilities of all the conditioning events are bounded away from zero. But these methods may give incorrect answers when this suIcient condition fails, which happens frequently in practical applications. For example, an assessment of a conditional probability P(A|B) is, by itself, completely uninformative about the probability of B, and it is consistent with P(B) = 0. Examples of the problems created by zero probabilities will be given in the following sections, especially Examples 1 and 4.

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

121

It is also common in applications that some probabilities are assessed imprecisely, e.g., as upper and lower bounds rather than as precise numbers. That may be because there is insuIcient information or insuIcient time to make a precise assessment, or the information is vague, inconsistent or diIcult to evaluate, or the assessor wants to be cautious. Therefore, methods to deal with imprecise probabilities are needed. This issue is discussed in Section 1.3. The main objectives of this paper are to give a uni2ed treatment of the consistency and inference problems, in which probability assessments may be either precise or imprecise and either conditional or unconditional, and to 2nd direct algorithms for solving the two problems. 1.2. Numerical examples Here we introduce two numerical examples that will be used throughout the paper to illustrate our algorithms. Example 1 (Hansen and Jaumard, 1996, Ex. 6).Assume that A1 ; : : : ; A5 are logically 5 independent events, i.e., all events of the form i=1 Di are possible, where each Di is either Ai or its complementary event Aci . Suppose that the following precise, unconditional probability assessments are made: P(A1 ) = 0:6, P(Ac1 ∪ A2 ) = 0:4, P(A2 ∪ A3 ) = 0:8, P(A3 ∩ A4 ) = 0:3, P(Ac4 ∪ A5 ) = 0:5, and P(A2 ∪ A5 ) = 0:6. In this case, the consistency problem is to determine whether the six assessments are coherent, and the inference problem is to determine what they imply about the probabilities of other events, for instance P(A3 ), P(A4 |A3 ), or P(A1 ∩ A2 |B) where B = (A1 ∩ A2 ) ∪ (Ac1 ∩ Ac2 ). The six assessments are consistent with many events having probability zero. For example, the 2rst two assessments imply that P(A1 ∩ A2 ) = 0. If we knew that P(B) was non-zero then we could infer that P(A1 ∩ A2 |B) = 0. However, it turns out that the assessments are also consistent with P(B) = 0, and it is not immediately clear what we can say about P(A1 ∩ A2 |B). Because the conditioning event B may have probability zero, the standard linear programming methods of probabilistic logic cannot be relied upon to compute bounds for P(A1 ∩ A2 |B). In fact, as the later Example 11 shows, the standard algorithms produce incorrect bounds in this case. This is not an isolated example. There are many other events, such as A3 ∩ A5 , A4 ∩ A5 , (A1 ∪ A4 )c and A2 ∩ (A1 ∪ A3 ∪ A5 ), which may have probability zero here. If C is such an event and we want to make inferences about a conditional probability P(A|C), we cannot in general use the standard linear programming algorithms. There can be similar diIculties in checking consistency, if further probability assessments are made that are conditional on C. Example 2. Suppose that three football teams, X , Y and Z, play a tournament involving three matches, X vs. Y , X vs. Z, and Y vs. Z. Each team is assigned two points for a win, one point for a draw, and zero for a loss. The tournament is won by the team which obtains the most points from its two matches. If two or more teams 2nish equal 2rst on points then further rules are applied to choose a winner from amongst them: the

122

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

Table 1 Precise probability assessments P(Ci ) for the 27 events Ci in the football example (L; L; L) 0:015

(L; L; W ) 0:04

(L; D; L) 0:02

(L; D; D) 0:045

(L; D; W ) 0:04

(L; W; D) 0:055

(L; W; W ) 0:06

(D; L; L) 0:015

(D; L; D) 0:02

(D; L; W ) 0:04

(D; D; L) 0:02

(D; D; W ) 0:05

(W; L; L) 0:01

(W; L; D) 0:02

(L; L; D) 0:02

(W; D; D) 0:04

(W; D; W ) 0:055

(W; W; L) 0:04

(D; W; L) 0:04

(D; W; D) 0:05

(D; W; W ) 0:07

(W; D; L) 0:02

(L; W; L) 0:04

(D; D; D) 0:035

(W; L; W ) 0:04

(W; W; D) (W; W; W ) 0:04 0:06

team which scored most goals wins the tournament, and if this still fails to determine a unique winner then the winner is chosen by randomization. A subject makes precise probability assessments for the 27 events of the form Ci = (R(X; Y ); R(X; Z); R(Y; Z)), where R(U; V ) represents the result, W (win), D (draw) or L (loss), for team U against team V . For instance, (W; W; L) means that X wins against Y and against Z, and Y loses against Z. Suppose that the probability assessments are those given in Table 1, based on the subject’s belief that team Z is weaker than teams X and Y , which are about equally strong. In this case, because the 27 events Ci form a partition, it is easily veri2ed that the probability assessments are mutually consistent, by checking only that the numbers are non-negative and sum to one. The subject wants to evaluate the probability of the event, A, that team X wins the tournament. This is an example of an inference problem. Here, the best possible lower and upper bounds for P(A) can be found simply by summing the probabilities of all events Ci that imply A to get the lower bound 0:325, and summing the probabilities of all events Ci that are consistent with A to get the upper bound 0:53. 1.3. More general versions of the problems In general, the conditional probability assessments cannot be expected to determine a precise value for a further conditional probability P(A|B), but only to determine upper and lower bounds for P(A|B). That was just seen in the football example, where we obtained only upper and lower bounds for P(A). The upper and lower bounds are called upper and lower probabilities. Because inferences usually need to be expressed in terms of upper and lower probabilities, it is natural to generalize the problem by allowing the initial assessments to be also in the form of upper and lower probabilities. In fact, this generalization greatly extends the scope of the problem. There are many practical problems in which it is diIcult or unreasonable to make precise probability assessments, because either there is little available information on which to base the assessments or the information is diIcult to evaluate (Walley, 1991, 1996). In these problems, we may be able to assess only upper and lower probabilities. For example, the precise probability assessments given in Table 1 are unrealistic, as even a subject who knows the teams well would 2nd it diIcult to assess precise probabilities for all 27 possible outcomes.

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

123

Because an assessment of an upper probability P(A|B) is equivalent to the assessment P(Ac |B) = 1 − P(A|B) of a lower probability, we can assume that all the quantities assessed are lower probabilities. When the assessments of upper and lower probabilities coincide, the common value P(A|B) = P(A|B) is called a precise probability and it is written as P(A|B). In the general formulation of the problem, we suppose that 2nitely many conditional lower probabilities P(Ai |Bi ) = ci are speci2ed, for i = 1; 2; : : : ; k. We do not assume any structure for the collection of conditional events {A1 |B1 , A2 |B2 , : : : ; Ak |Bk }, except that the occurrence of Bi is logically possible for each i = 1; 2; : : : ; k. A user is free to make whatever probability assessments are most natural or convenient. All the relevant events, Bi and Ai ∩ Bi (i = 1; 2; : : : ; k), can be identi2ed with subsets of a possibility space , where each set Bi is non-empty. If these events are not de2ned as subsets of a given possibility space, but instead the logical relationships amongst the events are speci2ed, then we can formulate an appropriate possibility space, , whose atoms are k all events of the form i=1 Di that are logically possible, where each Di is Ai ∩ Bi , Aci ∩ Bi or Bic . For instance, the six unconditional assessments in Example 1 generate a space which contains 20 atomic events. This general formulation allows assessments of precise probabilities, upper probabilities and unconditional probabilities. As previously noted, an assessment of an upper probability P(A|B) = c can be replaced by an equivalent assessment of a lower probability, P(Ac |B)=1−c. A precise probability assessment P(A|B)=c, which is equivalent to specifying equal upper and lower probabilities P(A|B) = P(A|B) = c, can therefore be replaced by the two lower probability assessments P(A|B) = c and P(Ac |B) = 1 − c. An unconditional lower probability assessment P(A) = c is equivalent to P(A|) = c, i.e., equivalent to conditioning on the certain event , and similarly for precise assessments of unconditional probability. Our formulation of the problem could be generalized further, to allow assessments of conditional upper and lower previsions (expectations) of random variables; see Section 4(a). Example 3. In the football example, the subject wishes to evaluate the probability of the event, A, that team X wins the tournament. If he were able to assess the 27 values P(A|Ci ) precisely, then P(A) would be precisely determined through the 27 conglomerative property P(A) = i=1 P(A|Ci )P(Ci ). Now the assessment of P(A|Ci ) is trivial for those A|Ci which are impossible, like A|(L; L; D), and those A|Ci which are sure, like A|(W; W; D), but there remain 5 events A|Ci which are neither impossible nor sure. The subject might 2nd it hard to assess precise probabilities for these events because they rely on guessing which teams will score more goals. It may be especially diIcult to evaluate the probability of A|(D; D; D). Suppose that the subject makes the 10 imprecise probability assessments given in Table 2, in addition to the 27 precise assessments in Table 1. Because each of the precise assessments is equivalent to two assessments of lower probabilities, there are a total of 2 × 27 + 10 = 64 lower probability assessments. Now it is necessary to check the consistency of the 64 assessments, and to determine what they imply about P(A). These problems will be solved later in the paper. It turns out that, although the extra

124

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

Table 2 Assessments of upper and lower conditional probabilities for 2ve events A|Ci in the football example

P P

A|(L; W; L)

A|(D; D; D)

A|(D; W; W )

A|(W; L; W )

A|(W; D; L)

1 0:6

1 0

0:75 0:25

0:65 0:4

0:8 0:65

assessments in Table 2 are quite imprecise, they substantially reduce the imprecision in the probability of A. 1.4. Previous work on the problems George Boole (1854a,b) formulated versions of the consistency and inference problems as early as 1854. He recognized that, in general, the probability assessments will determine only upper and lower bounds for the conditional probability of a new event, and he suggested several algebraic methods for 2nding the upper and lower bounds. The most eIcient of his methods involves using the assessments to determine a system of linear equality and inequality constraints on variables which represent the unknown probabilities of the possible atomic events, and then solving this system by successive elimination of variables (Fourier–Motzkin elimination). This method is what we call an indirect solution since it is formulated in terms of unknown probabilities. Boole’s other methods are rather obscure, but one of them (Boole, 1854a, p. 310) appears to involve a direct formulation of the problem. Boole’s methods are described in detail in Hailperin (1986) and Hansen et al. (1995). The methods of Boole and Hailperin (1986) can also be used to give an analytical, rather than numerical, solution to the inference problem. For example, if A1 and A2 are logically independent, A = A1 ∪ A2 , and the assessments P(A1 ) and P(A2 ) are regarded as unspeci2ed parameters, Boole’s method of Fourier–Motzkin elimination gives the analytical bounds max{P(A1 ); P(A2 )} 6 P(A) 6 P(A1 ) + P(A2 ). It appears to be possible to apply the direct methods of this paper in a similar way, to derive the rules of coherence and natural extension for imprecise probabilities, by using Fourier– Motzkin elimination to successively remove the coeIcients i from (7). For the special case of unconditional probability assessments, the inference problem was solved by de Finetti (1930) in what he later called ‘the fundamental theorem of probability’ (de Finetti, 1974). This name indicates how important the problem is for the Bayesian theory of probability. De Finetti’s fundamental theorem implies that, for precise unconditional probabilities, the inference problem can be solved by linear programming (LP). In this special case, the consistency problem can also be solved through a linear program, using de Finetti’s concept of coherence. LP approaches to the consistency and inference problems were described by Hailperin (1965) and Bruno and Gilio (1980). Generalizations of the fundamental theorem of probability were obtained in Lad et al. (1990), Lad (1996) and Walley (1991). Linear programming solutions for the consistency and inference problems also have a fundamental role in the theory of probabilistic logic that was proposed by Nilsson

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

125

(1986, 1993); see also Paass (1988). It is shown in Jaumard et al. (1991) and Hansen and Jaumard (1996) that these solutions can be extended to cope with very large numbers of assessments and variables, by incorporating column generation methods which avoid explicit formulation of an underlying space of atomic events. Hansen and Jaumard (1996) surveyed earlier work on the consistency and inference problems, which they called ‘probabilistic satis2ability’ problems, including computational algorithms for implementing the earlier solutions. Hailperin (1965, 1986) extended Boole’s methods to deal with conditional probabilities, by regarding each assessment P(Ai |Bi ) = ci [or P(Ai |Bi ) = ci ] as a linear constraint, of the form P(Ai ∩ Bi ) = ci P(Bi ) [or P(Ai ∩ Bi ) ¿ ci P(Bi )], on an unknown probability measure P. An equivalent approach is adopted in probabilistic logic and similar methods (Fagin et al., 1990; Jaumard et al., 1991; Frisch and Haddawy, 1994; Hansen and Jaumard, 1996; Lad, 1996). In this approach, the assessments are regarded as consistent if and only if there is a probability measure P that satis2es all the linear constraints (i = 1; 2; : : : ; k). In some versions of this approach, inferences concerning a conditional event A|B are carried out by computing upper and lower bounds for P(A|B), as P ranges over all probability measures that satisfy all the linear constraints and P(B) ¿ 0. We emphasise that these solutions for the consistency and inference problems are not equivalent, in general, to the solutions proposed in this paper. The assessment P(Ai |Bi ) = ci is equivalent to Hailperin’s linear constraint when P(Bi ) ¿ 0, but not in general. For that reason, Hailperin’s method is guaranteed to give correct answers only when all the conditioning events Bi have positive lower probability. In practice, it is common for some conditioning events of interest to have zero lower probability: this does not mean that the probability of such an event is known to be zero, but merely that having probability zero is consistent with the assessments. In such cases, probabilistic logic can give incorrect answers to the consistency and inference problems. That is illustrated by the following, very simple, examples. A detailed comparison between our solutions and those of probabilistic logic, and further examples, will be given in Sections 2.4, 3.7 and 3.8. Example 4. (a) Suppose that two precise probabilities are assessed: P(A|B) = 0:9 and P(Ac |B) = 0:9, where A and B are logically independent events. The two assessments appear to be inconsistent since they violate P(A|B) + P(Ac |B) = 1, and they are indeed inconsistent under the criterion that we propose in Section 2.1. But Hailperin’s method produces the constraints P(A ∩ B) = 0:9P(B) and P(Ac ∩ B) = 0:9P(B), which are compatible with every probability measure P that satis2es P(B)=0. Hailperin’s method and the equivalent methods of probabilistic logic therefore conclude, incorrectly, that the two assessments are consistent; indeed, that would remain true if both values 0:9 were changed to 1! (b) Suppose that A and Bc are possible events with A ⊂ B, the only assessment is P(Ac ∩B)=0, and we wish to determine what this implies about P(A|B). The assessment is compatible with P(A ∩ B) = P(B) = 0, and therefore with every value of P(A|B) between 0 and 1. Thus, the correct lower and upper bounds for P(A|B) are the vacuous bounds 0 and 1. These are the bounds given by the inference method we propose in

126

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

Section 3.1. But probabilistic logic produces diOerent inferences because, in computing bounds for P(A|B), it uses only the probability measures P that satisfy P(Ac ∩ B) = 0 and P(B) ¿ 0. Every such P has P(A|B) = P(A ∩ B)=P(B) = 1, so probabilistic logic concludes that P(A|B) = 1 precisely, i.e., that the lower and upper bounds for P(A|B) are both equal to 1. (c) Suppose that A and Bc are possible events with A ⊂ C ⊂ B, the two assessments are P(B) = 0 and P(A|B) = 0:5, and we wish to determine what they imply about P(C|B). The correct answer is that P(C|B) must lie between 0:5 and 1. Again, this answer is produced by the method of Section 3.1, but not by probabilistic logic, which takes into account only those probability measures with P(B) ¿ 0. In this case, no such probability measure is compatible with the assessment P(B) = 0. Because P(B) = 0, probabilistic logic is unable to extract any information from the assessment P(A|B) = 0:5. Methods for handling zero probabilities have been studied only quite recently. A general method for checking consistency of precise conditional probability assessments, which works when conditioning events may have probability zero, was developed in a series of papers by Coletti, Gilio and Scozzafava, including Coletti (1994), Coletti and Scozzafava (1996), Gilio (1995, 1996). Computationally, this method requires solving a sequence of LP problems. A solution for the inference problem in the case of precise conditional probabilities, which again involves a sequence of LP problems, was given by Vicig (1997). Many of the earlier studies, for example Hailperin (1965), Lad et al. (1990), Jaumard et al. (1991), Coletti (1994), Gilio (1995), Hansen and Jaumard (1996), Lad (1996), have considered imprecise probability assessments, although most of these studies contain only a brief discussion of imprecision. Walley (1991) gave a detailed theory of imprecise conditional probabilities, including general formulae for checking consistency and making inferences, on which the approach in this paper is based. These formulae yield correct answers when conditioning events may have probability zero, and also when in2nitely many conditioning events are involved. See also Williams (1975) and Walley (1997). The consistency problem for imprecise assessments was also studied by Coletti (1994) and Gilio (1995), using a criterion of consistency that is similar to what we call ‘avoiding uniform loss’, and by Vicig (1996), using a stronger criterion which we call ‘coherence’. Pelessoni and Vicig (1998) proposed an algorithm for computing the least-committal coherent correction of imprecise assessments that avoid uniform loss, and this can be used to solve the inference problem. Again, this algorithm requires solving a sequence of LP problems. The Pelessoni-Vicig algorithm is generally less eIcient than the algorithms proposed in this paper, although it is related to them through duality; see Section 3.7 for a comparison. Recently, Biazzo et al. (2001) used terminology from probabilistic logic to reformulate some of the methods developed in the papers cited in this paragraph. The computational methods that are proposed in the work we have outlined are indirect, because they involve programming problems in which the variables are taken to be unknown, precise probabilities. In this paper we propose direct solutions which do not involve unknown probabilities.

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

127

1.5. Outline of the paper Section 2 presents a general solution for the consistency problem. In Section 2.1, consistency is characterized mathematically through a condition of ‘avoiding uniform loss’. Two algorithms for verifying this condition are described in Section 2.3. When all the probability assessments are precise, avoiding uniform loss is equivalent to de Finetti’s concept of coherence (Section 2.2). When some of the assessments are imprecise, avoiding uniform loss still characterizes a basic type of consistency, but there is a stronger notion of consistency, called coherence, which is discussed in Sections 3.2 and 3.9. Avoiding uniform loss diOers from the concept of consistency that is used in probabilistic logic; the two concepts are compared in Section 2.4. A dual formulation of the consistency problem is stated in Section 2.5. Section 3 presents a general solution for the inference problem. Inferences are made by calculating upper and lower probabilities for a conditional event A|B, using a concept of natural extension that is de2ned in Section 3.1. When all the assessments are precise and coherent, the natural extensions are the maximum and minimum values of P(A|B) that are coherent with the assessments (Section 3.3). Two algorithms for computing natural extensions are proposed in Sections 3.4 and 3.5. In general, natural extension diOers from the inference method used in probabilistic logic; the two methods are compared in Section 3.7, which also formulates the dual linear programming problem, and in Section 3.8. Section 3.9 describes methods for checking whether a collection of imprecise probability assessments is coherent. Brief conclusions and suggestions for further research are given in Section 4. 1.6. Notation Here we summarize the notation that is used throughout the paper. We use the same symbol, usually A or B, to denote both an event and its indicator function (de Finetti’s convention). Using this convention, we de2ne the random variables Gi = Bi [Ai − ci ] for i = 1; 2; : : : ; k, where ci = P(Ai |Bi ). These random variables Gi play an important role in the ensuing theory. When  = (1 ; : : : ; k ) is a k-vector, we write  ¿ 0 to mean that i ¿ 0 for i = 1; 2; : : : ; k, and we write  0 to mean that  ¿ 0  and  = 0. We shall consider linear k combinations of random variables G , of the form i i=1 i Gi . De2ne I () = {i: i = 0;  i = 1; : : : ; k} and S() = i∈I () Bi . So S(), which is called the support of , is the union of those conditioning events Bi for which i is non-zero. If X is a bounded random variable and B is a non-empty subset of , sup[X |B] = sup{X (!) : ! ∈ B} denotes the supremum possible value of X if B occurs.

2. The consistency problem The 2rst problem is to determine whether the given assessments P(Ai |Bi ) = ci (i = 1; 2; : : : ; k) are mutually consistent. This problem will be solved in this section by

128

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

formulating a concept of ‘avoiding uniform loss’, a direct characterization of consistency which is equivalent to de2nitions in Williams (1975) and Walley (1991). 2.1. Avoiding uniform loss (AUL) We say that the assessments avoid uniform loss (AUL) when the parametric system of linear inequalities k 

i (Gi + Bi ) 6 0

and

 0

(1)

i=1

has no solution (; ) with  ¿ 0. The sum on the left-hand side of (1) is a random variable X , and we write X 6 0 to mean that the value of X is certainly less than or equal to 0, i.e., X (!) 6 0 for all ! ∈ . If the assessments do not avoid uniform loss, we say that they incur uniform loss. AUL is de2ned directly in terms of the assessments ci , since the left-hand side of (1) is a positive linear combination of the gambles Gi + Bi = Bi [Ai − (ci − )]. The AUL condition (1) is a fundamental consistency condition, which can be justi2ed through a behavioral interpretation of lower probabilities as betting rates (Walley, 1991). (This interpretation is not essential for understanding the rest of the paper.) A lower probability P(Ai |Bi )=ci is interpreted as a marginally acceptable rate for betting on Ai conditional on Bi , meaning that the bet whose reward is Bi [Ai − a] is acceptable whenever a ¡ ci . The bet is called oO, and the reward is zero, unless Bi occurs. Hence the random reward Gi = Bi [Ai − ci ] is at least marginally acceptable, but it need not be strictly acceptable. For any positive , the random reward Gi + Bi , which is the reward from a conditional bet at the lower rate ci − , is strictly acceptable. Thus, the small positive quantity  can be interpreted as a small reduction to each of the assessed lower probabilities ci , which makes the assessments slightly more cautious and makes the corresponding gambles strictly acceptable. The sum on the left-hand side of (1) represents the net reward from simultaneously accepting bets on each event Ai conditional on Bi ; the non-negative multiplier i can be interpreted as the stake of the ith bet. From (1), the assessments incur uniform loss if and only if there is a positive linear combination of strictly acceptable gambles which cannot possibly result in a net gain; this is a basic type of inconsistency. Lemma 1. The assessments AUL if and only if  k   sup i Gi |S() ¿ 0 whenever  0:

(2)

i=1

 Proof. Using S() = {Bi : i ¿ 0} and writing %1 = min{i : i ¿ 0; i = 1; : : : ; k} and k %2 = i=1 i , we see that %1 S() 6

k  i=1

i Bi 6 %2 S():

(3)

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

129

If the uniform loss then there are  0 and  ¿ 0 such that kassessments incur  k 0 ¿ i=1 i (Gi + Bi ) ¿ i=1 i Gi + %1 S(), using (3), with %1 ¿ 0 since  0. k Hence sup[ i=1 i Gi |S()] 6 − %1 ¡ 0, so that (2) fails. k Conversely, if there is  0 such that sup[ i=1 i Gi |S()] = −& ¡ 0, let  = &=%2 . k k k (Here %2 ¿ 0 since  0.) Then i=1 i (Gi + Bi ) = i=1 i Gi +  i=1 i Bi 6 −&S() + %2 S() = 0, using (3). Thus the failure of (2) implies that the assessments incur uniform loss. Another characterization is that the assessments AUL if and only if there is no k  0 such that i=1 i Gi 6 − S(). Another characterization (Williams, 1975) is that the assessments AUL if and only if there are precise conditional probabilities {P(Ai |Bi ) : i = 1; : : : ; k} which satisfy AUL (or, equivalently, de Finetti’s de2nition of “coherence”) and P(Ai |Bi ) ¿ ci for i = 1; : : : ; k. This indirect characterization is the basis for the previous algorithms for checking AUL or similar conditions. In particular, Gilio (1995) takes the dominance condition in this characterization as a de2nition of “coherence” for imprecise probabilities, and Coletti (1994) considers a condition similar to AUL but without assuming the conjugacy relation P(A|B) = 1 − P(Ac |B). A diOerent proof of essentially the same algorithm for checking the AUL condition is given in Pelessoni and Vicig (1998, Sec. 3). All these algorithms are indirect and involve solving for unknown, precise conditional probabilities. 2.2. Checking coherence of precise probability assessments Suppose that all the probability assessments are precise, P(Ai |Bi )=ci for i=1; 2; : : : ; m. We say that the assessments are de Finetti-coherent, or dF-coherent, when  m   sup i Gi |S() ¿ 0 for all real vectors  = 0: (4) i=1

(See Holzer (1985) for an equivalent de2nition.) This condition is identical to the necessary and suIcient condition (2) for AUL of conditional lower probabilities, except that now the coeIcients i are not required to be non-negative but are allowed to take any real values. When all the probability assessments are precise, dF-coherence is equivalent to AUL. To see that, recall that an assessment of a precise conditional probability P(A|B) = c is equivalent to two assessments of conditional lower probabilities, P(A|B) = c and P(Ac |B) = 1 − P(A|B) = 1 − c. Lemma 2. The precise probability assessments P(Ai |Bi ) = ci (i = 1; : : : ; m) are dFcoherent if and only if the corresponding assessments of conditional lower probabilities, P(Ai |Bi ) = ci and P(Aci |Bi ) = 1 − ci (i = 1; : : : ; m), avoid uniform loss. Proof. The assessments P(Ai |Bi ) = ci and P(Aci |Bi ) = 1 − ci have marginal gambles Gi = Bi [Ai − ci ] and Gi = Bi [Aci − (1 − ci )] = Bi [1 − Ai − 1 + ci ] = −Bi [Ai − ci ] = −Gi . Including both Gi and Gi = −Gi in (2), where the two coeIcients are required to be

130

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

non-negative, is equivalent to allowing i in (4) to take any real values: formally, the m  G sum i=1 mi i in the dF-coherence condition (4), where i are any reals, is equal to the sum i=1 ('i Gi + 'i Gi ) in the AUL condition (2), where 'i ; 'i are non-negative, by setting i = 'i − 'i or 'i = max{i ; 0} and 'i = max{−i ; 0}. In verifying (2), clearly we can assume that 'i = 'i = 0 whenever 'i = 'i , so that the vectors (1 ; : : : ; k ) and ('1 ; : : : ; 'k ; '1 ; : : : ; 'k ) have the same support S(). It follows from Lemma 1 that AUL is equivalent to dF-coherence. dF-coherence of precise probability assessments can therefore be veri2ed by checking AUL, by using the following Algorithms 1 and 2. 2.3. Algorithms for checking consistency To check AUL, we need to determine whether the system of inequalities (1) has a solution (; ) with  ¿ 0. Because the constraints in (1) become weaker as  decreases, (1) has a solution if and only if it has a solution for suIciently small positive values of . That suggests a heuristic algorithm for checking AUL, which is to solve (1) using an extremely small positive value of . To develop this idea, de2ne the degree of inconsistency of the assessments to be the supremum value of  for which the system (1) has a solution. The degree of inconsistency is positive if and only if the assessments incur uniform loss, and in that case it is the minimal correction to the assessments that is needed to achieve consistency. That is, replacing each assessment ci by max{0; ci − }, where  is the degree of inconsistency, achieves consistency. We can test whether the degree of inconsistency of the assessments exceeds a small positive value, , by searching for a solution to (1) with this 2xed value of . That is the basis for the following algorithm. Algorithm 1. Fix a small positive value of , and check whether the system of linear inequalities (1) has a solution . This involves a single linear program. If the system has a solution, then the assessments incur uniform loss and their degree of inconsistency is at least . If the system has no solution, then the degree of inconsistency of the assessments is no greater than . Algorithm 1, with an extremely small positive value of , can be used as a heuristic algorithm to test whether the assessments AUL. To do so,  should be chosen to be larger than the rounding error of computations, but also much smaller than the rounding of assessments. Usually the latter condition is easy to satisfy, because probabilities are rarely speci2ed to more than four decimal places. We have successfully used values of  that range from 10−7 to 10−10 in standard optimization programs such as Lingo and the simplex package of Maple V, in computations with at least 10 Roating-point digits. As a general rule for 2xing  in Algorithm 1, we suggest  = min{10−10 ; 10&}; where & is the rounding error of computations, and retaining as many Roating-point digits as possible in computations to reduce &. Whatever positive value of  is used in Algorithm 1, examples can be constructed in which (1) has no solution  but the assessments incur uniform loss. In other words,

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

131

when AUL fails, the degree of inconsistency may be arbitrarily close to zero. That can be seen from the following example. Example 5. Given any small positive value of , suppose that two assessments are made: P(A|B) = c and P(Ac |B) = 1 − c + , where 0 ¡  6 c 6 1. Because P(A|B) + P(Ac |B) ¿ 1, these assessments incur uniform loss, as can be seen from the de2nition of AUL. But, for this value of , system (1) has no solution . This example shows that, however small  is chosen to be, there are pathological cases in which Algorithm 1 will fail to detect an inconsistency in the assessments. We emphasize that this is extremely unlikely to occur in non-arti2cial problems, if  is very small (10−10 or less). In the very rare cases where it does occur, the assessments must be almost consistent, since their degree of inconsistency can be no greater than . In these cases, if  is chosen to be not much larger than the rounding error of computations, the degree of inconsistency is so small as to be almost indistinguishable from rounding error, and it will be diIcult for any algorithm to detect the inconsistency. Any algorithm in which computations have 2nite precision can give an incorrect answer when the degree of inconsistency is extremely small. For example, precise probability assessments that are dF-coherent always have zero degree of inconsistency, and therefore an arbitrarily small perturbation can make them inconsistent. Rounding errors in computations have the eOect of introducing such a perturbation, which can produce the erroneous conclusion that coherent, precise probability assessments are inconsistent. This is a serious problem for most algorithms. If the value of  in Algorithm 1 is larger than rounding error then it protects against this kind of numerical instability. To cope with cases like Example 5, Algorithm 1 can be extended as follows. First, if Algorithm 1 2nds a solution , then the assessments are de2nitely not consistent (they incur uniform loss). If Algorithm 1 has no solution , the next step is to check whether the system (1) with  = 0 has a solution . This involves a second linear programming problem. If this second system has no solution then the assessments are de2nitely consistent (they satisfy AUL). In the remaining case, where the 2rst system has no solution but the second system has a solution, we can use the following iterative algorithm which (ignoring the eOects of rounding errors) will work in all cases. To motivate this algorithm, note that, by Lemma 1, the assessments incur uniform loss if and only if there is a non-empty set I ⊆ {1; : : : ; k} such that   Bi 6 0 for some  ¿ 0: (5) i Gi + i∈I

i∈I

Algorithm 2 aims to 2nd the largest set I that satis2es (5), by starting with I = {1; 2; : : : ; k}, and successively reducing I until (5) holds. If eventually I is reduced to the empty set then we conclude that the assessments AUL. Algorithm 2. (a) Set I = {1; 2; : : : ; k}.

132

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

(b) Maximise



%i

i∈I

subject to and

 ¿ 0; 0 6 %i 6 1 (i ∈ I )   i Gi + %i Bi 6 0: i∈I

(6)

i∈I

(c) If %i = 1 for all i ∈ I then the assessments incur uniform loss. Otherwise, replace I by the subset {i ∈ I : %i = 1}. If I is empty then the assessments AUL. Otherwise, return to (b). Algorithm 2 relies on the following properties, which can be easily veri2ed. (i) If  is optimal in step (b), then %i = 0 or %i = 1 for all i ∈ I . (ii) Let (; ) be optimal in (b) and let ( ;  ) satisfy the constraints in (6). Then %i = 1 whenever %i ¿ 0. Algorithm 2 works by successively reducing the set I . At each iteration of step (b), I is replaced by I  = {i ∈ I : %i = 1}. By properties (i) and (ii), step (b) produces a solution  with as many components as possible equal to  1. Hence I   is the unique largest subset of I with the property that (for some  ¿ 0) i∈I i Gi + i∈I  Bi 6 0. When the algorithm terminates, %i = 1 for all i ∈ I (where possibly I is empty), so that I  = I . Hence I satis2es (5) and it is the largest set that does so. It follows from the characterization of AUL in terms of (5) that if I is non-empty then the assessments incur uniform loss, whereas if I is empty then they AUL. Because initially I has cardinality k, and I must be reduced, in step (c), before each iteration of (b), the algorithm is guaranteed to give an answer in at most k steps, i.e., after solving at most k linear programs of the form (6). If the assessments are inconsistent in the sense of probabilistic logic (de2ned in the next section) then Algorithm 2 requires only one step, and similarly if there is no set Bj such that k i=1 i Gi + Bj 6 0 for some  ¿ 0. Computations for all the later examples were done using both the optimization program Lingo and Maple V (release 5.1). We wrote a Maple program that constructs an appropriate possibility space , sets up the relevant LP problems with several values of , and solves the LP problems. Because all our algorithms involve only linear programming, they can be carried out using many other optimization programs, and they are computationally tractable if the number of assessments is not too large. The main source of computational complexity is that the number of constraints involved in (1) is k + n(k), where n(k) is the cardinality of the space  that is generated by the k conditional events (see Section 1.3), and n(k) can grow exponentially with k. A method for handling this problem is mentioned in the Conclusions. There are LP algorithms whose running time depends polynomially on the size of the problem. We brieRy mention a diOerent approach to checking AUL. Theoretically, we could determine whether (1) has a solution by applying a method of Jeroslow (1973), using the null function as a (dummy) objective function and viewing (1) as a single

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

133

non-parametric linear programming problem whose coeIcients belong to an ordered 2eld of rational functions of . The feasibility of (1) could then be checked by using an extension of the simplex algorithm. In principle, a similar method could be used to solve the inference problem. However, these methods have the drawback that polynomials of a very high degree are likely to appear in computations. 2.4. Avoiding sure loss (ASL), and comparison with probabilistic logic We call the consistency condition (1) ‘avoiding uniform loss’ rather than ‘avoiding sure loss’ because violations of AUL cannot necessarily be exploited to produce a sure loss. Incurring uniform loss means that there is a positive linear combination of strictly acceptable bets which cannot possibly result in a net gain, and which will produce a net loss if any of the bets is carried out. That is, there is a net loss if and only if S() occurs. Example 6. As in Example 4(a), suppose the assessments are P(A|B) = 0:9 and P(Ac |B) = 0:9, with A and B logically independent. To show that these assessments incur uniform loss, set 1 = 2 = 1, so that S() = B, in (2). However, there is no way to exploit the two assessments to produce a sure loss, because any bets based on the assessments must be conditional on B. If B fails to occur then all bets will be called oO and nothing is lost or gained. k We say that the assessments avoid sure loss (ASL) if sup[ i=1 i Gi |] ¿ 0 whenk ever  0; equivalently, there is no  0 such that i=1 i Gi 6 − 1. This condition can be checked by solving a single linear program. Comparing ASL with the characterization of AUL in Lemma 1 shows that AUL implies ASL. Example 6 shows that AUL is a stronger condition than ASL, and that ASL is too weak to characterize k consistency of the assessments. A suIcient condition for ASL is that i=1 Bi =  k k (since i=1 i Gi = 0 outside i=1 Bi ), i.e., it is suIcient for ASL that possibly none of the events Bi will occur, and yet the assessments may be inconsistent when this holds. For further discussion of the diOerence between ASL and AUL, and examples which show that ASL is too weak to characterize consistency, see Walley (1991, Ch. 7). Many of the methods that have been proposed for dealing with conditional probability assessments use de2nitions of consistency that are equivalent to ASL. That can be seen by applying duality results, such as Lemma 3.3.2 of Walley (1991), to show that the assessments ASL if and only if there is an unconditional probability measure P (de2ned on subsets of ) under which each random variable Gi has non-negative expectation, i.e., such that P(Ai ∩ Bi ) ¿ ci P(Bi ) for i = 1; : : : ; k. We say that any such probability measure is compatible with the assessments. Let M(0) denote the set of all compatible probability measures. The assessments ASL if and only if M(0) is non-empty. As noted in Section 1.4, Hailperin (1986) takes each assessment P(Ai |Bi ) = ci to impose a linear constraint P(Ai ∩ Bi ) ¿ ci P(Bi ) on an unknown probability measure P. Hailperin’s de2nition of consistency is therefore equivalent to ASL. So is the

134

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

de2nition of consistency in probabilistic logic (Jaumard et al., 1991; Frisch and Haddawy, 1994), where the constraints are often equivalently formulated as: either P(Bi ) = 0, or P(Bi ) ¿ 0 and P(Ai |Bi ) ¿ ci . Because they check ASL rather than AUL, the algorithms for checking consistency in probabilistic logic involve only a single linear program, but they may give incorrect answers about consistency, e.g., they are unable to detect the type of inconsistency in Example 6, where the assessments satisfy ASL but not AUL. A suIcient condition for AUL to be equivalent to ASL, and thus for the probabilistic logic algorithms to check AUL, is that all the conditioning events Bi have probabilities that are bounded away from zero; formally, the natural extensions E(Bi ), de2ned in Section 3, are non-zero for i = 1; : : : ; k. The probabilistic logic algorithms therefore give the correct answer about consistency when all the probability assessments are unconditional, as in Example 1. The following modi2cation of the football example again shows that these algorithms are unreliable when some conditional probabilities are assessed. Example 7. Consider the football example with the 10 upper and lower probability assessments given in Table 2, plus one further assessment that P(A|B) = 0:5, where A denotes the event that team X wins the tournament, and B is the event that X 2nishes equal 2rst on points and also Y loses against Z. Here B is the union of the two outcomes (L; W; L) and (W; D; L). Using this fact, it can be shown that the three assessments P(A|B) = 0:5, P(A|(L; W; L)) = 0:6 and P(A|(W; D; L)) = 0:65 are mutually inconsistent, and therefore the system of 11 upper and lower probabilities incurs uniform loss. This can be veri2ed using Algorithm 1, with any value of  smaller than 0.05. However, this system avoids sure loss, because the union of the 11 conditioning events is not certain to occur. Again, the ASL condition is too weak to detect the inconsistency in the assessments; probabilistic logic and similar methods would wrongly conclude that the assessments are consistent. Next suppose that we add the 27 precise probability assessments in Table 1 to the 11 imprecise assessments. Now the assessments incur sure loss: combining the three assessments that incur uniform loss with P((L; W; L)) = 0:04 produces a sure loss. The precise probability assessments tell us that the event B has positive probability, and this is enough to turn the uniform loss (on the set S() = B) into a sure loss. In the previous version, without the precise assessments, B had lower probability zero. This illustrates that the diOerence between AUL and ASL is essentially concerned with whether or not the union of conditioning events, S(), has lower probability zero. We have argued that AUL, rather than ASL, is the proper characterization of consistency. However, it may appear that ASL is simpler than AUL from a computational point of view, since ASL can always be checked by solving a single linear program. But we have found that, in practice, AUL can also be checked by solving a single linear program, using Algorithm 1 with an extremely small value of . Also, whenever the assessments incur sure loss, which is exactly the case where checking ASL is guaranteed to give the correct answer about AUL, Algorithm 2 always terminates in a single step, after solving a single linear program. It is only when the assessments

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

135

ASL, which fails to guarantee consistency in the sense of AUL, that Algorithm 2 may require more than one linear program. In general, we cannot simplify the AUL condition (1) by setting  = 0. That can be seen by considering the single assessment P(A1 |B1 ) = 1, which avoids uniform loss; taking 1 =1 gives 1 G1 =B1 (A1 −1) 6 0, so (1) is satis2ed with =0. In fact, whenever the assessments imply that some event B has probability zero, there is  0 such that k i=1 i Gi 6 − B 6 0. More generally, if the assessments imply that any non-trivial conditional probability P(A|B) is precisely determined, i.e., P(A|B) = P(A|B), then condition (1) is satis2ed for  = 0. This shows that the strengthening of AUL that is obtained by setting  = 0 in (1) is not necessary for consistency. 2.5. The dual problem Another formulation of the AUL condition (1) can be derived by applying duality theorems, such as the theorem of the alternative from Gale (1960) or Walley (1991, p. 612), to (1) with  2xed. This gives the dual condition: the assessments AUL if and only if, for every  ¿ 0, there is an unconditional probability measure P (de2ned on subsets of ) such that EP (Gi ) + P(Bi ) ¿ 0 for i = 1; 2; : : : ; k, where EP denotes expectation under P. To check this in practice, we would 2x an extremely small positive value of  and solve for P; this is the dual form of Algorithm 1. Any such solution P must satisfy P(Bi ) ¿ 0 for i = 1; : : : ; k, since P(Bi ) = 0 implies that EP (Gi ) = 0. The dual condition for AUL diOers slightly from the dual condition for ASL: the assessments ASL if and only if there is an unconditional probability measure P such that EP (Gi ) ¿ 0 for i = 1; 2; : : : ; k. 3. The inference problem 3.1. Natural extension Given the assessments of conditional lower probabilities P(Ai |Bi )=ci (i =1; 2; : : : ; k), we make inferences by calculating further conditional lower and upper probabilities, which will be denoted by E(A|B) and E(A|B). The symbol E stands for ‘Extension’. Here A|B need not be a new conditional event: it may agree with one of the conditional events Ai |Bi for which assessments are made. We always assume that B is possible (i.e., non-empty). Again, upper probabilities are determined by lower probabilities through E(A|B) = 1 − E(Ac |B), so we concentrate on the lower probability E(A|B). The quantity E(A|B) represents what can be inferred from the assessments concerning the conditional lower probability P(A|B). Recall that P(A|B) is interpreted as a supremum (marginally acceptable) rate for betting on A conditional on B. We therefore de2ne the natural extension E(A|B) to be the supremum rate for betting on A conditional on B that can be constructed from the assessments through positive linear combinations of acceptable bets. See Walley (1991, 1996, 1997) for further discussion of this idea.

136

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

Formally, the natural extension E(A|B) is de2ned to be the supremum value of ' for which there are  ¿ 0 and  ¿ 0 such that k  B(A − ') ¿ (7) i (Gi + Bi ): i=1

The supremum value of ' may or may not be achieved. Here B(A − ') represents the k net reward from a bet on A conditional on B at betting rate ', and i=1 i (Gi + Bi ) is the net reward from a positive linear combination of bets on each Ai conditional on Bi , at betting rates P(Ai |Bi ) −  with non-negative stakes i . Since P(Ai |Bi ) is interpreted as a supremum acceptable betting rate, some positive reduction  is needed to ensure that the bets are strictly acceptable. Eq. (7) directly characterizes the natural extension in terms of linear combinations of the assessments P(Ai |Bi ). There are always values of ('; ; ) which satisfy the constraints in the de2nition of natural extension. For example, ' = 0;  = 0 and any positive value of  satisfy the constraints, and this shows that always E(A|B) ¿ 0. To de2ne the natural extension, it is not necessary that the assessments AUL. However, assessments that incur uniform loss may produce bad inferences, in the sense that E(A|B) may be in2nite: this happens if and only if B ⊆ S, where S is the largest set S() on which the assessments incur uniform loss. Since E(A|B) may be 2nite even when the assessments incur uniform loss, it is advisable to check AUL, using Algorithm 1 or 2, before computing any natural extensions. If this check reveals that the assessments incur uniform loss then some of them should be modi2ed to satisfy AUL, and typically that will change a natural extension E(A|B) even if its initial value is 2nite. An alternative characterization of the natural extension is given in the following lemma, which is analogous to Lemma 1. Lemma 3. The natural extension E(A|B) is the supremum value of ' for which there is  ¿ 0 such that  k   sup i Gi − B(A − ')|S() ∪ B ¡ 0: (8) i=1

Here the supremum is not achieved: the set of '-values which satisfy these constraints is the open interval (−∞; E(A|B)). Proof. First  suppose that ('; ; ) satisfy the system of inequalities (7), and * ¡ '. Also let S() = {Bi : i ¿ 0}, and %1 = min{i : i ¿ 0} or %1 = 1 if  = 0. Then, using (3), k k i=1 i Gi −B(A−*) 6− i=1 i Bi −B('−*) 6−%1 S()−B('−*) 6−&[S()∪B], where & = min{%1 ; ' − *} ¿ 0. (This holds also if  = 0 since then S() = ∅.) Thus k sup[ i=1 i Gi − B(A − *)|S() ∪ B] 6 − & ¡ 0, so that (*; ) satisfy (8), for every * ¡ '. It follows that the quantity de2ned in the lemma is at least as large as E(A|B). For the reverse inequality, suppose that ' and  ¿ 0 satisfy (8). Then there is k 6 − &[S() ∪ B]. (Here all terms are zero & ¿ 0 such that i=1 i Gi − B(A − ')  k outside S() ∪ B.) If  0 then %2 = i=1 i ¿ 0, so  = &=%2 ¿ 0, and from (3) k k  i=1 i Bi 6%2 S()=&S()6&[S()∪B]. (This holds if =0 since then i=1 i Bi =0.)

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

137

k k Hence i=1 i Gi − B(A − ') 6 −  i=1 i Bi , and so ('; ; ) satisfy the system (7) that de2nes E(A|B). This shows that E(A|B) is at least as large as the quantity de2ned in the lemma. 3.2. Properties of natural extension Here we state the most important properties of natural extension. Proofs of these results are in Walley (1998); see also Section 3.1.2 of Walley (1991) for proofs in the special case of unconditional probability assessments, most of which readily generalise to the case of conditional probabilities. (a) For all events A and B, E(A|B) ¿ 0. (This was proved in Section 3.1.) (b) If the assessments AUL then E(A|B) 6 1 for all A and B. If the assessments incur uniform loss then there is at least one assessment P(Ai |Bi ) such that E(Ai |Bi ) = ∞. Thus AUL can be characterized in terms of natural extension: the assessments AUL if and only if E(Ai |Bi ) 6 1 for i = 1; : : : ; k. (c) E(Ai |Bi ) ¿ P(Ai |Bi ) for i = 1; : : : ; k. (d) Say that a 2nite collection of conditional lower probabilities is coherent when each conditional lower probability agrees with the corresponding natural extension of the collection. (This is equivalent to the de2nitions of coherence in Williams (1975) and Walley (1991, 1997).) For example, the assessments are coherent if and only if E(Ai |Bi )=P(Ai |Bi ) for i=1; : : : ; k. For precise probabilities, coherence is equivalent to dF-coherence and to AUL, but for imprecise probabilities coherence is stronger than AUL. If the assessments AUL and their natural extensions E(A|B) are de2ned for any 2nite collection of conditional events, then these natural extensions are coherent. It follows from coherence that, for example, the natural extensions satisfy: (i) if A ⊆ B ⊆ C, then E(A|B) ¿ E(A|C) ¿ E(A|B)E(B|C); (ii) E(A|B) 6 E(A ∪ C|B ∪ C). (e) If the assessments are coherent then E(A|B) is the minimal value of P(A|B) that is coherent with them, i.e., their minimal coherent extension. Any coherent collection of conditional lower probabilities can be coherently extended to any other A|B, and E(A|B) is the minimal coherent extension. (f) If the assessments AUL then the natural extension is the lower envelope of all collections of precise conditional probabilities that dominate the assessments and AUL. (Recall that, for precise probabilities, AUL is equivalent to dF-coherence.) Formally, let + index the non-empty set of all collections of precise conditional probabilities (P, (A|B); P, (A1 |B1 ); : : : ; P, (Ak |Bk )) which satisfy AUL and P, (Ai |Bi ) ¿ P(Ai |Bi ) for i = 1; : : : ; k. Then E(A|B) = min{P, (A|B): , ∈ +}. This property gives an indirect characterization of natural extension, in terms of a set of precise conditional probabilities. 3.3. Making inferences from precise probability assessments As explained in Section 1.3, m assessments of precise conditional probabilities can be replaced by 2m equivalent assessments of conditional lower probabilities. Suppose

138

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

that the assessments are dF-coherent, which is equivalent to AUL. To calculate what the assessments imply about a further conditional probability P(A|B), we compute the natural extensions E(A|B) and E(Ac |B). The next result shows that these two natural extensions give a complete solution to the problem of making inferences about P(A|B). Thus natural extension solves the Bayesian problem of inference. Lemma 4. Suppose that all the conditional probability assessments are precise and dF-coherent. Let E(A|B) = 1 − E(Ac |B). Then the set of values P(A|B) that are dF-coherent with the assessments is the closed interval [E(A|B); E(A|B)]. Proof. Because the assessments are precise, any precise conditional probabilities that dominate the assessments must coincide with them. Hence the set {P, (A|B) : , ∈ +} in 3.2(f) is the set of all values P, (A|B) that are dF-coherent with the assessments. By result 3.2(f), this set has minimum value E(A|B), and similarly its maximum is E(A|B). To complete the proof, verify that, when E(A|B) 6 - 6 E(A|B), adding P(A|B)=- to the assessments preserves the dF-coherence condition (4). Write m=(0 ; 1 ; : : : ; m ). For ; : : : ;  and any  ¿ 0, sup[ any real numbers  1 m 0 i=1 i Gi + 0 B(A − m -)|S()] ¿ sup[ i=1 i Gi + 0 B(A − E(A|B))|S()] ¿ 0 by (4), since P(A|B) = E(A|B) is dF-coherent with the assessments. The case 0 ¡ 0 is similar, with E(A|B) replaced by E(A|B). 3.4. An approximate algorithm for computing natural extension In this section and the next one, we derive two algorithms for computing natural extensions. We assume that the assessments AUL; in practical applications, this should be veri2ed 2rst by applying Algorithm 1 or 2. To compute the natural extension E(A|B) from (7), we must solve: maximise ' subject to and

 ¿ 0;  ¿ 0 k 

(9)

i (Gi + Bi ) + 'B 6 AB:

i=1

This is a parametric linear programming problem with scalar parameter . Problems of this type, in which the parameter appears in the matrix of linear constraints, are not easily solvable in general (Gal, 1984). Let '∗ () denote the maximum value of ' when (9) is solved for a 2xed value of . Using the assumption that the assessments AUL, the supremum '∗ () is achieved by some ('; ), for any 2xed positive , because the feasible region is closed and non-empty (the origin is always feasible) and, by 3.2(b), the objective function ' is bounded from above by 1. A crucial point is that, because the constraints in (9) become stronger as  increases, '∗ () is a non-increasing function of . It follows that E(A|B) is the limit of the maxima '∗ () as  → 0 from above, and this limit is no larger than '∗ (0). (Also, by a well known result from parametric linear programming

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

139

(Dinkelbach, 1969; Freund, 1985), '∗ can have only 2nitely many points of discontinuity.) So '∗ () can approximate E(A|B) arbitrarily closely by taking  to be suIciently small. By the preceding results, '∗ (0) ¿ E(A|B) ¿ '∗ () for all  ¿ 0. Thus we can 2nd upper and lower bounds for E(A|B) by solving two linear programming problems of the form (9), using  = 0 to give the upper bound and a very small positive value of  to give the lower bound. Moreover, the lower bound '∗ () is arbitrarily close to E(A|B) when  is suIciently small. That suggests the following simple algorithm for computing the natural extension. Algorithm 3. Fix a very small positive value of  and solve the linear program (9). The maximized value of ' is a lower bound for E(A|B), and it agrees with E(A|B) to a very close approximation if  is suIciently small. To 2x  in Algorithm 3, we suggest using the same rule as for Algorithm 1:  = min{10−10 ; 10&}; where & is the rounding error in computations. Algorithm 3 involves a linear program with k + 1 variables ('; 1 ; : : : ; k ) and at most k + s linear constraints, where s is the cardinality of the possibility space . Because AB ¿ 0, the simplex method can be used to solve the linear program without any initialization, since a starting point is found at once by adding the slack variables. Algorithm 3 will give the correct value of E(A|B), with an accuracy that is much better than the precision of the assessments, in almost all practical problems. Nevertheless, whatever positive value of  is used in Algorithm 3, we can construct pathological examples, similar to Example 5, in which Algorithm 3 does not give a good approximation to the correct value (e.g., see Example 8 of Walley et al. (1999)). Essentially, that happens when the function '∗ has a discontinuity at some positive value between 0 and . After using Algorithm 3 to compute '∗ (), it may be useful to also compute '∗ (0), by solving (9) with  = 0. That gives the upper and lower bounds '∗ (0) ¿ E(A|B) ¿ '∗ (): If the diOerence between the upper and lower bounds is negligible then the upper bound can be adopted as the solution. In many problems, the function '∗ is right-continuous at 0, in which case the upper bound is the exact solution; see Section 3.8. It is shown in Section 3.7 that one version of probabilistic logic adopts the upper bound '∗ (0) as the general answer to the inference problem. If the diOerence between the upper and lower bounds '∗ (0) and '∗ () is nonnegligible, when  is extremely small, the most likely reason is that the function '∗ is discontinuous at 0; several examples of this will be given later. In this case, '∗ () is almost certain to be a much better approximation to E(A|B) than '∗ (0) is. Another reason for preferring '∗ () to '∗ (0) as an estimate of E(A|B) is that '∗ () is a conservative or cautious estimate, since it is a lower bound for a lower probability. Some caution is needed when computing '∗ (0), because the corresponding linear programming problem is not always bounded and it is possible that '∗ (0) = +∞. For example, given the single assessment P(B) = 0 and an event A that is logically independent of B, it can be veri2ed from (9) that the natural extension is E(A|B) = 0, but that '∗ (0) = +∞.

140

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

3.5. An exact algorithm for computing natural extension As noted in the previous section, whatever value of  is used in Algorithm 3, we can construct arti2cial examples in which Algorithm 3 yields a poor approximation to E(A|B). The following theory leads to an iterative algorithm which always gives the exact value of E(A|B). As shown earlier, E(A|B) = lim↓0 '∗ () 6 '∗ (0), but if '∗ is not right-continuous at 0 then E(A|B) ¡ '∗ (0). The crucial step in an exact algorithm to compute E(A|B) is to 2nd a subset of the assessments, indexed by I ⊆ {1; 2; : : : ; k}, which determines E(A|B) and for which the function 'I∗ (de2ned using only the subset of assessments I ) is right-continuous at 0, so that E(A|B) = 'I∗ (0). Then E(A|B) can be constructed from the subset {P(Ai |Bi ) : i ∈ I }, without using the other assessments, and we can set  = 0 in the computations. The appropriate set I is identi2ed in the next lemma.  Lemma 5. Let I be the largest subset of {1; 2; : : : ; k} with E(B| i∈I Bi ∪ B) ¿ 0. (A unique largest subset exists because if two sets I1 and I2 have this property then so does I1 ∪ I2 .) Then E(A|B) is the maximum value in the following linear program: maximise

'

subject to

i ¿ 0 (i ∈ I )  i Gi + 'B 6 AB:

and

(10)

i∈I

Proof. First suppose that ('; ) satisfy condition (8) of Lemma 3. Then simple manipk ulation of (8) shows that sup[ i=1 i Gi − (S() ∪ B)(B − &)|S() ∪ B] ¡ 0, for some & ¿ 0. (This holds if ' ¿ 0 in (8). If ' ¡ 0, the same inequality can be obtained by dividing by 1 − '.) This implies that E(B|S() ∪ B) ¿ & ¿ 0. It follows that, in (8), S() ⊆ i∈I Bi , so I () ⊆ I . This means that any  which satis2es the conditions of Lemma 3 must have i = 0 whenever i ∈ I . Since Lemma 3 characterizes the natural extension E(A|B), this implies that E(A|B) can be computed by natural extension from the subset of assessments indexed by I . Let 'I∗ () denote the maximum value in the modi2ed linear program that is obtained from (9) by adding the constraints that i =0 whenever  i ∈ I . Because this is equivalent to reducing the set of assessments to I , and E(B| i∈I Bi ∪ B) ¿ 0 by de2nition of I , the suIcient condition for right-continuity of '∗ in the later Lemma 7 implies that 'I∗ is right-continuous at 0 and hence that E(A|B) = 'I∗ (0). This gives the characterization (10). Lemma 5 shows that, given the set I , E(A|B) can be computed exactly from the single linear program (10). To use this result in practice, we need to be able to determine I . It is not obvious that the de2nition of I  given in Lemma 5 is useful,  because it requires 2nding the natural extensions E B| i∈J Bi ∪ B for various sets

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

141

J . The next lemma gives some other characterizations of I which are more useful, especially (c) which will be used in Algorithm 4. Lemma 6. The set I , dened in Lemma 5, is characterized by each of the following conditions. (a) I = {i: E(B|Bi ∪ B) ¿ 0; i = 1; 2; : : : ; k}. (b) I is  the largest  subset of {1; 2; : : : ; k} for which there is  ¿ 0 such that sup[ i∈I i Gi | i∈I Bi ∩ Bc ] ¡ 0. (c) I is  the largest  subset of {1; 2; : : : ; k} for which there is  ¿ 0 such that sup[ i∈I i Gi + i∈I Bi |Bc ] 6 0.  Proof. (a) j ∈ I implies that E(B|Bj ∪ B) ¿ 0, since otherwise E(B| i∈I Bi ∪ B) 6 E(B|Bj ∪ B) 6 0, by property 3.2(d)(i) which follows from coherence of the natural extensions. Conversely, suppose that   E(B|Bj ∪ B) ¿ 0 and let J = I ∪ {j}. By 3.2(d) (ii), E(Bj ∪B| i∈J Bi ∪B) ¿ E(B| i∈I  Bi ∪B, so  Bi ∪B) ¿ 0. Also B ⊆ Bj ∪B ⊆ i∈J it follows, using 3.2(d) (i), that E(B| i∈J Bi ∪ B) ¿ E(B|Bj ∪ B)E(Bj ∪ B| i∈J Bi ∪ B) ¿ 0. This shows that J ⊆ I , hence j ∈ I .  (b) By de2nition, I is maximal such that E(B| i∈I Bi ∪ B) ¿ 0. Using Lemma 3, k this i=1 i Gi −  is equivalent to the existence of ¿ 0, - ¿ 0 and  ¿ 0 such that ( i∈I Bi ∪ B)(B − -) + (S() ∪ [ i∈I Bi ] ∪ B) 6 0: On B, this is equivalent to k  k c  i=1 i Gi + - +  6 1, and on B it is equivalent toc i=1 i Gi + - i∈I Bi + (S() ∪ i∈I Bi ) 6 0. If (; -; ) satisfy the condition on B then so do (&; &-; &) whenever & ¿ 0, and by taking & to be suIciently small the condition on B can also be satis2ed. This shows that the condition on B is redundant. Hence, writing I () = {i: c i ¿ 0} and rewriting  the conditionc on B , I is maximal such that there is  ¿ 0 with sup[ i∈I () i Gi | i∈I ∪I () Bi ∩ B ] ¡ 0. But we can replace I by I ∪ I () without changing this condition, so the maximal I must contain I (). Hence we obtain the characterization in (b).   (c) By (b), I is maximal such that sup[ i∈I i Gi +  i∈I Bi |Bc ] 6 0 for some  ¿ 0,  ¿ 0. This isequivalentto (c), as can be seen by multiplying this inequality by k−1 and using k i∈I Bi ¿ i∈I Bi .   Except for the conditioning on Bc , the inequality sup[ i∈I i Gi + i∈I Bi |Bc ] 6 0 in Lemma 6(c) agrees with the inequality (5) for uniform loss. We can therefore apply the iterative method of Algorithm 2 to 2nd the largest set I for which the inequality holds (for some  ¿ 0), in at most k steps, by successively reducing the set I from its initial value {1; 2; : : : ; k} until the inequality is satis2ed. This is done in steps (a) – (c) of Algorithm 4, which are essentially the same as Algorithm 2 but with conditioning on Bc in (11). (Thus, identifying the set I to use in computing a natural extension E(A|B) is essentially the same as checking AUL on the reduced possibility space Bc .) Lemmas 5 and 6(c) tell us that using the resulting set I in the linear program (10) will determine E(A|B); this is the 2nal step (d) of Algorithm 4. The 2nal set I may be empty, in which case (10) gives the vacuous answer: E(A|B) = 1 if A ⊇ B, E(A|B) = 0 otherwise. This produces the following exact algorithm for computing E(A|B).

142

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

Algorithm 4. (a) Set I = {1; 2; : : : ; k}. (b)  Maximise %i i∈I

subject to and

 ¿ 0; 0 6 %i 6 1 (i ∈ I )     c sup i Gi + %i Bi |B 6 0: i∈I

(11)

i∈I

(c) If %i =1 for all i ∈ I then go to (d). Otherwise, replace I by the subset {i ∈ I : %i =1}. If I is non-empty then return to (b). (d) Solve the linear program (10). The maximized value of ' in (10) is the exact value of E(A|B).  At each iteration of(b), I isc replaced by its largest subset, I , with the property that sup[ i∈I i Gi + i∈I  Bi |B ] 6 0. (As in Algorithm 2, the solution  in (b) must have %i = 0 or %i = 1 for all i ∈ I , and  has as many components as possible equal to 1.) The 2nal set I , used in (d), satis2es I  = I since %i = 1 for all i ∈ I , so I is the set characterized in Lemma 6(c). Whenever we use Algorithm 4 to compute an unconditional natural extension E(A), the constraint in (11) of the algorithm is trivially satis2ed since Bc =∅, and consequently I = {1; 2; : : : ; k} in step (d). In this case, Algorithm 4 works in a single step by solving the linear program (10) with I = {1; 2; : : : ; k}. Whenever the more general condition k E(B| i=1 Bi ∪ B) ¿ 0 is satis2ed, Lemmas 5 and 6 show that Algorithm 4 involves no more than two linear programs and again I = {1; 2; : : : ; k} in (10).

3.6. Numerical examples Example 8. In the football example, the subject wishes to evaluate the probability of the event A, that team X wins the tournament. First consider just the precise probability assessments in Table 1. By applying Algorithm 3 or 4 to compute E(A) = 0:325 and E(Ac ) = 0:47, we obtain the lower and upper bounds 0:325 and 0:53 for P(A). These bounds could also be obtained by summing the probabilities of the events Ci that imply A (to get the lower bound), and of those Ci that are consistent with A (to get the upper bound). Now consider the eOect of combining the 10 imprecise probability assessments in Table 2 with those in Table 1. By applying Algorithm 3 or 4 again to compute the natural extensions, we now obtain E(A) = 0:3955 and E(A) = 0:4945. Although the 10 extra assessments are quite imprecise, they substantially reduce the interval [E(A); E(A)]. Again, the LP algorithms were not really needed here: because of the simple structure of the events involved, have been calculated through the 27 the natural extensions could 27 formulae E(A) = i=1 P(A|Ci )P(Ci ) and E(A) = i=1 P(A|Ci )P(Ci ).

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

143

Example 9. Consider the six assessments of precise unconditional probabilities that were given in Example 1. Using Algorithm 3 or 4, we can compute the natural extensions to any conditional or unconditional events. For example, we obtain the natural extensions E(A3 ) = 0:4 and E(A3 ) = 0:8 concerning A3 , and E(A4 |A3 ) = 0:375 and E(A4 |A3 ) = 0:75 concerning A4 |A3 . 3.7. The dual problem, and comparison with probabilistic logic Algorithm 3 requires the solution of a single linear program (9) with 2xed  ¿ 0. Equivalently, we can solve the dual linear programming problem, which can be formulated as follows. Again we assume that the assessments AUL. Let M() denote the set of all unconditional probability measures that assign non-negative expectation to each of the random variables Gi∗ () = Gi + Bi = Bi (Ai − ci + ) (i = 1; 2; : : : ; k). Then M() is the set of all probability measures P that are compatible with each of the reduced assessments P(Ai |Bi ) = ci − , in the sense that P(Ai ∩ Bi ) ¿ (ci − )P(Bi ). As  increases, these constraints become weaker and therefore the set M() is non-decreasing in . When  ¿ 0, M() contains M(0), the set of compatible probability measures that was de2ned in Section 2.4, which is non-empty since the assessments ASL. Thus M() is non-empty whenever  ¿ 0. The acceptable gambles Gi∗ () (i = 1; 2; : : : ; k) generate a larger set of acceptable k gambles D()={Z: Z ¿ i=1 i Gi∗ ();  ¿ 0}. It follows from standard duality results, as in (Walley, 1991, Sec. 4.2.1), that D() can be written in the dual form: D() = {Z: EP (Z) ¿ 0 whenever P ∈ M()}. By applying this duality result to Z = B(A − '), we obtain a dual expression for the quantity '∗ () that was de2ned in Section 3.4: ∗

' () = max{': B(A − ') ¿

k 

i Gi∗ ();  ¿ 0}

i=1

= max{': B(A − ') ∈ D()} = max{': EP (B(A − ')) ¿ 0 whenever P ∈ M()} = max{': P(A ∩ B) ¿ 'P(B) whenever P ∈ M()} = sup{': P(A|B) ¿ ' whenever P(B) ¿ 0; P ∈ M()} = inf {P(A|B): P(B) ¿ 0; P ∈ M()}:

(12)

When  ¿ 0, '∗ () 6 E(A|B) 6 1 by 3.2(b) (using AUL), and hence the set in (12) is non-empty. Equation (12) still holds when  = 0, but then the set in (12) may be empty and in that case '∗ (0) = +∞. As noted in Section 2.4, M(0) is the set of probability measures that satisfy Hailperin’s linear constraints. In Hailperin’s theory and in probabilistic logic, the assessments are regarded as consistent if and only if M(0) is non-empty, which holds if and only if the assessments ASL. The solution to the inference problem that is adopted in some versions of probabilistic logic, such as (Jaumard et al., 1991), is also

144

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

based on M(0). There '∗ (0) = inf {P(A|B): P(B) ¿ 0; P ∈ M(0)}

(13)

is taken to be the best possible lower bound for the values of P(A|B) that are consistent with the assessments, whenever the set in (13) is non-empty. Assuming AUL, the set in (13) is non-empty (i.e., '∗ (0) is 2nite) if and only if E(B) ¿ 0. (To prove that, use Corollary 1 in Section 3.8 to show that E(B)=1−E(Bc |)=1−inf {P(Bc ): P ∈ M(0)}= sup{P(B): P ∈ M(0)}.) Formula (13) was proposed in Walley (1991) as a way of constructing conditional probabilities from unconditional ones; there it was called the regular extension to distinguish it from the natural extension. Thus, one version of probabilistic logic proposes '∗ (0) as the solution to the inference problem, instead of the natural extension E(A|B) proposed here. The two solutions often diOer: generally E(A|B)=lim↓0 '∗ () 6 '∗ (0), but often E(A|B) ¡ '∗ (0) because '∗ is discontinuous at 0. Conditions under which the two solutions agree are studied in Section 3.8, and examples where they disagree are given in Examples 4, 10, 11 and 13. The probabilistic logic solutions to the consistency and inference problems actually deviate from our solutions in opposite directions. When the two approaches give diOerent answers to the consistency problem, the set of compatible probability measures in probabilistic logic, M(0), is too large, because eOectively it ignores those assessments P(Ai |Bi ) for which the conditioning event Bi can be assigned probability zero, and the resulting concept of consistency (ASL) is too weak. But when the two approaches give diOerent inferences concerning A|B, probabilistic logic computes them from a set that is too small, obtained by removing from M(0) all the probability measures with P(B)=0, and the resulting upper and lower bounds for P(A|B) are too tight. Although the inferences in probabilistic logic are equivalent to setting  = 0 in the de2nition of natural extension (7), the de2nition of consistency in probabilistic logic is not equivalent to setting  = 0 in the de2nition of AUL (1). (12) gives an alternative method of computing E(A|B), by solving a linear program that is dual to the one solved in Algorithm 3. To obtain this dual program, let  = {!1 ; : : : ; !s } denote the possibility space. Each unconditional probability measure P for which s P(B) ¿ 0 corresponds to a vector (y1 ; : : : ; ys ) with yj ¿ 0 (j = 1; : : : ; s) and j=1 B(!j )yj = 1, where yj is determined by P through yj = P(!j )=P(B). Then s P(A|B) = P(A ∩ B)=P(B) = j=1 A(!j )B(!j )yj , and the linear constraints that de2ne M() correspond to linear constraints on the variables yj . Hence, '∗ () is the in2mum value of the objective function in the following linear program: minimize

s 

A(!j )B(!j )yj

(14)

j=1

subject to

s 

Bi (!j )[Ai (!j ) − ci + ]yj ¿ 0

(i = 1; : : : ; k);

(15)

j=1 s  j=1

B(!j )yj = 1;

and yj ¿ 0

(j = 1; : : : ; s):

(16)

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

145

The value of '∗ (0), which is used to make inferences in probabilistic logic, can be computed by setting  = 0 in the dual linear program (14)–(16), as in Jaumard et al. (1991). The (generally diOerent) value of the natural extension E(A|B) can be computed, to a very close approximation, by using a very small positive  in the dual program. The dual formulation of the problem is closely related to the algorithm of Pelessoni and Vicig (1998). This algorithm computes the exact value of E(A|B) in a 2nite sequence of steps, each involving one LP problem, except for the last step which involves two LP problems. When the algorithm terminates in just one step, the second LP problem is exactly the problem (14)–(16) with  = 0. When it requires more than one step, the algorithm sequentially eliminates those of the initial assessments which are not essential in computing E(A|B), solving at the last step a problem analogous to (14)–(16) with  = 0 that involves only the reduced set of assessments. 3.8. Continuity of '∗ at zero We have seen that inferences in probabilistic logic may be made from '∗ (0), and that '∗ (0) agrees with the natural extension E(A|B) if and only if '∗ is right-continuous at 0. In that case, E(A|B) can be computed exactly by setting  = 0 in either (9) or the dual (14)–(16), and solving a single linear program. It is therefore important to investigate when '∗ is right-continuous at 0. The next example shows that this does not always happen. Example 10. As in Example 4(b), suppose that the only assessment is P(Ac ∩ B) = 0, which is equivalent to P(A ∪ Bc ) = 1, where ∅ ⊂ A ⊂ B ⊂ , and we wish to compute E(A|B). By (9), '∗ () is the maximum value of ' such that B(A − ') ¿ (G + ) for some  ¿ 0, where G = (A ∪ Bc ) − 1 = −(Ac ∩ B). By considering the values of this expression on Bc ; Ac ∩ B and A, we obtain the three inequalities: 0 ¿ ;

−' ¿ (−1 + );

1 − ' ¿ :

(17)

Because  ¿ 0 and  ¿ 0, the 2rst inequality gives  =0 and then the second inequality gives ' 6 0. Hence we obtain '∗ () = 0 for all  ¿ 0, giving the natural extension E(A|B) = 0. But setting  = 0 in (17) gives ' 6  and ' 6 1, hence '∗ (0) = 1. Thus '∗ has a large discontinuity at 0. Here E(A|B) = 1, and B has positive upper probability E(B) = 1. According to our solution, the lower and upper bounds for P(A|B) are the vacuous bounds 0 and 1: the assessment is completely uninformative about P(A|B). But according to probabilistic logic, the assessment implies the precise probability P(A|B) = 1. Next we give suIcient conditions for '∗ to be right-continuous at 0. Lemma 7. Assume that the assessments AUL. Then a suDcient condition for E(A|B)= k '∗ (0), i.e., for right-continuity of '∗ at 0, is that E(B| i=1 Bi ∪ B) ¿ 0. Hence it is suDcient that E(B) ¿ 0.

146

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

Proof. Here we cannot simply apply the suIcient conditions given in the operations research literature (Bereanu, 1976) for continuity of '∗ , because the optimality region and feasible region, in the variables ' and , are not always bounded. First assume that '∗ (0) is 2nite, and write ,='∗ (0). In that case, the supremum , is achieved, and hence k  there is  ¿ 0 such that B(A − ,) ¿ i∈I 1i Gi . Assuming that E(B| i=1 Bi ∪ B) ¿ 0,  k any & ¿ 0, there are  ¿ 0 and % ¿ 0 such that B − %( i=1 Bi ∪ B) ¿ i∈I i Gi . Given  let  = k −1 &%=max{1i + &i : i = 1; : : : ; k}, so  ¿ 0. Then B(A − , + &) ¿ i∈I (1i + k  &i )Gi +&%( i=1 Bi ) ¿ i∈I (1i +&i )(Gi +Bi ). (This holds also if all values of 1i and i are zero, since then the last term is zero for all  ¿ 0.) By de2nition of the natural extension, E(A|B) ¿ , − &. Since & is arbitrarily small, E(A|B) ¿ ,, and it follows that E(A|B) = , = '∗ (0) since always E(A|B) 6 '∗ (0). Thus '∗ is right-continuous at zero. The same argument shows that '∗ (0) must be 2nite, because otherwise , can be chosen to be arbitrarily large and we obtain E(A|B) ¿ ,, which contradicts E(A|B) 6 1. Using 3.2(d), the second statement in the lemma follows from the coherence property k E(B| i=1 Bi ∪ B) ¿ E(B|) = E(B). k When the suIcient condition E(B| i=1 Bi ∪ B) ¿ 0 in Lemma 7 is satis2ed, Algorithm 4 computes the natural extension E(A|B) by solving two linear programs: the 2rst linear program eOectively veri2es that the suIcient condition holds, and the second linear program computes '∗ (0). Compare the suIcient condition E(B) ¿ 0 in Lemma 7 with the weaker condition E(B) ¿ 0, which is necessary for continuity of '∗ at 0 (assuming that the assessments AUL). In fact, if this weaker condition fails then '∗ (0) = ∞. (The condition E(B) ¿ 0 was omitted by mistake from Walley (1996, Eq. 1).) In the special case where the assessments AUL and a precise probability P(B) is assessed, or is precisely determined by the assessments, it is necessary and suIcient for right-continuity of '∗ at 0 that P(B) ¿ 0, i.e., that '∗ (0) is 2nite. To check either condition in Lemma 7, we must compute a natural extension. The following stronger condition is much easier to verify. Corollary 1. Assume that the assessments AUL. A suDcient condition for rightk continuity of '∗ at 0 is that B contains i=1 Bi . Hence it is suDcient that B = . Thus, the computation of an unconditional natural extension E(A) requires just the single linear program (9), or its dual (14)–(16), with  = 0. Example 11. Consider again the six assessments in Example 1. Let A = (A1 ∩ A2 )c and B = (A1 ∩ A2 ) ∪ (Ac1 ∩ Ac2 ). Here we investigate the continuity of '∗ at zero, for three natural extensions: E(A), E(B), and E(A|B). To study the behaviour of '∗ , we solved the LP problem (9) for several values of , including zero. The results are shown in Table 3. First consider the results for A|. The second column of the table shows that E(A)=1 and that '∗ is continuous at zero, which also follows from Corollary 1 since this is an unconditional natural extension. The values of '∗ () for small  are very good approximations to E(A). Here E(A) = 1 implies that E(A1 ∩ A2 ) = 0, so every coherent

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

147

Table 3 Values of '∗ () for several values of  in Example 1, in computing three natural extensions, where A = (A1 ∩ A2 )c and B = (A1 ∩ A2 ) ∪ (Ac1 ∩ Ac2 ) 

A|

B|

A|B

10−1 10−2 10−3 10−4 10−5 10−6 10−7 0

0:8 0:98 0:998 0:9998 0:99998 0:999998 0:9999998 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1

extension of the assessments must have P(A1 ∩ A2 ) = 0. The third column of the table shows that E(B) = 0 and again, by Corollary 1, '∗ is continuous at zero. In both these cases it would have been suIcient to compute '∗ (0). The third example, concerning A|B, involves conditioning on an event of lower probability zero. From the fourth column of Table 3, the result is E(A|B) = 0, which means that E(A1 ∩ A2 |B) = 1. Here '∗ has a large discontinuity at 0. Natural extension gives 0 as the lower bound for P(A|B), whereas probabilistic logic gives the very diOerent lower bound '∗ (0) = 1. 3.9. Checking coherence of imprecise probability assessments By 3.2(d), the lower probability assessments P(Ai |Bi ) = ci (i = 1; : : : ; k) are coherent if and only if E(Ai |Bi ) = ci for i = 1; : : : ; k. To check coherence of the assessments, it therefore suIces to compute the natural extensions E(Ai |Bi ) (i=1; : : : ; k). This involves k problems of the form (9), one for each assessment. Using Algorithm 3, it requires solving k linear programs. This method of checking coherence can be simpli2ed slightly, using result 3.2(c) that E(Ai |Bi ) ¿ P(Ai |Bi ) for i = 1; : : : ; k. Hence it suIces to check that E(Ai |Bi ) 6 ci for i = 1; : : : ; k. Using the de2nition of natural extension, this is true if and only if the k system of inequalities  ¿ 0,  ¿ 0 and Gj − Bj ¿ i=1 i (Gi + Bi ) has no solution (; ), for each j =1; : : : ; k. Again this can be determined in practice by solving k linear programs. If both precise and imprecise (lower probability) assessments are made, to check coherence of the whole system it suIces to check AUL of the system and to check that E(Ai |Bi ) 6 P(Ai |Bi ) for each of the lower probability assessments. Example 12. Consider the assessments for the football example in Tables 1 and 2, which are equivalent to a system of 64 lower probability assessments. To check the coherence of this system, (a) we used Algorithm 1 to verify that the assessments AUL, and (b) we used Algorithm 3 to compute the natural extensions E(A|Ci ) and E(Ac |Ci ) corresponding to each of the lower and upper probabilities in Table 2. For (b), we can

148

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

set  = 0 in each application of Algorithm 3, since we know from Table 1 that each conditioning event Ci has positive probability. We cannot set  = 0 in step (a), because precise probability assessments are involved and therefore (1) has solutions for  = 0 (see the last paragraph of Section 2.4). We 2nd that all the natural extensions agree with the corresponding upper or lower probabilities in Table 2, and therefore the 64 assessments are coherent. If the assessments AUL then their natural extensions {E(Ai |Bi ): i = 1; : : : ; k} are always coherent. If the assessments AUL but are not coherent, then their natural extensions can be regarded as coherent ‘corrections’ of the assessments, in the sense that at least one lower probability assessment is increased (or an upper probability decreased) to achieve coherence. In fact, the natural extensions make the minimal corrections of this type to achieve coherence. Example 13. Consider the 10 assessments of conditional upper and lower probabilities for the football example, given in Table 2, plus one further assessment that P(A|B) = 0:625, where A and B are de2ned in Example 7. We can check that the 11 assessments AUL, by using Algorithm 1 or checking that (1) has no solution for  = 0. But we 2nd, using Algorithm 3 or 4, that the natural extension E(A|(L; W; L)) = 0:625, which diOers from the assessment P(A|(L; W; L)) = 1. Thus the 11 assessments are incoherent. Reducing the assessment to P(A|(L; W; L)) = 0:625 does achieve coherence. Another way of achieving coherence in this example is to increase the assessment of P(A|B) slightly, from 0:625 to 0:65. It can be veri2ed, using the same procedure, that these 11 assessments are coherent. In particular, we now obtain E(A|(L; W; L)) = P(A|(L; W; L)) = 1. Here E(A|(L; W; L)) is computed via E(Ac |(L; W; L)), which produces another example of a function '∗ that is discontinuous at zero: E(Ac |(L; W; L)) = lim↓0 '∗ () = 0 but '∗ (0) = 0:35. Here the conditioning event (L; W; L) does have positive upper probability, and in fact E((L; W; L)) = 1. Hence, probabilistic logic obtains the upper bound 1 − '∗ (0) = 0:65 for P(A|(L; W; L)), whereas natural extension gives the upper bound E(A|(L; W; L)) = 1. 4. Conclusions We have shown that the problems of checking the consistency of 2nitely many conditional probability or lower probability assessments, and making inferences from them, can be formulated as special kinds of parametric linear programming problems. In practice, each problem can be solved through a single linear program, by using Algorithm 1 to check consistency or Algorithm 3 to make inferences, with a suIciently small positive value of . Unlike earlier attempts to solve the problems, we have formulated them directly in terms of the assessments, without introducing unknown probabilities. Our results show that general versions of the consistency and inference problems, where conditional probability assessments may be imprecise and any of the conditioning events may have probability zero, are both conceptually simpler (illustrated by the simple, explicit formulae for consistency (1) and inference (7)) and

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

149

computationally simpler (using the one-step linear programs involved in Algorithms 1 and 3) than was previously recognized. The versions of the consistency and inference problems studied here are quite general, but there is scope for further generalization in the following respects. (a) From probabilities to previsions: Instead of conditional lower probabilities, we can allow any assessments of conditional lower previsions (or expectations) P(Xi |Bi ), where Xi is a simple random variable (i.e., one that has only 2nitely many possible values). This formulation is more general, because the lower probability of an event A can be identi2ed with the lower prevision of its indicator function. Similarly, we may need to calculate the natural extension of the assessments to a lower prevision E(X |B). Results in Walley (1991) show that this more general problem can be solved by the methods described in this paper: the de2nitions of AUL and natural extension can be extended from events to simple random variables by replacing Ai by Xi in (1) and (7), and replacing A by X in (7) (Walley, 1997). (b) Independence judgements: The consistency and inference problems need to be generalized to allow judgements of independence or conditional independence (Walley, 1991, 1996), as well as conditional probability assessments. (c) Innitely many assessments: The problems need to be generalized to allow in2nitely many assessments of conditional lower probabilities or previsions, e.g., to handle statistical problems where either the parameter space or sample space is in2nite. This introduces extra complications because the de2nitions of AUL and natural extension must be generalized to include conglomerative conditions, as in Walley (1991, Chs. 6–8). (d) Very large LP problems: Problems of similar size to the examples in this paper can be solved by commonly used optimization programs. However, when the number of assessments is large, the possibility space  that they generate (as outlined in Section 1.3) may be so large that the LP problems become intractable. In such cases, it may be useful to combine our algorithms with ‘row generation’ methods that enable us to solve the LP problems without specifying  explicitly. Such methods are dual to the ‘column generation’ methods of Jaumard et al. (1991), which make the LP problems computationally tractable even when many assessments are involved.

Acknowledgements Peter Walley wishes to thank Dipartimento di Matematica Applicata ‘B. de Finetti’, UniversitSa di Trieste, for supporting this research project during a visit to Trieste in 1998, and also FAPESP and Escola PolitTecnica, Universidade de S˜ao Paulo, and Departamento de Ciencias de la ComputaciTon e Inteligencia Arti2cial, Universidad de Granada, for support in the later stages of the project. We are especially grateful to Fabio Cozman for comments on an earlier version of this paper (Walley, Pelessoni and Vicig, 1999), and to the referees for suggesting some improvements to the presentation.

150

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

References Bereanu, B., 1976. The continuity of the optimum in parametric programming and application to stochastic programming. J. Optim. Theory Appl. 18, 319–333. Biazzo, V., Gilio, A., Lukasiewicz, T., San2lippo, G., 2001. Probabilistic logic under coherence, model-theoretic probabilistic logic, and default reasoning. In: Benferhat, S., Besnard, P. (Eds.), Lecture Notes in Arti2cial Intelligence, Vol. 2143. Springer, Berlin, pp. 290 –302. Boole, G., 1854a. An Investigation of the Laws of Thought, on which are Founded the Mathematical Theories of Logic and Probabilities. Macmillan, London. Reprinted in 1958 by Dover, New York. Boole, G., 1854b. On the conditions by which solutions of questions in the theory of probabilities are limited. London Edinburgh Dublin Philos. Magazine J. Sci. 4 (8), 91–98. Bruno, G., Gilio, A., 1980. Applicazione del metodo del simplesso al teorema fondamentale per le probabilitSa nella concezione soggettiva. Statistica 40, 337–344. Coletti, G., 1994. Coherent numerical and ordinal probabilistic assessments. IEEE Trans. Systems Man Cybernet. 24, 1747–1754. Coletti, G., Scozzafava, R., 1996. Characterization of coherent conditional probabilities as a tool for their assessment and extension. Internat. J. Uncertainty Fuzziness Knowledge-Based Systems 4, 103–127. Dinkelbach, W., 1969. SensitivitXatsanalysen und Parametrische Programmierung. Springer, Berlin. Fagin, R., Halpern, J.Y., Megiddo, N., 1990. A logic for reasoning about probabilities. Inform. Comput. 87, 78–128. de Finetti, B., 1930. Problemi determinati e indeterminati nel calcolo delle probabilitSa. Rend. Reale Accademia dei Lincei 6, 367–373. de Finetti, B., 1974. Theory of Probability, Vol. 1. Wiley, London. Freund, R.M., 1985. Postoptimal analysis of a linear program under simultaneous changes in matrix coeIcients. Math. Programming Stud. 24, 1–13. Frisch, A.M., Haddawy, P., 1994. Anytime deduction for probabilistic logic. Arti2cial Intelligence 69, 93–122. Gal, T., 1984. Linear parametric programming—a brief survey. Math. Programming Stud. 21, 43–68. Gale, D., 1960. The Theory of Linear Economic Models. McGraw-Hill, New York. Gilio, A., 1995. Algorithms for precise and imprecise conditional probability assessments. In: Coletti, G., Dubois, D., Scozzafava, R. (Eds.), Mathematical Models for Handling Partial Knowledge in Arti2cial Intelligence. Plenum Press, New York, pp. 231–254. Gilio, A., 1996. Algorithms for conditional probability assessments. In: Berry, D.A., Chaloner, K.M., Geweke, J.K. (Eds.), Bayesian Statistics and Econometrics. Wiley, New York, pp. 29–39. Hailperin, T., 1965. Best possible inequalities for the probability of a logical function of events. Amer. Math. Monthly 72, 343–359. Hailperin, T., 1986. Boole’s Logic and Probability, second enlarged edition. Studies in Logic and the Foundations of Mathematics, Vol. 85. North-Holland, Amsterdam. Hansen, P., Jaumard, B., 1996. Probabilistic satis2ability. Research Report G-96-31. Les Cahiers du GERAD, MontrTeal. Hansen, P., Jaumard, B., Poggi de Arag˜ao, M., 1995. Boole’s conditions of possible experience and reasoning under uncertainty. Discrete Appl. Math. 60, 181–193. Holzer, S., 1985. On coherence and conditional prevision. Boll. Un. Mat. Ital. Serie VI (IV-C), 441–460. Jaumard, B., Hansen, P., Poggi de Arag˜ao, M., 1991. Column generation methods for probabilistic logic. ORSA J. Comput. 3, 135–148. Jeroslow, R.G., 1973. Asymptotic linear programming. Oper. Res. 21, 1128–1141. Lad, F., 1996. Operational Subjective Statistical Methods. Wiley, New York. Lad, F., Dickey, J.M., Rahman, M.A., 1990. The fundamental theorem of prevision. Statistica 50, 19–38. Nilsson, N.J., 1986. Probabilistic logic. Arti2cial Intelligence 28, 71–87. Nilsson, N.J., 1993. Probabilistic logic revisited. Arti2cial Intelligence 59, 39–42. Paass, G., 1988. Probabilistic logic. In: Smets, P., Mamdani, A., Dubois, D., Prade, H. (Eds.), Non-Standard Logics for Automated Reasoning. Academic Press, London, pp. 213–251. Pelessoni, R., Vicig, P., 1998. A consistency problem for imprecise conditional probability assessments. In: Proceedings of IPMU’98, Vol. 2. E.D.K., Paris, pp. 1478–1485.

P. Walley et al. / Journal of Statistical Planning and Inference 126 (2004) 119 – 151

151

Vicig, P., 1996. An algorithm for imprecise conditional probability assessments in expert systems. In: Proceedings of IPMU’96, Vol. 1. Proyecto Sur de Ediciones, Granada, pp. 61– 66. Vicig, P., 1997. Upper and lower bounds for coherent extensions of conditional probabilities given on 2nite sets. Research Report 7/97. Dip. Mat. Appl. ‘B. de Finetti’, Univ. di Trieste, Trieste. Walley, P., 1991. Statistical Reasoning with Imprecise Probabilities. Chapman & Hall, London. Walley, P., 1996. Measures of uncertainty in expert systems. Arti2cial Intelligence 83, 1–58. Walley, P., 1997. Coherent upper and lower previsions. Unpublished manuscript. Walley, P., 1998. An introduction to the theory of natural extension. Unpublished manuscript. Walley, P., Pelessoni, R., Vicig, P., 1999. Direct algorithms for checking coherence and making inferences from conditional probability assessments. Research Report 6/99. Dip. Mat. Appl. ‘B. de Finetti’, Univ. di Trieste, Trieste. Williams, P.M., 1975. Notes on conditional previsions. Research Report. School of Math. and Phys. Science, University of Sussex.