Reconciliation of multiple probability assessments

ORGANIZATIONAL BEHAVIORAND HUMAN PERFORMANCE28, 395-414 (1981) Reconciliation of Multiple Probability Assessments ANTHONY N. S. FREELING Decision Sci...

Download PDF

1MB Sizes 0 Downloads 56 Views

Report

PDF Reader
Full Text

ORGANIZATIONAL BEHAVIORAND HUMAN PERFORMANCE28, 395-414 (1981)

Reconciliation of Multiple Probability Assessments ANTHONY N. S. FREELING Decision Science Consortium, Inc., Falls Church, Virginia 22043 AND

St. John's College, Cambridge, England This paper addresses the problem of a probability assessor who has, directly or indirectly, given two different numbers for the probability of one event. In order to use these assessments in, for example, a decision analysis, these two numbers must be reconciled to give one value as his probability. The previous work on this subject by D. V. Lindley, A. Tversky, and R. V. Brown (Journal of the Royal Statistical Society, Series A, 1979, 142, 146-180), and R. V. Brown and D. v . Lindley (Theory and Decision, to appear, 1981) is explored. In particular the paper by Lindley et al. is summarized and discussed in the light of its practicability. It is shown that the method of reconciliation they have proposed is formally equivalent to that of taking a weighted average of log-odds, with weights proportional to the independent information content of each assessment. This method has the advantage of being simple in application. It is further argued that the motivation for taking multiple probability assessments is an attempt to obtain more information from the subject, and that the proposed method of reconciliation captures the essence of this motivation. The relationship between this research and the well-known "expertuse" problem is explored.

Contents. 1. Introduction. 2. The problem, and the motivation for studying it. 3. A mathematical formulation of the problem. 4. The Bayesian updating approach. 5. Difficulties with the Bayesian updating approach. 6. The leastsquares approach. 7. The importance of correlations. 8. A psychological interpretation of the metric. 9. Least squares and a weighted average. 10. An information-oriented approach. 11. Quantification of information. 12. Expert use. 13. An alternative approach to the problem. 14. Summary and conclusions.

1. INTRODUCTION This paper presents s o m e research following on from the w o r k o f

Lindley, Tversky, and Brown (1979) and Brown and Lindley (1981) on the The work described in this paper was performed for the Office of Naval Research/Naval Analysis Program under Contract Number N00014-79-C-0209 for Decision Science Consortium, Inc. The author would like to thank Dr. R. V. Brown, Dr. M. S. Cohen, and Dr. S. R. Watson for many invaluable comments and discussions during the course of this work, and an anonymous reviewer for comments upon an earlier draft of the manuscript. He also wishes to thank Miss P. Cambell and Ms. J. Lauck for their endless patience in preparing typewritten versions of nearly illegible scribblings! Requests for reprints should be sent to Decision Science Consortium, Inc., Suite 421,7700 Leesburg Pike, Falls Church, VA 22043. 395 0030-5073/81/060395-20502.00/0 CopyrightO 198l by AcademicPress, Inc. All rightsof reproductionin any formreserved.

396

A N T H O N Y N. S. F R E E L I N G

reconciliation of incoherent judgments. We summarize the paper by Lindley, Tversky, and Brown (LTB), make some criticisms of their work pertaining to the practicality of their proposed methodology, and then present an extension of their ideas which is both practicable and intuitively appealing. 2. THE PROBLEM, AND THE MOTIVATION FOR STUDYING IT When a subject, S, is asked to produce judgments (about his utilities, his probabilities, or even his preferred decisions) in a variety of different ways, it is quite likely that his responses will contradict each other in some way, or fail to satisfy the computational constraints imposed by the probability calculus. Perhaps the most obvious example of this occurs when a decision analysis is carried out, and the selected alternative differs from the option that S had chosen via direct introspection. Another example occurs when S produces numbers for the probability of "target event" A, P(A), in two different ways, and these two numbers differ. To provide a concrete example of this latter situation, suppose I am interested in the target event that Oxford University will win their next annual rowing race with Cambridge University. Taking all things into consideration, I decide that this has probability .7. Then I decide to make the assessment by conditioning on Oxford winning the toss. I decide that I believe they have a chance .8 if they win the toss, but only .5 if they lose the toss. Assuming the probability of winning the toss to be .5, these latter two assessments imply a probability for the target event of .65, and hence I have caught myself in an inconsistency. Typically in such a situation, one of the judgments will be considered "better" than the other, and the other judgment simply ignored. So, for example, when a decision analysis has been performed for a decision, it is usually assumed that one should trust the analysis more than a nonanalytical, intuitive, judgment. Again, if one has arrived at the probability of a target variable in two ways, once directly (holistic assessment) and once by "extending the conversation" to another, relevant, event and obtaining probabilities conditional on that event (decomposed assessment), the decomposed assessment will typically be used and the holistic assessment disregarded. In fact, this selection is often made implicitly, before any assessments are made, and only a "minimally specified" set of judgments is taken, e.g., only the decomposed assessment, so that there is no chance for incoherence to be discovered. However, I may have a strong gut feeling that a decision analysis failed adequately to capture all my opinions about a decision, and that my direct choice really did have something to offer. Similarly, I feel that .65 is too low for the probability of Oxford winning, and that my holistic assessment may have captured aspects of my uncertainty left untapped by the de-

M U L T I P L E PROBABILITY ASSESSMENTS

397

composed assessment. Had I not tried multiple approaches to the elicitation I would not have discovered this. The fundamental thesis of this paper is that there is something to be gained, in terms of digging into S's psychological field, by pursuing several alternative methods of eliciting the same judgment. For further discussion of this motivation, see Brown and Lindley (1981) and Freeling (1980a). This paper looks only at inconsistent probability assessments, although this is just a small part of the much wider field of inconsistent judgment. At present, if multiple assessments are used, and inconsistency discovered, S will typically have the inconsistency pointed out to him, and be requested to perform the reconciliation informally. The research described here attempts to provide a theoretical basis leading to a practical technique for a f o r m a l reconciliation procedure. Such a theoretical basis is desirable to aid S in his reconciliation. Perhaps more importantly, a theoretical foundation will raise these procedures to the same level of credibility and defensibility as the rest of decision analysis, and will, we hope, cause practicing decision analysts to view this seeking out of inconsistency as an integral part of a good decision analysis. 3. A MATHEMATICAL FORMULATION OF THE PROBLEM We suppose there is a subject S, whose probabilities q~, i = 1 . . . . . n, we have elicited. These probabilities will typically be inconsistent. Our aim is to provide a reconciled set of probabilities ¢ri, i = 1 . . . . . n, which satisfy the constraints specified by the probability calculus, which can be stated in the f o r m f j (Tr,. . . . ,7r,) = 0, j = 1. . . . ,n. We shah use vector notation for simplicity, in which case the constraints can be stated as f(rr) = 0. (Throughout this paper, bold type is used to indicate vectors and matrices.) As an example, suppose S provides probabilities for an e v e n t A and for its complement, - A , each of .4. Then q l = .4, q2 = .4, and the single coherence constraint is 7rl + ~'2 = 1. This can be viewed geometrically in Fig. 1. The assessed q is the point (.4, .4), and this is incoherent as it does not lie on the constraint set represented by the line 7rl + 7r2 = 1. Our reconciliation task is to find one point on the constraint set which is in some sense " t h e b e s t . " This is ~-. 4. THE BAYESIAN UPDATING APPROACH In Lindley et al. (1979), hereafter referred to as L T B , the authors use Bayesian updating to arrive at it. T h e y assume that, lodged somewhere inside S's head, there exist some " t r u e , " consistent probabilities, ~-, which S can only access with partial success. The assessed q~ are viewed as readings on 7r, together with a measurement error. L T B then introduce the concept of a coherent investigator N, who provides information about

398

A N T H O N Y N. S. F R E E L I N G

P(~A) ~~1+

PIA)

"~2=1

1

FIG. 1. A graphical illustration of incoherence.

this measurement error. T h e y suggest two alternative Bayesian procedures to arrive at #. Both these procedures require the following three probability distributions from N. (i) p(A): N ' s (coherent) probabilities corresponding to q. (ii) p (~r~4): N ' s view of what S will have as true probabilities, ifA in fact obtains. (iii) p (ql~r): N ' s opinion of what S will say, if his true probabilities are in fact ~-. It is worthwhile looking in further detail at the meaning of these latter two distributions. They are probability distributions of N concerning S, prior to S making his assessments % In accordance with subjective probability, N can capture his initial uncertainty regarding the true ~- by using the distribution (ii). It is the third distribution that is u n u s u a l - - N is viewing the elicited q as being an observation arising from a stochastic process Q, within S. Thus, the distribution should be read as p(Q = ql~r). The stochastic nature of Q arises from the " m e a s u r e m e n t e r r o r " mentioned above. In later sections, we make normality assumptions concerning this distribution and it should then be borne in mind that all the mean and variances introduced are of the process, Q, rather than of the individual readings, q. H o w e v e r , by abuse of notation, we shall refer to Var(ql), and so on. Note also that there is a family of distributions p(ql~-), one for each possible value of ~-. The three distributions can be viewed as representing, respectively, N ' s own beliefs about A, N's model of S's knowledge acquisition, and N ' s model of S's performance as a probability appraiser. It should be noted that we assume that p(qlcr) = p(ql~-, A), so that S's measurement error does not depend on whether A in fact obtains or not. With these three distributions, the authors of L T B develop an internal and external approach. In the internal approach, N derives a probability distribution for

MULTIPLE PROBABILITY ASSESSMENTS

399

~- updated in the light of the elicited q, P(~'lq), and uses this to arrive at Or, so here ~- is viewed as a " b e s t " estimate of the true ~-. In the external approach, N updates his own probabilities for A, in the light of the information provided by q. So here, ~- is N's revised view of the world,p(Alq). Figure 2 shows diagrammatically the way N combines his probability distributions for each approach. 5. DIFFICULTIES WITH THE BAYESIAN UPDATING APPROACH

The procedure described in the above section appears, at first sight, intuitively appealing, but on further consideration, some very real problems appear. First, the psychological existence of the " t r u e " probabilities ;r is, at best, very dubious. Psychological evidence (e.g., Slovic & Tversky, 1974; Kahneman & Tversky, 1979; Tversky, 1969) implies that S in fact does not carry around a set of numbers in his head describing his feelings of uncertainty. In this case the interpretation of distributions such as p(;rlA) and p(ql~-) will be very difficult, and hence these distributions will be hard to assess. We must also wonder about the validity of a

Internal Approach

-..j ,\ J P (~IA)

p (A)

We elicit p(~lA), P(A) and P(ql~). These are

p (~l

then used consecutively to derive P(~), P(?rlq) and hence ~.

~) E (lTIq)=

P (~lq)

External Approach

\J

P (~LA)

p (AI

\J

P (qlA)

P (ql~)

We elicit P(ITIA) and P(ql~) and P(A). These are then used, consecutively, to derive P(qlA) and P(AIq).

P (AIq)

FIG. 2. The internal and external Bayesian updating approaches.

400

ANTHONY N. S. FREELING

p r o c e d u r e (the i n t e r n a l a p p r o a c h ) d e s i g n e d to a p p r o x i m a t e 7r, if this is a nonexistent entity. A s e c o n d difficulty lies in the a s s u m p t i o n o f N. W h o is this fully c o h e r e n t i n v e s t i g a t o r ? L T B a p p e a r to v i e w h i m as a part o f o u r s u b j e c t , in a m o r e r e f l e c t i v e m o o d . I f o n e v i e w s d e c i s i o n a n a l y s i s as a p r o c e d u r e in w h i c h all the j u d g m e n t a l i n p u t s c o m e f r o m S r a t h e r t h a n the a n a l y s t , t h e n L T B ' s v i e w o f N is m o r e s a t i s f a c t o r y t h a n a s s u m i n g h i m to b e the a n a l y s t . F o r the e x t e r n a l a p p r o a c h w o u l d t h e n v i e w the s u b j e c t ' s j u d g m e n t s p u r e l y as e v i d e n c e to u p d a t e the a n a l y s t ' s o p i n i o n s , a n d this takes the d e c i s i o n a n a l y s t far a w a y f r o m a s u p p o s e d l y n e u t r a l , p u r e l y a n a l y t i c a l , role. 1 T h e i n t e r n a l a p p r o a c h also d e p e n d s v e r y h e a v i l y o n N ' s j u d g m e n t a n d again, this s i t u a t i o n m a y b e r e g a r d e d as s o m e w h a t u n s a t i s f a c t o r y . If, o n the o t h e r h a n d , we v i e w N as a p a r t o f the s u b j e c t , w e shall b e a s k i n g h i m s o m e v e r y s t r a n g e q u e s t i o n s . F o r e x a m p l e , to get p(A), we shall h a v e to say: "I'm afraid the q's you have just given me were inconsistent. Could you please give me a coherent set of probabilities p(A), so that I can use my method to find coherent probabilities #? W h y w o u l d we n o t s i m p l y u s e p(A) i n s t e a d o f q(A)? This raises the g e n e r a l q u e s t i o n of s e c o n d - o r d e r i n c o h e r e n c e . N will t y p i c a l l y b e i n c o h e r e n t too, so his j u d g m e n t s will n e e d to b e r e c o n c i l e d . This leads to a n infinite regress, w h i c h L T B h o p e will c o n v e r g e , b u t this is far f r o m clear, a n d a n y w a y i n v o l v e s us in h o r r e n d o u s c a l c u l a t i o n . H o w e v e r , the m a t h e m a t i c s of this r e g r e s s s h o u l d be i n v e s t i g a t e d , as k n o w l edge o f the e x i s t e n c e a n d n a t u r e of a limit m i g h t e n a b l e us to a p p r o x i m a t e it a n d i m p r o v e o u r r e c o n c i l i a t i o n m e t h o d o l o g y . A third difficulty arises in the i n t e r n a l a p p r o a c h . E v e n if we c a n get the d i s t r i b u t i o n p ( ~ [ q ) , h o w do w e t h e n arrive at 4r? L T B tell us to take # = 1There is, however, a very strong case for arguing that we must always use our own beliefs to determine someone else's meaning. In this case, one could view a reconciliation procedure as part of the analyst's model for interpreting S's statements, and it would then be appropriate to take the analyst as N. In the traditional view of decision analysis, the analyst is portrayed as a logical machine, whose only function is to point out necessary logical implications to a decision maker, without any of the analyst's own beliefs ever entering into the analysis. This complete neutrality of the analyst has been one of the big selling points of the methodology, but it is now becoming apparent that the judgmental inputs to an analysis come from the DM-analyst pair, viewed as a single entity. This is especially noticeable with the present problem for, as Savage (1954) noted, the logic of personal probability can only tell us we are inconsistent; it can make no recommendation toward remedying the situation. Hence a reconciliation methodology will of necessity include judgments of some form from others than just the subject. It is important that the true role of the analyst and his position of power be acknowledged and understood by practising analysts. For a further discussion of philosophical implications, see Freeling (1980a).

M U L T I P L E PROBABILITY ASSESSMENTS

401

E(~-]q), but these # would not satisfy the constraints, unless the constraint equation is linear. For a further discussion of these and other problems see Freeling (1980a). For this paper it is sufficient to note that Bayesian updating, and the internal approach in particular, requires assumptions about the existence of ~- and N which may well be untenable, and involves us in some very difficult assessments and computation. L T B claim the assumptions to be vital for a formal reconciliation procedure. In this paper we shall show this to be false. 6. THE LEAST-SQUARES APPROACH There are a variety of ways in which one can avoid the necessity of explicit assumption of true probabilities while remaining within the Bayesian paradigm. For example, in the external approach, one could assess p (qlA) directly, but this assessment would also be difficult. Indeed, I feel that v e r y little is gained, in t e r m s of ease of a s s e s s m e n t , by decomposingp(A]q) intop(qlA ) andp(A), so one might as well have asked N d i r e c t l y what he thinks the probability of A is, once he has heard S's judgment. So such an approach would not be a very useful aid. L T B are aware of the practical difficulties of the Bayesian approach, and have suggested a least-squares procedure as an approximation to the internal approach. T h e y p r o p o s e to take fi- as the solution to the following constrained minimization problem: minimize ~ w u ( q i - 7 q ) ( q j - 7r~), 7r

(1)

i,j

subject to the coherence constraints, f(~-) = 0, or, in full vector notation, minimize (q -n-)TW(q - ~-) subject to the constraints. L T B take W = V -1, where V is the v a r i a n c e - c o v a r i a n c e matrix of the distribution P(qln'). The motivation for developing this approach is that it approximates the full Bayesian updating procedure, in the following sense: If we assume that p(q]n') will be multivariate normal with mean ¢r, that the variance V will be the same for each possible ~- (i.e., that Var (Q]1r) = Var (Q]~-')), and that N ' s prior beliefs about ~ are diffuse so that p (~') is approximately constant, then defining W = V 1 means that the solution to (1) will be a good approximation to the reconciliation that would be achieved using the internal Bayesian approach. So for example if we believe the assessed q~ to be unrelated, we may take them to be independent, with variances o-~. Then the function to be minimized is Z~ (q~ - ~r~)2 / o-~. It will be noted that the normality assumption is far more reasonable if we are working with log-odds, which can take all values in ( - 2 , ~), rather than with probabilities which are constrained to the interval (0,1), since the normal distribution has an infinite range. The

402

A N T H O N Y N. S. F R E E L I N G

assumption of equal variance for each possible ~- does not make sense, working with probabilities, as for ¢r close to 0 or 1, we may e x p e c t the a b s o l u t e variance to be small. Such a consideration does not hold true when using log-odds. For some psychological work which lends support to this theory, see Wheeler and Edwards (1975). To exemplify the method, consider again the Oxford and Cambridge boat race. Then A, the target event, is " O x f o r d wins" and take X to be " O x f o r d wins the t o s s . " Then ql = q ( A ) = .7, q2 = q(AIX) = .8, q3 = q ( A l - X ) = . 5 , and we will assume that p(X) is known to be .5. Then, in general, the matrix will take the form r

V =

Ip

p0-~-p0-T~

0-~r "r2

6T21'

(2)

LpO-r &r ~ r2 ] if q2 and q8 are assumed to have equal dispersion characteristics. (So 0-2 is Var(q 1), ~2 = Var(q 2) = Var(q3), P is the correlation between q 1 and q2, and 8 is the correlation between q2 and q3-) Then if we assume that all the assessments q are independent and have equal variance, so 0-2 = ~.2 = 1, p = 8 = 0, and W becomes the identity matrix, we find that #1 = .67 is the reconciled value. Furthermore, if we define precision as the inverse of the variance (see, e.g., Winkler, 1972), we find that the precision of # is three times that of the original assessments. This calculation appears in L T B , where it is used to indicate the dramatic increase in precision achieved by taking multiple assessments and reconciling them. T h e r e are several caveats about this procedure which should be considered. These are discussed in the next section. 7. THE IMPORTANCE OF CORRELATIONS I now wish to show that the results of L T B in fact arise largely from the assumptions of independence between assessments. Indeed, the increase in precision by a multiple of 3 is directly attributable to this assumption, as may be understood from the following heuristic argument. Precision as here defined is closely related to the statistical concept of the amount of information described by a probability distribution. Also, independence between assessments is equivalent to saying that each donates entirely different information to the reconciliation. Hence, with three independent assessments, we have thrice the information, and equivalently, thrice the precision. Such an analysis interpreting the quality of assessments in terms of their information content has a very intuitive appeal. It was after all an attempt to consider all the possible available information (in the form of searching the assessor's psychological field) that prompted this search for incoherence. H o w e v e r , the assumption of independence b e t w e e n as-

M U L T I P L E PROBABILITY ASSESSMENTS

403

sessments is clearly untenable, for much of the same information will be used in assessments directed toward the same target variable (e.g., q(A) and q (A IX)). Alternatively, looking at the same situation from a different angle, if q(A) is overestimated we might well expect q(AIX) t o be overestimated as well, since S might make the same mistake in each case, so the correlation would be nonzero. L T B are aware of the falsehood of the independence assumption, and the effect this has on precision. T h e y extend the previous example, by taking o-2 = r 2 = 1, p = 6 = .5. The calculations again give .67 as the reconciled value, but the precision is only increased by a half. H o w e v e r , L T B appear to ignore this fact when making one of the major conclusions of their paper, for they conclude that the following procedure is a good one for increasing precision: Find a partitionX~ (i = 1 . . . . . n) of the sample space, such that p(XO = p(X~) for all i j (where these probabilities are assumed known, as, perhaps, with a coin toss). Then " e x t e n d the conversation a b o u t A " to includeX~, by assessing q(A[XO, i = 1. . . . . n. Then we have two assessments of the target variable ~r(A), in a direct, holistic assessment q(A), and an indirect d e c o m p o s e d assessment ~ q(A[XOp(XO. These should be reconciled via the least-squares procedure. L T B show that under the assumption that all assessments are independent and of equal variance, this procedure gives an increase in precision by a factor of n + 1, and that using an equiprobable partition is optimal over all partitions of size n. T h e y then suggest that we should thus always try to extend the conversation to include such an equiprobable partition. When correlations are included, this conclusion no longer holds true. (Indeed, we can see that precision is maximized by utilizing as much information as possible. This concept is made more explicit in a later section.) To take an absurd example, suppose in the boat race example that I decide to condition not on the relevant coin toss, but on a coin I toss. Then the analysis would be identical to the real case if correlations are ignored, yet clearly there should be no increase in precision by considering irrelevant events. The point is that q(A), q(AIX), and q (A[-X) should all be very similar, as they are really the same assessment, and so the correlations are very high. L T B also state that c o r r e l a t i o n s " h a v e little effect on the probabilities," noting that in each of the above examples the reconciled value was .67. They assume that the correlations only affect the precisions, not the values. This is untrue as can be seen by once again altering the varia n c e - c o v a r i a n c e matrix of the previous example. With o-2 = 4, r 2 = 1, p = 6 = .5, we find that the reconciled value is .645. This is a reconciliation of .65 and .7 which is very different from .67, and perhaps somewhat counterintuitive. This is due to the fact that describing the relationship between assessments by correlations is difficult and not very intuitive.

404

A N T H O N Y N. S. F R E E L I N G

Whereas one can assess variances fairly well b y asking for credible intervals for an a s s e s s m e n t , assessing correlation coefficients is not so easy. With the boat race example, the author was not able e v e n to produce a v a r i a n c e - c o v a r i a n c e matrix for his own a s s e s s m e n t s which was positivedefinite.5 H e thus has v e r y little faith in direct methods of a s s e s s m e n t for correlation coefficients. As t h e s e correlations have b e e n shown in this section to be of p a r a m o u n t importance to the least-squares technique, we now p r o c e e d to look at alternative ways of interpreting the relationships causing nonzero correlations, and thus making indirect a s s e s s m e n t s of correlation coefficients. The next section looks at the least-squares technique from one different perspective.

8. A PSYCHOLOGICAL INTERPRETATION OF THE METRIC E x p r e s s i o n (1) is an example of a generalized distance, or metric. The matrix W transforms the familiar Euclidean space R" to a curvilinear space. In this situation, we interpret this curvilinear space as being the psychological field of the assessor, with respect to his assessments. So, if q is the assessed probability vector, and x i (i = 1. . . . . m) are other possible probability vectors, then that x ~ minimizing (q - x~)~W(q - x i) is the closest x ~ to q in this psychological space. An intuitive understanding of this distance is to consider it as measuring the unease of S at being forced to take s o m e v e c t o r other than q as his probability vector. So the solution to (1) is that probability vector which satisfies the c o h e r e n c e constraints, which S is least u n h a p p y using as his probability vector. This then gives us an alternative method for assessing the distance matrix W. If we can discover S ' s perceived distances b e t w e e n different points, we m a y then deduce the W these distances imply. Such a p r o g r a m appears v e r y attractive. It does not depend on the assumption of hypothetical true probabilities and its definition of the " b e s t " reconciliation is totally subjective, in the spirit of the theory of subjective probability. H o w e v e r , on a further examination, the method appears unworkable. This can be exemplified b y a thought experiment. I f such a metric existed, one would be able to use the methods of multidimensional scaling (MDS) to find it. To take a concrete example, suppose S assesses q(A) = .5 and q(-A) = .4, so q = (.5, .4) revealing incoherence. Suppose the analyst selected x 1 = (.5, .5) and x 2 = (.6, .4) for A positive-definite symmetric matrix A is one satisfying the following condition: x~Ax > 0 for all nonzero x. It can be s h o w n that a v a r i a n c e - c o v a r i a n c e matrix m u s t always be positive-definite. Being positive-definite is the matrix equivalent of being a positive n u m b e r , and the condition that the v a r i a n c e - c o v a r i a n c e matrix of a multivariate distribution be positive-definite is an extension of the condition that the variance of a univariate distribution be positive. A practical check on w h e t h e r a s y m m e t r i c matrix is positive-definite is to discover the eigenvalues o f the matrix. A t h e o r e m o f linear algebra s h o w s that a s y m m e t r i c matrix is positive-definite if and only if all its eigenvalues are positive.

M U L T I P L E P R O B A B I L I T Y ASSESSMENTS

405

presentation to S. Using M D S we would require S to a n s w e r questions of the form (a) Which of x I or x 2 is closer to q? (b) Which of x ' or q is closer to xZ? Q u e s t i o n (a) is a n s w e r a b l e , b u t q u e s t i o n (b) is not. B o t h x I and x 2 are vectors invented by the analyst, and S m a y well find it impossible to assess his feelings of discomfort at being forced to m o v e f r o m x 2 as his probability. The mental gymnastics required are too difficult. In fact, we see that the only feelings of discomfort S truly has c o n c e r n moving from q to the various x ~, but not moving b e t w e e n any two points, arbitrarily selected by the analyst. In this case there exists no matrix W with the interpretation of this section (for, if there were, question (b) would be answerable). We are forced therefore to look further for a practical and satisfactory reconciliation procedure.

9. LEAST SQUARES AND A WEIGHTED AVERAGE F o r the rest of this paper, we concentrate on two estimates of the target variable, p(A). So ql m a y be a holistic assessment, and q2 the a s s e s s m e n t logically implied by d e c o m p o s e d judgments. H e n c e f r o m now on the c o h e r e n c e constraints take the f o r m ql = q2. We also a s s u m e that we are working in log-odds, for the reasons noted in Section 6. If we use the least-squares a p p r o a c h , with V equal to

[

o-2 oo-'T1

Do-'7

"/'2 j ,

then the reconciled value ('72 _ po-'7)q, + (o.2 _ Po-'7)q2 6- =

T2

-[-

0-2

__

200-'7

(3)

(for a p r o o f of this, see Freeling (1980a)). N o t e that 6- is simply a weighted average of ql and q2,

6- = ( a q , + Bqz)/(a + B),

(4)

with A = 1/o-z - p/o-T, B = 1/'72 - 0/o-7, though one of the weights m a y be negative. T h e s e weights have an appealing intuitive interpretation. F o r example 1/o-2 m a y be t a k e n as a m e a s u r e of h o w " g o o d " an a s s e s s m e n t q a is and m a y be viewed as a m e a s u r e of the amount of information c o n tained in q 1, so A is the a m o u n t of information in q 1 reduced by a quantity due to the correlation. In the next section we interpret this quantity as the amount of information shared by both ql and q2. It is appropriate to note here that (3) can also be derived in a different way. We m a y decide a priori to m a k e our reconciled value a weighted

406

A N T H O N Y N. S. F R E E L I N G

average of the two elicited target probabilities. In this case, since our motivation is to increase precision, which we may equate with reducing variance, we wish to seek the minimum-variance weighted average. It is easy to show (see Freeling, 1980a) that the optimal weights are as in (3). This idea is discussed by Bunn (1978), with regard to pooling the results of different forecasting models. It provides an alternative motivation for using this approach, which may be more acceptable to some. We can also see from this interpretation of the least-squares technique that there is an underlying assumption that our elicited values are unbiased estimators, i.e., that the error ofq~ is expectationally zero, so E (q~lzr) = ~-. For, were this assumption false, ¢r would not be an unbiased estimator, so E (#[zr) = ~r + a, where a is nonzero, and a better estimate would be ¢r - a. The implications of this assumption are discussed in the final section. 10. AN INFORMATION-ORIENTED APPROACH In this section we argue that the least-squares approach is in fact an attempt to quantify the information (in the broad sense discussed in Section 7) captured by an assessment, and perform the reconciliation based on this quantification. Consider Fig. 3. This diagram illustrates the information accessed by our two (log-odds) assessments ql and q2. So ql has information 11 and q2 has information 12. 3 Then A = 111/121 quantifies the information accessed only by ql, B = ]I2/I~[ quantifies the information accessed only by q2, and C = ]Ia fl I2[ quantifies the information c o m m o n to both ql and q2. In this formulation, the total amount of information is lI1 U/21 = A + B + C. A formal definition o f " i n f o r m a t i o n " is really necessary in order to fully operationalize and quantify the concept. Such a definition is unfortunately very hard to produce. The use of Shannon and Weaver's information measure may prove a very useful line of research, which is at present being further explored. At the present we may identify two distinct aspects of information; the degree to which S has been able to dip into his psychological field, and the extent to which that gleaned data have been correctly processed in accordance with Bayesian principles. The first aspect has the intuitive meaning of describing what of relevance to the target event was taken into consideration when making the assessment. The second refers to the ability of human beings to adequately process data, which, from the psychological evidence, is limited. I believe tnese concepts can be further explored and a Here 11 and 12 denote the sets describing the information content o f q 1 and q2. We use the modulus symbol I'1 to denote the size of the set (in mathematical terms, its cardinality). So, for example i r a is the set { 1,3,5,6,9}, then IAI = 5. Also I~/I2 signifies I~ less 12, and is thus identical to I~ fl -I2.

MULTIPLE PROBABILITY ASSESSMENTS ql

With Information 11

407

q2 With Information 12

FIG. 3. Information overlap between two assessments.

made explicit, but for now we must trust our intuition that such concepts have meaning. We now have a model which is able to describe simply both our motivation for studying incoherence, and the weights that are " o p t i m a l " when using a weighted average. First, by eliciting both ql and q2, we have obtained more information from S than if we had only elicited one of them. It must be better to take account of all this information if possible, rather than using just some of it by using only one assessment ofp(X). The increase in quality of our result is measured by the additional information used. Second, an intuitively reasonable way of weighting the two assessments is in proportion to the information unique to them, the information c o m m o n to both tipping the scales in favor of neither one nor the other. This then makes the natural reconciliation to use (Aql + Bq~)/(A + B), as in Section 9. This intuitive procedure has an obvious correspondence to the leastsquares procedures. [Ii1 corresponds with 1/o-2, 1121 with 1/r 2, and I/1N/2I with p/0-r. In particular, this provides a clear interpretation of the correlation coefficient, p in this context. The two assessments are related to the extent that they each draw on the same information. I believe that it is this relationship we are attempting to quantify by including p in our analysis. H o w e v e r , quantifying the relationship in terms of information content is a more natural way of proceeding, as p is a nonintuitive entity. This explains both the difficulty involved in assessing a p that is coherent, and also the potential for u n e x p e c t e d (and unsatisfactory) reconciliations which arise from using a p which does not correctly capture one's belief. As an example of the value of these information ideas, consider the classical situation, such as the boat race example, where ql is a holistic assessment, and q2 a d e c o m p o s e d one (as defined in Section 2). Then the assumption (often unspoken) of decision analysts has been that q2 captures all the information O f q x , and some extra as well (i.e., it is assumed

408

A N T H O N Y N. S. F R E E L I N G

that by decomposing the judgment we are able to take some aspects of the situation into account that previously we could not; and also that there has been an improvement in processing of the data by making explicit use of the equation p(A) --- ~ p(A[XOp(XO). Analysts will therefore often not bother with eliciting q l at all---it would not appear to have anything to offer. In the present formulation, the above argument means that I I C 12 (see Fig. 4) so thatA = 0. Hence the weighted average (4) becomes Bq2/B = q2, confirming the heuristic reasoning above. Defining our weights via v a r i a n c e - c o v a r i a n c e structure, however, in order to achieve such a reconciliation, we see from (3) that/9 must equal ~-/tr. I very much doubt that such a value would be elicited from a subject who actually held the above beliefs. This example also makes explicit once again our motivation for seeking i n c o h e r e n c e - - i f we do not agree with the above reasoning, but in fact believe thatI1/I2 # O, then we gain by considering both q l and q2. Another interesting consequence of the present formulation lies in the correct value o f p to use in a statistical analysis when one has no information about its value. Lindley (1965) has suggested p = .5 is appropriate. F r o m Fig. 3, one could invoke a form of the principle of insufficient reason, and take A = B = C. In this case one can easily calculate the implied value of p to be .5.

11. QUANTIFICATION OF INFORMATION The concepts of " i n f o r m a t i o n " discussed above have a fairly intuitive interpretation, but it is rather difficult to obtain quantitative assessments for them. In this section we make some suggestions for quantification. The first item to note is that if we have equal confidence in each of q andq2, then we know t h a t A + C = B + C, s o A = B, and we may simply take the arithmetic mean of (log-odds) q 1 and q2. This illustrates the point that a quantification of C is of use only in assessing the precision of the 12

A=O

F1G. 4. All the information of 1 contained in 2.

M U L T I P L E PROBABILITY ASSESSMENTS

409

reconciled estimate, and also that we can arbitrarily assign one of the values, e.g., A, as it is only relative quantities in which we are interested. It should also be noted that this explains the findings of L T B (see Section 7) that correlations will not affect the probabilities. For, their calculations were performed with 0-5 = r 2, o r A = B, and as we have noted, this then eliminates the effect of the correlation. There is a standard statistical concept upon which we may draw to aid our understanding of information, that of Fisher's information measure, but while such a c o n c e p t is of value in providing a theoretical basis for the work, it does not aid the practical problem of assessment. The following suggestions are only tentative, and further work is necessary to extend some of the ideas. One could simply assign the weights for the weighted average directly, without explicitly considering the information content. This and any other such attempt at quantification will need to be an interactive process between the analyst and S, so as to capture the subjective feelings of S and the more objective knowledge the analyst has about the different assessment techniques. A more satisfying method of direct elicitation is to use the intuitive idea of information, and ask the following two questions: (a) H o w much extra information was gleaned by taking q2, when q x had already been assessed? (b) H o w much extra information would have been gleaned by assessing ql, had q2 already been assessed? Each of these answers should be made relative to the amount of information contained in ql. To exemplify the way A, B, and C could be calculated from these answers, suppose the answer to (a) was "as much again" and to (b) " h a l f as much again." Then we deduce that B =A + C

(from(a))

and 2A = (B + C)

(from (b)).

H e n c e A = 2C = 2B/3. So the weighted average is .4ql + .6q~, and the precision of the reconciled estimate measured by A + B + C is twice that o f q 1 and one and a half times that of q2. One might instead suggest that the weight should be related directly to the confidence placed in the judgments. Some o f the confidence in ql would arise from the same reasons as some of the confidence in q2. One can envisage displaying Fig. 3, and allocating 100 coins to A, B, and C, in order to quantify these "confid e n c e " judgments. Alternatively, one might use the concept of equivalent sample size to

410

A N T H O N Y N. S. FREELING

assess the information content of an assessment, by relating the extra information gained from an assessment to the number of extra observations from a binomial process that would have provided equivalent gain in information. Bunn (1978) has discussed ways of using this idea for assessing the parameters of a/3 distribution, and an extension of those ideas might provide a good method of dealing with the present situation. One could also use the ideas of L T B to help decide upon the weights--by assessing credible intervals for each assessment, we gain a good idea of the relative degrees of confidence in each assessment. The variance of an assessment may be taken as proportional to the square of the confidence interval. Assessing the information common to the two assessments is not so easy this way. 12. EXPERT USE In this paper we have discussed the situation of a single decision maker who gives inconsistent probability assessments. However, the technique of taking a weighted average of log-odds is also applicable to the problem of expert use, i.e., the situation when two or more experts each give probability assessments for a target variable. One would expect the experts to differ somewhat in their probability assessments so, in order for a decision maker to make explicit use of the assessments, a reconciliation needs to be performed. Morris (1974) has developed a Bayesian procedure for performing this reconciliation that is similar to the method of LTB. The argument we have presented in previous sections can be used to show that the reconciliation should be a weighted average of log-odds. In this case the interpretation of the weights is much easier than before; they are the decision maker's opinion of the relative expertise of the various experts. So, for example, the intersection Ilf)I2 represents the shared expertise. We are now in a position to offer an interesting perspective on the well-known problem of what reconciliation to use for multiple experts of equal expertise. The arithmetic mean of the probabilities is an obvious candidate, but Norman Dalkey 4 has suggested that the geometric mean is better than the arithmetic mean. From our work we can conclude that the arithmetic mean of the log-odds is the appropriate procedure. It will be recalled that log-odds were suggested because the assumption of normality necessary for the least-squares procedure was more valid. We make the observation that taking the geometric mean is equivalent to taking the arithmetic mean of the log-probabilities. For, if the reconciliation q' = (qlq2) 1/2 and letting r~ = lnpi, rz = lnp2, then r' = In q' = .5(r~ + 4 Dalkey's point was made at the 18th Annual Bayesian Research Conference, held in Los Angeles, February 14-15, 1980.

M U L T I P L E PROBABILITY ASSESSMENTS

411

r2). Thus taking the geometric mean would be our recommended procedure if we believed log-probabilities to be normally distributed. Such an assumption may be better than taking probabilities as normal since logprobabilities have infinite range, but log-probabilities are always nonpositive, so the normality assumption cannot be strictly true. Our work thus implies that taking the geometric mean is better than taking the arithmetic mean, in agreement with Dalkey, but that taking the mean of log-odds is better than either. 13. AN ALTERNATIVE APPROACH TO THE PROBLEM We have worked from the philosophical position that we must produce a single number to describe a decision maker's uncertainty about an event, so that this can be used as " t h e " probability in a decision analysis. The difficulty with this procedure arises from the fact that the axioms of subjective probability do not permit incoherence as a part of "rational" decision making. They permit such incoherence to be exposed, but make no recommendation about how it should be eliminated. However, one can argue that it is precisely these conflicting opinions that cause some decisions to be hard to make, and further, that it is perhaps not desirable to exclude such conflicting opinions from our theory of rational belief. We understand that probability theory is incomplete regarding prescriptions for rationality and that it is only an approximation to the way persons think. In the present situation one can argue that the approximation is not sufficient and that a better one should be sought. As an example we may suppose that one's feelings of uncertainty are themselves vague, and that a range of possible probabilities better describes the state of our mind. In that case, when forced to provide single numbers as probabilities we will simply be choosing arbitrarily from within the range. It is then to be expected that differing probabilities will be picked. We can attempt to incorporate this idea into our theory of rationality and thus, hopefully, produce a prescriptive theory of decision making which can cope with the situation. An example of an extended theory of belief, which attempts to capture the inherent vagueness in the way man thinks, and within which the "incoherence" discussed here can be shown to be consistent, is based on fuzzy set theory (FST) (Zadeh, 1965). Two papers (Watson, Weiss, & Donnell, 1979; Freeling, 1980b) which look at the application of FST to decision analysis, have been published. A paper at present in preparation examines the motivation and justification for using FST as a basis for studying "vague probabilities" at a less sophisticated mathematical level (Freeling, 1981). This latter paper uses incoherence as a case study to exemplify the potential of a fuzzy approach.

412

A N T H O N Y N. S. F R E E L I N G

14. SUMMARY AND CONCLUSIONS

In this paper we have examined in detail the work of Lindley, Tversky, and Brown (1979), and further explored some of the consequences of that work, in particular the internal approach. We have concluded that taking a weighted average of log-odds, with the weights proportional to the independent information content of each assessment, is equivalent to the procedure developed by LTB, while being simpler and of greater intuitive appeal. It should, however, be made clear that this technique is only appropriate when we believe the only discrepancy between assessments to be due to some sort of "random measurement" errors. The technique is inappropriate, for example, if we have reason to believe that some biasing is present in the assessments (see Section 9). The problem is that there are many different possible s o u r c e s of incoherence, and a particular mathematical reconciliation technique will only be appropriate for eliminating some of them. Rather than using one mathematical approach to resolve inconsistency, the author believes that a complete reconciliation procedure should consist of several different mathematical techniques, each addressed to a different source of error, together with ways of identifying those sources. We would then view the approach described in this paper as a way of performing the final reconciliation; of dealing with the residual error. Some work performed parallel to the research described in this paper, by Detlof von Winterfeldt (1980), discusses the different sources of incoherent assessments from a psychological perspective. Dennis Lindley (1980) has also performed some parallel research, which is aimed at producing a procedure for performing "calibration-type" adjustments to probability assessments. Such a procedure would form a useful part of a complete reconciliation package. Further research in this area should be directed at unifying all the different research that has been performed on reconciliation. This includes the present work, work on reconciliation of multiple experts, and work on the pooling of forecasts. It is of prime importance to link mathematical techniques with the sources of incoherence, in order to reduce the arbitrary nature of many reconciliation techniques. Continued investigation of extended theories of belief, such as the use of fuzzy set theory, and Shafer's "Belief functions" (Sharer, 1976) should be carried out. The current research is part of a program looking at incoherent judgments in general, so the work will be extended to deal with utilities and eventually, decisions. The procedure of taking a weighted average may smack somewhat of "adhockery," but it should be clearly understood that it has been derived as an approximation to a complete Bayesian analysis. A comment of De

MULTIPLE PROBABILITY ASSESSMENTS

413

Finetti (1974) is appropriate here. The use of "adhockeries . . . . may sometimes be an acceptable substitute for a more systematic a p p r o a c h . . , only if--and in so far a s - - s u c h a method is justifiable as an approximate version of the correct (i.e., Bayesian) approach. (Then it is no longer a mere 'adhockery.')" It is hoped that we have adequately demonstrated in this paper that the procedure of using a weighted average of log-odds to reconcile inconsistent assessments is sufficiently simple to apply, and that the justification for seeking out incoherence in order to increase the amount of information used is sufficiently compelling for this strategy to become a standard and useful part of the decision analyst's armory.

REFERENCES Brown, R. V., & Lindley, D. V. Improving judgment by reconciling incoherence. Theory and Decision, to appear, 1981. Bunn, D. The synthesis of forecasting models in decision analysis. Basel: Birkhauser Verlag, 1978. De Finetti, B. Theory of probability (2 vols.). London: Wiley, 1974. Freeling, A. N. S. Alternative frameworks for the reconciliation of probability assessments (Tech. Rep. 80-4). Falls Church, Va.: Decision Science Consortium, November 1980. (a) Freeling, A. N. S. Fuzzy sets and decision analysis. IEEE Transactions on Systems, Man, and Cybernetics, 1980, SMC-10, 341-354. (b) Kahneman, D., & Tversky, A. Prospect theory: An analysis of decisions under risk. Econometrica, 1979, 47, 263-291. Lindley, D. V. Probability and statistics. London/New York: Cambridge Univ. Press, 1965. Lindley, D. V. The improvement of probability judgments. In R. V. Brown, A. N. S. Freeling, D. V. Lindley, & D. von Winterfeldt, Papers on the reconciliation of incoherent judgment (Tech. Rep. 80-6). Falls Church, Va.: Decision Science Consortium, November 1980. Lindley, D. V., Tversky, A., & Brown, R. V. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, Series A, 1979, 142, 146-180. Morris, P. A. Decision analysis expert use. Management Science, 1974, 20, 1233-1241. Savage, L. J. The foundations of statistics. New York: Wiley, 1954. Shafer, G. A mathematical theory of evidence. Princeton, N.J.: Princeton Univ. Press, 1976. Slovic, P., & Tversky, A. Who accepts Savage's axiom? Behavioral Science, 1974, 19, 368-373. Tversky, A. Intransitivity of preferences. Psychological Review, 1969, 76, 31-48. Von Winterfeldt, D. Some sources of incoherent judgments in decision analysis. In R. V. Brown, A. N. S. Freeling, D. V. Lindley, & D. von Winterfeldt, Papers on the reconciliation of incoherent judgment (Tech. Rep. 80-6). Falls Church, Va.: Decision Science Consortium, November, 1980. Watson, S. R., Weiss, J. J., & Donnell, M. L. Fuzzy decision analysis. IEEE Transactions on Systems, Man, and Cybernetics, 1979, SMC-9, 1-9. Wheeler, G. E., & Edwards, W. Misaggregation explains conservative inference about normally distributed populations (SSRI Tech. Rep. No. 75-11). Los Angeles: University of Southern California, Social Science Research Institute, August 1975.

414

ANTHONY N. S. FREELING

Winkler, R. L. An introduction to Bayesian inference and decision. New York: Holt, Rinehard & Winston, 1972. Zadeh, L. A. Fuzzy sets. Information and Control, 1965, 8, 338-353.

REFERENCE NOTE 1. Freeling, A. N. S. Possibilities versus fuzzy probabilities-two alternative decision aids In preparation, Decision Science Consortium, 1981. RECEIVED: September 5, 1980

Reconciliation of multiple probability assessments

Reconciliation of multiple probability assessments

Recommend Documents