Symposium on Current Views
of Subjective
Probability
SUBJECTIVE PROBABILITY AS THE MEASURE OF A NON-MEASURABLE SET I.
J.
GOOD
Admiralty Research Laboratory, Teddington, Middlesex, England
1. Introduction I should like to discuss some aspects of axiom systems for subjective and other kinds of probability. Before doing so, I shall summarize some verbal philosophy and terminology. Although the history of the subject is interesting and illuminating, I shall not have time to say much about it.
2. Definition In order to define the sense in which I am using the expression "subjective probability" it will help to say what it is not, and this can be done by means of a brief classification of kinds of probability [16, 11, 8]. Each application of a theory of probability is made by a communication system that has apparently purposive behaviour. I designate it as "you". It could also be called an "org", a name recently used to mean an organism or organization. "You" may be one person, or an android, or a group of people, machines, neural circuits, telepathic fields, spirits, Martians and other beings. One point of the reference to machines is to emphasize that subjective probability need not be associated with metaphysical problems concerning mind (compare [7J). We may distinguish between various kinds of probability in the following manner. (i) Physical (material) probability, which most of us regard as existing irrespective of the existence of orgs. For example, the "unknown probability" that a loaded, but symmetrical-looking, die will come up 6. (ii) Psychological probability, which is the kind of probability that can be 319
320
PROBABILITY AND INDUCTION
in ferred to some extent from your behaviour, including your verbal communications. (iii) Subjective probability, which is psychological probability modified by the attempt to achieve consistency, when a theory of probability is used combined with mature judgment. (iv) Logical probability (called "credibility" in [19J for example), which is hypothetical subjective probability when you are perfectly rational, and therefore presumably infinitely large. Credibilities are usually assumed to have unique numerical values, when both the proposition whose credibility is under consideration and the "given" proposition are well defined. I must interrupt myself in order to defend the description "infinitely large". You might be asked to calculate the logical probabilities of the Riemann, Fermat, and Goldbach conjectures. Each of these probabilities is either 0 or 1. It would be cheating to wait for someone else to produce the answers. Similarly, as pointed out in [17, p. x], you cannot predict the future state of society without first working out the whole of science. The same applies even if you are satisfied with the logical probabilities of future states of society. Therefore a rational being must have an infinite capacity for handling information. It must therefore be infinitely large, or at any rate much larger than is practicable for any known physical org. In other words, logical probabilities are liable to be unknown in practice. This difficulty occurs in a less acute form for subjective probability than for logical probability. Attempts have been made [2, 10J to define logical probability numerically, in terms of a language or otherwise. Although such a programme is stimulating and useful, the previous remarks seem to show that it can never be completed and that there will always remain domains where subjective probability will have to be used instead. (In Camap's contribution to this Congress he has shifted his position, and now defines logical probability to mean what I call numerically completely consistent subjective probability. He permits more than one consistent system of probabilities. Thus his present interpretation of logical probability is a consistent system within a "black box" in the sense of Section 3 below.) Physical probability automatically obeys axioms, subjective probability depends on axioms, psychological probability neither obeys axioms nor depends very much on them. There is a continuous gradation, depending on the "degree of consistency" of the probability judgments with a system of axioms, from psychological probability to subjective probability, and beyond, to logical probability, if it exists. Although I cannot define "degree of consistency", it seems to me to have very important intuitive significance. The notion is indispensable. In my opinion, every measure of a probability can be interpreted as a subjective probability. For example, the physical probability of a 6 with a loaded die can be estimated as equal to the subjective probability of a 6 on the next throw, after several throws. Further, if you can become aware of
GOOD: The Measure ofaXon-measurable Set
321
the value of a logical probability, you would adopt it as your subjective probability. Therefore a single set of axioms should be applicable to all kinds of probability (except psychological probability), namely the axioms of subjective probability. Superficially, at least, there seems to be a distinction between the axiom systems that are appropriate for physical probability and those appropriate for subjective probability, in that the latter are more often expressed in terms of inequalities, i.e., comparisons between probabilities. Theories in which inequalities are taken seriously are more general than those in which each probability is assumed to be a precise number. I do not know whether physical probabilities are absolutely precise, but they are usually assumed to be, with a resulting simplification in the axioms.
3. A Black-Box Description of the Application of Formalized Theories I refer here to a "description", and not to a "theory", because I wish to avoid a discussion of the theory of the application of the black-box theory of the application of theories [4, 5, (iJ. The description is in terms of the block diagram of Fig. 1 in which observations and experiments have been omitted. It consists of a closed loop in which you feed judgments into a black box and feed "discernments" out of it. These discernments are made in the black box as deductions from the judgments and axioms, and also, as a matter of expediency, from theorems deduced from the axioms alone. If no judgments are fed in, no discernments emerge. The totality of judgments at any time is called a "body of beliefs". You examine each discernment, and if it seems reasonable, you transfer it to the body of beliefs. The purpose of the deductions, in each application of the theory, is to enlarge the body of beliefs, and to detect inconsistencies in it. When these are found, you attempt to remove them by means of more mature judgment.
JUDGMENTS (Body of beliefs)
I
MATHEMATICAL DEDUCTIONS
I DISCERNMENTS
• IAXIOMS--THEOREMSI THE BLACK BOX
"YOU"
FIG.
1. The black-box flow diagram for the application of formalized scientific
theories.
322
PROBABILITY AND INDUCTION
The particular scientific theory is determined by the axioms and the rules of application. The rules of application refer to the method of formalizing the judgments, and of "deformalizing" the mathematical deductions. For example, in a theory of subjective probability the standard type of judgment might be a comparison of the form
P'(EIF) ~ P'(GIH), where P'(EIF) is the intensity of conviction or degree of belief that you would have in E if you regarded F as certain. The P"s are not necessarily numerical, and what is meaningful is not a P' by itself, but a comparison of intensities of conviction of the above type. These judgments are plugged into the black box by simply erasing the two dashes. Likewise, discernments can be obtained by taking an output inequality, P(EIF) ~ P(GIH), and putting dashes on it. The P's are assumed to be numbers, even if you can never discover their values at all precisely. This is the reason for the expression "black box". The black box may be entirely outside you, and used like a tame mathematician, or it may be partially or entirely inside you, but in any case you do not know the P's precisely. Following Keynes and Koopman, I assume that the P"s are only partially ordered. Apart from the axioms and rules, there are in practice many informal suggestions that you make use of, such as the need to throwaway evidence judged to be unimportant, in order to simplify the analysis, and yet to avoid special selection of the evidence. (In spite of all the dangers, objectivistic methods in statistics invariably ignore evidence in order to achieve objectivity. In each application a subjective judgement is required in order to justify this device. Compare the idea of a "Statistician's Stooge" in [9].) But in this paper I am more concerned with axioms and rules of application than with "suggestions". De luxe black boxes are available, with extra peripheral equipment, so that additional types of judgment and discernment can be used, such as direct judgments of odds, log-odds, "weights of evidence", numerical probabilities, judgments of approximate normality, and (for a theory of rational behaviour) comparisons of utilities and of expected utilities [5 or 6]. (There are numerous aids to such judgments, even including black-box theorems, such as the central limit theorem, and a knowledge of the judgments of other orgs. All such aids come in the category of "suggestions".) But, as we shall see shortly, for the simplest kind of black box, a certain kind of output must not be available.
4. Axiom Systems for Subjective Probability See, for example, [18, 3, 13, 14, 4, 20) and, for similar systems for logical probability, [12, lO]. The axioms of subjective probability can be expressed in terms of either
GOOD: The Measure of a Non-measurable Set
323
(i) comparisons between probabilities, or preferences between acts, or (ii) numerical probabilities. Koopman's system [13, 14J was concerned with comparisons between probabilities, without reference to utilities or acts. Although it is complicated it is convincing when you think hard. From his axioms he deduced numerical ones for what he called upper and lower probabilities, P" and P *. We may define P*(EIF) and P*(EIF) as the least upper bound and greatest lower bound of numbers, x, for which you can judge or discern that P' (ElF) > x or < x. Here P'(EIF) is not a number, although x is. The interpretation of the inequality P' (ElF) > x is as follows. For each integer n, perfect packs of n cards, perfectly well shuffled, are imagined, so that for each rational number, x = m/n(m < n), there exist propositions, G and H, for which P(G IH) would usually be said to be equal to x. The inequality is then interpreted as P'(EIF) > P'(GIH). Note that P* (ElF) and P*(EIF) depend on the whole body of beliefs. Also note that P * (ElF) is not the least upper bound of all numbers, x, for which you can consistently state that P'(EIF»x: to assume this interpretation for more than one probability would be liable to lead to an inconsistency. If P * = P*, then each is called P, and the usual axioms of probability are included in those for upper and lower probability. The analogy with inner and outer measure is obvious. But the axioms for upper and lower probability do not all follow from the theory of outer and inner measure. It is a little misleading to say that probability theory is a branch of measure theory. In order to avoid the complications of Koopman's approach, I have in the past adopted another one, less rigorous, but simpler. I was concerned with describing how subjective probability should be used in as simple terms as possible more than with exact formal justification. (I gave an informal justification which 1 found convincing myself.) This approach consisted in assuming that a probability inside the black box was numerical and precise. This assumption enables one to use a simple set of axioms such as the following set (axioms C). C1. P(EIF) is a real number. (Here and later, the "given" proposition is assumed not to be self-contradictory.) C2. 0
< P(EIF) < 1.
DEFINITION. It P(EIF) = 0 (or 1), then we say that E is "almost impossible" (or "almost certain") given F.
C3.
It
E.F is almost impossible given H, then P(E v FIH) = P(EIH)+P(FIH) (addition axiom).
C4. It H logically impUes E, then E is almost certain given H (but not conversely) .
PROBABILITY AND INDUCTION
324
C5. If H·E and H·F are not self-contradictory and H·E implies F and H·F implies E, then
P(EIH) = P(FIH) (axiom of equivalence). C6.
P(E'FIH)
=
P(EIH) . P(FIE . H) (product axiom).
C7. (Optional.) If E; . E, is almost impossible given H (i
< i:
i,
i=
1,2,
3, ... ad inf.), then
P(E 1 v E 2 V
•••
IH) =
!
P(E;IH) (complete additivity).
(The above axioms are not quite the same as axiom sets A and B of [4].) C8. (The Keynes-Russell form of the principle of cogent reason. Optional. See [19, p. 397J, [4, p. 37].) Let ep and 1jJ be proposit£onal functions. Then
P(ep(a)I1jJ(a») = P(ep(b)I1jJ(b»). I describe this axiom as "optional" because I think that in all those circumstances in which it is judged to be (approximately) applicable, the judgment will come to the same thing as that of the equation itself, with dashes on. It follows from axiom C1, that P(EIF) <'. .>. or = P(GIH), but we do not want to deduce that P' (E IF) and P' (GIH) are comparable. There is therefore an artificial restriction on what peripheral equipment is available with de luxe black boxes. This artificiality is the price to be paid for making the axioms as simple as possible. It can be removed by formulating the axioms in terms of upper and lower probabilities. To use axioms C is like saying of a non-measurable set that it really has an unknowable ("metamathematical") measure lying somewhere between its inner and outer measures. And as a matter of fact there is something to be said for 'this paradoxical-sounding idea. If you will bear with me for a moment I shall illustrate this in a nonrigorous manner. Suppose A and B are two non-intersecting and non-measurable sets. Write m for the unknowable measure of a non-measurable set, and assume that
m(A+B) = m
(A)+m(B).
Then
m(A+B) < m*(A)+m*(B),
m(A+B)::?: m*(A)+m*(B).
Therefore (for elucidation, compare the following probability argument)
and these inqualities are true. Similarly,
m(A)
= m(A+B)-m(B) < m*(A+B)-m*(B).
GOOD: The Measure of a Non-measurable Set
325
Therefore m*(A)
< m*(A+B)-m*(B),
which is also true. The same metamathematical procedure can be used, more rigorously, in order to derive without difficulty, from axioms C together with the rules of application, a system of axioms for upper and lower probability. These are the axioms D listed below. As an example, I shall prove axiom D6(iii). We have P(E· FIH) = P(EIH) . P(FIE . H).
Therefore, if P(FIE· H) =1= 0, P(E· FIH)/P(F[E . H) = P(EIH).
But P*(E· FIH) ~ P(E' FIH), since, in this system, P*(E· FIH) is defined as the greatest lower bound of numbers, x, for which it can be discerned that x > P(E . FIH). Similarly, P*(FIE· H) ~ P(FIE· H).
Therefore P*(E· FjH)/P*(FIE· H) ~ P(E/H).
Therefore P*(E· F/H)/P*(FIE· H) ~ P*(EIH).
Q.E.D. The main rule of application is now that the judgment or discernment P'(E/F) > P'(GIH) corresponds to the black-box inequality P*(EIF) > P*(GIH). Koopman derived most, but not all, of axioms Dl - D6 from his non-numerical ones, together with an assumption that can be informally described as saying that perfect packs of cards can be imagined to exist. His derived axioms for upper and lower probability do not include axiom D6(iii) and (iv). (D7 and D9 were excluded since he explicitly avoided complete additivity.) I have not yet been able to decide whether it is necessary to add something to his non-numerical axioms in order to be able to derive D6(iii) and (iv). Whether or not it turns out to be necessary, we may say that the present metamathematical approach has justified itself, since it leads very easily to a more complete set of axioms for upper and lower probability than were reached by Koopman with some difficulty. The axioms D will now be listed. I have not proved that the list is complete, i.e., that further independent deductions cannot be made from axioms C. Dl. P*(EIF) and P*(EIF) are real numbers. (Here and later the given proposition is assumed not to be' self-contradictory.) D2. 0 < P*(EIF) < P*(EIF) < 1;,
326
PROBABILITY AND INDUCTION
DEFINITION. If P" = P*, each is called P. The previous definitions of "almost certain" and "almost impossible" can then be expressed as P * = 1 and P" = o.
D3. If E· F is almost impossible given H, then (addition axiom)
P*(EIH)+P*(FIH) :s;; P*(E v FIH) :s;; P*(EIH)+P*(FIH)
P*(EIH) = P*(FIH), P*(EIH) = P*(FIH). D6. (Product axiom.) (i) P*(E· (ii) P*(E' (iii) P*(E· (iv) P*(E· (v) P*(E· (vi) p* (E·
FIH) FIH) FIH) FIH) F/H) FIH)
> P*(EIH) . P*(F/E· H); > P*(E/H) . P*(FIE· H); > P*(EIH) . P*(FIE· H); :s;; P*(E/H) . P*(FIE· H); :s;; P*(E/H) . P*(FIE· H); :::; p* (EIH) . p* (FIE· H).
D7. (Complete super- and sub-additivity. Optional.)
It E,> E, is almost impossible given H(i then (i) (ii)
< j; i, j = 1,2,3, ... ad inf.),
P*(ElIH)+P*(E2IH) + ... :;;; P*(E I V E 2 v··· IH); P*(ElIH)+P*(E2IH)+··· > P*(E I V E 2 v·· ·IH).
D8. (Cogent reason. Optional.) Let c/J and 1p be propositional functions. Then P * (c/J(a) 11p(a)) = P * (c/J(b) 11p(b)) , P*(c/J(a) 11p(a)) = P*(c/J(b) 11p(b)). D9. (Complete super- and sub-multiplicativity. Optional.) For any (enumerabl~) sequence o] propositions, E l , E 2, .. '.
(i) P*(ElIH)P*(E2IEl' H)P*(Ea/E l . E 2 • H) ... ~ P*(E I • E 2·· 'IH); (ii) P*(ElIH)P*(E2IEl' H)P*(ElaEl' E 2· H)··· 2: P*(E l· E 2·, ·IH).
The corresponding result in the C system is a theorem. (See Appendix.) I have not been able to prove that
P*(E v F)+P*(E· F)
~
P*(E)+P*(F),
even though the corresponding property is true of Lebesgue outer measure, and I suspect that it does not follow by the above methods. It would be possible to prove it (compare [1, p, 14J) provided that we assumed: D 10. (Optional.) Given any proposition, E, and a positive number, e, there exists a proposition, G, which is implied by E, and has a precise probability P(G) < P» (E) +e. (This axiom may be made conditional on another proposition, H, in an obvious manner.)
GOOD: The Measure of a Non-measurable Set
327
I cannot say that D 10 has much intuitive appeal, although the corresponding assertion is true in the theory of measure. It seems that the theory of probability is not quite a branch of the theory of measure, but each can learn something from the other. .Incidentally, random variables can be regarded as isomorphic with arbitrary functions, not necessarily measurable. I understand that this thesis is supported by de Finetti. Also upper and lower expectations can be defined by means of upper and lower integrals in the sense of Stone [21].
5. Higher Types of Probability A familiar objection to precise numerical subjective probability is the sarcastic request for an estimate correct to twenty decimal places, say for the probability that the Republicans will win the election. One reason for using upper and lower probabilities is to meet this objection. The objection is however raised, more harmlessly, against the precision of the upper and lower probabilities. In [5J, and in lectures at Princeton and Chicago in 1955, I attempted to cope with this difficulty by reference to probabilities of "higher type". When we estimate that a probability P'(EIH) lies between 0.2 and 0.8, we may feel that 0.5 is more likely to be rational than 0.2 or 0.8. The probability involved in this expression "more likely" is of "type II". I maintain that we can have a subjective probability distribution concerning the estimate that a perfectly rational org would make for P'(EIH) a subjective probability distribution for a credibility. It this probability distribution were sharp, then it could be used in order to calculate the expected credibility precisely, and this expectation should then be taken as our subjective probability of E given H. But the type II distribution is not sharp; it is expressible only in terms of inequalities. These inequalities themselves have fuzziness, in fact the fuzziness obviously increases as we proceed to higher types of probability, but it becomes of less practical importance. It seems to me that type II probability is decidedly useful as an unofficial aid to the formulation of judgments of upper and lower probabilities of type I. I would not myself advocate even the unofficial use of type III probability for most practical purposes, but the notion of an infinite sequence of types of probability does have the philosophical use of providing a rationale for the lack of precision of upper and lower probabilities.
Appendix. Continuity. (See the remark following D9.) There is a well-known strong analogy between the calculus of sets of points and the calculus of propositions. In this analogy "E is contained in F" becomes "E implies F"; E + F becomes E v F; E -F becomes E· F; E n F becomes E . F; "E is empty" becomes "E is impossible"; "all sets are contained in E" becomes "E is certain"; En ,71 becomes E 1 :J E 2 :J E 3 :J ... ; En \J becomes' .. E 3 :J E 2 :J E 1 • Accordingly we can define, for an infinite sequence of propositions, {En}'
328
PROBABILITY AND INDUCTION
lim sup En = (E l V E 2 V • , .) • (E 2 V E 3 V ' •• ) , (E 3 V E 4 V ••• ) , • " lim inf En = (E l· E 2 ••• ) V (E 2 ' E 3 •• ,) V (E 3 ' E 4 • •• ) v· . " If these are equal, each is called lim En. The limit of a monotonic increasing (or decreasing) sequence of propositions is
E lvE 2 v · · · ( o r E l,E 2 · , · ) · The other definitions and arguments given, for example in [15, pp. 84-85] can be at once adapted to propositions, and we see that complete additivity is equivalent to continuity, i.e., lim P(E n) = P (lim En) if {En} is a monotonic sequence of propositions. It can then be· proved that, for example, peEl . E 2
••• )
= peEl) , P(E 2IE l) . P(E 3 IEl ' E 2 )
, • "
i.e., we have "complete multiplicativity". The axiom D9 is derived from this theorem by means of the metamathematical argument of Section 4. The analogy between propositions and sets of points is imperfect. For the analogy of a point itself should be a logically possible proposition, E, that is not implied by any other distinct proposition that is logically possible. It is not easy to think of any such proposition E, unless the class of propositions has been suitably restricted. Fortunately the notion of a point is inessential for the above analysis: the algebra of sets of points makes little use of the notion of a point. REFERENCES
[lJ BURKILL, J. C. The Lebesgue Integral, Cambridge, University Press, 1951. [2] CARNAP, R. Logical Foundations o] Probability. Chicago, University of Chicago Press, 1950. [3] FINETTI, B. DE. La prevision: ses lois logiques, ses sources subjectives. Annales de l'Inst. Henri Poincare, Vol. 7 (1937), pp. 1-68. [4] GOOD, I. J. Probability and the Weighing o] Evidence, London, Griffin; New York, Hafner; 1950. [5J GOOD, I. J. Rational decisions. Journal ot the Royal Statistical Society, Series B, Vol. 13 (1952), pp. 107-114. [6J GOOD, I. J. Chapter 3 of Uncertainty and Business Decisions, 2nd ed., Liverpool, University Press, 1957. (Ed. by Carter, Meredith and Shackle.) [7J GOOD, I. J. Could a machine make probability judgments? Computers and Automation, Vol. 8 (1959), pp. 14-16 and 24-26. [8J GOOD, I. J. Kinds of probability. Science, Vol. 129 (1959), pp. 443-447. [9] GOOD, I. J. The paradox of confirmation. British Journal o] the Philosophy ot Science, Vol. 11 (1960), pp. 145-149; Vol. 12 (1961), pp. 63-64. [10] JEFFREYS, H. Theory o] Probability. Oxford, University Press, 1939. [11] KEMBLE, E. C. The probability concept. Philosophy at Science, Vol. 8 (1941), pp. 204-232. [12] KEYNES, J. M. A Treatise on Probability, London, Macmillan, 1921. [13] KOOPMAN, B. O. The axioms and algebra of intuitive probability, Annals at Mathematics, Vol. 41 (1940), pp. 269-292. [14] KOOPMAN, B. O. The bases of probability. Bulletin at the American Mathematical Society, Vol. 46 (1940), pp. 763-774. [15] LOEVE, M. Probability Theory, Toronto, New York, London, van Nostrand. 1955.
GOOD: The Measure of a Non-measurable Set
329
[16J POISSON, S. D. Recherches sur la Probabilite des [ugements, Paris, Bachelier, 1837. [17J POPPER, K. R. The Poverty of Historicism, London, Routledge and Kegan Paul, 1957. [18J RAMSEY, F. P. Chapters 7 and 8 of The Foundations of Mathematics, London, Routledge and Kegan Paul, 1931. [19J R.USSELL, B. Human Knowledge, its Scope and Limits, London, George Allen, 1948. [20J SAVAGE, L. J. The Foundations of Statistics. ~ew York, Wiley; London, Chapman and Hall, 1954. [21J STONE, M. H. Notes on integration, I, II, III, IV. Proceedings of the National Academy of Sciences, Vols. 34 and 35 (1948).