Simplicity, Entropy and Inductive Logic

Simplicity, Entropy and Inductive Logic

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC KURT WALK IBM Laboratory, Vienna, Austria Introduction. It seems to be generally agreed that there is a princ...

842KB Sizes 0 Downloads 82 Views

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC KURT WALK IBM Laboratory, Vienna, Austria

Introduction. It seems to be generally agreed that there is a principle of simplicity that plays an important role as a guide of our inductive behaviour: in situations where a choice is to be made among a number of otherwise equally acceptable hypotheses, the principle tells us to compare the hypotheses with respect to their simplicity, and to choose the hypothesis which is the simplest one. There is disagreement, however, concerning the grounds of the acceptance of this principle, and concerning the (quantitative) explication of the notion of simplicity. In this paper I will concentrate on an explication of the notion of simplicity of state descriptions. As for the justification of the principle of simplicity I refer to the discussion of this topic by Kemeny [1953], who holds that the principle is not a presupposition concerning the simplicity of nature, but that it may be regarded as a consequence of requirements set up by methodological considerations. The formulation of the principle proposed later in this paper is essentially in line with this view. The purpose of this paper is to advocate an explication of simplicity which leads to an entropy-like measure, and which has turned out to be useful in settling problems of application of inductive logic. Examples of the use of this interpretation of simplicity are given in the second part of the paper; they pertain to Catnap's system of inductive logic and to an extension of it. Measures of simplicity, however, already come into the picture prior to the establishment of a quantitative system of inductive logic, as a means to handle choice situations, and as a guide for the construction of a quantitative system. Hermes, Kiesow and Oberschelp give quantitative explications of the simplicity of state descriptions, with a view to justifying certain axioms of Carnap's theory of probability. In the first part of the paper I shall follow this line of thought, and show that a natural extension of the principle underlying the construction of these measures, and also other approaches, gives rise to an entropy-like measure, and that the results obtained in this way are

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC

67

deeper than those obtained by Hermes. The aim is to offer reasons for the acceptance of the proposed measure. 1. The explication of simplicityby Hermes, Kiesowand0 berschelp. Carnap's quantitative system of inductive probabilities [1963] is based on a set of axioms, which includes Kolmogorov's axioms of probability theory. These axioms, which are not at all sufficient for determining numerical values of probability (except in trivial cases), may be taken to be explained in terms of the basic notions of betting theory. In fact, the requirement of the rationality of combined bets yields precisely these axioms; see Kemeny [1955], Lehman [1955] and Shimony [1955]. Hermes [1958] poses the question whether an explanation, other than the appeal to common sense, can be given also to the further axioms of Carnap's theory, and he investigates the principle of simplicity for this purpose. In order to give the principle a precise meaning, an exact explication of simplicity is needed. This problem becomes manageable if the simplicity of statements is considered rather than that of propositions. The simplicity of a hypothesis then is measured by the simplicity of the statement which in a certain language expresses it. It is the first intuitive idea to relate the simplicity of a statement to its length. Hermes does not object to the obvious consequence that the properties of the particular language will influence the resulting measure of simplicity, arguing that scientific hypotheses really do not come into existence unless they are formulated in a well-defined language, and that scientific languages therefore play in any case a decisive role in scientific thinking. A language used to carry out these ideas will preferably be a formal language. Hermes uses for this purpose the predicate calculus with identity, with a finite number of individual constants, an arbitrary number of individual variables, and the usual logical constants, connectives, and quantifiers. What is the length of expressions in this language, and how is the length related to simplicity? An answer to these questions has been proposed by Kiesow [1958]. He first restricts the range of statements taken into consideration to state descriptions, in order to compare only statements of equal descriptive strength. The length of a state description is defined as the number of atomic statements ofwhich it is composed. The simplicity of a state description is measured by the length of the shortest equivalent state description ~ this assures that equivalent state descriptions are given the same simplicity measure. A state description 1 is simpler than a state description 2, if the length of the shortest

68

KURT WALK

state description equivalent to 1 is smaller than the length of the shortest state description equivalent to 2. Let us consider a simple example. We speak about n individuals aj, a z, '" an and one primitive predicate P. The state description sd asserts that n j individuals ai' a z , ... a nl, have the predicate P and the remaining n-nj individuals have the predicate --, P, and we assume that n 1 >n-nj' The language under consideration allows the representation of this statement as sd:

(1)

The length of sd according to this convention is equal to n - n j + 1, which is obtained by counting the atomic statements x=ai (i=nj + 1, ... , n) and the atomic statement --, Px. It can be shown that there is no description which is shorter in this sense. The principle of description leading to the construction of this expression is something like the following: the statement asserts that, as a rule, the individuals have the predicate P; but that there are several exceptions, namely the individuals anI + I , . . . an> which have the predicate --, P. So the length of the expression is determined by the number of exceptional individuals which disturb the uniformity of the state described. Before considering the implications of this measure together with the principle of simplicity, I briefly present two other explications of simplicity. One observes that it is essential for the length of state descriptions in Kiesow's sense that the language used contains the equivalence symbol <--+. This indicates a rather unpleasant dependence on the properties of the language. Although simplicity has been introduced basically as a language concept, efforts have been made to reduce this dependence. To be independent of particular logical operators, Oberschelp [1960] uses languages with no restriction on the kind of logical operators, so that the languages contain all two-place operators, all three-place operators and so on. The problem is again to find the shortest possible description equivalent to any state description. The result is surprisingly simple, so that it can easily be reported. Let us consider languages with m exclusive predicates (Carnap's Q-predicates), then each state description induces a partitioning of individuals into classes having the same predicate.Tf' the number of individuals in the largest class is k l , then the length of the shortest description is (2) thus depending for given m (the number of predicates) and n (the number of individuals) only on the cardinality of this largest class.

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC

69

The objection to be raised against this measure and also to Kiesow's measure clearly is that the distribution of the individuals over the classes other than the largest class is completely ignored. It was this objection that made Oberschelp propose as an alternative measure of simplicity not a single number, but an r-tuple of numbers: (3)

where the k, are the cardinalities of the classes of individuals ordered with respect to their size. A state description sd ; in this sense has to be regarded as simpler than a state description Sdl' if the first number (from left to right) differing for the two descriptions, is larger for state description sd I. Hermes [1958] makes the following comments on such an explication of simplicity: first, we prefer a measure which is a single number; second, the considerations which lead to measure (2) are based on an arbitrarily complicated language (it contains arbitrarily complicated logical operations), so that the complexity of a situation to be described may only cause the use of complex language elements instead of causing a long descriptive expression. I will postpone the question of the explication of simplicity at this point and consider the significance of the principle of simplicity for probability theory. Hermes proposes the following formulation as an axiom: a state description sd, is more probable than a state description sd, if and only if the simplicity of sd, is greater than the simplicity of Sdl' and the two state descriptions are equally probable if they are equally simple. The following example illustrates the use of the principle. We consider the case of a singular prediction. The evidence is a state description of n individuals specifying the cardinal numbers k l , k l , ... , k; of the classes of individuals having the same predicate. We have the choice among a set of hypotheses hI' h l , ... , h m , where the hypothesis hi predicts that the (n + 1)st individual (not mentioned in the evidence) has the predicate Pi (i=l, 2, ... , m). We compute the length of the state description of all n + 1 individuals under an assumption that hi is true. Doing this for all hypotheses hi allows for the comparison of the hypotheses with respect to simplicity. The result is, for all measures of simplicity mentioned, that the simplest case is obtained when the new individual is associated with the largest class, i.e. that that hypothesis is the most probable which gives the new individual the predicate that occurs most frequently in the evidence. The result confirms exactly that axiom in Carnap's axiom system which indicates the direction of inductive behaviour: things that are experienced to happen most frequently, are most probable to be experienced in the future.

70

KURT WALK

2. The explication of simplicity by entropy. To begin with, I return to the measure proposed by Kiesow. I have already stated that the description principle leading to this measure is the description of the universe (i.e. the set of individuals) as being partitioned into a class of individuals following a rule and a class of individuals constituting exceptions to this rule. If the whole universe follows the rule (i.e. if all individuals are alike), we have the most pleasant case for description - it suffices to report the rule. We will extend the principle of description in a natural way. The approach is characterized by the following features: (I) In the originally proposed measures there was only one rule which could be stated and to which exceptions could be given. We reformulate the description principle by saying that we will first give a rough description of the universe (a higher order description), i.e. we assign the universe to a member of a set of possible classes of universes; and in a second step we specify the details necessary to give a full specification. The total effort of description is the sum of the efforts involved in the two steps. The second effort will be the smaller, the fewer details are left open by the higher order description. The classes of universes, which are the object of the higher order description, are to be chosen in a way which is most natural for the universes under consideration. In the example considered in this paper we propose to characterize a class by the set of numbers k i , which are the cardinalities of the subsets of individuals having the same predicates. For illustration purposes we look at an example of a different universe. Suppose we know a finite number of points of a curve in the plane and we have to select a hypothesis which gives a description of the whole curve. It seems to be quite natural to classify all possible curves according to their order, so that a higher order description specifies the order of a curve. The details needed to specify fully a curve of the nth order are, for example, the coordinates of n + 1 points. A total description consists of a description of the order and a description of the details. We will choose that hypothesis which is the simplest to describe (which for almost any language used will give the curve with the lowest order possible). (2) We will not consider a specific language in our explication, but we will use the amount of decision (the decision content in the sense of information theory) involved in a description as a measure for the length of the description. This is to mean that if a choice among k possibilities is to be made, then the amount of decision involved in the selection of one possibility by a specifying description is equal to logk. If the logarithm of base r is used, then, according to a theorem of information theory, the minimum average length

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC

71

of a description specifying the selection (averaged over all possible selections) is greater than or at most equal to log.e, if an alphabet of r letters is used. In addition, it has been shown in information theory that this is a good bound. Using the decision content as a measure of the effort of description retains the original intent of explaining simplicity as a language concept. But the resulting measure, specifying the minimum length of an expression, is independent of special language properties. Of course, if we consider the total number of possible state descriptions for n individuals and m predicates, we could compute the decision content of a state description as the logarithm of this number, which is thus the same for all state descriptions. This is a useless measure; we get reasonable results only if we compute the decision content in connection with the levels of descriptions required under (1). We thus replace the dependence on specific language properties by the dependence on a specific description principle. The classes of state descriptions specified by the numbers k, correspond to Carnap's structure descriptions. The number of different structure descriptions for given m and n is (n + m - I)! N = _ .._ -- -(4) S

n!(m -

I)!

and the number of state descriptions for a given structure description is n! Nd=~rr i

kj !

.

(5)

The decision content of a total description is D

= log N, + log N,

(6)

which for fixed m and n depends only on the second term Dz

= log N, = log

n! rr. k;!

(7)

(The base of the logarithms is left unspecified; it is only assumed that it is fixed.) We now assert that a description 1 is simpler than a state description 2, and is to be given the higher probability, if D z of description 1 is smaller than D z of description 2. The measure D z is a single number and depends on the distribution of the individuals over all classes. It is thus possible to give a sensible ordering of the state descriptions with respect to their probabilities which is based on the simplicity principle. The measure is an entropy-like measure: for the number n

72

KURT WALK

of individuals approaching infinity, D z approaches nH: Dz---+nH

(8)

where H is the entropy of the distribution of the individuals over the classes: H

=-

L

k. k, ----'-log-'. n n

(9)

Before discussing the use of our explication of simplicity for probability theory, I sketch another approach which leads to the same measure (8). We again consider a population of individuals and a set of predicates, defining equivalence classes of individuals. In the most uniform population, an the individuals belong to one such class, i.e. the individuals are not distinguishable with respect to their properties. Speaking about an individual implies, however, that the individuals can be uniquely identified. The individuals are given names for this purpose, and in order to make n individuals of a uniform population identifiable, we have to supply n different names. Again according to information theory, a name out of a set of n different names must contain at least log,» decisions between r possibilities, i.e. it consists of at least log,a symbols out of an r-symbol alphabet (if all names are of equal length, which however is an unnecessary restriction). If the population is not uniform, we have to supply different names only for individuals which are in one and the same class, in order to "individualize" the population. If we have the predicates red and black, we may speak of a "red Jack" and a "black Jack", using the name"Jack" for two different individuals. If the r h class contains k, individuals, then the total minimum length of names for this class is k, log k., and the minimum length for the total population is

I

i

k, log k, = n log n - nH.

(10)

The variable part in this expression is again the entropy H of the distribution. We now consider a population the simpler (or the more uniform), the less individuality its members have, i.e. the greater the effort is that is required for making them individuals. We now return to the example of the singular prediction from the preceding chapter and apply the measure we have defined. We compute the decision content of the over-an state description sd, under the assumption that the hypothesis hi is true:

(n + I)! D(sd.) = l o g - - - - - - - - I

k 1!k z!. .. (k i+l)!. .. k m !

(11)

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC

73

It is easily seen that

if This results in an ordering of the hypotheses with respect to their simplicity and by this with respect to their probability, so that we have the basis for a comparative concept of confirmation. The result is more far-reaching than the result obtained with the measures based on the predicate calculus, which does not give an ordering of the hypotheses but only the separation of the one hypothesis which is to be considered the most probable. A quantitative system of inductive logic should retain the ordering of state-descriptions according to their probabilities as defined by their simplicities. In other words, there must exist a monotonic function transforming the simplicities into the probabilities of state descriptions. For the case of Catnap's c*-function this transformation is especial1y simple. The measure function (a priori probability) of a state description specifying the set of numbers k ., k 2 , ••• , k m in the c*-system is explained in Carnap [1952]: ca(sd)

(m - I)! TIki! =

_i __ • J

(m + n - I)!

(12)

The decision content (6), which explains the simplicity of state descriptions is given by (m + n - I)! D = log-------(13) (m-1)! TIki! i

so that the relation to the measure function is D = -log ca(sd).

(14)

This simple relation il1ustrates the exceptional position of the c*-function. We have explicated simplicity as the effort of description and thus as the effort involved in reporting about a situation, so that there is something which may wel1 be cal1ed information. We can reformulate the principle of simplicity in the fol1owing way: if you have to explain a situation by a hypothesis, then do this by adding as little information as possible, i.e. choose the hypothesis which contains the minimum of what one could cal1 "semantic noise". There is the simple case where we have observed n individuals, ni of which have the predicate P, and where we have to select a hypothesis describing a set of new individuals. IfnI is larger than n-s-n., then we choose the hypothesis

74

KURT WALK

assigning P to all new individuals. This follows from the simplicity principle in aLI its interpretations and also from Carnap's theory, although we may expect (and in fact do expect) a rather different distribution of P and -, P among the new individuals. But if we accept this different distribution, we also have to make a decision concerning the arrangement of P- and -, Pindividuals, i.e. we have to add information we don't have, which could be given equally well by a random-output procedure, and which is thus "noise". An important thing to note is that in information theory the concept of probability is prior to the concept of information, and that the use of the notion of information in an inductive system is in danger to become circular. In the present approach, however, we have made no reference to probability when speaking about simplicity and information - we just referred to a principle of description. 3. Applications to inductive logic. Inductive logic referred to in the sequel will denote Carnap's system of inductive logic and an extension of it, whose essential features have been described in Walk [1964]. The extension concerns a world of individuals that are arranged in a linear order. A short description of this system is appropriate at this point. The system is based on a set of languages 2 N.m' If a specific language 2 N,m is used, then the description of a finite sequence of individuals specifies the properties of all the N-tuples of adjacent individuals in the sequence. One N-tuple is characterized by an ordered sequence of N predicates, where the predicates are from the set of m predicates contained in 2 N,m' There are, obviously, m" different N-tuples possible. The information used for induction is the knowledge of the frequencies of the various N-tuples in the sequence. An atomic sentence in Carnap's system, which associates an individual constant with a predicate, corresponds to the description of an N-tuple of individuals in 2,y.m' The N-tuples, however, do not occur independently of one another. For example, let N=3, m=2, and the two predicates PI and P2 ; then a 3-tuple of individuals in a sequence with the predicates PI P2P2 implies the existence of a 3-tuple (immediately adjacent to the right) with the predicates P 2P2 PI or P2P2P 2. The set of possible adjacent N-tuples at a certain point in a sequence makes up the "state" of this point. The above example illustrates that the state is determined by an (n ·-1 )-tuple of predicates (in the example the 2-tuple P2P2)' The structure of a sequence as induced by !i?N.m is representable by a directed graph, as shown in the figure below for 2 3 , 2 ' The nodes of the graph represent the states of the sequence, the arcs, labelled with the predicates, represent the transitions from individuals in the

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC

75

sequence to adjacent individuals having the respective predicates. Each path in the graph corresponds to a sequence of individuals, and vice versa. A singular predictive inference concerns the transition from the end state of a sequence described in the evidence to one of the possible successor states.

P, P,

The description of a sequence in !l'N,m specifies the frequencies of the m N different N-tuples in the sequence. Because of the interdependence of the N-tuples, only certain sets of frequencies are possible for a sequence of a given number of individuals. For convenience, we think of a set of relative frequencies as represented by a vectorfin an mN-dimensional space S, where each coordinate measures the frequency of a specific N-tuple. Each possible description of a sequence of N-individuals marks a vector in the space. For an infinite sequence, a certain continuum of relative frequencies of N-tuples is possible, which defines a subspace Soo in the frequency space. The estimation of the long-run frequencies, given the frequencies in a finite sequence, is a transition from the vector representing these frequencies in S to one of the vectors in Soo- A distinguished vector in Soo is the vector f' for which the difference vector tofis minimum;f' is obtained by projection offon Soo. The freq uency vector f' corresponds to the"direct estimation" in Carnap's system. The vector of inductive probabilities for the singular predictive inferences is obtained as a weighted mean of this vector f' and a vector of a priori probabilities of the N-tuples. In a way completely analoguous to the establishment of the inductive probability for the singular prediction in Carnap's system, the vector fis weighted by the amount of evidence, i.e. the number of N-tuples in the sequence described in the evidence, and the a priori probabilities are weighted by a factor A. This constitutes a continuum of inductive methods for !l'N,m' N = 1 yields Carnap's original system. To be able to be somewhat more general at some points of the following discussion, we mention an obvious extension based on languages !l' G' G represents a strongly connected finite-state graph whose arcs are labelled with predicates. This graph serves the functions of the graphs of !l'N,m in a

76

KURT WALK

more general way, allowing for the consideration of sequences with logical constraints in their construction. The method which we just sketched applies to this case, provided that the description of each sequence allows the identification of its states as states of the graph (this is always possible, for instance, when the initial state is given).

3.1. The parameter k The choice of the parameter A, corresponding to the choice of a specific inductive method out of the continuum of methods offered by Carnap's system, is determined by a priori considerations on the "uniformity" of the universe to which the method is to be applied. The necessity of this choice usually troubles the prospective user of inductive logic. A picture of the implications of the choice of Ais obtained by computation of the a priori probabilities of structure descriptions of a large population. A structure description is characterized by the set of relative frequencies of individuals with equal predicates. The choice A=m, m the number of exclusive predicates (Q-predicates), gives equal a priori probabilities to all structure descriptions. A large A assigns high a priori probabilities to structure descriptions characterized by relative frequencies centred around the value of 11m, i.e. to descriptions of highly inhomogeneous populations. A low A highly confirms structure descriptions of uniform populations, i.e. populations allowing statements like: "almost all individuals have predicate P", or "almost none of the individuals have predicate Q". The a priori considerations before the application of inductive logic include the matching of one's guess on the uniformity of the universe of discourse and the family of a priori probability functions that are at hand. Leaving "uniformity" unspecified as it is in Carnap's books, means leaving the user with the problem of matching two unspecified quantities. Even if he would be able to give an ordering of his possible universes according to some uniformity criterion, he would not necessarily be able to do so for structure descriptions associated with their a priori probabilities. We will set "uniformity" to be synonymous to "simplicity" and use the explication of simplicity by entropy. The entropy of a structure description specifying the set of relative frequenciesJ; of individuals with predicate Pi is given by (15) H = log j..

-II i

Weighting the entropy of every structure description by its a priori probability results in an expectation value of the entropy which is a function of A (see Walk [1963]): (16) E(H) = F(A).

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC

77

This functions F (A) is monotonous, so that each value of A corresponds to a unique value of the expectation of H, and vice versa. A = CfJ corresponds to E(H)=Hm ax and A=O to E(H)=O. The equation relates the parameter of the inductive method to a measure that is well established as measure of information: the entropy of a structure description is the average amount of information provided by the observation of an individual mentioned in the structure description, in the sense of information theory. Since Hmax is determined by the underlying language, it suffices to deal with relative entropy, or the quantity R

= 1-

HJHm a x

(17)

which is known as redundancy. To expect a universe to be redundant to a certain degree, gives good reason to choose the parameter ). of the inductive system to be applied in such a way that the expectation of redundancy provided by this system matches this expectation. There is a large variety of situations where a guess of redundancy is at hand. As an example, it is wellknown that the redundancy of written text in natural language (computed on the basis of letter-frequencies) is about 15%. The estimation function of inductive logic, if applied to the universe of written text, is reasonably chosen with the corresponding parameter A. (Experiments show that the efficiency of estimation is near optimum for this choice.) 3.2. The choice of a priori probabilities. A priori probabilities are usually chosen according to symmetry considerations. The a priori probabilities of the singular predictive inferences in Carnap's system are evenly distributed among all predicates - for there is no a priori reason for preferring one predicate to another, and in the absense of information there is no a posteriori reason, either. In a situation of highest ignorance we expect the highest information supplied by the event that removes the ignorance. So the set of possible events that may remove our complete ignorance, say the observation of the properties of the first individual, will be given probabilities that reflect this expectation. This is just another way of expressing the same consideration. Measuring information by entropy means that we have to maximize the entropy defined by the a priori probabilities. This gives, as an almost trivial result, equal a priori probabilities to the singular predictive inferences for all predicates in Carnap's system. The result is not quite trivial when the languages 2 G are considered: we choose the a priori probabilities for the transitions in graph G in such a way that the entropy of the resulting Markov

78

KURT WALK

source is maximized. This rule gives a unique solution for the a pnon probabilities. We have discussed in the preceding paragraph the choice of A on the basis of some a priori knowledge of the redundancy. There seems to be a contradiction now between the presupposition of a knowledge on redundancy a priori, and the rule that redundancy is minimized by choice of probabilities a priori. Let us assume that we have good reason to expect a certain amount of redundancy; then we can try to choose probabilities a priori in such a way that this redundancy is obtained. This choice, however, is not unique. A unique solution for the set of probabilities is obtained only for the single case of the entropy being maximum. To make a selection among the different solutions would mean to add information that is not available, in contradiction to the above formulation of the principle of simplicity. 3.3. The choice of the language. Faced with an inductive problem, the first step is the formulation in a suitable language. This step implies selecting the information which is used in induction, and of other information that is ignored. Using one of the languages underlying Carnap's system for example, implies that no relation between individuals is considered, that individuals are distinguished by a fixed set of predicates, that names of individuals have no inherent meaning. A language fits to a given problem if the relevant information is used and the irrelevant information is ignored. The choice of the language is not only a matter of a priori considerations. Let us consider the languages :£'Nsm for the description of sequences. We may describe a given sequence of n individuals by any language :£'N,m with N~n and pertinent number of predicates m. With N = 1 the description contains just the freq uencies of individuals with equal predicates, correlation properties of the sequence are ignored. The description with n = N is the specification of one single N-tuple; since induction is based on enumeration in Carnapian systems, this provides little evidence. Which language is the best possible one for the description of a given sequence when the order of the underlying Markov process is not known? There is an answer, at least for the case that there is some a priori knowledge sufficient to justify the choice of A for each language. Each inductive method, based on a certain language and a certain A then gives a different picture of the world, and we may compare these pictures with respect to their simplicity. We may ask for the expected simplicity of the universe, simplicity measured in terms of entropy, and get a definite answer. The expected simplicity will in general be the higher the more information is provided by the evidence and the more regularities are in the evidence. So it

SIMPLICITY, ENTROPY AND INDUCTIVE LOGIC

79

may be concluded that a description of the evidence which results in the expectation of a simpler universe, is providing more relevant information and shows up more regularities, under the proviso, of course, that the inductive methods used are reasonable. We will prefer the description which makes us expect the highest simplicity. Thinking of entropy as a measure of information, we prefer a description which, using a reasonable inductive method, minimizes the amount of information which we expect to gain by experiencing further evidence. Gain of new evidence will in general make a different language the optimum language for description. Step by step observation will increase N steadily up to a certain point (for ergodic sequences). This is what one would intuitively expect. The above considerations give quantitative support. 4. Conclusion. The notion of simplicity has been used in the preceding chapter in considering application problems of inductive logic. The first example was the use of the explication of simplicity by entropy, for the matching of a priori expectation of simplicity provided by inductive logic to the expected simplicity of the universe to which it is to be applied; the second example was a use of the principle of simplicity; and the third example the use of expected simplicity as a measure of success of an inductive method. The simplicity principle basically governs choice situations. The examples presented concern choice situations from a rather low level (the choice of a hypothesis stating a singular predictive inference) up to the level of the choice of an inductive system. Let us assume for the moment the existence of a general inductive method which would yield a quantitative solution to any inductive problem, as soon as it is formulated in some sufficiently rich language. In this hypothetical case there is no explicit application of the principle of simplicity. The above considerations are symptoms of the fact that there is only a modest range of problems that can be treated with inductive logic in a straightforward manner. Those who believe in the development of inductive logic may hope that these considerations eventually will be subsumed under a more general system of inductive logic. References CARNAP, R., 1950, The logical foundations oj probability (University of Chicago Press, Chicago; second edition, 1963) CARNAP, R., 1952, The continuum of inductive methods (University of Chicago Press, Chicago)

80

KURT WALK

CARNAP, R., 1963, An axiom system for inductive logic, in: The Philosophy of Rudolf Carnap (The Library of Living Philosophers), ed. P. A. Schilpp (Open Court, La Salle, Illinois) pp. 973-979 HERMES, H., 1958, Zum Einfachheitsprinzip in del' Wahrscheinlichkeitsrechnung, Dialectica, vol. 12, pp. 317-331 KEMENY, J. G., 1953, The use of simplicity in induction, Philosophical Review, vol. 62, pp. 391-408 KEMENY, J. G., 1955, Fair bets and inductive probabilities, J. Symbolic Logic, vol. 20, pp.263-273 KIESOW, H., 1958, Die Anwendung eines Einfachheitsprinzips auf die Wahrscheinlichkeitsrechnung, Archiv fur mathematische Logik und Grundlagenforschung, vol. 4, pp. 27--41 LEHMAN, R. S., 1955, On confirmation and rational betting, J. Symbolic Logic, vol. 20, pp.251-262 OBERSCHELP, W., 1960, Ilber Einfachheitsprinzipien in del' Wahrscheinlichkeitstheorie, Archiv fur mathematische Logik und Grundlagenforschung, vol. 5, pp. 3-25 SHIMONY, A., 1955, Coherence and the axioms of confirmation, J. Symbolic Logic, vol. 20, pp. 1-28 WALK, K., 1963, Kumulative Information, Nachrichtentechnische Zeitschrift, vol. 16, pp.523-528 WALK, K., 1965, Extension of inductive logic to ordered sequences, IBM Technical Report TR 25.053