On the logic of information retrieval

On the logic of information retrieval

Inform. Stor. Retr. Vol. 2, pp. 217-220. Pergamon Press 1965. Printed in Great Britain O N T H E L O G I C OF I N F O R M A T I O N RETRIEVAL* WILLI...

201KB Sizes 0 Downloads 83 Views

Inform. Stor. Retr.

Vol. 2, pp. 217-220. Pergamon Press 1965. Printed in Great Britain

O N T H E L O G I C OF I N F O R M A T I O N RETRIEVAL* WILLIAM GOFFMAN Center for Documentation and Communication Research, School of Library Science, Western Reserve University, Cleveland, Ohio 44106

I

AN INFORMATION retrieval process may in general be characterized in terms of two sets Q and S together with a relation R. The set Q represents queries, the set S a file of documents and the relation R, called relevance, is a property which assigns to any qeQ a certain subset A of S, called the answer to q. The classical two valued propositional calculus has generally been accepted as the underlying logic of an information retrieval process. This follows directly from the use of the so-called Boolean Model as a representation of the process. In the Boolean Model queries are represented by propositional functions and the file of documents to be interrogated, by a finite set. Since both the algebra of sets and the propositional calculus are Boolean algebras, there is a unique subset of the file which corresponds to each propositional function. For each member in the file the propositional function takes on a truth value of "1" or "0" depending upon whether the document is or is not relevant to the query. Hence the answer consists of all members in the file which take on a value of one, i.e., all members for which the propositional function is assertable. Moreover, because of the Boolean property, the answer can be obtained by taking unions, intersections, and complements of those sets which correspond to the propositional variables which appear respectively as disjunctions, conjunctions, and negations in the propositional function. Since the Boolean Model has been shown to be inadequate for describing an information retrieval process [1], it follows that the two valued propositional logic is also inadequate. If we consider relevance not as a property which is definite for each member of the file relative to the query as is assumed in the Boolean Model, but more realistically as a measure of information conveyed relative to the query [2], it is necessary for the propositional function representing the query to take on more than two truth values. In fact, the propositional function should be permitted to take on any value on [0, 1]~ rather than only "0" or "1". Hence what is required to describe the process of information retrieval is an infinitely valued logic. The logical system which seems to be most appropriate for this purpose is the probability logic suggested by Reichenbach [3]. This system which we shall call Lp may be constructed as follows: * This work is supported by A F O S R G r a n t No. 403-64. t There is no loss in generality by restricting the set of truth values to [0, 1]. 217

218

W. GOFFMAN II

Initially we shall admit a denumerable number of propositional variables denoted by capital letters P, Q, R, S . . . . together with a number of propositional operators in the infinitely valued logic. Since these operators correspond to conjunctions, disjunctions, negations, and implications in the two valued case, we can adopt the conventional symbols ( ^ , v , ---1 and --)) to denote them without confusion. Propositional functions in Lp are then defined by the following set of rules: 1. Every propositional variable is a propositional function. 2. If P is a propositional function then ---1 P is a propositional function. 3. I f P and Q are propositional functions, then P A Q, P v Q, and P--) Q are propositional functions. 4. There are no propositional functions except by virtue of rules 1-3. All real numbers x such that 0 < x < 1 shall constitute the set of truth values which propositional functions in Lp are permitted to take. With every propositional function F(P, Q. . . . T) there is associated a truth function f ( p , q . . . . t) where p, q. . . . t are the truth values of the propositional variables P, Q . . . . T respectively. The value of the truth function f is said to be the truth value of the propositional function F. Thus the truth value of a propositional function is determined solely on the basis of the truth values of its propositional variables. Two propositions in Lp are equivalent "=_" if and only if they always take the same truth value. We shall now define the truth functions of the elementary propositional functions by generalizing the two valued case. Thus 1. 2. 3. 4.

ifF(P) if F(P, if F(P, if F(P,

~ ~ P, thenf(p) = 1 - p . Q) = P A Q, thenf(p, q) = p.q Q) ~ P v Q, thenf(p, q) = p + q - p . q Q) =-- P ~ Q, then f(p, q) = 1 - p +p. q.

A number ~ in the half open interval [0, 1) denotes a designator truth value [4]. All truth values x such that x > ~ are referred to as designated while those truth values x < ~ are said to be undesignated. In the two valued logic, propositional functions which take a value of 1 are asserted and those which take a value of 0 are denied. Similarly, in the infinitely valued logic Lp, propositional functions taking designated truth values are said to be asserted while those propositional functions taking undesignated truth values are said to be denied. Thus the truth value in Lp can be considered as a measure of the degree of assertability of a propositional function. The designator truth value does not affect the structure of the logic but only detremines the number of assertable propositions, i.e., as the designator ~ increases the number of assertable propositions decreases. As in the two valued case, propositional functions which take a value of 1 regardless of the values of their propositional variables are universally valid while those taking a value of 0 irrespective of the values of their propositional variables express contradictions. We wish to impose on our logic the requirement that any proposition implies itself. Hence P--.P must be a universally valid proposition, i.e., its truth function must always be equal to 1. Thus f(p) = 1 - p + p . p = 1 and p2 = p.

On the Logic of InformationRetrieval

219

It can easily be verified that 1. propositions in L p obey the commutative, associative, and distributive laws with respect to the operators " ^ " and " v ", and 2. every theorem in the two valued logic is a universally valid proposition in Lp. A feature of the two valued propositional logic is that not only: a. is the truth value of a propositional function determined by the truth values of its propositional variables but also b. the designation of a propositional function can be determined by the designations of its propositional variables. This is easily seen from the truth tables of the operators. Thus 1. A proposition is true if and only if its negation is false. 2. A conjunction is true if and only if both of its conjuncts are true. 3. A disjunction is false if and only if both of its disjtmcts are false. ' We can construct truth tables for the operators in L p by assigning the value T to propositions which take a designated truth value for some arbitrary designator ¢ and assigning the value F to those propositions which take undesignated truth values. For the case where ~ _~ 1/2 we obtain PAQ P

~P

T

PVQ F

T

F

T

F

(T, F)

F

T

T

F

(T, F)

F

F

T

(T, F)

For the case where ¢ < 1/2 we obtain PAQ

PvQ

P

~P

T

F

T

F

T F

(T, F) T

(T, F) F

F F

T T

T (T, F)

Since these truth tables are multi-valued, conditions 1-3 although still necessary are no longer sufficient. Hence, in this case the designation of the propositional function cannot be obtained in terms of the designations of the propositional variables. This deficiency could be removed if L p could be represented by any n valued system, where n is finite, which yields single valued truth tables. We have already seen that this condition does not hold for n = 2. Suppose we now consider the case where n = 3.* Let all truth values in the dosed interval [1-¢, ¢] be denoted as indeterminate I. This yields the following truth tables:

P

"aP

T

.PAQ I

F

T

PVQ I

F

T x F

F x T

(T,I,F) (X,F) F

(I,F) (X,F) F

F F F

T T T

T (I, 73 (I,T)

T (I,20 (F,I,T)

* We shall pursue the argument for ~ _~ 1/2 only. The case where ~ < 1/2 follows in a similar manner.

220

W. GOFFMAN

The situation has grown worse since we now have three valued truth tables. If we increase the number of truth values to five where/'1 represents truth values in (1/2, ~], I2 a truth value of 1/2, I3 truth values in [1 - ~ , 1/2) then the resulting truth tables could be five valued. In fact, regardless of the number of truth values (intervals) we select, the truth tables will always be at least two valued. This follows from the density property of the real numbers, for given any real number ~ in [0, 1] there exists two real numbers t/1 and )12 in [0, 1] such that ~]1>~, / ~ 2 > ~ and r/lr/z<~ and real numbers r/a<~ and r/4<~: such that ~'/3 -{-/'/4 -- ~ 3/'/4 > ¢ •

We have thus shown that:

Theorem 1: The system Lp cannot be represented by a finite valued system having single valued truth tables. In terms of information retrieval, this theorem implies that no finite valued logic is adequate for representing an information retrieval process. REFERENCES [1 ] J. VERHOErr, W. GOF~MANand J. BELZER:Inefficiency of the Use of Boolean Functions for Information Retrieval Systems, Commun. A.C.M., 1961, 4, [12], 557-559. [2] W. GOFF~AN: A Searching Procedure for Information Retrieval, Inform. Stor. Retr., 1964, 2, 73-78. [3] H. REICHENSAC.: The Theory of Probability, University of California Press (1949). [4] J. B. ROSSER and A. R. TtmQUETrE: Many-Vah~edLogics, Amsterdam (1952).