Knowledge-Based Systems 9 (1996) 483±489
Analogic: Inference beyond logic Karvel K. Thornber NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, USA
Abstract Analogic is the unique class of many-valued, non-monotonic logics which preserves the richness of inferences in (Boolean) logic and the manipulability of (Boolean) algebra underlying logic, and, in addition, contains a number of unexpected, emergent properties which extend inferentiability in non-trivial ways beyond the limits of logic. For example, one such inference is rada (reductio ad absurdum, reasoning by contradiction, but now in the absence of excluded middle). This is important to retain since direct proofs of many theorems are not known. Another example is chaining. Transitivity is uncommon in many-valued logics; however, in analogic we can carry out inferences in either direction even through weak links in the chain. The latter, impossible in logic, simulate intuitive leaps in reasoning. Protologic effects inferences using only (n þ 1) implications which require 2 n implications in logic. Indeed protologic has no counterpart in logic, or any other form of reasoning. Analogic is useful in formulating problems which are largely inferential including document and pattern classification and retrieval. These inference properties, long sought after in alternative logics by adding appropriate axioms or other indicative implicit or explicit restrictions, are now available in analogic, in turn the result of removing an axiom and letting inferences become many-valued. q 1997 Elsevier Science B.V. Keywords: Many-valued logic; Non-monotonic logic; Fuzzy logic; Inference; Fidelity
When one first encounters non-Boolean logics, his/her initial reaction is one of disbelief. Gone is the unmatched manipulability of Boolean algebra which underlies Boolean logic – gone is the rich inferencial structure at the heart of all explicit and formal, and at the root of most implicit and informal, reasoning. In fact, until a few years ago, there was no non-Boolean, or more specifically, no many-valued, logic which preserved these seemingly essential features. Even more frustrating was the difficulty of ascertaining from the point of view of inference whether anything was gained by each novel logic. (Many non-Boolean logics, e.g., fuzzy logic, were not created to effect novel inferences. Similarly, many heuristics referred to as inferences, such as probabilistic inference, are not inferences in the sense of reasoning.) All too often, the only inference explicitly formalized is modus ponens [(A, A ⇒ B) → B], as if, as is the case in Boolean logic, all other inferences follow readily from this primitive rule and the underlying algebra. Well, they do not, and except for an occasional modus tollens [(¬B, A ⇒ B) → ¬A], none were ever achievable. The purpose of this review is to summarize the now widely scattered inferences which make up analogic [1–12]. These include such inferences as reductio ad absurdum (reasoning by contradiction) long thought to be impossible in many-valued logics, intuitive leap, an inference impossible in Boolean logic, and protologic, which has no counterpart at all in Boolean or any other logic. In addition, 0950-7051/97/$17.00 q 1997 Elsevier Science B.V. All rights reserved PII S0950-7051(96)00004-4
analogic preserves the manipulability and richness of inference of Boolean logic. It arises when one removes the restriction of excluded middle (A ∧ ¬A ¼ 1), and equivalently, that of non-contradiction (A ∧ ¬A ¼ 0), from the list of properties usually demanded of Boolean algebra. This permits many-valued logics to be considered. (Like Euclid’s parallel axion, excluded middle is often regarded as less essential than the other axioms usually stated to formally define logic.) As we discuss these inferences in turn it is important to keep in mind that these are all emergent: they result from the removal of an axiom. Nothing was put in, either explicitly or implicitly, in order to achieve the derived results. We are thus able to take the process of inference beyond logic to see what other aspects of reasoning we may reasonably expect to capture. In addition to dropping the axiom of excluded middle, the major conceptual change necessary for analogic was to realize that inference had to be generalized from a two-level to a multilevel procedure. This is non-trivial. ‘From A and A ⇒ B, B may be inferred’ is the primitive statement on which all formal reasoning is based. Either one can effect an inference by this rule of detachment, or one cannot. It is primitive in that no way is known to express the consequent B as an expression in the algebra underlying the logic involving the given antecedent A and implication A ⇒ B. (Indeed, any proof of non-primitiveness would require its use.) In Boolean logic, if A ¼ 1, and (A ⇒ B) ¼ 1 we infer B ¼ 1;
484
K.K. Thornber/Knowledge-Based Systems 9 (1996) 483–489
otherwise we can infer nothing. In all other nonBoolean logics, this yes/no aspect of inference had been simply carried over from Boolean logic, with the result that almost nothing novel would ever be achieved. On the other hand, one would expect that in order to infer the value of B from the values of A and of A ⇒ B in the context of a many-valued logic, some modification of the various rules of inference would be necessary. The key to analogic, and hence to inference beyond logic, was to effect the simplest modification possible. Before discussing this, however, we must first discuss the algebra underlying the logic. In Boolean algebra quantities, A, B, C … can take on the values of {0, 1}, these combine according to AND (A, B) ¼ A ∧ B ¼ 1 if A ¼ B ¼ 1, and ¼ 0 otherwise; OR (A, B) ¼ A ∨ B ¼ 0 if A ¼ B ¼ 0, and ¼ 1 otherwise; complement (A) ¼ A ¼ 1 if A ¼ 0, ¼ 0 if A ¼ 1. These satisfy a great many properties, the most important of which being 1. Idempotency: A ∧ A ¼ A, A ∨ A ¼ A 2. Associativity: A ∧ ¬(B ∧ C) ¼ (A ∧ B) ∧ C, A ∨ (B ∨ C) ¼ (A ∨ B) ∨ C 3. Commutativity: A ∧ B ¼ B ∧ A, A ∨ B ¼ B ∨ A 4. Distributivity: A ∧ (B ∨ C) ¼ (A ∧ B) ∨ (A ∧ C), A ∨ (B ∧ C) ¼ (A ∨ B) ∧ (A ∨ C) 5. de Morgan’s Laws: ¬(A ∧ B) ¼ ¬A ∨ ¬B, ¬(A ∨ B) ¼ ¬A ∧ ¬B 6. Equivalence: (A ∧ B) ∨ (¬A ∧ ¬B) ¼ (¬A ∨ B) ∧ (A ∨ ¬B) 7. Absorption: A ∨ (A ∧ B) ¼ A, A ∧ (A ∨ B) ¼ A 8. Identity: A ∨ 0 ¼ A, A ∧ 1 ¼ A 9. Nullness: A ∧ 0 ¼ 0 10. Fullness: A ∨ 1 ¼ 1 11. Involution: ¬¬A ¼ A 12. Law of Excluded Middle: A ∨ ¬A ¼ 1 13. Law of Noncontradiction: A ∧ ¬A ¼ 0 Satisfying all these properties renders Boolean algebra the single most manipulative of all algebras. Passing now to many-valued logics, A, B, C,… take values in [0,1]. Many interpretations exist for AND, OR, and complement when A, B, C,… can take on three or more values. However, if even only the most basic properties are to be preserved, the choice for AND and OR is unique for a surprising number of combinations of these properties[5]. For example, choosing idempotency, associativity, monotonicity in each argument, continuity and Boolean agreement fix A ∧ B ¼ min(A, B) and A ∨ B ¼ max(A, B). On the other hand, complementation is very general: so long as ¬1 ¼ 0, ¬0 ¼ 1, ¬¬A ¼ A, and ¬A is continuous and strictly decreasing with increasing A… properties (1)–(11) will be satisfied, with, of course, AND as minimum and OR as maximum. Unless otherwise specified, we use ¬A ¼ 1 ¹ A. This comprises no subsequent result. We call this algebra the Lukasiewicz algebra in memory of its first proponent. Next, we must choose something for implication. Lukasiewicz first tried (A ⇒ B) ¼ max(¬A, B), the natural
extension from Boolean logic where (A ⇒ B) ¼ (¬A ∨ B). But since (A ⇒ B) ¼ 1 only for A ¼ 0 or B ¼ 1, he chose (A ⇒ B) ¼ min(1, B þ 1 ¹ A) which is unity for B $ A. He, like those who followed, insisted on (A ⇒ B) ¼ 1 if valid inferences were to be effected. The price he paid was extreme: his implication is incompatible with max and min for OR and AND. Hence, manipulability is greatly restricted. For example, (A ∧ B) ⇒ C is no longer equivalent to A ⇒ (¬B ∨ C). Others have tried to remedy this situation by choosing (A ∧ B) ¼ max (0, A þ B ¹ 1), (A ∨ B) ¼ min(1, A þ B), but then one loses idempotency: A ∨ A and A ∧ A are no longer A, which is completely unsatisfactory for logic. In order to retain the manipulability of Boolean logic, we choose (Kleene–Dienes implication) (A ⇒ B) ¼ max(¬A, B) As we shall see, this choice enables us to infer the value of B whenever ¬A , (A ⇒ B), a feat not possible when (A ⇒ B) ¼ 1 for B $ A. It also enables us to write down well-defined inferences for extensions of all inferences of logic. (Regarding manipulability, lack of (12) and (13) is only somewhat restrictive. For example, there will now be 84 different functions of two variables with an additional 113 if the value 1/2 is permitted over non-infinitesimal intervals of the argument. Inference is not affected by this limitation, which if dropped would force us back to Boolean logic.) Many-valued logicians eschew (A ⇒ B) ¼ max(¬A,B) because then (A ⇒ A) ¼ max(¬A,A) which is unity only if A [ {0,1). They argue that ‘if A, then A’ must have unit membership for all 0 # A # 1 [6]. But this assumes that ‘if A, then B’ is an interpretation of A ⇒ B, which logicians since George Boole have realized is not the case for logic. (For if in A ⇒ B, A ¼ 0, then (A ⇒ B) ¼ 1 for B [ {0, 1}. But now ‘if A, then B’ is an interpretation of A ⇒ B, ‘if A ¼ 0, then B ¼ 0’ and ‘if A ¼ 0, then B ¼ 1’ would both have unit value (would both be true), which is impossible for arbitrary A, B.) I have never seen it explained why they did not also insist on (A ⇒ ¬A) being zero. While in logic (A ⇒ A) ¼ 1, in logic (A ⇒ ¬A) ¼ ¬A rather than being zero. How Boolean logic survived this most ‘embarrassing’ asymmetry I do not know. The simple fact, however, is that if ‘if A, then B’ is not an interpretation of A ⇒ B even in logic, no intuition regarding the former need carry over to the latter. (There are also numerous mathematical reasons why A ⇒ A need not be unity, but it is satisfying that we need not venture beyond logic to remove this restriction. A scientist is someone to whom nothing is too obvious to be invalid.) (At this point we can at last see why most many-valued logics gridlock with respect to manipulability and inference. To retain excluded middle (A ⇒ A ¼ 1) but pass to a manyvalued logic, other of the remaining properties of logic must be modified. Given the uniqueness properties of the Lukasiewicz algebra and the necessity of retaining implication as an algebraic operation, this would seem to be very difficult,
K.K. Thornber/Knowledge-Based Systems 9 (1996) 483–489
unless perhaps one radically departs from Boolean logic altogether. It is, of course, not at all obvious that our choice will retain the inferences of logic and take us beyond logic – so it is to this end that the rest of this paper must be devoted.) We can, at last, turn to the inferences. Doing so, however, we encounter the concept of fidelity [3,4], the validity of inference, which plays a key role in all inferences beyond logic [7,8]. As indicated above, if (A ⇒ B) . ¬A we can infer B ¼ (A ⇒ B): since for (A ⇒ B) ¼ (¬A ∨ B) . ¬A, it follows from the property of maximum that B ¼ (¬A ∨ B) ¼ (A ⇒ B). However, should (A ⇒ B) ¼ ¬A, 0 # B # (A ⇒ B). (While the latter information might be useful in some cases, our only concern here is when a specific value for B (or A) is obtainable.) Now, we should not always expect to be able to infer B from A and A ⇒ B. Indeed in logic both must be unity to infer that B is unity. However, we see that A ⇒ B need not be unity in order to affect inferences of B’s value. Indeed we can simply write (A, A ⇒ B) → B ¼ (A ⇒ B) whenever (A ⇒ B) . ¬A for modus ponens. But what do we put for B when (A ⇒ B) ¼ ¬A? If we imagine a hardware implementation of modus ponens, with A and A ⇒ B as input and B as an output, some voltage, charge, current, flux, etc., will always be on the output line. A simple representation of (A ⇒ B) . ¬A is f B ¼ (A ⇒ B)v¬A, where xvy ¼ x ¹ y for x $ y and xvy ¼ 0, x , y. If f B . 0 the necessary inequality is satisfied and B ¼ (A ⇒ B) can be inferred. If f B ¼ 0, we ignore B. If (A ⇒ B) and A are noisy, f B expresses a noise margin. In logic f B ¼ 1 would indicate a valid inference and f B ¼ 0 an invalid one. In this way we are led to the fundamental modus ponens and modus tollens inferences for analogic (A, A ⇒ B) → B ¼ (A ⇒ B), fB (A ⇒ B)v(¬A) (¬B, A ⇒ B) → ¬A ¼ (A ⇒ B), f¬A ¼ (A ⇒ B)v(¬¬B) and regard f B and f ¬A as the degrees of validity, i.e., the fidelities of these inferences. In our hardware implementations A and (A ⇒ B) are input and B and f B are output. Fidelity plays an unexpectedly crucial role in simplifying inference expressions, avoiding inequalities, enabling representations, etc. We shall shortly see that if an inference involves chaining, e.g. A ⇒ B, B ⇒ C, C ⇒ D, the fidelity of the result is the minimum of the fidelity of each link in the chain. If the same consequent, say B, can be inferred via several routes, the value corresponding to the highest fidelity, rather than the highest membership value, is to be regarded as most valid. This enables non-monotonicity in that subsequently lower values of B with higher fidelity can replace higher, more preliminary, values with lower fidelity. While some many-valued logics had rudimentary modus ponens inferences, and a few even had a modus tollens inference, no one even attempted reductio ad absurdum (reasoning by contradiction), since presumably such an inference is impossible in the absence of a principle of excluded middle. However, for analogic such inferences,
485
which we refer to as rada, are trivial[1,2]: (A ⇒ B, A ⇒ ¬B)) → ¬A ¼ min(A ⇒ B, A ⇒ ¬B), f¬A ¼ (A ⇒ B)v¬(A ⇒ ¬B) These are essential, since in logic no direct proofs are known for many theorems proved using reductio ad absurdum. Another important inference lacking in most manyvalued logics is chaining. Usually relying on transitive implications, (A ⇒ B) ∧ (B ⇒ C) ; (A ⇒ C), chaining becomes impossible when this equivalence relation is lost. However, approaching transitivity from inference, we have no problem in effecting chaining [1–4]. Consider the chain A ⇒ X 1, X 1 ⇒ X 2, X 2 ⇒ X 3,…, X n¹2 ⇒ X n¹1, X n¹1 ⇒ X n, X n ⇒ B, where each of these implications is a number R m in [0, 1], m ¼ 0, …. Given A, we can affect the following inferences: (A, A ⇒ X 1) → X 1 ¼ R 0, f X1 ¼ R 0v¬A; (X 1, X 1 ⇒ X 2) → X 2 ¼ R 1, f 1 ¼ R 1v¬R 0, …; X n ¼ R n¹1, f n¹1 ¼ R n¹1v¬R n¹2; B ¼ R n, f n ¼ R nv¬R n¹1. The fidelity of the consequent B is the minimum of f X1 and the fidelity of the chain, f AB ¼ min (f 1 … f n). This latter quantity depends only on the implications composing the chain and not on A (for modus ponens) or B (for modus tollens). The membership value inferred for B depends only on the final implication X n → B, and, if we were transversine the chain in the reverse direction for ¬A, ¬A depends only on the initial implication A ⇒ X 1. Thus we can represent the chain by (A ⇒d B) ¼ (A ⇒ X1 , Xn ⇒ B, fAB ) where the ⇒ d means derived implication. This leads to a transitivity relation of derived implication of the (closed) form (A ⇒d B, B ⇒d C) → (A ⇒d C) ¼ (A ⇒ X1 , Ym ⇒ C, fAC ) where f AC ¼ min (f AB, f BC, (B ⇒ Y 1)v¬(X n ⇒ B)), and inferences (A, A ⇒d C) → C ¼ (Ym ⇒ C), fC ¼ min(fAC , (A ⇒ X1 )v¬A) (¬C, A ⇒d C) → ¬A ¼ (A ⇒ X1 ), f¬A ¼ min(fAC , (Ym ⇒ C)v¬¬C) Buried in the fidelity of the chain, f AB, is a most curious result [1,2]. In order for f AB . 0, each of the f m must satisfy f m . 0, which in turn requires (X m ⇒ X mþ1)v¬(X m¹1 ⇒ X m) . 0 for each m. Now consider two adjacent pairs, say X 3 ⇒ X 4, X 4 ⇒ X 5, and X 5 ⇒ X 6. If (X 3 ⇒ X 4) ¼ 0.9 and (X 5 ⇒ X 6) ¼ 0.8, so long as (X 4 ⇒ X 5) is larger than 0.2, f 4 and f 5 will be non-zero and the chain fidelity, though diminished, will not vanish due to the small value of X 4 ⇒ X 5. The curious aspect here is that in the limit of Boolean logic, all (X m ⇒ X mþ1) , 1/2 will tend to zero, breaking the chain and rendering any Boolean inference unwarranted. However, in analogic such inferences can still be effected. We call this phenomenon intuitive leap, in that even with a weak link in (A ⇒ dB), one can still relate an antecedent A
486
K.K. Thornber/Knowledge-Based Systems 9 (1996) 483–489
and a consequent B which would be totally missed in logic. Once one knows A and B are related, one can seek stronger links connecting them. This is a common aspect of human reasoning which no prior logic has succeeded in representing. Note that on either side of a weak link (X m ⇒ X mþ1) , 1/2 there must be stronger links (X m¹1 ⇒ X m) . 1/2 and (X mþ1 ¼ X mþ2) . 1/2. Otherwise as many as every other link can be weak. Before describing the more sophisticated inferences involving implications, we must consider inclusion. Instead of regarding the A, B, C… as numbers in [0,1], we could just as well regard them as functions A(x), B(y), C(z),…, taking values in [0,1]. Now in logic, if A(x) ⇒ B(x) for all x in some domain of interest, then A ⇒ B would be regarded as a theorem valid for all those x. Passing now to many-valued logics, the A, B, represent membership functions that x has property A, B, …. Now just as a set A of elements x satisfying A(x) is said to be a subset of B, A , B, if for each A(x), we have B(x), a natural definition for inclusion relating the membership of A(x) in B(x) would be (Kitainik, 1987)[13] A , B ¼ ∧x (Ax ⇒ Bx ) where ∧ x represents minimization over the elements x of interest. Thus, the inclusion of A in B is governed by the least valid (weakest) implication. This result is both useful and general. It is useful in that all inferences expressed with implications and derived implications can be reexpressed in inclusion and derived inclusion. It is general in that the A , B above is unique if one requires only (1) that it be independent of the order (permutation) of the elements x, (2) that it reduce to the Boolean result in the {0, 1} limit, (3) that it be contrapositive, (A , B ¼ (¬B , ¬A), and (4) that it be distributive, (A ∨ B) , C ¼ (A , C) ∧ (B , C) and A , (C ∧ D) ¼ (A , C) ∧ (A , D). Inclusion is often regarded as the key to inference owing to its embodiment of results holding for many examples. Analogic clearly captures this embodiment. By contrast, probability theory would seem not to be able to do so. If it were able to, then inclusion would have to be represented in some measure, while the above very non-measurable expression is unique. Hence, any measure-theoretic approach would have to weaken one or more of the above four properties of inclusion and aready curtail the resulting inferences, much as pre-analogic, many-valued logic was unable to emulate the most basic inferences of logic. One of the most important inferences in logic is the cutrule inference: (A1 ⇒ L ∨ B1 , A2 ∧ L ⇒ B2 ) ⇒ (A1 ∧ A2 ) ⇒ (B1 ∨ B2 ) Here the auxiliary proposition L is eliminated, or cut, from two related implications to yield a single, coupled implication in the principal propositions. Suppose, for example, you are in a line of traffic which dips out of sight a few cars ahead of you and comes back into view many cars down the line. The nearest cars are all white and the farthest are all other colors; so you ask yourself whether a non-white car is
somewhere just ahead of a white one. If W m is the proposition that the mth car is white and J m,mþ1 is the just-ahead proposition, then we have the set of implications W 1 ⇒ J 1,2 ∨ W 2, W 2 ⇒ J 2,3 ∨ W 3, …, W n¹2 ⇒ J n¹2,n¹1 ∨ W n¹1, W n¹1 ∧ ¬W n ⇒ J n¹1,n. Cutting W 2 through W n¹1 yields (W1 ∧ ¬Wn ) ⇒ ∨m Jm, m þ 1 which both provides an affirmative answer to the question of adjacency and suggests the (cognitive) rule of change, if at one position in a line an entity is X and at another the entity is ¬X, at some position there must be a change and hence an adjacency of X and ¬X. Although one could go through the above inference step by step, or use mathematical induction on m, in reality the eliminations should be viewed as simultaneous, parallel rather than serial. This is readily seen from the cut rule in analogic[3,4]. Consider the set (A 1 ⇒ (B 1 ∨ L 1)) ¼ R 1, ((L 1 ∧ A 2)) ⇒ (B 2 ∨ L 2)) ¼ R 2, …, ((L n¹2 ∧ A n¹1) ⇒ (B n¹1 ∨ L n¹1)) ¼ R n¹1, ((L n¹1 ∧ A n) ⇒ B n) ¼ R n. Now the motivation in logic behind the cut rule is that even though for no r is A r ⇒ B r valid in all cases, for each case or example (x), A s(x) ⇒ B s(x) for some s, and there are auxiliaries L r(x) which can be chosen to render the other R r valid. For example, choosing L 1, …, L s¹1, ¬L s, …, ¬L n¹1 unity would satisfy all implications whenever A s ⇒ B s. In analogic, we choose L 1, …, L s¹1, ¬L s, …, ¬L n¹1 to be larger than R s where (A s ⇒ B s) . (A r ⇒ B r) for all r Þ s. Then ∧r Rr ¼ ∧r ((Lr ¹ 1 ∧ Ar ) ⇒ (Br ∨ Lr )) ¼ ∨r (Ar ⇒ Br ) ¼ (( ∧r Ar ) ⇒ ( ∨r Br )) which effects the cut-rule elimination. Here L 0 ¼ 1 and L n ¼ 0 to unify the notation. What is the fidelity of such an expression? – Can we take advantage of intuitive leap here also? In order that L r not dominate both R r and R rþ1, it is necessary and sufficient that f r ¼ (R rþ1v¬R r) . 0. The overall fidelity of ( ∧ A) ⇒ ( ∨ B) is therefore ∧ rf r. As with chaining, some R s can be less than 0.5 and not prevent executing the cut rule, but note, as with chaining, these R s cannot be adjacent (directly coupled in this case). Analogous results obtain when implication is replaced by inclusion and even by derived implication and derived inclusion. Unlike control theory, where the laws of nature exert sufficient constraints on system response that methods based on (fuzzy) interpolation are adequate, there are many weakly constrained situations, such as pattern classification and retrieval, where more inferential approaches are required. If one searches for the patterns most similar to a given pattern, clearly the desired patterns are not non-existing interpolated patterns. However, we also know that it is very difficult if not impossible to classify patterns of any richness in such a way that all patterns of comparable similarity are classified together: choosing one set of features to classify will prevent crisp searches in terms of alternative sets. Analogic, because of its unexpected
K.K. Thornber/Knowledge-Based Systems 9 (1996) 483–489
successes in extending the full range of logical inferences to many-valued membership functions, offers the possibility to overcome these obstacles. As is well-known in computer science, the most efficient search of a database is accomplished by casting the database into a tree structure and then searching the tree from the root to the leaves for the item of interest. Inference on a tree structure is more complicated than simple chaining because at each node one must choose which chain to follow [10–12]. Also the values of the implications must be determined from the data itself. Indeed the data should indicate its own classification for efficient searching. Considering the following set of inferences on a tree structure: (A ∧ Smq ⇒ Bm ) ¼ Rm (Bm ∧ Smkq ⇒ Cmk ) ¼ Rmk (Cmk ∧ Smkqr ⇒ Dmkr ) ¼ Rmkr Here A is the membership function for the root of the tree, S mq is the similarity of a query pattern to the cluster-center of the m th cluster, B m is the membership function of the m th cluster, S mkq is the similarity of the query to the k th subcluster of the m th cluster, C mk is the latter’s membership function, etc., and the R m, R mk … are the values (membership functions) of their respective implications. The idea, of course, is that one starts with a query ‘belonging’ to A, with value A q, determines S mq for each of M clusters, and infers B mq ¼ R mq with fidelity f mq ¼ R mqv¬(A ∧ S mq). Now among the M clusters only a few (h M) will have f mq . 0 (or f mq greater than some positive threshold if more appropriate). Only these clusters will be searched further. These are searched in a similar manner, and the process is iterated down the tree. Leaves reached with non-zero fidelity can be searched pattern-by-pattern for possible commonality. The natural question to ask is what are the R m, R mk, etc. Consider the quantity bm ¼ maxj9«9m Smj9 where S mj9 is the similarity of a filed pattern j9 (in the database) not associated with the m th cluster to the cluster center of cluster m. (Generally a filed pattern j is nominally associated with cluster m if S mj . S m9j for all clusters m9 Þ m.) Now suppose b m < 0. This means no pattern not in cluster m is similar to those in cluster m. In this case, we expect R m < 1: similarly of a query to m will indicate a large membership in m: the m cluster is relatively sharp, i.e. nondiffuse. On the other hand, b m < 1 means that patterns associated with other clusters can be nonetheless quite similar to those in cluster m. Now we would expect R m < 0: even high similarity of a query to m will not indicate a large membership. It should come as no surprise that we take R m ¼ 1 ¹ b m as the membership function of the implication associated with the similarity (S mq) of a query to cluster m. We can refer to 1 ¹ b m as the sharpness of the cluster m. One more
487
quantity is also suggested: am ¼ minj«m Smj Here a m specifies the tightness of the cluster m. The larger a m, the larger must be the similarity of the members of the cluster m to its center (prototype or average member). Returning now to the inference on our tree of clusters we noted that f mq ¼ R mqv¬(A q ∧ S mq) ¼ min (R mqv¬A q, R mqv¬S mq). Since we are taking R m ¼ 1 ¹ b m, strictly speaking R mq $ 1 ¹ b m; however, for simplicity we will ignore the pattern dependence of R m: that of S mq will suffice. Thus f mq ¼ min (R mv¬A, R mv¬S mq). Subsequent fidelities include f mkq ¼ min (f mq, R mkv¬B m, R mkv¬S mkq) ¼ min(f mq, R mkv¬S mkq), and f mkq ¼ min (f mkq, R mkrv¬S mkrq). Unlike the fidelities for a chain, those for a tree involve two separate but related chains. The first involves only the filed patterns and the clustered structure chosen to represent them: min (1 ¹ b mkvb m, 1 ¹ b mkrvb mk, …). Recalling intuitive leap, we see that diffuse clusters can be tolerated if they precede and follow sharper clusters. The second reflects the actual cluster similarities of the query pattern with the cluster centres: min (S mqvb m, S mkqvb mk, S mrqvb mkr, …). Now suppose q is at least as similar to cluster m as the least filed pattern associated with m. Then S m qvb m $ a mvb m. By constructing the clusters so that we maximize the minimum of a mvb m over m, we can ensure similar patterns will be found when the query is of comparable similarity to at least one center. On the other hand, if the query is similar to patterns associated with another center, S mqvb m < 0 and the cluster need not be searched further. Suppose that the query is a filed pattern. Then S mqvb m is at least a mvb m for the m of the pattern, and zero for the other m’s. The latter observation follows because b m is the maximum of S mj9 over patterns not assigned to m. Hence S mq , b m for j any filed pattern not assigned to m. Thus filed patterns can be found most efficiently. Often the query will lie between clusters and (a mvb m) . (S mqvb m) . 0 for several m. The search time for scanning the query, searching three layers and scanning the chosen leaves is (S þ MI M þ h MMKI K, þ h MMh KKRI R þ h Mh Kh RJI J)t where s is the number of features per pattern, I M, I K, I R, I J, are the number of features associated with each cluster, J is the total number of filed patterns and t is the processing time per feature. This time is significantly less than full-pattern search times of s(J þ I)t and brute-force search times of (s þ JMI)t. Again the key here was the ability to affect inferences under conditions of partial membership in various clusters of both filed and query patterns. Finally we turn to an example of inference with no counterpart in logic[9]. Consider A ⇒ (¬B ∨ C) ¼ p 1, A ⇒ (B ∨ C) ¼ p 2, A ⇒ (B ∨ ¬C) ¼ p 3, and A ⇒ (¬B ∨ ¬C) ¼ p 4. Using rada we can obtain first (A ⇒ C) ¼ min (p 1, p 2) and (A ⇒ ¬C) ¼ min (p 3, p 4), and then again inferring ¬A ¼ min (p 1, p 2, p 3, p 4) with f ¬A ¼ min (p 1v¬p 2, p 3v¬p 4, p 1v¬p 4, p 2v¬p 3, p 1v¬p 3, p 2v¬p 4). In the limit of logic one obtains the usual result. However, suppose we only had three of these four expressions: suppose the last were missing. In
K.K. Thornber/Knowledge-Based Systems 9 (1996) 483–489
488
logic, or even analogic, we could not effect an inference for ¬A. We could get A ⇒ B and A ⇒ C, but not ¬A. Yet under some circumstances we can in fact make this inference, albeit with lower fidelity. For example, here we find ¬A ¼ min (p 1, p 2, p 3), f ¬A ¼ min(p 1v¬p 2, p 2v¬p 3, p 3v¬p 1, max(p 1vp 2, p 3vp 2)). Note that the last entry is zero in logic. This remarkable logic, which requires only (n þ 1) implications in place of up to 2 n implications in logic, I have called protologic, and have only begun to explore its interesting properties. We can learn a great deal from the solution of the set of n equations, each of the form (i ¼ 1, …, n)
{a}, two points can be said to be adjacent if they are a (Hamming) distance of 1 apart: they differ in only one component. The set is connected if one can pass from any point in it to any other point via pairs of adjacent points. (Loops are not permitted, as their presence indicates that the equations are underdetermined. Indeed (a k ¹ a l) must be orthogonal to (a r ¹ a s) where (a k, a l) and (a r, a s) are any two pairs of adjacent points.) Once the set of equations is ascertained to be a connected set, ¬A can be inferred at once: ¬A ¼ mini pi
i3 i4 ∧n (A ⇒ (Bi2 2 ∨ B3 B4 … ∨ Bn )) ¼ pi
f¬A ¼ minijk (pi v¬pj (i Þ j), max(pk1 vpk2 , pk3 vpk2 ))
where each variable is either a B j or a ¬B j or 0. Since (A ⇒ B) ¼ max (¬A, B), it is clear that ¬A ¼ min i (p i) ¼ ∧ p i. The more difficult task is to determine the fidelity f ¬A. To effect this we note that B j # p i in all those equations in which B j enters. In particular, B j can be no larger than the minimum p j for an equation in which it appears. Consider, therefore, the equation with maximum p i. There will be one and only one variable, say B r, which appears only in that equation (¬B r can appear in any other equation. For this part of the argument B r and ¬B r are regarded as ‘different’ variables since they have different values. Of course, ¬B r could be the uniquely appearing variable, and B r entering elsewhere, etc.) We then take B r ¼ ∨ ip i, drop that equation, and, employing the cut-rule relations p i . ¬p j for all (i, j) pairs of equations with complementary (B k, ¬B k) entries, eliminate ¬B r from all the remaining equations. If no uniquely occurring variable appears in the equation with maximum p i, no solution is possible. If more than one uniquely occurring variable is present. the remaining equations become over-determined, so again no solution is possible. Turning now to the fidelity, the elimination of the uniquely-occurring variable introduces p mxvp ls into the fidelity minimum. Here p mx ¼ ∨ ip i, the maximum p i of the equations, and p ls is the largest p i of an equation not containing a uniquely-occurring variable. Having dropped the equation containing B r, and eliminate ¬B r from the rest, there remain n ¹ 1 equations, and the above process is iterated until only two equations remain. Then rada provides the final fidelity entry. The fidelity minimum, of course, contains the p iv¬p j, again for each (i,j) pair of equations with complementary (B k,¬B k). The above algorithm provides the fidelity for any specific case, but requires iteration and does not provide a formula for f ¬A in terms of the p i as we are normally accustomed to obtaining. I have not found such a formula for a general set of equations: however, if the equations form a connected set, then such a formula is straightforward. Suppose we represent each equation by a point in {0, 1} n¹1, where a ¼ (a 2, …, a n), a i«{0, 1}; a i ¼ 1 if B i is present, a i ¼ 0 if ¬B i is present. (If neither B i nor B j appears, this trick does not work.) Now, within this set of n points
where k refers to the triple of points (k 1, k 2, k 3) where k 1 and k 3 are each adjacent to k 2, and the minimum is over all such triples. Each point can belong to one or many triples. The picture here is the following. In a connected set of equations, there can be no local maximum. If there is, the fidelity vanishes. The reason is important. Consider the three adjacent equations:
B ijj
¬A ∨ S ∨ ¬B ∨ C ¼ p1 ¬A ∨ S ∨ B ∨ C ¼ p2 ¬A ∨ S ∨ B ∨ ¬C ¼ p3 Here S summarizes the other variables. Clearly ¬A ∨ S # min (p 1, p 2, p 3). If p 2 . p 1, p 3 (local-maximum condition) then B # p 3 , p 2 and C # p 1 , p 2. Hence the p 2 equation cannot be satisfied, and we conclude that local maxima can occur only at the ends or boundaries of a connected set. Clearly also there can be only one local minimum. Note also that if one does not start with a connected set, once the reduction algorithm reduces the initial set of equations to one, then the above applies at once. One can also develop a protologic using inclusion, derived implication and derived inclusion. In the above, I have attempted to highlight the essentials of analogic: how it solved the nearly three-quarters-of-acentury-old problem of the lack of inference and manipulability in many-valued logic by extending to many values the true-valued inference used exclusively for 2300 years; how it overcame the phantom limitation of (A ⇒ A) ¼ 1; how it affected reductio ad absurdum in the absence of a principle of excluded middle, the ‘parallel’ postulate of logic; how it represented intuitive leap and nonmonotonicity with the help of the identification of fidelity; how it rendered into many-valued logic the inferences of logic while retaining the manipulability of the underlying algebra; how it enabled the inferences necessary for tree searches; and how it adapted to inferences with no counterpart in logic, or any other predecessor for that matter. Omitted were nearly all details of the deviations of these results, and extensions to derived implication, inclusion, and derived inclusion. Throughout, the concept of inference as a rule
K.K. Thornber/Knowledge-Based Systems 9 (1996) 483–489
of detachment, unrepresentable purely algebraically, has been retained uncompromisingly. Analogic should become a useful tool whenever inference in many-valued, nonmonotonic logics is necessary. References [1] K.K. Thomber, A new look at fuzzy-logic inference, in: D. DuBois, H. Prade (eds.), IEEE Conference on Fuzzy Systems, 1992, pp. 271–278. [2] K.K. Thornber, A key to fuzzy-logic inference, Int. J. Approx. Reasoning 8 (1993) 105–121. [3] K.K. Thornber, The role of fidelity in fuzzy-logic inference, in: P.P. Bonissone (ed.), IEEE International Conference on Fuzzy Systems, 1993, pp. 938–943. [4] K.K. Thornber, The fidelity of fuzzy-logic: Inference, IEEE Trans. Fuzzy Systems 1 (1993) 288–297. [5] W.D. Smith, K.K. Thornber, The uniqueness of the minimum and maximum functions for AND and OR in fuzzy logic, J. Fuzzy Math. 1 (1993) 601–609.
489
[6] K.K. Thornber, Some novel results in fuzzy-logic inference, Proc. NAFIPS ’93, pp. 247–250. [7] K.K. Thornber, Fuzzy Syllogistic System, U.S. Patent No. 5 392 383, filed 1991 September 13, issued 1995 February 21. [8] K. Takahashi, Fuzzy Syllogistic System, U.S. Patent No. 5 361 325, filed 1993 March 27, issued 1994 November 1. [9] K.K. Thornber, Inference beyond logic: Protologic, in: K. Hirota, T. Fukuda (eds.), Proceedings Fuzzy-IEEE/IFES’95, pp. 1213–1218. [10] K. Takahashi, K.K. Thornber, Transitive fuzzy-logic inference for the nearest-pattern search, in: K. Hirota, T. Fukudo (eds.), Proceedings Fuzzy-IEEE/IFES ’95, pp. 2327–2334. [11] K. Takahashi, K.K. Thornber, Nearest-document retrieval using fuzzy-logic inference, in: S. Miyamoto (ed.), FUZZ-IEEE/IFES ’95 Workshop on Fuzzy–Database Systems and Information Retrieval, pp. 31–36. [12] K. Takahashi, K.K. Thornber, Application of fuzzy inference to nearest-neighbor search, in: S. Sandri (ed.), Proc. IFSA’95, pp. 153–156. [13] L.M. Kitainik, Fuzzy inclusion and fuzzy dichotomous decision procedures, in: J. Kapcjprzyk, S.A. Orlovski (eds.), Optimization Models Using Fuzzy Sets and Possibility Theory, Reidel, Boston, MA, 1987, pp. 154–170.