Chapter 6
Diagnostic Abduction in AI Patient: Doctor, it hurts like blazes if I do this (extending his arm
upwards). Doctor: Well, my advice is: Don't do that. That will be $500.00,
please. Groucho Marx
6.1
Explanationist Diagnostics
One of the environments in which abduction is most at home is diagnostics. In its general form, the diagnostician's task is to match a disorder to an array of symptoms. In so doing, he or she is often faced with an unwanted abundance at both ends of the process. At the beginning there is a plurality of candidate disorders. And, although the diagnostician strives to shrink this plurality to one, it cannot be excluded that the end of the process a plurality may still exist, albeit a smaller one. Like all abducers, the diagnostician is faced with the Cut Down Problem. He also faces issues relating to the process-product distinction,which, as before, presents the abductive logician with a problem in the engagement-sublogic. The problem is that, as for any abduction problem, even if it is justified to postulate the existence of a filtration structure in which abductive solutions are cutdowns of up to very large possibility spaces, there is no empirical evidence that real-life abducers achieve their abductive targets by constructing such structures. Apart from these quite general features, diagnostics liberally instantiates a distinction of importance for the logic of hypothesis-generation. On the one hand, it is often the case that all the candidate disorders are known in advance, obviating the need for generation. In such cases, the abductive task is to pick a candidate from a known 155
156
Chapter 6. Diagnostic Abduction in AI
field each of whose members is underdetermined by the symptoms to which the diagnostician has access. On the other hand, there are cases in which the symptoms are more radically mysterious, since they present themselves in the absence of any candidate disorders. In this chapter we shall briefly review some representative treatments of diagnostic abduction by AI researchers. Josephson and Josephson take an explanationist tack, whereas Peng and Reggia's 1 approach integrates explanationist and probabalistic elements. In a further section, we shall make some effort to adjudicate the rivalry between explanationism and probabilism. Much of the r e c e n t - - and most promising w o r k - has been produced by computer scientists and by logicians who also work in the AI research programme. Most of the contemporary work to date tends to concentrate on hypothesis generation and engagement. Accordingly, this chapter is a contribution to the relevantly associated sublogics. We begin by reviewing a representative system for dealing with a class of problems for which hypothesis-generation and hypothesisengagement are an efficient way of finding the best solutions (or explanations). The system in question is that of Josephson and Josephson [1994, ch. 7]. 2 We should be careful about what we intend by the representativeness claim. Josephson and Josephson [ 1994] is representative of a the explanationist approach to formal diagnostics. It is an approach that meets with daunting complexity problems. The work is more representative in the former respect than the latter; but it is well to emphasize early in the proceedings the general problem posed by complexity in formal reconstructions of real-life human performance. Josephson's and Josephson's is an approach with some notable antecedents; e.g., Miller, Pople and Meyers [ 1982], [Pearl, 1987], de Kleer and Williams [ 1987], [Reiter, 1987], Dvorak and Kuipers [1989], Struss and Dressler [1989], Peng and Reggia [1990], and [Cooper, 1990], among others. We here adopt the notational conventions of Josephson and Josephson [ 1994]. d denotes a fact or a datum; D a class of data; h denotes a particular hypothesis and H a class of these. H itself can be considered a composite hypothesis. An abduction problem is an ordered quadruple, (D art, Hart, e, pl). D~tt is a finite set of the totality of data to be explained; Hart is a finite set of individual hypotheses; e is a function from subsets of Hau to subsets of Dau (intuitively H explains e(H)); and pl is a function from subsets of H~u to a partially ordered set (intuitively, H has plausibility pl(H)). In this structure the requirement that unit values of pl be partially ordered leaves it underdetermined as to whether pl(H) is "a probability, a measure of belief, a fuzzy value, a degree of fit, or a symbolic likelihood" [Josephl Other important, indeed, seminal contributions to probabilistic AI are Pearl[1988; 2000]. 2The authors of Josephson and Josephson's chapter 7 are Tom Bylander, Dean Allemang, Michael C. Tanner and John R. Josephson.
6.1. Explanationist Diagnostics
157
son and Josephson, 1994, p. 160].3AP is a logic which aims at a solution of an abduction problem. H is complete if it explains all the data; i.e., e(H) - Dart. H is thrifty if the data explained by H are not explained by any proper subset of it; i.e., -~3H C H(e(H) C_ e(H')). If H is both complete and thrifty then it is an explanation. An explanation H is a best explanation if there is no other more plausible explanation; i.e., -~3H'(pl(H') > pl(H)). Note that since pl gives only partial orderings, there can be more than one best explanation of a d or a D. A solution to an A P both resembles and yet differs from our earlier conception of a filtration structure. Similar questions also arise. Perhaps, most importantly, is the issue of whether practical agents actually construct such solutions in their own successful abductions on the ground. The system AP was designed to model Reiter's account of diagnosis [1987] and Pearl's approach to belief-revision [1987]. We briefly sketch the connection with Reiter's theory. Reiter on Diagnosis." A diagnosis problem is an ordered triple (SD, COMPONENTS, OBS). SD is a finite set of first-order sentences which describe the diagnostic problematic. OBS is a set of first-order sentences which report observations. COMPONENTS is a finite assortment of constants, of which ab is a one-place predicate meaning 'abnormal' A diagnosis is a least set A C_ COMPONENTS such that SD U 0 B S U {ab(c)/c C A} U {~ab(c)/c C C O M P O N E N T S \ A } is consistent. A diagnosis problem can be modelled in AP as follows" Hall -- C O M P O N E N T S Dau - OBS
e(H) - a maximal set D C_ D such that SD U D U ab(h) I h C H U ~ab(h) ] h C Hall\H is a consistent set. Diagnoses are unranked in Reiter's treatment; hence pl is not needed in the AP reconstruction. In a good may respects AP is a simplified model. For example, both e and pl are assumed to be tractable, notwithstanding indications to the contrary [Reiter, 1987; Cooper, 1990]. Even so, intractability seems to be the inevitable outcome 3We give a formal model of abduction in Chapters 12 and 13. In this model, given elements d C Dall, an abductive algorithm utilizing the proof theory II of the logic will yield a family { ~ l i -1, 2, . . . } of possible hypotheses which explain d. From such families it is easy to define the set//~ll and a function e as described in the Josephson model. In this context, the ordering pl(H) can be meaningfully defined from the logic involved and using the abductive algorithm available. The Josephson model (Dall, Hall, e, pl} is no longer an abstract model but a derived entity from our abductive mechanisms. As a derived entity, better complexity bounds may be obtained, or at least its complexity can be reduced to that of the abductive algorithms.
158
Chapter 6. Diagnostic Abduction in AI
above a certain level of interaction among the constituents of composite hypotheses. We say that an abduction problem is independent when, should a composite hypothesis explain a datum, so too does a constituent hypothesis; i.e., VH C_ Hau(e(H) - Uh~He(h)). In systems such as AP, the business of selecting a best explanation resolves into two component tasks. The first task (1) is to find an explanation. The second task (2) is keep on finding better ones until a best is identified. A number of theorems bear on these two matters, the first seven on (1) and the next three on (2). In the interests of space, proofs are omitted. Theorem 6.1 In the class of independent abductionproblems, computing the num-
ber of explanations is #P-Complete (i.e., as hard as calculating the number of solutions to any NP-complete problem). In the case of independent abduction problems, sub-task (1) is tractable. Hence we have Theorem 6.2 For independent abduction problems there exists an algorithm for specifying an explanation, if one exists. The algorithm's order of complexity is O(nCe + n2), where C~ is the true complexity of e, and nC~ indicates n calls to e.
An abduction problem is monotonic if a complete explanation explains at least as much as any of its constituent explorations; i.e., VH, H ' C_ Hau(H C_ H' --+ e(H) C_ e(H')). Any independent abduction problem is monotonic, but a monotonic abduction problem need not be independent. Theorem 6.3 Given a class of explanations, it is, NP-complete to determine whether
a further explanation exists in the class of monotonic abduction problems. Moreover Theorem 6.4 In the class on monotonic abduction problems there also exists an
O(nCe + n 2) algorithm for specifying an explanation, provided there is one. Let h be a composite hypothesis. Then an incompatibility abduction problem exists with regard to h if h contains a pair of constituent hypotheses h* and -~h*. More generally, an incompatibility abduction problem is an ordered quintuple (Dau, Hart, e, pl, I), where all elements but I are as before and I is a set of pairs of subsets of Hazt which are incompatible with one another. We put it that VH C_ Hatt((3i C I(i C_ H)) --+ e(H) - 0). In other words, any composite hypothesis containing incompatible sub-hypotheses is explanatorily inert. Any
6.1. Explanationist Diagnostics
159
such composite is at best trivially complete and never a best explanation. However, independent incompatibility problems are independent problems apart from the factor of incompatibility. In other words, they fulfill the following condition:
VH q Hatt((--,3i C I(i C_ H))--+ e(H) -
U e(h)). hCH
Incompatibility abduction problems are less tractable than monotonic or independent abduction problems. Theorem 6.5 In the class of independent incompatibility abduction problems it is NP-complete to find whether an explanation exists.
From this it follows that it is also NP-hard to determine a best explanation in the class of independent incompatibility abduction problems. This class of problems can be reduced to Reiter's diagnostic theory. In fact, Theorem 6.6 In the class of diagnosis problems, it is NP-complete to determine whether a diagnosis exists, depending on the complexity of deciding whether a composite hypothesis is consistent with SD U OBS. (Here, a composite hypothesis is a conjecture that certain components are abnormal and the remainder are normal.)
Independent and monotonic abduction problems are each resistant to cancellation. One hypothesis cancels another if and to the extent that its acceptance requires the loss of at least some of the explanatory force the other would have had otherwise. A cancellation abduction problem can be likened to an ordered sextuple (Dart, Halt, e, pl, e, e), of which the first four elements are as before, and e is a function from Hart to subsets of Dart, indicating the data 'required' by each hypothesis. This gives an extremely simplified notion of cancellation. Nevertheless, it is enough to be a poison-pill for them all. It suffices, that is to say, for interpretability. Theorem 6.7 It is NP-complete to ascertain in the class of cancellation abduction problems whether an explanation exists.
It follows from Theorem 6.7 that it is NP-hard to find a best explanation in this class of problems. Thrift is also an elusive property, notwithstanding its methodological (and psychological) desirability. Indeed, it is as hard to find whether a composite hypothesis is thrifty in the class of cancellation abduction problems as it is to determine whether an explanation exists. In other word, both tasks are co-NP-complete in this class of problems. Up to this point, our theorems address the problem of whether an explanation exists in various problem classes we have been reviewing. The remaining theorems
160
Chapter 6. Diagnostic Abduction in AI
bear on the task of finding best explanations. In systems such as AP, finding a best explanation is a matter of comparing plausibilities. There is a best-small plausibility rule for this. The rule gives a comparison criterion for classes of hypotheses H and H ~. The rule provides that these be a function from H to H ~ which matches elements of H to not less plausible elements of H ~. If H and H ~ have the same cardinality, at least one element in H must be more plausible than its image in H ' if H is to be counted more plausible than H ~. On the other hand, if H is larger than H', it cannot be more plausible than H ~. Whereupon Theorem 6.8 In the class of independent abduction problems, it & NP-hard to determine a best explanation, using the best-small plausibility rule.
However, the best-small rule is tractable where the individual hypotheses have different plausibility values u i , . . . , un and the ui are totally ordered. In that case, Theorem 6.9 In the class of totally ordered monotonic problems satisfying the best-small criterion, there exists an O(nCe + Cpl § n 2) algorithm for determining a best explanation.
It follows that if there exists just one best explanation in the conditions described by Theorem 9, it will be found by an algorithm of the stated type. On the other hand, it is notoriously difficult to determine whether the 'just one' condition is indeed met. In fact, Theorem 6.10 In the class of totally ordered independent abductionproblems deploying the best-small rule, if there exists a best explanation it is NP-complete to ascertain whether another best explanation also exists.
These theorems establish that if we take abduction to be the determination of the most plausible composite hypothesis that is omni-explanatory, then the problem of making such determinations is generally intractable. Tractability, such as it may be, requires consistency, non-cancellation, monotonicity, orderedness and fidelity to the best-small rule. But even under these conditions the quest for the most plausible explanation is intractable. What is more, these difficulties inhere in the nature of abduction itself and must not be put down to representational distortion. One encouraging thing is known. If an abduction problem's correct answerspace is small, and if it is possible to increase the knowledge of the system with new information that eliminates large chunks of old information, then this reduces the complexity of explanation in a significant way; but it also threatens to eliminate the abductive character of explanations thus achieved. By and large, however, since there are no tractable algorithms for large assortments of abduction problems (the more psychologically real, the more intractable), most abductive-behaviour is heuristic (in the classical sense of the term; see chapter 11).
6.1. Explanationist Diagnostics
6.1.1
161
Difficulties with A P
AP finds a solution for problems in the form (D all, Hall, e,pl). Except for a specific role recorded to factors of relevance, an AP instantiates the filtrationstructures discussed in chapter 3. We've said repeatedly that the empirical record of human cognitive behaviour gives little encouragement to the idea that in solving real-life abduction problems, beings like us actually construct filtration-structures. What's true for the genus is likewise true for the species. Accordingly, we have it that Proposition 6.11 (The non-execution of APs by practical reasoners) In solving their real-life abduction problems, practical agents do not in the general case execute an AP. And, as before, we conjecture that ~9 Proposition 6.12 (Complexity and non-execution) It is plausible to suppose, as with the example of filtration-structures, that an important part of why individual agents do not execute APs is the computational complexity of such systems. We would do well not to be unduly alarmed by Propositiom 6.11, even assuming it to be true. What this proposition conjectures is that programs such as AP aren't realistic for beings like us to implement, and that they run up against limitations that inhere in the comparative paucity of an individual's cognitive resources and the comparative modesty of his (or its) cognitive goals. Virtually everything to date that has been offered as a logic of reasoning or as a model of cognitive performance outreaches descriptive adequacy in this same way. Judging by the empirical record, beings like us don't achieve their ends by implementing such logics or instantiating such models. By far the standard apologiae for such gaps is that, at its best, and to what actual behaviour approximates to what the logic stipulates the model sanctions. We have already said our piece about this standard answer. We have difficulties with it. One is that it leaves undealt with the question of how to select those principles laws that are given privileged place in the logic or in the model. The other is that there has been rather little in the way of successful accounts of the requisite approximation relation, even assuming a wholly satisfactory resolution of the first difficulty. But we should be clearer than we have been before what our reservation about approximation comes to. It is, therefore, not our view that approximations do not exist about what abductive individuals do and how abduction fares in the likes of an AP. or any other appropriately contrived filtration-structure. Moreover, it is not our view that such approximations are theoretically intractable. It is our view, however, that Proposition 6.13 (Limits of approximation) Such approximation relations as are presently within our present capacity to describe do not elucidate relevant details of how the actual abductive behaviour of individuals works in practice.
162
Chapter 6. Diagnostic Abduction in AI
It is certainly true that actual abductive behaviour more closely resembles what an AP provides than what goes on in a model of the internal combustion engines. It is at a close resemblance, in our view. But this is far from a blanket dismissal. There are a number of reasons for taking a conciliatory view. q) Proposition 6.14 (Starting small) Since inadequate approximations themselves approximate to requisitely realistic approximations, the former may be indulged as attempts to achieve the latter. 4 q) Proposition 6.15 (Applicability to theoretical agents) To the extent that inadequate approximation is a matter of the resource constraints and goal modesty typical of practical agents, it may be conjectured that the actual behaviour of theoretical agents will have a better approximation fit with AP and filtration structures more generally.
Corollary 6.15(a). If Proposition 6.15 is true, then the great value of such structures is that they give accounts of abduction to which the real-world behaviour of actual agents of a type more satisfactorily approximates. Thus they describe real abduction, if not the real abductions of individuals. We said in our discussion of filtration-structures that there is reason to believe that in the process of solving an abduction, the set of possible candidates takes on the contours of a filtration-structure. This means that, whatever other details might be involved, winning hypotheses stand to original set of candidates in a complex relationship of the relevant-to-the-plausible-to-the most plausible. We can suppose this to be so quite independently of whether an individual abducer ever actually selects his winning hypothesis by activating these filters. Even so,
q) Proposition 6.16 (Filtration-structures as constraints) Whatever the manner in which an individual abducer actually selects his hypothesis from a set of candidate hypotheses, his modus operandi must honour the fact that the winning H has a determinate place in aft#ration-structure. Approximative adequacy is one thing; explanationist adequacy is another. In chapter three, we argued that the nescience condition has the effect of restricting the type of explanation embedded in explanationist abductions to subjunctive explanations. Let us briefly recapitulate. An abduction problem is triggered when a certain target cannot be hit with the agent's present resources. In a loose and intuitive way, this means that the target cannot be reached on the basis of what the agent currently knows. In strictness, the lack-of-knowledge (or nescience) requirement is not a matter of sheer ignorance, but rather of the unattainability of the target 4See here [Simon, 1973,p. 327]: "If there is no such thing as a logical methodof having new ideas, then there is no such thing as a logical method of having small new ideas".
6.2. Another Example
163
with what the agent has a certain level k of knowledge of, or higher. Thus in an abduction problem nothing the agent knows at level k or higher enables him to reach his target. Accordingly, a conjecture is necessary. It is a conjecture in the form C(H), i.e., proposition H is conjectured to have a degree of epistemic virtue consonant with the level-k requirement. This is a clearly necessary constraint. For if H is already known, but at a lower degree than k, or is a well-justified belief short of knowledge strictly speaking, then that H facilitates the hitting of a heretofore unhittable target would afford no occasion for the conjecture of H. (Why make what is already known or justifiably believed a matter of conjecture?). The role of conjecture is intimately connected to the ampliative character of abduction, concerning which Peirce gave such emphasis to the requirement that abductive success required "originary" thinking. Proposition 6.17 (Testing hypothesis) The requirement that an abductive hypothesis be a conjecture that a given proposition has an epistemic standing of at least degree k, sets the standard for the subsequent testing of the proposition. The test is to determine whether it does in fact possess epistemic virtue to that degree. Corollary 6.17(a) makes it perfectly unintelligible for abducers both to conjecture and to test propositions that are objects of their antecedent beliefs. It can be seen on inspection that AP logics in the manner of Josephson and Josephson, and Reiter are not fully enough developed to meet reasonable adequacy conditions.
6.2
Another Example
In this section we briefly review the parsimonious covering theory of Yun Peng and James Reggia. As was the case with Josephson and Josephson, our discussion will be limited to a description of characteristic features of this approach, as well as representative problems to which it gives rise. An important feature of this account is its sensitivity to the complexities induced by giving the theory a probabalistic formulation. This is something these authors attempt, but with the requisite reduction in computational costs. In our discussion we shall concentrate on the non-quantitative formulation of the theory, and will leave adjudication of the conflict between qualitative and probabalisitic methodologies for later in this chapter. This is an appropriate place to make an important terminological adjustment. Explanationism and probabalism are not mutually exclusive methodologies. There is a substantial body of opinion which holds that explanations are intrinsically probabalistic; and that fact alone puts paid to any idea of a strong disjunction. Better that we characterize the contrast as follows.
164
Chapter 6. Diagnostic Abduction in AI An explanationist about abduction is one who holds that meeting explanatory targets is intrinsic to his or her enterprise. Probabilists are not natural rivals of explanationists.
A probabilist is one who holds that, whatever his targets are (including explanatory targets), they are only meetable by way of a probabalistic methodology which is intrinsic to the enterprise, or they are best met in this way. Peng and Reggia are inference-to-the-best-explanation abductionists, who happen to think it possible to integrate, without inordinate cost, their informal explanationism into probability theory [Peng and Reggia, 1990, ch. 4 and 5 ]. The main target of their theory are procedures for the derivation of plausible explanations from the available data [Peng and Reggia, 1990, p. 1]. Parsimonious covering theory is a theoretical foundation for a class of diagnostic techniques described by association-based abductive models. An associative (or semantic) network is a structure made up of nodes (which may be objects or concepts or events) and links between nodes, which represent their interrelations or associations. Associative models are evoked by two basic kinds of procedure: (1) the deployment of symbolic cause-effect associations between nodes, and a recurring hypothesize-and-test process. Association-based models include computer-aided diagnostic systems such as INTENIST-1 (for internal medicine; Pople [ 1975] and Miller, Pople and Meyers [ 1982]); NEUROLOGIST (for neurology; Catanzarite and Greenburg [1979]); PIP (for edema: Pauker, Gorry, Kassirer and Schwarz [ 1976]); IDT (for fault diagnosis of computer hardware; Shubin and Ulrich [ 1982]), and domain-free systems such as KMS.HT (Reggia [ 1981 ]); MGR (Coombs and Hartley [1987]) and PEIRCE (Punch, Tanner and Josephson [1986] See also [chapter 4]Magnani:2001). Given one or more initial problem features, the inference mechanism generates a set of potential plausible hypotheses of 'causes' which can explain the given problem features' [Peng and Reggia, 1990, p. 20]. Thus associative networks involve what we can think of as a logic of discovery. Once the hypotheses have been generated, they are tested in two ways. They are tested for explanatory completeness; and they are tested for their propensity to generate new questions whose answers contribute to the selection of a given hypothesis from its alternatives. This hypothesize-and-test cycle is repeated, taking into account new information produced by its predecessor. This may occasion the need to update old hypotheses; and new ones may be generated Association-based abduction differs from statistical pattern classification, as well as from rule-based deduction. In statistical pattern classification, the inference mechanism operates on prior and conditional probabilities to generate posterior probabilities. In rule-based deduction, the inference mechanism is deduction
6.2. Another Example
165
which operates on conditional rules. In the case of association-based abduction, the hypothesize-and-test mechanism operates on semantic networks. Both statistical pattern classification and rule-based deduction have strong theoretical foundations - - the probability calculus in the first instance, and first order predicate logic in the second. The association-based abductive approach has not had an adequate theoretical foundation. Furnishing one is a principal goal of the parsimonious cover theory (PCT) PCT is structured as follows. In the category of nodes are two classes of events or states of affairs called disorders D and manifestations M, In the category of links is a causal relation on pairs of disorders and manifestations. Disorders are sometimes directly observable, and sometimes not. However in the context of a diagnostic abduction problem disorders are not directly scrutinizable and must be inferred from the available manifestations. Manifestations in turn fall into two classes: those that are present and those that are not. 'Present' here means "directly available to the diagnostician in the context of his present diagnostic problem". The class of present manifestations is denoted by M + A diagnostic problem P is an ordered quadruple {D, M, C, M +), where D {dl, d2,...~ dn} is a finite, non-empty set of objects or states of affairs called disorders, M - {ml, r n 2 , . . . , rnn} is a finite, non-empty set of objects or states of affairs called manifestations, and C C_ D • M is a relation whose domain is D and whose range is M, and D • M is the Cartesian product of D and M. C is called causality. 5I+ is a distinguished subset of M which is said to be present For any diagnostic problem P, and any di and mj, effects (di) is the set of objects or states of affairs directly caused by di and causes (mj) is the set of objects or states of affairs which can directly cause m j . It is expressly provided that a given mj might have alternative and even incompatible, possible causes. This leaves it open that P may produce a differential diagnosis of an m j The effects of a set of disorders effects (D I) is U
effects (di), the union of
dieD1
all effects di. Likewise, the causes of a set of manifestations causes M j is
U mj eMa
causes (my), the union of all causes (m j) The set D1 C_ D is a cover of M j C_ M if M j C_ effects (DI). Informally, a cover of a set of manifestations is what causally accounts for it A set E C_ D is an explanation of M + with respect to a problem P if and only if E covers M + and E meets a parsimony requirement. Intuitively, parsimony can be thought of as minimality, or non-redundancy or relevance A cover DI of M j is minimum if its cardinality is the smallest of all covers of M j . A cover of M j is non-redundant if none of its proper subsets is a cover of M j , and is redundant otherwise. A cover D I of M-I-- is relevant if it is a subset of causes M + , and otherwise is irrelevant
166
Chapter 6. Diagnostic Abduction in AI
Finally, the solution Sol(P) of a diagnostic problem P is the set of all explanations of M + Two especially important facts about diagnostic problems should be noted.
Explanation Existence Theorem. There exists at least one explanation for any diagnostic problem.
Competing Disorders Theorem. Let E be an explanation for M + , and let M + ~ effects (dl) Q_ M + ~ effects(d2) for some dl and d2 cD. Then (1) dl and d2 are not both in E; and (2) if d lC E then there is another explanation E* for M + which contains d2 but not dl and which is of the same cardinality or less. Cover Taxonomy Lemma. Let 2 D be the power set of D, and let Smc, S~c, Sre and Sc be, respectively, the set of all minimal covers, the set of all non-redundant covers, the set of all relevant covers, and the set of all covers of M + for a given diagnostic problem P. Then ~ C_
S~c a_ Sn~ a_ S~ a_ 2D We now introduce the concept of a generator. Let g 1 , 9 2 , . . . , 9n be non-empty pairwise disjoint subsets of D. Then Gt - {91,92,..-, 9n} is a generator. The class [GII generated by Gt is {{dl,d2,...,dn} l dicgi, 1 <_i <_n} Here is an informal illustration of how Parsimonious Cover Theory deploys these structures. We consider the well-known example of a chemical spill diagnosed by KMS.HT, a domain-independent software program for constructing and examining abductive diagnostic problem-solving The problem is set as follows: A river has been chemically contaminated by an adjacent manufacturing plant. There are fourteen different kinds of chemical spills capable of producing this contamination. They include sulphuric acid, hydrochloric acid, carbonic acid, benzene, petroleum, thioacetamide and cesmium. These constitute the set of all possible disorders di. Determination of the type of spill is based on the following factors: pH of water (acidic, normal or alkaline), colour of water (green or brown normally, and red or black when discoloured), appearance of the water (clear or oily) radioactivity, spectrometry results (does the water contain abnormal elements such as carbon, sulphur or metal?), and the specific gravity of the water. M is the set of all abnormal values of these measurements The diagnostician is called the Chemical Spill System, CSS. When there is a spill an alarm is triggered and CSS begins collecting manifestation data. CSS's objective is to identify the chemical or chemicals involved in the spill The knowledge base for CSS includes each type of spill that might occur, together with their associated manifestations. For example, if the d i in question is sulphuric acid, the knowledge base reflects that spills of sulphuric acid are possible at any time during the year, but are more likely in May and June, which are
6.2. Another Example
167
peak periods in the manufacturing cycle. It also tends to make the water acidic, and spectrometry always detects sulphur. If the spilled chemical were benzene, the water would have an oily appearance detectable by photometry; and spectrometry might detect carbon. If petroleum were the culprit then the knowledge system would indicate its constant use, heaviest in July, August and September. It might blacken the water and give it an oily appearance, and it might decrease the water's specific gravity. It would also be indicated that spectrometry usually detects carbon KMS.HT encodes this information in the following way.
Sulphuric Acid [Description: Month of year = May (h), June (h) pH = Acidic (h) Spectrometry results = Sulphur (a)]
Benzene [Description: Spectrometry results = Carbon (m) Appearance = Oily (m) ]
Petroleum [Description: Month of year = July (h), August (h), September (h) Water colour = Black (m) Appearance = Oily (m) Spectrometry results = Carbon (h) Specific Gravity of Water = Decreased (m)] Here a is always, b is high likelihood, m is medium likelihood, 1 is low likelihood and n is never. It is easy to see that the knowledge base provides both causal and noncausal information. For example, that sulphuric acid has a manifestation pH = acidic is causal information, whereas the information that sulphuric acid use is especially high in May and June is important, but it does not reflect a causal association between sulphuric acid and those months. Information of this second kind reports facts about setting factors, as they are called in KMS.HT. The set effects (di) is the set of all manifestations that may be caused by disorder di, and the set causes (mj) is the set of disorders that may cause manifestation mj. In particular, effects (Sulphuric Acid) = {pH = Acidic, Spectrometry Results - Sulphur}, and causes (pH = Acidic) - { Benzenesulphuric Acid, Carbonic Acid, Sulphuric Acid}. Since there are fourteen disorders in this example, there are 214 possible sets of disorders, hence 214 possible hypotheses. However, for reasons of economy, CSS
168
Chapter 6. Diagnostic Abduction in AI
confines its attention to sets of disorders that meet a parsimony constraint. If we chose minimality as our parsimony constraint, then a legitimate hypothesis must be a smallest cover of all present manifestations M + (a cover is a set of disorders that can causally account for all manifestations). The objective of the CSS is to specify all minimum covers of M + . When a manifestation my presents itself it activates the causal network contained in the system's knowledge base. More particularly, it evolves all disorders causally associated with my. That is to say, it evokes causes (mj). These disorders combine with already available hypotheses (which are minimum covers of previous manifestations) and new hypotheses are formed with a view to explaining all the previous manifestations together with the new one m j. The solution to P is the set of all minimal covers. For reasons of economy, it is desirable to produce the solution by way of generators. As we saw a generator is a set of non-empty, pairwise disjoint subsets of D. Let the subsets of D be A, B, and C. Then form the set ofall sets that can be formed by taking one element from each of the subsets A, B and C. The set of those sets is a generator on {A, B, C}. If the ith set in a generator contains n i disorders, then the generator reflects I-Ii ni hypothesis, which in the general case is significantly fewer than the complete list. CSS now continues with its investigation by putting multiple-choice questions to the knowledge base. For example, it asks whether the spill occurred in either April, May, June, July, August, or September. The answer is June. This enables the system to reject Carbon Isotope, which is never used in June. It then asks whether the pH was acidic, normal or alkaline, and is told that it was acidic. This gives rise to hypotheses in the form of a generator that identifies the alternative possibilities as Benzenesulphuric Acid, Carbonic Acid, Hydrochloric Acid, Sulphuric Acid. Each of these is a minimum cover. Now there is a question asking whether Metal, Carbon or Sulphur were detected Spectrometrically. The answer is that Metal and Carbon were detected. This now excludes Sulphuric Acid because it is always the case that Spectrometry will detect the presence of Sulphur, and did not do so in this case. The presence of metal evokes four disorders: Hydroxyaluminum, Cesmium, Rubidium and Radium. None of these occurs in the present generator. So to cover the previous manifestation (acidic pH) and the new manifestation, CSS must produce new hypotheses involving at least two disorders. The presence of Carbon, evokes six disorders: Carbonic Acid, Benzene, Petroleum, Benzenesulphuric Acid, Thioacetamide and Chromogen. Carbonic Acid and Benzenesulphuric Acid are already in the present generator, so they cover both pH = Acidic and Spectrometry = Carbon. Integrating these into the set of four new hypotheses, gives eight minimum covers for three existing manifestations. These in turn give a generator of two sets of competing hypotheses.
6.2. Another Example
169
These are { Carbonic Acid, Benzenesulphuric Acid} and { Radium, Rubidium, Cesmium, Hydroxyaluminum}. The system now asks whether Radioactivity was present. The answer is Yes. This evokes four disorders: Sulphur Isotope, Cesium, Rubidium and Radium. Each hypothesis to date involves one of these disorders, except for the Hydroxyaluminum hypothesis (which is not radioactive). Thus Hydroxyaluminum is rejected and a new generator is formed. It contains six minimum covers for the current manifestation The system asks whether Specific Gravity was normal, or increased or decreased. The answer is that it was increased. This evokes four disorders: Hydroxyaluminum, Cesmium, Rubidium and Radium, which are also evoked by the radioactivity answer. Accordingly, the solution to the chemical spill problem is given by a generator containing two sets of incompatible alternatives: { Carbonic Acid, Benzenesulphuric Acid} and { Radium, Rubidium, Cesmium}.
6.2.1
Remarks
CSS is a very simple system. (A typical search space involves 250 candidates and higher.) Even so, it is faced with 16,384 distinct sets of disorders. Of these, 91 are two-disorders sets. But CSS identifies the six plausible hypotheses (where plausibility is equated with minimality). This is a significant cutdown from a quite large space of possible explanations. This indicates that PCT does well where the approach discussed in the previous section tends to do badly, namely, on the score of computational effectiveness. In the engineering of this system, finding plausible inferences just is making the system computationally effective. Peng and Reggia point out that for certain ranges of cases non-redundancy is a more realistic parsimony constraint than minimality, and that in other cases relevancy would seem to have the edge [Peng and Reggia, 1990, pp. 118-120 ]. Nonredundancy was a condition on Aristotle's syllogism, and it is akin to Anderson and Belnap's full-use sense of relevance and of a related property of linear logic. In standard systems of relevant logic a proof l--[ of ~b from a set of hypotheses ~2 is relevant iff H employs every member of y~'~. Relevance is not nonredundancy, however; a relevant proof might have a valid subproof, since previous lines can be re-used. If the logic is linear, all members of ~ must be used in ]-I, and none can be re-used. Peng and Reggia's nonredundancy is identical to Aristotle's: No satisfactory nonredundant explanation can have a satisfactory proper sub-explanation. Relevance for Peng and Reggia is restricted to causal relevance, and so captures only part of the sense of that broad concept. If we allow nonredundancy as a further sense of relevance, then the following structural pattern is discernible in the Peng-Reggia approach. One wants one's explanations to be plausible. Plausible explanations are those produced parsimoniously. Parsimony is relevance, in the
170
Chapter 6. Diagnostic Abduction in AI
sense of nonredundancy. Thus relevance is a plausibility-enhancer. In later chapters, we will investigate the possibility that relevance and plausibility combine in a different way from that suggested here. In essence, what C S S does is select a causal explanation of some symptoms from explanation of some symptoms from a comparatively small set of phenomena that are not only known to be possible causes of it but are known to be actual causes under particular circumstances. Since the actual circumstances of the symptoms exhibit some variations from circumstances under which it it known that those symptoms are caused, the abducer's trigger is not those symptoms, but rather that those symptoms do not correlate with a known possible cause sufficiently to meet the abducer's (often implicit) epistemic-level test. In a rough and ready way, the abducing device is attempting to move from knowledge of what causes of these symptoms are known to be like to what, in this particular case, the symptoms actually are. In the nature of the case, the hypothesizing diagnostician cannot know the answer to this question with the appropriate degree of knowledge. This allows us to say that C S S systems hypothesize that if a certain H did meet epistemic standards which it presently does not appear to meet, it would (subjunctively) be the (best) causal explanation of the manifestations. The winning conjecture in each case is "originary" and "epistemically challenged" i.e., a conjecture that some proposition has a degree of epistemic virtue which it is now not known to have.
6.3
Coherentism and Probabilism
6.3.1
The Rivalry of Explanationism and Probabilism
We briefly characterized explanationism and probabilism in the previous section. As we say, they are not natural rivals. There are ways of being an explanationist which involves no resistance to probabilistic methods, and there are ways of being a probabilist which are compatible with some ways of being an explanationist. For example, one way of being an explanationist about abduction is to insist that all abductive inference is inference to the best explanation. But there is nothing in this that precludes giving a probabilistic account of inference to the best explanation. Equally, someone could be a probabilist about inference in general and yet consistently hold that explanation is intrinsic to abuductive inference. Even so, there are explanationists who argue that probabilistic methods are doomed to fail or, at best to do an inadequate job. If this is right, then knowing it endows the enquirer into abduction with some important guidance for the resourcemanagement aspects of his programme. It is negative guidance: Do not give to the calculus of probability any central and load-bearing place in your account of abductive reasoning.
6.4. Explanatory Coherence
171
In this section we examine one way of trying to make good on this negative advice. In doing so, we develop a line of reasoning which we find to have been independently developed in Yhagard [2000]. Since our case heavily involves Thagard himself, and pits him against the same position that Thagard also examines, we shall not give a fully-detailed version of our argument, since the case we advance is already set out in Thagard [2000].
6.4
Explanatory Coherence
Like the approach of Peng and Reggia, Thagard sees abductive inference as a form of causal reasoning [Thagard, 1992 ]. It is causal reasoning triggered by surprising events (again, a Peircean theme), which, being surprising call for an explanation. Explanations are produced, by hypothesizing causes of which the surprising events are effects. Thagard understands explanationism with respect to causal reasoning to be qualitative, whereas probabilism proceeds quantitatively. In recent years, both approaches have been implemented computationally. In Thagard [1992], a theory of explanatory coherence is advanced. It can be implemented by ECHO, a connectionist program which computes explanatory coherence in propositional networks. Pearl [1988] presents a computational realization of probabilistic thinking. These two styles of computational implementation afford an attractive way of comparing the cognitive theories whose implementations they respectively are. The psychological realities posited by these theories can be understood in part by way of the comparison relations that exist between ECHO and probabilistic realizations. As it happens, there is a probabilistic form of ECHO, which shows not only that coherentist reasoning is not logically incompatible with probabilistic reasoning, but that coherentist reasoning can be considered a special case of probabilistic reasoning. What makes it so is that ECHO's inputs can be manipulated to form a probabilistic network which runs the Pearl algorithms. Coherentism is not the only way of being an explanationist, and Pearl's is not the only way of being a probabilist. But since the computer implementations of these two particular theories are more advanced than alternative computer models, comparing the Thagard and the Pearl computerizations is an efficient way to compare their underlying explanationist and probabilist theories as to type The chief difference between explanationist and probabilist accounts of human reasoning lies in how to characterize the notion of degrees of belief Probabilists hold that doxastic graduations can be described in a satisfactory way by real numbers under conditions that satisfy the principles of the calculus of probability. Probabilism is thus one of those theories drawn to the idea that social and psychological nature has a mathematically describably structure, much in the way that physics holds that physical reality has a mathematical structure. Explanation-
172
Chapter 6. Diagnostic Abduction in AI
ism, on the other hand, is understood by Thagard to include the claim that human causal reasoning, along with other forms of human reasoning, can be adequately described non-quantitatively. It rejects the general view that reasoning has a mathematical structure, and it rejects the specific view that reasoning has a structure describable by the applied mathematics of games of chance. Disagreements between explanationists and probabilists are not new; and they precede the particular differences that distinguish Thagard's theory from Pearl's. Probabilism has attracted two basic types of objection. One is that the probabilistic algorithms are too complex for human reasoners to execute [Harman, 1986]. The other is that experimental evidence suggests the descriptive inadequacy of probablistic accounts [Kahneman et al., 1982 ], a score on which explanationist accounts do better [Read and Newhall, 1993; Schank and Ranney, 1991; Schank and Ranney, 1992; Thagard and Kunda, 1998 ]. See also [Thagard, 2000, p. 94]. The importance of the fact that coherentist reasoning can be seen as a special case of probabilistic reasoning is that it suggests that there may be a sense in which the probabilistic approach is less damaged by these basic criticisms than might otherwise have been supposed. There is ample reason to think that the sheer dominance of human reason is evidence of the presence of the Can Do Principle, which we discussed in section 3.3.1. A theorist comports with the Can Do Principle if, in the course of working on a problem belonging to a discipline D, he works up results in a different discipline D*, which he then represents as applicable to the problem in D. The principle is adversely triggered when the theorist's attraction for D* is more a matter of the ease or confidence with which results in D* are achieved, rather than their well-understood and well attested to applicability to the theorist's targets in D. (A particularly dramatic example of Can Do at work, is the decision by neoclassical economists to postulate the infinite divisibility of utilities, because doing so would allow the theory to engage the fire-power of the calculus.) The conceptual core of the theory of explanatory coherence (TEC) is captured by the following qualitative principles [Thagard, 1989; Thagard, 1992; Thagard, 2000], and [Magnani, 2001 a, pp.34 and 138 ].
1. Symmetry. Unlike conditional probability, the relation of explanatory coherence is symmetric. 2. Explanation. If a hypothesis explains another hypothesis, or if it explains the evidence, then it also coheres with them and they with it. Hypotheses which jointly explain something cohere with each other. Coherence varies inversely with the number of hypotheses used in an explanation. (cf the minimality condition of Peng and Reggia [ 1990].) 3. Analogy. Similar hypotheses that explain similar bodies of evidence cohere with one another.
6.4. Explanatory Coherence
173
4. Observational Priority. Propositions stating observational facts have a degree of intrinsic acceptability. 5. Contradiction. Contradictory propositions are incoherent with each other. (It follows that the base logic for TEC must be paraconsistent lest an episodic inconsistency would render any explanatory network radically incoherent.) 6. Competition. If both P and Q explain a proposition, and if P and Q are not themselves explanatorily linked, then P and Q are incoherent with each other. (Note that explanatory incoherence is not here a term of abuse.) 5 7. Acceptance. The acceptability of a proposition in a system of propositions depends on its coherence with them. TEC defines the coherence and explanation relations on systems of propositions. In ECHO propositions are represented by units (viz., artificial neurons). Coherence relations are represented by excitatory and inhibitory links. ECHO postulates a special evidence unit with an activation value of 1. All other units have an initial activation value of 0. Activation proceeds from the special evidence unit to units that represent data, thence to units that represent propositions that explain the data, and then to units representing propositions that explain propositions that explain data; and so forth. ECHO implements principle (7), the Acceptability rule, by means of a connectivist procedure for updating the activation of a unit depending on the units to which it is linked. Excitatory links between units cause them to prompt their respective activation. Inhibitory links between units cause them to inhibit their respective activation. Activation of a unit aj is subject to the following constraint:
aj(t + 1) = aj(t)(t - d)+netj(max-aj(t)ifnetj > O, otherwise netj(aj(t))-min) in which d is a decay parameter (e.g., 0.5) that decrements every unit at each turn of the cycle; j is the minimum activation (-1); max is maximum activation (1). Given the weight Wij the net input to a unit (netj) is determined by
netj = ~-~-iwijai(t). In ECHO links are symmetric, but activation flow is not (because it must originate in the evidence unit). ECHO tolerates the presence of numerous loops, without damaging the systems ability to make acceptability computations. Updating is performed for each unit u i. For this to happen the weight of the link between ui and uj (for any uj to which it is linked) must be available to it; and the sPropositions are explanatorily linked if one explains the other or jointly they explain something else.
174
Chapter 6. Diagnostic Abduction in AI
same holds for the activation of uj. Most units in ECHO are unlinked to most units. But even if ECHO were a completely connected system, the maximum number of links with n units would be n - 1 . Updating, then, requires no more than n*(n-1) calculations. However, uncontrolled excitation (the default weight on excitatory links) gives rise to activation oscillations which preclude the calculation of acceptability values. Thagard reports experiments which show high levels of activation stability when excitation, inhibition and decay are appropriately constrained. Accordingly, the default value for excitation is 0.4; for inhibition (the default value of inhibitory links) the value is -0.6; and 0.5 for decay [Yhagard, 2000, p. 97 ]. Thagard also cites experimental evidence which indicates that ECHO's efficiency is not much influenced by the size or the degree of connectivity of networks. Larger networks with more links do not require any systematic increase in work required by the system to make its acceptability calculations [2000, p. 97].
6.4.1
Probabilistic Networks
A clear advantage of probabilistic approaches to reasoning have over explanationist approaches is a precise theoretical vocabulary, a clear syntax, and a welldeveloped semantics. Degrees of belief are associated with members of the [0-1 ] interval of the real line, subject to a few special axioms. A central result of this approach in Bayes' theorem: Pr(hle ) =
Pr(h) x Pr(elh ) Pr(e)
The theorem provides that the probability of a hypothesis on a body of evidence is equal to the probability of the hypothesis alone multiplied by the probability of the evidence given the hypothesis, divided by the probability of the evidence alone. The theorem has a certain intuitive appeal for the abduction theorist, especially with regard to the probability of the evidence relative to the hypothesis, which resembles AKM-abduction schema of 1.1 in an obvious way. The Bayesian approach to human reasoning is, however, computationally very costly [Harman, 1986; Woods and Walton, 1972]. Probabilistic updating requires the calculation of conjunctive probabilities, the number of which grow exponentially with the number of conjuncts involved. Three propositions give rise to eight different probability alternatives, and thirty would involve more than a billion probabilities [Harman, 1986, p. 25]. It is also possible that coherence maximization is computationally intractable, but provided the system has the semidefinite programming algorithm, TEC is guaranteed that the system's optimality shortfall will never be more than 13 percent [Thagard and Verbeurgt, 1998 ]. In their turn, Pearl's network, in common with many other probabilistic approaches, greatly reduces the number of probabilities and probability determinations by restricting these determinations to specific
6.4. Explanatory Coherence
175
classes of dependencies. If, for example, Y depends wholly on X and Z depends wholly on )1, then in calculating the probability of Z, the probability of X can be ignored, even though X, Y and Z form a causal network. In Pearl's network, nodes represent multi-valued variables, such as body temperature. In two-valued cases, the values of the variable can be taken as truth and falsity. ECHO's nodes, on the other hand represent propositions, one each for a proposition and its negation. In Pearl networks edges represent dependencies, whereas in ECHO they represent coherence. Pearl networks are directed acyclic graphs. Its links are anti-symmetric, while those of ECHO are symmetric. In Pearl networks the calculation of probabilities of a variable D is confined to those involving variables causally linked to D. Causally independent variables are ignored. If, for example, D is a variable whose values are causally dependent on variables A, B, and C, and causally supportive of the further variables E and F, then the probabilities of the values of D can be taken as a vector corresponding to the set of those values. If D is body temperature and its values are high, medium and low, then the vector (.6.2.1) associated with D reflects that the probability of high temperature is .6, of medium temperature is .2, and of low temperature is. 1. If subsequent measurement reveals that the temperature is in fact high, then the associated vector is (1 0 0). If we think of A, B and C as giving prior probabilities for the values of D, and of E and F as giving the relevant observations, then the probability of D can be calculated by Bayes' Theorem. For each of Pearl's variables X, '(x)' denotes the degrees of belief calculated for X with regard to each of its values. Accordingly, BEL(x) is a vector with the same number of entries as Xhas values. BEL(x) is given by the following equation: BEL(x) = a • A(x) x 7r(x) in which a is a normalization constant which provides that the sum of the vector entries is always 1. A(x) is a vector representing the support afforded to values of X by variables that depend on X. 7r(x) is a vector representing the support lent to values of X b y variables on which X depends. It is known that the general problem of inference in probabilistic networks is NP-hard [Cooper, 1990]. Pearl responds to this difficulty in the following way. He considers the case in which there is no more than one path between any two nodes, and produces affordable algorithms for such systems [Pearl, 1988, ch. 4]. If there is more than one path between nodes, the system loops in ways that destabilizes the calculation of BEL's values. In response to this difficulty, procedures have been developed for transforming multiply-connected networks into uni-connected networks. This is done by clustering existing nodes into new multi-valued nodes Pearl [1988, ch. 4], discusses the following case. Metastatic cancer causes both serum calcium and brain tumor, either of which can cause a coma. Thus there are two paths between metastatic cancer and coma. Clustering involves replacing
176
Chapter 6. Diagnostic Abduction in AI
the serum calcium node and the brain tumor node into a new node representing variables whose values are all possible combination of values of the prior two, viz.: increased calcium and tumor; increased calcium and no tumor; no increased calcium and tumor; and no increased calcium and no tumor. Clustering is not the only way of dealing with loops in probabilistic networks. Thagard points out [2000, p. 101 ] that Pearl himself considers a pair of approximating techniques, and that Lauritzen and Spiegelharter [1988], have developed a general procedure by transforming any directed acyclic graph into a tree of the graph's cliques. Hrycej [1990] describes approximation by stochastic simulation as a case of sampling from the Gibbs distribution in a random Markov field, Frey [1998] also exploits graph-theoretic inference pattern in developing new algorithms for Bayesian networks. Recurring to the earlier example of D depending on A, B and C, and E and F depending on D, D's BEL values must be computed using values for A, B, C,E and F only. To keep the task simple, suppose that a, b, c, d, and e are, respectively, the only values of these variables. Pearl's algorithms require the calculation of d's probability given all possible combination of values of the variables to which D is causally linked. If we take the case in which the possible values of D are truth (d) or falsity (not-d), Pearl's algorithm requires the calculation of eight conditional probabilities. In the general case in which D is causally linked to n variables each with k variables, the system will have to ascertain k n conditional probabilities. Where n is large, the problem of computational intractability reappears, and occasions the necessity to employ approximation procedures. Even if n isn't large, the question isn't whether the requisite numbers of calculation of conditional probabilities can be done by humans, that rather whether they are done (in that number) by humans.
6.5
Pearl Networks for ECHO
There are important differences between ECHO and Pearl. There are also some significant similarities. If ECHO is faced with rival hypotheses, it will favour the ones that have greatest explanatory power. The same is true of Pearl networks. ECHO likes hypotheses that are explained better than those that aren't. Pearl networks have this bias, too. Thus there is reason to think that there may be less of a gap between ECHO and Pearl networks than might initially have been supposed. Accordingly Thagard discusses a new network, PECHO. PECHO is a program that accepts ECHO's input and constructs a Pearl network capable of running Pearl's algorithms. (For details see Thagard [2000, pp. 103-108 ].) The theory of explanatory coherence which ECHO implements via connectionist networks, as well as by the further algorithms in Thagard and Verbeurgt [1998], can also be implemented probabilistically. That alone is a rather striking result. Some of the older objections
6.5. Pearl Networks for ECHO
177
to probabilism held that the probability calculus simply misconceived what it is to reason. The integration achieved by Thagard suggests that the inadequacies which critics see in probabilism need to be treated in subtler and more nuanced ways. It remains true that PECHO is computationally a very costly network. Vastly more information is needed to run its update algorithms than is required by ECHO. Special constraints are needed to suppress loops, and it is especially important that the simulated reasoner not discard information about co-hypotheses and rival hypotheses. But given that what coherentists think is the right account of causal reasoning is subsumed by what they have tended to think is the wrong account of it, computational cost problems occur in a dialectically interesting context. How can a good theory be a special case of a bad theory? In general when a bad theory subsumes a good, it is often more accurate to say that what the bad theory is bad at is what the good theory is good at. This leaves it open that the bad theory is good in ways that the good theory is not. This gives a context for consideration of computational cost problems. They suggest not that the probabilistic algorithms are wrong, but rather that they are beyond the reach of human reasoners. If one accepts the view of Goldman [1986] that power and speed are epistemological goals as reliability then explanationist models can be viewed as desirable ways of proceeding apace with causal inference while probabilistic models are still lost in computation. [Thagard, 2000, p. 271 ]. Thagard goes on to conjecture that 'the psychological and technological applicability of explanation and probabilistic techniques will vary from domain to domain' [Thagard, 2000, p. 112 ]. He suggests that the appropriateness of explanationist techniques will vary from high to low depending on the type of reasoning involved. He considers the following list, in which the first member is explanationistly most appropriate, and the last least: Social reasoning Scientific reasoning Legal reasoning Medical diagnosis Fault diagnosis Games of chance. We find this an attractive conjecture. It blunts the explanationist complaint that probabilism is just wrong about reasoning. It suggests that there is something for probabilism to be right about, just as there is something for explanationism also to be right about. It suggests, moreover, that they are not the same things. The fact that coherentist networks are subcases of probabilistic networks, when conjoined with the fact that what is indisputably and demonstrably troublesome
178
Chapter 6. Diagnostic Abduction in AI
about probabilistic networks is their inordinate informational and computational complexity and, relatedly, their psychological unreality, occasions a further conjecture. Systems that are well-served by explanationism are systems having diminished information-processing and computational power, and have psychological make-ups that systems for which probabilism does well don't have. This, in turn, suggests that the variability which attaches to the two approaches is not so much a matter of type of reasoning as it is a matter of type of reasoner. In chapter 2, we proposed that cognitive agency comes in types and that type is governed by levels of access to, and quantities of, cognitively relevant resources in relation to the strictness of cognitive goals: information, time and computational capacity. If we take the hierarchy generated by this relation, it is easy to see that it does not preserve the ordinal characteristics of Thagard's list. So, for example, scientific reasoning will be represented in our hierarchy of agencies by every type of agent involved in scientific thinking, whether Joe Blow, Isaac Newton, the Department of Physics at the University of Birmingham, all the science departments of every university in Europe, NASA, neuroscience in the 1990s, postwar economics, and so on. This makes us think that we need not take our hierarchy to be a rival of Thagard's. Ours is an approach that also explains the dialectical tension represented by the subsumption of ECHO by probabilistic networks. It explains why explanationism is more appropriate for reasoning by an individual agent, and why probabilism could be appropriate for an institutional agent. And it explains why it is unnecessary to convict either account of the charge that it simply doesn't know what reasoning is. On the other hand, even for a given type of agency there can be variation in the kind of reasoning it conducts. Individual reasoners undertake cognitive tasks for which social reasoning is appropriate, but they also try their hands at games of chance. A medical diagnostician need not always operate under conditions of battlefield triage; sometimes he has plenty of time, an abundance of already-sorted information, as well as the fire-power of his computer to handle the statistics. What this suggests is that the fuller story of what model is appropriate to what reasoning contexts will be one that takes into account the type of agent involved and the type of reasoning he is engaged in. It is a story which suggests the advisability of criss-crossing a kinds-of-reasoning grid on a kinds-of-resources g r i d - of laying a Thagardian hierarchy across one such as our own. Things are less good, however, when we move from the solo reasoning of agents to interactive n-agent reasoning. It is a transition that bodes less well for the probabilistic approach than for various others. It is a feature of Thagard's hierarchy that, whereas at the bottom objective probabilities are in play, at the top we must do with subjective probabilities. But subjective probabilities are notorious for their consensual difficulties in contexts of n-agent reasoning. This is an invariant feature of them irrespective of whether the interacting agents are Harry and Sarah or the
6.5. Pearl Networks for ECHO
179
Governments of Great Britain and France. Then, too, an individual reasoner may want to reason about the state of play in Monte Carlo, which is reasoning of a type for which the probabilistic approach is tailor-made. Even so, the individual gambler cannot run the probabilistic algorithms willy-nilly. It remains the case that at least part of the reason that explanationism gets the nod over probabilism in the case of individual reasoning is that in general individual reasoners are constitutionally incapable in reasoning in the way that probabilism requires, but is capable of reasoning as coherentism requires. Part, too, of the reason that probabilism is psychologically inappropriate for individuals is that agency-types for which probabilism is appropriate don't (except figuratively) have psychologies; hence have no occasion to impose the constraint of fidelity to psychological reality. When PECHO took ECHO into probabilistic form, it endowed PECHO-reasoners with probabilized counterparts of reasonings that an ECHO-reasoner is capable of. But it also endowed the PECHO-reasoner with two qualities which in the general sense are fatal for the individual reasoner, a capacity to run algorithms on a scale of complexity that removes it from an individual's reach, and a hypothetical psychology which no individual human has any chance of instantiating. Thus the ECHO-reasoner can get most things right which PECHO can get right, but it can't get them right in the way that PECHO does. The heart and soul of this difference is that the way in which the PECHO-reasoner must operate to get things right involve its having properties which no individual can have, in fact or in principle. Institutional reasoners are another matter, subject to the quantifications we have already taken note of. As it would now appear, an individual's cognitive actions are performed in two modalities, i.e., consciously and subconsciously. Bearing in mind the very substantial difference in information-theoretic terms between consciousness and subconsciousness, this is a good place to remind the reader of the possibility that accounts that are as computationally expressive as probabilistic models might more readily apply to the subconscious cognitive processes of individual agents, which appear to be processes that aren't informationally straitened in anything like the way of conscious processes. It is less clear that unconscious cognitive systems have structures that are not all realistically represented as structures in the real line. but it is harder to make a change of psychological unreality stick against probabilism precisely because the true psychological character of the unconsciousness of individuals is itself hardly a matter of consensus, to say nothing of theoretical clarity. It is true that some theorists have no time for the idea of unconscious reasoning (cf Peirce's insistence at Peirce [1931-1958, pp. 5.109 and 2.144] or for unconscious cognition of any sort). Our present suggestion will be lost on those who share this view. On the other hand, if one's epistemological orientation is a generally reliabilist one, it is difficult to maintain that the idea of unconscious reasoning is simply oxymoronic. We ourselves take the reliabilist stance. If cognition
180
Chapter 6. Diagnostic Abduction in AI
and good reasoning are the products of processes that are working as they should (i.e., normally, as in Millikan [1984] there is nothing in this basic idea to which consciousness is either intrinsic or exclusive).
6.6
Neuropharmacological Intervention
Neural disorders are often the result of imbalances of neurochemicals, owing for example, to cell damage or degeneration. When the requisite neurochemical is irrevocably depleted a serious neurological malady is the result. Parkinson's disease is such a disorder. Pharmacological remedies of disorders such as Parkinson's disease are the object of what is called rational drug design (RDD) (see, e.g., [van den Bosch, 2001 ], from which the present section is adapted). In a typical RDD, information is collated concerning the requisite organic structures and of the success or failure of past pharmacological interventions. In wide ranges of cases, computer models are able to suggest further chemical combinations and/or dosages. RDDs are sometimes supplemented or put aside in favour of hit-and-miss strategies in which large quantities of chemicals are produced and tested in vitro for their capacity to influence receptors in the targeted neural structures. A third procedure is the generation of data from laboratory studies of animals. Following Timmerman et al. [1998], van den Bosch develops the Parkinson's disease example with a description of an important class of subcortical nuclei called basal ganglia, which are necessary for control in voluntary behaviour. When Parkinson's disease strikes, a component of the basal ganglia, substantia nigra pars compacta (SNC), is significantly degraded, a result for which there is no known cause. The SNC furnishes the neurotransmitter dopamine, whose function is to help modulate signals from the cortex. Figure 6.1 schematises the function of dopamine. In Parkinson's disease the object is to stimulate chemically increases in dopamine levels. L-dopa has had a mixed success in this regard. In the first five years, it is highly effective, but with nausea as a significant side effect; and after five years of use, its therapeutic value plummets dramatically. Parkinson's research is, therefore, dominated by the quest for alternative modes of dopamine receptor agonists, especially those that interact with only particular dopamine receptors. Figure 6.1 charts the effect of versions of these dopamine receptor agonists. Two receptors on the passage from the stimulation to the SNR/GPi are indicated as D1 and D2. D 1 occurs on the direct route, whereas D2 occurs on the indirect route via the GPe. Both D 1 and D2 are receptive to dopamine, but differently. When dopamine stimulates D1, this excites the relevant cell, whereas the stimulation of D2 inhibits it. A natural question is whether a combined excitation/inhibition convergence is necessary for a favourable outcome. Van den Bosch reports studies that show that
6.6. Neuropharmacological Intervention
181
Cortex
T
Glu + striatum D2(-)
DI(+)
DA
thalmus GABA(-)
GABA(-) GPe
GABA(-) STN r. . . .
s
I
I
'I SNC
'
I I
I L
.
.
.
.
GABA(-)
GABA(-)
Glu(+) I
SNR/GPi
J
1
._ brainstem "- GABA(-)
Figure 6.1
compounds that stimulate D 1 but not D2 receptors are ineffective [van den Bosch, 2001, p. 32]. The object of a drug design is to discover a successful agonist for the disorder in question. In Parkinson's research, as in many other sectors of neuropharmacology, this is done by modelling what of the patient's neurological endowments are known to be of relevance to dopamine stimulation, together with what our various pharmacological regimes are known to stimulate such receptors. When these two knowledge bases are coordinated in the appropriate way, it is often possible to model the resultant dynamic systems by a qualitative differential equation (QDE). An example of how a QDE might model the basal ganglia as schematized in Figure 6.1 is sketched in Figure 6.2. M+ and M- are, respectively, monotonically increasing and decreasing functions. Variables have initial values of high, low or normal and dynamic values of increasing, steady or decreasing. Variables are interconnected in such a way that their differentials determine a value for the combination. These determinacies are subject to a qualitative calculus. If a variable v is a differential function over time
Chapter 6. Diagnostic Abduction in AI
182
d/dtf(SNC) ................ M+
M+ 1~ d/dt a(L-DOMA,striatum)
....i', i d/dta(dopamine,striatum) M+
flstriatum) _~ M+
M-
d/dtfl stri atum -D 2-t o-G pe)
d/dtflstriatum-D 1-to-SNR/GPi)
M+
d/dta(GABI,SNR/GPi)
kM+
M.
d/dtflSN~UGPi)
d/dtJ~Gpe)
M
IM+ [__
I
_
d/dta(glutamate, SNR/G Pi)
Figure 6.2
ofvariables v2 and va and ifvleM +, then an increase in the value of v2 and v3 will drive up the value of Vl, but vl, will not be assigned a value when v2 increases and v3 decreases. Such indeterminacies flow from the qualitative features of QDEs. A qualitative state of a system described by a QDE is an attribution of variable values to all variables of the system, consistent with the constraints in the QDE. Given a QDE and a set of known initial values, a set of all consistent system states can be deduced, together with their possible transitions. When a calculated value is unknown, all possible states are included in the set. This set is complete, but is proved not always correct since spurious states may be included as well [van den Bosch, 2001, p. 34].
6.6. Neuropharmacological Intervention
183
Figure 6.2 graphs a part of a QDE (which includes a part of the model of basal ganglia schematized in Figure 6.1) together with the operation of dopamine. It charts the firing rates (j) of nuclei and neural pathways and amounts (a) of neurotransmitters contained in nuclei. Thus if the firing rate of the SNC increases this will increase the quantity of dopamine in the striatum, which in turn depresses activation of the neural pathway that signals to the GPe. A drug lead is a specification of the properties a drug should possess if it is to hit a given desired therapeutic target or range of targets. For every disease for which there is some minimal degree, or more, of medical understanding, a profile exists, which is a qualitative specification of the disease. For every profile, the researcher's goal is to supply a drug lead. The search goal is to find those variables by which one can intervene in the profile in such a way that the pathological values of the variables associated with a disease are reversed. The goal set is defined to consist of the variables of the disease profile with an inverted direction of change, i.e., if a variable is lower in the pathological profile, it is included in the goal to increase that variable value [van den Bosch, 2001, p. 35 (emphasis added)]. The (ideal) goal of such a search is to 'find a minimal set of variables such that a manipulation of the variable values propagates a change in direction of the values of the variables of the goal set' [van den Bosch, 2001, p. 35]. Because of their qualitative nature, QDEs are not typically complete, so that the specification of all possible desired value changes of the goal will not be possible. Accordingly QDE search models are taken as approximations to this ideal [Kuipers, 1999; van den Bosch, 1997; van den Bosch, 1998 ]. The search tasks can now be described. The starting point in a QDE model and whatever is known of the initial values of variables. Next a goal is selected. A goal is a set of desired values. The the searcher backward chains from the values set up by the goal to 'possible manipulations of the variables' [van den Bosch, 2001, p. 36]. Because QDE methods provide only approximately good (or bad) outcomes, approximation criteria are deployed to measure the closeness of fit between values sought by the goal and values actually produced by the manipulation of variables. The search is successful when it finds a set of manipulations that best approximates to the production of the greatest number of goal values, with least collateral damage. A successful search is thus a piece of backward chaining successfully completed. It is a case of what van den Bosch calls inference to the best intervention [van den Bosch, 2001, p. 34 ]. In the case of Parkinson's disease, one of the goal values is a reduced activation frequency of the SNR/GPi than is present in pathological circumstances. A search of possible manipulations discloses two salient facts. One is an increase (a) of L-
184
Chapter 6. Diagnostic Abduction in AI
dopa in the striatum. The other is a reduced firing rate (f) of the indirect channel between the striatum and the GPe induces a suppression of the firing rate of the SNR/GPi. It turns out that a D2 agonist can produce this decrease, but with lighter consequences for other such (i.e., 'dopaminergics')than dopamine. QDE searches bear some resemblance to computational diagnostic procedures. In each case, they tell us nothing new about the diseases in question. Their principal advantages are two. One is that these models are means of explicitizing both what is known of the relevant symptoms and how one reasons about them diagnostically and/or interventionistically. Another is that when QDEs are modelled computationally, they, like computerized diagnostic techniques, are especially good at teasing out all consequences compatible with integrity constraints and design limitations. In each case, the backward chaining component is modest. It is little more than manipulation by way of the appropriate monotonically increasing and decreasing functions of variables already known to be interconnected. However, a third feature of the QDE search methodology considerably enhances its status as an instrument of abductive reasoning. For all their incompleteness and, relatedly, their disposition towards misprediction, drug lead exercises can also be used as proposals for experiments, and thus satisfy at least Peirce's idea that decisions about where to invest research resources are a part of the logic of abduction. Claims for research-economic abduction is discussed in the next chapter.
6.7
Mechanizing Abduction
We have attempted to show that explanationist abduction embeds a subjective explanation in the form "If H were to have a degree of cognitive virtue k or higher, this would explain the state affairs at which the explanation is targeted; or for ease of exposition: If H were the case, E would also be the case. Subjunctive conditionals also crop up in another way. In some cases, as we have also seen, an abducer has occasion to consider the explanatory potential of propositions he considers to be false. This is as it should be. One of the tasks of abduction is to set up propositions for trial. One of the purposes of trials is to correct mistakes. The abducer is free to adopt a proposition he now considers false if he also now grants that he might prove to be mistaken in thinking so. Not only does thinking it false preclude a proposition from abductive conjecture; neither does its actually being false. For consider the contrary case. If it were a condition of abductive arguments that their conclusions be true, then twice-over abduction loses its very rationale. Supposing H to be true, the would-be abducer knows this or not. If he knows it there is no abduction possible with regard to H. If he does not know it, he does not know whether a condition on the possibility of what he
6.7. Mechanizing Abduction
185
undertakes to do is met. So he must, at best, be an agnostic about whether what he is doing is abduction. It is one t h i n g - and perfectly in o r d e r - for an abducer to be, on occasion, in some doubt about whether his abduction is correct or plausible. But it is not an acceptable account of abduction that the would-be abducer always be in the dark about whether the process he is involved in is abduction a t all. Let us grant that essential to the abducer's quest is that for propositions he now takes to be false and concerning which he also allows that he might be mistaken, he be ready to consider not only subjective conditionals, but counterfactual conditionals in the form. Even though H is false, it remains the case that were H true then E would be true. We may take it, then, that real-life abducers routinely deploy counterfactual conditionals. A psychologically real account of what abducers do must take this fact into account. Computer simulations of what abductive agents do are attempts at producing mechanical models that mimic abductive behaviour. A model gives a good account of itself to the extent that is mimicry approximates to what actually happens in real-life abduction. In particular, therefore, such a model works to the extent that it succeeds in mechanizing counterfactual reasoning. Can it do this? Our answer, which is adapted from [Gabbay and Woods, 2003a] follows closely [Jacquette, 1986]. People who are disposed to give a negative answer to this question are drawn to the following question: What is involved in expressly counterfactual thinking when it is done by real-life human agents? It appears that the human agent is capable of producing some important c o n c u r r e n c e s . For one, he is able to realize that P is true and yet to entertain the assumption that P is not true, without lapsing into inconsistency. Moreover, the human agent seems capable of keeping the recognition that P and the assumption that not-P in mind at the same time. That is, he is able to be aware of both states concurrently. Thirdly, the human agent is capable of deducing from the assumption of not-P that not-Q without in doing so contradicting the (acknowledged) fact that Q might well be true. When the AI theorist sets out to simulate cognitive behaviour of this sort, he undertakes to model these three concurrences by invoking the operations of a finite state Turning machine. Turning machines manipulate syntax algorithmically; that is, their operations are strictly recursive. The critic of AI's claim to mechanize counterfactual reasoning will argue that no single information processing program can capture all three concurrences. It may succeed in mimicking the first, in which the agent consistently both assents to P and assumes its negation, by storing these bits of information in such a way that no subroutine of the program engages them both at the same time. But the cost of this is that the second concurrence is dishonoured. The human agent is able consciously to access both bits of information
186
Chapter 6. Diagnostic Abduction in AI
at the same time, which is precisely what the Turning machine cannot do in the present case. It is possible to devise a program that will enable the simulation of the first and the second concurrence. The program is capable of distinguishing syntactically between the fact that P and the counterfactual assumption that not-P, say by flagging counterfactual conditionals with a distinguished marker, for example (~). Then the program could have subroutines which has concurrent access to "P" and "(~)not-Pt~)", without there being any danger of falling into inconsistency. Here, too, there is a cost. It is the failure of the program to honour the third concurrence, in which it is possible correctly to deduce "(~)not-Qt~" from "p", "~)not-P(~)" and "if not-P then not-Q". Of course, the program could rewrite "If not-P then not-Q" as "If' (~)not-P(~' then '(~)not-Q(~)'". From the counterfactual assumption "(~)not-P(~", the deduction of "(~not-Q(~)" now goes through, and does so without there being any question of an inconsistency on the deducer's part. Still, there is a problem. It is that t~)-contexts are intensional. There are interpretations of P and Q for which the deduction of "t~)Qt~)" from "not-P", "counterfactually if P then Q" and "(~)P(~)" fails. Thus it is possible to assume counterfactually that Cicero was a Phoenician fisherman, and that if Cicero was a Phoenician fisherman, then Tully was a Phoenician fisherman, without its following that I assume that Tully was a Phoenician fisherman. The notation "(~)Qt~)" expresses that Q is assumed. Assumption is an opaque context [Quine, 1960], hence a context that does not sanction the intersubstitution of co-referential terms or logically equivalent sentences. [Jacquette, 1986]. Thus t~)-inference-routines are invalid. Their implementability by any information processing program that, as a finite state Turing machine must be, is strictly extensional dooms the simulation of counterfactual reasoning to inconsistency. We should hasten to say that there are highly regarded efforts to mechanize reasoning involving counterfactual or belief-convening assumptions. Truthmaintenance systems (TMS) are a notable case in point [Rescher, 1964; Doyle, 1979]. See also [de Kleer, 1986; Gabbay et al., 2003; Gabbay et al., 2002; Gabbay et al., 2004]. The main thrust of TMSs is to restore (or acquire) consistency by deletion. These are not programs designed to simulate the retention of information that embeds belief-contravening assumptions and their presentation to a uniformly embracing awareness. The belief that P is not inconsistent with the concurrent assumption that not-P. There is in this no occasion for the consistency-restoration routines of TMS. Thus (~)-contexts resemble contexts of direct quotation. Such are contexts that admit of no formally sound extensional logic [Quine, 1960; Quine, 1976]. No strictly extensional, recursive or algorithmic operations on syntax can capture the logic of counterfactual reasoning. Whereupon goodbye to a finite state Turning machine's capacity to model this aspect of abductive reasoning.
6.7. Mechanizing Abduction
187
Named after the German word for assumption, ANNAHMEN is a computer program adapted from Shagrin, Rapaport and Dipert [1985]. It is designed to accommodate hypothetical and counterfactual reasoning without having to endure the costs of either inconsistency or the impossibility of the subject's access to belief-contravening assumptions and the beliefs that they contravene. ANNAHMEN takes facts and counterfactual assumptions and conditionals as input. The latter two are syntactically marked in ways that avoid syntactic inconsistency. This input is then copied and transmitted to a second memory site at which it is subject to deduction. The previous syntactic markers are renamed or otherwise treated in ways that give a syntactically inconsistent set of sentences. The next step is to apply TMS procedures in order to recover a consistent subset in accordance with an epistemic preference-heuristic with which the program has been endowed. In the case before us, the TMS is Rescher's logic of hypothetical reasoning, or as we shall say, the Rescher reduction. From this consistent subset the counterfactual conclusion is deduced by a Lewis logic for counterfactuals and syntactic markers are re-applied. Then all this is sent back to the original memory site. It mixes there with the initial input of beliefs and belief-contravening assumptions. ANNAHMEN can now perform competent diagnostic tasks and can perform well in a Turing test [Turing, 1950]. As Jacquette [2004] observes, the functions RESCHER REDUCTION and LEWIS LOGIC call procedures for the Rescher-style reduction of an inconsistent input set to a logically consistent subset according to any desired extensionally definable set of recursive or partially recursive heuristic, and for any desired logically valid deductive procedure for detaching counterfactual conditionals, such as David Lewis' formal system in Counterfactuals [Lewis, 1973]. The problem posed by the mechanization of counterfactual reasoning is that there appeared to be no set of intensional procedures for modelling such reasoning which evades syntactic inconsistency and which allows for what Jacquette calls the "unity of consciousness" of what is concurrently believed and contraveningly assumed.. ANNAHMEN is designed to show that this apparent problem is merely apparent. The solution provided by this approach is one in which the inconsistency that occurs at memory site number two exists for nanoseconds at most and occurs, as it were, subconsciously. Thus counterfactual reasoning does involve inconsistency. But it is a quickly eliminable inconsistency; and it does not occur in the memory site at which counterfactual deductions are drawn. Inconsistency is logically troublesome only when harnessed to deduction. It is precisely this that the ANNAHMEN program precludes. It may also be said that the program is phenomenologically real. When human beings infer counterfactually, they are aware of the concurrence of their beliefs and their belief-contravening assumptions, but
188
Chapter 6. Diagnostic Abduction in AI
they are not aware of the presence of any inconsistency. (Rightly, since the counterfactual inference is performed "at a site" in which there is no inconsistency.) The ANNAHMEN solution posits for the reasoning subject the brief presence of an inconsistency that is removed subconsciously. It is some interest to the present authors that the program implements the operation Putter-of-Things-Right of [Gabbay and Woods, 200 l b ]. This is a device postulated for the human information processor. What makes the ANNAHMEN proposal especially interesting in this context is that, in effect, it purports to show that Putter-of-Things-Right is mechanizable. Whether it is or not, we find ourselves in agreement with Jacquette in the case of an ANNAHMEN approach to counterfactual reasoning. Jacquette shows that while ANNAHMEN handles certain types of counterfactual reasoning, it fails for other types. Further even though certain refinements to the ANNAHMEN protocols, in the manner of Lindenbaum's Lemma for Henkin-style consistency and completeness proofs or in the manner of the Lemma for consistent finite extensions of logically consistent sets, resolve some of these difficulties; they cannot prevent others [Tarski, 1956; Henkin, 1950]. We side with Jacquette in thinking that there "is no satisfactory extensional substitute for the mind's intentional adoption of distinct propositional attitudes toward beliefs and mere assumptions or hypothesis". We shall not here reproduce details of Jacquette's criticisms; they are well-presented in [Jacquette, 2004]. Our more immediate interest is in the recurring question of whether the logic of down below is plausibly considered logic. Earlier we briefly noted treatments of abductive insight by connectionist models of prototype activation at the neurological level. (See also [Churchland, 1989; Churchland, 1995; Burton, 1999 ].) We were drawn to the connectionist model because it see to capture important aspects of abductive behaviour at subconscious and prelinguistic levels. If this is right, there are crucial aspects of abductive practice which on pain of distortion cannot be represented as the conscious manipulation of symbols. As we now see, the manner in which ANNAHMEN is thought to fail bears on this issue in an interesting way. At present there is no evidence that conscious mental functions such as memory and desire, and even consciousness itself, has a unified neurological substructure [Kolb and Whishaw, 2001 ]. If one subscribes to an out and out connectionist materialism with respect to these matters, we would have it that the phenomenological experience of a unified consciousness is an illusion. If that were so, then it would hardly matter that a mechanized logic of counterfactual reasoning is incompatible with the unity of consciousness. Even if we allowed that phenomenologically unified manifestations of consciousness were not always or in all respects illusory, we have already seen in section that there are strong information-theoretic indications that the conscious mind is neither conscious enough nor efficient enough for the burdens of a human
6.8. Abduction in Neural-Symbolic Networks
189
subject's total cognitive agenda. In the face of mounting evidence that substantial and essential aspects of cognition operate down below, it seems an unattractive dogmatism to refuse to logic any purchase there. Add to that, that plausible mechanization of cognitive processes such as hypothetical reasoning require that we postulate subconscious and prelinguistic performance in a way that downgrades the role of a unified consciousness, not only is the logic of down below given some encouragement, but so to is its algorithmic character. So we think that it must be said by everyone drawn, for reasons of the sort we have been examining, to the logic of down below is in all consistency pledged to reconsider, with more favour than proposed by researchers such as Jacquette, the plausibility of mechanical models of abductive practice.
6.8
Abduction in Neural-Symbolic Networks
We suggested earlier that part of an individual's cognitive wherewithal might well be representable in a connectionist logic. Such was the conjecture of Chapter 3. We have also touched briefly on non-representational systems in which various constraints are satisfied but not by following rules for their satisfaction - - yet another comment on the process-product dichotomy. An attraction of such systems, apart from their intrinsic interest, is the hope they offer, albeit with qualification, to logicians who take seriously the massively plain fact of cognitive performance " down below". Connectionist approaches also offer some (conjectural) relief on the score of computational complexity. It must be admitted, however, that the relief is rendered ambiguously. On the one hand, parallel distributed processes are intuitively plausible subduers of complexity, echoing the old saw that many hands make light work. On the other hand, they are often highly complex systems to implement mechanically. The one fact does no discredit to the other. Whatever the complexities of simulating parallel processes, and whatever the complexities in the evolution of them in human beings, when they operate in human beings there is every reason to think that they achieve the economies attendant upon the efficient evasion of complexity-overload. Connectionist logics are still in their infancy. But enough is already known about them to make it possible to say that, in their standard forms, they are not especially well-suited to abduction. The problem is that the operation of connectionist backwards-chaining is too coarsely grained for the selective refinements of hypothesis-generation and hypothesis-engagement. Our principal task in this final section of the chapter is to consider ways of mitigating this difficulty. To this end, we will sketch a new parallel model for abductive reasoning based on Neural-Symbolic Learning Systems. Our further purpose, in addition to the benefit of possible parallel speed-ups, is to sketch an integrated reasoning and learning system. This requires the use of simple neural networks to which stan-
190
Chapter 6. Diagnostic Abduction in AI
dard, off-the-shelf learning algorithms can be applied. A third objective is to give the reader an early taste of what lies in store in the next, and final, part of the book, devoted to formal models. The main problem to tackle here is the fact that neural networks work bottom-up, while for abduction we would like to reason topdown. One might be tempted to revert the network in an attempt to reason using abduction, but this would not work in the case of neural networks, as the following example illustrates.
Example 6.18 Consider the Neural-Symbolic Learning System o f Figure 6.3. It encode the logicprogram P = {rl : a, b --+ x; r2 : c --+ x; r3 : x --+ y}. Each rule ri is mapped from the network's input layer to its output layer through a hidden neuron Ni such that output is activated if the input is satisfied. For example, output neuron x will be activated if either input neurons a and b are both activated, or if neuron c is activated. In addition, input and output neurons having the same label (e.g., x) are linked through a feedback connection with weight 1 connecting the output to the input layer of the network. This is responsible for implementing chains such as a --+ b and b --+ c, in the network. In the case of P, this is how, given a and b, the network would have y activated, via neuron x. From using abduction on P, we know that {a, b} is a possible explanation for x and so is c. I f we were to simply revert the network in an attempt to compute explanations {a, b} and {c} given hypothesis x, we would have a relation instead o f a function from the input to the output o f the network. As a result, a standard neural network (which computes functions, and not relations) would not be able to distinguish {a, b} and {c} as two alternative explanations for x. Instead, {a, b, e} would be activated given x.
An alternative to the problem discussed in this example would be not to reverse the network but to treat different sets of input values to input neurons a, b, c as hypotheses. Abduction in this case would be the process of presenting such different input values to the network and inspecting the output for additional hypotheses. For example, if a and b were activated in the input layer of the network of Figure 6.3, and we treated these activations as hypotheses instead of as facts, we would be able to conclude that, since a and b activate x, {a, b} is an explanation for x. Similarly, {c} would be an explanation for x, and x an explanation for y, but not {a} alone, or {b} alone. In addition, {a, b, c} would still be an explanation for x, but a non-minimal explanation. The problem with this approach is on the choice of inputs to select. Would we, for example, select all combinations of inputs? In this case, we would have an exponential complexity algorithm with each input being either activated or deactivated, and 2 ~ input values to check, where n is the number of input neurons (atoms in the corresponding logic program). What we really would like to be able to do is to reason top-down or goal directed way. We would like to be able to activate
6.8. Abduction in Neural-Symbolic Networks /j.
,;
/
..............
191
.. ",
!
i ,
," -. . . . . . . - /
Figure 6.3 A Neural-Symbolic Learning System
y as a hypothesis in the network and obtain {x}, {a, b} and {c}, (for example), in parallel as alternative possible explanations for this hypothesis. A solution to this problem lies with connectionist modal logics [d'Avila Garcez et al., 2002; d'Avila Garcez and Lamb, 2004 ]. Modal Logics can be implemented in Neural-Symbolic Learning Systems with the use of an ensemble of neural networks, each network representing a possible world. The use of an ensemble allows for the representation of relations such as accessibility relations, in neural networks. Each network in the ensemble is a simple single hidden layer network like the network of Figure 6.3, to which standard neural learning algorithms can be applied. Learning, in this setting, can be seen as learning the concepts that hold in each possible world independently, with the assessibility relations providing the information on how the networks should interact. In the case of abductive reasoning, we can model the fact that {a, b} and {c} are possible explanations for x, for example, by having neurons a and b active in a network of the ensemble (say, W 1). Neuron c also active in a different network of the ensemble (say, W2), whenever neuron x is active in a network W such that R ( W , W1) and R ( W , W2), where R is an accessibility relation. The following example illustrates the idea. Example 6.19 Take the same program P - { r l " a, b -+ x; r2 " c -+ x; r3 " x -+ y}. First, we translate P into a modal program by replacing each rule o f
192
Chapter 6. Diagnostic Abduction in AI
the form L1,..., Ln --+ A by a modal rule of the form A --+ (}(L1 A ... ALn). The intuition behind this translation is that L1, ..., Ln is a possible explanation for A (and thus the use of (}). In addition, we label each rule r i with a world Wi in which ri holds, and define how the worlds relate to each other (i.e., the accessibility relation R ( W i , Wj )), according to the dependency chains in P. This will become clearer when we present the algorithm to translate P in the sequel For now, as an example, suppose we translate r 1,r2 and r3 in this sequence. For rl, we obtain W1: x --+ (}(a A b). For r2, since there is no chain from rl to r2, we keep r2 in W1, and obtain WI : x --+ {}c. Finally, for r3, we define W2 : y --+ ~ x and R(W2, W1), since now there is a chain between r3 and the other rules using x. Given W1: x --+ (}(a A b), by definition we would like a and b to be true in a world Wo such that R ( W 1 , Wo). Similarly, given W1: x --+ (}c, we would like c to be true in another world Wo, such that R ( W 1 , Wo, ). Note that, by defining the relation R appropriately, and since in this case we have a different neural network for each world, we can check that when e.g., x is activated in W1, a and b will be activated in Wo, while c will be activated in Wo,. Similarly, when y is activated in W2 then x will be activated in W1. This allows us to reason top-down, in parallel, and at the same time to keep track of the alternative explanations for our hypotheses.
Given a modal program, an ensemble of neural networks can be constructed by repeating the procedure for constructing single networks. Figure 6.4 shows networks W1, Wo and Wo, for the program of Example 6.19. In W1, whenever input neuron X is activated, we would like neurons (}A, B and (}C also to be activated. This can be easily implemented by properly setting up the connection weights and thresholds of the hidden neurons connecting the input to the output. In addition, whenever output neuron (}A, B is activated, we would like to have output neurons A and B activated in Wo. This is implemented in the same way with the use of hidden neurons in Wo. Similarly, whenever output neuron (}C is activated, we would like to have C activated in Wo,. Note that the fact that neurons A and B, and neuron C get activated in different networks is responsible for identifying {a, b} and {e} as two alternative explanations for x. In addition, similarly to the feedback connections in a neural network (e.g., linking output neuron X to input neuron X in W1), there might be feedback connections between networks (i.e., from Wo and Wo, to W1). From Wo to Wa, there is feedback from A and B to X such that whenever both output neurons A and B are activated in Wo, output neuron X is activated in W1 (again, this is implemented via an A N D gate hidden neuron). Similarly, from Wo, to W1, there is feedback from C to X such that whenever C is activated, X is activated (i.e., output neuron X acts as an O R gate for the hidden neurons that are linked to it). This allows one to reason deductively as well as abductively within the same model. If, for example, for some reason, output neurons A and B are activated in W o (say,
6.8. Abduction in Neural-Symbolic Networks
193
we force A and B to be activated, or there are other rules and facts in Wo that make A and B activated), output neuron X will be activated in W1 (implementing rule a, b --+ z). Similarly, if C were to be activated in Wo, then X would be activated in W1. In summary, when activations are propagated forward, according to the accessibility relation, the network computes explanations for hypotheses; when activations are propagated backwards, through the feedback connections in the network, deduction is being performed. This is a very interesting characteristic of the model presented here.
WO
X % l t
J
I I
I
i ! I I
\
i
w0'
I I I
I
I ! I I
I I / I
I
I
%
i
4 b
9., ,... i ..,~,.. J~ ' ~
-- -
><-.i'-..
@
"
.
%
--
%
%
i// I
/ I
/
% I
l
r'
l
1 I I I
i
/ W1
Figure 6.4 Neural Network Ensemble for Abductive Reasoning
194
Chapter 6. Diagnostic Abduction in AI
Let us now present the algorithms to translate symbolic rules into connectionist networks that reason abductively. 1. For each rule in P of the form X1,
...,
X n --~ Y do:
(a) If Y has not been assigned a world: i. Assign a new world Wi to Y; (b) If X1, ..., Xn have not been assigned a world: i. Assign a new world Wj to X1, ..., Xn; (c) Make R ( W j , Wi); 2. Make R(Wk, Wi) for any rule Wi in which ~ is used; and 3. Call Modalities Algorithm to build the network ensemble. To run the network ensemble to compute explanations (forward), proceed as follows: (a) Activate a number of neurons (hypotheses) at time t 1; and (b) Check neuron activation at times t2, t3,.., until the ensemble is stable (i.e., until activations at time ti - activations at time ti+ 1). Theorem 6.20 Each set of activations at time ti in each world is apossible expla-
nation for the activations at time ti-1. To run the network ensemble to compute answer sets (backwards), proceed as follows: (a) Introduce a number of facts to the ensemble by setting certain output neurons as being always active, regardless of the input; and (b) Let the ensemble become stable. Theorem 6.21 The set of activations in the ensemble will be the set of what can
be deduced from the facts given. Connectionist logics are a lot like human infants. They are difficult to raise, and then take quite a long time to grow up. We offer the suggestions of the present section in the hope that it might be said, sooner rather than later, that connectionism in logic has to some extent grown up and that the days of its infancy are numbered, if not ended quite yet.