e C) such that at least one member from each equivalence class whose members are <~eC (>~eC) is represented. Clauses may be organized into a tangled hierarchy where all clauses below a given clause in the hierarchy are ~ j, C by conditions (2) and (3). Construct D' correspondingly. Clearly, lg((Y, D') <~ rnsg by definition of lg because both C' and D' are ~<~ msg, so
Head. Tail,
equivalent to
[Head ITail ] or cons(Head, Tail).
162
W. I3UNTINE
generalization hierarchy, the Model Inference System a specialization hierarchy, and the Version Space method [10] the intersection of the two hierarchies. Surprisingly, specialization and generalization hierarchies possess very different characteristics. To investigate these, I consider the advantages gained during search of the hierarchies when using a stronger model of generalization, as achieved by utilizing more of an induction system's current knowledge. The use of a stronger model of generalization does not in fact increase the scope of the search space defined by a specialization hierarchy. A hierarchy so constructed will still span an equivalent set of clauses. More specilically, let Q ~ P be two logic programs, let S t, denote the set of clauses represented in some specialization hierarchy w.r.t. P rooted at the clause (', and let 5,'~, denote the same for Q. Then every clause in S~, has a member of its equivalence class w.r.t. Q in S O (as a specialization w.r.t. P is also a specialization w.r.t. Q), ~md every clause in S o has a member of its equivalence class w.r.t Q m ~,'~, (let D ~ S~_~ and C~,,~O = D~,,~, then (D U ('0) =c) D and -<-~(7, hencc ~:~, ('). Using a similar argument, this property can be shown to occur in any rule language where the bodies of the rules are closed under conjunction. And, any specialization hierarchy rooted at the most general clause (for the sort predicate, this was sort(X, Y)) will span all possible clauses, it forms a complete search space. A stronger model of generalization, however, reduces the size but not the scope of the search space because several equivalence classes may collapse into the one. This was demonstrated by the second example in Section 2.2 where clause (5) becomes equivalent to clause (4). The stronger model also allows more pruning to occur during search, as demonstrated by clauses (6) and (7) in the same example. To illustrate these ideas with a more extensive example, consider the logic program below called Philosophers: in tellectual( socrates ) . man(socrates) , mortal(X) e- man(X) , greek(socrates) , man(descartes) .
Part of a specialization hierarchy constructed with the same language and organized using 0-subsumption is shown in Fig. l. Compare this with Fig. 2 showing part of the same hierarchy organized using subsumption w.r.t. Philosophers. Notice that not only have groups of nonequivalent clauses collapsed into single equivalence classes, as highlighted by the shading, but also the hierarchical structure of the space has been rearranged. For instance, compare the positions of intellectual(socrates), the clause shaded the darkest. Most of the significant differences between the hierarchies stem from the clause m o r t a l ( X ) +- m a n ( X ) .
163
G E N E R A L I Z E D SUBSUMPTION
intell(X)eman(Y) •
ntell(X)~mortal ( Y)-
intell(X){ntell(descartes) intell(s ocrates)!iiiili "~/X ~........~......................................................................... . . . . . . . . . . ~!i!! g reek ( Y) •
~'~~in~a~(~ ) ~~(~)!"~(ne~k:~
),
", '~
" ~ in~l(~)v~man(X), \ man(Y), \ mortal(Y). ,'~ mortal(Y). .~,.'~.,.-.~\\~,~"~ ~
T
~" intell(X) ~ x 5 ~ man(X), Q . mo~al(X). ~ ~
~ ,ntell(X)~-~ ~ man(Y), ~ _ \mortal(X).~ ~--~. ~
intell(X)~ man(Y), mo~al(Y), greek(Z).
,ntell(X)eman(Y), mortal(Z), greek(W).
~intell(X)~ ~ man(Y), E 2 mo~al(X),~ '- greek(Z)~
Fig. 1. Part of a specialization hierarchy w.r.t. ¢), that is, under 0-subsumption, rooted at intellectual(X), Arrows indicate "subsumes w.r.t. 0", that is, "0-subsumes".
In contrast, for generalization hierarchies w.r.t, different knowledge, the same equivalence of scope does not hold. The more knowledge incorporated into the model of generalization, the greater the number of clauses spanned. In particular, a generalization hierarchy w.r.t, the current knowledge rooted at a given fact contains representatives of exactly those equivalence classes whose clauses are currently known to cover that fact. This stands out dramatically if we compare the clauses found subsuming (above) intellectual(socrates) in Fig. 1 with those in Fig. 2. Thus, when using a generalization hierarchy w.r.t, current knowledge to achieve induction, an assumption is being made about the current knowledge. It must be sufficient to justify some clause, true in the intended interpretation, subsumes the root clause. I call this the justifiability assumption. The justifiability assumption dictates that knowledge should be acquired incrementally, with new concepts being built on existing knowledge. When the assumption is used, the scope of the resultant search space is greatly restricted in comparison with a full specialization hierarchy. There is one situation where the justifiability assumption is satisfied by default. This occurs when an unknown concept does not need a recursive
164
w. BUNTINE
intell(X) ~ ~ ~ n--- "'"""'" t ~ - mor t a l ( ~ / ~ / ~ . / / / ~ ~
n ~
t
e
~
l ~ ~, ~ , ( X ) ~
~
gre.~e scartes,. ~ ~}'22~X), ~esca~es)
intell(X)~ man(X), greek(X).
~:~:;~ (socrates}: Fig. 2. Part of a specialization hierarchy w.r.t. P~ilosophers roo~ed at b~ellectua~(XL Arrows indicate "subsumes w.r.t. Philosopher.~ "~. Clauses equivalent w.r.t. Philosophers arc shaded identically in Fig. 1 and Fig. 2.
definition and all predicates that could be used in the definition are such that the truth of their instances are known according to current knowledge. The Version Space method is often explained in this context. 1 envisage two other situations where the justifiability assumption is applicable. Induction may be being controlled by a knowledgeable trainer, as is expected with M A R V I N [18]. Before beginning induction, it is not unreasonable to expect the trainer to supply the system with relevant knowledge and pertinent examples sufficient to justify that a clause is indeed a cover. Although Banerji [28] comments, in the context of the induction of recursive rules, such a system "depends heavily on the facts being presented in a proper sequence", Second, a system may be supplied with all currently available knowledge, that is, observations and a suitably rich description language, and be required to perform induction solely with that knowledge. Any inductive hypothesis can only be accepted if it is currently plausible. I claim that to a first approximation, plausibility of a clause is equivalent to being able to justify that the clause covers at least some facts known true, but none known false. Most specific generalizations, constructed by some induction techniques to search a generalization hierarchy, are to be considered separately in Section 7.
GENERALIZED SUBSUMPTION
165
6. A Model of Redundancy In this section I consider a restricted form of redundancy called logical redundancy. With this form, redundancy is determined according to the ideal rules of logic, independently of the particular implementation of the knowledge-based or logic programming system being reasoned about. Before considering logical redundancy in more detail, it is best to consider the issues that it ignores. In the final analysis, a rule or some condition in a rule is redundant for a particular knowledge-based system if, on its removal, the system would continue to perform as required and would still do so after any feasible extensions to the system's knowledge have been made. Apart from just the form of knowledge available, several aspects need to be considered when determining this: (1) the kind of questions that will be asked of the system by the user; (2) the completeness of the reasoning component of the system (that is, given enough resources, is it able to draw all possible conclusions); (3) the resource and timing constraints the system operates under; and (4) the subset of the system's knowledge base that is correct or complete. (For example, if a set of Horn clauses is complete, that is, does not need to be extended in order to make conclusions about facts, we are able to use the closed world assumption to make conclusions about their negation [23, Chapter 3].) With logical redundancy, the following assumptions are being made: (]) any possible question can be asked of the system; (2) the system's reasoning component is complete; (3) the system has unbounded resources~ and, (4) no part of the knowledge base is complete, although a certain subset (usually represented below by P) is correct. It is fairly safe to make assumptions (]) and (4) because these only weaken the actual situation. ! consider some implications of relaxing assumption (3) in Section 6.2.
6.1. Logical redundancy At least two kinds of this style of redundancy exist in a logic program. A clause can be redundant and so can an atom within a clause. A clause D is redundant in a logic program P if P - { D } ~ VD. Such a clause can be removed from the logic program and, by assumptions (2) and (3) above, a particular implementation will still have exactly the same goals succeed. For example, the third and fourth clauses in the program
member(X, X. Y) , member(X, Z. Y) ~- member(X, Y) , member(X, Z.X.[ ]) , member(l, 3.2.1.[ ]),
166
W. BUNTINE
arc redundant because they are both logical consequences of the first two. From T h e o r e m 4.3, if a clause is redundant, there always exists another clause in the logic program that can be considered primarily responsible for rendering it redundant. Several such clauses may exist. For example, from T h e o r e m 4.2 the third and fourth clauses in the above program are both more redundant than the second clause and the first cannot be so related to any of the others in the program. In view of this, the subsumption relation can be considered to order clauses in terms of their relative redundancy. In addition, the subsumption test gives a semi-decidable algorithm to detect logical redundancy, ideally suited to computation by a logic programming system. This allows~ for instance, the redundancy of clauses and not just facts to be detected (compare with Bowen and Kowalski [29]). Within clauses themselves, a second type of redundancy is possible. If clauses are being constructed by an algorithm with no proper regard for lt~e underlying semantics, for example, if clauses are being enumerated during induction, they may contain atoms making no effective contribution lo the successful working of the clause. The clause
cuddly-cat(X) ~- fluC]}'( X ), cat(X), animal(X) is such a case if it is known that a cat is always an animal. This is because thc atom animal(X) will be proven true whenever cat(X) is. The last atom can be said to be redundant because its only effect is to cause additional but unnecessary computation. Formally, this occurs because if the logic program P contains the clause animal(X) ~- cat(X), then
cuddly-cat(X) ~-- fluffy( X ) , cat(X), animal(X) - t , cuddly-cat(X) *-- fluf[y( X ), cat(X) and, according to Corollary 4.4, the shorter clause can replace the longer in any such logic program and the same goals will still succeed. Plotkin [11] denotes the process of ensuring that a clause contains no further redundant atoms, in the context of 0-subsumption, reducing a clause. A corresponding concept is appropriate for generalized subsumption. Definition 6.1. Clause A 0 *- A ~. . . . . A,, is reduced w.r.t logic program P if for all i such that l ~ < i ~ < n , A ~ - A ~ . . . . . A~ ~ , A i ~ . . . . . A,,is not equivalent, w.r.t. P to A~ ~- A ~. . . . . A,,. A clause D within a logic program P can be replaced by an equivalent but reduced clause w.r.l. P and the new logic program will have exactly the same goals succeed. More precisely
GENERALIZED SUBSUMPTION
167
Lemma 6.2. Let D ~ P. l f D' = e D and D' has been obtained by deleting atoms
from the body of D, then b P < - ~ ( P - {D} tO { D ' } ) . Proof. D ~>1,D', so by Corollary 4.4, ~ P - ~ ( P - {D} tO {D'}). D' ~>0 D (as D' has been obtained by deleting atoms from D) so the reverse direction holds as well. [] The equivalent but reduced version of a clause will always remain equivalent to the original if the logic program is subsequently expanded. In additiom a reduced form obtained by deleting atoms is guaranteed to cause smaller proofs to be built--and would usually cause less computation--than the original clause, because any proof using the original will always contain a proof using this reduced form. When the reduced form is unique, it is guaranteed to cause smaller proofs to be built than any other equivalent clause. Unfortunately, the uniqueness condition does not always hold. Although it is a simple matter to show that the reduced form w.r.t, a nonrecursive program will always be unique. An algorithm for reducing a clause, given below, is the same as Plotkin's [11, Theorem 2] but is based on the subsumption test given in Theorem 4.2 rather than 0-subsumption, and, as also suggested by Sagiv [27, Section VIII, each atom need only be considered for reduction, in Step 2, once in the entire course of the algorithm. It inherits termination problems from the subsumption tests performed in Step 2. Because the input clause is finite in length, the algorithm is guaranteed to terminate with a clause D set to a reduced form of C if all subsumption tests terminate. Theorem 6.3 (Reduction algorithm). The reduction algorithm below accepts as input a logic program P and a clause C. Assuming all subsumption tests terminate, the clause D when output will be in reduced form and be equivalent to C w.r.t.P.
Step 1. Set D to C. Step 2. For each atom A in the body of C, if D - { ~ A } < ~ I , D D - {-~A}.
set D to
Proof. Termination has already been discussed. Clearly, D ~
168
W. BUNTINE
Represent D by A ~ ~ A ~. . . . , A,,. Assume A,, can be reduced from D, that is Ao~-AI,...,A,,
~=eA~I~--AI ....
,A,,,
and An+ ~ was the last atom reduced to obtain D, that is A o ~--A ~. . . .
, A,, =t, A o ~-- A ~. . . . .
A,,~I .
This implies Ao <--A ~. . . . .
A,,
~~ e A ~
~-A~ ....
Ao~-A
A,,
t,A,,+~<~oAoe-A~
, A,,+.~
and so ~. . . . .
.....
A,~ t
<~t, A o ~-- A ~ , . . . , A,,.~ 1 .
Consequently, A,, could have been reduced before A,,+~, that is A~I~--AI,...,A,,
I,A,,+~=t, Ao~-AI
.....
A,,~I.
So, if an atom in the output version of D can be reduced, it would also have been able to be reduced during Step 2 (as earlier versions of D were equivalent but had extra atoms), so should not exist in the output version. Because this is a contradiction, no atom in the output version of D can be reduced from D. [] The complexity of this algorithm depends on the complexity of subsumption. Unfortunately, 0-subsumption is known to be NP-complete [30, p. 264]. The algorithm can be speeded in some instances by attempting to reduce a small group of atoms before employing the full subsumption test in Step 2. For example, consider the clause a(X) ~-b(X,
Y), c(X, Z), d(Z, W).
Using T h e o r e m 4.2(2), it can be shown that b ( X , Y ) can be reduced from this clause if a(X)~--c(X, Z), d(Z, W) <~pa(X)~--b(X, Y) .
If successful, this computation is far shorter than the full subsumption test of
a(x)~-c(X, z), ct(Z, w)~pa(X)~-b(x, v), c(X, z), a(z, w).
GENERALIZED SUBSUMPTION
169
Similarly, d(Z, W) can be reduced if
b(X, Y),
z)
W),
where z is a unique constant symbol. An application of the reduction algorithm is demonstrated in Section 7. Maher [31] suggests that a logic program can be reduced under 0-subsumption to a "canonical form" by removing all subsumed clauses and then replacing all remaining clauses by their unique reduced form under 0-subsumption. Likewise, it is possible to remove both kinds of logical redundancy from a program by removing all redundant clauses and then replacing each clause in the program by its reduced form w.r.t, the program, as in Lemma 6.2. Sagiv [27, Fig. 2], proposes a variant of this technique for "minimizing" a D A T A L O G program (although, in his algorithm, the strict order in which redundancies can be removed is unnecessary); as mentioned in Section 4, subsumption w.r.t. DATALOG programs is decidable. Similar techniques to these need to be developed for the more extensive knowledge representations usually found in knowledge-based systems, to assist in the task of knowledge base maintenance. 6.2. Redundancy when resources are bounded
Consider what happens when assumption (3) for logical redundancy is dropped; resources are always limited in practice. In this situation, rules that are logically redundant may become necessary for the system because the processing resources may not be available to deduce them in the course of answering other questions. It is important then to introduce controlled logical redundancy. There are two main contexts in which this can be done. Logically redundant rules may need to be generated by the system before it could even begin, within its resource constraints, to answer a particular question. This, of course, leads to the problem of lemma conjecturing [32], so important in proof discovery. Alternatively, the system may be currently answering a particular style of question and is required to improve its performance on these. A partial solution to this problem has already been proposed in another setting: explanation-based learning in the context of a strong, logical, domain theory [33]. Methods currently used [33, 34] effectively take the results of a deduction, a proof tree, remove problem specific components from the tree (the interesting question is, which ones?) and then collapse the remainder of the tree to a single rule using a technique such as partial evaluation. This is a simple knowledge compilation method that produces logically redundant rules to allow similar problems to be solved more efficiently. This pre-empts the need for methods such as analogy to be used when solving these similar problems.
170
,a++ BUNTINIai
The kind of "generalization step" used may not always be directly applicable in more extensive contexts. For instance, with a logic incorporating a time or change component, such as a planning logic, goal regression ma> also bu required to generalize an " e x p l a n a t i o n " / p r o o f .
7. Most Specific Generalizations A clause C is a most specific generalization w.r.t, logic program P of clauses D~ and D~ if C is constructed from (predicate, constant, and function) symbols occurring in P, DI and D 2, C is a common generalization of D~ and D,, and for any other common generalization, C', C'->-/, C. Although the clause (~ is not unique, its equivalence class under =/, is. Now, it is a simple matter to devise a logical formula that is a most specific common generalization: take the disjunction of the two clauses D~ and D,. But finding a clause is a more difficult problem. The concept is important for induction because, assuming that known clauses D~ and D 2 are specializations w.r.t. P of some unknown clause true in the intended interpretation, the most specific generalization w.r.t. P of D~ and D e must also be true. The assumption is similar to the justifiability assumption. Vere [19] gives illustrative examples where the current knowledge is represented by a set of facts. The following theorem, in conjunction with Plotkin's least generalization algorithm [11, T h e o r e m 3] suggests a method to find a most specific generalization if one exists. Plotkin's algorithm is used to find the least generalization (most specific generalization w.r.t. ~, under 0-subsumption) of two clauses. Theorem 7.1 (Finding a most specific generalization of two clauses). Let C and D be two clauses containing disjoint variables and P be a logic program. Let O~ be a substitution grounding the variables occurring in Che,,~ to new constants, O~ be a substitution likewise grounding the remaining variables occurring in C, and d)~ and d)2 be likewise for D. If a most specific generalization of C and D w. r.t. P exists then it is equivalent w.r.t. P to a least generalization of CO~ t3 { ~ A ~ . . . . . ~ A , , } and Ddo~Lt(-~B l . . . . . ~B,~,} where for l~i-~-.n, P:\ C~,odyO~~ ~ A~, and A~ is a ground atom constructed from symbols" occurring in P, C, 0~, 02, and D. Likewise .[or each B:. Proof. Let msg(C', D') denote a representative most specific generalization of clauses C' and D ' w.r.t, program P. Likewise, let lg(C', D') denote a representative least generalization. Let msg be any most specific generalization of C and D w.r.t. P containing no variables in common with C, D, or P. First, goals G and H shall be constructed such that msg = t, lg(C01 tO G, Dgal tO H) . Let r be the unique substitution affecting only variables in (msg)h~a u such that (msg)hc~d r = Chela. This exists because msg is a generalization of C. Let
GENERALIZED SUBSUMPTION
171
C' = C ~J (msg)r~7 for any substitution cr such that (1) ~ affects variables in the body of (msg)'r only, (2) (rnsg)~o- >~p C, and (3) variables common to C and (msg)'ccr must occur in the head of C. C' <~ rnsg by the construction of C', C' ~
lg(C', D') <~pmsg. In addition,
lg( C', D ' ) >~,,msg( C', D ' ) because any generalization w.r.t. ~ is also a generalization w . r . t . P . But, C' = p C and D' = e D , so
lg(C', D') >~~ rnsg . Consequently,
rnsg =elg(C', D') =plg(C'O~, D'd~) =elg(CO~ ~ G, Dd)~ U H) , where G represents the goal part of (msg)r~rO~ and likewise for H. If a particular msg and substitution o- can be chosen such that the atoms in G satisfy the conditions on the A~ given in the theorem, and likewise for the corresponding substitution for H, we are done. The argument that the conditions on H are satisfied is similar to the corresponding argument for G so is not given below. Now (rnsg)-c >~ C and the heads of these two clauses are identical so by Theorem 4.2(2),
P/x C~,o~yO~O2 ~ ~v(mSg)bo~y~O~ , where V is the set of variables occurring only in the body of msg. Using the usual refutation process, a substitution o" can be constructed such that cr assigns variables in V to ground terms composed of the symbols in P/x CbodyOlO2 or (msg)b,,d~ rO~ and P/x Cbody0~02 ~ (msg)bodyrO , or.
172
w. BI !NT1NE
Function and constant symbols in msg all occur in P, C, and D by definition. So the corresponding ~- must also have function and constant symbols occurring only in P, C, and D. Consequently, (msg)~,od:.rO~~ is composed of function and constant symbols occurring only in P, C, 0~, &_, and D. Finally, note that the atoms in G are exactly the atoms in (msg)body~'O1~r and so satisfy the necessary conditions. ~] Two corollaries indicate special cases when most specific generalizations arc guaranteed to exist. The corollaries treat situations such as the logic program P being either a set of facts or a DATALOG program. Corollary 7.2. If both clauses C and D are unit clauses and only a finite number of ground atoms (constructed from symbols" in P, C and D) are a logical consequence of P, then a most specific generalization qf C and D w. r.t. P exists. Proof. In the construction in T h e o r e m 7.1, note that for the situation given in the corollary both C~,ody and Dbm~y are empty, so the sets { A~ . . . . . A,,} and { B ~ , . . . , B,,,) must contain ground atomic consequences of P. Consider the case where both these sets equal the entire set of ground atomic consequences of P. This set is finite so a least generalization, lg, can be constructed as given in the theorem. Now, for any other common generalization, cg, of C and D w.r.t. P, there exists another common generalization, cg', such that cg' ~ , c g and cg' can be constructed as outlined in T h e o r e m 7.1 but using other sets of atoms {A'~. . . . . A~,} and {B'~. . . . . B,~). A proof of this last statement is similar to the proof of T h e o r e m 7.1. Because the sets {A I . . . . . A~',} and {B'~ . . . . . B~} can contain nothing more than ground atomic consequences of P, lg <~ cg'. Because cg was picked arbitrarily, lg is most specific. ~q ¢
t
Corollary 7.3. If P, C, and D contain no function symbols, then a most specific generalization of C and D w.r.t. P exists. A direct implementation of T h e o r e m 7.1 as it stands is impractical for all but the simplest cases because it essentially involves the deduction of all ground facts logically implied by the logic program P. Furthermore, the resultant most specific generalization is to be reduced w.r.t, the logic program and this operation is NP-complete for even the simplest case when P = { }. Plotkin, in his thesis, suggests that the processes of constructing a least generalization and of performing its reduction should be interwoven. Buntine [36] discusses this in more detail. T h e o r e m 7.1 is, however, of theoretical significance because it provides a precise characterization of the technique suggested by Plotkin for using his least generalization algorithm, extended to allow the generalization of pairs of clauses, not just facts, and to allow the use of current knowledge expressed as definite clauses.
GENERALIZED SUBSUMPTION
173
Two simple examples below illustrate some problems that may be encountered in practice. In the first example, suggested by Tim Niblett, an infinite number of facts are able to be generated that contribute to an "infinite" most specific generalization. The simplicity of this example suggests that most specific generalizations will commonly not exist. A most specific generalization of h(a) and h(b) w.r.t. P, given below, does not exist. g(f(a)) ,
g(o, X),
g(b, X) .
A proof is by contradiction. Suppose a most specific generalization does exist. Consider applying the construction given in Theorem 7.1 with C=-h(a), D =- h(b), the grounding substitutions 0I, 02, ~b~ and (~2 empty, and the two sets of ground atoms, {A 1. . . . Ap) and {B~ . . . . . Bin} both equal to H,,, where H,, is the set of all ground atomic consequences of P with term depth ~ n . H,, is given by ,
{g(f(a)))
U0
G,,
i-O
where
G,, =- {g(a, fr'(a)), g(b, f"(a)), g(b, f"(b)), g(a, f " ( b ) ) } . The theorem says to construct the least generalization of
h(a) ~- g(f(a)), g(a, a), g(a, f(a)) . . . . . g(a, f"(a)), g(b, a), g(b, f(a)) . . . . . g(b, fn(a)) , g(b, b), g(b, f(b)) . . . . , g(b, f"(b)), g(a, b), g(a, f(b)) . . . . . g(a, f"(b)). h(b) ~- g(f(a)), g(a, a), g(a, f(a)) . . . . . g(a, f"(a)), g(b, a), g(b, f(a)) . . . . . g(b, F(a)), g(b, b), g(b, f(b)) . . . . . g(b, f"(b)), g(a, b), g(a, f(b)) . . . . . g(a, f"(b)). This eventually reduces w.r.t. P to cg,,, given by
h(X) ~-- g(X, X), g(X, f(X)) . . . . , g(X, f"(X)), g(X, a), g(X, f(a)) . . . . . g(X, fn(,)), g(X, b), g(X, f(b)) . . . . . g(X, f"(b)). By Theorem 7.1, the most specific generalization must be equal to cg n for some n. But then a contradiction exists as cg,, ~ is also a common generalization, but it is more specific than cg n (because both cg,, and cg,+~ are known to be reduced).
174
w. BUNTINE
In the second example, a most specific generalization does exist but is impractical as an immediate induction hypothesis. Consider a most specific generalization of C ~ member(4, [3, 4]) and D =- member(2, [5, 1, 2]) w.r.t. P, given below.
member(X, X. Y) , member(2, 1.2.[ ]) . Using a theorem and algorithm developed by Buntine [36], a most specific generalization exists. One has been computed to be 43 atoms in length, but it is unknown at present whether this clause can be reduced further. Had a most specific generalization been developed using a naive application of T h e o r e m 7.1, the intermediate clause constructed by the least generalization algorithm would have been around 2000 atoms long before application of the reduction algorithm. One method of overcoming the problems of infinite or lengthy most specific generalizations is to incorporate relevance notions [35, 36] as follows. In an induction problem, a ground atom B is irrelevant to a ground atom A if for the intended hypothesis H, B does not occur in any proof of A in H. The definition of a most specific generalization can then be extended to incorporate relevance. A maximally specific plausible generalization of two clauses is a common generalization of the two clauses satisfying available information on relevance that is not more general than any other common generalization satisfying available information on relevance. For instance, in the second example above, we could say that for N an integer and L a list of integers, member(U, V) is irrelevant to member(N, L) if one of the following holds: U is not an integer, V is not a list of integers, or if the length of V is greater than or equal to the length of L. This information about relevance, when incorporated into the example, yields a maximally specific plausible generalization of
member(X, U. V. W ) ~-- member(X, V. W ) . 8. Conclusions An integral part of the induction process is search through a space of rules. This paper introduces a model of generalization, incorporating theory and algorithms, on which suitable search spaces can be built. This supersedes 0-subsumption as such a model of generalization for definite clauses. Several results have been presented: a characterization of the potential effects of a system's current knowledge on inductive search spaces, methods to detect redundant clauses in logic programs and redundant atoms within clauses, a theoretical characterization of Plotkin's induction technique of finding a most specific generalization, and a significant improvement over an existing tool, 0-subsumption. For the reasons outlined in Section 5, the model also suggests an improve-
GENERALIZED SUBSUMPTION
175
ment over Shapiro's most general refinement operator [8] for enumerating a specialization hierarchy. Of course, to achieve induction in practice, a means of heuristically searching hierarchies needs to be developed. This is currently being investigated. More broadly, however, this work provides a case study of the interaction between different facets of knowledge: gen6ralization, induction, redundancy, relevance, and structure--key factors in the maintenance of knowledge bases. Finally, how can the model of generalization presented be strengthened without losing the computational properties demonstrated here? Maher [31] and Sagiv [27] both contribute in this direction. The kind of current knowledge utilized, presently definite clauses, could be extended, or generalization could be taken relative to some particular domain, that is, class of interpretations. ACKNOWLEDGMENT 1 would like to thank Ross Quinlan, Jenny Edwards, and Graham Wrightson for their encouragement and support; Donald Michie, John Potter, Michael Maher, the journal referees, and especially Tim Niblett for the suggestions tl~ey made concerning the current extension: Paul O'Rorke for making me aware of the many versions of subsumption; and Tim Niblett for introducing me to Plotkin's thesis. This researcl~ has been supported by a Commonwealth Postgraduate Research Award from the Australian Government.
REFERENCES 1. Dietterich, T.G., London, B., Clarkson, K. and Dromey, G., Learning and inductive inference, in: P.R. Cohen and E.A. Feigenbaum (Eds.), The Handbook of Artificial Intelligence lII (Kaufmann, Los Altos, CA, 1982) 323-512. 2. Angluin, D. and Smith, C.H., Inductive inference: Theories and methods, Comput. Surveys 15 (3) (19831 237-269. 3. Hart, A., The role of induction in knowledge elicitation, Expert Syst. 2 (1985) 24-28. 4. Quinlan. J.R., Compton, P.J., Horn, K.A. and Lazarus, L., Inductive knowledge acquisition: A case study, in: J.R. Quinlan (Ed.), Applications of Expert Systems (Addison-Wesley, London, 19871. 5. Kitakami, H., Kunifuji, S., Miyachi, T. and Furukawa, K., A methodology for implementation ot a knowledge acquisition system, in: Proceedings 1EEL International L~vmposium on Logic Programming, Atlanta City, NJ (1984) 131-142. 6. Michie, D., Current developments in exper~ systems, in: J.R. Quinlan (Ed.), Applications of Expert Systems (Addison-Wesley, London, 19871. 7. Quinlan, J.R., Induction of decision trees, Mach. Learning 1 (19861 81 106. 8. Shapiro, E.Y., Inductive inference of theories from facts, Tech. Rept. 192, Department of Computer Science, Yale University, New Haven, CT (1981). 9. Shapiro, E.Y., Algorithmic Program Debugging (MIT Press, Cambridge, MA, 1983). 10. Mitchell, T.M., Generalization as search, Artificial Intelligence 18 (1982) 203-226. 11. Plotkin, G.D., A note on inductive generalisation, in: B. Meltzer, and D. Michie (Eds.), Machine Intelligence 5 (Elsevier North-Holland, New York, 19701 153-163. 12. PIotkin, G.D., A further note on inductive generalisation, in: B. Meltzer and D. Michie (Eds.), Machine Intelligence 6 (Elsevier North-Holland, New York, 1971) 101-124. 13. Mitchell, T.M., Utgoff, P.E., Nudel, B. and Banerji, R., Learning problem solving heuristics through practice, in: Proceedings 1JCAI-8I, Vancouver, BC (1981) 127-134. 14. Michalski, R., A theory and methodology of inductive learning, Artificial Intelligence 20 (1983) 111-161.
176
W. BUNTINE
15. Brachman, R.J. and Levesque, H.J., The tractability of subsumption in frame-based description languages, in: Proceedings AAAI-84, Austin, TX (1984) 34-37. 16. Robinson, J.A., A machine-oriented logic based on the resolution principle. J. A ( M 12 (1) (1965) 23-41. 17. Seikmann, J. and Szabo, P., Universal unification and a classification of equational theories, in: Proceedings Conference on Automated Deduction, Lecture Notes in Computer Science 87 (Springer, New York, 1982) 369-389. 18. Sammut, C.A. and Banerji, R.B., Hierarchical memories: An aid to concept learning, in: R.S. Michalski, J. Carbonell and T.M. Mitchell (Eds.), Machine Learning: An Artificial Intelligence Approach II (Morgan Kaufmann, I_~os Altos, CA, 1986)~ 19. Vere, S.A., Induction of relational productions in the presence of background information, in: Proceedings IJCAI-77, Cambridge, MA (1977) 349-355. 20. Fu, I~.-M. and Buchanan, B.G., Learning intermediate concepts in constructing a hierarchical knowledge base, in: Proceedings IJCAI-85, Los Angeles, CA (1985) 659-666. 21. Plotkin, G.D., Automatic methods of inductive inference, Ph.D. Thesis, I;niversitv of Edinburgh, Scotland (1971). 22. Sammut, C.A,, Concept development for expert system knowledge bases, Aust. ('omput. J. 17 (1) (1985) 4%-55. 23. Lloyd, J.W., Foundations of Logic Programming (Springer, New York, t984), 24. Loveland, D.W.. Autornated Theorem Proving: A Logical Basis (North-Holland, Amsterdam, 1978). 25. Doyle, J., A truth maintenance system, Artificial Intelligence 12 (1979) 231-272 26. Gallaire, H. and Minker, J., Logic and Databases (Plenum, New York, 1978). 27. Sagiv, Y., Optimizing datalog programs, in: Proceedings Foundations of Deductive Databases and Logic Programming Workshop, Washington, DC (1986) 136-162; also Rept. STAN-CS86-1132, Department of Computer Science, Stanford University, CA (1986). 28. Banerji, R.B., Changing language while learning recursive descriptions from examples, in: T.M. Mitchell, J.G. Carbonell and R.S. Michalski (Eds.), Machine Learning: A Guide to Current Research (Kluwer Academic, Boston, MA, 1986) 5-9. 29. Bowen, K.A. and Kowalski, R.A., Amalgamating language and recta-language in logic program~ning, in: K.L. Clark and S.-A. Tarnlund (Eds.), Logic Programming (Academic Press, New York, 1982) 153-172. 3~). Garey, M.R. and Johnson, D.S.. Computers and Intractability (Freeman, San Francisco, CA, 1979). 31. Maher, M.J., Equivalences of logic programs, in: Proceedings Third International Con.f~'rence on Logic Programming, Lecture Notes in Computer Science 225 (Springer, New York. 1986). 410-424. 32. Bledsoc, W.W., Some thoughts on proof discovery, in: Proceedings IEEE Symposittm on Logic Programming, Salt Lake City, UT (1986) 2-10. 33. Mitchell, T.M., Keller, R.M. and Kedar-Cabelli, S.T., Explanation-based generalization: A unifying view, Mach. Learning I (1) (1986). 34. DeJong. G. and Mooney, R., Explanation-based learning: An alternative xiew, Mach. Learning 1 (2) (1986) 145-176. 35. Buntine, W.L., Induction of Horn clauses: Methods and the plausible generalization algorithm, Int. J. Man-Mach. Stud. 26 (1987) 499-519; Revised version of a paper presented at Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Alia. (1986). 36. Buntine, W.L., A most specific generalisation algorithm for Horn clauses, Unpublished manuscript (1988).
Received N o v e m b e r 1986; revised version received January 1988