Znf. J. Man-Machine Studies (1992) 36, 617-637
Learning Ross PETER CLEMENT Man-Machine Interaction Laboratory, Department of Systems and Information Engineering, Toyohashi University of Technology, Japan. Currently with: School of Computer Studies, University of Leeds, Leeds LS2 9JT, UK (Received 27 July 1990 and accepted in revised form 3 March 1991) describes a new method of knowledge acquisition for expert systems. A program, KABCO, interacts with a domain expert and learns how to make examples of a concept. This is done by displaying examples based upon KABCO’s partial knowledge of the domain and accepting corrections from the expert. When the expert judges that KABCO has learnt the domain completely a large number of examples are generated and given to a standard machine learning program that learns the actual expert system rules. KABCO vastly eases the task of constructing an expert system using machine learning programs because it allows expert system rule bases to be learnt from a mixture of general (rules) and specific (examples) information. At present KABCO can only be used for classification domains but
This paper
work is proceedings to extend it to be useful for other domains. KABCO learns disjunctive concepts (represented by frames) by modifying an internal knowledge base to remain consistent with all the corrections that have been entered by the
expert. KABCO’s incremental learning uses the deductive processes of modification, exclusion, subsumption and generalization. The present implementation is primitive, especially the user interface, but work is proceeding to make KABCO a much more advanced knowledge engineering tool.
1. Introduction Probably the major problem in artificial intelligence today is the well-known “Knowledge Acquisition Bottleneck” (Feigenbaum, 1981). Attempts to ease this bottleneck by automating knowledge acquisition have spawned a great variety of approaches (e.g. Gaines & Boose, 1988; Michie, 1982) and an even greater number of implemented systems (e.g. Boose & Bradshaw, 1987; Quinlan et al., 1986; Clement, 1991). The difficulties involved in knowledge acquisition (KA) are well known and good descriptions of these problems can be found in Kodratoff, Manago and Blythe (1987) and Clancey (1986). Knowledge integration (KJ) is the task of adding extra knowledge to a pre-existing knowledge base and is also a significant bottleneck (Gaines, 1989). Machine learning (ML) programs that learn from examples (hereafter referred to as “training instances” to avoid confusion) and generate rules have been successfully applied as automatic KA interfaces for expert systems (Quinlan et al., Kodratoff, Manago & Blythe, 1987; Michalski & Chilauski, 1980). This approach has been found to be suitable for a number of classification domains, especially diagnosis of illnesses, either human or other (see any of the above papers). The KA tool presented in this paper is presently applicable to similar domains to the above ML programs. 617 0020-7373/92/040617 + 21$03.00/O
0 1992 Academic Press Limited
618
R. P. CLEMENT
The research in this paper grew out of research into ML programs for knowledge acquisition (Clement, 1991). Advantages of the ML approach are that specific training instances are easier to create than rules (where complex interrelations in the rule base must be considered by the knowledge engineer and expert) and the ease of KI. The final expert system (ES) can be modified by adding new training instances to the original training set and regenerating the rule base from scratch. A disadvantage of the ML approach is the sheer number of training instances that must be entered into the computer. In some cases these training instances already exist (Quinlan et al., 1986) but in most cases (e.g. Kodratoff, Manago & Blythe, 1987; Brew & Catlett, 1986) a form listing the attributes and possible conclusions is filled out by hand, then entered into the computer. The task of filling out thousands of forms (even if these forms are computer-readable) is very taxing for the expert. Although ML techniques were originally intended for generating very large ES rule bases (Michie, 1982), most actually realized systems have been very small (a notable exception is a 2500 rule ES developed by British Petroleum, see Expert System User, August, 1986: pp. 16-19). Another disadvantage of the ML approach is that ML programs are not easily incorporated into KA systems that combine a number of KA tools (e.g. Boose & Bradshaw, 1987). The user interface of ML programs (if there is one at all) is so different from other KA tools that a semi-computer-literate expert is unlikely to abandon a familiar KA interface to use an ML program. This paper presents an implemented system, KABCO (Knowledge Acquisition by Being Corrected) that interacts with the domain expert and learns to make the training set itself. As is explained in the remainder of this paper, KABCO starts with minimal knowledge about the domain (the list of classes, attributes and values) and starts generating examples. The expert corrects errors in these examples and KABCO uses these corrections to update its knowledge base. This process continues
43* 6
@
l_)
___!______Q____d ____ ____ _-_________--____.--_____-____--____-_____--___--0
LYIfl LrYsTl f-gfe!j Apples FIGURE 1. The fruit-storing
problem
LEARNING EXPERT SYSTEMS BY BEING CORRECTED
619
until the expert is satisfied that KABCO has learnt the domain. KABCO then generates a large set of training instances which are given to an ML program and a rule base is generated. KABCO is an incremental learning system and hides the learning from examples technology so well that during experimentation some users were completely unaware as to how the rules were generated, but still found KABCO easy to use. It is claimed here that KABCO makes using a pre-existing ML program (to produce as ES rule base or other data structures such as neural networks) much easier to use and also significantly eases the KI problem. It is also claimed that KABCO allows ML programs to be used with a similar interface to other KA tools and be incorporated into larger, multi-approach, systems. KABCO’s application, creating training sets, is much easier than engineering expert system rule bases directly because much of the work in building rules is left to the ML program. KABCO is explained using a very simple domain, sorting fruits coming down a conveyor belt and placed in a box by a human worker (Figure 1). 2. Learning from examples ML programs that learn from training instances, e.g.: lemon (colour = yellow shape = oval stem = none surface = dimpled size = 8 cm ) are useful as automatic KA tools, as mentioned above. As the aim of this research is to design KA tools that improve on ML programs, it is necessary to examine why it is easier to generate a set of training instances than write the rules by hand. (Kodratoff, Manago & Blythe (1987) describes the KA process when an ML program is used). One reason is that training instances are specific, localized pieces of knowledge, in other words characteristic information (Michalski et al., 1983). Figure 2a shows how creating (characteristic) training instances for apple, orange and strawberry does not require any inter-class relations to be considered, while these relations are vital in the case of discriminant rules (Figure 26). An expert can easily be given forms to fill out which describe training instances. It can be imagined what would happen if a non-computer-literate expert was given a standard “rule” form and asked to write rules for the final ES. Training instances include more attributes than rules, and large numbers are necessary, but individually they are logically simple. The ML program can be defined as a learning system that takes characteristic information as input and generates discriminant information as output. Modern ML programs are able to perform effective search and generalization with no background knowledge for many domains (Clement, 1991). Another major advantage of using an ML program to generate a rule base is that the actual optimality of the training set is far less crucial than the optimality of
620
R. P. CLEMENT
Coloy
= red
-
Size = 8 cm
Surface
= dimpled
c
Surface
Size = 12 cm
= dimpled
(b)
FIGURE2. Logical simplicity of a training set versus a rule base.
instances. (b) The interdependence
(a) The independence of rules.
of training
hand-written rules. For example, if the expert is unsure which attributes to use then many attributes can be used and the ML program chooses which ones to use. KI is also much simpler as the ES can be extended by adding new training instances to the original training set and regenerating the rule base. KI is closely related to KA as one way of creating an ES to classify eight fruits is to create an ES that classifies two fruits and extend it fruit by fruit. In practice, this is the way that KABCO works. 3. Learning from general
and specific information
The research that led to the creation of KABCO was an attempt to reduce the number of training instances necessary for an ML program by allowing the training set to contain both general and specific information. As an example, consider: apple (colour = red or green or yellow shape = round stem = none surface = smooth size=6.. 12cm I.
621
LEARNING EXPERT SYSTEMS BY BEING CORRECTED
Because some attributes number of possible
have general
values this training
instance
describes
a
et al., 1983). Allowing the expert to write these general rules should make the task of generating the training set much easier. An immediately obvious extension is to allow various attributes to be completely ignored:
kiwifruit (colour = brown shape = oval stem = * surface = * size = *
where the expert has effectively entered a discriminant rule into the training set. This one training instance represents all possible kiwifruits. A mixture of training instances, characteristic rules and discriminant rules (which are what we want the program to generate) makes up the training set, allowing the expert much more freedom than the traditional, training instance only, approach. This is similar to the way that human beings are taught and is defined as learning from general characteristic information (since they are still training instances). While this extended definition of a training instance makes creating the training set much easier, it does not solve the problem of ensuring that the training set is adequate. As mentioned in Kodratoff, Manago and Blythe (1987) and Feigenbaum (1977) it is necessary for the expert to ensure that rare and extreme training instances are included in the training set. In the fruit example, it is necessary for there to be training instances of yellow apples and melons to prevent the ML program (here MAIN; Clement, 1991) generating the rule: if colour = yellow then class: := lemon. It is very difficult to design ML algorithms that can learn from training sets containing general information. The problem is that it is impossible to decide whether a rule generated covers a general instance. For example, does the rule: if shape = oval and surface = furry then class := kiwi cover the above kiwifruit instance or not. The author attempted to develop learning theories to handle this while still acting in an intuitive manner, but was unsuccessful in either developing a theory or proving the task impossible. Another approach is to convert the general instances into a number of specific instances. For example the previous apple training instance can be preprocessed (as in Figure 3) and converted into a hundred or so red, green and yellow apples of
622
R. P. CLEMENT
FIGURE 3. Preprocessing
a mixed information training set.
various sizes (a similar concept to preprocessors that fill in (unknown) values in attributes as in Quinlan et al., 1986). The advantage of a preprocessor is that unmodified ML programs can be used. The program that became KABCO started out as an attempt to write such a front end. A problem is deciding what values to assign to attributes that have been dropped completely (such as size, surface and stem in the above kiwifruit training instance). It is impossible to design a strategy that is correct in all cases (in the example there are no clues what values are correct without resorting to background information). Another possibility is generating training instances which are approved one-byone by the expert. An improvement is for the expert to type corrections to the training instance rather than just approving or disapproving. The expert originally gives a partial training set which is completed by interaction between the expert and the preprocessor. Since the program must handle all realistic situations, it must be able to start with an empty training set and “complete” this. This final situation is the definition of “learning by being corrected”. 4. KABCO: learning
by being corrected
KABCO starts out with a knowledge base containing only a description of the classes present and the attribute vector (Quinlan, 1986). In the fruits example: classes(melon kiwifruit orange strawberry peach pear apple lemon) template (colour = (brown green orange pink red yellow) shape = (round oval uneven) stem = (leafy stout none thin) surface = (smooth furry dimpled thin) size = integer ). Four points that must be emphasized this paper are:
at this point but which are discussed later in
(i) This is a very simple example and that it is only being used to explain the concept of learning by being corrected. (ii) For realistic problems that creation of this template is a very difficult task, and forms a sizeable part of the knowledge acquisition bottleneck. (iii) The present implementation of KABCO has an extremely primitive user interface. The original knowledge base file (template and classes) must be created manually using a conventional editor. Displayed training instances, error messages and typed commands are exactly as presented in this paper.
LEARNING
623
EXPERT SYSTEMS BY BEING CORRECTED
(iv) The expert is entering general information that resembles the target rules, raising the question of whether KABCO is actually doing anything at all. KABCO is a system that learns from general characteristic information and produces specific characteristic information (training instances). The ML program then converts this into general discriminant information (rules). Learning from characteristic information combines the conciseness of hand-written rules with (most of) the logical simplicity of training instances. (v) The expert continues tutoring KABCO until KABCO’s behaviour matches the expert’s own conception of the domain. 4.1.
INTERNAL
KNOWLEDGE REPRESENTATION
After reading in the template, KABCO creates each class. This creates a knowledge base stating valid instances for any class. This over-general specialized until it is judged adequate (although described later). The initial frame for oranges is:
a single frame (Minsky, 1975) for that any combination of values is a knowledge base is incrementally generalization is also possible as
orange(colour = brown or green or orange or pink or red or yellow shape = round or oval or uneven stem = leafy or stout or none or thin surface = smooth or furry or dimpled or thin size = =32768 . .32767 ) and there are similar frames for the other seven classes. A training instance is generated from a frame by choosing one value from a symbolic slot, or a value at random between the high and low bounds for an integer or real slot. For example, a training instance generated from the above orange frame is: orange(colour = green shape = oval stem = stout surface = dimpled size = 28312 Because KABCO starts with almost no knowledge of the domain, the first training instances are almost invariably silly (this orange is 28 kilometers wide). 4.2. MODIFICATION The
expert corrects some, all or none of the errors in the training instance. Errors left uncorrected remain as gaps in KABCO’s knowledge base and are corrected when they appear again later. As an example, the expert corrects the size error by typing: size = 2 . .60;
which is interpreted
by KABCO as: for all classes: 2 * size + 60
624
K. P. C‘LEMENT
and KABCO then updates its knowledge base to be consistent with this fact. This is referred to as a modification because all frames in KABCO’s knowledge base are modified to be consistent with the new fact. Formally, each frame FR is modified to be consistent with the fact (correction) F, giving FR&F. Even though this correction is entered in response to the previous training instance KABCO makes no connection between the two. To restrict the effects of the correction to one frame would prolong the learning process unnecessarily. The modified orange frame is: orange(colour = brown or green or orange or pink or red or yellow shape = round or oval or uneven stem = leafy or stout or none or thin surface = smooth or furry or dimpled or thin size = 2 . .60
A simple statement is useful for properties applicable to all classes, but to enter finer detail more complex statements are necessary. 4.3. EXCLUSION
The next instance output by KABCO is: peach(colour = brown shape = round stem = none surface = dimpled size = 42 ) and the expert enters some information
relating surface and colour :
if surface = dimpled then colour = orange or yellow or red; Because this correction is an “if-then” statement, there are two ways that frames are modified to remain consistent. One is to modify colour to just take the values orange or yellow or red, the other is to change surface to exclude the value dimpled, so that the antecedent is no longer valid. KABCO maintains its knowledge base using deductive inference (i.e. no possible training instances are excluded from the knowledge base unless specifically ruled out by the expert’s corrections). Therefore both frames (the exclusion to exclude the antecedent and the modification to apply the consequences) are generated and retained in KABCO’s knowledge base. The orange frame becomes: orange(cofour = orange or red or yellow shape = round or oval or uneven stem = leafy or stout or none or thin surface = smooth or furry or dimpled or thin size = 2 . .60
LEARNING
EXPERT
SYSTEMS
BY BEING
CORRECTED
625
orange(colour = brown or green or orange or pink or red or yellow shape = round or oval or uneven stem = leafy or stout or none or thin surface = smooth or furry or thin size = 2 . .60
An exclusion is the same as a modification by “not consequences” and is implemented in this way. In the example the “if” statement applies to every frame in KABCO’s knowledge base but after learning has progressed further this is unlikely to be true. For example, a melon frame: melon(colour = green shape = round or oval stem = stout or none surface = smooth or veined size = 29 . .58 )
is already consistent with the correction and is not modified in any way. KABCO’s internal representation is similar to the general training instance format presented in section 3. There is an implicit “or” between each frame. KABCO is incrementally learning disjunctive concepts using deductive reasoning (and can be classified within machine learning research as in Michalski et al., 1983). To explain KABCO’s learning strategy, the knowledge base at any time is KB and the new fact typed by the expert is F. The above (and below processes) are simply KABCO altering its knowledge structures to represent KB&F, in a manner that is both spatially and computationally efficient and allows training instances to be easily generated. The modified knowledge base becomes KB1 and is combined with the next correction to generate KB1&F1 which is optimized to KB2. This process is continued unit1 KB,, the final (consistent with the expert’s view of the domain) knowledge base is reached. Modifications and exclusions can be explained as above. If a frame FR is to be made consistent with the statement if A then C and the expression A is true in FR then the following frames are generated: FR & C (the modification) FR & 1A (the exclusion) either of which can be a set of several frames (if one of A or C is a disjunctive expression) or consist of no frames whatsoever (if the resulting expression is a contradiction). Another statement that the expert can enter is the “iff” (if-and-only-if) statement. All frames where the antecedent applies are handled in the same way as above, while all other frames become exclusions of the form: FR&-C
626
R. P.
CLEMENT
4.4. KABCO’S INPUT LANGUAGE
KABCO accepts corrections written with a full complement of <, >, > = , <= , or, and, not, if-then, iff-then. Quantification is universal for normal statements and existential for the special “exists” statement (described later). There is no restriction on what attributes or values are used in different parts of rules. A special “class” attribute, that is used just like any other attribute, describes the class of the frame. This very general input language was designed to allow the expert to enter information in any form, general statements (rules, either characteristic or discriminant or neither) to specific statements (training instances). 4.5. MULTIPLE GENERATION
AND DESTRUCTION
An “if-then” statement can be entered above “if-then” can be rewritten as:
OF FRAMES
as a simple expression,
for example the
not surface = dimpled or colour = red or yellow or orange; In this case all frames in KABCO’s knowledge base are modified, but because the expression is complex, more than one frame is generated (the above orange frame produces the same frames as the previous exclusion and modification, though by a different process). Although the internal state of knowledge base appears quite different from after the “if-then” statement, it represents the same knowledge. Because of this and because the knowledge base becomes incomprehensible for a human, it is never shown to the expert. The expert only evaluates KABCO’s understanding of the concept as displayed by the training instances generated. Modification and exclusion sometimes generate new frames, but they also sometimes lead to a decrease in the number of frames when contradictions occur. For example, if the frame (probably generated by exclusions) is strawberry(colour = brown or orange shape = round or uneven stem = leafy surface = furry stem=2. .9 then there is no way that this frame can be modified to be consistent with the fact: if class = strawberry then colour = red; and the system deletes it. Formally, FR & C = false. The class attribute only holds one value (class name) at a time and any attempt to modify this to a different class automatically generates a contradiction. For example, to make the above strawberry frame consistent with the fact: if colour = brown and surface = furry then class = kiwifruit; the (only one is generated)
exclusion is the frame:
strawberry(colour = orange shape = round or uneven stem = leafy surface = furry size = 2 . .9 ).
LEARNING
EXPERT SYSTEMS BY BEING CORREaED
The modification
is a contradiction
627
because it involves solving cluss = strawberry &
cluss = kiwifruit. 4.6.
EXISTENTIAL QUANTIFICATION
KABCO starts with a general knowledge base and successively specializes it to maintain consistency with corrections entered by an expert. In a perfect world there would be no need for generalization, but humans sometimes make mistakes. For example, the expert may enter: if class = apple or strawberry then colour = red or yellow; forgetting about the existence of green apples, the possibility of which disappears from KABCO’s knowledge base. When the lack of green apples (how this is easily noticed is described in section 4.8) is noticed by the expert s/he can state that that: exists class = apple and colour = green. The expression part is taken as a fact F and used to update the knowledge base by modifying each frame and retaining both the modified and unmodified frame in the knowledge base, i.e. FR becomes FR or (FR & F). If no new frames are generated (because of contradictions) then a new frame (as in the original knowledge base before learning started) is generated for each class and the resulting (FR, & F)s (in the example case only the new apple frame doesn’t result in a contradiction) are added to the knowledge base. Generalization sometimes results in some learning being repeated, which is very frustrating for the expert. How this can be avoided is discussed further in section 9. 4.7. SUBSUMFTION
As mentioned before KABCO does not associate corrections with the training instance shown to the expert, but uses them to update all relevant frames in the knowledge base. The total number of frames can rapidly grow or shrink (a characteristic of KABCO is that the number of frames tends to rise quickly then falls gradually as learning continues) and there is the danger of redundant frames coexisting. This is where there are two frames, FRx and FR2, and FRI => FR2. New frames generated are checked against presently existing frames and redundant frames (either new or old) are deleted. 4.8. META-CONTROL
Because the expert often wants to concentrate on just a small part of the domain (for example, just apples), there is a command to instruct KABCO just to produce training instances specified by an expression. For example, the statement: look at class = apple; instructs KABCO to only generate apple training instances. It is at this point that the expert may notice the aforementioned lack of green apples, and take a closer look by entering: look at class = apple and colour = green; only to be told: there are no class = apple and colour = greens.
R. P. CLEMENT
628 The pseudo-value
“all” turns this function off, e.g.: look at all;
The “look” command is a step towards making KABCO more suitable for long and difficult KA tasks where the very simple KA model presented in this paper is not adequate. It allows the expert to define a subset of the domain and only teach the subject. Other necessary steps are described in Section 9. 4.9. SELECTION OF INSTANCES TO GENERATE
KABCO chooses instances to show to the expert in order to give an accurate and wide-ranging view of the current state of its knowledge base. To do this KABCO needs to find “near misses”, training instances that are most likely to be errors. This is the reverse of the “near miss” selection process described in Winston (1975) where the expert chooses near misses to show to the program. KABCO uses four mechanisms to choose frames to generate instances from and how to generate instances: (i) RANDOM: The frame and values are chosen randomly. (ii) EXTREME: Extreme high or low values are chosen for numerical slots. (iii) MINIMUM: Each frame and each value for each slot is a record (KABCO is implemented in C on an Omron Unix Workstation) and records how often the value has been shown by the expert. This method selects frames and values that have been shown to the expert few times, and therefore have been validated (by acceptance or correction) the least. (iv) OVERLAP: KABCO selects frames and instances which according to the present state of its knowledge base could be either of two classes. For example, KABCO doesn’t know if the instance: pear or peach(colour = pink shape = uneven stem = thin surface = furry size = 12 ) is a pear or a peach. Since both classes are output with the instance the expert knows in this case what KABCO expects to be taught. Selection of instances to be displayed is an important part of the knowledge acquisition process and is a guiding influence for the expert. Research must be carried out to find out how different training instance generation methods affect the KA process. 4.10. MODIFICATION OF THE TEMPLATE
Creation of the template (the list of attributes, values and classes) is a major part of the knowledge acquisition process. KABCO reads in a handwritten template provided by the user and therefore appears to make no contribution to solving the problem of creating the template. This is not true because KABCO has functions to modify the template during the KA process.
LEARNING EXPERT SYSTEMS BY BEING CORRECTED
629
The simplest commands show parts or all of the template. This is important because users often forget what values are possible for attributes. That this is necessary as a command is solely because the implementation has such a primitive user interface. It is planned that KABCO will be modified to have a WIMP interface, rendering many current commands obsolete. Next in complexity are commands to add and delete values from attributes in the template. For example, the command: add long to shape; is useful if we are going to add banana to the classes of fruits. This command changes the template and automatically performs a: exists shape = long; command to update the knowledge base. The reverse command deletes long from the template and automatically performs: not shape = long; clearing the value from all frames. Finally commands to add and delete whole attributes, and new classes, are planned but not yet implemented. These commands are urgently needed because with the current implementation it is a common (and therefore important) operation for the knowledge engineer (author) to have to edit KABCO internal files written to disk and perform these modifications by hand, an ad-hoc measure. In the same way as ES can be built using ML algorithms (Section 2) by starting with a small set of instances and gradually extending it to a full system, it is possible to invoke KABCO with a one-attribute attribute vector and gradually add attributes as they are necessary to remove overlaps (visible to the expert as explained in Section 4.9). Here KABCO aids in the creation of the template, but the expert must decide when to add new attributes and what attributes and values to add. As explained in Section 6 this can be improved. 5. KABCO and machine
learning
programs
Gaines (1989) lists “a diversity of techniques and tools that overlap in their applications and where it is not clear whether they are competitive alternatives or as a major impediment to the understanding and complementary partners” application of KA tools. Therefore it is necessary to compare KABCO with other KA tools. KABCO is clearly complementary to ML programs such as C4 (Quinlan et al., 1986), RL (Buchanan & Fu, 1985), NEDDIE (Corlett, 1983) and MAIN. KABCO acts as a front end and produces the set of training instances from which rules are generated. KABCO’s usual output is only compatible with MAIN, but in response to Gaines’ claim that lack of standardization of knowledge representation is an impediment to the wide usage of KA tools, KABCO can generate files matching those of the ML database of the University of California, Irvine. It is hoped that once KABCO has been developed and debugged further that the program itself will be available from this database.
630
R. P. CLEMENT
The instances generated can also be used as input to non-expert system generating programs such as neural-network systems. A current problem is that KABCO does not generate instances and values in the proportions that they appear in the real world but generates them in proportions relative to their logical complexity. This is a major drawback in using the program to generate neural networks or rules with certainty factors (Clancey, 1983). A new version of KABCO capable of producing training sets of correct proportions has been designed but not yet implemented (as described in Section 9).
6. KABCO and knowledge editors In this section the term knowledge editors refers to a large number of programs that interact with the expert directly and write the ES rules themselves with no intermediate training instance representation (Marcus, 1988; Boose & Bradshaw, 1987; Gaines & Boose, 1988). These programs can be used to generate large knowledge bases in a short time and be used by semi-computer-literate experts to generate their own ES. A limitation of these programs is that they are domain-dependent, i.e. they have been designed to solve a set of problems and cannot be used outside this set. ML programs are not domain-dependent, but are limited by simple training instance languages and are limited to classification problems and cannot be used outside these domains. (MAIN has recently been extended to handle problems outside the classical classification domain and it is planned that KABCO will be modified to handle the same domains). The user interface of ML programs and knowledge editors is so different that it is difficult to image the two being implemented within a single system. EXPERT I
I
Analysis of problem Type ao?:;Eection
KA tools
I
I
MAIN
I I
FIGURE 4. Integration of multiple approaches into a single KA system.
LEARNING EXPERT SYSTEMS BY BEING CORRECTED
631
The user interface to KABCO is more similar to that of the knowledge editors, meaning that it is possible to design a system as in Figure 4, where a selection interface directs the user to a particular system. Knowledge editors that include several different approaches simulate a single, multi-domain knowledge editor (Tijerino al., 1990). 7. Actual use of KABCO The following observations were gained by performing simple experiments with KABCO. The fruifi domain has been recreated by several people, who then went on to make their own simple knowledge bases. Observations made include the following: The user is often unaware that a training set is generated and given to another program. KABCO forms the interface between the expert and the computer, and the ML program is a knowledge compiler. This gives a surprisingly similar feel to normal text editor/language compiler combination. One possible criticism of KABCO is that the user constructs the rules (or decision tree) manually, rather than entering examples, and therefore using KABCO is no easier than writing the rules by hand. This is not true because different classes are usually taught independently with no consideration of complex inter-class interactions as in Figure 2. In the example, a single look at class = strawberry; followed by teaching of the class, strawberry, is possible with no consideration of interactions between strawberries and other fruits (the main thing considered when writing rules by hand). Characteristic rules combine the compactness of disjunctive rules with the locality of training instances. Experimentation shows that this is how experts use KABCO. 7.1. THE MUTANT WORM EXPERIMENT
The main experiment performed with KABCO so far is the creation of a small database to classify w0rm.s as normal or as one out five types of mutant. This rule base was entered directly by an expert with minimal help from the author. The experiment was as follows: The expert is skilled in the field of molecular biology but had never created any type of computer program before. He was asked to choose some domain suitable for encoding as a classifier system rule base, and chose the mutant worm domain. He was then shown how to operate KABCO by being coached through the fruits example. This was achieved in about two hours including explanation of how the system worked, creation of the template, and some argument over how to formalize the domain (even for such a simple system). After this, cherries were added to the domain to simulate KI. Cherries were deliberately chosen to be a difficult to add to the presently existing rule set. Once this process was finished the expert was asked to engineer the mutant worm domain, starting by entering the template using a normal text editor. KABCO was used to acquire the properties of each class and a set of rules was generated. The final expert system was written as a C language program, compiled and run.
632
a.
P. CLEMENT
This KA process was recorded on videotape and examined later. At many points during the experiment (especially at the end) the expert was asked to express his opinions freely. 7.2. RESULTS FROM THE MUTANT WORM EXPERIMENT
Some results from this experiment: (i) The expert found the input language of KABCO easy to memorize and quickly learnt how to enter commands with no assistance. The present input language is suitable for a practical tool used by non-computer experts. (ii) The fruits domain was entered correctly except for one error which was found by looking at the rules. The error was quickly fixed and a new rule set generated. This process was overseen by the author. A user-friendly controlling module is necessary to guide an unsupervised expert when debugging rules. (iii) The new class cherry caused MAIN to produce different rules for both apples and strawberries. The expert did not need to examine these classes (or any other classes) when the concept of cherries was taught. This shows that KI using KABCO is simple. (iv) The mutant worm domain was entered in about 15 minutes. Almost no assistance was necessary and when the experiment was finished the expert was confident that he could now use KABCO to engineer realistically large domains over a longer period of time. (iv) The expert had little difficulty formalizing the domain into a template, but this may be because the experimental results being used as a guide closely matched the template structure. (vi) It is easy for an unsupervised expert to make mistakes that delete an entire class from KABCO’s knowledge base. An especially non-intuitive feature is the way that mistakes must be repaired. For example Sl: if class = apple then surface = furry; S2: exists class = apple and surface = smooth S3: if class = apple then surface = smooth. The expert felt it was intuitive to type in the statement S3 to correct for Sl, without first entering the statement S2. In this case there is an easy solution, KABCO should ask, “are you sure?” if an entire class is about to be deleted, and do an automatic “exists”. When only part of the knowledge for a class is about to be deleted then it is harder to tell between a mistaken and a correct specialization. 7.3. HUMANIZING
KABCO
The expert noted that the interface of KABCO was mechanical and made several suggestions for improving the feel. One was that KABCO should ask questions about the KA process to verify the knowledge being input. This would double as an interface for inductive guidance of learning (see Section 9) and as a humanizing influence. An unexpected suggestion made by the expert was that KABCO should be capable of humour to prevent boredom, occasionally making jokes to amuse the
LEARNING
EXPERT SYSTEMS BY BEING CORRECTED
633
expert. The expert thought that this would be especially interesting if the jokes had some relation to the KA task at hand. It is easy to think of such jokes, for example KABCO might feign an aversion or particularity to certain classes: look at class = apple; Oh no, not again. (ret> No, I’ve had enough of apples, you haven’t looked at kiwifruit yet. (ret> Ok, you want apples? Here’s ten of them. Feast yer eyes sucker. A sense of humour can be implement leaving the present KA process intact. Previously it was thought that making KABCO less boring to use for lengthy tasks requires basic changes in the KA strategy and very advanced functions such as using inductive reasoning to guide induction (see Section 9). 8. Limitations of KABCO KABCO’s user interface is very primitive and the KA task may become long and boring for complex tasks. The user interface has many annoying features which are of little theoretical but major practical importance (e.g. some corrections typed by the expert are several lines long and if there is a single typing error then the whole correction must be typed again). A multi-window environment may help. For example a “show” command prints the template to the expert. A separate scrolling template window seems more natural. Gaines & Shaw (1986) noted that a very powerful multi-window environment can confuse experts by offering too many functions, but it is believed that KABCO would benefit from at least some improvements in this area. KABCO reduces KA to a very simple framework, but complex domains require hundreds (if not thousands) of instances to be viewed and corrected. KABCO can speed the KA process by choosing representative near misses, and there are many commands to focus learning on specific areas of the domain (such as “look”, “hide” which hides attributes so that learning on a subset of the attribute vector is possible. and “display” which displays hidden attributes) but none of these remove the inhuman feel of the process. As mentioned before, KABCO is designed for use for classification systems only. This is a solvable restriction that must be tackled, especially since the learning from examples program MAIN can already handle different types of domains. KABCO is not suitable for probabilistic domains (Quinlan, 1990; Buchanan & Fu, 1985) in its present form. KABCO can be used for such a domain by ignoring the fact that some overlaps in the knowledge base are not resolved. As an experiment a fruits knowledge base was generated with only two attributes, colour and surface. Green apples and melons cannot be differentiated, but a rule set accurate 96% of the time is generated. KABCO generates training sets where the proportions of different classes have no relation to the proportions in the real world and the certainty factors attached to the rules are inaccurate. KABCO does not have any higher level control strategies able to make hypotheses and test them by supplying relevant examples to the expert. If these
634
K. P.
CLEMENI.
functions existed then the expert would not have to manually check all parts of the domain using the “look” function. Because a number of training instances testing a similar concept (e.g. “can apples really be any of the eight colours?“) can be displayed in succession this should make the KA process more interesting for the expert. KABCO would be behaving less like an inhuman instance generator and more like an intelligent learner. 9. Future work The immediate plans for extension of KABCO include a proper user interface, the ability to handle different types of training instances, and more advanced functions for altering the template during the learning process. These have all been described in the preceding text. Extending KABCO to handle probabilistic domains has been a long term aim and was mentioned by the mutant worm expert. This extension is likely to be difficult, and will probably involve displaying information in a different way. For example, KABCO may choose to display its understanding of apples’ dour by creating 100 training instances: I have 100 class = apple instances. 24 of them are red. 27 of them are pink. 23 of them are yellow 26 of them are green to which the expert replies: #(colour
= pink) = 0 and #(colour
= green) = 50.
stating that there are no pink applies and that about 50% of apples are green. KABCO has a very simple input language that non-computer professionals find easy to use but natural language interface may be even easier for a novice to learn. For example, the statement: if class = melon and colour = yellow then not surface = smooth; matches the English sentence: No yellow melons have smooth skins A natural language interface may prevent KABCO becoming harder to use as the number of commands and functions (such as the suggested “#” command) grows. Natural language may also be a better medium for KABCO to convey its understanding of the domain to the user. An important way that domains can be divided up into manageable chunks is by identifying intermediate classification points (Fu, 1985). The expert can identify some of these points to aid the teaching process, as in: if class = melon and surface = smooth then sub-class = watermelon. if class = melon and surface = veined then sub-class = rockmelon. and once these differences are identified teaching can continue as if watermelon and rockmelon are different classes, although the superclass melon can still be used for properties common to both. The expert said that this was a feature he definitely wanted included in the program. Fu (1985) and others suggest that intermediate
635
LEARNING EXPERT SYSTEMS BY BEING CORRECTED Intermediate classification points Natural langage information Other information
FIGURE 5. integration
\
of different types of information learnt by KABCO.
classification points can be learnt automatically by ML programs, but sometimes the program designs different classification hierarchies from those commonly used by humans. If KABCO allows the user to build a classification hierarchy as part of the learning process then this hierarchy can be incorporated in the final expert system. KABCO-like systems will eventually learn many facets of the domain from the expert (as smoothly as possible) and incorporate these in the final expert system, leading to an expert system building strategy as in Figure 5. At present the corrections typed by the expert are not stored. This is a waste of information which may be useful for guiding the expert and for recovering from the expert’s errors. For example, if an entire class is accidentally deleted then KABCO restarts learning for that class from scratch. A better method is to show stored corrections to the expert and ask for incorrect ones to be deleted. KABCO then can apply all the approved ones to recreate some of the lost learning. This is important when the expert does not know how to type an “exists” statement before correcting errors. In addition to these above areas for future research, there are hundreds of other possibilities. For example, it may be easier for multiple experts (Gaines, 1989) to agree if they are only dealing with training instances, rather than general rules.
10. Conclusions As has been stated many times the implementation of KABCO is very primitive but even the present implementation is much easier to use than a ML program by itself. The toy fruifs example can be fully learnt in about 10 minutes, but requires about 1000 training instances for a correct expert system to be reliably generated. The advantage of the KABCO system is that it combines the small volume of input of writing rules by hand with the logical independence of training instances. Another advantage from the ML approach is that the input does not need to be optimized, finding an optimized set of rules is the task of the ML knowledge compiler. No practically large system has been built but small systems such as the fruit example and the mutant system have. Since teaching of classes is independent, much larger systems can be built with a linear increase of time and effort (with respect to the number of classes). The expert consulted felt confident that he could engineer large systems over a period of time.
K. I’ CLEMENT
636 It is therefore concluded that the concept KABCO program are useful as KA tools, pre-existing ML programs.
of learning significantly
by being corrected and the increasing the usability of
Acknowledgements The research in this paper comes from the author’s research for the degree of D. Eng. at the Toyohashi University of Technology. The author wishes to acknowledge the financial support of the Japanese Ministry of Eduction and also his academic supervisors; Professor Hajime Ohiwa and Dr Kazuhisa Kawai. A special debt is owed to Associated Professor Shahid Siddiqui, Head of the Molecular Biology Laboratory at The Toyohashi University of Technology, who was the domain expert for the mutantworm experiment.
References BREW, P. W. & CATLE?T, J. (1986). SA: An expert system for diagnosing an aluminum smelter. In Proceedings of the 1st Australian Artificial Intelligence Congress, Melbourne,
Australia. BOOSE, J. H. & BRADSHAW, J. M. (1987). Expertise transfer and complex problems using AQUINAS as a knowledge-acquisition workbench for knowledge-based systems. International Journal of Man-Machine
Studies 26, 3-28.
BUCHANAN, B. & Fu, L. M. (1985). Inductive Knowledge Acquisition for Rule-Based Expert Systems. Knowledge Systems Laboratory, Report No. KSL-85-42. Department of Computer Science, Stanford University, Stanford, CA 94305. CLANCEY, W. J. (1983). The epistemology for rule-based systems: a framework for explanation. Artificial Intelligence, 20, 215-251. CLANCEY, W. J. (1986). Transcript of plenary system: cognition and expertise. In 1st AAAf Workshop on Knowledge Acquisition for Knowledge Based Systems, Banff, Canada. CLEMENT, R. P. (1991). Rule generation and selection with a parallel generalisation architecture. IEICE Transactions, E 74, July 1991. CORLE~, R. (1983). Explaining induced decision trees. Proceedings Expert Systems, 136-142.
FEIGENBAUM, E. A. (1977). Themes and case studies of knowledge engineering. Proceedings of the 5th NCAI, Cambridge, Massachusetts. FEIGENBAUM, E. A. (1981). Expert systems in the 1980s. In A. BOND, Ed. State of the Art Report on Machine Intelligence. Maidenhead: Pergamon-Infotech. Fu, L.-M. (1985). Learning object-level and meta-level knowledge in expert systems. PhD Thesis, Knowledge Systems Laboratory, Computer Science Department, Stanford University. GAINES, B. R. (1989). Integration issues in knowledge support systems. International Journal of Man-Machine
Studies, 31, 495-515.
GAINES, B. R. & BOOSE, J. H., Eds. (1988). Knowledge Acquisition for Knowledge-Based Systems. London: Academic Press. GAINES, B. R. & SHAW, M. L. G. (1986). Foundations of dialog engineering: the development of human-computer interaction part II. International Journal of ManMachine Studies, 24, 101-123.
KODRATOFF, Y., MANAGO, M. & BLYTHE, J. (1987). Generalization Journal of Man-Machine
and Noise. fnternational
Studies, 21, 181-204.
MICHALSKI, R. S. & CHILAUSKI, R. L. (1980). Learning by being told and learning by examples: an experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease analysis. Policy Analysb and Information Systems, 4, 125-160.
LEARNINGEXPERT SYSTEMSBY BEING CORRECTED
637
MICHALSKI, R. S., CARBONELL, J. G. & MITCHELL, T. M., Eds. (1983). Machine Learning, An Artificial Intelligence Approach, especially chapter 1. California: Morgan Kaufmann. MICHIE, D. (1982). The state of the art in machine learning. In D. MICHIE, Ed. Introductory Readings in Expert Systems. London: Gordon & Breach. QUINLAN, J. R. (1986). Induction of decision trees. Machine Learning, 1. QUINLAN, J. R. (1990). Probabilistic decision trees. In Y. KODRATOFF & R. S. MICHALSKI. Eds. Machine Learning: An Artificial Intelligence Approach. Vol. III 140-152 San Mateo, CA: Morgan Kaufmann. QUINLAN, J. R., COMFTON, P. J., HORN, K. A. & LAZARUS, L. (1986). Inductive knowledge acquisition, a case study. In Proceedings of the 2nd Australian Conference on Applications of Expert Systems, pp 183-204, Sydney, Australia. TIJERINO, Y. A., KTTAHASHI,T., MIZOGUCHI, M. & KAKUSHO, 0. (1990). MULTIS: a task analysis interview system based on prestored problem solving models. In Proceedings of the First Pacific Rim International Conference on Artificial Intelligence, Nagoya, Japan. WINSTON, P. H. (1975). Learning structural descriptions from examples. In P. H. WINSTON. Ed. The Psychology of Computer Vision. New York: McGraw-Hill.