Case-based learning: A new paradigm for automated knowledge acquisition

Case-based learning: A new paradigm for automated knowledge acquisition

I I I Case.Based Learning: A New Paradigm for Automated Knowledge Acquisition Stuart H. Rubin Naval Ocean Systems Center Case-Based Learning (CBL) i...

3MB Sizes 1 Downloads 85 Views

I I I

Case.Based Learning: A New Paradigm for Automated Knowledge Acquisition Stuart H. Rubin Naval Ocean Systems Center

Case-Based Learning (CBL) is a new paradigm j~r solving problems ~ generalizing and warnforming solutions o f similar previously encountered ones. Cases serve as actual problem-solution instances in CBL. Here, cases differ j~om those in the Case-Based Reasoning (CBR) paradigm in that they are immutable. F~er~ systems dcBne a nemorh o f interacting domain-s~,ecific subsystems for bootstrapping CBL methods. The primitive subsystems are called next.generation expert systems. All subsystems constrain random generalization spaces. Generalizatiog¢ can be augmented with a CBL method called Random Seeded Crystal Learning (R$CL). RSCL methods generate a population o f similar hypotheses using a transformational paradign~ The cases serve to delimit Fundingfor this projtgt ms ~ bYgrantsto the author lrom the NavalOceanSystemsCenter(N06C), Code.u4., San Dboo, CA; dm ONI"PostdoctmalFellow+hipProgram, Projects Office, ASEE, Washington, DE; 20036 USA; the Research ExcellenceFundof lho State d Mid~n, Grant No. 62732; and NSF GrantNo. USE-9051617. Fundingfor Dr. Andersenwas provided underthe NOSC NP Program.

this space while simultaneously

r~nln8 theproce.~r g~trut-

INTRODUCTION The hallmark of intdligence is

ing the hypotheses. RSCL meth- the ability to solve new problems ods provlde for the shifting of the based on experience with old

knowledgeenglneer's~ a from the design of rules to the design and bootstrapping of domainspec~°w languages ~ r their capture. A prototype CBL system called LS h ~ been reallged at NOSC. 1,5 e~wiently induces novel rules that are open under deduction. Results indicate that more Imowledge can he generated than is supplied LS is implemented on a DAP-610 plageorm in approximately 10, 000 lines o f FORTRAN-Plus. Its u~'lity-especially as applied to intelligent manufaeturlng systems--is expected to mirror advances in parallel h~rdware.

problems. Creativity, in the most general sense, is the process of discovering new paradigms for =intelligent behavior." Albus [1] postulates the role of closed feedback hierarchical control mechanisms in an evolutionary model of intelligent behavior. Case-Based Learning (CBL) is a term that refers to problem-solving methods that make use of generalization and analogical inference rather than deduction in the acquisition of problem-solving knowledge. Unlike other paradigms for the generalization and transformation of solutions, CBL methods are in the |imk self-validating. This is because the cases serve as formal constraints upon the solutions. The principle contribution of the present work lies

The principle contribution of the present work lies in that it proposes an engdneering solution to the outstanding problem of knowledge acq~i~'on in expert systems. m

ISSN 0019-0578/92/02/0181/29/$2.50© ISA 1992

VoLuME 31 • NUMBER2 = 1992

t81

~,RTIFICIALINTELLIGENCE in that it proposes an engineering problem of knowledge acquisition in expert systems. This problem has long been recognized as the primary bottleneck ,,, :- ,,.vc,opmg a- , • an expert system [7]. Case-Based Reasoning (CBR) constitutes a fifth major paradigm of machine intelligence research. Its two long-term agendas are to develop a scientific model of human memory and build robust computer programs that can assimilate experiences and adapt to new situations [35]. Slade [35] reviews several case-based systems in a variety of domains. Golding and Rosenbloom [11] present a hybrid architecture for combining rule-based and case-based reasons o l u t i o r l tO .t._~ ~,,~. outstanding

cially those for which the rules are not known. The advantages msociated with the use of this paradigm are elaborated upon below. induction

Induction implies all inferential processes that expand knowledge in the face of uncertainty [15]. Inductive reasoning as it is employed in science was analyzed m a t h e m a t i c a l l y by Ray J. Solomonoff [3]. According to his model, a theory that enables one to understand a series of observations is seen as a small computer program. This program reproduces the observations and makes predictions about possible future observations. The smaller the

theory is necessarily invalid outside of this space, but rather that its validity may not hold there. Moreover, the inductive process must be d o m a i n - s p e c i f i c or knowledge-driven if it is to be scalable. This assertion follows from the failure of the "General Problem Solver" (GPS) (see the Appendix). Expert nSystem Overview Expert n s y s t e m s are an approach to the construction o f intelligent systems using a

cooperative distributed problemsolving paradigm. They are interesting because, while remaining domain-specific, they serve to generalize the conventional, expert systems operational domain. Expert n systems essentially consist of composition of exExpert n systems are an approach to the construction apenrecursive n'l subsystems, having a basis of intelligent systems using a cooperative distributed in primitive expert subsystems. Expert 0 systems, or next-generation expert systems, are pliable ing. Another hybrid CBR system program, the more comprehensive learning systems, which serve a vafor inventing mechanical devices, the theory, and the greater will be riety of distinct domain-specific functions. These functions can be telling stories, planning, and the degree of understanding. Acintegrated with one another with problem solving is described by cording to Occam's razor [3], the the result that more general and Turner [38]. simplest among a set of equally powerful expert n systems can be CBR is a less recent related valid competing scientific t!heories bootstrapped. term, which, like CBL, implies the is to be favored. Buih:ling a extraction of knowledge from knowledge-based system is !likede- Next-Generation Expert ~ m cases. However, unlike CBL veloping a scientific theory [27]. Next-generation exr~t systems methods, where the cases serve as Thus, there are two interrelated i m m u t a b l e constraints, CBR phases of knowledge-base con- are recursively defined i~ ~.~ clockmethods attempt to modify the struction: (1) model building and wise) in Figure 1. The)- ~.'~.~,,~ of cases themselves to arrive at a solu- (2) model extension. The meth- a base of immutable case i~i~.~,,¢es tion to the current problem. Un- odology developed herein follows at the core, surrounded by successive shells--each composed of fortunately, the process of case from Solomonoffs keen insight. three fundamental subsystems. modification is far more intricate The process of inducing a theThe first of these subsystems conthan would at first be obvious to ory must necessarily rely upon sists of a base of generalizations of the c~ua! observer. heuristic means. Furthermore, the cases. The second consists of a CBL is a new paradigm. It is the validity of the induced theory base of analogous pairs of these applicable to any domain that can is bounded by the space of known generalizations. This base, like be addressed with conventional exemplars (i.e., cases) of the the- the base of case instances, serves to expert systems technology~espe- ory. This is not to say that the constrain the local base of general-

problem-solvingparadigm.

"18:2 ISATRANSACTIONS

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

z-I Ruks

/ .

.

.

.

gous

.

.

f-'--

Ru s Ir

r

11"!

OeneraUzed Rules

Next Generation ExpertSystem

i-1

Aee]ogous

Figure 1-A Next-Generation ExpertSystem

izations; but, unlike the instance base, it is volatile. The third subsystem consists of a base of generalized rules that define the analogies. It should be noted that the use of the term =second-generation expert system" is a slight variant of that accorded by the general literature [19, 28, 37]. That is, here domain-models are specific to representational languages and architectural configuration. This is because randomness---the relative incompressibility of in(brmation [3]--is defined only with respect to some domain-specific language. The hierarchy of subsystems requires the use of multiple domainspecific languages. This ?rovides for a potentially greater compres-

sion of information. The interplay of the subsystems, depicted in Figure 1, is app!;ed in Table 6.

representation languages) and vice versa, as depicted in Figure 2. This figure extends the theoretical perspective presented in the ApExpert n Systems pendix (see Figure 12) because it Expert n systems differ from allows for the concurrent developsimple networks of conventional ment of expert n'l subsystems. expert systems in that they allow This extension is pragmatically for the transformation of knowl- motivated because in abstract :heedge across similar domains (i.e., ory all components can be serially subsystems). This fundamental constructed--allowing for the property is analyzed in the Appen- maximization of their collective dix. level. Expert n systems, as recursivdy Each primitive expert ° subsysdefined, are bootstrapped by one tem in Figure 2 may be decomor more expert "1 subsystems. posed to yield the beautiful These subsystems support the heuristic functions of one or more recursive geometry depicted in similar expertn'l subsystems (i.e., Figure 1. The number of levels in possibly differing only in the con- Figures 1 and 2 is dynamic and tents of their knowledge bases and finite. VOLUME 31 • NUMBER 2 • 1992

'd83

ARTIFICIALINTELLIGENCE

Level N-I ~-xpertSystem

/ •

Level 0 Expert Bystem



E~e~ B~tem Level 0 Expert B~tem

Level N-I Expert S~

Level 1 Expert

N

%

Level 0 Expert Bvstem

Level

Level N Expert S tem

N-1

ROute2-An ~l~fl nSystem lows for the eventual acquisition generation expert systems (and, of further generalizations in A therefore, expert n systems) relative through random and crystal pro- to two wall-known and accepted Intercommunication in expert n cesses. These rules tend to mitisystems is recursively defined as gate against a recurrent induction categories of machine-learning follows. An arbitrarily chosen of the expunged generalization(s) methodologies: (1) Data-driven methods apply subsystem, call it A, serves to focus in B. Thus, the generalizations in the attention of another arbitrarily A converge upon an ideal set a theory to many training chosen subsystem, call it B (see the through process of acquisition. instances to extract a preAppendix). Cycles in the commu- They also converge upon the ideal defined instance of the thenication architecture are possible set through the counterbalancing ory. and are amenable to analysis. The process ofdeletion. This is shown Example: Neural networks dynamic domain-speciflc set of in Figure 3. This figure: presents (2) Analytic methods apply a generalizations in B serves as the an algorithm that attempts to gentheory to a single training cases for A. No domain-specific eralize a case in B. instance to generate new generalizations in A may contratheories of similar (i.e., dict any of its cases. LeamlngParadigms bounded) complexity. Expunging a generalization in Example: ExplanationB is synonymous with loosing a An abLreviated taxonomy of member of the set of cases for A. learning paradigms will serve to Based L e a r n i n g (EBL) The occurrence of this event al- evidence the capabilities of nextmethods Intercommunication and Evolution in Expert n Systems

$84

ISA TRANSACTIONS

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

problem solver [32]. EBL systems, unlike expert "systems, pro"'~'~ide no assistance in the task of Repeat Search for a generalization of the case in B using the heuristic acquiring a tractable domain theguidance of the (remaining) set of generalizations in A for a quantum ory--or any new knowledge dur(see below). ing training [2, 6]. Furthermore, they cannot effectively generalize If a generalization of the case in B was not found, then select a fired generalization in A, if any, at random for temporary removal. number or =generalize to N" [33, /* Observe the progression towards blind search. */ 34]. Shavlik [34] clarifies this Until concept with the example that the A generalization of the case in B is found or Quantum' expires system should produce a plan for Restore all generalizations in A. unstacking any number of blocks when generalizing a plan for unIf a generalization(s) of the case in B was found, then any members of sucking them. However, in genthe set of temporarily removed generalizations in A, which are eralizing a w a g o n - b u i l d i n g inconsistent with the new-found generalization(s) in B are permanently example, the number of wheels expunged should not be subjected to such Else tenninam with failure. generalization. RSCL methods address this problem by providing /* k follows that the set of generalizations, in the communicating subsystems, evolves as was to be shown. */ for the domain-specific capture of the problem context. The RSCL methodology is simFigure 3-The Algorithmfor Case Generalization ilar to that of EBL (i.e., where a The basic tenet of RSCL is in- domain model, or background (3) Next-generation and expert n systems apply one or teresting in that learning is de- knowledge, is used to generalize more theories to one or fined to mean two distinct training examples into rules). more training instances paradigms; namely, the external This is because whereas the former each to generate analogous acquisition of random knowledge merely requires that appropriate theories of case-bounded and the internal acquisition of =featural ~ description languages transformatively derived redun(i.e., possibly greater) comdant knowledge. All acquired be predefined, the latter requires plexity. knowledge is represented in the that an appropriate theory of the Example: Random Seeded form of situation-action rules. domain in which a knowledge Crystal Learning (RSCL) Here, random seeds of knowledge base is to be acquired be prepromethods aggregate into regular crystalline grammed. Often, it is the case geometries (i.e., the logical shape that sufficiendy detailed domain being determined by domain-spe- theories are simply not known or RSCL Overview cific attributes) under transforma- are too complex to facilitate capRSCL methods are to next-gen- tional operators operating over ture in the event that they are. eration expert systems what pro- the course of time. The longer the RSCL methods follow duction systems are to expert time, the larger and more nearly Krawchuk and Witten's [20] ~=,----systems; that is, RSCL methods perfect the crystal (i.e., knowledge ommendation that learning be indefine the fundamental paradigm, base). RSCL methods are best tegrated with problem solving in while next-generation expert sys- suited for application to symmet- EBL. RSCL methods can also be applied to the task of learning tems are the embellishment (i.e., ric domains [30]. from failure (i.e., to avoid similar replete with user-friendly interpitfalls in future situations sharing faces, etc.). An illustrative next- Comparison of RSCL with EBL the same u n d c r i y i n g failure generation expert system, using causes). Failures are a source of the RSCL methodology as applied EBL is a very powerful method to the naval battle management for category formation and im- negative feedback in expert n sysproving the search efficiency of a tems. These methods can be exdomair, is provided below. Annealed-Search Algorithm

,

.....

VOLUME 31 ®NUMBER 2 = 1992

t85

ARTIFICIALINTELLIGENCE

tended to incorporate uncertainty information.

Transfo~UonaJAna~y

formational analogs by similar re,qsoning. Transformational analogs are ordered pairs of domain-specific entities that can be brought into equivalence through the effects of a transformational operator. This operator may itself be a transformational analog. Its space-time complexity is less than that of the image of its effects [30]. Notice that transformational analogs, while reflexive under the identity transformation, are clearly nontransitive according to the definition. The pragmatic implication of this nontransitivity is that the capability of any expert n system for learning by way of analog~. bears proportion to the multiplicity of its constituent expert n'l systems, as recursively defined. That transformational analogs are not symmetric is shown below.

only of pragmatic importance to the extent that computability results are of interest.

Fuzzy TemporalDependencies

Graded membership, as in the definition of fuzzy logic [ 16], can find expression in RSCL through the multiplicity of a terminal grammatical symbol. For example, the assertions, "If it's cold then wear a layer of clothes," and "If two layers of clothes are worn then it's cold," can be captured by the two grammar rules a-eb and bb-ea, respectively. However, fuzzy rules are not necessarily symmetric under transformation. For example, the rule 61bea is analogous to the rule /g->a, but the reverse is not true (i.e., due to inherent right recursion in the transformational map). This means that transformational anaEnumeration of Transformational logs are not necessarily symmetric. The practical interpretation of an~lo~ this fact is that the temporal orThe definition of transforma- dering of the search space plays a tional analogy is domain-specific. role in the discovery of analogies Also, the definition of the space- [23]. This result agrees with contime complexity of any transfor- temporary results in machine mational operator is necessarily learning (i.e., the importance of dependent upon that of the un- the order of the training instances derlying transformational lan- to that which can be efficiently guages. The complement of the learned) [26]. It follows that the given definition for transforma- RSCL methodology is sensitive to tional analogy (i.e., the discovery the order of case acquisitions and of ordered pairs of domain-spe- rule activations. It is preferable cific entities that are not analo- that simple cases be presented begous) may not be recursively fore more complex ones. ! . . ' ILy , . ;o A~,q,n,~4 n e x t . ,=, enumerable [17]. That is, given a The definition of transforma- sufficiently powerful transforma- The Probiem tional :nalogy is object-oriented. tional language, such as a Type 0 For example, the ordered pair of grammar (transformations under The problem is to develop a grammar rules (namely, aa-~a which may involve the use of prototype Learning System (LS) and bb-~b) are transformational more than one implication), then that supports decisions regarding analogs, since they can be equated the set of transformational analogs the interpretation of a banlefidd through the effects of the rule is recursively enumerable. How- scenario. Intelligent decision a-oh. Conversely, the ordered ever, the complement of this set is making requires an ability to prepair of grammar rules (namely, not. The nonrecursively enumer- dict one's environment and reab-~a and ha-oh) are not trans- able transformational analogs are spond in an optimal manner with

Analogical reasoning is a mechanism for using experience in solving a problem to solve a new, similar problem [14]. Analogy may be a powerful new paradigm with which to exploit reusable specifications, but it has received little attention in the relevant literature [22]. However, a study of software reuse through analogical so~,vare specification is currently underway [4]. One of the principal advantages of rule-based systems over standard software solutions is that the rules, unlike the software, are more generally applicable. Transformational analogs are similar rules. McDougal et al. [23] darify the concept with an example showing how small groups of soldiers can converge from many directions upon a fortress occupied by terrorists and successfully capture it. They go on to show how this scenario is analogous to radiating a tumor from several directions with weak rays, which are collectively strong enough to kill the tumor while preserving the surrounding tissue. Furthermore, they note the importance of similarity in encoding to successful "spontaneous analogical recall." This notion of simi-

I~

ISA TRANSACTIONS

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

respect to some underlying purpose [10]. LS accepts inputs that describe the battlefidd situation and produces outputs that detail the desired doctrine to be followed. The methodology developed herein is i m m e d i a t e l y transferable to the manufacturing sector for application to intelligent systems. The theater is presented to the user in the form of a moving picture. It is shown on any of three possible pairs of labeled coordinate axes (i.e., x-y, x-z, y-z) as selected. Furthermore, these views can be alternated by the user with an appropriate keystroke (i.e., =F" for x-z or front view, =S" for y-z or side view, =T" for x-y or top view). For example, a submarine will disappear from the top view upon submergence. Either of the remaining two views may then be selected to track its movements. Subsequent versions of LS may present the motion picture in a simulated 3-D format.

The Scaling Problem The CBL methodology developed below allows for the decomposition of a learning problem into several subproblems, each of which is accorded a domain-specific representation. This view agrees with Minsky [24, 25]. It differs in what follows herein in that the developed methodology additionally provides for the broad and formal definition of t ransformatively-based shells. These shells allow for the sharing of knowledge among distinct domains.

The Approach A CBL system is implemented on a Distributed Array Processor (DAP-610) SIMD Supercompu-

ter having 4,096 nodes, each with a math coprocessor and 64K of local memory. It follows t:hat LS can accommodate up to 4,096 cases, containing up to about 500 vehicle-response combinations each, without incurring any performance degradation. Here, there are enough processors available so that the match between the retrieval probe and every case in the m e m o r y can be simultaneously effected. Thus, the need for the indexing of cases evaporates [39]. Subsequent performance degradation is linear with secondary memory swap time. LS is currently implemented in FORTRAN-Plus. Conversion to C is contingent upon the availability of a C compiler. This next-generation expert system is based upon an RSCL method. The method saves knowledge in the form of cases and automatically extracts a basis of rules fiom the cases. Furthermore, the system is capable of inducing and verifying novel rules by process of transformational analogy. It also promises to transform knowledge across similar domains, thereby solving the classical knowledge acquisition bottleneck. Case-instances are randomly generated in compliance with a black box that consists of up zo 25 Naval Aegis Doctrines. The essential idea is that an RSCL method, augmented by some domain-specific knowledge to facilitate implementation, can tractably acquire these rules using nothing more than an enumeration of instances and some domain-specific languages. In effect, it translates declarative into procedural knowledge.

l~e0me= A battlefield scenario tracks the movement of aircraft (A), missiles (M), ships (S), and submarines CO). Moreover, vehicle instances may be somewhat undefined. Here, an "F ~ is used to identify an aircraft or a missile, while a ~W~ is used to identify a ship or a submarine. Each vehicle is assigned a disposition (i.e., A l l y - *A,~ Enemy=E,~ or, by default, Unidentified =U~) and a uniquely identifying number. Unknown case-instance types are designated with a =0." This designation is required for vehicles, =F" and =W." The vehicular types are defined in Tables 1 through 4, along with .constants that define their minimum range (NRange), maximum range (XRange), minimum velocity (NSpeed), m a x i m u m velocity (XSpeed), and a constant of proportionality (C.P.). The constant of proportionality, c, is; used for computing the probability that a vehicle will arrive at a target. It can be estimated from the desired probability of arrival: p, using the equation, c = 2t - 1. This equation finds use in the instantiation of the four tables. All velocities in the tables are expressed in meters per second, while all distances are expressed as multiples of 100 meters These tables may be extended in subsequent versions of IS. Vehicular types are not to be confused with type instances. For example, a reconnaissance aircraft could be a fixed-wing plane or a helicopter. •

The Vehicular Range and Velocity Constants The range and velocity constants enable LS to check entered VOLUME 31 • NUMBER 2 • 1992

t87

ARTIFICIALINTELLIGENCE

NRange XRange NSpeed XSpeed C.P.

N o.

Definition

1

Bomber

200

100,000

200

400

0.99

2

Fi~hter

50

10,000

300

1,000

0.99

3

Reconnaissance

100

150,000

200

400

0.99

Table 1-Air©raft(A) Types

NO.

Definition

1

Anti-Submarine Weapon

2

Automatic Standard Missile

3

Semi-Automatic Guns

NRange XRange NSpeed XSpeed C . P . 1,000 0.30 1,000 0 1

.

1

9,000

400

5,000

0.80

1

25

400

1,500

0.00

Table 2-Missile (M) Types

No. __..1

Definition Aircraft Carrier Battleshi_,_,,,........,p..._

. 22. . 3

Gun-Boat

Table3-Ship(S}Types

q

No. I

Definition

Ngan~e XRange NSpeed XSpeed C.P.

1

Conventional

50

10,000

0

10

0.99

2

MHD-Stealth

100

200,000

0

15

0.99

3

Nucic~"

200

400,000

0

15

0.99

Table4-Submarine(U)Types case instances for validity. Also, the program must recursively add the maximum containing vehicle range to the maximum armament range in the final computation of the maximum strike distance. The computation of the minimum strike distance is similar.

188

ISATRANSACTIONS

Containing vehicles are easily vehicular allocations. Constraints identified, since they share the are checked befi:,re the issuance of same coordinates with the launch a particular response. vehicle until the time of launch. A containing vehicle must be a re- The Tracks sponse vehicle that has the same target as its armament. A followThe relative position, direction, up system to LS can optimize any speed, and identity of the vehicles

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

define a track (i.e., ignoring acceleration, topographic constraints, etc, for purposes of this prototype). A uniform grid mesh is assumed for the sake of simplicitymhaving an implicit origin at the center for allied command and control operations. The center is assumed to be at sea level. Again, one geometric unit is equivalent to one hundred meters on any axis. Similarly, one temporal unit is equivalent to one second. Vehicles can be introduced to, or exit from, the theater at any time. Current positional information, direction, and velocity are encoded in the form of 3tuples. For example, the vector, =[U U.3 50,7,-9 0,-2,1]" designates a nuclear submarine of uni d e n t i f i e d disposition. This submarine is currently located approximately 5,000 meters downrange on the abscissa, 700 meters down-range on the ordinate, and submerged at a depth of approximately 900 meters. It is not c h a n g i n g its x-coordinate, is changing its y-coordinate in a negative direction at the rate of 2 meters per second, and is surfacing at the rate of 1 meter per second. LS gives priority attention to the most rapidly moving vehicle. Notice that an incredibly large possibility space can be captured by this simple scheme (i.e., reminiscent of the =n-body problem" in classical physics). Such large possibility spaces are characteristic of most real-world problems, particularly that which artificial intelligence (AI) must address if it is to be deemed a prgmatically successful approach. Weapons miss their targets according to a probability based upon the type of weapon deployed, its speed, and the distance it travels. A simple linear equa-

tion for computing the probability of arrival for a given vehicle and type is:

p= c + 1- c ( XRange-d -T- ~X1~nge- Nt~nge + v- NSpeed ~ (1) XSpecd- NSpeed J " XRange>NRang~ XSpeed>NSpeed; where: c = the constant of proportionality d= the distance between the vehicle and the target in multiples of 10U meters p = the probability that the vehicle will arrive at the target v = the vehicles velocity in meters per second

The Responses Each battle scenario must include a description of the appropriate responses (i.e., doctrine types). An appropriate response to the unidentified nuclear submarine example might be, =ASSUMED ENEMY (currently at) [50,7,-9], MISSILE.1 (target currently a0 [50,7,-9]." Here, an ant i s u b m a r i n e weapon is to be directed against an assumed enemy. This foe is uniquely identified by the last reported position. In actuality, the track coordinates of the missile will differ based upon such factors as the trajectory of the enemy vessel, the time required to deliver the weapon, its velocity, the topography of the region, etc. Response vehicle trajectories are implicitly computed and displayed. The purpose of the LS battle management system is nGc to detail an appropriate response (i.e., since programs already exist for

this explicit purpose), but rather to characterize one. For example, it is implicitly assumed that an ant i s u b m a r i n e w e a p o n will be launched from its containing rehide at the appropriate time subsequent to the iatter's launch. It is sufficient that the present coordinates of the containing vehicle be. given. Destroyed or incapacitated vehicles (i.e., as determined by a probability function given below) and vehicles that exceed their effective range are automatically removed from the theater. Also, the multiplicity of a particular response need not be detailed. The responses are presently limited to the doctrines defined in Table 5. These doctrines may be extended ia subsequent versions of IS. The unidentified disposition response can be applied to enemies who surrender and vice versa. The NIL response serves as a negative t r a i n i n g instance, which provides for the retention of more cases. These instances of damp oscillatory behavior augment the reflective process and otherwise constrain the generalization process. Cases serve as =homes~ for generalizations. ~ , ~ralizati0ns C.ases need to be generalized, and domain-specific languages need to be developed to support the process of generalization. Th,~e languages facilitate the concise., symbolic expression of relevan~t features in the case-b;,seRSCL methods serve as schema, which can be instantiated with preceding wc~k in AI. Also, RSCL methods (like neural networks) can continue to operate in the presence of processor (node) faiiures. This feature allows for the development of faultVOLUME 31 o NUMBER 2 = 1992

189

ARTIFICIALINTELLIGENCE

RESPONSE

DEFINITION

N

NIL or No Response

Ai

Assumed AII]t at/ill Track Coordinates

Ei

Assumed Enem~ at ith Track Coordinates

Ui

Unidentified Disposition at ith Track Coordinates

A.# i

Launch Intercept Aircraft T ~ e # to ith Track Coordinates

M.# i

Fire Missile Type # to ith Track Coordinates

S.# i

Launch Ship T~I3e # to ith Track Coordinates

U.# i

Launch Submarine Type # to ith Track Coordinates

Table5-Respon~ Oc~lfln~ tolerant, wafer-scale, parallel host machines.

Generalization Languages Without some form of generalization, the training received by a system would not be transferable to new problems. The choice of generalization language has a major influence on the capabilities and efficiency of the learning system. The system designer, having selected a generalization language, builds in his or her biases concerning useful and irrelevant generalizations in a domain [26]. That is, each generalization language is capable of representing only some of the possible sets of describable instances. Again, inconsistency is unavoidable in many real-world applications. Languages for learning in the presence of such inconsistency are needed. A related problem warranting attention is that of defining languages for representing and using incompletely learned generalizations. Work in machine learning has suggested that useful gene.ralizations do not always exist in the language in which cases are described [9]. RSCL methods ret'~

ISATRANSACTIONS

quire the implementation of distinct languages for the description of cases and their generalizations on each distinct level. Methods by which a program could automaticaUy detect and repair deficiencies in its generalization language would represent a significant advance [26]. Such methods comprise expert n systems domains. It should be noted that the determination of an appropriate language for the "featural ~ description of cases may not be a simple task. Bareiss [2] maintains that the main source of power for maay machine learning systems, such as ID3, is the "featural" language defined by the user to characterize the training set. Quinlan [29] reported that the selection of appropriate features for describing chess positions took almost two man-months [12]. Rubin [30] discusses the need for domain-specificky in transformational languages and translators. The importance of domain-specificity to functionality in neural networks has been recently shown by Lin and Vitter [21]. That is, they show that

without any domain-specific knowledge, the problem of training neural nets is NP-complete (i.e., generally unfeasible). This is true for even for very simple twonode nets that have only two noninput nodes, one of which is hidden. Expert n systems, on the other hand, are predicated upon the existence of domain-specificity to ensure their tractable functionality (see the Appendix). The definition of domain-specific lang',ages can be captured with expert n compilers (ENCs). Such translators not only facilitate extension in the language (i.e., including the detection and repair of deficiencies) but also permit the bootstrappiI.~g of similar new domain-specific languages as well.

The LS Languages In LS, the generalization languages must support all manner of projected vector geometries for determining appropriate vehicular responses. They must also support the generalization of velocity vectors. A simple domain-specific calculus for generalization enables the strategist to capture heuristic reasoning. An illus~ration of the

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

use of this calculus is presented in ful orthogonal, descriptive, and Table 6. implicit function (i.e., CDR). The Level 0 generalization lan- This language was designed to guage consists of one very power- emphasize the use of symbolic re-

latiom. This maximizes its utility for purposes of transformational analogy and generally abstracts the more relevant decision-mak-

Level 0: I

GENERALIZATION

I

CASE-INSTANCE

1) [A S.100010] [U S.2 0 0 00]

[-Max, Max] [0, Max] [0, Max] ...) A.3 E

[A S.] 70,80,0 10,0,0]z [U W.0 47,-3,O 0,O,O]2

tu s . o o o o o]

) A.32 (on S.1)

[-Max, 0]

E2

2) [A S.1 0 0 0 10]

[0, Max] [0, Max] .-.) A.3 E

1) [A O.0 -Max Max lO Max] [U A.0 0 Max 500 Max]

[-Max, Max] [0, Max] [0, Max] -.)

E 2) [A O.0 -Max Max 10 Max] [U O.0 0 Max 10 Max]

[-Max, Max] [0, Max] [0, Max]

[A A.3 47,-3,0 186,206,40011

[A S.1 75,80,0 10,0,012 [U A.2 70,70,200 500,100,-50].~ [U W.0 47,-3,0 -4,10,014 )

E3

-..)

E

II

Table 6-Table Structurefor the LS BattleManagementProblem ..... vOi'.UME31 • NUMBER 2 • 1992

t9t

ARTIFICIALINTELLIGENCE

Level 0:

CASE-INSTANCE

GENERALIZATION 1) [A S.10010 Max] [E O.0 0 Max 3OMax] [-Max, Max] [I0, Max]

[o, Max]

[A A.3 140,100,200-50,-22,-200]1 [A S.1 80,80,0 10,-2,012 [E A.2 70,70,200 430,430,0]3 [E S.2 45,2,0-4,-33,0]4 )

A.2 M.2

A.23 (on S.1) A.24 (on S.1) M.23 (on A.2) M.24 (on A.2)

Table e (continued) Table Structurefor the LS BattleManagementProblem

Level 1:

GENERALIZATION 2

CASE-GENERALIZATION I

I) [* * * 0 I0]

[* S.2 * 0 0]

$ [* O.0 -Max Max 10 Max] [* A.0 0 Max 500 Max]

$ E

[A S.1 0 0 0 10] [U S.20000] I-Max, Max] [o, Max] [0, Max] $ A.3 E )

[A O.0-Max Max 10 Max] [U A.0 0 Max 500 Max] [ - ~ x , Max] [0, Maxl [o, Max] $ E

Table 6 (continued) Table Structurefor the US BattleManagementProblem t9:2

ISATRANSACTIONS

CASE-BASED LEARNING: PARADIGMFOR AUTOMATEDKNOWLEDGE

Level 1: GENERALIZATION 2

CASE-GENERALIZATION

1) [* * * "1

[* * • .1

[A 0 . 0 - M a x Max 10 Max] [U A.0 0 Max 500 Max]

I-Max,max] [0, Max] [0, Max] $

$

E

)

[* S.1 0 0 "1 IE O.0 * 30 Max] [10, Max]

)

[A S.1 0 0 10 Max] [E o.0 o Max 30 Max] [-Max, Max] [10, Max]

$ A.2 M.2

[o, Max] $ A.2 M.2

Table 6 (conUnued)-TableStnRaJrefor Ihe L$ hale ManagementProblem

ing criterion for the user. The details of this language follow. The definition of the Level 1 generalization language follows from the definition of the Level 0 generalization language and is summarized, along w i t h the other application languages, below. Consider the function: CDR ([ I V.T zlz2as2 ] [ I V.T z~z'2sis2 ]),

(2)

where the functional variables, referring to the current time, applied in any preferably nonredundant combination in lexicographic order, are: C = C l o s i n g R a t e m t h e required closing rate between constrained vehicles in 3space in 100-meter increments per second. C • 0

implies converging, C < 0 implies diverging, and C = 0 implies parallel. -Max < C < Max for [q, q] is implied by default. Thus, there is no default assumption. Here, in keeping with convention,

of vehicle ranges is trar-sparent to the user.) R= Range--the closest point of approach, i.e., the extrapolated minimum Euclidean distance between the specified vehides in 3space in 100-meter incre-

C - (u2 - Ul)(X2- Xl) + (v2 - vl)(y2 - yl) + ( w 2 - Wl)(Z2 - Zl) 3/(x2 - xl) 2 + (y2 - yl) 2 + (z2 - Zl)2

D = D i s t a n c e - - t h e required Euclidean distance between constrained vehicles in 3-space in 100-meter increments. 0 < D _
(3)

ments. 0 < R < M a x for [rl, r2] is implied by default. Here, The parametric variables are: I = Identificatiot~, where IE {A, E, U}

VOLUME 31 • NUMBER2 • 1992

193

ARTIFICIALINTELLIGENCE

(4)

R = ~'~/(.X2 -- Xl) 2 + (0'2--.71) 2 + ( Z 2 - - Zl) 2 , '--< 0;

L40, ,>0, where: 4,) = q [ q -

~a + ( - 2 - uOt] z + ~ - y l

+ (,,2- n)t] 2 + [ ~ - ~ + ( ~ -

~)t] 2,

(5)

and

t=-

T

(,,2- ul)(x2- xO + (,,2-v O ( n - ) . ) + (,o2- un)(~- zO (-2 - - 0 2 + (,:2 - ~ ) 2 + (u,2 - ~ ) 2

=

Type, where T e {0 (i.e., any type), 1, 2,

hicular allocations is, however, nontrivial.

3}

Vehicle, where V {A, M, O (i.e., any vehicle), S, U}

V

=

x,y,z

=

$

.-

THE NEXT-GENERATIONEXPERT SYSTEM

the absolute spatial coordinates of a vehicle and zl, z2 define a locally valid interval generalization

T a b l e 6 depicts an RSCL method specifically designed for the LS battle management problem. It is followed by a brief description of the example as well as a discussion of the relevant mechanics.

the absolute speed of a vehicle, where

The DescriptiveLangoages

(,=.q=2+y2+ 2 )

The Case-Instance column in Table 6 lists several battlefield scenarios that depict the LS battle management problem. Here, alU, V, W = the relative rate of ternative generalizations have change, expressed in been approximately ordered acmeters per second, cording to Occam's razor [3]. of a vehicle's spatial That is, the most elegant of comcoordinates peting theories is to be favored. The axes of the coordinate pairs LS evolves and retains the most may be partially instantiated. For elegant generalizations on every example, the Level 0 generaliza- level. Two implications are used: tion [E U.0 -10 -9 0 0] refers to an active one, =->" and a passive any type of enemy submarine, one, =$". Most instances of the which is currently stationary at an functional variables are shown with their default values for the inclusive depth between 1,000 sake of simplicity. and 900 meters. The x-y coordinates are not relevant. An}, nontrivial optimization is The Level 0 Case-Instances to be effected by an expert n sysThe Case-Instance column in trm. All optimization within LS Table 6 presents the following is currently of a trivial nature (e.g., battle scenario. Initially, the the[0, Max] and [1, Max] imply [1, ater contains an allied aircraft carMax]). The optimization of re- rier. A s t a t i o n a r y ship or and sl, s2 define a locally valid interval generalization

t 94

(6)

ISA TRANSACTIONS

submarine whose intentions and allegiance are not clear is introduced shortly from then on. Therefore, the alliance assumes the unknown ship or submarine to be a foe and launches a reconnaissance aircraft from the deck of the carrier to confirm the supposition. The reconnaissance aircraft rapidly confirms the supposition (i.e., see the third case where the unknown vehicle is determined to be an enemy battleship). (Apparently, EW countermeasures were picked up once a direct line of sight was established.) The nongeneralizable response subscript refers to the case-vector having the same denotation. The Cartesian coordinates are instantiated from every associated matching case. A fighter aircraft of unidentified disposition is subsequently detected on carrier radar. This fighter is determined to be unfriendly based upon very close range surveillance by the previously launched reconnaissance aircraft. The fighter appears to be following the reconnaissance plane before the latter's landing upon the deck of the carrier. The landing is uneventful. The carrier launches one or more fighters, each containing one or more automatic standard missiles subsequent to the identification of two hostile targets. These missiles are targeted for the enemy battleship. A transparent computation predicts the point of

CASE-BASED LEARNING: PARADIGM'FORAUTOMATED KNOWLEDGE

impact based upon the intersection of all extrapolated trajectories. Only missiles may change their z-sign in flight. In all other cases, attempting to effect a z-sign change will result in the z-coordinate being fixed at zero. The carrier simultaneously launches one or more fighters, each containing one or more automatic standard missiles, which are targeted for the enemy fighter. Notice that similar weapons may be simultaneously directed at distinct targets, but the response must be orthogonal to facilitate generalization. The carrier does not turn to pursue the battleship in this brief s i m u l a t i o n . However, any vehicle's direction ~nd velocity can be preprogrammed by the user at any time and are otherwise randomly reset upon encounter with a spatial boundary. The latter action is a form of =simulated annealing" and facilitates the acquisition of analogous knowledge. Vehicles are allowed to =pass through" ~ c h other in view of the coarseness of the geometric scale.

The CDR function allows allies, enemies, and vehicles of unidentified disposition to be mapped onto any disposition. H o w e v e r , instances must be mapped onto themselves to preserve continuity. The implied ordering here is lexicographic. All responses have been ordered to facilitate the operations of the pattern marcher. A consistent internal precedence is established for the pattern marcher to preserve continuity. One of the advantages of LS is that it permits the use o f relativdy large case-instances. A case-instance consisting of ten allies, ten enemies, and

ten unknowns, for example, is not unusual. The user interface checks the user's entries for some possible contradictions. Such contradictions include ships or planes that are below sea level, submarines that fly, etc. The remaining details are assumed to be stir-explanatory.

The Level 0 Generalizations. Both generalizations of the first case compute the functional variables for an allied aircraft carrier not exceeding 10 meters per second and any unidentified stationary battleships (ships in general) - - a subset of ships or submarines --presendy anywhere on the surface. The constraints on the functional variables are given by default, for purposes of this example, unless otherwise indicated. The first generalization targets the assumed enemy ship for reconnaissance just in case it lies at a distance within the bounds specified in the third row of Table 1. All generalized responses, at either level, refer to the second row of the generalization. The second row Level 0 generalizations, in turn, acquire targeting coordinates from the associated matching case-instance rows. This is useful in predicting an appropriate response scenario. The carrier is not a response vehicle and thus does not enter into the computation of the effective strike distance. The second of many possible generalizations has the same effect as the first, but only in case an allied ship, diverging from an enemy ship, can be found. The degree of divergence can be specified. The effect iterates over all cases that satisfy the specified constraints.

Both generalizations of the second case compute the functional variables for any allied vehicle presently located anywhere and moving at a speed of at least 10 meters per second, and any unidentified vehicle (i.e., aircraft or aircraft and missiles, respectively). The first generalization assumes that the mere existence of the unidentified aircraft, moving at a speed of at least 500 meters per second, is just cause to assume that it is an enemy. The second generalization assumes that any unidentified vehicle moving at a speed of not less than 10 meters per second is an enemy. Notice that neither generalization subsumes the other, nor is there any interaction with the preceding case.

The generalization of the third and finn] case computes the functional variables for any allied aircraft carrier located in any surface direction. It must be moving at a speed of at least 10 meters per second, and any enemy vehicle not located beneath the surface and doing at least 30 meters per second. Here, enemy vehicles targeted for response must be positioned more than 1,000 meters from the allied carrier. An anomalous situation would otherwise exist. The launch of attack fighters that contain automatic standard missiles is initiated. These missiles are destined for all enemy targets within striking distance, as previously explained. Notice that the generalization language does not permit such generalizations as, =aircraft or submarine" or =Type 1 or 3 vehicles." This is an appropriate design concession and can often be circumvented, where necessary, through the prudent choice of altitude and speed constraints. For VOLUME 31 = NUMBER 2 * 1992

t g~

ARTIFICIAL INTELLIGENCE

!'/

fires, followed by the first generalble without the use of an expert stub. However, the Level 0 stub ization of the second case, folreduces the search space to 3*A*(E lowed by the generalization of the + U) possible generalizations per third case. Level 0 case. Here, the three letters refer to the number of IFF The Level I Generalizations dispositions of that type, from The generalization of the first which to choose, in the associated case-instance in LS. Figure 4 case-generalization approximates presents the Level 0 expert n stub: the most specific one possibie in this instance (i.e., using the Level 1 expert n stub - - see below). Similarly, the generalization of the R [0] second case-generalization is the /* An expert n stub for the Level 0 rule base. *i most general one that could be /* Three states are possible. */ given. It is depicted for pedagogical purposes, since it embodies V ¢ {case_instance, '(9'} several contradictions on the first T e {case_instance, 'O'1 case-generalization. Notice that "S.1" is not a contradiction on T ~* 'O' --) V ~* ' O ' "O.0", since it is a subset. In actuV = 'S ° --) zl, z2 = 0 ality, no generalization is retained here. This is a direct consequence Expand the bounds of [cl, c:t], [dl, d21, [r~, r2], [zl, z2], and [si, s2] as necessary to cover each case instance minimally as supplied in of the foresight provided by the sequence by the control processor. Intervals are ~lit into two distinct stub. The reflective processes (see intervals where the introduction of a discontinuity would have otherwise bdow) can expand upon the set of caused the loss of the existing interval. This is always possible. realizable generalizations. Level i Arbitrarily select among the allied vehicles for row one. If there "are no generalizations should be as genla!!iedvehicles, then no generalization is possible. eral as possible, but not more so. RSCL methods provide for truth Arbitrarily select among the enemy vehicles and vehicles of unknown maintenance by way of update opdisposition for row two. If there are no such vehicles, then no generalization is possible. erations. They also evolve ever more general generalizations on every level. Figure 4-The Level 0 ExI~R n Stub Asterisks are ~wild cards" that tions may not exist in the imple- The Level 1 Case-Generalizations match any instance in the corremented language. This possibility sponding position. This instance Level 1 Case-Generalizations will be bound to the asterisk uncan be minimized in practice through the selection of orthogo- are temporally associated (i.e., less a different instance is specified nal cases. For example, the third containing the Mos~ Recently subsequently. A single asterisk is case-instance should not be aug- Fired or MRF-rule) Level 0 gener- used to denote "V.T" or any ormented with the response (say, alizations, which may or may not dered pair of upper and lower "M.33") unless its orthogonal re- be analogous under domain-spe- bounds. This compression elimisponse (namely, "M.3,/') is in- cific transformation. Only the nates redundancy with the Level 0 cluded. T h e potential for analogous case-generalizations are generalization language and reo ever retained, since the retention duces the magnitude of the search effective generalization is also proof a case-generalization implies space. portional to the number of exthe retention of a generalization of It will be observed that every pert n-1 systems in an expert n it. Again, Table 6 depicts a possi- position has two not necessarily system as recursively defined. ble battle scenario. Here, it is im- distinct states associated with it, The search space for Level 0 plicitly assumed that the first namely, the most specific state generalizations is clearly intracta- generalization of the first case and the most general state. If

example, the third block of Level 0 generalizations specifies the position and velocity of an enemy aircraft or ship using non-negative bounds upon the z-coordinates and a speed of at least 30 meters per second, respectively. Cases can be random with respect to any chosen generalization language. Relevant generaliza-

u

196

ISA TRANSACTIONS

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

there are n positions, then there are 2'* possible states, where n is set to an upper limit of 24 in IS. This implies that there are about 412 (i.e., over 16 million) possible generalizations per Level 1 case in LS. The four ordered maps are <*,*>, <*, ¢2>, , . The search space can be reduced to an upper limit of 212 (i.e., 4,096) possible generalizations (i.e., <*,q>, ). This is possible with the assistance of the Level 1 expert'* stub provided. The search space is, on the average, much smaller. The search space can be reduced if simple training instances are presented initially or if the search is terminated upon the discovery of a generalization (i.e., with proportionately less general resultant rules initially). Progressivdy more complex instances can be saved for later, when progressivdy more efficient use can be made of computational resources (i.e., crystal learning). Figure 5 presents the Levd 1 expert "*stub.

Breadth and Depth of Languages It is very important to note, as can be proven, that the definition of transformational analogy (i.e., all RSCL levels taken into simultaneous consideration) requires the use of mutually random generalization languages on distinct levels. For example, a generalization language that merely incorporates distinct functional variables (i.e., and implied conditionals) would not qualify as a mutually random language because it can be viewed as an extension of a similar implemented language. In LS, for example, the Level 1 generalization language uses asterisks for the designatien of positional generalizations, while the Level 0 generalization language docs not

R [ll /* An expert a stub for the Level 1 nile base. */ /* Note: The ordered map is never used. */

/* Consider corresponding antecedent, aj, and consequent, cj, positions in a case-generalization. The following/ales also apply to the response pairings, which are treated as antecedent and consequent sets for the determination of equivalence and subset membership. */

Else/* Note: 'E' c 'U' *1 ~jc C cjl cj c ajl [zl, z2] = [-Max, Max] I [Sl, s2] = [0, Max] I 1, c2] =I-Max, Max] I [dl, d2] = [0, Max] I [rj, r2] = [0, Max] --, <*,c2> Else <*,c2> I

Figure S-The Level I Expert n Stub

use them or their isomorphic equivalent. It follows that the two levds of generalization language are mutually random. All levels of generalization language, within a case-instance domain, must be mutually random. If two contiguous levels of generalization language are not mutually random, then the reflective process reduces to the special case of self-reflection. No Level 2 generalization language has been enumerated. This does not, in itself, preclude the existence of one, but rather serves to suggest that the search for one would not be presently justifiable on an empirical basis. Just as it is better to formulate fine-grained rules (which are justifiably replaced rather than updated), it is better to "atomize" generalization languages to reduce their singular generality, while increasing their collective generality. Atomization also renders the random search-space tractable. The atomization of generalization languages implies the same for the case-description languages. Cases

must necessarily be restricted to the detailing of salient features to the greatest possible extent. It follows that only one generalization language need be paired with any case-description language at any time. The simplest among competing generalization languages is to be preferred, according to the dictates of Occam's razor. It should be possible to define the task of creating domain-specific languages, for the atomization of cases or their generalizations, as a domain for expert n systems. For example, in the knowledge-based design of a collocating functiondescription language, the following two initial rules are germane: RI: If data is periodic, then function is of transcendental form. R2: If data is nonperiodic, then function is of polynomial form. The possibility of cross-domain transference (see below) is contingent upon (and proportional to) the use of similar languages in mutually random domains. VOLUME 31 = NUMBER 2,1992

t 97

ARTIFICIALINTELLIGENCE

The greatest proportion of effort required to implement an expert'* system is tied to the definition of constituent domainspecific languages. This effort is, however, at least an order of magnitude less than would be required for the definition of a sizable base of rule-instances. It can be shown that the general automation of the language-acquisition task is attainable only if realized as an expert'* domain. That is, ENCs are required for the definition of themselves, where n is arbitrary. The RSCLMelhod

Next, an overvi,-w of the operational mechanics RSCL is provided. The discu.~_ a is somewhat more general than is required by the scope of the LS battle management system. Nevertheless, the following information should prove useful in the implementation of future next-generation expert systems--in addition to LS. RSCL Mechanics

First, the acquisition mode is entered. Initially, all levels in Table 6 are empty. The user begins by entering an antecedent case-instance context (i.e., theater definitions) on Level 0. The system will momentarily switch into predictive mode in an attempt to generate a consequent case-instance (i.e., an appropriate response scenario) through the application of any Level 0 generalizations. Subsequent to this attempt, the system will switch back into acquisition mode and query the user for the correct response in the event that the generated consequent differs from that desired. Replies are saved as case-instances. Generalizations found to be in error are expunged. The firing of t 98

ISA TRANSACTIONS

any generalization results in an update of its time stamp. This is in keeping with an MRF cachemanagement policy. Cases are never fired because they are represented in a distinct language, the interpretation of which adds nothing that could not, in principle, be captured in the generalization language. T h e least recently acquired nongeneralized cases on Level 0 are expunged, followed by the least recently fired generalizations here, if it becomes necessary to reclaim memory. The same algorithm applies to higher-level generalizations, with the exception that here, case-generalizations cannot exist without having an associated generalization. The concept extends recursively to any cache hierarchy. The cases (rules) in each cache can be maintained as an indexed hierarchy of constraints (predicates), d y n a m i c a l l y defined. These indices not only streamline the discovery of constraint violations (i.e., on all levels), but facilitate the interpretive p a t t e r n matching process as well. Constraints (predicates) advance in the hierarchy in direct proportion to the relative frequency of their exercise. The system will search for Level 0 generalization rules, satisfying the definition of transformational analogy, whenever the user supplies a proper consequent. The generalization rules explain the case-instance mappings. These rules must not violate any case-instance constraints. In addition, cases and generalizations can be subsumed. They are thus replaced. This iterative process is limited by the domain-specific expressive power of the generalization languages implemented at

each level. Any update operations, where they exist, will propagate to subsequent levels. There may be zero generalizations on Level 0, and one elsewhere, to many generalizations per case saved in the associated generalization caches. Multiple versions of the generalization rules are allowed so long as each alternative involves exactly one active implication. This restriction is necessary so that each alternative satisfies the definition of transformational analogy. It follows that each distinct level embodies mutually random domainspecific transformation rules. These rules are fired in parallel as part of the search and rule-refinement process. Here, search refers to the search through the problem domain space, including the induction of new search rules (see below). The acquisition of a Level n - 1 generalization induces the creation of a Level n case antecedent. T h e system will momentarily switch into predictive mode in an attempt to generate a case consequent (i.e., a rule analog for possible retention on Level n - 1). This is accomplished through the application of any Level n generalizations for which the associated case antecedent does not match the context. No generalization may map to a violation of any case constraint on the immediately preceding level. Generalizations are expunged as necessary to ensure compliance with this requirement. The loss of a Level n - 1 generalization, where the rule is a Level n case predicate, induces the loss of the indicated Level n case(s), their associated generalizations, and so on at successive levels. That is, the generalizations are expunged by the root.

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

Random vs. Crystal Learning A process is said to be serializable if it can be decomposed into an ordered set of subprocesses, members of which, once solved, remain inviolate [18]. The process of generalization is not generally "serializable." This is the case for LS. Thus, a constrained uniform random search of the generalization space is to be effected. It follows that no seed generalization is, in principle, precluded from discovery. A hybrid SIMDIMIMD constrained search algorithm, which makes use of expert n stubs, has been written. It is readily modified for implementation on pure SIMD or MIMD architectures. The essential idea is to improve search efficiency by pruning the search space with the stubs rather than by employing a post-scarch filter. Processor generalizations are pruned upon the realization of the first constraint violation and are immediately replaced with new generalizations at random. Interprocessor communication is synchronized by a control processor. Pure MIMD machines offer improved efficiency using asynchronous communication (i.e., data-flow architectures). More than one seed generalization may be discovered per case. The genetic diversity of the seed generalizations serves to enlarge the space of crystal generalizations. Ifa generalization results in a constraint violation, then the generalization is not made. Otherwise, the generaliza_t;.on is made and retained, if proper. The probability of generating and retaining valid rules and expunging invalid rule~ increases with scale. It is worth noting that genetic algorithms [15] do not converge with scale. The process of crystal gen-

eralization can be indirectly applied to the task of automating the acquisition of domain-specific languages wkh expert" systems. Conversdy, the process of crystal generalization is bootstrapped throug.~l the application of expert" systems, where next-generation expert systems, such as IS, lie at the basis. The entire seed generalization process is iterated for a variable amount of time, which is uniform per case or case-block. The time limit, t, is initially set just long enough so that, on the average, the entire generalization space (or a least a tractable subset) is searched for one typically complex case. (The imposition of an order upon the training instances allows for typically simple cases to be used at the outset and for t to be subsequently revised upwards.) From then on, t is equally divided over the generalized cases as they aggregate so that on the average, given n generalized cases on a given level, l+4"n'n possibly redundant seed (i.e., basis) generalizations will be found. That is, the quantum, or allowed time for generalizing a case before switching contexts (i.e., polling to fire another rule) or dreaming (see t

below), is ~ .

Any pre-

viously unused quantum slice is ignored. A lower limit, ~ is set to prevent thrashing. Thus, a generalization will not be immediately attempted if the quantum is less than the lower limit. That is, I is the quantum for every

t

cases (i.e., rounded off) if l > ~ .t

The proof of these

claims stems from the fact that there are r (r- 1) possible distinct

ordered pairs, given r random rules.

The process of generalization is not invertible. That is, if the subsequent acquisition of any case induces a violation in any generalization, then that generalization must be expunged (although k may have already served a useful purposemsee below). The quality of generalizations improves with time. This improvement affects all aspects of system performance. Several details have been omitted from this discussion inasmuch as they are implementation-dependent decisions. Level 0 rules are fired at random (i.e., without contextual effects) if, upon the expiration of a quantum, no rules would otherwise fire. The random variable bears proportion to a rule's time stamp. The more recently fired the rule, the proportionately higher the probability that it will be fired again. This skew effectively preserves the firing sequence within statistical limits. Here, the resting system is said to be in the =dream state." Dreaming perturbs the system so that the reflective processes (see below) can create new rules. These rules will be most similar to the MRF rules. Thus, dreaming tends to increase the utility of the system. If the dream state is of relatively short duration, then the resting system is said to be =daydreaming." Daydreaming buffers throughput to maximize the utilization of systern resources.

Dreaming is interleaved with the seed generalization process. The percentage of time alloc',ted to the latter is proportional to 1 + ~ , where g is the number of l+g generalized cases on the current level. Notice that random generVOLUME 31 • NUMBER 2 ® 1992

t9~

ARTIFICIALINTELLIGENCE

tion-rules if proper and nonredundant. A reflection occurs whenever a generalization-rule is fired on Levd n > 0. Knowledge induced by reflective processes is said to be crystallized. Observe that it is redundant to reflect back a case-consequent by definition. It follows that the proScalability of RSCL Methods cess of reflection should exclude RSCL m e t h o d s are best cases. Any applied generalizationmapped onto parallel hardware rules, for which the generated rule because they are capable of keep- analogs violate any case coning all processors busy doing pro- straints on the immediately preductive work. Also, the efficiency ceding level, must be expunged. of the rule generation and valida- T b,at is, no image under transfortion process is directly propor- mation may violate any case contional to the scale of the system. straints on the i m m e d i a t e l y Therefore, RSCL methods are po-

alization is the driving force at the outset. This is also true of biological evolution. It decreases proportionatdy in importance with time (i.e., concomitant with the increase in importance of crystal generalization).

RSCL methods are potentially well-suited to complex real-world environments. tentially well-suited to complex real-world environments. These environments are expected to provide roughly one thousand times the processing speed of a Cray 2 (i.e., in the teraflop range) by the mid to late 90s. A further gain in processing speed of several orders of magnitude can be had using Spatial Light Modulators (SLMs) for heuristic pattern matching. What is more important, the processing power provided by these environments is essentially limited only by architectural geometry (i.e., the locality of memory reference).

Reflection Reflection is defined (i.e., by analogy to the common optical transformation) to occur whenever rule analogs are passed back to the immediately preceding level. They are verified and subsequently saved there as generaliza-

2£..~} ISA TRANSACTIONS

preceding level. Reflection is all the more important, since the case-base is dynamic. The deleterious effects of cycles (see below) are avoided through the requirement that a rule may be iterativdy reflected if, and only if, each reflection finds a home. This methodology allows for the localization of erroneous transformations. Case-generalizations are not created in the event of a proper reflection. Otherwise, the acquisition of the case-generalizations is initiated by the activation of a generalization-rule on the immediately preceding level. This rule will serve, in turn, as a case-consequent and a case-antecedent. This is necessary since transformational analogs are not generally symmetric. The working set of generalizations supplies the caseantecedents and case-consequents, respectively. That is, if there are n - 1 generalizations in the working set (i.e., the nth generalization is

the MRF rule), then twice the same number of candidate casegeneralizations will be created for the n ~ level. One set of casegeneralizations will have the same c o n s e q u e n t and the other the

same antecedent (i.e., the MRF rule). Any heuristic technique, such as an expert" system, for pruning case predicates can be equivalently incorporated into the heuristic technique for delimiting the generalization space. The space associated with each case-generalization is simultaneously searched for a generalization (i.e., transformational map). Statistically better results are obtained if each case generalization is distinct. This follows from the observation that most case-generalizations cannot be generalized and because the same generalization will be created many times through random search in the different processors. Candidate generalizations are checked against the existing base of case-generalizations. Therefore, the base of case-generalizations grows through the acquisition of a casegeneralization ~nd associated proper generalization(s). Proper reflected rule analogs, like any other generalization, tend to evolve to a maximally general form with respect to a levels-employed generalization language. Indeed, an expert n system will evolve ever greater capabilities for information compression, as can be shown for increasing ,,alues of n.

Rules may be indexed (i.e., especially in a serial processing environment) to facilitate retrieval operations. A good indexing scheme involves several hierarchies of caches, which can be updated through the application of an MRF methodology. The over-

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNO~'v~EDGE

all algorithm uses forward chaining, although a backward-chained or a hi-directional system is similarly possible. If a rule analog is expunged from a generalization cache, then it is required that this operation propagate recursively on all subsequent levels. This requirement ensures that any automatically generated transformation rule will have its associated case antecedent and conseque,t in the same generalization cache (i.e., on the immediately preceding level). Such =bounding" can be shown to be associated with a higher quality of transformation rule. It should be added that cases are not subject to any reflective process. This follows from the fact that generalizations yield more general results through the same process. A case-description language is random with respect to a generalization language-also by definitionmimplying that cases can never serve as their own generalizations. Furthermore, Level 0 case-instances, which are not generalized, are anomalies. They are first to be expunged. (All retained case-generalizations must be associated with at least one generalization, by definition.) Finally, the presence of noise and uncertainty (i.e., including the uncertainty surrounding the capture of all salient features in a case-context) imply the need for a case-update mechanism. Therefore, it is not a good idea to propagate cases so that the propagation of errors can be avoided. All levels of cases and generalizations in the table are subject to dynamic creation and destruction at run time. In particular, the existence of the nth defined level in the table (i.e., for n > 0) implies

that at least one pair of generalizations are retained on Level n - 1.

SelfiRq~ection The process of expanding the r e l i a n t knowledge-base segment is accomplished through the application of transformation rules to themselves, i.e., self-reflection. For example, the grammar rule bb-,bc a u g m e n t s the pair of grammar rules a-oh and ab-~.ac. Only the transformatively generated rule can act upon the context bb, say (i.e., if retained as a casegeneralization). Rules may not be right recursive. This possibility can arise only on Level 1 generalizations in IS. Here, generalizations are similar to grammar rules as used for transformation. T h e M R F rule is simultaneously applied to all generalizations (i.e., in the same language) on the same level in the memory (i.e., the working set) and vice versa. This process facilitates the augmentation of the knowledge base (i.e., up to a tripling of the size of the working se0 with transformatively derived rules on the same level. If any of these induced rules is right recursive, then the generating pair contains a cycle. In this event, the non-MRF rule is removed, removing the cycle. Notice that this process will not generally trap cycles. For example, the rule a - ) b is trivially acyclic. Integrating the rule/,-oc results in the induction of the rule a--)c. This last rule is lost, since it is in conflict with the case-generalization for the rule a-oh. Next, integrating the rule ,.--)a with the two remaining acyclic rules results in a total of three retained rules. These three rules are cyclic, as was to be shown. Notice that it would be totally nonproductive to treat

the most recently acquired (recursire) rule e--,a as an MRF rule. Any process for general cycle detection in a Type 0 grammar is NP-hard (i.e., in the number of states in the digraph). Thus, excepting the special case detailed above, no effort to remove cycles can be justified on a theoretical basis. Any latent cycles may surface and be remedied through the normal course of exercising the system in predictive mode. Level 0 cycles have been precluded from occurring in LS through the prudent design of the same generalization language.

Results

The key metric used in evaluating the merit of this RSCL implementation is the critical ratio (c.r.) of the number of crystallized rules retained on Level 0 to the number of randomly generated rules retained there. Level 0 generalizations implicitly account for the transformational effects that occur on all higher levels.

Analysis It will be seen that results with the LS prototype fit the developed model. Moreover, it will be graphically demonstrated that ".!,e critical ratio exceeds unity at some point, which is a function of the inherent symmetry of the application domain. Let c = the number of crystallized rules retained on Level 0, k = a domain-specific constant of proportionality, n =the n u m b e r of case instances, and VOLUME 31 * NUMBER 2 ® 1992

20t

ARTIFICIAL INTELLIGENCE

r = the number of randomly generated rules retained on Level 0. First, it is to be observed that c is equivalent to the product of three things. The first item is the maximum number of additional rules that can be generated from the reflection of the initial random rule-set. The second item is the constant, k. The last item is the probability that a crystallized rule can "bubble up" and find a home on Level 0. There can be no means to detect the difference between rules generated by crystallization and random processes from the rules themselves. This is by definition. It follows from the inability to identify crystallized rules, in conjunction with the recursive definition of the contraction defined in the Appendix, that the theoretical upper bound on the total number of rules, c, which can be crystalr--I

coeds unity as a minimal condition, crystallized rules are iteratively induced from themselves. This implies that subject to the upper bound imposed by (7), the more knowledge that is crystallized, the more knowledge that n can be crystallized in an expert system. The constant k is proportional to the inherent domain symmetry. Perfectly random domains do not exist in theory [3]. That is why k can at best approach zero for nontrivial random domains. Conversely, perfectly s y m m e t r i c domains also do not exist in theory. That is why k can at best app r o a c h u n i t y for n o n t r i v i a l symmetric domains. It follows from the above discussion that 0 < k < 0.5. The likelihood that a crystallized rule finds a home on Level 0 is given by the probability ratio:

2c

p --"~~ I

n--

,.2

(9)

n

lized from r rules, is given by ~ i . i=1

Here, iteratively minimal contractions in the number of crystallized rules are summed. A simplification, where r > 1, leads to result (7): 1

(7)

])

The number of generalizations per case-instance tends towards unity as n grows very large. This follows from the fact that each of n case instances is accorded a domain-specific representation, implying that, likewise given a distinct domain-specific generalization language, the proper gen-

A helpful observation is that (7) is equivalent to half the number of possible permutations of r rules taken in pairs. It follows from (7) that .. 0--

.. 1

1)

(8)

150

eralization is unique where it exists. Furthermore, case-instances that do not have generalizations are volatile and thus do not enter into the effective computation. The large magnitude of n implies that the rejection rate for improper rules and, conversely, the retention rate for proper rules tends towards unity. It follows from (7) and (9) that

< G-

ISA TRANSACTIONS

(10)

where p is independent of c and r. Combining results, it is found that

G-I

c< k . t( r-1)-"-4~n

,n>l

(11)

where the constraining surfaces are explicitly given by (11), (12), and (13) as well as implicitly by (7). Here,

r'='G'n

(12)

c'= r (13) It also follows from (11) and (12) that

1) 2 n > 1

c.r. <_

(14)

Nn All variables are constrained according to the above. Figure 6 depicts a hypothetical random domain for which k = 0.01. Observe that the critical ratio exceeds unity if, and only if,

150

I00 If the critical ratio exceeds 100 0 unity, then more knowledge is O 50 generated by the efficient pro50 cesses of crystallization than by much more costly random processes. Furthermore, it follows that, wherever the criticalratioex- Figure6-A HypotheticalRandomDomain

202

1

/,_----4g--,,,> I

CASE-BASED LEARNING:PARADIGMFOR AUTOMATEDKNOWLEDGE

n exceeds 10,400 case-instances (14). This implies that a comparatively large-scale expert, system would be required for functionalit,/here. Figure 7 depicts a hypothetical symmetrical domain for which k = 0.49. Observe that the critical ratio exceeds unity if n exceeds 14 case-instances (14). This implies that a comparatively small-scale expert n system would be wellsuited here. It is observed that the greater the domain symmetry, the fewer Figure7-A HypotheticalSymmetricalDomain the number of cases that are L~--quired to enable the critical ratio to exceed unity. Unity may be substituted for the critical ratio 100000 (14) to yield the equality (15). Equation (15) defines Figure 8.

k - (,g~d_ 1)~-' ~1

(]5) a

1OOOO

g Here, the lim k-~0 is n--~oo. Also, the lim k-~0.5 is n->14. ¢J The decrease in the minimum 1000 number of cases required for the critical ratio to exceed unity is asymptotically limited with increasZ 100 ing domain symmetry as shown. Empirical results on a DAP610 Supercomputer using up to n = 4,096 cases, based upon simu10 lated instanti~tions of a "black 0.0 0.1 0.2 0.3 0.4 0.5 box" containing six hidden rules, Constant k are consistent with a choice of k = 0.03. This means that the LS domain is more random than sym- Figure 8.-Effectof DomainSymmelryuponthe Crilical Ratio metric. Here, the critical ratio (17) exceeded unity whenever n ex- rithmic specifications are more ceeded 1,250 case-instances (14). often similar than not when real- f(n)-- 1 + ~ i i= 1 Figure 9 depicts the critical ratio ized. To help see this, first define the function: for LS. This function is realized by the f(n)= 1+ 1),n_>O (16) recursive algorithm shown in FigIraplicationsfor Automatic Software Generation A simple transformation of ure 10. Replacing the addition symbol (16) yields an equivalent result: Application software domains in Figure 10 with a multiplication are usually associated with a rdasymbol result"-- in .L,~c_recursive altiveiy large value for constant k. gorithm shown in Figure 11. This follows since distinct algo-

i

n(n+2

VOLUME 31 • NUMBER2 • 1992 2 0 3

A"RTIFICIALINTELLIGENCE

i

i

llliiiiiiiH

W

llllllit

;mtmel.at

60

I 40

4 I

IIIII/I/~

l l l t l l l l

~all

40

O

c

20

20

I I I IAFII IIMFIIIIIII! ! IAfflll

I IW'IlI]Ill

-IIII

I I I I [I IIIII!

III

I I III II II1111111111111 ''llllllllrl

I ! I l I I I I ! I I I

I I I |

1 ! I I I

~~t

lc2~^

Flgure 9.-The Crltlcal Ratlo for LS

A Recursive Algorithm SUM (N) = IF N--0 THEN 1 ELSE N + SUM (N-l) Flgure IO-A Recurslve Summation Algorithm

A Recursive Algorithm SUM (N)= IF N=0 THEN 1 ELSE N * SUM (N-l) Figure 11-A Recurslve Multlpllcatlon Algorllhm Observe that the recursive algorithm defined in Figure 11 computes the function: "

(18)

g(n)= l]i i=1

Equation (18) is none other than the factorial function (!). The realization of (16) and (18) are similar under the effects of domain-specific transformations. The similarity occurs despite the dissimilar appearances of (16) and (18). This strongly suggests that many, if not most, application software domains are highly sym204

ISA TRANSACTIONS

metric. This follows, despite dissimilar specifications, as was to be shown. Comparatively smallscale ENCs for assisting in the automatic generation of software should be possible. Future Work Cases and generalizations can serve in cross-domain transference (i.e., the porting of knowledge from one domain to a mutually random one). An example is the porting of a knowledge of checkers into a game of chess and vice

versa [8, 30]. Much future work will follow as a re.suit of the transference property. A generalization language can support many case-description languages, but a case-description language can be supported by at most one generalization language at any time. A generalization language can induce a case-generalization language on the next level (i.e., with crossover between multiple uses). This follows from the fact that cross-domain transference can be effected only between similar languages. That is why these languages should be of as fine a granularity as allowed for by the system. Expert (optimizing) compilers allow the clean partitioning of a compiler design problem [13]. Here, if a change in requirements calls for a compiler code change, it can usually be isolated to one or more modules. A module consists of approximately 500 production rules. The software developer can query the compiler and find out the reasoning it used to produce some code with a production rules-based design. What is more, rules can easily be updated or added to accommodate new compiler situations. Some knowledge-based compilers have found use in real environments. In particular, Intermetrics Corporation has built an operational expert that optimizes the Ada compiler [13]. Also, Lexeme Corporation has built an expert translator for translating various languages such as Jovial to Ada, for example [13]. Expert systembased source code debuggers, such as the University of Waterloo's Message Trace Analyzer, helps the user to find and debug errors in source code that have not yet created errors in compiled code [13].

CASE-BASED LEARNING:'IaARADIGMFORAUTOMATEDKNOWLEDGE

ENCs can improve upon all of ory. They are also built with opthese systems. This is accom- erating system principles for the plished through the effects of their efficient organization of memory capability for automatic knowl- (i.e., induding forgetting). edge acquisition. Formal specification languages Several germane implementa- can be more difficult to learn and tions can serve as natural follow- use than the programming lanup projects. Related projects guage itself [13]. However, ENCs include, but are not limited to: for Generalization Augmented 1) Use of the same (or similar) Programming (GAP), i.e., the inimplemented case-descrip- cremental generalization of protion language(s) in mutu- grams from their instance specifications (see Reference 30 ally random domains for a preliminary approach), can 2) Use of the same (or similar) be bootstrapped and promise to implemented generaliza- improve programmer productivtion language(s) in mutu- ity. The increase in productivity ally random domains with underpins the bootstrapping procrossover cess. Here, all implemented code 3) Implementation of expert n is programmed using instance systems for explanation specifications. All domain-speand explanation-by-anal- cific transformations are effected ogy (EBA) subsystems by ENCs. The associated con4) Implementation of expert n stants of proportionality (i.e., systems to replace the stubs symmetries) here are predicted m be relatively high. Thus, the efon each level. fects of cross-domain transference 5) The bootstrapping of can render ENCs eminently pracENCs for automatic softtical. That is, the larger the sysware generation tem, the proportionately easier it A good project, concerning the becomes to implement its compofirst alternative, entails the opti- nents. mization of the vehicular allocaENCs for Generalization Augtion domain. The developed mented Design (GAD) are exsystem would serve LS by mini- pected to bestow similar benefits mizing redundancy among con- upon the engineering profession taining vehicles assigned to targets (see Reference 5 for related work). in close relative proximity. For example, ENCs for silicon can provide very powerful hardware definition languages and an intelSUMMARYANDCONCLUSIONS ligent global planning mechanism Next-generation expert systems that can help to select and interprovide a basis for a highly effi- connect blocks. The widespread cient and powerful AI. This is adoption of networked fineachieved throungh the develop- grained architectures, especially ment of expert, systems. These the arrival of wafer-scale parallel systems are built with generaliza- (optical) machir,~es, should serve tion and transformation tech- to enlarge the scope of practical niques, implying better applications for next-generation explanation facilities and a re- expert systems (i.e., including the duced requirement for case mem- bootstrapping of expert n systems).

It follows from the above discussion that both of Slade's longterm agendas for CBR, as stated at the outset of this paper, can be successfully addressed through CBL [35]. An expanded formal theory of expert ~ systems is called for to facilitate progress.

ACKNOWLEDGMENTS The author is grateful to Dr. Brian L. Andersen for numerous insightful conversations and for his virtuosity on the parallel machines. The author also recognizes the efforts of all those who have the courage to pursue science for its intrinsic truths and not necessarily for what others say ought to be the case. The study of information science should be one of symmetry and beauty.

REFERENCES 1. Albus, J.S. (1991). =Outline for a Theory of Intelligence." IEEE Transactions

on Systems, Man, and Cybernetics, May-June, 21, 473-509. 2. Bareiss, IL (1989). Exem-

plar-Based Knowledge Acquisition. New York, NY: Academic Press. 3. Chaitin, G. J. (1975). =Randomness and Mathematical Proof." Scientific American, May, 232, 4752. 4. Chen, Z. (1991). =Analogical Reasoning for Human Performance Improvement." Pe~rmance & Instruction, February, 30, 27-28. 5. Cook, D. J. (1991). =Application of Parallelized Analogica~ Planning to EnVOLUME 31 = NUMBER 2 ® 1992

20~

ARTIFICIALINTELLIGENCE

6.

7.

8.

9.

gineering Design." Journal of Applied Intelligence, October, 1,133-144. DeJong, G., and Mooney, IL (1986). *ExplanationBased Learning: An Alternative View." Machine Learning, 1,145-176. Durkin, J . (1991). -.Designing an Induction Expert System." A/Expert, December, 6, 28-35. Engel, B. A.; Baffaut, C.; Barrett, J. R.; Rogers, J. B.; and Jones, D. D. (1990). -.Knowledge Transformation." Applied Artificial lntelligence, April-J une, 4, 67-80. Flann, N. S., and Dietterich, T. G. (1986). -.Selecting Appropriate Representations for Learning from Examples." Pro-

ceedings of the Fifth National Conference on Artificial Intelligence, 460466. 10. Fogel, D. B. (1991). "The Evolution of Intelligent Decision Making in Gaming." Cybernetics and Systems, 22, 223-236. 11. Guiding, A. R., and Rosenbloom, P. S. (1991). -.Improving Rule-Based Systems through CaseBased Reasoning." Pro-

ceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 22-27. 12. Gray, N. A. B. (1990). *Capturing Knowledge through Top-Down Induction of Decision Trees."

IEEE Expert: Intelligent Systems and their Applications, June, 5, 41-50. 206

ISA TRANSACTIONS

13. Harandi, M. T., and Bhansali, S. (1989). *Program Derivation Using Analogy." Proceedings of

the Eleventh International Joint Conference on Artificial Intelligence, Detroit, MI, 389-394. 14. Hindin, H. J. (1986). *Intelligent Tools Automate High-Level Language Programming." Computer Design, May 15, 25, 45-56. 15. Holland, ]. H.; Holyoak, K. J.; Nisbett, IL E.; and Thagard, P. R. (1986). In-

ducaon: Processesof Inference, Learning, and Discovery. Cambridge, MA: The MIT Press. 16. Immarco, P. (1991). *Fuzzy Associative Memories." PC AI, January-February, 5, 52-58. 17. Kfoury, A. J.; Moll, IL N., and Arbib, M. A. (1982).

A ProgrammingApproach to Computability. New York, NY: Springer-VerlagInc. 18. Koff, R. E. (1985). Learn-

ing to Solve Problems by Searchingfor Macro-Operators. Boston, MA: Pitman Publishing Inc. 19. Keravnou, E. T., and Washbrook, J. (1989). -.Wha( Is a Deep Expert System~ An Analysis of the Architectural Requirements of Second-Generation Expert Systems." The

Knowledge Engineering Review, September, 4, 205233. 20. Krawchuk, B. J., and Witten, I. H. (1989). -.Explanation-Based Learning: Its Role in Problem Solving." Journal of Experimen-

tal & Theoretical Artificial Intelligence, J an uaryMarch, 1, 27-49. 21. Lin, J. H., and Vitter, J. S. (1991). *Complexity Results on Learning by Neural Nets." Machine Learning, May, 6, 211-230. 22. Maiden, N. (1991). *Analogy as a Paradigm for Specification Reuse." Software Engineering Journal January, 6, 3-15. 23. McDougal, T.; Hammond, K.; and Seifert, C. (1991). *A Functional Perspective on Reminding." Proceed-

ings of a Workshop on CgseBased Reasoning Washington, D.C., 63-76. 24. Minsky, M. (1987). The Society of Mina[ New York, NY: Simon and Schuster. 25. Minsky, M. (1991). "Logical versus Analogical or Symbolic versus Connectionist or Neat versus Scruffy." AI Magazine, Summer, 12, 34-51. 26. Mitchell, T. M. (1982). Generalization as Search.

Artificial

Intelligence,

March, 18, 203-226. 27. Musen, M. A. (1989). Automated Support for Building and Extending Expert Models. Machine Learning, December, 4, 347375. 28. Price, C., and Lee, M. (1988). Applications of Deep Knowledge. The In-

ternational Journal for Artificial Intelligence in Engineering, January, 3, 12-17. 29. Quinlan, J. R. (1983). *Learning Efficient Classi-

CASE-BASED LEARNING: PARADIGM FOR AUTOMAiP_D KNOWLEDGe

fica,.:ion Procedures and Their Application to Chess End Games." In (ILS. Michalski, J.G. Carbonell, and T.M. Mitchell, editors) Machine Learning:

An Artificial Intelligence Approach, Vol. 1, Palo Alto, CA: Tioga Publishing.

The Tramformative Compression of Coherent Languages.

50. Rubin, S. (1988).

Ph.D. thesis, Department of Computer Science & Electrical Engineering, Lehigh University, Bethlehem, PA. 31. Rubin, S. (1990). UAn Engineering Approach to Automatic Programming."

Proceedings, Fifth Conference on Artificial Intelligence for Space Applications Huntsville, AL, 383-391. 32. Schank, IL C., and Leake, D. B. (1989). ~Creativity and Learning in a CaseBased Explainer." Artificial Intelligence, September, 40, 353-385. 33. Shavlik, J. W. (1990a). "Acquiring Recursive and Iterative Concepts with Explanation-Based Learning." Machine Learning, March, 5, 39-70. 34. Shavlik, J. W. (1990b).

Extending ExplanationBased Learning by Generalizing the Structure of Explanations. San Mateo, CA: Morgan Kaufmann Publishers. 35. Slade, S. (1991). "CaseBased Reasoning: A Research Paradigm." A/ Magazine, Spring, 12, 4255.

36. Steels, L. (1985). "SecondGeneration Expert Sys-

tems." Future Generation Computer Systems, June, 1, 213-221. 37. Steels, L. (1990). "Comaonents of Expertise." A," Magazine, Summer, II, 28-49. 38. Turner, S. IL (1991). *A Case-Based Model of Creativity." Workshop Notes

from the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 46-54. 39. Waltz, D. L. (1989). *Is Indexing Used for Retrieval?" Proceedings of a

Workshop on Case-Based Reason ing, Pensacola Beach, FL, 41-44.

pert systems) are capable of addressing the knowledge acquisition bottleneck and are potentially faster and more robust than their forerunners. To see this, first define f, and • such that f,/~ •: $1-.52 are total computable functions, f and g represent two next-generation or expert 0 systems, the latter of which, without loss of generality, serves to focus the attention of the former. $1 represents the formal space of a knowledge base and • attributes, including the probtts lem context. 52 represents the formal space of all expert "*system actions, including the generation of a knowledge base that contains transformed attributes and nonmonotonic rules; that is, $2 51 in the derivational sense. Furthermore, let the equation

e-fog APPENDIX: THEORY GF EXPERTn SYSTEMS

Expert systems can be formally represented by recursive collections of specialists or subsystems that are atomic at some primitive level. Each primitive subsystem is charged with the effective performance of a domain-specific task. It may be useful to think of conventional expert systems as ~macro" expert systems, where all constituent subsystems communicate through a blackboard, but no pair of subsystems can modify the effective procedure of its companion. It is this later stipulation that serves to differentiate conventional expert or horizontal systems from expert'* or vertical systems. All vertical systems can be horizontally integrated and vice versa. It will be shown herein that new genres of expert systems (namely, expert'*systems and their constituent next-generation ex-

(A.I)

define an expert / system. Note that the functions must be computable in order for the composition (i.e., transformational) operator to be defined. Note too that the totality of the functions can be guaranteed through the provision for a timer-interrupt control transfer mechanism in the interpreter. Without loss of generality, further let SI, SI', and 52 consist of sets of similar elements such that SI" c $2 c SI, where SI" is chosen such that its relative cardinality is sufficiently small. It is noted that O c O. It follows that the time (i.e., the number of steps) required to map the domain SI' onto the range Sz, f t : SI '-.52 is less than or equal to the time required to map the domain St onto the range $2, f t : S1-*$2. This is because the number of computational steps bears a direct proportion to the cardinal|ty of the cross VOLUME 31 = NUMBER 2 • 1992

2Oi

ARTIFICIALINTELLIGENCE

product of the sets being mapped. Next, let g t: $1 --->$1 ' by definition (A.2) Therefore, where card {S! x St "} + card ISI " x $2 } card ! S I x S2} (i.e., a triangle inequality)

f(g

f(so

(A.3)

by substitution with result (A.2) Then, it follows from definition (A. 1) that (A.4)

et( &) <-f*(SD

Result (A.4) demonstrates that expert / systems are potentially faster than next-generation expert systems. It may be generalized somewhat with the result: el+ t (A)(sI) < ei(A)(s1), i=0,1,2,

....

n-1

(A.5)

expert '* systems. This result demonstrates that if A is fixed, then iteratively more robust expert'* systems, as measured over the defined metric space, can usually be obtained (i.e., limited only by attaining the infimum, or suprem u m in case of the inverse relation, of A) from the synergistic cooperation of two similar expert nd systems (A.6). These systems can differ in the content of their knowledge bases and representation languages. It is possible to transform the selection of A through the definition of appropriate attribute-mapping (e.g., attention focusing) transformation functions. The family of these functions is defined by the mapping

q (,4'):& ._, 6"2, (A.S) where • 0 - f - ~ Result (A.5) applies to any given fixed set of j = O , 1,2, . . . . n knowledge-base attributes, A c where A " c 82 and represents the Sl, which are totally ordered by transformed metric space. Where the relation ~;. Then, it follows the domain of result (A.8) is A, that an expert n system (i.e., where, the range will be A'. It follows from an inductive arn > 0) defined by the recurrence equation gument similar to the preceding one that the metric space defined ¢i+ l (A) (Sl)---- d(Sl), (A.6) in result (A.7) may be transformed i--0,1,2, .... n-1 with the following result. the integer power denoting auto- e2(6A')(Sl)) < e~A')(A)(etA')(St)), c o m p o s i t i o n , can be boot(A9) strapped. The composition (i.e., i = O, 1,2, . . . . n transformation) defined by e,2 in Simplifying result (A.9) by subresult (A.6) maximizes the perfor- stitution with result (A.8) yields: mance of an expert n system with d ( S2) < e/aS(S2), (A.10) respect to A. This can be proven by induction on index/, taking i = i = 0 , 1 , 2 , . . . . n Result (A. 10) may be compared 0 as the basis case and using result with result (A.7). This compari(A.5) for the inductive step. The son evidences that the transformafollowing result is derived from retion of knowledge-base attributes suits (A.5) and (A.6). will necessarily be accompanied e,:Z(St) < ei (A) (St), (A.7) by a contraction in the affected knowledge base, since S2 c S ! by i=0,1,2, .... n definition. In practical terms, this Result (A.7), read ei2 precedes means that the best paradigm for el with respect to metric space A, bootstrapping knowledge bases, defines totally ordered subsets of ISA TRANSACTIONS

by process of analogy, involves the synergistic cooperation of two similar expertn'lsystems. These systems can differ in the content of their knowledge bases and representation languages, as before. Therefore, the process of analogical discovery in expert n systems is a contraction mapping that has an infimum, or supremum in the case of the inverse relation, of A'. This underscores the importance of knowledge (i.e., expert n systems) in serving analogical discovery. An alternative simplification of result (A.9) by substitution with result (A.8) yields the following intermediate result. e2($2) -< e2(SD, i=0,1,2,

....

(A.11) n

It is then proper to substitute result (A.6) into result (/Lll) to yield el+ 1('¢)(S2)<_el+ l(X)(Sl), i=0,1,2,

....

(A.12)

n-1

Result (A. 12) demonstrates that more robust expert n systems (i.e., where, n > 0) can be had using transformed knowledgebase attributes (i.e., subject only to the occurrence of the infimum, or the supremum in the case of the inverse relation, of A'). This resuits in a contraction of the domain (i.e., since S2 c Sl by definition). This fact precludes the existence of any strong ~general problem solver." That is, in common with all other knowledge-based systems, expert n systems (and thus next-generation expert systems) must be domainspecific. This requirement applies to all incorporated knowledge representation languages as well. In summary, expert n systems can bootstrap their knowledge bases by process of transforma-

CASE-BASED LEARNING: PARADIGM FOR AUTOMATED KNOWLEDGE

tional analogy as wall as contract their domains. The iterative progression of these systems is given by the auto-composition equation

(A.6), ei+l(~O=e2. Figure 12 shows the induced serial geometric structure. The degree of each node (i.e., excepting Level 0 and

Level N) is three. This is the minimal provided for by the theory.

o-(

)

Figure 12-Serial Structureof an Expert "System

VOLUME 31 = NUMBER 2 ® 1992

2~