fnjormorion .Y+srems Vol. 15. No. I. pp. 151-160, 1990 Printed HI Great Britain. All rights reserved
Copyright
0306-4379/90 $3.00 -I- 0.00 “, 1990 Pergamon Press plc
EXPERT DATABASE SYSTEMS: KNOWLEDGE/DATA MANAGEMENT ENVIRONMENTS FOR INTELLIGENT INFORMATION SYSTEMS LARRY KERSCHBERG Department of Information Systems and Systems Engineering, George Mason University, 4400 University Drive, Fairfax. VA 22030, U.S.A. (Receivedforpublication 19 October 1989) Abstract-Expert database systems (EDS) are database management systems (DBMS) endowed with knowledge and expertise to support knowledge-based applications which access large shared databases. The architectures, tools and techniques needed to build such systems are varied, and draw upon such fields as artificial intelligence, database management and logic programming. It is precisely the confluence of ideas from these fields that provides the synergism for new insights and tools for building intelligent information systems. Expertise may reside within the system to improve performance by providing intelligent question-answering, using database semantic integrity constraints for query optimization and combining knowledgeand data-driven search techniques in efficient inference schemes. Conversely, expertise may reside outside the system in knowledge-based application that interpret vast quantities of data and make decisionimpelling recommendations to users. Thus the goal of EDS research and development is to provide tools and techniques to make databases “active” agents that can reason, and to allow database systems to support artificial intelligence applications that manage and access large knowledge bases and databases. Expert database systems allow the specification, prototyping and implementation of knowledge-based information systems that represent a t:ertica/extension beyond well-defined, transaction-oriented systems to those with knowledge-directed reasoning and interpretation.
1. INTRODUCTION During the past few years, we have seen the emergence of expert ~~~a~~~e systems (EDS) as a vibrant and productive field for research and development. Expert database systems represent the c@&ence of concepts, tools and techniques from diverse areas: artificial intelligence (AI), database management (DB) and logic programming (LP). Three international forums have provided researchers and practitioners the opportunity to present and discuss their latest insights and research results: The First Znternational Workshop on Expert Database Systems [l], and the First and Second International Conferences on Expert Database Systems [2,3]. This article provides an introduction to some of the major issues of the field, and in addition presents some recent research results of the author. In addition, most major conferences have sessions devoted to the ED&elated topics. Basically, an EDS supports applications that processing of shared require “knowledge-directed information [4].” Expertise may reside within the system to improve performance by providing intelligent question-answering, using database semantic integrity constraint for query optimization and combining knowledge- and data-driven search techniques in efficient inference schemes. Conversely, expertise may reside outside the system in knowledgebased applications that interpret vast quantities of
data and make decision-impelling recommendations to users. Thus the goal of EDS research and development is to provide tools and techniques to make databases “active” agents that can reason, and to allow database systems to support artificial intelligence applications that manage and access large knowledge bases and databases. The special appeal of EDS is that they evoke a variety of ways in which knowledge and expertise can be incorporated into system architectures. One can envision several possible scenarios: (1) an expert system loosely-coupled with a database system; (2) a database management system (DBMS) enhanced with reasoning capabilities to perform knowledge-dir~ted problem solving; (3) an LP system or an AI knowledge representation system, enhanced with database access and manipulation primitives; (4) an intelligent user interface for query specification, optimization and processing; and (5) a tightly-coupled EDS “shell” for the specification, management and manipulation of integrated knowledge-databases. All of the above architectures are meaningful, and there indeed may be many others. The particular one chosen will depend on the application requirements and the availability of tools for their implementation. The terms “loosely-coupled” and “tightly-coupled” 151
152
LARRY
Kutsci+~ERG
have come to characterize two important classes of EDS. By loose coupling we mean that both the AI system and the DBMS will maintain their own functionality and wit1 communicate through a well-defined interface. For example, an AI system might send SQL queries to the database system, and conversely, the database system might send messages to the AI system placing data onto its blackboard. In addition the DBMS could pose questions to the AI system in much the same way a user might consult an expert system. Examples of such architectures include [5-S]. ~~ght-co~pZ~~g,on the other hand, implies that at least one system has knowledge of the inner workings of its counterpart, and that special, performanceenhancing access mechanisms are provided. Applications that require tight coupling are those in which both the data and knowledge may be updated and in which the most timely changes must be accessible. Examples of such systems are POSTGRES [9] and the work reported in [lo]. 2. THE NEED FOR EDS Let us explore the motivations for EDS architectures in terms of different types of applications. The reality of present day enterprises indicates that data is considered a corporate-wide resource to be managed by the Data Administration function. Much of that data is managed by a one or more DBMS, and corporations have made major investments in database applications. These applications will continue to evolve to meet changing organizational requirements. The data engineering process refers to database requirements specification, database design, implementation and maintenance. On the other hand, knowledge-based processing of corporate databases is relatively new, as is the process of acquiring and using knowledge, called knowledge engineering. While data is viewed as static and factual, knowledge is considered to be a dynamic and complex commodity used to solve problems. Thus, data and knowledge go hand-in-hand, and it behoves organization to manage both. Organizations are realizing that knowledge-based applications can serve as a mechanism for competitive a~vant~e by providing reasoned advice for decisionmaking. Most of the applications being developed are well-guarded secrets, because the mere mention of the existence of an EDS project may result in the loss of the perceived competitive advantage. There are, however, several case studies reported in the literature, and they will be addressed through the loosely- and tightly-coupled viewpoints.
The most obvious approach to data/knowledge integration is to interface a knowledge-based “shell” or an AI language (e.g. Prolog) with a DBMS. A commercially available architecture that uses this
technique is the KEEconnection [8] from IntelliCorp which allows the frame-based, object-oriented Knowledge Engineering Environment (KEE) to be tailored so that KEE objects--whose data are stored in a DBMS-can be materialized through an interface that generates SQL queries to the DBMS. Loosely-coupled applications are typically those that view the database as a data server, with knowledgebased processing used to interpret data obtained by issuing SQL queries to the database. The amount of expertise and knowledge required to construct the interface will depend on the nature and the amount of data being retrieved, as well as the amount of preprocessing needed to formulate the query. For example, American Express’ Authorizer Assistant [ 1I] uses about 800 rules to summarize company policies and guidelines for credit-worthiness. These rules reside in a knowledge base supported by the expert system shell, ART running on a Symbolits 3645 which can call the Credit Authorization System’s database residing on the IBM 3033 processor. The data needed is a customer’s current authorization request and past spending history. In the MEDCLAIM system [12] developed for Blue Cross/Blue Shield of South Carolina, a looselycoupled EDS architecture was implemented: the M.1 expert system shell from Teknowledge, Inc. contains about 120 claims evaluation and BC/BS policy rules, and a relational database of 6000 records representing medical knowledge and BC/BS policy regulations about the medical necessity and adequacy of the treatments for 1000 well-established diseases. The ES shell was used to codify problem-solving knowledge regarding valid claims and the relationships among claim items. The relational database was used to encode highly-structured and formatted knowledge regarding typical recuperation profiles and treatment plans for the 1000 most common diseases and diagnoses. Thus, both the ES shell and the relational database contain dl~ere~t types of k~o~i~edge about the application domain.
The novel aspect of having highly-structured and formatted knowledge (as data) is the simplification and automation of knowledge acquisition from an expert. A graphics-oriented interface was constructed so that the expert could “map” both medical and BC/BS knowledge directly into the database. The data abstraction and organization techniques provided by a semantic data model proved quite useful. In fact, this experience proved to our team that an integrated approach to knowledge and data-engineering is essential to modeling EDS applications [12]. Our team came up with an adage: “Never partition a knowledge base before its time.” As soon as the knowledge base is partitioned into rule-based and data-based component, one must then also worry about the interfaces between them. Changes to either one will impact the other. The project was so successful that BC/BS has implemented a production version of MEDCLAIM. It is written in COBOL
Expert database systems and runs as part of the normal BC/BS claims processing operation. Both applications cited above require limited amounts of data and the database queries are welldefined. An application requiring more domain-specific knowledge about the database(s) concerns heterogeneous database query processing. Assume that the goal is to support a uniform user interface to access multiple heterogeneous databases, each having different query formats, data models and organizations. Clearly the knowledge-based front-end should be responsible for query refinement through a knowledge-based thesaurus, query decomposition and reformulation for target databases, and query result assembly for user reports. Extensive knowledge about the database contents and query formats is essential to this approach. Thus, in some cases, extensive knowledge about the other enrivonment is needed to support a “narrow communication channel” between environments, e.g. database queries. It appears that many new applications will involve the interaction of heterogeneous, distributed knowledge- and data-based systems [13, 141 so that loosely-coupled functional architectures involving knowledge-based system coordination will be essential. One might argue that the extensive knowledge used to support the interface is a form of tight-coupljng. 2.2. Tightly coupled applications Stonebraker and Hearst [15] analyze the drawbacks of loose-coupling of AI-DBMS: 1. The rule base is memory resident while the database is disk resident, and changes to the rule base may not be saved unless the user intervenes to expressly save it. In addition, the rule base is not shared among multiple users. 2. The AI system may request data through a query and store the result in a cache for performance reasons. The database, however, may be updated in which case the inference engine would be using out-of-date info~ation, unless the data were “locked” by means of a transaction. The performance of the system might be degraded because updates to the selected objects would not be permitted. 3. There are some non -partitionab~e applications in which the shell must retrieve the entire fact base in order to perform its reasoning. The cache size would have to be very large. The application domain and its performance requirements will be instrumenta in determining the overall EDS architecture. For example, Smith [4) provides two military applications that require a very tighf coupling between multiple expert systems and a database system: I.
An automated map production system in which the DBMS maintains a time-varying
153
model of the world’s surface. Raw image data is relayed by satellites, and this data must be analyzed by expert systems to extract significant features. Information regarding these features is used to update the map database. The performance bottleneck is in the feature analysis, while the map database may be updated at a later time. The database size is approx. IO**19 bytes. 2. A system for Naval tactical situation assessment for use on board ship. Static information consists of maps, charts, ship characteristics and weapons characteristics. Dynamic data includes contact reports on the position and actions of other ships and aircraft in the vicinity. The database must process hundreds of contact reports per second and alert expert systems which analyze potential threats. These two applications ments for an EDS:
provide certain
require-
(a) a query language and data model that can express the “semantics” of space and time; (b) the batched reintegration of updated replicated information in a possibly distributed database; (c) efficient processing of a large number of situation-action rules (triggers) over the database on the arrival of new information; and (d) the optimization and processing of recursive logic rules over the database in reponse to query requests. Several interesting proposals have been made to include the processing of situation-action rules in DBMS. The approach suggested by Stonebraker and Hearst [15] is a tightly-coupled EDS architecture with the POSTGRES [9] rule system as an example. Another approach suggested by Delcambre [lb, 171is to extend the database query language SQL to include the specification of queries containing production rules. Finally, Sellis et al. [IS] propose the use of an existing relational DBMS to implement a generic production rule processor. This area of research is extremely important because of the many applications that require near real-time knowledge-based processing of large collections of constantly updated data. In subsequent sections we discuss how the fields of database management, artificial intelligence, and logic programming are dealing with these and related requirements in conceptualizing EDS architectures. 2.3. The role of databuse management Database management systems (DBMS) support the organizational concept the data is a corporate resource that must be managed, refined, protected and made available to multiple users who are authorized to use it. Thus, concurrent access to large shared databases is the raison d’etre of DBMSs. The
154
LARRY KERSCHBERG
commercial success of DBMS supporting the relational model of data has enabled users to develop information sytems that query and update large, structured collections of data that are viewed as “flat files”. However, DBMS are used in increasingly more complex environments, those in which entities are related in very complex ways, and the rules governing their behavior in response to updates need to be made explicit, rather than be hidden in application programs. The above requirements indicate the DBMSs should also be enhanced with reasoning capabilities for performance reasons. For example, there is a need to manage different types of data: text, graphics, images, computer-aided design (CAD) schemes, etc. Also, DBMS are being used to support complex environments such as software engineering environments (SEE), configuration management (CM), etc. Traditional DBMS are hardpressed to handle these new duties, and more robust systems-those supporting more semantically meaningful “data models” and new features such as extensibility and long transactions-are being proposed. This new class of system is called “object-oriented database systems” (0-ODB) [19,20]. The goal of 0-ODB is to provide data models whose structures, operations and constraints can deal with complex objects as they are! This implies that the objects may be hierarchical, and the operations on them need not be decomposed into simpler operations. The modeling paradigm for these systems is based on object-oriented programming as exemplified by the Smalltalk language [21]. In 0-ODB the complex objects are organized into typed classes of objects, and each class has associated with it a collection of permissible operations called methods. In order to perform an operation on an instance (read member) of a class, a message must be sent to the class requesting that the operation be performed. Thus, object classes know what types of operations are admissible, and they are responsible for the execution of those operations. In addition, the object classes are organized in type hierarchies in the traditional AI sense with property inheritance. Thus, if an object class does not have the requested method associated with it, the method may be found by moving up the type hierarchy. Such hierarchies are very important for the organization of both data and knowledge; they provide a powerful mechanism for placing data attributes and knowledge predicates (rules, constraints, methods) at the appropriate hierarchy level and object class. There are many important issues to be addressed when building 0-ODB [19]. They are outlined below: 0
Data abstraction and encapsulatio*An object class has a set of operators similar to abstract data types. In terms of the implementation interface, there must be both a public and a private interface.
Object identity-Every object has a unique identifier which is independent of the particular property values that the object may have. communicate by sending 0 Messages-Objects messages to one another, and each message consists of a receiver object-identifier, the message name and message arguments. class hierarchies l Property inheritance-The provide a degree of economy of specification by allowing generic properties to be defined for higher-level object classes and more specialized properties to be associated with lower-level object classes. objects and their methods 0 GraphicsComplex can best be understood and manipulated through object-oriented graphics interfaces. many complex 0 Transaction management-In applications transactions may run for long time periods, and effective methods are needed to handle consistency, concurrency control, recovery, etc. must be protected at the 0 ProtectioeObjects instance level. This is particularly important when multiple users are collaborating on the design of these objects, so that version control is an important issue. a Access management-Specialized access paths, storage structures and main memory management are essential for complex objects. 0 Methologies for object-oriented database desig*The object oriented paradigm requires new approaches to database design. It is important to be able to model not only data but also knowledge about objects.
0
Notable projects currently underway to construct the next generation object-oriented DBMS are POSTGRES at UC-Berkeley [9], PROBE at Computer Corporation of America [22], EXODUS at the University of Wisconsin [23], GEMSTONE at the Oregon Graduate Center and Servio Logic Development Corp. [24], ORION at MCC in Austin, Texas (251 and GENESIS at the University of Texas at Austin [26]. 2.4. The role of AI research Researchers in AI have become increasingly aware of the need and advantages of EDS architectures. From the AI view, there is a need to have knowledgebased applications access large databases. Reid Smith [27] has pointed out that the reasoning (inference) component of a knowledge-based system is but a small fraction (6%) of the total system code; real systems require the integration of diverse and possibly distributed data and knowledge sources. As knowledge bases become larger, they pose serious system performance problems. The patternmatching processes involved in determining which rules are candidates for “firing” are performance bottlenecks because the rules are not indexed with
155
Expert database systems respect to their component predicates. Most expert systems rely on operating system virtual memory techniques [28], and overall performance degrades under heavy paging requirements. Thus secondary storage access mechanisms are desirable for AI systems. They can be used to index a rule base and to provide fast access to facts stored in database files. The management of a knowledge base is an area of open research, although some results have been obtained [29, 301. Another important avenue of AI research that impacts EDS is work concerning the Knowledge Level, and the insights gained by asking modal questions regarding the knowledge base and knowledge derived through inference [31, 131. By taking a fiinctionul view of a knowledge base as a knowledge server, one can “tell” and “ask” the knowledge base what it knows. One fundamental result of their work is that AI knowledge representation systems require complicated processing and interpretation of symbolic information and axioms associated with the “world” being modeled. The processing involved is quite complex, exceeding the capabilities of current database systems; but it may be possible to characterize different types of knowledge “engines” that are amenable to support by EDS. For an excellent review of the International Conference on Expert Database Systems--especially the Keynote Address and the Panels-the reader is referred to [32]. 2.5. The role of logic and logic programming Logic and logic programming play an important role in EDS architectures. Logic provides a formal basis for both relational databases and database theory. For example, logic may be used to extend the expressive power of query languages to include recursive queries that handle the transitive closure operation found in applications such as the inventory partsexplosion problem, CAD/CAM and routing problems. In addition logic and logic programming provide efficient mechanisms to integrate data, meta-data, that is, data about data or schema information, domain-specific knowledge and control knowledge. Logic views databases from two points of view: (1) the model theoretic approach; and (2) the proof theoretic approach. In the model theoretic approach the database definition, or schema, is viewed as a time-variant definition, a theory, specified by means of data structure definitions and integrity constraints. The database state, that is, the collection of data instances, at any time is an interpretation of that theory. Integrity contraints are proper axioms that the database state must satisfy at all times. Queries on the database are expressed as well-formed formulae to be translated into relational operations. Traditional DBMS such as relational systems adhere to this viewpoint. The proof theoretic or deductive database approach there is no separation of the schema and
the data. The database is represented as a first-order theory with equality. Facts (data instances) are represented as ground well-formed formulae (wffs) of the first-order theory. The set of proper axioms provides the wffs for deduction and integrity constraints. The set of theorems constitutes the implicitly derivable information. Queries are considered theorems to be proved from the axioms. Logic programming provides a computational language, Prolog, for the proof theoretic approach. Prolog processes Horn Clauses which are a subset of First-Order Predicate Calculus. However, Prolog is not logic programming because it has several additional features not found in logic programming:
0 0 0
a
Built-in I/O predicates to read and write to and from terminal and databases. Control of search through depth-first-search and backtracking. Built-in predicates such as Cut and Fail which can control the inference process. Performance sensitivity to the order of predicates in the knowledge base, and the order of terms within predicates.
phenomenon is the “impedance A well-known mismatch” between Prolog and relational databases; Prolog evaluation is tuple-at-a-time due to the unification process, while relational databases perform associative retrieval on large collections of data. This presents performance problems in a loosely-coupled architecture with a Prolog-based knowledge system and a relational DBMS. Several excellent articles on the LP view of EDS are found in [33-351. Recent research in LP has focused on more tightly-coupled Prolog-DBMS architectures [lo], which take advantage of the DBMS data dictionary information regarding file index and storage structures to control access to data, and to pre-fetch collections of data into Prolog’s fact base for more efficient processing. Logic is a natural way to specify complex queries to a DBMS, but systems using Prolog as the query language require the user to be aware of Prolog’s inference strategy and the order of evaluation. This results in the user’s having to specify the order of query predicates so as to avoid performace problems. The Logic-Based Data Language [36] alleviates the requirement that the user specify predicate ordering and, in addition, it eliminates the “impedance mismatch” by compiling logic queries into an extended relational algebra.
3. EDS
ARCHITECTURES INFORMATION
FOR INTELLIGENT SYSTEMS
EDS architectures will have a major impact on the next generation of software and hardware systems that support knowledge-directed processing of large, shared databases. The research and development
156
LARRYKERSCHBERG
goals and requirements of EDS affecting intelligent information systems are: 1. A unified and formal knowledge/data model that captures not only data semantics (i.e. objects, properties, ass~iations) but also know~dge semaatics (i.e. concepts, rules, heuristics, uncertainty, scripts, etc.) of an application, 2. The EDS specification and manipulation languages should have modeling primitives to express and reason about causal, temporal and spatial relationships. It should also have facilities to allow for user-extensible, object-oriented views of the enterprise or application domain. 3. The EDS architecture will merge patterndirected search and inference, such as those found in production, rule and logic-based systems, with DB associative retrieval and join processing to provide efficient knowledgebased query processing, semantic integrity maintenance, as well as constraint-directed reasoning and system control. This merging of tools and techniques will require a Knowledge Encyclopedia to manage internal system knowledge as well as domainspecific knowledge and data, and to “package” it for the various tools that will access the knowledge/database. 4. The EDS should provide facilities for inexact reasoning over large databases, and an explanation facility will justify the reasoning process to application developers and users. The above-mentioned EDS features will profoundly influence the design and development of intelligent information systems. The traditional phased, linear, development lifecycle will be replaced by an iterative, interactive, fast prototyping mode. In addition, the knowledge-based approach will promote llpw classes of applications in which the knowledge and data engineers work with an expert to elucidate strategic and domain-specific knowledge and data organizations. These new applications represent a vertical extension of applications beyond the well-defined, predictable, transaction-oriented systems to those with knowledge-directed reasoning and interpretation under conditions of uncertainty. Just as DBMS are used to manage data as a corporate resource, EDS will manage knowledge as a corporate resource. The availability of a unified Knowledge/Data Model will facilitate a specificationbased approach for creating knowledge schemes that express the semantics of both knowledge and data. The Knowledge Encyclopedia will represent both system and application knowtedge in an object-oriented view in which object behavior is specified by explicit constraints [37, 381. These constraints will be made available to tools that will help to design and manage the knowledge base. In fact, the Knowledge Encyclopedia will also be used to formulate user-in-
terfaces for knowledge acquisition, and to provide a meta-explanation facility that, by accessing both system and domain-specific knowledge, could document and explain system structure and dynamics to the Knowledge/Data Administrator. The following sections present the major ideas regarding the Knowledge Data model and a prototype system called KORTEX. 3.1. The Knowledge Data Model Recently, Potter and Kerschberg have developed the Knowledge Data Model (KDM) [30]. The KDM belongs to the class of semantic data models. It provides an object-o~ented view of both data and knowledge. KDM modeling primitives include those found in semantic data models, object-oriented models and AI knowledge representation formalisms. The KDM represents an evolution of the Functional Data Model [39,40] and the PRISM system [37,38]. It contains modeling features that allow the semantics of an enterprise to be captured. These include data semantics, as captured by semantic data models and knowledge semantics, as captured in knowledge-based systems, such as heuristics, uncertainty and constraints. The KDM modeling primitives are: Generalization-Generalization provides the facility in the KDM to group similar objects into a more general object. This is done by means of the “is-a” relationship. This generalization hierarchy defines the inheritance structure. Classificatio~Classification provides a means whereby specific object instances can be considered as a higher-level object-type (an object-type is a collection of similar objects). This is done through the use of the “is-instance-of” relationship. AggregatioeAggregation is an abstraction mechanism in which an object is related to its components via the “is-part-of” relationship. MembershipMembership is an abstraction mechanism that specifically supports the “isa-member-of” relationship. The underlying notion of the membership relationship is the collection of objects into a set-type object. For example, an attack-mission may be the a set object-type with member object-types land-assaa~t-team and airassauit -team.
Temporal-Temporal relationship primitives relate object-types by means of synchronous and asynchronous relationships. Constraint-This primitive is used to place a constraint on some aspect of an object, operation or relationship via the “is-constraint-on” relationship. For example, for an air squadron there may be a limit on the number and types of aircraft participating. Heuristic-A heuristic can be associated with an object via the “is-heuristic-on” relationship. These are used to allow the specification of rules and knowledge to be associated with an object. In this
Expert database systems way object properties ate heuristics.
can be inferred
using appropri-
These primitives give a designer the abstraction mechanisms requisite for flexible knowledge and data modeling. They extend semantic data models by: 1, Incorporating heuristics to model inferential relationships. 2. Organizing and associating these heuristics with objects. 3. Providing a tight-coupling (at the specification level) between data objects and knowledge concepts. Architecturally, the KDM integrates various levels of data and meta-data. These ideas are common to the PRISM system [37,38] and some knowledgebased systems. In particular, the KDM allows a uniform description of, and access to: (1) a populated knowledge/data base organized in terms of a valid KDM schema; (2) the KDM schema (meta-data in the Data Dictionary sense); (3) the KDM metaknowledge, e.g. the primitives, functions and predicates used to manage the KDM itself; and (4) the knowledge (strategic and system specification methodological knowledge about how the system will allow access to and use of the KDM support tools). The KDM philosophy allows Knowledge and Data Administrators to tailor user views and even data models (e.g. an Entity/Relationship model) using KDM primitives. The specification language for the KDM is now discussed.
The KDM specification language is the Knowledge Data Language (KDL) [41]. The KDL has two primary constructs, the object-type and the attribute. These are similar to the notion of entity type and function in the Functional Data Model and the DAPLEX language [40]. An object-type is a collection of homogeneous objects having similar properties; the object-type corresponds to real-world objects or concepts. An attribute corresponds to a property or characteristic of an object. In KDL, an attribute of an object-type relates, associates or maps an object of an object-type to either another object, a set of objects, or a list of objects in a target (or range) object-type. The KDM and KDL support attributes that may be computed or inferred. thereby allowing knowledgebased reasoning processes to deduce attribute values and materialize complex views. In Fig. 1 is a template written in KDL to define a generic object-type specification. An object-type can be defined together with its attributes, value-types, any groupings of attributes, and constraints or heuristics associated with attributes. In additon, any subtype or supertype object-types may be specified, together with appropriate constraints or heuristics.
157
object-type: OBJECT-TYPE-NAMEhas [attributes: ( Al-l-RIBUTE-NAMEz
[set of/list ofl VALUE-TYPE /rdefauh is single-valued composed of [ATTRIBUTE-NAME,)] with consuaints (predicate,]] with heuristics (RULE,)];)] [subtypes: (OBJECT-WE-NAMEJI [constraints :
iH-41
[heuristics : ble,)f P’successors, predecessors, and concurrents are temporalprimitives
[successors: (OBJECT-TYF’E-NAME,)]
lpredeCe.%ors: (OBJECT-TYPE-NAME,]] [concur-rents: (OBJECT-TYPE-NAME,)] [members: {MEMBER-NW MEMBER-TYPE11 [instances:
fINSTANCE,1 end-object-type
Fig. I. The KDL template for object-type specification.
Finally, temporal relationships among the objecttype and other object-types may be given. Instances or members of object-type can also be specified in the “frame.” Thus, meta-data and knowledge, in the form of constraints and heuristics, as well as data instances, may exist in the same object-type specification. In the same way that templates can be used for object-type specification, they can also be used to define KDM meta-data concerning the model itself, that is, the specification of object-types, attributes, constraints and heuristics are defined in KDL. The KDL supports a query facility similar to DAPLEX and allows updates both to the schema and to database instances. However, no facilities as yet exist to study the impact of changes to the database schema or to the constraints or heuristics. Figure 2 depicts a hypothetical Air Force example in terms of the KDM and its graph representation formalism which is based on the Functional Data Model. Single-, double- and triple-headed arrows set-valued and list-valued denote single-valued, functions, respectively. The rectangles denote objecttypes. Here we note that some objects are standard while others are inferred (virtual) (e.g. Aces and the Strength of a Squadron). In this particular example the concept of a squadron’s strength is based on the missions flown, aircraft scheduled, the number of pilots and the ratio of Aces to regular pilots. The English statement for the squadron’s strength is: A squadron’s strength is squadron conducts at least month, schedules more than least 10 pilots, and the ratio is at least 0.25.
“Strong” if the 50 missions per 20 aircraft, has at of Aces to pilots
158
LARRY KERSCHBERG
KDM Virtual
Objects
Fig. 2. An Air Force example
The associated heuristic, SH, for the strength attribute is: For EACH s in Squadron SH-C 1: IF COUNT(conducts(s)) > 50 AND SH-C2: IF COUNT(has-pilots(s)) > 10 AND SH-C3: IF COUNT(schedules(s)) > 20 AND SH-C4: IF RATIO(COUNT(aces(s)) TO COUNT(pilots(s))) > 0.25 THEN RI: strength(s) = “strong” Note that the virtual type Aces can also be defined in terms of a heuristic that defines those pilots qualifying to be considered Aces. In processing the strength heuristic, the KDM processor would first evaluate the Aces (possibly) through inference and then evaluate the strength heuristic. An important issue is that the KDM’s constraint and heuristic primitives allow knowledge-based concepts to be specified as easily as database structures. This capability is similar to database views, but the view is very tightly associated with the schema. In addition, views involving views are quite natural, e.g. the Aces virtual type used in the strength heuristic. 3.3. The KORTEX EDS shell KORTEX [42] is an experimental EDS prototype that embodies many of the concepts of the KDM. A discussion of the major features follows. 3.3.1. The KORTEX conceptual framework. KORTEX is a prototype semantic database system which embodies many of the concepts of the KDM, especially the (Attribute, Object, Value) paradigm, and also supports a limited inferential capability.
KORTEX provides users with an extended Entity/ Relationship view of data together with rule-basd knowledge representation for inferring attribute values and materializing user-specified virtual objects. The user represents all entities (types) (such as aircraft, personnel) directly as objects with their corresponding attributes (such as id-number, name, years-of-seruice). In a similar fashion, the user represents all relationships among entities (such as Personnel-assigned-to-aircraft) as objects. These relationships may also have attributes (such as dateof -assignment ). In addition, one may specify that one entity is a generalization or specialization of another entity (such as Colonel IS-A Ojicer). Finally, an attribute value may be determined from data values, by means of computed formulas or through inference rules (such as service-required-of-aircraft based upon miles-flown, age-of-aircraft, etc.). The system functionality is next described, followed by a specification of the system architecture. 3.3.2. The KORTEX user interface and data/knowledge specljication. KORTEX has an interactive menu system that guides the user to specify relationships among entities. This contrasts with the usual situation found in relational database systems, in which relations are not explicitly represented by the database system, but instead are buried within the data. By allowing relationships to be treated as objects in their own right, the KORTEX model allows users to specify desired attributes for relationships. KORTEX uses interactive menus to obtain user specification of entities and relationships. Entities are characterized by their attributes. Attribute values may be numeric (integer or real) or character strings (such as names). A new feature allows attribute values to also be set valued (such as one of [low, medium, high]), group items (such as address, comprised of street, city, state and zip code) or Boolean (true or false) (such as for critical-need). Moreover, the user may place constraints upon appropriate attribute values (such as 0 < age < 100). Attribute values may be specified by the user upon data entry (such as year-of-birth is 1950). Alternately, attribute values may be determined by a user-specified formula, such as age is current-year -year-ofbirth. Finally, a collection of rules may be invoked to infer attribute values from other entity attributes (such as critical-shortage of item inferred from values of quantity-on-hand and degree-of-importance). This last feature is an example of the novel Knowledge Component of the database system. A central advantage of this embedded inferential knowledge is that such information need be specified only once by the user while constructing the database. Hence, such knowledge need not be specified repeatedly in separate application programs using the database. The KORTEX rule-based knowledge system has further system applications, including the specification of virtual entities, which appear as distinct
Expert
database
to the user, but are actually inferred by the system from a related entity. There is a trade-off between explicitly storing data resulting from an inference process or recomputing it every time it is needed. The trade-off involves the computational costs of inference vs the storage cost of materialized views. Another aspect is the update activity against the database, and the effects it might have on the results of the computed view. KORTEX also supports the notions of generalization and specialization. That is, when a user adds a new entity type to the database, KORTEX will allow the user to specify whether this new entity type is a generalization or specialization of any other entity within the database. Since a specialized entity inherits all attributes of its generalized entity, this greatly reduces the effort required to describe an entity. It also simplifies the logical interactions of the database entities, aiding the user in “keeping track” of the overall database structure Further, the system supports the explicit inheritance mechanism associated with generalization hierarchies. The functionality just described conforms to that of a knowledge-based extended Entity-Relationship model. The Entity-Relationship model has been used in practice to aid in hand-developing database schemas. KORTEX has automated this process to better aid database development. Adding the recent semantics of specialization and generalization further increases the expressive power of this database specification tool. The object-oriented semantic data/knowledge model provides a mechanism to define modular data/knowledge “chunks.” Finally, concepts from expert or rule-based systems are incorporated in KORTEX, making this an extremely expressive database system employing tools and techniques from both database and artificial intelligence systems. 3.3..3. 7%~ knowledge data kernel. KORTEX is structured as an object-oriented system and is written objects
in both Franz LISP and COMMON LISP. Each prototype currently has about 5000 lines of LISP code. All major database constructs (such as entities and relationships) are modeled as objects. Inforrnation pertaining to an object (such as attributes)
resides with that object, so that a database construct and its components reside together as a unit. These objects are represented as frames. Individual units of information are represented within a frame using the attribute-object-value paradigm of the Knowledge Data Model (such as “(type (entity (aircraft)))” to represent “aircraft has type entity”). This structure facilitates the management of the knowledge component of KORTEX. There is a methodological component in KORTEX. Each object in KORTEX is developed in stages of increasing detail. The first stage of development captures the overall structure of an object and its interrelationships (that, is, the database schema). The successive system representations decompose each
159
systems
object into “concepts,” with assigned unique concept numbers. Concepts refer to object characteristics such as generalization/specialization, relations with other objects, and attributes. Concepts are used to bring together metadata associated with objects. It is here that attribute details are listed, including userspecified formulas or inference rules (in the form of AND/OR trees). KORTEX uses the concept numbers to index object information as needed for retrieval and manipulation. After the user has completed the specification of the database schema, KORTEX allows him/her to enter database instances. Each database instance is in the “is-instance-of” relationship with either an entity type or a relationship type. During specification, the system enforces all user-stated constraints (such as value limits and key specifications). At any time, the user may browse the database to retrieve data based on a variety of conditions that may be placed on attribute values (such as age > 30 and years-ofservice < 10). There is also a browsing capability to list general database schema information. Database instances are also stored as objects (or frames) by the system. Specific object information is stored with its corresponding concept number, to aid in data retrieval. 4. CONCLUSIONS The integration of concepts, tools and techniques from DB, AI and LP-as embodied in EDS-will create new and revolutionary environments for the specification, design, prototyping and maintenance of Intelligent Information Systems. Many researchers and practitioners are providing insights and prototypes that lead to new architectures for the software and hardware systems of the 1990’s and beyond. Acknowledgements-The
author would like to thank Richard Baum for his help in perfecting the KORTEX system and to J. Hung who implemented the KORTEX system as part of his M.Sc. Degree.
REFERENCES [l] L. Kerschberg (Ed.). Experr Database Systems: Proceedings from the First International Workshop. Benjamin/Cummings, Menlo Park, Calif. (1986). [2] L. Kerschberg (Ed.). Expert Database Systems: Proceedings from the First Int. Conf: Benjamin/Cummings, Menlo Park, Calif. (1987). 131 L. Kerschberg (Ed.). Expert Database Systems: Proceedings from the Second International Conference (George Mason University, Fairfax, VA, 1988). Benjamin/Cummings, Menlo Park, Calif. (1988). [4] J. M. Smith. Expert database systems: a database perspective. In [l]. 151 _ . C. L. Chang and A. Walker. PROSQL: a Prolog programming interface with SQL/DS. In [l]. 161 Y. E. Ioannidis. J. Chen, M. A. Friedman and M. M. e ’ Tsangaris. BERMUDA-an architectural perspective on interfacing Prolog to a database machine. In [3]. [7] B. Napheys and D. Herkimer. A look at looselycoupled prologidatabase systems. In [3].
LARRY KERSCHBERG
160
PI R. M. Abarbanel and M. D. Williams. A relational
representation for knowledge bases. In [2]. [91 M. Stonebraker. Object management in POSTGRES using procedures. In [20]. PO1 S. Ceri, G. Gottlob and G. Wiederhold. Interfacing relational databases and Prolog efficiently. In [2]. [111 D. Leinweber. Knowledge-based systems for financial applications. IEEE Expert Fall, 18-31 (1988). WI J. Weitzel and L. Kerschberg. Developing knowledgebased systems: reorganizing-the systems development life cvcle. Commun. ACM 32(4), (1988). 1131 R. J: Brachman and H. J.. ‘Levesqde. Tales from the far side of KRYPTON. Expert Database Systems: Proceedings from the First International ference. Benjamin/Cummings, Menlo Park,
Con-
Calif.
(1987). [I41 M. L. Brodie (Chair), D. Bobrow, V. Lesser, S. Madnick, D. Tsichritzis and C. Hewitt. Future artificial intelligence requirements for intelligent database systems. Panel Report, Expert Database Systems.’ Proceedings
from
the Second
International
Conference.
Benjamin/Cummings, Menlo Park, Calif. (1988). [I51 M. Stonebraker and M. Hearst. Future trends in expert data base systems. In [3]. iI61 L. M. L. Delcambre and J. N. Etheredge. The relational production language: a productional language for relational databases. In [3]. v71 L. M. L. Delcambre. RPL: an expert system language with query power. IEEE Expert Winter, 5161 (1988). I181 T. Sellis, C.-C. Lin and L. Raschid. Implementing large production systems in a DBMS environment: concepts and algorithms. Proc. ACM SIGMOD Conf., May (1988). [I91 C. Zaniolo. Prolog: a database query language for all seasons. In [I]. PO1 K. Dittrich and U. Dayal (Ed.). Proc. 1986 Int. Workshop on Object-Oriented Database Systems, ACM and IEEE (1986). The PI A. Goldberg and D. Robson. Smalltalk-SO: Language and Its Implementation. Addison-Wesley, Reading, Mass. (1983). P21 F. Manola and U. Dayal. PDM: an object-oriented data model. In [20]. [231 M. J. Carev el al. The architecture of the EXODUS extensible DBMS. Proc 1986 Int. Workshop on Object-Oriented Database Systems, ACM and IEEE (1986). v41 D. Maier and J. Stein, Indexing in an object-oriented DBMS. In [20]. WI W. Kim. Architectural issues in object-oriented databases. MCC Technical Report Number ACT-OODS115-89 (1989). 1261 D. Batory. GENESIS. a project to develop an extensible database management system. Proc. 1986 Int. Workshop on Object-Oriented Database Systems, ACM and IEEE (1986). I
1271 R. Smith. On the development of commercial expert systems. The AI Mag. S(3), 61-73 (1984). PI M. Deering and J. Faletti. Database support for storage of AI reasoning knowledge. In [I]. ~91 L. Cholvy and R. Demolombe. Querying a rule base. Expert Database Systems: Proceedings from the First International Conference. Benjamin/Cummings, Menlo
Park, Calif. (1987). 1301 W. D. Potter and L. Kerschberg. The knowledge data model: a unified approach to modeling knowledge and data. Proceedings of the IFIP DS-2 Working Conference on Knowledge and Data (R. Meersman and J. Sowa, Eds), Albufeira, Portugal, November, 1986; also in Darn and Knowledge (R. Meersman and J. Sowa, Eds). North-Holland,-Amsterdam (1988). 1311 R. J. Brachman and H. J. Levesaue. What makes a knowledge base knowledge? A view of databases from the knowledge level. Expert Database Systems: Proceedings
from
the First
International
Workshop.
Benjamin/Cummings, Menlo Park, Calif. (1986). ~321R. P. van de Riet. Expert database systems, conference report. Future Generations Computer Systems, Vol. 2, No. 3, pp. 191-196. North-Holland, Amsterdam (1986). [331 D. S. Parker er al. Logic programming and databases, Working Group Report. In [I]. I341 E. Sciore and D. S. Warren. Towards an integrated database-Prolog system. In [I]. [351 C. Zaniolo et al. Object oriented database systems and knowledge systems. In [I]. [361 S. Tsur and C. Zaniolo. LDL: a logic-based data language. Proc. of the 12th Int. Conf Very Large Data Bases, Kyoto, Japan (1986). I371 A. Shepherd and L. Kerschberg. PRISM: a knowledgebased system for semantic integrity specification and enforcement in database systems. Proc. ACM SIGMOD Int. Conf on Mgt of Data, Boston, pp. 307-315 (1984). 13’31A. Shepherd and L. Kerschberg. Constraint management in expert database systems. Expert Database Systems: Proceedings from the First International Workshop, pp. 309-331. Benjamin/Cummings, Menlo
Park, Calif.-(1986). I391 E. H. Siblev and L. Kerschbera. Data model and data architecture considerations. Proc. National Computer Conf, pp. 85-96. AFIPS Press, Reston, VA (1977). 1401 D. W. Shipman. The functional data model and the data language DAPLEX. ACM Trans. Databases Syst. 6(l),
140-173 (1981).
[41] W. D. Potter, R. P. Trueblood and C. M. Eastman. KDM/KDL: a hyper-semantic data model and specification language. Engineering. North Holland, Amsterdam (1990). [42] L. Kerschberg, R. Baum and J. Hung. KORTEX: an expert database system shell for a knowledge based entity relationship model. In?. Conf. on the Entity/Relationship
Approach
(1989).