An evaluation of the FGCS data & knowledge base system — expectations and achievements

An evaluation of the FGCS data & knowledge base system — expectations and achievements

Future Generation Computer Systems 9 (1993) 153-158 North-Holland 153 An evaluation of the FGCS Data & Knowledge Base System - expectations and achi...

518KB Sizes 31 Downloads 43 Views

Future Generation Computer Systems 9 (1993) 153-158 North-Holland

153

An evaluation of the FGCS Data & Knowledge Base System - expectations and achievements Shojiro Nishio * Department of Information SystemsEngineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka 565, Japan Abstract In this paper the developments of the Data & Knowledge Base System will be described such as they evolved in the FGCS project and an appraisal will be given. The Data & Knowledge Base System consists primarily of the Kappa and Quixote systems. Kappa is the lower level system, in particular Kappa-P is the parallel DBMS implemented on a parallel inference machine. Quixote is essentially a constraint logic programming language with object-oriented features, such as object identity, complex objects, encapsulation, type hierarchy and methods. Quixote is used for powerful applications, such as a legal system and a system for molecular biological databases.

Keywords. Data and knowledge bases systems; deductive databases; object-oriented databases; fifth generation computer systems; logic programming.

1. lntroducliml T h e Fifth Generation Computer Systems (FGCS) project was launched in 1982 after a three-year preliminary study stage financially supported by the Japanese Ministry of International Trade and Industry (MITI). The FGCS project was planned as a ten year project. The basic framework of the fifth generation computer is parallel and inference processing based on logic

programming. A ten year project with a budget of 50 billion (YEN) is an unprecedented national undertaking. A ten year time frame is rather long in rapidly progressing fields such as information processing. During its initial stage, the FGCS project attracted the attention of the entire world. Actually, this project was a catalyst for the wide acceptance and popularity of amficial intelligence research and had the profound effect of stimulating research activity overseas as well as in Japan. * Email: [email protected]

However, today the impact of this project is not as pronounced as during the initial stage. Is it true that the FGCS project fell short of our expectations? At this point, it is very important to evaluate the overall activities and the results of the project. This p a p e r is devoted to the discussion of such issues with respect to data & knowledge base systems. We note here that the author has been involved in several working and sub-working groups organized by the Institute of New Generation Computer Technology (ICOT). The activities of these groups will be described in Section 5.

2. Evolution of the FGCS Data & Knowledge Base System In o r d e r to evaluate the p r o d u c t s in data & knowledge base systems of the FGCS project, it is important to trace the development of these systems (see Fig. 1). At the beginning of the project, it was planned to develop data & knowl-

0376-5075/93/$06.00 © 1993 - Elsevier Science Publishers B.V. All rights reserved

154

s.Nishio the requirements of the necessary functions, then design and implement the knowledge database machine. In the final stage of the FGCS project, the development of the Kappa DBMS software has been extended and finally Kappa-P, a parallel DBMS, was implemented on parallel inference machines. On this database engine Kappa, the knowledge representation language Quixote has been developed. A large scale software system integrating Kappa and Quixote was delivered as the final data & knowledge base system of the FGCS project. However, the development of knowledge database machines has not been realized. Because of the above mentioned change of development policy, the FGCS project has not achieved the goals of the original plan for the data & knowledge base systems. Strictly speaking, it failed in the development. However, it is fairer to discuss its success or failure by evaluating the large scale data & knowledge base software developed during the project. The importance of advanced large scale DBMS software is obvious as was emphasized by the Laguna Beach Report [7]. This interesting report discussed the important future research topics in the database

edge base machines. In the initial stage, from 1982 to 1984, the sequential database machine Delta was intensively developed. Then in the intermediate stage, the parallel knowledge base testbed system Mu-X was developed. However, the development of the knowledge database machines was interrupted in the intermediate stage. Thereafter the emphasis was transferred from hardware systems to the development of the Kappa (knowledge application oriented advanced database management system (DBMS)) series of software. This DBMS software ran on the PSI as well as Multi-PSI machines which were produced during the initial and intermediate stages of the project. This change in development policy for knowledge database systems was due to the complexity of the target systems. These systems have very complex structures, such as nested relations, and a strong capability to handle higher order rules, i.e. more complex than Datalog. Thus it is very hard to design the hardware specification of the planned knowledge database system. As a result, the following strategy was employed. First, develop the software for the target system running on the PSI and Multi-PSI and precisely define

Intermediate Stage

Initial Stage 1982 1984 Knowledge Base Machine

1985

Delta

Final Stage 1988

1989

Mu-X

:

PHI

Knowledge Base Software

Inference Machine

1991

Kappa-P Kappa-I

Kappa-II

PSI-I

PSI-II

Multi-PSI

CHI

CHI-II

Fig. 1. The development of the FGCS Data & Knowledge Base system.

PIM

b

Evaluation of the FGCS Data & Knowledge Base System

155

f~Applieations Genorne Databases for Genetic Information Processing Legal Precedent Databases for Legal Reasoning Dictionaries for Natural Language Processing Semantic Representation in Natural Language

Know,e-aseLanage

Knowledge Representation Language

)

QUIXOTE

atase

Kappa-II Kappa-P

)

Fig. 2. The framework of a Knowledge-Base (DOOD) Management System of the FGCS project.

field, and suggested that the development of large scale general purpose advanced DBMS software is more important than that of database machines.

3. Configuration of the FGCS Data & Knowledge Base System Referring to the papers [9,11], we shall take a general view of the configuration of the FGCS data & knowledge base system. The knowledge base management system consists of two layers as illustrated in Fig. 2 1. These upper and lower layers are specified by KL1 (i.e. parallel logic programming language developed by the project), and now running on PIMOS (i.e. standard parallel operating system for large-scale parallel machines used in symbol and knowledge processing).

1 This figure is from [11] with permission of the authors.

The lower layer is constructed out of DBMS

Kappa H and Kappa-P, and the upper layer uses the knowledge representation language Quixote. The design and implementation of these software components proceeded under the leadership of Mr. Kazumasa YOKOTA of ICOT. The development of the database layer, i.e. the Kappa series, started at the beginning of the intermediate stage. Kappa aimed to manage the 'natural databases' accumulated in society, such as natural language dictionaries. It employed a nested relational model so that it could easily handle data sets with irregular record sizes and nested structures. In addition to natural language dictionaries, Kappa has the strong capability to handle other 'natural databases' produced in our social systems, such as molecular biological databases and rule databases supporting legal reasoning systems. The first and second versions of Kappa were developed on a PSI machine using ESP. ESP is a logic programming language developed for PSI.

156

S.Nishio

The second version, Kappa-H, was completed at the end of intermediate stage. In the final stage, a parallel and distributed version of Kappa called Kappa-P was implemented employing the KL1 language. Kappa-P is intended to use large PIM main memories to implement the main memory database scheme, and to obtain a very high throughput rate for disk input and output by using many disks connected in parallel to element processors. For developing the upper layer, intensive research on knowledge representation languages and knowledge base management systems was conducted in conjunction with the development of Kappa-H and Kappa-P. Finally, a deductive and object-oriented database (DOOD) [12] was employed as a basis which provided us with a knowledge representation language Quixote. The integrated Quixote and Kappa-P system is a powerful knowledge base management system with a high-level knowledge representation language as well as a parallel and distributed DBMS as the base of the language processor. Version 3.0 of Quixote will be completed soon and is scheduled to be registered as free software. This software assumes the client/server computer architecture, where the client side can operate in the UNIX 2 environment, although the server software still requires the PIMOS environment.

4. Advances in the DOOD system Since the late seventies, many data models have been proposed for the extension of the relational model in order to overcome its various disadvantages such as inefficient representation and inadequate query capability. Among them, deductive databases and object-oriented databases have been at the forefront of research into the next generation of intelligent database systems. Object-oriented programming and design methodologies hold great promise for reducing the complexity of very large software systems in such domains as computer-aided design and manufacturing, integrated office information systems, multimedia information systems, and artificial in-

2 UNIX is a registered trademark of AT & T Bell Laboratories.

telligence. Object-oriented database systems will enhance the programmer/user productivity of such systems. Research into deductive databases is aimed at discovering efficient schemes to uniformly represent assertions and deductive rules, and to respond to highly expressive queries against the knowledge base of assertions and rules. This area of research strongly interacts with logic programming which has developed in parallel, sharing logic as a common basis. Recently, research has been aimed at integrating object-oriented paradigms and rule-based deduction to provide a single powerful framework for intelligent database systems. Such an integrated database is called a DOOD. The first international conference to work towards such integration using DOOD was held in Kyoto in 1989 [5] and subsequently every two years. To realize the integration of deductive databases and object-oriented databases, the extension of the relational model, deductive databases, and object-oriented databases has been investigated from three directions. These directions are logic, the data model, and the computational model. More specifically, we can consider the following three methodologies: (1) Extension of deductive databases: The deductive databases are based on first-order predicate logic represented by terms. By extending the term representation to incorporate several important features of the object-oriented paradigm, such as object identity, class hierarchy, and method execution, we can integrate both databases. (2) Extension of object-oriented databases: We can support the rule-based deductive capability in object-oriented database systems by, for instance, describing the rules declaratively in the method part. (3) Extension of relational databases: We can directly extend the relational database systems to have object-oriented features as well as a deductive capability. The pioneering work in DOOD research is the extended term representation O-logic [6] proposed by D. Maier, which employed the first method. Following O-logic, many extended term representations such as Hilog [2], F-logic [3,4], IQL [1] and DOT [8] have been proposed. The knowledge representation language of the FGCS project, Quixote, is also involved in this extension

Evaluation of the FGCS Data & Knowledge Base System

of the database model to realize a DOOD system. Quixote is essentially a constraint logic programming language with object-oriented features such as object identity, complex objects, encapsulation, type hierarchy, and methods. Quixote is currently considered to be the most advanced and powerful knowledge representation language for DOOD systems. Using Quixote, we can steadily construct a large scale knowledge base from a simple database, i.e. starting with the accumulation of passive fact data, then gradually accumulating active rule data, and finally we can construct a complete knowledge base. Several very large scale application systems have already been developed based on Quixote [11] (see Fig. 2). Among them are a molecular biological database and a legal reasoning system (TRIAL). The development of these application programs based on Quixote is fully due to its strong capabilities supported by the features of D O O D systems. Among many advanced features of Quixote, we can find significantly rich concepts in its object identifiers, module, and database update. At the Second International Conference on D O O D held in Munich in 1991, J. Ullman presented an interesting invited paper titled 'A Comparison between Deductive and Object-Oriented Database Systems' [10], where he wrote: "despite some important concepts originating with the object-oriented approach, the deductive family of systems will ultimately dominate". As a proof of the limitation of object-oriented approach, he pointed out that the declarative capability (more specifically, deductive capability) is required for object identifiers to support the infinite structure. We often have to handle such structures in knowledge representation. For instance, assume that a graph with cyclic structure is given. Then it is hard to specify all of the infinite paths (i.e. infinite objects) generated by the graph without recursively defined terms. In Quixote, the object identifiers are given by extended terms defined by attributes of objects, which can support the infinite structure of objects. Note that several other extended term representations including F-logic employed extended terms to represent objects, however, Quixote has richer concepts regarding object identifiers as compared with these extended terms. The module concept of Quixote is introduced

157

in order to classify knowledge and handle local inconsistencies. A module is defined as a set of rules and can specify a unit, or a granule, of knowledge. For instance, let us consider the knowledge involved in a book. Then the knowledge included in a chapter or a section constitutes a module. The knowledge base system based on Quixote can consist of many such modules, and the acyclic relation between these modules is called a submodule relation and is considered to be the means by which the ride inheritance mechanism between modules is supported. Further, Quixote has powerful database update mechanisms based on the concept of nested transactions. For instance, dynamic insert and dynamic delete of terms is possible during query processing. In addition to the above capabilities, !Quixote has many more functions than those suppOrted by the conventional extended term representations. Quixote is in a sense an over-specificatlon language, which means that users can select any subclass of Quixote according to the aimOd application area. However, except for [11], there are presently few papers published in English which discuss such functions in detail. We hope that more detailed papers regarding the design and implementation of Quixote, together witl~ its applications, will be available soon. Here we should note the incompatibility between Quixote and the parallelism of KL1. Currently, the important concepts of Quixote! such as object identity and module do not sufficiently correspond to the granurality of parallelism of KL1, although these concepts have been implemented in KL1. As a theorem prover of first-order predicate logic, MGTP (Model Generation Theorem Prover) has been developed in KL1 and it achieved a high performance. However, MGTP does not provide any feature of knowledg!e representation languages. To improve the compatibility with KL1 is therefore one of the most retquested future extensions of Quixote.

5. Concluding remarks To evaluate the FGCS data & knowledge base system, we have considered two crite~'ia, the hardware system and software system. As we have d e m o n s t r a t e d in this paper, the

158

S.Nishio

data & knowledge base machine (i.e. hardware system) has not been able to achieve the goals set forth at the beginning of the FGCS project. However, after the change in policy to develop the data & knowledge base system in the intermediate stage, the development as well as implementation of large scale data & knowledge management software has begun based on the framework of the DOOD. This system has many distinguished features in concept, size, and varieties, in comparison with other systems. The system aims to achieve not only a new paradigm for DOOD but also to provide data & knowledge facilities in practice for many knowledge information processing applications. Consequently, we can postulate that the FGCS project opened a new era of intelligent databases. In addition to the above noted technical developments, the FGCS project played a very important role through the activity of several working and sub-working groups, which were realized by the efforts of ICOT. For instance, the DOOD working group, established in 1989, played a key role in the success of the First International Conference on DOOD (DOOD89) in Kyoto. As other working and sub-working groups, ICOT organized groups for the topics of database programming languages (DBPL), deductive databases and artificial intelligence (DDB & AI), extended term representation (ETR), biological databases (BioDB), intelligent databases (IDB), and next generation databases (NDB). Beyond the barrier of different universities, institutes, and companies, many active young researchers were involved in these working groups and periodically held technical meetings. These meetings resulted in precious information on advanced data & knowledge base systems being exchanged; members presented their ideas for the next generation of intelligent databases and held very constructive discussions concerning these ideas. These activities will surely contribute significantly to the future development of attractive knowledge information processing systems. References [1] S. Abiteboul and P. Kanellakis, Object identity as a query language primitive, in: Proc. 1989 ACM SIGMOD Internat. Conf. on the Management of Data (1989) 159-173. [2] W. Chen, M. Kifer and D.S. Warren, Hilog as a platform for database languages (or why predicate calculus is not

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

enough), in: Proc. 2nd Internat. Workshop on Database Programming Language (1989) 121-135. M. Kifer and G. Lausen, F-Logic: A higher-order language for reasoning about objects, inheritance, and scheme, in: Proc. 1989 ACM SIGMOD lnternat. Conf. on the Management of Data (1989) 134-146. M. Kifer, G. Lausen and J. Wu, Logical foundation of object-oriented and frame-based languages, Technical Report 90/14 (revised), Dept. of Computer Science, University of New York at Stony Brook, 1990. W. Kim, J.N. Nicolas and S. Nishio, eds., Deductive and Object-Oriented Databases (North-Holland, Amsterdam, 1990). D. Maier, A logic for objects, in: Prec. Workshop on Foundation of Deductive Databases and Logic Programming (1986) 6-26. E. Neuhold and M. Stonebraker, Future Directions in DBMS Research, Technical Report, International Computer Science Institute, TR-88-001, Berkeley, CA, 1988. M. Tsukamoto, S. Nishio and M. Fujio, DOT: A term representation using DOT algebra for a knowledge-base, in: Proc. 2rid Internat. Conf. on Deductive and Object-Oriented Databases (I)001)91) (1991) 391-410. S. Uchida, Summary of the Parallel Inference Machine and its basic software, in: Proc. Internat. Conf. on Fifth Generation Computer Systems 1992 (1992) 33-49. J.D. Ullman, A comparison of deductive and object-oriented database systems, in: Proc. 2nd Internat. Conf. on Deductive and Object-Oriented Databases (DOOD91) (1991) 263-277. K. Yokota and H. Yasukawa, Towards an integrated knowledge-base management system (overview of R & D on databases and knowledge-bases in the FGCS project), in: Proc. Internat. Conf. on Fifth Generation Computer Systems 1992 (1992) 89-112. K. Yokota and S. Nishio, Towards integration of deductive databases and object-oriented databases - A limited survey, in: Proc. Advanced Database System Syrup. (1989) 253-261.

Shojiro Nishio received the B.E., M.E., and Dr. E. degrees from Kyoto University, Kyoto, Japan, in 1975, 1977, and 1980, respectively. From 1980 to 1988, he was with the Department of Applied Mathematics and Physics, Kyoto University. In October 1988, he joined the faculty of the Department of Information and Computer Sciences, Osaka University, Toyonaka, Osaka, Japan. Since August 1992, he has been a Professor in the Department of Information Systems Engineering of Osaka University, Suita, Osaka, Japan. He has held visiting appointments with the University of Waterloo and the University of Victoria in Canada. His current research interests include database systems, knowledge base systems, and distributed computing systems. Dr. Nishio served as the Program Committee Co-Chairs of several International Conferences including the First International Conference on Deductive and Object-Oriented Databases (DOOD89). He currently serves on the editorial boards of IEEE Transactions on Knowledge and Data Engineering and New Generation Computing. He is also a member of five learned societies, including ACM and IEEE.