Data & Knowledge Engineering North-Holland
1 (1985) 31-73
31
A conceptual design aid environment expert-database systems*
for
Ramin YASDI Inhut
fiir Informatik,
University of Stuttgart, D-7ooO Stuttgart 1, Fed. Rep. Germany
Abstract. With the actual penetration of expert systems into the business world, the question is, how the expert system idea can be used to enhance the existing information systems with more intelligence in usage and operation. This interest is not surprising due to the advancement of the fifth generation of computer technology, and avid interest in the field of Artificial Intelligence. Therefore design of an information system for an application becomes more complex, and the inability of the human designer to deal with it increases. For designing intelligent systems, we have to be able to forecast the behavior of the information system more precisely before implementing it, i.e. we’have to support the specification process. Clearly the technology, such as Data base systems, is leading on efficiency issues as those needed for the construction, retrieval and manipulation of large shared data base. On the other hand, the AI techniques have improved significantly with function such as deductive reasoning and natural language processing. It is important to find way to merge these technologies into one mainstream of computing. A meeting point.for the two areas is the issue of conceptual knowledge modelling, so that models can be created that will define the role and the ways to use data in AI systems. In the framework of this study, one possible expert system design aid environment has been suggested to assist the designer in his work. In a conceptual modelling environment a model is given for analysing complex real world problems known as the Conceptual Knowledge Model (CKM), represented by a Graphical and a Formal Representation. The Graphical Representation consists of three graphs: Conceptual Requirement Graph, Conceptual Behavior Graph, and Conceptual Structure Graph. These graphs are developed by involving the expert during the design process. The graphs are then transformed into first-order predicate logic to represent the logical axioms of a theory, which constitutes the knowledge base of the Expert System. The model suggested here is a step towards closing the gap between the theory of the conventional data base theory and AI databases. Keywords. Logic databases, knowledge verification of specifications.
1. Motivation
engineering,
information
systems, conceptual modelling,
formal
and goals
Software Engineering can be viewed as the production of a series of concepts, models, and tools and progressing toward techniques that are more and more machine oriented. The development of concepts, methodologies and tools for requirements specification is not a new subject matter for Software Engineers. Unlike other efforts, however, this work tackles the problem from the point of view of a data base designer, who wants to design a more intelligent system, having the capability of deduction, explanation and dealing with a large amount of data, with respect to the correctness and absence of contradictions. The work reported here is therefore aimed mainly of establishing a bridge across the border of Artificial Intelligence (AI)‘, with the idea of opening the door to future importing of ideas and technology. The goals of an Information System (IS) design have much in common with those of ‘This paper is an extended version of [53,55]. ‘Appendix A contains a list of abbreviations. 0169-023X/85/$3.30 @ 1985, Elsevier Science Publishers B.V. (North-Holland)
32
R.
Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
Knowledge Representation (KR) in AI. Both attempt to provide tools for the representat of real world knowledge (Universe of Discourse UoD, all those entities ‘of interest’ that h been, are or ever might be in the specific context [14]) such as concept definition, assm tions, constraints, and event descriptions. A major reason for the interest in interactions between AI and a data base is realization that, on the one hand, significant improvements in productivity and functiona of information systems require a rich semantic theory such as are being proposed knowledge-based and reasoning systems. On the other hand, practical application of same AI technology requires progress on systems and efficiency issues such as addressee database management research. A meeting point for the two technologies is on the issue knowledge base management systems (KBMS). KBMS are being proposed as a I technology which will provide friendly environments for the retrieval and manipulatior large shared knowledge bases, with functionalities such as deductive reasoning, concun access of knowledge base, distribution of knowledge base over several geographic locatic error recovery and security. The resulting structures are expected to be natural and di: representations of the knowledge and they can facilitate communication between don experts system builders and system users. Feigenbaum defines the activity of Knowledge Engineering (ICE) as follows: “The kn ledge engineer practices the art of bringing the principles and tools of AI researchs to t on difficult applications problems requiring expert’s knowledge for their solution. ’ technical issues of acquiring this knowledge, representing it, and using it appropriate11 construct and explain lines-of-reasoning, are important problems in the design of knowle based systems.. . . . The art of constructing intelligent agents is both part of and an extensio the programming art. It is the art of building the complex computer programs represent and reason with knowledge of the world” [12]. By an expert we usually mean a person who knows how to define problems and SC them. He knows what facts or data to collect and to investigate, he knows what rules to and he knows how to make inferences. To build an expert system, we essentially have to store the expert’s knowledge, collection of rules and facts in a knowledge base. The ‘process’ of extracting knowledge f an expert is called ‘knowledge acquisition’, whereas the ‘procedure’ of encoding it : program is called ‘knowledge representation’ and techniques to use this knowledge are ca ‘knowledge manipulation and retrieval’. Knowledge may be readily available in instructions or regulations (e.g. for tax calculati it may be distributed over paragraphs, chapters, and even books, or it may be stored in or as a whole in the brain of human experts. As a matter of fact, the knowledge neede such expert systems usually originates from a mixture of all these three sources: Tabular 1 and formula type of a regulation, natural language text, and unformulated human exper This transformation is the heart of the expert system development process. For designing intelligent systems, we have to be able to predict the behavior of information systems more precisely before implementing them, we have to be able to r( plan them in advance and we have to see the consequence from the technological poir view. Since we can prove the correctness of a program according to its formal specifics only for small problems, we have to support the specification definition process. If specification of a problem solution is wrong then the correctness of proof is useless. The problem is therefore to arrive at a model which is as complete as possible from a g informal description of the application. With the growing complexity of today’s comF based systems, this task is becoming more difficult. There is still a lack of’ methodology which forces the designer to go through’ several 5 until arriving at a model(s), which covers all aspects (static, dynamic) of an application.
R. Yasdi
1 A conceptual
design
aid environment
for expert-database
systems
33
There are two persons mainly concerned in information systems applications, the application specialist (expert), who knows the application and has no knowledge of the data base internal structure and the data base designer (Knowledge Engineer), who knows a lot about the data base design but is not very familiar with the specific application. To design a system which is as complete as possible, free from contradiction and redundancy, an extensive requirements analysis of the application is necessary. The easy way to collect the requirements by interviewing a couple of experts at different levels of the managerial hierarchy usually leads to confusion and misunderstanding of the KE and as consequence to the design of an inconsistent system. To overcome this deficiency the expert should be involved in the design process as soon as possible. This can be done in a modelling environment (see Fig. 2), which aims to provide a means to -Guide the data base designer to arrive at a model which satisfies as many information requirements as possible; - Provide a convenient interface for interacting with the expert via a model, which is easy to understand and use, also for someone who is not familiar with computers; - Discover inconsistencies in the design; -Allow the expert to formulate and verify system properties without referring to a certain implementation. Knowledge acquisition presupposes the availability of suitable techniques for knowledge representation. Having developed in the knowledge acquisition process a model which characterizes the important aspects of the application, we can use this as a basis for the knowledge representation in a prototype. In most of the existing applications knowledge is part of a program. In order to make such systems more flexible, more transparent and more maintainable, the knowledge should be kept separate from the knowledge processing program, i.e. in a knowledge base.
2. Introduction Expert systems are problem-solving programs that solve substantial problems generally conceded as being difficult and requiring expertise. They simulate the behavior of a human expert in a specialized problem domain. They can offer ‘intelligent advice’ and justify their line of inference. Usually the expert system engages the user in a dialog. Through dialog, the system may eventually provide expert advice or a solution to the user problem, or supply information the user is looking for. The user interface with the system is essentially through question answering, that is the system asks a question and the user responds with an answer, or asks the user more questions if it needs to collect more facts. There are many ways to pose questions. For example, we may use menus or sentences. Response can be selection of options, keywords, phrases, or complete sentences. We may use tables, then data are entered in tables using the sentences as a heading. If knowledge goes beyond the facts, i.e. contains also rules that allow to deduce new facts and rules, the system must inc1ude.a deduction mechanism which depends on the knowledge representation and on the type of problem to be solved. A deduction mechanism essentially is a theorem prover, which proves that the answer for some question is implied by the rules and facts in the knowledge base. This theorem prover can also be used to check whether knowledge to be added to a knowledge base is consistent with already stored knowledge. To satisfy the above requirements we have to distinguish the following major problem areas that must be dealt with during the system development process.
34
R. Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
Knowledge acquisition
Knowledge acquisition facilitates the understanding of application domains. It is a con ponent for interacting with an expert in order to obtain the expert knowledge. Th component is designed to communicate with a expert even if the expert knows little abet programming or computing. The acquisition of knowledge from human experts for th purpose of Knowledge Representation (KR) demands intensive involvement with the subjet matter of the application domain. In the field of medicine, for example, knowledge fro] physicians about diseases, medical procedures, hospital procedures and diagnosis have bee encoded in KR languages; knowledge about chemistry and geology have been captured, an physical bodies and time have been formalized and analyzed. Understanding the problem domain(s) is an absolute prerequisite to building correct, reliable systems, and knowledf acquisition techniques have proved extremely useful in this respect. Knowledge representation
The result of knowledge acquisition is a set of rules and facts which represent the experti: about the particular problem and has the same structure as the observed application. Facts are associative tuples, that are values of attributes. For example the fact that 5 employee is 30 years old. A rule is a conditional sentence relating several fact statements in a logical relation. Tl nature of the relation varies from rule to rule. For example the rule that all employees mu be older than 30 years. The conditional part of a rule is called its ‘premise’. In general, a premise is a condition I a number of clauses. Each clause is a simple statement concerning facts such as “The age ( the employee is more than 30 years” or “The status of the employee is married”. Thus tl syntax for a rule is: premise : action :
*((clause 1). . .(clause n)) (conclude(new fact))
Where * indicates a logical condition.
For example:
If the employee is older than 30 years and the employee is single, then hire the employee. This rule represented internally in PROLOG hire(Employee,
reads:
Company)temployee-age(Employee, Age 30) & Age > 30 & employee-status(Employee, single)
From a logic point of view this is Proof Theory which prescribes the means to correc reason according to the conceptual structure or syntax of the given universe. This is the called conceptual schema and consists of a set of axioms and a set of theorems. Proof Thee delivers criteria for the design of an inference engine as well as a knowledge representati formalism for specifying the ‘general state of affairs’, where Model Theory assigns meaning the logic statements of the Proof Theory. This is the interpretation of the theory a constitutes the data base. Model Theory delivers criteria for the design of the data b; management as well as the knowledge representation formalism for specifying ‘specific sta of affair’. This knowledge is acquired by the knowledge acquisition component. The knowledge representation presupposes the availability of suitable techniques : manipulation and retrieval.
R. Yasdi
2.1. Building
/ A conceptual
design
aid environment
for
expert-database
systems
35
the expert system
Both theoretical and practical considerations led us to the architecture which is illustrated in Fig. 1. We have a knowledge-based architecture, which contains the following basic components: -Knowledge acquisition aids the designer in extending, correcting, and refining the knowledge (expertise) base. It has a mechanism that identifies the relevant knowledge. at each stage. -Knowledge base. The knowledge base has both an internal and an external ‘image’. The internal image represents the rules specifying the general state of affairs and the external image represent the facts which relate to a particular instance of the application specific state of affairs. The internal image is embedded in a PROLOG environment, where the external image is logically stored in the files which can be accessed by data base management. This distinct separation is due to the fact that we are dealing with a large amount of data which are beyond the PROLOG storage capacity. -Dialog component is used to permit the interaction with the expert, user and designers. A user friendly dialog interface with natural language will play an important role is this respect. The transformation of files which PROLOG cannot process directly into PROLOG format is logically contained here. -Monitor or control knowledge (knowledge about knowledge). This consists of procedures to control the evaluation process and to provide communication to the external interface. These are also embedded within the internal image. -Knowledge manipulation and retrieval. Supports updates to and queries on the knowledge base subject to satisfying the constraints. For this purpose a reasoning-technique is employed which is a strategy that determines which knowledge to apply and in what order. It contains
f a
II f 0
C
t
tt S
S
nr
i I
deduction
rules
I
derivation
rules
I I
Fig. 1. Basic components of an expert system.
r e e
I
36
R. Yasdi
I A concepttial
design
aid environment
for expert-database
systems
an inference (deduction) mechanism which uses a set of knowledge to derive conclusion! from the knowledge base. In some instances it should be able to cope with uncertair knowledge. -Explanation component. This component is able to explain why certain conclusions have been made. It answers “why?” and “why not?” questions. The designer is expected to declare which predicates are interesting. - Searching engine serves as searching engine for the desired facts. A user may access this combined system through a high-level interface that accept knowledge and questions in the form of English-like statements and returns answers am explanations for these answers as derived facts and ‘evidence chains’, respectively. L question is in the form of manipulation, deduction or explanation. An update query from a user is transformed into the manipulation component. Thi component applies all constraints on affected data base objects, and if these are satisfied the] new knowledge (facts but not rules) is added to the knowledge base. Updating the rules is done by a designer or expert person. In this case the knowledg acquisition component is responsible for the correct rule-manipulation. If at some point constraint is violated, then the transaction fails and the update is not committed. If a user requires no deduction it is translated directly from its English-like form to th query form of the underlying searching engine and the answer is obtained directly an presented to the user. If a user’s query requires deduction, then a deduction plan is retrieved from the reasonin rules and stored in the stack temporarily, where the inference engine can work on it. A required deductions become deeper and more complex the inference engine must operate fo a long period of time. If explanations are required, then the explanation component justifie the inference by popping the stack. From the inference engine’s point of view the searching engine and external fact bas combination is a means of achieving a massive amount of OR-parallelism over the deductiv search for relation instances (tuples), thereby greatly increasing the efficiency of deductiv questions answering. From the user and searching engine point of view the inference engin and rule base combination constitutes a body of application specific expertise that converi high level user queries into low level data access strategies. By separating rule search from fact search both kinds of search may be optimized in the respective engines.
3. Methodology
of knowledge acquisition
3.1. Introduction The acquisition of knowledge from human experts for the purpose of knowledf representation intensifies the involvements with the subject matter of the application. particularly effective method for analysing is modelling, representing the application and tl real-world situation in a graphical and formal notation. The model can be used as the bas for communication and for understanding the application that will comprise the system. Y propose that the best way to capture the needed information is to structure the description ; a ‘conceptual model’, i.e. a model that reflects the content and structure of the applicatic world. In this respect we develop a Conceptual Knowledge Model (CKM) to specify : application process-oriented and data-oriented concepts as well as integrity constraints that part of the development process where the expert plays a vital role. During the knowledge acquisition process the expert and Knowledge Engineer (Kl
R. Yasdi I A concephul design aid environment for expert-database systems
37
explicate the objects, relations and information flow characteristics needed to describe the problem. They also sp&fy subtasks, strategies and constraints related to the problem activity. In the following we describe the knowledge acquisition process in our apprbach. 3.2. The acquisition mechanism
The main task of a knowledge engineer is revealing and formalizing the expert’s knowledge. Though an extended series of interactions, the knowledge engineering team (the knowledge engineer and the expert) defines the problem to be attacked, discovers the basic concepts involved, and develops rules that express the relationships existing between concepts. We distinguish four major problem areas that must be dealt with during the+ system development process (see Fig. 2): - abstraction, - specification, - realization, -testing. During the abstraction phase, the KE selects objects, functions and relationships of the system and documents in an informal form. In the specification phase the identified concepts will be represented in the CKM. KE chooses to. represent the entire system, a subsystem, an aspect or a combination of a subsystem and an’aspect system in a graphical, and formal form. The relevant properties of the elements and relationships in the system are observed. The result of the specification process is a schema representing the dynamic and static structure of the application. The realization deals with the problems of mapping the conceptual schema into a rapid prototype, which serves as a fast feedback about user acceptance. ab&o&ion c applicalim
knwlvdge
Expert &
en*ironmenl
ocqui~ition
Conceptual Knowledqo Mc&llinq (CKW > t*tiinq
swclfkatlon 1
t Inlormotlon
)
prototype
rsallzatlon Leqend: -
communicate
-
data flow
Fig. 2. Expert systemdesign environment.
user
splem
,*
38
R. Yasdi
I A conceptual
design
aid environment
for
expert-database
systems
The testing involves evaluation of the prototype. Those activities should be tested that are necessary to determine whether the system satisfies the requirements, in the sense of functionality, performance, user interface, etc. During this phase the conclusion is derived from the model(s) that are tested. If the result of this activity is not satisfactory, the model(s) must be improved and sometimes refined. This will result in a repetition of the phases in the modelling cycle. As shown in Fig. 2 we divide these phases into two environments namely, modelling the application and realization of the application which corresponds to knowledge acquisition and knowledge representation in turn. From an initial set of information requirements, a conceptual schema is developed and a prototype is built and used in operation. In order to capture new information requirements, the prototype is augmented by the new or revised information, it is used in operation, and so on, until a satisfactory model is obtained. Such a design situation is characterized by a stepwise integration of new information. A great flexibility in the design is necessary. Each of the above defined main phases may be divided into subphases. For the specification phase they are: requirements analysis, conceptualization, formalization and implementation. These stages are not clear-cut, well defined, or even independent. At best they are characterizing roughly the complex process of the knowledge acquisition. For example, formalization and implementation are closely related and changing one may lead to immediate reformulation of the other. In the following we give a short introduction to the process of specification in our approach (see Fig. 3). 3.2.1. Requirements analysis The information of interest for a specific application is collected by interviews or from documentation and exists first in the KEYS brain. He analyses it, decides with the expert which type of information is to be included in the model, and specifies the requirements on the system. This analysis is documented in an informal form. 3.2.2. Graphical Representation (GR) We shall describe here concepts, techniques and procedures that can be used to develop three graphs. 3.2.2.1. The Conceptual Requirement Graph (CRG). The main task of CRG is to derive interactively the system in question by abstracting and/or refining the description levels functionally up to the point.where a complete function hierarchy exists. Its purpose is to give an overview of functions in an organization that must be supported by an IS and derive the information requirements. In this framework, the modelling of events is used for transferring the state change in the system and indicating the time for activating the functions and identifying dependencies between functions, i.e. sequence, concurrency and mutual exclusion.. 3.2.2.2. The Conceptual Behavior Graph (CBG). The term behavior is used to emphasize the fact that we focus on the functional behavior of the system. Here the semantic integrity constraints are defined, those to be enforced by the system before an operation begins tc execute. 3.2.2.3. The Conceptual Structure Graph (CSG). The CSG is a semantic network consisting of ‘classes’ and binary ‘relationships’ between these classes. Each class in the semantic network is assumed to be a direct representation of a set of entities, where an entity represents an object in the real world.
R.
Yasdi
I A conceptual
design
*
model
aid environment
expert-database
systems
39
base
Fig. 3. Modelling
3.2.3. Formal Representation
for
environment.
(FR)
Formalization involves mapping the graphs into a formal representation language. In order to perform this task the model is defined in the framework of predicate logic. Then the model constitutes the non-logical axioms of the first order theory for the UoD. There are great advantages in having a uniform formalism. Predicate logic as a language is well known. It is easier to prove correctness of programs or data structure expressed in a uniform and accepted language. The general case of expressing integrity constraints is an example of this. From a logic point of view we can represent the application as a theory T, where its axioms are a set of well-formed formulas, which express explicit information in terms of general rules as well as concrete facts. All other rules or concrete facts which can be derived from the explicit axioms represent the implicit axioms of the theory T. The result of axiomatization will be a set of axioms prepared in the form of general clauses. 3.2.4. Verif ca tion
Whether the theory we created is not inconsistent, i.e. whether the rules we have created are correctly formulated or not (free from redundancy, contradiction, deadlock) is a matter of validation which is normally investigated by means of a theorem prover, or other techniques such as Petri nets. 3.2.5. Implementation
The programming
language PROLOG
is used for the implementation.
PROLOG
is a high
40
R. Yasdi
/ A conceptual
design
aid environment
for
expert-database
systems
level, interactive, non-procedural language with a very simple syntax. This simplifies the programming, makes modification of it easy and gives a natural connection to predicate logic. Programs are generally very small and easy to change. New functions can be added to the system independent of previously defined functions. Definitions are usually recursive and no distinction is made between programs and data. The interpreter is basically a mechanical theorem prover for restricted first order predicate calculus (Horn clauses). The result of axiomatization are clauses in general form, and therefore not suited for the PROLOG. The first step must be the transformation of general clauses into Horn clauses. 32.6.
Transformation
into a knowledge base management
system
CKM constitutes the non-logical axioms of a theory for the UoD, where a conceptual schema constitutes the non-logical axioms of the theory for the UoD of the knowledge base. The latter universe must be a finite subset of the former. Thus, we have to consider possible differences of the universes. 3.2.7. Prototyping
Having an accurate and hopefully complete representation of the application we can start to design the actual expert system based on the CKM. By means of repetitive prototyping we can see whether the correct information is stored and whether the system can help the usex to solve his problems. Further we build a prototype in order to - Enable the user to evaluate the interface and to suggest changes; - Enable the designer to evaluate user experience both objectively and subjectively. It helps to improve the existing model by adding new information; - Give the user a more immediate sense of the proposed system and its usability for assisting him in his work; - Reduce the risk of project failure. 3.2.8. Related work
The first proposal for a semantic-based framework specifically proposed for requirement5 came from [52]. His framework used abstraction principles. His implementation translate: the conceptual model into a relational data base model, and he prescribes a multi-step process for doing requirements and design based on his approach. The Information System (IS) area is an abstract source of modelling ideas (e.g. see [33]) The primary purpose of data models was originally to describe the data in the system in : manner independent of the physical implementation, rather than to model the world, but the more recent work on the so-called ‘semantic data models’ [3,44] has been increasing11 concerned with what the data presents in the real world. Certainly, there is a tremendous amount of research in the IS areas from which we car learn; and have learned, a great deal. However our approach differs from those in tht following respects: The approach described by Bubenko [15] is totally descriptive and does not include an] operational aspects. He promotes a ‘goal-oriented’ strategy which implies that the use) defines all information requirements in terms of an application and by starting op.erationa criteria which must be satisfied in order to consider a solution-oriented requirement specification which is normally presented in terms of data sets and flows, and rules fo: process initialization and synchronization. The two views are quite consistent with ou: approach. In [27] the ISAC methodology is proposed. However the graph structures used withir ISAC do not provide a detailed specification of the activation conditions for activities. WC
R. Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
41
employ the event and trigger concept for this purpose. The CKM shares with ISAC the abstraction principles. It allows, through recursion, a theoretically unbounded number of abstraction levels. CKM differs from channel/agency nets [36] in the following point: - Our methodology data and activity aspects are modelled through the design phase and a more parallel development of data and activities is achieved. In comparing the CI(M with the usual software engineering approach using a functional specification as SADT (Structured Analysis and Design Technique, a trademark of Softech. Inc.), the difference is that SADT does not have a formal definition of concepts while CKM has. Another difference is that CKM captures more of the semantics of the system data and processes than a functional specification approach. Another difference is that CKM captures much more about the intended use of system and how it must respond to its environment. In summary we highlight the important advantages of our approach: - It combines three approaches to specification: procedural, logical, and data modelling into a single, uniform framework. - Abstraction principles shown useful previously in other areas such as Information Systems, AI, and programming languages, are applied to refinement modelling. - It provides methodological guidelines for step by step developing the model throughout its life cycle. - It has a formal definition, namely a translation into First Order Logic. The CKh4 specifies a theory and it is consistent if and only if the corresponding theory is consistent. - It follows the new way of Software Engineering, that emphasizes the representation of knowledge as an essential step in system development and maintenance. 3.3. Conceptual Knowledge Model (CKM)
The CKM is a specification of the IS. It will be derived from an informal requirements description of the IS by using several iteration cycles as discussed earlier. The CKM is a ‘model type’, i.e. consists of models. A model can be a graphical model, a formal model, or any analytical model. Depending on the aspect, which we are looking for, we choose a suitable representation. For example, we use a graphical representation in order to have a more pleasant interaction with the users. In the following we will introduce a graphical representation and a formal representation. 3.3.1. Graphical Representation
The basic concepts of GR are graphical symbols, e.g. ellipses, rectangles, arrows, etc. The reason for this representation is not only the communicability factor but also a precise ‘definition’ of the concepts, those we are going to formalize later on. From our point of view it is important to consider. both the dynamic as well as the static aspects of an IS from the very beginning. Since the design of the conceptual schema will be immediately influenced by the static and dynamic aspects of the IS and vice versa, therefore the GR comprises different types of graphs for covering static and dynamic aspects. We will introduce the concepts and techniques that can be used to construct a model represented by three graphs: the requirement graph, the behavior graph, and the information structure graph. The requirement graph supports the determination of information requirements; the
R. Yasdi / A conceptual design aid environment for expert-database systems
42
behavior graph serves as a specification of the integrity constraints and the dynamic aspects; the structure graph shows the static aspects of IS, i.e. the entities and their associations. 3.32.
Conceptual Requirement
Graph (CRG)
The objective of CRG is to provide a broad understanding of the enterprise as a whole and to identify those information and functions upon which attention needs to be focused. The application will be expressed in terms of functions and information flow through these functions to show the relationships between them. By means of CRG we will derive interactively the system in question by abstracting and/or refining the description levels functionally to the point where a complete activity hierarchy exists. On various abstraction levels there are different detailed system descriptions where each level is self consistent, see also Fig. 4. The CRG is a directed graph defined as: (F, 4 E, T n) where - F is the set of functions, - I is the set of information units, - E is the set of events, - T is the set of triggers, ‘- Fl is the set information flow. In the following we discuss these concepts. 3.3.3. Definition
of basic concepts
CKM consists of a set of interrelated specification units. Each definition describes some real world phenomenon. CRG has the following elements (see Fig. 5): (1) Function is a group of interrelated activities considered as a black box. We are level 0
level 1
level 2
level 3 Choir/Referee invitation
call for paper planning
Person rejection
Chair selection
Choir rejection
Referee rejectipn
fig. 4. Abstraction levels of refinement of functions.
R. Yasdi / A conceptual design aid environment for expert-database systems symbols
interpretotion
jenototion
function.
activity.
agency
internol
informotion
unit
I
external
informotion
unit
L
externol
event
exE
interno:
e*:cnt
inE
F. A
cl
n
0 0 x El x
trigger
x - input-, output-logic AND. OR. EXOR. loop
informotion reod, ---
l I---
--( ---
I J
1
43
flow,
control
flow
T
FI
write
Compound activity describes on activity (n-1)th level on the Compound describes tj:e,(n;lMh m
of the n th level
lnformotion unit on informotion unit of level on the n-th level
Fig. 5. CRG symbols.
interested primarily in the purpose of a function, i.e. the relationship inputs and its outputs. The set of functions F1 is composed of two disjoint subsets:
that exists between its
F=eFUcF where - eF is the set of elementary functions, .- cF is the set of complex functions. A complex function is composed of elementary functions and/or other complex functions. A function f~ F consumes/reads elements from some of its input information units. If a function f E F is connected with a trigger t E T by an arc Fl E T X F, the function may be performed as soon as the trigger condition is satisfied. (2) Information unit is an entity representing an object of the real world. The set of information units I is composed of two disjoint subsets I=LUC where - L is the set of external information units which represent information entering into a function Li or outgoing from a function Lo when it is in operation. Modification of the L does not activate the functions. The list may be used to pass parameters to the IS and can be discarded after the use. An example of Li is a set of facts in PROLOG, an example of Lo is the user console. - C is the set of internal information units which will be stored as part of the IS and later on transformed into the entity classes of the structure graph.
44
R. Yasdi
/ A conceptual
design
aid environment
For a given function f E F the set of input information
for expert-database
systems
units is defined by
pre(i)=(iI(i,f)EIxF), The set of output information post(i) = (i 1(f, i) E F
units by x
I) .
(3) Information J’IOW is a relationship between two concepts, where one concept is th source and the other the destination of the flow: FlC((IxF)U(FxI))U((Ex
T)U(TxF)).
An arrow points to a function that accesses an information unit and points to a information unit that accesses it, i.e. Fl C (I X F) U (F x I). An arrow represents the association of events with the functions via a trigger Flc((ExT)U(TxF)).
An arc with a single arrow specifies that those elements taken from the source are put int the destination. An arc with a double arrow specifies that those elements taken from the source are PI into the same place when the function (action) has taken place. If the arrow head is omitted a read-only access from the source is modeled. (4) Event. According to IS0 an event is constituted by the fact that something h; happened either in the UoD or in the IS. By our definition an event is a ‘signal’ or a ‘messag which arrives from outside the system or is generated within the system (compare with placing token in the Petri nets, activating a goal in PROLOG): We use events for synchronization of the behavior of the activities within the system. VT also use events to transmit information into or within the system. We shall call the sendii out of a signal the raising of an event and the receiving of a signal the consumption of ; event. An event happens between two points in time, called ‘start’ and ‘end’ times, and at i time between them the event is said to be active. Messages are signals containing in addition some data. They allow us to pass speci information between the activities. In this kind of setting a signal is a primitive messa conveying the information that something has happened. The set of events is composed of two disjoint subsets: E=exEUinE where - exE is the set of external events which represent instantaneous changes of an object of t real world resulting in a state change of the IS. The occurrence of external events depen on conditions outside the formal scope of IS (exogen): exE = (event-id) ((event-var-list))
.
- inE is the set of internal events. An internal event causes a state change of the IS and occurrence depends either on another state change (generated by a function) or on
R. Yasdi / A conceptual design aid environment for expert-database systems
45
system handled time condition (endogen): inE = (event-id) ((event-var-list))
.
Events will be implemented in PROLOG as a goal for querying and manipulating the data base. (5) Trigger. The trigger mechanism was first formally introduced in SEQUEL, allowing the user to specify certain actions which were to be executed under user specified conditions. The function of a trigger consists in accepting events and activating the activities under specified conditions,’ and to transmit message information (event parameters) into the activities. In particular it allows to control sequence, choice and repetition of activities to represent the behavioral relationships between the actions. The logic symbol in the upper half of the trigger indicates the input condition of the trigger (compare with transition rule of Petri net). According to the label marked in the lower half of the trigger the associated functions will be activated. The upper and the lower label will be omitted if there is only one event or one function, respectively. Where + indicates AND; x sequence; 11concurrency; 0 choice; * repetition; and + alternative. We have implemented the trigger as an ‘active’ element, monitoring its input events and activating its output activities. In general, the reaction of a trigger is described by ‘triggering rules’ defined as follows: T = X ((event)) : X ((function(or
activity)))
- x input or output logic AND, ANY, FOR, - (event) is the set of input events El, , . . , En, - (function (activity)) the set of functions Fl, . . . , Fn (activities Al, . . . , An) to be triggered. There are several compositions of triggering rules for instance: E : sequential E : SEQ(A1, A2)
states that if event E occurs, then actions Al, A2 must be triggered in sequence. and : sequential AND(E1,
E2) : SEQ(A1, A2)
states that if event El and E2 occurs (in any order), then the activities Al and A2 must be triggered in sequence. sequential : sequential SEQ(E1, E2) : SEQ(A1, A2) states that if event E2 occurs after El in sequence then action A2 must be triggered after Al. concurrent : concurrent CON(E1,
E2) : CON(A1,
A2)
states that if ‘in any order’ the event El or E2 occurs then the action Al or A2 must be triggered ‘in any order’.
R. Yasdi
46
/ A conceptual
design
aid environment
for expert-database
systems
choice : choice ANY(E1,
E2) : ANY(A1,
A2)
states if El or E2, but not both occur, then Al or A2, but not both (it does not matter whicl one) must be triggered. alternative : alternative IF(El,EZ)
: THEN(ALA2)
states if event El occurs, action Al must be triggered; if event E2 occurs, action A2 must b8 triggered. E : repetitive E : FOR@* Al) states that for each occurrence of E the action Al is to be triggered n times. 3.3.4. Functional decomposition The size of a diagram will be very large when the application is represented at the low& level of aggregation, i.e. when all the information functions, information units are shown i: one diagram. It would be difficult to grasp its contents. But we can use the decomposition a CRG into a subsystem to cut the diagram into smaller pieces. The diagrams in model are organized in a hierarchic and modular fashion, often callel ‘top-down’. The scope of the system is established in a single overview diagram. Th component parts (functions, information units, events, triggers) shown in the overview ar then detailed, each on another diagram. Each part shown on this detail diagram is agai broken down, and so forth, until the system is described to any desired level of detail. Eat’ decomposition describes a function of the (n - 1)th level on the nth level where n 3 2. In thi way we obtain a hierarchy of the diagrams. By abstraction and/or refining we arrive at a la: level that comprises all activities existing in the IS. 3.3.5. Description of graphs The symbols used in graphs are shown in Fig. 5, boxes represent active elements an arrows represent relationship between these. An arc without an arrow head specifies follow of information and control. An arc with on arrow head is used to model a read or write access. It is clear that only active elements ca read or write. An arc with a double arrow specifies read and write, i.e. entities taken fror the class are put back in the same class again. The large rectangle in the diagram shows the system boundary. The dashed line describing a compound activity and information unit indicate that the black box on th higher level has this structure. Data are transformed into the system by events parameter. In addition, there is ir formation, which is stored in the system over a long period of time and its modification can b done manually (by editing) and has no effect on activating the function, for instance a list c addresses. This information is graphically shown by diamonds. The same symbol is used t represent data leaving the system, for instance to be displayed on the terminal or tram formed into some files. A special notation is provided for referring to the concepts. Each has a capital letter as th first character indicating the type of the concepts. ‘A’ for activities, ‘T’ for triggers, ‘c’ fc
R.
Yasdi
I A concepmal
design
aid environment
for expert-database
systems
41
classes, ‘E’ for events, ‘L’ for lists. The letters are followed by numbers (not greater than six). For example, the diagram with index A21 is the diagram which details box 1 on the A2 diagram. Similarly, box A2 is the diagram that details box 2 on the A0 diagram, which is the top of the model. For convenience we also use texts to make references, so that each concept is associated with the text. Thus we may only use the text references and omit the number references. Figure 6 shows an overview of a conference organization exercise given in [33] (see Appendix B). The total distribution list Ll, the deadline date L6, and the invitation list L7; are examples of the input external information units, whereas call for papers L2, invitation to persons L3, paper to referee L4 and invitation list L5 are output external information units. The PREparation (PRE) Al, Program Committee (PC) A2, and Organization Cbmmittee (OC) A3 are functions. Person Cl, program CT? and attendance Q are internal information units. A0
overview ---
----
EZ Proqrom cGZi
El Init.
.--
/
invito./
---
-------
--
Fig. 6. CRG. El1
El 2 response
Okt.
P
from
P T12 preparing
conference
-
-----Al
I Cl
person selection
Fig. 7. Preparing conference (level 2).
+(-)
person
chair
R. Yasdi
I A conceptual
design
aid environment
for expert-database
-----T
.c2 f-w----
343
.c3
PaPer seleclion K
-+(izzy)-
,.>
systems
session preparing
-4
ti:
---
I
Fig. 8. Program Committee (level 2).
The information flow is a result of functions and activities that are triggered by events II El, paper arrival &2, and response to invitation E3. For instance, prepare activities ( begin, when a message indicating Init. arrives at the system. Then the addresses of I persons to be invited are read from Ll. Names of persons will be stored in the perr information unit, invitations and call for papers are sent to persons. An overview of functions at level 1 in connection with the organization of the conferer is given in Figs. 7 and 8. 3.3.6. Conceptual Behavior Graph (CBG) Above we have been looking for ‘what’ should be done in CRG, now we are interested ‘how’ it should be done. We reveal now the black box and show how it works. Where 1 CRG provides a purely functional description of IS (extended by event concepts) the Cl describes in addition the behavior of each function by so-called pre- and post-conditio They state that whenever the precondition is satisfied the event(s) lead the system to a st, where the post-condition is satisfied. When the precondition is not fulfilled and yet the eve occur, then the activity will not be performed. Thus the CBG is oriented towards specify] the functional behavior of the IS in more detail and towards integrating semantic knowlec about the elements of information units. The CRG is directly mapped to the CBG: - The functions are replaced by activities (or actions); - Information units, events and triggers of CRG correspond to those of CBG. CRG and CBG are also related to each other through the representation of the formation flow. A flow (or part of a flow) in CRG to an information function will also bc flow to an action in CBG. In this way, the information requirement can be traced in 1 CBG. The structure of CBG is represented in the same way as in the CRG. The graphi symbols have the same meaning, with the following exceptions. In the symbols for acti there are also listed the integrity constraints as pre-, and‘ post-condition for activating 1 action in a pseudo-language to describe the behavior of the function in a less formal for specifying the constraints which must be met before and after triggering the actic According to the label marked in the lower half of the trigger symbol the action within t rectangle will be activated. During the processing the lists and classes are read or written, Graphically the pre-, and post-conditions are separated by a horizontal line. In Fig. 9 see an example of the session preparing function. When the external events, dead line tir
R. Yosdi / A conceptual design aid environment for expert-database systems
dead
program meetting
line time
/gjpyq ke&n(mox)
49
committee day
1lecz?~ I incr(n,l)
h-h,1 a
h-1
Issigning papers to
roqramhpq)
Fig. 9. An example of triggering of activities.
‘and’ program committee meeting day occur, the papers are to be assigned to the sessions, ‘and’ sessions are to be grouped into the program ‘and’ an internal event is to be generated by the done action to inform the subsequent activity about the performed task. The precondition n < max specifies that a paper may be assigned to a session as long as the maximum number of papers, which may be included in a session has not been reached. By assigning a paper to a specific session the number of assigned papers is raised by 1 (as specified by the post-condition incr(n, 1). Furthermore, since we assume that a session is modeled as a grouping of papers, the relationship pa-se@a,se) is estabhshed between the session and the assigned papers. For the same reason the relationship’between session and program se-pgm(se, pgm) is created. According to the output logic to trigger (*) this triggering is repeated several times (in PROLOG until a ‘fail’ is detected). We will now provide a definition for CBG, and discuss those parts of the CBG in detail which differ from CRG. A CBG is defined as a directed graph:
where - I is the set of information units, - A is the set of activities, - E is the set of events, - T is the set of triggers, - Fl is a relation representing the arcs of CBG. All functions of CRG may be directly taken over in CBG by replacing function by action. (1) Action. An action is an application-oriented operation on information units. Before performing the action, the preconditions must be met, and actions on other object may be
50
R. Yasdi
/ A conceptual
necessary. After the action provide the only means for integrity of information units. any legal application-oriented
design
for
aid environment
expert-database
systems
is invoked a post-condition must be checked. Thus actiol manipulating information units. This ensures the semant The objective of an action is, that it may be used to constru operation:
A : : = (Pre-cond, Action, Post-cond)
where Pre-cond, Post-cond : : = Ii U Ij, Ii, Ij : : = term logical operation term, term ::= variables, constant. (2) Behavior of an information unit is completely defined by its activities. If an activi a E A is associated with a trigger by an arc which is an element of TX F, the occurrence input events and satisfaction of the input condition of the trigger is a prerequisite f executing the activity. .An activity may be executed if (some of) its input information un contain appropriate entities and (some of) its output information units are able to acce corresponding entities. The restriction which must hold for entities of input (outpl information units are specified by the pre- and post-condition of the activity. In the same w as channel/agency net [36] the CBG does not provide a unique specification with respect the consumption and generation of entities. During the execution of an activity: - Not all input information units must contain entities; - Not all output information units receive entities; - The arcs do not specify the number of entitios which are consumed or produced. (See also Figs. 10, 11 and 12). > El1
El2
okt.
response from choir/relarse(p.
dsc,
m
Tll 3 Al 1 . invitation
Al 2. .v---
r-1
w penon
712 selection
Al .Cl p. rejection 4
Fig. 10. CBG-Al:
Preparing Conference (level 3).
51
R. Yasdi / A conceptual design aid environment for expert-database systems
l I L6
E21 paper orrivol (person.paper.time)
IS22 refree report tpapsr.autw.tims)
J 421 pee paper selection - cl - - - Iqrtliywfdattq)l ’
‘E23 dead line time E24 PC-meeting P
H
3
T23
T
I
(arssio I I
& session preparing
Fig. 11. CBG-AX:
n
1
E23 dead
line
Program Committee (level 3).
P
time
E24’ PC-meeting
PC-meeting
day
L2 5 accept. criteria
/
x2.c
3
T23
’ /
A2.a ’ !G=’ --/
evolution
I Al * 74 P
&I
gr(5,pa-eval) paper
rejection
I I I I
4 max nr. of papers in scssion(max) max nr. of session in programtmawp)
yj..P’ -.“.,“I , bA
paper
selection
day’
T24
t
-------
L
i/
‘1
A23 paper
J -J
----
E24
day
- I
A24 session -_-_----
‘A
‘I
,I
preparing 1
A2.W
accep
#urrenI
e
I
n.r\
/
w A2.C5 wren n f session f-7 ‘n proaram -
, program
/
-/
Fig. 12. CBG-A22 Program Committee (level 4).
I I
I
c2 eropmm)
R. Yasdi / A conceptual design aid environment
52
I hire(X)
for
enter-manager -salary(X)
expert-database
systems
set-state
cold.
X - manager
X - employee we-emp
oyet
FL
Y
!I calcul. empl. vacation
Rg. 13. Actions hierarchy.
3.3.6.1. Action hierarchy. In order to specify a complete information handling we also need to model the actic hierarchy. Using the example introduced in [30] we are able to execute actions ir hierarchical manner, providing the user with a more explicit description of the in1 dependencies of the actions. This is shown in Fig. 13 for the event hire. When the ev hire(manager) arrives, the manager will be hired. Then after the activities hire employ enter manager salary, calculate manager vacation and set stock opt are executed in turn. 1 event hire(employee) which can also be used to hire secretaries and engineers for exam will not be used to trigger enter-salary and calculate-vacation as these are handled enter-manager-salary and calculate manager vacation. The events are totally independent. 1 example the activity create ID may be triggered while not necessarily hire-employee sho have occurred. 3.3.7. Conceptual Structure Graph (CSG) The static aspect of the information units (properties and associations to each other represented in a so-called semantic network. and used to describe the meaning and the t: of data stored in IS. By means of CRG and CBG we arrive at the source level, where information units will not be refined any further, i.e. we will replace information units entity-class. Note. The main problem with the semantic network is that the application specialist (exI person) is not familiar with all the modelling concepts usually offered by a semantic netwc so that the identification of entity-classes and recognition of abstractions will not be an e
‘R. Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
53
task. Even a designer is not in a position to get a general idea of it, because of the lack of knowledge about the application. As a result it is not feasible to develop within one design cycle a complete network. But as we develop CKM iteratively and gain successively new information during each cycle, at least we will have a complete network. In the field of database and artificial intelligence, several forms of semantic networks or entity-relationship diagrams have been introduced; among other [7, 8, 11, 40, 471. We have kept the main concepts of these proposals and added some of our own ideas. We will now give a definition of the CSG, for further details see the references. (1) Definition of CSG concepfs. A semantic network is a directed net consisting of labelled nodes and labelled arcs connecting these nodes, where nodes represent the entity-classes and arcs associations among them. More precisely CSG is defined as:
where - EC stands for entity-classes, -AC stands for a class of arcs designating the associations, - IC stands for constraints assigned to EC and AC. (2) Entity-class. An entity represents an object of the real world. A set of entities having the same properties is an entity-class. We represent an entity class by the predicate C(X, T) where - X : a set of entities of the defined type, - T : time (optional). For example, employee(Employee-name, Year). The fact employee(Meier, 1984) indicates the existence of Meier at year 1984. Restriction: all entity-classes must have unique names. (3) Association class. We distinguish four types of connections, that is AC is composed of AC=RUGUAgUGr where R = set of relationships,
G = set of generalizations, Ag = set of aggregations, Gr = set of groupings. The set of relationships
is composed of two disjoint subsets
R=MMRUCMR where - MMR is the member-member-relationship; it describes relationships between members of two classes. Example: employee-worksfor-department. - CMR is the class-member-relationship; it specifies the relation between a class as a whole and a member of another class. It represents the attributes of that entity-type. Example: employee-hasmax.cardinality-number. We use the notation r(C1, C2) to represent the relationships between two classes. An example would be has-age(Person, Age); this indicates that all persons from class person
54
R. Yasdi I A conceptual design aid environment for expert-database systems
have an age from class age. Every entity of class person ‘can’ have a relation to the an&her entit class, e.g. john-smith has a relation to age-class, but no relation to car-class. Restrictions to relationships :
(i) All relationships must have different names; (ii) Cardinality constraints: For a given relationship between two entity-classes A and 1 Cr = (min, max) specifies the minimal and maximal number of entities of class A related t class B. The minimum and maximum cardinality constraints attached to a relationshi describe how many times an entity can occur or must occur in the set of relationship! Standard cardinalities are: (0, l), (1, l), (0, *), (1, * ) . * means that the maximum cardinality is nc restricted. (4) Generalization is an association arc which connects a sub-class to a super-class, wher sub-class is a subset of the super-class. For example, the facts that an employee is a perso and a student is a person are represented by generalization arcs going from the employet class to the person-class and from the student-class to the person-class. The arcs are labelle with the ‘role’ that the person has in this particular association. In this case, it is the functio of the person to be either an employee or a student. The most important consequence of this organization is that properties (relations) can b inherited down the hierarchy. Such an organisation is efficient, since properties can b associated with the most general applicable class and natural because the more two classe have in common, the closer they will be in the hierarchy. To say that person is generalization of student (conversely, student is a specialization of person) means that ever member of class student is a member of class person. Moreover every definitional propert of a person must be a property of a student. Generalization is also a means for a stepwise refinement that is based on the introductio of details for special classes. We use the notation is-a(C2, Cl) for is-a(STUDENT, PERSON). Restriction: is-a(C1, C2): an element of Cl must be an element of C2. (5) Aggregation is an arc a which connects an entity-class representing a ‘part of’ an entil to the entity representing the entire entity. For example the fact that street and city a~ components of entity address is represented by an aggregation arc going from street-cla! and city-class to the address-class. Inheritance also serves as a memory aid, since knowing that a class is a subclass (: another class, allows one to concentrate on the additional information needed to describe th subclass. We use the notation agg(C1, C2). Restriction: agg(C1, C2) an entity of Cl has exactly one component of C2. Like classification, aggregation can be applied recursively so that one can represent th components of the components of an object, etc. Thus aggregation defines a secon organizational dimension. (6) Grouping is an arc which connects an entity-class comprising groups of entities to super entity-class. Each group is identified by an entity of super-class. For example, the fat that many sessions are grouped into a program is given by a double arrow going from th session-class to the program-class. We use the notation gr(C2, Cl), i.e. an entity of Cl has n entities xl, x2,. . . , xn of C2 2 elements. Semantic networks are often conveyed using a directed graph. We will use the graphic; convention as shown in Fig. 14. After drawing the net, we must determine the cardinality of relationships. A one to on cardinality is shown in the graph with a single arrow in the relationship symbol, where one t many cardinality is indicated with a double arrow. The example of a semantic network for [33] exercise is given in Fig. 15.
R. Yasdi / A conceptual design aid environment for expert-database entity-class
or data-class
relationship cordinality: one to bne one to many generalization
(GE-)
55
systems
> z>>
aggregation
ICI
grouping
Fig. 14. Symbols of CSG.
z (;jgi”~fy5!p=Program
ha1
Sesrion
Fig. 15. Example of a CSG-diagram.
O!
-
56
R. Yasdi / A conceptual design aid environment for expert-database systems
3.3.8. The methodology
in the system life cycle
In this paper the methodology covers two phases of the system life cycle. In the first phase the CKh4 is developed until we have identified entities, their characteristics, and their behavior. The second phase involves validation of the model, and checking that the stated information requirements are satisfied. 3.3.8.1. Phase 1. The development of CKMgraphs.
The main idea is, that CKM is incrementally developed during the first phase. There is no strict top-down design strategy. The graphs are elaborated and quite possibly changed in the subsequent stages. In this section the development process is outlined. The process includes 3 major steps: 1. In the first step a ‘skeleton’ is configurated merely by reasoning about the application discourse. The CRG is here successively decomposed and extended. 2. In the subsequent step the behavior of the CKM is represented in CBG focused deeply, by specifying the integrity constraints. 3. The last step involves drawing the CSG, where the identified entities in previous steps are associated to each other. of CRG An overview of the enterprise is taken as a whole, and documented in terms of its functionality, information flow and major information uints. We are dealing here with ‘what’ rather than ‘how’. The following substeps guide this process. - User interaction: consider the user interaction with the system in the form of invoking functions by means of events IF, . . . THEN, . . . - Function determination: determine the desired functions to carry out the task and divide the task into a sequence of functions or group of concurrent functions. - Information recognition: make an inventory of information units and events needed per function in the form: Step 1. Development
function
event
input info
output info
PC
paper arrival ref. report
papers ref. report
program accept. paper
- Information flow: draw the information flow from source to destination. - Decomposition: determine which functions should be further decomposed
and which information units analysed in more detail. Complex functions are grouped into the subfunctions. The functional decomposition should ensure that the users recognize the functions as necessary to perform their job. The process of decomposition is continued until the analyst finds that he is finished with dealing with ‘what’ and can go to consider ‘how’. As a clue, if we decompose a function to subfunctions and we find out, that there is no information flowing from one subfunction to another, then we are finished with the decomposition. The endproduct of this step is a thoroughly understood and documented application. of CBG We are now in a stage that we can reveal the black boxes and look inside, to see ‘how’ they do work. Following substeps help to do this: - Parameter specification: Specify information to be passed as parameter to the action to be obtained through user interaction via events. - Entity class identification: Specify objects in the CBG on which an action will be performed, determine the retrieval and update mode. Step 2. Development
R. Yasdi / A conceptual @sign aid environment for expw-database
systems
.57
- Activity analysis: In order to determine whether an activity is exactly defined it must be checked, whether - there exist situations in which not all input/output information units are accessed by an activity. - the pre- and post-conditions specified for an activity cover alternatives which should be represented by several subactivities. Having identified such situations each of the alternative state transitions, which are represented by one activity, are represented by a corresponding subactivity. -Pre- and post-condition specification: The pre- and post-conditions of an activity over an entity-class must enforce all properties of that object with respect to its neighbors. Notice that the conditions cannot refer to a class outside the action schema. The behavior of an activity can be based on the behavior specification of its trigger. According to this the preconditions are the conditions required to correctly initiate the sequence of actions. The post-conditions are conditions that ensure the correct execution of actions. The pre- and post-condition should verify any other ‘user defined’ integrity constraints. Note: at this level we do not consider the integrity constraints based on a data model, with its primitive operations like ‘insert’ and ‘delete’. These are specified at the data base management level. Step 3. Developtient of CSG A major part of this task, in terms of the work involved, is defining the meaning of all entities identified. Once the entities have been identified, we have to find the interaction between entities known as relationships. These have their own properties and require to be defined precisely as follows: - Entity-class selection: Select the entity-class already identified from CBG. - Partition: Partition classes having the same nature. Example: Person, employee, worker. - Association: Relate the classes to each other in terms of relationships and abstractions and examine them with respect to the following criteria: - the two relationship “Cl a C2” and “Cl is-a C2” must not hold simultaneously between Cl and C2, - if a relationship Cl elem C2 is defined and C2 does contain only one entity, the entity class C2 may eventually be removed (depending on the specific application). -Integration : Form an integrated structure graph including all classes. - Cardinality designation: Determine dependencies of classes in terms of: one to one (1, l), one to many (1, *), many to many (*, *). - Access table generation: Once the graphs have been determined it is necessary to find the affinity between functions. This can be done manually by constructing a matrix as is shown in Table 1, where entity classes against activities are determined. This matrix shows exactly the
7 r- ---01 .T 0 L Fig. 16.
58
R. Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
Table 1. Cl,. . . , Ch J [actions-, reg. paper referee expec. referee-report accep. referee-report accep. papers rejec. papers
collecting referee reports
evaluating papers
r r
g
r
6 67
type of information retrieved and updated by an activity. The matrix contains a column for every activity and a row for every entity-class. A matrix element is filled with g/c/r/m or is empty, indicating generate, create, read, modify. This matrix can be used to perform a completeness check for the IS. As a rule, every entity class should be updated by exactly one activity. If this is not the case one should consider the combination of the activities. Otherwise, the synchronization consideration is necessary. If a class or relationship does not have an entity in the access table, a new iteration cycle is required to introduce the corresponding activities.
3.3.8.2. Phase 2, inspection, verification of CKM. The objective of constructing the CKM is that it ‘should be’ a representation of knowledge about the UoD and we ‘assume’ that is a model of the area of interest. Thus we can use it as a basis for the design. In this sense we have to make sure that the model is correct. In most approaches it is assumed that the correctness of an information model is decided from heuristic considerations, although formal methods may be applied. There is a number of requirements that a model must conform to. Among these are: consistency, satisfiability, completeness, non-redundancy, and absence of deadlocks. We have to provide techniques for formally verifying the design decisions leading to the design of IS. In the following section we will discuss these. (1) Correctness of CKM. We will discuss three cbrrectness criteria of CKM, namely: consistency, satisfiability and completeness. From a logical point of view CKh4 constitutes a first-order theory. We employ the resolution principle to check that CKM is satisfied in a defined mathematical model, which is assumed to be a representation of the area of interest. We will attach particular importance to this subject, and will discuss it in the next section. (2) Redundancy. Once the analyst is confident that the graphs are sufficiently complete, he starts searching for any redundancy. Typically low functions are found to be repeated in several previously independent function hierarchies. A final check needs to be made to eliminate duplicated entity-classes. This is normally a result of the same class being given different but similar names by different users. (3) Deadlock properties. The model can be transformed in a net of agencies and channels of a Petri net. This gives on the one hand a well-defined formal base, because Petri nets are axiomatically defined. On the other hand it is possible to examine the different system properties like deadlock, the detection of parts of the system which are never used (dead), the safeness of system components (e.g. storage capacity limitations) or feedback loops. The work of [28] is a good discussion about this subject. By extending the interpretation of such nets it is possible to come to the simulation of corresponding parts of the system.
R. Yasdi
/ A conceptual,design
aid environment
for expert-database
systems
59
4. Knowledge representation 4.1. Formal representation
In graphical modelling as we stated above there is no possibility to analyze the content of the model or to include other correctness-criteria than defined in the graph manipulation procedures. The formal definition of CKM is given in terms of translation of graphs into a first order logic. Reasons for axiomatization.
There are several reasons for introducing the axiomatization: (1) The possibility to determine whether the model is correct; (2) To determine whether the specificaton is self contradictory; (3) The formal representation will include support for semantical aspects. This can be used by a designer as the final arbiter in cases where there. is a disagreement as about the exact meaning of some specification. (4) The formal definition can be transformed into a executable language as PROLOG which constitutes the representation of the knowledge. The knowledge about a UoD is considered to be of two types, namely: concrete- and abstract knowledge. With concrete knowledge we mean facts, e.g. “jim is an employee”. Abstract knowledge (rules) refers to general statements about the UoD, e.g. “all employees have a salary”. When a first-order language is used for representing knowledge about a UoD, the abstract knowledge is represented by quantified sentences (i.e. formulae in which all variables are quantified) and concrete knowledge is represented by ground instances of the employed predicate symbols. Thus abstract knowledge constitutes the theory, and the concrete facts will correspond to the stored data. A knowledge base represents besides the syntactical properties also semantics by usage of rules. There are the following types of rules: - Derivation rules or implication rules, they allow implicit information to be derived from other information that is stored (known) directly. These rules constitute the integrity constraints to be held in the knowledge base and must always be true. One part of these rules is represented in the form of pre- and post-conditions associated with the manipulation of the knowledge base. One other important part is given by - Schema rules define the schema that specifies a large configuration of commonly associated concept types. For example is-a or part-of. These are rules which are defined by the structure of the conceptual schema. They are checked in the form of ‘side effects’ upon data manipulation and retrieval. - Deduction rules are rules which deduce ‘new’ knowledge from the stored knowledge. Sometimes such rules are called ‘virtual rules’ because when knowledge is wanted, it must be computed rather than retrieved. - Inference rules are a ‘means’ by which we are able to make deductions. One of the simplest inference rules is ‘modus ponens’: If A is true AND A implies B then B is true
or
A A-+B B
In standard notation these are written A & A + B I= B where +,, I- are used for ‘implies’ (i.e. is provable and I= for ‘yields’ (i.e. is a logical consequence). In order to express knowledge concerning CKM, there is a need for a representation that
60
R. Yasdi
/ A concepncal
design
aid environment
for expert-database
systems
allows inferences. We will use the first-order predicate logic. This implies that CKM specification is represented by a set of axioms and these axioms constitute a first-order theory. - Search space rules [20] are used as statistical meta knowledge concerning the size and distribution of search spaces. Size estimates here refer to intermediate results and the number of secondary accesses in a data base, etc. 4.2. Derivation
rules
A constraint that is declared in an Expert System is used to restrict the stored knowledge. For example, consider the constraint “every employee has a salary”. This constraint implies that every constant symbol which refers to an employee should be associated with another constant symbol that refers to a salary. This can be reformulated in terms of information as follows: Whenever we know about an employee, then we also know about his/her salary. This constraint can easily be represented as follows: VX (emp(X) + 3 Y (sal( Y) & has-sal(X, Y))) .
(1)
However, this formula does not imply that only employees have a salary. We could close the predicate ‘has-Sal’ by stating that only employees have a salary as follows: VX V Y (has-sal(X, Y)+ emp(X) & sal( Y)) .
(21
Assume that the following predicates are defined for this example: “x is an employee” em&) Sal(x) ‘Ix is a salary” has-sal(x, y) “x has salary y” “y is tax to be paid for salary x” t=G, Y > “x pays the tax y” pv(xy Y> A derivation rule may be formally represented as follows: VX VZ ((3 Y has-sal(X, Y) & tax( Y Z)) + pay(X, Z)) . Another
(3)
rule is VX VZ ((3 Y has-sal(X, Y) 8c tax( Y, Z)) e pay(X, Z))
(4)
A derivation rule that is represented by an implication as (3) is called a partial derivation rule, whereas the derivation rule corresponding to (4) is called a total derivation rule [26]. The difference between (3) and (4) is that the latter states that whenever an employee pays tax he/she must have a salary that determines the tax. The difference between a partial and a total derivation rule can also be interpreted as follows. A partial derivation rule states that given some information other information is implied. Then a total derivation rule can be considered as a partial derivation rule and a constraint on the derived information. Formula (4) is equivalent to the two following formulae: VX VZ ((3 Y has-sal(X, Y) & tax( Y, Z))-+ pay(X, Z)) ,
(4’)
VX VZ (pay(X, Z)+ 3 Y has-sal(X, Y) & tax( Y, Z)) .
(4”)
4.2.1. Schema rules The CKM is not only a representation of objects of the real world, but also a representation of the structure of the objects and their associations. This structure is modelled in the conceptual schema and can be defined in an axiomatic form. In the following we will specify
R. Yasdi
/ A conceptual
design
aid environment
for
expert-database
61
systems
the concepts used, as entity-class, generalization, action, etc. To formalize the concepts used in the GR we first define axioms for the concepts. From these axioms a theorem can be derived, which can be used for correctness checking. Entities and entity-class. Entity is used to denote an ‘object’ or ‘thing’, of interest in the UoD. Consider an entity set employee with which a predicate emp(X) is associated. The predicate then defines a set of entities, namely the set of entities that satisfy the predicate, i.e. the predicate defines the set of employees. A class may not contain the same entity twice. Cardinalities of entity-class. A cardinality declaration is intended to denote the number of entities of an entity-class and can be of three types, namely: minimum cardinality, maximum cardinality, and exact cardinality. For example, assume that the number of employees in a UoD is at most two. This can be represented as follows:
VX V Y VZ (emp(X) Correspondingly,
& emp( Y) & emp(Z) + ((X = Y) v (X = Z) v (Y = Z)))
if the number of employees is at least three:
3X 3 Y 32 (emp(X) & emp(Y) & emp(Z)+
(7((X = Y) & (X = Z) & (Y = Z)))) .
An exact cardinality has then to be represented as a combination cardinality.
of minimum and maximum
Relationships and relationship-class. Whenever two or more entities are associated in some way to each other, a relationship is said to hold between them. For example the association between employee and salary is represented by the binary predicate eam(X, Y) and
VX V Y (earn(X, Y)+ emp(X) & salary(Y)) while the constraint is VX V Y VZ (earn(X, Y) & earn(X, Z)+
Y = Z) .
The theory implies that every employee has at most one salary. Whenever deleted, then all relations connected to the entity are also deleted.
an entity is
Total and partial relationship. When all entities of an entity-class participate in a relationship: this relationship is said to be total with respect to the entity-class. For example, when all employees have a salary, it can be represented by
VX 3 Y (emp(X) + salary(X, Y)) . A partial relationship relationship class.
denotes that some entity of an entity-class may not participate
in a
Sub-class. A sub-class denotes that all entities of an entity-class, say, employee, are also members of an entity-class, say, person:
VX (emp(X) --) person(X))
.
If a class has a sub-class, then for all entities of the class a corresponding super-class must exist.
entity in the
62
R. Yasdi
I A conceptual
design
aid environment
for expert-database
system
Generalization. A generalization relates an entity-class to a more generic one, thus a generalization is the counterpart of a sub-class, where the sub-class structure is defined upwards and the generalization structure downwards. For example,
VX (emp(X)
v secretary(X)e
person(X))
.
The left to right implication states the set of employees and the set of secretaries are a sub-class of the set of persons. The right to left implication states that all persons are either secretaries or employees, or both. One can say that the latter implication ‘closes’ the set of persons in that it prohibits an entity to be a person without also being a secretary or an employee. If an entity is inserted in a generalization, then the same entity. must be inserted in a sub-class. Intersection. An intersection denotes that those entities that are members of two entityclasses constitute a third set. For example, managers who are also employees are board members, i.e.
VX (emp(X)
& manager(X)e
board-member(X)).
Aggregation. Aggregation is the concept of ‘disjointness’; it denotes that two entity-classes do not have any common entities. For example, the set of salaries is disjoint from the set of managers, i.e.
7 3X (Sal(X) & manager(X)) Here again the aggregation upwards.
.
is defined
downwards
and intersection
structure
is defined
Euenf. Event is intended to indicate that something is happening either in the real world or in the system, thus some action must be activated, for example,
VX (ev(X, t) & person(X)+
hire(X)),
saying that when event ev has happened at the time t and X is a person, then X has to be hired. In this way we may formalize all other concepts. These constraints will be packed into the conceptual schema and used to maintain the integrity. For example, when an entity is to be inserted in a person-class, which is a generalization of employee and secretary, then the associated entity (employee or secretary) must be inserted beforehand. 4.3. Deduction
rules
These rules are called deduction rules because they make deduction possible from rules when answering user queries. Consider the following example which describes the activity of a university department. Included in the knowledge base could be facts such as teaches(schmidt, 115) teaches(schmidt, 452)
size(ll5,30) day(l15, monday)
The facts state that Schmidt teaches course 115 and 452, the enrollment
in 115 is 30 students
R.
Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
and 115 is held on mondays. These are the known things that the knowledge contain. But we can also include rules such as
63
base would
occupied(X, Y) + teaches(X, Y) occupied(X, Y) c attends(X, Y) attends(X, 141) +year(X, 1) & major(X, compsci) The first two rules could be used for time tabling purposes. They state that a person X is occupied with a course Y if X teaches Y or X is a student attending Y. The third rule states that every student who is in his/her first year and is majoring in computer sciences must attend course 141. Notice the power of this last rule: it tells us something about all first-year computer science majors. Using it we can deduce that all such students attend 141. 4.4. Inference rules
Modus ponens is the primary rule of inference by which a system adds new facts to a growing data base. For example, “if X is the husband of Y, then Y is the wife of X” and “hans is the husband of susi” leads to the conclusion that “susi is the wife of hans”. Applying an inference rule as modus ponens yields: Axioms : Query : Result:
wife( Y, X) + husband(X, Y) husband(hans, susi) ? wife(hans, X) X = susi
In this sense an axiom system is understood as a representation of a theory in such a way that certain sentences of the theory (the axioms which are assumed to be self-evident) are placed at the beginning and from them the further sentences (the theorems) are derived by means of logical deduction.
5. Engineering and implementation
issues
In the framework of DB-EX project’ we are implementing an expert system. The focus of the project is the integration of AI-techniques and data base techniques for the design of Information Systems. We are implementing an Expert System for the conference organization application. Figure 1 shows the various components of the system and their interconnections. An experienced designer who has a good feeling for language and practice with conceptual graphs can easily do the analysis illustrated. Unfortunately, people with that kind of experience are rare. When the total time to gather the information, do the analysis, and write the roles is considered, the typical productivity of a knowledge engineer is about ten rules per week. A toy system with fifty rules can be implemented in a few weeks, but a serious system with about 2000 rules may take four years. To make expert system technology a commercial success, tools and aids must be developed to increase the productivity of the sThe project DB-EX is founded by the Deutsche Forschungs Gemeinschaft (DFG) under the supervision of Dr. R. Studer and R. Yasdi at the University of Stuttgart. Many students are involved in this project, carrying out their thesis work. The detailed description of program codes of the components, as far as implemented are available at the Institute for Informatics under DB-EX Reports.
R. Yasdi
64
/ A conceptual
design
aid environment
for
expert-database
designer. To limit the size of this work we only give an overlook and schema design aid. 5.1. PROLOG
systems
to the PROLOG
tool kit
tool kit for developing expert systems
The PROLOG tool kit contains a set of facilities which are useful as extension of the PROLOG interpreter predicates, to develop new expert system applications. Each part of the PROLOG tool kit is described in this document. The presentation style is primarily to explain how to use these facilities. A designer using this tool concentrates on writing the application program without concern for the way the facilities (built-in predicates) are to be executed, or the way the user dialog looks like. 5.1.1. Utility function summary
There are a number of useful utilities available. The following is a high-level list of the various predicates that can be used. The built-in predicates are classified into their operations type. Initialization commands. When the start predicate is used a file will automatically be consulted and executed. It may contain all things to be done when starting-up. For example, - to consult files we use, - to hold test case, - to turn on testing facilities. Operating system commands. It is convenient to use the operating system commands without leaving the PROLOG environment. The following commands can be used: - list; listing the files, - edit; editing a file, - del; deleting a file. Predicate declaration. We may characterize a
-
predicate
by
&&ration
to
be
remainable, saveable, readable, askable, explainable.
Getting input from the user. We use different methods to tell the system how to get data from the user: (1) Screen formats. This is the format for a full-screen presentation. A screen is associated with a predicate by using the ‘menu-for’ declaration. (2) Variable names. In some case, the designer may be able to provide meaningful variable names. The goal must always have a fixed structure whenever it is to be asked (it can’t be asked one time with a variable instantiated, another time uninstantiated). Variable names are associated with a predicate by using the ‘give-arg’ declaration. 51.2.
Utility function description
In this section we define the syntax of the provided predicates. Remainabfe. When we define a predicate to be remainable, the system will remember the argument value for the duration of the session. We use remainable to remember that a certain predicate has been proved.
R.
Yasdi
1 A conceptual
design
aid environment
for expert-database
systems
65
Example :
remainable(Predicate) remainable(Predicate)
*condition
Assume that pet(X, Y) is true when pet X is owned by person Y and that cat(X) is true if X is a cat: remainable@et(X,
Y)) +cat(X)
.
This will remember the relation pet(X, Y) every time a person is proved to own a cat. Saveable. If it is desirable that a deduced fact (and its proof) be remembered across multiple sessions, the predicate may be declared to be saveable. In that case a permanent record is saved in the associated file. This record may be reconsulted at a subsequent session, or the ‘readable’ attribute may be used. Example :
saveable(Predicate, File) saveable(Predicate, File) t-conditions saveable(P(X), File) +cat(X) & cat-lover-file(File) Readable. When a goal predicate is declared to be readable, the system first looks in the associated file to see if it has been proven previously. The action in this case is similar to remainable, except that a file is consulted. If found, the predicate is asserted to be true (analog with the proof). Each search for a readable predicate starts at the beginning of the file. This approach is not desirable for large files with many references.
readable(Predicate, Filename) readable(Predicate, Filename) t condition readable@‘(X), Filename) *cat(X) & cat-lovers-file(Filename) Askable. Fundamental to this feature is the concept that the user is simply an extension of the knowledge base, and can be asked for a proof of something if it is needed by the system to demonstrate the proof of another goal. If the sub-goal can be proved satisfactorily, then the user need not be bothered to supply the proof. The system only supports querying the user as a last resort. If the user should be queried about a certain predicate, the designer needs to declare the predicate ‘askable’ using one of the following:
askable(Predicate) askable(Predicate) Example:
t conditions
If the program has a clause of the form:
cat-care-rating(Cats-name, Rating) t cats-home(Cats-name, Cats-home) & analysis-of-cat-care(Cats-name, Cats-home, Rating) and the user must be asked about the cats home, we define cats-home to be ‘askable’ askable(Cats-home(Cats-name,
Cats-home)).
66
R. Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
If we only want cats-home to be asked when the cat in question is important, define cat-home to be askable by the following: askable(cats-home(Cats-name,
we would
Cats-home)) +important(Cats-name).
Explainable. After a goal has been solved, and the proof tree saved, the explanation component parses the tree. At every node there is a predicate. If the predicate is declared ‘explainable’, then some text (supplied by the programmer) is displayed to the user. If the proof is completed, and none of the predicates in the proof tree are declared ‘explainable’, the user will not get an explanation. There are three ways to define a predicate ‘explainable’. (1) To tell the system that a predicate is explainable, we define it as follows: explainable(Predicate,
Text) tcondition
For example if employee-salary(Employee-name,
& computation. Sex) is explainable we define a predicate
explainable(employee-sex(Employee-name, Sex), Text) t concate-str(Employee-name.‘is a’.Sex.nil, Text). concate-str is used to create text to be displayed to the user. It strings together a list of text clauses in quotes, and variables. If the proof is completed and included employee-sex(hans, man), the user would see hans is a man and smokes a cigar. (2) Another alternative is to put the text directly in the second argument of the explainable predicate: explainable(employee-sex(Employee-name, or a woman’.nil).
Sex), ‘an employee is a man
If the proof is completed and included employee-sex(hans,
man) the user would see
an employee is a man or woman. (3) Another alternative is not to specify text. In this case, the predicate will be printed as it is. It has the form explainable(employee-sex(Employee-name,
Sex)).
If a proof is completed and included employee-sex(hans, employee-sex(hans,
man) the user would see
man).
We can also declare an ‘explainable’ for a goal that fails. The text created above is displayed if a goal succeeds and the goal is in the proof tree. If a goal fails in the proof tree, and its predicate is explainable of the ,form: explainable(Predicate!, explainable(Predicate!, explainable(Predicate)
Text) + condition & computations Text)
R. Yasdi
I A conceptual
design
aid environment
for
expert-database
systems
67
then the text that corresponds to the predicate failing will be displayed. Often when a goal fails, some of the arguments are left uninstantiated. We may want to check for this in our conditions, so we don’t display something like: hans is a *. Menu-for. Provides a user with a screen, where he can indicate his choice, or enter data on a data-entry line. We must use one of the following formats:
menu-for(Predicate, menu-for(Predicate,
Menu-file, Display-text, Menu-file, Display-text,
Reply-text) Reply-text) c condition
where the arguments are defined as -
Predicate: the predicate Menu-file: it is the name Display-text: the text to Reply-text: the response
name and its argument, of a file containing all parameters and text for the screen picture, be shown to the user, of the user.
For example menu-for(employee-sex(Employee-name, Sex), employee-menu, what is the sex of employee?, employee-sex-choice) When menu-for is called, the screen picture employee-menu is shown to the user and an answer is obtained. We must define at least one predicate whose name is employee-set-choice andhas two arguments. Before the menu-for predicate is finished and after the user types of information requested, the system looks for this (these) predicates. We can check to see if the reply is valid by defining the predicate(s): employee-sex-choice employee-sex-choice
(*, man) (*, woman)
If the predicate(s) fail, the screen is reshown. In this case the user has the choice to enter man or woman. If we don’t care to do checking for an argument we replace it by ‘*‘. If the user says ‘man’, then the result of predicate employee-sex(hans,
*) will be employee-sex(hans,
man).
Give-value. If we don’t choose to show the user a full screen menu, we can let the system ask the user for the value of the predicates arguments. We define a predicate of the form:
give-value(Pred,
Pred-with-good-var-name,
Blind-list)
where - Pred is the predicate name and its arguments, - Pred-with-good-var-name is a predicate again, but with arguments with small letters, - Bind-list is a list that contains the good variable name for the corresponding argument referred to in the predicate’. For example,
68
R. Yasdi
I A conceptual
design
aid environment
for expert-database
systems
give-value(employee-sex(X, Y), employee-sex(Employee-name, (Employee-name.X).(Sex. Y).nil). If employee-sex is called in the proof with neither variable instantiated defined as above, the system displays the following: Enter the values employee-name: sex :
Sex), and give-value is
(user types here hans) (user types here man)
If we don’t choose to show a menu or use the above method, and we have defined the predicate ‘askable’, the system will still ask the user for the value it needs. If employee-sex is called in a proof with the first variable instantiated to hans, and the second variable uninstantiated (employee-sex(hans, *)), the system displays the following: Enter the value for Y such that employee-sex(hans, Y) is true. Y: (user types in value for Y here) 5.2. Computer aided tool for CKA4 design
A tool with a graphical display enables the designer to go to the process of knowledge acquisition and interactively develop the model. Automating the whole process is impossible, but automating certain parts of it will relieve the designer of much complexity of such a design. The tool is a means to help the designer to build interactively the knowledge schema and the transactions on it, by successive enrichments. Such an enrichment can be seen as new knowledge obtained by the designer about the application regarding already existing knowledge. The tool comprises mainly of three modules (a) information module, (b) definition module, (c) manipulation module. The modules have the following functions: (a) Information module. In this mode one makes inquiry about graphs defined already. The desired graph is selected by entering CRG, CBG, CSG. The concepts of the graphs can be located by issuing the ‘FIND’ command. Having found the concept, we will be able to look around the object by entering ‘DOWNWARD’ or ‘UPWARD’. (b) Definition module. The designer will be asked to enter the objects of the graphs. Having finished the entering, the system will create the desired graph. The tool must be able to detect the situations in which there are inconsistencies, redundancy, and incompleteness. (c) Manipulation module. Manipulations are performed on the existing graphs without a need for redefining the graphs. The objects of the graphs may be erased by a ‘DELETE command and renamed by ‘MODIFY’ command. An object can be created in the graph by typing the ‘INSERT’ command. Note: Manipulations of the data instances are performed by means of queries. Such user transactions will be executed on the prototype, not on the model.
6. Conclusion The research reported in this paper is one of few initial efforts to argue for and demonstrate the usefulness of integration of Data base and Artificial Intelligence. Expert
R.
Yasdi
/ A conceptual
design
aid environment
for expert-database
systems
69
systems represent an important set of applications of AI to problems of commercial as well as scientific importance. There appear to be three main motivations for building an expert system, apart from research purposes: (1) Replication of expertise. Providing many (electronics) copies of an expert’s knowledge, so it can be consulted even when the expert is not personally available. Geographic distance and retirement are two important reasons for unavailability. (2) Union of expertise. Providing in one place the union of what several different experts know about different specialities. (3) Documentation. Providing a clear record of the best knowledge available for handling a specific problem. In this study, we have presented a design-aid environment, which is partitioned in two parts the modelling and the prototyping. In the modelling part we provide the designer with concepts to define the organization, behavior, and properties of the objects in the application. This model allows the designer to communicate with the expert to find out, whether the correct rules are identified. Without a framework, he spends more time.on syntactic consideration than on semantic ones. The main areas addressed here are - The design of a graphical structure for knowledge modeling; - Methodology for constructing the model; - Formalization of the model. In the prototyping part we use this model as a basis for the design of the knowledge base; thus it is possible to automatically generate a prototype KB from the CKM. We have argued that the inference mechanism can significantly enhance the power and usability of a data base management system. They enable the compact storage of a large amount of information in the form of general assertions or rules, and they enable the combination of these assertions, with explicitly stored facts, to deduce other specific facts that would otherwise not be available. Graphic representation
We have designed a graphical structure for knowledge modelling. It has characteristics, such as being object-oriented and making use of abstraction techniques, that are popular in Database management and Artificial Intelligence, and apply equally well to knowledge modelling, because they are particularly well suited to the organization of large descriptions that need to be understood by humans. In the knowledge modelling, the purpose of the graphs were to give a less formal specification of the application. In this direction, the more recent, so-called semantic network has been used to capture the static associations of the objects of the real world. Two weaknesses of the semantic network are their facilities for modelling the behavior and expressing the integrity constraints. By using the Conceptual Behavior Graphs of CKM we overcome these deficiencies. We have dealt with the issue of acquisition of knowledge from the expert which is a major job in the building of knowledge-based systems. We claim that it is advantageous to use the graphical representation of CKM for the solutions of the problems, such as misunderstanding between the expert and the designer. Methodology
We believe that the knowledge modelling would be difficult in practice if it is not first decided WHAT is to be modelled before it is decided HOW it can be done. CRG thus serves as the first step, in which the relevant concepts are named and organized. From CRG, it is relatively straightforward to get the behavior graph and structure graph.
70
R. Yasdi / A conceptual design aid environment for expert-database systems
Formalization Formalization is of paramount importance when a model is introduced. A formal definition is given for CKM in terms of a translation from graphs into a FOL. The formalization also highlights the uniformity of the language framework with respect to the using PROLOG as implementation language. Our approach is to consider the CKM as being equivalent to a theory expressed as a set of axioms in a FOL. The interpretation of this theory is the set of facts. An important problem is that of consistency of the CKM specification, due to the formal definition. It is known what it is meant to have a consistent model. Therefore, one can employ theorem provers to determine the consistency of the model. For the FOL foundation, this is not always decidable. However, it is suspected that this is the case, if one ignores the axioms that give rise to inconsistency into the model. The expert can formulate assertions in terms of pre- and post-condition using the PROLOG syntax. An assertion is a closed formula derived from a PROLOG clause by binding each of its free variables (represented by the argument of predicate). We have,assumed a single user and one processor system, which is not the case in a large application environment. An important question is, how concurrent PROLOG can be used in our approach to overcome this deficiency. Modelling time is extremely important in expressing real work constraints. We need a better integration of time into the framework of our approach. Although many researches have been done on the time concept, there are still many open questions. A good discussion on this issue is provided in [48] which is particularly suited for the extending CKM in the time modelling.
Appendix A. Abbreviations AI CBG CICM CRG CSG DBMS DP EX FOL
Artificial Intelligence Conceptual Behavior Graph Conceptual Knowledge Model Conceptual Requirement Graph Conceptual Structure Graph Data Base Management System Deductive Processor Expert system First Order Logic
FR GR IS KBS KBMS ICE UoD
Formal Representation Graphical Representation Information System Knowledge Base System Knowledge Base Management System Knowledge Engineer Universe of Discourse
Appendix B. Problem definition B. 1. Background An IFIP Working Conference is an international conference intended to bring together experts from all IFIP countries to discuss some technical topics of specific interest to one or more IFIP Working Groups. The usual procedure, and the one to be considered for the present paper, is that of an invited conference which is not open to everyone. For such a conference it is something of a problem to ensure that members of the involved IFIP Working Group(s) and Technical Committee(s) are invited even if they do not come. Furthermore, it is important to ensure that sufficient people attend the conference so that the financial break-even point is reached without exceeding the maximum dictated by the
R. Yasdi
/ A conceptual
design
aid environment
for
expert-database
systems
71
facilities available. IFIP’s policy on Working Conferences suggests the appointment of a Program Committee to deal with the technical content of the conference and an Organizing Committee to handle financial matters, local arrangements, and invitations and/or publicity. These committees clearly need to work together closely and have a need for common information and to keep their recorded information consistent and up to date. B.2. Information
system to be designed
The information system which is to be designed should support the activities of both a Program Committee and an Organizing Committee involved in arranging tan IFIP Working Conference. The involvement of two committees is seen as analogous to two organizational entities within a corporate structure using some common information. The following activities of the committees should be supported: Program Corn m ittee :
1. 2. 3. 4. 5. 6.
Preparing a list to whom the call for paper is to be sent. Registering the letters of intent received in response to the call. Registering the contributed papers on receipt. Distributing the papers among those undertaking the refereeing. Collecting the referee’s reports and selecting the papers for inclusion in the problem. Grouping selected papers into sessions for presentation and selecting a chairman for each session.
Organizing
Committee :
1. Preparing a list of people to be invited to the conference. 2. Issuing priority invitations to National Representatives, Working Group members and members of associated working groups. 3. Ensuring that all authors of each selected paper receive an invitation. 4. Ensuring that authors of rejected papers receive an invitation. 5. Avoiding sending duplicate invitations to any individual. 6. Registering acceptance of invitations. 7. Generating final list of attendees. B.3. Boundaries of system
It should be noted that the budgeting and financial aspects of the Organizing Committee’s work meeting plans of both committee’s, hotel accommodation for attendees and the matter of preparing camera ready copy for the proceedings have been omitted from this exercise, although a submission may include some or all of these extra aspects if the authors fe&l so motivated.
References [l] A. Barr, P. Cohen and E.A. Feigenbaum (eds.), The Handbook of AI (Kaufmann, Los Altos, 1983). [2] H. Biller and E.J. Neuhold, The semantics of data models, Inform. Systems3 (1) (1978). [3] M. Brodie, J. Mylopoulos and J. Schmidt (eds.), On Conceptual Modelling: Perspectivesfrom AI, Da&bases and fiogramming Languages (Springer, Berlin, 1983). (41 M.L. Brodie and E. Silva, Active and passive component modelling: ACM/PCM, in: T.W. Olle, H.G. Sol and A.A. Vetijn Stuart (eds.), Information Systems Design Methodologies (North-Holland, Amsterdam, 1982) 41-91.
72
R.
Yasdi
I A conceptual
design
aid environment
for expert-database
systems
[5] A.J. Bubenko, Comments on some comparison of IS Design methodologies, SYSLAB, University of Stockholm, 1983. [6] C.L. Chang and C.T. Lee, Symbolic Logic and Mechanical Theorem fioving (Academic Press, New York, 1973). [7] P.P. Chen, Entity-relationship model: Toward a unified view of data, ACM Trans. Database Systems 1 (1) (1976). [S] P.P. Chen (ed.), Entity-Relationship Approach to Information Modeling and Analysis (North-Holland, Amsterdam, 1983). [9] W.F. Ciocksin and C.S. Mellish, Programming in PROLOG (Springer, Berlin, 1981). [lo] C.J. Date, An Znttwduction to Data Bare Systems (Addison-Wesley, Reading, MA, 1977). [ll] C. Davis, S. Jajodia, P. Ng and R. Yeh (eds.), Entity-Relationship Approach to Software Engineering (North-Holland, Amsterdam, 1983). [12] EA. Feigenbaum, The art of artificial intelligence: I. Themes and case studies of knowledge engineering, 5th Zntemational Joint Conference on AZ, 1977. [13] H. Gailaire and J. Minker (ed.), Logic and Data Bases (Plenum Press, New York, 1978). [14] J.J. van Griethuysen (ed.), Concept and Terminology for the Conceptual Schema the Information Base (American National Standards Institute, New York, 1982). [15] M.R. Gustafsson, T. Karlsson and J.A. Bubenko Jr., A declarative approach to conceptual information modeling, in: T.W. Olle, H.G. Sol and A.A. Verrijn Stuart (eds.), Information Systems Design Methodologies (North-Holland, Amsterdam, 1982) 93-142. [16] P. Hammond, APES, A Prolog Expert System, Department of Computing, Imperial College, London. [17] H. Herms, Introduction to Mathematical Z.ogic (Springer, Berlin, 1973): [18] W. Hodges, Elementary predicate logic, in: D. Gabbay and F. Guenthner (eds.), Handbook of Philosophical Logic (Reidel, Dordrecht, 1983). [19] A. Horndasch, R. Studer and R. Yasdi, An approach to conceptual schema design of information systems, Proc. TFAZS &5, theoretical and Formal Aspects of ZS, April 1985, Barcelona (1985) 109-125. [20] M. Jark, Design choices for different uses of rules in databases and knowledge bases, New York University, 1984. [21] M. Jark and J. Koch, A survey of query optimization in centralized data base systems, NYU Working Paper Series CRlS #+I, GBA 82-73, 1982. [22] R. Kowalski, bgic for Problem Solving (North-Holland, New York, 1979). [23] S. Kunifuji and H. Yokota, Prolog and relational database for the fifth generation computer system, Froc. Workshop on Logical Bases for Databases, Toulouse (1982). [24] P.T.M. Laagland, Modeling in information system development, Free University, Amsterdam, 1983. [U] B. Lundberg, m an information modeling tool, SYSLAB, Report No. 3 S-41296, Gateborg, 1981. [26] B. Lundberg, Information modelling and the axiomatic method, SYSLAB, University of Stockholm, 1984. [27] M. Lundeberg, The ISAC approach to specification of information systems and its application to the organization of an IFIP working conference, in: T.W. OIIe, H.G. Sol and A.A. Verrijn Stuart (eds.), Information Systems Design Methodologies (North-Holland, Amsterdam, 1982) 173-2.34. [28] E.J. Neuhold, Development methodologies for event and message based application systems, University of Stuttgart, 1981. [29] E.J. Neuhold, Vergleichende Analyse von Software Entwurfsmethoden, ZHS J. 5 (1981). [30] E.J. Neuhold, Objects and abstract data types in information systems, ZFIP, TCZ Working Conference on Database Semantics (DS-I), January 1985, Hasselt. [31] N. Nilson, Principles of Artificial Intelligence (Springer, Berlin, 1982). [32] H.E. Nissen, Subject matter separability of information system design methods, in: T.W. Olie, H.G. Sol and C.J. rUlly (eds.), Information Systems Design Methodologies (North-Holland, Amsterdam, 1983) 207-237. [33] T.W. Olle, H.G. Sol and A.A. Verrijn Stuart (eds.), Information Systems Design Methodologies (NorthHolland, Amsterdam, 1982). [34] T.W. Olle, H.G. Sol and C.J. Tully (eds.), Information Systems Design Methodologies (North-Holland, Amsterdam, 1983). [35] R. Reiter, Deductive question answering on relational data bases, in: H. Gallaire and J. Minker (eds.), Logic and Datu Base (Plenum Press, New York, 1978). 1361 G. Richter and R. Durchholtz, IML-inscribed high-level Petri nets, in: T.W. Olie, H.G. Sol and A.A. Verrijn Stuart (eds.), Information Systems Design Methodologies (North-Holland, Amsterdam, 1982) 335-368. [37] J. Robinson, A machine-oriented logic based on the resolution principle, J. ACM 12 (1965). [38] C. Rolland and C. Richard, The Remora methodology for information systems design and management, in: T.W. Olle, H.G. Sol and A.A. Verrijn Stuart (eds.), Znformation Systems Design Methodologies (NorthHolland, Amsterdam, 1982) 369-426. [39] D.T. Ross and K.E. Schoman, Structured analysis for requirements definiton, ZEEE 7ken.r. Software Engrg. 3 (1) (1977) 16-34.
R. Yasdi
1 A conceptual
design
aid environment
for expert-database
systems
73
[40] U. Schiel, Ein semantisches Datenmodel ffir konzeptuelle Schema und ihre Abbildung auf interne relationale Schemata, University Stuttgart, 1984. [41] H.J. Schneider (ed.), Formal Models and fiacticai Tools for Information Systems Design (North-Holland, Amsterdam, 1979). [42] H.J. Schneider (ed.), Lexiicon der Informatik und Datenverarbeitung (Oldenburg, Miinchen, 1983). [43] H.J. Schneider, Software der fiinften Generation, Die Loesung der Sofhvarekrisse?, IBM Wissenschaftsmagazin (1984). [44] H.J. Schneider, Entwicklungswerkzeuge der 5. Software-Generation, GI-Fachtagung, Tutzing, 1984. [45] H.J. Schneider and A. Wasserman (eds.), Automated Tools for Information System Design (North-Holland, Amsterdam, 1980). W. Schoenfeld, Zum Einsatz von automatischen Beweissuchverfahren in Informationssystem, IBM, Heidelberg, 1984. [47] J.M. Smith and D.L.P. Smith, Database abstraction: Aggregation and generalization, ACM Trans. Database Systems 2 (1977). [48] R. Studer, Modeling office information system by using timed THM- nets, Bericht 8184, University of Stuttgart, 1984. [49] A. Walker, Data base, expert systems, and Prolog, IBM Research Laboratory, San Jose, CA 95123, 1983. [SO] A.I. Wasserman and H.J. Schneider, Toward a unified view of data, Infotech State of the Art Report, Series 8, no 4, 1980 AI 15, 1980, s.19-48. [51] R. Weyhrauch, Prolegomenta to a theory of mechanical formal reasoning, Artificial Intelligence 13 (1980). [52] M.L. Wilson, The Information about approach to design and implementation of computer based systems, FDS76-0093, IBM, Gaithersburg, MD, 1975. [53] R. Yasdi, Design aid environment for database based expert systems at the conceptual level, DB-EX, Rot. 1st International Workshop on Expert Database Systems, October, 1984, Kiawah Island, SC. [54] R. Yasdi and S. Akoto, Implementation of hyper-resolution in Prolog, DB-EX Report on Deduction Component, University of Stuttgart, 1985. [55] R. Yasdi, Modelling database based expert systems at the conceptual level, Proc. CSC’SS, ACM Computer Science Conference, March, 1985, New Orleans.