A conceptual modelling formalism for temporal database applications

A conceptual modelling formalism for temporal database applications

Information Sysrems Vol. 16, No. 4, pp. 401416, Printedin Great Britain. All rights reserved 1991 Copyright A CONCEPTUAL TEMPORAL MODELLING FORMALI...

2MB Sizes 37 Downloads 62 Views

Information Sysrems Vol. 16, No. 4, pp. 401416, Printedin Great Britain. All rights reserved

1991 Copyright

A CONCEPTUAL TEMPORAL

MODELLING FORMALISM DATABASE APPLICATIONS

C. THEODOULIDIS,‘~ ‘Department of Computation,

P. Louco~ou~os’

and

0

0306-4379/91 53.00 + 0.00 199 I PergamonPress plc

FOR

B. WANGLERS

UMIST, P.O. Box 88, Manchester M60 IQD, U.K.

%wedish Institute for Systems Development (SISU), Box 1250, 164 28 KISTA, Sweden (Received 16 August 1990; in revised form 21 February 1991)

Abstract-Arguably the most critical of all activities in the development of an information system is that of requirements modelling. The effectiveness of such a specification depends largely on the ability of the chosen conceptual model to represent the problem domain in such a way so as to permit natural and rigorous descriptions within a methodological framework. Recent years have witnessed an increased demand for information systems which cover a wide spectrum of application domains. This, inevitably, has had the effect of demanding conceptual models of enhanced functionality and expressive power than is currently possible in practice. This paper introduces the TEMPORA modelling paradigm for developing information system applications from a unified perspective which deals with definitional, intentional and constrain knowledge. The paper discusses in detail one of the components of the TEMPORA conceptual model, the entity-relationshiptime (ERT) model, which deals with structural aspects including time and complex objects modelling. Key words: Conceptual modelling, time semantics, temporal databases, database design

I.

INTRODUCTION

In recent years there has been an increasing demand for enhanced functionality of information systems that deal with applications beyond the traditional data processing type. It has also been argued that the development of the next generation of information systems will require the adoption of approaches which provide a closer alignment between business policy and systems operation [l, 21. A number of issues arising from these requirements are addressed within the TEMPORAS project [3] which provides the framework for the work reported in this paper. The need for enhanced systems functionality is addressed through the use of a conceptual modelling formalism which caters for the modelling of: business rules, time and complex objects. This formalism is supported at the database level by an extension of the relational model with temporal semantics and an execution mechanism that provides active database functionality. The TEMPORA paradigm advocates that development of an information system should be viewed as the task of developing or augmenting the policy knowledge base of an organization, which is used throughout the software development process. Within TEMPORA, this knowledge base is concerned with the definition of the principal facts and operations within the organization together with the rules which guide these operations and ensure the integrity of these facts. An abstract view of the TEMPORA conceptual components and their interrelationships is shown in the diagram of Fig. 1. At the specification level three models are defined, the ERT, Process and Rule models, while at the application level a relational schema and a transaction programming language deal with the execution aspects of the application. The two nodes 1 and 2 indicated in Fig. 1, represent mappings from a TEMPORA specification to a relational schema plus a number of application programs expressed in a transaction programming language. The structural properties of an application are expressed in terms of the ERT model. The process component deals with the definition of operations. A process is the smallest independent unit of tTo whom correspondence should be addressed. $The TEMPORA project has been partly funded by the Commission of the European Communities under the ESPRIT R&D programme. The TEMPORA project is a collaborative project between: BIM, Belgium;Hite-c, Gm; Imperial College, U.K.; LPA, U.K.; SINTEF, Norway; SISU, Sweden; University of Liege, Belgiumand UMIST, U.K. SISU is sponsored by the National Swedish Board for Technical Development (STU), ERICSSON and SwedishTelecomm.

IS1614-D

401

402

C. THEODOULIDIS ef al.

ecification Level

Fig. 1. Overview of the TEMPORA architecture.

business activity of meaning, initiated by a specific trigger and which, when complete, leaves the business in a consistent state. The analysis of processes in terms of the ERT model results in a set of primitive actions suffered by an entity such as “salary increase”. Control of the behaviour of a system is modelled in terms of rules. Two general classes of rules are recognized: static rules which are concerned with the integrity of the database; and dynamic rules which are concerned with the control of transactions. More specifically, static rules are expressions that must hold in every valid state of the database. Static rules represent either integrity constraints on the ERT model or derivations on it. Dynamic ruies are expressions that define valid state transitions in the database. In effect, dynamic rules specify the interaction of state components and/or event occurrences and they represent either dynamic integrity rules or control of operations. This paper is concerned with a detailed description of the ERT mode1 and of the mapping node 1. Details of earlier work in the process and rule models are reported in Refs [I, 2,4-6]. The ERT model uses as its basic structure the entity-relationship approach in order to preserve the welI-known advantages of this approach namely, graphical display, increased readability and wide acceptance by practitioners. The basic mechanism differs from the original entity-relationship model [7] in that it regards any association between objects in the unified form of a relationship. Thus, the conceptually unnecessary distinction between attributeships and relationships [S, 91 is avoided. On this basic mechanism the ERT mode1 is extended in its semantics and graphical notation in two directions: the modelling of time; and the modelling of complex objects. The need for modelling time explicitly is that, for many applications when an info~ation item becomes outdated, it need not be forgotten. The lack of temporal support raises serious problems in many cases. For example, conventional DBMS cannot support historical queries about past status, let alone trend analysis which is essential for applications such as decision support systems (DSS). The need to handle time more comprehensively surfaced in the early 19’70sin the area of medical info~ation systems where a patient’s history is particularly important [lo]. Since these early days there has been a large amount of research in the nature of time in computer-based information systems and the handling of the temporal aspects of data [11-l 61. Research interest in the time modelling area has increased dramatically over the past decade as shown by published bibliographies [ 171 and comparative studies [ 181. Without temporal support, many applications

A conceptual modelling formalism for temporal database applications

403

have been forced to manage temporal information in an ad hoc manner. Recent research work attempts to overcome this problem from a number of different perspectives. The ERAE model [19,20] extends the semantics of the entity-relationship model with a distinguished type Event as one of its basic constructs. The ERAE approach considers only the requirements specification area without providing any integration to subsequent activities towards the implementation of a database system. Furthermore, the language is confined to simple objects and avoids the issue of modelling objects with component properties. The CML language uses an object-centred viewpoint and includes time as a primitive notion [21,22]. This language has rich time semantics and the modelling of complex objects is implicitly defined in the object-centred framework adopted by the language. However, because it deals with two time dimensions and also caters for the modelling of relative temporal info~ation, its efficient use in large database applications is debatable. Furthermore, it lacks a notion which is conducive to concept elicitation from and validation by end users. The modelling of complex objects [23] arises from the need to deal with applications which require the management of objects of arbitrary complexity. For example, in CAD~CAM or CASE applications one needs to be able to deal with objects that consist of a number of components and to reason for them whilst being able to deal with their components. Traditional data models fail to deal with this requirement; the structural constraints, for example, of the relational model [24] force a developer to decompose the representation of a complex object into a set of relations. Extensions to the relational model include new types of attributes [25] and the relaxation of the first normal form constraint [26]. In both cases modelling of complex objects is carried out from a machine oriented rather than a user-oriented perspective. In semantic modelling Dayal [271 proposes an extension of DAPLEX [ZS] as the means for dealing with complex objects. Whilst this approach is based on work which has the formally attractive feature of representing everything as an object, from a conceptual modelling perspective it is important to permit views which more naturally reflect a user’s conception of the application domain. The ERT model is based on the premise that at the requirements specification level, the conceptual model should exhibit a number of properties such as implementation independence, abstraction, formality, constructability, ease of analysis, traceability and ability to be mapped onto a relational schema. The ERT model is not advocated by the authors as the only approach to modelling structural aspects of applications-conceptual modelling is a field which is greatly influenced by ontological and epistemological assumptions and a universal agreement on these assumptions has not (and is possibly unlikely to be) agreed upon. Section 2 of the paper describes the basic fo~alism of the ERT model in terms of the basic concepts, the external graphical notation, the semantics of complex objects and the semantics of time. Section 3 discusses the issues associated with mapping an ERT schema onto a relational schema and describes an algorithm for such an undertaking. Section 4 concludes the paper by presenting the current status of the work and discussing future research directions.

2. THE

ERT

MODEL

2.1. Basic concepts and externals The orientation of the ERT model is the entity-relationship formalism which makes a clear distinction between objects and relationships. On this basis, the ERT model offers a number of features which are considered to be necessary for the modelling of complex database applications. Specifically, it accommodates the explicit modelling of time, taxonomic hierarchies and complex objects. The different aspects of data abstraction that are dealt with in the ERT model are: classification, generalization, aggregation and grouping. The most primitive concept in ERT is that of a class which is defined as a collection of individual objects that have common properties i.e. that are of the same type. In an ERT schema only classes of objects are specified. In addition, every relationship is viewed as a named set of two (entity or value, role) pairs where each role expresses the way that a specific entity or value is involved in a relationship, These two named roles are called relationship involvements and for completeness reasons, they are always required in an ERT schema. By using relationship involvements we have

404

C.

THEODOULIDISef al.

the possibility to express each relationship with two sentences which are syntactically different but semantically equivalent. Time is introduced in ERT as a distinguished class called time period class. More specifically, each time-varying simple entity class or complex entity class and each time varying relationship class is timestamped with a time period class. That is, a time period is assigned to every time varying piece of information that exists in an ERT schema. The term time varying refers to pieces of information that the modeller wants to keep track of their evolution, i.e. to keep their history and consequently, to be able to reason about them. For example, for each simple or complex entity class, a time period might be associated which represents the period of time during which an entity is modelled. This is referred to as the existence period of an entity. The same argument applies also to relationships, i.e. each time varying relationship might be associated with a time period which represents the period during which the relationship is valid. This is referred to as the validity period of a relationship. Besides the objects for which history is kept, another type of object, called event is also supported. These are objects that prevail for only one time unit and thus, the notion of history does not apply to them. Alternatively, one can say that these objects become history as soon as they occur. Events are denoted by defining the duration of their timestamp to be one time unit. As a consequence of the adopted timestamping semantics, only the event time is modelled in ERT; i.e. the time that a particular piece of information models reality. At a first glance this might seem to be restrictive in the sense that the captured information is not semantically as rich. However, this assumption is considered to be necessary in order to keep the proposed approach manageable and to permit computational attractive algorithms for reasoning about time. For each timestamp its granularity should be defined. The granularities however, should be carefully chosen as they affect the relative ordering of events stored in the database. For example, if DATE is the granularity of the SHIPMENT object, then two shipments received on the same date will be treated to have occurred at the same time even if their exact time of arrival differs. Another distinguished class that is introduced in ERT is that of a complex object. The distinction between simple and complex objects is that simple objects are irreducible in the sense they cannot be decomposed into other objects and thus, they are capable of independent existence, whereas a complex object is composed of two or more objects and thus, its existence might depend on the existence of its component objects. The relationship between a complex object and its component objects is modelled through the use of the IS_PART_OF relationship. The ERT model accommodates explicitly generalization/specialization hierarchies. This is done through a distinguished ISA relationship which has the usual set theoretic semantics. Furthermore, for each relationship involvement, a user supplied constraint rule must be defined which restricts the number of times an entity or value can participate in this involvement. This constraint is called curdinality constraint and it is applied to the instances of this relationship involvement by restricting its population. Each of the simple entity classes and user defined relationship classes in an ERT schema can be specified as derived. This implies that its instances are not stored by default but they can be obtained dynamically, i.e. when needed, using the derivation rules. For each such derivable component, there is exactly one corresponding derivation rule which defines the members of this entity class or the instances of this relationship class at any time. In addition, if the derivable component is not timestamped then the corresponding derivation rule instantiates this component at all times whereas if this component is time varying then the corresponding derivation rule obtains instances of this class together with its existence period or validity period. An example ERT schema is shown in Fig. 2. As shown in Fig. 2, entity classes, e.g. EMPLOYEE, are represented using rectangles with the addition of a “time box” when they are time varying. Derived entity classes, e.g. TOP-SELLINGPRODUCT, are represented as dashed rectangles. Value classes, e.g. Name, are also represented with rectangles but with a small black triangle at the bottom right corner to distinguish them from entity classes. Complex entity classes, e.g. CAR, and complex value classes, e.g. ADDRESS, are represented using double rectangles. Relationship classes are represented using a small filled rectangle (e.g. the relationship class between MANAGER and CAR), whereas derived relationship classes are represented using a non-filled dashed rectangle (e.g. the relationship

A conceptual modelling formalism for temporal database applications

01

405

L,,

I

L,

1

has f

/

13

employs

T

63)

I

ha (1,‘)

of I

selling_ I, prcduc1: I . : I I

I

: r-t-; : ’,T: L-a., : : I best_ :; sales_ ,

(1,N)

has

1

1

1

Fig. 2. An example ERT schema.

between PRODUCT and SALESPERSON). In addition, relationship involvements and cardinality constraints are specified for each relationship class. The interpretation of these is for example “a MANAGER has one CAR and a CAR belongs to only one MANAGER” (Fig. 2 shows the minimum and maximum relationship involvements). The notation of the ISA hierarchies, e.g. MANAGER ISA EMPLOYEE, is also given in Fig. 2. A distinction is made between different variations of ISA hierarchies. These are based on two constraints that are usually included in any ISA semantics, namely the partial total ISA constraints and the disjoint overlapping ISA constraint. It is assumed of course, that these constraints are applicable to hierarchies under the same specialization criterion. These constraints are defined as follows: 0 The partial ISA constraint states that there are members in the parent or generalized entity class that do not belong in any of its entity subclasses. On the other hand, the total ISA constraint states that there are no members in the parent or generalized entity class that do not belong in any of its entity subclasses. 0 The overlapping ISA constraint states that the subclass of a given parent class under the same specialization criterion are allowed to have common entities, whereas the disjoint ISA constraint states that the subclasses of a given parent class under the same specialization criterion are not allowed to have common entities. The first of these constraints refers to the relationship between the parent class of generalized class and the child class(es) or specialized class(es). The second constraint refers to the relationship between child classes. Based on the above constraints the four kinds of ISA relationships are supported, namely: partial disjoint ISA; total disjoint ISA; partial overlapping ISA; and total overlapping ISA.

C. T~WULIDIS

406

ef al.

In summary, the components of ERT are defined as follows: Entity is anything, concrete or abstract, uniquely identifiable and being of interest during a certain time period. Entity Class is the collection of all the entities to which a specific definition and common properties apply at a specific time period. Relationship is any permanent or temporary association between two entities or between any entity and a value. Relationship Class is the collection of all the relationships to which a specific definition applies at a specific time period. Value is a lexical object perceived individually, which is only of interest when it is associated with an entity. That is, values cannot exist on their own. Value Class is the proposition establishing a domain of values. Time Period is a pair of time points expressed at the same abstraction level. Time Period Class is a collection of time periods. Complex Object is a complex value or a complex entity. A complex entity is an abstraction (a~regation or grouping) of entities, relationships and values (complex or simple). A complex value is an abstraction (aggregation or grouping) of values (complex or simple). Complex Object Class is a collection of complex objects. That is, it can be a complex entity or a complex value class.

In addition, the following axioms apply to the concept of a relationship class: I. An entity can only participate in a relationship if this entity is already in the population of the entity class specified in the relationship. Furthermore, the validity period of the relationship should be a subperiod of the intersection of the existence periods of the two involved entities. 2. Each entity in a subclass population has also a reference in the population of its superclasses. In addition, the existence period of the specialized entity should be a superperiod of the existence period of the generalized entity. 3. If an entity belongs to a population of an entity class, it cannot also belong to the population of a value class at any time and vice versa. Furthermore, any two entity classes which are not themselves subclasses of a third entity class and all have no common subclasses, must be disjoint at any time point. Note that this definition does not prevent entities from moving between entity classes during their lifetime.

The graphical notation for the ERT is summarized in Fig. 3.

In the approach described in this paper, time is introduced as a distinguished entity class. For example, each entity class can be timestamped in order to indicate that its history is of importance to the particular application. The same argument applies also to user defined relationship classes and IS_PART_OF relationships. The “time periods” approach was chosen as the most primitive temporal notion because it satisfies the following requirements [29,30]: l

l

Period representation allows for imprecision and uncertainty of information. For example, modelling that the activity of eating precedes the activity of drinking coffee can be easily represented with the temporal relation before between the two validity periods [31]. If one tries, however, to model this requirement by using the line of dates then a number of problems will arise since the exact start and ending times of the two activities are not known, Period representation allows one to vary the grain of reasoning. For example, one can at the same time reason about turtle movements in days and main memory access times in nanoseconds.

The formal framework employed as the temporal reasoning mechanism is that of Interval Calculus proposed by Allen [31] and which was later refined by Loki [21] but with the addition of a formal calendar system in order to provide for the modelling and reasoning about the usual calendar periods. In Fig. 4, the semantics of the adopted time model is defined using ERT notation. The modelling of information using time periods takes place as follows. First, each time varying object (entity or relationship) of ERT is assigned an instance of the built-in class s~~~oiper~o~. Instances of this class are system-generated unique identifiers of time periods, e.g. SPl, SP2, etc. Members

A conceptual modelling formalism for temporal database applications

407

Explanation

ERT Graphical Notation

Entity classA and derivedentity classA

Complex entityclass C and complex valueclassD. SimplevalueclassE and derivedvalue class F.Mayhave relationshipsto nodesof tyPeA BCandD 3 1

b_

ILL

a F-,-b

-ml- “_ 4 &- ’

Relationship(binary)thatmay connectnodes of type A,B,Cor D. a and b arerelationshipnames(b is inverse of a). ml and m2 indicatemappingin the format(x:y), wherex,y arc non-negativeintegers,or N. Non-tilled boxindicatesderivedrelationship. Time stampedbinaryrc1ationships.T is a symbolictime period

b

&

ISA relationshipsFilled box -> total,non-filled-> partial. Severalarrowspointingto roundbox indicatedisjoint subsets. Fig. 3. ERT externals.

of this class can relate to each other by one of the thirteen temporal relations between periods [31]. Instances of the symbol period class may be displayed in an ERT schema. If they are not included then a simple T in the time box indicates that the corresponding object is timevarying. The two subclasses of the time period class are disjoint, as indicated in Fig. 4. This is because symbol periods are used to model relative time information while calendar periods model absolute time information. Thus, both views are accommodated. In Fig. 4, the symbol t represents a temporal relationship and the symbol t 1 its inverse. In addition, time periods start and end on a hfc and also have a duration expressed in ticks. A tick is defined as smallest unit of time that is permitted to be referenced and it is usually the time unit second. The calendar period class has as instances all the conventional Gregorian calendar periods e.g., 10/3/1989, 21/6/1963, etc. Members of this class are also related to each other and to members of the symbol period class with one or more of the time period comparison predicates. The temporal relationships between calendar

NoOflicks

TickNo

Fig. 4. The time metamodel

c.

408

-hEODOULIDIS

et ai.

periods follow a formal calendar system [32] which is based on the work reported by Clifford and Rao [33] with the addition of the calendar unit “week”. This was considered to be necessary since there is often reference to this unit depending on the particular application domain. Any desirable calendar time unit can be defined as a combination of already defined calendar units. Thus, expressions like END-OF-MONTH and NEXT-FORTNIGHT are easily defined. The usual operators are provided including set operators and comparators. These are distinguished to those that are applied to elements of the same domain and those that are applied to elements of different domains [33]. Additional operators like the time period comparison predicates are also provided together with functions that transform elements of one domain onto another. Note however, that the reasoning always takes place at the lower level of the ones involved. Other notions of time such as duration and periodic time are also represented directly in the proposed formalism in addition to the above specified ones. As a consequence, the expressive power of the proposed formalism is increased and so does its usability. These notions of time are expressed in the conceptual rule language as constraints upon the structural components and also as constraints on the behaviour of procedures. The definition of the duration class is shown in Fig. 5. Members of this class are simple durations expressed in any abstraction level. Each duration consists of an amount of calendar time units expressed using real numbers and it is uniquely identified by the combination of its amount and abstraction level. For example, the duration “1.5 year” is a valid duration according to this definition. The periodic time class is defined in Fig. 6. As seen in Fig. 6, a periodic time has a base which is a calendar period, a duration and also it can be restricted by two calendar periods which restrict the set of values for this periodic time. For example, the expression “first week of each month during next year” is a valid definition of a periodic time according to the above definition. In this case the calendar period corresponds to “l-7 days”, the duration corresponds to “1 month” and the restricting calendar period is the next year corresponding to [l/1/1991, 31/12/1991]. A periodic time is uniquely identified by the combination of its base and its duration. 2.3. Complex object semantics Complex objects can be viewed from at least two different perspectives [34]. The first one is the representational perspective and focuses on how entities in the real world should be represented in the conceptual schema. This entails that objects may consist of several other objects arranged in some structure. Events in the real world are then mapped to operations on the corresponding objects. In contrast, if complex objects are not allowed, e.g. in the relational model, then information about the object is distributed and operations on the object are transformed to a series of associated operations. The second perspective is the methodological perspective which means that the complex object concept is regarded as a means of stepwise refinement for the schema and for hiding away details of the description. This in turn, implies that complex objects are merely treated as abbreviations that may be expanded when needed. The basic motivation for the inclusion of the complex entity/value class in the externals formalism, is to abstract away detail, which in a particular situation is not of interest. In addition, no distinction is made between aggregation and grouping, but rather a general composition mechanism is considered which also involves relationships/attributeships. Graphically, composition is shown by surrounding the components with a rectangle representing the composite object class.

Duration

Fig. 5. Metamodel of the duration class.

I

Fig. 6. Metamodel of the periodic time class.

A conceptual

modelling

formalism

for temporal

database

applications

409

The notation of a complex object in ERT is shown in Fig. 7. The complex entity class CAR and the complex value class ADDRESS of Fig. 2 may be viewed at a more detailed level as shown in Figs 7 and 8. The components of a complex object comprise one or more hierarchically arranged substructures. Each directly subordinate component entity class IS_PART_OF-related to the complex entity class border so that the relationship between the composite object and its components will be completely defined. Whether the HAS-COMPONENT involvement is one of aggregation or grouping, it can be shown by means of the normal cardinality constraints. That is, if its cardinality is (0, 1) or (1, l), the component is aggregate whereas if its cardinality is (0, N) or (1, N), the component is a set. Most conceptual modelling fo~alisms which include complex objects [35-371, model only p~y~jc~~p~rt ~jer~rc~je~, i.e. hierarchies in which an object cannot be part of more than one object at the same time. In the ERT model, this notion is extended in order to be able to also model logical part hierarchies where the same component can be part of more than one complex object. To achieve this, four different kinds of IS-PART-OF relationships are defined according to two constraints, namely the dependency and exclusiveness constraints. The dependency constraint states that when a complex object ceases to exist, all its components also cease to exist (dependent composite reference) and the exclusiveness constraint states that a component object can be part of at most one complex object (exclusive composite reference). That is, the following kinds of IS-PART-OF variations [38] are accommodated: l l l l

Dependent exclusive composite reference. Independent exclusive composite reference. Dependent shared composite reference. Independent shared composite reference.

Note that no specific notation is introdu~d for these constraints. Their interpretation comes from the cardinality constraints of the IS-PART-OF relationship. That is, assume that the cardinality of the IS_PART_OF relationship is (a, /?). Then, u = 0 implies non-dependency, tl # 0 implies dependency, B = 1 implies exclusivity while /I # 1 implies shareness. The following rules concerning complex objects should be observed: 1. Complex values may only have other values as their components. In addition, the corresponding IS-PART-OF relationship will always have dependency semantics unless it takes part in another relationship. 2. Complex entities may have both entities and values as their components. Every component entity must be IS_PART_OF-related to the complex entity. 3. Components, whether entities or values, may in turn be complex, thereby yielding a composition/decomposition hierarchy. Timestamping in a time-varying IS_PART_OF relationship is translated to the following constraints. The dependency constraint in a time varying IS_PART_OF relationship means that: The existence periods of the complex object and the component object should finish at the same time with the validity period of the IS-PART-OF relationship.

ADDRESS1 I IsPanOf 1-N

Fig. 7. The complex

value class ADDRESS

DOOR

MOTOR

STREET NAME

in more detail.

Fig, 8. The complex

entity class CAR

in more detail.

C. THEODOULIDIS et ai.

410

Also, the exclusiveness constraint is translated to: If an object A is part of the complex objects B and C, then the period during which A is part of B should have an empty inters~tion with the period during which A is part of C. 3. MAPPING

AN

ERT

SCHEMA

TO

A

RELATIONAL

SCHEMA

3. I. Overview The mapping of ERT to the relational model was examined in terms of the two possible approaches that can be taken as far as the resulting relations are concerned, i.e. whether to map onto normalized relations of NF2 relations. The approach adopted is that of normalized relations because of the processing overhead and the complexity of the query language that is inherent in the NF* approach. The mapping ~gorithm takes into account the cardinality constraints, the complex objects, the ISA links and the IS-PART-OF links. The rest of the integrity rules that can be expressed in the rule model, do not affect the mapping process for node 1 (Fig. 1). This is because most of these rules are application dependent and thus cannot be applied to any application independent mapping algorithm. A number of assumptions were made in order to satisfy the requirements of the mapping process. The first one concerns the identification of entities in the resulting relational schema. For this, the use of system-generated, globally unique object identifiers, the so-called surrogates, were adopted. More specifically, to each simple or complex entity, complex value and. symbol period a system generated surrogate, e.g. e$, v%and sp$ is assigned. In addition, a set of attributes inside a complex object is assigned a system generated composite reference surrogate, e.g. c # . As far as the reincarnation of entities is considered, two possible assumptions exist. The first one is that no reincarnation of entities is permitted, i.e. any entity which is no longer current can return only by using different surrogates. For example, an employee that is fired and later on is rehired will have associated two different surrogates. This implies that although the information is essentially about the same employee, the system will have recorded this info~ation twice. Alternatively, it can be assumed that reincarnation is permitted by using the same surrogate. This second assumption was chosen since it is intuitively more appealing but also, because it preserves the strong notion of object identity [39]. Another ass~ption refers to the interpretation of the timestamps. That is, according to the semantics of time adopted, the interpretation of timestamps can be explained as follows. The validity period of a time-varying relationship can be explained as in Fig. 9(a). According to this interpretation, the following generated relations depending on the value of the corresponding cardinality constraints exist: Case 1: (ml,nl)=O,N

or 1,N and (m2,n2)=O,N

or 1,N

generated relation @l(A$, --- B$, SPx%)

I

A

a1 ml, nl

I

I

I

I

I

R2 m&r12

B

A 10

exists

fb)

(a) Fig. 9. Interpretation

of time stamps.

A conceptual modelling formalism for temporal database applications

Case 2:

(ml,nl)=O.

1 or 1, 1 and (m2,n2)=O,N

411

or 1,N

generated relation &l(A$, B$, -SPx$) Case 3:

(ml,nl)=O,N

or 1,N and (m2,n2)=0,

1 or 1, 1

generated relation .@2(A%,-B$, SPx$) where the underlined fields represent the primary key of the relation. The existence period of an entity can be interpreted as shown in Fig. 9(b). In this case the resulting relation will be &(A%, SPx%) Entity classes and complex value classes should be named distinctly, whereas simple value classes and relationship classes may not. In addition, the following assumptions are also true: 1. The existence and validity periods should always he mapped to the calendar time axis, i.e. they should he specified explicitly. That is, if the existence period of a timestamped entity is not specified explicitly as an absolute value then we take the current time as the starting point of the existence period. Similarly, if the validity period of a timestamped relationship is not specified explicitly as an absolute value then take as its starting point the most recent starting point of the existence periods of the two involved entities. 2. eon-timestamped entities and relationships are assumed always existing, i.e. from system startup time until now. These last two assumptions are considered in order to avoid efficiency problems introduced by the use of relative time information. As has already been discussed elsewhere [40,41], the problem of

inferring temporal problem.

relations between two relatively specified time periods is an NP-complete

3.2. The mapping algorithm Several algorithms to construct a relational schema from a conceptual schema have already been proposed in the literature (e.g. for the binary relationship model [42,43] and for the entity-relationship model [44,45]). These algorithms mostly have common underlying principles which are basically simple and straightfo~ard. Our algorithm, as exemplified in the Appendix, keeps most of their simplicity and straightforwardness but it also incorporates the mapping of timestamping and complex objects. We presume the ERT schema to be correct and complete according to our semantics, Then it can be shown that we can guarantee at least third normal form (3NF) for the relational schema generated from our algorithm. In the sequel, the different steps of the mapping algorithm are described. Also, in the Appendix, a formal definition of the rules for the mapping node 1 is presented. Step 1 For each simple or complex entity class create a relation with its surrogate as primary

Step 2

Step 3

key and, for every functional dependent and time invarying role of this entity class, append to this relation the corresponding entity class surrogate or value class name (see r&es RI and R2 in the Appendix). For each entity class which is subclass of another entity class, replace its surrogate in its corresponding relation with the surrogate of its superclass (see rule R3 in the Appendix). For each complex value class do (see rules R4 and R5 in the Appendix): 3.1. Create a relation with its surrogate as primary key and append to it every functional dependent component simple value class and surrogates for every functional dependent complex value class. 3.2. For each functional independent component which is simple value class append to this relation distinct composite reference surrogates and create a separate relation for each of these classes with their composite reference surrogate as their key. 3.3. For each functional independent component which is complex value class append tothisrelationdistinctcompositereferencesurrogatesandcreateaseparate relation for each of these classes with their composite reference surrogate as their key.

C. THEOWULIDIS et al.

412 Step 4

For each simple or complex entity class and for every functional dependent and time varying role of this entity (not including the HAS-COMPONENT role), group in a relation the corresponding entity and symbol period class surrogates (see rule R6 in the Appendix). Step 5 For each relationship (except the IS-PART-OF relations) with no identifier constraint create a relation with fields all the corresponding surrogates and define as key their combination. Also, if the relationship is timevarying then include the symbol period class surrogate (see rule R7 in the Appendix). Step 6 For each complexity entity class do (see rule R8 in the Appendix): 6.1. Create a relation containing the entity surrogate as the primary key, every functional-dependent and time-invarying IS-PART-OF component (i.e. if the component is simple value class then append itself and otherwise append its surrogate) and finally, composite reference surrogates for the rest of its components. 6.2. For each composite reference surrogate in the previously defined relation, create a separate relation and group in it the surrogates of the complex entity, the component and the symbol period. If, in addition, the composite reference is functional independent then define as key the combination of all three fields. Step 7

Add additional constraints according to the constraints of the ERT schema and the constraints defined in the rule model. 7.1. For each complex value class relation created in Step 3.3, define as an alternative key the combination of all the fields except its surrogate (see rule R5 in the Appendix). 7.2. For each time-invarying relationship between an entity and a value class (except IS-PART-OF relationships) with cardinality (1: 1, 1: 1) define as an alternative key of the entity relation the combination of the entity surrogate, the value and the time period surrogate of the entity (if it exists) (see rule R2 in the Appendix). 7.3. For each relation created in Step 4, if the corresponding relationship has cardinality (1: 1, 1: 1) then define as an alternative key of the relation the combination of the other entity or value surrogate and the time period surrogate (see rule R6 in the Appendix).

In addition, for the mapping of the symbol periods we assume a simple database relation of the form symbolperiod(sp$,

starts, finishes, HAS-DURATION).

In the above relation the sp$ surrogate is the key of the relation and from the rest three fields only two are necessary to be defined because the third can always be deduced. Furthermore, the always constant used in our algorithm is a special time period which is used to represent the period of time from the system startup until now. Application of the above mapping steps to the ERT schema of Fig. 10, results in the set of database relations shown in Fig. 11. The advantages of this approach are as follows. The resulting relational schema has flat normalized relations which can be used rather directly in the computations. In addition, the algorithm is simple and straightforward, takes into account the cardinality constraints and, complex objects are treated like any other object without providing additional operators. The disadvantages of this approach are that we need more frequently join operations in order to obtain combined information due to the flatness of the resulting database schema. This may result in an unacceptable I/O consumption. Consequence, it needs to be investigated whether an application specific relational schema is optimally defined. A number of different mapping options, including the mapping of IS-PART-OF links and the possibility to combine tables in order to improve query efficiency, are under investigation, 4. CONCLUSIONS

AND

FUTURE

WORK

The main aim of any conceptual modelling approach is to capture knowledge about the universe of discourse and represent it in such a way so as to enable a system developer to reason about this knowledge, communicate this understanding to end users for validation and specify the allowable structures of and transitions on the information base. This paper discussed the ERT model, developed in the context of the TEMPORA conceptual framework, and argued that this model

A conceptual mcxieiling formalism for temporal database applications

Fig. IO. Example ERT schema

provides the necessary and desirable properties for the development of database systems which, as well as dealing with traditional applications, is required to deal with applications which need the explicit modelling of time and complex objects. The ERT model is complemented in its functionality by the conceptual rule language (which defines all integrity constraints on the components of the ERT as well as the permitted transitions on these objects) and by the process language (which specifies the transactions and the process logic of each transaction). These two components are outside the scope of this paper but it should be noted that they are necessary employee@&,

Name, ~QEJ

manager&&$,

sEz$,

proj~t~,

Ak : (Name, spl S)

Ak : (ProjNo, sp8S)

ProjNo, m

department(&@$.DcptNo,

Ak : (DcptNo, ~~3s;)

a)

Ak : (RegNo, sp9S)

CM&I& RegNo.M) motor(motor$, door&&,

Power, aiwavsa

Color, always$)

addrcss(addr&$,

StrcctNamc,

~rnpioy~~Ad~~~~,

CityName) addressf , @_I$)

EmployceWorlcsForDcparlmcnt~,

M~~n~~cr~~~,

car%

DepartmentHasNam~,

CarDoorSct@oor#.

SDI

OS)

DcptName, &TJ

ProjcctOccupiesEmploy~@& CarComponents&&,

dept’& &%)

OfficeNo, so@&)

ManagcrHasOflice~,

Ak : (OfficeNo, sp6S) Ak : (car& splO$) Ak : (DcptName, sp6)

QLQ& m

motor!% door+0 doors, alway%).

Fig. 11. The relational schema for the example ERT of Fig. 10.

C. THEODOULIDIS et al.

414

components which, together with the ERT model, constitute the set of conceptual models and provide the basis for developing guidelines for information systems implementation using the TEMPORA paradigm. In the context of TEMPORA, the process of information systems development is viewed as a sequence of model-building activities which require appropriate mechanisms for “knowledge elicitation, representation and validation” about the modelled application domain and their mapping onto corresponding design and implementation structures. Current work relating to the issues discussed in this paper is concerned with the development of CASE tools to support the design process, developing mapping tools between the conceptual and executable levels and testing the feasibility of the paradigm on large-scale industrial applications. REFERENCES [l] F. Van Assche, P. J. Layzell, P. Loucopoulos and G. Speltincx. Information systems development: a rule-based approach. J. Knowledge Based Systems l(3), 227-234 (1988). [2] P. Loucopoulos. The RUBRIC project-integrating E-R, object and rule-based paradigms. Workshop Session on Design Paradigms; Eur. Conf on Object Oriented Programming (ECOOP), Nottingham, U.K. (July 1989). [3] P. Loucopoulos, P. McBtien, U. Persson, F. Schumaker and P. Vasey. TEMPORA-Integrating database technology rule based systems and temporal reasoning for effective software. In Proc. ESPRIT Conf, Brussels, Belgium (Nov. 1990). [4] C. Theodoulidis, B. Wangler and P. Loucopoulos. Requirements specification in TEMPORA. In Proc. 2hd Nordic Conf on Advanced Inf Systems Engineering (CAiSE’90) Kista, Sweden (1990). [5] P. Loucopoulos, B. Wangler, P. McBrien, F. Schumacker, B. Theodoulidis and V. Kopanas. Integrating database technology, rule based systems and temporal reasoning for effective information systems: the TEMPORA paradigm. Inf Systems l(l), (1991). [6] P. McBrien, M. Niezette, D. Pantazis, A.-H. Seltveit, U. Sundin, G. Tziallas and C. Theodoulidis. A rule language to capture and model business policy specifications. In Proc. 3rd Nordic Conf on Advanced If. Systems Engineering (CAiSE’9f) Trondheim, Norway (1991). [7] P. P.-C. Chen. The entity-relationship model-toward a unified view of data. ACM-TODS l(l), 9-36 (1976). [8] W. Kent. Limitations of record-based information models. TODS (1979). [9] G. M. Niissen. D. J. Duke and S. M. Twine. The entitv-relationshin data model considered harmful. In Proc. 6th Svmo. ~, I on Empir>cal Foundations of Information and Softwaie Sciences Atlanta, GA (Oct. 1988). G. Wiederhold, J. F. Fries and S. Weyl. Structural organization of clinical database. In Proc. the NCC, AFIPS Press, Montvale, NJ (1975). G. Ariav and J. Clifford. A system architecture for temporally oriented data management. In Proc. 5th Int. Conf: on If. Systems. Tucson, AZ (Nov. 1984). I. Ahn and R. Snodgrass. Partitioned storage for temporal databases. Inf Systems 13(4), 369-391 (1988). J. Ben-Zvi. The time relational model. Ph.D. Dissertation, Univ. California, LA (1982). V. Lum, P. Dadam, R. Erbe, J. Guenauer and P. Pistor. Design of an integrated DBMS to support advanced applications. In Proc. Conf Foundation Data Organization, Kyoto, Japan (1985). P. Dadam, V. Lum and H. D. Werner. Integration of time versions into a relational database system. In Proc. VLDB, Singapore (1984). R. H. Katz and T. Lehman. Database support for versions and alternatives of large design files. IEEE Trans. Softw. Engng 10(2),

(1984).

E. McKenzie. Bibliography: temporal databases. ACM SIGMOD 15(4), (1986). C. Theodoulidis and P. Loucopoulos. The time dimension in conceptual modelhng. Inf Systems 16(3), 273-300 (1991). E. Dubois, J. Hagelstein, E. Lahou et al. The ERAE model: a case study. In Information Systems Design Methodologies: Improving the Practice (T. W. Olle, H. G. Sol and A. A. Verrijn-Stuart, Eds). North-Holland, Amsterdam (1986). J. Hagelstein. Declarative approach to information systems requiring modelling. Knowledge-Based Systems l(4), (1988). ESPRIT PlO7-Loki. A logic oriented approach to knowledge and databases supporting natural language user interfaces. Institute of Computer Science, Research Center of Crete, Greece (1986). M. Jarke. The DAIDA demonstrator: development assistance for database application. In ESPRIT Conf Proc. (1989). M. E. Adiba. Modelling complex objects for multimedia databases. In Entity-Relationship Approach: Ten Years of Experience in Information Modeiiing (S. Spaccapietra, Ed.). North-Holland, Amsterdam (1987). E. F. Codd. A relational model of data for large shared data banks. CACM U(6), (1970). R. L. Haskin and R. A. Lorie. On extending the functions of a relational database system. Proc. ACM SIGMOD Conf., Orlando, FL (1982). S. Abiteboul, P. C. Fischer and H.-J. Schek. Nested relations and complex objects in databases. Lecture Notes in Computer Science, No. 361, Springer-Verlag, Berlin (1989). U. Dayal. Simplifying complex objects: the PROBE approach to modelling and querying them. In Proc. GI Conf Datenbanksysteme in Buro, Technik und Wissenschaft, Darmstadt (April 1987). D. Shipman. The functional data model and the data language DAPLEX, ACM-TODS a(l), (1981). M. B. Villain. A system for reasoning about time. Proc. AAAI-82 Pittsburgh, PA (Aug. 1982). P. Ladkin. Logical time pieces. AI Expert 5867 (1987). J. F. Allen. Maintaining knowledge about temporal intervals. CACM 26(1 l), (1983). C. Theodoulidis. A declarative specification language for temporal database applications. Ph.D. Thesis, UMIST (1990). J. Clifford and A. Rao. A simple, general structure for temporal domains. In Proc. IFIP 8/WG8. I Working Cony on Temporal Aspects in Inf. Systems. Sophia Antipolis, France (May 1987). C. Batini and G. Di Battiste. A methodology for conceptual documentation and maintenance. Inf, Systems 13(3), 297-318 (1988).

A conceptual modelling formalism for temporal database applications

415

[35] W. Kim, J. Banerjee, H. T. Chou, J. F. Garza and D. Woelk. Composite object support in object-oriented database systems. In Froc. Id Int. Conf on Object-Oriented Programming Systems, Languages and Applications, Orlando, FL (Get. 19871. [36] k. Lmie and W. Plouffe. Complex objects and their use in design transactions. In Proc. Dat~~es for ~ng~eering Application; Database Week 1983
APPENDIX VE 3! T name(T) = name(E) VT(E) 3! A A A

Fl ET name(F1) = name(E) + “$” 3! F2 ET name(F2) = existence_period(E) + “$” Pk(T(E)) = (Fl, F2) (Vrel(E, Y) where name(rel(E, Y)) # IS-PART-OF A card(rel(E, Y)) = (0, 1) or (1, 1) A validity_period(rel(E, Y)) = always 3! F3 E T (is_entity~Y) -* name(F3) = name(Y) + “S” v is-value(Y) -+ name(F3) = name(Y) A card(rel(Y, E)) = (0, 1) or (I, I) -Ak(T)

Vrel(E1, E2) where name(rel(E1, E2)) = ISA Replace Fl of T(EI) where name(F1) = name(E1) + “$” with F2 of T(E2) where name(F2) = name(E2) + “8” vcv 3! name(T) = name(CV) VTl(CV1 Fl E Tl name(F1) = name(CV) + “S” Pk(Tl(CV)) = Fl (Vrel(CV, Xl) where name(rel(CV, Xl)) = IS-PART-OF A card(rel(CV, Xl)) = (0, 1) or (1, 1) 3! F2 E Tl (is_complex_value(Xl) -+ name(F2) = name(X1) f “S” v is_simple_value(Y) + name(F2) = name(X1))) (Vrel(CV, X2) where name(rel(CV, X2)) = IS-PART-OF A card(rel(CV, X2)) = (0, N) or (1, N) A isralue(X2) 3! F3 E Tl name(F3) = name(X2) + “ # ” A (3! T2 name(T2) = name(CV) + “_” + name(X2) + “-set” A 3! F4 E TZ name(F4) = name(X2) + “ # ” A 3! F5 E T2 (is_simple_value(XZ) + name(F5) = name(X2) v is_complex_value(X2) -) name(F5) = name(X2) + “S”) A Pk(T2(X2)) = F4)) A Ak(TI(CV)) = combination of all fields except Fl Vrel(X, Y) where name(rel(X, Y)) # IS-PART-OF A card(rel(X, Y)) = (0,I) or (1,l) A card(rel(Y, X)) = (0,l) or (1.1) or (0, N) or (1, N) A validity_~riod(rel(X, Y)) # always A is-entity(X) 3! T name(T) = name(X) + “_” + name(rel(x, y)) + ‘*_” + name(Y) A 3! Fl ET name(F1) = name(X) + “S” h 3! F2 E T (is-entity(Y) -+ name(F2) = name(Y) + “S” v is-value(Y) --) name(F2) = name(Y)) A 3! F3 E T name(F3) = name(validity_period(rel(X. Y))) + “%” A Pk(T) = (Fl , F3) h card(rel(Y, X)) = (0,l) or (1,1) --) Ak(T) = (F2, F3)

= (F2, F3)))

C. THEODOULIDIS et al.

416 SW:

Vrel(X, Y) where name(rel(X, Y)) # IS-PART-OF A card(rel(X, Y)) = (0, N) or (1, N) A card(rel(Y, X)) = (0, N) or (1, N) A is-entity(X) 3! T name(T) = name(X) + “_I’ + name(rel(x, y)) + “_” + name(Y) A 3! Fl ET name(F1) = name(X) + “%” A 3! F2 ET (is-entity(Y) + name(F2) = name(Y) + “$” v is-value(Y) + name(F2) = name(Y)) A 3! F3 ET name(F3) = name(validity_period(rel(X, Y))) + “%” A Pk(T) = (Fl, F2, F3)

Ws:

VCE 3! Tl name(T1) = name(CE) + “_Components” rr 3! Fl E Tl name(F1) = name(CE) + “$” A Pk(T1) = Fl A (Vrel(CE, Xl) where name(rel(CE, Xl)) = IS-PART-OF A card(rel(CE, Xl)) = (0, 1) or (1, 1) A validity_period(rel(CE, Xl)) = always 3! F2 E Tl (is_simple_value(Xl) + name(F2) = name(X1) v name(F2) = name(X 1) + ‘5”)) A (Vrel(CE, X2) where name(rel(CE, X2)) = IS_PART_OF A (card(rel(CE, Xl)) # (0, 1) or (1, 1) v validity_period(rel(CE, Xl)) # always) 3! F3 E Tl name(F3) = name(X2) + “ # ” A 3! T2 name(T2) = name(CE) + name(X2) + “-set” A 3! F4 E T2 name(F4) = name(X2) + “ # ” A 3! F5 E T2 name(F5) = name(X2) + ‘5” A 3! F6 E T2 name(F6) = name(validity_period(CE, X2)) + “$” A (card(rel(CE, X2)) = (0, 1) or (1, 1) + Pk(T2) = (F4, F6) v card(rel(CE, X2)) = (0, N) or (1, N) -+ Pk(T2) = (F4, F5, F6))

Legend

E: simple (SE) or complex entity (CE) V: simole (SW or comolex value (CV) T: tabie/reiati& F: field Pk: Primary key Ak: Alternative key