Data & Knowledge Engineering 1 (1985) 75-116 North-Holland
75
The category concept: An extension to’the entity-relationship model R. ELMASRI* Departmentof ComputerScience, Universiry of Houston - University Park, Houston, TX 77004, U.S.A.
J. WEELDREYER* Motorola Microsystems, Tempe, AZ 8.5%
U.S.A.
A. HEVNER** College of Business and Management, University of Maryland,
College Park, MD 20742, U.S.A.
Abstract. An enhanced version of the Entity-Relationship (ER) data model called the Entity-CategoryRelationship (ECR) data model is presented. The principal extension is the introduction of the concept of a category. Categories permit the grouping of entities from different entity types according to the roles they play in a relationship, as well as the representation of ISA and generalization hierarchies. The structures of the ECR data model are defined, and a graphic representation technique for their display is presented. Language operations to define and use an ECR database are defined. Two realistic examples of the use of the ECR model for database design are demonstrated. The examples show how ECR structures can be directly mapped into relational and network structures. The definition of derived relationships on an ECR database gives the power to phrase. higher order recursive queries in a first order query language.
Keywords. Conceptual data models, entity-relationship model, data semantics, generalization and ISA hierarchies, entity types, categories, attributes, relationships, roles, structural constraints, relational databases, network databases, query languages, graphs, connections, database operations, transactions, recursive queries.
1. Introduction A central feature of any database management system (DBMS) is the data model upon which the system is based. At the conceptual level of a DBMS architecture [41], the data model plays two important roles. First, the data model provides a methodology for representing the objects of a particular application environment, and the relationships among these objects (the conceptual or semantic role). Second, the data model is structured to allow a straightforward translation from the conceptual schema into the physical data structures of the internal level of the DBMS (the representational role). A large number of data models have been proposed. However, actual experience with the use of these data models as a basis for implementing a generalized DBMS system is very scant. Nearly all early commercial DBMS implementations were based on either the hierarchical * Most of this work was performed when these authors were with Honeywell Computer Sciences Center, 10701 Lyndale Avenue South, Bloomington, MN 55420, U.S.A. This work is partially supported by NSF grant ECS8007683.
‘*This work was performed when the author was with the Computer Minnesota, Minneapolis, MN 55455, U.S.A. 0169-023?C/85/$3.30 @ 1985, Elsevier Science Publishers B.V. (North-Holland)
Science Department,
University
of
76
R. Elmasri et al. I The category concept
model [63] (such as IMS [39] and SYSTEM-2000 [47]) or the network model [62,14] (such as IDS [37] and IDMS [20]). Many newer DBMS implementations are based on the relational model [16]. Experience with actual systems has led to the realization that these data models do not satisfy the two roles of a conceptual model (semantic and representational) with equal success. The hierarchical and network models are strongly oriented towards facilitating the subsequent implementation of the conceptual schema. This is undoubtedly so because, historically, the physical structure of the DBMS was designed first, and then a data model was developed to allow conceptual modeling on the particular physical design. Thus, the hierarchical data model is based on underlying tree-oriented data structures, while the network model is based on ring-oriented data structures. The use of the hierarchical and network data models in the semantic role is hence burdened with numerous construction rules and artificial constraints. In recognition of some of these problems, the CODASYL committee has produced a revised report [15] that removes some of the implementation dependencies of their earlier work [14]. However, little practical experience has yet been reported about these modifications. The relational data model [16] was proposed as a simple and theoretically appealing representation of data. In spite of its attractive simplicity, however, the relational model must be enhanced to fill the two roles of a conceptual level data model. In pursuit of discovering semantic enhancements to the relational model, a rich theoretical foundation about data dependencies and normalization was produced [17,18,5,31,44]. However, no implemented relational DBMS that we know of uses this theory. In addition, several of the dependencies proposed, such as multi-valued dependencies do not seem to have an intuitive basis in data modeling. Some normal forms have been attacked as counter-intuitive when considered from a purely formal point of view [6]. Criticisms of the ability of the relational model to adequately model the semantics of application environments appear in [55,60,43,40,19,35]. Implementations of relational systems (such as SYSTEM-R [3] and INGRES [36,61]) have demonstrated that practical performance can only be gained by including effective methods of translating relational structures into efficient physical. structures. The optimization of these translation methods is an important area of physical database design research [ll]. Recent research in database design methods has developed the idea of using two distinct data models at the conceptual level [42]. An enhanced conceptual data model would provide an effective means of describing the database application environments. A representational data model would be used for efficient translation of a schema into physical data structures. Thus, the hierarchical, network, or relational data models could be employed as representational data models. The DBMS would provide a user-invisible translation between the conceptual schema and the representational schema. Therefore, database applications are able to declare and reference data as viewed in the conceptual schema. In the past several years, a large number of proposals for new conceptual data models have appeared in the literature [54,10,32,66,60,7,8,48,34,23,59, 19,351. While each new model claims to represent more of the semantics of the data, there are usually only slight differences among the proposed models. One of the most accepted conceptual data models has been the entity-relationship (or ER) model of Chen [lo, 211. The important contribution of the ER data model is the distinction between entities (or objects) and relationships among entities (or connections among objects). In the ER model, it is recognized that relationships have certain semantic properties that distinguish them from entities, such as relationship cardinalities (1 : 1, 1 : N, M : N) and existence dependencies. Important contributions in this area were also reported in [55,65]. Continued research in data models seems to indicate that the constructs of the ER data model are not sufficient to represent some important data semantics. The concepts of
R. Elmasri et al. / The category concept
77
subclasses [60,34] (also called ISA-hierarchies [45]) and superclasses (called generalization categories in [60]) are not directly supported by the ER model. The importance of these concepts is further discussed in [34,48,65,59, 191. A recent survey of data modeling concepts [56] shows that while the ER model provides for the modeling of aggregation [60] (using attributes and relationships), it does not provide modeling primitives for generalization. Another necessary semantic concept that is not directly represented in the ER model is that of grouping of entities, which do not necessarily belong to the same entity set, to participate in a relationship in the same given role. The concept of role is introduced in [4], and is further discussed in [59,57,19]. In the ER model, only a single entity set can participate in a relationship in a given role. Requiring that the entities that participate in a relationship in a certain role to be all of the same entity set is unnecessarily restrictive. In this paper, we present our research to extend the ER model so that we may directly represent the above semantic concepts. We accomplish this by introducing a single additional construct to the ER model; the concept of a category. In Section 2 of this paper, we define this extended ER model, which we call the Entity-Category-Relationship, or ECR model of data. The concept of categories handles both the multiple entity set participation in a given role of a relationship, as well as the subclass and superclass concepts. Other enhancements to the ER model have been incorporated. In the ECR model, entities stand for themselves, and need not be identified by a particular key attribute. This approach is similar to the concept of surrogates [33,19]. Another extension to the ER model is a more general definition of attribute. The attributes in the ECR model can be multi-valued, yet are defined within the same framework as single valued attributes. Attributes are defined as functions with the domain of the function being an entity set, a relationship set, or a category, and the range of the function being the power set of a set of values. Recent research in languages for the ER model [51] shows the importance of multi-valued attributes in the definition of an algebra for the ER model. Finally, a more complete specification of the cardinality and dependency properties of relationships is available in the ECR model. A precise specification is used, similar to that in [l], but extended to n-ary relationships, II > 2. This same technique is used to specify constraints on attributes. The organization of this paper is as follows. In Section 2, we present the ECR model. In Section 3, we demonstrate how the ECR model is easily translated to a representational data model. The translation methodology is illustrated by using the relational model and the CODASYL network model as representational data models. In Section 4, we demonstrate the use of the ECR model in logical database design. ECR schemas are designed for two realistic database applications. In Section 5, we present a query and update language, GORDAS [27,28], associated with the ECR model. In Section 6, we present a mechanism whereby complex, second order, queries (such as recursive closure) can be phrased by the user as first order queries if appropriate derived structures are represented in the ECR schema. Section 7 presents our conclusions, and briefly discusses two experimental database management systems which use the ECR model at the conceptual level and the relational model at the representational level.
2. Definition of the ECR model 2.1. Ovemiew Briefly, the ECR model, like the ER model, views the world as consisting of entities and
78
R. Elmasri et al. / The category concept
relationships among entities. Entities and relationships have attributes which provide information. Entities are classified into entity types according to their basic attributes. Furthermore, in the ECR model, entities are grouped into categories according to the roles which they may play in relationships. Thus, relationships are mathematical* relations over categories of entities. 2.2. Entities and relationships Given the above intuitive overview of the ECR model, we now give more precise definitions of the model constructs. An entity is something which exists in the real world and has attributes which describe it. Attributes of an entity are of two kinds, basic and acquired. A basic attribute is one which is fundamental to the entity, independent of any relationships involving that entity (explicitly or implicitly). An acquired attribute is not fundamental to the entity, but is associated with the entity as the result of an abstraction process in which a relationship involving the entity has been omitted from the database. For example, a basic attribute of a person entity would be his/her hair color, while an acquired attribute might be employee number. Employee number could be considered to be an attribute of an employment relationship involving the person. However, in the case of a personnel database for a single company, such an employment relationship is not of great interest and is likely to be omitted, with the employee number attribute moved to the person. Figure 1 illustrates this abstraction process. The conventions for our diagrams are given in Section 2.6.
Fig. 1. ER diagram of an abstraction resulting in acquired attributes.
The difference between basic and acquired attributes is important when modelling a particular application. In many cases, categories of entities, which are either a subset of entities of an entity type, or a grouping of entities from several entity types, have attributes that apply only to entities that are members of the categories, but not to other entities in the entity sets. 2.3. Entity types and categories Entities which have similar basic attributes are classified into entity types. For example, the particular automobile with registration number, XYZ-123, is an entity, while a set of automobiles. is an entity type. Entity types are disjoint, in that a given entity can be a member of only one entity type. We use the notation e,, e,, . . . , e, to represent individual entities, and T,, T,, . . . , q to represent entity types. An entity type is similar to the entity set of the ER model. Entities are also classified into categories according to the roles they may play within
R. Elmasri et al. I The category concept
79
relationships (whether or not the relationship is actually represented in the application environment). For example, members of the AUTOMOBILE entity type and of the TRUCK entity type may be categorized as VEHICLE in an owner-vehicle relationship. Similarly, members of the PERSON entity type and of the CORPORATION entity type may be categorized as OWNER in the same relationship (Fig. 2(a)). These two examples show how categories are used to represent superclasses (generalization categories) of entity types. Categories are also used to represent subsets (ISA-categories). For example, three categories of the PERSON entity type could be TECHNICIAN, SCIENTIST and FULL-TIMEEMPLOYEE in a company application (Fig. 2(b)). In Fig. 2, categories are shown in hexagonal boxes, which is the convention of the ECR diagrams presented in Section 2.6.
OWNER CATEGORY
VEHICLE CATEGORY
EMPLOYEE
ULLTIME-EMPLOYEE
Fig. 2. Examples of categories; (a) Examples of generalization (superclass) categories; (b) Examples of ISA (subclass) categories.
We use the notation C,, C,, . . . , c to represent categories. Thus,
where, for 1
80
R. Elmasri et al. I The category concept
EMPLOYEE EMPLOYEE.
in addition to its own acquired attributes since every SECRETARY
is also an
2.4. Relationships We define the Cartesian product X of n categories as X(C,, C,, . . . , C,) = {(e,, e,, . . . , en): ej E q for 1 Gj G n}. Then, R is a relationship (or relationship set) over the not necessarily distinct categories C, if it is a subset of X(C,, . . . , C,,). An element of R is termed a relationship c,,..., instance. (Note that R is in fact a mathematical relation over C,, . . . , C,, although we will not use that term in order to avoid the confusion which would obviously result.) For semantic integrity, rules exist in the real-world situation that govern the possible ways in which entities of a category can participate in a relationship. We call these the structural properties of the relationship. We can specify the participation of a category Cj in a relationship R by two numbers (il, iJ, 0 c i, s i2 and i2> 0. The numbers mean that each entity that is a member of category Cj must participate in at least i,, and in at most i, relationship instances in R. The numbers i, and iz are a concise method of specifying structural constraints. Popular constraints on relationships are easily specified in this scheme. A relationship R is total with respect to the particpation of Cj if i, 3 1, and partial if i, = 0. We can also specify that the participation is functional (at most one appearance of each entity x in C will exist in R at a given moment) if we specify i2 = 1. These specifications are equivalent to the cardinality and dependency constraints of [24], and to Abrial’s specifications [l], although the latter are specified for binary relationships only. The default values for i, and iz are i, = 0 and iz = 00, which correspond to not specifying any constraint on the participation. A relationship R is specific with respect to the participation of C if the participation is total, and, in addition, once an entity from C;. is related by some relationship instance in R, that relationship instance cannot be removed unless the entity itself is deleted. An example of partial participation of a category in a relationship would be the OWNER category in the OWNERSHIP relationship of Fig. 2(a). Since the OWNER category is composed of the PERSON and CORPORATION entity types, and since not all persons and corporations are expected to own vehicles, the participation is partial. On the other hand, the participation of the VEHICLE category is total, since the existence of an unowned vehicle is not allowed. The total participation of a category in a relationship implies an existence dependency of entities within that category upon related entities within other participating categories. For example, the deletion of all owners of a given vehicle without its reassignment to another owner or its deletion would result in an inconsistent database. Note, however, that participation of the VEHICLE category is not specific, since vehicles may be reassigned to new owners. An example of a specific relationship is given in Fig. 3(a), which is part of a company database. Here, the association of a dependent to a supporting employee is assumed to be a permanent relationship, and deletion of the supporting employee would imply deletion of the dependent, since reassignment to a new supporting employee would not conform to the real world situation. It is also possible for a single category to play multiple roles within a single relationship, so that the categories over which a relationship set is defined are not necessarily distinct. An example of such a situation is the participation of the PART category/entity type in a COMPONENT relationship (the well-known parts explosion relationship), shown in Fig. 3(b). Here, the PART category plays both roles in the COMPONENT relationship: .assembly and
R. Elmasri et al. / The category
81
concept
EMPLOYEE
subassembly
assembly
1
Fig. 3. Examples of relationships;
(a) Specific participation of DEPENDENT, category.
(b) Multiple
roles of the same
subassembly. A particular PART entity which participates in the relationship in the assembly role is related to its subcomponent PART entities which participate in the subassembly role. 2.5. Attibutes Entities and relationships have attributes (basic and acquired) which describe them. An attribute is associated with either an entity type, a category, or a relationship. Each attribute is defined on a value set, from which the values of the attribute for the entities or relationship instances are taken. An attribute a, of an entity type T, a category C, or a relationship R, defined on value set V, is a function with domain T, C, or R, and range P(V), where P(V) is the power set of V: a:T+P(V)
or
a:C+P(V)
or
a:R+P(V).
This definition allows for a direct representation of multi-valued attributes. If V is a set of single values, a is called a simple attribute. If V is the cross product of several sets V,, . . . , V,, (V = V, X * * * X V,,, where X is the cross-product operation), then a is called a compound attribute. Constraints exist on the attributes in order to represent real-world semantic constraints. These are cardinality constraints and uniqueness constraints. The cardinal@ of an attribute is the number of values from set V that can appear in a(x) for some element x (entity or relationship instance) in the domain D (entity type T, category C, or relationship R) of a. .The cardinality constraint for attribute a is specified by two numbers (iI, iJ, where 0 Q i, c i2 and i2> 0, where i, is the minimum number of values, and iz is the maximum number of values from V that can appear in a(x) for some x in D:
a(x) = h,
v2, . . . , vJ,
The numbers i, and attributes. An attribute is total (no null values default values are i, =
i,Gn
i, are a simple and a is single-valued are allowed) if i i 3 i, = 1 (single-valued,
VforlSjSn.
uniform method to specify popular constraints on if i2= 1, and is multi-valued if i2> 1. The attribute 1, and is partial (null values allowed) if i, = 0. The total).
82
R. Elmasri et al. / The category concept
The uniqueness (or key) constraint is specified on an attribute a (a set of attributes can be considered a compound attribute) as follows. An attribute a is a unique (key) attribute if any value t) in P(V) appears as a value of attribute a for at most one entity x in D at any given consistent state of the database. Unique attributes are used to represent attributes whose values serve to identify entities of a given entity type or category and are hence similar to the ‘key’ concept of relational databases. However, all attributes in the ECR model are updatable. In addition there is no requirement that an entity type must have a set of key attributes, since entities with the same attribute values could be identified by their relationships to other entities. For the purpose of user reference to the database to perform query and update, there is a correspondence between the attributes of a category and the attributes of the entity types whose members comprise the category. Namely, the set of attributes of a category is the union of the basic attributes of all participating entity types together with any acquired attributes which have. been specially defined for the category. Since a given entity ,type attribute A, may not be defined for all entity types in the category, the category attribute, A,, corresponding to A, is defined as follows: A = A, f I 0
for those entities for which A, is defined, for other entities in the category.
From the above definition, it follows that if A, is a total attribute, defined over all participating entity types, then A, is a total attribute. Also, if A, is unique, then so is A,. Note that these attributes are not duplicated in the category but are inherited from the entity types as discussed in Section 2.3 (see also Section 5.2, query Q.ll). A slightly different approach to categorization is suggested in [57], and is based upon abstraction by generalization as defined in [60]. The primary difference between that approach and the one suggested here lies in the treatment of category attributes. In [57], the attributes of the category would be the intersection of the attribute sets of all participating entity types. The approach we have chosen allows the easy specification of data to satisfy .selections of the kind: Find aN information about a subset of entities within a category. A specific example would be: “Find all information about the owner of the vehicle with registration number XYZ-123”, where an owner could be a person or corporation. Within the ECR model, there is one’implicit attribute, ENTITY-TYPE, which is defined for all entity types in the database. The ENTITY-TYPE attribute is a single-valued total function mapping all entities into the ENTITY-TYPE value set, which consists of the names of all entity types defined within the database, i.e., ENTITY-TYPE
: U T *
type names}
for 1 d i s n
where n is the number of entity types. For a given entity, the value of its ENTITY-TYPE attribute is the name of the entity type to which it belongs. This attribute is user uigible, but the value is system assigned when the entity is created. The purpose of this attribute is to allow the user to determine the entity type of a particular entity within a category. 2.6. ECR diagrams Figure 4 shows how the different ECR constructs are displayed in ECR diagrams. ECR diagrams are extensions of ER diagrams [lo]. An entity type is represented as a rectangular box, and a relationship as a diamond-shaped
83
R. Elmasri et al. I The category concept
entity type which also a category
relationship ENTITYTYPE NAME attribute
name
.
entity
is
r’
type
category
-(i, = 0) partial
participation
total participation
specific
participation N
1
(i,=l) functional
--
participation
Fig. 4. Conventions for ECR diagrams.
box, as in ER diagrams. A category is represented as a hexagonal box, and is connected to the entity types that form it by single lines. When a category is identical to an entity type, a hexagon is drawn to surround the rectangle that represents the entity type. Subset (ISA) categories are shown as in Fig. 2(b), with a subset symbol on the arc connecting the category to the entity type. Generalization categories are shown as in Fig. 4(a), where the letter U specifies the set union operation since the category is a subset of the PERSON
CORPORATION
OWNER
Fig. 4(a). Generalization
category.
84
R. Elmasri et al. / The category concept
union of several entity types. If a restricting predicate is specified, it may be displayed next to the appropriate arc. Attributes are shown as functions from the object set (entity type, category, or relationship) to the power set of the value set of the attribute. Hence, a directed arc is drawn from the object set to an oval box which contains P(V), and the attribute name is displayed next to the arc (Fig. 4). The default attribute cardinality is single-valued and total (il = i, = 1). The default cardinality for a multi-valued attribute is (il = 1, i, = w), and such an attribute is tagged with the symbol M.V. next to the attribute name. When alternative values of i, and/or i2 are used, they are displayed next to the attribute arc. Since we do not discuss the detailed definition of value sets in this paper, we will show attributes in a slightly different manner in our diagrams. The name of the attribute itself (rather than P(V)) is displayed in an oval, which is connected to the appropriate entity type, category, or relationship by an undirected arc. For relationships, different representations exist for popular participations of categories in relationships. The default structural constraint for relationships is i, = 0 and i, = CO(no constraint), and is represented by a single line between the category and the relationship in which the category participates. A total participation (i, = 1) is represented by double lines, while a specific participation (which must also be total) is represented by triple lines. Whenever alternative values of i, and/or i2 are used, they are displayed next to the participation line. Functional participations (i2= 1) are represented by an arrow away from the category whose participation is functional. 2.7. Special cases of subset (ISA) categories
In some cases, additional constraints may be specified on categories. For example, two or more subset categories may be known to be disjoint. This can be represented in the ECR model by creating a single-valued attribute in the entity type on which the categories are specified, and specifying the conditions of category membership on different values of the attribute. For example, if MANAGER, ENGINEER and SECRETARY are disjoint subsets of EMPLOYEE, we can use an attribute JOBTYPE in EMPLOYEE to specify the membership condition for each category (JOBTYPE = ‘manager’ for MANAGER, JOBTYPE = ‘engineer’ for ENGINEER, and JOBTYPE = ‘secretary’ for SECRETARY). If it is desired to represent this, or any other constraints, graphically, ECR diagrams may
T
EMPLOYEE
Fig. 4(b). Disjoint ISA categories.
R. Elmasri et al. / The category concept
85
be easily extended. For example, disjoint categories can be shown as in Fig. 4(b). Hence, special cases of categories can be incorporated into ECR diagrams if required.
3. Use of the ECR model in database design 3.1. The design process
The complete design of a database requires four separate but dependent design steps [42]. (1) Requirements analysis. The environment of the applications must be thoroughly analyzed to define the requirements of each potential database system user. (2) Conceptual modeling of information. Using the information gained in Step 1, the requirements of each user are modeled into an external view of the database. The several user views must be reconciled and integrated into a single community view of the data [23]. All user applications are supported in this conceptual level schema of the database. (3) Representational design of the database. The system-independent conceptual schema of Step 2 is mapped into a system-dependent implementation schema. This step is needed to bridge the gap between the rich conceptual data structures used for modeling user applications and the processable data structures required for effective system implementation. (4) Physical database design. The questions of actual database system implementation are dealt with in this step. The choice of data models with which to perform Steps 2 and 3 of the database design process is a very important issue. Most of the previous work in this area has utilized a single data model for the complete design process. The most popular data models have been the relational data model [16] and the CODASYL DBTG network data model [14]. While these models provide the strong structuring and manipulation capabilities needed in Step 3, they are not semantically rich enough to capture the diverse meanings of data needed for the complete view modeling of user requirements in Step 2 [60,43,40,19,64,25,35]. Several new data models that include semantic enhancements to the relational and network models have been proposed [55,10,4,66,65] for use in Step 2. During Step 3 of system design, these enhanced data models must be mapped into data models, such as the relational or network models, which can be readily implemented on a database system. This mapping is invisible to the database user, who defines and manipulates the data as viewed in the high-level semantic-rich data model. In this section, we will demonstrate that the ECR model can be mapped into a relational representation and a DBTG network representation quite easily. Section 4 presents two examples of this mapping. 3.2. ECR representation in the relational data model
In the relational representation of an ECR schema, every relation that represents an entity type or a category will contain a unique surrogate identifier attribute [33,19], which is invisible to system users. We will refer to this attribute as S(*), where * is the name of the entity type or category under consideration. The translation of an ECR schema to a relational schema is straightforward, as follows. An entity type is represented directly as a relation with a unique surrogate attribute (Fig. 5). Note that in the ECR model, there is no notion of an entity being uniquely identified by an attribute or a set of attributes. We can constrain an attribute to be unique, but all attributes can be updated. For implementation purposes, the system supplied surrogate attribute will serve to uniquely identify each entity instance.
R. Elmasri et al. / The category concept
86
PERSON S(PERSON)
SSN
NAME
ADDR
ADDR
PHONE
PHONE
J&f.yIff’y SSN
NAME DBTG
Type
Data Model
Fig. 5. Entity type translation.
A category is represented as a relation in which each tuple is identified by the unique surrogate values of each entity that is a member of the category (Pigs. 6 and 7). The attributes in the category relation are the acquired attributes of the entities in their role defined by the category. Note that no redundancy results, but that the basic attributes of an EMPLOYEE entity such as Name and Ssn are recovered by a relational join operation between the EMPLOYEE and PERSON relations on the surrogate identifiers of both relations. EMPLOYEE S(EMPLOYEE) SAL “(;z
PERSON
SHIFT
(1 Relational
Data Model
=> I
EMPLOYEE SAL
EMPLOYEE
iis
Sal
SHIFT
13 Set 27 DBTG
Shill
PERSON Data Model
Fig. 6. ISA (subclass) category translation. OWNER S(OWNER) CORPORATION S(OWNER)
I sS(PERSON) U S(CORPORATION) Relational Data Model
* 111
OWNER
CORPORATION DBTG Data Model
Fig. 7. Generalization
(superclass) category translation.
87
R. Elmasri et al. I The category concept
Functional relationships (one to one, many to one) can be represented by defining an attribute containing the surrogate values of the range category relation in the domain category relation. In Fig. 8, the attribute S(PLANE TYPE) is sufficient to represent the 1: N relationship OF-TYPE. This is identical to the concept of a foreign key in relational terminology. Any attributes of the 1: 1 or 1 :N relationship itself are also placed in the relation which represents the domain category of the functional relationship. For example, if the OF-TYPE relationship had had any attributes they would have been placed in the AIRPLANE relation. AIRPLANE
foreign key REG # S(PLANETYPE)
S(AIRPLANE) RANGE CATEGORY
Relational
Data
Model
Data
Model
Y
DBTG
Fig. 8. Functional (1 : 1, 1 : N) relationship translation.
Non-functional relationships (many to many) are represented by a relation that contains one surrogate attribute for each category which participates in ‘the relati,onship. These category attributes hold the surrogate values of entities that appear in the relations that represent the categories. The single-valued attributes pertaining to the relationship are also included in the relation (Fig. 9). OWNS combined key S(OWNER) S(AIRPLANE)
1
I Relational
PDATE
I Data
Model
=+
Fig. 9. Non-functional
(M:N)
relationship
translation.
I
88
R. Elmasri et al. / rite category concept
The only other relations needed are for the special case of a multi-valued attribute. A relation is constructed whose attributes are the multi-valued attribute and the surrogate attribute of the relation that represents the entity type, category, or relationship with the multi-valued attribute. This relation relates the surrogate values of the entity-type, category, or relationship with each value of the multi-valued attribute (Fig. 10). This is similar to the concept of ‘nest’ relations [65]. Note that the single-valued attributes are kept in a separate relation which corresponds to the entity type, category or relationship. By application of the appropriate relational join operation on the surrogate identifiers, all relevant information concerning relationships, categories and multi-valued attributes can be recovered from the relational database. SCIENTIST-SOCIETIES combined S(SCIENTIST)
key SOCIETIES
Relational
Model
Data
M.V.
DBTG
Fig. 10. Translation of multi-valued
Data Model
attributes.
3.3. ECR representation in the DBTG network model Using the DBTG Network model [14] as a representation structure is also straightforward. The strategy is to represent each entity type as a record type. The attributes of the entity type are represented as data items of the record type (Fig. 5). Multi-valued attributes are represented as repeating groups (or vectors) (Fig. 10). The database key mechanism may be used to uniquely identify each entity. Categories can be represented in numerous ways in the DBTG model. We will only demonstrate the most straightforward representations. When the entities of a category are all from a single entity type, the category is a subset of the entity type (an ISA category), and we can represent it using a single owner-single member set which is constrained to be a 1: 1 set. The category record type acts as the member and the associated entity type record type acts as the owner (Fig. 6). We can choose the set to be automatic mandatory, so that simple constraints are automatically maintained by the DBTG system. Thus, a category record must be owned by the entity type record which represents the same entity instance. When entities from more than one entity type are members of the category (a generalization category) a single owner-multi member set is needed, as shown in Fig. 7, with the category as the owner. Functional relationships can be directly represented by a DBTG owner-member set with the domain category functioning as the member and the range category as the owner (Fig. 8). Any attributes of the 1: 1 or 1: N relationship are placed in the member record type. A non-functional (M : N) relationship is represented by a distinct linking record-type, together with sufficient owner-member sets to associate the participating category record types with
R. Elmasri et al. / The category concept
89
the relationship record type (Fig. 9). In all such sets, the category record type functions as the owner and the relationship record-type as the member. This approach is identical to the approach in [lo, 621. 3.4. Summary Table 1 gives a comparison summary of the mapping from the ECR conceptual structures into the semantic free relational and DBTG representation structures. Table 1. Representation
structures.
ECR
Relational
DBTG
Entity-type Category (ISA)
Relation Relation
Category (generalization)
Relation
Functional relationship (l:l, 13)
Range surrogate value in domain relation (foreign key) Relation
Record type Record type and single-owner single-member set Record type and single-owner multiple-member set Single-owner single-member set
Non-functional relationship (M : IV) Single-valued attribute Multi-valued attribute Entity identifier
Attribute ‘Nest’ relation with combination key System assigned surrogate attribute
Record type and owner-member sets Field Repeating group, vector Database key
4. Examples of the use of the ECR model Two realistic applications to illustrate the use of the ECR model in database design are now presented. A brief scenerio of the application requirements is given and a conceptual schema using the ECR model is constructed. As a final step, the ECR schema is translated into a relational schema and a DBTG schema. 4.1. Example 1: An airport database A database is needed to manage the operations of a municipal airport. Upon completion of the requirements analysis step the following user ,applications are found to be required: (1) A record of each airplane registered at the airport, and its registration number [Reg#] is needed. Each airplane is of a particular plane type. Plane types have properties of a model name [Model], a capacity [Capacity], and a weight [Weight]. (2) The airport maintains hangars to store the airplanes. Each airplane has a designated hangar. Hangars have a capacity [Capacity] and a location [Location]. (3) Records are kept on all airplane owners. Either individuals or corporations may be owners. Owners and airplanes define a many-to-many relationship. The purchase date [Pdate] of the plane is recorded. For each owner, the properties of n,ame [Name], address [Address] and phone number [Phone] are stored.
90
R. Elmasri et al. / The category concept
(4) Pilot records are kept. Each pilot has a license number [Lit-Num] and certain restrictions [Restr]. Pilots are qualified to fly specific plane types. (5) The airport has employees who work specific shifts [Shift] and earn salaries [Salary]. Employees are qualified to work on certain plane types and they file maintenance records on specific airplanes listing the date [Date], the workcode [Workcode], and hours worked during that date [Hours]. (6) For all persons in the database, the social security number [Ssn] is needed. From the user applications, an integrated conceptual schema of the airport database was designed using the ECR model. The schema is shown in Fig. 11.
CORPORATION
Fig. 11. Airport
database
conceptual schema.
Note particularly the use of categories in -the design. The categories PLANE-TYPE, AIRPLANE, and HANGAR are also entity types in the database. OWNER is a generalization category made up of entity instances from CORPORATION and PERSON. Each owner entity inherits the attributes of either a corporation entity or a person entity. The categories PILOT and EMPLOYEE are subsets of PERSON. It is important to recognize that the three categories whose entities may be persons (OWNER, PILOT, and EMPLOYEE) demonstrate independent roles in the database. Thus, a person may be an owner, a pilot, and an employee. Following the rules presented in Section 3, the ECR conceptual schema of Fig. 11 is mapped into a relational representation schema (Fig. 12) and a DBTG representation schema (Fig. 13).
a R. Elmasri et al. / l’le category concept
91
ENTITY TYPES: PLANE lYPE(S(PlANE TYPE), MODEL, CAPACITY; WEIGHT) HANGAR(S(HANGAR, CAPACITI: LOCATION) AIRPLANE(S(AlRPLANE), REG #, S(PLANE TYPE), S(HANGAR)) PERSON(S(PERSON), SSN, NAME, ADDRESS, PHONE) CORPORATlON(S(CORPORATION), NAME, ADDRESS, PHONE) CATEGORIES: OWNER(S(OWNER)) WHERE S(OWNER) cS(PERSON) U S(CORPORATION) PILOT(S(PILCjT), UC-NUM, RESTR) WHERE S(PILOT) E S(PERSON) EMPLOYEE(S(EMPLOYEE), SALARY SHI!=l-) WHERE S(EMPLOYEE) C S(PERSON) M:N RELATIONSHIPS: OWNS(S(ObVNER), S(AIRPlANE), PDATE) FUES(S(PILOT), S(PLANE TYPE)) WORKS ON(S(EMPLOYEE), S(PLANE TYPE)) MAINTAIN(S(PILOT), S(AIRPLANE), DATE, HOURS,
Fig. 12. Airport database representation
WORKCODE)
schema (relational model).
PLANETYPE EMP4SA.P
OF-TYPE
OWNS-A
STORED-IN I
1
CfRPORATI;N NAME
ADDRESS
rJ 1 ; PHO
E
, SSN
P;,,,, NAME
ADDRESS
,
, PHONE
Fig. 13. Airport database representation schema (nehvork model).
4.2. Example 2: Company database A company needs a database system to manage it& operations. The following applications are found to characterize the required data: (1) The company is divided into divisions. Each division has a name [Name], a number [Number], and a geographical location [Location].
92
R. Elmasri et al. / The category concept
(2) The company contracts for projects which are defined by project name [Name], number [Number], start date [Pstart], and location [Location]. Each project is assigned to a specific division for management control. (3) Personnel control is of special interest to the company. For all employees, the properties of name [Name], address [Address], social security number [Ssn], office phone [Ophone], and home phone [Hphone] are maintained. Full-time employees of the company are assigned to a division for accounting and salary [Salary] purposes. Consultants are hired and are paid depending on a designated payscale [Payscale]. The name of the organization [Organization] of a consultant is kept. (4) Personnel work on projects for a specified number of hours per week [Hours]. A person may work on any number of projects. A full-time employee manages each project in a one-to-one relationship. The start date [Mstart] for managing a project is noted. (5) Special information is maintained on scientists and technicians in the company. Scientist properties are scientist grade [Sgrade] and professional societies [Societies] which is a multi-valued attribute. Technician properties are technician grade [Tgrade] and union [Union]. The integrated conceptual schemas of the company’s database was designed as shown in Fig. 14.using the ECR model. The entity types of EMPLOYEE, DIVISION, and PROJECT
DIVISION
CONTROLSi
( FULL-TIME-EMPLOYEE)‘
I
T5 Salary
fig.
14. Company
database
conceptual
schema.
R. Elmasri et al. / l%e category concept
93
ENTITY TYPES: EMPLOYEE(S(EMPLOYEE), F-NAME, M-NAME, L-NAME, ADDRESS, HPHONE, OPHONE, SSN) OlVlSlON(S(DIVlSION), NAME, LOCATION, NUMBER) PROJECT(S(PROJECT), NAME, PSTAR7; NUMBER, LOCATION S(DIVISION), S(FULLTlME EMPLOYEE), MSTART) CATEGORIES: FULLTIME EMPLOYEE(S(FULLTlME EMPLOYEE), SALARY S(DlVlSlON)) WHERE S(FULLTIME EMPLOYEE) E S(EMPLOYEE) CONSULTANT(S(CONSULTANT), PAYSCALE, ORGANIZATION) WHERE S(CONSULTANl-) E S(EMPLOYEE) SClENTlST(S(SClENTlST), SGRADE) WHERE S(SCIENTIST) E S(EMPLOYEE) rrCHNlClAN(S(TECHNlClAN), TGRADE, UNION) WHERE S(TECHNICIAN) E S(EMPLOYEE) M:N RELATIONSHIPS: WORKS ON (S(EMPLOYEE), MULTl-VALUED SClENTlST
Fig. 15. Company
S(PROJECT),
AlTRIBUTES: SOCll3lES(S(SClENTlST),
HOURS)
SOCIEpo
database representation schema (relational model).
are also categories since they participate in given roles in relationships. The EMPLOYEE entity type has four ISA (subset)- categories for the roles of PULL-TIME-EMPLOYEE, CONSULTANT, SCIENTIST, and TECHNICIAN. An entity in any of these categories inherits the attribute values of the EMPLOYEE instance that it corresponds to. In this application, the categories of CONSULTANT, SCIENTIST and TECHNICIAN
Fig. 16. Company database representation schema (network model).
94
R. Elmasri
et al. / The category
concept
are not shown to participate in any specific relationships. The database designer may still decide to define these categories. By analyzing the applications on the database, the designer recognizes that the groupings of consultants, scientists, and technicians are important. Each of these categories has additional acquired attributes which only apply to members of the category, and need to be represented in the database. Specific queries such as, “List all information for scientists of Sgrade = 10.” should be facilitated by, the database schema. Future expansion of the database application universe may include relationships in which these categories may perform a role. For example, by including company EQUIPMENT as an entity type, a relationship MAINTAINS can be defined between an EQUIPMENT instance and the TECHNICIAN who maintains it. The ECR conceptual schema of the company database can be mapped into the relational representation schema of Fig. 15 and the DBTG representation schema of Fig. 16.
5. Data query and update In this section, we present the update operations, and the selection formalisms associated with the ECR model. Update operations change the data in an actual instance of an ECR database. Data selection formalisms provide the means to specify predicates that select information for retrieval or update. In the design of an ECR language, both query and update will be part of the same language. We present them separately here because we are concerned with the underlying concepts. We do not present the complete formalisms of update and query to limit the size of our paper. A formal specification of the data selection language is presented in [28]. In [27], we present a unified language, GORDAS (Graph Oriented Data Selection Language), for data definition, query and update. Appendix A provides the syntax for the GORDAS language. We illustrate our discussion of update and selection by examples from the ECR schema shown in Fig. 17. This schema is for a company application, and the steps followed for its design are given in Section 4.2. Figure 17 is similar to Fig. 14 except that it is augmented with connection names which are attached to the arcs that represent participation of categories in relationships. These connection names are used to refer to the entities in one category which are related to an entity of another category by a specified relationship. For example, to refer to the EMPLOYEE entities related to a particular PROJECT entity, we use the (employees) connection name and write: employees of PROJECT. Connection names can be considered as ‘reverse’ roles of the category participations in the relationships. In Fig. 17, we use the convention that names of entity types, categories and relationships are specified in upper case, attribute names are capitalized, and connection names are specified in lower case letters. 5.1. Update operations and transactions The following update operations are defined on an ECR database. (1) Insert-entity. (2) Delete-entity. (3) Add-to-relationship. (4) Delete-from-relationship. (5) Modify-attribute. (5.1) Add-attribute-values. (5.2) Remove-attribute-values.
R. Elmasri et al. / The category concept
95
employees 1 A
grade)
< Unioy
Fig. 17. Company database conceptual schema with connection names.
(6) Add-to-category. (7) Remove-from-category. The insert-entity operation gives us the ability to introduce new entities of a given entity type into the database. To perform an add-entity operation, we must specify the entity type T of the entity that will be added to the database, and the values of all applicable basic attributes for that entity. The delete-entity operation gives us the ability to delete entities of a given entity type from the database. To perform a delete-entity operation, we must specify the entity type T from which the entity or entities are to be deleted. We must also give some conditions in order to identify the particular entity or set of entities within the entity type that is to be deleted. Hence, a predicate S is specified, and all entities in T which satisfy S are selected for deletion. The add-to-relafiomhip operation allows us to relate entities in a specified relationship. To perform an add-to-relationship operation, we must specify the relationship, one entity from each of the categories which participate in the relationship, and any attribute values for that relationship instance. A participating entity x is specified by a predicate S on the
96
R. Elmasri
et al. I l’he category
concept
participating category C which x belongs to. It is possible that S will select a set of entities from C rather than a single entity. In the latter case, each of the selected entities is related to each of the other selected entities from the other participating categories, so that the cross product of the sets of entities selected from each participating category is added to the relationship. The remove-from-relationship operation allows us to delete particular relationship instances from a relationship, and hence has the effect of ‘unrelating’ entities. To perform a remove-from-relationship operation, we must specify the relationship involved, and the set of relationship instances to be deleted from the relationship. The set of relationship tuples is specified by predicates on the categories that participate in the relationship. A predicate S on the attributes of the relationship can be applied to further restrict the set of relationship instances specified for removal. The modify-atfribufe operation gives us the ability to change the value of an attribute for an entity (of an entity type or a category) or a relationship instance in the database. To change the value of an attribute of some entity x, we must specify the entity type T of x or some category C to which x belongs. To change the value of an attribute of some relationship instance x, we must specify the relationship R to which x belongs. In either case, we must also specify the attribute to be updated, as well as a predicate to select the entity (or entities), or the relationship instances that will be updated. We must also specify the new value(s) for the attribute(s) to be updated. For a multi-valued attribute, we can specify two kinds of attribute update operations. The modify-attribute operation replaces the current values of the attribute by a new set of values. A second type of operation changes the set by addition or deletion of values. The add-attribute-values operation will augment the set of values for that attribute by an additional set of values that is specified in the update operation. A remove-arzribute-values operation will remove some values from the set. When a relationship R is involved rather than a type T or a category C, the predicate S that selects a set of relationship tuples for changing the attribute values is specified as in the case of the remove-from-relationship operation. The add-to-category operation is needed to add one or more entities to a category. To perform an add-to-category operation, we must specify the entity type of the entities to be added, the category of which the entities will become members, and the values of any acquired attributes of the entities when they become members of that category. The remove-from-category operation is smaller to the add-to-category operation, but is used to remove entities from a category. The entities remain in the database as members of their entity types, but cease to become members of the category. To perform a remove-from-category operation, we must specify the category, and a predicate that specifies the entities to be removed from the category. The first five operations are also needed for an ER database. The last two are the ones that are necessary because of the introduction of the concept of category. In an actual application environment, logical update units are formed from one or more of these basic update operations. This grouping of operations is useful for two reasons: (1) To form larger, logically coherent update operations on the database. (2) In some cases, when basic update operations are performed separately, a semantic integrity violation occurs on the database. Hence, we need the ability to group operations in a transaction that does not, at its termination, violate the constraints. A transaction is formed of parameter declarations and a transaction body. The transaction body is the sequence of basic update operations that form the transaction, while the parameters hold the values needed by these basic update operations. Semantic integrity checks may be automatically inserted into the transaction, to ensure
R. Elmasri
et al. / The category
concept
91
that semantic integrity constraints specified in the schema are not violated by an invocation of the transaction, as discussed in [26,46]. In the following examples, we use the form of the basic update operations defined in the language GORDAS [27] (see Appendix A). A transaction ,which inserts a new entity in the database initializes its attribute values, and relates it to other entities when required. Hence, several parameters are needed for each of the transactions. For example, referring to the schema of Fig. 17, to insert a new project into the database, we may specify the following transaction: DEFINE TRANSACTION INSERT-PROJECT (PROJ-NAME, INIT-DATE, PROJ-LOC, PROJ-NUM, PROJ-DIV, PROJ-MGR-SSN): BEGIN INSERT PROJECT ATTRIBUTES Name +PROJ-NAME, Number t PROJ-NUM, Location cPROJ-LOC, Pstart @INIT-DATE; ADD-TO-RELATIONSHIP CONTROLS WHERE DIVISION: Name = PROJ-DIV, PROJECT: Name = PROJ-NAME; ADD-TO-RELATIONSHIP MANAGES WHERE FULL-TIME-EMPLOYEE: Ssn = PROJ-MGR-SSN, PROJECT: Name = PROJ-NAME ATTRIBUTES Mstart tINIT-DATE; END The INSERT-PROJECT transaction requires the values of the attributes for the new project to be specified. We must also specify the manager of the project, and the division to which the project is assigned, since the two relationships MANAGES and CONTROLS are total with respect to the participation of PROJECT. To insert a full-time employee who is also a scientist, we can use the following transaction: DEFINE TRANSACTION INSERT-FULL-TIME-SCIENTIST (FNAME, MNAME, LNAME, SSN, OPHONE, HPHONE ADDRESS, GRADE, {SOCIETY}, SALARY, DIVNAME): BEGIN INSERT EMPLOYEE ATTRIBUTES F-name +FNAME, M-name +MNAME, L-name +LNAME, Ssn t SSN, Ophone t OPHONE, Hphone c HPHONE, Address c ADDRESS; ADD-TO-CATEGORY SCIENTIST FROM EMPLOYEE WHERE Ssn = SSN ATTRIBUTES Grade c GRADE, Societies c SOCIETY; ADD-TO-CATEGORY FULL-TIME-EMPLOYEE FROM EMPLOYEE WHERE Ssn = SSN ATTRIBUTES SalarycSALARY; ADD-TO-RELATIONSHIP ASSIGNED WHERE FULL-TIME-EMPLOYEE: Ssn = SSN, DIVISION: Name = DIVNAME; END In this transaction,
the new entity must also be added to the required categories. The
98
R. Elmawi et al. I The category concept
scientist is inserted as an EMPLOYEE entity, and is also added to both the SCIENTIST and the FULL-TIME-EMPLOYEE categories as well as being related to a DIVISION. Hence, the transaction requires the values of the attributes for EMPLOYEE, SCIENTIST, and FULL-TIME-EMPLOYEE, as well as the identification of the DIVISION to which the new entity is related. Other transactions could be defined to relate or un-relate entities in a specified relationship. In some cases, it is necessary to do both operations in a single transaction, when the relationship is total with respect to one of the categories. For example, we cannot un-relate a project from a manager without assigning a new manager for that project. The following transaction replaces a project manager: DEFINE TRANSACTION REPLACE-MANAGER (PROJ-NAME, NEW-MGR-SSN, DATE): BEGIN REMOVE FROM RELATIONSHIP MANAGES WHERE PROJECT: Name = PROJ-NAME, FULL-TIME-EMPLOYEE: TRUE; ADD TO RELATIONSHIP MANAGES WHERE PROJECT: Name = PROJ-NAME, FULL-TIME-EMPLOYEE: Ssn = NEW-MGR-SSN ATTRIBUTES Mstart + DATE; END Since the MANAGES relationship requires every project to have a manager at all times, we must place the two operations above in a single transaction. If we performed the REMOVE FROM RELATIONSHIP operation separately, it would result in a semantic integrity violation, and would be rejected. The choice of update transactions is influenced by both the frequency of use of a transaction, and the semantic integrity constraints in the schema may require the grouping of update operations because the individual update operations would result in a semantic integrity violation. The previous transaction (REPLACE-MANAGER) is an example. Transactions to delete entities, or to remove an entity from a category (but not from the database) can also be defined. For example, to delete an employee, we write: DEFINE TRANSACTION DELETE-EMPLOYEE BEGIN DELETE EMPLOYEE WHERE Ssn = EMP-SSN; END
(EMP-SSN):
The DELETE EMPLOYEE transaction can be used to delete any company employee from the database. This will result in automatic removal of the employee from any categories, as well as removal of relationship instances that relate the employee to projects or to a division. However, if the employee was also a project manager, the transaction would be rejected. As mentioned previously, the statements to check for integrity constraint violation can be automatically generated. The transaction is examined as a whole, so that only necessary checks are generated. The above examples illustrate the types of transactions that may be defined on a schema of a particular application. These transactions can be compiled and used by non-technical users who are authorized to update the database. When a compiled transaction is executed, the
R. Eltnasri et al. / The category concept
99
user is prompted for the values of the parameters of the transaction. Other transactions can be written if the need arises, and these transactions can either be compiled and added to the transaction library (if they are expected to be used frequently) or they may be executed once and discarded. 5.2. Data selection (query)
There are two possible approaches to define query capabilities for the ECR model. The first possibility is to adapt some formal relational language, such as the relational algebra or the relational calculus, to the ECR model. The second possibility is to attempt to use the additional information about relationships in an ECR schema and come up with a new formal language. We have chosen to pursue the latter course. We will only sketch the language capabilities here, and demonstrate the language by examples. The language is based on the Structural Model Query Language (SMQL) presented in [25]. The language is called GORDAS, for Graph Oriented Data Selection. GORDAS is designed for a class of data models, the entity-attribute-relationship data models, of which the entity-relationship model [lo] is a prime example. Other models in this class include those described in [52,65,66]. The characteristic of these models is that relationships between entity sets are explicitly recognized. This recognition allows us to refer to entities from one set that are related to a given entity of some other set via a specified relationship. In languages based on the relational model, this capability is missing because no explicit connections between relations is recognized. Hence, in relational queries, we must explicitly specify the join operation that defines this connection. In GORDAS, a connection name is given for each participation of a category in a relationship. This connection name is used as a means of functional reference to related entities. This aspect of GORDAS is hence similar to the proposed functional query languages, such as FORAL [58], DAPLEX [59], FQL [9], and TASL [38]. However. all the above languages are based upon underlying binary relationship or fitnctional data models, while GORDAS is based on underlying entity-relationship-attribute data models. A data selection request, or query in GORDAS is formed of two clauses: the GET clause and the WHERE clause. In the following, we use the word class to stand for a set of entities (an entity type or a category) or a set of relationship instances (a relationship). The word object (of a class) stands for individual entities from the entity set, or individual relationship instances from the relationship. In GORDAS, the GET clause specifies the class (or classes) about which information is desired, and the information requested about the selected objects from that class. The WHERE clause specifies a selection expression that determines which objects from the class (or classes) are desired. We illustrate GORDAS with examples of queries using the schema of Fig. 17. We give each query in English, then give the corresponding GORDAS statement. Following most queries, we give a discussion of the feature of GORDAS illustrated by that query. A formal specification of data selection in GORDAS is presented in [28]. The GORDAS syntax is given in Appendix A. (Q.1) Retrieve the home phone and address of employee John Smith. GET (Hphone, Address) of EMPLOYEE WHERE F-name = ‘John’ AND L-name = ‘Smith’ Here, the set of interest is the EMPLOYEE
set, and the information
requested is the
100
R. Elmasri et al. / The category concept
home phone and the address. The entity selected from the set is the employee whose name is John Smith. The meaning of a GORDAS query is as follows: each object in the class specified at the end of the GET clause (each EMPLOYEE entity in this query) is tested by the selection expression of the WHERE clause. If the selection expression evaluates to TRUE for that object, the object is selected, and the information about that object specified in the GET clause (Hphone and Address in this example) is returned. Hence, if more than one employee had the name ‘John Smith’, a set of employees would be selected, and a set of (Hphone, Address) values would have been returned: one for each selected employee. (Q.2) Retrieve th e names, home phones, and addresses of all employees assigned to the Research Division. GET (Name, Hphone, Address) of EMPLOYEE WHERE Name of division of EMPLOYEE = ‘Research’ The same EMPLOYEE set is the set of interest, and the WHERE clause selects all employee entities assigned to the research division. The connection name [division] is used to refer to the division related to an employee by the ASSIGNED relationship. From the relationship cardinality information in the schema, we know that an employee is assigned to at most one division. We can refer to [division of EMPLOYEE], even though the ASSIGNED relationship is defined between DMSION and FULL-TIME-EMPLOYEE. This is possible because the category FULL-TIME-EMPLOYEE is defined on the entity type EMPLOYEE. If an employee entity is a member of the FULL-TIME-EMPLOYEE category, it will inherit all relationships which the category participates in (as well as all acquired attributes of the category). This query implicitly selects only employees that are full time employees, since other employees do not participate in the ASSIGNED relationship. To give GORDAS the above capability, certain naming restrictions on duplicate connection names and attribute names in a schema are necessary. These restrictions are given in [27], and are all easily verified automatically. Whenever a single class name appears in the GET clause, the class name can be omitted from the WHERE clause, since it must be the same as the class name in the GET clause. That is why in Q.l, no class names appear (we could have specified F-name of EMPLOYEE and L-name of EMPLOYEE rather than F-name and L-name). Similarily, the word EMPLOYEE can be omitted from the WHERE clause in Q.2. Hence, Q.2 can be stated as: GET (Name, Hphone, Address) of EMPLOYEE WHERE Name of division = ‘Research’ Note that we could have specified Q.2 as: GET(Name, Hphone, Address) of employees of DIVISION WHERE Name of DIVISION = ‘Research’ The meaning here is that we select the DMSION entity whose Name = ‘Research’ (from the WHERE clause), then we retrieve all the EMPLOYEE entities related to that DMSION (from employees of DIVISION). The capability to specify queries in different
R. Elmasri et al. / The category concept
ways is important the same query.
101
since different persons may have different points of view when looking at
(Q.3) For each employee assigned to the Research division, retrieve the name of the employee, and the names and hours for each of the projects the employee works on. GET (Name, (Name, Hours) of projects) of EMPLOYEE WHERE Name of division [of EMPLOYEE] = ‘Research’ (The part of the query between square brackets ([. . .]) may be omitted. We will use this convention in subsequent queries.) In Q.3. we refer to attributes of entities related to an employee entity in both the GET and the WHERE clauses. The WHERE clause is identical to that of the previous query. The GET clause specifies that for each selected employee, the value of the Name attribute of the employee, and the values of the Name and Hours attributes for each project entity related to that employee by the WORKS-ON relationship are retrieved. Since the WORKS-ON relationship is of cardinality M :IV, we expect each employee entity to be related to a set of projects. For each such project, a single relationship instance will relate the employee to the project and the value of the Hours attribute for that relationship instance is retrieved along with the Name of the project. (Q.4) Give me the names of all managers in the research division. GET Name of EMPLOYEE WHERE Name of division [of EMPLOYEE] = ‘Research’ AND COUNT proj-managed [of EMPLOYEE] > 0 An employee is a manager if he (or she) is related to a project via the MANAGES relationship. Hence, we want to select all employees who are assigned to the research division, and who manage a project. The function COUNT (or NUMBER-OF) is a fundamental part of GORDAS, and returns the number of entities (or values or relationship instances) in a specified set. In this example, the specified set is the set of project entities related to an employee entity: the employee entity under consideration for satisfying the WHERE clause. In our case, the number of projects will always be 0 or 1 because the relationship MANAGES is 1: 1. (QS) For each project, give me the name of the project, the number of employees who work on that project, and the names of these employees. GET (Name, COUNT
employees, Name of employees) of PROJECT
PwHEREmw The GET clause of this query specifies that for each selected project, the name, number of employees, and the set of employee names who work on that project are retrieved. The WHFRE clause condition (TRUE) selects all project entities. We use the convention that when the logical expression of the WHERE clause is TRUE, the whole WHERE clause can be omitted. (Q.6) For each project, give me the name of the project, the number of employees who work on that project and which are assigned to the research division.
R. Elmasri
102
GET (Name, COUNT [WHERE TRUE]
et al. / nie
category
concept
employees: Name of division = ‘Research’) of PROJECT
Here, we illustrate the feature of GORDAS called restricting logical expression. As we saw, a connection name specifies either a single entity, or a set of entities that is related to some entity by a given relationship. When a set of entities is specified, we sometimes want to restrict that set to a subset which satisfies certain conditions. This is accomplished by the restricting logical expression. In Q.6, for each selected project, we want to return the name of that project. We-also want to return the number of employees related to that project, but only those who work in the research division. The connection name (employees of PROJECT) specifies all the employees related to the project by the WORKS-ON relationship. However, when we specify a logical expression following the connection name (employees: Name of division = ‘Research’) the logical expression is applied to each employee entity related to the specified project, and only those entities that satisfy the logical expression are selected. Hence, the logical expression restricts the specified subset of entities further. (Q.7) Give me the names of all employees who work on both the MX and the XYZ projects (but possibly on more projects). GET Name of EMPLOYEE WHERE Name of projects [of EMPLOYEE]
INCLUDES
{‘MX’, ‘XYZ’}
This query introduces the set comparison operators. In GORDAS, these are INCLUDES, NOT INCLUDES, =, and NOT=. The set comparison operators fall quite naturally into the GORDAS framework because we always know whether to expect a set of values (or objects), or a single value (or object). This is always easily determined from the relationship cardinality information in an ECR schema. (Q.8) Give me the names of all employees who work on all projects that John Smith works on (but possibly on more projects). GET NAME of EMPLOYEE WHERE5 projects [of EMPLOYEE] INCLUDES (GET projects of EMPLOYEE WHERE F-name = ‘John’ AND L-name = ‘Smith’) Query Q.8 is quite similar to Q.7, except that the set of projects that is used in the where clause is the ,result of a retrieval from the database, specified by the embedded query. In Q.7, this set is explicitly specified by its value (‘MX’, ‘XYZ’}. Here, we see the capability in GORDAS to use embedded queries within other queries. All references within the inner query refer to the object set of interest in the inner query. We can also refer to the objects of the outer query within the inner query. This poses no ambiguity if the object sets of the outer and inner queries are different. However, if we need’ to refer to the object set in the outer query, and it happens to be the same as the object set in the inner query, .we have to differentiate between the occurrences of the two. We do this by appending unique integer numbers (1,2, . . .) to the different occurrences. (Q.9) Give me the names of all employees who work on at least two projects that John Smith works on.
R. Elmawi et al. / The category concept
103
GET Name of EMPLOYEE WHERE COUNT (projects [of EMPLOYEE] INTERSECT (GET projects of EMPLOYEE WHERE F-name = ‘John’ AND L-name = ‘Smith’)) 2 2 In Q.9, another feature of GORDAS is illustrated. Since we often deal with sets, we need the ability to form new sets from known sets. We use the set formation operators for this purpose. These are the well-known set operations UNION, INTERSECT, and- (set difference). In Q.9, we form the intersection of two sets in order to count the common elements. Each employee who works on more than two common projects with John Smith is selected. (Q.10) Give the names of all divisions where the average salary of the employees assigned to the division is greater than 25 000. GET Name of DIVISION WHERE AVERAGE Salary of employees [of DIVISION]
> 25 000
As in many query languages, several standard functions exist in GORDAS which correspond to frequently required operations. These are AVERAGE, SD (standard deviation), SUM (or TOTAL), MAX and MIN. Limited capability for expressing arithmetic operations also exists. (Q.11) Give me the names of all consultants who work on the MX project. GET Name of CONSULTANT WHERE Name of projects [of CONSULTANT]
INCLUDES
‘MX
This query illustrates the attribute inheritance capability in GORDAS, which allows the reference to the attributes of an entity type on which a category is defined. The CONSULTANT category is defined on the EMPLOYEE entity type, so we can refer to the attribute Name of EMPLOYEE as an attribute of the CONSULTANT category. We can also refer to attributes of categories defined on an entity type as attributes of the entity type itself. This implies that only members of the category will have values for those attributes. When GORDAS is compared to a high-level relational query language such as SQL, two main differences should be pointed out: (1) Join operations do not have to be specified in the GORDAS query. Connection names are used in place of the join operations, which make the query statement closer to an English language statement. (2) There is no need to specify the GROUP BY clause in GORDAS. Entities and attributes are implicitly grouped by the class name at the end of the GET clause. It is worthwhile to note here that a psychological study reported in [53] showed that join operations and GROUP BY’s were the features of SQL which caused the most difficulty in correct specification of SQL queries. It is also important to note that GORDAS is compatible with, and quite similar to, a recently proposed entity-relationship algebra [51]. 6. Derived relationships for recursive queries In this section, we consider an important family of queries which often arise in common database applications. These queries involve what have been termed ‘recursive’ or ‘least
104
R. Elmasri et al. / The category concept
fixed point’ operations [2]. The well-known ‘parts explosion’ problem is one example of such a query. In [2], a proof is given to show that the transitive closure of a binary relation, which is an elementary operation of this type, is not expressible in the relational calculus. Several approaches to this problem have been suggested in the literature (QBE [67], FQL [9,2]), all involving the incorporation of some of the least fixed point operations in a query language. None of these suggestions is entirely satisfactory in that there is a trade-off between simplicity and generality. Query languages simple enough to be understood and used by non-programmers typically incorporate only a very limited least fixed point capability (such as the transitive closure of a binary relation in QBE). Languages providing a more general capability are procedural and recursive [9,2], and are no less difficult to use than a high level programming language. In what follows, we suggest an alternate approach; namely ,the utilization of the wellknown concept of derived information [43,59,25] to specify (in the schema) information derivable via least fixed point operations. Then, the user can rely upon first order query languages such as GORDAS to extract the information from the database. We also propose a new method of specifying the derived data in the schema. This method is based on graph theory, and seems to be simpler and more straightforward than the least fixed point concept. Our graph-theory method appears to satisfactorily address most least fixed point operations, but a proof of equivalence is a topic for further research. We now state our proposal in terms of the ECR model. A binary relationship over a single category defines a directed graph over the entities in that category. We call such a relationship a digruph relationship. A digraph relationship is similar in concept to the digraph relation as defined in [19]. In the graph defined by a digraph relationship, attributes of the relationship can be viewed as labelings of the edges of the graph. Similarly, attributes of the entities in the category can be viewed as labelings of the vertices of the graph. Consider the example of the bill of materials database shown in Fig. 18. Here, COMPONENT is a digraph relationship over the PART category. The Qty attribute appears as a labeling of the edges of the digraph, and the Pno and Cost. attributes are labelings of the vertices. The concept of a digraph relationship is also useful from a semantic integrity standpoint. It is possible to specify that a digraph relationship is constrained to be acyclic. The COMPONENT relationship of Fig. 18 is an example of an acyclic digraph relationship. Similarly, a
P24, $25.00 n
5
Fig. 18. Bill of materials database.
P43, $18.50
R. E1mq.k et al. / The category concept
105
digraph relationship could be constrained to define a free. An example of a tree digraph would be the relationship describing the reporting hierarchy among employees of a corporation. An example of a digraph relationship which is neither acyclic nor a tree is the relationship describing connecting flights in an airline reservation database. However, the primary rationale for considering digraph relationships is the capability to define operations on the associated directed graphs in order to devise new directed graphs which are of interest. The two useful operations we consider are composition and simplification.
In the following, we use the notation e, to stand for node i in the database graph (which represents an entity in the database), a,, to stand for the set of attribute values for node e, (the node labeling), and a,, to stand for an edge labelling (the values of the attributes of a relationship instance represented by the edge). Given a database path p = (ei, ei+l, . . . , ei-i, ej) (ei # ej) of length n L 1 connecting entities e, and ei, the composition of p creates a new graph, consisting of e,, ii, and one edge from e, to ej, as shown in Fig. 19(a). Further, labelings (attribute values) a, for the new edge can be defined as functions of labelings of the vertices and edges in p; a, = Fk(a,,, . . . , a=,, a ,,’ * * * , a,,-,). The functions Fk are called composition functions. We can now define the closure under composition of a digraph relationship R ‘(denoted R+) with attributes ak = F,(a,, . . . , at,, a,,, . . . , a,,-,) as the directed graph resulting from the application of the composition operation and composition function to all non-circuitous paths in R, and forming the union of the resulting graphs. d,
a, = F,(IaJ,{aJ) F, is a -position
a,= G,(W)
function Fig. 19. Derivation
G, is amplification operations; (a) Composition;
function
(b) Simplification.
Note that R+ is a directed graph, but is not a relationship in general, since there may be parallel edges. (A necessary and sufficient condition for R+ to form a relationship is that R be a tree). In order to derive a relationship from R+, we require another operation. Given two vertices (entities e,, ej) with n > 1 parallel edges connecting them, the simplification of this graph consists of the two nodes, and a single edge which is the result of compacting the n parallel edges. Here also, it is possible to define new attributes a, defined as functions G, of the labelings on the edges created by composition. Hence, a, = G,(a,, . . . , a,). The G, are called simplification functions. We can now define the simplified closure (or closure) of R, denoted R* with attributes: at = G,E,(a,,
. . . , a ,,..,1,. . . , &n(ae,, . . . , a,-,))
R. Elmasri et al. / The category concept
106
as the directed graph resulting from the replacement of all parallel edges in Rt with a single edge through application of the simplification operation. Note that R* is a digraph relationship with newly defined attributes, a,. Also, disregarding labelings (attributes), R* is the transitive closure of R. Thus, R* is suitable for inclusion in an ECR database schema as a derived relationship. Figure 19(b) illustrates a simplification function. We now illustrate the use of this approach with examples of two common database problems: parts explosion from a bill of materials database and employee organization within a corporation. Example 1: Parts explosion Given the Bill-of-Materials explosion’ query would be:
database shown in Fig. 18, an example of a typical ‘parts
“Given a part number, P123, create a list of all primitive required to build the part.”
parts and quantities
In order to support queries of this type, the database administrator would define a derived relationship, COMPONENT*, with a Total-Qty attribute defined by a composition function = (product) applied to all Qty attribute values, and a simplification function = (sum). Figure 20 shows the resulting schema, and Fig. 21 illustrates these functions on an extension of the database.
COMPONENT
Fig. 20. Bill of materials database with derived relationship.
The COMPONENT* relationship differs from the COMPONENT relationship in that while the COMPONENT relationship defines only immediately subordinate subassemblies, the COMPONENT* relationship indicates subassemblies at all levels (Fig. 21). Given our derived relationship, COMPONENT*, stating our parts explosion query in a first order language such as GORDAS is relatively simple. GET (Pno, Total-Qty) of all-components: of PART WHERE Pno [of PART] = ‘P123’
(COUNT
components = 0)
The WHERE clause selects the part whose number is P123. The connection name (all-components of PART) gives the set of all parts that are subassemblies of P123 at any
R. Elmasti
et al. 1 l%e category
concept
d
.
107
COMPONENl
4
2
3
L P26
P57
in COMPONENT’
Fig. 21. Extensions of COMPONENT
and COMPONENT*.
level, and the restricting logical expression (COUNT components = 0) restricts that set to only the components who have no subassemblies themselves and hence are primitive parts. Example 2: Employee organization Given the Employee database in Fig. 22(a), a typical query might be: “Find the names of all employees who report to Smith, together with the level at which they report to him.” As in Example 1, queries of this type are supported via a derived relationship, WORKSFOR*. However, there is a slight difference in the way the derived attribute, Level, is
wr
emps
w
emps
Fig. 22. Base and derived relationships; (a) Employee database; (b) Employee database with derived relationship.
108
R. Elmasri et al. / The category concept
defined. The composition function for Level is the function COUNT applied to the edges in each path being composed. Since WORKS-FOR is a tree, there is no need for a simplification function. Thus, we form the derived relationship as shown in Fig. 22(b), with extensions as shown in Fig. 23. Given WORKS-FOR*, we can state our typical query in GORDAS as follows: GET (Name, Level) of all-emps of EMPLOYEE WHERE Name [of EMPLOYEE] = ‘Smith’ Smith WORKS-FOR
Peters
Johnson
Carter
Dayton
Smith
Peters
Johnson
Carter
Hudson in WORKS-FOR*
Dayton
Fig. 23. Extensions of WORKS-FOR
Hudson
and WORKS-FOR*.
7. collcluslons
In this paper, we presented an extension to the entity-relationship model to allow direct modeling of the important semantic concepts of generalization, ISA-hierarchies, roles, structural constraints and multi-valued attributes. A single concept of category was added to the ER model so that the extension does not unnecessarily complicate the model. We then showed how the ECR model can be used at the conceptual level of a database management system while the relational or network models can be used at the representational level of the same system. Examples to illustrate the use of the ECR model in database design were presented. A query and transaction language, GORDAS, for the ECR model was presented in Section 5. Finally, we discussed a technique to define derived structures on the ECR model which allow the specification of recursive queries as first order queries. The ECR model described in this paper has been used as the basis of database research which requires the use of conceptual semantic models. One such area of research is the
R. Elmasri et al. / The category concept
109
integration of views [29,49]. Another area of research in which the model has been used is semantic integrity of high-level database transactions [26,46]. The ECR model and the GORDAS language have been used in two experimental database management systems (DBMSs). In both systems, the ECR model was used at the conceptual level, and the relational model was used at the representational level. The first DBMS is the Distributed Database Testbed System (DDTS) [22] which was designed and implemented at Honeywell Computer Sciences Center in Minnesota. The DDTS is not a production-type system, but is used as a vehicle for conducting research in many areas of distributed database systems, such as concurrency control algorithms, transaction and query optimization algorithms, semantic data modeling, multi-level DBMS architectures, mappings between different schema levels, query language translation, and heterogeneous database systems. The modular design of DDTS allows the replacement of modules that perform a specific task by alternative modules that use different methods or algorithms for the same task. The system will be used mainly to obtain quantitative comparisons for design choices under different usage environments. In the first version of DDTS, three Honeywell Level-6 minicomputers form three network nodes, and each stores the local data on an IDS-II DBMS. The global semantic schema uses the ECR model, while a global representational schema uses the relational model. A subset of the GORDAS query language was implemented as a user interface for the first version of DDTS. The language is mapped to an internal form, which is then optimized and translated to operations at the different local DBMS%. Different aspects of DDTS are described in [64,22,26]. The second DBMS is a centralized DBMS being implemented as an ongoing research project at the University of Houston. The implementation is on a VAX/ll-780 using VMS. It is coded in PASCAL, and uses VMS/RMS files. One goal of this system is to test the feasibility of implementing a DBMS which incorporates advanced data modeling concepts and high-level languages, such that the DBMS will have acceptable performance. The second, more important goal is to use the system as a vehicle for advanced research in database systems based on a high-level conceptual model. Such research includes semantic integrity of transactions, deduction, intelligent interfaces and data distribution. The DBMS at the University of Houston is currently operational. It is currently being tested systematically and its performance is being enhanced. It is described in [30].
Appendix A. GORDAS syntax in BNF We will
use the following <. . . > 1x1
[xl
( xl ‘ml
BNF conventions:
non-terminal symbol x appears 0 or more times x appears 0 or 1 time I x2 I . . . I xn > xl,or x2 or . . . or xn literal metasymbcl where m is a metasymbol I 1 [ 1 ( 1 < > I
11) DATA DEFINITION STATEMENTS
: := I I
I
R. Elmasri et al. / The category concept
110
(1.1)
DATA STRUCTURESPECIFICATION :
statement>:
set def>::=
:= DEFINE ( I I I *) VALUFSET
name> AS
description>
name>: :=
:
:=
I (
I
I
> 1
comment: includes each of the upper and lower case alphabetic characters (A . . Z and a . . z> , includes each of the decimal digits 0 .. 9 includes each special character used in nasnes
descriptions>:
valueset>::=
: : =
:= ‘1’
E ,
I
valueset>
I
) ‘1’
of characters> : = STRING 1 ALFWSETIC 1 C 1 I INTEGER I RANGE : 1 I REAL C INTERVAL : 1 DATE [ 1 [ : 1
valueset>:
comment: is an integer number in sune format is a real number in sane format is a date in sane format
value
set>: :=
name>: : =
: :=
name> OF
type name> I
name>
name>
type name>::= name>: := type def>::= list>:
ENTITY TYPE
:=
type name> ATTRIBUTES
attribute>
1 ,
attribute>
1
::= ( COMPOUND’ ( ’ ’ 1’ I VALUESET > [ KEY 1 UNIQUE 1 [ MINIt4Jh VALUES < natural nun> 1 [ MAXIMUMVALUES 1 comment:
nun> is any natural
CATEGORY FROM 1 SPECIFIC I ATTRIBUTES
def>::=
nunber
R. Elmasri et al. 1 The category concept
set specifications>:
:=
spec> [ UNION
spec>: := [ 1( 1
comment:
expression>
111
is defined
expression>
def>: := RELATIONSHIP FROM [ ATTRIBUTES 1
name>: : q list>:
::=
(1.2)
’ 1’ 1
in part 2 (DATA SELECTION>
spec> 1
:= , { , 1 [ SPECIFIC I(’ , [ MINIWM PARTICIPATION [ MAXIMUM PARTICIPATION
1 C PD 1 0’ C DLT 1 1 1
CONSTRAINTSPECIFICATION:
canment:
statement>::= DEFINE ( CONSTRAINT 1 TRIGGER [ ON ( 1 1 1 : 1 1
and (message> will
not be specified
>
here
( 1.3) DERIVED DATA SPECIFICATION :
statement>::=
DEFINE DERIVED (
derivation> 1
I
: := ATTRIBUTE OF ( I > TO BE ( I > comment:
and are specified
in PART 2 (DATA SELECTION)
( 1.4) SIMPLE TRANSACTIONSPECIFICATION :
update transaction definition statement>::= DEFINE TRANSACTION I(' I>'
name>::=
::=
::=
:
body>::=
1 , I ‘I1
BEGIN TRANSACTION [ VARIABLES
declarations>::= :=
1 ‘1’
I ‘
declarations>
I ,
1
; 1
I>’
112
R. Elmasri et al. I The category concept
::=
I, I
: := 1 (DISPLAY I PRINT 1 cament:
is specified
:=
I
in PART 2 (DATA SELECTION) is specified in PART 3 (DATA MODIFICATION)
.mDl!TAsELECrIoN(QUERY)STATEMENT : : =
THE GET CLAUSE:
(2.1)
: := GET 1
specification
spec>::=
list>::=
spec list>::=
spec>: : =
::=
spec> I ;
class name>::=
list>
class
name>
( I 1
spec> I ;
::= (2.2)
spec> 1
expression>
1 OF I
EXISTS I CCUNT 1 SUM ( AVERAGEI SD 1 ( MAX I MIN > [ I(’ ‘1’ 1 I
name> [ :
expression>
I OF 1
THE WHERECLAUSE:
: := [ WHERE
expression>
expression>::=
term>: : = C NOT 1 (
:
C
1 ( AND I OR 1 1
I I( *
expression>
:= I ’ 1 NOT 1 IN
name> 1
C 1
::= ( CCUNT I EXISTS 1 [ OF 1 I [ 1 OF 1 1 [ : , 1 . 1 ‘>I OF ::=
spec> 1
comparison
!
op>: := EQUAL I NOT ERUAL I INCLUDES I INCLUDED IN op>: := = I NOT = I > ( 2 I < I I
‘1’
1
113
R. Elmasri et al. / The cateiory concept
: := [ ::=
UNION : INTERSECT
set>: : =
set>::=
‘1’
set>
: DIFFERENCE
I
1
value>
I ’ (’ ’ 1’
1 ,
1 ‘1’
::= I I 1 1 ( I
name> >
: := [ ( + I - 1 1 : := [ ( * I / I DIV I REM 1 I ::=
::= u1
I I(’
‘1’
I
I
DATA ACIDIFICATION (UPDATE) STATEMENT
:
use>::=
I
use>
( EDIT I COMPILE 1 EXECUTE I REMOVE > TRANSACTION
operation>::= I I I I I INSERT 1 INTO 1
::=
values>::=
: : = :
: q
type name>
!
value.9
ATTRIBUTES [ = ] I , 1 = 1 1
expr>
(
:= DELETE [ FROM 1
( type name>
: : = MODIFY OF ( I
list>:
::=
:=
::=
expr list>:
{ ,
name> 1
1
ADD TO RELATIONSHIP 1
name> value.9 1
:= : { , :
: ::=
name> 1 ,
rime> 1
1
:= REMOVEFROM RELATIONSHIP I
ADD TO CATEGORY FROM [ 1
name> >
name>
114
:
R. Elmasri et al. J The category concept
: = REMOVEFROM CATEGORY
name>
References [l] J.R. Abrial, Data semantics, in: J.W. Klimbie and K.L. Koffeman (Eds.), Data Base Management, Proc. IFIP Conference on Data Base Management (North-Holland, Amsterdam, 1974) l-60. [2] A. Aho and J. Ulhnan, Universality of data retrieval languages, Proc. 6th ACM Symposium on Principles of Z+ogrtiming Languages (1979) 110-120. [3] M.M. Astrahan et al., System R: Relational approach to data base management, ACM Trans. Database Systems I (2) (1976) 97-137. [4] C. Bachman and M. Daya, The role concept in database models, Z%c. 3rd ZEEE International Conference on Very Large Data Bases (1977) 464-476. [5] P. Bernstein, Synthesixing third normal form relations from functional dependencies, ACM Trans. Database Systems 1 (2) (1976) 277A298. [6] P. Bernstein and N. Goodman, What does BoyceCodd normal form do? Proc. 6th IEEE International Conference on Very Lurge Data Bases (1980) 245-259. [7] H. Biller and E. Neuhold, Semantics of data bases: the semantics of data models, Information Systems 3 (1) (1978) 11-30. [S] G. Bracchi, P. Paolini and G. Pelagatti, Binary logical association in data modelling, in: [SO]. [9] P. Buneman and R.E. Frankel, FQL: A functional query language, Proc. ACM SZGMOD Conference (1979) 52-57. [lo] P.P. Chen, The entity-relationship model: towards a unified view of data, ACM 7kans. Database Systems 1 (1) (1976) 9-36. [ll] P. Chen and S. Yao, Design and performance tools for data base systems, Proc. 3rd ZEBE International Conference on Very Large Data Bases (1977). [12] P.P. Chen (Ed.), Entity-Relationship Approach to Systems Analysis and Design (North-Holland, Amsterdam, 1980). 1131 P.P. Chen (Ed.), Entity-Relationship Approach to Information Modeling and Analysis (North-Holland, Amsterdam, 1983). [14] Committee on Data Systems Languages, CODASYL Data Base Task Group Report, ACM, New York, 1971. [U] Committee on Data Systems Languages, CODASYL Data Base Task Group Revised Report, ACM, New -York, 1978. [16] E.F. Codd, A relational model for large shared data banks, Comm. ACM 13 (6) (1970) 377-387. [17] E.F. Codd, A data base sublanguage based on the relational calculus, Proc. ACM SZGZTDET Workshop on Data Description, Access, and Control (1971) 35-68. [IS] E.F. Codd. Recent investigations in relational data base systems, Proc. Zi?ZP Conference (1974) 1017-1021. [19] E.F. Codd, Extending the relational model to capture more meaning, ACM Trans. Database Systems 4 (4) (1979) 397-434. [20] Cullinane Corporation, Integrated Database Management System (IDMS) Reference Guide, 1975. [21] C.G. Davis, S. Jajodia, P.A. Ng and R.T. Yeh (Eds.), Entity-Relationship Approach to Software Engineering (North-Holland, Amsterdam, 1983). ..-_ [22] C. Devor and J. Weeidreyer, DDTS: A testbed for distributed database research, Froc. ACM Pacific Conference (1980) 86-94. [U] R. Ehnasri and G. Wiederhold,.Data model integration using the structural model, Proc. ACM SZGMOD Conference (1979) 191-202. 1241 R. Elmasri and G. Wiederhold, Properties of relationships and their representations, Z+oc. National Computer Conference, AFZPS 49 (1979) 319-326. [25] R. Elmasri, On the design, use, and integration.of data models, Computer Science Department, Stanford University Report STAN-CS-80-801 (Ph.D. Thesis), May 1980. [26] R. Ehnasri, Semantic integrity in DDTS (Distributed database testbed system), Honeywell CCSC Technical Report HR-80-274, Bloomington, Minnesota, December 1980. [27] R. Elmasri, GORDAS: A data definition, query and update language for the entity-category-relationship model, Honeywell CCSC Technic&Report HR-81-250, Bloomington, Minnesota, January 1981. [28] R. Elmasri and G. Wiederhold, GORDAS: A formal high-level query language for the entity-relationship model, in: [13], pp. 49-72. [29] R. Elmasri and S. Navathe, Object integration in logical database design, Proc. L5EB COMPDEC Conference (1984) 426-433.
R. Elmasri et al. / The category concept
115
[30] R. Elmasri, A DBMS based on an extended ER model, Technical Report UH-CS-84-5, University of Houston, 1984. [31] R. Fagin, Multivalued dependencies and a new normal form for relational databases, ACM Trans. Database Systems 2 (3) (1977) 262-278. [32] E. Falkenberg, Concepts for modeling information, in: [50]. [33] P. Hall, J. Gwiett and S. Todd, Relations and entities, in: [50]. [34] M. Hammer and D. McLeod, The semantic data model: a modeling mechanism for database applications, Proc. ACM SIGMOD Conference (1978) 26-36. [35] M. Hammer and D. McLeod, Database description with SDM: a semantic data model, ACM Trans. Database Systems 6 (3) (1981) 351-386. [36] G. Held, M. Stonebraker and E. Wong, INGRES: a relational data base system, &oc. National Computer Conference, AFIPS 44 (1975) l-7. [37] Honeywell Information Systems, Integrated Data Store (IDS) Reference Manual, Wellseley, Massachusetts, 1972. [38] B. Housei, V. Waddle and S. Yao, Functional dependency model for logical data base design, Pmt. 5th EEE International Conference on Very Large Data Bases (1979) 194-208. [39] IBM, Information Management System Virtual Store (IMS/VS) Reference Manual, White Plains, New York, 1975. [4O] W. Kent, Limitations of record-based information models, ACM Trans. Database Systems 4 (1) (1979) 107-131. [41] A. Klug and D. Tsichritxis (Eds.), The ANSIIX3lSPARC Report of the Study Group on Data Base Management Systems (AFIPS Press, 1977). [42] V. Lum, 1978 New Orleans Database Design Workshop Report, IBM Report RJ2.554, 1979. [43] D. McLeod, A semantic data base model and its associated structured user interface, MlT Technical Report MlT/LCS/IR-214 (Ph.D. Thesis), Cambridge, Massachusetts, August 1978. [44] D. Maier, The Theory of Relational Databases (Computer Science Press, Rockville, MD, 1983). [45] M. Minsky, Computer science and representation of knowledge, Proc. National Computer Conference, AFIPS 42 (1973). [46] R. Moore, An algorithm for automatic generation of semantic integrity checks into database transactions, Master’s Thesis, University of Houston, August 1983. [47] MRI Systems Corporation, System 2000 General Information Manual, Austin, Texas, 1972. [48] S.B. Navathe and M. Schkolnick, View representation in logical database design, Rot. ACM SZGMOD Conference (1978) 144-156. [49] S.B. Navathe, T. Sashidhar and R. Elmasri, Relationship merging in schema integration, &oc. 1Orh International Conference on Very Large Data Bases (1984). [50] G. Nijssen (Ed.), Modelling in Data Base Management Systems (North-Holland, Amsterdam, 1976). [51] C. Parent and S. Spaccapietra, An entity-relationship algebra, Proc. IEEE COMPDECConference (1984) 500-509. [52] A. Pirrotte, The entity-property-association model: an information-oriented database model, Report R343, MBLE, Brussels, Belgium, March 1977. [53] P. Reisner, The use. of psychological experimentation as an aid to the development of a query language, IEEE ‘Ibans. Software Engrg. 3 (3) (1977). [54] N. Roussopoulos and J. Mylopoulos, Using semantic networks for data base management, Proc. ACM International Conference on Very Large Data Bases (1975). [55] H. Schmid and J. Swenson, On the semantics of the relational data model, &oc. ACMSIGMOD Conference (1975). [56] M. Schrefl, A.M. Tjoa and R.R. Wagner, Comparison criteria for semantic data models, l+oc. BZEE COMPDEC Conference (1984) 120-125. [57] P. Scheuermann, G. Schiffner and H. Weber, Abstraction capabilities and invariant properties modeling within the entity-relationship approach, in: [12]. [58] M. Senko, A query-maintenance language for the data independent accessing model II, Information Systems 5 (4) (1980) 257-272. [59] D. Shipman, The functional data model and the data language DAPLEX, ACM 7kan.v.Database Systems 6 (1) (1981) 140-173. [60] J. Smith and D. Smith, Database abstractions: aggregation and generalization, ACM nans. Database Systems 2 (2) (1977) 105-133. [61] M. Stonebraker, Retrospection on a database system, ACM Tkans. Database Systems 5 (2) (1980) 225-240. [62] R. Taylor and R. Frank, CODASYL data-base. management systems, ACM Comput. Surveys 8 (1) (1976) 67-104. [63] D. Tsichritxis and F. Lochovsky, Hierarchical data-base management, ACM Comput. Suroeys 8 (1) (1976) 105-124.
116
R. Elmasri
et al. / The category
concept
[64] J.A. Weeldreyer, Structural aspects of the entity-category-relationship model of data, Honeywell Technical Report, HR-80-251, Bloomington, Minnesota, March 1980. [65] G. Wiederhold and R. Elmasri. The structural model for database design, in: [12]. [66] G. Wiederhold, Database Design (McGraw-Hill, New York, 2nd ed., 1977). [67] M. Zloof, Query by example, Proc. National Computer Conference, AFZPS 44 (1975).
CCSC,