Relational database design based on the entity-relationship model

Relational database design based on the entity-relationship model

Data & Knowledge Engmeermg 7 (1991) 47-83 North-Holland 47 Relational database design based on the Entity-Relationship model* Veda C. Storey Wtlham ...

2MB Sizes 29 Downloads 132 Views

Data & Knowledge Engmeermg 7 (1991) 47-83 North-Holland

47

Relational database design based on the Entity-Relationship model* Veda C. Storey Wtlham E Stmon Graduate School of Busmess Admmtstratton, Untverstty of Rochester, Rochester, NY 14627, USA Abstract

Storey, V C , Relational database design based on the Entity-Relationship model, Data & Knowledge Engmeermg 7 (1991) 47-83 The Entity-Relationship (E-R) model is often recommended for use m the database design process because its concepts are percewed to be both natural and easy to use This paper describes a methodology for the design of a relational database based on the E-R model The methodology starts with the identification of the basic E-R constructs that occur m an apphcatlon resulting in an mltlal database design Next, ways of capturing certam semantics of an application through data abstraction and procedures for Identifying potential design problems are discussed The translation of the E-R model into a set of normahzed relations completes the design process This methodology budds on previous work on the Extended EntityRelationship Model and on the automation of database design. Keywords Database design, Entity-Relationship model, relattonal model

Part I. Introduction T h e E n t i t y - R e l a t i o n s h i p ( E - R ) M o d e l [7] is o f t e n used as a tool for c o m m u n i c a t i o n b e t w e e n a d e s i g n e r a n d a n e n d - u s e r of a d a t a b a s e b e c a u s e of its ease of use a n d its c o n v e n i e n c e in r e p r e s e n t a t i o n [30]. A c c o r d i n g to B r o d i e [5], the p o p u l a r i t y of the E - R m o d e l for high level design is d u e to its e c o n o m y of c o n c e p t s a n d belief in entities a n d r e l a t i o n s h i p s as n a t u r a l m o d e l l i n g concepts. This p a p e r p r e s e n t s a m e t h o d o l o g y for r e l a t i o n a l d a t a b a s e design b a s e d o n the E - R m o d e l . It p r o v i d e s b o t h a s t e p - b y - s t e p p r o c e d u r e a n d v a r i o u s g u i d e l i n e s for a good design. T h e p r o c e d u r e consists of first i d e n t i f y i n g the m a i n constructs of the E - R m o d e l - entities a n d r e l a t i o n s h i p s - a n d their associated a t t r i b u t e s . R u l e s are p r o v i d e d for selecting p r i m a r y keys f r o m c a n d i d a t e keys a n d for c a p t u r i n g s o m e of the s e m a n t i c s of the a p p l i c a t i o n t h r o u g h d a t a a b s t r a c t i o n s . This results in a n E n t i t y - R e l a t i o n s h i p s m o d e l of the a p p l i c a t i o n . A n initial set of r e l a t i o n s is t h e n c o n s t r u c t e d a n d n o r m a l i z a t i o n principles a p p l i e d to p r o d u c e a set of r e l a t i o n s that r e p r e s e n t s the d a t a b a s e design. T h e p r o c e d u r e e n s u r e s b o t h e n t i t y a n d r e f e r e n t i a l integrity a n d also aims for c o m p l e t e n e s s in the design. In a d d i t i o n , s o m e

*This research was supported by the IBM Program of Support for Education and Research in the Management of Information Systems, Xerox Corporation, Wdham E Simon Graduate School of Business Administration, Umverslty of Rochester The author wishes to thank Robert C. Goldstem, Terence Barton and Wdham Richmond for their helpful comments on an earher version of this paper 0169-023X/91/$03 50 (~) 1991 -Elsevier Science Pubhshers B V All rights reserved

48

V C Storey

suggestions for identifying potential design problems are provided. The final output is the logical design of a relational Database Management System (DBMS). This paper is dwided into four parts. The rest of this part describes the major steps m the database design process and provides an overview of the methodology. Part 2 describes a procedure for the development of an E-R model. Part 3 discusses the translation of an E-R model into a relational model. Concluding remarks are found m Part 4. The appendix illustrates how the methodology can be applied to a sample database design problem.

1. Database design Database design can be defined as the process of capturmg relevant information and processing requirements of an enterprise and mapping them onto an underlying database management system [10]. The database design process can be divided into four phases: (1) requirements analysis; (2) conceptual design; (3) logical design; and (4) physical design.

1.1. Requirements specification The first phase of the database process is the requirements specification phase during which an analysis is made of the information needs within an organization resulting m a preliminary specification of the information requirements of various user groups [1].

1.2. Conceptual design The conceptual design phase models and represents the users' and apphcations' views of information and, possibly, a specification of the processing or use of the information [1]. The objective of this phase is to produce a high-level representation of the requirements independent of the Database Management System which wdl be used. This high-level representation is called a conceptual schema or conceptual design It is often expressed as an Entity-Relationship (E-R) model or a semantic data model.

1.3. Logical design During the logical design phase, a logical design (or schema) that corresponds to the data model of the chosen DBMS is produced; for example, a relational data model. Batini et al. [1] call this step implementation design because it represents the transformation of the conceptual schema into the logical schema of the DBMS.

1.4. Physical destgn Physical database design transforms the logical design into a form that is suitable for the given hardware and database management system It maps the logical schema into an appropriate stored representation, and determines the physical parameters necessary to optimize the database performance against a set of required transactions [1].

1.5. Scope This paper provides a step-by-step methodology for the development of a conceptual model, expressed as an Entity-Relationship model, and its transformation into the logical design of a relational database management system.

Relattonal database destgn

49

2. Design methodology

2.1. Enttty-Relationship model The Entity-Relationship (E-R) model is based on two main constructs-entities and relationships - and their associated attributes. An entity is a 'thing' of interest in a database; for example, Employee. A relationship is an association among entities; for example, Employees assigned to Projects is an association between the entity types, Employee and Project. Attributes are characteristics or properties that can be identified for both entities and relationships. For example, 'job-title' could be an attribute of Employee and 'length-ofassignment' an attribute of the above relationship. Each entity type plays a particular role in a relationship- for example, travellers make reservations; reservations do not make travellers [24]. Many database design methodologies are based on the E-R model. Since its initial conception, there have been various attempts to modify and extend the E-R model. The extensions include the addition of the category concept [21], subtyping and generalization, semantics for optional relationships [30], and work on converting E-R models into relational data models (for example, Ling [16]). Much of these research efforts are summarized by Teorey et al. [30] who provide a database design methodology called the Logical Relational Design Methodology (LRDM).

2.2. Overview of methodology This paper also describes a step-by-step methodology for designing relational databases. The objective of this methodology is to expand previous research by providing the following: 1. rules and heuristics for distinguishing among entities, attributes, and relationships 2. a refinement of relationship cardinalities 3. rules for identifying the kinds of relationships that can have attributes 4. suggestions for ways of highlighting design problems 5. means of capturing some of the semantics of an application through the inclusion of data abstractions 6. rules for choosing primary keys from candidate keys, including how to deal with situations where one entity type requires another for unique identification 7. a detailed procedure for translating an E-R model into a relational model 8. interpretation of null values of foreign keys in entity relations. Finally, the methodology presented in this paper reflects a continuation of our previous research in this area that focuses on automating database design [25-28]. The design procedure can be divided into the following steps as illustrated in Fig. 1: 1. Identify entity types and their attributes. 2. Identify relationships and their attributes. 3. Examine the design for potential problems and reiterate the first three steps until no more problems can be found or refinements required. 4. Select primary keys from candidate keys. 5. Transform the entities and relationships into a set of relations. 6. Apply normalization principles. The final output ~s a set of normalized relations in (at least) Third Normal Form (3NF) that can be used as input to the physical design phase. In general, the higher the normal form of the relations, the better the design with at least 3NF being considered as sufficient for most practical purposes [19]. Each of the design steps is explained in detail in the following sections.

50

V C Storey

IDENTIFY ENTITIES

IDENTIFY RELATIONSHIPS

iterate

CHECK FOR 4 DESIGN PROBLEMS

SELECT P R I M A R Y KEYS

J

I

CREATE INITIAL RELATIONS

APPLY NORMALIZATION PRINCIPLES

1 PRODUCE FINAL DESIGN Fig 1

Design methodology

P a r t II. D e v e l o p m e n t o f E - R m o d e l 3. Entities The first step in the development of an E-R model is the identification of entity types. At least some entity types must be identified first because at is impossible to define relationships among entity types before any such entity types exist. This does not, however, preclude the addition of other entity types during the step that deals with relationships. Rather, the identification of entitles and relationships can be an iterative process. For each entity type, attributes and candidate keys must be defined. Although it is certainly possible to identify attributes later in this process, minimally, candidate keys should be identified. The design should then be checked for indications of missing entities. For the purposes of this paper, let E = {E,) be the set of all entity types and A = {A,.j} the set of all attributes where A,.j as the/th attribute of the ith entity type. 3.1.

Attributes

Each entity type has a set of attributes (properties or characteristacs) associated with it. Two things need to be taken into consideration when identifying entity attributes:

Relattonal database destgn

51

1. multivalued attributes should be eliminated; and 2. the distinction among entity types, attributes, and relationships, although not trivial, should be clearly made. 3.1.1. Multivalued attributes

The final objective in this methodology is to represent a database design problem as a set of relations that are m at least 3NF. 1NF requires that each attribute be atomic. Therefore, any attribute, A,,j, of entity type, E,, that can have multiple values (concurrently) should be converted into a separate entity type, E k. This implies that a relationship should probably • 1 exist between the original and new entity types, E, and E k, respectively ; otherwise, A,,j would not have appeared initially as part of the definition of the entity type, E,. Example: Project: [NAME 2. . . . task[ A project can have m a n y tasks so 'task' ts a multivalued attribute. Thts becomes:

Project: [ N A M E , . . . ] Task: [NAME, description . . . . ] A relationship ts required between 'Project' and 'Task' as illustrated in Fig. 2.

Similarly, if there are attributes of an entity type, E,, that are of the form XI, X 2, X 3, etc., then X~, X z, X 3 are repeating attributes and X should become a separate entity type. A relationship should exist between E, and X to reflect the original association• Example: Employee: [ E M P # , . . . skdl 1, skill 2, skiU3] 'Skill' is a repeating attrtbute so it becomes a new entity type:

Employee: [ E M P # , . . . ] Skill: [NAME . . . . ] A relationshtp should exist between ' E m p l o y e e ' and 'Skill' as shown in Fig. 3.

The roles that the entity types play are indicated in Figs 2 and 3. 3.1.2. Attrtbutes versus entities

Sometimes the distinction between entities and attributes is difficult in the EntityRelationship model 3. Teorey et al. [30] provide some guidelines for distinguishing between

TASK

Poo

Ir IPRO CTt

Fig 2. Multlvalued attribute Task becomes new entity

EMPLOYEE]Acquires ~

Found-in SKILLS

Fig 3 Repeating attribute becomes new entity LAlthough this reflects a normahzation ~ssue, ~t ~s dealt with at this point because it identifies an enttty and, possibly, a relationship that need to be defined 2In this paper, keys are represented by using capital letters See Section 3 2 ~It is often noted in the hterature that the distraction between entity types and relationships is also difficult with a classic example being "Should mamage be a separate entity type or a relationship between two person entity types 9'' Sometimes tt does not matter. For example, marriage can be dealt with using either of the above approaches Both can he correctly represented In the final relational model and provide essentially the same result

V C Storey

52

entity types and attributes. They include the fact that entity types have descriptive information while attributes do not and that multlvalued attributes should be converted to entity types as discussed above. Another situation where it becomes important to distinguish between entities and attributes is when an attribute, A,.I, of entity type, E,, is the name of another entity type, E k. When this occurs, two possibilities exist: 1. The entity type, Ek, is needed for the unique identification of E,. Then, A,,j, should be replaced by the primary key attribute(s) of E k. Since primary keys have not been selected at this point, the entity type's name, A,, 1, is used as a surrogate. Example: Company: Plant: Eventually Plant:

[NAME, address . . . . ] [COMPANY, NAME, location, .. ] 'Plant' will become: [COMPANY_KEY, NAME, location . . . . ].

2. The entity type, Ek, is not needed as part of the key of E,. Then, the attribute is really suggesting a relationship between the two entity types. In essence, it is implying that this relationship should be represented by a foreign key 4 (attribute A,4 is being used as a foreign key in entity E,.) However, one cannot assume that the relationship between E, and E k will be represented by a foreign key until the relationship's cardinalities have been examined. This, in fact, signals a possible design error. A,I should be deleted from E, and a relationship between E, and E k is implied. Example: Company: [NAME, a d d r e s s , . . . ] Employee: [EMP#, address . . . . . company] 'Company' is not needed to identify uniquely the entity type 'Employee' so it wdl be deleted. A relationship is required between 'Employee' and 'Company' as dlustrated m Ftg. 4.

Order of Definition. In the above example, the Company entity type had already been defined when 'company' became an attribute of Employee. However, it could be the case that Company becomes a separate entity type after it already appeared in the definition of

EMPLOYEE

COMPANY

[EMIa#, addr~s ..... company]

[NA.M~, address, . ]

Becomes:

EMPLOYEE

Employs

[EMP#, address.... ]

COMPANY

[NAME, address. . . .

]

Fig 4 Relationship reqmred instead of attribute 4A foreign key is a non-key attribute m one relation that appears as the primary key (or part of the primary key) m a n o t h e r relation [19]

Relattonal database destgn

53

the Employee entity type. As a result, constant checking is required to ensure that all cases of one entity type appearing in another entity type's definition are identified and dealt with.

3.2. Candidate keys Candidate keys should be specified for each entity type. A candidate key is an attribute or group of attributes that can be used to identify uniquely an occurrence of an entity type in the database. Eventually, a primary key will be chosen from the candidate keys to be used in the final design. Example: Employee: [EMP#, name a d d r e s s , . . . ] Employee: [NAME, ADDRESS e m p # , . . . ] 'Employee' has two candidate k e y s - [EMP#] and [ N A M E , ADDRESS].

3.2.1. Generic key attributes Certain attributes such as name, number, code, td, etc. are often used as identifiers. These are called generic key attributes and should be considered as possible candidate keys when they appear in an entity type's definition. In addition, there may be domain-specific attributes such as 'emp#' that re-occur in a design and should also be considered. One simple way to check for missing entity types involves examining each existing entity type to determine if any of its attributes has a 'generic key attribute' in its suffix that could imply a missing entity type. That is, if an attribute, A,.j, of entity type, E,, is of the form Att_Genenc-Key-Attribute, it might be the case that Att should be a new entity type, E k. If so, then a relationship should exist between E, and E k. Example: Example: [EMP#, dept_name . . . . ] 'Department' should become a new entity type and a relationship should exist between ' Employee' and ' Department'. ' Dept_name' ts deleted from ' Employee'. (It might later be included if the relationship between 'Employee' and 'Department' is eventually represented by a foreign key.)

4. Relationships The second major construct of the E-R model is the relationship construct. Binary, unary, and n-ary relationships are discussed.

4.1. Binary relationships Many database problems in the real world can be modelled using binary relationships of the form A Verb Phrase B, where A and B are entity types and the Verb Phrase is the relationship between the two; for example, Companies have Employees. The procedure for dealing with this type of relationship is: 1. identify the relationship 2. assign mln/max cardinalities 3. if appropriate, identify relationship attributes.

4.1.1. Min/max cardinalities For a relationship, A Verb Phrase B, min/max cardinalities describe the minimum and maximum number of A values that can occur for each value of B in the relationship and vice versa. Tsichritzis and Lockovsky [31] refer to this as the minimum and maximum cardinality

54

V C Storey

of the mapping. If, for example, each value of A can have from 1 to many (represented by '* ') corresponding values of B, then the min/max cardmalitles of A are (1, *). If each value of B has one and only one corresponding value of A, the mm/max cardinalities of B are (1, 1) 5. Example:

Departments have Employees

(1,*)

(1,1)

Interpretation: Each department has between one and many employees; each employee works for one and only one department. The min/max cardinalities will eventually be used to ascertain how a relationship will be represented in the final design. The min/max cardinalities are a refinement of the usual values asmgned through the E-R model as illustrated in Fig. 5. In the E-R model, this relationship, Departments have Employees, would be represented as shown in Ftg. 6. Comparing Fig. 5 and Fig. 6 shows that the placement of the min/max cardinalitles appears to be opposite to the notation found in the E-R model.

4.1.2. Relationship attributes As with entities, relationships can have attributes-properties or characteristics of the relationship as a whole. However, unlike entities which usually have corresponding attributes, only certain types of relationships can have attributes. These relationships are identified by examining the min/max cardinalities of A and B. Relationship attributes exist only when neither A nor B have min/max values (1, 1). Such cardmalities correspond to what are normally referred to as many-to-many relationships. Example:

Re~nonsh~:

Employees work-on Projects: [start-date, end-date] (0,*)

(1,*)

Relationsh~ attrtbutes are'smrt-date'and'end-daw'astllustratedmFtg.

7.

Suppose that for a relationship A (1, I) Verb Phrase B (0, .)6, a relationship attribute does exist; call it R.. Then, R. is a function of both A and B; that is, R. = f ( A , B). However,

DEPARTMENTS l~;*~ires ~

Work-in (1,1) EMPLOYEES

Fig 5 MIn/Max cardlnahtles

(N)

DEPARTMENTS

Work-in EMPLOYEES Fig. 6 Mappings of E-R model

SThe m m / m a x cardmahtles for A can be obtained by considering the questions (1) Is there at least one value o f "B' fo r every ' A ' 9 (If there is, then the m m value for A is 1, otherwise, it is 0), and (2) Is there at most one value o f ' B' f o r every ' A ' ~ (If there is, then the max value for A is 1, otherwise, it is many ( * ) ) Similarly for the mln/max cardmalmes of B 6The following proof was suggested by Yalr Wand

55

Relattonal database destgn

(o,,) ~ . ~ J t ~ EMPLOYEES ]/~igned-to ~

(I,,) Requires

PROJECTS

[start-date, end-date] Fig 7 Relationship with attributes

since there is one and only one B for every A, then B is a function of A. As a result, R~ = f ( A , B ) = f ( A , g(A))= f ' ( A ) . Therefore, what was thought of as a relationship attribute is probably an attribute of the entity with the (1, 1) cardinalities.

4.1.3. Relauonship as entity A relationship can also be thought of as an entity type [8]. This corresponds to relationships where netther of the min/max values are (1, 1) being the ones that can be treated as entities. The reason is that, in the final database design, the resulting relation, representing the relationship, will be the same although its name might differ; that is, the semantics will be the same. The details of the translation process are discussed in Section 8. Example:

As a relationship: Employees use Computers: [%-of-time]

Representation in the final design: Employees_use_Computers7: [EMPLOYEE_KEY, COMPUTER_KEY, %-of-time]

As an entity type: Computer-Use: [EMPLOYEE, COMPUTER, %-of-time]

Representation in the final destgn: Computer-Use: [EMPLOYEE_KEY, COMPUTER_KEY, %-of-time].

4.2. Unary relationships A relationship of the form Z Verb Phrase Z is referred to as a unary relationship [30], an example of which is shown in Fig. 8. For such relationships, it might be necessary to obtain further refinement on the kinds of Z entity types that can exist. This is determined by the role that each plays in the relationship and could result in the following: 1. the addition of two new entity types Z~ and Z 2

EMPLOYEE I

m~ages

Fig. 8 Unary relationship. 71n this paper, for a relatlonsh~p A Verb Phrase B, the concatenation of the three components Is used to obtain the name of the relation that represents the relationship; that ,s, A_Verb Phrase_B.

v C Storey

56

2. the addition of two new relationships: (a) Z 1 is-a Z. (b) Z 2 is-a Z. 3. modification of the original relationship, e.g.:

Relationshtp : E m p l o y e e s manage E m p l o y e e s

Becomes: M a n a g e r _ E m p l o y e e manages W o r k e r _ E m p l o y e e s M a n a g e r _ E m p l o y e e i s - a n Employee Worker_Employee i s - a n Employee.

The enttty tape 'Employee' has subtypes 'Manager_Employee' and 'Worker Employee' as illustrated in Fig. 9. Thus, a relationship of the form Z Verb Phrase Z should be examined to determine if it suggests that important subtype entities are missing.

4.3. N-ary relattonships Some relauonships might involve three or more entity types, e.g.:

Employees w o r k - o n Prolects us i n g Equtpment Students t a k e Courses a t Untversities from Professors. These relationships can also have attributes. In the above example, 'hrs-worked' would be an appropriate attribute of the first relauonship and 'grade' an attribute of the second. An n-ary relationship can also be thought of as an entity type where the key of the entity type depends upon the semantics of the application but is, at most, the concatenation of all of the involved entity types. The non-keys are the relationship attributes. For example, the relationship Employees work-on Prolects using Equtpment could be thought of as an entity type Assignment:

Entity: Assignment: [EMPLOYEE, PROJECT, EQUIPMENT, hrs-worked] (This assumes that all three entity types are needed for the untque identification of ' Asstgnment'.)

EMPLOYEE CO,I) J ~ C°uld~"/

(1,1) ~

b-Atw~y,~ MANAGER- (1,.) EMPLOYEE

(O,l)

~

~

Doem-Supervising~

.

(1,1)

"-,~,b-Al~y, (1,1)[ WORKER-

Supervia~l-By[ EMPLOYEE

FAg 9 Employeehas two subtypes- Manager and Worker.

Relattonal database destgn

57

It might not be necessary to include all of the entity types in order to identify uniquely the relationship, e.g. [30]:

Engineer uses Casebook on Project If: (1) an engineer uses one casebook for a given project; (2) different engineers use different casebooks for the same project; (3) no engineer uses the same casebook for different projects, but different engineers can use the same casebook for different projects; then any combination of two entity types will uniquely identify the relationship.

4.4. Missing and redundant relationships In an effort to ensure that the design is a complete representation of the real world, it should be checked for m~ssing relationships. The following is a set of heuristics for doing so. 1. An entity type, E,, that does not appear in any relationship might signal a missing relationship since entity types that are important in a design usually appear in at least one relationship. Such unattached entities are easy to highlight in an E-R diagram and are referred to as 'orphaned' entity types by Tauzovich [29]. 2. The appearance of an attribute of one entity type that refers to some other entity type suggests that a relationship should exist between the two entity types. This situation could arise in a number of ways: (a) If an entity type, E,, originally had a multivalued attribute that was converted to another entity type, E k, then a relationship should exist between E, and E k. (See Section 3.1.1.) (b) If an entity type, E, originally had repeating attributes of the form X 1, X 2, X 3, etc., and X became a new entity type, then a relationship should exist between E and X. (See Section 3.1.1.) (c) If the name of one entity type, E~, originally appeared as an attribute of another entity type, E,, then a relationship between E,, and E k is implied. (See Section 3.1.2.) (d) If an entity type, E, originally had an attribute of the form X_Generic_KeyAttribute and X became a new entity, then a relationship should exist between E and X. (See Section 3.2.1.) Tauzovich [29] suggests that a design should be examined for three types of traps: 1. A fan trap which occurs when an entity type has a one-to-many relationship with two other entity types which are not directly related but perhaps should be, e.g.:

Manager manages Employees Manager heads Projects

Sugges~ ~at ~erecould bea relationsh~ between Emp~yeeand Prolect(e.g. Employee ass~ned-to Pr~ects). 2. A transitive relattonship trap occurs when a direct link between two entity types could be redundant, e.g.:

Foreman is-a Manager Manager is-an Employee Foreman is-an Employee

The relationsh~ between Foreman and Employeets redundantbecauseitcan beinferred ~om ~ e f i ~ t t w o re~onsh~s.

v c Storey

58

3 A connection trap occurs when the only link between two entity types is via a third entity type through an optional relationship which suggests that, for some occurrences, the link that perhaps should exist is missing, e.g.:

Employee assigned-to Project

(0,*)

(0,*)

Special-Skills (0,*)

required-by

Project (0,*)

Suggests that a relationship between Employee and Special-Skdls (e.g. Employee has Special-Skills) might be missing. 4.5 Data abstractions Brodie [4] suggests that there is an ever-growing need to capture more meaning of information stored in a database m order to provide higher-level data modelling concepts. A number of researchers have either discussed the need for data abstractions or have begun to explore their impact on database demgn 8. In general, an abstraction is a simplified description, or specification, of a system that emphasizes some of the system's details or properties while suppressing others [23]. The most common forms of abstractions are specialization, generalization, aggregation, and association.

4.5.1. Is-a relattonshtps The well-known is-a relationship, A is-a B, implies that something that is true of the generic entity type B must also be true of the specific entity type A [3]. For each occurrence of A, there is one and only one corresponding occurrence of B; for each occurrence of B, there may or may not be one corresponding occurrence of A. Therefore, the min/max cardinalitles are: A

is-a

(1,1)

B

(0,1)

Is-a hterarchtes. It is possible to have ts-a hierarchies of the form A is-a B, B ls-a C, etc. Then, any attributes that are common to a set of entity types in an is-a hierarchy should be stored at the highest level entity type and deleted from the specific entity types below (unless they are part of a key). The specific entity types can simply 'inherit' these attributes from the most generic entity type in order to minimize the amount of redundancy that appears in a design. This is possible because all of the is-a relationships must be represented in the final design. Example: Relationships: Employee is-a Person Manager is-an Employee

~tttalsetofenKties:

Person: [SSN, name, address, b-day, phone] Employee: [EMP#, name, b-day, dept, phone] Manager: [TITLE, dept, name, address, phone, b-day]. 8See, for example, Ullman [32], Mattos [17], Mattos and Mlchels [18], Meersman [20], Brodle [5], Smith and Smith [24], Blaha et al [2], Davis and Bonnell [9], Potter and Kerschberg [22], and Goldsteln and Storey [12]

Relanonal database design

59

Employee can inhertt the attrtbutes 'name', 'b-day', and 'phone' from Person. Manager can mherit the attributes 'name', 'address', 'phone', and 'b-day' from Person and 'dept' from Employee. Fmal set of entities: Person:

[SSN, n a m e , a d d r e s s ,

b-day,

phone]

Employee: [EMP#, dept] Manager : [TITLE ] One way, therefore, to check for a design problem is to ensure that, for each relationship,

A is-a B, all of the attributes of B can be attributed to A. If not, there is a design error. Either the is-a relationship is incorrectly specified or some attributes of B are wrong. Example:

Professor is-a Staff Staff: [ SSN,..., hourly-rate] Professor: [SSN,... ] Professors do not have hourly rates tmplying that either the relattonship is incorrect or 'hourly-rate' should not be an attribute of'Staff'. Another possibility is that 'Staff' should be divided into two types, for example, 'Admimstrative-Staff' and 'Faculty', with 'hourly-rate' an attrtbute of the former Generalization and subtype hierarchies. When the is-a relationship involves only two entity types, it is called a specializatton relationship. A generalizatton hierarchy occurs when there are two or more specific entity types that partition the generic entity type [24, 30]. For example, if an Employee must be either a Secretary, Trucker, or Mechanic, then the three relationships Secretary ts-an Employee, Trucker ts-an Employee, and Mechanic is-an Employee form a generalization hierarchy. If an occurrence of the generic entity type can correspond to occurrences of more than one of the specific entity types, then overlapping among the specific entity types can occur. This results in a subtype hierarchy [30]. For example, the relationships Full-Time ts-an Employee, Secretary is-an Employee, and Engmeer is-an Employee form a subtype hierarchy. Specification of Is-a relattonships. Is-a relationships should be examined to ensure that they are properly specified m a design. Multiple ts-a relationships involving the same specific entity type A - A ts-a B and A is-a C, for example - should be examined for possible design errors. The relationships A is-a B and A is-a C are A ts-a C are probably incorrectly specified if B and C are mutually exclusive. If they are not mutually exclusive, this situation could be perfectly correct ~. An example is provided in Fig. i0 which indicates that a Project-Participant is both an Employee and a Consultant at the same time. If Employee and Consultant are mutually exclusive, then there is an error in the design: 1. There could be two kinds of project participants, Staff-Project-Participants who are employees and Contract-ProIect-Participants who are outside consultants. Then, the is-a relationships should be Staff-Project-Partictpant ts-an Employee and Contract-Project-

Parncipant is-a Consultant. 2. A project participant could be either an Employee or a Consultant but not both at the same time. One way to correct the design error is to change the Project-Partwtpant entity type so that it contains only those attributes that are common to all project 9This Is discussed further m Goldstem and Storey [13]

60

V C Storey

EMPLOYEE

CONSULTANT

PROJECTPARTICIPANT I Fxg 10 Two ls-a relationships revolving same

specific

entity type

participants. Then, two other entity types need to be a d d e d - (a) Employee-ProjectParticipant with corresponding is-a relationships to both Employee and Project-Partictpant; and (b) Consultant-Prolect-Participant with corresponding ts-a relationships to both Consultant and Project-Participant. Inheritance confltct. When an entity type appears as the specific of two generics, the possibility of inheritance conflicts needs to be examined. Consider, for example, the two relationships A is-a B and A ts-a C. If B and C have attributes with the same name, it is necessary to determine whether or not the attributes refer to the same thing semantically. Assume, for example, that Student-Employee is-a Student and Student-Employee is-an Employee, and both Student and Employee have an attribute 'fee' which refers to the fee that a student pays to take courses and the hourly rate that an employee charges for his or her services, respectively. Then, Student-Employee would be inheriting two different kinds of attributes with the same name. One way to resolve this problem is to change the name of the attributes to reflect more of their semantics.

4.5.2. Aggregation An aggregation or component-of relationship, A component-of B, can take place among entities, attributes, and relationships, or a combinations of these. For example, the attribute 'address' is the aggregate of 'street', 'city', and 'mpcode'. The entity type 'committee' is an aggregate of the entities Secretary and Chairperson, and the attribute 'Committee Name' [6]. The Assignment relationship is an aggregation of the entity types Employee and Project and the attributes 'start-date' and 'end-date'. There are, m general, three types of components - relevant, characteristic, and identifying [11]. A relevant component may occur, but is not reqmred; for example, Air Conditionmg Component-of Car. A characteristic component is required in order to have the aggregate; for example, Wheel component-of Car. An identifying component is both required and uniquely identifies the aggregate; for example, Engme component-of Car. The min/max cardinalitms for aggregation relationships are [12]:

A component-of (_,_) (_,_) (I,i)

B (0,> O) (~i,__) (I,__)

(Relevant) (Characteristic) (Identifying)

In other words, when the A entity type is relevant, the mimmum cardinality of B should be zero to reflect the fact that the component is an optional one. When A is a characteristic component, the minimum value of B must be at least one in order to ensure that at least one A exists. Finally, when A is identifying the min/max values of A must be (1,1) to reflect the fact that an identifying component can correspond only to one aggregate; each aggregate must have at least one of the identifying components, so the minimum value for the

Relattonal database destgn

61

aggregate, B, must also be 1. Understanding the type of component entity type helps to ensure that the min/max cardinalities are specified correctly. The different types of component entity type also have implications for determining keys. In general, an identifying component, A, can always be used to identify the aggregate, B. Sometimes the key of the aggregate is needed for the unique identification of the component; for example, if Seat component-of Flight, then, in order to identify uniquely a Seat, one needs to know both the key of Seat (for example, 'Seat#') plus the corresponding Flight. Another possibility is that the key of the aggregate might be sufficient to identify the component; for example, the identifier for Car uniquely identifies its Engine.

4.5.3. Association An association is an abstraction in which a collection of members is considered a higher-level (more abstract) set [6]. The details of the member objects are suppressed and properties of the set object are emphasized. There are no restrictions on the min/max cardinalities:

A

member-of

(0,*)

B (0,*)

One could, however, impose some restrictions. For example, if Person member-of Club, it could be possible that a Club could not be defined unless it has at least one member in which case the minimum cardinality for Club would be one. On the other hand, there could be a club without any members (people have not yet paid their dues) so the minimum value of zero holds. In a relationship, A member-of B, the B entity type can have properties of its own. There are two different kinds of such p r o p e r t i e s - a priori and derived. Derived properties are based on the properties of the membership of the set. For example, 'average age' is a derived property of a committee. A priori properties, for example, 'maximum number of members' (of a committee) cannot be derived from its set of members. There must be provisions made in the database design for both the derived and a priori attributes. Furthermore, the key of the association entity type B can be an a priori, but not a derived, attribute. Example:

Entities: Person: Health-Club:

[SSN,... ] [HEALTH-CLUB-ID, max#-members, current#members, avg-age]

The associatton entity type, 'Health-Club' has a prtort attribute 'max#-members' and derived attributes, ' current#-members' and 'average-age'. Association Relationship:

Person member-of Health-Club: [date-joined]

(0,,)

(0,,)

5. Potential design problems At this point, an initial E-R model has been developed. It should now be examined for potential problems. For example, in an attempt to minimize redundancy and ambiguity, the design should be examined for synonyms of both entity types and relationships.

62

V C Storey

5.1. Enttty type s y n o n y m s

One way to identify entity types that are synonymous is by examining the relationships in which they appear. For example, relationships of the form: • E t Verb Phrase E 3 • E 2 Verb Phrase E 3

suggest that E~ and E 2 might be either synonyms or related in some way that has not yet been identified. For example, Relationshtps :

Employees work-on Projects Managers work-on Projects is of the correct form, however, obviously, E m p l o y e e and M a n a g e r are not synonymous. They are related, through, because Managers are Employees. It is important that such a relationship be included m the design. 5.2. Relattonship s y n o n y m s

To check for relationship synonyms, one could examine relauonships of the form: • E I Verb Phrase I E 2 • E I Verb Phrase 2 (or E 2 Verb Phrase 2 E 1)

These forms could imply that the same information is being represented in different ways. For example, the relationships: Employees work-on Projects Project requires Employees could be conveying the same information. 5.3. Iterattons

The lmtial steps m the design process can be iterative. New entity types and attributes might come to mind as relationships are identified and as the design is examined for errors. Thus, these steps might need to be reiterated until the E-R model appears to be an accurate representation of the database problem.

6. Primary keys The initial design of the E-R model is almost complete. The final step in this stage is to select primary keys from candidate keys before transforming the E-R model into a relational data model. Entity integrity requires that each entity have a umque existence in the database. This is done by ensuring that only one object of one type exists and that each object be uniquely identified by a non-null key [5]. For each entity type, a primary key is selected from its set of candidate keys.

Relattonal database destgn

63

6.1. Selection rules Each entity type, E,, has a set of candidate keys that umquely identify an occurrence of it. The following are two heuristics for the selection of a primary key from a set of candidate keys. 1. The simplest candidate key is chosen over all others; that is, the one with the least number of attributes. This will reqmre less work when processing queries against the database. 2. If there ~s a tie for the simplest candidate key, then choose the one that is the most application-specific. If it is not obvious which one is, an arbitrary choice can be made. Primary keys for entity types involved in is-a relationships should be selected first; then all others can be obtained. Further adjustments might be needed to reflect the fact that one entity type requires another for umque identification.

6.2. Entity types tn is-a relattonships In a relationship A is-a B, anything that is true of the B entity type is also true of the A entity type. In particular, any attribute or group of attributes that uniquely identifies an occurrence of B should also uniquely identify an occurrence of A. For entities involved in ts-a hierarchies, the primary key for the highest level entity in the is-a hierarchy (that is, the highest generic entity type) is chosen. Each of the specific entity types below this entity type can inherit the generic entity type's primary key as one of its candidate keys. The general rules for the selection of candidate keys can then be applied to the new set of candidate keys for the specific entity type. After the primary key has been chosen for this specific entity type, any specific entity types of which it is generic can inherit its primary key as a candidate key, and so forth, down the is-a hierarchy. Example: Relationships: Employee is-a Person Manager is-an Employee Entt~es: Person: Employee Manager:

[SSN, name, address,... ] [EMP#,... ] [TITLE, DEPT,... ]

The candida~ keys for'Employee'become:

Employee: [EMP#,...,ssn] Employee: [SSN,...,emp#] ' E m p # ' is selected as the primary key of 'Employee' because it is more domamspecific than ' SSN' : Employee: [EMP#,...,ssn] The candida~ keys for'Manager'become:

Manager: [EMP#,...,dept, title] Manager: [DEPT, TITLE,...,emp#] The primary key for 'Manager' is selected:

Manager: [EMP#,...,dept, title]

64

v. c. Storey

Note that, when a specific entity type 'inherits' the primary key of its generic entity type, it might be inheriting attributes that were not part of its original definition. When this occurs, these attributes are added as non-key attributes. In the above example, when Employee inherits the primary key of Person as one of its candidate keys, 'ssn' is added as a non-key attribute in the existing candidate key. The semantics of the /s-a relationship between Employee and Person dictate that this is a legitimate attribute of Employee. If an entity type is a specific entity type in two or more is-a relationships, then it can inherit the primary keys of each of its generic entity types as candidate keys.

6.3. One entity type requires key of another One entity type name might occur as part of the definition of another. There are two cases to consider: 1. If an entity type, E k, appears as part of the primary key of another entity type, E,, then replace E k with its primary key. Example:

Company: Department: becomes:

[COMPANY_NAME,... ] [COMPANY, DEPARTMENT_NAME,...

Department:

[COMPANY_NAME, DEPARTMENT_NAME, . . . ]

]

2. If an entity type E k, appears, as a non-key attribute of another entity type E,, then E k should be deleted as an attribute of E, and a relationship specified between the two entity types. This situation could arise because a candidate key of E, required E~ as part of its composition but this candidate key was not chosen as E,'s primary key. Example:

Company: Department: becomes:

[COMPANY_NAME,... ] [DEPARTMENT_ID,...,company]

Company:

[COMPANY_RAME,... ]

Department: [DEPARTMENT_ID,... ] Re~tionsh~: Company divided-into Departments

Part III. Logical design The previous part of the methodology was concerned with the development of a conceptual model expressed in the form of an E-R model. The translation from the conceptual to the logical design can now take place. In this case, the logical data model is the relational model. The translation process involves the following three steps: 1. represent entities 2. represent relationships 3. normalize relations. Each of these is discussed below.

7. Entity representation An entity type is represented by constructing an entity relation whose key and non-key attributes correspond directly to the those of the entity type; m other words, the imtial entity relations correspond exactly to the entity types. These entity relations may be modified as relationships are represented.

Relattonal databasedesign

65

8. Binary relationship representation relationship, A Verb Phrase B, is represented in one of the following ways: adding the key of B as a foreign key in the A entity relation adding the key of A as a foreign key in the B entity relation constructing a new relation whose key is the concatenation of the keys of the A and B entity types and whose non-key attributes are the attributes of the relationship. The choice of representation depends on the min/max cardinalities of the A and B entity types. A 1. 2. 3.

8.1. Representation rules Two general rules are used to determine how a binary relationship should be represented in the design. 1. One entity type has (1, 1) cardinalities If the min/max values of one of the entities are (1, 1), then the relationship should be represented by adding the key attributes of the other entity as a foreign key in the entity relation with the min/max values (1, 1) (if, of course, the attributes are not there already). Qualification of the foreign key attributes might be necessary in order to make the semantics clear. This is especially true if two or more entity types have the same key or the key is a 'generic key attribute ''°. Example:

Entices:

Employee [EMP#,...] Manager: [EMP#,...] Project: [PROJECT_ID, description,...] Re~onsh~: Managers supervise

Projects (1,1) Representby making ~e key of'Manager'afore~n key in 'Project':

(1,*)

Project: [PROJECT_ID, description,..., manager_emp#] The reason that such a relationship can be represented in this manner is as follows. First, when one of the entity types has min/max values of (1, 1), there can not be any relationship attributes as discussed in Section 4.1.2. Second, if the min/max cardinalities of B (for example) are (1, 1), then there must be one and only one corresponding value of A for each value of B. Thus, the foreign key will be single-valued (so there will be no violation of 1NF) and, also, it must exist. A special case of this rule occurs when the min/max values of both A and B are (1, 1). Then, the relationship can be represented by: (1) adding the key of A as a foreign key in the B entity relation; (2) adding the key of B as a foreign key in A; or (3) doing both (that is, adding the key of A as a foreign key in B and adding the key of B as a foreign key in A). The choice of representation should depend upon how the queries will be processed against the database. Example:

Manager: [MANAGER_ID,...] Restaurant: [RESTAURANT_NAME,...] Manager manages Restaurant (1,1) (1,1) ~°See Section 3.2 1

V C Storey

66

If the query of most mterest involves determinmg who manages whtch restaurant, then, the relationshtp would be represented by makmg the key of 'Manager' a foreign key in 'Restaurant':

Restaurant : [RESTAURANT_NAME,..., manager_id ] 2. Separate relatton When neither entity type has min/max cardinalities of (1, 1), the relationship can be represented by constructing a separate relation whose key is the concatenation of the keys of the A and B entity types and whose non-keys are the relationshlp's attributes. The name of the relationship could be the name of the verb phrase that describes the relationship if it is unique, the concatenation of the participants in the relationship (that is, A_Verb Phrase_B), or some other name Example: Entities:

Employee: [EMP#,...] Project: [PROJECT_ID,...] Rela~onsh~:

Employees assigned-to Projects; [start-date, end-date]

(0,*)

(1,*)

Representas:

Employees_assigned-to_Projects: [EMP#, PROJECT_ID, start-date, end-date] Although these are the general rules for binary relationships, two cases deserve special consideration: (1) is-a relationships and (2) relationships where the cardmalities of both entity types are (0, 1). These are discussed in Section 8 3 and Section 8.4, respectively.

8.2. Incluston of mm values The incorporation of the mm values - the mlmmum number of occurrences of one entity type that must exist for an occurrence of the other entity type in a relationship-gives precImon to the analysis. It provides a means of: (1) either ensuring referential integrity or providing an interpretation of null values of foreign keys; and (2) identifying when a relationship can have attributes.

8.2.1. Referential integrity Referential integrity requires that if object X is referred to by object Y then Y can exist only if X exists [5]. Example (from [30]):

Department contains Employees

(1,*)

(1,1)

Represent ~ e re~6onsh~ by making ~ e k ~ of'Department' a fore~n key m 'Employee': Department: [DEPT-NO,...] Employee: [EMP-NO,..., dept-no] Here, referential integrity is guaranteed because a foreign key is adopted only when the mln/max values of the entity that receives the foreign key are (1, 1). In the above example, Employee has min/max values of (1, 1) indicating that, for each employee, there must be one and only one corresponding department. Hence, the foreign key should never be null. If

Relattonal database deszgn

67

null values do occur m a foreign key, the interpretation of the null value is that It is either unknown or not available but it must exist. This provides a semantic interpretation of null foreign keys. Thus, due to the manner in which relations are constructed, referential integrity should always be guaranteed. The cost of guaranteeing referential integrity is the cost of constructing a new relation in some cases, as illustrated below. This cost can be traded-off against knowing the semantics of a null foreign key. At an implementation level, a decision would be required as to which is best. Contrast this approach with the manner in which the cardinalities of the E-R model are normally considered. Usually, in the E-R model, one deals with one-to-one (1:1), one-tomany ( I : N ) , and many-to-many ( N : M ) mappings. Two examples of how this methodology compares are given below. 1. Department contains Employees is clearly a 1 : N relationship. In LRDM, for example, this relationship is also represented by a foreign key but with a constraint, i.e. (from [301): Represent as:

Employee: [EMP-NO,..., dept-no] where the attribute 'dept-no' is allowed to have a null value in the 'Employee' entity relation.

Such qualification of foreign key values is not necessary in the methodology presented here. 2. Suppose that the min/max values are (1, *) and (0, 1) for A and B, respectively. In this methodology, the relationship would be represented by a separate relation, e.g. (from

[301): Engineer: [EMP-NO,...] Secretary: [EMP-NO,...] Secretary works-for Engineer (1,,) (0,1) Represent ~ : Secretary_works-for_Engineer: [SECRETARY-EMP-NO, ENGINEER-EMP-NO] Note the qualification on the key attributes. Prefixing the attributes with their entity names enables one to distinguish whxch employee is the engineer and which one is the secretary. In L R D M this is represented as:

Engineer: [EMP_NO,..., sec-emp-no] Secretary: [EMP_N0,...] where Me attribu~ 'sec-emp-no' is allowed ~ have a nuH value m Me ' E n g i n e e r ' e n ~ re~6on.

8.2.2. Relattonship attributes Further suppose that the above relationship, Secretary works-for Engineer, has a relationship attribute, 'begin-date'. Since neither entity type has (1, 1) cardinalities, relationship attributes can exist. Example:

V C Storey

68

Secretary works-for Engineer: [begin-date] (1,*) (0, 1) Represent as: Secretary_works-for_Engineer

:

[SECRETARY_EMP_NO, ENGINEER_EMP_NO, begin-date] It is not clear how this is handled m LRDM, but it is assumed that a separate relation would be constructed, provided it were known m advance that the relationship attribute is important to the problem. If the relationship attribute were not known in advance, however, a modification to the design would be required (from a foreign key to a separate relation representation). According to Wilmot [33], a relation should always be constructed to represent a relationship. Some of his arguments are based on the belief that the requirements of a database are not known in advance, and, thus, there is a need for a great deal of flexibility in database design. However, provided that the mm/max cardinalities truly reflect the semanncs of the real world, for relationships where the min/max cardinalities of at least one of the entity types are (1, 1), the foreign key approach should provide a proper representation and a separate relation need not be constructed (which, presumably, is more expensive).

8.3. Entity types in ls-a relattonships A relationship A is-a B always has cardinalities A(1, I) and B(O, 1). Following the general rules for representing relationships, this would be represented by making the key of B a foreign key in A. There are, however, some special considerations that need to be given to is-a relationships as outlined below.

8.3.1. Speciahzatton If, in a specialization relationship A is-a B, the specific entity type has no unique attributes, then the relationship can be represented by adding a 'role' or discriminating attribute to the generic entity type to indicate whether or not each occurrence of the generic entity type is also an occurrence of the specific one. Example: Relationshtp : Manager is-an Employee Represent as: Employee: [ E M P # , . . . , m a n a g e r - y e s - o r - n o ] When the specific entity type has attributes of its own, then a separate relation must be created for it. Recall that, when the primary key of a specific entity type, A, in a relationship A is-a B is determined, there are two possibilities for the primary key of A. A either: (1) adopts the primary key of B; or (2) adopts one of its own candidate keys as its primary key. When the primary key of A is the same as that of B, the relationship A is-a B is inherently represented. Suppose, however, that an original candidate key of A were chosen as its primary key. Recall, from Section 6.2, that, when a specific entity type inherits the primary key of its generic entity type, it adds any attributes that it does not already have, but which are part of the primary key of the generic entity type, as non-key attributes. In effect, these non-key attributes serve as a foreign key that represents the/s-a relationship. Example (from Secnon 6.2):

Relattonal database design

69

Relationship: Employee is-a Person After primary keys are selected:

Person: [SSN, name, address,... ] Employee: [EMP#,..., ssn] The attribute 'ssn' serves as a foreign key of 'Person' in the 'Employee' relation, thus, representing the relationship.

The /s-a relationship, therefore, is implicitly represented through the selection of primary keys. 8.3.2. Generalization hierarchy In a generalization hierarchy, the generic entity type is the union of non-overlapping specific entity types, and the specific entity types partition the generic entity type. There are two cases to consider: (1) when the specific entity types have their own, unique attributes; and (2) when they do not. 1. Specific entity types have their own attributes. When the specific entity types have their own, unique attributes, then the generic entity type should contain the attributes that are common to all the specific entity types. The specific entity types should contain their keys, plus their unique attributes. Example: Relationships: Trucker is-an Employee Secretary is-an Employee Mechanic is-an Employee Represent as:

Employee: Trucker: Secretary: Mechanic:

[EMP#, name, address, phone, b-day] [TRUCKER_EMP#, licence-plate-no ] [SECRETARY_EMP#, words-per-minute ] [MECHANIC_EMP#, skill-level]

2. Specific entity types do not have their own attributes. If the specific entity types do not have any unique attributes, then there are two ways in which to represent the hierarchy: 1. A role, or discriminating, attribute can be added to the generic entity type. This attribute indicates to which of the specific entity types a particular entity occurrence corresponds. The specific entity types can then be deleted. (They could, however, be included for efficiency in processing but would be redundant.) For example, if all employees are truckers, secretaries or mechanics, then a 'type-of-employee' attribute would have the value 'trucker', 'secretary' or 'mechanic'. Example: Represent as: Employee: [EMP#, name, address, phone, b-day, type-of-employee] E.g.: An occurrence of a 'Trucker Employee' is represented as: Employee: ['53792', 'Wallace, D.' . . . . . trucker'] An occurrence of a 'Secretary Employee' is represented as: Employee: ['32342', 'Smith, Q.' . . . . . 'secretary']

70

V C Storey

The specific entity types can extst as separate entity types whose only attributes are their key attributes. The purpose of the key attributes is to indicate which occurrence of an entity type belongs to which category. In this case, the 'type-of-employee' attribute is not needed in the generic entity type. This representation would be adopted if, for example, the specific entity types occur m other relationships, i.e. Represent as:

Employee: Secretary : Trucker : Mechanic :

[EMP#, name, address, phone, b-day] [SECRETARY_EMP# ] [TRUCKER_EMP# ] [MECHANIC_EMP# ]

8.3.3. Subtype hwrarchy In a subtype hierarchy, an occurrence of the generic entity type can correspond to occurrences of more than one of the specific entity types; thus, overlapping among the specific entity types can occur. In order to represent this hierarchy, the specific entity types can be defined, regardless of whether or not they have their own attributes. This could occur, for example, when an employee can be both a trucker and a mechanic. Then, even if the specific entity types contain only their keys, the values of the keys indicate when an entity occurrence is both of these. Alternatively, if the specific entity types do not have unique attributes, the subtype hierarchy could be represented by multiple discriminating variables in the generic entity type. In general, the number of discriminating variables is equal to the number of simultaneous overlapping specific entity types. Example:

Employee IS either full-time or part-time. Employee is either a secretary, trucker, or mechanic Represent as: Employee: [ E M P # , . . . , f u l l / p a r t - t i m e , type-of-employee] 8.4. Both enttty types have (0, 1) cardmahties

The general rule for representing relationships where neither entity type has min/max cardmahtles (1, 1) is to construct a separate relation. Special consideration, however, needs to be given to a relationship which is optional in both directions with a maximum cardinality of one; that is, both A and B have min/max cardinalities (0, 1). There are two situations to e x a m i n e - when the relationship has attributes and when it does not. 8.4.1. Relationship has attrtbutes If the relationship has attributes, it should be represented as a separate relation following the approach outlined previously except for the key of the relation. In this case, the key of one of the entity types, A or B, is sufficient as the key of the relation [15]. Example ([15]): Entittes :

Employee: [EMP#,... ] Car: [CAR#,... ] Employee uses Car: [total-kilometers]

(0,1)

(0,1)

Represent as"

Employee_uses_Car:

[EMP#, ear#, total-kilometers]

or:

Employee_uses_Car : [CAR#, employee#, total-kilometers ]

Relational database destgn

71

Since the keys of both entity types are candidates to serve as the key of the relation, the choice can be based on information about the types of queries to be processed against the database.

8.4.2. Relationship does not have attributes If the relationship does not have attributes, then there is a choice: 1. A new relation can be constructed. Each occurrence of the relation is guaranteed to correspond to an occurrence of the relationship. 2. The key of A can appear as a foreign key in B or vice versa. The choice depends on usage. For example, the relationship Patient (0,1) has Bed (0,1), could be represented by making the key of Patient a foreign key in Bed. This is because it is probably more likely that a Bed has a patient more frequently than vice versa (due to out-patients, etc ), i.e.: Bed: [BED# . . . . . patient_id] Under this scheme, the foreign key could be null. Both of the above schemes are feasible. The choice of which one to use depends on the participanon rates of the entities and anticipated queries [28].

9. Unary relationship representation The following examples of unary relationships are based on Teorey et al. [30]. As in the rules for binary relationship representation, when one of the entity types has (1, 1) cardinalities, the foreign key approach is adopted; however, it requires qualification of the attribute names. 1. One-to-one mandatory. Example: Every apprentice has exactly one of the other apprentices as a partner.

Apprent ice partner-o f Apprent ice (I,i) (I,I) Represent as: Apprentice: [EMP#,..., partner-emp#] Integrity constraint: EMP# # p a r t n e r - e m p # because, obviously, an apprentice cannot be his or her own partner. 2. One-to-one opuonal. Example: An employee could have one o f t h e other employees as his or herspouse.

Employee married-to Employee:

(0,1)

(0,1)

This situation can be treated as shown in Section 8.4. However, it is necessary to distinguish among the two types of employees. The relationship could be represented as follows:

Husband-Employee_married-to_Wife-Employee: [HUSBAND-EMP#, wife-emp#, marriage-date, #children,...]

72

V C. Storey or:

[WIFE-EMP#, husband-emp#, marriage-date, #chi i dren,... ]. Or (if there are no relationship attributes of mterest): Employee: [ E M P - N O , . . . , sp-emp-no] where 'sp-emp-no' ts allowed to be null. (Thts is the solution provided by L R D M . ) Integrity constraint: EMP-NO # sp-emp-no. 3. One-to-many mandatory. Example: Every apprentice has exactly one of the other apprentices as his or her tutor; an apprentice can tutor zero or more other apprentmes. Apprentice tutors Apprentice (0,*) (i,i) Since the min/max values of one of the entity types are (1, 1), this suggests a foreign key approach. The semantics of the relationship indicate that each apprentice is tutored by one and only one other apprentice so the key of the apprentice doing the tutoring becomes a foretgn key tn the apprentice that ts being tutored. Apprentice: [EMP#,..., apprentice-tutoring-emp#] Integrity constraint: EMP# # apprentice-tutoring-emp#. 4. One-to-many optional. Example:

Engineer manages Engineer (0,*) (0,1) The m m / m a x cardinalittes imply that a separate relation should be created to represent the relattonship and that relationship attributes can exist. E n g i n e e r manages E n g i n e e r s :

[MANAGING-ENGINEER_EMP#,

ENGINEER_EMP#,... ]

If there are no relationship attributes, 'Engineer' could have the key of the engineer domg the supervtsing as a foreign key. In this case, however, the foreign key ts allowed to have a null value: Engineer : [ENGINEER_EMP#,.. o supervisor-engineer_emp# ] Integrity constramts : (a) ENGINEER_EMP# # supervisor-engineer_emp#. (b) Supervisor-engineer_emp# can be null. Suppose, as another example, that an engineer leads a group of other engmeers. Then, there are relationships between the engmeer that is the leader and the group, and between the other engineers and the group. The latter can be modelled as an association abstraction, t.e.:

Leader-Engineer leads Group (i,i) (i,i) Represent by making the key of 'Leader-Engineer' a foreign key in 'Group': G r o u p : [GROUP_ID, # - m e m b e r s , m a x - s i z e , l e a d e r - e n g i n e e r - e m p # ] The assoctation ts: M e m b e r - E n g i n e e r M e m b e r - o f Group (0,I) (i,*)

Relattonal database destgn

73

Represent by: Member-Engineer_Member-of_Group : [EMP#, GROUP_ID, d a c e - j o i n e d , . . . ]. 5. Many-to-many. Example: Each project may reqmre special communication with other projects. Project communication-with Proj eets

(1,,)

(1,,)

Represent as: Proj ect_Communicat ion-With_Proj ect : [PROJECT_ID, COMMUNICATING_PROJECT_ID ] Integrity constraint: PROJECT_ID # COMMUNICATING_PROJECT_ID

I0. N-ary relationship representation Both Teorey et al. [30] and Higa and Sheng [14] provide rules for determining how to represent ternary relationships. These rules involve determining whether or not a combination of two of the entity types' keys can be used to identify uniquely the third. In the worst case, ternary relationships have min/max cardinalities of (0, *) or (1, *) on all entity types, so creating a separate relation is the best way to represent this type of relationship. The key of the relanon ~s the concatenation of the keys of the involved entities. Any relationship attributes become non-keys. Consider, for example, a relationship among the entities Student, Course and Semester where a student takes a course m a given semester and receives a corresponding grade. 'Grade' is an attribute of the relationship since it refers to the relationship as a whole. Hence, the relationship would be represented as: student_course _semester: [STUDENT_KEY, COURSE_KEY, SEMESTER_KEY, grade] or: enroll: [STUDENT_KEY, COURSE_KEY, SEMESTER_KEY, grade] The above can be generalized to N-ary relationships.

II. Summary of representation of E-R constructs The guidelines for translating an E-R model into a relational data model are summarized in Table 1.

12. Normalization The ininal set of relations should be examined for violations of Second Normal Form (2NF) and Third Normal Form (3NF) (and higher). 2NF involves a partial functional

V C. Storey

74 Table 1 E-R representatmn m relational model E-R construct

Representation

Comments

Entity type

Construct entity relation

Keys and non-keys of relation correspond to those of the entity type

Bmary relationship one entity has m m / m a x values (1, I)

Make key of the entity type with non(l, 1) values a foreign key In the enUty type wlth the (1, 1) values

Referential integrity is guaranteed, null foreign keys imply that the values are unknown, but they must exist

Binary relationship both entitles have mln/max values (1, 1)

Make key ot A foreign key of B, make key of B foreign key of A, or both

Depends on query patterns

Binary relationship both entitles have mln/max values (0, 1)

If relationship attnbutes, construct a separate relation whose key is the key of one of the entity types If no relationship attributes, then either (1) make the key one entHy a foreign key in the other, or (2) construct a separate relation

Constructing a separate relation guarantees referential integrity Key of the relation is the key of one of the entities

Binary relationship specmhzatlon

Single relationship A ts-a B, A either adopts the key of B or adds st was a foreign key

Inherently represented due to the procedure for selecting primary keys

Binary relationships GeneralizaUon hierarchy

(1) If specific entity types have unique attributes, then construct a relation for each (2) If specific entity types do not have unique attributes, then, either (a) add a 'role" attribute to the generic entity type and delete the specific entity types, or (b) retain the specific entity types whose only attributes are their keys

Second alternative depends on occurrence of specific entity types m other relationships

Binary relationship subtype hierarchy

Specific entity types can exist and conSlSt of their keys plus any attributes that are unique to them If no unique attributes, relationships can be represented by discriminating attributes in the generic entity type

Number of discriminating attributes is equal to the number of potentially overlapping specific entity types

Binary relationship all other cases

Construct separate relation

Key of the relation is the concatenation of the keys of the entity types, non-keys are the relatlonshlp's attributes

Unary relationships

Foreign key or separate relation depending on m m / m a x cardmahtles

Key attributes need to be qualified with their entity names, integrity c o n s t r a i n t s required to ensure occurrences of different entity types in the relationship

Non-binary relationship

Construct separate relation

Key of the relation is (at most) the concatenation of the keys of the involved entity types, non-keys are the relatlonshlp's attributes

dependency and exists when a non-key attribute depends on part, as opposed to the complete, relation key. A transitive functional dependency is a violation of 3NF and occurs when a non-key attribute depends on other non-key attributes, instead of directly on the key. Following normal database design principles, violations of these types require that a relation be split into two. Example:

Relattonal database destgn

75

Violation of 2NF: Order: [ORDER#, CUSTOMER#, customer-address, amount] 'Customer-address' depends on ' C U S T O M E R # ' only so this becomes: Orde r : [ORDER#, CUSTOMER#, amount ] Customer: [CUSTOMER#, customer-address ] Violation of 3NF: Project: [PROJECT_ID, type, charging-rate] If 'chargmg-rate' depends on the 'type' of project, this becomes: Project: [PROJECT_ID, type] Type-Rate: [TYPE, charging-rate] Higher normal forms can also be obtained.

Part IV. Summary This paper has presented a step-by-step methodology for the design of a relational database based on the Entity-Relationship model. It has shown how the E-R model can be constructed and then transformed into a relational data model. Data abstractions, and their impact on a design, have also been discussed. Some heuristics have been presented for: correctly identifying entity types and attributes; selecting primary keys from candidate keys; identifying missing constructs (entity types or relationships); and highlighting design problems. The methodology strives to expand upon previous work on extensions to the E-R model and its use in the design of relational databases. Some of the discussion (for example, how to identify possible design errors) may only serve as guidelines for a design and are subject to apphcation-specific or physical design considerations. Other parts of the methodology, such as how to identify which relationships may have attributes, should always be followed. It is intended that the methodology be used by a database practitioner or in an automated tool for database design.

Part V. Appendix 13. Database design example This appendix applies the methodology outhned in this paper to a small database design problem.

13.1. E n ~ e s

Person: Person: Employee: Manager: Secretary: Department:

[SSN, name, address, bday] [NAME, ADDRESS, ssn, bday] [EMP#, name, address, phone, title] [EMP#, name, address, phone, title, promotion- date] [EMP#, name, address, phone, level] [COMPANY, NAME, function]

V C Storey

76

Company: Project: Task: Computer:

[NAME, location, phone] [ID, description] [NAME, description] [NUMBER, type, model]

13.2. Re~uonsh~s Company e m p l o y s E m p l o y e e s

(1,*)

(1,1)

Company has Departments (1,*)

(I,I)

Project has Tasks (1,*) (1,*) Manager s u p e r v i s e s E m p l o y e e s

(I,*)

(I,I)

Manager is-an Employee

(1,1)

(0,1)

Secretary is-an Employee (i,i) (0,i) Employee is-a Person

(1,1)

(0,1)

Manager is-a Person

(1,1)

(0,1)

Employee assigned-to Projects: [start-date, end-date] (0,*) (1,*) Employee w o r k s - o n Computer: [ l e n g t h - o f - t i m e ]

(0,I)

(0,1)

• The relationship Project has Task is more accurately represented by the aggregation Task component-of Prolect. Similarly, Company has Departmen~ is changed to Department component-of Company. • The relationship Manager zs-a Pe~on is redundant and, there~re, deleted. It can be in~rred through the relanonshlps Manager ~-an Employee and Employee ~-a Pe~on. • The revised set of relanonships m:

Company employs Employees

(1,*)

(1,1)

Department component-of Company

(I,I)

(I,*)

Task component-of Project

(1,*)

(1,*)

Manager supervises Employees (1,*) (1,1) Manager is-an Employee

(I,I)

(0,I)

Secretary is-an Employee (1,1) (0,1) Employee is-a Person

(I,I)

(0,I)

Employee assigned-to Projects: [start-date, end-date]

(0,*)

(1,*)

Relattonal databasedestgn

77

Employee works-on Computer: [length-of-time] (0,1)

(0,1)

• 'Inherited attributes' are eliminated from the entity types involved m is-a relationships: 1. The attributes 'name' and 'address' are deleted from Employee because Employee can inherit them from Person through the relationship Employee is-a Person. The Employee entity type becomes:

Employee: [EMP#, phone, title] 2. The attributes 'phone' and 'title' are deleted from Manager because Manager can inherit them from Employee through the relationship Manager is-an Employee. The attributes 'name' and 'address' are also deleted from Manager because Manager can inherit them from Person through the relationships Manager is-an Employee and Employee is-a Person. The Manager entity type becomes:

Manager: [EMP#, promotion-date] The attribute 'phone' is deleted from Secretary because Secretary can inherit ~t from Employee through the relationship Secretary ts-an Employee. The attributes 'name' and 'address' are also deleted from Secretary because Secretary can inherit them from Person through the relationships Secretary is-an Employee and Employee ts-a Person. The Secretary entity type becomes:

Secretary: [EMP#, level] • The new set of entity types is:

Person: Person: Employee: Manager: Secretary: Department: Company: Project: Task: Computer:

[SSN, name, address, bday] [NAME, ADDRESS, ssn, bday] [EMP#, phone, title] [EMP#, promotion-date] [EMP#, level] [COMPANY, NAME, function] [NAME, location, phone] lID, description] [NAME, description] [NUMBER, type, model]

13.3. Primary keys • For ennty types in ts-a hierarchies, the specific entity types inherit the primary keys of their generic entity types.

Ini~alsetofen6~ typesin 'ts-a'hierarchy: Person: Person: Employee: Manager: Secretary:

[SSN, name, a d d r e s s , b d a y ] [NAME,ADDRESS, s s n , b d a y ] [EMP#, p h o n e , t i t l e ] [EMP#, p r o m o t i o n - d a t e ] [EMP#, l e v e l ]

The prtma~ key f o r ' P e ~ o n ' ~ se&c~d: Person:

[SSN, name, a d d r e s s ,

bday]

v c Storey

78

'Employee' inherits the prtmary key of 'Person' as one of us candidate keys:

Employee: [EMP#, phone, title, ssn] Employee: [SSN, emp#, phone, title] ' E m p # ' ts selected as the prtmary key of 'Employee' because tt ts an apphcation-speclfiC key. 'Employee' has inherited the attribute 'ssn'. This attribute remains as part of the defimtton of 'Employee'. E m p l o y e e : [EMP#, s s n , p h o n e , t i t l e ]

'Manager' could inherit the primary key of 'Employee' as one of its candidate keys but tt is already there: M a n a g e r : [EMP#, p r o m o t i o n - d a t e ]

Stmilarly, 'Secretary' could mhertt the primary key of 'Employee' as one of its candidate keys but it is already there:

Secretary: [EMP#, level] Prtmary keys for these enttty types: Person: [SSN, name, a d d r e s s , b d a y ]

Employee: [EMP#, ssn, phone, title] Manager: [EMP#, promotion-date] Secretary: [EMP#, level] • Manager and Secretary have unique attributes and, therefore, must remain as separate entity types • Primary keys for the rest of the entity types are chosen. Since there is only one candidate key in each of the remaining cases, they become the entity types' primary keys. • The entities become: Person: Employee:

[ SSN, name, address, bday] [ EMP#, ssn, phone, title]

Manager: Secretary: Department: Company: Project: Task: Computer:

[EMP#, promotion-date] [EMP#, level] [COMPANY, NAME, function] [NAME, location, phone] [ID, description] [NAME, description] [NUMBER, type, model]

• The attribute 'company' of entity type Department Is replaced by its primary key.

Department: [COMPANY, NAME, function] becomes:

Department: [COMPANY_NAME, DEPARTMENT_NAME, function] • The keys of the entity types mvolved in component-of relationships are examined: 1 Department component-of Company - the key of Company already appears in the key of Department so no adjustment is required. 2. Task component-of Prolect - the key of Project does not appear in the key of Task. 'NAME' uniquely identifies Task so no adjustment is required. • The entity types are now:

Person:

[SSN, name, address, bday]

79

Relattonal database destgn

Employee: Manager: Secretary: Department: Company: Project: Task: Computer:

EMP#, ssn, phone, title] EMP#, promotion-date] EMP#, level] COMPANY_NAME, DEPARTMENT_NAME, function] NAME, location, phone] ID, description] NAME, description] NUMBER, type, model]

13.4. E-R model The E-R model for this design is now complete and shown in Fig. 11.

PERSON I (0,11 Could-be

COMPANY

I Hires (1'.1

(1'11 Is-alwa~,s . (1'11/ E M P L O Y E E || Access-to (011) worxa-ioa]

~~

Superv,.., / ( ~ : )

"~

/(w:;)k ....

/ ~(o,1)/ Coud l -~

/ (1,*)

Supervises

/

(1,1) Is-aiways

/

~'Sl.',./ I

I

MANAGER

(I,*) Requires PROJECT

COMPUTER sld-Of

TASK Fig 11 E-R Model

(0,: Used-by

v c Storey

80

13.5. Transformatton of E-R model into relational data model 13.5.1. Enttties The entity types become the initial set of (entity) relations and look exactly the same as the set of entity types above. 13.5.2. Relationships 1. Company e m p l o y s E m p l o y e e s

(1,*)

(1,1)

Representation: Make the key of 'Company' a foreign key m 'Employee': E m p l o y e e : [EMP#, s s n , p h o n e , t i t l e , company_name] . Department component-of Company

(1,1)

(1,*)

Representatton: Add the key of 'Company' as a foretgn key in 'Department' but smce the key of 'Company' already appears as part of the key of 'Department', no modification is required. Department : [COMPANY_NAME, DEPARTMENT_NAME, func t ion ] . Task component-of Project

(1,*)

(1,*)

Representa~on: Cons~ucta separa~ re~tion: Task_component-of_Project:

[TASK_NAME, PROJECT_ID]

. Manager supervises Employees

(1,*)

(1,1)

Representation: Add the key of 'Manager' as a foreign key tn 'Employee': E m p l o y e e : [Fk~P#, s s n , p h o n e , t i t l e , company_name, m a n a g e r _ e m p # ] . Manager is-an Employee

(1,1)

(0,1)

Represen~non: 'Manager' has already adop~d ~e prima~ k ~ of 'Employee' so no mod~catton required. M a n a g e r : [EMP#, p r o m o t i o n - d a t e ] . Secretary is-an Employee (1,1) (0,1) Represenmtion: 'Secrem~' has already adopted ~e key of'Employee'so no mod~canon ts required. Secretary: [EMP#, level]

. Employee is-a Person

(1,1) Representation:

(0,1)

Relattonal database destgn

81

Represented already because 'ssn' was added as a non-key of 'Employee' when 'Employee' inherited the primary key of 'Person' as one of its candidate keys.

Employee: [EMP#, ssn, phone, title, company_name, manager_emp#] 8. Employee assigned-to Projects: [start-date, end-date] (0,*)

(1,*)

Represen~tion: Cons~ucta separa~ re~uon whose non-keys are d e r e ~ o n s h ~ ' s attributes.

Employee_assigned-to_Projects: [EMP#, PROJECT_ID, start-date, end-date] 9. Employee works-on Computer: [length-of-time]

(0,1)

(0,1)

Represen~on: Thh is an opttonal re&6onsh~ in both directions with maximum cardinali~ of 1; therefore, the key of 'Employee' and d e key of'Computer' are b o ~ candidates for the new re~tion's key.

Employee works-on Computer: [EMP#, computer_number, length-of-time]. 13.6. Norma~zanon • The relations are normalized.

13.7. ~ n a l d e s ~ n

Person: [SSN, name, address, bday] [EMP#, ssn, phone, title, company_name, manager_ Employee: emp#] [EMP#, promotion-date] Manager: Secretary: [EMP#, level] Department: [COMPANY_NAME, DEPARTMENT_NAME, function] Company: [NAME, location, phone] Project: [ID, description] Task: [NAME, description] [NUMBER, type, model] Computer: Task_component-of_Project: [TASK_NAME, PROJECT_ID] Employee_assigned-to_Projects: [EMP#, PROJECT_ID, start-date, end-date] Employee works-on Computer: [EMP#, computer_number, length-of-time]. References [1] c Batlm, M Lenzenm and S.B. Navathe, A comparatwe analysis of methodologies for database schema integration, ACM Comput Surveys 18(2) (December 1986) 323-364 [2] M R Blaha, W J Premerlam and J E Rumbaugh, Relational database demgn using an object-oriented methodology, Commun. A CM 31(4) (April 1988) 414-427

[3] R J Brachman, What IS-A Is and isn't' an analyms of taxonomic hnks in semanttc networks, Computer 16(10) (October 1983) 30-36. [4] M Brodle, Database management: a survey, m: M L Brodle, and J. Mylopoulos, eds., On Knowledge Base ManagementSystems (Springer, Berhn, 1986) 201-218 [5] M. Brodle, On the development of data models, m' M L. Brodle, J Mylopoulos, and J W

82

[6] [7]

[8]

19]

[I0]

[11]

[12]

[13]

[14]

[15] [16]

[17]

[18]

[19]

[20]

[21]

V C Storey Schmldt, eds , On Conceptual Modelhng (Springer, Berhn (1984) 19-47 M Brodle, Association a database abstraction, m Proc Enttty-Relattonshtp Conf (1981) P Chen, The Entity-Relationship model - toward a umfied view of data, A C M Trans Database Systems 2(1) (March 1976) 9-36 C J Date, An lntroduetton to Database System~, Vol I, 4th edition (Addison-Wesley, Reading, MA, 1986) J P DawS'and R D Bonnell, Modehng semantics w~th concept abstraction m the EARL data model, m Proc Etghth Internat Conf on EntttyRelattonshtp Approach, Toronto, Canada (October 1989) 102-117 A Dogac, B Yuruten and S Spaccapletra, A generalized expert system for database design, IEEE Trans Software Engrg 15(4) (April 1989) 479-491 C S Dos Santos, E J Neuhold and A L Furtado, A data type approach to the Entity-Relationship model, m P P Chen, ed Proc EntttyRelattonshtp Approach to Systems Analysts and Destgn (North-Holland, Amsterdam (1980) 103119 R C Goldstem and V C Storey, Data abstractions the impact on database management, Working paper, Umverslty of Rochester, 1991 R C Goldstem and V C Storey, Unravelhng IsA structures, Working paper, Umverslty of Rochester, 1991 K Hlga and O R L Sheng, A comparative analysis of dmgrammatlc logical database design methods SEM vs LRDM (EER-based), Working paper, Umverslty of Arizona, 1988 D R Howe, Data Analysts for Data Base Destgn (Edward Arnold, Pans, 1983) T-W Ling, A normal form for Entlty-RelatLonshtp dmgrams, in. Proc 4th lnternat Conf on the Enttty-Relattonshtp Approach (Chicago) (IEEE Computer Society Press, Sdver Spring, MD, 1985) 24-35 N M Mattos, Abstraction concepts' the basis for data and knowledge modelhng, m Proc E-R Conference, 1988 N M Mattos and M Mlchels, Modelhng with KRISYS the design process of DB apphcatlons rewewed, in Proc 8th lnternat Conf on EntttyRelattonshtp Approach, Toronto, Canada (October 1989) 159-173 F R McFadden and J A Holler, Database Management, 2nd edition, (BenJamin/Cummings, Menlo Park, CA, 1988) R A Meersman, Preface, in R Meersman and A C Sernadas, eds (North-Holland, Amsterdam, 1988) vn-xli S B Navathe, R Elmasrl and J Larson Integrating user wews m database design, IEEE Computer (January 1986) 50-62

[22] W D Potter and L Kerschberg, A umfied approach to modehng knowledge and data, m R A Meersman and A C Sernadas eds, Data and Knowledge (DS-2) (North-Holland, Amsterdam, 1988) 265-291 [23] M Shaw, The impact of modelhng and abstraction concerns on modern programmmg languages, m M L Brodle, J Mylopoulos and J W Schmldt, eds , On Conceptual Modelhng (Springer, Berhn, 1984) 19-47 [24] J M Smtth and C P Smith, Database abstractions aggregation and generahzatlon, ACM Trans Database Management Systems 2(2) (June 1977) 105-133 [25] V C Storey and R C Goldstem, Design and development of an expert database design system, Internat J Expert Syst Res Apphc 3(1) 11990) 31-63 [26] V C Storey and R C Goldstem, An expert view creation system for database design Expert Syst Rev 2(3) (1990) 19-45 [27] V C Storey, Vtew Creatton An Expert System for Database Deszgn, Ph D dissertation, Faculty of Commerce and Business Administration, Umverstty of British Columbia, Vancouver, Canada (ICIT Press, Washington, DC, 1988) [28] V C Storey and R C Goldstem, A methodology for creating user views m database design, ACM Trans Database Systems ( TODS) 13(3) (September 1988) 305-338 [29] B Tauzovlch. An expert system for conceptual data modelhng, Cognos Inc Ottawa, Canada, m Proc 8th lnternat Conf on Enttty-Relattonshtp Approach, Toronto, Canada (October 1989) [31)] T K Teorey, D Yang and J P Fry, A logical design methodology for relational databases using the extended Entity-Relationship model, Comput Surveys 18(2) (June 1986) [31] D Tslchntzls and F Lockovsky, Data Models (Prentice-Hall, Englewood Cliffs, N J, 1982) [32] J Ullman, from 'Discussion' m M L Brodle and J Mylopoulos eds , On Knowledge Base Management Systems (Springer, Berhn, 1986) 197-198 [33] R B Wdmot, Foreign keys decrease adaptabdlty of database designs, Commun ACM 27 (12) (December 1984) 1237-1243 Veda Storey, Assistant Professor of Computers and Information Systems at the Wllham E Simon Graduate School ot Business mdmm~stratton, Umverslty of Rochester, has research interests m database management systems and artificial lntelhgence Her research has been pubhshed m A CM Transactions on Database Systems, The lnternattonal Journal of Expert Systems, and Expert Systems Revww She Is the author of View

Relational database design Creation. An Expert System for Database Design, a book based on her doctoral dissertation, published by ICIT (International Center for Information Technology) Press m 1988. Dr Storey received her doctorate in Management Information Systems from the University of British Columbia, Canada, in 1986 She earned a Master of

83

Business Administration degree from Queen's Umverslty, Ontario, Canada, in 1980, and a Bachelor of Science degree (with distinction) from Mt Allison University, New Brunswick, Canada. In addition, she recewed in 1978 her Associate of fhe Royal Conservatory of Music for flute performance from the University of Toronto