Relational databases: a theoretical primer M Worboys
As a result o f their tight and simple formal underpinning, relational techniques are suitable for application to a wide range o f systems. Within the database community, relational databases have become the dominant type. There are now several comprehensive products on the market that meet many o f the requirements o f a truly relational system, and relational methods appear to be ever more extensively used. Relational techniques also have the advantage that they interlock well with standard design methodologies and to newer ideas, such as object modelling approaches. The aim o f the paper is to provide a concise and theoretically based introduction to relational databases, both in their design and in the manipulation o f data that they contain. The formal development proceeds in parallel with the progression o f an extended example. relational databases, SQL, relational algebra, normalization
HISTORICAL
BACKGROUND
It is often unnecessary to go right back to first principles when specifying and designing a system. There are available several 'off the peg' methodologies, where the fundamental concepts are already in place. This article describes one such methodology, based on the concept of a relational database. An introductory but fairly formal approach is taken to the design and application of a relational database system. The foundations of relational database technology were laid by Codd, who in 1970 published the paper 'A relational model of data for large shared data banks 'x. Codd was at that time working for IBM at their research laboratories in San Jose, California, USA. He laid down certain principles of database management, referred to as the relational model. These principles were soon applied in experimental systems, and a start was made on the design of a database language that would interact with such systems.
In 1974, Chamberlin and Boyce (again from IBM, San Jose) read a paper at a database workshop at the University of Ann Arbor, Michigan, USA 2. This paper introduced the database language SEQUEL, which was implemented in 1974/75 by IBM as the prototype language SEQUEL-XRM, The first attempt at a larger scale implementation of Codd's relational model was IBM's System R. This system used a revised version of Chamberlin and Boyce's SEQUEL called SEQUEL/2. In 1977, System R became operational at a number of installations. SEQUEL/2 became shortened to SQL. System R was a success, and relational ideas began to become accessible to a more general computer public. This was due in large part to the writings of Date (again from IBM), who in his book Introduction to database systems, Volume 1, provided a clear and readable introduction. The fourth edition of this book 3 was published in 1985 and is warmly recommended to the reader as an excellent intermediate text on relational databases. At present, there are many implementations of relational database technology. At the mainframe end, there are IBM's system DB2 (a direct descendant of System R), Oracle, and Ingres, all supporting dialects of SQL. Relational systems are now available for all sizes of computer. For microcomputers, Ashton-Tate's new product dBase IV supports a further variation of SQL. SQL is now the major language for interacting with relational databases. That this is so has been recognised by the American National Standards Institute (ANSI), who in 1982 instituted the development of a proposal for a standard relation database language. In 1986, this proposal for a language, which was essentially a skeletal version of IBM SQL, became an ANSI standard and will shortly become a full ISO standard 4. An extended version of standard SQL, SQL2, is currently being developed s.
BASIC DATABASE
Department of Computing Studies, Universityof Leicester,University Road, LeicesterLEI 7RH, UK vol 31 no 3 april 1 9 8 9
CONCEPTS
Relational databases derive their power from the simplicity of their underlying concepts. Intuitively, a relation may be thought of as a rectangular table, and a relational
0950-5849/89/030115-08 $3.00 © 1989 Butterworth & Co (Publishers) Ltd
115
(Local views)
I
I
Conceptual level (Global view)
together with a mapping Dom from {A1, A 2 . . . . . A,}, to the set of domains. A relation r(R)is a finite set of mappings {h, t2 . . . . tin} (called tuples), such that, for 1 <~i<<.m: ti: {AI, A2 . . . . . A,}--, Dom(A1) u Dom(A2) w . . . w Dom(A.) ti(Aj) ~ Dom(Aj)
Internal level (Physical database) Figure 1. Three-tier model for database management system
database management system (RDBMS) as the system required to maintain the storage, manipulation, update, and retrieval of data from these tables. The data may be stored on one of many sites, thus leading to the idea of a distributed relational database. A fundamental idea in all database work is that of data independence, where the system shields the user from the physical details of the storage of the data. A refinement of this concept allows the user to concentrate on only that part of the database (local view) with which he is concerned, rather than on the entire database (global view). An architecture that neatly implements these ideas is summarized by the ANSI/SPARC Study Group on Data Base Management Systems 6, who provided a simple three-tier model for a DBMS. The scheme is shown in Figure 1. The three tiers are: • Internal: concerned with details of physical storage. • Conceptual (global): the overall logical view of the database. • External (local): concerned with each user's (or group of users) logical view of the database.
RELATIONS
AND
KEYS
This section describes some of the fundamental concepts that underlie relational systems. To do this in a clear and unambiguous way, it is necessary to express these ideas using basic mathematical notation. Such notation lies at the heart of most areas of contemporary computing. The notation will be kept to a minimum, but some knowledge of sets and functions is required. For those readers who are unfamiliar with this notation, the author suggests the consultation of a basic text in discrete mathematics v. To begin with, some definitions are needed.
Definition 1 A domain is a nonempty set of cardinality less than or equal to No (i.e., finite or countably infinite). A relation scheme R is a finite set of attribute names {A1, A 2 . . . . . A,}, 116
It would be expected that a database evolves with time. Therefore it is assumed that the relations will change as tuples are changed, added, or deleted. However, the relation schemes are assumed to remain constant with respect to time. To illustrate with an example, the SQL expression Create Table Film ( Title Director Country Date Length
Character(30), Character(20), Character(10), Smalllnt, Smalllnt)
declares a relation scheme, with attributes Title, Director, . . . . Length. The corresponding domains are the sets of character strings of lengths less than or equal to 30, 20, and 10, and the set of integers within a range that is predefined for any given implementation. It is necessary to choose a numeric domain for an attribute if any arithmetic operation, such as taking the average or finding the maximum, is to be performed on the occurrences of that attribute. Standard SQL has only a limited provision for varieties of domains, however some relational systems have a wider range of domains, such as 'Date'. Notice also that SQL insists on an ordering of the attributes, while in general relational theory the attributes in a relation scheme are unordered. A possible relation that corresponds to the above scheme is shown in Table 1. This is a tabular representation of the set of tuples: ta(Title) = 'Jaws', h(Director)= 'Spielberg', ... t2(Title) = 'Star Wars', t2(Director)= 'Lucas', ... An important concept in database theory is that of the key. A key is a means of identifying uniquely a tuple of a relation. It is usually given as a subset of attributes of the relation scheme. For example, it is clear that specifying a value for Director in the relation Film will not in general uniquely define a tuple that contains that value. However, if it is assumed that the database contains no two films with the same title, specifying a value for Title will do so. Therefore, given the previous assumption, {Title} is a key. Note that {Director} is not a key, since in general a director may have any number of films in the database. If it was always the case that films made by a information and software technology
Table 1. Relation Film Title
Director
Country
Date
Length
Jaws Star Wars American Graffiti Raiders of the Lost Ark A Bridge Too Far Manhattan Kramer vs Kramer The Deer Hunter The Great Gatsby The Godfather French Lieutenant's Woman All the President's Men Last Tango in Paris Superman Marathon man
Spielberg Lucas Lucas Spielberg Attenborough Allen Benton Cimino Clayton Coppola Reisz Pakula Bertolucci Donner Schlessinger
USA USA USA USA GB USA USA GB GB USA GB USA IT/FR GB USA
1975 1977 1973 1981 1977 1979 1979 1978 1974 1972 1981 1976 1972 1978 1976
125 121 110 115 175 96 105 182 140 175 123 138 129 143 126
director were released at the rate of no more than one per year, then {Director, Date} would also be a key. More formally, the following definition is obtained.
Definition 2 Let r(R) be a relation as given in Definition 1. K is a key iff
• (1) K ~_ {A1, A2, . . . , A.} • (2) ti(K) = t~{K) implies that i=j • (3) No proper subset of K satisfies (2). Note that (1) defines the key to be a subset of attributes, (2) asserts that the key must define each tuple of the relation uniquely, and (3) provides a minimality condition on a key.
DATABASE
DESIGN
THROUGH
NORMALIZATION Imagine that the requirements analysis stage has already been completed and that a first approximation to a system has been obtained. Such a view of the system might be aided by an entity-relationship approach to data modelling 8. A newer and highly effective approach is to use an object model. Such models spring from concepts in object-oriented programming languages. C o m m o n features of object models are object identity, complex objects, and type hierarchy. Object-oriented modelling shall not be dealt with here, but it is remarked that object models are capable of transformation to normalized relations; the interested reader is referred to a recent well written paper that details these links 9. The question that is addressed is 'Given some kind of data model, how do we construct a database that works?' Standard approaches to this problem depend on database normalization. Normalization was developed by Codd and presented in a series of papers 1°-~2. It is now clear that normalization is by no means the vol 31 no 3 april 1989
whole answer to database design. Normalization gives an excellent first pass at design, however, and forces an examination of critical issues such as redundancy and database integrity. The method is based on a sequence of ever more exacting normal forms for the relations in the database. An explanation is best given by using a mixture of example and formal definition.
THREE
NORMAL
FORMS
Suppose that to begin with there is the following relation scheme: Cinemal (Film, Star, Role, Director, Dir_ nat) where Dir nat is an abbreviation for the Director's nationality. Suppose that the system satisfies the following constraints: • • • • •
(1) "Film' identifies a film uniquely. (2) 'Star' identifies a film star uniquely. (3) 'Director' identifies a film director uniquely. (4) All films in the database may have only one director. (5) For the purposes of the database, each film star may be allocated no more than one role in each film. • (6) A director may have only one nationality recorded in the database.
A typical relation based on this scheme is shown in Table 2. Observe that all the data in Table 2 in noncomposite. Table 2 is in first normal form. The formal definition of first normal form is as follows.
Definition 3 Let r(R) be a relation as given in Definition 1. Then r(R) is said to be in first normal form (1NF) ifffor 1 -%
Table 2. TypicalrelationCinemal Film
Star
Role
Director
Dir
Star Wars American Graffiti Raiders of the Lost Ark Manhattan Kramer vs Kramer Kramer vs Kramer Last Tango in Paris The Godfather Superman All the President's Men All the President's Men
Harrison Ford Harrison Ford Harrison Ford Meryl Streep Dustin Hoffman Meryl Streep Marion Brando Marion Brando Marion Brando Dustin Hoffman Robert Redford
Han Solo Bob Flafa Indiana Jones Jill Ted Kramer Joanna Kramer Paul Don Vito Corleone Superman's father Carl Bernstein Bob Woodward
Lucas Lucas Spielberg Allen Benton Benton Bertolucci Coppola Donner Pakula Pakula
USA USA USA USA USA USA Italy USA GB USA USA
D o m (A~) consists of atomic elements only. That is, D o m (A~) has no elements that are composite, such as arrays, lists, and sets. A set of relations is 1NF iff every relation in the set is INF. Although the database is 1NF, there are problems with representing all the data in a single table.
Insert problems Unless a lot of gaps are to be allowed the fact that a particular film has a particular director cannot be entered without entering further information about a star of the film, the role that the star takes, and the director's nationality.
Delete problems If the only tuple for a particular film/role is deleted then not only is the intended role information destroyed, but so is information about who was the director.
Update problems The information about who directs a particular film appears in Cinemal as m a n y times as roles of stars in the film are given. This redundancy causes update problems. For example, ifa spelling correction for director D is to be entered, either every row in which D appears must be updated or a problem of database integrity faced. The rules (1)-(6) above place constraints on the system. It turns out that the determination of these constraints is important for successful database design. A convenient notation is required for expressing these constraints. This is the purpose of the functional dependency concept.
I ilm 1-~ I 'tar
Director
nat
Dir_nat
Role
Figure 2. Functional dependency diagramsfor example
The definitions are framed in terms of the tuples of a relation, and this leads to an extensional view of the notions of key and functional dependency. It would take only a small amount of extra work, however, to redefine these concepts and the normal forms below, relying only on the semantics of the relationships between the attributes. This would make the definitions independent of the particular tuples in a relation at a particular time and dependent only on the semantics of the relation scheme. To define a higher normal form, some further definitions are needed.
Definition 5 Let r(R) be a relation as given in Definition 1. Let X, Y ~_ {A 1, A 2 . . . . . A,}. Y is said to be fully functionally dependent on X iff Y is functionally dependent on X, and for no proper subset X ' c X is it the case that Y is functionally dependent on X'.
Definition 4 Definition 6 Let r(R) be a relation as given in Definition 1. Let X, Y _ {A1, A 2 . . . . , A,}. Then r satisfies the functional dependency X --. Y iffVti, tj e r, t,(X) = tj(X) implies ti(Y) = tj(Y). It is said that Y is dependent on X. Note that it follows immediately that for all X, Y _ {A t , A 2 . . . . , A,}, i f X is a key then X ~ Y . A set of functional dependencies may be conveniently displayed in a functional dependency diagram. Figure 2 gives this diagram for the example. 118
An attribute of a relation is said to be prime iff it is a member of at least one key of the relation. Otherwise the attribute is nonprime.
Definition 7 Relation r(R) is said to be in second normal form (2NF) iff it is 1NF and every nonprime attribute is fully information and software technology
functionally dependent on every key of r. A set of relations is 2NF iff every relation in the set is 2NF. In the example relation Cinemal, it can be checked that the requirements of Definition 2 are satisfied by the set of attributes {Film, Star}, and by no other set of attributes. {Film, Star} is the only key, and the nonprime attributes are therefore Director and Dir nat. It can be seen from Figure 2 that neither of these attributes is fully functionally dependent on the key {Film, Star}. To transform Cinemal to 2NF, it is split as follows. The normalized relation schemes are: Film Dir2 (Film, Director, Dir Film _Star2 (Film, Star, Role)
nat)
I Film ~-4~ DirectorH Dir-nat I
The functional dependency diagrams are shown in Figure 3 and sample relations in Table 3. The database is now 2NF, and the revised structure overcomes the specific problems mentioned earlier. However, there are still difficulties remaining. The relation Film Star2 is satisfactory, but Film Dir2 still suffers from the lack of mutual independence among its non-key attributes.
Insert problems for Film
Dir2
The fact that a particular director has a particular nationality cannot be entered until there is a film directed by that director for entry into the database. The reason is that until such a film exists, there is no appropriate (i.e., non-null) primary key value.
Delete problems for Film Dir2 If the only Film Dir2 tuple for a particular director is deleted then not only is the information about the film destroyed but so is the information about the director's nationality.
Film
--~ Role
[
Update problems for Film
Star
Figure 3. Functional dependency diagrams for relation schemes Film Dir2 and Film Star2 Table 3. Samplerelations Film Dir2 Film
Director
Dir nat
Star Wars American Graffiti Raiders of the Lost Ark Manhattan Kramer vs Kramer Last Tango in Paris The Godfather Superman All the President's Men
Lucas Lucas Spielberg Allen Benton Bertolucci Coppola Donner Pakula
USA USA USA USA USA Italy USA GB USA
Dir2
The nationality of a given director may appear in the table many times (i.e., there is redundancy). Thus if the nationality of a director needs to be changed, a repetitive task or lack of integrity has to be faced. To define a normal form that overcomes these difficulties, some further definitions are needed.
Definition 8 Let r(R) be a relation as given in Definition 1. Let X ___ {A t, A 2 . . . . . A,}. It is said that attribute Ai is transitively dependent on X iff there exists Y_~ {A1, A 2 . . . . . A,} such that:
• • • •
(1) (2) (3) (4)
x-,Y Y~X Y-~ A i Ai~XwY
Film Star2 Film
Star
Role
Star Wars American Graffiti Raiders of the Lost Ark Manhattan Kramer vs Kramer Kramer vs Kramer Last Tango in Paris The Godfather Superman All the President's Men All the President's Men
Harrison Ford Harrison Ford Harrison Ford Meryl Streep Dustin Hoffman Meryl Streep Marion Brando Marion Brando Marion Brando Dustin Hoffman Robert Redford
Han Solo Bob Flafa Indiana Jones Jill Ted Kramer Joanna Kramer Paul Don Vito Corleone Supcrman's father Carl Bcrnstein Bob Woodward
vol 31 no 3 april 1989
119
Definition 9
Table 4. Sample relations Film Dir3
Relation r(R) is said to be in third normal form (3NF) iff it is 1NF and no nonprime attribute is transitively dependent on a key ofr. A set of relations is 3NF iffevery relation in the set is 3NF. The following proposition shows that 3NF encompasses 2NF.
Proposition Let r(R) be a relation. If r(R) is 3NF then r(R) is 2NF.
Film
Director
Star Wars American Graffiti Raiders of the Lost Ark Manhattan Kramer vs Kramer Last Tango in Paris The Godfather Superman All the President's Men
Lucas Lucas Spielberg Allen Benton Bertolucci Coppola Donner Pakula
Proof Use the notation of Definition 1. Suppose that r(R) is not 2NF. Then there exists a nonprime attribute Ai that is not fully functionally dependent on some key K of R. Clearly, A i is functionally dependent on K. A i is shown to be transitively dependent on K. As Ai is not fully dependent on K, there is a proper subset K ' c K, such that K ' ~ Ai. As K is a key, K -+ K', and it is not the case that K ' ~ K. Since A~ is nonprime, AiCK=K•K'. Therefore, A~ is transitively dependent on K, and r(R) is not 2 N F . [ ] Now, returning to the cinema example, the nonprime attribute Dir nat is transitively dependent on the key {Film} for the relation Film Dir2. To transform to 3NF, this table needs to be split into two. The normalized relation schemes become: • •
•
Film Dir3 (Film, Director) Director3 (Director, Dir nat) Film Star3 (Film, Star, Role)
The functional dependency diagrams are shown in Figure 4, and sample relations for Film Dir3 and Director3 are shown in Table 4. The tables are now 3NF and the difficulties described earlier are overcome.
FURTHER
NORMAL
FORMS
Transformation of the relations in the database has gone some way to decrease data redundancy and increase database integrity. However, there still remains a number of problems. Some of these are briefly addressed in this section, indicating remaining difficulties and pointing to possible solutions, without discussing the details.
I Film H
Director1 --~ Role
IDirectort~ Dir-nat [
I
Star
Figure 4. Functional dependency diagramsfor Film Dir3, Director3, and Film Star3 120
Director3 Director
Dir nat
Lucas Spielberg Allen Benton Bertolucci Coppola Donner Pakula
USA USA USA USA Italy USA GB USA
First, the idea of transitive dependency and its application to third normal form is reconsidered. This normal form guarantees that no nonprime attribute is transitively dependent on a key. But in the case of multiple and overlapping keys, this condition may not be sufficient to eliminate all the problems that arise from transitive dependencies on prime attributes. Maier 13 (section 6.7) gives an example to show this. The normal form that overcomes this problem is Boyce-Codd Normal Form (BCNF), which was introduced by Codd 12. BCNF is stronger than 3NF. The decomposition of a relation scheme to 2NF and 3NF depends on the existence of functional dependencies between attributes of the scheme and of the properties of these dependencies. Such decompositions are sometimes still possible/necessary, however, even when no functional dependencies exist. Multivalued dependencies may still exist, and their elimination by decomposition leads to relation schemes in fourth normal form. Multivalued dependencies and fourth normal form (4NF) were first discussed by Fagin 14. A further normal form is project-join normal form (PJNF), which eliminates so-called join dependencies. Examples and formal details of all these higher normal forms may be found in Maier 13. Thus there are several further normal forms, which address further special database design problems. There is usually a trade-off between the level of normal form of the database and the efficiency (both space and time) of running the database. RELATIONAL
OPERATIONS
Now that some database design issues have been considered in the previous section, a formalism for the information and software technology
manipulation and retrieval of relational data is developed. Most formalisms have their roots in either logic or algebra. Typical logic-based systems are the tuple and domain calculi (see, e.g., Date 3 or Maier 13 for a development of these approaches). Here relational algebra is considered, which as its name suggests is algebraic in nature. In particular, several basic relational operations are looked at and some of their algebraic properties suggested.
SET-BASED
RELATIONAL
OPERATIONS
Since relations are defined as sets (of tuples), this gives the usual operations on sets extended with no change to operations on relations. Thus union, intersection, and set difference are defined in the usual way, with the proviso that these operations are well defined iff the underlying relation schemes for the relations are identical, i.e., iff the relations are compatible. Set complement makes little practical sense and is seldom used. The set operation of Cartesian product also carries over to relations. When relations have common attribute names, however, a little care is required to define their product. A closely allied and more useful operation is that of relational join, which shall be discussed shortly. When relations are viewed in more detail than just as sets of tuples, the essentially relational operations of select, project, and join are arrived at.
SELECTION
AND
The logical symbols ~ and A may be read as 'not' and 'and' respectively. Now that it has been seen how to retrieve rows from a table, the next step is the retrieval of particular columns. The unary operation is projection. The formal definition is given.
Definition 11 Let r be a relation on attribute set {A 1, A 2 . . . . . An} with mapping Dom, and S ___A1, A2, . . . , An}. Then the projection of r onto S, rI-S], is a relation with the properties:
• r[S] has attribute set S. • r[S] has mapping Dom', which is the restriction of Dom to S. • r[S] = {t+S:ter}. In the language of rows and columns, r[S] consists of the table formed from r using only columns in S, and deleting duplicate rows. For example, the interest might be in just the names and directors of films in the database. The operation is represented as Film[Title, Director] (note the omission of set brackets here) and retrieves a relation with attribute set {Title, Director} and example tuple (Jaws, Spielberg).
RELATIONAL
PROJECTION
Select is a unary operation that picks out particular tuples of a relation, whose values on specified attributes satisfy a predefined condition. Let r(R) be a relation on attribute set {A 1, A2 . . . . . A,} with mapping Dom. Atomic formulas are expressions of the form Aria, where a e Dom(Ai), and 0 is some suitable comparison operation on Dom(A~) (such as ' = '). Formulas, well formed in this system, are Boolean but quantifier free combinations of atomic formulas.
Definition 10 Let r(R) be a relation on attribute set {A 1, A 2 . . . . . An} and let F be a formula, then aF(r ) is defined to be the relation that is the set of tuples of r(R) that satisfy F. For example, referring to Figure 1, the tuples of O'( l(Direclor-'Spielberg')A Year>m781(Film) are given in Table 5.
JOIN
Join is a binary relational operation, which takes two tables and produces a single table as a result. Given relations Film Star2 and Film Dir3, shown in Table 3 and 4, suppose that the equipment is to retrieve from the database a list of all the stars that have worked with each director. This information cannot be obtained from the tables individually. The two tables need to be combined together in some suitable manner. This relational operation is called a natural join and is written Film Star2 <~ Film Dir3. Some of the tuples retrieved are shown in Table 6. More formally, this gives the following definition.
Definition 12 Let there be given relations r(R) and s(S) with the property that their domain mappings are equal on
Table 6. Tuples retrieved for Film Star2 D<1 Film Dir3 T a b l e 5.
Tuples for ~t re(Director= ' S p i e l b e r g ' ) A Year>
Title Manhattan Kramer vs Kramer French Lieutenant's Woman
vol 31 no 3 april 1989
1978)
Director Country Date Allen Benton Reisz
USA USA GB
1979 1979 1981
(Film) Director
Film star
Lucas Spielberg Attenborough Allen Benton Benton
Harrison Ford Harrison Ford Robert Redford Meryi Streep Dustin Hoffman Meryl Streep
Length 96 105 123
121
R n S. Then the natural join of r and s, r ~ relation such that:
s, is the
• •
u has attribute set R w S. u has mapping Dom', such that Dom' corresponds on R to the domain mapping of r(R), and on S to the domain mapping of s(S). • u = { t : 3t'er, t " e s such that t~R=t', t~s=t"}.
There is a wide range of join operations, the details of which may be found in Maier 13. RELATIONAL
DIVISION
The final relation operation that is introduced here is the binary operation divide. Proceed straight to the formal definition.
years, relational databases have become dominant over all other types of database system. It was IBM that promoted most of the early developments in the field, and it is IBM's relational database product DB2 which in its latest release is a market leader in this area, on a par with other relational systems such as Oracle and Ingres. A first international standard 4 now exists for the elements of the relational database query language SQL. A new standard is in draft s, which covers SQL more extensively. Relational ideas are still developing, for example, with respect to distributed databases, temporal databases, and logic databases. Close connections exist with object-oriented methodologies. This powerful tool is suitable for the design and implementation of a whole class of computer-based systems.
REFERENCES Definition
13
Let r(R) and s(S) be relations such that S _ R, and with the property that their domain mappings are equal on S. Then r divided by s, r - s, is the relation u(U) defined by: • (1) u = R-S.
• (2) u(U) has mapping Dom', such that Dom' is the restriction to U of the domain mapping of r(R). • (3) u = { t : Vt"es, ~ t ' ~ r such that t ' l U = t and t'~,S = t"}. PROPERTIES OF RELATIONAL OPERATIONS The algebra of relational operations combining together under function composition has a distinctive structure. All the usual set-theoretic properties hold for union, intersection, and difference. Select is itself commutative and commutes with the set-based operations. Project commutes with select provided that the attributes in the selection formula belong to the subset of attributes onto which the relation projects. Natural join is also itself commutative. An interesting question is whether project commutes with join. The answer is 'Only in special circumstances'. If r(R'u R") is a relation, then it is not hard to show that r _ r[R'] D<1 r[R"]. If r = r[R']] 1:><] r[R"], then it is said that r decomposes losslessly onto R' and R". The final property that is discussed here gives a sense in which natural join and division are inverse to each other. If r(R) and r'(R') are relations for which R n R' = ~b, then (r D<~ r ' ) - r ' = r. The proof of this follows from Definitions 12 and 13. CONCLUSION This paper has discussed some of the theoretical bases of relational databases. An attempt has been made to show that the foundations are tight and elegant and so provide a solid footing for a live system. Indeed, in recent 122
1 Codd, E F 'A relational model of data for large shared data banks' Commun. A C M Vol 13 No 6 (June 1970) pp 377-387 2 Chamberlin, D D and Boyce, R F 'SEQUEL: a structured English query language' in Proc. A C M SIGMOD Workshop on Data Description, Access and Control (Ann Arbor, MI, USA, May 1974) 3 Date, C J An introduction to database systems Addison-Wesley, Wokingham, UK (1984) 4 International Organization for Standardization Document ISO/TC97/SC21/WG3 (Nl17) ISO, Geneva, Switzerland 5 International Organization for Standardization Document ISO/TC97/SC21/WG3 (N143) ISO, Geneva, Switzerland 6 ANSI/X3/SPARC Study Group on Data Base Management Systems 'Interim report' A C M SIGMOD Bulletin Vol 7 (1975) 7 Cooke, D J and Bez, H E Computer mathematics Cambridge University Press, Cambridge, UK (1984) 8 Chen, P P-S'The entity-relationship model--toward a unified view of data' A C M Trans. Database Syst. Vol 1 No 1 (March 1976) 9 Blaha, M R, Premerlani, W J and Rnmbaugh, J E 'Relational database design using an object-oriented methodology' Commun. A C M Vol 31 No 4 (April 1988) pp 414-427 10 Codd, E F 'Normalized database structure: a brief tutorial' Proc. A CM SIGFIDET Workshop on Data Description, Access and Control (November 1971) pp 1-17 I 1 Codd, E F 'Further normalization of the relational database model' in Rustin, R(ed) Data base systems Prentice-Hall, Englewoods Cliffs, N J, USA (1972) 12 Codd, E F 'Recent investigations in relational database systems' in Proc. IFIP Conf. (1974) pp 1017 1021 13 Maier, D The theory of relational databases Computer Science Press, New York, NY, USA (1983) 14 Fagin, R 'Multivalued dependencies and a new normal form for relational databases' A CM Trans. Database Syst. Vol 2 No 3 (September 1977) pp 262-278 information and software technology