Note on the structure of joins

Note on the structure of joins

0306-4379/92 $5.00 + 0.00 Copyright 0 1992 Pergamon Press Ltd Information Systems Vol. 17, No. 4, pp. 343-346, 1992 Printed in Great Britain. All rig...

331KB Sizes 8 Downloads 62 Views

0306-4379/92 $5.00 + 0.00 Copyright 0 1992 Pergamon Press Ltd

Information Systems Vol. 17, No. 4, pp. 343-346, 1992 Printed in Great Britain. All rights reserved

SHORT

NOTE

NOTE ON THE STRUCTURE OF JOINS HARM VAN DER LEK~

BSO (Bureau for Systems Development), Department of Management Support, P.O. Box 543, 3740 AM Baarn, The Netherlands (Received 2 July 1991; in revised form 11 March 1992)

Abstract-When SQL is used to formulate queries for a relational database, many conditions in the WHERE clause appear to be very predictable. These are the so-called join conditions which indicate how the tables of a database are related. It seems that a system should be able to generate these conditions to a great extent automatically from the knowledge of the database structure. To this end the notion of the structure of a join is introduced and mathematically described as a graph morphism. It turns out to be a generalization of the notion of a natural join. It is claimed that this approach is theoretically elegant and provides in practice a good basis for the development of query generators. Key words: Relational database, join, query generator, graph morphism

1. A DATABASE

SCHEME

AND

ITS

BACHMAN

GRAPH

As is well known a relational database scheme consists of a set of table schemes (or relation schemes) and each table scheme consists of a set of attributes. Furthermore we are interested in foreign keys. We will regard a foreign key of a table scheme Tl as a nonempty subset of the set of attibutes together with a bijective (one-to-one) mapping from this set to a subset of the attributes of a table scheme T2. In practice this second set of attributes is often the primary key of table T2, but for our purpose this is not important. If we consider the set of table schemes as a set of nodes and the foreign keys as a set of arrows between these nodes we get a directed graph [I]. We will call this graph the bachman graph of the database scheme. Example 1. For a small business we want to record information on personnel. In particular we want to know how the hierarchical lines are (who is boss of who?) and the representatives of the employees in the works council. The database consists of the following table: PERS. There are two foreign keys. One references the boss of an employee and the other references the works council. So the bachman graph has one node (PERS) and two arrows (BOSS and REPR) both of course from PERS to PERS (see Fig. 1). Example 2. For a tube network we want to record information on tubes and stations. A tube has a direction: it leads from one station (the beginstation) to another (the endstation). The database consists of the following two tables: TUBE and STATION. There are two foreign keys both from TUBE to STATION. One references the beginstation of a tube and the other references the endstation. Consequently the bachman graph consists of two nodes and two arrows both from one node (TUBE) to the other (STATION) (see Fig. 3).

2.

THE

JOIN

STRUCTURE

Suppose that Gl and G2 are directed graphs. A graph morphism is a pair (g, h) in which g maps the nodes and h the arrows of Gl to G2 in such a way that the image points of the begin node TAddress correspondence

to: Dr H. van der Lek, Dammolen 18, 3481 AM Harmelen, The Netherlands. 343

HARMVAN DER LEK

344

PERS Fig. I. The graphmorphism of Example 3.

Fig. 2. The graphmorphism

of Example 4.

and the end node of an arrow are equal to the begin node and the end node of the image arrow. Now we come to the main definition of this note: let a database scheme be given. Definition. A join structure is a graph together with a graph morphism from this graph into the

bachman graph of the database scheme. Example 3. In the Small Business Database (Example 1) we consider the following query: “Provide the names of the employees whose manager’s works council representative is called ‘Mark’ “. A translation into SQL: SELECT FROM WHERE AND AND

Pl .NAME PERS Pl, PERS P2, PERS P3 Pl.REPRNR = P2.NR (1) P2.BOSSNR =PB.NR (2) = ‘Mark’; (3) P3.NAME

This query is of the project-select-join type [2]. But (in contrary to the literature) we count condition (1) and (2) as a part of the join (the join conditions). Condition (3) is the condition representing the select-part. The join structure of this query is given by the graph morphism depicted in Fig. 1. The graph Gl consists of 3 nodes (Pl, P2 and P3) and two arrows (Al from Pl to P2 and A2 from P2 to P3). The graph G2 is the bachman graph of Example 1. The morphism is defined by mapping PI, P2 and P3 on PERS and Al on REPR and A2 on BOSS. Example 4. We change the query of Example 3 a little bit: “Provide the names of those employees who have a representative whose manager is called ‘Mark’ “. This query is based on a join that is different from the previous one: Al is now mapped on REPR and A2 on BOSS (see Fig. 2). The relevant part of the SQL statement is as follows: SELECT FROM WHERE AND AND

........ PERS Pl, PERS P2, PERS P3 Pl .BOSSNR = P2.NR P2.REPRNR = P3.NR ........

Example 5. In the Tube Network Database (Example 2) we consider the query: “Provide the tube numbers of tubes of which the arrival station capacity exceeds the depature station capacity”. For

The structure of joins

345

STATION

TUBE

Fig. 3. The graphmorphism

of Example 5.

the structure of the join see Fig. 3. The relevant part of the SQL statement (that is, the part including the structure of the join) is as follows: SELECT FROM WHERE AND AND

.... TUBE Tl, STATION T2, STATION Tl .DEPSTATNR =T2.STATNR Tl .ARRSTATNR =T3.STATNR .....

T3

Example 6. A somewhat simpler query than the foregoing is: “Provide for each tube, which is longer than 10 miles its tube number, the location and the capacity of its arrival station.” The structure of the join turns out to be of a special type: the subgraph type. A SQL translation: SELECT FROM WHERE AND

TUBENR, LOCAT, CAPACI TUBE, STATION TUBE.ARRSTATNR = STATlON.STATNR TUBE.LENGTH > 10;

Example 7. In a Hospital Database consider the query: “Provide the name and address of all the patients treated for ‘oan' “. A SQL translation: SELECT FROM WHERE AND

PATNAME, PATADD PATTRM, PAT PATTRM.PATNR PATTRM.TRMCODE

= PAT.PATNR

(1)

=‘oan’

(2)

Condition (1) represents the structure of the join. Note that the names of the columns are the same in this case (both: PATNR). Condition (2) represents the select-part of the query. In [3] the “query structure diagram” is introduced and Remmen [4] discusses the “query relationship diagram”. These diagrams are (approximately) equal to the graph of the graph morphism. The literature however does not explicitly refer to a relationship with the database structure diagram (or bachman diagram). In [5] graphical query generation is considered for ER-databases, but here the “unfolding idea” (see paragraph 4) seems not to be present. Of course having only the notion of a structure of a join is useless without defining the idea of the join belonging to the structure. A join is a special case of a database query, that means: it is a mapping which assigns to each database (population of the scheme) a table (the answer table). Defining this formally goes beyond the scope of this note, but it is easily seen what is meant in terms of a database query formulated in a language like SQL. Given the structure of a join the FROM-clause is determined by the mapping of the nodes. Some conditions in the WHERE-clause (namely the joinconditions) can be generated by the mapping of the arrows and the knowledge (in the datadictionary; for a suitable structure see [6] of the foreign key columns and the bijections (see the examples).

346

HARM VAN DER LEK

3. SPECIAL

CASES

In this paragraph we consider two special cases: the subgraph type and (as a special case of this) the natural type. As a consequence we will see that the well-known notion of a natural join indeed can be considered as a special case. Suppose we have a subgraph of a bachman graph of a database scheme. Then the inbedding of this graph is an injective graph morphism. Conversely each injective graph morphism can be considered as a subgraph and its inbedding (up to isomorphism). In this case we call the structure of the join of subgraph type. Examples 6 and 7 are of this type. Note that a structure is of the subgraph type if and only if we are not forced to use aliasses in the SQL-query. Now we argue that the well-known notion of a natural join [2] can be seen as a special case of the subgraph type. Suppose that the set of foreign keys is natural. This means that for each foreign key from Tl to T2 the bijective mapping is the identity function on the intersection of the attribute sets of Tl and T2 and for each two table schemes where the intersection of attributes is nonempty the identity function is a foreign key in one of the two directions (as a consequence: one can choose the same names for the attributes in Tl of the foreign key and the referenced attributes in T2). Suppose that, in this natural case, we have a given set of table schemes. Take the full subgraph of the bachman graph generated by this subset (i.e. take all the possible arrows). A structure of a join obtained this way is called of natural type. Example 7 is of this type. Now it is easily seen that the resulting join indeed corresponds to the classical notion of the natural join. 4. DISCUSSION

It is claimed that “the structure of a join” as defined in this note is exactly the extra information one needs to generate the join part of a query. The word “extra” means: in addition to the “static” information of the database structure. An important area of application is the generation of (for instance SQL-) queries. Today much so-called “user-friendly” interface to this extent already exist. But (to the knowledge of the author) they do not fully exploit the information which is (or could be) already present in the datadictionary. In fact often only the natural join is supported in the sense that the system takes as default join conditions the equalization of attributes which happen to have the same name, which of course is not sufficient as can be seen from Examples 2 and 6. That the subgraph case is not general enough can be seen from Examples 3 and 4. It seems to the author that the notion of the structure of a join as defined here is very descriptive indeed in the following sense: Conjecture. Two given structures of a join are isomorphic if and only if the resulting database queries are equal.

The definition of isomorphism of two join structures is straightforward. Unfortunately the author has not yet succeeded to find a proof or a counter example for this conjecture. Note that a bachman graph may contain cycles. In the given examples the graph of a graph morphism is acyclic. There are reasons to require this in general. This means that we can see this graph as a kind of unfolding of (part of) the bachman graph. REFERENCES [l] J. A. Bondy and U. S. R. en Murty. Graphtheory with Applications. Macmillan, NY. [2] D. Maier. The Theory of Relarional Databases. Computer Science Press, Rockville, MD (1983). [3] W. de Jonge, J. Bruining, W. H. Schoenmaker and G. A. M. Otten. Het formuleren van opdrachten in relationele vraagtalen. fnformatie 29(1 I), 407-416 (1988). [4] F. Remmen. Hoe vriendelijk zijn vraagtalen in het gebruik? Informafie 27(7/8), 666-673 (1985). [5] A. Ehnasri and J. A. Larson. A graphical Query Facility for ER Databases. In Proc. 4rh Inr. Co& on the Entity-Relationship Approach, Chicago, IL, pp. 236-245 (1985). [6] R. B. Buitendijk and H. van der Lek. Direct manipulation of a datadictionary with SQL. Information Systems M(3), 323-333 (1991).