0306-4379/93$6.00+ 0.00 Pergmon Press Ltd
Intformationsystems Vol. 18, No. 2, pp. 121-128,1993 Printed in Great Britain
SQL*: A RECURSIVE KEMAL KOYMEN’
SQL
and QUJUNCAIN
‘Moorhead State University, Moorhead, MN 56563, U.S.A. ‘Unicorn Systems Company, Los Angeles, CA 90042, U.S.A. (Received
12 June 1992)
Abstract-SQL, Structured Query Language, supported by most relational database systems since its introduction by IBM in the mid 197Os,has been widely accepted and has enjoyed much popularity because of its declarativeness and the simplicity of its English-like syntax and semantics. SQL satisfied classical database applications which merely involved basic database storage and retrieval operations. However, with the emergence of modem database applications such as CAD and SE databases, we have discovered shortcomings of SQL. A severe problem is the inability of SQL to handle recursion. A relation being defined can not be referenced in conjunction with its own definition. Consequently, a whole class of application problems which require the use of recursion can not be solved with the power of SQL alone. This paper has proposed an enhancement to SQL. The enhanced version is called Recursive SQL, or simply SQL*. The paper begins with the examination of the recursion facilities of Datalog, a logic-based language, and then introduces a similar scheme to extend the syntax and semantics of SQL to provide recursive capabilities. The resultant language, SQL*, has been implemented using a bottom-up incremental query evaluation strategy. SQL* makes it possible to express recursive queries as natural as they would be expressed in Datalog or in some other logic-based language. Moreover, SQL* still uses the original SQL-like syntax. Using SQL*, modern database and knowledge-base applications now become feasible. Key words: SQL, recursive query, transitive closure, datalog logic rules, semi-naive query evaluation, incremental tuples
1. INTRODUCTION
Since the introduction of the notion of relational database in 1970 by Codd [l], several relational DBMS products have emerged. A great majority of them accompanied a form of SQL, as user interface, which had originally been introduced as SEQUEL as part of a research prototype DBMS, System R [2], developed by IBM. The SQL and its dialects have received great attention and enjoyed overwhelming popularity. SQL contended to be a viable language for classical database applications which merely involved basic database storage and retrieval operations. However, with the emergence of modern database applications such as CAD databases, SE databases and VLSI databases, we have started discovering the shortcomings of SQL in dealing with those applications. An important facility required by those applications is the transitiue closure operation. The transitive closure operation is defined in conjunction with a binary relation R, and produces another binary relation R’ which is the smallest relation containing R and is transitive (i.e. the existence of tuples XY and YZ in R’ implies the existence of tuple X2). The transitive closure operation can not be performed in SQL due to its limited capabilities. Several enhanced versions of SQL have been proposed to overcome SQL’s deficiency in handling this problem [3-51. This paper introduces another version of SQL, referred to as SQL*, so as to handle the transitive closure operation. SQL* [6] is basically the SQL itself, enhanced with recursion facilities to handle the transitive closure problem. The principles of recursion in SQL* are based on recursive logic rules. A language of logic rules such as Datalog [7] or Prolog [8] provides us with capabilities to express, in addition to other operations, the transitive closure operation. As an example, let us consider the binary relation MANAGER(Emp,Mng), reflecting who is whose immediate manager, and the query ‘Who are the bosses of a given employee ?‘. The answer to this query could be computed by the evaluation of the following Datalog logic rules: (1) boss(Emp, Mng):-manager(Emp, Mng). (2) boss(Emp, Bss):-boss(Emp, X) & manager(X, Bss). 121
KEMAL KOYMEN
122
and QUJUNCAI
INSERT INTO BOSS SELECT * FROM MANAGER ALSO SELECT BOSS.Emp, MANAGER.Mng FROM BOSS, MANAGER WHERE BOSS.Bss = MANAGER.Emp; Fig. 1. An example SQL* query.
Rule 1 indicates that the relation for predicate manager is to be contained in the relation for predicate boss; rule 2 indicates that the computation of the relation for hoss requires referencing the relation for boas itself. This implies that the computation of the relation for predicate boss is a recursive process. This is why a logical rule such as rule 2 above is referred to as a recursiue rule.
The query specified above would be expressed in SQL* as it appears in Fig. 1. In this example query BOSS is the relation to be computed, whereas MANAGER is the one containing the given managerial information. It is assumed that an empty relation BOSS has been created prior to the evaluation of the query. As it could be observed from the above examples, there is an analogy, in structure, between the SQL* statement and Datalog rules for a given query. For instance, the portion of the SQL* statement preceding the keyword ALSO corresponds to the first rule, whereas the remaining part of the SQL* statement matches the second (recursive) rule. SQL*, an enhanced version of SQL, is referred to as recursive SQL in this paper. This is due to the fact that the relation to be computed is recursively re-referenced in the FROM clause of the SQL* statement; such a recursive reference is not supported by SQL. The reference to BOSS in the FROM clause of the query of Fig. 1. is an example of recursion in SQL*.
2. SQL*: THE SYNTAX
AND THE SEMANTICS
SQL* is simply the original SQL extended with an enhanced INSERT statement which could be used in formulating recursive queries. The syntax of the enhanced INSERT statement is depicted in Fig. 2. The relation R,, , is the one to be computed recursively from the basis relations R,, R . . ., R,. As indicated, the extended INSERT statement is composed of m + 1 subqueries. The &t subquery forms the basis for recursive definition. The subsequent subqueries constitute the recursive definitions. Naturally, a recursive query will have at least two subqueries. The INSERT statement consisting of only one subquery is just the original SQL statement. Each subquery consists of an SQL SELECT statement. The basis relations can be referenced in any subquery, whereas the relation R,, , can not be referenced in the first subquery. The number of attributes referenced in the SELECT clause of a subquery should be the same as the arity of relation R,, ,. The following three examples describe the use of SQL* INSERT statement. Example
1. A genealogical database
Consider the instance of the base relation PARENTHOOD(Child, Parent) shown in Fig. 3(a). In the graph representation of the relation, an edge downward from x toy means that x is a parent of y. INSERT INTO %l Basis-Subquery, ALSO Recursive-Subquery . ALSO
Recursive-Subquery.,, Fig. 2. The syntax of the extended INSERT statement of SQL.
123
SQL*:a recursive SQL
Figure 3(b) shows SQL*-queries to compute the relations: SIBL~G(~e~~~~
1, Person 2) COUSIN(Pemon 1, Person 2) RELATED(Per.son1, Person2)
Figure 3(c) shows the tuples of these relations resulted from the evaluation of the queries by the SQL* processor. It should be observed that these relations are all symmetric. For instance, the tuple (I), C), or (C, 0) of SIFBLING indicates that C and D are siblings, the tuple (F, H) of COUSIN indicates that F and H are cousins, the tuple (C, r>> of RELATED indicates that C and D are relatives, and so forth. Child
(a)
Parent
C
D D E F : H I I Table
A A B
J J K K
z : D D E Form
1)
INSERT INTO SIBLING
2)
INSERT INTO COUSIN SELECT Pl.Child,P2.Child FROM PARENTHOOD Pl,PARENTHOOD P2,SIBLING SX WHEm Pl.Parent = Sl.Personl AND P2.Parent = Sl.Person2 AND Pl.Child # PZ.Child ALSO SELECT Pl.Chi.ld,P2.Child FROM PARENTHOOD Pl,PARENTHOOD P2,COUSIN Cl
SELECT Pl.Child,PZ.Child FROM PARENTHOOD Pl,PARENTHOOD P2 WHERE Pl.Parent = P2.Parent AND Pl.Child #PZ.Child
WHERE Pl.Parent = Cl.Personl AND PP.Parent = Cl.Person2 AND Pl.child # PZ.Child 3)
INSERT INTO RELATED SELECT * FROM SIBLING ALSO SELECT Personl,Child FROM ~~~D,P~~HOOD WHERE ~~TED.Person2 = P~~HOOD.Parent AND Person1 # Child ALSO SELECT Child,Person2 FROM RELATED,PARENTHOOD NHERE RELATED.Personl = PAREiNTHOOD.P&rent AND Child $ Person2 Fig. 3(a) and (b).
Caption overleaf.
l? : I
KEMALKOYMENand QUJ~JNCAI
124
(4
Person1 Pereonl _______________ D C C D E D D E G F F
G
Person1
Person2 -____-_____________.. F Ii J X X J z :: G I Ii F H G H I I F I G I H
I Ii H I I F F I _______________
_____________________
Relation
Relation
SIBLING
COUSIN
Pereonl Person2 ____________________~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ F H J J C D F J C H : H X F H I c X c I X F I I D C J x F J I E c x F X X D C I F X D E G D I J X D F G F I H X G H I I D J D I G I X I J D D J G J I x H G X J C E D E H J D H E J H F E I E J F H G J G E X H J H ____________________~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I X C D E F
J H I J X C E J I
Fig. 3.(a)The relation PARENTHOOD(Child,Purent). (b)SQL* queries to compute the relations. (c) The results of evaluation of the queries ofFig. 3(b).
Example 2. A managerial database
Consider the instance of the base relation MANAGER(Emp, Mng) shown in Fig. 4(a). In the graph, an edge downward from x to y means that x is an immediate manager of y. Figure 4(b) shows the SQL* query to compute the relation BOSS(Emp, Bss), whereas Fig. 4(c) shows the tuples of the relation upon the evaluation of the query. It should be noticed that this relation indicates who are the bosses (i.e. immediate and non-immediate managers) for each employee. For instance, F, G and H are the bosses of A, and H is the only boss of G, and so forth. Example 3. An image processing database
Suppose we wish to use a database system to store visual images composed of cells and to construct images from recursively defined cells. Let us assume that images are black-and-white. A cell is composed of a collection of bits (pixels), each of which is either white (set) or black (reset). A cell also can contain copies of other cells, whose origins are translated to a specified point in the coordinate system of the containing cell. Figure 5(a) contains the descriptions of four example cells. Cell 1 is composed of two pixels. Cell 2 contains four pixels of its own at the coordinates (1,2), (1,3), (4,2) and (4,3). Cell 2 also contains two copies of Cell 1 with their origins at coordinates (2, 1) and (2,4). Cell 3 has two pixels of its own on the two comers and two copies of Cell 2 with the origins at the coordinates (3, 1) and (1,3). Finally, Cell 4 contains two pixels of its own on the upper right corner, a copy of Cell 1 on the lower left comer, a copy of Cell 2 on the lower right corner, and a copy of Cell 3 on the upper left corner.
125
SQL*: a recursive SQL
RnP Bng ___________ A F B C : D H E I F G I G G H
(4
___-_____-_-
__-_________________------
Graph (b)
Table
Form
Form
INSERT INTO BOSS SELECT * FROM MANAGER ALSO
SELECT BOSS.Bmp,MANAGER.Mng FROM BOSS,MANAGER WHERE BOSS.Bes = MANAGER.Emp
(c)
Emp Bes ____________________----E G A F E H A G I E A H F G B F H B G : H B H
I C G I C H D H ____________________-------
G H
Fig. 4. (a) The relation MANAGER(Emp, Mng). (b) The SQL* query to compute the relation BOSS. (c) The results of evaluation of the query of Fig. 4(b).
Figure 5(b) shows the two relations, PXL._SET(CeN, X, Y) and CONTAINS(Embedding, Embedded, X, Y), which describe the example cells defined above. The first relation indicates which pixel in which cell has been set to “1” (i.e. black), whereas the second relation reflects the embedding and embedded relationships among the cells. The tuple (i,j. a, b) of relation CONTAINS indicates that Cell j is embedded in Cell i, where the origin of the embedded copy of Cell j is at the coordinates (a, 6) in the i’s coordinate system. SQL* can be used to compute the relation, say, PXL_ON(Cell, X, Y), describing which pixels should be set to “1” (i.e. black) in a given cell for the purpose of displaying. The tuple (c, d, e) of relation PXL-ON indicates that the pixel of the given cell at the coordinates (d, e) should be set to “1”. For example, Fig. 5(c) shows the SQL* query to compute the relation PXLON(X, Y) for Cell 4. Figure S(d) shows the X and Y attributes of the tuples of this relation that resulted from the evaluation of the query.
3. THE SQL’ PROCESSOR SQL* described in the preceding pages has been implemented on the ORACLE system using Pro*C language [9]. The new SQL* processor, referred to as SQLSTAR, has been designed to replace the original ORACLE product SQLPlus. SQLSTAR accepts not only all the SQL and SQLPlus statements and commands supported by SQLPlus, but also SQL+ statements and several pseudo commands that assist the usage of SQLSTAR. When SQLSTAR is invoked, it first checks the user’s ID and password and establishes the connection between the database and the user. Upon receiving the input from the user, SQLSTAR
KEMAL KOYMEN
126
(a)
m
lm
I
12
and QUIU~SCAI
4
6
3
5
2
4
1
3 123
4
2 1
12 @I
3
4
5
6
7
8
Cell X Y ____________________--
9 10 11 12 Embedding Embedded X Y ____________________---1 a 1 f 3 z 1 t
1 t
INSERT INTO PXL-ON SELECT celi,x,Y FROM PXL SET wmm ceil = 4
(cl
ALSO
SELECT C.Embedding, C.X + P.X, C.Y + P.Y FROM CONTAINS C, PXL_ON P WHERE C.Embedding = 4 AND C.Embedded = P.Cell; (4
x
Y
2'
1
f
:
2 11
4
4 17
7
10 11
1
9 12
:
:
5 :
2 12
z
____________________~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-~---
9 f i 4 16 z : 13 10 4 12 8 12 f f _____________________ ____________________~~~~~~~~~~~~-~-~ Fig. 5.(a) Examples of cells. (b) Relations PXL_SET and CONTAINS describing the cells of Fig. S(e). (c) The SQL query to compute the relation PXL_ON. (d) The results of the query of Fig. 5(c).
127
SQL*: a recursive SQL %I
s= EVAL_QUERY1(~,R1,.
%.l
:=
. . ,R,,)
a%+,;
repeat &Q := &,,,; for j z= 2 to m+l do w.*, := EVAL_QUERYJ(8Q,RI,...,%); end
until
%*I
1’
a%.,
%.I
I’
%.I
8%.1
- %.I; u &+I:
= a;
Fig. 6. The pseudo-code of the algorithm for the evaluation of SQL* queries. R,,. R,, ,: the relation to be computed recursively.
., R,: basis relations;
parses it and determines if it is an ordinary SQL statement, a pseudo-command or an SQL* statement. It is then processed accordingly. Because the user queries are entered at run time, the implementation of SQLSTAR exploits the facilities of Pro*C concerning the dynamic preparation and execution of SQL operations at run time. When an SQL statement is executed, feedback information is returned to SQLSTAR via SQL Communication Area (SQLCA). The information required in conjunction with a SELECT statement is determined using the dynamic Pro*C statement DESCRIBE. When SQLSTAR is input the SQL* query of Fig. 2. it decomposes the query into one basis (i.e. non-recursive) and m recursive subqueries and performs the query evaluation. Figure 6 shows the pseudo-code of the algorithm that achieves the query evaluation. This algorithm is a bottom-up query evaluation algorithm, referred to as semi-naive or incremental query evaluation algorithm [7]. The algorithm computes the answer by retrieving the tuples constituting the answers to the subqueries and then combining them accordingly. In Fig. 6, the function invocation EVAL_QUERY,(O, R, , . . . , R,) derives the tuples which constitute the answer to the basis subquery, whereas the invocation EVAL_ QUERY,(GQ. R, , . . . , R,) used in the iteration determines the answer set corresponding to recursive subquery,. On entry to the first iteration, the parameter AQ contains the result of the basis subquery. Each subsequent iteration computes a set of tupies using the tuples resulted from the previous iterations. The new tuples (called incremental tuples) resulted from the current iteration are added to the answer set R,, , for the original input query. The iteration continues as long as new tuples can be produced. 4.
CONCLUSIONS
The SQL* introduced in this paper intended to extend the power of SQL with recursion facilities, in an attempt to satisfy the requirements of the modem database and knowledge base applications. The features of SQL, such as the declarativeness and the simplicity of English-like syntax, have been the major reasons for its vast acceptance and overwhelming popularity. However, SQL still falls short so as to satisfy the needs of modem database applications. Therefore, we followed the path of other researchers [3-51 and focused our efforts on enhancing the SQL with recursion facilities, as an alternative to logic-based systems, while keeping the original SQL facilities intact. Now the users of SQL* will enjoy the power of recursion, in addition to the conventional facilities of SQL; they no longer need to embed SQL statements in procedural programs for implementing recursive queries. REFERENCES [l] E. F. Codd. A relational model of data for large shared data banks. Common. ACM 13(6) (1970). [2] M. M. Astrahan ef al. Systems R: relational approach to database management. ACM TODS l(2) (1976).
128
KEMALKOYMENand QU~UNCAI
[3] J. Melton. Database Language SQL2 and SQL3. ANSI X3H2-90-001, IS0 DBL SEL3 @SO-ANSI Working Draft) (1989). [4] I. Mumick er al. Magic is relevant. In Proc. ACM-SIGMOD 1990 ht. Con/. on Management 01 Data, pp. 247-258 (1990). [S] P. Shaw. A generalization of recursive expressions for non-linear recursion of fixed degree. ANSI X3H2-88-93REV (1988). [6] Q. Cai. SQL? A recursive SQL-Design and implementation. M.Sc. Project, Moorhead State University (1991). [7] J. D. Ullmann. Principles of Database and Knowledge Base Systems, Vol. 1. Computer Science Press (1988). [8] I. Bratko. Prolog: Programming For Artificid Intelligence. Addison-Wesley, Reading (1990). [9] Pro lC User’s Guide. ORACLE Corporation (1986).