INFORMATION SCIENCES 79,61-710994)
61
An Algorithm to Convert Graphs into EDNF AL1 A. KHAN* MANSOOR
AL-A’ALI*
and NIGAT
R. ALAM
University of Bahrain, Computer State of Bahrain
Science Department,
P.O. Box 32038, Isa Town,
ABSTRACT The inclusion of inferencing capabilities in traditional database query supports recursion, allows Prolog as a language, and uses predicate calculus as a tool. We have studied the extended disjunctive normal form and propose a method to process query efficiently. This proposed technique is simple, easily implemented, and time efficient.
INTRODUCTION Query is an important aspect of a database. A lot of research has been done in this area [ll, but these studies were limited in the sense that inferencing capabilities were not incorporated. Now such capabilities have been included [2] in traditional databases. This supports recursion, allows Prolog as a language, and uses predicate calculus as a tool. In order to process a query efficiently in terms of space, time, or preferably both, several methods such as magic sets, wavefront, etc. [2], are available. The extended disjunctive normal form (EDNF) approach has several advantages such as use of existing DBMS, better performance, better understanding of recursive logic queries and availability of alternative processing algorithms. Basically, the method converts a logic query into an EDNF tree, selects an access path, constructs a set of query plans, and chooses the best plan. However, the conversion of a query into an EDNF is an important step in query optimization. This is precisely what has been studied in this short note. An algorithm to convert query graphs into EDNF is also proposed. This requires the query graphs to be stored as a 0 Elsevier Science Inc. 1994 655 Avenue of the Americas, New York, NY 10010
0020-0255/94/$7.00
A. KHAN ET AL.
62
binary tree. Then it lists the nodes in Inorder. A node may be OR or AND. Therefore, brackets are included for proper preference of operations, but unnecessary brackets are removed. A Pascal program is developed and run, for example, as in [2]. This program is given in the Appendix for easy reference. The proposed technique is better in the sense that it is simple, easily enabled on computers, and requires less time. BACKGROUND The EDNF approach uses Ullman’s rule/goal graph [3] to describe the data structure to represent a logic query. The rule/goal graph represents a set of rules by creating rule nodes-one for each rule-and goal nodesone for each predicate. The rule and the predicate have strings of b’s and f’s, where the symbols “b” and “f” indicate that the corresponding variable or argument is bound or free. However, this system requires graphs for all possible bindings. To overcome this drawback, the rule/goal graph is slightly modified and is called a query graph. This graph contains a specific binding value for a bound argument and keeps explicit mapping information among variables. Query graphs can deal with recursion. As an example, consider the recursive query graph in Figure 1 for the rules rl: a(X, Y) :- c(X, L), e(L, Y) r2: 0, IV) :- d(M, V), g(V, Z), a(Z, W) r3: 4X, L):-h(X, M), j04, L) and ask the query ?a(x,5>. If these graphs are used in a straightforward manner, we might run into problems. First, creating relations for each node takes an excessive amount of storage; thus minimization of temporary relations is required. Second, the execution structure and performance are heavily dependent on rules
//KW)/‘y~ rl(f/X,
.,,x,
fd
bW/Y)
T e(f/X, bW/Y)
m/,f/Xly
cKf/M, f/X)
Fig. 1.
Ydq g(f/X, f/Z)
a(f/Z, bW/Y)
AN ALGORITHM
TO CONVERT
GRAPHS
INTO EDNF
63
which counter the principle of data independency. This can be eliminated by normalizing the query to keep only semantic information that is necessary to evaluate the query. Finally, it does not take advantage of exisiting DBMS optimization. For these reasons, the query graphs are converted into EDNF [2]. The characteristics of the EDNF are as follows: (a) It is a set of two-level trees. (b) The root of each such tree is the query goal. (c) Each tree has one or more leaf nodes that are basic relations (i.e., not temporary relations). A leaf node in the tree must be a leaf node in the query graph. (d) A tree may have one or more loops on the root that indicate recursion. Such a tree is called a looped tree. (e) There is only one temporary relation that is the root of all trees. The EDNF trees for the recursive query in Figure 1 are shown in Figures 2 and 3. Whang and Navathe [2] have designed the following intuitive (top-down) procedure for converting the query graph into the corresponding EDNF: 1. The query graph is traversed by following one of the alternative branches at every nonleaf goal node (basically OR node). If a rule node (which is an AND operation), has a leaf node as a child, then the child is attached to the output tree as a leaf node. Otherwise, all these AND branches reach leaf nodes, and one EDNF tree has been constructed. 2. After the first visit, whenever the query goal is visited again (indicating a looped tree), it is treated as a leaf node. If multiple cycles bifurcate from an AND (OR) node, Multiloop node (multiple single-looped) trees are formed. 3. Whenever a leaf node is encountered, the mapping accumulated through the entire path from the root to the leaf is recorded on the arc unless it is the identity mapping. PROPOSED
ALGORITHM
We start with the query graph stored as a binary tree. Then it is traversed in Inorder and the nodes are listed. It may be noted that here a(f, X, bW/i)q
h(f/X,f/M)
j(f/M,f/L)
e(f/LbW/Y)
Ll
d(f/M, f/X)
Fig. 2. EDNF Forest of Figure 1.
g(f/X, f/Z)
A. KHAN ET AL.
64
a(f/X, bG)/Y) ‘cl AT\ d(f/M, f/X)
gtf/X, f/Z)
Fig. 3. Unfolded
a(f/Z, b(S)/Y) form.
only one visit is made to each node as compared to [2], where several visits are made. To differentiate between AND and OR nodes, brackets are included in the listing. Mapping is incorporated whenever required and unnecessary brackets are removed. Based on these criteria, a new algorithm QEDNF that produces looped EDNF trees from the input query graphs is proposed as follows: Definitions: Procedure GETQUERY: This accepts a query as the input. Procedure INORDER: This traverses the query graph in Inorder and inserts brackets at certain points. Procedure PROCESS: This processes to give EDNF trees. Procedure GETQUERY Get the query graph and store it as binary tree. End {GETQUERY} Procedure INORDER (Currentnode) If currentnode is an OR then insert a ‘(’ in the list inorder (currentnode.leftchild) Copy the data of the current node into the list inorder (currentnode.rightchild) End {Inorder) Procedure PROCESS Find the innermost set of brackets If there is an OR just before or after the close or open brackets then delete the OR If there is an OR just before the innermost open bracket then open the brackets If there is an AND just before the innermost open bracket then append the data before the AND to all of the non-AND and non-OR data in the brackets remove the brackets End {PROCESS)
AN ALGORITHM
TO CONVERT
GRAPHS
INTO EDNF
65
Program QEDNF GETQUERY INORDER PROCESS End {QEDNF)
CONCLUSION A new method to convert the query graphs into EDNF forests of unfolded trees used in processing recursive queries has been proposed. Based on the proposed algorithm, a program in Pascal was developed and run on the VAX 11/780 system. The example of [2] was considered. As expected, the same results were obtained. The proposed algorithm is better in the sense that it visits each node only once and requires less conditions to be tested. However, because the problem is NP-complete, there is no mathematical justification provided for the superiority of the algorithm.
APPENDIX PROGRAM QEDNF(INPUT, OUTPUT); TYPE ROOT = ^ROOTREC; ROOTREC = RECORD L :ROOT; MAP :PACKED ARRAY [l . . .5] OF CHAR; DATA :PACKED ARRAY [ 1. . .2] OF CHAR; B :PACKED ARRAY [l . . .2] OF CHAR; END; EDROOT = ^EDROOTREC; EDROOTREC = RECORD BACKEDROOT; NEXT:EDROOT; EDIWWPACKED ARRAY [l . . .5] OF CHAR; EDDATA:PACKED ARRAY [l . . .2] OF CHAR; END; TYQUERY = PACKED ARRAY [l . . . lo] OF CHAR; VAR QUERY:TYQUERY; BRACK, 1NSERT:INTEGER; CHOICE, STAT, MAPANS = CHAR;
66
A. KHAN ET AL. CUR, P, BASE, PERM = ROOT; Q, EDNF, BASE, PERM = ROOT; Q, EDNF, PERMANENT, OPEN, CLOSE, REM, REMl, REM2 = EDROOT; LISTER = EDROOT; FOUND, LISTEND, STOP = BOOLEAN; PROCEDURE INORDER (CURRENTNODE:ROOT); BEGIN IF CURRENTNODE < > NIL THEN BEGIN IF CURRENTNODE^.DATA[l] = ‘ + ‘THEN BEGIN NEWCEDNF); EDNF^.EDDATA:= ‘(‘; EDNF”.NEXT := NIL; EDNF^.BACK := Q; IF Q = NIL THEN PERMANENT:= EDNF ELSE Q .NEXT:= EDNF Q := EDNF; BRACK := BRACK + 1; END; INORDER (CURRENTNODE”.L); NEW(EDNF); EDNF^.EDDATA:= CURRENTNODE^.DATA; IF CURRENTNODE^.DATA[l] = ‘* ’ THEN EDNF*.EDMAP := CURRENTNODEhAP; QA.NEXT:= EDNF; EDNF^.NEXT := NIL; EDNF^.BACK := Q; Q := EDNF; INORDER (CURRENTNODE^.R); END; END; PROCEDURE GETQUERY; BEGIN NEW(CUR); PERM := CUR; CUR^.L := NIL; CUR*.R := NIL; CUR--.B := NIL; BASE := CUR; WRITELN(‘QUERY TREE INPUT’);
AN ALGORITHM
TO CONVERT
GRAPHS INTO EDNF
67
WRITE(‘ENTER THE QUERY’); READLNCQUERY); WRITEC‘ENTER THE ROOT’); READLN(CUR^ .DATAS; REPEAT NEW(CUR); CUR*.L := NIL; CUR* .R := NIL; WRITE(‘ENTER A NODE’); READLN(CUR”.DATA); RE~LN(CUR~.DAT~; REPEAT WRITE (‘IS IT IMMEDIATELY TO THE LEFT, RIGHT OR NEITHER TO LEFT NOR RIGHT OF BASE”. DATA,‘(L,‘R/N); READLNCSTAT); IF STAT = ‘L’ THEN BEGIN BASE,-, L := CUR; CUR .B := BASE; END; IF STAT = ‘N’ THEN BEGIN P := BASE-.B; BASE := P; END; UNTIL (STAT = ,,‘> 0~ (STAT = ‘R’); IF CURI, DATal] = ‘* ’ THEN BEGIN WRITE(‘ANY MAPPING Y/N?‘); READLN@fAPNS); IF MAPNS = ‘Y’ THEN BEGIN WR~~ENTER OPINES READLN(CURkAP); END; END; BASE = CUR; WRITE(‘ANY MORE Y/N?‘); R~~LN(CHOICE~ UNTIL CHOICE = ‘N’; END;
68
A. KHAN ET AL PROCEDURE PROCESS; BEGIN REPEAT {Locates for innermost open bracket} FOUND := FALSE; WHILE FOUND = FALSE DO BEGIN Q := EDNFi EDNF := Q .BACK; IF EDNF*.EDDAT&l] = ‘(’ THEN FOUND := TRUE; END; OPEN := EDNF; EDNF:OPEN^.NEXT {Locates for the innermost close bracket} FOUND := FALSE; WHILE FOUND = FALSE DO BEGIN Q := EDNFi EDNF := Q *NEXT; IF EDNF”.EDDAT&l] = ‘), THEN FOUND := TRUE; END; CLOSE := EDNF; {If there is an OR before the close bracket then remove it} IF CLOSE~.BACKA,~DDAT~l] = ‘ + ’ THEN BEGIN REM := CLOSE”.BACK, REM-.BACK”.NEXT:= REM”.NEXT; REM^.NEXT”.BACK := REM^.BACK; DISPOSE(REM); END; {If there is, an ORnafter the open bracket then remove it} IF OPEN .NEXT .EDDATA[l] = ‘ i- ’ THEN BEGIN REM := OPEN”.NEXT; REM”.BACK”.NEXT := REM”.NEXT; REM”.NEXT := REM”.BACK; END; IF OPEN”.BACK< > NIL THEN BEGIN {If there is an OR before the open bracket}
AN ALGORITHM
TO CONVERT
GRAPHS INTO EDNF
69
IF OPEN^.BACK^.EDDATA[l] = ‘ + ’ THEN BEGIN OPEN^.BACK”.NEXT := OPEN^.NEXT; OPEN” .NEXTI .BACK := OPEN^.BACK; DISPOSE(OPEN); CLOSE-.BACKI .NEXT := CLOSE ^.NEXT; CLOSE^.NEXT*.BACK := CLOSE^.BACK; DISPOSE(CLOSE); END; {If there i: an AND before the open bracket] IF OPEN .BACK .EDDATA[l] = ‘* ’ THEN BEGIN Q := OPEN^.NEXT; REM2 := OPEN ,..BACK; REM := REM2; REPEAT IF (REM^.EDDATA[~] = ‘(’ OR (REM^.EDDATA[l] = ‘ + ‘1 THEN BEGIN STOP := TRUE; REM1 := REM .NEXT; END; ELSE REM := REM^ .BACK; UNTIL STOP := TRUE; OPEN^.BACK^.NEXT := OPEN^.NEXT; OPEN*.NEXT”.BACK := OPENI.BACK; DISPOSE(OPEN); REM := Q; STOP := FALSE; REPEAT IF (REM”.EDDATA[~] =‘ +J) 0~ (REM^.EDDATA[~] =w THEN BEGIN STOP := TRUE; IF REM* .EDDATA[l] = ‘ + ’ THEN REPEAT NEW(LISTER); REM” .NEXT” .BACK := LISTER; LISTER”.NEXT := REM^.NEXT; REM” .NEXT := LISTER; LISTER^ .BACK := REM; LISTER”.EDDATA := REMl^ .EDDATA;
70
A. KHAN ET AL. LISTER*.EDMAP := REMl^.EDMAP; REM1 := REMl”.NEXT; REM := LISTER; UNTIL REM1 = Q; ELSE BEGIN REM--.BACKA.NEXT := REM^.NEXT; REM^.BACK^.BACK := REMI.BACK; DISPOSE(REM); END; END; ELSE REM := REM^.NEXT; UNTIL STOP := TRUE; .{If this is the last set of brackets then remove them) IF OPENI.BACK = NIL THEN BEGIN Q := OPENI.NEXT; OPEN”.NEXT := NIL; Q^.BACK := NIL; DISPOSE(OPEN); REM := CLOSE^.BACK; CLOSE^.BACK:NIL; REM^.NEXT := NIL; DISPOSE(CLOSE); PERMANENT := Q; END; UNTIL Q”.BACK = NIL; END; BEGIN {MAIN PROGRAM} Q := NIL; BRACK:O; GETQUERY; INORDER(PERM); FOR INSERT:= 1 TO BRACK DO BEGIN NEW(EDNF); EDNF”.EDDATA:= ‘)‘; EDNFl.NEXT := NIL; EDNF”.BACK := Q; Q^ .NEXT := EDNF; Q := EDNF; END;
AN ALGORITHM
TO CONVERT
GRAPHS INTO EDNF
71
LISTER := EDNF; PROCESS; EDNF := PERMANENT; WRITELN(‘RESULTS’); WRITELN(‘THE EDNF TREES FOR THE QUERY’,QUERY, ‘ARE’); WHILE EDNF < > NIL DO BEGIN IF (EDNF-.EDDATA[l] = ‘ + ‘) THEN BEGIN WRITELN; WRITELN; WRITELN; END ELSE IF (EDNF^.EDDATA[l] = ‘* ’ THEN BEGIN WRITE(EDNF^ .EDDATA[l]); IF EDNF*.EDMAP < > NIL THEN WRITELN(EDNF^.EDMAP); END ELSE WRITE(EDNF^.EDDATA); Q := EDNF; EDNF := Q .NEXT; END; END REFERENCES 1. F. Bancilhon and R. Ramkrishnan, An amateur’s introduction to recursive query processing strategies, Proceedings of the International Conference on Management of Data, 1986, pp. 16-52. 2. K. Y. Whang and S. B. Navathe, An extended disjunctive normal form approach for optimizing recursive logic queries in loosely coupled environment, Proceedings of the 13th T/LDB Conference, Brighton, 1987. 3. J. Ullman, Implementation of Logical Query Languages for Databases, ACM Trans. Database Syst. 10(3):289-321
(1985).
Received 3 October 1992; accepted 1 December
1993