Information Systems Vol. 17, No. 4, pp. 299-322, Printed in Great Britain. All rights resetved
AUTOMATIC
1992 Copyright 0
0306-4379/92 $5.00 + 0.00 1992 Pergamon Press Ltd
GENERATION OF COMPILED FORMS FOR LINEAR RECURSIONS JIAWEIHAN’
and KANGSHENGZENG~
‘School of Computing Science, Simon Fraser University, British Columbia, Canada V5A IS6 *Department of Computer Science, Zhejiang University, Hangzhou, Zhejiang, P.R.C. (Received
4 September
1991; in revised form 31 March
1992)
Abstract-This article presents a graph-matrix expansion-based compilation technique which compiles complex linear recursions into highly regular compiled forms. The technique uses a variable connection graph-matrix, the V-matrix, to simulate recursion expansions and discover the expansion regularity of complex linear recursions. Our study shows that linear recursions can be compiled into highly regular compiled forms by the V-matrix expansion technique and such compiled forms can be generated automatically. The compilation of linear recursions into compiled forms not only captures the bindings which are difficult to capture otherwise but also facilitates the development of powerful query analysis and evaluation techniques for complex linear recursions in deductive databases. Key words: Deductive databases, compilation
techniques, linear recursions, recursive query evaluation
1, INTRODUCTION
Compilation is a powerful preprocessing technique in the evaluation of recursions in deductive database systems [l-l 13. Compilation methods can be classified into 2 categories, query-dependent compilation and query-independent compilation. The former rewrites a logic program into an equivalent but more efficiently evaluable one bused on speczjic query forms [l-3, 111.The Magic rule rewriting technique [l, 3, 1l] is a typical such method. The latter compiles IDB predicates into a set of compiled forms independent ofqueries. When a query is submitted to the system, the compiled forms are incorporated with database statistics and query information in the query analysis and evaluation. There have been some interesting studies on the query-independent compilation of complex recursions. Henschen and Naqvi [7] studied the compilation of some linear recursions in complex variable patterns. Ioannidis [ 121studied the recognition of a subclass of bounded linear recursions. Sagiv [13] studied the optimization of Datalog programs. Naughton studied the compilation of one-sided linear recursions in some complex variable patterns [14]. Jagadish et al. [15] studied the compilation of linear recursions based on an expansion mechanism. Agrawal and Devanbu [16] studied moving selections into complex linear recursions which involve complex variable connections. Youn et al. [17] classified linear recursions into several classes and discovered the expansion regularities for certain classes. Han [18] studied the compilation of complex linear recursions using a variable connection graph, the V-graph, and concluded that a linear recursion can be compiled into either a bounded recursion or a chain recursion. Al-Sukairi and Henschen [19] proposed a simulation-based compilation method which applies a set of matrices to simulate the expansions of linear recursions. However, their matrix construction method is rather complex, and moreover, the method cannot discover the minimum expansion sequences for some linear recursions. None of the above studies have generated an efficient algorithm to automatically generate precise compiled forms for complex linear recursions. In this article, we present a variable connection graph-matrix, the V-matrix, and develop a V-matrix expansion technique, which discovers the minimal necessary expansions in the compilation of complex linear recursions. Moreover, the compiled forms of linear recursions can be generated automatically based on such V-matrix expansions. In comparison with the previous studies, the V-matrix captures similar predicate/variable connection information of linear recursive rules as that captured by the a-graph of Ioannidis [12], the A/V-graph of Naughton [20], and the V-graph of Han [18]. However, when the predicate/variable connections become complex, it is 299
300
JIAWEIHAN and KANGSHENG ZING
difficult to discover expansion techniques using these graphs. Our study shows that the V-matrix expansion approach provides an organized way to simulate recursion expansion, record predicate/variable connectivity and discover expansion regularity. The motivation of our study is illustrated in the following example. As a notational convection, the syntax of PROLOG [21] is adopted for logic rules except the upper case letters are used to denote predicates and the lower case letters to denote attribute vectors, with those starting with s, . . . ) z indicating variables and others indicating constants. Example 1. The recursion {(SG,,), (SG,)) is a typical linear recursion in chain rules [22], where SG represents the same generation relatives [l, 2, 111.
(SG,)
SG(x, y) + Sibling(x, y).
(SG,)
SG(x, y) + Parent(x, u), SG(u, u), Child(v, y).
It is easy to observe by expansions that SG can be compiled into a compiled form (SG,) [6], which consists of 2 synchronous chains, a Parent -chain and a Child-chain, where synchronous means that the 2 chains grow synchronously (i.e. they are of the same length at each expansion).
WC)
WXO,Y,)
=
iuo (P
arent’&
I) Xi), Sibling(xi, y,), Childi(yi, yi_ ,)).
Notice that the notation Parent’@_, , xi) indicates a tautology when i = 0, or otherwise, a chain of length i which is a sequence of compositions of i Parent predicates. That is, Parent’(X,_ 1, Xi) =
True Parent’-‘(xi_2,
if i = 0, xi_ ,), Parent(xi_, , xi)
if i > 0.
It is easy to compile the SG-recursion into a chain form because of the simplicity of the variable connections in the recursion. However, it is challenging to compile complex linear recursions, such as the recursion {(P,), (P,)}.
A (x, , x4), WY, 1, C(z19 Y,, zI ), WY,,
a, z,>, F(Y
4 u, b).
0
The goal of this article is to develop an algorithm which automatically discovers the expansion regularity and generates highly regular compiled forms for complex linear recursions. The article is organized as follows. In Section 2, a variable connection graph-matrix, V-matrix, is introduced. A V-matrix may consist of 1 or more units depending on the variable connectivity in the matrix. The expansion regularity of the recursions with single-unit V-matrices is studied in Sections 3. The automatic generation of compiled forms for such recursions is studied in Section 4. The expansion regularity and the generation of compiled forms for linear recursions with multiple-unit V-matrices are studied in Section 5. The transformation of the compiled forms into a normalized recursion is presented in Section 6. The strength and limitations of the expansion-based compilation method are discussed in Section 7 and the study is summarized in Section 8. 2.
VARIABLE
CONNECTION
MATRIX:
THE
V-MATRIX
Like many researchers, we assume that a deductive database consists of an EDB (a set of database relations) and an IDB (a set of predicates defined by function-free Horn-clause rules). A deduction rule is of the form “PO+ P,, P2, . . . , Pk)‘, where Pi (0 6 i 6 k) is of the form pi(xl 9* . . , xn) where Pi is a predicate name, xi (1
Automatic generation of compiled forms for linear recursions
301
recursion is a program consisting of one linearly recursive rule and one or more nonrecursive (exit) rules.
The following assumptions are made in our discussion: Assumption
1. All the nonrecursive
predicates are EDB predicates.
Assumption
2. There is no constant
appearing in a predicate in the rule.
Assumption 3. There is only one (default) nonrecursive rule (E,,) in the recursion with the head predicate R(x, , . . . , x,). (Ed
R(x,,...,x,)tEo(x,,...,x,).
It is easy to see that the first and third assumptions can be adopted without loss of generality. Assumption 2 confines the rules to be constant-free. Obviously, a constant in a nonrecursive predicate can be eliminated in preprocessing by performing selection using the constant and then projecting off the column containing the constant. A constant in a recursive predicate may affect the expansion behavior which will be discussed in Section 7. Definitions. For a linear recursive rule with the head predicate R(x, , . . . , x,), the 0th expansion of R is defined as a tautological rule,
Wx,, . . . , x,) + Wx, , . . . , x,). The first expansion of R is the recursive rule of R. The kth expansion of R (k > 1) is the unification of the recursive rule of R with the (k - 1)st expansion of R. The k th expanded exit rule of R (k ‘3 0), denoted as Ek(x, , . . . , xn), is the unification of the kth expansion with the exit rule (E,). The compiled form of R is the union of all its expanded exit rules. The expansion behavior of a recursion is closely related to the variable connections among its predicates. Definitions. Two predicates in the body of a rule are connected if they share variable(s) with each other or with a set of connected predicates. Two nonrecursive predicates in the body of a rule are U-connected if they share variable(s) with each other or with a set of U-connected predicates. A set of variables are U-connected if they are in the same nonrecursive predicate or in the same set of U-connected (nonrecursive) predicates. The variables of a recursive rule can be partitioned based on their U-connections. In order to study the expansion behavior of a linear recursive rule, a variable connection graph-matrix, V-matrix, is constructed based on the variable connections of a linear recursive rule. Definition. The variable connection
graph-matrix, V-matrix, for a linear recursive rule of arity n (the arity of the head predicate) consists of a sequence of rows. Each row consists of n columns with the ith one corresponding to the ith argument position of the recursive predicate. Moreover, there are possibly U-connection edges between some columns in a row. The contents of an initial V-matrix reflect the variable connection information in the corresponding arguments in the original recursive rule. Its expansions reflect similar information in the expanded recursive rules. The initial V-matrix, which consists of the first 2 rows (row [0] and row [l]) of the V-matrix, is constructed according to the following V-matrix initialization rules, while the remaining rows, if any, are constructed based on the V-matrix expansion rules to be presented in the next section. V-matrix initialization rules
A V-matrix is initialized in the following 4 steps: (1) Partition the variables in the rule according to the U-connections (and each partition is called a U-connected set); (2) copy the variables in the recursive predicate in the head and the body to the corresponding columns in row [0] and row [l] respectively; (3) replace the variable at each column of row [l], say x, by the set of distinguished variables U-connected with x, if any; and
JIAWEIHAN and KANGSHENG ZENG
302
(4) set up a U-connection edge between each pair of columns in the corresponding row if the pair of columns are in row [0] and contain U-connected distinguished variables, or if they are in row [l] and contain U-connected nondistinguished variables. Example 2. The initial V-matrices of the recursive rules (A,) to (G,) are shown in Fig. l(a-g). (A,) (B,) (C,) (0,) (F,) (G,)
R(x)+A(x, x,)9 R(x,). R(x)+A(x,x’), R(x,). R(-~,Y)tA(x,Y,),R(x,,Y,),B(x,,Y). R(x,Y,Z)cA(X,Y),R(x,,z,z,),B(X,,z,). R(x,Y)cA(x,xl,Y),B(Y,Y,),R(x,,Y,). R(x, Y, w, t, z, ~20) + R(Y, x,, t, w,, z, >u,, u, 1, A 6,
x, , t),
NW, w, 1, C(z), W,
0, u, 1.
For rule (A,), x, in row [l] is replaced by x as shown in Fig. l(a) because there is a U-connected set {x, x, }. For (B, ), x, in row [l] retains as shown in Fig. l(b) because x, is not in the U-connected set {x, x’}. For (C,), x, is replaced by Y, and Y, by x in row [l] as shown in Fig. l(c) because the rule has 2 U-connected sets: {x, Y,} and {x, ,Y}. For (D,), since it has 3 U-connected sets {x,Y}, {z} and {x,, z, }, row [l] [x,, z, z,] retains, and there are 2 U-connection edges, one between columns 1 and 2 in row [0] and the other between columns 1 and 3 in row [l] as shown in Fig. l(d). For (F, ), since there is 1 U-connected set {x, x, , y, y, }, x, and Y, in row [l] should be replaced by the set of distinguished variables {x, Y}, and moreover, there is 1 U-connection edge between columns 1 and 2 in row [0] as shown in Fig. l(f). Finally, (G,) contains 6 U-connected sets {x, xl, t>, {w, w,}, {z>, {4 W and 1u, t), u, }. Thus for the variable vector [y, x,, t, w,, z, , uI, u, ] of row [l], xl and I are replaced by (x, t}, w, by w, and a, by {u, u}. Moreover, there are 2 U-connection edges in row [0], one between columns 1 and 4 and the other between columns 6 and 7 as shown in Fig. l(g). cl A V-matrix can be partitioned into 1 or more unit V-matrices based on the connections among matrix columns. Definitions. Two columns of a V-matrix are connected if the 2 columns in the initial V-matrix share a variable or a set of U-connection edges with each other or with a set of connected columns. A set of connected columns form a unit V-matrix. A linear recursive rule whose V-matrix consists of only one unit is a single-unit rule; otherwise, it is a multiple-unit rule. In Example 2, the first 5 rules, (A,) to (F, )), are single-unit rules; while the sixth one, (G,), is a multiple-unit rule which consists of 3 units headed by (i.e. in row [0]) [x, Y, w, t], [z] and [u, v], respectively. 3. EXPANSIONS
OF
SINGLE-UNIT
We study the expansions of corresponding
LINEAR
RECURSIVE
RULES
V-matrices for single-unit linear recursive rules.
Example 3. The recursive rules (A,) to (F, ) in Example 2 are expanded to (A,) to (Fz) respectively in the second expansion. (A*) (82) (C,) (02)
R(x)cA(x,x,),A(x,,x,),R(x,). R(x)cA(x,x’),A(x,,x;),R(x,). R(x,Y)tA(x,Y,),~(x,,Y,),R(x*,Y*),A(x,,Y*),B(x,,Y).
(4)
R(x,Y)cA(x,x,,Y),B(Y,Y,),A(x,,x*,Y,),B(Y,,Y,),R(x,,Y,).
R(X,Y,Z)tA(X,Y),A(x,,z),R(xZ,z,,Z*),B(X*,Z2),B(X,,z,).
If each recursive rule generated at the second expansion is treated as an original recursive rule, row [2] of each V-matrix can be constructed as row [l] using the same V-matrix initialization rules presented in Section 2. For example, row [2] in Fig. 2(a) is [x] because x, is in the U-connected set {x,x,, x2}, and x is a distinguished variable. Row [2] in Fig. 2(b) is [x2] because x, is not U-connected to the distinguished variable x in @). Row [2] in Fig. 2(c) is [x, y ] because there are 2 U-connected sets in (C,): ( x, x2, y, } and {x, , y, y, }, and x2 can be replaced by x, and y, by Y.
303
Automatic generation of compiled forms for linear recursions
101
X
X
101
x
ill
Xl
(a)
x Y
VI
101
(X,Y 1 b.Y I
101
XL
x
[l]
x
z
(c)
(b)
X2
Y
X
. Y
111
y
z
101
z3
Ul
(d) w
(x,t) {x,t)
!
z
w
zi
u-
lu,vl fu.vl
101 111
(is)
(f)
Fig. 1. The initial V-matrices of rules (A,) to (G,).
Row [2] in Fig. 2(d) is [x2, z, z2] because there are 3 U-connected sets in (D2): {x, JJ}, {x,, z, z, >, and {x,, zZ}, the second column z, should be replaced by the distinguished variable z, and there should be a U-connection edge between x2 and z2. Finally, row [2] in Fig. 2(f) is [(x, y}, {x, y}] because there is only 1 U-connected set (x, y, x, , y, , x2, y2}. Interestingly, row [2] can be derived not only from the second expansion of the recursive rule but also from the initial V-matrix, i.e. from row [0] and row [l]. For example, Fig. 2(a) indicates that if a distinguished variable x at row [0] derives the same x at row [I], it will derive the same x at row [2]. Figure 2(b) indicates that if a distinguished variable x at row [O] derives a nondistinguished variable x, at row [ 11, it will derive a new nondistinguished variable, such as x2, at row [2]. Figure 2(c) indicates that if a distinguished variable x at row [0] derives another distinguished variable y at row [l], the same x will derive the same y from row [l] to row [2]. Using the same rules observed in Fig. 2(b and c), [x2, z, , zz] should be obtained for row [2] in Fig. 2(d). However, since the copy of U-connection edges from row [l] to row [2] and from row [0] to row [l] make z and z, U-connected, z, in row [2] is replaced by the distinghished variable z, and row [2] becomes [x2, z, zJ. Similarly, row [2] for Fig. 2(f) is [(x, y}, (x, y}] because both x and y derive the set (x, y }. Cl The above example shows that new rows of a V-matrix can be generated from its initial V-matrix by a set of V-matrix expansion rules, and the generated rows reflect the U-connectivities of the corresponding expanded recursive rules. Definition. A variable y is a derivative of a distinguished variable x in a V-matrix if y is derived by x,
that is, y and x are at the same column in the V-matrix but y’s row number is x’s row number + 1. In general, the V-matrix expansion rules can be summarized as follows, where the row NewRow (= LastRow + 1) is generated from the row LastRow of the V-matrix. V-matrix expansion rules
(1) (Row generation) For each distinguished variable x in V-matrix [LastRow, i]. add x’s derivatives to V-matrix WextRow, i]. (2) (U-connection propagation) The U-connection edges are copied from LastRow to NewRow and then from LastRow - 1 to LastRow. If such copying makes a distinguished variable x U-connected to the set of variables in V-matrix [NewRow, i], x is added to the set of variables in V-matrix [NewRow, i]. X
101
x
101
x
y
WI
X
111
x1
111
Y
x
111
X
PI
X*
121
x
Y
121
(a) X
Y
Xl ; X
Cc)
(b)
Z
(d)
z
101
“,1
111
Z3
121
101 b.Y)
(X,Yl
t21
(0
Fig. 2. V-matrices of the 5 rules at the second expansion.
JIAWEIHAN and KANGSHENG ZENG
304 ... ... ,., ... ...
xi
. . *
Xi
. . .
xp
. . .
yi ... ... ...
... ... ... ...
Xi ..* ... ...
... ... ... ...
y, ...
... ...
‘i xi
Fig. 3. V-matrix expansion corresponding
“’ ..’
IO1 Ill ...
WI W+ll
to rule expansions.
Lemma 1. Each row of the V-matrix generated by the above V-matrix expansion rules correctly registers the set of distinguished variables U-connected to each column of the recursive predicate in the body at each expansion. Proof
According to the V-matrix construction rules, the initial V-matrix registers: (i) the set of distinguished variables U-connected to each argument position in the recursive predicate in the body; (ii) the U-connections among the distinguished variables in the head predicate of the recursive rule; and (iii) the U-connections among the nondistinguished variables in the recursive predicate in the body. Suppose in the recursive rule (R,), xi, the ith variable of the head predicate, is U-connected to yj, the jth variable of the recursive predicate in the body. Notice that the subscript number of a variable indicates its argument position in the recursive predicate. Suppose xi is U-connected to yP at the kth expansion as shown in (&). (R,J can be rewritten as (R;) by variable renaming (for resolution). By resolution, it is obvious that xi is U-connected to zP at the (k + 1)st expansion, (&+I). (RI) (R,J (R;) (Rk+l)
R(...,Xi,...,Xj,...,Xp,...)tA(Xi,yi),R(...,yi,...,yj,...,y,,...),.... R( . . . . xi ,..., xi ,..., xr ,... )+-B(x,,y,),R( . . . . yi ,..., yj ,..., yr ,... );*.. R(...,Yi,...,Yj,...,Yp,...)tB(yj,Zp),R(...,Zi,...,Zj,...,Z,,...),.... R(...,Xi,...,Xj,...,Xp,...)tA(Xi,yj),B(yj,Z,),R(...,Zi,...,Zj,...,Z,,...),~...
This U-connection can be obtained by the V-matrix expansion rule (1) (Fig. 3). In the initial V-matrix, xi derives xi from row [0] to row [l]. The pth column of row [k] contains xi. Then the pth column of row [k + l] should be xi according to rule (1) which is also derivable from the expanded rule (Rk+ ,). Similarly, if there were no expression A in (R,), the jth column should be yj in row [ 11. Then the p th column in row [k + l] should be a nondistinguished variable from the V-matrix expansion rule (1) or from the expanded rule. Similarly, other distinguished variables U-connected with Xi will play the same role. Thus the V-matrix expansion rule (1) is correct. The correctness of the V-matrix expansion rule (2) (U-connection propagation) is proved as follows. The initial V-matrix sets up the U-connection edges for the U-connections between distinguished variables and for those between nondistinguished variables. Suppose in the initial V-matrix, the ith and jth (distinguished) variables in the head, xi and xi, are U-connected via A(xi, xi), the nondistinguished variables in the jth and kth columns in the recursive predicate in the body, x( and xi, are U-connected via f?(x;, xi), and the variable at the ith column in the recursive predicate in the body and the kth variable in the head xl and xk, are U-connected via C(xj, xk), as shown in (S,). (S,)
R(...,Xi,...,Xj,...,Xk,...)t
At the second expansion (S,), the newly set U-connections make xk U-connected with xi, xi’, xi and xj’ . This verifies the correctness of the V-matrix expansion rule (2). Similarly, we can prove its correctness for other kinds of single-unit recursive rules. (S,)
R( . . . . xi ,..., A(xi9
xi ,...,
Xj),B(x~,x;)9
C(x;,x;),R(
x, ,... )c C(xj,Xk),A(x(,Xi'),B(x~,x;),
. . . . xf ,...,
xJ’ ,...,
xk”,... );.a.
Automatic generation of compiled forms for linear recursions
305
Table 1. Stable levels and periods of the recursions A, to F, Rule s
4 0
T
I
4
G
I 0
0 2
D, 1
6 0
I
I
The U-connection information in the original recursive rule are fully registered in the initial V-matrix. Based on the above reasoning, we can conclude that the V-matrix expansion rules register completely the U-connection information for each column relevant to the set of distinguished variables. 0 In principle, a V-matrix can be expanded infinitely by following the V-matrix expansion rules. However, it is easy to observe that starting at a certain expansion, future expansions of a V-matrix will repeat the patterns of the existing rows in the V-matrix. Definitions. The DV-set of a column is the set of all the distinguished variables U-connected to the variable(s) in the column. Two rows, row [i] and row [j], in a V-matrix are identical (denoted as row [i] = row [j]) if each pair of their corresponding columns has the same DV-set. Notice that if a column in a row of a V-matrix contains a distinguished variable x, the DV-set of the column is the maximal set of U-connected distinguished variables containing x in any column of the V-matrix. In Fig. 2(a and f), row [l] = row [0] (notice that 2 rows are identical in Fig. 2(f) since x and y are U-connected via a U-connection edge in a row [O]). Similarly, row [2] = row [l] in Fig. 2(b and d) and row [2] = row [0] in Fig. 2(c). Clearly, if row b] E row [i] (j > i), all the expansions starting at row [j] repeat the expansions starting at row [il. That is, Lemma 2. In a single-unit V-matrix, if row [i] E row [j], and i
Proof. According to the V-matrix expansion rules, all the rows in a single-unit V-matrix share the same initial V-matrix and follow the same V-matrix expansion rules in their expansions. Therefore, if two rows contain the same set of distinguished variables in their corresponding columns, their expanded rows should still contain the same set of distinguished variables in the corresponding columns. That is, if row [i] s row [j], then row [i + l] = row [j + 11. By induction, row [j+k]Erow [i+k] for any k>O. cl Based on such regularity of V-matrix expansions, the stable level and the period of a V-matrix are defined below. Definitions. If starting at row S, there exists a T such that the row of a single-unit V-matrix repeats
at every T more expansions, that is, row [S + k x T] c row [S] for all k > 0, then S is called the stable level and the smallest T the period of the V-matrix. If row [S] contains no distinguished
variables, T is defined as 0. For example, the stable levels and periods of the recursions A, to F, are shown in Table 1. In general, the following algorithm is presented for the expansion of a single-unit V-matrix and the derivation of its stable level S and the period T. Algorithm 1. The expansion of a single-unit V-matrix and the derivation of its stable level S and
the period T. Input. An initial single-unit V-matrix. Output. An expanded V-matrix, the stable level S and the period T. Method. begin
LastRow:=O; CurrentRow:= 1; (CurrentRow,
while not RowRepeating begin
LastRow:=CurrentRow;
ExistingRow) do
CurrentRow:=CurrentRow
+ 1;
306
JIAWEIHAN and KANGSHENG ZENG
/*Generate the contents of the CurrentRow.*/ for each column i do /*Every column in CurrentRow is initially empty.*/ for each distinguished variable x in V-matrix[LastRow, i] do Add x’s derivatives to V-matrix[CurrentRow, i]; /*U-connection Propagation.*/ Copy the U-connections from LastRow to CurrentRow; Copy the U-connections from LastRow - 1 to LastRow; for each column i do for each x in V-matrix[CurrentRow, i] do if x is U-connected to a distinguished variable y which is not already in V-matrix[CurrentRow, i] then Add y to V-matrix[CurrentRow, i] and remove, if any, nondistinguished variables there end; S:=ExistingRow; if there is no distinguished variable in CurrentRow then T:=O else T:=CurrentRow - ExistingRow end. Notice that RowRepeating is a Boolean function which returns true if there is an ExistingRow, where 0 < ExistingRow < CurrentRow, such that row[ExistingRow] _=row[CurrentRow]. That is: function RowRepeating (CurrentRow, var ExistingRow): Boolean; begin ExistingRow:=CurrentRow - 1; repeat if row[CurrentRow] E row[ExistingRow] then return(true) else ExistingRow:=ExistingRow - 1 until ExistingRow < 0; return(false) end.
Cl
Algorithm 1 generates almost the same matrices for Example 3 as Fig. 2 except that it does not generate new nondistinguished variables from nondistinguished variables. That is, row [2] of Fig. 2(b) is [@I instead of [x,], and row [2] of Fig. 2(d) is [a, z, @] instead of [xr, z, zz]. This simplifies book-keeping effort and will not influence the derived stable levels and periods. Theorem 1. In a single-unit recursive rule of arity n, the expansion of its V-matrix terminates at or before the nth iteration. That is, S + T
307
Automatic generation of compiled forms for linear recursions 4.
AUTOMATIC GENERATION OF SINGLE-UNIT LINEAR
COMPILED RECURSIONS
FORMS
FOR
Since the variable connectivity of an expanded V-matrix corresponds to the variable connectivity of the expanded recursive rules, the regularity of the expansions of a V-matrix corresponds to the regularity of the expansions of the corresponding recursion. When a single-unit V-matrix reaches its stable level, the corresponding linear recursion reaches its stable stage, where a linear recursion is stable at the Sth expansion if: (1) it generates no new results for any EDBs by further expansions (bounded [20]); or (2) it adds to the body of an expanded rule the same set of EDB predicates U-connected to the same set of distinguished variables at every further T expansions (periodicity). Lemma 3. The recursion becomes stable at level S with period T, where S and T are derived from the V-matrix expansion. Proof An expanded V-matrix represents the connectivities of the distinguished variables of its corresponding expanded linear recursion. Starting at the Sth expansion, row [S] z row [S + k x T] (k > 0). This indicates that starting at the Sth expansion, the connections of the distinguished variables repeat at every further T expansions. Therefore, the recursion becomes stable at the level S with period T. El
Therefore, S is also called the stable level of the recursion, and T the period of the recursion. The Sth expanded recursive rule is called the stable rule of the linear recursion. Definitions. A chain of length k (k > 1) is a sequence of k predicates with the following properties: (1) all k predicates have the same name, say p, and the Ith p of the chain is denoted as pO; (2) there is at least 1 shared variable in every 2 consecutive predicates, and if the ith variable in the first predicate is identical with thejth variable in the second, then (i, j) is an invariant of the chain in the sense that the ith variable of p0 is identical with the jth variable of pu+ ,) for every I where 1 d I d k - 1, Each predicate of the chain is called a chain predicate. A chain predicate may consist of a sequence of connected nonrecursive predicates. Definitions. A linear recursion is a chain recursion, or more precisely, an n-chain recursion if for any positive integer K, there exists a kth expansion of the recursion consisting of one chain (when n = 1) or n synchronous (of the same length) chains (when n > 1) each with the length greater than K, and possibly some other predicates which do not form a chain. It is a single-chain recursion when n = 1, or a multi-chain recursion when n > 1. A recursion is bounded if it is equivalent to a set of nonrecursive rules. Lemma 4. A single-unit linear recursion is bounded if its period T = 0. Proof. According to the definition of the period of a V-matrix (hence a single-unit linear recursion), the period T = 0 indicates that at the stable level S, row [S] contains no distinguished variables.
Variables at the further expansion on the row are U-connected only to the nondistinguished variables which have no W-connections with the distinguished ones. Such U-connections cannot contribute to the new answer(s) of the distinguished variables. Thus the recursion is bounded. 0 For example, the recursion formed by (B,) and the default exit rule (EO) is bounded since its period T = 0. Lemma 5. The number of potential chains of a single-unit linear recursion is the number of distinct DV-sets in the row [S + T] of its V-matrix, where S is the stable level, and T is the period of the V-matrix. Proof. Let w be the DV-set of column C of row [S + 7J in the V-matrix of a single-unit linear recursion. If the Cth variable v, in the recursive predicate of the (S + T)th expansion is the same distinguished variable as the head predicate, the column generates no chains in the expansions since
308
JIAWEI HAN and KANGSHENG ZENG
the distinguished variables at this column are linked to the recursive predicates directly via no nonrecursive predicates. Otherwise, the variable v, must be U-connected to o via a set of nonrecursive predicates. This set of nonrecursive predicates or a portion of it will grow repeatedly at every T further expansions, which forms a chain. If column C’ of row [S + Tj in the V-matrix has the same DV-set w, it will share the same chain with the column C since the 2 columns are U-connected at the (S + T)th expansion. Thus, we have the Lemma. q Example 4. In Examples 2 and 3, (A,) has one potential chain since its row [l] (S + T = 0 + 1 = 1) [x] has one distinct set of distinguished variables; (C,) has 2 potential chains since its row [2] (S + T = 0 + 2 = 2) [x, y] has 2 distinct sets of distinguished variables; (0,) has 1 potential chain since its row [2] (S + T = 1 + 1 = 2) [x2, z, zz] has 1 distinct set of distinguished variables; and (F,) has 1 potential chain since its row [l] (S + T = 0 + 1 = 1) [{x, y}, (x, y>] has one distinct set of distinguished variables. cl We examine the generation of a chain-predicate for each chain. Each chain predicate which corresponds to a distinct DV-set is a set of U-connected predicates which repeats at every T expansion(s) starting at the Sth expansion and U-connects a DV-set to its corresponding columns in the recursive predicate at the (S + T)th expansion. A chain predicate is formed by the following process. First, the set of nonrecursive predicates generated from the (S + 1)st expansion to the (S + T)th expansion is taken as the candidate set of chain predicate(s) since the same set of predicates repeats at every T further expansions. However, there could be two problems for such a candidate set: (1) a set of predicates in the candidate set corresponding to a distinct DV-set may not be U-connected together; (2) some predicates in the candidate set may not be U-connected to any set of distinguished variables. Such predicates cannot be the part of chain predicates and therefore should be replaced by the corresponding predicates in the previous expansions. Thus, a set of U-connected predicates in the replaced set which are also U-connected to a distinct DV-set is the chain predicate for the DV-set. Secondly, the variables in the chain predicate should be indexed appropriately and renamed when necessary. The variables not shared with any predicate outside of the chain should be ignored in the presentation since they play no role in the information exchange among different chain elements during iterative processing. Other variables should be indexed properly to reflect the information passing during iterative processing. Let the set of variables in the recursive predicate at the Sth and the (S + T)th expansions be S-set and ST-set, respectively. For each variable in a chain predicate, the variable in the ST-set should have the same name (renaming when necessary) as the corresponding variable (i.e. at the same column ) in the S-set but with the index increased by one. If a variable in the ST-set happens to appear also in the S-set, the same variable should be the same (name and index) in the new set of variables. Renaming and indexing of a variable should be performed consistently for every occurrence of the variable in the recursive rules in the (S + T)th expansion. Therefore, we have the following rule for chain generation. Chain generation rules The generation of chain-predicates for nonnull chains from the recursive rules in the (S + T)th expansion consists of the following 3 steps. (1) Take the set of nonrecursive predicates generated from the (S + 1)st expansion to the (S + T)th expansion as the candidate set of the chain predicates. (2) Replace the predicates in the candidate set which are not U-connected to any set of distinguished variables or which corresponds to a distinct DV-set but are not U-connected together by their corresponding predicates in the previous expansions. This makes the predicates corresponding to a DV-set U-connect together. Each chain is a set of predicates in the replaced set corresponding to a distinct DV-set. (3) Rename and index the variables in the (S + T)th expansion. Ignore the variables not shared with any predicate outside of the chain. For the remaining variables in the chain predicate, rename and index them when necessary to make each variable in the ST-set have the same
Automatic generation of compiled forms for linear recursions
309
name as the corresponding variable in the S-set but with the index increased by one. If a variable in the ST-set appears also in the S-set, the same variable should be the same (name and index) in the new set of variables. Renaming and indexing of a variable should be performed consistently for every occurrence of the variable. 0 Lemma 6. Following the chain generation rules, the chain predicates and their associated variable names and indices are generated correctly. Proof. Based on the analysis performed before the presentation of chain generation rules, each chain so generated consists of U-connected predicates, corresponds to a distinct DV-set, and repeats at each T expansions. Also, the variables are consistently (for each occurrence) renamed (when necessary) and appropriately indexed to reflect the information passing at different rounds of expansions. Therefore, the chain generation rules are correct. cl Example 5. We examine the chain generation for the recursions in Examples 2 and 3. The chain predicate for (A,) is A(xi_ , , xi) since A(x, x,) is the only nonrecursive predicate at the first expansion (S + T = 0 + 1 = l), which corresponds to the only DV-set (x}, x, is in the ST-set and x is in the S-set. The 2 chains for (C,) are “AB(xi_, , xi)” and “BA(yi_ ,, yi)” where “AB(x, x,)t A(x, t), B(x,, t)” and “BA(y, y,)+-A@, y,), B(t, y)“. This is because S + T = 0 + 2 = 2, “A (x, ,y, ), B(x,, y,)” corresponds to the DV-set {x}, and “A(x,, y,), B(x,, y)” to {y}. Also, the S-set {x, y} and the ST-set {x,, y2} can be renamed and indexed accordingly. The chain for (0,) is AB(z,_, , zi) where “AB(z, z,) c A(t, z), B(t, z,)“. This is because S := 1, T = 1, and the candidate set is “A(x,, z), B(x,, 2,)“. However, B(x*, z2) is not U-connected to the DV-set in (D,) and is thus replaced by the corresponding B-predicate, B(x,,z,), of the first expansion. Thus, the chain is “A(x, , z), B(x, , z,)“, and the corresponding variables in the S-set and the ST-set are {z > and {z, >. Thus z -P z-_ , and z, + zi, and x, is ignored in the chain predicate AB. The chain for (F,) is AB(x,_,, yi_,,xi, yi) where “AB(x, y,x,, y,)+ A(x,x,, y), B(y, y,)“. This is because S = 0, T = 1, and the candidate set “A (x, x, , y), B( y, y,)” is the chain (U-connected and corresponding to the DV-set {x, y}). Also, renaming and indexing lead to x --) xi_, , y -+ yi_ , in the S-set, and x, + xi, y, + yi in the ST-set. 0 According to the above discussion, the compiled form of a single-unit linear recursion can be generated automatically by extracting chain predicate(s) from the (S + T)th expansion. The algorithm is as follows. Algorithm 2. Generation
of the compiled form for a single-unit linear recursion.
Input. A linear recursion R, its stable level S and period T. Output. The compiled form of the recursion. Method Case 1: T = 0. The recursion is bounded and the compiled form is the union of the expanded exit rules from 0th to Sth expansions. That is: B&,x2,...,
x,)=EO(x,,xz uEs(x,,xz,.
,...,
x,)~E,(x,,x~
,...,
x,)u...
. . ,x,1.
Case 2: T # 0. The compiled form for the recursion can be generated as follows: If a recursion contains only null chain predicates (trivially true), it is bounded and its compiled form is the union of the kth expanded exit rules for 0 < k < S + T - 1. That is: R(x,,x,
,...,
x,)=&(x,,x~
,...,
uEs+r-,(x,,x*,...,x,).
~,)uE,(x,,x~
,...,
x,)u*.*
310
J~WEIHAN and
KANGSHENGZENG
Otherwise, the recursion is a single- or multiple-chain recursion with the following compiled form: ig0 (MM, CC’, TT)
, >
which consists of (iii) chain-portion The SS-portion of Ek for k from
4 portions: (i) prestable exit rule portion (SS); (.‘) ii miscellaneous portion (MM); (CC); and (iv) stable exit rule portion (TT). represents the rule expansion before reaching its stable stage, which is the union 0 to S - 1 if S > 0 or empty otherwise. That is:
The TT-portion consists of the bodies of the exit rules contributing to the period of the recursion, which is formed by the union from E0 to ET_, . That is: TT = ;u; q(. . . , ui, . . . , xi, . . .). The chain-portion CC consists of a set of nonnull chains in the exponential form with the same exponent i. Each nonnull chain predicate is generated following the chain-generating rules and is in the form of A (xi_, , xi), where A is the chain predicate, and xi_ I and xi are connection variable vectors. The formula consists of a set of unions starting from i = 0 to infinity. The variable indices outside of the chain predicates should be set accordingly based on i = 1. Finally, the miscellaneous portion, MM, if any, is composed of the predicate(s) left in the (S + T)th expansion, i.e. those not used in the formation of the chain predicate(s). Cl Theorem 2. Algorithm 2 generates correct compiled forms for single-unit linear recursions. Proof From Lemma 4, if T = 0, the recursion is bounded. Its (S + 1)st expanded rule is absorbed by the Sth expansion. Obviously, the union of the expanded exit rule from 0th to Sth expansion generated by Algorithm 2 is the correct compiled form. In the case of T # 0, if the set of the variables at the (S + T)th expansion is the same as the set of distinguished variables of the head predicate, the corresponding potential chain is trivially true, which is the null chain-predicate. If a recursion contains only null chain predicates, the recursion must be bounded. Thus the formula generated by Algorithm 2 is correct. Otherwise, it is easy to show for any integer K 3 0, the Kth expanded exit rule is included in the compiled form generated by Algorithm 2. When 0 < K c S, the Kth expanded exit rule is in the SS-portion. When K > S, let j = (K - S) div T and h = (K - S) mod T (0 < h < T - 1). The term (CC, E,,, MM) in the compiled form is the Kth expanded exit rule where Eh is in the TT-portion, and the chain portion is CC with the period T (i.e. every (j x T) expansions form a new chain), according to Lemma 6. The predicates which do not form part of the chain, if any, form the miscellaneous portion, MM. The MM-portion will not grow along the expansions and thus should not be in the exponential form. Since all the generated predicates are included in the compiled form, and the formula does not contain those not generated by the expansions, theorem is proved. 0 Example 6. According to Algorithm 2, the recursions (A,) through (F,) in Example 2 generate the following compiled forms. For (A,), S = 0 and T = 1. Its SS-portion is 4, its TT-portion is “E,,(xi)“, and its MM-portion is trivially true. It has one potential chain with the chain-predicate “A(xi_, , xi)“. Therefore, its compiled form is (A,). (A,)
R(xo) = ,,QO(Ai(xi- Iv Xi), E,(Xi))*
For (B,), S = 1 and T = 0. The recursion is bounded, and the compiled form is (B,), the union of the exit rule and the first expanded exit rule. (J$)
R(x)=E,(x)uE,(x)=E,(x)u(A(x,x’),E,(x,)).
Automatic generation of compiled forms for linear recursions X
y
w
t
z
Y
b.tJ
Ix,t)
w
z,
(x,r)
(Y,W)(Y,W)
311
ro1
u4
{I4
t21
(X,f)
Fig. 4. The expansions of the V-matrix of rule (C,).
For (C,), S = 0 and T = 2. Its SS-portion is 4, its TT-portion is “EO(xi, yi) u E, (xi, y,)“, and its MM-portion is trivially true. It has 2 chains with the chain-predicates extracted from (C,), i.e. “AB(x, x,) + A(x, t), B(x,, r)” and “BACv, y,) t A(t, y,), B(t, y)“. The compiled form is (C,). (c,)
R(xC19 Y0) = ilJOtABiCxi- I,xi),BA~~i-I,Yi),(E,(xj,Yi>uE,(xi,Yi))).
For (O,), S = 1 and T = 1. Its SS-portion is “&(x, y, z)“, its TT-portion is “E,,(u, zi, 0)“. It has I chain with the chain predicate, “AB(z,, zl) + A(t, z,), B(t, z,)“. Its MM-portion is “A (x, y), B(u, u)“. Thus the compiled form is (D,).
For (F,), S = 0 and T = 1. Its SS-portion is 4, its TT-portion is “EO(xi, yi)“, and its MM-portion is trivially true. It has one chain with the chain predicate “A&x, y, xl, y,) + A (x, x, , y), BCV,y,)“. Thus the compiled form is (F,). (Fc)
5.
0
R(x,,Y,)=i~~(ABi(xi-,,Yi-,,x,,Yi),4(xi,Yi)).
COMPILATION
OF MULTIPLE-UNIT
LINEAR
RECURSIONS
The V-matrix expansion technique for compiling single-unit linear recursions can be generalized to multiple-unit linear recursions. Although the V-matrix expansion rules and the V-matrix expansion algorithm for single-unit linear recursions can be applied directly to multiple-unit linear recursions, it may lead to a relatively large number of expansions. Suppose there are k units in the recursion, the minimum number of expansions to reach row repeating stage in the V-matrix should be the least common multiplier of the period of each unit, that is, lcm (T, , T,, . . . , Tk), where T, (for j from 1 to k) is the period of the jth units. For example, if a V-matrix consists of 3 unit matrices V,, V, and V,, with S, = S, = S3 = 0, T, = 5, T2 = 6 and T3 = 7, it will take T = lcm(5,6,7) = 210 expansions to find a repeating row in the combined V-matrix. Obviously, this is undesirable. Fortunately, since each V-matrix unit reaches its own stable state independent of other units, the stability of a recursion can be determined by the examination of each unit independent of others. For this example, if the 3 unit V-matrices are expanded separately, it will take only 7 expansions to detect the regularity of the expansion patterns and compile the multiple-unit rule. Definition. A linear recursion whose recursive rule corresponds to a multiple unit V-matrix reaches a stable stage at the Sth expansion if for every unit V-matrix Vi, either no new results can be generated from this unit on any EDBs (i.e. bounded for this unit), or every further Ti expansions add to the body of the rule the same set of EDB predicates connected to the same set of variables in the unit. 7. The expansion regularity of the recursion with the recursive rule (G,) of Example 2 can be obtained by the examination of each V-matrix unit. The V-matrix consists of 3 units, V, headed by (i.e. in row [0]) [x, y, w, t], V, by [z] and I’, by [u, v]. According to Algorithm 1, V, has S, = 0 and T, = 2; V, has S, = 1 and T2 = 0; and V, has S, = 0 and T3 = 1 as shown in Fig. 4. Therefore, their combined stable level should be S = maximum (S, , S,, S,) = 1, with T, = 2; T2 = 0; and T,= 1. cl
Example
JMWEIHANand KANGSHENG ZENG
312
In general, the algorithm for the expansion of a multiple-unit V-matrix is summarized below. Algorithm 3. The expansions of a multiple-unit V-matrix. Input. An initial V-matrix V which is partitioned into k unit V-matrices, V, , . . . , V,. Output. A stable level S of the V-matrix and the period Ti (1 < i < k) for each unit V-matrix Vi. Method For each unit V-matrix y., derive its Si and K based on Algorithm (S,, * . . , Sk), and each unit & maintains its own period Ti.
1. Then S = maximum Cl
Theorem 3. If the V-matrix of a multiple-unit recursive rule consists of k unit matrices, with the ith V-matrix having the stable level Si and the period Ti, then the expansion becomes stable at the Sth expansion where S = maximum (S,, . . . , S,), and each unit Vi maintains its own period Ti. Proof Suppose Si < S, where Si is the stable level of the ith unit-V-matrix Vi, and Sj is that of the jth unit-V-matrix, V;.. At the ith expansion, Vi stabilizes. For Vi, we have row [S,] = row [Si + TJ. According to Lemma 2, row [Si + (S - S,)] = row [Si + Ti + (q - S,)]. That is, for Vi, row [Sj] = row [Sj + Ti]. Therefore, at the jth expansion, Vi still repeats at every Ti further expansions. At the S = maximum (S, , . . . , S,), every unit V-matrix repeats with its corresponding period. Thus we have the theorem. cl From Theorems 1 and 3, we have Corollary. In a multiple-unit linear recursion, the number of expansions to reach the stable stage is less than the arity of its recursive predicate. Similarly, compiled forms can be generated for multiple-unit linear recursions. We examine an example. Example 8. In recursion G,,S = 1, T, = 2, T2 = 0 and T3 = 1. G, can be viewed as if there were 3 independent single-unit linear recursions: G,, , G,2 and Cu. (G, ) (G) (G,)
R, (x, Y, w, t) + R, (Y, x, , t, w, ), A (x, x, , t), B(w, w, ). Rz(~)+Rz(z,), C(z). R,(u,v)+R,(u,,u,),D(u,v,u,).
Each unit is compiled independently. For G,, with the header “R,(x, yo, w, to)“, it has 2 chains with chain predicates, “AB( yi_ , , wi_ , , yi, wi)” and “BA (Si_ , , ti_ , , si, ti)” respectively, where AB(Y,, WI,YZPwz)+A(Y,9Yz, WI), &w,, WZ). BA(s,, t,, sz, fd +A@, , sz, fd, W,, By aligning to S = 1, its TT-portion “E,, (x, yo, w, to)“, and its MM-portion form is (G,,). (G,,)
R,(KYo, w>to)+Eo,(x,yo, BAYsi-1,
td.
is “Eo,(y,,s,, ti, wi)u E,,(yi,si, ti, wi)“, its SS-portion is is “A (x, so, to), B(w, wo)“. Therefore, its aligned compiled
w, to) uigo (A( xv SOP to),B(w,wo),
AB'(Yi-,v
Wi-lv.Yiv Wi),
fi-l,Sit ti),(Eo,(Yi, Sip ti,wi)uJ'l,(.Yt, Si, ti,Wi)))*
For Glz, it is a bounded recursion with R2 = E. u E, as shown in (G,,). (Gz~) &(z) = ~o,(z)uE,,(z,),
C(z).
For G13, its S, = 0, T3 = 1, and it has 1 chain with the chain predicate where D’(Ui_,,Ui-,,Ui,Vi)+D(Ui-,,Vi_,tUi),
“D’(,_,
, vi_,
, Uiy Vi)”
(Ui=Ui)*
Notice that the chain predicate contains 2 distinguished variables. However, the predicate D consists of only 3 arguments. Thus an auxiliary variable vi is introduced to the predicate with the
Automatic generation of compiled forms for linear recursions
313
constraint “vi = ui)‘. Moreover, its TT-portion is “EoJ(ui, vi)“, SS-portion is 4, and its MM-portion is trivially true. Therefore, its compiled form is (G,,).
The general compiled form for recursion G, can be derived by merging the three independent compiled forms. Since S = 1, T, = 2, T2 = 0 and T3 = 1, all the independent compiled forms should be aligned to a common S and Tin order to generate a combined compiled form. Clearly, S should be the common S, and T should be the least common multiplier of the nonzero T;s (for i from 1 to the number of units). For this example, T = Icm(2, 1) = 2. (G,,) is in the aligned form already. (G,,) is a bounded recursion, which is the same form for any expansion greater than S. (G3() is aligned to (G;,). (G;,)
R,(u, 0) = K,,(u, u)u ,Q (D’( ~0, ~O,UO),DD~(U~-I,V,-I,U~,V~),(E~,(U~,V~)UE,,(~~,V~))), where DD(U~-~,V;_~,U,,V,)~D’(U~_~,V;-~,S,~),D’(S,~,~~~~~)~
(Gc)
R(x,yo,w,to,z,u,v)=~o(x,yo,w,to,z,u,v)u
Therefore, the combined compiled form for recursion G, should be (Cc), where the SS-portion is E,, the TT-portion is E. u E, , the chain portion consists of *B-chain, BA -chain and DD -chain, and the MM-portion consists of “A(x, so, to), B(y, yO), C(z), D’(u, v, uo, vo)“. cl In general, we have Algorithm 4. Generation
of the compiled form for a multi-unit linear recursion.
Input. A multiple-unit linear recursion R, its stable level S and the period Ti (1 < i G k) for each unit Vi. Output. The compiled form of the recursion. Method Step 1. For each unit Vi, generate its compiled form Ri according to Algorithm 2; Step 2. Generate the aligned compiled form for each unit Vi based on the common stable level S and the common period T, where S = maximum (S,, . . . , Sk) and T = lcm(T, , . . . , Tk); Step 3. Merge the multiple aligned compiled forms into one combined compiled form in which: (i) the SS-portion consists of the union of E, to Es_, if S > 0 or empty otherwise; (ii) the TT-portion consists of the union of E,‘s for i from 0 to T - 1; (iii) the chain-portion consists of all the nonnull chains, with each chain predicate determined within its unit and then aligned up for merging. All the chain predicates are in the exponential form with the same exponent i, and each variable connected to the set of distinguished variables is in the form of xi_, for a distinct x, and that connected to the set of variables in the recursive predicate is in the form of xi; and (iv) the MM-portion consists of the predicates at the (S + T)th expansion which does not participate in the chain predicates. 0 Theorem 4. Algorithm 4 generates correct compiled forms for multiple-unit linear recursions. Proof. Step 1, the generation of compiled form for each component unit, follows Algorithm 2. Its correctness has been proved by Theorem 2. The correctness of Step 2, the generation of the aligned compiled form for each unit based on the common stable level S and the common period T
JIAWEI HAN and KANGSHENG ZENG
314
is correct since S = maximum (S, , . . . , Sk), T = icm(T, , . . . , Tk), and the compiled form simply rewrites its original form by different grouping. Step 3 merges the multiple aligned compiled forms into one combined compiled form. Its SS-portion consists of the union of all the S portions up to the common stable level S. Its TT-portion consists of the union of Eis for i from 0 to T - 1. Its chain-portion consists of all the nonnull chains, and its MM-portion consists of the predicates at the (S + T)th expansion. Therefore, the algorithm generates correct compiled forms for multiple-unit linear recursions. 0 6. FROM
COMPILED FORM TO LINEAR NORMAL A RULE REWRITING PROCESS
FORM:
To facilitate the evaluation of compiled linear recursions by the Magic Sets method or other rule rewriting techniques, the V-matrix-based compilation can be viewed alternatively as a query-independent rule rewriting process, which transforms a complex linear recursion into a normalized linear recursion.
Definitions. A recursion is in linear normal form (LNF) if it consists of a set of exit rules and at most 1 normalized recursive rule in the form of (N, ), where xi and yi (for 1 Q i < n) are variable vectors, and each Ci (for 1 < i < n) is a chain predicate. Notice that a chain-predicate Ci for some i may be null in the sence that there is no Ci predicate and yi = xi. The normalization of a linear recursion is the process of transforming a linear recursion into its equivalent LNF. (N,)
P(%,-%
. . ..X.)tC,(X,,Y1),C2(X*,Y*),...,Cn(X,,y,),
P(Y,,Y,,...,Y”).
The compiled form of a linear recursion is either a bounded or an n-chain recursion. A compiled bounded recursion is already in LNF since it consists of only nonrecursive rules. A compiled n-chain recursion can be written into LNF based on the following LNF transformation rules. LNF transformation
rules
The compiled form of an n-chain recursion consists of 4 portions: (i) prestable exit rule portion (SS); (ii) miscellaneous portion (MM); (iii) chain-portion (CC); and (iv) stable exit rule portion (TT). It corresponds to LNF composed by the following set of rules. (1) A set of prestable exist rules in the form of “R + S”, where S is a disjunct in the SS-portion of the compiled form; (2) one auxiliary rule in the form of “R t M, P”, where A4 is the set of predicates in the MM-portion which do not share variables with the TT-portion at the ith expansion, and P is an auxiliary predicate; (3) one normalized linear recursive rule in the form of (N, ) where C, , . . . , C, is a set of chain predicates which form the CC-portion of the compiled form; and (4) a set of stable exit rules in the form of “P t M’, T”, where T is a disjunction in the TT-portion of the compiled form, and M’ is the set of remaining predicates in the MM-portion. Theorem 5. A compiled n-chain recursion can be transformed transformation
into LNF based on the above LNF
rules.
According to the compilation process presented in Sections 4 and 5, the compiled form of an n-chain recursion consists of the four portions: SS, MM, CC and TT, in the form of
Proof.
R = SS u
,
iuO (MM, CC’, TT) >
where CC’ is in the form of “Cl, C:, . . . , C:” for an n-chain recursion. Based on the equivalence between logic and relational expressions [I 11,the !‘S-portion is equivalent to a set of prestable rules; the set of predicates in the MM-portion, M, if nc t sharing variables with the TT-portion, should be isolated from the recursive predicate, thus we have “R t M, P” where P is an auxiliary predicate; the remaining portions (CCi, T ‘,M’) rtihould be grouped together to define P; and
315
Automatic generation of compiled forms for linear recursions
obviously, the exit-portion of P should be TT and M’. Thus the compiled form is equivalent to 0 the above four sets of the transformed rules. The LNF transformation
is illustrated in the following 2 examples.
Example 9. From Example 6, the compiled form of the recursion {(D,), (0,)) of Example 2 is (D,).
(DC) Nx,
Y,
zo)=
E,(x,Y,ZO)U~~~(‘(X,Y), ‘(4 v), AB’(Zi-lvZi),&(u, Ziy u)),
Of the 2 predicates in the MM-portion,
A (x, y) does not share variables with the TT-portion rules, the compiled form can be in which the only recursive rule
E,,(u, zi, u), but B(u, Y) does. Following the LNF transformation transformed into a set of rewritten rules {(II,), (D;), (PO), (P,)} is (P, ), a normalized l-chain rule.
(Do) W, Y, z)+-Eo(x,Y, 2). VW R(x,y,z)tA(x,y),P(z). V’o) P(z)+-Eok z, u), W, 0). (PI) Example
P(z)+- AW, t), P(r).
IO. The compiled form of the recursion {(Go), (G,))
(GJ
cl is (G,).
R(x,~~,w,t,,z,u,~)=E,(x,~,,w,r,,z,u,u)u~~~(A(x,s,,r,),B(w,w,), C(z)~ABi(Yi-~~Wi-1rYi~Wi)~BAi(~i-~~~i-l~Si~fi)~~‘(~~~~~~~~~)~ DD’(%-
13Ui- 13W> vi), (&(yi,
Sip tip Wi, ZI 3ui, vi) WE, (yip Si, tip Wi, ~29ui, vi))).
Since all the 4 predicates in the MM-portion, A(x, s,, to), B(w, wO),C(z) and D’(u, u, u,,, u,,), do not share variables with the TT-portion at the ith expansion, they should be presented in the auxiliary rule. Thus, the recursion can be transformed into a set of rewritten rules {(Go), (G;), (PO), (PI )I in which the only recursive rule is (P,), a normalized 3-chain rule. (Go) (G;) (PO) (P,)
7. STRENGTH
R(x,~o,~,to,z,u,u)cEo(x,~o,w,to,z,u,~). W,Y,, w, to, z, u, 0) + A(4 so, to), B(w, wo), C(z), wu, u, uo, uo), my,, wo, so, to, uo, 00). P(Y, w, s, t, u, v) + E(Y, s, t, w, ~1, u, 0); ~3 (Y, s,t, w, ~2, u, 0). P(Y, w, s, 4 u, u)+ fwy, w, Yl, w), BA(s, t, SI, t,), DW9 6 UI, 4), P(Y, 3 WI, 31, fl, UI 7v,). AND
LIMITATIONS
OF
EXPANSION-BASED
cl
COMPILATION
The expansion-based compilation method compiles complex linear recursions into highly regular compiled forms, or alternatively, linear normal forms, which provide precise chain connection information in the forms of relational expressions or normalized rules for detailed query analysis and optimization. Moreover, it provides extra strength on some bindings which are difficult to be captured otherwise. Capture of more bindings for ejficient query evaluation
Regular binding propagation techniques, such as magic rule rewriting, can propagate sufficient binding information into recursive rules for many recursions. However, it may encounter difficulties for certain recursions. Example II. We examine the binding propagation of magic rule rewriting method [2, 1I] for a query “?- R(a, y, c)” on the linear recursion defined by the rule set {(II,), (E,)} where a and c are’ constants. Following the binding propagation rules [l l] and the Magic rule rewriting method [2,3], the goal node is adorned as R6fb. The bindings in the adorned goal node are propagated to the subgoal R in the body of the recursive rule, resulting in an adorned subgoal node Rfbf, which are in turn
JIAWEI
316
HANand KANGSIJENG ZENG
to the subgoal R in the body of the recursive rule at the next expansion, resulting in Rfff cannot propagate any binding information to the body of the rule. The adorned rules, (D;) and (D i2) represent such binding propagation, where (D;) is an adorned rule of (0,) with respect to the adorned goal node Rbfb, (D;,) is an adorned rule unifving directly (without renaming variables or modifying adornments) with (D;) at the second expansion. propagated
R*ff. Obviously,
(D;)
Rb’b(~, Y, z) + A b+, y), RYb%, , z, zI ), Bbb(x,, zI 1.
UT,)
Rfbfh,
z, ~1) + Afbk,,
z), R’%,
21, zd, Bbb(xz, zd.
Since there is no binding information which can be passed further to the subgoal R in (D;r), the Magic Set involves the entire relation A. Clearly, the binding propagation cannot reduce the set of data to be examined in the semi-naive evaluation. The subgoals in the body of the rule may be reordered in search for better rewritten rules [l 11. It is easy to verify that every reordering which avoids the generation of the uninstantiated subgoal RfJ’ will introduce uninstantiated subgoals, which requires iterative processing of some uninstantiated (entire) data relation. However, based on the V-matrix expansion-based compilation, a highly-regular compiled form (D,) is derived. According to its LNF presented in Example 9, the constants a and c in the query “?-R(a, y, c)” can be easily pushed into the rule (D ;) as “A (a, y), T(c)“. T(c) is a closed query which can be evaluated fairly efficiently by an existence-checking algorithm [6], that is, the evaluation terminates the first time when T(c) is evaluated to true. cl Towards quantitative analysis of recursive queries.
The highly regular compiled forms derived from the expansion-based compilation facilitates quantitative analysis and evaluation of deductive queries. First, when a complex recursion is compiled to a bounded or a chain form, it is straightforward to select an appropriate algorithm from a set of candidate query processing algorithms. A bounded recursion requires a nonrecursive query processing algorithm [23]. A single-shain recursion requires a partial transitive closure algorithm [24,25]. A multi-chain recursion can be evaluated by Magic Sets, Counting [l] or other proposed methods [4,7,8,26]. Many application-oriented recursion problems, though in complex forms, can be compiled into single-chain forms and be evaluated by partial transitive closure algorithms. Compilation makes Counting applicable to complex linear recursions and facilitates multi-way counting [6] for complex queries. Secondly, the compiled forms provide precise chain connection information which facilitates the derivation of flexible and efficient processing plans for complex queries. Without compilation, it is difficult to tell how many chains in a recursion R(x, , . . . , x,) and which distinguished variables belong to which chain(s). Compilation makes explicit the role of each variable in a recursive predicate. Suppose a query provides instantiations on both xi and xi. If xi and xi are at the same end of a chain, both instantiations should be pushed in for better selection. However, if xi and xi belong to different chains or are at the different ends of a chain, their selectivities should be compared and, possibly, only the more selective one should be pushed in for efficient evaluation. Example
12. Let x and yr-birth-x (x’s birth year) be at one end of a chain and y at the other end in a recursion Ancestor (x, yr-birth-x, y). For the query: jind those in DB group who were born after 1960 and who has some ancestor(s) born in Canada. A question mark “?” is put in front of x to indicate that only x is inquired in the query. ?-Ancestor(.?w,
yr_birth_x,
y), DBgroup(x),
yr-birth_x
> 1960, PlaceJirth(y)
= Canada.
Obviously, the instantiations DBgroup(x) and “yr_birth_x > 1960” should be pushed in but not = Canada”. Moreover, since the user is interested only in x (not y), an existencechecking evaluation algorithm (find-jirst instead of find-all) should be used, and the search following each x terminates as soon as one of x’s ancestors is found. Such query binding analysis can be easily performed when the recursion is normalized. 0 “Place-birth(y)
Thirdly, compilation facilitates constraint-based query evaluation. The above example shows the push of those constraints associated with the start end of a chain. Care should be taken for the constraints not associated with the start end. Although they can be used at the end of the chain
317
Automatic generation of compiled forms for linear recursions
processing, it is beneficial to push them in as early as possible. This can be accomplished by the analysis of integrity constraints in the database. Example
13. Let us examine the query: find John’s ancestors who were born in the 19th century. ?-Ancestor(john,
y), Yr_birth(y,
yrbirth),
yr-birth > 1800, yrbirth
< 1900.
Obviously, “john” is more selective than “born in the 19th-century” for the Parent-chain in Ancestor(x, y). The processing should start at the end where the constant “john” resides. Since John’s ancestors reside at the other end of the chain, the query constraints “Yrbirth(y, yrbirth), yrbirth > 1800, yr-birth < 1900” should be treated differently. However, since parents are older than their children (an integrity constraint), that is, yrbirth of y is an argument whose values decrease monotonically along the chain processing, the query constraints at the other end of the chain can be selectively pushed into the chain for early constraint enforcement. The general rule for pushing query constraints at the other end of the chain is that if a constraint can be used to block the growth or shrinkage of the values of a monotonic argument, it (or its transformed form) can still be pushed into the chain for iterative processing. In the example, the constraint, yr&rth > 1800, blocks the shrinkage of the values of a monotonic argument yr-hirth. Therefore, “Person (y, yrbirth), yr-birth B 1800” can be pushed into the chain for iterative processing. However, the constraint, yr-birth < 1900, which cannot block the shrinkage of the values of the argument, should not be pushed into the chain during the iterative processing. 0 Integrity constraint can be discovered based on the characteristics of data and rules. The use of compiled forms and constraint information in the analysis of deductive queries leads to constraintbased query evaluation in deductive databases [27]. In general, a deductive query may contain a set of recursive and nonrecursive predicates. A query planner should consider the cost in the evaluation of both nonrecursive and recursive predicates. By compilation of a linear recursion into chain forms, the cost of evaluating a recursive predicate can be estimated relatively precisely based on EDB statistics, the available access paths, the available query constraints, integrity constraints, types of queries, connections between EDB predicates within each compiled chain, and connections among recursive and nonrecursive predicates. The cost estimation can be performed when a query is submitted to the system. Cost estimation can be assisted by experimentation or based on the execution history of recursive predicates with different instantiations and constraints. Therefore, a sub-optimal query evaluation plan can be generated based on the execution space, cost model and different search strategies [23]. The query optimization process is very much like accessing plan generation in relational databases. Quantitative analysis can be performed relatively precisely on the compiled forms. Therefore, compilation is a powerful preprocessing technique for query evaluation in deductive databases. Constants in linear recursions: relaxation of Assumption 2
Assumption 2 in Section 2 confines our discussion to be constant-free linear recursions. However, this assumption can be easily relaxed. A set of rules with constants in nonrecursive predicates can be transformed into a set of equivalent constant-free rules by substituting those nonrecursive predicates with a set of new, constant-free predicates using selection(s) and/or projection(s) on the original ones. A constant in a recursive predicate may affect the expansion behavior of a recursion. There are several cases to be examined. Notice that it is difficult to find semantically meaningful linear recursions with constants in recursive predicates. Case 1. Constants are at the same argument position in both the body and the head of the recursive predicate.
There are two subcases to be considered: (1) two constants at the same argument position of the recursive predicate in the body and the head are the same; and (2) the two constant are not the same. In subcase (l), the recursion can be transformed into an equivalent, constant-free recursion by projecting-off the constant from the recursive predicate. In subcase (2), the recursive rule is essentially a nonrecursive rule.
318
JUWFIHANand KANGSHENG ZENG
Example 14. Let a recursion Ant be defined as below, Anc(x, y, z) t Parent (x, y), Birth_pZuce(y, z). Anc(x, y, cunadu) t Purent(x, w), Anc(w, y, Canada). It can be transformed
into an equivalent, constant-free
recursion,
Anc(x, y, z) t Purent(x, y), Birth_pZuce(y, z). Anc(x, y, Canada) t Ant, (x, y). Ant, (x, y) c Parent (x, w), Ant, (w, y). Ant, (x, y) t Parent (x, y), Birth_pluce( y, Canada). On the other hand, a recursive rule below Anc,(x, y, north-america) t Purent(x, w), Anc,(w, y, Canada). is equivalent to a nonrecursive rule since the constant northdmerica Canada, and no iterative processing can proceed.
will not change back to 0
Case 2. A constant is in the head but not in the corresponding position in the body. In this case, the first evaluation of the recursive rule coerces the value at that argument position of the recursive predicate to the same constant. Further evaluation is then similar to Case 1. Thus, it can still be transformed into an equivalent, constant-free recursion. Example 15. Let the recursion Ant be defined as below, Anc(x, y, z) t Parent (x, y), Birth_pluce(y, z). Anc(x, y, earth) t Purent(x, w), Anc(w, y, z). It can be transformed
into an equivalent, constant-free
recursion,
Anc(x, y, z) t Purent(x, y), Birth_place(y, z). Anc(x, y, earth) t Ant, (x, y). Ant, (x, y) t Parent(x, w), Ant, (w, y). Ant, (x, y) + Parent(x, y), Birth_pluce( y, z).
Cl
Case 3. A constant is in the recursive predicate in the body but not in the corresponding position in the head. In this case, the rule confines the recursive predicate in further expansions to be the same constant at the argument position. Thus, the variable at the position can still be eliminated, and the recursion can still be transformed into an equivalent, constant-free recursion. Example 16. Let the recursion R be defined as below, R(x, Y, z) + EGG Y, z). R(x,y,z)+A(x,
It can be transformed
w,z),R(wY,c).
into an equivalent, constant-free
recursion,
R(x, Y, z) + W,
Y, z).
W,y,z)+A(x,
~,z),E(wy,~).
W,Y,Z)+&(X,Y),Z
=c.
4 (xv Y) + A (x, w, ~1, R, h 4 k
Y) + E(x, Y, ~1.
Y ). cl
Limitations: an optimization problem Our compilation method has a limitation on the further optimization of the compiled recursions. When a linear recursion is compiled into an n-chain recursion using the expansion-based
Automatic generation of compiled forms for linear recursions
319
compilation algorithm, it is possible that the compiled form can sometimes be further optimized and simplified to an m-recursion for some m < n (it becomes a bounded recursion when m = 0). In this sense, the n-chain recursion obtained from the compilation process is an “at most” n-chain recursion. The conditions for bounded linear recursions presented in the expansion-based compilation algorithm are only sulhcient conditions. In general, it is a difficult problem to further simplify the compiled recursion. We examine one such example. Example 17. According to the expansion-based compiled into (Af).
compilation method, the recursion {(A,), (A,)} is
(A,) (4)
R(x)+A(x,x,), R(x,). R(x) c A (x9 XI).
(A:)
R(x)=i~,A’(xi-13xi).
However, a careful examination of the compiled form may discover that (A:) is equivalent to a simpler formula (A:), a formula derived from the exit rule only. (A :)
R(x) = A (x, x,).
This is because the formula generated by the second expansion, “A (x, x, ), A (x, , x,)“, enforces a stronger restriction on the derivable values of x than the formula “A (x, x, )” and thus cannot produce any new answers. This is also true for further expansions. This phenomenon was first discovered by Naughton and Sagiv [20]. In fact, if the predicate in the exit rule were different from the nonrecursive predicate in the recursive rule, such as “R(x) + B(x, x,)“, where A is not contained in B, the recursion should still be a typical single-chain recursion. However, A and B could consist of a sequence of conjunctive predicates. It is a well-known NP-complete problem to test the containment mapping for complex conjunctive queries [l 1,20,28]. q Clearly, the compilation technique developed in this study cannot detect the boundedness for the above cases. However, it is an NP-complete or undecidable problem to test for containment mapping and perform further optimization on the compiled forms [20,28]. Fortunately, without such optimization, regular recursive query evaluation algorithms still terminate and derive all the answers for such recursions in deductive databases. The inability of performing such further optimization is a limitation of our technique. However, it does not forbid the successful application of our compilation and query evaluation techniques to most linear recursive query processing problems.
8. CONCLUSIONS A V-matrix expansion-based compilation method is developed which constructs a variable connection graph-matrix, the V-matrix, for a linear recursion and simulates recursions by V-matrix expansions. The regularity (stability and periodicity) of linear recursions can be discovered by V-matrix expansions. Based on such regularity, compiled forms in either bounded or highly regular chain forms can be automatically generated for complex linear recursions. Alternatively, a query-independent rule rewriting process can be applied to the compiled n-chain recursions to transform them into equivalent normalized n-chain recursions. The compilation of linear recursions into highly regular relational expressions facilitates the quantitative analysis of recursive queries and generation of efficient query evaluation plans. Moreover, some bindings which are difficult to be captured by other techniques can be captured naturally by the expansion-based compilation technique. Therefore, the automatic generation of compiled forms for linear recursions is a powerful tool for the analysis and evaluation of complex linear recursions in deductive databases. The V-matrix expansion-based compilation method has been implemented on the UNIX system in Simon Fraser University. The software, the V-compiler, is written in C+ + using LEX and YACC. It generates highly regular forms for complex linear recursions.
320
JIAWEIHAN and KANGSHENG ZENG
Many database-oriented application examples belong to relatively simple recursions: recursions in chain rules, such as ancestor, supervisor, same-generation, etc. Nevertheless, there are many interesting application programs which involve complex recursions with arithmetic operations and function symbols, such as list processing, air-flight reservation, network traversing, inventory management, etc. [29]. Interestingly, a recursion with function symbols can be transformed into its function-free counterpart by function-predicate transformation [30]. Thus, the compilation technique developed here can also be applied to the compilation and evaluation of complex linear recursions with function symbols in deductive databases. Our study is confined to linear recursions. The technique can be applied to a mutual recursion with single resolution cycle (i.e. single cycle in a dependency graph [31]). It is an interesting research issue to generalize the expansion based compilation technique to the recursions with multiple resolution cycles, such as multiple linear recursions [32,33], nonlinear recursions [34] and complex mutual recursions in deductive databases. Acknowledgemenrs-The work was supported in part by the Natural Sciences and Engineering Research Council of Canada under operating grant A-3723 and a research grant from Centre for Systems Science of Simon Fraser University. The authors would like to thank Tong Lu and Ju Wu for their implementation of the V-matrix-based compilation technique on the UNIX system. Also, the authors would like to express their thanks to Lawrence J. Henschen, Michael Kifer, Raghu Ramakrishnan, Kotagiri Ramamohanarao, Yehoshua Sagiv, Shalom Tsur and Carlo Zaniolo for their helpful discussions on the binding propagation problems.
REFERENCES 111F. Bancilhon, D. Maier, Y. Sagiv and J. D. Ullman. Magic Sets and other strange ways to implement Logic Programs. In Proc. 5th ACM Symp. Principles of Database Systems, Cambridge, MA, pp. l-15 (1986).
Proc. 1986 DC, pp. 16-52 (1986): _ [31 C. Beeri and R. Ramakrishnan. On the power of Magic. In Proc. 6th ACM Symp. Principles ofDatabase Systems, San Dieao. CA, UP. 269-283 (1987). [41 F. Bry. Query evaluation in recursive databases: bottom-up and top-down reconciled. In Deductive and Object-Oriented Databases (Edited by W. Kim, J.-M. Nicolas and S. Nishio), pp. 25-44. North Holland (1990). PI D. Chimenti, R. Gamboa, R. Krishnamurthy, S. Naqvi, S. Tsur and C. Zaniolo. The LDL system prototype. IEEE
121F. Bancilhon and R. Ramakrishnan. An amateur’s introduction to recursive query processing strategies. In ACM-SIGMOD
Conf. Management of Data, Washington,
Trans. Knowl. Data Engng 2(l), 76-90
(1990).
PI J. Han. Multi-way counting method. Information Systems U(3), 219-229 (1989). _ -_ queries in recursive first-order databases. J. ACM 31(l), 47-85 [71 L. J. Henschen and S. Naovi. On compiling (1984). PI M. Kifer and E. L. Lozinskii. On compile-time query optimization in deductive databases by means of static filtering. ACM Trans. Database Svst. 15(3). 385-426
(1990).
191 R. Ramakrishnan. Magic~templates: a spellbinding approach to logic programs. In Proc. Inr. Co@ Logic Programming, Seattle, WA, pp. 140-159 (1988). 1101 H. Seki. On the power of alexander templets. In Proc. 8th ACM Symp. Principles of Database Systems, Philadelphia, PA, pp. 150-159 (1989). 1111J. D. Ullman. Bottom-up beats top-down for datalog. In Proc. 8th ACM Symp. Principles of Database Systems, Philadelphia, PA, pp. 140-149 (1989). WI Y. E. Ioannidis. A time bound on the materialization of some recursively defined views. In Proc. 11th Int. Conf: Very Large Data Bases, Stockholm, Sweden, pp. 219-226 (1985). iI31 Y. Saaiv. Optimizing datalog programs. In Foundations of Deductive Databases and Logic Programming (Edited by J. Milker), pp. 6591698. Morgan-Kaufmann (1988). _ _ _ Principles _ of- Database Systems, San Diego, CA, 1141J. F. Nauahton. One-sided recursions. In Proc. 6th ACM Svmp. pp. 340-3i8 (1987). 1151H. V. Jagadish, R. Agrawal and L. Ness. A study of transitive closure as a recursion mechanism. In Proc. 1987 ACM-SIGMOD Conf. Management of Data, San Francisco, CA, pp. 331-344 (1987). WI R. Agrawal and P. Devanbu. Moving selections into linear least-fixpoint queries. In Proc. 4th Int. Conf: Data Engineering, Los Angeles, CA, pp. 452-461 (1988). [I71 C. Youn, L. J. Henschen and J. Han. Classification of recursive formulas in deductive databases. In Proc. 1988 ACM-SIGMOD Co@ Management of Data, Chicago, IL, pp. 320-328 (1988). J. Han. Compiling general linear recursions by variable connection graph analysis. Computat. Intell. S(l), 12-31 (1989). t::; A. Al-Sukairi and L. J. Henschen. Query independent compilation of linear recursions. In Proc. 1990 Inr. Conf: Software Engineering and Knowledge Engineering, Chicago, IL, pp. 177-182 (1990). .1201 , J. F. Nauehton and Y. Saniv. A decidable class of bounded recursions. In Proc. 6th ACM Symp. Principles of Database Systems, ?lan Diego, CA,-pp. 214-226 (1987). [21] L. Sterling and E. Shapiro. The Art of Prolog. MIT Press (1986). [22] C. Beeri, P. Kanellakis, F. Bancilhon and R. Ramakrishnan. Bounds on the propagation of selection into Logic Programs. In Proc. 6th ACM Symp. Principles of Database Systems, San Diego, CA, pp. 214-226 (1987). [23] R. Krishnamurthy and C. Zaniolo. Optimization in a logic based language for knowledge and data intensive applications. In Proc. Int. Conf. of Extending Database Technology @DBT’88), Venice, Italy, pp. 16-33 (1988). [24] Y. E. Ioannidis and R. Ramakrishnan. Efficient transitive closure algorithms. In Proc. 14th Int. Co@ Very Large Data Bases, Long Beach, CA, pp. 382-394 (1988).
Automatic generation of compiled forms for linear recursions
321
[25] B. Jiang. A suitable algorithm for computing partial transitive closures. In Proc. 6th Inr. Conf. Data Engineering, Los Angeles, CA, pp. 264-271 (1990). [26] L. Vieille. Recursive axioms in deductive databases: the query/s&query approach. In Proc. fsl Int. Conf. Experf Database Systems, Charleston, SC, pp. 179-193 (1986). [27] J. Han. Constraint-based reasoning in deductive databases. In Proc. 7th Inf. Conf. Data Engineering, Kobe, Japan, pp. 257-265 (1991). [28] M. Y. Vardi. Decidability and undecidability results for boundedness of linear recursive queries. In Proc 7rh ACM Symp. Principles of Database Sysrems, Austin, TX, pp. 341-351 (1988). [29] S. Tsur. Deductive databases in action. In Proc. 10th ACM Symp. Principles of Database Systems, Denver, CO, pp. 142-153 (1991). [30] J. Han and Q. Wang, Evaluation of functional linear recursions: a compilation approach. Information Systems 16(4), 463-469 (1991). [31] J. D. Ullman. Principles
of Darabaseand Knowledge-Ease Systems, Vol. 2. Computer Science Press, Rockville, MD (1989). [32] J. Han and L. Liu. Efficient evaluation of multiple linear recursions. IEEE Trans. Software Engng 17(12), 1241- 1252 (1991). [33] J. F. Naughton, R. Ramakrishnan, Y. Sagiv and J. D. Ullman. Efficient evaluation of right-, left, and multi-linear rules. In Proc. 1989 ACM-SIGMOD Co@ Management of Data, Portland, OR, pp. 235-242 (1989). [34] Y. E. loannidis and E. Wong. Towards an algebraic theory of recursion. J. ACM 38(2), 329-381 (1991). APPENDIX The
Proof of Theorem 1
Theorem 1. In a single-unit recursive rule of ariry n, rhe expansion of its V-matrix terminates at or before rhe nrh ireration. Thar is, S + T Q n, where S is rhe srable level and T is the period of rhe V-matrix. Proof. Let x, (where 1 d i < n) be a distinguished variable at the ith argument position in the head predicate of the recursive rule and n@) (where 1 d k Q n) be a nondistinguished variable at the corresponding argument position of the recursive predicate in the body. The initial V-matrix can be represented as in Table Al, where &xl) is a mapping of x, to the corresponding column of row [1] following the V-matrix initialization and expansion rules. The mapping results in either a set of distinguished variables derived from xi or a nondistinguish variable .$I). Since the same matrix expansion rules apply to the expansions of other rows in the V-matrix, the element in the column i of row [i] can be considered as obtained by applying the mapping rules to the same column of row [j - 11, i.e. pbj-‘(xi)) = p’(xi). The theorem can be proved by induction on n. First, the theorem is valid when n = 1. This is because there are only 2 cases when n = 1: (1) 11(x,) =x,, and (2) ‘I). Obviously in Case (1) S = 0 and T = 1; and in Case (2) S = 1 and T = 0. In both cases, S + T = 1. The &)=a, theorem is valid. ’ Suppose the theorem is valid when n = 1,2, . . . , I. We prove that the theorem is valid for n = I + 1. The proof is partitioned into two parts.
Part I: There are no U-connected distinguished variables in row [O].It can be divided into 2 cases: Case (1): I # I, for any i #j (i, j = 1,2, . , , I + 1); and Case (2): p(x,) = I, for some i fj (i, j = 1,2,. , I + 1). We prove each case as follows. Case 1: Case 1 can be proved by considering the following three circumstances. = x, for some i (i = 1,2, . . . , I + 1). It is easy to show by contradiction that the circumstance is impossible. Without loss of generality, suppose i = 1. That is, I = x, . We have the situation as shown in Table A2. Since this is a single-unit recursive rule, there must exist some j (j = 2,3, . , I + 1) such that p(xj) = x, . This contradicts the condition r(xi) # I, for any i #j. = ur’. Since this (2) p(x$) = ui” for some i (i = 1,2,. . , I + 1). Without loss of generality, suppose i = 1. That is I is a single-unit recursive rule, there must exist some j (j = 2,3, . , I + 1) such that I = x, Suppose j = 2. For the same reason, there must exist some k (k = 3,. . , I + 1) such that I = x2. Suppose k = 3. Thus, we have rows [0] and [I] as shown in Table A3. Following the matrix expansion rules, we have row [l + 1] as shown above where row [I + 1] contains no distinguished variables. Thus, we have S = I + 1, T = 0, and S f T = I + 1. (3) p(x,) = x, where i #j for some i, j (i, j = 1,2, . , I + 1). Without loss of generality, suppose i = 1 and j = 2. Then what can p(x*) be? It cannot be x2 nor u$‘)as discussed before. Nor can it be x, unless I = 1 based on the single-unit matrix assumption and Case (1) assumption &xi) Z&x,), for any i fj. So /1(x2) must be x,, (h = 3, , I + 1).
(1) a
Table Al
Table A2
Table A3 101 [Jl r21
x2 XI 11) aI
X3 X1 XI
... ... ..
PI
XI
[I]
x,
$2) 1:: p;:::,)
, %+I XI XI-1
Table A4
JIAWEIHAN
322
and
KANGSHENGZENG
Table A5
Table A6
PI [)I
XI
l4-4
[/+I]
PI+“’‘(x,)
x2
PW p’+ ‘(X1)
Table A7
Xl
A4 .
“’
...
‘.’ /I”‘@,)
XI+1
PI
a(++,) ..,
PI
XI_;
0
r(x,)
I((%)
x3
.”
P(%)
.”
xi+ I &/,I)
Without loss of generality, let h = 3. The initial matrix becomes that shown in Table A4. For the same reason, we have &) = x,, &) = x5, etc. Therefore, we have the initial matrix shown in Table AS. By matrix expansions, row[I+l]~row[O].Thus,S=O,T=I+landS+T=I+1. Case t: Case 2 can be proved by induction on its sub-matrix. We show that the theorem is valid when r(xJ = I for some i #j (i, j = 1.2, . . , I + 1). Without loss of generality, suppose i = 1 and j = 2. The matrix becomes that shown in Table A6 Notice that at most I different distinguished variables are contained in row [I]. Assume that some distinguished variable xj is absent. Then xi will not appear in the V-sub-matrix headed by b(x,),p(x,), . ,p(x,+,)]. Ignore the column ~(x ) = u(I) if any. According to the assumption of induction for the V-submatrix, there exist 2 integers i and j (1 h i
There are U-connected distinguished variables in row [O]. We will show that the theorem holds for n = I + 1 when there are U-connected distingushed variables in row [O].Without loss of generality, assume that x, is U-connected with x2 (see Table A7). According to the matrix expansion rules, the distinguished variables x, and x2 will always appear in the row [i] (i = 1,2, , I) as in one set. The set variable {x,, x2} can be served as a new distinguished variable. Note that there are at most I different distinguished variables contained in row [l]. That is, either one of elements in row [I] is a nondistinguished variable (for example, I = u(I)) or there exist 2 elements such that p(x,) = p(xj). When ~(x ) = u(‘) the element can at most serve as a bridge (when it has U-connections with other nondistinguished variables as iown’ in 0,) to pass the distinguished variables to other elements. Once the pass is done, it will not play any role of binding passing in future expansions. When I = p(xj), the 2 columns i and j can be treated as one in further expansions. Therefore, after the second expansion, the first row (row [0]) and the ith column can be deleted from the V-matrix, which results in a V-submatrix. That is, the row [l] in the original V-matrix can serve as the 0th row of the submatrix, and can be treated as I new distinguished variables. According to the induction assumption, the new {x,,x*J,xJ,~..>x/+I V-submatrix will be in the situation that either a row repeats some previous rows or the whole row contains no distinguished variables at no more than I expansions. Thus, the original matrix will have the same situation at no more than I + 1 expansions. Therefore, the theorem is proved. 0