Data & Knowledge Engineering North-Holland
1 (1985) 139-154
_
139
A relational language with deductions, functions and recursions S. Misbah PRECI United
Dnrabase Kingdom
DEEN Research
Project.
Department
of Computing
Scierlce,
University
of Abcrdeer!,
Aberdeen,
Abstract. We propose here a ,relational algebra capable of deducing tuples from a premise expressed in an extended relational form: it also supports user-defined and recursive functions in the form of parameterised views, and provides a facility for easier specification of queries. The paper shows the power of the language in dealing with problems such as ancestors, part explosions and connected tours. This language, called DEAL, is an enhanced version of the PRECI Algebraic Language (PAL) but presented here in a SQL-like syntax. Keywords: Deductive queries, deductive relational languages, deductive databases, intelligent database queries.
1. Introduction A deductive database is defined as “a database in which new facts may be derived from facts that were explicitly introduced” [lo]. Correspondingly a deductive language may be viewed as one that allows such a derivation facility. The absence of this facility in traditional database languages-notwithstanding some of their recent enhancements [2,4,18,20]-makes intelligent database processing difficult. This has led many researchers to advocate [15,22] and indeed develop Prolog interfaces to databases [ll, 121. The advantage of Prolog [5] and more generally logic programming [14], lies in its flexibility to deal with a large number of rules and analogously expressed facts, although its ability to handle a very large number of facts in secondary storage, typically represented by a database, remains unproven. This motivated us to explore the possibility of extending a relational language to provide a deductive capability. The work presented here represents a step in that direction; it includes user-defined and’ recursive functions along with a facility to deduce tuples from a given premise, the premise being expressed in the form of a relational expression. The vehicle we chose is the PRECI Algebraic Language (PAL) which is a relational algebra implemented in the PRECI database system [8,9] at Aberdeen. Our investigation has led us to design what we call a Deductive Algebra or DEAL which is presented here, on an SQL-like [3] syntax. The plan of this paper is as follows: In Section 2 we introduce our relational operations and functions. Deductive tuples are discussed in Section 3 and macros and functions in Section 4. Recursive processing including examples of ancestors, part explosions and connected tours is presented in Section 5, followed by a conclusion in Section 6. The examples given in the paper are drawn mainly from the following relations: (1) EMP(EN0 ENAME SAL MN0 JOB EDN) where relation EMP contains employee number, employee name, salary, manager number, job title and employee’s department number; (2) DEPT(DN0 DNAME TOTSAL LOC) where relation DEPT contains department number, name, total salary and location; 0169-023X/8.5/$3.30 @ 1985, Elsevier Science Publishers B.V. (North-Holland)
S.M.
140
Dcen
I DEAL
(3) PARENT(DAD MUM PNAME SEX) where relation PARENT contains father’s name, mother’s name, person name and person’s sex; (4) LIKE(PNAME OBJECT) where relation LIKE contains person and object liked by the person. 2. Relational operations 2.1. Basic operations
The basic form of a relational query expression in DEAL optional parts): [SELECT FROM] [WHERE
is (the square brackets imply the
(attribute list, functions or expressions) (relational expression) (predicate)]
The relational expression in the FROM clause can be a relation name, any valid relational expression, ‘including a query expression possibly with a SELECT . . FROM . . WHERE . . block. This is major departure SQL which permits only existing relation(s) in the FROM clause. The relational operators supported in the FROM clause are join (*), outer join (Y;?), Cartesian product (**), union (++), outer union (-i-?), difference (--). The WHERE clause permits a universal quantifier ALL, a special quantifier ONLY and a number of functions mentioned later. DEAL also has a number of database update commands for insertion, deletion and replacement of tuples in permanent relations, not discussed here. For a query to list the employee and department name of employees in Paris or London, we may write in DEAL SELECT ENAME, DNAME FROM EMP[EDN] * [DNO] DEPT WHERE LOC = “Paris” OR LOC = “London” EDN and DNO above are the respective common domains in the natural join operation. As in SQL, a query can be nested to an arbitrary depth, however for economy, we may sometimes express a SELECT. . FROM. . WHERE. . block on a single line, rather than over several lines. The following two formulations are equivalent in DEAL (i). (ii)
/ implies SELECT T:= SELECT* FROM EMP T : = FROM EMP
all tuples as in SQL/:
both of which assign relation EMP to a pre-defined temporary relation T. DEAL permits assignments to not only temporary relations but also to basic variables and attributes. The assignment-expression (i.e. the right-hand side) can be a conditional expression, with a potentially nested IF. . THEN . . ELSE . . clause, as shown in some examples later. 2.2. Data definitions
and functions
We support two data definition commands, static data definition and dynamic data definition, both of which can be used to define parameterised views, ,as shown later. The form
S.M.
of static data definition
Deen
I DEAL
141
is
S==E
where S is called a virtual variable and E is an assignment-expression which may contain ordinary and other virtual variables. During compilation S is replaced everywhere it occurs in a query by the assignment-expression E. Therefore S does not exist after compilation, it being bound to E sfufically (that is during the compilation time). We may refer to S as a statically defined virtual variable or stat variable for short. The other form of data definition permits dynamically defined virtual variables, (dyn variables for short), to be specified as.
where D is a virtual variable, to be evaluated from assignment-expression E during the execution time. This is sometimes known as deferred binding, and has been referred to as lazy evaluation in [2]. D is thus dynamically bound to E, to be evaluated when it occurs in an assignment command, e.g. T:=D
A virtual variable (stat or dyn) can act as a relation, array, scalar or Boolean variable, depending on its right-hand side, and it can have optional arguments for the evaluation of the right-hand side-like a function, as discussed in Section 3. In addition to the five statistical functions COUNT, SUM, AVG, MAX and MIN, DEAL permits the following other functions: GEN CONC DISPLAY ORDER EXT and REP SEQ
to to to to to to
generate a constant relation, concatenate attributes, display result on the screen, order tuples, alter columns, hold incremental tuple number.
A constant relation may be defined with one or more sets of tuples separated by ] (vertical bar). For instance a constant binary relation can be assigned to a suitably defined temporary relation T(X, Y) as T : = GEN(X Y X Y
= = = =
(assignment-expression), (assignment-expression) (assignment-expression) (assignment-expression))
] ]
where the right-hand side has three sets of tuples, with attribute Y in the second and X in the third set missing. A set will contain only one tuple if the expressions for both X and Y in that set yield single values, otherwise the set will be the Cartesian product of the two columns of values. An exainple of relation T above with a single tuple per set can be TX
Y) 5 7 ?
10.2 ? 6.3
where ? represents null values.
142
S.M.
Deerr
I DEAL
The next pair of functions is for the alterations of columns [7], EXT creating a new column and REP replacing an old column. Their basic formats are: R [EXT (C = (assignment-expression))] R [REP C BY B [ = (assignment-expression)]] In EXT, a new column C, defined by the assignment-expression, is temporarily added to relation R, while in REP, column C in relation R is replaced by column B where B is a pre-existing column or is defined by an assignment-expression. For instance, to add a column for the departmental average salary of secretaries in the EMP relation, we may write S==EMP EXT(AVSAL
= AVG(SAL
BY EDN WHERE
JOB = “secretary”))
Stat variable S will now represent the EMP relation extended by a new column AVSAL containing in each EMP tuple the average salary of secretaries in his department. The BY and WHERE clauses are the counterparts of the GROUP BY and HAVING clauses of SQL. However, each statistical function in DEAL can have its own BY/WHERE pair, without being rationed to the single GROUP BY/HAVING pair of a SQL block, and additionally the DEAL constructs avoid some of the SQL problems mentioned in [6]. Returning to the example above, S will represent the EMP relation extended by a new column AVSAL; S can be used like any other relation, and can be assigned to a temporary relation which has all the attributes of relation EMP plus the new attribute AVSAL. To replace a column, suppose we wish to change “Londres” to “London” in relation DEPT: S= =DEPT REP(LOC
BY SITE = IF (LOC = “Londres”) THEN “London” ELSE LOC)
Column SITE will have all the values of column LOC, except “Londres”-if presentchanged to “London”. This will create a virtual relation S(DN0, DNAME TOTSAL, SITE). The assignment-expressions in EXT and REP may contain statistical functions and arithmetical expression; if it is multi-valued, then the Cartesian product rule stated earlier in connection with the GEN function will also apply here. Strictly speaking SEQ is not a function, but a special identifier which can be used in a predicate as T := EMP WHERE
SEQ c 3
where only the first three Employee tuples - in a system-defined order - will be copied to T. An alternative use of SEQ can be in an EXT function as: T:= EMP EXT(TN0 where a new column TN0
= SEQ) with incremental
tuple numbers will be generated for T.
S.M.
Deea
I DEAL
143
3. Deductive tuples,
We may consider two types of information for databases: factual and deductive (alternatively, extensional and intensional). The factual information is typically stored in the permanent relations of the database, whereas the deductive information is derived from the database using deductive rules. In DEAL, we permit the generation of virtual relations for such deductive information; these relations are not permanently stored, but only generated during run-time. A deductive tuple can be viewed as a tuple of a constant relation by an assignmentexpression; thus “Tony likes anyone who likes wine” (from relation LIKE) can be represented by a stat variable S acting as a stat relation S = = GEN(PNAME = “Tony”, OBJECT = SELECT PNAME FROM LIKE WHERE LIKE.OBJECT = “Wine”) WHERE OBJECT NOT = NULL OBJECT in the outer WHERE clause refers to the generated relation and prevents the generation of a tuple with OBJECT = NULL if Tony likes no one. If OBJECT has more than one value, the generated relation would be a Cartesian product of columns PNAME and OBJECT. The answer as it stands will include Tony likes Tony if he likes wine. We can define a new stat relation V as V==LIKE++S where V represents all the tuples of LIKE and S. The content of V is not stored, but generated each time V appears on the right-hand side of an assignment. Further discussion on deductive tuples is deferred to the end of the next section. 4. Macros and functions
Stat and dyn variables can act like macros and functions respectively, and can be evaluated for different parameter values. We can use a stat variable S to define an expression, say, to find the parents of Andrew: S = = SELECT DAD, MUM FROM PARENT WHERE PNAME = “Andrew”
(1)
and then we can write any of these u==s
(2)
V : = SELECT DAD FROM S WHERE MUM = “Sheila”
(3)
or
Expression (3) is meaningful only if there are. several “Andrews”. The compiler does. not evaluate S but replaces it in (2) and’ (3) by the right-hand side of (l), e.g. (3) will be
S.M. Deer1 I DEAL
144
V : = SELECT DAD FROM(SELECT DAD, MUM FROM PARENT WHERE PNAME = “Andrew”) WHERE MUM = “Sheila” which a good compiler [21] could change to V : = SELECT DAD FROM PARENT WHERE PNAME = “Andrew”
AND
MUM = “Sheila”
(4)
It is possible to specify arguments in stat variables but these arguments are replaced by their values during the compilation in the event of an assignment. For instance, we can define S(X) = = SELECT DAD, MUM FROM PARENT WHERE PNAME = X
(5)
Then we may write U := S(“Andrew”)
or
U : = S(Y)
where Y is a variable holding a value such as “Andrew”. The compiler will substitute S by the right-hand side of (5), with X = ‘&Andrew” or Y. In contrast to a stat variable, a dyn variable does not disappear after compilation; instead it is retained and is executed each time it appears on the right-hand side of an assignment. The main differences between dyn and stat variables are: (i) A dyn variable can be compiled like a function in Fortran, to be evaluated later for different parameter values. Therefore it is possible to create a library of compiled dyn variables to be invoked subsequently by different user programs with appropriate parameter values. (ii) A dyn variable provides a basis for recursive processing as will be shown in the next section. A stat variable can be optimised statically (during compilation) whereas a dyn variable requires at least some elements of dynamic optimisation (during execution), with necessarily a higher execution overhead. Therefore, stat variables should be used in preference to dyn variables whenever a macro, optionally with parameters, will do the job. We present below two further queries based on stat relations. Query 4.1. Is Jane a sister of Andrew? SELECT TVAL FROM PARENT: A [DAD, MUM] * [DAD, MUM] PARENT: B EXT(TVAL = IF (A.PNAME = “Andrew” AND B.PNAME = “Jane” AND B.SEX = “F”) THEN “Y”, ELSE “N”) As in SQL, A and B in the FROM clause above are labels of relation PARENT which is joined with itself over the common composite column of DAD and MUM. A new attribute truth
S.M.
Deen
I DEAL
14s
value (TVAL) is created which contains “Y” if the assertion is true and “N” otherwise. (SQL uses an EXISTS clause to ascertain truth values.) We can generalize this query with a stat relation (that is, a stat variable acting as a relation) to check if Y is a sister of X. V (X, Y) = = SELECT TVAL FROM PARENT: A [DAD, MUM] * [DAD, MUM] PARENT: EXT(TVAL = IF (A.PNAME = X AND B.PNAME = Y AND B.SEX = “F”) THEN “Y”, ELSE “N”)
B
We can rewrite the above query with a Boolean stat variable SISTER: SISTER = = A.PNAME
= X AND B.PNAME
= Y AND B.SEX = “F”
and then, V (X, Y) = = SELECT TVAL FROM PARENT: A [DAD, MUM] *[DAD, MUM] PARENT: EXT(TVAL = IF (SISTER) THEN “Y”, ELSE “N”)
B
A virtual variable can thus be used to hold a predicate, optionally with arguments. To evaluate V for for Andrew and Jane, we may write T : = V( “Andrew”,
“Jane”)
Query 4.2. Does Pat like everybody? This is a ambiguous question, as we do not know what ‘everybody’ means. It could be for instance EVERYBODY
= = SELECT
PNAME
FROM PARENT
EVERYBODY
= = (SELECT PNAME FROM PARENT) + + (SELECT DAD FROM PARENT) + -I- (SELECT MUM FROM PARENT)
or
where EVERYBODY is a virtual variable. The definition of everybody will depend on whether a closed or an open world approach [16, 17, 191 is adopted. If we agree to treat the database as a closed world, then ‘everybody’ can be defined by one of the options above. Once we have agreed on the definiton of ‘everybody’ we can write SELECT TVAL FROM LIKE EXT(TVAL
= IF(PNAME = “Pat” THEN “Y” ELSE “N”)
AND OBJECT = ALL EVERYBODY)
146
S.M.
Deer1
I DEAL
where ALL is a universal quantifier. The use of EVERYBODY is not essential, as it can be replaced by the appropriate right-hand side. However, we may define a stat relation as S(X, Y) = = SELECT TVAL FROM LIKE EXT(TVAL
Then we have the flexibility T : = S(“Pat”,
= IF (PNAME THEN “Y” ELSE “N”)
= X AND OBJECT = ALL Y)
of evaluating S as we like, e.g.
EVERYBODY)
We can‘use the concepts of macros and functions for deductive tuples as well. For instance, we can generalise the deductive rule given in the previous section by replacing Tony and wine with arguments X and Y respectively, and by writing S as S(X, Y); thus V(X, Y) = = LIKE + + S(X, Y) This will allow us to evaluate S or V for any pair of (X, Y) values. Now consider another example, say relation TXY containing TXY( X Tony Jim Fiona
Y) Wine Apple Wine
such that X likes anyone who likes Y. We can generate S(X, Y) for each pair of these arguments separately, and then union the resultant S(X, Y)‘s to produce V(X, Y). Alternatively, we can define a virtual relation SS as: SS==TXY REP(X BY PNAME) REP(Y BY OBJECT = SELECT OBJECT = Y)
PNAME
FROM LIKE
WHERE
OBJECT in the WHERE clause refers to the LIKE relation. If desired, the NOT NULL condition can be added. Owing to the Cartesian product rule stated earlier, each PNAME value in SS will generate n tuples, if the corresponding OBJECT gets n values. The second REP command above could have been replaced by an appropriate EXT command, but in that case, column Y must be excluded from SS by a projective operation. Either way, SS will represent all the tuples generated from the three (X, Y) pairs of TXY, unlike S(X, Y) which represents the tuples of only a single pair of arguments. It is up to the user to define a suitable virtual relation. However, once defined, a virtual relation can be used like any other relation in deduction rules. Suppose we have a rule that states “anyone who likes Sara is tall”. We can define a stat relation, say HEIGHT, using V(X, Y) given earlier, as HEIGHT(X,
Y, Z) = = GEN(PNAME
= (SELECT PNAME FROM V(X, Y) WHERE OBJECT = Z), HITE = “Tall”)
S.M. Deen I DEAL
147
We have used an argument Z in place of Sara. If V(X, Y) is already evaluated to, say a temporary relation T, then HEIGHT can be redefined with T, but without arguments X and Y. The expression on HEIGHT above embodies what in Prolog will be regarded as the unification technique, although syntaxes are very different in the two languages. In contrast to Prolog where rules are unified through the matching values of the common attribute names specified by the user for the purpose, unification in DEAL is achieved either through a SELECT as shown above or through the common domain of the natural join operation, either operation being specified by the user [22]. In the above example, since V is a stat relation, it will get substituted successively during compilation until no more stat relations are left on the right-hand side. To evaluate HEIGHT all these steps have to be retraced by the system in a correct sequence. Thus the unification order in DEAL is embedded in the definition of each rule, although determined eventually by an optimiser if present; in contrast rules in Prolog are unified principally in order to their appearance. Backtracking is not needed in DEAL, since all the tuples of a relation are processed together, rather than individually at different times.
5. Recursive processing
The basis for using a dyn relation in recursive processing is its execution time binding’which permits repeated evaluation of a dyn relation with a new set of tuples. The form of a typical expression may be given as: D(M, X) tf(D(M
- 1, X))
where D(M, X), with a termination argument M and an initialisation argument X, represents the tuples of a level (or instance) of D, obtained as a function of the previous level of D. But to evaluate the previous level D(M - 1, X), we must call D(M - 2, X) and substitute D(M 1, X) by f(D(M - 2, X)). Thus the process of substitution continues until the user-supplied initial values of D are found. This recursive function, like most, if not all, recursive functions [l], can be interpreted and implemented as an iterative process, where instead of repeated calls with appropriate arguments, the function is iterated from the initial (lowest) level until the user-specified level is reached. Therefore, it follows that, if we interpret our ‘self-defined’ dyn relations as essentially iterative, then we can use them to represent not only recursive functions through iteration, but also iterative functions. We are likely to adopt this interpretation for implementation, its advantages have been briefly stated in the Conclusion. However, given the definition of a dyn relation D, we can union all its levels through another dyn relation, say DSUM, as DSUM(N,
X) tDSUM(N
- 1, X) + + D(N, X)
assuming we have initialised DSUM appropriately. Alternatively we can use a special function Union, as UNION(D(N, X)) to union all the levels of D up to N. If D is non-recursive or an ordinary relation, then the Union function will return D. A dyn relation may take any number of arguments which can be attributes, relations and other functions, and its right-hand side may contain any relational expression including other dyn relations, statistical and other functions, the WHERE clause and the IF THEN ELSE clause. To find the ancestors of X in the male line (that is, the father and mother of X and the fathers and mothers of the successive fathers of X), we may define a dyn relation as
S.M.
148
Deal
I DEAL
DYN D(*M, X) D(0, X) tGEN(DAD = X, MUM = NULL) D(M, X)+-SELECT DAD, MUM FROM PARENT WHERE PNAME IN SELECT DAD FROM D(M - 1, X) END We use ‘6’ to identify the termination argument(s) in a DYN entry. Function GEN above generates a relation with a single tuple, DAD = X, and MUM = NULL. An equivalent definition of this function by an IF clause has the traditional form: D(M, X) +IF (M = 0) THEN GEN(DAD = X, MUM = NULL) ELSE SELECT DAD, MUM etc. We can evaluate D for the first five generations of Andrew’s T := UNION(D(5,
“Andrew”))
WHERE
ancestors by writing:
DAD # “Andrew”
The WHERE clause excludes D(0, X) which contains Andrew’s name as Dad. (Note that we could have defined D differently without D(0, X) as done in B(N, X) given later.) An explicit dyn relation can also be defined for the union as shown below: DYN DSUM(*N, X) DSUM(0, X) : = NULL DSUM(N, X := DSUM(N END
- 1, X) + +,D(N, X)
Now the assignment statement T : = DSUM(5,
“Andrew”)
will yield the first five level ancestors of Andrew. (Note D(0, X) is excluded from DSUM by setting DSUM(0, X) := NULL.) This type of explicit function for union is useful if several different dyn relations are unioned together. The recursive expressions given above are always terminated by the user-supplied value of the termination argument. Special termination conditions can also be stipulated by an UNTIL statement, for instance DYN B(*N, X) B(l, X) *SELECT DAD, MUM FROM PARENT WHERE PNAME = X UNTIL (B(N, X) = NULL) B(N, X+SELECT DAD, MUM FROM PARENT WHERE PNAME IN SELECT DAD FROM B(N - 1, X) END
S.M. Deen I DEAL
UNTIL acts like a ‘white’ loop. If the current iteration of B (that to null (that is cardinality zero), then the loop will be terminated However, if there is no previous iteration, the final N would B(N, X) will represent the last generation of the ancestors of T : = UNION(B(
149
is the Nth with the be null. X. If we
iteration) evaluates final N set to N - 1. Thus the resultant write
10, “Andrew”))
then the recursion will terminate before N = 10 is reached if Andrew has less than ten generations of ancestors in relation PARENT. However to find all the ancestors of Andrew, we may write T := UNION(B(hv,
“Andrew”))
where we assume hv to represent a High Value. If we define another dyn relation BSUM(X) +-UNION(B(hv, “Andrew”)), then we can find all the ancestors of Andrew also by T : = BSUM( “Andrew”) An equivalent function for all the ancestors of X in the male line can be defined without a termination argument as DYN C(X) C(X) *SELECT DAD, MUM FROM PARENT WHERE PNAME = X UNTIL (C(X) = NULL) C(X) -SELECT DAD, MUM FROM PARENT WHERE PNAME IN SELECT DAD FROM C(X) END This is identical with B(N, X) except for the absence of the integer argument N. Thus C(X) in the UNTIL condition is the C of the current level, whereas C(X) in the FROM clause is obtained from the previous level. Strictly speaking C(X) is an iterative rather than a recursive function since X, the only argument of C, does not change with levels; that is, C cannot be evaluated using repeated calls to C. However, this does not matter in our case as we intend to implement all ‘self-defined’ dyn relations by iteration. The only use of C is to find all the ancestors of X, but unlike B(N, X), it cannot yield the generation number, since the relevant incremental integer argument is not defined. Let us consider another example - say to find all the ancestors (including those given in the earlier examples) of X. We can write DYN ANC(+N, X) ANC (1, X) +SELECT DAD, MUM, PNAME FROM PARENT WHERE PNAME = X ANC(N, X) +SELECT DAD, MUM, PNAME FROM PARENT
150
S.M.
Deen
I DEAL
WHERE
AND
(PNAME IN (SELECT DAD FROM ANC(N - 1, X)) + + (SELECT MUM FROM ANC(N - 1, X))) (PNAME NOT IN SELECT PNAME FROM UNION(ANC(N - 1)))
END The evaluation of ancestors (including those in the earlier examples) will usually lead to an infinite loop if a person name is the same as his/her ancestor’s name, but the condition PNAME NOT IN . . . above prevents it. We consider below two further problems, namely part explosion and connected tours, both to be solved by recursive dyn relations. 5.1. Part explosion problem
To examine the processing of a part explosion problem, let us consider two relations PART(PN COST QST) and LINK(SUP INF QTY) representing a part explosion tree. A PART record contains part number (PN), its assembly cost (COST) excluding the cost of its inferior parts, and quantity in stock (QST). There is a LINK record for each superior/inferior pair, each record containing superior part number (SUP), its immediate inferior part number (INF) and the quantity (QTY) of the inferior part number in that superior part number. We may now define a part explosion tree PT as DYN PT(*N, X) PT(0, X) +GEN(INF = X, TOTQ := 1) UNTIL (PT(N, X) = NULL) PT(N, X) +SELECT INF, TOTQ FROM (LINK WHERE SUP IN SELECT INF FROM PT(N - 1, X)) EXT(TOTQ = LINK.QTY * PT(N - 1, X).TOTQ) END First the GEN function creates a tuple in PT for the wanted part X with TOTQ (total number of X’s in X) as 1. The recursive expression PT then produces tuples for all the successive inferior part numbers from the topmost part X, along with the total quantity (TOTQ) of each inferior part in X. To do this, each qualified tuple of the LINK relation is extended by TOTQ which is obtained as the product of QTY of the inferior part number (in this LINK tuple) by the TOTQ of its immediately superior part number (which was produced by the previous iteration, that is PT(N - 1, X), and hence PT(N - 1, X).TOTQ). Finally the inferior part number (INF) and their total quantity (TOTQ) are projected for PT. To calculate the total cost of producing part X, we may define another dyn relation as: DYN TCOST(X) TCOST(X)
+SELECT SUM(TC) FROM UNION(PT(hv, X)) [INF] * [PN] PART EXT(TC = TOTQ *COST)
END where hv is assumed to stand for a High Value. Attribute COST is obtained from PART through the join of UNION(PT) and PART over part numbers (INF and PN), and employed
KM.
Deen
I DEAL
151
to create the extended attribute TC, which gives the total cost of all the quantities of this inferior part number in X. The SUM of all TCs gives the final total cost TCOST. Note we could have defined a single function for PT and TCOST together. Using TCOST we can find the cost of any superior part A, by writing TOTCOST
: = TCOST( “A”)
To produce level numbers (LL), branch numbers (BR) and tuple numbers (TNO), we need to redefine PT as DYN RT( * N, X) RT(0, X) tGEN(INF = X, TOTQ = 1, LL = 0, BR = 0) UNTIL (PT(N, X) = NULL) RT(N, X) +RT(N - 1, X) + + SELECT INF, TOTQ, LL, BR FROM PT(N, X) EXT(LL = N) EXT(BR = SEQ) END where the final RT represents the union of all the iterations. We have re-used PT(N, X) above instead of defining RT from scratch. SEQ generates tuple numbers from 1 in each iteration and hence provides the branch numbers at each level. All functions, WHERE clauses, etc. specified on the right-hand side of a recursive/iterative dyn relation operate on each iteration separately; for instance AVG will calculate the average in each iteration in isolation. Likewise the incremental tuple numbers for SEQ are generated from 1 in every iteration. Therefore, to find the tuple number irrespective of iterations, we have to use RT as T : = RT( hv, “A”) EXT(TN0 = SEQ) where T will contain TNO. Alternatively,
we could have defined another dyn relation:
DYN PL(X) PL(X) +RT(hv, X) EXT(TN0 = SEQ) END To find part A and its first three subordinate
parts, we may write
T : = RT(hv, “A”) WHERE SEQ < 3 or T : = PL( “A”) WHERE
TN0 s 3
Likewise to find all the subordinate T : = pL(
“A”)
WHERE
LL s 3
part numbers in the first three levels
S.M. Deett I DEAL
152
Finally to list only the lowest level components
of A with quantities (TOTQ)
LLC : = UNION(PT(hv, “A”)) WHERE INF IN ((SELECT PN FROM PART) - - (SELECT SUP FROM LINK)) where (SELECT PN FROM PART) - - (SELECT level components in the inventory.
SUP FROM
LINK)
yields the lowest
5.2. Tours from a city We consider here the problem of finding all (through connected routes, which can form (START END) containing start city (START) flight between two cities. We have instantiated in the adjoining figure the cycles they form:
FLIGHT
(START A A A B B B C D
END) B C D C P A A A
the cities that can be reached from a given city cycles). Let us assume a relation FLIGHTand destination city (END) for each non-stop the relation with some tuples, and also shown
A+
-B
2 r\,/ .
Cycles (note that all paths are not reversible)
To find all the cities that can be reached from a given city, we define a dyn relation TOUR as: DYN TOUR(*N, X) TOUR( 1, X) +-FLIGHT WHERE START = X UNTIL (TOUR(N, X) = NULL) TOUR(N, X)+-FLIGHT WHERE (START IN SELECT END FROM TOUR(N AND (START NOT IN SELECT START FROM UNION(TOUR(N - 1, X))) END
- 1, X))
This function does not retrace the same path even if there are loops or bidirectional routes. In this respect, it is similar to the function ANC on ancestors given earlier, except that it terminates automatically when there are no more cities left to be processed.
S.M.
Deen
153
I DEAL
6. Conclusion We have presented above a number of relational constructs which can be used to facilitate some advanced database queries. DEAL offers deductive tuples, parameterised macros, ordinary and recursive/iterative functions, and a potential for knowledge representation, through use of stat and dyn variables for complex expressions. It also supports the first order predicate calculus in full, which is a superset of Horn clauses used in Prolog. Queries can be generalised in the form of parameterised virtual relations, to be declared in the subschema (or its equivalent) for an earlier query validation and also to hide the details from the users who would be able to invoke these virtual relations,’ with appropriate parameter valuesand treat them like other relations. This facility is expected to enhance programmers’ productivity. DEAL is intended as a high level database language for intelligent query processing, but not as a stand-alone language for writing systems, ,including intelligent systems, where a lower level language, such as Prolog, would be more appropriate. Thus it is primarily aimed at the relational database users who require an extended facility, without the problem of learning and using a Prolog-like language, although it can also provide an easier database interface to knowledge processing systems, with say Prolog as a host language. As a relational language DEAL can be implemented much more efficiently than Prolog, and hence its particular advantage as deductive interface language. It can also act as an internal language for more user-friendly languages. As indicated at the beginning of Section 5, most if not all recursive functions can be implemented by iterative procedures. If the evaluation of a recursive function at a given level always requires the evaluation of all its preceding levels, then obviously the iterative procedures are likely to be more efficient than repeated function calls. This condition holds for all the examples presented in this paper, and is probably true for all non-numeric recursive functions of our interest, including transitive closure operations. Therefore, we may implement recursive functions by iteration, without any differentiation from that of iterative functions. This would have the added advantage of supporting the UNION function almost free of cost, since the tuples(values) of all the levels are automatically produced in the iterative process. However, the conversion of a recursive expression into an iterative form may not always be straightforward. This is an area we wish to examine further, before we make a final decision on it. We are currently studying the implementation issues of both stat and dyn relations, particularly recursive/iterative functions including their optimisation strategies. In addition to static optimisation, some dynamic optimisation-at least at a crude level-would improve the performance. We hope to publish the outcome of this study in a separate paper later. Acknowledgment The author would like to thank a number of people who have suggested improvements on an earlier draft of the paper: David Bell, Ray Carrick, Don Kennedy, Jim Longstaff, Rubic Sadegi, Bill Samson and Malcolm Taylor. Special thanks to the DKE referees of this paper, Reind van de Riet and Peter Apers, for their invaluable comments and suggestions.
References [l] D.W. Barron, Recursive Techniques in Progrmttming (Macdonald, London, 1968). [2] O.P. Buneman et al., FQL-A functional query language, Proc. ACM SIGMOD Conference
(1979).
154
S.M.
Dcet~
I DEAL
(31 D.D. Chamberlin, A summary of user experience with the SQL data language, hoc. hreraafioaai Coafcrcr~ce ON Databases, Aberdeen (Heyden, London, 1980). [4] E. Clemons, Design of an external schema facility to dchne process recursive structures, ACM Trans. Database Systems 6 (1981) 295. [S] W.F. Clocksin and C.S. Mellish, Prograrnndag in Prolog (Springer, Berlin, 1981). [6] C.J. Date, A critique of the SQL database language, ACM SIGMOD Record 14(3) (1984) 8. [7] S.M. Deen, R.R. Amin and M.C. Taylor, Data integration in distributed databases, Department of Computing Science, University of Aberdeen (to be published). [8] S.M. Deen et al., PAL: PRECI algebraic language, Internal Report, Department of Computing Science, University of Aberdeen, Aberdeen. [9] S.M. Deen et al., The design of a canonical database system (PRECI), Corn[>rrr. J. 24(3) (1981). [IO] H. Gallaire, J. Minker and J.-M. Nicolas, Logic and databases-a deductive approach, ACM Cornpat. Survey f6(2) (1984) 153. [ll] J.M. Gallausiaux and G. Jomier, Interfacing Prolog and relational DBMS, Proc. ICOD Workshop on NCIV Applicatiom of Dafabases (Academic Press, New York, 1984). [12] M. Jarke et al., An optimizing Prolog front-end to a relational query, Proc. ACM SIGMOD Corrfcrom (1984) 296. [13] R.A. Kowalski, Logic for Problem Solving (North-Holland, Amsterdam, 1979). [14] R.A. Kowalski, Logic programming, in: R.E.A. Mason (Ed.), hformarioa Processirlg 83 (North-Holland, Amsterdam, 1983). [IS] R.A. Kowalski, Logic as a database language, Proc. 3rd British Nafiorlal Coafererm 011 Dafabases (1984). [16] H.J. Levesque, The logic of incomplete knowledge bases, in: M.L. Brodie, J. Mylopoulos and J.W. Schmidt (Eds.). 011 Cowxptrtnl Modclling (Springer, Berlin, 1983) 165. [17] W. Lipski Jr., On semantic issues connected with incomplete information databases, ACM Trans. Database Systems 4(3) (1979) 262. [18] J. Mylopoulos et al., A language facility for designing database-intensive application, ACM Trans. Database Sysfem 5(2) (1980) 185. [19] R. Reiter, Towards a logical reconstruction of relational database theory, in: M.L. Brodie, J. Mylopoulos and J.W. Schmidt (Eds.), On Coacepmal Modelling (Springer, Berlin, 1983) 191. [20] D.W. Shipman, The functional data model and data language DAPLEX, ACM Tram. Darabase Systems 6( 1) (1981) 140. [21] S.J.P. Todd and J.S.M. Verhofstad, An optimiser for a relational database system-description and evaluation, IBM Peterlee, United Kingdom, 1978. [22] D.H.D. Warren, Efficient processing of interactive relational database queries expressed in logic, Proc. 71 IEEE Coaferetrce OII Very Large Data Bases (1981).