Nurql: A nested universal relation query language

Nurql: A nested universal relation query language

Information Systems Vol. 14, No. 4, pp. 307-316, 1989 Printed in Great Britain. All rights reserved NURQL: 0306-4379/89 $3.00 + 0.00 Copyright 0 198...

1MB Sizes 3 Downloads 172 Views

Information Systems Vol. 14, No. 4, pp. 307-316, 1989 Printed in Great Britain. All rights reserved

NURQL:

0306-4379/89 $3.00 + 0.00 Copyright 0 1989 Pergamon Press plc

A NESTED UNIVERSAL QUERY LANGUAGE M. LEVENE

RELATION

and G. Lo~zou

Department of Computer Science, Birkbeck College, University of London, Malet Street, London WCIE 7HX, U.K. (Received 3 July 1988; received for publication 12 April 1989) Abstract-The

universal relation (UR) model aims to free the user from logical navigation in a relational database. A UR query language simplifies user queries since the database is queried solely through the attributes. Furthermore, the nested relational model extends the applicability of the relational model to non-business applications, and a nested relational query language provides a powerful and flexible user interface. Under the nested UR model, introduced by the authors, nested relations constitute the internal level for the nested UR scheme at the conceptual level, and a UR interface constitutes a view at the external level. Herein we describe NURQL (Nested UR Query Language), which is the query language proposed

for the nested UR model, and show that NURQL combines the simplicity of a UR query language with the power of a nested relational query language. Key words: Universal relation, nested relations, logical data independence, nested UR, query languages, SQL, NURQL.

1. INTRODUCTION

l

Relational database management systems (DBMSs) are currently dominating the commercial marketplace. The impact of relational DBMSs is enhanced by the abundance of DBMS products for microcomputers supporting the relational query language SQL [l] or variants thereof. Although SQL has become the oficiul ANSI standard for relational DBMSs, Date [ 1] has pointed out many shortcomings of this standard. One of the criticisms is the lack of fully nested expressions in the SQL standard. The query language NURQL, described in this paper, is an SQL-based, relational calculus query language; it is the query language for the nested UR model (see [2, 31) which provides a UR interface [4,5] as a view at the external level, supported by the nested VR scheme (NURS) [3] at the conceptual level, on top of nested relations at the internal level [&9]. NURQL has the following advantages over existing query languages for the relational model:

It can be interfaced to existing relational DBMSs to enhance their expressive power and usability.

The remainder of the paper is organized as follows. In Section 2 we briefly describe the UR model, the nested relational model and the nested UR model. In Sections 3 and 4 we briefly review existing query languages for the UR model and the nested relational model, respectively. In Section 5 we present NURQL and in Section 6 we demonstrate its effectiveness with example queries. In Section 7 we give the expressive power of NURQL. Finally, in Section 8 we present our conclusions. 2. THE UNDERLYING

DATA

MODELS

2.1. The nested relational model

We assume the reader is familiar with the relational model as presented in [21,22]. A basic assumption of this model is that relations are flat (in lNF), which implies that their attribute domains are atomic. In the nested relational model, which extends the applicability of the relational model to more advanced applications such as computer aided design, image processing and text retrieval systems, we drop the 1NF assumption and allow attribute domains to be relation-valued. The attributes will correspondingly be referred to as zero order (for atomic domains) and higher order (for relation-valued domains), respectively. In this paper we consider a subclass of the resulting nested relations, hereinafter simply called nested relations; these relations minimize redundancy, explicitly represent dependencies and possess

It is a simple and easy-to-use query language in which querying the database is carried out solely through the attributes, as in other UR query languages (cf. [l&12]). It extends the expressive power of SQL to cope with non-business applications by allowing restructuring of data (cf. [13-201). If it is required the NURQL query language does provide automatic restructuring facilities for output purposes. . It incorporates the suggestions put forward by Date [l] to improve SQL-type query languages (cf. [18]).

l

l

307

M. LEVENE and G. LOIZOU

308

computational efficiency since they obviate the need to compute ail the joins. Let U = (A,, A,, . . . , A,,} be the universal set of attributes. A scheme tree T, defined over a set of zero order attributes WC U, is a tree whose vertices are labelled by pairwise disjoint sets of attributes over W. Following Ozsoyoglu and Yuan [7], we can define a nested relation scheme (NRS) R(T), represented by T, and correspondingly an instance of a nested relation r* over R(T). We now introduce the notation for scheme trees (cf. [7]). Let T be a scheme tree over a set of attributes W G U, e = (u, v) be an edge of T, and n be a node of T. Then: is a label for node n and is equal to the set of attributes associated with n; A(n) is the union of all ATT(u) for all ancestor nodes v of n, including ATT(n); D(n) is the union of all ATT(v) for all descendant nodes v of n, including ATT(n); S(T) is the union of all ATT(n) for all nodes n in T; M(e) is the projected mu~t~valued dependency (PMVD), i.e. an MVD [Zl, 221 that can be inferred by projection, represented by the edge e, namely A(u) --;r~(v)(A(u)U~(u)); MVD(T) is the set of PMVDs represented by the edges of T.

(1) ATT(n) (2)

(3) (4) (5)

(6)

Let u,, u2, , u, be all the leaf nodes of T, then the path set of T, P(T), is given by P(T)={A(u,),A(u,),...,A(u,)}. Example 2. Let T, be the scheme tree over u’ = {STUDENT, DEPT, cbwi, MAJOR, EXAM, PROJECT) shown in Fig. 1. From Fig. 1 it follows that S(T,) = U’ and M(e,) = STUDENT

DEPT -++ MAJOR (U’)

M(e,) = STUDENT DEPT + CLASS EXAM PROJECT (U’)

We3) = STUDENT DEPT CLASS + EXAM ({STUDENT, CLASS, EXAM, PROJECT))

We4) = STUDENT

DEPT CLASS + PROJECT ({STUDENT,CLASS,EXAM,PROJECT}).

The set of PMVDs represented by the edges of T, is given by

STUDENT DEPT

MAJOR

CLASS

EXAM

1 PROJECT

Fig. 1. The scheme tree

r, .

A scheme forest over U is a collection (T,, T,, . . .,T,f of scheme trees such that S(Ti)zU,

lGi
and S(P) = Ug=, S(Ti) = U. A nested database scheme (NDS) over U is a collection of NRSs, (R(T,), R(T,), . . . , R(T,)>, such that . . .,T4) is a scheme forest over U and is F={T,,T,, hereinafter denoted by R(F). A nested datable d* = {r:, rf, _ . . , r:> is a collection of nested relations, where each rf is a nested relation over R(q.)), 1 6 i & q. The path set of F, denoted by P(F), is given by ‘Jg=1f’(T/). In the sequel we use the two standard operators, NEST and UNNEST [23,24]. Intuitively, nesting transforms a nested relation into one which is more “deeply” nested, while unnesting transforms a nested relation into a “flatter” one. We also use the UNNEST* operator, denoted by $*, which transforms any nested relation into a flat relation; the order of unnesting does not affect the resulting flat relation 1241. E.xampIe f ~cont~nued~. Herein we use the notation employed in [S] for representing nested relations. Null values of type “value exists but unknown at the present”, denoted by unk, are indicated by empty cells. For the nested relation r* shown in Fig. 2 the flat relation p*(r*) is shown in Fig. 3. We note that in [2] we extended the nested UR model to include two additional types of null, namely “value does not exist” or “value is not applicable”, denoted by dne, and “no information is available for an appropriate attribute value”, i.e. we do not know whether this value exists or not [8,25]. Z.Z. The UR model

MVD(T,) = {M(e,), M(+), M(e,), M(e,)f and the path set of T, is given by P(T,) = {{STUDENT, DEPT, MAJOR),

(STUDENT, DEPT, CLASS, EXAM}, {STUDENT, DEPT, CLASS, PROJECT}}. Moreover, letting R(T, ) be the NRS for T, of Example 1, we see that R(T,) = STUDENT DEPT (MAJOR)* (CLASS (EXAM)* (PROJECT)*)*.

The

relational model provides physical data as users do not need to know the physical structure of the database, when posing queries. On the other hand, users are not free from Zogical navigation amongst the relations in the database when formulating queries. The UR model [4, 51 provides logical data independence for a relational database by allowing users to view the database as a single relation. independence

NURQL

Fig. 2. The nested relation

In the UR model the UR interface is a view on top of a flat database over U at the conceptual level. The usual assumptions of the UR model are [4]: The universal relation scheme assumption, i.e. each attribute A E U plays a unique role. Thus, in Example 1, PROJECT means a project of a student for a class he is taking and not, say, a departmental project. The unique role assumption, namely that there is a unique relationship among sets of attributes Xs U. The basic relationship on X is denoted by [X] and is called the window for X. Thus, in Example 1, (STUDENT, DEPT} means the department a student is majoring in and not a department in which a student is taking some other class. The oneflavour assumption, which says that the real-world significance of any tuple of a window for X does not depend on the details of its construction (see also [26]).

309

r* over R(T,)

model it was shown that the nested representative instance (NRI) is suitable for storing the information in the nested database in one nested relation (cf. representative instance [4]). The window for X, [Xl, can be computed by unnesting an extended total projection over the NRI [2]. In order to provide a UR interface at the external level, in the nested UR model we employed the NURS, denoted by U(T), at the conceptual level [3]. The NURS provides the structure needed for joining the nested relations in d* when answering queries. For our running example the NURS is shown in Fig. 8. Since creating the NRI is, in general, a computationally expensive operation, in [27] we define an alternative way to compute [Xl, namely via an extended algebra expression. We note that if P(F) is a reduced y-acyclic database scheme [28], then the query processing is simplified as the one flavour assumption is satisfied [26]. In this case no ambiguity arises in queries and there is a unique join sequence for computing [X] for any XG U.

2.3. The running example We now present the running example for the rest of the paper. Let Ii = {STUDENT, DEPT, MAJOR, CLASS, EXAM, PROJECT, TUTOR, TEXT, CHILD, SALARY}, F = {T, , T2, T,J and R(F) = {R(T,), R(T,), R(T,)}. T, is shown in Fig. 1, while T, and T, are shown in Figs 4 and 5, respectively; it therefore follows that S(F) = U, as required. Moreover,weletd*={r:,r:,rf},wherert,r:and r: are shown in Figs 2,6 and 7, respectively. 2.4. The nested UR model The nested UR model was presented in [2, 31. In those papers we generalized the basic concepts of the UR model to the nested relational model and our main theorem showed that nested relations are a suitable internal level for implementing a UR interface as a view at the external level. For the nested UR

Fig. 3. The flat relation

3. QUERY LANGUAGES FOR THE NESTED RELATIONAL MODEL An algebraic query language specifies the precise order of evaluation of the query, while a calculusbased query language is non-procedural. Hence most user query languages are calculus-based, thus relieving the user from specifying the exact way the query is to be evaluated. On the other hand, the query processor can benefit from the algebra from the point of view of optimizing and evaluating the query. Various algebras for the nested relational model have been suggested [6, 8,9,23,24]. It turns out that the flat relational algebra augmented with the NEST and UNNEST operators provides us with the power of the nested relational algebra, as long as we assume that a nested relation contains the same information as its flat counterpart obtained by using the p*

p*(r*)

over S(T, ).

M. LEVENE and G. Lo~zou

310

Ull!lWl pmgramming

logic

Hanna

Knuth

MendeIson

Fig. 4. The scheme tree T,.

Fig. 6. The nested relation r: over R(T,).

operator [B, 24,291. To provide for the concept of duplicate tuples, Dayal et al. [30] developed the multirelational model and Guting et al. [3 I] extended it to nested relations incorporating ordering and nuN values. In [32] the algebra proposed by Scholl et al. has the additional function of providing an internal level for a conceptual level of flat relations. In the nested UR model we go one step further and provide the NURS at the conceptual level supported directly by an internal level of nested relations. Several calculus-based query languages have been developed for the extended relational model. For example, QUEL +[20] and POSTQUEL [19] are extensions of QUEL, STBE [15] is an extension of QBE while SQL/NF [18] and SQL for NF* relations, called HDBL [14, 16, 171, are extensions of SQL. All these query languages extend the nested relational model by encompassing such extensions as duplicates, ordering, functions and recursive queries. NURQL is an SQL-based query language. The syntax of NURQL resembles those found in [I, 16-181, however, no logical ~uvigutio~ amongst the nested relations in the nested database is necessary. NURQL possesses the full expressive power of nested relations under the nested UR model while, as we shall see in Section 6, query formulation is significantly simplified.

UR query languages. Since they are based on the UR model the RANGE clause (cf. FROM clause in SQL [l]) is omitted. A different approach to the UR view is taken in DURST [12], where the concept of QBE is applied to define a tabl~u-o~ent~ UR query language, which is then translated into a QUEL-based UR query language. In a recent paper, Desai et al. [13] propose the use of nested relations to simplify the UR interface and thus the specification of document retrieval queries. The proposal therein does not alleviate the problem of logical navigation as the structure of the nested database needs to be known by the user. In analogy to PIQUE, NURQL is a ~i~do~-~~ed UR query language, and it allows a very flexible output interface according to the needs of the users (see Section 5). While adhering to the simplicity of UR-based query languages, NURQL possesses the full expressive power of SQL under the nested UR model.

4. QUERY LANGUAGES UR MODEL

FOR THE

There have recently been several experimental UR DBMSs developed [lO-12,331 though they are not yet commercially available. A UR query language simplifies, considerably, the user interface to a relational database as logical navigation is automatically undertaken by the DBMS. In a UR query language the user queries the database solely through the attributes. Thus, in addition to physical data independence provided by the relational model, the UR model provides logical data independence. Both SYSTEM/U [lo] and PIQUE [ll] are QUEL-based

Fig. 5. The scheme tree 7’>.

5, THE NURQL INTERFACE The nested UR model provides a very flexible interface at the external level, since nested relations allow users to view the data in a hierarchical fashion as well as in flat tables (see Figs 2 and 3). NURQL supports the restructuring operators NEST, UNNEST and ~NN~ST~, so that users can restructure their output according to their needs. Although these operators allow considerable flexibility as to the format of the output, it is cumbersome for users themselves to restructure the output of a query as befitting their needs. In order to solve this problem we allow three types of output interfaces for NURQL. Each interface specifies the automatic restructuring provided to the user by NURQL, namely: (1) a #at interface, also called the standard interface. This interface coincides with the classical UR model interface;

Fig. 7. The nested relation r: over R(T,).

NURQL (2) the NURS

interface, also called the standard nested interface. This interface coincides with the conceptual level of the nested UR model; (3) the specialized interface, also called the nonstandard nested interface. This interface involves further restructuring operations (using NEST and UNNEST) from the NURS interface and must be specified to NURQL by the database administrator (DBA).

We now make some further output interfaces:

remarks

311 1 STUDENTDEPT

MAJOR

EXAM

1

[ CLASS

(

1 TEXT

1

TUTOR

1 PROJECT

on these

1. The Jut interface and the NURS interface do not have to be specified to NURQL by the DBA, since they are encapsulated within the nested UR model. This is not the case with the specialized interface; however, this interface mav. better serve the demands of the user than the other two interfaces, especially when after using either (1) or (2) the user finds out that he himself spends too much time on restructuring his output. 2. Given a nested interface (standard or nonstandard), the _fZatinterface is realized via the operator UNNEST*. 3. The NURS interface is the most efficient, since it is the conceptual level of the nested UR model. 4. It is possible to provide different specialized interfaces to different users, according to their demands. 5. The DBA can be provided with tools to help him to find the appropriate interface for each group of users. 6. Each user must be aware of the interface at his disposal. This is mainly because any further restructuring of the output starts from the given interface. For the running example, a NEST from the NURS interface on, say DEPT, is bbviously different from a NEST on DEPT from thefIat or specialized interface. We further observe that a NEST on TEXT from the NURS interface does not even affect the output. 7. Due to (6), we recommend theflut interface for casual users and the nested interface for regular users. As we demonstrate in the next section, apart from restructuring, queries are formulated in the same way for any given output interface. As NURQL is a UR-based query language the user accesses the nested database solely through attributes. For the rest of the paper we assume that the output interface is the flat interface, unless explicitly stated otherwise. 6. A QUERY LANGUAGE FOR THE NESTED UR MODEL NURQL is an SQL-based query language for the nested UR model. We have tried to keep the syntax of NURQL as similar as possible to that of SQL [1],

Fig. 8. The NURS, U(T), for R(F) of the running example.

while adhering to the principle of orthogonality [1] as in [18], and incorporating the suggestions of Date [1] in order to improve the usability of NURQL. The addition of the restructuring operators (Section 6.3) adds the power of nested relations to NURQL as in HDBL and SQL/NF. On the other hand, the SQL FROM clause, which specifies the relation list for a query, is omitted in NURQL as tuple variables are bound to windows and not to nested relations (or flat relations). The semantics of the well-known SQL constructs such as, SELECT, WHERE, ORDER BY and IS NULL, are the same in NURQL, keeping in mind the semantics of the specific UR scheme, U. The advantage of this approach is that users can interact with NURQL, SQL and SQL/NF or HDBL, under the same DBMS, according to their sophistication and needs, without any unnecessary confusion between the different query languages. A unique feature of NURQL is its treatment of nulls within the nested UR model (Section 6.4). By adding the simple construct TOTAL the user has the option to remove tuples with nulls from the output, while in the default output, tuples do not contain nulls. This allows more informative answers to queries than other UR-based query languages, such as PIQUE, SYSTEM/U and DURST, which do not consider nulls in the users’ output. From now on we use the BNF notation when describing the syntax of NURQL. Thus, the general form of a query in NURQL is: [SELECT] (nested-query) [ WHERE(nested-query)]. We call this basic structure a SELECT-WHERE (SW) expression, since the logical navigation amongst the nested relations in the nested database, which is normally specified in the FROM clause, is now done automatically by NURQL. Furthermore, we note that the reserved word SELECT is redundant in a query, since we regard it as optional. Consequently, (Q,) SELECT

STUDENT WHERE CLASS = “databases”

M. LEVENE and G. LOIZOU

312

could be simplified to do STUDENT

CLASS = “databases”,

WHERE

and

we further remark that the reserved word is a shorthand for WHERE EXISTS. If an entire relationship among a set of attributes is desired, then we can omit the WHERE clause. For example, the relationship between STUDENT and CLASS is: WHERE

(Q,) SELECT

STUDENTCLASS.

If all the attributes of U are to be retrieved, then we use the reserved word UR. For example, (Q,) SELECT

UR WHERE

(Qs) SELECT

6.1 Simple queries Simple queries in NURQL have the general form: [SELECT](select-spec) [WHERE (search-condition)]. NURQL is a tuple-calculus language [21,22] and thus the (select-spec) is a list of tuple attribute pairs in the form of tuple-variable.attribute-name. As in PIQUE, SYSTEM/U and DURST, we assume a default tuple variable referred to as the blank tuple variable, which is a shorthand notation. Thus, (Q,) above is a shorthand for the query (Q4), namely: (Q,,) SELECT

blank. STUDENT WHERE blank. CLASS = “databases”.

(search-condition) is a collection of predicates combined with the Boolean operators AND, OR, NOT and parentheses which indicate the order of evaluation. Tuples which satisfy (search-condition) are retrieved from the nested database, and in the answer to a query only attributes from (select-spec) are output. Thus, the answer to (Q4) above is {Mark, Iris}, i.e. only tuples with CLASS = “databases” are retrieved. In SQL (SQL/NF) tuple variables are bound to relations (nested relations). In contrast, in NURQL, as in PIQUE, tuple variables are bound to windows. Each window in turn is bound in NURQL to an extended algebra expression; such an expression computes the value of the window [27]. Once tuple variables are bound to windows the query is evaluated, i.e. further operations (such as selection, binary operations, computing functions and restructuring) are performed on the windows to produce the answer to the query. These two stages are called binding and evaluation [4]. A simple query pertaining to this is: u. PROJECT WHERE

u. STUDENT,

TUTOR WHERE

SALARY > 10000,

which returns {Robert,Hanna}, the blank tuple variable is bound to [{TUTOR, SALARY}]. Now, consider a query involving more than one comparison, namely: (Q,) SELECT

t. DEPT, t. CLASS

WHERE t. TUTOR = “Robert” AND t. TEXT E {“Date”,“Ullman”},

CLASS = “databases”.

Throughout the rest of this paper, we include the reserved word SELECT in the query structure whenever it enhances the clarity of the query.

(Q,) SELECT

which returns { INF, NF2, UR, prolog}. We recall that WHERE is a shorthand for WHERE EXISTS, therefore (Q5) returns PROJECTS that are known to have at least one STUDENT. The tuple variable u is bound to [{PROJECT, STUDENT}]. In the query:

which returns {(CS, databases)}. We note that in PIQUE “*” is used for this interpretation of Q, instead of AND. In NURQL the Boolean operators AND, OR and NOT have their usual meaning. The binary operators and nested queries are used to bind the same tuple variable to different windows in the same query. 6.2 Queries with binary operators The form of a query with a binary operator is: (query-exp) (bin-op) (query-exp), where bin-op is one of: (bin-op)::

=

uNIoN 1DIFFERENCE 1 IINTERSECTION I [OUTER] JOIN 1TC (transitive closure).

We note that in queries involving set operators, the output from the corresponding query expressions must be compatible, i.e. it must be over the same set of attributes. To this end we give an example of INTERSECTION, namely: (Q,) (SELECT CLASS WHERE TEXT = “Date”) INTERSECTION (SELECT CLASS WHERE EXAM = “final”), which returns {databases}. More details on the OUTER JOIN and TC operators are given, respectively, in Sections 6.4 and 6.5. 6.3 Structured queries NURQL provides automatic restructuring for the output of a query according to the NURQL interface (see Section 5). The user may specify further restructuring via the operators NEST, UNNEST and UNNEST*. A structured query takes one of the following forms: l

UNNEST*(nested-query)

l

UNNEST

l

NEST

(nested-query)

ON (column-list)

(nested-query) ON {(column-list > INURS} [AS < column-name > 1.

NURQL

The UNNEST* operator converts the current interface into a Jrat interface, i.e. a flat relation is output. The NEST(nested-query)ON NURS converts the current interface into a NURS interface, i.e. the nesting corresponds to the NURS structure of the NDS. We use the “*” notation [6,7] for higher order attributes as the following query demonstrates. (Q9) NEST

(SELECT

TUTOR,

CHILD) ON CHILD

returns the NRS TUTOR(CHILD)*. Alternatively, we can give the higher order attribute a new name [8,24] as in: (Q,,,) NEST

TUTOR, CHILD) ON CHILD AS CHILDREN.

(SELECT

Since Qg results in the same structure as the NURS, query

we could have written the equivalent

(Q,,) NEST

(SELECT

TUTOR,

CHILD) ON NURS.

6.4. Ordering, duplicates and null values in NURQL

At the physical level we allow null values, duplicate tuples and ordering. In [2] we extended the nested UR model to include three types of null at the internal level, but windows only include the total tuples (i.e. tuples over the set of attributes of the given window that contain only non-null values [4]). Furthermore, the internal nested relational level does not support duplicates and ordering. NURQL supports duplicate tuple output, ordering and output of tuples with nulls. It is interesting to note that Ullman [5] argues in favour of the extension of windows to include null values, as information is lost, at times, when only the total projection is considered and output. Ordering of the output is done by using the ORDER BY clause. For example, (Q,,) SELECT (NEST (CLASS, STUDENT) ON STUDENT) WHERE (COUNT (STUDENT)*) >0 ORDER BY CLASS returns the nested relation shown in Fig. 9. The default output from NURQL eliminates duplicates (DISTINCT), outputs only total tuples (TOTAL) and does not order the output. Thus (Q,,) SELECT is equivalent SELECT

TUTOR

to (DISTINCT

TOTAL

(TUTOR)),

which returns {Robert, Hanna}. If null values are desired, then we have: (Qld) SELECT

(DISTINCT

and we get {Robert,Hanna,unk}.

(TUTOR)),

313

Fig. 9. The

ordered nested relation output for the structured query Q,,.

If duplicate tuples are desired but null values are not, then we have: (QIs) SELECT

(DUPLICATE

TOTAL

(TUTOR)).

In this case we would have “Robert” and “Hanna” repeated for as many times as they appear in [TUTOR] before the elimination of duplicates. In order to output duplicates with nulls we must pose the query (Q& SELECT

(DUPLICATE

(TUTOR)).

A tuple t, is subsumed by a tuple t, if t,, being non-null on a set of attributes X, implies that t, is also non-null on X and has the same values as t, over X. Finally, if null values are allowed but we wish to eliminate subsumed tuples, we must pose the query: (Q,,)

SELECT

(SUBSUME

(TUTOR)).

For the running example, the result of (Q,,) is the same as the result of (Q,,), since SUBSUME removes duplicates and a non-null value subsumes a null value. In general, subsumption in a query gives a different result than the default answer, when more than one attribute is output, as can be verified from the running example. Duplicate elimination and subsumption have a strong effect on statistical functions, such as COUNT, when null values and duplicates may be desired in the COUNT output. On the other hand, when null values are output, such functions as A 1/G may become undefined unless we allow default values. Thus, the user has to be cautious when applying statistical functions, by controlling the output of duplicates and nulls, so as to be able to interpret the answer correctly. The default of DISTINCT TOTAL conforms to the semantics of the nested UR model; thus it caters for most queries. In order to be able to interpret queries with statistical functions, we recommend that all the relevant attributes be output together with the function output, so as to make the result clearer to the user. We note that we have decided to include nulls of type dne in the default TOTAL output. This is due to the fact that dne nulls are considered as being just another domain value and not as a placeholder for missing informatoin [8]. In NURQL tuple variables are bound to windows. If we have more than one tuple variable in a query we may have a join operation. As null values are supported by the nested UR model, we may require

M. LEVENEand G. LOIZOU

314

an outer join [34], which preserves tuples that do not participate in the join. For example, in: (Q,,) DZSTZNCT (t. TUTOR, t. CLASS, u. STUDENT WHERE t. CLASS = u. CLASS PRESERVE t, u), we note that we PRESERVE tuple variables, since they are bound to winnows and not to relations. (Q,,) preserves the data for all the classes for which it is unknown whether any students take these classes or any tutors give them. By using the OUTER JOIN binary operator, (Q,,) is simplified to: (Q,,) (DISTINCT (TUTOR, CLASS) OUTER JOIN (~ZSTZNCT (STUDENT, CLASS)). Note the use of DISTINCT in the OUTER JOIN query. If only TOTAL tuples were output the query would reduce to a JOIN query. 6.5. Transitive closure queries In this section we briefly discuss the transitive closure operator 1211 TC which adds recursion and deduction to the DBMS [ 14,351. We note that the TC operator is not necessary for relational completeness [36], however, it is known to be very useful. Herein we provide the TC operator (cf. [19] RETRIEVE*) as a binary operator. Let X, , X,c Li, X]nX, = 0 and /X, f = 1X, 1(where 1S 1denotes the cardinality of a set S). A simple transitive closure query is of the form: (SELECT The interpretation

X,) TC (SELECT

TUTOR)

>Irx,,>m

TC (SELECT CHILD),

which returns the relation in Fig. 10. In the more general form, a transitive closure query may involve further operations on [{(X,),(X,)>] before TC is performed. 7. THE EXPRESSIVE

Fig. 10. The relation output for the transitive closure query Qzo.

the NURS, U(T), and u = p *(r *) be its corresponding flat relation, over U. Then u can always be restructured back into r* by a sequence of NEST operations [24]. More specifically, if (X)* is a higher order attribute of V(T), then the query: NEST(UNNEST

(UR) ON (X)*) ON X,

over r*, always outputs the given nested relation, r* [23,24]. Now, since NURQL supports the restructuring operators, NEST, UNNEST and UNNEST*, the result follows as in [8,24]. !J At this juncture, we stress the fact that in NURQL the logical navigation amongst the nested relations in the nested database is automatically done by the system. Thus, many queries which normally require joins can be stated by a simple query in NURQL. In general, queries in NURQL are simpler than queries in a query language supporting the nested relational mode1 or even the relational model. 8. CONCLUDING

as in the following query: (QzO) (SELECT

Brian Annette Richard Annette Richard

X2).

of this query is

(H, If(X, 1, (X*>)l) 2-C (n&X,

Robert Hanna Hanna Robert ~ Robert

POWER OF NURQL

We prove in this section that the expressive power of NURQL under the nested UR model is at least that of the extended relational algebra with nulls. We note that, since NURQL also allows functions, ordering, duplicates and transitive closure queries, it actually extends the power of the extended relational algebra. Pr~pos~ti~~ 1. The expressive power of NURQL under the nested UR model is at least as powerful as that of the extended relational algebra. Proof-In [21,22] it was shown that SQL is relationally complete in the sense that it has at least the expressive power of the relational algebra of Codd [34. In the context of this paper nested reiations contain the same info~ation as flat relations in the following sense. Let r* be a nested relation over

REMARKS

In this paper we presented a query language, namely NURQL, for the nested UR model. The nested UR mode1 provides a UR interface as a view at the external level, on top of the NURS at the conceptual level and supported by nested relations at the internal level (above the physical level). Thus, on the one hand, we gain the simplicity of the UR model, and on the other hand the expressive power of the nested relational modet. NURQL has the further advantage that it can be used as a UR interface to existing relational DBMSs in order to enhance their expressive power and usability, due to the fact that flat relations are a special case of nested relations. The syntax of NURQL is based on those of SQL, SQL/NF and HDBL. The full syntax is given in Appendix A. The basic construct of NURQL is the SW-expression, which is similar to the RETRIEVE-WHERE expression of PIQUE, SYSTEM/U and DURST. Since NURQL is simpler and more flexible than either SQL or, say SQL/NF, it will be easier for users to adapt to NURQL than to one of the other query languages. In addition, a user who is familiar with NURQL will be able to learn one of the above-mentioned query languages, for the nested relational model or the relational model, much faster, as many of the underlying concepts overlap. As in [18f we adhere to the principle OJ”orthogonality 111 by supporting fully nested expressions.

315

NURQL With the NEST, UNNEST and UNNEST* operators NURQL possesses the expressive power of nested relations under the nested UR model, thus allowing more complex applications to be tackled in comparison to the classical UR model. Moreover, functions in NURQL allow us to model statisticai data [15]. We also support duplicates, ordering and null

DOIH. F. Korth, G. M. Kuper, J. Feigenbaum, A. Van

values, which further enhance the flexibility of NURQL. Finally, we allow recursive queries via the TC operator.

]12] J. Biskup and H, H. Br~~emann. Data manjpuIation languages for the universal relation view DURST. In Proc. Inf. Co& Mafhemut~cal Foundations of Data Base Sysrems, Dresden, pp. 2tX-41 (1987). [131 B. C. Desai. P. Goyal and F. Sadri. Non-first normal form universal relations: an application to info~ation retrieval systems. Inform. Systems 12(l),

We conclude the paper by summarizing of the NURQL interface:

the goals

* To provide

a user-friendly interface, allowing flexible output formats. . To provide a simpler query language than existing ones for the relational and nested relational models. To provide both physical and logical data independence via the nested UR model.

To provide facilities for extended applications via the nested UR model. To provide a query language based on a model which allows efficient internal structures, i.e. nested relations which minimize redundancy,

l

explicitly represent dependencies and possess computational efficiency since some of the joins are realized in the nested relations themselves. To provide compatibility with SQL.

We are currently in the process of implementing parser for NURQL.

a

REFERENCE [I] C. J. Date. A Guide to the SQL Standard. AddisonWesley, Reading, Mass. (1987). [2] M. Levene and G. Loizou. A universal relation model for nested relations. In Prot. Int. Conf. Extending Dutubase T~chnofogy (Edited by J. W. Schmidt, S. Ceri and M. Missikoff), pp. 294-308, Springer-Verlag, Berlin (1988). [3] M. Levene and G. Loizou. A universal relation interface for a nested database. Research Report LL-88-01, Department of Computer Science, Birkbeck College, Univeristy of London, U.K. (1988). [4] D. Maier, J. D. Ullman and M. Y. Vardi. On the foundations of the universal relation model. Ass. comput. Mach. 283-308 (1984).

Trans. Database System.

9(2),

151J. D. Ullman. Universal relation interfaces for data‘.I base systems. In Proc. of IFIP (Edited by R. E. A. Mason), pp. 243-252. North-Holland, Amsterdam (1983). [6] S. Abiteboul and N. Bidoit. Non first normal form relations: an algebra allowing data restructuring. J. Comput. System. Sci. 33(3), 361-393 (1986). [7] 2. M. Ozsoyoglu and L.-Y. Yuan. A new normal form for nested relations. Ass. comput. Mach. Trans. Database System. 12(l), 11I-136 (1987). [8} M. A. Roth, H. F. Korth and A. Silberschatz. Null values in -r 1NF relational databases. Research Report TR-85-32, Department of Computer Science, University of Texas at Austin, U.S.A. (1985). [9] H.-J. Schek and M. H. Scholl. The relational model with relation-value attributes. Inform. Systems 11(2), 137-147 (1986).

Gelder and J. D. Ullman. System/U: a database system based on the universal relation assumption, Ass. comput. Mach. Trans. Database System. 9(3), 331-347 (1984).

PII D. Maier, D. Rozenshtein, S. Salveter, J. Stein and D. S. Warren. PIQUE: a relational query language without relations. Inform. Systems 12(3). 317-335 (1987).

49-55 (1987).

.



1141V. Linnemann. Non first normal form relations and recursive queries: an SQL-based approach. In Prof. 3rd IEEE Int. Conf Data Engineering, Los Angeles, pp. 591-598 (1987). 1151G. Ozsoyoglu, Z. M. Qzsoyoglu and V. Matos. Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. Ass. camput. Mach. Trans. Database System. 12(4), 566592

(1987). P. Pistor and F. Anderson. Designing a generalized [I41 NF2 model with an SQL-type language interface. In Proc. 12th. Int. Conf. Very Large Databases, Kyoto, _._. Japan, pp. 278-285 (1986). 1171P. Pistor and Traunmueller. A database language for sets, lists and tables. Inform. Systems 11(4), 323-336 (1986). [18] M. A. Roth, H. F. Korth and D. S. Batory. SQL/NF: A query language for 7lNF relational databases. Infornt. Systems 12(l), 99-114 (1987). [19] L. A. Rowe and M. R. Stonebraker. The POSTGRES data model. In Prof. fkh fnt. Conf Very Large Databases, Brighton, pp. 83-94 (1987). [20] M. Stonebraker, J. Anton and E. Hanson. Extending a database system with procedures. Ass. eomput. Mach. Trans. Database System. 12(3), 35&376

(1987).

[Zl] D. Maier. The Theory of ~e~ui~o~l Databa.~e.~. Computer Science Press, Rockville (1983). [22] J. D. Ullman. Frinc~les of Database Systems. Computer Science Press, Rockville (1982). 1231G. Jaeschke and H.-J. Schek. Remarks on the algebra of non first normal form relations. In Proc. Ist ACM &Imp. Principles of Database Systems, Los Angeles, pp. i24-138 (1982). [24] S. J. Thomas and P. C. Fischer. Nested relational structures. In Advances in Computing Research, Vol. 3 (Edited by P. C. Kanellakis and F. Preparata), pp. 269-307. JAI Press, Greenwich (1986). [ZS] C. Zaniolo. Database relations with null values. J. Comput. System. Sci. 28(l), 1422166 (1984). ]261 *J. Biskup, H. H. Briiggemann, L. Schnetgiike and M. Kramer. One flavor assumption and y-acyclicity for universal relation views. In Proe. 5th ACM Symp. Principles of Database Systems, Cambridge Mass, pp. 148-159 (1986). ]271 M, Levene. The nested universal relation database model. Ph.D. Thesis, in preparation. Department of Computer Science, Birkbeck College, University of London ( 1989). [281 R. Fagin. Degrees of acyclicity for hypergraphs and relational database systems. J. Ass. comput. Mach. 30(3), 514-550 (1983).

]291 D. Van Gucht and P. C. Fischer. MultiIevel nested relational structures, .J. Compuf. System. Sci. 36(l), 77-105 (1988). [30] U. Dayal, N. Goodman and R. H. Katz. An extended relational algebra with control over duplicate

316

M. LEVENE an d G. Lcnzou

elimination. In Proc. 1st ACM Symp. Principles of Dafubuse Systems, Los Angeles, pp. 117-123 (1982). PiI R. H. Guting, R. Zicari and D. M. Choy. An algebra for structured of&e documents. IBM Research Report RJ 5559 (56648), San Jose, Calif. (1987). f321 M. H. Scholl, H.-B. Paul and H.-J. Schek. Supporting flat relations by a nested relational kernel. In Proc. 13rh Inr. Conf. Very Large Databases, Brighton, pp. 137-146 (1987). 1331 S. M. Kuck and Y. Sagiv. A universal relation database svstem imolemented via the network model. In Proc. k

ACM *Symp. Principles

of Database Systems,

[34] C. J. Date. The outer join. In Relulional Dafabases Seiected Wirings, pp. 335-366. Addison-Wesley,

Reading, Mass. (1987). [35] R. Agrawal. Alpha: an extension of relational algebra to express a class of recursive queries. In Proc. 3rd IEEE Inr. Conf. Data Engineering, Los Angeles, pp. 586590 (1987). [36] E. F. Codd. .Relational completeness of data base sublaneuaees. In Du~araBase &stems (Edited bv R. Rustinr, pi. 65-98. Prentice-H&, Endewood dliffs, New Jersey (1972).

Los Angeles, pp. 147-157 (19821. APPENDIX

A

We use the BNF notation for the NURQI. grammar with the following conventions. 1. 2. 3. 4.

Key words are indicated by uppercase italicized characters. Non-terminal symbols are enclosed with “( )“. Optional clauses are enclosed with “[ 1”. Alternatives are separated by “ I”. If only one of the symbols is to be chosen out of several alternatives, then we enclose them with “{ 1”. 5. “. . .” indicates zero or more repetitions of the previous entry. (query-exp):: = (query-spec) [
1function((query-exp))

(~~~~~~~~)::=MIN~MAX~AVG~SUMICOIINTIDISTINCT[T~TAL]~DL~PLICATE[TOTAL]ISUBSUME

(nested-query):: = (column-list) [((query-exp)) (query-spec):: =[SELECT] (select-spec) [WHERE (search-condition) [PRESERVE (tuple-variable) [,{tuple-variable). . ]]] 1[SELECT] ( nested-query) [WHERE (nested-query)] (structured-query):: =NEST(nested-query) ON ~{co1umn-list} 1NURS) [~~(att~bute-name)] / UNNEST(nested-query) ON (column-list) 1iJNNEST* (nested-query} (order-spec):: = ORDER BY (attribute-name) [, (attribute-name). . .] [AX IDESC] (select-spec):: = (column-exp) I (column-list) (search-condition):: = (Boolean-term) I(search-condition) OR (Boolean-term) {Boolean-term):: = (Boolean-factor) 1(Boolean-term) AND (Boolean-factor) (Boolean-factor}:: = [NO~](Boolean-primary) {Boolean-primary):: = (predicate) /({search~ondition}) (predicate}:: = (camp-predicate) /(in-predicate)I~~(~l~~3~clr (in-predicate):: = (scalar-exp) [NOT] ZN(nested-query) (exists-predicate>:: = [EXS’TS] (nested-query) (like-predicate}:: =(see [I}) (between-predicate):: = (see [l]) (null-predicate):: = (column-ref) IS [NOT] NULL (scalar-exp):: = (term) I(scalar-exp) { + / - } (term) (term):: = (factor) I (term){*l/}(factor) (factor):: =[ + I -](primary) .](attribute-name) (column-list):: =[ALL ~~~(coiumn-rep [, (column-ref). . .]/ UR (column-exp):: = (scalar-exp~[A~(attribute-name)] 1(column-exp) [,(column-exp). . .] (literal):: = (zero-order-value) \ (higher-order-value)