InformorionSystems Vol. 14, No. 3, pp. 231-246, 1989 Printed in Great Britain. All rights reserved
A GRAPHICAL
0306-4379/89 $3.00 + 0.00
Copyright 0 1989MaxwellPergamon Macmillanplc
QUERY LANGUAGE FOR THE BINARY RELATIONSHIP MODEL LEO MARK
Department of Computer Science and Institute of Advanced Computer Studies, University of Maryland, College Park, MD 20742, U.S.A. (Received 30 April 1987; in revisedform 24 November 1988; receivedfor publication 14 February 1989) Abstract-We present a graphical query language for the Binary Relationship Model. The language is closed with respect to the model, and it supports queries as well as view definitions. The user interacts with the database using a mouse on a graphical representation of the database schema. He first specifies a query context graph by highlighting parts of the database schema. A query context graph is an unordered graph that defines the context for the evaluation of a subsequent set of queries and view definitions. A query identifies the object type which is the subject of the query and specifies a set of conditions on the query context graph. The object type and the conditions constitute the minimum of information needed to traverse the query context graph and evaluate the query. The result of a query is the set of elements of the identified object type that are related to elements in the query context graph in accordance with the specified conditions. The language is one of several tools for the Binary Relationship Model being implemented by the Database Systems Group at Maryland.
1. INTRODUCTION
The Binary Relationship Model has been praised for the way it not only supports, but practically forces us to model the “deep structures” of information in the conceptual schema [l-4]. An adverse effect of this is that the conceptual schema becomes very large. Queries in the Binary Relationship Model are expressed by navigating in the schema, and the length of most queries is directly proportional to the size of the schema. There are several ways to make queries shorter, i.e. less wordy. In a previous paper [5] we introduced two new concepts for view definition, DOTS and CUTS. DOTS are derived object types and CUTS are derived relationship types. These concepts allow us to cut down on the size of the schema and to reuse queries in the formulation of new queries. The graphical query language is based on the textual query language in (51, but we have done much more than just replacing the keyboard by a mouse. A major design principle has been to try to relieve the user of the graphical query language from the strict syntax of the underlying textual query language. A query context graph is a connected piece of the database schema, within which a set of queries can be evaluated. It provides the user with the ability to easily change a query without having to redefine the whole query. The elements of a query context graph can be pointed out in any order. A query identifies an object type within the query context graph and specifies a set of conditions on the query context graph. The conditions in a query can be given in any order. The result of a query is the set of
elements of the identified object type that are related to elements in the query context graph in accordance with the specified conditions. The Binary Relationship Model is one of the many data models that appeared in the mid-70s. Abrial’s Semantic Binary Data Model [6] is to our best knowledge the first proposed Binary Relationship Model. The two basic modeling concepts are object types and binary relationship types between them. Two kinds of object types, entity-sets and entity-namesets, are distinguished in Senko’s Data Independent Accessing Model (DIAM) [l]. Surrogates are proposed as a means of representing entity-sets [7]. Several constraint types are added to the data structures of the Binary Relationship Model in [3, 8-111. The addition of “IS-A ” relationships completes the data structures of the model [l l-131. FORAL, the first query language for the Binary Relationship Model, is introduced with DIAM in [l]; it later appears in an elegant light-pen version in [14]. Although significantly different from the light-pen version of FORAL, the main inspiration to the graphical query language presented in this paper is FORAL. Another user-friendly query language, RIDL, is presented in [15]. A schema abstraction tool for the Binary Relationship Model is presented in [16]. Unfortunately, the abstractions are not defined through the query language and can therefore not be made the subject of queries. An information systems analysis and design methodology called Nijssen’s Information Analysis 231
LEOMARK
232
Methodology, or NIAM, is presented in [ 171.At least two companies, CDC and CREON, offer software support for the methodology. This software accepts a database schema in the Binary Relationship Mode1 as input and produces a normalized relational database schema as output. Some of our previous work on the Binary Relationship Model include [l8-201. In [ 181 we raise several questions that unveil problems with the Binary Relationship Model and we propose solutions to some of them. We also define a self-describing meta-schema for the Binary Relationship Model. In 1191we propose a data manipulation language that together with the self-describing meta-schema allows manipulation of all data and data descriptions in a database. In [21] we concentrate on aspects of schema manipulation using this data manipulation language. In [5] we introduce two new concepts, derived object types and derived binary relationship types, for view definition in the Binary Relationship Model. Finally, performance issues in the Binary Relationship Mode1 are analyzed in [22]. In Section 2 of this paper we first give a brief presentation of the graphical formalism used for schema definition in the Binary Relationship Model. As an example of its use we define a simple schema which will be used as a basis for the examples in this paper. After this introductory section, we proceed in Section 3 with a presentation of the major concepts of the graphical query language along with examples of its use. We introduce the concept of a query context graph; we give examples of chain, tree and lattice queries; and we show how the language can be used to define views. The syntax and semantics of the graphical query language are described along the way. A complete definition of the syntax and semantics of the textual query language is given in Appendix A. The syntax of the graphical query language is given in Appendix B.
2. CONCEPTS AND GRAPHICAL DATA DEFINITION LANGUAGE The following symbols are the building blocks of a conceptual schema in the Binary Relationship Mode1 (9, 111. Linguistic object types are represented by circles with a name inside. .. . /LOT". .NAME: .. . .
A lexical object type (LOT) with name “LOT NAME” represents a class of strings used to name objects. For example, NAME is a LOT having as elements (“SMITH”, “MARK”, “JONES”, . . .}; TITLE has as elements {“DATA”, “ADA”, “CLU”, “NORMALIZATION”, . . .).
0
NOLOT NAME
A non-lexical object type (NOLOT) with name “NOLOT NAME” represents a class of real world entities each of which is represented in the system by a surrogate [q. There is a one-to-one correspondence between surrogates and real world entities. Surrogates are distinct and generated by the system. They are invisible to and not changeable by the user of the system. The distinction between lexical and non-lexical object types makes it possible to represent an object separately from the lexical objects naming it. Binary relationship types are represented by a box, and two names inside.
A binary relationship type is an ordered pair of LOTS and/or NOLOTs; it represents a relation between the two linguistic types. r, and r, are names for the ROLES played by the linguistic object types in the binary relationship type. An example is a relationship between the NOLOTs, PERSON and PAPER, with the roles, “wrote” and “was by”, respectively. The relationship type defines a relation between the sets of occurrences of the two object types. We use the term active set for role r to represent the set of objects actually appearing in the tuples of the relation from the object type in role r. The set of occurrences of an object type is equal to the union of all active sets for the object type.
This special symbol for a binary relationship typecalled an IS-A or SUB-TYPE LINK-is used whenever a NOLOT IS-A subtype of another NOLOT or a LOT IS-A subtype of another LOT. An IS-A relationship defines a total function from the set of occurrences of the sub-object type to the set of occurrences of the super-object type. A number of symbols can be used to constrain relationships in the database, however, in this paper we do not need them and we refer to [S] for a detailed description.
Example 1
We illustrate the use of the graphic formalism for schema definition through the example schema in Fig. 1.
Graphical query language
233
.
. - .
l
Fig. 1. Example of a schema.
3. THE GRAPHICAL
QUERY LANGUAGE
We shall present the major concepts of the graphical query language in terms of examples. A complete definition of the syntax and semantics of a textual query language for the Binary Relationship Model is given in Appendix A. A syntax definition for the graphical query language is given in Appendix B. The graphical query language supports chain queries, tree queries and lattice queries. In addition, the user can define derived object types, DOTS and derived relationship types, CUTS. To relieve the user of the graphical query language from the strict syntax of the underlying textual query language, especially the order of mouse clicks, we introduce the concept of a query context graph, which is basically a piece of the database schema. Queries are interpreted by the system within the context of the current query context graph. The query context graph also provides the user with a means to easily change queries without having to redefine the whole query. A language that would allow the user to directly formulate queries using the mouse on the schema could fairly easily be defined, but it would have a very rigid ordering of mouse clicks as part of its syntax and it would not provide the user with a way of easily modifying queries.
3.1. Query context graphs A query context graph defines the context for the evaluation of a set of subsequent queries. A query context graph is a connected unordered subgraph of the database schema. It can consist of any number of LOTS and NOLOTs and binary relationships between them, as long as it is coherent. A query context graph is defined by using the mouse for pointing out the elements of the database schema that are to be part of it. The order in which the elements are pointed out makes no difference. The elements of the query context graph are highlighted in the database schema as they are pointed out. Changes to a query context graph, including the initial definition and subsequent modifications, are preceded by a mouse click to the keyword “change QCG”, and the end of the change is marked by a mouse click to the keyword “process”. A query context graph can be changed by adding or removing any number of elements in any order from it. Coherence of a query context graph is checked on the “process” command. In far most situations, we only need to point out the binary relationship types that are to be included in or removed from the query context graph. We shall see examples where pointing out the binary relationship types is not enough to create a query context graph.
234
h!O
MARK .
’
. .
.
.
. * .
l
TOPIC
?
. . .
. NAME
Fig. 2. Example of a query context graph (chain).
We illustrate the concept of a query context graph in the following example. Example 2
The query context graph in Fig. 2 consists of part of the database schema in Fig. 1, It is defined by using the mouse to point out the command keyword, change QCG, followed by a set of schema elements from Fig. 1, followed by the command keyword, process. The schema elements can be pointed out in any order as long as the query context graph is coherent when the process command is issued. Notice, that when the change QCG command is used, a mouse click to one of the roles of a binary relationship type will add the binary relationship type to the query context graph. -change
QCG
+(of,called -+(reads,read_by) +(denotehby,denoting) -+(on,of) -*process
Notice that there may be several identical pairs of role names in a database schema, but the mouse click uniquely identities the binary relationship types pointed out.
3.2. Queries A query context graph is an unordered graph that can form the basis for the evaluation of several different queries. A query provides the information needed to traverse the query context graph as a directed graph by pointing out a start point and by specifying conditions for the traversal of the query context graph. The result of a query is a set of values that belong to some lexical object type; values of non-lexical object types cannot be printed. Queries are always evaluated with respect to the query context graph. In addition to queries, the language supports the definition of DOTS and CUTS. We shall in the following show queries on query context graphs that are chains, trees and lattices. Example 3 (chain queries) The graphical query:
-list -NAME +process
will list the set of NAMES of PERSONS that read PAPERS on TOPICS denoted by some KEYWORD. The result consists of the set of lexical objects from NAME, that are related via the binary relation
235
Graphical query language (ofcalled) to non-lexical objects from PERSON, that in turn are related via the binary relation (reads,read-by) to non-lexical objects from PAPER, that in turn are related via the binary relation (onof) to non-lexical objects from TOPIC, that finally are related via the binary relation (denoted_by,denoting) to lexical objects from KEYWORD. The equivalent textual query is: list NAME of PERSON reads PAPER TOPIC denoted-by KEYWORD.
on
Using the relational algebra notation (u union, n intersection, - minus, II projection, u selection, D< semi-join, W join), the semantics of this query can be expressed as follows: II .,,,,,((of,called)W(reads,reahby) M(on,of)W(denoted_by,denoting)). Similarly, the following graphical query will list the set of KEYWORDS denoting TOPICS of PAPERS read by PERSON called some NAME. +list
*KEYWORD -*process
considerably simpler because there are fewer possible ways of joining the relations together in the binary model. Since the relations are binary, more joins are needed to evaluate a query than is normally the case in an n-at-y relational model. However, each of the binary relations is smaller, so fewer reads may be needed to read and join binary relations than to read and project an n-ary relation. Furthermore, multiattribute hashing works very well on binary relations. For a detailed analysis of performance of the Binary Relationship Model see [22]. We can specify conditions on the values of lexical object types in a query as in the following command v,hich will list the set of NAMES of PERSONS that read PAPERS on TOPICS denoted by the KEYWORD, “DATABASE”. + list +NAME -*value condition
*KEYWORD = “DATABASE” *process
The equivalent textual query is: list NAME of PERSON reads PAPER on TOPIC denoted-by KEYWORD = “DATABASE”.
The equivalent textual query is: list KEYWORD denoting TOPIC of PAPER reahby PERSON called NAME. The semantics of the query is defined by: II KEYWoRDdenolmg((denoting,denoted-by) W(of,on)W(read_by,reads)W(called,of)).
In general, the reduction of a chain query can be described as follows, where we use rj to denote the temporary result of prior reductions of the query:
It is outside the scope of this paper to discuss performance, but it is obvious that query optimization strategies similar to those used in the n-ary relational model can be applied in the binary model. As a matter of fact, the optimization algorithm is
The semantics of the query is defined by: II ,,,,,((of,called)W(reads,readby ) W(on,of)W (0 KEYWORDdenoting = DATABASE (denoted-by,denoting))). In general, the reduction of value conditions can be described as follows:
Chain queries can become quite complicated, as the following example shows.
Example 4 (chain queries, double back)
In the following graphical query we move all the way out to KEYWORD as in the previous queries,
LEOMARK
236
An alternative solution to these self-joint queries is to provide a facility for copying the query context graph. In the example above we would copy the query context graph into two parts with overlapping KEYWORD object type.
and then double back to NAME. --*list
+NAME -walue condition
+KEYWORD +denoting +of + readby -called -+NAME = “SMITH”
Example 5 (tree queries)
The following graphical command will change the query context graph into the tree in Fig. 3. + change QCG +(with,in) +(of,by)
-process
--*process
The result of the query is the NAMES of all PERSONS that read PAPERS on TOPICS denoted by a KEYWORD, that in turn is the KEYWORD of TOPICS of PAPERS ready by the PERSON called NAME, “SMITH”. The equivalent textual query is:
The following graphical query lists all the NAMES of PERSONS that read PAPERS, where the PAPERS are on the TOPIC denoted by the KEYWORD “DATABASE” and are published in a MAGAZINE published by the “ACM”. +list
+NAME + tree condition +PAPER -on
list NAME of PERSON reads PAPER on TOPIC denotedby KEYWORD denoting TOPIC of PAPER readby PERSON called NAME = “SMITH”.
+and
+in
The reason this query is more wordy than the previous ones is that the system needs more information than the query context graph and a simple condition on NAME to specify the condition on KEYWORD. The problem is that the path from KEYWORD back to NAME overlaps the path from NAME to KEYWORD.
-value
-walue condition
+PUBLISHER = “ACM” -process .
. .
. . . . . KEYWORD . . . . . . *
l
.
.
.
PUBLISHER-
.
.
.
’
.
.
. . .
* .
. .
.-&I
n TOPIC
.
l
NAME
condition
+KEYWORD = “DATABASE”
of
tolled
. l
Fig. 3. Example of a query context graph (tree).
231
Graphical query language Notice that the three conditions in the query can be given in any order.
A condition on a non-lexical object type is a conjunction or disjunction of roles of binary relationship types that elements in the object type must appear in. A condition on a lexical object type is a constraint on its set of values, which may consist of conjunctions or disjunctions of comparisons of lexical objects and constants. The comparison operators in these conditions play the same role as the roles of binary relationship types do for non-lexical object types. The equivalent textual query is: list NAME of PERSON reads PAPER
(on TOPIC denoted-by KEYWORD = “DATABASE” MAGAZINE by PUBLISHER
and in = “ACM”).
+process
The symbol + indicates mouse clicks that add elements to a query context graph, and the symbol + indicates mouse clicks that remove elements. The following graphical query lists the NAMES of all PERSONS who read a PAPER with the TITLE “DATABASE” that they did not themselves write:
+lattice condition
-+PERSON + reads +and +-I
PAPERon ((onvof)W = DATABASE
((in,with)W
(denotecLby,denoting)))nII,,,,,, (QPUBUHER~~=
QCG
+NAME
..,,,,((of,called)W(reads,reahby)W (0 KEYWORDdenoting
+chunge
c(with,in) +(of,by) t(on,of) +(denoted-by,denoting) +(wrote,wasby) +(entitled,of)
-list
The semantics of the query is defined by the following algebra expression:
:
query context graph into the lattice shown in Fig. 4.
-value
ACM (bY,Df))))h
Formally the reduction defined as follows:
+wrote +PAPER
of a tree query can be
condition
-TITLE = “DATABASE’ +process
The following example involves multiple paths between the same two object types.
We notice again that the conditions involved in the query above can be given in any order.
The equivalent textual query is: Example 6 (lattice queries)
The following
graphical
.*.-.
list NAME of PERSON (reads and iwrote)
statement
changes
the
PAPER entitled TITLE = “DATABASE”.
Fig. 4. Example of a query context graph (lattice).
...*.
238
LEOMARK
Using the relational notation the semantics of this query can be expressed as follows: II NAMEol ((of,called)W((reads,reahby) - (wrote,wasby))W (~ri~~r~r=nATABAsE(entitled90f))). It is important to realize the difference between the following two conditions on the result of a query: -lattice
condition
+PERSON *reads -+and
+wrote +PAPER + tree condition -PERSON -reads -*and
+wrote The first condition results in the set of NAMES for PERSONS that read PAPERS they wrote, whereas the second condition results in the NAMES of PERSONS who read PAPERS and wrote PAPERS, but not necessarily the same PAPERS. As can be seen from the condition on PERSON in the graphical query above, the lattice queries end conditions with an object type, whereas tree queries do not. As illustrated in the graphical query above, we can negate a binary relation in a query. This implies, that whenever we have a lattice, i.e. a set of binary relationships types between the same two object types, then we can formulate queries that require the computation of the set difference, intersection, or union of the binary relations in the lattice. The textual equivalents of the two conditions above are: . . . PERSON (reads and wrote) PAPER . . * PERSON (reads PAPER and wrote PAPER) Formally, the reduction defined as follows:
of a lattice query can be
schemata that are more detailed and precise than those designed in a purely lexical Relational Model. The lack of n-ary relationship types fragments the schemata more than those designed in an n-ary Relational Model. There are two disadvantages of the large schemata in the Binary Relationship Model. First, queries tend to be rather “wordy” because their length is directly proportional to the size of the schema. Second, it is sometimes hard to distinguish important concepts in a schema from less important details, especially since the schemata in the Binary Relationship Model are “flat”, i.e. aggregation is not explicitly supported. What we need for practical applications is a way of keeping the “deep structures” while seeing only the “surface structures”. We need to be able to define views on a conceptual schema. We need to be able to select and derive from the conceptual schema precisely the concepts we are interested in for each application of our model. In [5] we propose and discuss two new concepts for view definition in the Binary Relationship Model, DOTS and CUTS. DOTS are derived object types and CUT are derived roles. The two concepts are defined in the appendices. In the following examples we illustrate the concepts. Example 7 (DO Ts)
Based on the following query context graph (Fig. S), we can define the DOT,DATABASE_PERSON, as an object type derived from the non-lexical object type, PERSON. It consists of PERSONS that wrote, or write, or read PAPERS on a TOPIC denoted by the KEYWORD, “DATABASE”. The graphical statement defining the DOT, DATABASE-PERSON, is: *define
DOT “DATABASE-PERSON” *PERSON --*lattice condition
+PERSON -reads +or
3.3. Views Schemata in the Binary Relationship Model are considerably larger than schemata in the n-ary Relational Model. The reasons are the distinction between lexical and non-lexical object types and the lack of n-ary relationship types in the Binary Relationship Model. The distinction between lexical and non-lexical object types forces the database designer to design
+or
+wrote +PAPER *value condition
+KEYWORD = “DATABASE” +process
Graphical query language
239
Fig. 5. Query context graph for the DOT,DATABASE_PERSON.
The derived object type inherits the properties of the object type it is derived from, including its lexicality and its binary relationship types. It is reasonable to think of a derived object type as a dynamically defined subtype. After the process command above, the DOT becomes part of the database schema as a subtype of the object type it is derived from. It will appear as part of query context graph as illustrated in Fig. 6.
The equivalent textual DOT definition is: de& dot: = PERSON DATABASE-PERSON (reads or writes or wrote) PAPER on TOPIC denote&by KEYWORD = “DATABASE”. Using the relational algebra notation the semantics of this query can be expressed as follows: DATABASE-PERSON = lIPaRsoN((reads,read_by) u (writes,isby)u(wrote,wasby))W(on,of) W(~a,~a~n,r,s,&denoted-by9denoting))).
CT TOPIC
Fig. 6. The DOT,DATABASE_PERSON,
appears in the query context graph.
240
Lao MARK
of a DOT is to simplify subsequent queries. The following graphical command removes everything, but the DOT,DATABASE_PERSON, from the query context graph, and adds elements to the query context graph resulting in the graph in Fig. 7. Since DATABASE-PERSON is a subtype of PERSON it inherits the relationships that PERSON is involved in, as illustrated. The purpose
+ change QCG +(reads,read_by) +(writes,isby) t(wrote,was_by) +(on,of) +(denoted_by,denoting) +PERSON +(called,of) +(subscribe_to,subscr_by) +(by,of)
Using the definition of the DOT,DATABASE-PERSON, the query lists the NAMES of PERSONS that subscribe to MAGAZINES published by the PUBLISHER, “ACM”, and reads or writes or wrote PAPERS on TOPICS denoted by the KEYWORD, “DATABASE”. Using the algebra notation the semantics of the resulting query can be described as follows: NAMEor((of,called)WH,,,so, ~reads,read_by)u(writes,is-by)” (wrote,wasby))W(on,of)W (0 KEYWORDof = DATABASE (denote&by, denoting))))W(subscr_by,subscr_to) ~(“P”EILISHERof
=
Aw(bY,Of)).
Example 8
-process
Notice that we had to explicitly remove PERSON to keep DATABASE-PERSON in the query context graph. Had we removed the IS-A relationship between them, we would have removed both PERSON and DATABASEPERSON. The following graphical query lists the NAMES of DATABASEPERSONs that subscribe to MAGAZINES published by the PUBLISHER, “ACM”.
In Example 4 we experienced a problem with queries that overlap themselves on the query context graph. Using DOTS and the query context graph in Fig. 2, we can simplify the query in Example 4. The following graphical statement adds the DOT,SMITH’SXEYWORD, to the query context graph in Fig. 2 as a subtype of KEYWORD: +de$ne
DOT
+“SMITH’S KEYWORD” +KEYWORD -walue condition
+NAME = “SMITH” +process
4ist *NAME
The equivalent textual DOT definition is
-walue condition
define dot: SMITHS-KEYWORD = KEYWORD denoting TOPIC of PAPER read-by PERSON called NAME = “SMITH”.
-PUBLISHER = “ACM” +process
With this new query context graph, the following query is equivalent to the one in Example 4: -+list
-+NAME -+process
The equivalent textual query on the new query context graph is: list NAME of PERSON reads PAPER on TOPIC denoted-by SMITHS-KEYWORD. Example 9 (CUTS)
Using the graphical statement below on the query context graph in Fig. 7, we can define a CUT which further simplifies the formulation of queries: . . .
.
. * . NAME
-. Fig.
.
. .
.
-define CUT = “loyal-to” *from
. .
7. Query context containing graph DOT,DATABASE_PERSON.
the
+NAME *to +PUBLISHER +process
Graphical query language
The statement defines a CUT with the role name “loyaLto” from NAME to PUBLISHER. It is a derived binary relation consisting of pairs of NAME and PUBLISHER, where the NAMES are of DATABASE-PERSONS that subscribe to MAGAZINES published by PUBLISHERS. The equivalent textual CUT definition is: define CUT: loyalto from NAME to PUBLISHER = of DATABASEPERSON subscribes-to MAGAZINE by. Using the algebra notation the semantics of the CUT definition can be described as follows: (loyal-toJoyaLto-‘) = HNAMEofPuBLISHERof ((of,called)W(subscr_to,subscr-by)W(by,of)). The defined CUT becomes highlighted, and thus part of the query context graph. This is illustrated in Fig. 8. If we now remove the binary relationship types the CUT was defined from, leaving us with PUBLISHER, NAME, and the newly defined CUT, then the graphical query listing the NAMES loyal to the PUBLISHER, “IEEE”, is expressed as follows: + list +NAME -+value condition *PUBLISHER = “IEEE” -process Since the CUT, “loyaLto”, was defined on the DOT,DATABASE_PERSON, the full result of the query is a list of NAMES of PERSONS that subscribe to MAGAZINES by the PUBLISHER, “IEEE” and reads or writes or wrote PAPERS on TOPICS denoted by the KEYWORD, “DATABASE”.
.:
NAME
I,+, .
. * .
*
Fig. 8. Query context graph after CUT definition.
241
Using the definitions of the CUT, loyalto, and the DOT,DATABASE-PERSON, the above query is equivalent to the following textual query: list NAME of PERSON(subscr_.to MAGAZINE by PUBLISHER = “IEEE” and (reads or writes or wrote) PAPER on TOPIC denoted-by KEYWORD = “DATABASE”). In the example above, the CUT simply replaced a simple chain between two object types. If the query context graph is a tree or a lattice, then conditions must be defined on all branch points for the CUT to be well defined, just as for DOTS. It should be emphasized that the query context graph is just as important for the result of a query as the query itself. The concept of a query context graph is primarily introduced to allow the user to break up the formulation of a query, to reuse parts of a query, and to avoid a strict order of mouse clicks. 4. GRAPHICAL
ENVIRONMENT
The design of the screen layout will be described in a follow-up paper when our prototyping effort starts. We plan to implement graphical interfaces for schema definition and query processing. The query processing interface will support queries through any application schema and through the meta-schema. Since application schemata may be very large we expect to use the screen as a sliding window that can be moved over a schema. 4.1. Requirements to the schema definition interface The graphical schema definition interface must support the definition of schemata consisting of the following graphical elements: full and dashed circles with object type names; for each lexical object type defined, there must be a selection of primitive types; double boxes with role names connecting two object types; arrows connecting two object types for IS-A relationships. In addition to this, there must be symbols for key constraints, cardinality constraints, totality constraints, uniqueness constraints, subset constraints, exclusion constraints, and partition constraints. For a definition of these constraints, see [5]. Finally, on-line support for further documentation at schema-definition-time should be provided. All of the above elements should be represented by icons that can be selected and used as part of the schema. We need to be able to add and remove elements from a schema, and the schema should be “sticky” in an intelligent way to allow new elements to be added where they belong to form a coherent schema. We need to be. able to move, stretch and bend the graphical schema definition in any direction; and it should not come apart when we do so. The number of command keywords (possibly also represented by icons) can be. kept very small, i.e. less than 10. They will include the following: define a schema, change a schema, store a schema.
LEO MARK
242
The schema definition interface should give immediate feedback to the database designer, i.e. it should report object type name conflicts, role name ambiguity, inconsistencies in subtype hierarchies, inconsistencies among constraints, etc. 4.2. Requirements to the query interface We need to be able to issue queries through any application schema and through the meta-schema. The latter is quite important as a means to find and summarize information about large application schemata. It may also be used as a way to locate a desired part of an application schema for further query processing. We need to be able to highlight parts of the schema and dim others. This facility is needed to support the query context graph concept. Object types, roles and IS-A relations must be able to “feel” mouse clicks. Roles must be able to feel mouse clicks in two different modes. When a query context graph is defined or changed, a mouse click to a role implicates the whole binary relationship type, whereas it only implicates the role itself when a query is specified. Conditions on lexical object types need special consideration from a screen management point-ofview because they may require use of the keyboard rather than the mouse. It is clearly undesirable to change medium, but it is difficult to find simple and nice alternatives. The query processing interface should give immediate feedback to the user when a query is being defined. The feedback can warn the user about the violation of any syntax rules of the graphical query language and about violations of context sensitive rules of the language evaluated against the current schema. The number of command keywords is intensionally kept very small for the query processing interface. They include the commands: change QCG, process, list, dejine DOT, define CUT. . from . . to, and lattice condition, tree condition, and value condition. In addition, the logical comparison operators: =, (, ),, are needed. Finally, we need the logical connectors: and, or, 7.
We have yet to decide how the result of list statements are to be presented to the user. 5. SUMMARY We have defined a graphical query language for the Binary Relationship Model. The language has the same retrieval power as a textual language presented earlier. The language is well-suited for interactive query processing and for validating a schema design. The language can be used for query processing through both the meta-schema and application schemata. Although equivalent, the graphical and textual languages are considerably different. Through the
introduction of the concept of a query context graph, we have managed to avoid a strict order of mouse clicks in the graphical queries, and, at the same time, the query context graph serves as the reusable part of a query. There are many ways to compare the textual and graphical query languages. In the textual language we have to type the names of all the object types and the names of half of the roles in what corresponds to the query context graph. The total number of mouse clicks needed to change the query context graph and formulate a query is hard to estimate. When we change a query contest graph we only need mouse clicks to the binary relationship types that are to be removed or added to the graph; only rarely are mouse clicks needed for the object types. If the query context graph is unchanged or only changed a little, then the query is going to require far fewer mouse clicks than the number of words needed for the equivalent textual query. But more important than quantitative measures, we believe that the graphical query language presented here is qualitatively better than the textual language. Experiments verifying this cannot be conducted until the prototype has been built. The graphical interface is ideal for building and validating prototypes of databases in the Binary Relationship Model. The part of the schema currently being worked on will occupy most of the screen both during schema definition and query processing, and only the menues or icons used for the two functions may have to change. We are currently working on a number of ideas, some of these are described in the following. An n -ary graphical query interface is fairly easy to define. Inspired by Query-by-Example, making a relation part of the query context graph corresponds closely to specifying join variables in QBE. To project the query context graph into an n -ary lexical relation, we simply need to point out more than one lexical object type to be listed. Selection corresponds closely to value conditions. With an appropriate graphical notation for quantifiers we can define a relationally complete graphical query interface for the Binary Relational Model! The only problem is that the results of the queries are not part of the model, so we are talking about a query interface, not a query language. This work is already close to completion. In this paper we have concentrated on the closed part of a query language which makes it possible to define views. A graphical data entry language for the Binary Relationship Model and an interface for the Relational Model are also under definition. In both cases, the data entry facility integrates very well with the nature of the query ianguage and interface. Given the above languages, we can define an n-ary relational interface on top of a binary database. We even believe that we can automatically generate SQL code from graphical queries and use this code in application programs running against the relational
Graphical query language interface. Has anyone ever heard about graphical database application programming languages? As in the relational model, the DDL and the DML are not completely separated because the DML is used for view definition, and, as in the relational model, only minor modifications of the DML are needed in order to use it as a constraint definition language. In the following example we first define a query context graph, then we constrain the PUBLISHER for SMITH to be IEEE: -+change
QCG
+(of,W
-+(subscr_to,subscr_by) +(of,called) -‘process -define +value
constraint condition
+NAME = “SMITH” -+constraint
-+PUBLISHER = “IEEE’ -+process
Including a constraint definition facility like this is another interesting topic for future research. With a little imagination, most of the constraints needed can be expressed using such a tool. Constraint and view definitions must be stored in parsed form under the meta-schema (see Appendix A). It is not good to just represent the view defining expressions as values of LOTS in the meta-schema, rather we would like to explicitly model the components of the view defining expressions. The reason is that the query language processor needs access to both the schema and the view definitions through the meta-schema.
[91 G. M. Nijssen. An architecture for knowledge base
software. Presented to the Australian Computer society, Melbourne, July 30 (1981). WY B. Breutman, E. Falkenberg and A. M. Mauer. CSL: a conceptual schema language. Proc. ZFIP TC 2-WG 2.6 Working Conf, Munchen, Germany, March (1979). Griethuysen (Ed.). Report IS0 1111J. J. van TC97/SC5/WG3: concepts and terminology for the conceptual schema and information base. American National Standards Institute (1982). WI J. M. Smith and D. Smith. Database abstractions: aggregation and generalization. ACM Trans. Database System. 2(2), 105-133 (1977).
(131 S. Ceri, G. Pelagatti and G. methodology for designing static of data base applications. Inform. P41 M. F. Senko. FORAL LP: Design
Bracchi. Structured and dynamic aspects Sysfems 6(l) (1981). and implementation. Proc. Very Large Data Base Conf. Berlin (1978). WI R. Meersman. RIDL: A query system as support for information analysis. ECODO 32, 3-3@3-37 (1981). 1161D. Vermeir. Semantic hierarchy and abstractions in conceptual schemata. Inform. Systems 8, 117-124 (1983). P71 G. M. A. Verheijen and J. Van Bekkum. NIAM: an information analysis method. In Information Systems Design Methodologies: A Comparative Review (Edited by T. W. Olle, H. G. Sol and A. A. Verrijn-Stuart). North-Holland, Amsterdam (1982). WI L. Mark. What is the binary relationship approach?. In Entity-Relationship Approach to Software Engineering
(Edited by Davis). North-Holland, Amsterdam (1983). v91 L. Mark and N. Roussopoulos. Integration of data, schema, and meta-schema in the context of self-documenting data models. In Entity-Relationship Approach to Software Engineering (Edited by Davis). NorthHolland, Amsterdam (1983). PO1 L. Mark. Self-describing database systems-formalization and realization. Ph.D. Dissertation TR-1484, Department of Computer Science, University of Maryland (1985). WI N. Roussopoulos and L. Mark. Schema manipulation in self-describing and self-documenting data models. Int. J. Compur. Inform. Sci. 14(l), l-28 (1985). 1221C.-C. Lin, L. Mark, T. Sellis and C. Faloutsos. Performance issues in the binary relationship model. Data Knowledge Engng (1988).
REFERENCES (11M. E. Senko. Specification of stored data structures and desired output results in DIAM II with FOBAL. Proc. Very Large Dais Base Conf., Berlin (1975). PI W. Kent. Data and Reality. North-Holland, Amsterdam (1978). [31 G. M. Nijssen. One, two or three conceptual schemata. Proc. IFIP TC 2-WG 2.6 Working Conf, Munchen, Germany, March (1979). 141 R. Meersman. Reasoning about conceptual schema design-a semi-tutorial. In Proceedings IFZP TC-2 Conference on Knowledge and Data. North-Holland, Amsterdam (1987). PI L. Mark. Defining views in the binary relationship model. Inform. Systems. 12(3), 281-294 (1987). b51 J. R. Abrial. Data semantics. In Data Base Management (Edited by J. W. Klimbie and K. L. Koffeman), pp. l-59. North-Holland, Amsterdam (1974). [71 P. Hall. J. Owlett and S. Todd. Relations and entities. In Modelling in Database Management Systems (Edited by G. M. Nijssen). North-Holland, Amsterdam (1976). PI G. M. Nijssen. Current issues in conceptual schema concepts. In Architecture and Models in Data Base Management Systems (Edited by G. M. Nijssen). North-Holland, Amsterdam (1977).
243
APPENDIX
A
In this appendix we use a self-describing meta-schema for the Binary Relationship Model as the basis for the definition of the syntax and semantics of the query language. The meia-schema
The meta-schema (Fig. Al) can be read as follows. It defines BIN REL TYPES “with” precisely two (2!) ROLES. Each ROLE can be “called” at most (-) one ROLE NAME, which is the ROLE NAME “of” the ROLE. Each (V) ROLE is “of” at most one (-) BIN REL TYPE and “on” at most one (-) OBJECT TYPE. Each v’) OBJECT TYPE is uniquely (-) “called” an OBJECT TYPE NAME. OBJECT TYPE NAMES are “of’ at most one (-) OBJECT TYPE. OBJECT TYPES are partitioned 8 into lexical- and non-lexical obiect types (LOTS and NOLOTs). SUBTYPE LINKS are BIN RELTYPEs with ROLE called “sub” on one of the OBJECT TYPES and with ROLE called “sup” on the other OBJECT TYPE. (Only SUBTYPE LINKS will be allowed to have role names, “sub” and “sup”). ROLES are partitioned @ into PRIMITIVE ROLES and DERIVED ROLES. Each (V) DERIVED ROLE “has” one (u) SHORTCUT EXPRESSION, which is “for” a DERIVED ROLE. A DERIVED BIN REL
244
LEO MARK
TYPE is-a BIN REL TYPE. OBJECT TYPE is partitioned 0 into PRIMITIVE OBJECT TYPES and DERIVED OBJECT TYPES. Each (V) DERIVED OBJECT TYPE “has” one DOT EXPRESSION, which is “for” a DERIVED OBJECT TYPE. We shall not here concentrate on the details of making a meta-schema self-describing. For a detailed discussion see [201. In the extension of the meta-schema we store both a description of application schema defined in terms of the Binary Relationship Model and a description of the metaschema itself. This latter description can be used whenever we want to retrieve information about the general rules and laws for defining schemata using the Binary Relationship Model since this is exactly what the meta-schema describes. We obviously do not want to change the stored description of the meta-schema. Syntax
Using curly brackets for repetition, square brackets for optionality, slash for choice, bold type for terminals, and less than and greater than brackets for non-terminals, the syntax of the query and view definition language is:
A (lotname) must be an occurrence of the object type “OBJECT TYPE NAME” and this occurrence must be connected to an occurrence of the object type “LOT”. A (nolot_name) must be an occurrence of the object type “OBJECT TYPE NAME” and this occurrence must be connected to an occurrence of the object type “NOLOT”. An (object-type-name) must be an occurrence of the object type “OBJECT TYPE NAME” and this occurrence must be connected to an occurrence of either of the object types “NOLOT” or “LOT”. A (role-name) must be an occurrence of the object type “ROLE NAME” and this occurrence must be connected to an occurrence of the object type “ROLE”. Alternativelv. we could have defined
List (lot_objectset)
(dot_definition)
.._ ...._ ..-
(cut-definition)
.._ ..-
define CUT: ((role-name),(roleename>) from (object_typename) to (object_typename)
(objectset)
(nolot_objectset) (lot-objectset)
(nolotobject-set)
....I .._ ..-
(lot-object-set)
:: I .._ ..-
(liststatement)
define dot: (object-typename)
= (object-set)
= (role)
::
(qualific) (role)
(multi-role) (camp)
:: .._ ..I .._ ..-
(nolotname) (nolot_name)(qualific) (nolot_name)((qualific){and (nolotname)((qualific){or (lot-name) (lot_name)(comp)(value) (lotname)(qualific) (lot.name)((qualific){nnd (lot_name)((qualific){or
(qualific)}) (qualific)})
(qualific)}) (qualific)})
i ((qualific)) (role)(objectset) (rolename) 7 (role) (role)(object-set)(role) ((multi-role))
I ..I ....-
(role){and (role)} (role){or (role)} =ll~/~/f
For brevity, (value), (rolename), (objecttype name), (lotname) and (nolotname) are not defined in further detail. A (value) will be represented by a string of digits or upper-case letters enclosed in single quotes. A (roledame) will be represented by a lower-case string. A (nolotname), (lot-name) and (object-type-name) will be represented by an upper-case string. An acceptable statement must in addition to the syntax rules above obey a set of context sensitiverules, i.e. rules that depend on the particular schema the statement is issued against. These context sensitive rules are best defined in terms of rules on the data, i.e. schema, defined by the meta-schema.
All first (role_name)s in the expansion of all the (qualific)s of a (nolot-objectset) must be names of roles of the NOLOT named by the (nolot_name) in the (nolot-object-set). We note, that subtypes inherit the roles of their supertypes. The first (nolotname) or (lot-name) in the expansion of an (object-set) of a (qualific) must be the name of an object type in the range of the role represented by the (role-name) of the (qualific). In the expansion of the non-terminal, (role), into (role), followed by (object-set) followed by (role),, the first object type name in the expansion of the (objectset) must be the name of an object type in the range of (role), and the last object type name must have (role), as a role. When a (role) expands into a (multirole), then all the first (role_name)s in the expansions of all (role)s in the
Graphical query language
245
..
. : .
DOT_
.,
EXPRES
:
. ,:
&
..... . .
ROLE NAME . .
.Ofi&.
‘.
.
TYPE
for
bar
:
:
:
of
called
Fig. Al. The meta-schema. conjunct/disjunct must be names of roles of the preceeding object type; and, all the last (role_name)s in the expansions of all (role)s in the conjunct/disjunct must be names of roles that have the one object succeeding the conjunct/disjunct in their range. The two statements (dot-definition) and (cut-definition) define derived object types and derived roles, respectively. A derived object type, or DOT, inherits the roles of the object type it is derived from and can be used in place of an object type in a list statement. A derived role, or CUT, can likewise be used in place of a role in a list statement. The two concepts, DOTS and CUTS, provide a view concept for the Binary Relationship Model. For more details, please refer to [S]. The context sensitive rules for DOTS and CUTS are as follows. There are no additional rules for DOTS. The “from” object type in the CUT definition must have a role that is the same as the first role in the expansion of (role), and the “to” object type must be in the range of the last role in the expansion of (role). If the (role) expands into a (multi-role), then these rules apply to all the first and last roles in all roles in the conjunct/disjunct. The semantic of the query language is defined as follows. The list statement prints the set of values that results from evaluating the set of lexical objects defined by (lot-objectset). The define dot statement defines a derived object type with the name (objecttype-name) and the set of objects defined by (object-set), but it does not cause an evaluation of this set. The derived object type has the lexicality of the (object-set). The deline cut statement assigns the role name, (role-name) to the role defined by (role) from the object type denoted by the first
(object-type-name) to the object type denoted by the second (object-type_name), but it does not cause an evaluation of this role. Names of derived object types and roles will simply be replaced by the expression defining them when used in a query. It is important to realize, that the only place objects are stored in this model is in the binary relations. There are no existence lists of objects for the object types. Given the binary relation 0, (r, .r,)O,, then the active set of 0, in role r, is the set {x1$:(x, y)EO,(r,,r,)O,}. The active set of 0, is the union of all the active sets of 0, in some role. Subtype relations are represented as binary relations with the role names “sub” and “sup”. For query optimization purposes, we may choose to store materialized derived object types and roles. A conceptually consistent way of storing materialized derived object types would be to simply create a subtype relation between the derived object type and the object type it is derived from. A materialized derived role can simply be stored as a binary relationship type. A (nolot-objectset) defines a subset of the set of occurrences of the NOLOT denoted by (nolotname). If the (nolot_object-set) consists of (nolot-name) only, then the two sets are identical. If the (nolotobjectset) consists of (nolot-name) followed by (qualific), then there are two cases to consider. First, if (qualific) consists of (role) followed by (object-set), then the (nolot_object_set) is the subset of (nolot-name) which is related by (role> to (objectset). Second, if (qualific) is negated, then the (nolot-objectset) is the (active) subset of (nolot-name) which is not related by (role) to (object-set). If the (nolot-objectset) consists of (nolot_name) followed by a conjunction of (qualific)s, then the (nolot-object-set) is
246
LEOMARK
the intersection of the subsets of (nolot-name) determined by the individual (qualific)s in the conjunction. If the (nolotobjectset) consists of (nolot-name) followed by a disjunction of (qualific)s, then the (nolot-object-set) is the union of the subsets of (nolot-name) determined by the individual (qualific)s in the disjunction. A (lotobject-set) defines a subset of the set of occurrences of the LOT represented by (lot-name). If the (lot_object_set) consists of (lotname) followed by (camp) followed by (value), then the (lot-object-set) is the set of objects oe(lot-name) for which o(comp) (value) evaluates to true. We can think of (camp) as a role on (lot-name) and (value) as a single lexical object in (lotname). The semantics of the other rules for (lotobjectset) are the same as the rules for (nolotobjectset), mutatis mutandis.
A role is a multi-values function from one object type to another. If (role) consists of (role-name), then the role is identical to the role denoted by (role-name). If (role) consists of (role), followed by (objectset) followed by (role),, then (role) is identical to the function composition of (role), and (role),. If (role) is a (multirole), then there are two cases. If the (multi-role) is a conjunction of (role)s, then (role) is the intersection of the conjuncts. If (role) is a disjunction of (role)s, then (role) is the union of the disjuncts. Subtype-links can be part of role deriving expressions like any other role. Since subtypes inherit the binary relationship types of their supertypes we only include subtypes in role deriving expressions when we explicitly want to reserve the resulting CUT for the subtype and prevent it from being used from the supertype.
APPENDIX
B
In this appendix we define the syntax of the graphical query language. Change of query context graph
(change_QCG)
.._ ..-
{add-element)
....-
(remove-element)
:: =
(element)
.._ ..-
*change {[(addelement)/(remove_element)]}-rprocess (*change and process are keywords pointed out by a mouse click*)
+(element) (*corresponds to a mouse click adding an element*) +(element) (*corresponds to a mouse click removing an element*) NOLOT/LOT/ROLE/SUBTYPE_LINK
Query formulation (listquery) (defineDOT)
.._ ......-
(define-CUT)
....-
+list -rLOT{(condition)} *process -+deJine DOT “(name)“+[LOT/NOLOTJ {(condition)} +process +define CUT “(name),(name)” -from -+[LOT/NOLOT]+ to +[LOT/NOLOTJ {(condition)} +process
Conditions on query context graph
(condition) (lattice condition)
....:: =
(tree-condition)
....-
(value-condition)
:: =
[(lattice_condition)/(treerondition)/(value_condition)] -lattice condition +[NOLOT/LOT] {+ROLE~[and/or/~]}+ROLE~[LOT/NOLO~ + tree condition +[NOLOT/LOTl {-+ROLE+nd/or/l]}-ROLE -walue condition -LOT-r{ = /(/)/}“(value)”