UTILIZATION OF DATA ACCESS AND MANIPULATION IN CONCEPTUAL SCHEMA DEFINITIONS GERNOTRICHTER Gesellschaftfiir Mathematikund DatenverarbeitungmbH Bonn @MD),Postfach 1240,D-5205St. Augustin1, Federal Republicof Germany (&c&red 20 May 1980) Abstract-The basic idea of this approachis to use data access and ~nip~ation fun~~ns in data de~ni~on,such that testing a given individual data object on its conformance to data de&&ion is done by running a (finally boolean) procedure against it. In essence, schema entries (i.e. definitions, declarations, etc.) are viewed as expressionsof predicatelogicwherethe individualsare obtainedby the executionof data manipulationoperations which in&de a rather general informationselectiontechniqueor conceptualaccess method. Selectionof informationconstructsin databasesusually adoptsa techniquecloselytailoredto the specific“data model”. It is one of the intentions of this paper to demonstrate the common principles behind the variety of selection techniques by a uniform approach which comprises the seiection features of most of the database ma~ment systemsand makes them comparable. At the level of information structure a kind of “geography”is introduced into a database, which allows to distinguish the same information construct (record, file, segment, item, coset, tuple.. . .) in distinct information contexts or at distinct conceptuallocations called “spots”. By definition,every spot (“construct in/with context”) exists only once in a given database.In combinationwith some basic operatorsa logicaladdressingmechanismhas been designed,whichfollowsthe context of a construct to identifyit within this very context. This algorithmturns out to be a general vehicle to locate a construct at a spot, inde~ndenuy of whetherthe databasehas a relational, network,~~rarc~~ or any other appearance. The method of context directed addressingalong with pertinent operators allows in a very general way-L neither biased nor restricted to a “data model”-to define types of information constructs and of construct transitions as is required in a conceptualcommunity schema.This is demonstratedthroughexamplesof schema entries with rather complex cross-consistency conditions and additional transition rules called persistency conditions. The examoles also intend to give an idea of the minimum support to be expected from any future conceptualschemalanguage. PREFACE From mid 1977 to mid 1979 a Brazilian/German cooperative project in the field of database technology has been executed in Rio de Janeiro and Porto Alegre. The project resulted in the specification of the database operation Language DABOLll] (or LOBAN in the original Portuguese version[2,3]), a subset of which is being implemented at the Federal University in Port0 Alegre and is intended to be implemented at the Federal University in Rio de Janeiro. The underlying principles of DABOL originate from a conceptual approach which aims at creating a system of concepts for the precise and adequate description of information management structures and operations. Parts of this system of concepts called IMC (Information Management Concepts) have been published earlier in [4-g] and elsewhere, however being sometimes misinterpreted as a proposal of a (hierarchical) “data model”-the reason for this might become clear after having discussed the notion of information context (see Section 3). The availability of an (even still incomplete) system of tried basic concepts before designing a database management interface proved to be of great benefit for the objective of a clear and precise language design and spec~~~on, as has been striven for with DABOL. The same findings have been obtained in another project, which also relies on IMC. This project resulted in the specification of an interface called DAGS
which aims at supporting implementors of database management systems rather than users. An experimental subset of DAGS has been implemented and is running at GMD. 1. ~ODU~ON In this in~oduction the basic concepts of the IMC approach for the specification of data structures are summarized in an informal way. Their application for various “data models” is outlined in Section 2, which also serves to introduce the two reference databases of this article. The emphasis of the present paper, however, is on the use of data access and manipulation for conceptual schema definition. This requires presentation of one of the fundamental issues of IMC which is the conceptual distinction between an information construct (“data object”) and its appearance in an information context. The concept of spot which reflects this very distinc~on underlies in particuIar the technique how to point to a spot in a database, i.e. how to identify a construct in a considered context. Therefore, after a short introductory presentation of the concept of spot in Section 3, the technique of context directed spot addressing as available at. but not limited to the DABOL interface is explained and demonstrated in Section 4. The application of spot addressing for data type definition and transition rule specification (“transitional constraints”) at database interfaces is covered in Sections 5
53
54
G. RICHTER
and 6, which constitute the “message” to be contributed
to current conceptual schema discussion. Finally, other application fields of spot addressing are mentioned in Section 7. The appendix summarizes the few language elements introduced in this paper. The “data objects” in the IMC approach are of an abstract nature in the sense that their (potential or actual) representations do not enter into the concepts on the model information level. Any abstract “piece of information” which can be referred to at a considered database management interface is called an information construct or simply constnrcf. The fact that constructs are intended to model something (real world,. . .) and the way how this is achieved, is of great importance in information system design and operation, but is not regarded in the present context. A construct which is considered elementary, i.e. neither composed nor decomposable at the interface in mind is an atom as opposed to an aggregate which is composed according to one of the two composition principles in IMC. An aggregate is either a finite set of constructs (the immediate component constructs) and is then called a collection, or a finite function from constructs (the name constructs) to constructs (the immediate component constructs) and is then called a nomination. Usually only constructs of special kinds like strings (aggregates) and atoms may appear in the role of name consfructs or names for short. In the graphical representation of constructs (see Section 2) circles are used to indicate the presence of a name (in nominations), whereas a box (in IMC box representation) or the vertex of a tree (in IMC tree representation) is used to depict a considered construct and its component constructs (in aggregates). (For the relationship between box representation and tree repre+entation as well as for their combination see [5].) In the box representation, the way a nomination maps the name constructs onto the component constructs is depicted by “pair symbols”. To distinguish the role of the name constructs in a nomination, a pair (name,component) (which is not itself considered a construct) is represented by
A name rather than by
1name] J [component
This yields representations of nominations as shown in Figs. 1,2, $7 and 10. In the following section box representation will be used together with a shorthand tabular representation for “flat” nominations and for sets of nominations with the
1 SNO Sl
1 SNAME
I
SMITH
1 STATUS
I
20
same name set, i.e. for collectives as they are called in IMC. 2.CONSTRUCT COMPOSITION
IN
DATA MODELS
As far as the mere compositional characteristics of the various data models are concerned they differ only in the way of how constructs are aggregated (apart from terminology). The relational approach (cf. [9]) is concerned with “flat structures” which are nominations on items and are there called tuples.
Fig. 1. IMC box representationof a relationaltuple.
In Fig. 1 a tuple is represented, where the set of names is {SNO, SNAME, STATUS, CITY} and $e immediate component constructs (or values) are the items Sl, SMITH, 20 and LONDON. Constructs which may be components in a relational tuple are atoms or other basic constructs like character strings and numbers. In the following discussions we shall refer to them as to items. By the way, the layout of a grafic construct representation is of no matter for its interpretation. Thus Fig. 2, for instance, represents the same nomination as Fig. 1.
0
CITY
8
STATUS
I SNO
LONDON
8 51
Fig. 2. IMC box representation of the same relational tuple aa in Fii. 1.
Apart from the above box representation of nominations a tabular representation is sometimes more convenient to draw. Figure 3 exemplifies a tabular representation for the tuple of Figs. 1 and 2. Where the type of generic type of a consStruct is to be indicated as well, the pertaining type designation is represented in a “type plate” attached to the construct symbol[5] as shown in Figs. 4 and 7. (For the concept of type see [4,7] and Section 5 of this paper.)
1 CITY LONDON
name
constructs
component constructs
Fig. 3. IMCtabular representation of the same relational We asin Figs. 1and 2.
Utilization
of data access and manipulation in conceptual schema definitions
55
nomination which maps the name constructs S, P and SP onto the three relations. The elements of the common name set of the tuples in a relation are called attributes. The type of items, that is the “pool of values from which the actual values” of a given attribute are drawn, is referred to as its domain [9]. As another example for the composition of constructs
A relation in the relational approach is a collective of tuples[4], which means a collection of tuples with the same set of names. (In addition, in all tuples the values of a considered name are usually required to be drawn from the same type or domain.) Taking the sample data of [9] (Fig. 3.1), a full box representation of the table of suppliers (S) may look like in Fig. 4. The repeated
S-TAB
I
1
1
I z
S;O,
,zE;AME,
7
S,TATU;
,,g
’ S-T”P
““;
.
I
I ’ ,
z
;“oi
($AME
, y
ST’“;
pAz
ClT; ’ S-TUP .
J Fig. 4. IMC
box representation
of a collective
appearance of the name symbols and names in the box representation of a collective reflects exactly the compositional properties of the represented construct on the conceptual level. To simplify the drawing of tables, however, a short-hand tabular representation has been provided in IMC. This method “factorizes” the common name set of the tuples and puts it in the familiar way as a header on the top of the remaining individual parts of the tuple symbols.
of tuples (“relation”).
consider the entity-relationship model [ lo]. Further aggregation levels have been introduced there in order to reflect more reality structure in the information structure. Two kinds of tuples are distinguished: An entity tuple is a nomination on items and/or on mathematical tuples of items (called o&es). Thus it resembles (but is not) a relational tuple. Named entity tuples and named items may form a “higher” nomination called relutionship tuple. Depending on the context different terms are
S [SNO
PNO
1 SNAME
PNAME
1 STATUS
COLOR
Pl
NUT
Ri3
P2 P3 PS
COLT SCREW SCREV
GREEN BLUE REG
1 CITY
YEIGHT
1’12 17 17
j 1 -I&
I,
cq
-&h (iUPl?i
CITY LONDON PARIS ROME LONDON
CP 5
I
Fig. 5. Hybrid IMC representation of a relational database (combination of box representation and tabular representation)_the construct identifiers c. are for use in Section3 and are not part of the construct representation.
In Fig. 5 a hybrid representation is shown, where the three tables for the collectives of supplier tuples, of part tuples and of shipment tuples appear in tabular representation under the names S, P and SP. respectively. They make up a relational database which will be used as a reference example throughout the paper. The whole database (last level of aggregation) is considered a
used for the concept of name: The name of an item or item tuple in an entity tuple is an entity attribute, the name of an entity tuple in a relationship tuple is a role and the name of an item in a relationship tuple is called a relationship uttribure. Schematically this structure can be depicted as in Fig. 6. A relationship tuple of type PROJECT-WORKER
G. btTER
r$latlonshi~ attribute
role
. . .
. . .
Fig.6 SchematicIMC~xrepresen~tionoft~s~~eof
arclations~p~ple-~t~eedots(. but not how many at least/at most.
. .)indicate“various”,
taken from [IO] (Figs. 7 and S), is shown together with approach (see [9]) is to form an information unit, i.e. a type designations in Fig. 7. Note that instead of the construct, where one reiord (the “ownes”)is linked to a entity key for the entity tupte of type EMPLOYEE the set of records (the “members”) and vice versa. In adfull entity tuph has been inserted. dition, an order relation is established on the set of Entity tuples with common name set and relationship member records. For such a construct a “neutral” and tuples with common name set are the components of artificialterm has been coined which stresses the aspect entity relations and relutionship relations, which are col- of linkingone record with a set of records. We call it a lectives of the respective tuples. @ation [Z,11.In a ligationone or more order relation(s) Apparently, for item type the term o&e set, for type may be, but need not be established on the set of of entity tupks the term en&y set, and for type of memberrecords. in the CODASYLnetwork approach at reiatio~~p tuples the term fe~ar~o~~~ set is used .in ieast one order relation is always present; far such a construct the term “set” or, preferabiy, case? has been Vol. Finally, a rough analysis of the CODASYL network used in the literature. On the other hand, in [9] (Section approach-as far as construct imposition is cancer- 25.2) a construct cahed fan is intr~uced, which is a t&-demonstrates the applicability of the IMC primi- ligationwithout any order relation (by definition).Within tives for this model, too. The basic idea of the network the scope of the present discussion order relations in
ROJECT-WORKER] WORKER
I
ALTERNATIVE-NAME
PROJECT
PERCENTAGEOF-TIME
I
Fig. 7. IMC box represen~tion of a rciat~nship htple of type PROJE~-WORKER.
Utilization
of data access and manipulation 1 ligation
7
MEMBERS
OWNER
0
I : .
i -I
Fig. 8. Schematic IMC box represen~tion ligation without
of the structure of a
orderrelationson the members.
ligations will nbt be taken into consideration. In Fig. 8 the general structure of a ligation (without order reiations) is represented, where the names have been chosen by analogy with the main components of a coset (in a fan of [9] PARENT is used instead of OWNER, CHILDREN instead of MEMBERS). A record of the network approach may be a rather highly aggregated nomination on items and/or nominations. In order to leave the term record for free use in other contexts we prefer to consider these nominations generalized &pies, with the flat tuples being a subset thereof. The CODASYLapproach does have the concept of a set of iigations,but doesn’t have a term for it (see “fan set” in f91).A set of Iigationsis an IMC collective. To allow for an easy distinction between a collective of tuples and a collective of Iigations we use the terms relational table and ligational table, respectively. Figure 9 gives a schematic overview of the structures introduced so far and the related terminology. The info~ation model of an environment.ofsuppfiers,
57
in conceptualschemadefinitions
parts and shipmentsin terms of relationaltables has been represented in Fig. 5 (see [9], Fig. 3.1). A possible model of the same environment in terms of ligational tables (network approach) is shown in Fig. 10, which differs from the correspondingmodel in ([9],Fig. 3.5) in that the “connector” tuples (shipments) bear their membership i~ormation in a visibie manner, as it is done in the example of Fig. 20.15in [9]. Thus we obtain a “network database” which is a nominationof the names S-SP and P-SP on the two ligational tables and will serve as a reference example in the remainder of this paper, contrasting with Fig. 5. The characteristic of being a “network structure” will come out as a cross-consistency ~onst~int between ligationaltables (to be discussed in Section 5) and is therefore not a matter of mere coset composition (neither of IMC representation technique). 3. CONsTRucrs EN CONTEXT So far we have considered isolated constructs or, to say it in a more precise way, constructs wi~out regard to their info~ation context in which they appear. What is the “context” of a construct? Loosely speaking, the context comprises all the constructs “around” the construct in mind, up to the level of aggregation where the considered context terminates, i.e. where the border of the context (= construct) of reference has been reached. It has been shown in [4-6], that the context around a construct and its logical position within this context can be precisely defined by a sequence of pairs, where in each pair the right side is a construct and the left side is empty or the name of the construct at the right side. Referringto the network database of Fig. IOwe locate as an example the item S3 in the ligationaltable P-SP. For layout reasons a graphical representation of the comprising aggregateswould be too clumsy for the notation of a sequence of pairs. So we use symbolsc.. , instead, to represent the involved non-item constructs in the
1 llgotionol table
OWNER
MEMBERS
[ ligation
n
relatIonal
nome
table
. . . L
I%:9.
SchematicIMC box representation of the structure of constructs considered in Sections 3-6
58
G. RICHTER
Q OWNER
s-SP
Q
HEMBERS
Q
MEMBERS
Q
I
SNO Sl
1 SNAHE 1 SMITH
1 STATUS 1 20
1 CITY 1 LONDON
Q
OYNER
I
SNO S2
I
1 SNAME [ JONES
1 STATUS I10
r
II SNO
S3
1 SNAME 1 BLAKE
I
Q +f
OWNER
1 STATUS I30 ’
rLcb
1 CITY [ PARIS
I
‘I
NEHBERS
Q NC9I I
CITY PARZS~~~
/j
Led
ch (tuple) I
I’
ca P-SP
Q
I
OUNER PNO P2
1 PNAFlE 1 BOLT
I COLOR ] GREEN
NENBERS
I WEIGHT
I CITY
1 17
1 PARIS
ch (tuple)
s\
Cl
SNO
PNO
QTY
Sl St
P2 P2
S3
P2
200, 400 200
‘Lee
-Lee
OWNER PNO P3
1 PNAME I SCREW
1 COLOR I BLUE
1 UEI,GHT 1 17
1 CITY I ROME
NE-,
;,
Fig. 10. Hybrid MC
representation
of a network database-for
the construct identifiers c. . . see Fig. S.
Utilizationof data access and manipulationin conceptual schema definitions following expressions. The sequence (-, ca) (P-SP, cc) (-, ce) (MEMBERS, ci) (-, ch) (SNO, S3) defines the spot where the construct S3 appears in the considered context. The sequence of pairs we call the dejining sequence of a spot. There is one and only one defining sequence for a spot. In the present notation a single hyphen (-) indicates the absence of a name (empty name). On the level of immediate components of collections names are always absent by definition, in contrast to nominations which do not have immediate components without names. One might be interested in locating the same construct S3 in its two other contexts or, using the new concept, at its two other spots in the network database. This can be achieved by the defining sequences (-, ca) (S-SP, cb) (-? cd) (OWNER, cf) (SNO, S3)
59
depending on whether the construct at the considered spot appears there under a name (in a nomination) or not (in a collection). In the above examples the constructs ca (the network database) and cp (the relational database) have been considered the reference constructs. In a defining sequence of a spot there are no “jumps” or “gaps”: In two adjacent pairs the right side construct of the successor pair is always an immediate component of the right side construct of the predecessor pair. A sequence where the first and the last pair coincide, i.e. which has only one pair, defines the spot of the construct isolated from any context. We call this spot the ownspot of the construct, because it refers to the “own” construct as its reference construct. As an important consequence of the definition of the spot concept it turns out that there is a spot hierarchy in every structure oriented data model or in other words, the spot hierarchy is not typical of a “hierarchical model”. This will be utilized for a uniform addressing approach on a logical or conceptual basis as shown in the following section.
and (-, ca) (S-SP, cb) (-, cd) (MEMBERS, cg) (-, ch) (SNO, S3). In the present discussion it is of no matter whether the construct appears at different spots by chance (as, e.g. the construct 17 in two owners in P-SP) or by virtue of a “consistency constraint*’ (as, e.g. the construct S3 in the given example). In the relational database of Fig. 5 the construct S3 appears at the two spots (-3 CP) 6, cq) 6, cf) (SW
S3)
and
4. OPERATORS ANDSPOTADDRESSES The universal hierarchical spot structure allows to design a general logical addressing algorithm for spots, which follows the defining sequence of a spot in its top-down stepping from the ownspot of the reference context (construct) to the spot in mind. The basic idea is to substitute each pair of the defining sequence by a spot selection criterion to be evaluated against each candidate spot, i.e. against each spot immediately beneath the spots which have already been selected by the evaluation of the preceding criterion. In order to syntactically separate the criteria by level and to explicitely indicate the transition (descent) of the evaluation process to the next lower level a “level separator symbol” (say a dot) is introduced. Thus an expression has been obtained which is called a spot address and reflects the idea of the spot defining sequence by its syntax:
(-, cp) (SP, cr) (-, ch) (SNO, S3). Note that the two tuples cf and ch in the relational database are the same constructs as cf and ch, respectively, in the referred network database (they only appear in different contexts, i.e. at different spots). The defining sequence of a spot has a top-down orientation. It starts with a pair (empty name, reference construct) and concludes with a pair (name, construct at spot in mind) 01
(empty name, construct at spot in mind) 1S Vol 6. No I-E
spot-address : : = spot-selection-criterion [.spot-selection-criterion]. .
Like the first pair in a spot defining sequence the first spot selection criterion indicates the context of reference, or more precisely, identities the spot of the reference construct. For this purpose spot identifiers are defined, which may be standard word symbols (“reserved words”) at a given database management interface (e.g. “DB” for the ownspot of the overall database construct). Before we continue with what is a spot selection criterion for spots within the reference construct, two basic operators (primitives) have to be introduced which work on spots. Both are read or fetch operators: By
60
G. ltaxTsR
applyingthem on a spot, the name cons+3 (operator N) or the component canstruct (operator C) at this spot is obtained in a virtual in~~~~d~u~e zone which serves to keep intermediateresults (in the very same way as, e.g. on the occasion of the evaluation of a nested arithmetic expression). In terms of the defining sequence of a spot we can paraphrasethe two operators this way:
N spot-address %ame at spot”: yields in the in-
As an example we want to address the file with the name PSP in the network database of Fig. 10.The spot address is DB.(NXS=P-sP)
termediate zone the left element (Name construct) of the last pair in the defining sequence of the addressed spot (for spots of immediate components of coHectionsit results in “‘undefine~~.
I
second spot selection criterion level separator symbol I ’ first spot selection criterion. The evaluation of this spot address begins with the “selection” of the spot identifiedby DB, i.e. the ownspot of the database.Then the dot is “executed” which means to descend to the next lower level and to flag all spots immediately beneath the selected spot(s) as candidates with respect to the next spot selection criterion, in thii case the two spots where the files S-SP and P-SP appear. Now the system will(conceptually)set up a loop where it examines for each candidate spot, whether the name which appears at the examination spot is P-SP (literal)or not. As it has arrived at the last criterion in the given spot address, the spot(s) with the result TRUE is (are) the ~d~re~s~ spat(s). Thus e.g. the expression
I
~1
~!
XS for the candidate spot (= examinee) against which a considered criterion is about to be
intermediate zone the right element
With these two operators as a minimumwe are in the position to obtain the operandsfor other oper~rs which work on the obtained constructs (in the intermediate zone) rather than on spots. Examplesare the well known N DB.(N XS = P-SP) arithmetic operators, relational operators, boolean operators, etc. Thus, e.g. a product of two Wriabtes” yie1d.sthe name at the addressed spot, that is the string VARI and VAIU has to be written P-SP, whereas the expression CVARl*CVAR2 C DB.(NXS = P-SP) rather than VARf * VAR2 whore VARl and VAR2are very simple spot addresses supposed to refer to B nomination with sit least two immediatecomponents(“values”)under the namesVARf and VAR2. Now we can return to the spot selection criteria and define them as boolean expressions (and, later on, predicate calculus expressions): Only candidate spots for which the evaluation results in TRUE will be selected and taken into considerationwhen the candidatesof the next lower spot level are “nominated”. In order to be able to write down a spot address we have to start with the definition of at least two spot identitiers:
DB for the ownspot of the DataBaseconstruct,
I
yields the component construct at the addressed spot, that is the ligationaltable cc. Note that within the spot selection criteria there are again expressions (and spot addresses),by which cone structs are obtained in the intermediateaone in order to process them in a boolean operation. As an example of nested spot addresses we address (the spot of) the owner record of the ligation(?oset”) which represents part P2 and its shipments in the network database of Fig, 10.To depict the top-downevaluation of the spot address we write it in a top-down notation with a more conspicuouslayout (with indention instead of parentheses). On the right hand side a shorthand notation is shown which makes use of the‘syntac‘tical redundance of the term “XS.” (unless explicitly stated otherwise, an inner spot address refers to ttre examination spot) and of the term “N XS =” (a single name literal being the whole criterion will always be interpreted as if preceded by this term). The same shorthand notation in a one line layout (belowt resembles to
Utilization of data access and manipulation in conceptual schema definitions extent
61
familiar way of
notation DB .N XS ;-SP c xs .N XS
shorthand notation comment DB . . . identifier of database ownspot . . . criterion for the spot of the ligational table with name P-SP P-SP C . . . criterion for the spot of the ligation for part P2
6WNER .N XS
OWNER
;NO
PNO
;2 .N XS
P;
ZWNER
OWNER
. . . criterion for the spot of the tuple with the name OWNER (in the P2 ligation)
shorthand notation in one line layout DB.P-SP.(C OWNER.PNO = PZ).OWNER
A spot address, by the way, need not be unique. Strictly speaking, a spot address always addresses a (possibly singleton) set of spots. For the operators N and C, however, only singleton spot sets may be addressed. An operator which can cope with various addressed spots is the collection operator: COLLEC
spot-address
yields in the intermediate zone a collection of the component constructs encountered at the addressed spots
9
spot addressing can be presented in this paper. Shorthand notation of spot addresses and a space saving layout will be used.
Example 4.1 Obtain supplier numbers for suppliers in Paris with status > 20 ([9], 5.3.3, 7.2.3). Note that on an “obtained” construct further operations have to be executed in order to get it in a working area or to put it in the database. But this is beyond the scope of this paper. Against the relational database (Fig. 5)
STATUS>20
Against the network database (Fig. 10) COLLEC DBS-SP .C OWNER.CITY = PARIS AND C OWNER.STATUS > 20 .OWNER.SNO
With the three operators N, C and COLLEC (others have been defined in [2,1]) we are in the position to do some comparative exercises of answering queries against different databases by the same uniform conceptual addressing mechanism based on spot addresses. Due to the restricted space available, neither the generalization of
In both cases a collection (of supplier numbers) is obtained rather than a one column relation. Example 4.2 Obtain supplier names for suppliers who supply at least one red part ([9], 5.3.12, 7.2.7).
62
G. RICHTER
construct type x a separate type specification(in terms of a composing (pre)ty~ yf which has to be met by every occurrence of x. In a considered language this is achieved by a type definition entry which can be paraphrasedas for each occurrence of type x holds: is occurrence of y and (additionally~ meets type specification or, somewhatloosely, in set theoretic notation x = {y/typespecification). designaf~n, we get a sly-~e~rchi~
type com~sition like in a bill of materials application, where the part types correspond to the construct types in the present case. Conceptually,a type specificationhas to be evaluated each time a construct is examinedwith respect to its type conformance, which means to subject the construct to an operation which accepts or rejects itf71.
Now the point is, that in the type specification (boolean expressions, later on extended to predicate calculus expressions) the same spot addressing mechanism can be applied as in data manipulation, regardingthe candidate construct as the “database” for the purpose of type checking.The only thing we need in
63
Utilizationof data access and manipulationin conceptualschemadefinitions addition is a further spot identifier:
OC for the Ownspot of the Construct to be examined (i.e. for the ownspot of the pretended Cl
LIGTABLE (ligational table), LIGATION, INTEGER and STRING. Moreover, as a minimum “instruction set” the following two operators which work on (obtained) constructs rather than on spots are introduced (in addition to common arithmetic and boolean operators):
yields the cardinality of the obtained col-
Let S-TUP be the type designation of supplier tuples (see x above) which we use in the two databases of Figs. 5 and 10, and TUPLE the built-in pretype designation for tuples (see y above), then the corresponding type definition entry could be S-TUP : type of TUPLE such that C OC.SNO EL SUPPLIERNO AND C OCSNAME EL SUPPLIERNAME AND C OC.STATUS EL STATUS AND C OC.CITY EL CITYNAME where the type designations for the component types are supposed to refer to their type definition entries (e.g. with enumeration terms on the right side of EL). This “schema entry” along with all other entries referred to could be used by the system to check, e.g. the tuple of Fig. 1 on its belonging to S-TUP or not. If the construct Sl is element of SUPPLIERNO, the construct SMITH element of SUPPLIERNAME, the construct 20 element of STATUS and the construct LONDON element of CITYNAME (and by convention, no further component is present), the tuple will pass the check. In a shorthand notation we could dispense with the term “OC.” because it is obvious that by default the reference construct of spot addresses in a schema entry is the pretended occurrence. The utilization of operators like C, N, COLLEC on spots within the database and of boolean, relational and other operators on obtained constructs (see below) renders a type specification an executable expression in that it is considered a boolean procedure the execution of which against a given construct constitutes the verification process and yields “yes” or “no”. This opens the way for going beyond the mere compositional type definition. Further (possibly more complex) expressions may be admitted which establish logical conditions on the components of an occurrence by the use of first order predicate logic. These expressions cover all type relevant aspects of the so-called “consistency constraints” or “integrity constraints”. As we don’t see any reason for emphasizing composition specification as opposed to (cross-)consistency specification, we shall call a type definition entry of this general nature a consistency entry, that is an entry which allows to test a single construct for whether it is consistent (conforms to the specified type) or not. An (incomplete) set of consistency entries for the databases of Figs. 5 and 10 may demonstrate the definition power of consistency entries. For the purpose of these examples we assume the pretype designations DATABASE, RELTABLE (relational table), TUPLE,
PROJ
relational-table-expression
ON list-of-attri-
yields the projection of the obtain relational table (as a new relational table in the in-
Let the relational database of Fig. 5 be a database of type R-DATABASE. Apart from its composition from the three tables of the types S-TAB, P-TAB and SPTAB the following consistency entry specifies that the set of values appearing in the column SNO (resp. PNO) within the table SP is required to be a subset of the values appearing in column SNO (resp. PNO) withiq S (resp. P): R-DATABASE : type of DATABASE such that C S EL S-TAB AND C P ELP-TAB AND C SP EL SP-TAB AND COLLEC SP. .SNO SUBEQ COLLEC S. .SNO AND COLLEC SP. .PNO SUBEQ COLLEC P. .PNO The specification of the latter two constraints is tied to the database type. This is the correct level for an interrelational consistency specification (see observation made in [9], p. 399, example 9). Remember that an empty string between dots is equivalent to TRUE, i.e. selects all candidate spots for further consideration. The consistency entries for the three types of pretype RELTABLE have to specify their composition (i.e. each table is a subset of the component tuple type) and an additional constraint on the key columns (which resembles the technique of SEQUEL as presented in [91,p. 401): S-TAB : type of RELTABLE such that C OC SUB S-TUP AND CARD PROJ C OC ON SNO = CARD C OC The last expression asserts ‘that the number of distinct SNO values in any occurrence of S-TAB is the same as the number of its tuples, in other words, SNO is key attribute in occurrences of type S-TAB.
64
G. itICHTER
P-TAB : type of RELTABLE such that C OC SUB P-TUP AND CARDPROJ C OC ON PNO = CARDC OC SP-TAB : type of RELTABLE such that C OC SUB SP-TUP AND CARDPROJ C OC ON SNO, PNO = CARDC OC
To some extent this concept resembles the concept of the current value of a “tuple variable”. Now we paraphrase the functional dependency by the assertion that the number of distinct QTY values for a given SNOlPNOvalue combinationis one (see System R example in [9], p. 402), which finally results in the consistency entry
If the “key specifications” (last line in each entry) were only to hold for table occurrences under the names S, P or SP rather than for occurrences of S-TAB, P-TAB, respectively, they had to be placed at database type level:
SP-TAB : type of RELTABLE such that C OC SUB SP-TUP AND FOR EACH OC.~UE (CARDCOLLEC OC .C SNO = C CS.SNOAND C PNO = C CS.PNO .QTY. = 1).
CARDPROJ C 0C.S ON SNO = CARDC 0C.S CARDPROJ C 0C.P ON PNO = CARDC 0C.P CARDPROJ C OCSP ON SNO, PNO = CARDC OCSP
it is obvious, that “at least” and “at most” assertions to define a lower and an upper bound may use the same technique. For the tuple and item level we only sketch the consistency s~c~cations:
The conceptual distinctness of these two assertion levels is in general not considered in the literature due to the lack of distinction e.g. between a relation type (= specified set of relations, sometimes called “a time- S-TUP : type of TUPLE such that C SNO EL SNO-ITEM AND varying relation”) with tuples of type S and a relation etc (for each component) occurrence under the name S having tuples of type S. Another issue in the context of table (relation) con- P-TUP : type of TUPLE such that sistency is functional dependency. Let us suppose that C PNO EL PNO-ITEMAND for the table type SP-TAB no key attributes are etc (for each component) specified,but a functional dependency of QTY on SNO and PNO, instead. This example applies the universal SP-TUP : type of TUPLE such that quantifier(operator on spots) C SNO EL SNO-ITEM AND C PNO EL PNO-ITEM AND C QTY I 100EL INTEGER AND CQTY>=100ANDCQ?“Y<=5000.
The last two lines specify two boundsfor QTY values and require in addition that any.QTY value is an integer multipleof 1001 SNO-ITEM : type of ITEM such that, . . PNO-ITBM : type of ITEM such that.. , etc (for each user defined item type). SupposingNW-DATABASEto be the designationof the type of database, of which Fig. 10 represents an occurrence, the correspondingconsistency entries would be NW-DATABASE: type of DATABASEsuch that C S-SP EL S-SP-TAB AND C P-SP EL P-SP-TAB AND COLLEC !Z-SP..MEMBERS.TRUE= COLLEC P-SP. .MEMBERS.TRUE S-SP-TAB : type of LIGTABLE such that C OC SUB S-SP-LIG AND CARDCOLLEC OC. *OWNER= CARDCOLLEC OC. .OWNER.SNO P-SP-TAB : type of LIGTABLE such that C OC SUB P-SP-LlG AND CARDCOLLEC OC. .OWNER= CARDCOLLEC OC. .O~ER,PNO
Utilizationof data access and manipulationin conceptual schemadefinitions
65
automatic “correction” on consistency violation (see triggers of system R or request mod~cation of INGRES), deferred enforcement of consistency (transactions), etc. Finally-for the reader who is interested in a more detailed evaluation of the proposed technique-a short contrastingpresentation of examplestaken from a recent S-SP-LIG : type of LIGATION such that paper of the “abstract data type” approach[l2] is given. C OWNER EL S-TUP AND The referred proposal aims at demonstrating that the C MEMBERSEL SP-TAB AND “operational” concept of data abstraction can be applied FOR EACH MEMBERS.SNO with good profit to databases as well. To give au idea of (C CS =C OWNER.SNO) the relationshipbetween this “operational”approach and the present (“static”?) one the “specification of inP-SP-LIG : type of LIGATION such that dividual types and further interdependencies”(Sections C OWNER EL P-TUP AND 4 and 5 of [123)is given in our terms. C MEMBER EL SP-TAB AND The central construct type considered is the tuple type FOR EACH MEMBERS..PNO (C CS =C OWNER.PNO). with type designationPERSON (example 1 in [ 121):
In occurrences of the two ligational table types the SNO values and the PNO values, respectively, identify the l&ions (“owner keys”). At the ligation level it is required, that the SNO (or PNO) value of the owner is the same as in the correspondingcolumn of the members:
LENGTH C NAME > = 2 AND LENGTH C NAME < = 50 AND C AGE EL INTEGER AND CAGE>=OANDCAGE<= 150AND C PERSONS-STATUS EL (SINGLE; MATED; WI~WED; DIVORCED)AND C TAX-STATUS EL fl ; 2; 3; h: 5)
Specifications like these have a counterpart in the CODASYLDDL source-clause[111,which wouldappear in the member type entry (a conceptually incorrect place), e.g. RECORDNAME IS SP-TUP ....
02 SNO SOURCEIS SNO OF OWNEROF S-SP-LIG
The relation between the names of the components and the components themselves is implicitely given by the above consistency specificationand need no further devotion. The “modes”[l2] like STRING and INTEGER are considered pretypes (built-in, e.g. drawn from the host-system of the interface). If we want to define a type VOTINGAGE-PERSON as a subset of type PERSON (example 2 in [12]), we specify VOTING-AGE-PERSON: type of PERSON such that CAGE>= 18
The missinginsistency entries (for types of relational tables. tuples and items) are the same as in the previous The specificationof “ordered pairs” in order to model relational database example. a WORKS-INrelation on the “elementary sorts” PERAlthoughit is not intended to discuss languageaspects SON and DEPARTMENT(example 3 in 1121)is nothing in this paper, some remarks seem to be appropriate. else than a consistency spec~cation for a tuple type of Consistency entries are discussedhere from the point of two aggregationlevels: view of spot address application. Thus the “simplicity aspect” has been neglected to a certain extent. The WORKS-IN : type of TUPLE such that C PERSON EL PERSON AND author is sharing the opinion that for simple and/or C DEPAR~ENT EL DEPONENT AND frequently used assertions a simpler “method” of C PERSON.DEPT-NR definition, i.e. a shorthand notation for the full expres= C DEPARTMENT.DEPT-NR sions (e.g. specificationof string length) would probably prove more suitable (See 191,2.4.21,which, by the way, supposing an appropriate specification of the types has hen adopted for the recently developed DABOL PERSON and DEPARTME~. interface[2,1], too. Using the same technique a collection type GROUP On the other hand, not even the functional aspects of may be defined, where the components are of type consistency specificationhave been discussed here in an PERSON and no two persons have the same education exhaustive way. Among them are issues like ordering, (a component EDUCATIONis assumed).
G. RlCHrER
66
GROUP : type of COLLECTIONsuch that C OC SUB PERSON AND CARDC OC = CARDCOLLEC OC. .EDUCATION (or, using the universalquantifier:) FOR EACH OCTRUE I* for each tuple spot *I (NOT C CSEDUCATIONEL COLLEC OC .NOTC XS=C CS .EDUCATION) An equivalent specificationof course could be given usingthe existentialquantifier“EXIST spot-address”.In example 5 of [ 121the same type of collection is specified as the set of all groups (C OC SUB PERSON) without those groups, where duplicates of EDUCATIONoccur. Many examples for individualtype specificationas well as some of those, where “interdependent events” ([12], Section 5) are involved, fall within the “static” consistency specification:An enterprise is considered, that produces an article only if ordered by a customer. Let two relational tables ON-ORDER and IN-PRODUCTION model the set of articles on order and in production, respectively. The restriction “only if ordered” appears as a consistency specificationon the database level (“inter-relationcondition”):
properties of consistent (states of) databases, another type of specificationsis needed to define allowed transitions of the database or, in other words, to define the restrictions on the set of transitions from a consistent database to any other consistent database. Sometimes the latter specifications are referred to as “dynamic constraints” as opposed to the so-called “static constraints”, which have been discussed in the preceding section. The attribute “;tatic”, however, must not be misunderstood in the sense that no operation need be executed in order to check the consistency specification. In [7] it has been outlined, that both consistency specification and transition type specification may be viewed, in a more adequate manner, as pertaining to a general concept, which has been called coherence.Those
. . . . type of DATABASEsuch that
EL ORDER-TAB C ON-ORDER C IN-PRODUCTIONEL PRODUCTION-TAB C IN-PRODUCTIONSUBEQ C ON-ORDER.
AND AND ,
Another example of “operational” specificationis outlined in [12],where two tables for female persons and for male persons, respectively, are related the following way: Wheneverthe personal status of a person belonging to one of the two sets changes from SINGLE to MARRIED, the status of a person from the other set must change likewise. If records of married persons may neither enter nor leave the two tables, the following consistency specificationwould suffice:
coherence specifications which establish dependencies between an “old” and a successive “new” database, are called persistency specifications.The term “persistency” alludes to the “inertia” of a database against modification: Not any but only one of those modificationsmay be executed whicheffects a persistent transition depending on the preceding database. The theoretical problem with persistency lies in the need for comparingsomething“old” with a corresponding“new” one in two successivedatabases.Withinthe scope of this
. . . .*type of DATABASEsuch that
C FP EL FEMALE-TAB AND C MP EL MALE-TAB AND CARDCOLLEC FP.C PERSONAL-STATUS= MARRIED ;ARD COLLEC MP.C PERSONAL-STATUS= MARRIED. Otherwise, a technique for transition type specification as presented in the next section (see last example there) has to be applied or a general trigger specificationtechnique has to be adopted which would be based on an appropriate event classification. LlltANSlTlONTYF’EDWJNITIONlNA CONCEFl’UAL SCIIEMA
Whereas consistency specificationsdefine the properties of consistent constructs, in the last instance the
paper the meaningof “corresponding”must be left to the intuition, maybe inspired by these two examples: (1) Consider the transition from a tile with name F in a database (old database) to a modifiedlile with the same name in the modified database (new database). In general,the spot of file F in the new database@BP) will be viewed as correspondingto the spot of file F in the old database (DB.F, too). This might k so because of the identity of the file name. In a slightlyincorrect way of speakingthe two files are sometimesreferred to as the
Utilization of data access and manipulation in conceptual schema definitions
“same” file. How would correspondence be affected by an alteration of the file name? (2) Consider the transition from a record in a file of the old database to the modiied record in the new database. As far as the primary key has not been altered one would consider the record spot in the new database corresponding to the record spot in the old database. Is it the “same” record? In order to facilitate precise discussion of the persistency issue we introduce the concept of place as a sequence of spots, such that qf two successive spots the second spot @o&pot) corresponds to the first spot @respot), which implies that the (“new”) database with the postpot has been obtained by modification of the (“old”) database with the prespot (see Fig. 11).Actually, the concept of place involves the intention of the user when modifying a database, because it is the user who has the desirpd continuity (correspondence) of his database in mind or, in other words, who calls something which has undergone modification “the same”. This makes a general treatment of the place concept quite difficult. On the other hand, this very concept allows for a precise interpretation of everyday wordings like “the same file/record” (= the file/record at the same place). By definition, the old database and the new database are considered to be at the same place, i.e. the ownspot of the new database always corresponds to the ownspot of the old database. As a rule, correspondence is not left to the single user, but built-in in a database management interface in order to meet a general (community) understanding of continuity. Therefore we may start from a given correspondence and discuss persistency in terms of places.
Suppose a user requiring that to the file with name F at most two records may be added at once (between two successive consistency checks). In the case of F being an immediate component of the database the specification CARD C POSTDB.F- CARD C PREDB.F < = 2 would be an evident and sufficiently precise persistency specification. The spot identifiers PREDB for the ownspot of the old database POSTDB for the ownspot of the new database provide a linking of two successive databases. Note that by definition any persistency check virtually involves both the old database and the new database. As a more sophisticated example imagine the constraint that the quantity shipped (value of QTY) in an existing tuple of the shipment file may only be incremented. Generalizing the utilization of the prefixes PRE and POST we can apply them in the present case for linking two successive corresponding spots where (both in the old and in the new database) a value of QTY appears in the table SP: FOR EACH DB.SP. .QTY (C PRECS < = C POSTCS) Here PRE and POST have been prefixed to the spot identifier CS (current spot in the FOR EACH loop). The prefixes PRE and POST are only applicable to spot identifiers pointing to spots of a place where transition will be monitored. Such a place we call a transition
. . . spot A-A
. ..
iprespot)
correspondmg
spots
lpostspotl
...
PI
Fig.
67
place (sequence of corresponding spots)
11.Spots and places in a sequence of database states.
68
G. RCHTER
place. So far the spot identifiersDB, XS (Section 4) and CS (Section 5) belong to the category of prefixabie spot identifiers,but not does OC (Section 5). The meaningof the two prefixes is quite obvious:
refers to the prespot, i.e. the first spot of two successive spots of a considered transition
identifierOC is used here for didacticalreasons): SP-TUP : type of TUPLE such that C OC.SNOEL SNO-ITEM AND C OC.PNOEL PNO-ITEM AND C OC.QTYI 100EL INTEGER AND C OC.QTY> = 100AND C OCQTY < = 5000 transition such that C PRESPT.QTY< = C POSTSPT.QTY.
of a spot according to an established corA new spot identifier has been introduced for use in type bound persistency specification:
refers to the postspot, i.e. the second spot of two successive spots of a considered tran-
SPT for the Spot of a Place of Transition,where an occurrence of the type appears
cessor.of a spot according to an established The spot identitier SPT is a prefixable spot identifier. Note that SPT does not identify the ownspot of a type The above example of increasing shipped quantities occurrence (see OC), but the spot where the occurrence may be formulated by two further but equivalent appears within the database. expressions, namely from the point of view of the old Remark. In SEQUEL the reserved keywords OLD and NEW stand for our terms “C PRESPT.” and “C database POSTSPT.“, respectively and are used for persistency specificationon the tuple level only. FOR EACH PREDB.SP..QTY Besides the above intermediate persistency (C CS < = C POSTCS) specification, which will be evaluated during the “lifetime” of a place (i.e. as long as correspondingis and from the point of view of the new database stated), reality modeling calls for an initial and a final persistency specification, too. The initial persistency FOR EACH POSTDB.SP..QTY specificationdefines the condition which has to hold “at (C PRECS < = c CS) first”, i.e. when a spot without correspondingpredecesWhereas these two persistency specifications make sor is to appear or, in other words, when a place comes reference to the database where the spot address of the into existence (see Fig. 11). As an example suppose a FOR EACH expression is to be evaluated, the other registration file, where records for persons born in the (first) persistency specification given above does not. considered city may only enter, if the personal status is Thus it has to hold both for the old and for the new single. On the other hand, the final persistency database. The consequencesof such a requirement have specificationdefines the condition which has to hold “at not yet been investigatedsufficiently,so we do not adopt last”, i.e. when a spot without corresponding postdecessor (successor) is to disappear or, in other words, the above “symmetric” specificationtechnique. Sometimes, starting with persistency specification when a place discontinuesits existence (see Fig. 11).As from the database level may be quite a clumsy technique, an example suppose the constraint in the tegistrationfile, although it is the most precise way to model real world that a record of a taxpayer may only be removed, if the persistency requirements (“transitional constraints”). It balance is not negative. has turned out, that many practical persistency problems These considerations suggest a tripartite persistency can be solved adoptinga simplificationas far as the level entry which can be paraphrasedas of persistency specificationis concerned. If persistency is tied to consistency in the sense that transition confor each transition at a place straints are established for type occurrences, we gain a with an occurrence of the pertinent type holds: simpler specification at the price of a (maybe trouat first meets initialpersistency specification, blesome)extension of the scope of this specification:At in between meets intermediatepersistency each place in an evolving database sequence, where an specification, occurrence of the type appears, a persistency check will at last meets final persistency specification. be performed. Suppose the above persistency specificationfor QTY values in tuples of type SP-TUP tied to the consistency specification of this type. This yields the following To specify initialand final persistency we use the spot overall coherence entry (full notation with the spot identifier SPT without prefix, because it identifieseither
Utilizationof data access and manipulationin conceptualschema definitions
the first spot of a place of transition (initialpersistency) or the last one (final persistency). As we have already indicated by examples, boolean expressions (and also predicate calculus expressions) are used for persistency specification. As a concluding demonstration of the utiliition of spot addressing in coherence entries (combination of consistency and persistency entry) consider the following two examples. Suppose we want to define that all iigations in a database of the type NW-DATABASE (see previous section) may only enter with an empty list of shipments, and shipment tuples may only be inserted and updated but never be removed from a ligation. The coherence entry for ligationsof type S-SP-LIG would be
69
PROJ C PRESPTMEMBERS ON SNO, PNO SUBEQ PROJ C POSTSPTMEMBERSON SNO, PNO I* no constraint for removinga ligation*I at last TRUE The other examplerefers to the mentionedregistration file, for the tuples of which we want to impose the following persistency specification:Initial and final persistency as above, two intermediate persistency rules, viz. transition of personal status only single-married, married-widowed,marrieddivorced, widowed-married, or divorced-married, and transition of salary such that the new value is greater than the old value. The coherence entry (with the consistency specificationonly sketched) is the following:
PERSdN : type of TUPLE such that C NAME EL STRING AND EL {SINGLE;MARRIED;DIVORCED;WIDOWED} AND C PERSTAT EL {HERE; FOREIGN} AND C ORIGIN C TAX-BALANCEEL INTEGER AND ELREAL AND C SALARY . . . etc. transition such that I* born here impliessingle *I at first IMPL C SPT.ORIGIN = HERE C SPT.PERS’I’AT= SINGLE /* only legal modificationsof persona1status */ in between C PRESPT.PERSTAT = SINGLE IMPL C POSTSPT.PERSTAT EL {SINGLE; MARRIED} AND C PRESPT.PERSTAT = MARRIED IMPL C POSTSPT.PERSTAT EL {MARRIED;WIDOWED;DIVORCED} AND C PRESPT.PERSTAT = WIDOWED IMPL C POSTSPT.PERSTAT EL (WIDOWED;MARRIED} AND C PRESPT.PERSTAT = DIVORCED IMPL c posTspT.pERsTATEL {DIVORCED;MARRIED) AND I*not decreasing salary *I C PRESPT.SALARYc = C POSTSPT.SALARY I* remove tuple only with settled tax debts *I at lastst C SPT.TAX-BALANCE> = 0 S-SP-LIG : type of LIGATION such that Referring to the last example of Section S a EL !3-TUP AND C OWNER specificationof a persistency rule is given which includes C MEMBERS EL SP-TAB AND no “static” constraint but allows for free entering of FOR EACH MEMBERS..SNO records of already married persons into the files FP (CCS= C OWNER.SNO) and/or MP. The persistency specification (at database transition such that level) requires that a change from SINGLE to MARI* ligationenters without shipment*/ RIED in one tuple of a tie has to have a corresponding at first C SPT.MEMBERS= EMPTY transition in one tuple of the other file: I* shipmentsare only added *I in between transition such that in between COUNT POSTDB.FP .(C XSPERSONAL-STATUS= MARRIEDAND
CPREXS.PERSONAL-STATUS=SINGLE) COUNT P~~TDB.MP .(CXS.PERSONAL-STATUS=MARRIEDAND CPREXS.PERSONAL-STA'IVS=SINGLE)
70
G.
hYTER
The prefixPRE appliedto the spot identifierXS linksthe examinationspot (in the new database)to the spot of the same place in the old database. The operator COUNT appliedto a set of addressedspots yields the cardinalityof the spot set. 1. CONCLUSIONS The distinctionbetween a construct as an abstract data object and the spot or spots where a construct appears in a given context is one of the fundamental issues of the IMC approach. IMC (Information Management Concepts) has been developed as a conceptual tool for the precise and adequate description of database managementfeatures rather than as a proposalfor a data model itself. On the basis of this system of concepts, in particular the concept of spot, a “logical” addressing techniques has been developed, which relies on a top-downstepping algorithm in combination with executable selection criteria to identify a particular spot, that is, to point to a construct in a considered context. The stepwise selection process is specified by a socalled spot address, which contains a spot selection criterion for each spot level to be passed at evaluation time. A spot selection criterion is-apart from a smallset of standard spot identifiers-an expression of operators on addressed spots and operators on obtainedconstru$s. This results in quite a powerful operationalspot selection facility which has been demonstrated in this paper for data type definition or, using the IMC term, for consistency specification. Starting from the spot concept and assuming an application oriented correspondence between spots in evolving databases (defined by or imputed to the user), the IMC concept of place has been presented as a “dynamic extension” of the spot concept. This allows in a spot address to admit spot selection criteria which involve spots of two successivedatabasesand makes the spot address available for the definition of database transition types or, in IMC terminology,for persistency specification. Consistency and persistency specification are the crucial issues of conceptual schema definitionas one of the tools for database design.Therefore it is an objective of this paper to propose an approach for consideration within the context of concepts and terminology for the Conceptual Schema as introduced in [13]. On these topics a Working Group of the International Organization for Standardizationis currently at work[l4]. Spot addressing,however, is not limited to coherence definition. In the development of this research project spot addressingwas initiallyused with data manipulation in a narrow sense, i.e. insertion of a construct beneath a spot, substitutionof a construct at a spot, and removalof a construct from a spot of the database. To give an idea we apply a (simplified)SUBSTITUTEcommand SUBSTITUTE type-indication-termAT spot-address BY construct-expression againstthe relationaldatabase of Fig. 5 in order to model
the new reality, that supplier Blake has moved to the city of supplier Smith: SUBSTITUTE ITEM AT DB.S. (C SNAME= BLAKE).CITY BY C DB.S. (C SNAME= SMITH).CITY This very simple example is by no means representative of the applicationof spot addressesin data manipulation, but rather an indication of where it starts. Apart from manipulationof constructs, also navigationin databases, protection against interference on concurrent access and againstunauthorizedaccess may be defined in terms of spot addresses as well as input and output operations for which by analogy with the spot address field addresses serve to point to fields on physical media. In the database operation languageDABOL spot addresses are used in all these functional areas, using however two major extensions of the spot address (not presented in this paper, but relevant to conceptual schema definition, too). The one allows at each spot IeVcl.to label the candidate spots in order to make possible a multiple scanningof the same spot set (see range variablesin the relational approach and the remark on the RANGE statement as “a dynamically executable statement” in ([9],5.3.7).The other additionalfeature providesa means to step upwards from an identified spot for easy reference to the “surroundings”of this spot. The present papei, however, does not deal with the DABOLinterface and its specific data model, but aims at proposing nonspecific conceptual and algorithmictools for use in uniform and neutral descriptions of database management interfaces. Application needs call for a certain variety of implemented database managementapproaches also in the future. To manageand to cope with such a variety (e.g. in teachingor in standardization)a common and uniform conceptional and terminological background for data structures, operations and other data managementissues is urgently needed. The present paper tries to contribute to this backgroundand therefore intentionallygoes into conceptual details of other relevant work sometimes more than would be necessary for a mere presentationof a new approach. Acknowledgements-The
authoris indebtedto his Brazilian colleagues in the Project MINIBAN for their valuable discussions which contributed to reline the concept of spot address. In addition the author gratefully acknowledges the influence which has been exerted on his work by a number of papers on database management. It is hardly possible to identify all these intellectual sources. Amongst them in particularb12, H-181 have inspired the author to improve more and more the power of spot addressing and the approach of coherence vetication in order to cope with the consistency and persistency problems addressed by those papers.
REFERENCES [I] G. Richter: Syntax of the database operation language DABOL. Int. Tech. Rep. IIG.79.214, GMD. St. Augustin (1979).
Utilization of data access and manipulation in conceptual schema definitions [2] G. Richter, J. Cunha Pereira F. and J. M. V. Castilho: Projeto MINIBAN-Relarorio Final da Segunda Etapa. DIGIBRAS. Rio de Janeiro (I9781(In Porturmese).
[3] J. M. V. Castilho and G.‘Richt&: Uma-inte;face para sistemas de informaGo: LOBAN~linguagem de operacso de banco de dados. in Anais do 11. C&&so Nacionaide Processamento de Dados. SUCESU. Rio de Janeiro 11978) , (In Portuguese). [4] R. Durchholz and G. Richter: Concepts for data base management ‘systems. In Data Base Managemenl (Edited by J. W. Klimbie and K. L. Koffemann). North-Holland, Amsterdam (1974). f51 . _ G. Richter: On the relationship between information and data. In Data Base Systems. iecfure Notes in Computer Science (Edited bv H. Hasselmeier and W. G. Suruth).Vol. 39. SpringerlVerlag, berlin (1976). [6] R. Durchholz and G. Richter: Information management concepts (IMC)for use with DBMS interfaces. In Modelling in Data Base Management Systems (Edited by G. M. Nijssen). Noti-Holland, Amsterdam (1976). [7] R. Durchholt: Types and related concepts. In ht. Cornput Svmo. 1977 (Edited bv E. Morlet and D. Ribbens). NorthH&nd, Amsterdam 6977). [8] G. Richter: 0 sentido e o valor do banco de dados. Ados e Ideias 2(6) (June/July 1977). SERPRO, Rio de Janeiro (In Portuguese). [9] C. J. Date: An Introduction lo Database Systems (2nd Edn). Addison-Wesley, Reading, Mass. (1977). [lo] P. Chen: The entity-relationship model-toward a unified view of data. ACM Trans. on Database Systems l(l). 9-36 (March 1976). [ll] Data description language committee of CODASYL: J. Development. Secretariat of the Canadian Government, Quebec (1978).
71
[12] P. C. Lockemann, H. C. Mayr, W. H. Weil and W. H. Wohlleber: Data abstractions for database systems. ACM Trans. on Database Systems 4(l), 60-75 (March 1979). [ I9 ANSIIXVSPARC: Interim report ANSI/XVSPARC study group on data base management systems. CBEMA, Washington DC (1975). Reprinted in ACM SIGMOD Bul!. 7(2) (1976). [I41 ISO/TC97/SCS/WG3: Document ISOj’TC 97/SC 5 N 527: Report of ISO/TC97/SCS/WG3 (Data base management systems). ANSI (Secretariat ISO/TC 97), New York, NY (Nov. 1979). [IS] M. M. Astrahan et al.: System R: Relational approach to database management. ACM Trans. on Database Systems l(2), 97-137 (June 1976). [16] M. Stonebraker, E. Wong, P. Kreps and G. Held: The design and implementation of INGRES. ACM Trans. on Database Svstems l(3). 189-222 (Sent. 1976). [I71 S. Y. W. &I and k: Emam. CASUAL: CASSM’S DAta Language. ACM Trans. on Database Systems 3(l), 57-91 (March 1978). [18] A. L. Furtado, K. C. Sevcik and C. S. dos Santos: Permitting updates through views of data bases. Information Svsferns 4(4), 269-283 (1979).
AFTmDlx
In the following the language elements which have been introduced in the paper are listed (not necessarily a subset of the DABOL definition). As far as productions are given, they are not intended to constitute a closed syntax specification; the missing productions are left to the imagination of the reader. Metalinguistic comments are between -. and .-.
1. Spol ser expression spot-address : : = context-identifier [.spot-selection-criterion] 1 spot-selection-criterion [.spot-selection-criterion] . context-identifier : : = spot-identifier / prespot-identifier / postspot-identifier spot-identifier : : = DB 1XS 1CS 1SPT 1OC prespot-identifier : : = PREDB 1PREXS 1PRECS 1PRESPT postspot-identifier : : = POSTDB 1POSTXS 1POSTCS ( POSTSPT spot-selection-criterion : : = boolean-construct-expression 2. Operators on (addressed) spots N spot-address C spot-address COLLEC spot-address COUNT spot-address FOR EACH spot-address (boolean-construct-expression)
-.obtain name construct.-.obtain component construct.-.obtain collection of component constructs-.obtain natural number.-.obtain boolean construct (universal quantifier).-
3. Operators on (obtained) construcfs CARD collection-expression LENGTH string-expression NOT boolean-construct-expression PROJ relational-table-expression ON attribute [, attribute] . number-expression {+ 1- / * I/} number-expression number-expression {< = ) < 1> = / >} number-expression construct-expression = construct-expression construct-expression EL -.element.- construct-set-expression boolean-construct-expression {AND / OR / IMPL) boolean-construct-expression construct-set-expression {SUB -subset, not equal.- / SUBEQ -subset or equal.-) construct-set-expression Remark: xxx-expressionisanexpressionwhichyieldsaconstruct of pretype xxx, i.e. a collection. a relational table, a boolean construct (TRUE or FALSE), a number, a string. or a set of constructs in general.
4. Input operaror construct-literal Remark. A literal is considered a short-hand notation of an
expression where an interpretation operator (say INT) is applied to the string represented in the source text. Thus the expression SUPPLIER is equivalent to INT “SUPPLIER” and yields the (abstract) construct in the intermediate zone, which has been obtained by the interpretation of the inscription between quotation marks.