Volun e 14, number 3
INFORMATION PROCESSING LETTERS
16 May1982
DEFININGDATABASEDYNAMICSWITH ATTRIBUTEGRAMMARS* Dzenan RIDJANOVICand Michael L. BRODIE Departmentof ComputerScience, Universityof Mruyhd, CollegePark,MD 20742, UXA. Received 23 March 198-l; revised version received 4 January 1982
Semantic data models, specification of database integrity constraints, context-free and attribute grammar&conceptual modelling
1. Introduction
2. Semanticdata models
In the programming language area several formalisms are widely used for the precise definition of the syntax and semantics of programming languages and programs. In the database area the specification of the effects of operations on a database, with many associated integrity constraints, is still an important open issue. This issue which addresses database dynamics has been informally called database semantics. One approach to the specification of database dynamics, borrowed from programming languages, is to describe database semantics by associated transactions (programs). Another approach, which has just begun to receive attention in programming languages, is to use integrity constraints which are not input-output specifications for programs, but rather specifications for constraints on data [ 141. Due to the growing complexity and importance of database applications, the pragmatic database community requires precise definitions of database dynamics without the complexity of existing formal methods [4]. This paper proposes the use of attribute grammars for the simple, yet precise, specification of database integnty constraints.
The relationaldata model (RDM) [S] presents powerful primitivesfor the representationof structural propertiesof databaseapplications.These prop ertiesaddressdatabasestatics which correspondsto the syntax of a programminglanguage.Entitiescan be representeddirectlyin relationswhich imply fewer implementationdetails than, say, a list structure. Relationshipscan be representedexplicitly in relationsor implicitly by meansof foreignkeys. However, the RDM provides no direct means of defming and maintaining the properties of relationships. The RDM, and indeed most data models, provide low-levelmanipulation primitives (e.g., insert, update, delete) and inadequate means for cnnlposing application-oriented operations from the primitives. Cornpare the application properties that can be represented in the relation
with the propertiesrepresentedby the operation insert(482,RITZ, 4340, co1ID, 010682,010982
)+
The insertoperation requires more meaning to be
* This work is supported, in part, by the National Science Foundation under grant number MCS 77-22509. 132
expresseddependingon the context in which it is used. This meaningis expressedthroughintegrity constraintsassociatedwith application-oriented operationssuch as make-reservation, tmnsfeerc reservation and comfinnwservation. Semanticdata models (SDMs)attempt to extend 002@0190~82~OfKKWXW /$02.75 8 1982 North-Holland
Volume 14,
number3
INFORMATIONPROCESSINGLETFERS
the ability of datamodels to representstatic and dynamicpropertiesof preciselyand abstractly. model (SEW) [ 131 ad& to the RDMthree forms of abstraction: n, -tima andm&izatibn with which to representrelationships.The abstractionscan be consideredstructuralcomposition rules. Classificationis a form of abstractionirlwhidr a c&s is defmedas a set of elements.This is the Sasrmrcncsfrelationship, as in the relationship between a type and its instances.Aggregationis a form of abstractionin which a relationshipbetween component classesis consideredas a higherlevel aggregateclass.This is the parts)trelationshipas used in semanticnetworks.Generalizationis a form of abstractionin which a relationshipbetween category classesis consideredas a highr:r?gvelgenericclass. This is the &u re!ationshipa&2from semanticnetworks. Figs. 1,2 and 3 illustrate the threeabstrac-
16 May 1982
tions. (Capitallettersdenote abstractions and small lettersidentify classelements.) The extended semantichierarchymodel (SHMt) (21 adds both structural and behavioural abstractions for modellingtcl SHM.In the process of abstraction, detailsrelevantto the problem at hand are emphasized while less relevant details are ignored. Another view of abstraction is the establishment of a one to m43ntrelationship. Classificationrelates one class to many elements (instances) of the class. Aggregation relates an aggregate class to many component classes. Generalization relates a generic class to many category classes.The three forms of abstraction do not provide a means of representing the natural set relationship amongst classes,namely, that a class is composed of a set of members of another class. Clearly, the prop erties of a set differ from those of its members. For example, an employee may have properties name, sex, srrlrrryand depmmtent while employees (set of employee) may have properties group-name, number-
PERSOU_ Ret1
DavLd
B#ZZ
Da&d
R&won fr
ati LnHunar
reprarrmts
of
Zat ti8
Ridd
Lynn
a PERSOR claZ38.
an fndanoe-oj’
relationship,
Fig 1. CWification example.
RGSRRVATIOR t -7 WT7L ROTRL is
psrf
PERSOR
oj’ RRSRRVATIOR.
rf- “9
repxesents
a part-of
rdationahip. Fig 2. Aggrqgatianexample.
RnPLOmr /tL MARACER MARACGR ie
an EMPLOXRE,
SRCRETARP
where/
CRAMBTR-MAID
represents
ia-a
relationship.
Fig. 3. Generdiration example. 133
INFORMATION PROCESSING LEl’TJIRS
Volume14,number 3
sented as a set of operations (general iteration construct), e.g., for eachE InEWLOYEES nlrlse-yalaru@
of-males,numbercof-femalesand avawg~sa~?y-The member-of relationship between classes is absent in
the above abstractions; it is treated implicitly and is modelled using classification. SHM+introduces a fourth form of abstraction called associa~ioi:rto treat sets explicitly at the class level [3]. Association is a form of abstraction in which a set of member elements of the member class is considered as a higher level associate element of a set class.This is the memberof relationship. Association is illustrated in Fig. 4. In [6] it is suggested that data and control structures are designed using the same principles. In [7,8] it is shown that Cartesian product, discriminated union, and sequence data>structuring methods correspond respectively to functional composition, choice, and iteration operation constructs. This simple and appealing idea is fundamental to the ‘structured programming’area (e.g., see [ 10,121). This paper suggests a relationship between these important concepts and heretofore distinct concepts in the database area, thereby establishing a correspondence between structured programming and database design using semantic data models. Aggregation, generalization and association provide means of organizing behaviour as well as data. For example, an operation on RESER VATION,
3. A context-freegrammarfor databasestatica
In programming languages, context-free grammars (CFGs) are used to define syntax, and attribute grammars (AGs), based on CFGs, are used to define semantics. The main idea here is to use AGs to define database dynamics (integrity constraints). However, CFGs are used first to define database statics. In particular, CFG production rules are used for the specification of SHM+abstractions (composition rules). CFG is usually denoted as G = (V,T,P,S) [9]. V and T are disjoint finite sets of variablesand terminals, respectively. P is a &rite set of productions. Each production iule is of the form A ::= ar,where A is a variable and Q!is a string of symbols from (V U T)* (* is the Kleene star.) Finally, S is a special variable called the start symbol. A structural part of the SHMt is precisely denoted by M = (C,S,A,D). Thus C,S,A,D in SHM+ correspond to V,T,P,S, respectively, in CFG. C is a finite set of classes. There are two basic kinds of classes- composite and simple. Classesare simple if they are not defined in terms of other classes. S is a finite set of simple class elements. We assume that C and S are disjoint. A is a set with four different kinds of composition rules - one for each form of abstraction in SHMt. D is a special database class. In Fig. S(a) the structural part of the HOTEL RESERVATION database is presented and in Fig. S(b)
say make-reservation,can be represented as a sequence (functional composition) of operations on HOTEL and PERSON, an operation on EMPLOYEE, say hire-employee, can be represented as a case statement (general choice construct) on MANAGER, SECRETARY and CHAMBER-MAID;and an operation on EMPLOYEES, say raise-salary,can be repreHOTEL I NAME
1 EMPLOPEES
I ADDfiESS
EMPLOPEES EMPi;
f
One instance (memberelement).of an a
member
of
the instanceof the
representsa
member-oj?
Fig.4. Association example. I34
EMPLO1yEE
EMPLOYEES
relationsh2.p.
16 May 1982
member class
is
set class, where t
Vdume 14, number 3
tNFOlWA”CION P
Hnson
16 May 1982
RSdd
ZattfrrLynn
Johu eraen Fig. S(P). HOTEL RESER VATIOA?abstractions
EOTRL RESRRVATZOR* IC.S,A,DI C
-
{RRSGRVATIBRS, BWPLOYRRS,
PERSOU,
RRSBRVIITIOI, RMPLOYEB,
NARAGRR,
CTOTIL,
SECRRTARY,
IPAUE,
ADDRESS,
CRAHBRR-MAID)
S - CRvft Davfd, Einuon Xidd, ZaZZio Lynn, Retorta, Bitton, Wau York, Detroit,Ntz~#Rpoun, John Gram, Pet8r Wstrrh, . . . 1
D = RRSSRPATIORS 1 OOtW$8t8 Of tha RBSRRVATIORSs
t=
fOttOW$n@: RCS~RVATlOP
RSSRRVAI'IOR ::- HOTRL PRRSOR ROTRL
::- BANB
BNPLt?Ylms
::=
IwPLorEE
::= ff~dl~cimISECRETARY 1~~M~RER-MAID
PBRSOR
::-
Bett bav
IPAME
::=
A8lOria
ADDRRSS
::= Ipsa, Xo'orklDetroit
MA10AGBR
::-
ADDRRSS %MPLOYfEES
MPLOYIIFBJ
lIi3
Lpwt
t*O??
Mary Brow 1John Grew I?eter
WeZeh
etc.
F&5(b).Grammarfor HOTELRESERVATION suucture. 135
Volume 14, number 3
INFORMATION PROCESSING LETTERS
16 May 1982
Table 1 SC
slw+
particular CPG:
particular SXN+ structural model (or database schema):
G = (V,T,P,S)
El - (C,S,A,Dl
lcnquaqe L defined by G
data mfrerse
program PR of L
database DB of DV
its corresponding grammar. HOTEL RESERVATW!! can be considered as the database schema name, where a schema corresponds to ‘thestructural part of t:tC SHM+. In the programming language area a CFG is used to defme a programming language, i.e.. to generate all programs of that language. In the database area, the structural part of SHM+,denoted by M =:(C,S,A,D), is used to define a data universe,i.e., to generate all possible databases of that universe. Thus, a data universe corresponds to a programming language and a database corresponds to a program. A derivation tree of a program is a proof that the program is syntactically correct, and a derivation tree of a database is a proof that the database is a part of the defined data universe, i.e., satisfies the schema. The relationship between the structural properties of the Sk&i+ and the CFG is represented in Table I.
4. An attributr3grammarfor database dynamics A major open problem with SDMsis the specification of a database dynamics. Ir the programming languagesarea, attribute grammars [ 111 are used to describe language semantics. An attribute grammar (AG) is a CFG, G = Q/,T,P,S), in which ea+ variable from V is associated with a set of attributes. These; attributes describe the properties of the variables. The values of attributes are given by semantic rules associated with the production rules from P. Thus, a ‘meaning’may be assigned to a program in a CFG language by stating attributes of the variables in a derivation tree for that program. In a similar way, meaning may be assigned to a
, DV defined by M
database by using attributes of the classesin a derivation tree for that database. Each class from C in M = (C,S,A,D) is associated with a set of attributes. The attributes define the properties of insert, delete and update operations, and they can be defined by predicate rules (integrity constraints) associated with each abstract composition rule from A. There are four different types of abstract composition rules in SHM+ (for classification, aggregation, generalization and association) and each of them has predefined predicate rules which constitute semantics of the SKIM+. Thus, behavioural properties of a database are derived, on the one hand, from the behavioural characteristics of the abstractions in SHM+(predefined-model constraints), and on the other hand, from the dynamic properties of the database application (application constraints). Operands for the database operations are class (simple or composite) elements. That is the basic reason that non-terminal nodes in z database derivation tree can be considered as database variables of certain classes.The same names will be used for database variablesand classes,assuming that there is a one-to-one correspondence between them and that the structural and behavioural properties of the variables are determined by the properties of their classes. Fig. 6 shows the make-reservation operation attribute (constraint) vaZtd=reservation, which is defmed by the other integrity constraints hotel-exists, personexists(i.e., exists in a database) and vaZid-new-peson. This is a ‘synthesized’ constraint, i.e., defined solely in terms of constraints of the descendants of the corresponding database variable (or class).The other type of integrity constraints is ‘inherited’, i.e., defined in terms of constraints of the ancestor of the database variable.’For example, cancel-pemon is an operation
Volume 14, number 3
INFORMATIONPROCESSINGLETTERS
16 May 1982
RESERVATION I ROTEL RESERVATION
: :-
I PERSON
HOTEL PEPSOR
vuZ
(RESERVATIONI hotet-exist8
c----) (ROTELI
(person-exietti
and
(PERSON)
g=
valid-neo-pereon
where -
represents
the
logical
biconditional
(PERSORl I
connective.
Fig. 6. Valid-resenwhonattribute.
‘inherited’ constraint : cancel-person (PEASO~
Acknowledgment c* no-reservation
(RESER VATION).
Operation constraints (attributes) are defined in terms of their ‘local environments’ (using the principle of abstraction) miuiwizing the interconnections between different parts of a database. This localization and partitioning of the semantic (predicate) rules makes the definition of database dynamics (integrity constraints) easier to understand and more concise and explains the use of AGs.
5. Conclusion
Due to the growing complexity and importance of database applications, the pragmatic database community requires precise definitions of ,database dynamics without the complexity of existing formal methods. Semantic data models have been introduced to represent static and dynamic properties of database applications in a direct, precise and abstract way. So far, very mathematically oriented specification techniques of semantic data models have not been widely accepted in the database area. This paper has proposed a technique that has the flexibility to meet varying precision requirements, yet avoids the complexity of existing formal methods: context-free grammars are used to define database statics and, more importantly, attribute grammars are used to specify database dynamics (integrity constraints).
The authors wish to thank the referees for their constructive comments and helpful criticism.
References [l] M.L. Brodie and S.N. ZiJles,eds., Proc. Workshop on Data Abst., Database; and Conceptual Modelling, Special Issue SIGPLAN Notrl:es 16( 1) (1981); SIGMOD Record ll(2) (1981); SIGART Newsletter 74 (1981). [2] M.L. Brodie, On rnodelling behavioural semantics of databases, Proc. 7th irrternat. Conf. Very Large Databases, France, 198 1. [ 31 M.L. Brodie, Association: a database abstraction for semantic modeiling, in: P.P. Chen, ed., Entity-Relationship Approach to ;information Modelling and Analysis (ER Institute, Los Angeles, 1981). [4] M.L. Brodie, Axiomatic definitions of data model semantics, Inform. Systems 7(2) (1982). [S] E.F. Codd, A relational model for large shared data banks, Comm. ACM 13(6) (1970). [6] C.A.R. Hoare, Notes on data structuring, in: 0.3. Dahl, E.W. Dijkstra and C.A.R. Hoare, eds., Structured Programming (Academic Press, New York, 1972). [7] C.A.R. Hoare, Data reliability, SIGPLAN Notices lO(6) (1975). [8] C.A.R. Hoare, Data structures, in: R.T. Yeh, ed., Current Trends in Programming Methodology, Vol. IV (Prentice-Hail, Englewood Cliffs, NJ, 1978). [9] J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages and Computation (AddisonWesley, Re?ding, MA, 1979). f101M.A. Jackson, Principles of Program Design (Academic Press, New York, 1975). 137
Volume 14, number 3
INPORMATION PROCESSING LETTERS
[ 1l] D.E. Knuth, Semantics of context-free languages, Math. Systems Theory 2(2) (1968). [ 121 C,J. Myers, Composite Structured Design (Van Nostrand, New York, 1978). [ 131 J.M. Smith and D.C.P. Smith, Database abstraction:
138
16 May 1982
aggregation and generalization, ACM TODS 2(2) (1977). [ 141 S.N. Zilles, Types, algebras and modelling, in: M.L. Brodie and S.N. Zilles, eds., Proc. Workshop on Data Abst., Databases and Conceptual Modelling, Special Issue SIGPLAN Notices 16(l) (1981).