North-Holland Microprocessing and Microprogramming 27 (1989) 19-24
19
MICROCOMPUTER IMPLEMENTATIONS OF AN OBJECT-ORIENTED, MULTIPROCESSING DATABASE PROGRAMMING LANGUAGE John G. Hughes Department of Information Systems, Institute of Informatics, University of Ulster, Jordanstown, Co. Antfirn, BT37 0QB, N. Ireland We have designed and implemented an database programming language and environment called RAPP, based on a tight integration of the relational data model and the object-oriented data modelling facilities of a modular, multiprocessing programming language. Persistence is provided through the relation data type and so the simplicity of the relational model is retained. However, attributes of relations may be of type class, thereby permitting the implementation of a wide variety of object-oriented data modelling constructs. Classes in RAPP support data abstraction, encapsulation and multiple inheritance. Concurrency is provided by means of an explicit database transaction mechanism. The core of the system consists of a relatively small, well-structured, multi-pass compiler which generates a simple intermediate code for an abstract machine. This paper outlines the principal features of the RAPP system and its implementation on microcomputers.
1.
INTRODUCTION
In the commercial world, relational database systems have now become the de facto standard for data processing applications. Their success has largely been due to their flexibility and ease of use, and to the fact that a number of very efficient relational database management systems are now available commercially. Only in the past few years however have these systems succeeded in providing the level of performance required for large-scale transaction processing environments. However, it is widely recognised that there exists a large class of applications for which the data modelling capabilities of relational systems are too limited. These applications can be characterised as complex, large-scale, data intensive programs, such as those found in the areas of computer-aided design and computer integrated manufacturing. Object-oriented database systems are being developed to meet the complex data modelling requirements of such applications. However, these systems give rise to significant.implementation problems, particularly with regard to persistence and concurrency control, which are currently receiving considerable attention in the research community. To investigate some of these problems within a programming language framework, an object-oriented, multiprocessing database programming language called RAPP has been designed and implemented by the author over the past few years. This language, together with its associated software development environment, provides a sophisticated tool for database construction. A prototype is currently in use at several universities for teaching purposes and for database research. Although programming language objects and database objects are similar in that they encapsulate properties, there are several important differences [1]. First, database objects must persist beyond the lifetime of the program creating them. Second, many database applications require the capability to create and access multiple versions of an object. (Examples of this are to be found in historical databases, databases for software management, and computer-aided design.) Third, highly active databases (such as
those used for air traffic control and other real-time applications) require the ability to associate conditions and actions with objects where the actions are triggered when the conditions are satisfied. Finally, database integrity control demands the capability to associate constraints with objects. Programming languages do not typically support persistent objects or multiple object versions. Nor do they always provide the facilities to associate constraints and triggers with objects. In their efforts to provide a better framework for persistent data, database programming languages have typically adopted one of two contrasting approaches. The first approach, pioneered by the language Pascal/R [2], is to provide a tight integration of a database model (usually me relational model) with a conventional programming language. Variables of type relation may persist and additional control structures, based on the relational algebra or calculus, are added to the language for accessing and manipulating relations. Other languages falling into this category include Astral [3], Rigel [4], Theseus [5], Plain [6], and Modula/R [7]. A second group of languages adopt a more elaborate model of persistence in which any value may persist irrespective of its type, and persistent objects are subject to the same rigorous type checking as program variables. The languages PS-Algol [8] and Amber [9] were the first to treat persistence in such a uniform manner. Poly [10] and Galileo [11] are also in this category. These languages do not support any formal data model however, and a major criticism of their approach is that they are incapable of dealing with arbitrary 'join' queries and that query processing tends to be pointer oriented, as in old-fashioned network databases [12]. The same criticism may also be levelled at many object-oriented database languages [13-17], which have also moved away from the traditional database models and replaced joins by path traversals. These languages offer the advantage of built-in object identity [ 18] which obviates the need to maintain, for example, referential and existential integrity constraints. However, object-oriented data modelling remains somewhat vague and significant implementation problems remain with regard to persistence.
20
J. Go Hughes/ Implementation of a DatabaseProgramming Language
RAPP falls somewhere between these two approaches in that while supporting the spartan simplicity of the relational model, it offers a rich typing system for modelling complex attributes. In this respect it bears some similarities to those models which have attempted to introduce some generality to the relational model by relaxing the constraint that relations must be in first normal form. For example, Jaeschke and Schek [19] present an algebra for a non first normal form model which permits attributes to be sets of atomic objects, while the model of Zaniolo [20] supports nested tuples (i.e. attributes may themselves be tuples). The main advantages of RAPP may be summarised as follows: (i)
It provides a model of persistence in which the simplicity of the relational data model is retained; (ii) It offers sophisticated facilities for the construction of abstract data types and for data encapsulation, including multiple inheritance and mechanisms for specifying triggers and constraints in classes; ('hi) It provides explicit, high-level constructs for representing and scheduling database transactions; (iv) It has an associated database development environment, including an extensive library of classes and a variety of tools for debugging and tracing. (v) The core of the system consists of a relatively small, multi-pass compiler which generates P-code (a simple intermediate code for a stack-based abstract machine), and which is readily implemented on modern microcomputers such as the Apple Macintosh and IBM PC. 2.
RELATIONAL DATABASE CONSTRUCTS
In common with other Pascal-based database programming languages, the relation data type in RAPP is based on the existing record data type. However, RAPP differs from many of the relational database programming languages mentioned above in that it permits an attribute type to be an abstract data type or class. This facility is similar to that provided in the database management systems POSTGRES [20] and RAD [21], both of which permit abstract data types to be def'med for domains and allow arbitrary operations to be defined on those types. For example, in RAPP the following code defines a relation that represents people: type string = packed array [1..30] of char; persontype = record id :integer; name, address : string; weight, height : integer; bidhdate : date end; vat persons = relation [id] of persontype;
Relations may be persistent or temporary, local or global, and may be passed as variable parameters to procedures or functions. The key attributes of the relation are specified within the square brackets. The system uses this information to enforce integrity constraints. There are no restrictions on the type an attribute may take. For example, in the definition of persontype, the type date might be an abstract data type, defined by a class in a manner to be described in the next section. This approach is consistent with the relational model of data in which, at the abstract level, attributes are viewed as atomic or non-decomposable objects. How-
ever, for a database management system to actually store and manipulate attribute values, the details of their machine representation must be incorporated into that part of the system which is normally hidden from a user's view. The mapping of class instances between persistent store and main memory is described in section. The operators provided by RAPP for manipulatingrelations are based on the relational algebra [22]. These operators consist of selection, projection, natural join, Cartesian product and the set operators of union, intersection and difference. A full description of these operators is given in reference [23]. The advantage of the relational algebra is that there are a relatively small number of operations necessary for relational completeness and effort can be concentrated on the efficient implementationof these operations. Relations in RAPP may be indexed on any single-valued attribute by means of an index relation. Index relations are created by the user, but subsequently they are automatically updated by the system when the base relation is updated. This relieves the user of the often complex and mundane responsibility of index maintenance. 3.
CLASSES IN RAPP
In RAPP the programmer may introduce arbitrary new data types into a relational database and define arbitrary operations on objects of these new types. Such data abstraction and modularity are provided for in RAPP via enhanced versions of the envelope and envelope module constructs of the modular, multi-processing language Pascal Plus [24]. An envelope is simply a class which envelopes the execution of any block in which an instance of the envelope is declared. This control structure may be used to enforce correct initialisationand finalisationof data structures, and to implement triggers and constraints in a database environment. If only one instance of an envelope is required the definition and instantiationmay be combined into a single envelope module. An envelope module in RAPP is very similar to the module or package facility found in languages such as Modula/2 and Ada. For example, a domain which is of type set, with some user defined element type, could be represented in RAPP by an envelope of the following form: Envelope set ; Function *empty : Boolean; Function *member ( e : elementtype ) : Boolean; Procedure *insert ( e : elementtype ); Procedure *delete ( e : elementtype ); begin { initialisation code } ; { finalisation code } end ;
The effect of declaring an instance of set in any block B is to make available to that block a data structure and an associated set of operations, i.e. those variables, constants and operations which are 'starred' in the envelope declaration. In addition, the block B is implicitly enveloped by the code in the body of the envelope. Thus the set is first initialised and when the 'inner statement' (***)is encountered block B is executed. On termination of block B any f'malisation code
J.G. Hughes / Implementation of a Database Programming Language
following the inner statement will be executed. The inner statement has full statement status and may be executed conditionallyor repetitively. This enables a variety of tracing, monitoring and recovery strategies at block level which few other languages support [24]. RAPP uses a flexible library retrieval mechanism to implement a simple form of polymorphism in classes. With this mechanism, types, constants, procedures and functions may be left undefined when the class is stored in the library and supplied at compile time by the program retrieving the class. For example, a general purpose set abstraction could be implemented by a class in which the type of element is left undefined. A program requiring an abstraction for a set of type T may then retrieve the class from the library with the statement:
A monitor is a module which holds the declarations of items of data which are accessed by more than one process and guarantees exclusion on that data. That is, only one process at a time is capable of modifying data which has been declared local to a monitor. However, starred monitor variables may be inspected by any number of processes at the same time. Thus monitors provide a means for implementing read-shamble and write-exclusive locks and for scheduling database transactions such that serializability is maintained. For example the following is the interface for a simple monitor which controls access to a relation. Monitor RelationAccess; type *AccessMode = ( *read, "write ); Procedure *Acquire ( AccessRequired : AccessMode );
envelope set = envelope set in library with elementtype = T;
Multiple inheritance is central to the object-oriented approach and is provided for in RAPP via an inheritclause, in a manner similar to that of the language Eiffel [25]. For example, if we have a class polygon,we may define a class rectangle in the following way: envelope rectangle;
Procedure "Release; end;
An instance of this monitor may be declared for each relation and transactions requiring access to a relation do so through the appropriate monitor instance. For example, we may declare a monitor instance, personmonitor,controlling access to the person relation in the following way: instance personmonitor : RelationAccess;
inherit polygon; {definition of properties and operations specific to rectangles }
A transaction then acquires write-exclusive access to the person relation by issuing the command,
end;
The inheritclause lists all the parents of a class. No name conflict is permitted, but a rename clause is provided to resolve such conflicts. 4.
MODEI.~ING DATABASE TRANSACTIONS
Concurrent object-oriented languages model the world with concurrently executable objects called processes. A process, like a class, has an interface of executable objects but has one or more threads of control which at any given time may be either active or suspended. In common with the languages ABCL/1 [26] and Orient 84K [27], RAPP offers quasi-concurrentprocesses based on the monitor concept [281. A RAPP process is similar in structure to a procedure but represents a program activity which may be executed in parallel with other processes. Thus it is an ideal representation for a database transaction, provided that the implementation takes responsibility for treating processes as units for recovery. Database transactions must be atomic, i.e. they must represent uninterruptibletemporal units of execution. Arbitrary suspension of an execution thread in the middle of a transaction may violate the integrity of the database. Thus database transactions must be distinguished from other processes and in RAPP this is effected by use of the keyword transaction, i.e. transaction T;
endi"
21
personmonitor.Acquire ( write );
This example shows locks to be at the relation level, but higher degrees of granularity can be achieved. In this regard a useful addition to the language might be a selector function, such as that offered by Modula/R [7] which permits transactions to lock a portion of a relation defined by some selection criterion. Our approach to concurrency is based on the fact that we do not believe that we can define the semantics of sharing for all applications in an object-oriented environment and provide only one synchronisationmechanism that will suffice in all cases. Our philosophy is to provide sufficient primitives so that applications developed in RAPP can implement the transaction mechanism that best suits their environment.Thus the primitives in RAPP provide the necessary mechanisms for coordinating reliable updates to objects, and these mechanisms can be tailored to particular applications. 5.
IMPLEMENTATION
The RAPP system has been implemented on a wide variety of machines including the Apple Macintosh. A version for the IBM PC is currently under development. Most of the system code is written in standard Pascal and is therefore easily portable. Memory requirements obviously depend on the size of the application and the underlyingrelations. However, reasonably sized systems have been developed on the Macintosh with 1 Mbyte of main memory. The implementation consists of two main parts: program translation and run-time support. The core of the system
J.G. Hughes/ Implementation of a DatabaseProgramming Language
22
consists of a relatively small, multi-pass compiler which generates P-code (a simple, low-level intermediate code for a stack-based abstract machine). A P-code interpreter must then be provided for each machine on which the system is to run. The run-time system has several major components dealing with process management and synchronisation, the storage and retrieval of classes, query optimisation and general database management functions. The process management system is based on 'lightweight processes' in that concurrency is supported by the language, not the underlying operating system. The run-time system is responsible for maintaining the atomicity of transactions and employs standard journaUing techniques to achieve this. Journalling, in common with most database management functions in RAPP, is totally transparent to the user. A major concern of the compiler is the analysis of relational algebraic statements and the generation of code for their implementation. At the lowest level this involves calls to procedures imported from the common frie system for opening and dosing relations and reading and writing blocks of data to and from relations. Relations are stored as direct-access fries with a dense primary index organised as a B-nee. User-defined index relations are also organised as B-trees and the implementation of retrieval operations on these indices (i.e. using the selection operator) takes advantage of this multi-level structure.
of its representation, an instance of a class may be stored in the relation itself or as a pointer, but such details are transparent to the user. An important element of the RAPP system is the class library. This library covers many of the common elements of database programming, incorporating many fundamental data structures such as lists, variable length strings, sets, trees, stacks etc. Many of these are polymorphic (in the manner described in Section 3), and they have all been thoroughly tested. An obvious major advantage of the library is that it absolves the user of the responsibility for providing the implementation details of the class. This is particularly advantageous when class instances are to be mapped between persistent store and main memory. ACKNOWLEDGEMENTS I should like to thank Dave Bustard and Jenny Johnston of Queens University Belfast, who kindly provided me with a Macintosh version of Pascal Plus which formed the basis for the RAPP implementation on that machine. REFERENCES [1]
The join operation is implemented by means of the classic sort-merge-join algorithm. In this method, the two relations to be joined are sorted on their common attribute values using a multiway merge sort. The two resulting sorted files are then merged and tuples with matching common attribute values are output to the resultant relation. No attempt as yet has been made to optimise algebraic expressions involving non-index relations. The philosophy adopted in the development of RAPP is that, instead of providing a large and complicated optimiser, we provide appropriate programming tools with which the programmer can improve on efficiency directly. Index relations provide the programmer with a powerful means for optimising the performance of database systems written in RAPP. For this reason, a great deal of effort, during the development of RAPP, was devoted to the efficient storage and maintenance of indices, and to providing the programmer with a simple interface to these structures. Thus, index relations can be created or recreated simply by a call to the procedure: rewrite ().
The system subsequently maintains indices automatically, thereby absolving the programmer from the complicated procedure of updating indices following insertion or deletion operations on the base relation. Also, since indices are treated as relations, the programmer can combine and manipulate them using the normal algebraic and set operators. Classes for which persistent instances are required (i.e. those which are used in the definition of relations) must have in their definition certain routines to enable the runtime system to manipulate such instances. These include: converting to and from the internal representation of the class used in a RAPP program and an external storage representation; testing for equality; returning the size in bytes of the external representation of a given instance. However, once these routines have been supplied the user is absolved of the responsibility for mapping class instances between persistent store and main memory. Depending on the nature
Agrawal, R. and Gehani, N.H., ODE (Object Database and Environment): The Language and the Data Model, Prec. ACM-SIGMOD Int'l Conf. on Management of Data, Portland, Oregon, May-June (1989). [2] Schmidt, J.W., Some High Level Language Constructs for Data of Type Relation, ACM Trans. on Database Syst., 2, 247-261 (1977). [3] Amble, T., Bratbergsengen, K. and Risnes, O., ASTRAL: A Structured and Unified Approach to Database Design and Manipulation, Prec. Database Architecture Conf., Venice, (1979). [4] Rowe, L.A. and Sheens, K.A., Data Abstraction, Views, and Updates in Rigel, Prec. ACM SIGMOD Conf., Boston, Mass., 71-81, (1979). [5] Shopiro, J.E., Theseus - A Programming Language for Relational Databases, ACM Trans. on Database Syst., 4, 493-517, (1979). [6] Wasserman, A.I., Sherertz, D.D., Kersten, M.L., van de Riet, R.P. and Dippe, M.D., Revised Report on the Programming Language Plain, University of California, San Francisco, Technical Report No. 42, (1980). [7] Reimer, M., Implementation of the Database Programming Language Modula/R on the personal computer Lilith, Software - Pract. & Exper., 14, 945-956, (1984). [8] Atldnson, M.P., Bailey, P., Cockshott, W.P., Chisholm, K.J. and Morrison, R., An Approach to Persistent Programming, Comput. J., 26, no.4, (1983). [9] Cardelli, L., Amber Technical Report, AT&T Bell Labs., Murray Hill, N.J., USA, (1984). [10] Matthews, D.C.J., Poly Manual, Technical Report No. 63, Computer Laboratory, Univ. of Cambridge, Cambridge, England, (1985). [11] Albano, A., Cardelli, L. and Orsini, R., Galileo: A Strongly Typed Interactive Conceptual Language, ACM Trans. on Database Syst., 10, 230-260, (1985). [12] Neuhold, E. and Stonebraker, M., Future Directions in Database Research, Technical Report No. 88-001,
J.G. Hughes/ Implementation of a DatabaseProgramming Language
[13]
[14]
[15]
[16] [17] [18]
[19]
[20]
Int'l Computer Science Inst., Berkeley, California, USA, (1988). Maier, D., Stein, J., Otis, A. and Purdy, A., Development of an Object-Oriented DBMS, Proc. 1st Int'l Conf. on Object-Oriented Programming Systems, Languages and Applications, Portland, Oregon, 472486, (1986). O'Brien, P., BuUis, P. and Schaffert, C., Persistent and Shared Objects in Trellis/Owl, Proc. Int'l Workshop on Object-Oriented Database Systems, Asilomar, California, USA, (1986). Bancilhon, F., Briggs, T., Khoshafian, S. and Valduriez, P., FAD, a Powerful and Simple Database Language, Proc. 13th Very Large Database Conf., Brighton, England, 97-105, (1987). Lecluse, C., Richard, P. and & Velez, F., O2, an Object-Oriented Data Model, Proc. ACM SIGMOD Conf., Chicago, Illinois, 424-433, (1988). Zhao, L. and Roberts, S.A., An Object-Oriented Data Model for Database Modelling, Implementation and Access, Comp. J., 31, 116-124, (1988). Khoshafian, S.N. and Copeland, G.P., Object Identity, Proc. 1st Int'l Conf. on Object-Oriented Programming, Systems, Languages and Applications, Portland, Oregan, USA, 406-416, (1986). Jaeschke, G. and Schek, H., Remarks on the Algebra of Non First Normal Form Relations, Proc. ACM Int'l Symp. on PODS, Los Angeles, 124-138, (1982). Zaniolo, C., The Representation and Deductive Retrieval of Complex Objects, Proc. 11th Very Large Database Conf., Stockholm, (1985).
23
[20] Rowe, L.A. and Stonebraker, M., The POSTGRES Data Model, Proc. 13th Intl Conf. on Very Large Databases, Brighton, England, 83-96, (1987). [21] Osborn, S.L. and Heaven, T.E., The Design of a Relational Database System with Abstract Data Types for Domains, ACM Trans. on Database Syst., 11, No. 3, 357-373, (1986) [22] Codd, E.F., A Relational Model of Data for Large Shared Data Banks, Comm. ACM, 13, 377-387, (1970). [23] Hughes, J.G. and Connolly, M., A Portable Implementation of a Modular Multi-Processing Database Programming Language, Software - Pract. & Exper., 17,533-546, (1987). [24] Welsh, J. and Bustard, D.W., Pascal Plus - Another Language for Modular Multiprogramming, Software Pract. & Exper., 9, 947-958, (1979). [25] Meyer, B., Object-Oriented Software Construction, Prentice Hall International, London, (1988). [26] Yonezawa, A., Shibayama, E., Takada, T. and Honda, Y., Modelling and Programming in an ObjectOriented Concurrent Language ABCL/1, in ObjectOriented Concurrent Programming, (eels. Yonezawa, A. and Tokoro, M.), M1T Press, Cambridge, MA, USA, 129-158, (1987). [27] Tokoro, M. and Ishikawa, Y., Orient84/K: A Language with Multiple Paradigms in the Object Framework, 19th Hawaii Conf. on System Sciences, 198207, (1986). [28] Hoare, C.A.R., Monitors: An Operating System Structuring Concept, Comm. ACM, 17,549-557, (1974).