Programming versus databases in the object-oriented paradigm

Programming versus databases in the object-oriented paradigm

Programming versus databases in the 0bject-0riented paradigm S A Demurjian,* G M Beshers and T C Ting The object-oriented paradigm has come to the fo...

1MB Sizes 0 Downloads 17 Views

Programming versus databases in the 0bject-0riented paradigm S A Demurjian,* G M Beshers and T C Ting

The object-oriented paradigm has come to the forefront of the research community in the software enghwerhlg, programmhlg language, and database research areas. Moreocer, the paradigm appears capable of supporthzg adcanced applications such as software deL'elopment em'ironments (SDEs) that require both programmhlg ability and persistency t'ia a database system. tlowerer, there exists a dispari O' between the programmhzg and database approaches to the object-oriented paradigm. The paper examhws and discusses this disparity between the two approaches for the purpose of formulathlg an understanding of their cornmortalities and differences. This understanding has been h~strumental ht supporth~g work hwol~htg the protot)'phzg of SDEs ushtg the object-oriente[t paradigm, an e.ramhzation of the techniques required to ecoh'e a class library for persistency, attd the proposal of a software architecture and fimctionali O" of a persistent programmhlg language system. Thus, it is belieced that the work presented hi this paper can serve as a framework for researchers and practitioners whose efforts hwhtr the aforementioned or other, related areas. From a content perspecth'e, this paper prorhles a comparath'e analysis between the concepts o f programmhzg and databases for the object-oriented paradigm, through a detailed presentation of system-let'el attd model-lecel considerations. Both philosophical concepts and hnplementation pragmatics are hwestigated. A practical examhtation of the C + + programmhlg language t'ersus the Opal data language has been con&tcted, rerealing many cahtable htsights of system and application details and issues. Features of both approaches are also analysed aml illustrated. object-oriented paradigm, programmhtg languages, database s)'stems

TOWARDS AN UNDERSTANDING OF COMMONALITIES AND DIFFERENCES ,~s the 1990s are underway ~, there is a resurgence of interest in utilizing the object-oriented paradigm for advanced application specification, design, and development. Object-oriented modelling concepts, which have their roots'in abstract data types (ADTs) t-~, offer data. Computer Science and Engineering Department, Box U-155, 260 Glenbrook Road, The Universityof Connecticut,Storrs, Connecticut 06269-3155 *The work of this author was partially supported by grant IRI8902755 from the National Science Foundation and grant 1171-00022-00506-35-67 from The University of Connecticut Research Foundation 78

encapsulation, information hiding, and inheritance, so that both the structure and behaviour of advanced applications can be represented 4. From a modelling perspective, they provide a robust design methodology that stresses modularity, increases productivity, controls data consistency, promotes software reuse, and facilitates software evolution. From a user perspective, many of these benefits are accrued, through the definition of a uniform public interface (for maintaining consistency and reusing software) and a hidden private implementation (for supporting design changes and enhancements that do not affect the public interface). These concepts and their attainment in the object-oriented paradigm have been discussed elsewhere ~-8.

Increasing interest in object-oriented programming and databases The increased interest in the object-oriented paradigm has not been limited to only the software engineering and programming language research areas. While Smalltalk 9 initially promoted interest in the paradigm, and C + +~0 offered a popular forum for its usage, the paradigm has also found support in the database modelling and system research areas. Object-oriented database systems such as GemstoneH--a Smalltalk extension 9, ORION 12, Cactis 13, O Z + t4, and ODE Is and Ontos~6--C+ + extensions ~~ have all focused on providing support for object-oriented database application development. Coupled with the increasing interest in object-oriented programming and database application development, is the recognition that many application "areas would benefit by the integration of these two concepts, to support data persistency. This is particularly true for software-development environments (SDEs), as indicated by the work of the authors of this paper ~/,~~ and other research efforts ~3"2~-27. Database systems can play a critical role in supporting an SDE by offering many attractive capabilities, including transaction processing, concurrency control for multiple users, a centralized repository for data storage, data consistency via integrity and security, and querying capabilities 2~.

SDEs with object-oriented database support There are a number of advantages to developing SDEs with object-oriented database support. First, there is

0950--5849/93/020078-11 9 1993 Butterworth-Heinemann Lid

Information and Software Technology

S A DEMURJIAN, G M BESHERS AND T C TING

modelling support for the myriad of data types (source code, control-flow and data-flow diagrams, parse trees, symbol tables, designs, specifications, etc.) that are required 2s, so that the information which must be stored can be accurately and completely represented. The object-oriented paradigm is a strong candidate for supporting these diverse data needs, as traditional database solutions'(i.e., relational, hierarchical, and network) do not appear to have the modelling power required to represent these c6mplex types. Second, the availability of data encapsulation and information hiding in the objectoriented paradigm provides a clear public interface for tool designers and builders, thereby promoting both tool construction and integration. Finally, since the SDE is very dynamic (i.e., new languages and new tools must be supported over time), extensibility is a key issue. Extensibility is also supported in the object-oriented paradigm, with the ability to augment the public interface with the addition of new methods and types (via inheritance) and to modify the private implementation for supporting new functionality.

Disparity of object-oriented paradigm: p r o g r a m m i n g vs. databases Despite these advantages, experience has shown that there exists a disparity between the object-oriented paradigm as supported in the programming language and database domains. These differences must be understood in order to achieve a successful integration between the tools of an SDE and the database system necessary for supporting their usage. There are two approaches that can be taken to address this problem. One approach involves incorporating persistency directly into an object-oriented programming language, such as C + + - an attempt to move a programming language closer to a database system29--specifically adding language constructs for persistent data management and transaction processing for controlled data access. A complementary approach involves offering more programming language capabilities within a database system. In particular, adding complex data structures for supporting the development of advanced software and the treatment of functions (methods) as data for uniform storage, access, and execution. Consequently, the goal of this paper is to present, analyse, and understand the tradeoffs between programming language and database systems, and also to examine the problems and advantages of the two aforementioned approaches for their integration. This presentation is based, in part, on experience gained in object-oriented analysis and design when constructing design tools and environments ~9'2~176

Interest in this work and its goal The interest in this goal is traced to work done on SDEs with object-oriented database supporP 7"18, specifically developing Ozone, an SDE for supporting the implementation phase of the software development process ~9"2~ using the Pascal and C programming languages. Ozone Vol 35 No 2 February 1993

was implemented by developing a library of over 200 C + + classes that models software as projects composed of one or more systems (the unit of execution) that are further composed of one or more modules. Each module in Ozone contains multiple data representations, including source code, object code, parse tree, symbol table, and a profle for statistical data on the module. Ozone supports tools for project editing (i.e., defining new projects, adding/deleting systems, adding/deleting modules, etc.), editing/compiling modules (systems/project), and questioning software (i.e., what projects use a particular module, which modules use a particular function/variable, who has edited a particular module, what is the call graph, etc.). As part of the effort on Ozone, the integration of Ozone and Ontos 36.3~has recently been completed. The research focus of this effort was to examine the requirements and techniques necessary to evolve a class library in order to support persistency. The work in this paper has served as a primer to that integration effort, and developing an understanding of the commonalities and differences between objectoriented programming and database constructs has helped to facilitate the integration. Additionally, the authors look to the future, to propose the software architecture and basic functionality of a' persistent programming language system, that integrates programming-language and database-system capabilities, into a single, unified environmenP 8. The work presented in this paper is also the basis for that effort.

Relevance of the w o r k Clearly, the understanding of the commonalities and differences between the programming and database approaches to the object-oriented paradigm has been instrumental in the aforementioned research efforts. Consequently, the work presented in this paper can serve as an important framework for other researchers and practitioners whose efforts are in similar or related areas. This work can also function as a starting point for individuals who seek to comprehend the state of the object-oriented paradigm in both the programming and database areas, by outlining their respective scopes and indicating their numerous inter-relationships. 9 The r~mainder of the paper is organized into three sections. In the second section, comparison is made between object-oriented programming and database concepts from a conceptual perspective, focusing on system-level an0,,model-level commonalities and differences. In the third section, a more practical comparison of object-oriented programming and database concepts is offered, through an analysis of the C + + programming language I~ and the Opal data language 3g. This comparison is conducted on the basis of 11 criteria for examining the commonalities and differences of objectoriented approaches, in general, and C + + and Opal, in particular. The final section offers some overall insight into this work, provides a basic discussion of the authors' future thoughts on a persistent programminglanguage system '8, and details other, associated research. 79

Programmh~g versus databases in the object-oriented paradigm

P L S vs. DBS: C O M M O N A L I T I E S DIFFERENCES

AND

.

A programming language system (PLS) can be characterized as the environment that software engineers utilize to construct, test, and execute their applications, i.e., an operating system ( O S ) - - i t s file system and resources, the programming language compiler/debugger, etc. Therefore, the different pieces of a PLS often span the environment that they exist under. Conversely, a database system (DBS) is a single, integrated application that manages its own resources and executes and controls all phases of user transactions in a unified environment. IA DBS also offers support for constructing, testing, and debugging database transactions and applications, but this support is usually under the umbrella o f tools that comprise the user interface to the database system. In this section, the commonalities and differences between a PLS and a DBS are discussed, at both system and model levels of detail. At the system level, interest is in the functional aspects of PLS and DBS from a macroscopic perspective, that is, the way each system works, the way users utilize the system, and so on. The model level comparison offers a more detailed examination of microscopic features~ i.e., abstract data types (ADTs), classes, signatures, arid so on.

System-level considerations T o understand the issues that are required to incorporate persistence into a programming language, the inherent differences between the object-oriented paradigm as supported by a PLS and a DBS must be reconciled. At a system/runtime environment level, a comparison/ contrast of OS and shared resources versus DBS and shared data, is appropriate. The operating system executes and supports user applications in many different programming languages, providing access to and management of a common set of resources for the running applications. In an analogous fashion, the database system supports user transactions and provides access to and management of a shared database for running transactions. However, there are many differences. First, database systems often circumvent the host operating system (i.e., for direct disc access outside of the file system, network access independent of the network server, etc.) to achieve high performance. Second, the granularity level of information access is often much finer in the database system (i.e., security at instance level vs. the file level in an OS).

Philosophical differences Although the object-oriented paradigm as supported by PLSs and DBSs is very similar (i.e., support for data encapsulation, information hiding, and inheritance), there are many critical differences that are apparent, due to the differing philosophies of these two systems. First, information hiding in a PLS must be reconciled with data security in a DBS. A PLS has no concept of a user and security is essentially the prevention of unauthorised 80

access by users. The goal of information hiding is to enforce the external view of an abstract data type (ADT). This requires the hiding of both the representation of the ADT's value and the method code which manipulates that representation in conformance with the ADT's operations. In a OOPLS, data and methods are explicitly partitioned at design time into public/private categories. The data and methods which are private should be unaccessible to all users and all other objects in the program. Debugging is the only exception to this constraint. Compare this with the need for security in the database system where private means that only the owner, a user, can read the data. The owner may grant other users access to objects he or she owns, but this is a dynamic capability on object instances and may be revoked. Thus, in a OODBS, data is often assumed to be private, with permission to access data and methods provided via security primitives at post-design stages. We believe that information hiding in a OOPLS must be integrated and unified with security in a OODBS, as we have been exploring extensively in a related effort39-42. Second, inheritance in a O O P L S i s incorporated to allow object instances o f different types to be treated in a uniform fashion when it is semantically meaningful to do so. This contrasts with the database interpretation when work on generalization was first conducted, where the underlying motivation for inheritance was to allow ' . . . relevant details to be introduced in a controlled manner . . . ' (Smith and Smith 43, page 105), which emphasizes abstraction. It remains unclear which approach today's OODBS are following. However, there are some indications that the differences between these.two approaches are beginning to be reconciled.

Implementation considerations The final level of comparison involves the method power and applications that are required. It is not unreasonable to assume that the target platform o f any advanced application environment (like SDEs) is a network 'of workstations with one or more database servers, where many 'transactions' can be executed on the workstation where the application is running, rather than in the database system. It is unclear whether this will always produce the best performance, as the time to move large amounts of data over a network must be taken into account. However, executing method code in the database system significantly complicates the server and also can damage the reliability of the entire system when the method code gets a fatal error. Pictorially, PLS and DBS have traditionally overlapped through the existence of embedded data manipulation languages, as a result of an hnpedance mismatch 44 between the two approaches, as represented in Figure l(a). The impedance mismatch is traced to the fact that PLSs and DBSs handle data in significantly different ways. Embedded languages provide minimal overlap between a PLS and a DBS to bridge the mismatch, either by making database calls directly from a program or by inserting database commands that are automatically preprocessed into database calls by a precompiler. When Information and Software Technology

S A DEMURJIAN, G M BESHERS AND T C TING

PLS I~

I I

PLS 1

DBS I

a. A traditional view

t DBS I

1

b. Tomorrow's view

Figure 1. PLS, DBS, and their overlap object-oriented database systemsH-~6are considered, the amount~of overlap is significantly increased, thereby reducing the mismatch. For example, Ontos ~6classes are similar both structurally and syntactically to C+-Iclasses, with minor modifications to support persistency. The overlap has increased because an embedded language is no longer necessary and applications can use both C-F + and Ontos C + + directly, and be compiled using the Ontos C + + compiler. As the overlap grows, it implies that database constructs are now available in the PLS. If this is the case, then object-oriented models as supported by the PLS must contain database constructs. Similarly, it also implies that object-oriented models in a DBS must contain programming language constructs. Thus, while Figure l(b) indicates that the overlap is increasing, it is believed that there will always be some functions specific to a PLS (optimization, static semantic program analysis, etc.) and others specific to a DBS (disc I/O routines, recovely, etc.). In summary, by considering how the concepts of a PLS and a DBS can be merged, the capabilities of a PLS can be defined, the functions of the DBS identified, and the interfacing of these capabilities and functions can be understood, without degrading either the PLS or the DBS. In the process, the impedance mismatch between the two approaches can be substantially negated.

and operations associated with the type of interest are linked together in a fundamental way. This occurs via the signature of the ADT. A signature contains a specification of the behaviour of an ADT, with an emphasis on its operations. For example, the signature for a stack ADT specifies the operations (pop, push, top, init, etc.), where each operation is described by its input parameters and output result. Signatures are especially useful when a programming language supports generics. A generic is essentially a type-parameterizable ADT. For instance, a generic stack would contain all operations as listed previously, but the type of stack elements (integer, real, etc.) would be defined upon the creation of a stack instance (stack (integer) or stack (real)). The signature of such a generic stack would involve operations and their parameters/ return types, with the type of stack indicated as an additional parameter. Generics have many benefits in software engineering, including improved reuse/sharing, reduced development time, and localization of modifications.45 Finally, to transition from design concepts (ADTs, signatures, etc.) to implementation consideration, a class can be defined. The idea behind a class is to implement all the operations given in the signature as methods with appropriate parameters and return types, The chosen representation is encapsulated in an effort to ensure that it is never misinterpreted or abused. When a class is hzstantiated into an object, some memory is always allocated. The object's memory holds the representation of the object's abstract value. Methods may allocate more memory later as the object is modified, while issues involving object deletion are discussed elsewhere 18'29.

Model-level considerations Although the concepts of a data model in a DBS and typing in a PLS are analogous, their traditional realization and usage within each system is significantly different. Programming language typing evolved from compiler theory and is primarily concerned with representation flltegrity, i.e., not applying floating add to an integer value. This has been extended into classes to support data abstraction and data encapsulation. In this latter case, the emphasis is on the operations which may manipulate the data. In contrast, data models have evolved from the need to specifythe data in a database system. This has meant that allowable values (constraints) and the relationships between objects have received primary consideration. The result is that the model-level concepts are only semi-compatible and consequently are in need of refinement. When bridging the gap between a DBS and a PLS, it is increasingly important to distinguish the concepts of abstract data types, classes (types), and objects. This is particularly true for SDEs, where an integral part of the information (data) being stored in the DBS is in the form of ADTs and classes which may or may not take the same form and have the same theory as in the DBS. One of the primary observations of ADTs is that the values Vol 35 No 2 February 1993

C-t--F vs. O P A L : A P R A C T I C A L COMPARISON The previous section focused on examining the basic commonalities and differences that exist between objectoriented programming and database concepts from a very general perspective. In this section, a more detailed analysis of their relationship is provided using the C-t--t- programming language and the Opal data languagg. To provide a framework for this discussion, the next section presents a list of 11 object-oriented features which will serve as the basis for identifying language commonalities and differences. Then an. example from ,,C++ ~~ is described to illustrate the capabilities of an object-oriented programming language and to equate programming-language features with their database analogs, that is, the integration approach that incorporates persistency into an object-oriented programming language. Next, the capabilities of classes in the OpaP 8 language of the Gemstone" database system are detailed, including a comparison with programming language features--the integration approach that augments a data language/DBS with programming language features. Fina!ly, the commonalities and differences between the class concepts and capabilities of 8I

Programmh~g versus databases hi the object-oriented paradigm

"C+ + and Opal in the context of the list of language features are examined. At this point, it is important to note that the work leading to this paper differs from that of other researchers 4"s'46"47 in both approach and content. In Kim 46, the focus is on future research directions for object-oriented database research. The work of this ~aper's authors involves a more detailed aspect of this effort b y providing an initial understanding of the current status of object-oriented programming and databases. This comparison is also valid with respect to Zdonik and Maier 47, which has introduced terminology and outlines the fundamental features of object-oriented databases; and Wegner s, which is similar to Zdonik and Maier 47, but with a decidely programming language slant to the discussion. Finally, King 4 has compared objectoriented and semantic data models, while the present authors' work seeks to cross the boundaries between the programming and database areas.

A list of language features The basic difference between a PLS and a DBS is the notion of persistence. A PLS is usually directed at an application, the focus is on a single address space, and the manipulation of data within that address space while the application is executing. On the other hand, a DBS focuses on the storage and retrieval of data; the calculations to be performed on that data are largely left up to the application. To illustrate these differences, C + + classes and Opal classes are examined according to the following list of criteria or features, posed as questions: Typing: Does the language require static or dynamic typing, and if so why was that particular choice made? Universal class: Is there a class at the top of the hierarchy, and what are the implications of its existence? Representation: How does the language facilitate the user's ability to represent the data needed to implement an ADT?. Encapsulation: How is the representation (partially) hidden so that the software engineer can assume that the state of an object is not corrupted? Inheritance: How is inheritance (the isa hierarchy) handled, and what benefits are derived (no .pun intended) from the technique? Partition: How can data be partitioned for large software projects? Security: How are the concepts of a user and security handled? Concurrency: How is data shared and concurrent access facilitated? Constraints: Can data be constrahwd to known, mean-. ingful values? Mutability: How is_ (non) mutability of objects managed? Class Reuse: How are classes and their associated methods stored for promoting accessibility and reuse? Physical database: How is data stored (accessed), and what is the impact on recovery, versioning, etc.? 82

C + + and Opal have been chosen for this presentation as they are commercially available systems which are quite different, i.e., the former has evolved from C while the latter has evolved primarily from Smalltalk 9. Through this mechanism, it is hoped that the discussion is both approachable and understandable to people with background in either language. C + + and Opal are simply being used to provide a framework for the discussion, as it is believed that they are also representative of the programming and database approaches to the object-oriented paradigm, respectively. Also note that other researchers have identified many of these criteria, when considering the fundamental components of object-oriented data models and database systems 8'46"47.

C + + : an object-oriented programming language C + + is a statically typed language with a moderately rich set of representation primitives. In Figure 2, the first part of an example to illustrate the capabilities of C + + , is given. Notice that the class Table is divided (for encapsulation) into three sections each of which consist of members and methods. Entries in the private section are visible only to methods associated with this particular class, entries in the protected section are visible only to methods in this class and derived classes, and entries in the public section are visible everywhere after the point of declaration. The constructor, Table(Symbol def), and the destructor, ,-~Table (), are protected, signifying that an instance of this class can only be allocated by derived classes. None of the public methods are implemented, as indicated by the =nil. In addition, the reserved word virtual indicates that the associated method is called using a late binding protocol. This allows methods to be redefined in the derived class and requires that a runthne decision be made as to which method is the correct one class Symbol; ]] Forward declarations, inadequate for Table, but class Key; ]] realistic when parametric polymorphism is not available. class Table { private: Symbol def; protected: Table(Symbol s); ]] Called to create a Table. "rable( ); [] Called to deallocate a Table. public: virtual void update(Key, Symbol)= nil; virtual Symbol retrieve(Key) const = nil; virtual Key next(Key) const = nil;

};

class Symbol { ... }; Table::Table(Symbol s) : def(s) {} class Key { public: friend bool operator = = (Key, Key)

};

Figure 2. The Tree class

Information and Software Technology

S A DEMURJIAN, G M BESItERS AND T C TING

to call, as this may not be apparent by just a syntactic examination of the code. Consequently, a method call overhead cost in time is incurred. Further, the reserved word const after the retrieve and next methods indicate that these methods do not alter any of the members o f the class Table, i.e., they are read methods used mostly for protection--with obvious implications to sharing a Table olSject among several users. The Key class is specified to have just the equality operator defined. As presented, this class would support Table objects with linear search characteristics. Using this class as a base, other, important concepts in C + -t-, such as friends and inheritance, can be examined. Shown below is a class for representing nodes in a binary tree: class TreeNode { friend class TrceTable; private: TreeNode *1, *r; Key k; Symbol s;

};

The 'friend' specification allows the methods of the class TreeTable to modify the members of this structure. However, as illustrated below,,~here is no need for the rest of the program to be aware that the class TreeNode exists. T o demonstrate inheritance, the concept of a key is refined: class TreeKey : public Key { public: friend bool operator = = (TreeKey, TreeKey); friend bool operator < (TreeKey, TreeKey); [/Must be a total ordering

};

The enforcement of the comment, although desirable, is not possible in any current imperative programming language without making unreasonable restrictions. TreeKey is a public, derived class of Key, implying that it inherits all of the characteristics of Key. Now, with all the appropriate pieces in place, the actual class for TreeTable can be defined, as shown in Figure 3. The only actual data stored in this class is the root of the tree, the count of nodes for this tree, and the default value inherited from Table. The static member total_count is allocated only once for the entire application, and is shared among the class and all of its instances. The tree manipulation methods are protected, so that new public methods can be added in a derived class, for different kinds of traversals. It is worth noting that one could take TreeTable class and refine it to be an AVL tree if the members of TreeNode were changed from private to protected and also declared to be friends of the new class. This example is intended to illustrate the utility of nested classes. The remaining issue to discuss in this section concerns the ability to move the object-oriented paradigm as supported in C-t- + into a database domain. Technicalities have been addressed elsewhere, in Ontos ~6, which adds persistency to C + + , and ODE ~5, which extends C + + with constructs for constraints, triggers, security, Vol 35 No 2 February 1993

class TrceTable : public Table { public: virtual void update(TreeKey, Symbol); virtual Symbol retrieve(TreeKey) const; virtual TreeKey next(TreeKey) const; TreeTable(Symbol d) ~TreeTable( ); protected: TreeNode *search(TreeKey); void insert(TreeKey, Symbol); void delete(TreeNode *); TreeNode *next(TreeNode *); TreeNode *prev(TreeNode *); private: TreeNode *root; int count; static int total_count;

};

TreeTable::TreeTable(Symbol d) : Table (d) {root --- nil;} Symbol TreeTable::retrieve(TreeKey k)

{

TrceNode *t = search(k); if (t ! = nil) return t-+s; return def; //Allowed because protected not private

Figure 3. The TreeTable class

etc. This discussion is more conceptual in nature, and attempts to equate different components of C-I-+ classes to database concepts, if the C + + class could be made magically persistent. The intent is also to indicate where possible problems could occur. A class in C + + would translate to a class in an object-oriented database system (OODBS). But whereas C + + supports data encapsulation/information hiding via the public/private/protected protocol, in a DBS information would be assumed private by default when a class was declared. Then, the DBS would utilize security via views and other mechanisms to provide the means to make portions of the information public. In addition, in a DBS, different users might have different views of the same class, which would not be easily supported in C + + as there is no way for a class to have different public data for two different users. T4ae friend concept in C + + is similar to security as it allows the designer of a C + + class to make the private information available to other classes. However, this is a limited and dangerous capability, as it unequivocally breaks l,he class barrier. The transition from methods in a C + + class to transactions in a DBS is not immediately evident. In addition, C + + does not have any explicit constructs for specifying integrity constraints on the data, although individual C+-tmethods may be written for enforcing constraints. Clearly there is a substantial gap between the two concepts. As mentioned previously, efforts by the authors to bridge that gap and provide an integrated OOPLS/OODBS with a unified, consistent, characterization of information hiding in a PLS and security in a DBS, have been ongoing 3942. 83

Programming versus databases bt the object-oriented paradigm Object subclass: # Department instVarNames: # (# Name # Manager #TotalEmps) " classVars: #(#AIIEmps) poolDictionaries: # ( ) inDictionary: UserGIobals constraints: # [ # [ # Name, String]] islnvariant: false. Figure 4. The structure of an Opal class

O p a h an object-oriented data language Opal 3s is an object-oriented data language supported as part of the Gemstone object-oriented database system t~. To demonstrate the class definition capability of Opal, a sample Department class from a typical database application is given in Figure 4.The first line of the declaration defines Department to be a subclass of the system defined Object class, through which Department acquires persistence. Instance variables are the attributes of a class, in this case the department name, its manager, and the total number of employees. Class variables are shared by the class and all of its instances, which is appropriate for the total number of all employees in all departments, i.e. AIlEmps. Pool dictionaries allow classes to be shared between other users and sche~nas. In this case, all information associated with a class is therefore private, unless it is explicitly released for sharing. The field inDictionary refers to where the information on the class is stored, with the default being UserGlobals. Constraints have two uses, for indexing the information of a class and all of its instances, and the unintended use of type checking. Opal is dynamically checked, so there is no need to supply types for instance variables, islnvariant is used to specify whether an instance is changeable after creation. Finally, methods are also definable in Opal classes, but are not explicitly part of the class declaration (like in C + + ) and must be associated with a given Opal class. By comparing the class concept in Opal and C + O-, it is possible to evaluate the degree to which Opal aligns with object-oriented programming. Thus, this is a complementary discussion to that of the previous section, which considered CO-+ in the context of database system concepts. Clearly, there are immediate analogies between instance variables in Opal and members in CO-O-, and class variables and static members. The inheritance concept is similar, but CO- O- does offer two kinds of derived classes, while Opal has just basic inheritance. At a conceptual level, pool dictionaries are similar to friends in CO- O-. Both Opal and CO- Oinstances can access shared data in the form of pool variables stored in a pool dictionary and as global data, respectively. Dictionaries are a built-in data type which is quite similar to the CO-O- Table, in that it stores" associations which are (key, value) pairs. Refinements of the basic dictionary niay contain one or more bzdices to facilitate access. Notice that several pool dictionaries are possible, and although this can be useful it sometimes leads to confusion. At another level, a compilation unit in CO- O- and static nesting might be equated to a pool

84

dictionary. For example, in CO- + , the storing of the header file in an include directory and putting the method code into a library is similar to storing the class in an inDictionary and making it available via a pool dictionary. Unlike C + + , where the types of members must be provided, in Opal, the constraints, which are primarily intended for indexing, do not need to be given when specifying the class. This points to a major difference between CO- + and Opal in that CO- + is statically typed while Opal is dynamically typed. Finally, there does not appear to be any directly equivalent construct to islnvariant in C + + . In conclusion, while there are many commonalities between the two class concepts, there are also some critical differences that are not immediately reconcilable.

Language features revisited In this section, the implications of the language features are discussed, using the list presented earlier. Some of these features have been examined in the previous two sections, and are therefore now only briefly highlighted.

Typing and the universal class As indicated above, Opal is dynamically typed with a universal class Object. This has the effect of guaranteeing the availability of certain methods on every object (for example what class the object belongs to, a method for converting the object to a string, etc.), and from a database perspective, of associating a unique object identifier with every class instance. The latter is utilized for achieving persistence and is not available for use by the proo, rammer. When a message (essentially a method call) is sent to an object, the object's class is first searched for the method, and then all of its superclasses, until the method is found or an error is generated. On the other hand, CO- + , due to its static typing, guarantees that the method will be found, and further that it will be found in a constant time span. Thus, there is no need for'a universal class in C + + and also no need for a reference to the associated class to be included in every object. However, the lack of a universal class in CO- + inhibits an easy transition to persistency. The tradeoff is essentially one of the flexibility necessary for an interactive system and the need for the uniformity of all classes from a database perspective versus the performance and reliability that static type checking can provide.

Repres,r.ntation In practice, Opal has proved to be less useful than CO- -tin representing complex data structures. This is due primarily to the availability of pointers, arrays of pointers, etc., in CO- + . Although Opal does support the c o n c e p t of object identity, explicit references are not directly supported. For this reason, it is difficult to represent complex data structures (i.e., call graphs) without resorting to indexed classes and using integers as pointers. Opal's indexed classes are fairly flexible, but the issues of performance degradation and unnatural, hard-to-use data structures via indexed classes makes it Information and Software Technology

S A D E M U R J I A N ) G M BESHERS A N D T C "lING

inadequate i'or SDEs. Finally, support for true generics in both C4- 4- and Opal brings the signature concept (as discussed above) into play, allowing the design and development of class libraries that emphasize the sharing of code. If such a capability existed, software engineers would be able to specify a single, generic class that has a high-degree of representational abstraction, thereby promoting reuse as a fundamental aspect of design and development. Encapsulation and inheritance The concept of encapsulation and inheritance for both C + + and Opal have already been discussed in depth. This concept is also tightly coupled to security, concurrency, and constraint considerations, and is discussed below. Security is related to inheritance as inheritance involves the information that is available to the subtype from the supertype, which is an aspect of security. Multiple inheritance impacts on sharing of information among types, and ultimately, among users, thereby affecting concurrency of objects. Partition Opal supports both the concept of a user with access privileges (security) and collections of related objects. This works fairly well to support the hierarchy at the macroscopic level, but not the stronger need to encapsulate the microscopic details. For example, a parse tree should consist of a collection of nodes, all of which are reachable from a distinguished root by following a chain of references. None of the references should point to a node in another tree!! Security, concurrency and constraints Security is another aspect of encapsulation and inheritance, as it is promoting the notion of which information should be available. However, security differs from encapsulation in that it is sometimes dynamic. In a DBS, security can occur at both type and instance levels. At the type level, it is similar to encapsulation and inheritance, but it is not bound to single users, that is, different users can have different type-level security on the same schema. At the instance level, security is more powerful and, as such, often requires special pre- and postprocessing in a DBS to manage its usage. A number of other researchers have proposed security measures for objectoriented database systems4s-5~. Most of these efforts, however, are adopting existing security measures that do not take advantage of object-oriented features. This paper's authors have been conducting research on potential type-level database security mechanisms which are consistent with object-oriented concepts, precepts, and principles 39-~2. Further, this work has focused on extending and integrating information hiding with data security, to provide a consistent and uniform perspective of security for the object-oriented approach. Concurrency is related to security, especially when two different users have the same type or instance level views of the data. However, it differs from security in that it also involves the operational requirement that data Vol 35 No 2 February 1993

arc consistently presented when two users try simultaneously to access information. Constraints are related to security, in that they indicate the valid values of information, and to concurrency, as they indicate the allowable, consistent values. Neither C-I-+ nor Opal supports the wide-range of capabilities described above, as discussed in earlier sections. Mutability Mutability is defined as the ability to specify an operation on a class, that when applied to an instance, causes the instance to change its type dynamically at runtime 47. The question of mutable versus immutable classes is handled differently in C + + and Opal. In Opal, immutable classes are defined and immutable objects are created from instances of the associated mutable class. For obvious reasons, the immutable class cannot have instance variables or methods which are not part of the mutable class. In C4- + , it is possible to have constant values of a class (yes, C + + compilers are supposed to support this, but sometimes break or produce unpredictable results). The C + + approach is preferred, since it views immutable objects as simply elements of the set of values associated with the class. Mutable objects are locations which can only hold those values. But the set of values and the meanings of the methods are independent of the mutability of an object. Alternatively, one can view mutability as a constraint. It is worth noting that the integration of this concept with concurrency control is not well done by either approach. Class reuse In database systems, a data dictionary is often employed to store information on the schema (e.g., for each schema--the relations, for each relation--their attributes and types, number of tuples, etc.), and this information is controlled by the DBS, so that consistency is assured. In C + + , this is similar to class/method libraries. The critical difference is that these libraries do not have any means to control access. In Opal, the dictionaries are playing a similar role to data dictionaries. Ideally, all class declarations and method implementations should be stored in the DBS, under its controlled environment. From a non-storage perspective, generics a), fn C4- + and Opal would promote and facilitate class reuse, allowing a common signature to be shared among many implementations, and accruing the associated benefits, as was discussed earlier. T

Physical database issues Clearly, physical database issues can impact on the understanding of the commonalities and differences between object-oriented programming and databases. The ways that data are stored (accessed) to (from) secondary storage, support for versions and recovery, etc., are all important database issues. It is strongly held, however, that these issues fall into the category of being the responsibility of the DBS, as was indicated in Figure l(b) and in that section: and no matter how much 85

Programming rersus databases in the object-oriented paradigm

PLS and DBS overlap, there will still be functions that remain uniquely present in each system. Physical database issues fall into this category. Other issues related to persistent programming are discussed briefly in the next section and detailed more completely elsewhere ~s.

CONCLUSIONS AND ASSOCIATED RESEARCH This paper has examined the tradeoffs between objectoriented programming and database concepts, to identify their commonalities, to distinguish their differences, and consequently, to arrive at an understanding of their functions and capabilities when moving towards their integration. First considered wereconceptual commonalities and differences between a PLS and a DBS at both system and model levels. Focus on explaining C + + (and Opal) and detailing the database (programming) analogs that are identifiable followed. As part of this discussion, 11 criteria were presented that were then used in a more complete comparison and contrast of C + + and Opal. The authors' interest in reconciling these concepts is traced to the work on software development environments (SDE), in both the prototyping effort on Ozone 19"2~and the prqposal of a persistent programming language system (PPLS) to satisfy future application development and support needs 18. In the former, they have specifically utilized their understanding of commonalities and differences to facilitate the evolution of the Ozone class library to support persistency via Ontos 36"37. In conclusion, it is believed that the work presented in this paper can function as an important and useful framework for researchers and practitioners who are working in similar and related areas: and it can also serve as a primer for individuals who are seeking to come to terms with the state of the object-oriented paradigm, as supported in the programming and database areas. As to related research, the effort on PPLS is of particular note, as it is proposing the need for, and software architecture of, a PPLS to support advanced application development. A PPLS is implemented as a set of 'trusted' components including a compiler, a data dictionary, an incremental dynamic linker, a network interface and a database system. The 'trust' displayed by these components is their conservation of the static and dynamic semantics of the system. In particular, the same type checking algorithm is used by both the database system and the compiler. The compiler supports a l~ersistent language and is capable of manipulating both classes and code for storage in a database. The data dictionary is a combination of its traditional interpretation in a database system, augmented with abilities more aligned with symbol table information, and is. intended to support dynamic modifications of classes and class hierarchies_via schema evolution. The incremental dynamic linker will support the combination of object modules into larger object modules, and must interface with a database browser for debugging and other associated activities. The network interface defines the protocols that facilitate the communication 86

of information between the application (needing to use objects) and the database server (responsible for stor. ing/accessing objects). Finally, the database system must function as a central repository of objects and supply mechanisms for transaction management, concurrency control, and recovery. The interested reader is referred to Demurjian 's, for a more indepth discussion of the PPLS and other issues relating to object-oriented database support for SDEs. Other related research underway includes the design and prototyping of ADAM, short for Active DAta Model 3~ ADAM is an object-oriented database design tool that generates C+-t- applications (both class header and method implementation files) from a combination of graphically and textually supplied input. ADAM extends the object-oriented paradigm with propagation semantics (equivalent to design-level triggers) and the ability to define relationships between types, in order to model inter-class behaviour other than inheritance. Using a multiphase, graphical-based tool, designers can specify object types, inheritance among types, relationships among types (i.e., l-l and l-m), and propagation among types. ADAM has been implemented on both Sun 3 and Sparc platforms running Unix, using C + + , the X windowing system, and InterViews. Also the code-generation of ADAM has recently been improved b), the present authors, so that compilable Ontos C + + code is generated for an application 5-''5~. Current and ongoing work on ADAM involves modelling enhancements54 and its extension to support the definition and analyses of user-role based security for an object-oriented design model39-42. The authors' interest in ADAM also involves collateral research efforts in modelling information for distributed dynamic decision-making environments (DDD) 55. Additionally, ADAM has been customized to create ADAM/DDD, an application-specific modelling and design tool for DDD 3t'3s. ADAM/DDD is used to design and generate scenarios for the existing DDD environment at The University of Connecticut, and has implemented propagation to ensure that the scenario automatically remains consistent whenever the designer makes modifications.

REFERENCES 1 Liskov, B and Zilles, S 'Specification techniques for data abstraction' IEEE Trans. Soft. Eng. Vol SE-I (March 1975) 2 Liskov, Bet aL 'Abstraction mechanisms in CLU' Comm. AC~! Vol 20 No 8 (August 1977) Also appearing in 58 3 Shaw, 51 'The impact of abstraction concerns on modern programming language' Proc. IEEE Vol 68 No 9 (September 1980) 4 King, R 'My cat is object-oriented' in 57 5 Korson, T and McGregor, J 'Understanding objectoriented: a unifying paradigm' Comm. ACM Vol 33 No 9 (September 1990) 6 Meyer, B Object-oriented software construction PrenticeHall (1988) 7 Rumbaugh, J e t aL Object-oriented modeling and design Prentice-Hall (1990) 8 Wegner, P 'Concepts and paradigms of object-oriented programming' OOPS Messenger Vol 1 No I (August 1990) Information and Software Technology

S A DEMURJIAN, G M BESHERS AND T C TING

9 Goldberg, ,~ 'The influence of an object-oriented language on the programming environment' Proc. 1983 ACM Comp. Science Conf. (February 1983) Also in 56 10 Stroustrup, B The C+ + programmhlg language AddisonWesley (1986) I1 Maier, D et aL 'Development of an object-oriented DBMS' Proc. 1986 OOPSLA Conf. (September 1986) 12 Kim, K et aL 'Features of an object-oriented database system',, in'57 13 Hudson, S and King, R 'The Cactis project: database support for software environments' IEEE Trans. Soft. Eng. Vol 14 No 6 (June 1988) 14 Weiser, S and Lochovsky, F 'OZ+: an object-oriented database system' in 57 15 Agrawal, R and Gehani, N 'ODE (Object database and environment): the language and the data model', Proc. 1989 ACM SIGMOD Manage. Data (June 1989) 16 'ONTOS object database documentation', Release 2.1 Ontologic, Burlington, MA, (June 1991) 17 Demurjian, S, Ammar, R, Beshers, G and Ting T 'The SEEDS project at the University of Connecticut' Proc. 2nd Int. Workshop Comp. Aided Soft. Eng. (July 1988) 18 Demurjian, S, Beshers, G and Nichols, G 'Object-oriented database support for software development environments' Technical Report CSE-TR-90-10 Department Computer Science and Engineering, University of Connecticut, May 1990; accepted for publication in Prater, J (ed) Progress hi object-oriented databases Ablex Publishing 19 Nichols, G and Demurjian, S 'Object-oriented database design for the Ozone softw~e-development environment' Proc. 1990 BNCOD-8 Conf (July 1990) Putnam Press 20 Nichols, G, Demurjian, S and Ting, T C 'Supporting the needs and requirements of data modeling via the capabilities of the object-oriented paradigm' Technical Report CSE-TR-91-05 Department Computer Science and Engineering, University of Connecticut (March 1991) 21 Brown, A Database supportfor software enghwerhzg Halsted Press, Wiley (1989) 22 Clcmm, G 'The workshop system', Proc. ACM SIGSOFT[ SIGPLAN Soft. Eng. Sym on Practical Soft. DeceL Enrironments (November 1988)

23 Relss, S 'Working in the garden environment for conceptual programming' IEEE Soft. Vol 4 No 3 (November 1987) 24 Rosenblatt, W, Wileden, J and Wolf, A 'OROS: toward a type model for software development environments' Proc. 1989 OOPSLA Conf. (October 1989) 25 Taylor, R et aL 'Foundations for the arcadia environment architecture' Proc. ACM SIGSOFT/SIGPLAN Soft. Eng. Sym. Practical Soft. Decelopment Environments (November 1988) 26 Thomas, I 'PCTE interaces: supporting tools in software-engineering environments' IEEE Soft. Vol 6 No 6 (November 1989) 27 Zaroliagis, C et aL 'The GRASPIN DB: a Syntax directed, language independent software engineering database' Proc. 1986 hit. Workshop on Object-Oriented Database Systems

(September 1986) 28 Bernstein, P 'Database system support for software engineering; an extended abstract' Proc. 9th Int. Conf. Soft. Eng., (March 1987) 29 Beshers, G 'Mobile objects in C+ + ' Proc. of C + + at Work Wang Institute (October 1989) 30 Demurjian, S, Ellis, H and Hu, M-Y 'Software reuse and evolution in ADAM: A joint object-oriented programming language and database design-tool' Proe, 1990 Syrup. on Object -Oriented Programming Emphasizing Practical Applications (September 1990) 31 Demurjian, S, Hu, M-Y, Kleinman, D and Song, A

'ADAM/DDD: an application-specific database design tool for dynamic distributed decisionmaking' Proe. IEEE Syst. Alan. and Cybernetics Conf. (October 1991) Vol 35 No 2 February 1993

32 Ellis, H, Demurjian, S, Maryanski, F, Beshers, G and Peckham, J 'Extending the behavioral capabilities of the object-oriented paradigm with an active model of propagation' Proc. 18th Ann. ACM Comp. Science Col~ (February 1990) 33 Ellis, H and Demurjian, S 'ADAM: a graphical, objectoriented database design tool and code generator' Proc. 19th Ann. ACM Comp. Science CoJ~ (March 1991) 34 Ellis, H, Ammar, R and Demurjian, S 'The role of propagation in database support for performance-modeling environments' Proc. 1992 Pheonix Cm~ Comp. and Comm. (April 1992) 35 Hu, M-Y, Demurjian, S, Kielnman, D and Song, A 'ADAM/DDD--a scenario design tool for dynamic distributed decisionmaking' Proe. BRG 1991 Syrup. on Command and Control Research (June 1991) 36 Demurjian, S and Ranganathan, S 'Approaches, issues, and experiences in upgrading a class library to support persistency' Technical Report CSE-TR-92-10 Department Computer Science and Engineering, University of Connecticut (May 1992) 37 Ranganathan, S 'Evolving a class library to support persistency: the integration of Ozone and Ontos' Masters Degree Thesis University of Connecticut (December 1991) 38 'Programming in Opal', Version 1.5, Servio-Logic Dev. Co., Beaverton, OR (1989) 39 Hu, M-Y, Demurjian, S and Ting, T C 'User-role based security profiles for an object-oriented design model' in Landwehr, C and Thuraisingham, B (eds) Database security, VI: status and prospects North-Holland (1993) 40 Ting, T C, Demurjian, S and Hu, M-Y 'On information hiding for supporting user-role based database security in the object-oriented paradigm' Proc. 5th IFIP IVGII.3 tVorking Cot~ on Database SecuriO', Shepherdstown, W. Virginia (November 1991) 41 Ting, T C, Demurjian, S and Hu, M-Y 'Requirements, capabilities, and functionalities of user-role based security for an object-oriented design model' in Landwehr, C and Jajodia, S (eds) Database security, V: status and prospects North-Holland (1992) 42 Ting, T C, Demurjian, S and Hu, M-Y 'A specification methodology for user-role based security in an object-oriented design model--experience with a health care application', Proc. 6th IFIP IVGII.3 tVorkhzg Co1~ on Database Security, Vancouver, Canada (August 1992) 43 Smith, J and Smith, D 'Database abstractions: aggregation and generalization' ACM Trans. Database Syst. Vol 2 No 2 (June 1977) 44 Copeland, G and Maier, D, 'Making SmaUtalk a database system' Proc. 1984 A CM SIGMOD h~t. Cot~ Management of Data (June 1984) 45 Ghez~i, C, Jazayer, M and Mandrioli, D Fundamentals of software engbzeerhlg Prentice-Hall (1991) 46 Kim, K 'Object-oriented databases: definition and research directions' IEEE Trans. Knowl. Data Eng., Vol 2 No 3 (September 1990) 47 Zdonik, S and~laier, D 'Fundamentals of object-oriented databases' in 58 48 Keefe, T et al. 'A multilevel security model for objectoriented systems' Proc. l lth Nat. Comp. Security Cot~ (October 1988) 49 Rahitti, F et al. 'A model of authorization for next generationdatabase systems' ACM Trans. Database Systems, Vol 16 No 1 (March 1991) 50 Shilling, J and Sweeney, P 'Three steps to views: extending the object-oriented paradigm' Proc. 1989 OOPSLA Conf. (October 1989) 51 Thuralsingham, M 'Mandatory security in object-oriented database systems' Proc. 1989 OOPSLA Conf. (October 1989)

87

Programming versus databases in the objectooriented paradigm 52 Demurjian, S and E! Guemhloul, K 'Software synthesis in persistent, object-oriented application design and development' Technical Report CSE-TR-92-19 Department Computer Science and Engineering, University of Connecticut (August 1992) 53 El Guembioui, K 'The integration of the design tool ADAM]DB with the object-oriented database system Ontos' Masters Degree Thesis University of Connecticut (June 1992) 54 Ellis, H and Demurjian, S 'Object-oriented design and analyses for advanced application development--progress

88

55 56 57

58

towards a new frontier' Proc. 21st Ann. ACM Comp. Science Cot~ (February 1993) Kleinman, D and Serfaty, D 'Team performance assessment in distributed decision making' Proc. Workshop Simulation and Trahffng Research Syrup (April 1989) Barstow, D, Shrobe, H and Sandewell, E (eds) Interactive programming environments McGraw-Hill (1984) Kim, W and Lochovsky, F (eds) Object-oriented concepts, databases and applications Addison-Wesley (1989) Zdonlk, S and Maier, D (eds) Readings in object-oriented database systems Morgan Kaufmann (1990)

Information and Software Technology