Object-oriented database systems: The next miles of the marathon

Object-oriented database systems: The next miles of the marathon

0306-4379190 $3.00 + 0.00 Information .SysremsVol. IS, No. I, pp. 161-167, 1990 Printed in Great Britain. All rights reserved Copyright C’ 1990Perga...

897KB Sizes 0 Downloads 9 Views

0306-4379190 $3.00 + 0.00

Information .SysremsVol. IS, No. I, pp. 161-167, 1990 Printed in Great Britain. All rights reserved

Copyright C’ 1990Pergamon Press plc

OBJECT-ORIENTED DATABASE SYSTEMS: THE NEXT MILES OF THE MARATHON KLAUS

lnstitut

fur Informatik,

Universitat

Zurich-Irchel,

R. DITTRICH Winterthurerstrasse

190, CH-8057

Zurich.

Switzerland

(Received for publication 16 November 1989) Abstract-Half a decade or so ago, object-oriented database systems have become an extremely hot topic of database research and development. At many places all over the world, people work on individual aspects or complete system prototypes. Already, some systems have even reached the marketplace. Experience shows, however, that it takes some 15 yr or more until a new software-related technology really makes it towards widespread use in large-scale applications. Obviously, then, the field of object-oriented database systems is not yet in a state where e.g. relational systems are today, and it may even be harder to get there than it has been for these latter kinds of products: there is no such thing like fhe object-oriented data model, a number of basic issues are still pending (better) solutions, and there is also still vast room for improved implementations. This paper first tries to bring some clarification to the notion of “object-orientation” in the context of database systems, and then elaborates on some selected features and issues where further progress seems to be especially desirable.

widespread use of efficient and reliable products in large-scale applications. Given the fact that it is now only half a decade or so since ooDBS are being researched and developed, it is obvious that no satisfactory degree of maturity can be expected today. What has been achieved so far is that:

1. INTRODUCTION At least since the first publications on Smalltalk appeared in the popular computer science press [l], the notion of “object-orientation” has started to become one of the major buzzwords of the field, at first in the context of object-oriented programming (languages) and object-oriented system design. As it is the case with most buzzwords, “object-orientation” unfortunately suffered (and in many respects still suffers!) from misconceptions, overestimation and glorification on one side, and at the same time from ignorance of the “deja vu”-style at the opposite extreme. Typical symptoms include (slightly exaggerating) that:

-A

relatively large number of groups all over the world (including some of the most respected database people) are working on individual ooDBS aspects and/or complete system prototypes; as a result, a number of issues have been clarified, first-shot solutions have been tried out (and have partially triggered improved solutions); some systems-mainly produced by start-up companies-are even already offered for sale. -Considerable interest in ooDBS has been created; technical people and managers from numerous potential application areas are eager to acquire information and training; conferences, workshops and professional seminars on the topic attract large numbers of participants.

-everybody cries for mature products that have it already (without always knowing what it exactly is and what it is potentially good for), -in the marketplace (and sometimes in research, too), everybody,‘s system claims to have it, or at least everybody claims to work on it, -old stuff reappears under the new label (which may, in some cases, even be justified!).

However, there are also a number negative side:

When the database community got attracted a little later and started to carry over object-orientation to databases/database management systems, this situation did of course not at all improve. On the contrary, due to somewhat different intentions, approaches and attitudes, object-oriented database systems (or ooDBS for short) today are even in a less consolidated state then e.g. object-oriented programming languages. Past experience (e.g. from the development of relational database systems) indicates that it takes some 15 yr or more from the first (published!) ideas of a new software-related technology until the

of issues on the

-The history of relational database systems started with a clear concept and a formal basis (at least as far as the data model itself is concerned), and all important system prototypes have been developed on these common grounds. In contrast, the origins of ooDBS were a set of vague ideas that immediately lead to a number of prototypes with quite different underlying concepts. Consequently, it is now hard to even agree on a common definition of an ooDBS (apart from a very superficial one), and to some extent also on what the real issues in system development are. 161

KLAUS

162

R. DITTRICH

-We do not yet have any major experience in making real use of ooDBS, especially when it comes to applications of realistic size. Clearly, for the second problem it is just a question of time to get solved. Nevertheless, one should neither underestimate the effort it will take to introduce this (or any other) new technology to users, nor expect that all pitfalls will be known beforehand. The much more serious problem is the first one. Unfortunately, history is no transaction which would allow us to undo its effects. The only way to arrive at a common basis will thus be that the various groups working in the field get together and try to unify their views, and there is hope that such a process is already underway. Once a generally accepted understanding of the ooDBS concepts has been established, it will certainly be possible to come up with the appropriate formal basis and to compete for the best solutions for specific problem areas. In the remainder of this paper, we will first promote a definition of an ooDBS that recently has been put together by a number of researchers from six different “schools”, including this author. Afterwards, we will elaborate on some selected features and issues where further progress seems to be especially desirable.

listed the characteristics that are required and not the mechanisms by which they are provided (e.g. locking, transactions etc. for concurrency control, recovery,. . ,). Disk management includes such things like access paths, clustering or buffering; they are not directly visible to the user, but no realistic DBMS can fulfill its task without them. Under an ad koc query facility, we understand any means that allows access to database data without the necessity to go through the usual cycle of programming, compilation, execution and debugging. For example, a descriptive query language, a graphical browser or some “fill in the form”-facility would do that job. The list of characteristics we require for an objectoriented data model is much more controversial. First of all, the concept of the data model itself has to be understood in a broad sense as a framework in which to express real world semantics in a database system. Though this is not at all a novel view, traditional data models offer rather limited means and do not address at all some features for advanced semantic capture. They should thus not be taken as a yardstick for understanding what a data model is. With this remark in mind, our requirements include the following which will be discussed one by one in the sequel:

lcomposite objects l

2. TOWARDS

A COMMON

DEFINITION

In the sequel, we present and explain a list of features and characteristics that an ideal ooDBS should have and that together constitute a definition of the notion “object-oriented database system”. It is the result of joint work by six people representing groups that have all taken different approaches so far. The original publication is “The object-oriented database system manifesto (a political pamphlet)” [2]; this chapter contains a very condensed summary (partially in a rearranged form and with some reinterpretation). Like a relational database system is one which is based on the reiational data model, we define an object-oriented database system to be a database management system with an object-oriented data model. To fall into the category of database management systems means to have the following features (apart from the data model question to be discussed separately): persistence *disk management lconcurrency control 0 recovery lud hoc query facility l

Access control and distribution (if an issue) might be added. All these features are presented in every good database textbook and do not need much further explanation. Note, however, that we have

user-definable types

lobject identity lencapsulation 0 types/classes

ltypefclass hierarchies loverloading/over~dingtlate binding 0 ~omputa~onal completeness The units an ooDBS deals with are called objects. They have a representation, i.e a (possibly structured) value or state (sometimes also called a set of instance variables), and a set of operations that are applicable for that object. Composite objects (or, synonymously, complex objects, structured objects, molecular objects), apart from having attributes in the traditional sense, are built from components that are objects in their own right and may be composite themselves. The presence of this characteristic especially means that objects of any level of composition have to be supported in a uniform way, that object constructors (e.g. for tuples, sets, lists) have to exist, and that specific operations to dynamically modify and exploit object structures are needed. Every data model comes with a number of predefined data types (which are usually simple ones like integers, characters, strings). Above that, traditional record-oriented data models provide some sort of types with fixed sets of generic “parameterized” operators. In the relational model, for example, there is the parameterized type “(homogeneous) set of (flat) tuples”, called a relation. The only parameters to be chosen for a relation by the user are the number of attributes, their names and their (predefined, simple) types, plus the name of the relation itself. In contrast,

Object-oriented the user-definable types requirement means that really new types can be introduced to the system by its users, and afterwards dealt with in the same way as with predefined ones. This includes particularly that mechanisms are needed to program new operators and register them with the system. Object identity means that the existence of an object is independent of its value. It is thus possible within the database to distinguish between the equality of two objects (i.e. they happen to have-at a given point in time-the same value) and their identity (they really are-always-the same object). Obviously, the identity of an object is system-wide unique, does not change during the object’s lifetime, and even after its deletion, it is forever guaranteed that any other object may not have the same identity. Object identity (even when not explicitly made visible in a system) is an underlying concept for shared objects (i.e. objects that are components of two or more objects); it is also necessary for easily and correctly reflecting updates of real world entities in the database. Otherwise, if e.g. a person gets married and at this occasion changes his/her name which has been used as a key for the respective database object, how could one tell whether the updated object would still represent the same real world entity as before? Obviously, the current practice is to have the user introduce artificial attributes like employee numbers etc. and thus make it his task to deal with identity. It is now recognized that the database system can cope much better and more reliably with this problem. When a user defines a new type, he has to choose a representation for the values of its instances, he specifies the operator interfaces, and he programs their bodies (in terms of the representation). In this case, it usually does not make sense that users of this type may look at the representation details or at the operator codes; all they need to know to use objects of the type is its interface, i.e. the operator specifications. Encapsulation provides for information hiding in this abstract data type flavor. Note, however, that this should not be made an “all or nothing” principle: in some cases, one may well want to define a new type just on a representational basis and adopt the (generic) operators of the representation (e.g. direct access to the attributes of a tuple) for the type interface. maybe augmented by one or two additional operators that are user-defined. The requirement for types/classes as such is not that new for database systems, but it has an extended meaning for ooDBS and the notion of a class has been carried over from the programming language area. Whatever the favorite definitions (and there is some dispute about different characteristics or just synonymous use), types/classes include the following: -the specification of the commonalities of a set of objects with the same properties (i.e. their operators and representational structure),

database systems

163

-a

mechanism to create instances (objects) of the type/class (“object factory”), -mechanisms to query and manipulate the set of instances currently in existence for a type/class (the extension; “object warehouse”).

The real novelty with respect to types/classes in ooDBS is that they can be organized in hierarchies and thus allow to express that one type is considered a subtype of another one, i.e. that it is specified in more detail (with respect to representational structure and operators) than the supertype. The standard example is a type person with attributes like name, address, age etc. and subtypes employee (with additional attributes for salary, department, .) and student (semester, courses, grade points, .). Along with type hierarchies goes the concept of inheritance in that objects of a subtype inherit the properties (again, structure and operators) from the supertype, in addition to those properties that have been specified with the subtype itself. Of course, inheritance propagates all the way to the top of the hierarchy if there are more than just two levels. There are various solutions to particular questions arising in this context that are beyond the scope of our discussion here, for example on how to resolve ambiguities if multiple supertypes are allowed. In summary, with type hierarchies and inheritance, more semantics can be expressed than without, it introduces an additional modeling discipline (refinement!), and it may even save coding effort because operators need not always be recoded. Closely connected are the requirements for overriding, overloading and late binding. Overloading means that the same operator name (and interface) may be used for different operations in different types. This allows to reimplement (“override”) an inherited operator for a subtype, taking into account the additional semantics that may be known there. The advantage one gains is that users that deal with objects of various subtypes of a given supertype do not have to program tedious case selections to find out the exact type of any object and apply the appropriate operator; they may uniformly use the operator specified for the supertype, and the system will automatically determine which implementation to execute. Obviously, this mandates late binding: the system cannot bind names to programs prior to runtime. However, this is again not that much different from traditional approaches (consider e.g. the “find” operator in a network database system). Finally, computational completeness relates to the language facilities provided for programming the operators of user-defined types. We require that arbitrary algorithms may be coded, and thus mere query language facilities will typically not be enough for this purpose. 2.1. Discussion and classification The list of characteristics we have presented above is only a first attempt to give a comprehensive

KLAUS R. DITTRICH

164

definition of an ooDBS. Though it contains most features that up to now have evolved in the area, we did not specify how they should orthogonally cooperate. This is not a superfluous requirement: it may be much easier to provide an uncoordinated potpourri of the above features than an elegant, streamlined system that has them all. However, only the latter will be easy to comprehend and to use, and thus stand a chance to find wide acceptance. Also, we have defined an “ideal” ooDBS; most system prototypes and products that are available today do not (yet) fulfill it entirely, and for the majority of those, this is due to the data model requirements listed. In order to nevertheless allow a better evaluation of current and upcoming proposals, it makes sense to establish a coarse classification of object-oriented data models on the basis of the presented definition. It assumes that the first two criteria, composite objects and user-definable types, are the decisive ones, i.e. it is meaningless if a model does not incorporate any one of those, but supports some of the others. On the other hand, once composite objects and/or user-definable types are available, the data model is already a big step forward compared to record-oriented ones, even if some of the other goodies are missing. In this respect [3] introduced the following classification: a data model is called:

lstructurally

object-oriented

if it supports

com-

object-oriented if it supports

user-

posite objects,

lbehaviorally

defined types,

lfuUy

object-oriented if it supports both and has all the other features as explained above.

Full object-orientation matches the comprehensive definition as introduced. Of course, the provision of the other six data model features (where applicable; some of them to not make sense for structural object-orientation) does not hurt for the first two classes either. A structurally object-oriented database system etc., then, is again a database system (remember the list of criteria) with an appropriate data model. Finally, a number of issues has not been addressed at all in these definitions. They include, for example, topics like object versions or “nested” and “long” transactions (or design transactions). Though it is absolutely justified that these are often discussed in the context of object-oriented database systems, they do not decide about being an ooDBS or not. 3. SOME ISSUES FOR FURTHER CONSIDERATION Taking the definition of an ooDBS as introduced in the previous Section as a starting point, the current achievements in the area present themselves as follows. There is one direction of research and development that aims at adding persistent objects to

programming systems. For those, the fulfilment of the data model issues (apart from composite objects as discussed below) is usually not much of a problem, but they have to work harder to get the database system requirements straight, and in fact work remains to be done in this area. This direction also includes efforts to define entirely new database programming languages integrating the language and the persistence issues from their inception (in contrast to adding the latter to a language that was first invented without having database issues in mind). [4-61 report about some approaches that fall into this category. The second direction is to add an object-oriented data model to a database system, again either by extending a record-oriented one or by coming up with a completely new solution. Here, the database system requirements are less of a problem, but the integration of the appropriate data model mechanisms becomes harder. Some publications on representatives here include [7-lo] or [ll-121 are some sort of hybrid approaches that try to unite both directions and thus-not surprisingly-are already rather close to the definition. Furthermore, though not directly covered by the definition nor directly visible to system users, a lot of interesting work has been going on towards some sound theoretical foundation and towards ways for efficient implementations. However, as already alluded to in the Introduction, both areas should flourish even more once the common understanding of the subject of interest has been established-thus these are the first issues where we would like to propose further investigation. The reason for the advent of object-oriented database systems (besides the fact that it is a fascinating topic and that it has become fashionable) was twofold. On one side, this approach promised to overcome the “impedance mismatch” that had been experienced for a long time between database systems used to reliably manage large quantities of data, and programming languages used to operate on these data. On the other side, ooDBS are thought to be the answer to a large bulk of upcoming “non-standard” application areas for database technology, including e.g. design systems (CAD in various areas, software engineering environments), integrated manufacturing systems (CIM), office automation systems, knowledge representation or geographic information systems. While the first issue is mainly satisfied by behavioral object-orientation, the second one clearly relies on the structural aspects. Interestingly enough, most current systems that are offered as products are more or less behaviorally object-oriented database systems only (should that be due to the fact that this kind of system is easierfaster-to develop than one incorporating the structural aspects, too??). Though they do better with respect to handling complexly structured units than record-oriented systems, they do not really support

Object-oriented

165

database systems

objects in the full sense. In the sequel, we will therefore elaborate a bit on what one would really like to have in this area, and also on how to deal with (arbitrary) relationships between objects. Finally, we will briefly indicate some more areas that deserve attention for further development.

composite

3.1. Composite objects revisited An object is composite if it has at least one component that is itself an object (i.e. not only an ordinary value). The type of a subobject may be different from the superobject type, and multiple subobjects may have different types themselves. To get an impression of what it means to have composite object support in an ooDBS, one should have as a look the operations that are typically required. Among others, they include: -retrieve/get an object (including all its component objects), -retrieve/get all or specific subobjects of a given object, -navigate within object structures, --compose/decompose a composite object (structure building operations), delete an object (with/without its subobjects), -opy an object (“shallow” or “deep” copy). A closer look at composite objects reveals a number of details that further have to be taken care of (cf. e.g. [13]): -There must be mechanisms to dynamically change the structure of composite objects (by inserting and deleting components). Depending on the real world to model, this should be possible in a top-down or a bottom-up manner (or in a mixed mode, of course). -There are cases where subobjects may exist in their own right, i.e. without necessarily being a component of any other object at all times. On the other hand, there are also cases where the existence of a subobject is dependent on the existence of its superobject. Obviously, this has to be taken care of e.g. in the delete operation. -For some objects, it may be allowable that they are shared, i.e. that they are a component of more than one composite object at the same time.

I

Fig. 1. Object composition and arbitrary object association both expressed by references.

a diskdrive in a desktop computer for (2), a chapter in a text-book for (3), and a path in a VLSI cell design for (4). As an example for the different treatment needed, take the delete operation. In cases (1) and (2), the deletion of the superobject may or may not propagate to the component object; if it does not, the component will continue to exist “stand-alone” in the database. In case (3), the subobject has to be deleted unless it is also currently a component in another object (or otherwise the deletion of the superobject has to be prohibited as long as such a subobject exists). In case (4) finally, the deletion of the subobject (or the prohibition of the superobject deletion) is mandatory in all cases. Most current systems (even structurally objectoriented ones) do not provide these elaborate composite object semantics in full, though it is clearly

The second and third issue show up in combination which gives rise to four cases that need different treatment: (1) (2) (3) (4)

independent, sharable subobjects, independent, nonsharable subobjects, dependent, sharable subobjects, dependent, nonsharable subobjects.

Sharable components are typical for logical real world components while physical components are typically not sharable. Examples include a program module in a programming system for (l),

I

I

Fig. 2. Objects only vs objects and relationships.

KLAUS R. DITTRICH

166

Fig. 3. Relationships as objects. needed for many applications. Object composition is typically achieved by a special reference type of attribute which is kind of an object-valued pointer. The problem with this solution is that no general distinction is possible between references used for compositions (with some sort of the above semantics involved) and arbitrary references to model associations between objects without those semantics (Fig. 1). Only the programmers of the individual types can know which reference is meant for what purpose, and thus a general composite object support (which would not have to be reprogrammed for every type and for every composition pattern) is impossible. There are at least two ways to solve the problem: l

l

The system provides all necessary support as a built-in mechanism, i.e. there is a facility to specify the desired types of object structures, and there is a set of operators as outlined above, reflecting the various kinds of compositions. The user is given the necessary prerequisites thus that he can provide himself all he thinks necessary in terms of complex object support. One way to do so is to introduce different kinds of references, e.g. general ones and others for the various types of composition. Operators can then be defined to work transitively across specific cases of the composition hierarchies. Still, it would not be a simple job to define and implement such a mechanism at the user level. It is therefore suggested that it is provided in the form of predefined classes that show up at a rather high level of the class hierarchy.

Currently, only reaching composite oriented context. requirement, much tion would be very

[ 131 reports about rather farobject support in a fully objectGiven the importance of the more investigation in this direcdesirable.

3.2. Explicit relationships There are (at least) to philosophies in data modeling as to what the real world consists of: just objects, or objects and relationships among them. In the first case, associations between objects have to be expressed by (general) references, in the second one, there is a special top-level concept available (Fig. 2). If one sticks with the first solution, it would again be a problem for the user to provide all the typical operators for relationships (in a general way and

not individually for every type of relationship!). Furthermore, it has to be known at object type design time in what types of ass~iations its instances will need to take part (which is not always easy to foresee), and it is difficult to find a meaningful place for the attributes that describe the relationship instance (but not any one of the participating objects). Interestingly enough, this topic has been addressed for structurally object-oriented database systems only, but not in the case of full object-orientation. An idea for a first-shot solution is to provide systemdefined classes to model (n-place) relationships with attributes, as sketched in Fig. 3. This would already allow to provide the typical operations to establish and exploit relationships. Some more tricky details would, however, be needed to support such nice things as referential integrity: how would the “relationship object” learn that one of the participating objects is deleted, and thus the relationship instance has to be deleted, too? This problem clearly needs some system-internaf help, and suggests that the whole topic of explicit relationships gets some more attention in the future. 3.3. Other issues Obviously, there are many more issues that deserve further work to achieve optimal solutions at some future time. As was already mentioned, the integration of the whole areas of version and transaction support into ooDBS has certainly not yet heard the last word. The same is true for what is currently being called “active databases” [14]. In the system implementation arena, the search for clever architectures is going on, and help is expected from extensible database systems (as e.g. reported in [IS]). Also, considerable effort has to be spent to train object-oriented concepts to potential users, especially with respect to their application in database design. Finally, the success of ooDBS will also depend on their ability to cooperate with other worlds. These days see the demand for the integration of various preexisting systems, e.g. in a CIM environment. There will be very little chance that one can come up with a “from scratch” system in those areas. It seems that ooDBS offer good mechanisms to become the integrating component in such architectures. However, they still have to deliver the proof. By necessity, this list has to be incomplete, and no such list should preclude anybody from researching topics that have not been mentioned ([16] and [17] discuss some more issues). 4. CONCLUSIONS This paper has first tried to propagate a commonly acceptable definition of an object-oriented database system, together with a classification scheme for systems that do not completely match the full definition at this time. We have also sketched some problems where further investigations and prototyping

Object-oriented

seems to be promising. These have clearly been biased by the author’s special area of interest; it is not claimed that those are the only, nor even that they were the most important ones. Alluding to the metaphor of a marathon (as the title suggests), it seems to be clear that object-oriented database systems still have a long way to go to reach the goal of general, broad acceptance for large numbers of applications. They certainly do have the potential to make it. However, I am afraid that-like in many cases-the second half of the race is the harder one: much detailed work and improvement has to be done that may easily be less attractive than developing the initial concepts, and that may not promise many “fast” papers. Hopefully, the community will keep its good spirit to make ooDBS a long-lived success, and not just a short sprint. REFERENCES [l] A. Goldberg. Introducing the Smalltalksystem. BYTE August (1981). [2] M. Atkinson, F. Bancilhon, D. Dewitt, K. Dittrich, D. Maier and S. Zdonik. The object-oriented database system manifesto (a political pamphlet). Proc. DOOD 89, Kyoto, Japan, December -( 1989). .131_ K. R. Dittrich. Preface. AdLvznces in Object-Oriented Database Systems (Dittrich K. R.. Ed.), Lecture Notes in Computer Science. Vol. 334. Springer, New York (1988). [4] A. Albano, L. Cardelli and R. Orsini. Galileo: a strongly-typed interactive conceptual language. ACM TODS 10, 2 (1985).

database

PI

systems

167

M. Atkinson, P. J. Bayley, K. Chilsom, W. Cockshott and R. Morrison, An approach to persistent programming. Comput. J. 26, 4 (1983). 161D. Maier, J. Stein, A. Otis and A. Purdy. Development of an object-oriented DBMS. Proc. OOPSLA 86 (1986). to support [71 P. Dadam et al. A DBMS prototype extended NF2 relations: an integrated view on flat tables and hierarchies. Proc. ACM SIGMOD (1986). PI K. R. Dittrich, W. Gotthard and P. C. Lockemann. DAMOKLES-a database system for software engineering. Lecture Notes in Compuler Science, Vol. 244. Springer, New York (1987). and L. A. Rowe. The design of 191 M. Stonebrake; POSTGRES. Proc. ACM SIGMOD (1986). 1101 R. Lorie and W. Plouffe. Complex objects and their use in design transactions. Proc. ACM SIGMOD (1983). [Ill W. Kim er al. Composite object support in an objectoriented database system. Proc. OOPSLA 87 (1987). C. Lecluse and Ph. Richard. The O? database programt121 ming language. Proc. VLDB (1989). I131 W. Kim, E. Bertino and J. F. Garza. Composite objects revisited. Proc. ACM SIGMOD (1989). _ [I41 D. R. McCarthv and U. Daval. The architecture of an active database management system. Proc. ACM SIGMOD (1989). [I51 M. J. Carey, D. J. Dewitt and S. L. Vandenberg. A data model and query language of EXODUS. Proc. ACM SIGMOD (1988). D. Maier. Making database systems fast enough for [I61 CAD applications. Object-Oriented Conceprs, Darabases and Applications. ACM Press (1989). Object-orientation as catalyst [I71 J. E. B. Moss. for languagedatabase integration. Object-Oriented Concepts, Databases and Applications. ACM Press (1989).