The fundamentals of object-oriented database management systems

The fundamentals of object-oriented database management systems

Biochimie (1993)75, 331-335 © Soci6t6 franqaise de biochimie et biologie mol6culaire/ Elsevier, Paris 331 The fundamentals of object-oriented databa...

780KB Sizes 0 Downloads 40 Views

Biochimie (1993)75, 331-335 © Soci6t6 franqaise de biochimie et biologie mol6culaire/ Elsevier, Paris

331

The fundamentals of object-oriented database management systems D Plateau oe Technology, 7, rue du Parc de Clagny, 78000 Versailles, France (Received 25 November 1992; accepted 29 December 1992)

Summary - - The purpose of this document is to characterize the two technologies (database and object-oriented technologies) which constitute the foundation of object-oriented database management systems. The 02 Object-Oriented DataBase Management System is then described as an example of this type of system. databaae / object-oriented technologies / management systems Introduction

Logical~physical independence

The concept of an object-oriented database management system (DBMS) appeared in 1984. The purpose of this new generation of DBMSs was to take into account the requirements of new application domains such as computer aided design and manufacturing, geographic and urban systems, editorial information systems and office automation, as well as the traditional areas such as business and transactional applications. The former DBMS generation (relational DBMS) was only concerned with traditional areas. The technical baseline consists of the integration of two technologies: database management systems (DBMS) technology and object-oriented programming. The purpose of this article is to identify these two technologies and to show how they were merged in the 02 Object-Oriented Database Management System (OODBMS).

The user's programs know about the logical representation of the data (the mapping with the data model) but the physical layout of this data has to be completely transparent so that the program does not need to be modified each time the data are physically reorganized. Physical layout is concerned with performance, ie mainly the optimization of disk access. DMBS supports several techniques to optimize performance: indexing, clustering, buffering, etc. The DBMS is in charge of maintaining the mapping between the logical and the physical level.

Database features The main features of a DBMS are described below. Persistence for the data

The data handled by the user's programs have to survive the end of the program. Any database management system (DBMS) provides a data model, that is, a set of rules that the user will use to represent the real world. Once this mapping between the real world and the data model is performed, the DBMS knows how to store the data on a persistent medium (hard disk, tape, optical disk, etc).

The query language

A query language allows to extract and to manipulate in the database. Modem query languages are declarative as opposed to procedural: the user defines what is the information to be extracted and not how to extract it. The DBMS is in charge of deriving from the query how to compute efficiently the result. Declarative query languages are usable by non-professional programmers. Disk management

DBMSs manage very large amounts of data and performance is a critical issue. The main factor of good performance is an efficient management of the disk. Reading from and writing on the disk is the bottleneck of the system. A disk access time is in the range of

332 10 ms while access to core memory has a kts order of magnitude. Therefore, DBMSs provide features to reduce the number of disk accesses.

Virtual memory The DBMS managed a virtual memory made of the disk and the core memories. A virtual memory is divided in pages, some of them in core, most of them on the disk. Each time a program requests a page, the DBMS will first check whether the page is already in core. If so, a disk access is saved. Usually, when a page has been used by a program, there is a high probability that it will be reused later by this same program. Thus, a good page migration technique (caching) will save many disk accesses and improve significantly the performance of the program.

Data sharing The databases are usually used by several programs or users. Several programs will concurrently read and write the same data items and the DBMS guarantees that this will happen without corrupting the database or providing programs with inconsistent data.

Data reliability Once a program is finished, the DBMS guarantees that the data created or updated in the database will not disappear because of disk crash or system failures. In case such problems occur, the DBMS maintains a log of all the operations that affect the database. In case of a crash, the DBMS runs the log to restore the database in its pre-crash status.

Indexing Indices behave like the index one can find in a book: instead of reading the book from the beginning to the end to find all the appearances of a given word, the book index lists the page where each important word is used. The index provides a direct access to the information. The extraction of a single data item out of a very large non-indexed database will require to scan the complete database (eventually hundreds of disk accesses). The same operation on an indexed database might reduce the number of disk access to two: one to read the index and one to read the relevant data item. Traditional indexing techniques include BTrees and hashing. Some applications such as text database or geographic information systems require more specific indexing techniques.

Security Because data is shared by several users which eventually do not have the same rights to read and update the information, the DBMS provides access control to the databases. In addition to these traditional database functions, modem DBMSs also support features such as distribution (a database is distributed in several geographic locations) or client/server architecture. However, the main difference between the new generation DBMSs (object-oriented DBMS or OODBMS) and former generation DBMSs lies in the data model they support. The next section describes these object-oriented data modeling features.

Clustering

Object-oriented system

Databases are logically made of data items and physically divided in pages. If one can cluster within a single physical page various data items that are logically related, one can save disk accesses. For instance, if a program reads the first word of a sentence, there is a high probability that the rest of the sentence will be read soon. Therefore, the complete sentence should be clustered in a single page so that it will be read with one disk access and then cached in core memory.

The word 'object' covers two distinct technologies: object-oriented programming and complex object manipulation. Object-oriented programming is concerned with programmer productivity, program modularity, evolution and maintainability. Complex object manipulation is concerned with modeling the real world. Both of these aspects are considered in this section.

Query optimization

Object

When a query is activated by a program, the DBMS will automatically find the most efficient way to compute it. To optimize the query execution, the query optimizer takes into account indices, clustering. There are other techniques to optimize complex queries concerned with the reduction of the data flow resulting from the computation of the query. Again, query optimization is mostly concerned with decreasing of disk access time.

An object is made of a content (its value), an identity and an interface which allows to manipulate it. These concepts are described below.

Content and complex objects The content of an object reflects information about the real world. Its structure can be very rich. Mapping the real world concepts in the data model is difficult if the

333 data model is poor and easy if the model is rich. The goal of rich data models (those supporting complex object representation) is to be very close to the concepts of the real world. A complex object model will usually include: i) atomic components which consist of integer, real, character, boolean, strings or binary byte sequences (useful to represent color image or sound for instance); ii) object references: an object can contain or refer to other objects. For instance, a book is made of a title, a list of authors and a list of paragraphs, each of these elements in itself an object; and iii) object constructors: in order to combine and structure atomic components and object references, object constructors are proposed. For instance, the collection constructor is used to group the paragraphs of a single book and a tuple constructor allows to aggregate the title and this list of paragraphs.

Inheritance Class definitions are organized as inheritance trees or directed acyclic graphs (DAGs). For instance, a technical report is a special book having a registration number. The technical report class inherits the properties of the book class and adds some new ones. Inheritance among classes is one of the fundamental principles of object-oriented programming. The benefits of inheritance are the re-usability of classes and methods definition in different contexts. The programmer implementing the technical report class does not need to reimplement the book class methods. Multiple inheritance refers to the possibility to inherit properties from several classes.

Overriding

Object identity Any object has an identity which makes it unique in the database. Two distinct objects (that is, two objects having a different identity) may have the same content. In time, the content of an object may change but its identity is an invariant. Object identity is used to represent object references. This way, an object can be referred by several other 'owner' objects. If this object is updated, each owner object will 'see' the update. References can also build cycles. For instance, two books can have the same author. If the author's address changes, then the books see the correct new address. The author object may refer to the set of books he/she wrote, therefore creating a cycle.

Encapsulation The interface of an object consists of a set of methods. Each method implements an operation. For instance, the book object may have an 'add-paragraph' method. For a program handling books, the way the method is implemented and the structure of the book content is completely hidden. This way, the structure and the method implementation can be changed without modifying the program. This is the principle of encapsulation. This is how modularity is enforced with object-oriented programming techniques.

Class In a program, there are usually many objects sharing the same content structure and methods. These objects are grouped in a class. A program is then constituted by a set of classes and the database populated with the objects that are instances of these classes.

A method can have several implementations depending on the class. For instance, the 'add-paragraph' method might be different in the class book and in the class technical report to take into account technical report specificities. The property of having different implementations of a single method across the inheritance three is called overriding.

Late binding If a program manipulates a book, this book might be strictly a book or it might be a technical report. When executing the 'add-paragraph' method, the program is not aware that there exist two implementations for this method. The object-oriented system automatically decides at run-time which is the relevant implementation to use. This automatic selection process is called late binding. The advantage of late binding is again modularity. The program is not aware of the various implementations of a single method. Furthermore, another sub-class of book (new paper) may be created with a new implementation of 'add-paragraph' but the program will remain correct. To summarize, the object principles that are listed above all tend to improve programmer productivity: easier data modeling with complex object data model, re-usability of software components with inheritance and overriding, maintainability and evolutivity with encapsulation and late binding. In addition, object-oriented database systems provide much better performance than traditional systems (a 10 to 100 ratio) for the manipulation of complex objects. The next section illustrates both the database and object-oriented concepts through the example of a s~ stem called 0 2.

334 ment and a set of user interface tools. As an OODBMS, it satisfies the rules described in [1 ]. 02 has four principal objectives: i) increase the productivity of application development for both traditional and new applications; ii) provide better tools to serve the development of new applications; iii) improve the quality of the final applications (in terms of looks, performance, maintainability and customizability); and iv) improve existing applications by porting them at a minimal cost on 02. To reach these objectives, 02 is based on three main ideas: i) the merge of user interface, programming language and database technologies; ii) the use of object-oriented technology; iii) the conformity to standards. As shown in figure 1, the core of the 02 system is O2Engine, an object database engine. OzEngine stores structured and multimedia objects (complex object data model as defined in the previous section). It also

O2Engine

Fig 1. The 02 system 02 is an object-oriented database management system (OODBMS) with a complete development environ-

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

-;i'll-', ~.. , ~ . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

~,,.'~'J~~,...

[oi .

.

.

.

~

.

i. . . . . . . .

nalne

lr'Jto ~ Hotels

Roma Roma

J

i,

llalia c0untry

I m ...... ma____p. . . .

[ .....

Hype'r'

,

[ ~7~u I FJit,nap [

,x,

Fig 2.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

, , , i ,

,.,~:..,~:le,,,,:,., ' ,- • • ..- ........... --, .-

,

335 implements all the concepts of an object-oriented system (encapsulation, overriding, late binding, inheritance). O2Engine also has all the features of a complete DBMS: it handles disk management (this includes buffering, indexing, clustering and I/O), distribution and client/server, transaction management, concurrency, recovery, security and data administration. O2Engine may be used through various multi-level programming interfaces. An application programming interface (02API) O2API is a kernel interface well-suited when using O2Engine as an internal repository within a larger system such as a geographic information system or a hypermedia system. Traditional standard languages such as C and C+ + The interface between O2Engine, C and C++ was designed to make O2Engine transparent. The programmer is not aware of the database. Instead, the C and C++ environment has a new feature which allows to store automatically a C structure or a C++ object 'somewhere' and eventually to share it with other programs and users. The data model used by the programmer is the C or the ~C++ data model. The interface performs automatically the mapping between the C and C++ world on the one side and the 02 world on the other side. This approach is especially suited to add database functionalities to an existing C or C++ program. An object 4th generation language (4GL), 02C O2C is an object-oriented 4th generation language. It is a superset of C that provides complex object manipulation and object-oriented features which do not exist in C (for instance, O2C provides powerful operations to manipulate collections: union, intersection, difference, insertion, etc). The O2C data model is the union of the C data model and the 02 model. O2C is a true object-oriented database programming language which makes development easier and faster. An object query language 02SQL Information stored in an 02 database can be queried with the object query language O2SQL. O2SQL is a declarative query language which extends SQL (SQL is the standard query language to manipulate data managed by a relational DBMS) in several directions: O2SQL allows to query and to build complex objects, it allows to use methods within a query. In combination with any of these programming interfaces, the O2Look and O2Graph graphic user interface tools can be used (see fig 2). O2Look is an extension of the standard X-Window window system which provides applications with graphic database browsing facilities.

Any 02 object can be displayed and manipulated by O2Look. Object methods can interactively be activated through pop-up menus. O2Graph is an O2Look option; it consists of a graph display and manipulation package. Last, a graphic programming environment, O2Tools, is offered which consists of: i) a complete set of tools, O2Tools, including a graphic schema browser and designer, a source code manager, a symbolic debugger, an incremental compiler and a dynamic loader; ii) a library of reusable components, OEKit which contains basic classes such as text, color image, sound or date. The 02 system satisfies all the features listed in the previous sections concerning database and objectoriented features and a few others. 02 has been a commercial product since late 1990. A detailed description of the 02 system can be found in [2, 3].

Conclusion The concept of object-oriented DBMS appeared in 1984. The first prototypes were available in 1987 while real usable products were shipped in 1990. Object-oriented DBMS are now getting very popular in the 'new' domains where many applications are now in effective use. More traditional domains are also moving to this new but mature technology. The benefits of object-oriented database systems for these new applications are" i) the improved programmer productivity in the design, development, maintenance and evolution stages of the application; programming cost can be reduced up to 10 times; ii) the re-usability of generic software components in the context of specific applications; iii) the performance is improved up to two orders of magnitude in the case of complex structures manipulation; iv) new data types can be defined to represent rea~ world information such as structured documents, sound, color image, video or programs.

References 1

2 3

AtkinsonM, Bancilhon E DeWitt D, Dittrich K, Maier D, Zdenik S(1989) 'The Object-Oriented Database Manifesto'. Proceedings of the International Conference on Deductive and Object Oriented Databases, Kyoto, Japan, December 1989. DeuxOet al (1990) 'The Story of O2'. IEEE Transactions on Knowledge and Data Engineering, vo'l 2, no l, March 1990 BancilhonF, Delobel C, Kanellakis P (1992) Building an object-oriented database system: the story of 02. MorganKaufmann,