The replicator coordination design pattern

The replicator coordination design pattern

Future Generation Computer Systems 16 (2000) 693–703 The replicator coordination design pattern Heidemarie Wernhart a,∗,1 , eva Kühn a , Georg Trausm...

175KB Sizes 4 Downloads 61 Views

Future Generation Computer Systems 16 (2000) 693–703

The replicator coordination design pattern Heidemarie Wernhart a,∗,1 , eva Kühn a , Georg Trausmuth b,2 a

Institute of Computer Languages, University of Technology Vienna, Argentinierstrasse 8, A-1040 Wien, Austria b Tecco Coordination Systems, Mommsengasse 19/7, A-1040 Wien, Austria

Abstract The problem domain addressed in this paper refers to scenarios where several databases, which reside on different computers, platforms or database systems have to be kept synchronized. A solution design to this problem is presented as a coordination design pattern. A coordination design pattern describes the design of a solution to a recurring problem from distributed or parallel processing, where the solution is based on multiple processes that are coordinated using shared data objects. The coordination framework Corso is introduced as a possible implementation environment. The design pattern called Replicator is motivated with a scenario using distributed library databases. © 2000 Elsevier Science B.V. All rights reserved. Keywords: Design patterns; Heterogeneous databases; Replication; Virtual shared memory

1. Introduction Heterogeneous database integration and replication [1] is one of the major application requirements in industry. The purpose of replication is to synchronize multiple databases that represent the same data content. The databases may be geographically distributed, may be based upon different database systems from different vendors, may represent the data in different database schemas or may even use different data models. The scenarios where replication is needed come from real world problems: the architecture of huge distributed applications cannot rely on a single centralized database server. Several such servers would have to be taken into consideration, which requires to ∗

Corresponding author. The author is currently working for IBM Austria Ltd., Obere Donaustr. 95, 1020 Vienna, Austria. 2 Supported by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung. 1

replicate the data whenever an update occurs to one of these database. Whereas this scenario still refers to homogenous databases, real world applications are very often built upon heterogeneous database systems. One such scenario shows up when new applications are being built that are based upon a new database system or schema. Often it is not possible to migrate all of the production systems instantaneously. This means that both databases, the old one and the new one have to be kept synchronized until all applications have been migrated. The scenarios mentioned require a replication mechanism. In this paper we present the coordination framework Corso and how it is applied as a solution to the problems related to replication. Corso offers data objects that are virtually shared between all participating sites. This means that the participating software processes are coordinated by reading and writing such shared data objects. Using shared data for parallel, distributed or concurrent programming requires a different way of thinking compared to conventional techniques such

0167-739X/00/$ – see front matter © 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 7 3 9 X ( 9 9 ) 0 0 0 7 8 - 3

694

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

as explicit message passing. To illustrate how virtual shared memory can be used to solve a recurring problem we describe the replicator as a coordination design pattern. A coordination design pattern is a special case of a design pattern where the solution is based on virtual shared memory.

2. Related work The issues of replication are well known from distributed, heterogeneous database systems. Software products are already available which provide multi-directional data replication among various source and target database systems. This paper presents a general solution for the replication problem, which could either be used as an implementation basis for such a product, or to simply implement the replication scenario without the need for an existing product. The solution to the replication problem is presented as a design pattern. There have been some publications on design patterns for concurrent, parallel and distributed processing [6–8] which are based on object orientation, the traditional domain for design patterns. These patterns use operating system functionality or communication frameworks such as ACE in [7] to achieve parallel or distributed processing based on explicit message passing (send-receive) or event mechanisms. Often their driving forces are to hide the complexity of these communication mechanisms. Since we base coordination design patterns on coordination frameworks such as Corso, we do not have to care for this low level functionality. We can define our solutions on a higher level of abstraction. This distinguishes coordination design patterns from conventional patterns in the area of concurrent, parallel and distributed processing.

3. The library example We motivate the Replicator coordination design pattern using an example where distributed library databases have to be kept synchronized. The application based on the library databases should provide a user interface to query books according to title, author etc. As a result of a query the user will get a

Fig. 1. Hierarchy of Library Databases.

list of the available books and their physical location. The system is intended to be heavily used. A single centralized data server will certainly become a bottleneck. Therefore a hierarchy of regional databases is built. A regional database contains entries for all books that are available in the libraries of the region. It contains replicas of all entries of its child library databases. A regional database may again be part of a higher level regional database to which it replicates all its book entries (Fig. 1). It should be possible, on each site, to register new books and to remove existing book entries. Registering a new book in a local database means that it also has to be registered in all parent databases to maintain system integrity; deleting an entry means removing it from all parent databases where it is registered. No specific database system is required for the library databases. It should even be possible to integrate database systems or file based systems that do not have a reliable two phase commit. A commitment mechanism is necessary for reliably adding a book entry in all databases that replicate a record simultaneously. As the information of one database of our example is replicated to other databases, all insert, update, and delete operations have to be applied to all databases that store a copy of the same record. In this case, the replicator coordination design pattern outlines a design how to replicate an entry from a local library to other databases. The case of a distributed query is handled differently: first a local search operation is executed and if the necessary information is found, the search operation does not access any other database. Only if the result of the local operation is not satisfactory, the search operation would be passed on to the next

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

hierarchical layer. Therefore, the replicator pattern also provides support for the distributed query. The next sections give an overview of the functionality of the coordination framework Corso, which we used as the basis to implement the replicator pattern. The term coordination design pattern is defined and finally the replicator coordination design pattern is described with its participating processes and coordination data structures.

4. The coordination framework Corso The coordination framework Corso consists of the coordination kernel and programming interfaces. The coordination kernel deals with the internals of the coordination system such as replication concepts and communication protocols. Processes which communicate via shared data objects either require a coordination kernel running locally on a site, or in the case of lightweight access, use a coordination kernel on another computer. The programming interfaces are either directly imbedded into a programming language (extension) or consist of a set of APIs (application programming interfaces) that may be called by an application program. Language extensions for the languages Prolog and C have been prototyped, which are called Coordination Prolog and Coordination C [5]. APIs exist for the languages C, C++, Java, and VisualBasic. These language specific interfaces are called Prolog&Co, C&Co, C++&Co, Java&Co, and VB&Co, where Co stands for coordination. 4.1. Coordination kernel The coordination kernel is a software process running on every participating machine with the responsibility to present a consistent view of the shared data objects to all connected application processes. It deals with internal protocols, communicates with the other connected coordination kernels, encodes and decodes communicated data, manages the connected software processes, handles transactions and data persistency, implements replication strategies, performs garbage collection, etc. All coordination kernel processes in their entirety maintain the shared data objects in the virtual shared memory space. Each shared data object is uniquely identified by its OID (object identifica-

695

tion). It may represent a single value (integer, string) or may also be composed of a number of values and other shared data objects. Using this composition of data objects, endless streams of shared data objects may be constructed. A process has access to a shared data object if it gets its OID passed on invocation in the argument list, if the data object is communicated as a subobject of one that has been passed on invocation, or if the project itself created the object. The coordination kernel is also responsible to automatically recover the connected processes after a system crash, passing it the original shared data objects in their most recent state. A user console can be connected to a local or remote coordination kernel to start, abort, suspend, or resume Corso processes and to display state information. 4.2. Communication data objects Corso supports two types of shared data objects: constants and variables. Constants are shared data objects which may only be written once. Variables may have any number of values assigned during their lifetime. Communication data objects may only be written within a transaction. Corso is based on the Flex Transaction Model [2], which allows to use arbitrarily nested transactions. Compensate actions may be registered that the coordination kernel automatically calls to semantically undo the effects of a nested transaction when an enclosing transaction fails. Corso supports blocking and non-blocking read of shared data objects. When applying a blocking read, a process trying to read a shared data object that is still undefined (i.e. has not yet been written) will wait until another process eventually writes to this data object. Non-blocking read would immediately return a value or indicate a failure if the value has not been set. Whenever a shared data object is written, it is immediately stored into persistent memory. This allows to recover data objects after a system crash, and therefore, Corso processes may rely on the communicated data. 4.3. Corso processes A Corso process is an entry point function of an application program. Inside Corso, processes are

696

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

registered with their names and the software systems to which they belong. There are two ways to start Corso processes: first, an application built upon Corso provides at least one entry that can be started from the Corso console. Second, a Corso processes is started independently and later connects to the coordination kernel. A process that can be started from the console may not have any arguments. Such a process will create shared data objects and spawn other participating processes. It passes the OIDs of the created shared data objects to the spawned processes. These objects will then be used for communication and coordination. A Corso process can be spawned on a remote or on the local site. Corso supports two types of processes: independent process and dependent process. An independent process running under the control of Corso is reliable in the sense that it will execute even if a system failure occurs. After a system crash or shutdown, the coordination kernel will recover an independent process that has not yet terminated. It will spawn the process again passing it, the same argument list as before. The restarted process can examine the already processed values of the shared data objects and usually continues its work at the point where the system crash has happened. A dependent process can only be started from within a transaction. Its purpose is to participate in a transaction on a different computing site. Corso will also recover dependent processes after a system failure. The enclosing transaction can only commit if all its dependent processes have successfully completed, and vice versa the commitment of a dependent process depends on the success of its enclosing transaction. Corso provides a name server which allows to translate unique symbolic names into OIDs. Such a name server is available on every site running Corso. It allows to register and to look up shared data objects identified by name. 4.4. Software systems The local coordination kernel administrates and dispatches the processes for all connected local software systems (LSYS). These LSYS’ are configured to be either permanent or transient. A permanent LSYS has the capability to execute several Corso processes. At creation time of a Corso process, the name of the LSYS and the site uniquely identify the execution

context for the Corso process. A transient LSYS executes exactly one Corso process and then terminates. An arbitrary number of instances of a transient LSYS can be active at the same time. Corso processes of a permanent LSYS have the potential to be light-weight. The only information they share with their environment is represented by the reference to shared data objects that are passed to them at creation time. If the language binding of the LSYS supports multi-threading (e.g. Java&Co), the coordination kernel immediately passes to it the next Corso process entry, which is executed at the LSYS as a concurrent thread. If the LSYS is single threaded, the execution of processes there is sequentialized. In both cases, single and multi-threaded, the LSYS is started only once and then executes all processes determined for this very LSYS. At most one instance of the LSYS can be active at the same time. In summary, a Corso process of a permanent LSYS can be seen as a light-weight process. If the LSYS supports more than one thread, the Corso process can execute concurrently as a real thread. If the LSYS is transient, a Corso process of this LSYS maps to its operating system process. 4.5. Application programming interfaces All &Co languages APIs provide functions to control Corso processes and transactions and to manipulate shared data objects. Corso uses optimistic concurrency control for its transactions. The following sections briefly describe the functions of the API. 4.5.1. Operations on transactions Start a transaction: create a top level or nested transaction and return its transaction identifier. The transaction identifier will be passed as a handle to the functions that read or write values of shared data objects. Commit a transaction: execute all transactional requests (i.e. read and write operations) on shared data objects. If one of the requests fails, the transaction commitment fails and the transaction is aborted. Otherwise the commitment succeeds and all registered on commitment actions (if specified) are executed. Try to commit a transaction: return a failure indication if the commitment did not succeed. The number of the requests that did not succeed can be queried.

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

697

After cancelling this request the commitment may be retried. Abort a transaction: abort the transaction and all nested subtransactions. If a subtransaction has already been committed, the abortion causes its compensation [3].

a design scenario that is primarily based on a coordination framework on top of virtual shared memory. In this chapter we describe the term design pattern and explain in detail what we mean by coordination design pattern.

4.5.2. Operations on shared data objects Create a shared data object: return an object identifier (OID) as handle for the newly created object. Write data to a specified shared data object in a transaction: the data object may contain a value of certain types such as integer or string, it may represent another shared data object, it may group a number of values (struct), or it may represent list of values. The writing of the actual values to the shared data object is delayed until the enclosing transaction commits. Only if the commitment is successful, the binding becomes visible. Read data from a specified shared data object: the non-blocking (asynchronous) read immediately returns with a failure if the object does not yet have a value. The blocking (synchronous) read waits until the shared data object has been assigned a value.

5.1. Design patterns

4.5.3. Operations on processes Start an independent process: returns a process identification (PID). The software system to which the process belongs and the computing site have to be specified. The process identification also represents a shared data object, which can be used to query the current state of the process and to asynchronously send signals to the process. Start a dependent process: returns a process identification (PID). To start a dependent process, a transaction identifier has to be specified, which determines the transaction to which the process belongs. Terminate the current process: gracefully ends a process so that it will not be recovered. Allowed exit values are, for example, ABORT and COMMIT. The termination state is stored into the shared data object represented by the PID of the process.

One of the primary goals of design patterns is to communicate design experiences. A design pattern describes a good solution design to a problem that has been perceived in several recurring situations. The term design pattern has been applied to software design the first time in [4]. There we find a description of the following essential elements of a design pattern: • The pattern name is a clear term used to describe the design problem, its solution, and consequences. This name is used to communicate designs to colleagues. It helps to build a common vocabulary. • The problem describes when to apply a given design pattern. This description includes the conditions that must be met before it makes sense to apply the pattern. • The solution describes the elements that make up the design to solve the problem. For coordination design patterns these elements are the processes, the structure of the shared data objects, and the sequences of the read and write operations performed on these objects by the participating processes. • The consequences are the results and benefits of applying these patterns. These consequences are important for evaluating design alternatives. Using a certain design pattern influences other system components and subsequent releases of the software. Reusing design experience described as design patterns helps to improve the quality of new software systems. One of the major benefits of identifying design patterns is that the pattern names form a common vocabulary for software design, and this definitely improves the communication about designs. Capturing design experiences as design patterns helps novices to learn from experts.

5. Coordination design patterns 5.2. Definition of coordination design pattern The traditional term design pattern refers to the design of a solution that is based on object oriented technology. A coordination design pattern describes

A design pattern is called a coordination design pattern if the problem domain is from one of the areas of

698

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

distributed, parallel, or concurrent processing. A coordination design pattern usually involves more than one participating processes, coordination data objects which are virtually shared between the processes, and synchronization that uses read and write operations on the coordination data objects. A design pattern is no coordination design pattern if its solution is based on explicit message passing or event handling. These approaches contradict the shared object model which achieves coordination by writing and reading shared data objects. The solution of a coordination design pattern is described by • the participating processes, their properties and roles, • the coordination data objects, their properties and roles, and • the collaboration and coordination of the processes using shared data objects.

◦ A dedicated machine keeps a log of all or of specific changes made databases. These databases would replicate their changes to the dedicated machine. ◦ The same data contents shall be stored in different databases with different database schemas. In some applications data is stored in a normalized database schema as well as in a less efficient database representation, which is more efficient for specialized queries like in the case of data warehouses. ◦ An old and a new database have to be kept synchronized, e.g., when a decision has been made to migrate to a new database system. As long as there are still some applications that rely on the old database, both databases have to replicate their deltas.

7. Overall system structure 6. The replicator coordination design pattern We describe the replicator coordination design pattern using the template as described in [4] and extend it with the coordination design pattern specifics. • Pattern name. The name of the pattern is the replicator. This term should express that it solves a problem from replication and that it represents an agent or mechanism that actively solves the problem. • Intent. The replicator design pattern solves a replication problem. It is applied in scenarios where data that are available on one computer have to be consistently replicated on other data stores. • Motivation. For a usage scenario and motivating example of the replicator have a look at the library example described in Section 3. • Applicability. The pattern can be applied in a number of situations: ◦ Several data servers provide access to the same data for availability reasons and to avoid a single server bottle neck. ◦ Client machines keep part of their required data locally for performance reasons. These local data could even be stored in a file system to avoid installing an expensive database component on the client sites. ◦ For safety reasons sensitive data are immediately to be replicated to a backup server.

This is how the architecture that makes up the replicator coordination design pattern looks like (Fig 2). 7.1. Participating processes The following processes participate in the Replicator coordination design pattern: • Replication manager. This process provides the interface of the replication mechanism to the outside world (user or client program). It provides functions for maintaining the database meta information of the local site (insert and delete databases) and for performing replication requests. A replication request consists of the following subtasks: ◦ Trigger all required worker processes on all participating sites. ◦ Compose the database commands that are simultaneously performed by all worker processes. ◦ Control the execution of the database commands and check the return codes from the workers. ◦ Make the global decision and communicate the result, which is to either abort or to commit the database transaction to the workers. The replication manager process is a long running but not everlasting process. This means that it performs a limited number of tasks and ends after the tasks have been completed, for instance when the

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

699

Fig. 2. The Replicator system structure.

user logs out. There can be any number of such manager processes running per site. They are usually started as an independent process from the Corso console. The manager process runs on a transient software system. Concerning the shared data, the replication manager never gets data passed on its invocation nor does it access the name server. Only data objects needed for communicating the commands are created. The replication manager is a starter process, i.e. the first process of an application which initiates the distributed execution. It also acts as a factory process (creator) for the replication agent process. During runtime of the replication application, the manager acts as a conductor. A conductor process is a dedicated process that directs and coordinates the execution of other processes. • Replication agent. The replication agent is a permanent software system that implements several functions (Corso processes) which are called from the replication manager process within transactions. The replication agent is responsible to initially start the permanent multi-database gateway process and to contact the name server of a local site, to add or remove a database to the local database meta information, to create the execution slot for a local

database that needs to handle a request and to initiate the slot creation for all its parent databases. Based on this requirement, the replication agent publishes the following functions, which are called from the replication manager: add database, delete database, and get all slots which is needed for the replication mechanism. The get all slots function may also be called recursively by a replication agent located on this or a different site to get the execution slots for all participating databases. The processes of the replication agent are short term processes. There exists one replication agent software system per site, which is permanently loaded. Still there may be several instances of the get slots process which are active at the same time (recursion). The Corso processes of the replication agent are dependent processes. They work on shared data objects that are either available from a name server or passed on invocation. The database meta information is accessible through the local name server using the database name as the key. The add database and the delete database Corso processes use the name server data. On invocation, the get all slots Corso process is passed shared data objects. These objects are used and extended by adding subobjects containing commands.

700

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

• Multi-database gateway. The multi-database gateway handles the requests for all databases located on its site. It repeatedly reads the execution slot list which is produced by the replication agent. It starts as an appropriate worker for an execution slot and passes it the entry into its command list. This process is an endless process. It is first initiated by the replication agent, and from then on runs forever. In case of a system crash, Corso will restart it. There is just one instance of the multi-database gateway process running on every participating site. The gateway is executed as an independent process belonging to a permanent software system. The entry into its slot list is provided by the replication agent at startup. The multi-database gateway acts as a factory for the worker process. It extracts the type of the database from the database information provided in an execution slot and starts an appropriate worker process to apply the operations to its database. • Database worker. The name worker characterizes a class of processes that implement the same functionality. For every database type (Postgres, Oracle, etc.) workers are implemented, which provide the same interfaces, understand the same coordination data and perform semantically equivalent actions on the assigned databases. A worker process reads the commands from the command list passed on invocation. For every command it performs the action described, which could be one of: start a database transaction, perform an insert or delete, commit or abort the transaction. The worker writes a success or failure indication for the command into the shared data object. The replication manager waits for the value of this data object and then decides to either commit or roll back all operations. The worker process performs the commands of one slot and ends immediately after the operations have been committed or rolled back. Multiple instances of worker processes may be active at the same time. Worker processes run as independent processes belonging to a transient, or in a more optimized version, to a permanent software system. They are started by the multi-database gateway process, which immediately continues to handle the next execution slot. The worker processes only use

the command list which is a shared data object they get passed on invocation. 7.2. Coordination data The following coordination data structures are used by the replicator: • Database information. The database information contains the meta information for the databases participating in the replication scenario. One database information record exists for every database provided locally by a site. The name of the database is used to retrieve this information from the local name server. The database information structure contains the database type (Postgres, Oracle, file system, etc.), the supported type of two phase commitment, and a reference to a replication database. The replication agent maintains the database information. • Execution slot. An execution slot corresponds to a request, which is to be handled by one specific database. There exists a list of such slots on every participating site. The entry into this list can be queried from the local name server. A new request is added by a replication agent to the end of the list. The multi-database gateway process gets the entry into this list when it is first started. The execution slot structure is made of: ◦ Name of the database to handle the request. ◦ Type of the database. ◦ Type of two phase commitment. ◦ List of commands to process. ◦ Indicator if the request has already been processed. This indication is needed for recovery. When the multi-database gateway process is recovered, it skips the already processed slots. The replication agent process get all slots produces these execution slots. The multi-database gateway reads and processes the slots by starting appropriate worker processes. • Commands. A user request to change a record in a replicated database is performed in the following steps: (1) begin the transaction, (2) execute one or more database operations, and (3) commit or abort the transaction depending on the state of all replicas involved. One database operation of step (2) is described by an operation identifier (insert, delete, select,

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

etc.), the arguments needed to process the command (title of a book, etc.), and another parameter to communicate success or failure. A list of commands is assigned to an execution slot. A reference to this list is passed to a worker process on its invocation. The commands are executed in two phases. In the first phase, the replication manager starts a transaction and writes the commands for all workers that concurrently have to perform the database actions into the coordination data structure. The worker processes read these commands, perform the described actions and set the success indications. The replication manager process reads these indications, makes a global decision, and initiates the second phase: if all operations were successful, it tells the worker processes to commit, otherwise, to abort. 7.3. Collaboration The participating processes collaborate in the following way: 1. A replication manager process is started from the console. 2. Whenever a replication request is issued, the replication manager process starts the get all slots process of the replication agent. This process performs the following steps: 2.1. If the requesting database is not already in the list of databases to synchronize: Create a new execution slot for the requesting database. Add the slot to the local execution slot list. Add the slot to the returned slot list that will be used by the calling process. Add the requesting database to the list of databases to synchronize. For all parent databases of the requesting database, start the get all slots process on their computing sites. 3. Executing in parallel. The multi-database gateway process reads the execution slot and starts an appropriate worker process, passing it the command list contained in the slot information. 4. The get all slots process returns to the replication manager process. The replication manager process uses the returned list of execution slots to get hold of all command list entries. It produces the commands for all workers.

701

5. In parallel. All workers read their command lists and execute the commands. They write a success/failure indication for every processed command. 6. The replication manager process bases its global decision on the success indications communicated by the worker processes. If any worker was not able to successfully perform a request, abort, otherwise, commit the operations. This decision is communicated to the workers as the last command. 7. All workers commit or abort. 7.4. Variants of the pattern There are two ways to start the recursive calls of the get all slots process of the replication agent, which has a significant influence on the replication behavior: • As on commitment actions. This is only possible if the corresponding database supports a reliable two phase commitment protocol because it could happen that another request for the same database is handled earlier. • As dependent process. This assures that the creation of the execution slots for all participating databases are committed in one step. If databases of different schemas or data models are involved, the worker process will have to do the mapping to the corresponding database schema or model.

8. Solving the library example The pattern described solves the replication issues of the library example outlined in Section 3. Book entries of a local database can be replicated to its parent regional database by applying the replicator pattern. The database information data can be directly reused from the design pattern. It contains the description of the database hierarchies by storing the parent regional database for every database. The replication manager process is extended with a user interface that allows to enter the data for a new database and to specify a database that is to be removed from the replication scenario. The database worker processes have to be implemented to interpret the commands issued by the replication manager process and to apply them to the local database. The commands contain the attributes of a book entry that is to be added or removed. One

702

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703

such worker process is implemented for every possible database type. The library example also requires functionality that is beyond the scope of the Replicator coordination design pattern. A distributed query is executed whenever book entries cannot be found in the local database. In this case the query is delegated to the parent database which contains all book entries for all the libraries of the region. Therefore, the possibility that the query succeeds at the parent database is higher. When a given number of matching entries has been found, the query is supposed to be complete. To achieve this functionality, the infrastructure of the pattern (processes, data structure) can be reused. Just a few extensions have to be made: the replication manager is extended to also handle queries. This means that the user interface allows to enter the data to be used for the query. To perform the query, an additional Corso process that belongs to the replication agent software system delegates the query to the next parent database if the local database worker did not succeed to retrieve enough records. The query will recursively be propagated to parent databases until the requested number of query results is found. The database workers are extended to perform the query and to return the results in shared data objects that are read by the interface process that submitted the request.

9. Ongoing and future work Besides the Replicator coordination design pattern introduced in this paper, seven other coordination design patterns have been identified to date. These patterns were found by analyzing real-world application scenarios. They have been classified according to their complexity and inter-process dependencies. In this classification, the Replicator pattern is one with a high complexity and with high inter-process dependencies. This is because there is one dedicated process (the replication manager) which plays the role of a so-called conductor, which is one that controls the execution of other processes (the database workers). All these patterns have been described using a notation that allows to illustrate the use of shared data objects, the transactions and the synchronization of the processes based on shared data objects.

In future we plan to implement samples for all these coordination design patterns. Our library of coordination design patterns should help novices in the subject to learn about the capabilities of shared data and to find a way to rapidly develop a first application.

10. Conclusion In this paper we described the design of a replication scenario as a coordination design pattern. A coordination design pattern outlines the solution to a typical recurring problem domain, where the solution is based on a coordination framework working with virtual shared memory. The pattern presented described the processes, the coordination data objects, and the collaboration among the participating processes based on the coordination data. We motivated the Replicator design pattern using an example from distributed library databases. The driving force for identifying and describing coordination design patterns is to share experiences in working with virtual shared memory systems. Novices in the subject tend to reuse well-known designs and solution strategies. We also experienced that using the terminology of the identified patterns eases the communication when talking about designs and leads to higher quality designs. Another reason for sharing our experiences by communicating coordination design patterns was to demonstrate the elegance of solutions that are based on coordination frameworks. The pattern described in this paper called the Replicator is supposed to be reused in situations where data needs to be replicated between machines, heterogeneous database systems, and varying database schemas. References [1] M.W. Bright, A.R. Hurson, S.H. Pakzad, A taxonomy and current issues in multidatabase systems, IEEE Computer, March 1992. [2] A. Elmagarmid, Y. Leu, W. Litwin, M. Rusinkiewicz, A multi-database transaction model for InterBase, in: Proceedings of the 16th International Conference on Very Large Data Bases, August 1990. [3] A.K. Elmagarmid (Ed.), Database Transaction Models for Advanced Applications, Morgan Kaufmann Los Altos, CA, 1992.

H. Wernhart et al. / Future Generation Computer Systems 16 (2000) 693–703 [4] E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns: Elements of Reuseable Object-Oriented Software, ISBN 0-201-63361-2, Addison-Wesley Reading, MA, 1995. [5] e. Kühn, Principles of coordination systems: a library approach for distributed and reliable programming in C and Prolog, Vienna, The CoKe Reference Manual for Version 0.8.2., 1996. [6] D.C. Schmidt, Experience using design patterns to develop reuseable object-oriented communication software, Commun. ACM (Special issue on object-oriented experiences) 38 (10) (1995). [7] D.C. Schmidt, A family of design patterns for application-level gateways, Theory and Practice of Object Systems, Special Issue on Patterns and Pattern Languages vol. 2, no 1, Wiley, New York, 1996. [8] A.R. Silva, Framework, design patterns and pattern language for object concurrency, in: Proceedings of the Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, Nevada, USA, 1998.

Heidemarie Wernhart received her MS degree in Computer Science from the Technical University of Vienna in 1985. She worked at the Institut für Angewandte Informatik at the University of Klagenfurt/Austria in a project funded by the Fonds zur Frderung der wissenschaftlichen Forschung, where she implemented a software development environment for the 4th generation language called HIBOL-2. Since 1989 she is working for IBM Vienna/Austria in the area

703

of software development. Her special interests are object oriented technology, design patterns and workflow management systems. eva Kühn received her MS degree in Computer Science in 1983 and the Ph.D. degree in 1989 from the University of Technology in Vienna. In 1990 she received the Heinz-Zemanek research award for her thesis about the implementation of multidatabase systems in Prolog. In the same year she spent the summer at the InterBase Laboratory at Purdue University, Indiana, supported by the Austrian Government (Kurt-Goedel-Stipendium). In 1994 she received the Venia Docendi (Habilitation) at the University of Technology, Vienna where she is currently working as a professor and lecturer at the Department of Computer Languages. Since 1997 she is the chief executive officer of the tecco Software Development GesmbH which was founded to market Corso. The company is supported by the Austrian Government. Her research interests include parallel and distributed programming languages, methodologies and systems, coordination systems and heterogeneous transaction processing. Georg Trausmuth received his MS in Computer Science in 1993 and the Ph.D. degree in 1996 from the University of Technology in Vienna where he also works as assistant professor in the distributed systems group. Currently he is on leave to work for tecco on a research project funded by the Fonds zur Förderung der wissenschaftlichen Forschung. His research interests include object-oriented programming and design, distributed systems and software architecture.