Comput. Lang. Vol, 15, No. 2, pp. 95 108, 1990 Printed in Great Britain. All rights reserved
0096-0551/90 $3.00 + 0.00 Copyright (~ 1990 Pergamon Press plc
INTEGRATING EMERALD MIXED-LANGUAGE
INTO A SYSTEM PROGRAMMING
FOR
ROGER HAYES, NORMAN C. HUTCHINSON a n d RICHARD D . SCHLICHTING Department of Computer Science, The University of Arizona, Tucson, AZ 85721, U.S.A. (Received 20 October 1988: revision receired 6 October 1989)
Abstract The difficulties encountered while resolving the programming models of object-oriented and procedure-oriented languages are discussed. As a concrete example, the integration of Emerald an object-oriented, distributed programming language--into the MLP system for mixed-language programming is described. This integration allows Emerald objects to invoke procedures written in other supported languages, and allows procedures written in the other languages to invoke operations on Emerald objects. Two different aspects of this integration are highlighted. First, the issues involved in bridging the distance between the programming model supported by Emerald and the programming model supported by MLP are discussed; specific problems include reconciling the object-oriented nature of Emerald with the procedure-oriented model of MLP, and accommodating Emerald's concurrency within the previously sequential MLP system. Second, the implementation of this design is described. Mixed-language programming Remote procedure call Procedural programming languages Concurrency
I.
Object-oriented programming languages
INTRODUCTION
Object-oriented programming languages support a programming paradigm that is appealing in many situations. By precisely capturing the basic elements of a model and their relationships, object-oriented languages are well suited to the system and process simulations for which they were conceived. In addition, the object abstraction has found general applicability in areas such as user interface design, semantic networks, and rapid prototyping. These advantages have led to the design and implementation of a number of object-oriented languages, including Smalltalk [1], CLU [2], Emerald [3 5], and Trellis/Owl [6]. On the other hand, object-oriented languages often suffer from an inability to communicate with programs or procedures written in other languages. For example, interlanguage communication may be necessary for programs written in object-oriented languages to invoke a wide range of system services. Even when not strictly required, the ability to construct mixed-language programs can be used to advantage in exploiting special language characteristics in specific situation,;. For example, such a facility might allow Icon [7] to be used for string parsing in a simulation program written primarily in an object-oriented language. The usefulness of mixed-language programming is evident in provisions for multiple languages in RPC (remote procedure call) facilities like HRPC [8], Horus [9], Matchmaker [10], Mercury [11], and MLP [12, 13], and in papers advocating its use as a means for taking advantage of the large amount of numerical software written in Fortran [14, 15]. To date, systems that support mixed-language programming have focused on procedure-oriented languages to the exclusion of object-oriented programming languages. One reason for this omission is the difficulty of meshing the programming model supported by such languages with the programming model typically supported by multi-language RPC systems. The most obvious difficulty is, of course, reconciling the difference between the object-oriented and procedureoriented programming paradigms. However, there may also be other, more subtle differences to be resolved. For example, the RPC system may support a much different type system than the language, or may be oriented towards sequential execution rather than concurrency. In this paper, we outline an approach to resolving these difficulties in the context of describing the integration of the object-oriented, distributed programming language Emerald with the MLP system for mixed-language programming. In addition to the object-oriented programming paradigm, Emerald supports concurrency, fine-grained object mobility, object persistence, and a c'~ ~.~ c"
95
96
ROGERHAYESet al.
type system that concentrates on the specification of objects rather than their implementation. In contrast, MLP is a procedure-oriented sequential system that allows a programmer to construct programs out of modules called components, where each component contains a collection of procedures written in a single language. The intent of MLP is that the programmer implement each component in the language whose features most appropriately support the kind of programming required by that module; the programmer is expected to understand only the high-level interactions between the components that make up a program, leaving the low-level details to the MLP system. The result of the integration of Emerald and MLP is a system that provides two useful and complementary facilities. First, it allows operations exported from an Emerald object to be invoked from other languages, thus giving programmers access to Emerald's special characteristics from other languages. Second, integrating Emerald into MLP allows procedures written in other languages to be invoked from within Emerald objects. Such functionality is particularly useful for providing access to existing UNIXt programs and libraries without having to modify the Emerald system; this is especially true for programs that require the C run-time environment, such as Suntools [16]. In addition, the ability to invoke procedures outside of Emerald can be used to allow Emerald programs to span different machine architectures. In the following sections, we describe the merger of Emerald and MLP. In Section 2, the relevant characteristics of the two systems are surveyed. Section 3 then discusses how the different programming models supported by these two systems can be reconciled. The way in which this reconciliation has been realized in practice is described in Section 4, which contains an overview of relevant parts of the implementation. Finally, some attractive application areas for the merged system are discussed and conclusions offered in Section 5. 2. B A C K G R O U N D 2.1 An overview of Emerald
Emerald is an object-based language and system designed for the construction of distributed applications on a homogeneous collection of machines. The principal feature of Emerald is a uniform object model appropriate for programming objects ranging from private local objects to shared remote objects. Objects are an excellent way to structure a distributed program because they encapsulate the concepts of process, procedure, data, and location. In contrast to a number of existing distributed programming languages and systems that support separate computational models for local and distributed entities, Emerald supports a single object model. All Emerald entities ranging from Booleans and integers to compilers and entire file systems are programmed using the same model, and have identical invocation semantics even though they are implemented differently. Each Emerald object has four components: a unique network-wide name, a representation, (i.e. the data local to the object, which consists of primitive data and references to other objects), a set of operations that can be invoked on the object, and an optional process. Emerald objects that contain a process are active; objects without a process are passive data structures. Objects with processes make invocations on other objects, which in turn invoke other objects, and so on to any depth. As a consequence, a thread of control originating in one object may span other objects, both locally and on remote machines. Multiple threads of control may be active concurrently within a single object; mutual exclusion is provided by monitors, and synchronization by built-in Condition objects. Two additional aspects of Emerald make it especially attractive for use from other languages. One is that objects are fully mobile and can move from node to node within a distributed Emerald system, even during an invocation. Moreover, the support for mobility is integrated into the Emerald language. The language explicitly recognizes the notions of location and mobility and provides primitives to discover the locations of objects and move them about in the network. The second intriguing aspect is object persistence: once created, each object exists in an object space maintained by Emerald as long as references to it exist. Object persistence is enhanced by provisions for programming objects that can survive processor crashes. Specifically, the checkpoint tUNIX is a trademark of AT&T Bell Laboratories.
Integrating Emerald into a system for mixed-language programming
~)7
primitive of Emerald allows a collection of objects on a single machine to save their state. When a machine recovers after a failure, all checkpointed objects are restored by the system to their most recently checkpointed state. Programmer defined recovery actions are supported through the use of object recover); code, which is executed for objects on a recovering machine once all objects have been restored. There are two major pieces of the Emerald language implementation: the compiler and a run-time support package called the kernel. The compiler translates Emerald source into native code for 1:he target machine, but instead of producing linkable modules, it produces implementation objects that are written to code files. These files are placed in a database shared by the compiler and kernel. The compiler also generates files that, when executed by the operating system, communicate with the kernel and cause the dynamic loading of required implementation objects and the creation of those objects specified in the compilation. The kernel provides the run-time environment for Emerald objects. A single Emerald kernel executes on each machine in the Emerald environment; all Emerald objects on a machine exist in this single address space. The kernel performs the dynamic linking necessary to incorporate newly defined objects into the environment, manages object creation and invocation, performs garbage collection of no longer referenced objects, and manages the concurrency demanded by Emere~ld. Managing concurrency includes multiplexing light-weight Emerald processes within the kernel's single address space and implementing the monitor entry/exit and condition variable wait/signal operations. In addition, because Emerald is a distributed programming language, the kernel must manage communication with the kernels on other machines in the local environment. This communication is necessary for implementing remote invocation of object operations, performing object migration and location, and performing distributed garbage collection. In addition to the communication path between Emerald kernels, which is implemented using a custom reliable datagram protocol layered on top of UDP, the kernel also manages multiple connections to the outside world. A kernel measurement and debugging system uses a custom protocol layered on TCP, compiler-kernel communication uses raw TCP. and an interface to the X window system may use either TCP or UNIX domain streams.'~ Emerald is currently implemented under Berkeley UNIX on Vaxes and Suns. Emerald environments are currently running on local networks at the Universities of Arizona, Copenhagen and Washington. A small number of applications have been implemented, including a mail system, a shared calendar system, a file system, and a replicated name server. In additior~, a number of load-sharing applications have been implemented to experiment with light-weight mobility.
2.2 An oz'erview oj" M L P MLP is a system for constructing distributed, mixed-language programs. Specifically, the system is a simple RPC (remote procedure call) system that allows each procedure in a sequential program to be written in a different programming language and executed on any machine in a network. MLP is based on a type scheme called UTS (the Universal Type System), which is composed of a data representation standard and a type expression language. The type expression language is used as an interface definition language in a manner similar to the use of Courier in Xerox systems [17]: the data representation standard specifies the format of data transferred as arguments or results between procedures in a machine- and language-independent fashion. A data element represented in this format is known as a UTS value. An MLP program is made up of components, each providing procedures written in a sitlgle programming language, known as the host language for that component. The interface to a component, that is, the number and type of parameters for each imported and exported procedure, is described by an interface specification written in the UTS type expression language. The interface specifications are used by the M L P stub generator for the particular host language to generate s'tuh proce,tures for each imported or exported procedure; the stubs marshal data between host language format and UTS format as required, and interface the user's code with the MLP run-time system. The user-written procedures and the stub procedures are compiled using the standard language t T h i s interface to X would more naturally be done now using MLP.
98
ROGERHAYESet al.
translator and then linked with the MLP library code and run-time to produce an executable component file. Components communicate by sending and receiving messages in UTS format. A call message contains a procedure identifier, an argument record, and a reply address. When idle, each component is blocked on the receipt of a message in the MLP run-time procedure mlp_server. The receipt of a call message results in invocation of the proper stub procedure, which unmarshals the arguments and calls the exported procedure. If this procedure in turn invokes an imported procedure, a host language call is performed to the import stub for that procedure. This stub then marshals the arguments, sends the cell message, and enters a recursive invocation of mlp_server; this last action allows the component to service any new calls that might be generated by the continuing execution. When a reply message is received, the most recent invocation of mlp_server returns to the stub for the imported procedure, which unmarshals the return value and returns to the invoking procedure. Messages are transmitted between components using streams, a message transport abstraction supported by MLP that allows use of multiple protocols. The mechanism for realizing this abstraction is the new type Stream. When a value of this type is created, it is initialized with the protocol to be used (e.g. TCP, UDP) and the address of the creating component in a form appropriate for the protocol. Stream values are then disseminated to other components that wish to communicate with the creating component. Adding a new transport protocol requires only adding to the definition of the Stream type: this change is transparent to the rest of the MLP system. Execution of an MLP program is started and managed by a special server component, mlpx. The program is started by executing mlpx as a command, specifying the files containing the program components as arguments, mlpx performs three separate functions. First, mlpx handles component initialization and finalization. When an MLP program is started, mlpx starts each component and enables communication by supplying a stream value with the address of the server; the component completes the startup exchange by sending its own stream address to the server. When an MLP program is finished, mlpx terminates all components in an orderly fashion. Second, mlpx provides a mapping between procedure names and the components implementing those procedures. Following startup, the server ascertains the exported procedures in the program by invoking a standard 'Export' procedure that is provided by each component. This procedure returns a component value, a data structure containing a description of the procedures exported by the component. Components communicate with mlpx to resolve references to imported procedures when the procedure is first called. The third task of mlpx is to initiate execution of the MLP program. To do this, mlpx currently calls a designated procedure (usually main), terminating the MLP program when it returns. In addition to handling straightforward cross-language calls with automatic data marshalling, MLP has facilities to allow explicit user control over data conversion. This provides the flexibility required for procedures whose arguments can vary in type from call to call, for procedure values passed as arguments, and for parameters of UTS types not supported by the host language. The basis for these facilities are representatives, which are capabilities or "tickets" for encoded UTS values maintained by the MLP system. A Representative data type is added to each MLP language; this type is used for parameters whose values are not to be marshalled automatically. To manipulate instances of type Representative, a collection of procedures is available as part of the MLP library. These procedures provide the ability to marshal and unmarshal data under program control, to determine the type of the UTS value associated with a representative, and to invoke a procedure value held in a Representative. Despite these advanced features, MLP is basically a simple system. This simplicity is deliberate; the primary goal of MLP is to support access to many languages, so the data types provided are those that are common to most languages and supportable in all. As a result, the more refined points of semantics are left to the programmer, aided by the provisions for explicit data conversion described above and by the flexibility of the type scheme. We have found that this philosophy results in a system that is simple, yet still useful in practice [13]. The version described above is a second version of the system. The previous version, which has a different internal structure, supports C, Pascal, and Icon. It has been in use since November 1986 on an interconnected collection of Vaxes and Suns running Berkeley unix. To date, it has been
Integrating Emerald into a system for mixed-language programming
99
used to write a mail system, a small database system, and a collection of network transparent plot routines. 3. R E C O N C I L I N G
PROGRAMMING
MODELS
Integrating a programming language into a multi-language programming system requires reconciling the programming model of the language with the model supported by the system. In the case of Emerald and MLP, three specific issues must be addressed. The first is, of course, resolving differences between the object/operation model supported by Emerald and the procedural model supported by MLP. The second is mapping between Emerald's types and UTS, the type system used by MLP. The final issue is accommodating the concurrency introduced into MLP by the addition of Emerald to a system that had previously supported only sequential languages. Interestingly, it is this last issue that has proven perhaps most challenging.
3.1 Object-oriented vs procedure-oriented The first step in resolving the difference between the two programming paradigms involves developing a mapping between the world of objects and operations supported by Emerald and the world of components and procedures supported by MLP. The simple mapping of an object to a component and an operation to a procedure has proven to be very workable. Under this mapping, each Emerald object participating in the execution of an MLP program appears as a component from outside of Emerald, while the non-Emerald MLP components appear as objects from within Emerald. It is worth mentioning that the success of the mapping between RPC system entities and objects depends greatly on the design of the RPC system. This mapping of object to components is successful in MLP because components are represented both statically in the definitions of the pieces that make up the system and dynamically in the system's run-time data structures. In other RPC systems this mapping is less successful. For example, while HRPC, SunRPC, and Horus all statically group procedures into interfaces, this grouping is not represented at run-time, and thus objects can also not be represented at runtime. The straightforward mapping between MLP and Emerald is possible primarily because the component/procedure hierarchy supported by M LP can be viewed as object-oriented in its own right. Another related issue is the notion of a program itself. In MLP, a program is essentially a traditional command that is initiated from the command line by invocation of mlpx, calls a single "main" procedure, and terminates when that procedure returns. On the other hand, the idea of a program in Emerald is somewhat more amorphous due to the existence of a persistent object space; in particular, objects within that space can interact freely regardless or when they were actually compiled or incorporated into the Emerald system. Emerald objects continue to exist until they are no longer referenced, at which point they are reclaimed by the garbage collector. The issue with respect to model reconciliation is which paradigm to support. Our solution is to support both paradigms, with the choice depending on whether the MLP program is initiated from the command line or from within Emerald. Section 4 elaborates on this point.
3.2 Type systems The second issue to be addressed in the integration of an object-oriented language into a system for mixed language programming is reconciling the disparate type systems. One problem here is that object-oriented programming languages take a fundamentally different view of data than their procedure-oriented (or value-oriented) counterparts. In a value-oriented language, data is defined independently of any operations that manipulate it; in an object-oriented language, data and operations are defined in an integrated way, Hence, the mapping between the UTS types and the Emerald type structure must take these differences into account. Problems with identity and the encapsulation of data and operations can be ignored for the base types of UTS, like Integer and Float, as they map directly into predefined types of Emerald. The type constructors of UTS, Record and Array, also map in a straightforward way to their Emerald counterparts. However, this straightforward mapping of composite types is not perfect. Emerald's view of composite types as objects rather than values combined with the value-result parameter
100
ROGER HAYESet al.
passing mechanism employed by MLP can cause problems in certain rare circumstances. For example, if an Emerald array, record, or other composite object is passed as an argument to an MLP procedure and later returned as a result of another operation the identity of the object will be lost. That is, the object will re-appear in Emerald as a new object isomorphic with the old rather than as the old object itself. This can cause coherency problems with other shared references to the original object. Another problem involves procedure values. As mentioned in Section 2.2, UTS provides a procedure data type that facilitates transmission of procedure-valued arguments between components. Unfortunately, this type does not map directly to Emerald since, like other truly object-oriented languages, Emerald has no notion of an independent procedure; Emerald operations can be understood only in the context of some object. The solution to this problem is to map a UTS procedure value into Emerald as a single-operation object, created when the UTS procedure value is decoded. Invocation of the operation results in a call of the represented MLP procedure. In the other direction, when encoding an Emerald operation into a UTS procedure value, both the object and the operation specifier for the operation must be given. Of course, it is also possible to represent an entire Emerald object as a component value.
3.3 Concurrency The final major issue to be resolved in the area of programming models involves concurrency. Prior to the incorporation of Emerald, all languages supported by MLP were sequential, and hence, a given MLP program only had a single thread of control. The introduction of Emerald with its ability to spawn multiple threads changes this situation by making it possible for invocations generated by independent threads to arrive concurrently at a sequential component. As a result, the programming model has to be modified to accommodate interactions between concurrent and sequential languages. We now survey three alternative solutions to his problem, discussing the positive and negative points of each possibility. The evaluation criteria used include both the correctness and coherence of the model presented to the programmer, and its effect on the complexity of the implementation. The first alternative is to change the model seen by the programmer from a completely sequential one as exists currently in MLP to one that is completely concurrent. To do this, facilities are added to the sequential languages to allow creation of multiple threads and to support synchronization of these threads within the component. In addition to major changes in the various host languages, this alternative requires modifying the MLP runtime to support multiple threads. This approach has a number of drawbacks related both to the implementation cost and the resulting programming model. The first is an obvious problem; the expense of adding concurrency to each supported language and the MLP runtime is prohibitive. However, the problems with the model are potentially more serious, since reverse engineering concurrency into all sequential languages results in a collection of languages that tends towards homogeneity rather than specialization. This approach is counter to the intent of MLP as a system designed to exploit the specific features of its supported languages. Moreover, the approach adopted in MLP is generally minimalist in nature; the intent of the system is merely to facilitate communication among existing languages, not to create new ones. A second way to resolve the existence of both concurrent and sequential languages is to enforce complete separation of the programming models by ensuring that the semantics of calls to procedures in sequential components are unaffected by the presence of concurrent components. To achieve this, a thread invoking a procedure in a sequential component is given exclusive access to that component during execution of that procedure; in essence, the component is locked when the invocation is accepted and only unlocked when the invocation returns. Note that this approach implies that a component remains idle if execution of a procedure generates an external call to another MLP component unless that call results in a recursive invocation of the original component. While clearly more desirable than the first alternative, this approach also has several negative points. One is that leaving the component locked during an external invocation restricts concurrency without taking the semantics of the component into account. Only when the procedures of a sequential component are not re-entrant does the component require this sort of
Integrating Emerald into a system for mixed-language programming
101
protection from concurrency. For a large class of sequential components, this is an unnecessary restriction of' concurrency. Another problem is that implementing this approach in the M L P runtime system is not as easy as it might first appear. The basic requirement is that those invocations that cannot proceed because the component is locked be deferred until the appropriate return is executed. Although the actual deferring of calls is relatively easy, the problem is distinguishing which invocations must be deferred. Specifically, calls associated with the thread currently owing the lock (i.e. both directly and indirectly recursive calls) must be allowed to execute, while those calls associated with other threads must be blocked. To differentiate these two types of calls, some fraction of the previous call history must be encoded in each call packet and each reply packet. It is sufficient for each component to keep track of its own recursion level and record in each call and reply message the recursion level for each component in the program. A minor difficulty with this scheme is that the space required to store this history information is not known until program initiation, as it depends on the number of components in the program. The third a p p r o a c h - - a n d the one we have chosen to adopt---is to support a degree of coexistence by allowing concurrent invocation of procedures in a sequential component, but without reverse-engineering concurrency features into the structure of the language itself. The implication is that the writer of an M L P program must ascertain the potential for concurrent invocations when writing a component in a sequential language and, if such invocations can occur, allow for their effects. Fortunately, dealing with such concurrency is usually not a problem in practice. For one thing, the type of program in which this situation might arise is well-defined: an Emerald component must have multiple threads that generate invocations to the same external sequential component either directly or indirectly. Moreover, the most common use of such a structure would be for cases where the sequential component is functioning as a "library" component, and writing re-entrant code for which concurrent invocations are acceptable in such cases is usually not difficult. The most direct implementation of this approach would be to change the M L P runtime of each sequential language to support multiple threads of control. However, this strategy has the disadvantage of being expensive; it requires, in essence, adding a lightweight process facility to every M L P language. To simplify the implementation, we opted instead to implement a restricted form of threads in which context switching can occur only at the point of an external call or following a return. Since these are exactly the two places in which new invocations can be serviced in the standard existing runtime for M L P languages, this restriction allows the same recursive strategy to be used in the runtime as described in Section 2.2. That is, whenever an external call is made, m l p s e r v e r is invoked recursively to handle new invocations or reply messages, where the latter causes control to exit the topmost instantiation of ml p s e r v e r . Of course, the basic difference with the addition of Emerald is that an invocation waiting for service may now be associated with any one of many threads rather than just the thread that made the external call. The use of this particular implementation strategy for implementing multiple threads within a sequential component does suffer from one drawback, however. When there is only a single thread in the program, it can be guaranteed that the external calls and resulting replies associated with a single component obey a last-in-first-out (LIFO) discipline. That is, a component waiting for a reply message associated with a particular external invocation can always assume that the first replay matches the invocation. Unfortunately, this L I F O discipline breaks down when pending invocations are from independent threads. The crux of the problem is that these threads may invoke a single M L P component which itself makes external calls to another component. To illustrate this problem, consider an M L P program with two independent threads tl and t 2, and two external components A and B; such a program is diagramed in Fig. 1. The problem arises if both t~ and t, invoke procedures within A at roughly the same time (invocations labeled 1 and 2 in F:ig. I). In the course of executing thread tt, suppose that an external call to a procedure in B is generated (invocation 3). While B is executing, A is idle, and so accepts the invocation associated with thread t,. Suppose further that execution of this procedure also generates an external call to a procedure in B (invocation 4). Since the component B serializes incoming calls, this second call (from thread t,) must wait for the t~ call to complete before it can execute. The result is that A will receive the reply from B associated with t I (reply 5) before the reply associated with t2, a violation of the last-in-first-out assumption.
102
ROGERHAYESet
al.
ComponentB l 1 5
l 4
ComponentA
Fig. 1. ConcurrencyviolatesLIFO ordering. Solving this problem while retaining the simplicity of the existing runtime is relatively straightforward: defer the processing of any replay messages that arrive out-of-order. Thus, in the scenario from Fig. 1, the reply associated with tl would be saved by component A and deferred until after the reply associated with t2 has been received and processed. The implementation of such a strategy is straightforward. Each call is given a sequence number and an out-of-order reply, i.e. one with a sequence number different from that of the current external invocation, is saved until the appropriate instantiation of mlp_server regains control. Before executing a blocking receive, mlp s e r v e r checks for a deferred reply by comparing the sequence numbers of replies that have been received and stored against the number of the expected reply. For most programs, this approach yields results identical to what could be achieved by implementing true multiple threads. To see this, consider again the scenario outlined above. With the deferred reply approach, t~ is forced to wait for t2 to finish. With multiple threads, this would not be the case: t~ could complete execution of its procedure and return before the component receives the reply associated with t2. However, note that, in the absence of synchronization between t~ and t2 following their external calls from the component, the order in which the replies are received is indeterminate and depends solely on timing. In other words, both execution sequences are legal in the sense that they are both possible given a component that supports multiple threads. Therefore, even though deferring replies restricts execution to LIFO order, the resulting sequence is a legal sequence that could also have occurred with multiply-threaded components. It is, of course, possible to write MLP programs in which the execution sequence mandated by deferred replies yields unexpected behavior as opposed to true multiple threads. For example, if the execution times of two operations are vastly different, it is possible that processing of the first reply will be deferred for a long time waiting for the second reply to arrive.
3.4 Summary Object-oriented languages--especially those that are also concurrent--are a challenge to integrate into multi-language RPC systems that are targeted for sequential, procedure-oriented languages. The primary problem is integrating the programming models in such a way that the resulting system preserves the unique features of each language while providing coherent access to those features from other languages. The more the new language differs from the existing languages, the greater the challenge. The additional goal of minimizing the effect on the existing infrastructure software only serves to exacerbate the task. Each of the three aspects of model resolution described above--object vs procedure-oriented, type systems, and concurrency--have their subtleties that often require making compromises. For example, although the mapping between type systems seems straightforward in the case of Emerald and MLP, it has certain deficiencies from the point of view both of Emerald and the other languages in the system. Such compromises were also seen in the resolution of the concurrency issue; the answer we chose requires programmers of sequential components to be aware of potentially concurrent invocations, as well as the particular context switching strategy used to avoid excessive
Integrating Emerald into a system for mixed-language programming
103
implementation complexity. While these choices are reasonable given that an MLP program is typically written by a single programmer, other strategies might be more appropriate for RPC systems in which modules are typically written by different individuals.
4. I M P L E M E N T A T I O N Adding Emerald to the suite of languages supported by M L P required making additions and changes to the implementations of MLP and especially Emerald. This section surveys these changes. To do so, we first discuss the general way in which programming model differences influenced the structure of the implementation, then describe in more detail the required software additions and changes, and finally outline the run-time execution of the resulting system. Where applicable, we also highlight the way in which the model reconciliation decisions described above have been carried out. 4.] System structure
As emphasized in previous sections, MLP supports a procedure-centric programming model, whereas Emerald's model is object-centric. In addition, Emerald's implementation style, which uses a single unix process (the Emerald kernel) to support all the objects on a given machine, is very different from the one unix process per program model of procedure-centric languages. These differences have caused us to use an implementation style that is different from that used for procedural languages such as C and Icon. The general structure of an MLP component in such a language is shown on the left side of Fig, 2. In addition to the component body written by the programmer, the component includes three additional kinds of code. The first is the MLP run-time, which oversees the execution of the entire component; it contains routines that facilitate communication with other components, as well as a main procedure that performs the initialization, services invocations, and manages the orderly termination of the component. The second is the MLP library, which contains the user-level routines mentioned in Section 2.2 plus low-level routines that encode and decode UTS values. The last are the stub procedures for each imported and exported procedure that are created by the MLP stub generator. These stubs perform the data conversion tasks necessary to transform data values between host language format and the UTS external representation. They also invoke the MLP run-time routines to exchange messages with other components. A C Component
l
EmeraldComponents
/
I
Network
Fig. 2. The structure of C and Emerald components.
104
ROGI~RHAYESet al.
Incorporation of a new language into M L P requires constructing versions of these three pieces of software for the new language. For procedural languages, this task is usually easy. In fact, since their common programming model implies an identical overall system structure, the software can be constructed in most cases by simply modifying the code for an existing MLP language. However, note that this style of implementation relies on the procedure-centric nature of the host language; that is, procedures are the main structuring features. This allows the stubs, run-time, MLP library, and component body to be separately compiled, relying on the linker to bind them together. Emerald, on the other hand, is object-centric, implying that the implementation style must be adapted to conform to this different programming model. For example, operations are not global, rather they are interpreted in the context of the object on which they are invoked. This means that an MLP component containing an external invocation of an Emerald operation must be able to name the object containing the appropriate operation at run-time. To illustrate how Emerald objects fit within an MLP program, consider the representative Emerald component shown on the right side of Fig. 2.t The first features to note are the two Emerald objects that function as the stubs for incoming and outgoing calls, respectively. Each stub object contains one operation for each exported or imported operation in the associated Emerald object. Note also the M L P l i b object. This object serves as the MLP library and contains operations that allow UTS values in representatives to be manipulated both by the stubs and explicitly by the user. The collection of the stub operations into stub objects and the definition of the M L P l i b object reflect the difference in philosophy between Emerald and the other languages supported by MLP. Emerald encourages the use of objects for structuring: defining stub objects and a shared M L P l i b object is the natural Emerald way to provide access to these facilities. The use of objects as the basis for structuring did, however, cause problems in the binding of references between the component body, stubs, and MLP library. Referring back to Fig. 2, we see that the export stub must be able to refer to the body, and the body must be able to refer to the import stub. In addition, each activation of an Emerald MLP component causes the creation of a new object (and a new pair of stub objects). The only practical solution to this mutual dependency between dynamically created objects and stubs is to compile them together. A simple naming convention then allows the stubs and body to name one another. A second feature to note is that the functionality of the M L P run-time is implemented by the Emerald kernel; it is not a part of the component as it is in the C component. This allows the MLP run-time to be shared among the various Emerald MLP components on a given machine. Finally, recall from Section 3.1 that our implementation is intended to support both the traditional notion of an M L P program and the more amorphous Emerald model of a program as a collection of freely interacting objects. The mechanics of including an Emerald object in an MLP program in the first case are straightforward. The file containing the code for the object and a separate specification file containing the UTS import/export statements are processed by an Emerald stub generator and compiler to produce an executable component. Execution of the MLP program is then initiated by command line invocation of mlpx. Like components written in other languages, the Emerald object is included in the MLP program by providing the name of its executable file as an argument to mlpx. To support the Emerald notion of a program, an M L P component must be incorporated into the Emerald world in a way analogous to Emerald objects. A slight modification of the standard M L P method is sufficient to accomplish this task. A programmer wishing to provide Emerald access to an MLP component written in another language must implement a stub object in Emerald that imports the operations provided by the component and exports to Emerald operations that merely call these imported operations. In addition, the programmer must implement an Emerald object that starts mlpx with the 'foreign' component and the Emerald component just described as arguments. Note that when started from Emerald, mlpx need only perform two of its three functions: component management and name mapping. Program initiation by mlpx is not needed because the main procedure in this case is in Emerald code. tThe large oval delineating each Emerald component is only a logical boundary, as opposed to the large oval surrounding the C component, which represents the physical boundary of a UNIXprocess,
Integrating Emerald into a system for mixed-language programming
105
4.2 Software additions and modifications Integrating Emerald and MLP required making changes and additions to the implementations of both systems. The modifications to MLP were minimal, however: the addition of a new transport protocol (discussed below) and code to handle the deferred reply messages discussed in Section 3.3. Accordingly, we focus here on Emerald, discussing the Emerald stub generator and modifications to the Emerald run-time system. Despite the differences in the implementation style between MLP components in procedurebased languages and object-based languages, some of the required software can still be adapted from existing languages. The best example is the Emerald MLP stub generator, which was easily constructed from the prototypical MLP stub generator. Instead of generating stub procedures for the imported and exported operations, the stub generator creates objects that can dynamically create stub objects in response to the creation of new components. The actual construction of the new stub generator took about a day. The prototypical MLP stub generator is divided into a language independent front end that builds an internal representation of the UTS types contained in a specification, and a language specific back end that walks this representation and generates the stubs. Since stubs are very straightforward, consisting mostly of calls to UTS routines to marshal and unmarshal parameters, the changes required to an existing stub generator were mostly syntactic. The more substantial modification involved incorporating the MLP run-time system into the Emerald run-time kernel. Our general approach to this task is to use existing MLP support code wherever possible, to write most of the remaining code in Emerald, and minimize changes l:o the Emerald run-time system. There are three specific issues worth addressing. First is the problem of communication between the components that make up an MLP program. MLP allows the use of multiple transport mechanisms, much like H R P C [8]. The extant transport protocols refer to components using addresses provided by the underlying operating system (in this case, UNiX socket addresses). In the case of Emerald, every object that is part of an MLP program on a given machine runs as part of the single Emerald kernel process on that machine. Extending the existing addressing scheme to the Emerald domain is not feasible because communication connections (sockets) are a scarce resource. A socket address per Emerald object would imply a socket per object, but UNIX imposes limits on the number of sockets that a single process (the Emerald kernel) may own. To overcome this problem, we have added a new transport mechanism (that is, a new implementation of the MLP Stream abstraction) that includes two levels of addressing: a socket address for the Emerald kernel, and a second level of address understood by the kernel in order to identify the specific object within Emerald. As the kernel was already designed to handle multiple external connections, it was relatively easy to add another connection for messages associated with MLP programs. In concrete terms, an M L P stream built on TCP is addressed by the usual (IP address, TCP port number) pair, while an M L P stream for Emerald is addressed by an (IP address, TCP port number, Emerald object I D ) triple. A message is delivered on a TCP stream by opening a TCP connection and sending the message over that connection; a message is delivered on an Emerald stream by opening a TCP connection to the Emerald kernel and sending the receiving object ID followed by the message. The Emerald kernel trivially demultiplexes each incoming message to the addressed object. The second issue is dealing with UTS values within Emerald. These values must conform to the data representation prescribed by the external data representation associated with UTS. Since this serialized form is not valid Emerald data, this data is stored as BitChunks--an Emerald type that contains an otherwise uninterpreted bunch of bits. The M LP types required by the run-time system, such as those used for Streams and Representatives, are constructed using the standard Emerald type definition facility. The third issue relates to the initialization of an MLP component. When the server component mlpx is started, it is passed as arguments an executable file corresponding to each component that it is to create. For components written in procedure-based languages this file is merely the executable file consisting of the component linked with the stubs and M L P run-time. For components written in Emerald this file takes on a different form. As mentioned in Section 2.1, the Emerald compiler already generates files that, when executed, communicate with the kernel and
106
ROGER HAYESet al.
cause the dynamic loading of required code and the creation of those objects specified in the compilation. These same files may be used to create an Emerald MLP component with one minor extension: they have been modified to accept an additional argument, namely the MLP stream value to be used in communication with this instance of mlpx. This argument is passed on to the kernel; the newly created Emerald objects then retrieve the stream from the kernel in order to complete the standard MLP component initialization.
4.3 System dynamics The dynamic behavior of MLP programs after the changes described above includes three aspects: initialization, invocation, and termination. Initialization of an MLP program is very much the same after the addition of Emerald as before. Once the mlpx server has been started, whether from the command line or from within Emerald, it causes the creation of the appropriate Emerald objects as outlined in Section 2.2. In particular, the object corresponding to an Emerald MLP component is created along with its stubs, and a stream value identifying the export stub is communicated to mlpx. The rest of the standard MLP initiation protocol is then executed, including the inquiry by the server to a standard 'Export' operation implemented by the export stub object to determine the operations exported by this Emerald MLP component. The only change is that when mlpx is invoked from within Emerald it does not execute a call to the "main" procedure as described in Section 4.1. Following startup, invocations of imported procedures by an object and exported operations by procedures in other components are handled much like any other MLP invocations. The major difference is that, as mentioned above, the Emerald kernel implements the functionality realized by a separate MLP run-time package in other languages. Te illustrate the effects of this change, consider the invocation of an operation exported from an Emerald component. The incoming call message, which includes the arguments in the UTS external data representation format, comes in on the stream returned to mlpx when the Emerald component was initialized. The Emerald kernel creates Emerald BitChunk objects containing the UTS arguments, decodes the procedure identifier, and invokes the call operation in the export stub object named by the stream. The export stub then decodes the arguments from the BitChunk into Emerald values and invokes the real operation. When that operation returns, the Emerald result values are encoded back into representatives, which are then passed back to the kernel for transmission on the reply stream contained in the call message. Invocations from Emerald to MLP are symmetrical; the reply stream passed in the call message identifies the important stub, which is the object that is prepared to receive the reply. Termination of MLP started from the command line is also unaffected by the addition of Emerald. However, when MLP components are integrated into an Emerald environment, termination becomes a serious problem since it is not clear when such components can be terminated. Recall from Section 4.1 that incorporating an MLP component into Emerald relies on the existence of an Emerald object that serves as an interface for the component. As long as this interface object is referenced from within Emerald, new calls to the component may be made. Therefore, the appropriate time to destroy the component is when the stub object is no longer referenced. The Emerald garbage collector detects precisely these situations, destroying objects when all references to them cease to exist. However, the garbage collector does not inform objects of their impending destruction so there is no way for the interface object to terminate its MLP components before it is destroyed. We are continuing to investigate modifications to the garbage collection strategy that will allow the orderly termination of MLP components started by Emerald. 5. C O N CL U SIO N S In this paper, we have described how an object-oriented, distributed programming language can be incorporated into a multi-language RPC based on sequential programs and procedures. In addition to describing how the integration of Emerald and MLP has been implemented, we addressed the broader issue of reconciling programming models. Of the three major issues that arose during this reconciliation process--object-oriented vs procedure-oriented, type system mapping, and concurrency--it is clear in hindsight that the concurrency problems were the most
Integrating Emerald into a system for mixed-language programming
107
challenging. Our approach of allowing concurrent invocations of procedures written in sequential languages seems a very reasonable solution to the inherent tensions that arise when concurrenl and sequential languages interact within a single program. Future work aims at exploiting these features in the construction of realistic mixed-language applications. We have identified a number of applications that could significantly benefit from this approach. For example, one potential application of this work would be to build a name server similar to the one described in [18]. This name server provides uniform access to various different name spaces, such as those found in the Domain name system, and "yellow-pages" and "whois'" services [19, 20]. It would be convenient to implement this server in Emerald by having an object for each different namespace, each providing a name resolution operation. To be useful to the general population, the server must be callable from outside Emerald; to be able to resolve names in a wide variety of namespaces, the Emerald objects must have access to system services such as yellow pages routines and Domain servers. The use of MLP appears to be ideal for providing both kinds of functionality. Another attractive application is construction of a user interface management system similar to the Squish system [21]. This system aims at developing a collection of graphic interfaces to UNIX procedures; the same technique could be used in Emerald to create a graphic interface to any MLP procedure. The use of an object-oriented language is ideal for this, since it provides the concurrency and the flexibility needed to implement a wide variety of graphic objects that have similar procedural interfaces yet independent behavior. Finally, it is worth emphasizing that the implementation effort required to integrate Emerald and MLP was very reasonable. The only changes to MLP were the addition of the Emerald transport protocol and the ability to defer replies, while for Emerald the changes consisted of some straightforward modifications to the Emerald run-time and the adaptation of an existing MLP stub generator to handle Emerald. Although due in part to the specific designs of MLP and Emerald, we also view the ease of incorporation as evidence that it is often possible to find more common ground between disparate programming languages and models than might be expected. Acknowledgements--Special thanks to C. Jeffery for incorporating many of the Emerald run-time changes required for MLP. We also thank two anonymous referees for useful comments on an earlier version of the paper. This research has been sponsored by the National Science Foundation under grants CCR-87-01516 and CCR-88-11423 and the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under grant AFOSR-84-0072. Equipment used in the projccl was provided by the DoD University Research Instrumentation Program (URIP) under grant AFOSR-8'i-0089.
REFERENCES 1. Goldberg, A. and Robson, D. Smalltalk-80: The Language and its Implementation. Addison Wesley, Reading, MA; 1983. 2. Liskov, B., Snyder, A., Atkinson, R. and Schaffert, C. Abstraction mechanisms in CLU. Commun. ACM 20(8): 5(},4 576; August 1977. 3. Black, A., Hutchinson, N., Jul, E. and Levy, H. Object structure in the Emerald system. Proc. A C M C.m/i on Ohjeet-Orh'nted Programming Systems, Languages and Applications, Portland, OR, pp. 78-86: October 1986. 4. Black, A., Hutchinson, N., Jul, E., Levy, H. and Carter, L. Distribution and abstract types in Emerald. IEEE Trans. S~/?ware Engng SE-13(I): 65 76; January 1987. 5. Jul, E.. Levy, H.. Hutchinson, N. and Black, A. Fine-grained mobility in the Emerald system ACM Trans. Comput. Svs. 6(1): 109 133: February 1988. 6. Schaffert, C., Cooper, T. and Wilpolt, C. Trellis Ol~iect-Based Encironment: Language Manual. Technical Report DEC-TR-372, Digital Equipment Corporation, Hudson, MA; November 1985. 7. Griswold, R. and Griswold, M. The Icon Programming Language. Prentice Hall, Englewood Cliffs, N J; 1982;. 8. Bershad. B. N., Ching, D. T., Lazowska, E. D., Sanislo J. and Schwartz, M. A remote procedure call facility for interconnecting heterogeneous computer systems. IEEE Trans. Software Engng SE-13(8): 880-894; August 1987. 9. Gibbons, P. B. A stub generator for multi-language RPC in heterogeneous environments. IEEE Trans. Software En~zng SE-13(I): 77 87, January 1987. I0. Jones. M. B., Rashid, R. F. and Thompson, M. R. Matchmaker: An interface specification language for distributed processing. Proc. 12Ih A C M ,~vmp. on Principles o['Programming Languages, New Orleans, pp. 225 235; January 1985. I 1. Liskov. B.. Bloom, T,, Gifford, D., Scheifler, R. and Weihl, W. Communication in the Mercury system. Programming Methodology Group Memo 59, MIT Laboratory for Computer Science; October 1987. 12. Hayes, R. and Schlichting, R. D. Facilitating mixed language programming in distributed systems. 1EEE Trans. Software Engng SE-13(12): 1254 1264; December 1987. 13. Hayes. R.. Manweiler, S. and Schlichting, R. D. A simple system for constructing distributed, mixed-language programs. S~/'tware Praet. Exper. 18(7): 641--660; July 1988. 14. Einarsson, B. and Gentleman, W. M. Mixed language programming. S()[tware Pract. Exper. 14(4): 3:~3 395; April 1984.
108
ROGER HAYES et al.
15. Einarsson, B. The structure of mixed language programming realization. Research Report LITH-IDA-R-85-01, Department of Computer and Information Science, Linkoping University; January 1985. 16. Sun Microsystems, Inc. SunView Programmer's Guide. Part No. 800-960-1300, Mountain View, CA; February 1986. 17. Xerox Corp. Courier: The Remote Procedure Call Protocol. Xerox System Integration Standard XSIS 038112, Xerox Corp., Stamford, CT; December 1981. 18. Bowman, M , Peterson, L. and Yeatts, A. Univers: An attribute-based name server. Software--Pratt. Exper. To appear. 19. Sun Microsystems, Inc. The Yellow Pages database service. In Network Programming, pp. 22-26. Part No. 800-1779-10, Mountain View, CA; May 1988. 20. Harrenstien, K., Stahl, M. K. and Feinler, E. J. NICNAME/WHOIS. Request for Comments 954, SRI International, Menlo Park, CA; October 1985. 21. Henry, T. R. and Hudson, S. E. Squish: A graphical shell for UNIX. Proe. Graphics Interface '88, Edmonton, Alberta, pp. 43 49, June 1988. About the Author--ROGER HAYES received the B.Sc. degree in computer science from Portland State
University in 1982, and the M.Sc. and Ph.D. degrees in computer science from the University of Arizona in 1984 and 1989, respectively. He is currently a Member of the Technical Staff of Sun Microsystems, Mountain View, California. About the Author--NORMAN C. HUTCHINSON received the B.Sc. degree in computer science from the
University of Calgary in 1982, and the M.Sc. and Ph.D. degrees in computer science from the University of Washington in 1985 and 1987, respectively. Dr Hutchinson then joined the faculty of the Department of Computer Science at the University of Arizona. His research interests include programming languages and operating systems, specifically those intended for distributed environments. He is a member of both the ACM and IEEE Computer Society. About the Author--RICHARD D. SCHLICHTING received the B.A. degree in mathematics from the College
of William and Mary in 1977, and the M.S. and Ph.D. degrees in computer science from Cornell University in 1979 and 1982, respectively. He is currently an Associate Professor in the Department of Computer Science at the University of Arizona. His research interests include distributed systems, fault-tolerant computing, and programming logics. Dr Schlichting is a member of the ACM and IEEE. He has also been a member of the IFIP Working Group 10.4 on Dependable Computing and Fault-Tolerance since 1986.