On the integration of Smalltalk and Java

On the integration of Smalltalk and Java

JID:SCICO AID:1628 /FLA [m3G; v 1.116; Prn:22/11/2013; 9:28] P.1 (1-17) Science of Computer Programming ••• (••••) •••–••• Contents lists available...

812KB Sizes 3 Downloads 44 Views

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.1 (1-17)

Science of Computer Programming ••• (••••) •••–•••

Contents lists available at ScienceDirect

Science of Computer Programming www.elsevier.com/locate/scico

On the integration of Smalltalk and Java Marcel Hlopko a,∗ , Jan Kurš b , Jan Vraný a,c , Claus Gittinger c a b c

Faculty of Information Technology, Czech Technical University in Prague, Thákurova 9, 160 00 Praha 6, Czech Republic Software Composition Group, Institut für Informatik, Universität Bern, Neubrückstrasse 10, CH 3012 Bern, Switzerland eXept Software AG, Talstraße 3, 74321 Bietigheim-Bissingen, Germany

h i g h l i g h t s • • • • •

We identify and solve problems imposed by Java and Smalltalk integration. STX:LIBJAVA, a JVM implementation built into Smalltalk/X is presented. Interoperability features of STX:LIBJAVA (e.g. dynamic proxy methods) are shown. We validated our implementation on various Java projects, e.g. Groovy or Tomcat. Performance comparation of Oracle JVM and STX:LIBJAVA is provided.

a r t i c l e

i n f o

Article history: Received 12 November 2012 Received in revised form 21 October 2013 Accepted 24 October 2013 Available online xxxx Keywords: Language interoperability Smalltalk Java

a b s t r a c t After decades of development in programming languages and programming environments, Smalltalk is still one of few environments that provide advanced features and is used in the industry. However, as Java became prevalent, the ability to call a Java code from Smalltalk became important. A traditional approach to integrate the Java and Smalltalk languages is through low-level communication between separate Java and Smalltalk virtual machines. To our best knowledge there is no other project attempting to execute and integrate the Java language directly in the Smalltalk environment. A direct integration allows for a very tight integration of the languages and their objects within a single environment. Yet integration and language interoperability impose challenging issues related to method naming conventions, method overloading, exception handling and thread-locking mechanisms. In this paper we describe ways to overcome these challenges and to integrate Java into the Smalltalk environment. We focus on a possibility to call a Java code from Smalltalk using standard Smalltalk idioms while the semantics of both languages remains preserved. We present stx:libjava – an implementation of a Java virtual machine within Smalltalk/X – as a validation of our approach. © 2013 Elsevier B.V. All rights reserved.

1. Introduction The Java programming language has become one of the most widely used programming languages today.1 A significant amount of code is written in Java, ranging from small libraries to large-scale application servers and business applications. Nevertheless, Smalltalk still provides a number of unique features (such as advanced reflection support or expressive exception mechanism) lacking in Java, which makes Smalltalk suitable for many kinds of projects. It is a tempting idea to call Java from Smalltalk and vice versa, as it enables the use of many Java libraries within Smalltalk projects.

* 1

Corresponding author. E-mail addresses: marcel.hlopko@fit.cvut.cz (M. Hlopko), [email protected] (J. Kurš), jan.vrany@fit.cvut.cz (J. Vraný), [email protected] (C. Gittinger). According to http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html.

0167-6423/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.scico.2013.10.011

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.2 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

2

Fig. 1. (a) Two languages using same intermediate program representation; (b) two languages each one using its own VM; (c) Smalltalk/X VM supporting two different languages and their intermediate program representation natively.

The idea of an integration of two languages is not new and has been explored several times in the past. In general, two different approaches have been used. One possible way is to compile one language into another’s virtual machine intermediate program representation (Fig. 1(a)). Examples of such an approach are JRuby,2 Scala3 or IronPython.4 Another example is Redline Smalltalk (Ladd [9]), which executes Smalltalk within Java virtual machine. Although this approach is popular, the disadvantage is that features not directly supported by other VM must be emulated on the object level and that affects the performance of an emulated language. Another approach is to run two virtual machines side by side and communicate via some kind of a communication channel (Fig. 1(b)). Examples of such an approach are JavaConnect (Brichau and De Roover [1]) and JNIPort (Geidel [4]) that uses foreign function interfaces, and bridges such as VisualAge for Java (Deupree and Weitzel [2]) or Expecco Java Interface Library (Expecco [3]) which communicate with a Java virtual machine over an interprocess communication channel. Proxy objects are used to intercept and forward function calls to the other environment. Yet another, more versatile approach, is to execute both languages within the same virtual machine and to use a common object representation for both (Fig. 1(c)). stx:libjava, which is presented in this paper, is an example of such an approach: it executes Java within the Smalltalk virtual machine. Seamless, easy to use integration of two programming languages consists of various parts. First, it must allow one language to call functions in the other, possibly passing argument objects and getting return values (runtime-level integration). Second, it should support a programmer-friendly argument and return value passing between the languages. Third, it should ideally preserve an object identity. The use of replicas or proxy objects can introduce various problems when objects are stored or managed by their identity. This also affects any side effects to such objects when calling functions in the other language. Finally, it should seamlessly integrate the two languages on the syntactic level, which means that (ideally) no additional glue or marshalling code should be required to call the other language (language-level integration). In the case of Java and Smalltalk, a language-level integration raises a number of challenges, due to their different design and semantics. In particular these are:

• Smalltalk uses keyword message selectors, whereas Java uses traditional C-like selectors (virtual function names). • Java supports a method overloading based on static types, whereas there is no static type information in Smalltalk. • Exception mechanisms differ. In Smalltalk the exception handler is executed before the stack unwind, on contrary to Java, where the exception handler is called after. This fact has a significant impact on the implementation of the stack handling in the virtual machine. • Locking mechanisms differ. In Smalltalk, locking is explicit and is provided by the standard library. In Java, locking is a part of the language and is handled by the runtime system, therefore implicit. In this paper, we present stx:libjava,5 a Java virtual machine (JVM) implementation built into the Smalltalk/X environment. stx:libjava allows the program to call a Java code from Smalltalk almost as naturally as a normal Smalltalk code. We will demonstrate how stx:libjava integrates Java into Smalltalk and we will describe how it deals with semantic differences. The contributions of this paper are (i) a new approach of a full Java and Smalltalk integration, (ii) an identification of problems imposed by such an integration and (iii) solutions for these problems and their practical validation in stx:libjava. The paper is organized as follows: Section 2 discusses integration problems in detail. Section 3 gives an overview of stx: libjava and its implementation. Sections 4, 5, 6, 7 and 8 describe techniques to solve these problems. Section 9 presents

2 3 4 5

http://jruby.org/. http://www.scala-lang.org. http://ironpython.net/. https://swing.fit.cvut.cz/projects/stx-libjava.

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.3 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

3

real-world examples to validate our solution. Section 10 presents performance benchmarks. Section 11 discusses related work. Finally, Section 12 concludes this paper. 2. Problem description At first glance Smalltalk and Java are very similar. Both are high-level, object oriented, class-based languages with single inheritance and automatic memory management. In both languages message sending is the fundamental way of communication between objects. In this section we enumerate details by which these languages differ and which pose problems when integrating these languages into a common system. 1. Class access. In Smalltalk, classes are identified by name. There is only one class with given name at any time (although it may change over time) and it must be present and resolved when code is being executed. The system triggers a runtime error otherwise. The process of loading classes into the system is not standardized, although some Smalltalk dialects provide namespaces and/or a lazy class loading facility. On the other hand Java provides a well-defined, type-safe, extensible mechanism called classloaders for lazy-loading of classes into a running system. Moreover, in Java a class is not only identified by its name but by its defining classloader as well. In other words two possibly different classes with the same name may coexist in the running system as long as they have been loaded by different classloaders. Which classloader is used to load a particular class at a particular place in the code is subject to complex rules and depends on the runtime context. 2. Field access. In Java, public fields can be accessed directly using the dot-notation whereas in Smalltalk, values of instance variables could only be accessed using accessor methods. Although in modern Java, declaring instance fields public and accessing them directly is considered as a “bad style”, public static fields are often used in Java libraries to expose constant values. 3. Selector mismatch. On the bytecode level, a method is identified by a selector in both languages. However, the syntactic format of selectors differs. Smalltalk uses the same selector in a bytecode and in a source code. The Java language encodes the type information into a selector at the bytecode level. For example, consider the following code in Smalltalk: out println: 10

Let’s assume that out is an instance of the Java class java.io.PrintStream. Smalltalk compiles a message send with the selector println:. However, the Java selector as it would be used by Java compiler is println(I)V. The added suffix (I)V means that the method takes one argument of the Java type int and V means that the method does not return a value (it’s declared to return void). 4. Method overloading. Java has a concept of overloading: in a single class, multiple methods with the same name but with different numbers or types of arguments may coexist. For example, the PrintStream instance from the previous example has multiple methods by the name println, one taking an argument of type int, another taking argument of type String argument, and so on. On the virtual machine level, each overloaded method has a different selector (println(I)V, respectively println(Ljava/lang/String;)V). The method which is called depends on the static types of the arguments at the call site. At compile time the Java compiler finds the best match and generates a send with the particular selector. For example, the source code

out.println("String"); out.println(1); produces the following bytecode with type information encoded in method selectors (arguments of INVOKEVIRT instruction):

ALOAD ... LDC 1 INVOKEVIRT println(Ljava/lang/String;)V BIPUSH 1 INVOKEVIRT println(I)V Smalltalk is dynamically typed and no type information is known at the compile time. Therefore the Smalltalk compiler cannot generate message sends using type-specific selector as Java compiler does. 5. Protocol mismatch. Comparing Java and Smalltalk side by side, many classes with the similar functionality are found. For example both Java’s java.lang.String and Smalltalk’s String class represent a String type. More complex examples are java.util.Map and Smalltalk’s Dictionary class. Here, the classes have different names, possibly have different method names, but their purpose and usage are similar – both store objects under a particular key.

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.4 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

4

When using code from both languages the programmer has to take care which kind of object (Java or Smalltalk) is passed as an argument. For example the hash of a string is obtained by sending the hashCode message in Java but hash in Smalltalk. Passing a Smalltalk String as an argument to a Java method which expects a Java-like String may result in a MethodNotFound exception. 6. Exceptions. The exception mechanisms in Smalltalk and Java differ in two very profound ways: First, in Smalltalk both exception and ensure-handlers are blocks. Blocks are self-contained first-class closures that can be manipulated and executed separately. In Java, exception and finally-handlers are syntactic entities and technically generate a sequence of bytecode instructions which are spliced into the instruction stream. They can be only executed by jumping to the beginning of the handler and then resuming the execution from that location. The method in question contains a special exception table which maps instruction ranges to the beginning of a particular handler. Second, Smalltalk provides resumable exceptions. When an exception is thrown (raised) in Smalltalk, a handler is executed on top of the context that caused that exception. The underlying stack frames are still present during the execution of the handler. If the handler returns, i.e., the exception is not “proceeded”,6 ensure blocks are executed after the exception handler. Java provides only return semantics. When a handler is found, all contexts up to the handler context are immediately unwound and the execution is resumed at the beginning of the handler. Any finally block is treated like a special exception handler, which matches any exception. In other words, from the programmer’s point of view the main difference between Smalltalk and Java exceptions is that in Smalltalk, ensure blocks are eventually executed after the handler, if the handler decides to return. In contrast, finally blocks of Java are always executed and their evaluation happens before the execution of the handler. A Smalltalk handler is even free to dynamically decide whether to proceed with execution after the exception is raised. Such proceedable exceptions are useful to continue execution after fixing a problem in the handler, or to implement queries or notifications (for example, to implement loggers or user interaction). 7. Synchronization. The mechanisms for process synchronization differ both in design and implementation. The principal synchronization mechanism in Java is a monitor. Conceptually a monitor is associated with every Java object. Synchronization is done using two basic operations: entering the monitor, which may imply a wait on its availability and leaving the monitor (Section 8.13, Lindholm and Yellin [10]). Monitor enter and leave operations must be properly nested. In Java, whole methods can be marked as synchronized, in which case the Java virtual machine itself is responsible for entering the monitor associated with the receiver and leaving it when the method returns. This happens for both normal and unexpected returns, for example due to unwinding after an uncaught exception. For synchronized blocks, the compiler emits MONITORENTER and MONITOREXIT bytecodes and ensures that no monitor remains entered, no matter how the method returns. (Section 8.4.3.6, Gosling et al. [6], Section 3.11.11, Lindholm and Yellin [10]). The compiler achieves this by generating special finally-handlers, which leave the monitor and rethrow the exception. In Smalltalk (Smalltalk/X, in particular), processes are usually synchronized by semaphores. The programmer is responsible for proper semaphore signaling, although library routines provide support for critical regions. Technically speaking, there is no support at the virtual machine level for semaphores, except for a few low-level primitive methods. Now consider the following code: 1 [ 2 self doSomething 3 ] on: SomethingFailedError do: [ :ex | 4 self handleError: ex 5 ]

Execution of doSomething can result in executing synchronized Java methods, and then calling a Smalltalk method. An exception can be thrown in Smalltalk, and when stack unwinds, all entered monitors have to be left for system to run correctly. A system which integrates Java and Smalltalk must therefore ensure that all monitors possibly entered during the execution of the doSomething method are left when a SomethingFailedError is thrown and caught by the handler. Since Smalltalk and Java exception and locking models differ, a special care has to be taken to ensure the system remains in a consistent state and locks are properly signaled. Solutions to above problems are discussed in details in following sections, namely:

• • • •

Class Access in Section 4 Field Access in Section 5 Selector Mismatch in Section 6.1 Method Overloading in Section 6.2

6 Smalltalk allows for exceptions to be “proceeded”, which effectively means that the execution is resumed at the point where the exception was thrown (raised).

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.5 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

1 2 3 4 5 6 7 8 9 10 11 12

5

factory := JAVA javax xml parsers SAXParserFactory newInstance. parser := factory newSAXParser getXMLReader. parser setContentHandler: JavaExamples::CDDatabaseHandler new. [ parser parse: ’cd.xml’. ] on: JAVA java io IOException do:[:ioe| Transcript showCR: ’I/O error: ’, ioe getMessage. ioe printStackTrace ] on: UserNotification do:[:un| Transcript showCR: un messageText. un proceed. ]

Listing 1: Smalltalk code calling Java XML parser.

• Protocol Mismatch in Section 6.3 • Exceptions in Section 7 • Synchronization in Section 8 3. STX : LIBJAVA stx:libjava is an implementation of the Java virtual machine built into the Smalltalk/X environment. In addition to providing the infrastructure to load and execute Java code, it also integrates Java into the Smalltalk development environment, including browsers, debugger and other tools. Calling Java from Smalltalk is almost natural. Listing 1 demonstrates how a Java library can be used from Smalltalk. In the example, an XML file is parsed and parsed data are printed on a system transcript. The XML file is parsed by the SAX parser, which is completely implemented in Java. However, the SAX events are processed by a Smalltalk object – an instance of CDDatabaseHandler (see Listing 2). This example demonstrates stx:libjava interoperability features:

• Java classes are referred to via a sequence of (unary) messages, which comprise the fully qualified name of a class. Java classes can be reached via the global variable JAVA. For example, to access java.io.File from Smalltalk, one may use7 : JAVA java io File

• To overcome selector mismatch errors, stx:libjava intercepts the first message send and automatically creates a dynamic proxy method. These dynamic proxy methods translate the selector from Smalltalk keyword form into a correctly typed Java method descriptor (or vice versa) and pass control to the corresponding Java (or Smalltalk) method. • stx:libjava provides a bridge between Java and Smalltalk exception models. Java exceptions can be handled by Smalltalk code. When a Smalltalk exception is thrown and the handler returns, all Java finally blocks between the raising context and handler context are correctly executed and all possibly entered monitors are left. No stale Java locks are left behind. 3.1. Architecture of stx:libjava In this section we will briefly outline stx:libjava’s internal architecture. Unlike other projects which integrate Java with other languages, stx:libjava does not use the original JVM in parallel with the host virtual machine, nor does it translate Java source code or Java bytecode to any other host language. Instead, the Smalltalk/X virtual machine is extended to support multiple bytecode sets and execute Java bytecode directly. To our knowledge, Smalltalk/X and stx:libjava is the only available programming environment that took this approach. The required infrastructure for loading .class (Chapter 4, Lindholm and Yellin [10]) files, class loader support and additional support for execution, such as native methods, is implemented in Smalltalk. Java runtime classes and methods are implemented as customized Smalltalk Behavior and Method objects. In particular, Java methods are represented as instances of subclasses of the Smalltalk Method class. However, they refer to the Java bytecode instead of the Smalltalk bytecode. Execution of the Java bytecode is implemented in the virtual machine. In the same way that the Smalltalk bytecode is handled by the VM, the Java bytecode is interpreted and/or dynamically compiled to the machine code (jitted). However, some complex instructions (such as CHECKCAST or MONITORENTER) are handled by the virtual machine calling back into the Smalltalk layer via a so-called trampoline and are implemented in Smalltalk as a library methods. Similarly, native methods are implemented in Smalltalk.

7

Alternatively, the class can also be referred to via a sub-namespace, as JAVA::java::io::File.

JID:SCICO AID:1628 /FLA

6

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.6 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

1 CDDatabaseHandler>>startElement:namespace localName:localName qName:qName attributes:attributes 2 tag := qName. 3 4 CDDatabaseHandler>>endElement:namespace localName:localName qName:qName 5 qName = ’cd’ ifTrue:[ 6 title isNil ifTrue:[self error: ’No title’]. 7 artist isNil ifTrue:[self error: ’No artist’]. 8 index := index + 1. 9 UserNotification notify: 10 (index printString , ’. ’, title , ’ - ’ , artist) 11 ] 12 13 CDDatabaseHandler>>characters: string offset: off length: len 14 tag = ’title’ ifTrue:[ 15 title := string copyFrom: off + 1 to: off + len. 16 tag := nil. 17 ]. 18 tag = ’artist’ ifTrue:[ 19 artist := string copyFrom: off + 1 to: off + len. 20 tag := nil. 21 ].

Listing 2: An excerpt of CDDatabaseParser used in Listing 1.

1 JavaVM 2 classForName: ’org.junit.TestCase’ 3 definedBy: classLoaderObject

Listing 3: Accessing Java class with known class loader.

1 cf := 2 * (JAVA java lang Math PI) * r

Listing 4: An example of accessing Java fields from Smalltalk. Both Smalltalk and Java objects live in the same object memory and are handled by the same object engine and garbage collector. Performance-wise, there is no difference between Smalltalk code calling a Java method or other Smalltalk code. Moreover, all dynamic features of the Smalltalk environment – such as stack reification and advanced reflection – can be used on the Java code. The main disadvantage of our approach (as opposed to having a separate original JVM execute Java bytecodes) is that the whole functionality of the Java virtual machine has to be reimplemented. This includes an extensive number of native methods, which indeed involve a lot of engineering work. However, we believe that this solution opens possibilities to a much tighter integration (as listed below) which would not be possible otherwise. 4. Class access In Smalltalk classes are stored by name in the global Smalltalk dictionary. Obviously, this dictionary cannot be reused by the Java subsystem, as Java classes are specified by name and defining class loader. Therefore loaded classes are accessed through JavaVM class. Listing 3 shows its usage in case of a known class loader instance. As already shown in Listing 1, the JAVA global variable is provided to refer to a Java class using the current class loader. Some approaches integrating JVM with the host virtual machine via foreign-function interfaces (e.g., JNIPort or JavaConnect) suffer from the inability to reclaim classes which have entered the JNI (Java native interface). In stx:libjava all loaded classes are reclaimed in compliance with the JVM specification (Section 12.7, Gosling et al. [6]). 5. Field access To solve the field access problem, stx:libjava allows for public fields to be accessed from Smalltalk in a Smalltalk way, i.e., by access methods. These accessor methods are dynamically compiled when needed using mechanism described in Section 6. Consider the example given in the Listing 4 in which public static field PI of a class java.lang.Math is accessed. The stx:libjava interoperability mechanism catches the send of the message #PI to the class and automatically generates a getter method returning the corresponding field. Accessor methods are generated only for Java fields declared as public. If the field is declared as final, only a getter method is generated. The same mechanism is used to access both static and instance fields.

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.7 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

7

1 out := JAVA java lang System out. 2 out toString.

Listing 5: Example of selector mismatch.

1 java.io.PrintStream>>toString 2 ↑ self perform: #’toString()Ljava/lang/String;’

Listing 6: A proxy method for println() Java method.

1 out := JAVA java lang System out. 2 out println: 10. 3 out println: ’Hello World\n’.

Listing 7: An example of overloaded method called from Smalltalk. If there is a method without any parameters and with the same name as the field, an error is reported as it is not clear whether Smalltalk code accesses the field or calls a method. However, such cases are extremely rare in a real Java code. 6. Dynamic Proxy Methods Dynamic Proxy Method is a mechanism employed by stx:libjava to solve selector and protocol mismatch and to deal with Java’s method overloading. A proxy method is an intermediate method that possibly performs an additional method resolution, transforms arguments and finally passes the control to a real method depending on the type and number of arguments. Such a proxy is generated dynamically whenever the control flow crosses the language boundary, i.e., when Smalltalk calls Java or vice versa. In the following sections we describe in detail how dynamic method proxies solve the problems outlined in the previous sections. We will demonstrate proxies on examples of Smalltalk calling Java; the actions performed in the opposite call direction are analogous. 6.1. Selector mismatch Consider the example given in Listing 5. Without additional interoperability support, a DoesNotUnderstand exception would be raised, since there is obviously no method for the toString selector in the Java PrintStream class, there is only a method with the toString()Ljava/lang/String; selector. The stx:libjava’s interoperability mechanism catches cross-language message sends and dynamically generates a dynamic proxy method for the original selector which invokes target method – toString()Ljava/lang/String; in this case. The code of the proxy is shown on Listing 6. The proxy is compiled on the fly and installed into the receiver’s class and the original message-send is restarted. This way a proxy method is generated only for the very first time and subsequent sends will use the “fast path”, directly invoking the already generated proxy. Details on how sends are intercepted are discussed below in Section 6.5. The algorithm used to select the target method is discussed later in Section 6.4 6.2. Method overloading To exercise method overloading, we alter the previous example by adding two subsequent calls to an overloaded method as depicted in Listing 7. The first call should invoke Java method println(int), the second one println(String). After the execution of line 2, a new proxy method has been added to the java.io.PrintStream class. Based on previous example, one may expect a proxy method like the one shown in Listing 6. Such a proxy method will not work correctly in this example as it would invoke Java method println(int) with an argument of type String. Due to the dynamic nature of Smalltalk, argument types cannot be statically inferred and may even change during execution. Therefore, another method resolution step has to be added to the proxy method. To ensure type-safety (as required and assumed by the Java code) the actual call to the Java method is protected by a “guard” that checks the actual argument types. The code for the println: proxy after execution of line 2 is shown in Listing 8. Only one guard is added at a time. Line 5 ensures that if no guard matches – like when line 3 of the example from Listing 7 is executed – the proxy is recompiled, possibly adding a new guard. Line 6 restarts the send. This way, the proxy method contains dispatching code (a guard and a target method call) only for overloaded methods that are dynamically encountered. An alternative implementation based on multiple dispatch could be implemented by dynamically installed double dispatch methods in the encountered argument classes. However, as the number of dynamically encountered argument types is usually relatively small, we believe that the switch code on the argument class is usually sufficient and faster.

JID:SCICO AID:1628 /FLA

8

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.8 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

1 java.io.PrintStream>>println: a1 2 (a1 class == SmallInteger) ifTrue:[ 3 ↑ self perform: #’println(I)V’ with: a1 4 ]. 5 self recompile: a1. 6 ↑ self println: a1.

Listing 8: A proxy method for println() Java method.

1 java.io.PrintStream>>println: a1 2 | a1j | 3 4 (a1 class == SmallInteger) ifTrue:[ 5 ↑ self perform: #’println(I)V’ with: a1 6 ]. 7 (a1 class == String) ifTrue:[ 8 a1j := (Java classForName: ’java.lang.String’) javaWrap: a1. 9 ↑self perform: #’println(Ljava/lang/String;)V’ with: a1j. 10 ] 11 12 self recompile: a1. 13 ↑ self println: a1.

Listing 9: A proxy method with argument and return value conversions. Although it might seem that a polymorphic behavior is lost when concrete classes are hard coded in the method, it precisely mimics the behavior of the Java compiler when handling overloaded methods. Because Java is statically typed language, which overloaded method will be called can be statically determined. The only difference is that Java compiler solves the problem at compile time, and our proxy methods implement lazy, dynamically-updated resolving of the overloaded methods at runtime. 6.3. Protocol mismatch Unfortunately in real applications, it is not enough only to translate a Java selector to a Smalltalk selector and vice versa, and solve method overloads. A protocol – a set of methods an object understands – differs. Smalltalk code can send a #is Empty message to a java.lang.String (because it contains the isEmpty() method), but it cannot send #collect All:, as there is no such method in a Java string. Similarly, to store a value under some key into an instance of Smalltalk Dictionary one should use at:put: method whereas in Java the name of the method is just put. stx:libjava provides two different ways how to cope with mismatching protocols: automatic conversion and cross-language method extensions. Automatic conversion. When crossing a language boundary, certain types are automatically converted to the corresponding form for the target language. This conversion is done in the dynamic proxy method. Let’s take the previous example (Listing 7). To execute line 3 of the example, one cannot simply invoke Java method println(String) as the Java method expects instance of java.lang.String, not a Smalltalk String. Hence the Smalltalk string must be converted to a Java string before the Java method is executed. Listing 9 shows the code of the dynamic proxy method after execution of line 3 of Example 7. If the method returns a value, its return value is converted back if required. Currently, stx:libjava does the automatic conversion for all Java primitive and wrapper types and for strings. However, any type could be automatically converted as long as it implements javaWrap: and javaUnwrap: methods. Object conversion by principle does not preserve object identity and therefore cannot be used when identity matters. Our experience shows, that in case of wrapper objects and strings this is not an issue in practice. Cross-language method extensions. stx:libjava allows for extension methods to be defined on Java class. In other words, any Java class can be extended by a Smalltalk method. While it is technically possible to extend Smalltalk class with a Java method (i.e., there is no limitation in the underlying runtime system), due to the limitations of Java language and existing Java compilers there is currently no way how to compile such an extension method. 6.4. Method resolution There are numerous ways how to select Java method for given Smalltalk selector or vice versa. This section does not discuss any pros and cons of possible approaches. We do not believe that there is only one best way how to translate selectors. stx:libjava simply uses the way which showed to be the most natural, intuitive and easy for our purpose. stx:

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.9 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

9

resolve-s2j(receiver, selector, arguments) { candidates = { } matches = { } name = first-keyword-of(selector) nargs = num-args(selector) for m ∈ all-methods-of(receiver.class) { if (is-java-method(m) ∧ m.name = name ∧ m.nargs = nargs) { candidates = candidates ∪ { m } } } if (not-empty(candidates)) { for m ∈ candidates { for formal ∈ formal-arguments-types-of(m), actual ∈ actual-arguments-types-of(arguments) { if match(actual, formal) { matches = matches ∪ { m } } } } return (candidates, matches) }

Listing 10: A Smalltalk to Java Method Resolution Algorithm.

libjava design allows for an easy changes of the actual method resolution algorithm. This section describes the resolution algorithm that is currently used in stx:libjava. Smalltalk to Java method selection. Listing 10 shows an outline of a method resolution routine used when Smalltalk code sends a message to Java object.8 This routine is called whenever a target method in the other language has to be resolved. It returns a pair of sets: candidates and matches. Candidates is a set of all eligible methods based on given selector and number of arguments. For Java, candidates contains all overloaded methods for given selector regardless of parameter types. The second set – matches – contains only those candidates whose formal parameter types match actual parameter types. Note, that this routine is called in runtime so actual parameters are known. If returned matches set is empty, then message is not understood and #doesNotUnderstand: message is sent as usual. If matches contains more than one method, then the send is ambiguous and an error is reported. Otherwise, a proxy method is generated or updated, calling the matching method. The candidates set is used by proxy method generator to determine if there are overloaded methods. If not, then no guard is compiled (like in Example 6), resulting in faster cross-language call for non-overloaded methods. The actual method resolution algorithm is the following: 1. First, a set of candidates is computed (lines 4–15 in Listing 10). A method is considered to be a candidate if number of arguments matches and if method name is the same as first keyword of the Smalltalk selector. For example, Java method declared as put(Object k, Object v) would match both #put:value: and #put:with:. The test whether the method is actually a Java method is required as Java classes may contain Smalltalk extensions. 2. Second, a set of matching methods is computed (lines 12–20). A method is considered matching if actual parameter types match formal parameter types. Type matching rules are straightforward. Java primitive types are matched by primitive values and wrapper objects (e.g., formal parameter type int is matched by instances of java.lang. Integer and by instances of SmallInteger). Smalltalk types match java.lang.Object, and for everything else the same rules as for casting (Section 5.5, Gosling et al. [6]) are applied. If the candidates set contains two methods declared as foo(int i) and foo(Integer i) and a primitive integer (an instance of SmallInteger in Smalltalk) is passed, then both Java method match. This results in ambiguous message send error. Same happens if there is a subclass-superclass relationship between types of overloaded methods – both will match and thus lead into runtime error. Again, technically stx:libjava can choose either method and do a best-match but we think that would lead into confusing and non-intuitive behavior. Java to Smalltalk resolution. Listing 11 shows an outline of a routine used when a Java code sends a message to a Smalltalk object. The interface is the same as in the previous case – it takes a receiver, a selector and arguments and returns a pair of candidates and matches. In Smalltalk there is no notion of overloaded methods so candidates will never contain more

8

Here we use a C-like pseudo code to distinguish it from other examples. In stx:libjava, the actual implementation is written in Smalltalk.

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.10 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

resolve-j2s(receiver, selector, arguments) { candidates = { } matches = { } for m ∈ all-methods-of(receiver.class) { if (get-annotation(m, "javaselector") = selector ∧ m.nargs = nargs) { candidates = candidates ∪ { m } } } if (empty(candidates)) { name = name-part-of(selector) nargs = num-args(selector) for m ∈ all-methods-of(receiver.class) { if (first-keyword-of(m.name) = name ∧ m.nargs = nargs) { candidates = candidates ∪ { m } } } } matches = candidates return (candidates, matches) }

Listing 11: A Java to Smalltalk Method Resolution Algorithm. than one method. Also, as there is no static type information in Smalltalk, any candidate also matches. However to make the system more orthogonal and extensible, both sets are returned as in previous case. The actual method resolution algorithm is the following: 1. First, it searches all methods for those annotated by a special annotation (lines 5–9). Any method that is explicitly annotated and the annotation value matches the Java selector is considered a matching method and returned. This allows for “method aliasing” – for example, one can annotate Smalltalk method #printString to match Java selector toString()L...String; so invoking toString() in Java on Smalltalk object would invoke Smalltalk #printString. This can be also used to simulate overloaded methods on Smalltalk objects when used from Java code. 2. If no method is explicitly annotated, methods are searched by matching their names and number of arguments (lines 11–19) in the same spirit as in the previous case. Note, that for a Java selector at(Object k, Object v) both #at:put: and #at:ifAbsent: Smalltalk methods would match and therefore would lead into an ambiguous message send. To avoid ambiguity, programmer may annotate those methods explicitly so they are found in the first step. Variable number of arguments. Starting with Java 1.6, Java supports a variable number of arguments (Section 8.4.1, Gosling et al. [6]). Technically, variable arguments are wrapped into an array object and passed to the method as a single argument. A Java compiler is responsible for generating code that wraps variable arguments. stx:libjava does not provide any special facilities to handle variable argument method when called from Smalltalk – the programmer must manually pass an array. stx:libjava could do it automatically but we think that would lead into non-intuitive behavior. 6.5. Intercepting the message send To install a proxy, a message send must be intercepted. A standard Smalltalk solution would be to override the

#doesNotUnderstand: method so it creates and installs the generated proxy method. However, stx:libjava utilizes the method lookup meta-object protocol (MOP; Vraný et al. [12]) which is integrated into the Smalltalk/X virtual machine. The MOP allows for a user-defined method lookup routine to be specified on a per-class basis. This user-defined method lookup routine installs the proxy and invokes it. 7. Exceptions As already mentioned in Section 2, the exceptions in Java and Smalltalk differ in multiple important aspects. Concrete example and a deeper explanation of the problem is presented in Hlopko et al. [7]. In this section we present the concrete algorithm for handling both Smalltalk and Java exceptions in the single environment while preserving the individual semantics. Listing 12 shows the algorithm for throwing Smalltalk exceptions in the pseudo code. Similarly Listing 13 shows the algorithm for Java exceptions.

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.11 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

11

1 throw-st-exception(ex, this-context) { 2 for each context in context chain having ensure-handler { 3 if (is-java-context(context)) { 4 reset-for-java-exception(context, create-special-finally-token(ex)) 5 unwind-and-restart-java-exception(context) 6 } else { 7 (value, action) = eval-handler(context) 8 switch (action) { 9 case ’proceed’ : 10 return-in(this-context, value) 11 case ’reject’ : 12 continue 13 case ’return’ : 14 eval-unwind-handlers-between(this-context, context) 15 unwind-stack-up-to(context) 16 return-in(handler-context, value) 17 case ’restart’ : 18 eval-unwind-handlers-between(this-context, context) 19 unwind-stack-up-to(context) 20 restart-context(context) 21 } 22 } 23 } 24 }

Listing 12: Smalltalk Exception Throwing Algorithm.

1 throw-java-exception(ex, this-context) { 2 if (is-special-finally-token(ex)) { 3 continue-unwinding-the-stack(ex) 4 } 5 6 handler-context = find-handler-for(ex, this-context) 7 8 if (is-java-context(handler-context)) { 9 reset-for-java-exception(handler-context, ex) 10 unwind-and-restart-for-java-exception(handler-context) 11 } else { 12 value = eval-handler(handler-context) 13 eval-unwind-handlers-between(this-context, handler-context) 14 unwind-stack-up-to(handler-context) 15 return-in(handler-context, value) 16 } 17 }

Listing 13: Java Exception Throwing Algorithm. Lines 8–20 of Listing 12 show the standard handling of Smalltalk exceptions. A Smalltalk exception handler can proceed the exception, meaning the execution will continue in the context which raised the exception (line 10), reject the exception, meaning next handler higher on the stack has to be found (line 12), return, meaning the execution will continue in the context which contains the handler (lines 14–16), and finally restart, meaning the context containing the handler is restarted (lines 18–20). Both return and restart actions do unwind the intermediate contexts. A Smalltalk context can register an unwind-handler, which is called when the stack is unwound. When a handler returns or restarts these unwind-handlers have to be called. The function eval-unwind-handlers-between is responsible for this. When a Java exception is thrown, a handler context is found (line 6 in Listing 13). If a Smalltalk context is found (lines 12–15), the return action is performed, as that is the semantically correct behavior of Java exceptions. Unwind-handlers are executed (line 13) and the intermediate contexts are unwound (line 14). The execution continues in the context which contains the handler (line 15). When a Java context is found, the handling is more complicated. Since the Smalltalk/X virtual machine uses C-like stack for performance reasons and because the Java exception handlers are not self-contained callable objects, the only way to execute the finally code is to (i) set the program counter to the beginning of the handler and (ii) restart the method. reset-for-java-exception and unwind-and-restart-java-exception shown on lines 9 and 10 in Listing 13 are responsible for that. The problem arises when a Smalltalk exception is thrown, and there are Java contexts with finally-handlers on the stack. If the execution after the exception is returned or restarted, the stack needs to be unwound. Normally in Smalltalk the unwinding is controlled by the context raising the exception. This is not possible in our situation, we need to restart all Java

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.12 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

12

Fig. 2. Example of synchronized block in Java.

contexts with finally handlers for the handlers to be evaluated, and restarting implies that the context raising the exception must be destroyed. We need to restart the Java context, but after the finally-handler ends, we need to continue to execute remaining handlers. To solve this problem two facts are exploited. First, at the time when Java finally handler is executed, we already know that the contexts below the handler context are going to be destroyed. Second, Java finally blocks have no access to the exception being thrown. Java compiler generates synthetic local variable slot where the exception is temporarily stored and emits ALOAD and ATHROW instructions at the end of the finally-handler. These instructions push the exception back onto the stack and rethrow it. The exception object can be any object, not necessarily a Java object inheriting from Throwable. Our solution is to create a special finally token (line 4 in Listing 12) which is used as an exception and which is passed to the finally handler. When the handler execution reaches the ATHROW instruction, this special finally token is recognized by the ATHROW instruction handling code in the Smalltalk/X virtual machine (lines 2–4 in Listing 13). Upon detection of that special token, ATHROW handling code knows it should not raise the exception, but it has to continue to evaluate finally and ensure blocks (line 3 in Listing 13). 8. Synchronization The Java runtime library has been designed to be thread-safe. Critical sections guarded by monitors are pervasive through the code. Correct handling of monitors in case of mixed Java–Smalltalk code is essential as a single leftover monitor can easily result in blocking the whole system. Prior to entering a critical section, a monitor is entered, and must be left at the end of the section. This is done either implicitly by the virtual machine (when the whole method is marked as synchronized) or explicitly using special MONITORENTER and MONITOREXIT instructions. Those implement fine-grain synchronization of Java’s synchronized blocks. 8.1. Synchronized blocks When an exception occurs during the execution of a synchronized block, a guarding monitor must be left before the control flow is passed to a handler higher up on an execution stack.9 To ensure this, a Java compiler must generate a synthetic finally block in which the monitor is left and the exception is rethrown (Section 7.13, Lindholm and Yellin [10]). Fig. 2 shows an example of such a finally block. Consider the code shown in Fig. 2. Because an exception could be thrown in the synchronized block (bytecode lines 2–10), the compiler generates a special handler (bytecode lines 11–15, Fig. 2), in which the monitor is left and the exception is rethrown. 8.2. Synchronized methods In the case of synchronized methods, neither MONITORENTER / MONITOREXIT instructions, nor synthetic finally blocks are generated by the Java compiler. Instead, synchronization is done directly by the virtual machine. The monitor used for a synchronization is the monitor associated with the receiver or (in case of a static method) with the class.

9

Assuming that the stack grows downwards.

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.13 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

13

Prior to executing a synchronized method, the virtual machine enters the monitor associated with the receiver and tags the context as an unwind context. The VM intercepts returns through such tagged contexts and trampolines into the Smalltalk level for any unwind actions to be executed. In this case, the monitor-leave semantic is performed. 9. STX : LIBJAVA at work As mentioned in Section 1, language integration consists of runtime-level and language-level integration. Runtime-level integration. Several reasonably large Java projects have been chosen as benchmarks to validate the correct implementation of Java runtime support. These projects include:

• Apache Tomcat10 – a Servlet/JSP container, • SAXON11 – an XSLT and XQuery processor, • Groovy12 – a dynamic language for Java platform. Apache Tomcat makes heavy use of almost all Java features, from threads and synchronization, through dynamic class generation, class loading and finalization, to exceptions and finally blocks. Tomcat running on stx:libjava is able to start and stop cleanly, serve applications, deploy a new application while running and so on. Groovy makes heavy use of Java reflection and especially class loading as it dynamically generates Java bytecode and loads it into a running system. SAXON has been successfully used to transform the whole Shakespeare’s work into a set of HTML files using custom XSLT stylesheets. Language-level integration. Language-level interoperability was already demonstrated in Listings 1 and 2, which use the Xerces XML parser to parse an XML file. The filename is passed into the Java method as a Smalltalk string. The SAX handler which is passed to the parser and later called for decoded elements is actually implemented in Smalltalk. No boilerplate code is needed whatsoever. Tool support. A Java implementation in Smalltalk would not be complete without support in the development tools. Java classes can be browsed using the standard Smalltalk class browser, and Java objects or classes can be inspected in the Smalltalk inspectors. For programmer convenience, specialized inspectors are provided for specific Java classes such as Vector, List, Set or Map. Java is also fully supported by the debugger – breakpoints may be set on methods and Java code can be single stepped and debugged just like Smalltalk. A Groovy interpreter has been integrated into the Workspace application and into the debugger so programmers can quickly test Java code (doIt), in just the way they are used to with Smalltalk. Also, JUnit13 has been integrated into the standard tools to allow JUnit tests to be run directly from Smalltalk/X IDE.14 Java classes can be unloaded and then a possibly new version can be loaded again at run-time – a feature missing in other Java virtual machines. Another feature not currently present in other Java virtual machines is a runtime code update support. An existing Java class can be edited and then “accepted” in the class browser. A standard Java compiler is then invoked and an updated class is reloaded into the running system (Hlopko et al. [8]). 10. Performance analysis A natural question to raise concerns about an actual performance of stx:libjava. This section presents some performance analysis comparing stx:libjava with: (a) (b) (c) (d)

with with with with

a a a a

Smalltalk code running on Smalltalk/X VM (the same VM used to run stx:libjava) Java code running on Oracle JVM 1.6.0_27 Ruby code running on Ruby 1.9 VM (known as YARV) Ruby code running on JRuby 1.7 on Oracle JVM 1.6.0_27

Setup. The computer used for following experiments was i5-3360M 2.00 GHz, 12 GB RAM, running Linux 3.5.0 and a Smalltalk/X 6.2.3β 15 using an experimental, non-inlining JIT compiler for Java. Each benchmark was given a warm-up execution before the actual time measurement was performed to allow for caches to be filled and JIT compilers to adapt and generate optimal code. All benchmarks were run 1000 times. We took the

10 11 12 13 14 15

http://tomcat.apache.org. http://saxon.sourceforge.com. http://groovy.codehaus.com. www.junit.org. https://swing.fit.cvut.cz/projects/stx-libjava/wiki/Media. https://swing.fit.cvut.cz/jenkins/job/stx_jv/.

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.14 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

14

Table 1 Performance comparison of Oracle JVM, Smalltalk/X and stx:libjava. Benchmark

JVM [msec]

StX-S [msec]

StX-J [msec]

Ruby [msec]

JRuby [msec]

StX-J/JVM [1]

StX-J/StX-S [1]

Ruby/StX-S [1]

JRuby/JVM [1]

Ackerman Ary Hash Strcat

5 69 8 180

36 1352 4 559

38 2097 471 552

1757 13 404 8 500

1120 15 153 106 899

7.60 30.39 58.88 3.07

1.06 1.55 117.75 0.99

48.81 9.91 2.00 0.89

224.00 219.61 13.25 4.99

Groovy Saxon

56 109

N/A N/A

201 224

N/A N/A

N/A N/A

N/A N/A

N/A N/A

N/A N/A

3.59 2.06

Legend: JVM – time to run on Oracle JVM, 32 bit server mode, version 1.6.0_26-b03. StX-S – time to run equivalent Smalltalk code on Smalltalk/X VM, Smalltalk JIT compiler enabled. StX-J – time to run on Smalltalk/X VM using stx:libjava, Java JIT compiler enabled. Ruby – time to run on Ruby 1.9. StX-J/JVM – how many times is stx:libjava slower than Oracle JVM. StX-J/StX-S – how many times is stx:libjava slower than equivalent Smalltalk code run under same VM. Ruby/StX-S – how many times is Ruby slower than Smalltalk/X. JRuby/JVM – how many times is JRuby slower than Oracle JVM.

minimum value as that is the most accurate benchmark value with the least operating system interference. Full source of the benchmarks as well as the scripts to run them is available.16 10.1. stx:libjava compared to Oracle JVM, Ruby, JRuby and Smalltalk Table 1 shows the results of our comparison of Oracle JVM with a Java and Smalltalk code run on Smalltalk/X VM and with Ruby and JRuby. Following benchmarks were run: Ackermann – a micro benchmark that computes value of Ackermann function A (3, 8), Ary – a micro benchmark stressing array access, Hash – a micro benchmark that inserts into and retrieves elements from a hash table, Strcat – a micro benchmark that performs a string concatenation, Groovy – a macro benchmark that uses a Groovy compiler to evaluate a simple expression, Saxon – a macro benchmark performing an XSLT transformation of a simple XML document. STX : LIBJAVA compared to JVM. Macro-benchmarks show that stx:libjava is only 2–4 times slower than Oracle JVM when running the same Java application. The micro benchmarks show, however, that stx:libjava is 3–60 times slower than Oracle JVM, depending on the code. The difference in Hash benchmark between Java and Smalltalk is caused by the very different implementation of hash tables: Java uses linked-list hashtables whereas Smalltalk/X uses open hashing without quadratic probing. In general, the performance overhead of stx:libjava over Oracle JVM is caused by three main reasons:

1. First, Smalltalk/X VM together with stx:libjava provides the features not currently available in Oracle JVM such as stack reflection, restartable methods or runtime code updates17 [8]. Support for such features naturally comes at some costs. For example, to support stack reflection a special attention has to be paid for case when the stack frame outlives the execution of the corresponding method. In that case, the frame has to be copied to the heap. This requires an additional check in the GC barrier and on the method return. As an example, for Groovy and Saxon benchmarks, about 20–40% of the execution time is spent by these checks and by copying the contexts to the heap. 2. Second, in stx:libjava even the primitive datatypes like int, long or null value are represented as objects. This greatly simplifies the Java–Smalltalk interoperability as in Smalltalk everything is an object. Also, it makes a GC implementation easier. On the other hand, it brings some problems which are most visible on integers. JVM specifies that an int is a 32 bit signed integer. Smalltalk/X uses tagged pointers. In stx:libjava, an int value is represented either as an instance of SmallInteger (which is on the machine level just a tagged pointer) or LargeInteger (which is a full-blown object allocated on the heap). As a consequence, even the simple IADD instruction must handle all possible combinations (two SmallIntegers, SmallInteger and LargeInteger, . . . ). Even in the most common case i.e., two SmallIntegers, values must be de-tagged and the result must be tagged again, along with overflow check (in that case, LargeInteger has to be allocated on the heap).

16

https://swing.fit.cvut.cz/projects/stx-libjava/browser/branches/jk_new_structure/src/benchmarks. Runtime code update is an ability to modify the code at runtime, while the application is still running. Oracle JVM has only a limited support for runtime code updates, not comparable to support provided by stx:libjava. 17

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.15 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

15

3. Third, the JIT compiler used in stx:libjava currently is rather simple. It cannot compile certain Java methods, such as methods that contain exception handlers or methods that use more than 16 local variables. Moreover, current Java JIT compiler does not inline methods. This is purely an engineering problem and nothing prevents these optimizations from being implemented. STX : LIBJAVA compared to Smalltalk. Micro benchmarks show that performance of stx:libjava and equivalent Smalltalk code is roughly the same. This is not surprising as both Smalltalk and Java run-times share the same memory management and JIT compiler backend. The small deviations are caused by the fact that the benchmarked code is not exactly the same. Smalltalk and Java versions of the benchmark use different basic libraries. STX : LIBJAVA compared to Ruby and JRuby. Ruby is very popular, widely used dynamic programming language with similar set of features as Smalltalk. There’s an implementation of Ruby on top of a Java virtual machine – JRuby. Comparison of Smalltalk/X and stx:libjava with Ruby and JRuby shows how it stands comparing (i) with another widespread dynamic language and (ii) with JRuby – an another project that integrates two languages into the single programming environment. The microbenchmarks show that Ruby is comparable, or a bit slower than Smalltalk/X VM. For Hash and Strcat benchmarks, Ruby’s performance is comparable to JVM. This is caused by the fact, that in Ruby string concatenation and hash get/set operations are subject to the special optimization in Ruby VM as they are very common operations in Ruby programs. JRuby is 5 times slower (in the worst case 224 times slower) than corresponding Java code running on the same Oracle JVM. stx:libjava performs better in this case being 3–60 times slower than Oracle JVM.

10.2. Overhead of cross-language invocations To implement a cross-language invocation, i.e., Smalltalk calling a Java method or vice versa, stx:libjava uses a dynamic proxy method which is placed in-between the calling and called method. This proxy method then calls the real target method, performing all necessary arguments and return value transformations. In the case of overloaded methods, this proxy method also performs an additional method resolution based on the runtime types of arguments. To measure the actual performance overhead of such a cross-language method invocation, number of micro benchmarks were run in three setups: (i) running entirely in Smalltalk, (ii) running entirely in Java and (iii) a Smalltalk code calling Java methods. Benchmarking suite consist of following micro benchmarks: Method Invocation – performs 500 000 000 message sends with no arguments. PrimitiveArguments – performs 100 000 000 message sends with one integer argument. The invoked Java method takes one primitive argument of formal type int. ObjectArguments – performs 100 000 000 message sends with one object argument of class Object (a Smalltalk class). The invoked Java method takes one object argument of formal type Object. WrappedArguments – performs 100 000 000 message sends with one integer argument. The invoked Java method takes one object argument of formal type java.lang.Integer. When doing a cross-language send, the parameter passed by the Smalltalk code has to be wrapped into a Java class. OverloadedMethod4/8 – performs 40 000 000 overloaded message sends for 4 (8) different types. When doing a crosslanguage send, the proxy method must do an additional method resolution based on the runtime argument type in order to call current Java methods. Table 2 shows the results. The cross-language method invocation is roughly three times slower than method invocation within the same language. This is expected since every cross-language method send involves invocation of the dynamic proxy method in addition to the real code. Java and cross-language version of WrappedArguments and OverloadedMethods8 benchmarks are significantly slower than the Smalltalk variant. That is caused by an extra wrapper object allocation that is not required in the Smalltalk variant. 10.3. Summary To sum it up, stx:libjava is roughly 2–5 times slower than Oracle JVM when running the same code. At the same time, it provides some advanced features such as runtime stack reflection or dynamic code updates which are not supported by Oracle JVM. Thanks to VM-level integration, the performance of Smalltalk and Java code is the same. There is no performance penalty when running “non-native” code as in the case of Java and Ruby running on Oracle JVM. This is due to the fact, that some language constructs not supported by Oracle JVM have to be emulated at the object level. Being 2–5 slower than Oracle JVM, stx:libjava provides enough performance for real applications. We believe that the future versions of stx:libjava will provide even better runtime performance.

JID:SCICO AID:1628 /FLA

16

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.16 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

Table 2 Performance comparison of cross-language invocations. Benchmark

STX-S [msec]

STX-J [msec]

STX-J2S [msec]

STX-J2S/STX-S [1]

STX-J2S/STX-J [1]

MethodInvocation PrimitiveArguments ObjectArguments WrappedArguments OverloadedMethods4 OverloadedMethods8

5421 912 1023 877 3141 4368

4933 845 711 10 502 2460 13 351

12 561 2797 2792 16 481 8391 36 691

2.32 3.07 2.73 18.79 2.67 8 .4

2.55 3.31 3.93 1.57 3.41 2.75

Legend: STX-S – a pure Smalltalk. STX-J – a pure Java code. STX-S2J – a cross-language invocations: Smalltalk code calling Java methods. STX-S2J/STX-S – how many times the cross-language code is slower than pure Smalltalk code. STX-S2J/STX-J – how many times the cross-language code is slower than pure Java code.

11. Related work 11.1. JavaConnect and JNIPort JavaConnect (Brichau and De Roover [1]) and JNIPort (Geidel [4]) are Smalltalk libraries that allow interaction with a Java code from within Smalltalk. A Java virtual machine is linked into the Smalltalk virtual machine and the communication is made via foreign-function interfaces. Cross-language messages are translated into FFI invocations of either environment. Java classes can be browsed, but cannot be modified from within Smalltalk. The reason is the implementation of the Java virtual machine. Each Java object passed through FFI must be wrapped and registered, which generates additional overhead and disables automatic garbage collection of such objects. Because Java code is running in a separate virtual machine, proxy Java class must be generated for every Smalltalk class passed into Java. To reduce the overhead due to FFI calls, JavaConnect introduces the concept of language shifting objects. Shifted Java objects have part (or whole) of their behavior translated into Smalltalk, but not all instructions and language constructs are supported (such as MONITORENTER instruction and synchronized methods). In multithreaded applications, object state must be synchronized and problems arise, when a single native-threaded Smalltalk virtual machine interacts with a multithreaded Java application. Deadlocks can occur when a Java thread tries to communicate with the Smalltalk virtual machine, whose single native thread is blocked by another Java thread. In stx:libjava, Java classes can be created, modified, or destroyed in runtime. There is no need to synchronize the object state across two virtual machines. Java methods are directly executed, therefore no translation or interprocess communication is needed. 11.2. IBM VisualAge IBM VisualAge for Java (Deupree and Weitzel [2]) includes an interaction mechanism on the Smalltalk side. Communication is realized using remote-method invocation (RMI). Java and Smalltalk virtual machines run in parallel, possibly even on two separate machines. The transition between languages is explicit and managed by the programmer. A lot of boilerplate code must be written and objects must be registered, converted and maintained by the programmer when crossing language barrier. stx:libjava does not require any boilerplate code to access code across the language barrier, nor does it require any explicit handling or conversion when crossing the barrier. There is no need to set up an RMI service and to explicitly register objects in the RMI registry. 11.3. Redline Smalltalk Redline Smalltalk (Ladd [9]) is a Smalltalk implementation running on the JVM. Java classes are dynamically compiled from Smalltalk source code and loaded into JVM using standard class loaders. Smalltalk exceptions are mapped onto Java exceptions. Therefore, it is not possible to proceed or retry an exception. This greatly restricts the number of Smalltalk applications and libraries which could run on Redline Smalltalk. Some methods, especially those related to object space reflection, such as allInstances or become:, are not supported for performance reasons. Common types such as string and integer must be explicitly converted when crossing the language barrier. Due to Java’s static type system, Smalltalk objects can only be passed as argument if the Java method expects that type. stx:libjava does not alter any feature of the Java language, Java reflection is also fully compliant with the original JVM. Any Java application could run on stx:libjava without modifications. Common types are automatically converted. Java and Smalltalk methods can be passed in as needed and the programmer is freed from the need to handle Java or Smalltalk

JID:SCICO AID:1628 /FLA

[m3G; v 1.116; Prn:22/11/2013; 9:28] P.17 (1-17)

M. Hlopko et al. / Science of Computer Programming ••• (••••) •••–•••

17

objects differently. stx:libjava is currently the only language implementation of this kind on Smalltalk/X, however the approach does allow for direct communication between other such languages if they become relevant.18 12. Conclusion In this paper, we have presented stx:libjava, a Java virtual machine implementation integrated into the Smalltalk/X environment. We have described the main problems of a seamless integration of Smalltalk and Java languages and their solutions. We use dynamic method proxies to allow for a Java code to be easily called from Smalltalk and vice-versa. Also, we described how to integrate Smalltalk and Java exception and synchronization mechanisms, so the Smalltalk code can handle Java exceptions while the semantics of Java finally and synchronized blocks are preserved. A number of significantly large applications such as SAXON XSLT processor and Apache Tomcat Server/JSP container run on stx:libjava, as a proof of completeness of the implementation. stx:libjava is ready for and already is used in production. Also, the platform is used as a basis for the runtime code update and language interoperability research. Acknowledgements We would like to gratefully thank to Oscar Nierstrasz and Eliot Miranda for their precious comments. References [1] J. Brichau, C. De Roover, Language-shifting objects from java to smalltalk: an exploration using javaconnect, in: Proceedings of the International Workshop on Smalltalk Technologies, IWST ’09, ACM, New York, NY, USA, 2009, pp. 120–125, http://doi.acm.org/10.1145/1735935.1735956. [2] J. Deupree, M. Weitzel, Visualage integration for the 21st century: Smalltalk, java, websphere, http://www-01.ibm.com/support/docview.wss?uid= swg27000174, 2008. [3] Expecco, Java interface library 2.1, http://wiki.expecco.de/wiki/Java_Interface_Library_2.1, 2012, accessed: 25/06/2012. [4] J. Geidel, Jniport, http://jniport.wikispaces.com/, Feb. 2011. [5] C. Gittinger, Javascript compiler and interpreter, http://live.exept.de/doc/online/english/programming/goody_javaScript.html, 2005, accessed: 25/06/2012. [6] J. Gosling, B. Joy, G. Steele, G. Bracha, Java Language Specification, 3rd edition, Addison Wesley, Santa Clara, California 95054, USA, 2005, http://java.sun. com/docs/books/jls/. [7] M. Hlopko, J. Kurš, J. Vraný, C. Gittinger, On the integration of smalltalk and java: practical experience with stx:libjava, in: Proceedings of the International Workshop on Smalltalk Technologies, IWST ’12, ACM, New York, NY, USA, 2012, pp. 5:1–5:12, http://doi.acm.org/10.1145/2448963.2448968. [8] M. Hlopko, J. Kurš, J. Vraný, Towards a runtime code update in java, in: Proceedings of the Dateso 2013 Workshop, Písek, Czech Republic, April 2013, http://ceur-ws.org/Vol-971/paper12.pdf. [9] J. Ladd, Smalltalk implementation for the jvm, www.redline.st, 2011. [10] T. Lindholm, F. Yellin, Java™ Virtual Machine Specification, 2nd edition, Prentice Hall, Santa Clara, California 95054, USA, 1999, http://java.sun.com/ docs/books/jvms/. [11] J. Vraný, Supporting multiple languages in virtual machines, Ph.D. thesis, Faculty of Information Technologies, Czech Technical University in Prague, Sep. 2010. [12] J. Vraný, J. Kurš, C. Gittinger, Efficient method lookup customization for Smalltalk, in: Objects, Models, Components, Patterns, 2012, pp. 1–16, http://www.springerlink.com/index/PU875371770R562R.pdf.

18 Ruby (Vraný et al. [11], Chapter 8) and a JavaScript dialect (Gittinger [5]) have been integrated into are Smalltalk/X, but these translate the source language into Smalltalk/X bytecode and do not suffer from the complexity resulting from major semantic differences of e.g., the exception mechanism.