Efficiency of standard software architectures for Java-based access to remote databases

Efficiency of standard software architectures for Java-based access to remote databases

Future Generation Computer Systems 15 (1999) 417–424 Efficiency of standard software architectures for Java-based access to remote databases N. Zingi...

185KB Sizes 0 Downloads 16 Views

Future Generation Computer Systems 15 (1999) 417–424

Efficiency of standard software architectures for Java-based access to remote databases N. Zingirian ∗ , M. Maresca, S. Nalin Dipartimento di Elettronica ed Informatica, University of Padua, Padua, Italy Accepted 14 December 1998

Abstract Novel client–server architectures for remote database access are increasingly taking advantage of Web technology, adopting Web browsers as graphic user interfaces in the clients and traditional SQL database management systems (DBMSs) in the servers. The inter-operation between standard browsers and specific DBMSs is today supported by a number of software architectures based on the Java virtual machine embedded in last generation browsers. Unfortunately such software architectures, which appear excellent from the points of view of openness and flexibility, introduce conspicuous latencies in database access. The objective of this paper is to identify such latencies through the analysis of a number of experimental results. The paper describes four different software architectures supporting Java-based SQL database access, reports their performance c 1999 Elsevier Science B.V. All rights measurements on different hardware platforms and compares the results obtained. reserved. Keywords: CORBA; Java; JDBC; Performance evaluation; Visual database access

1. Introduction The traditional approach to client–server database access is based on the combined action of a user interface written in a visual language (i.e. the client) and a standard SQL engine (i.e. the server). Currently, this approach is challenged by novel solutions based on the emerging Web paradigm [1]. In the Web-based solutions the user interface, typically written in Java, is downloaded by a browser from the server and runs on the browser virtual machine. The main point of such solutions is that the client software resides in the server system as a unique copy, rather than being replicated in the client systems. This fact allows the costs of installation, configuration and maintenance of the client software to be reduced thanks to the possibility of centralising all these operations at the server site [2]. Web-based access to SQL databases can be supported by a number of software architectures [3] characterized by a different composition of a set of basic components [4]. Unfortunately such software architectures, which appear excellent from the point of view of openness and flexibility, introduce conspicuous latencies in database access. ∗

Corresponding author.

c 0167-739X/99/$ – see front matter 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 7 3 9 X ( 9 8 ) 0 0 0 8 5 - 5

418

N. Zingirian et al. / Future Generation Computer Systems 15 (1999) 417–424

Fig. 1. Software architecture 1: database access through Java sockets.

The objective of the study presented in this paper is to describe four different software architectures implementing Web-based SQL database access and to report and compare their performances, characterizing the sources of the latencies introduced by each architecture. The paper is organized as follows. First we describe the standard software architectures for Java-based database access (Section 2). Then we present the testbed used and the methodology followed in our experiments (Section 3). Finally we present the results of the experiments (Section 4) and discuss the performance profile of the software architectures tested (Section 5). A brief summary (Section 6) concludes the paper. 2. Software architectures for Java-based database access In this section we present the four software architectures for Java-based database access that we selected for our investigation. Each architecture complies with the client–server paradigm and consists of three main entities: the client, the server and the inter-operability components. The client and the server are the same in each architecture; the inter-operability components are the distinctive feature for each architecture. The client is a Java-enabled browser, the server is a SQL database management system (DBMS) which accepts remote network connections and queries. We assume that the network protocol stack is based on the TCP/IP protocol suite. The inter-operability components of each software architecture are described in the rest of the section. 2.1. Architecture based on Java socket connectivity modules The scheme of the architecture based on Java socket connectivity modules is depicted in Fig. 1 The shaded blocks represent the software modules that the programmer has to develop to interconnect the existing components. The client (browser) downloads an applet from the server machine and runs it. The applet sends the queries and receives the responses through the Java socket modules. The server (DBMS) exchanges data with the client through a software component that acts as an intermediate custom agent. Such an intermediate custom agent receives the client queries and submits them to the database through the database API library. The connectivity between the applet and the custom agent is based on TCP/IP. On the client side communication is supported by the standard Java socket API; on the server side communication is supported by the standard UNIX socket interface. Inter-operation between the applet and the custom agent is based on a custom protocol which supports the fundamental database access operations (open database, submit query, retrieve response, close database). 2.2. Architecture based on proprietary DBMS JDBC driver The Architecture based on proprietary DBMS JDBC driver is depicted in Fig. 2. Java dataBase connectivity (JDBC) is a standard interface which allows Java applets and applications to inter-operate with database. Like in

N. Zingirian et al. / Future Generation Computer Systems 15 (1999) 417–424

419

Fig. 2. Software architecture for database access based on DBMS proprietary Java JDBC.

Fig. 3. Software architecture for database access based on Java JDBC and request broker.

the previous architecture, the client (browser) downloads an applet from the server machine and runs it. The applet accesses the DBMS through the standard primitives of the JBDBC application programming interface (JDBC API). The JDBC API is available as a part of the standard classes of the Java developer kits and of the Java-enabled browsers compliant wit the latest versions of Java. The JDBC classes are not specific for any particular DBMS, except for one class, denoted the JDBC driver, which implements the communication protocol required by the specific DBMS which it refers to. The specific JDBC drivers are provided by the DBMS vendors or by third parties and are dynamically linked to the standard JDBC objects. Usually the JDBC driver is downloaded from the server along with the client applet. The DBMS driver interoperates with the DBMS over the network by translating the JDBC methods in proprietary messages to proprietary service request messages. The server (DBMS) receives the queries, sends the responses and performs all the operations through its proprietary protocol, with no need of additional inter-operability modules in the server machine.

2.3. Architecture based on JDBC request broker middleware The architecture based on a request broker is depicted in Fig. 3. As in the previous architecture, the applet communicates with the database through the standard JDBC API and through a specific JDBC driver. As opposed to the previous architecture, the JDBC driver does not access the database directly but it access an intermediate object, called the broker, running on the server machine, through a proprietary protocol. When the applet requests a connection, the request broker dynamically instantiates two components: a JDBC agent, and a database agent specific for the DBMS adopted. The server (DBMS) and the client inter-operate through the database agent and through the JDBC agent which interact by means of a proprietary protocol. The openness and flexibility advantages of using the request broker and the agents are discussed in [5].

420

N. Zingirian et al. / Future Generation Computer Systems 15 (1999) 417–424

Fig. 4. Software architecture for database access based on CORBA. Table 1 Software products used as components of the tested software architectures Software component DBMS 1 Browser Java socket DBMS JDBC driver Request broker Database agent JDBC agent Broker JDBC Driver Client ORB Server ORB

Software product PostgreSQL 6.1.1 Navigator 4 class java.net.socket class postgresql.driver oplrqb pgr95 sv jdbc sv class openlink.driver package org.omg.CORBA omniORB2

Vendor University of California [7] Netscape Netscape University of California Openlink Openlink Openlink Openlink Visigenic/Netscape Olivetti & Oracle Research Labs

2.4. Architecture based on CORBA inter-operability components The CORBA-based architecture is shown in Fig. 4. Even in this case the applet communicates with the DBMS through an object-oriented API. This API instantiates and inter-operates with a database access object which resides in the server machine. Inter-operability is supported by a pair of libraries called object request brokers (ORBs) [6] which communicate through the Internet Inter-ORB protocol (IIOP). One ORB is a Java package located in the client machine while the other ORB is a library located in the server machine. The server (DBMS) communicates with the database access object through the DBMS application programming interface. The database access object is developed by the system integrator and its API is described through a formal specification language denoted as inerface description language (IDL). The IDL supports reliable inter-operation between the client and the server in spite of the fact that the programs are written in different languages.

3. Tested and methodology In this section we describe the testbed that we used and the methodology that we followed to carry out the experiments. We installed the DBMS on a server machine (PC Pentium Pro, 200 MHz, 32 Mbytes RAM, Linux operating system) and the different software components in client machines (PC Pentium 166 MHz, HP715 100 MHz and Sun Sparc5 75 MHz, each equipped with 32 MB and running Unix operating systems), choosing the products reported in Table 1. We identified three database operations as the basic operations to be tested, namely database opening, query execution, database close. The query execution we adopted for our tests was a single selection of an item consisting of a 34 byte long string from a table of more than 39 000 rows of four columns each. Each latency measurement was obtained by repeating each experiment more than 6000 times and selecting the minimum result, in order to minimize the incidental overheads.

N. Zingirian et al. / Future Generation Computer Systems 15 (1999) 417–424

421

Table 2 Minimum times for database basic operations locally and remotely with Java socket for each platform Local Open Exec Close

33 54 67

Socket PC 37 60 68

HP 38 62 68

Sun 38 68 69

To measure the time we used the java.util.Date.getTime() method on Java programs, the gettimeofday() system call on C/C++ programs when the source code was available, and the strace-ttf-p hprocess PIDi Linux command when the source code was not available. The first two commands read the system real-time clock, the command strace intercepts and records the system calls activated and the signals received by the traced process, reporting the time of each event. To keep the tracing overhead acceptable we limited the tracing only to the system calls significant for our measures.

4. Experiments and results The experiments were carried to measure and characterize the database access latencies. In a preliminary set of experiments we measured the common latency components present in the four software architectures. In the rest of the experiments we measured the latencies present in each software architecture and identified the specific overheads as well as the sources of such overheads. We report the results of the preliminary set of experiments in Section 4.1 and the results of the rest of the experiments in Section 4.2.

4.1. Preliminary experiments This section presents a preliminary set of experiments carried out to measure the latencies common in all the software architectures based on Java-enabled clients, namely, (i) the database local access time and (ii) the Java socket communication time. To measure the database local access time we developed a C program which activates the basic operations of the database access library on the local machine and measures their execution time. The measurements, in milliseconds, are reported in Table 2 (first column). We also measured the overheads due to the access of the library to the database, by tracing the internal TCP/IP communication between the library and the DBMS. We experimented that such overheads are negligible, namely less than 2 ms. To measure the Java socket communication time, we set up the architecture based on socket connectivity modules (See Section 2.1). The custom agent module in this architecture was obtained by making the C program developed for the previous experiment accessible through TCP/IP connections. The timing measurements of this architecture are reported in the second set of columns of Table 2 (each column corresponds to a different hardware platform used in the test). The results reveal that the remote access through Java socket introduces latencies which are not negligible with respect to the database access times, namely 10–30% for database opening and query execution. It is noteworthy that the last results were obtained after optimizing the use of the Java socket interface. The optimization consisted of identifying the most efficient access methods to the Java socket buffers among the variety of methods available (e.g. read/writeByte, read/writeInt, read/writeBytes, read/write, etc.), packing the messages in the format required by the methods identified and sending them out by means of one method invocation. The effects of such optimizations are relevant because the latency of socket I/O, which heavily affects the overall performance, is highly variable depending on the method used.

422

N. Zingirian et al. / Future Generation Computer Systems 15 (1999) 417–424

Table 3 Minimum query times for different software architectures and client platforms Local Open Exec Close

33 54 67

JDBC PC 143 101 5

HP 192 104 8

Sun 218 106 10

JDBC/broker PC HP 70 93 77 80 0 0

Sun 160 83 1

CORBA PC 39 66 72

HP 42 71 75

Sun 44 78 78

4.2. Experiments on standard software architectures This section presents a set of experiments carried out to evaluate the efficiency of standard architectures described in Section 2. The standard architectures evaluated are based on JDBC and CORBA inter-operability modules. In this section we present the measurements of overheads introduced by two architectures based on the JDBC standard and one architecture based on the CORBA standard. 4.2.1. Efficiency of the architecture based on proprietary JDBC driver Columns 2–4 of Table 3 report the basic database access times, in milliseconds, measured on the architecture described in Section 2.2. In the open operation we identified the following two sources of overheads: (i) a Java socket connection is established whenever the database is opened and (ii) each item necessary to the connection to the database (login, password, database name, etc.) is sent through separate read/write of messages on sockets. The first overhead is due to the fact that in the JDBC standard the database close operation causes the close of the TCP/IP connection. The overhead is 18 ms in the PC platform, 28 ms in the HP platform and 37 ms in the Sun platform 1 . The second overhead is due to the fact that the JDBC driver for PostgreSQL DBMS does not pack the information in a single string in order to minimize the number of read and write operations on Java sockets. This overhead, in addition, is extremely variable in the different hardware platforms (10 ms for PC, 22 ms for HP and 67 ms for Sun). In the execution of the queries the main source of inefficiency derives from the fact that the JDBC standard requires the preparation of a complex data structure (which is an object of class ResultSet) in response to the query. We found that the time required to build this structure is 10 ms in all the architectures. In addition, we experimented that reading the results of the query from the network requires bytewise reading from socket which often causes abnormal overheads (typically 30 ms) on the Netscape Java Virtual Machine implementation in HP workstation. 4.2.2. Efficiency of the architecture based on JDBC request broker Columns 5–7 of Table 3 reports the basic database access times, in milliseconds, measured on the architecture described in Section 2.3. In the open operations we identified the following source of inefficiency by tracing the system calls of the request broker: (i) the applet creates two network connections to open the database, respectively toward the request broker and toward the JDBC agent; (ii) the server requires long time to fork and to configure the two agents. The first source of overhead produces latencies from 36 to 74 ms depending on the hardware platforms. The second source of overhead produces a latency of about 75 ms. We obtained this value by measuring the interval between the connection request time to the request broker and the query arrival time to the DBMS and by subtracting the network connection latencies. In the query execution the main source of inefficiency derives from the fact that the Database Agent opens the database before sending the query and closes it after sending back the result to the JDBC Agent. As a consequence, the overhead is due to the open operation, which takes 33 ms, as we showed above. In addition the timing measurements 1

This fact explains also the time measured for the close operation: the client does not wait for the termination of the database close operation before closing network connection.

N. Zingirian et al. / Future Generation Computer Systems 15 (1999) 417–424

423

reveal that Java sockets are used efficiently, considering that the execution time is close to the sum of the time measured in the Java socket architecture plus the database opening time. The close time reveals that the database is closed after the network connection is closed. The close time includes only the handshaking operations necessary before the termination of the agents. 4.2.3. Efficiency of the architecture based on CORBA Columns 8–10 of Table 3 report the basic database access times in milliseconds measured on the architecture described in Section 2.4. These measurements reveal that the architecture based on CORBA introduces negligible overheads with respect to the latency measurement of the architecture directly based on Java sockets, in all the operations and in all the hardware platforms (see Table 2). It is worth recalling that the results obtained in the architecture based on Java sockets were the result of several optimizations on the communication protocol and on socket method invocation. As a consequence our experiments reveal that CORBA inter-operability modules make optimized use of the Java sockets.

5. Discussion The experiments presented in the previous section allow extracting a performance profile of the standard solutions for Java-based access to databases. Such profile is valid in case of local environments (LANs) in which network latencies are negligible with respect of the other system latencies. The performance characteristics of Java socket modules obtained in our experiments are: high latencies in connections and read/write operations, high variability of socket latencies versus the type of data transmitted and depending on the hardware platforms used. The performance variability of Java sockets affects the performance of all the other Java-based solutions which unavoidably rely on Java sockets. However we experimented that the solutions based on JDBC and CORBA are differently affected by such a variability. Some experiments revealed that the Java socket interface propagates its poor performance to the JDBC modules. In particular we experimented that the Java socket operations add an overhead which ranges approximately from 30% to 100% and varies heavily according to the hardware platform used. On the contrary, other experiments revealed that CORBA is not affected by the poor performance of Java socket operations. In particular we measured a performance close to the best performance obtained optimizing Java socket method invocations in the custom client, without exhibiting latency variations among different hardware platforms. The good performance of CORBA is due to the fact that the communication library (client ORB) performs an access to the Java socket interface specifically optimized to transport the (IIOP) protocol. In addition, unlike JDBC Drivers, the client ORB can be implemented in different ways according to the different hardware platforms. The client ORB is not downloaded along with the applet, as it belongs to the standard packages of Java. Finally we demonstrated that the CORBA standard inter-operability modules are the best suited database access solution for high performance LAN environments, because of the following reasons: – the CORBA architecture acts at a level which is high enough to support reliable and easily standardizable interfaces between clients and servers written in different languages and running in different environments. – the CORBA architecture acts at a level which is low enough to belong to the standard library package of Java, which can be optimized to hide the performance problems of the Java socket interface.

6. Summary The paper has presented a set of measurements aimed at characterizing the performance of four software architectures, based on the Java socket interface and on JDBC and CORBA standards. These two standards are the most

424

N. Zingirian et al. / Future Generation Computer Systems 15 (1999) 417–424

frequently proposed solutions to the system architects in order to make remote database servers accessible from Java clients. The main contributions of our work are the performance evaluation and the comparison of the software architectures tested as well as the identification of the bottlenecks in the most significant database access operations for each architecture. The most relevant results are the characterization of the inefficiency of the Java classes for network access (Java socket), the impact of such a inefficiency on the architectures based on JDBC and the immunity of the architectures based on CORBA from such an inefficiency. References [1] J. Gary, Evolution of data management, IEEE Computer Magazine 29(10) (1997) 38–46. [2] J. Hamilton, Java and JDBC: tools supporting data-centric business application development, IEEE Proc. 4th Int. Symp. on Assessment of Software Tools, May 1996, pp. 121–138. [3] D. Garlan, D.E. Perry, Introduction to the special issue on software architectures, IEEE Trans. Software Eng. 21(4) (1995) 269–274. [4] R.M. Adler, Emerging standards for Component Software, IEEE Computer Magazine 28(28) (1995) 68–77. [5] K. Idehen, Openlink Java whitepaper, Openlink Technical report, September 1996. [6] S. Vinoski, CORBA: integrating diverse applications within distributed heterogeneous environments, IEEE Communication Magazine, 14 (2) (1997). [7] J. Chen, A. Yu, The Postages Implementation Guide, Postgres Group Technical Report, October 1995. [8] R. Cattell, G. Hamilton, JDBC Database Access with Java, Javasoft Press, Addison-Wesley, New York, 1997.

Nicola Zingirian received the Laurea Degree in Computer Engineering from the University of Genoa in 1994, and is going to discuss the Ph.D. thesis at the University of Padua, Italy. His research interests include the performance evaluation of high performance computing and networking architectures, in particular computing systems based on superscalar processors and network systems based on distributed object technology.

Massimo Maresca has been a Professor in the Department of Computer Engineering and Electronics of the University of Padova, Italy since 1994. Before joining the University of Padova he was an Assistant Professor (1990–1992) and an Associate Professor (1992–1994) at the University of Genoa, Italy, a Visiting Scientist at the International Computer Science Institute in Berkeley, California (1991), and a Post Doctoral fellow at the IBM T.J. Watson Research Center, New York (1985). He received a Laurea (1980) and a Doctorate (1986) in Computer Engineering form the University of Genoa. He is presently working at AIPA (Autorit’per l’Informatica nella Pubblica Amministrazione), an Italian government agency which stimulates and coordinates the utilization of the information and communication technologies in the Italian Public Administration, on leave from the University of Padova. His technical/scientific interests include computer architecture (at the hardware, system and software level), network technologies (models, protocols, design and applications) and network services and applications. Stefano Nalin received the Laurea Degree in Computer Engineering in 1998 from the University of Padua, Italy. He worked with Engineering S.p.A in Padova, Italy and is now working with Aeon Virtual Shopping Services GmbH in Stuttgart, Germany as Java-based system developer.