Advanced Resource Connector middleware for lightweight computational Grids

Advanced Resource Connector middleware for lightweight computational Grids

Future Generation Computer Systems 23 (2007) 219–240 www.elsevier.com/locate/fgcs Advanced Resource Connector middleware for lightweight computationa...

2MB Sizes 0 Downloads 100 Views

Future Generation Computer Systems 23 (2007) 219–240 www.elsevier.com/locate/fgcs

Advanced Resource Connector middleware for lightweight computational Grids M. Ellert a , M. Grønager b , A. Konstantinov c,d , B. K´onya e , J. Lindemann f , I. Livenson g , J.L. Nielsen h , M. Niinim¨aki c , O. Smirnova e,∗ , A. W¨aa¨ n¨anen h a Department of Nuclear and Particle Physics, Uppsala University, Box 535, 75121 Uppsala, Sweden b UNI-C, The Danish IT Centre for Education and Research, DTU Build. 304, DK-2800 Kgs. Lyngby, Denmark c University of Oslo, Department of Physics, P.O. Box 1048, Blindern, 0316 Oslo, Norway d Vilnius University, Institute of Material Science and Applied Research, Saul˙etekio al. 9, Vilnius 2040, Lithuania e Institute of Physics, Lund University, Box 118, 22100 Lund, Sweden f Structural Mechanics, Lund University, Box 118, 22100 Lund, Sweden g University of Tartu, Institute of Computer Science, J.Liivi 2-306, 50409 Tartu, Estonia h Niels Bohr Institute, Blegdamsvej 17, DK-2100 Copenhagen Ø, Denmark

Received 13 September 2005; received in revised form 17 May 2006; accepted 20 May 2006 Available online 24 July 2006

Abstract As computational Grids move away from the prototyping state, reliability, performance and ease of use and maintenance become focus areas of their adoption. In this paper, we describe ARC (Advanced Resource Connector) Grid middleware, where these issues have been given special consideration. We present an in-depth view of the existing components of ARC, and discuss some of the new components, functionalities and enhancements currently under development. This paper also describes architectural and technical choices that have been made to ensure scalability, stability and high performance. The core components of ARC have already been thoroughly tested in demanding production environments, where it has been in use since 2002. The main goal of this paper is to provide a first comprehensive description of ARC. c 2006 Elsevier B.V. All rights reserved.

Keywords: Grid; Globus; Middleware; Distributed computing; Cluster; Linux; Scheduling

1. Introduction R Computational Grids are not a novel idea. The Condor project [1] led the way in 1988 by setting the goal of creating a computing environment with heterogeneous distributed resources. Further efforts to create a layer for secure, coordinated access to geographically distributed computing and storage resources resulted in the appearance of the Globus R Toolkit [2] and its Grid Security Infrastructure (GSI) [3] in the late 1990s. From a general point of view, Foster has defined the Grid as a system that coordinates resources that are not

∗ Corresponding address: Lund University, Institute of Physics, Experimental High Energy Physics Division, Box 118, 22100 Lund, Sweden. Tel.: +46 462227699; fax: +46 462224015. E-mail address: [email protected] (O. Smirnova).

c 2006 Elsevier B.V. All rights reserved. 0167-739X/$ - see front matter doi:10.1016/j.future.2006.05.008

subject to centralized control using standard, open, generalpurpose protocols and interfaces to deliver nontrivial qualities of service [4]. Addressing the needs of the Nordic1 research community, the NorduGrid collaboration [5] in 2001 started to investigate possibilities to set up a regional computational Grid infrastructure. Scientific and academic computing in the Nordic countries has a specific feature in the way that it is carried out by a large number of small and medium size facilities of different kinds and ownership. An urgent need to join these facilities into a unified structure emerged in late 2000, as the only way to meet the computing needs of the local High Energy Physics community. Studies and tests conducted by NorduGrid in 2001 R showed that existing solutions, such as Globus Toolkit 1.4, 1 Nordic states are: Denmark, Finland, Iceland, Norway and Sweden.

220

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

R Globus Toolkit 2.0, Condor, UNICORE [6] and the EDG [7] prototype, were either not ready at that time to be used in a heterogeneous production environment, or did not satisfy other requirements of the community (see details in the paper). In order to meet the pressing users’ demands of urgent Grid system deployment, NorduGrid in early 2002 started to develop its own software solution for wide-area computing and data handling [8,9], making use of the by then de facto R standard Globus Toolkit 2. This new open source middleware is now called the Advanced Resource Connector (ARC). An important requirement from the very start of ARC development has been to keep the middleware portable, compact and manageable both on the server and the client side. The client package currently occupies only 14 Megabytes, it is available for most current Linux2 distributions and can be installed at any available location by a non-privileged user. For a computing service, only three main processes are needed: the specialized GridFTP service, see Section 3.1, the Grid Manager (see Section 3.3) and the Local Information Service (see Section 3.6.1). Recently, ARC development and deployment has been funded by the Nordic Natural Science Research Council (NOSN) and the aim is to provide a foundation for a large-scale, multi-disciplinary Nordic Data Grid Facility [10]. ARC is being used as the middleware of choice for this facility and for many other Grid infrastructures, such as the Estonian Grid [11], Swegrid [12], DCGC [13], Swiss ATLAS Grid [14] and others. The middleware supports production-level systems, being in non-stop operation since 2002 [15], and has proved its record in demanding High Energy Physics computing tasks [16], being the first ever middleware to provide services to a world-wide High Energy Physics data processing effort already in summer 2002 [17]. The main purpose of this paper is to give a comprehensive technical overview of the ARC middleware, its architecture and components — the existing core ones as well as the recently introduced and beta-quality services. The rest of the paper is organized as follows. In Section 2, the requirements and design of the architecture of ARC are presented. Section 3 discusses the implementation of the main services, while Section 4 describes the client components and user interfaces. Sections 5 and 6 are more development oriented, discussing the developer interfaces and libraries, and recent additions to ARC, respectively. Section 7 illustrates the performance of the ARC middleware through scientific applications, and a summary is provided in Section 8.

2. Architecture The architecture and design of the ARC middleware has mainly been driven by the requirements and specifics of the Nordic ATLAS [18] user community. The basic set of highlevel requirements is no different from those of any other community, be it users, resource owners or software developers: 2 Linux is a trademark of Linus Torvalds.

the system must be efficient, reliable, scalable, secure and simple. However, the environment that led to the creation of ARC has some specifics that can be outlined as follows: • Excellent academic network (NORDUNet [19]), providing a solid basis for deployment of wide area computing architectures based on Grid technologies; • A multitude of computing resources of various capacity, origin and ownership, effectively excluding the possibility for a centralized, uniform facility; • Prevalence of embarrassingly parallel, serial batch tasks, requiring Linux flavors as an operating system; • Necessity to operate with large amounts (Terabytes) of data stored in files and accessed by the tasks via Portable Operating System Interface (POSIX) calls. This resulted in prioritized development oriented towards commodity Linux clusters, and in emphasis on serial batch job management, giving lower priority for such problems as optimization of data management or support for non-Linux systems. Customers’ needs and the environment specifics impose strict rules for the middleware development. Independently of the nature of the customer – an end-user or a resource owner – the guiding principles for the design are the following: (1) A Grid system, based on ARC, should have no single point of failure, no bottlenecks. (2) The system should be self-organizing with no need for centralized management. (3) The system should be robust and fault-tolerant, capable of providing stable round-the-clock services for years. (4) Grid tools and utilities should be non-intrusive, have small footprint, should not require special underlying system configuration and be easily portable. (5) No extra manpower should be needed to maintain and utilize the Grid layer. (6) Tools and utilities respect local resource owner policies, in particular, security-related ones. These principles are reflected in the design and implementation of each individual component and of the overall system, as described in the rest of the paper. ARC developers take a conscious approach of creating a solution that addresses the users’ needs while meeting the resource owners’ strict demands. This requires a delicate balance between the users’ wishes to get a transparent, singlelogin access to as much resources as possible world-wide, and the system administrators’ needs to keep unwanted customers and expensive or insecure services at bay. Simplicity is the key word: applied correctly, it ensures robustness, stability, performance and ease of control. Development of the ARC middleware follows the philosophy “start with something simple that works for users and add functionality gradually”. This allows ARC to provide customers with a working setup at very early stages, and to define the priorities of adding functionality and enhancements based on the feed-back from the users. Middleware development was preceded by an extended period of try-outs of by then available solutions and collecting

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

feed-back from the users. This process identified a set of more specific requirements, described in the following section. 2.1. Requirements and design considerations Perhaps the most important requirement that affects ARC architecture stems from the fact that most user tasks deal with massive data processing. A typical job expects that a certain set of input data is available via a POSIX open call. Moreover, such a job produces another set of data as a result of its activity. These input and output data might well reach several Gigabytes in size. The most challenging task of the Grid is thus not just to find free cycles for a job, but to ensure that the necessary input is staged in for it, and all the output is properly staged out. In late 2001, when ARC requirements started to crystallize, there were very few options R available: most Globus Toolkit -based solutions relied on the client to upload the data to the execution site in real time, and likewise, to retrieve the result. Other solutions introduced “wrappers” for applications, that performed data movement to/from the execution nodes. EDG [7] decided to minimize data movement by sending jobs to where the data are and storing the results at local to the job storage facilities. None of these approaches were satisfactory for the NorduGrid for a variety of reasons, including separate locations of computing and storage resources, wide-area network inaccessibility from the execution nodes, security considerations and other limitations on application functionalities. The only adequate approach seen by the NorduGrid developers was to include massive data staging and corresponding disk space management into the computing resource front-end functionality. Today, other solutions are available, e.g., by the GriPhyN project using workflows [20], but ARC’s concept remains unique. The decision to include job data management functionality into the Grid front-end is one of the cornerstones of the architecture, adding to the high reliability of the middleware and optimizing it for massive data processing tasks. Another important requirement is that of a simple client that can be easily installed at any location by any user. Such a client must provide all the functionality necessary to work in the Grid environment and yet remain small in size and simple in installation. The functionality includes, quite obviously, Grid job submission, monitoring and manipulation, possibility to declare and submit batches of jobs with minimal overhead, and means to work with the data located on the Grid. The client must come primarily as a lightweight command line interface (CLI), with an optional extension to a GUI or a portal. The requirement of CLI installation procedure by non-experienced users with non-root accounts practically implies its distribution as a binary archived package (tar, gzip). A user must be able to use clients installed in different workstations interchangeably, without the necessity to instantiate them as permanently active services. Naturally, the configuration process must be reduced to an absolute minimum, with the default configuration covering most of the user cases. Last, but not least, the clients must remain functional in a firewalled environment. The Grid clients that existed at end 2001 were not satisfying these

221

requirements, being either over-complicated for a non-educated user to set up, or lacking the desired functionalities. ARC’s decision to use only necessary components to build the client part, and usage of a uniform programming environment (C++) resulted in the possibility to deliver a compact and easy to use Grid client. There is another perspective on the Grid: that of the resource owners. The idea of providing rather expensive and often sensitive services to a set of unknown users worldwide is not particularly appealing to the system administrators, facility owners and funding bodies, whose primary task is to serve local user communities. A set of strict agreements defining a limited user base and specific services results effectively in a limited and a highly regulated set of dedicated computing resources, only remotely resembling the World Wide Grid idea. Most Grid solutions of the time were based on the assumption of availability of dedicated application-specific resources, that can be configured in the way optimal for one or another Grid implementation, and had serious security flaws. NorduGrid’s approach is to provide resource owners with a non-intrusive and more secure set of tools, requiring minimal installation and maintenance efforts while preserving the existing structures and respecting local usage policies. This is achieved by using a minimal amount of exposed interfaces, and by designing and implementing each service having flexibility, security and simplicity in mind. Each resource owner that would be willing to share some resources on the Grid has a specific set of requirements to be met by the middleware. The most common implementation requirements are: • The solution must be firewall-friendly (e.g., minimize the number of necessary connections). • Services must have easy crash recovery procedures. • Allowed Grid-specific resource usage (load, disk space) should be adjustable and manageable. • Accounting possibilities must be foreseen. • Services must be able to run using a non-privileged account. • Single configuration procedure for site-specific services is highly desirable. In 2001 existing implementations had many shortcomings and did not meet the outlined requirements: for example, Globus R Toolkit required the number of ports opened in the firewall to be proportional to the number of potential jobs, and EDG and Condor were not adjustable and extensible enough. ARC components were designed and implemented addressing such shortcomings, as explained in the following sections. 2.2. Design The ARC architecture is carefully planned and designed to satisfy the above mentioned needs of end-users and resource providers. The major effort is put to make the resulting Grid system stable by design. This is achieved by avoiding centralized services and identifying only three mandatory components (see Fig. 1):

222

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

Fig. 1. Components of the ARC architecture. Arrows point from components that initiate communications towards queried servers. Detailed diagrams for the essential components are presented in the respective sections of the paper.

(1) The Computing Service, implemented as a GridFTP–Grid Manager pair of services (see Sections 3.1 and 3.3). The Grid Manager (GM) is the “heart” of the Grid, instantiated at each computing resource’s (usually a cluster of computers) front-end as a new service. It serves as a gateway to the computing resource through a GridFTP channel. GM provides an interface to the local resource management system, facilitates job manipulation, data management, allows for accounting and other essential functions. (2) The Information System (see Section 3.6) — serves as a “nervous system” of the Grid. It is realized as a hierarchical distributed database: the information is produced and stored locally for each service (computing, storage), while the hierarchically connected Index Services maintain the list of known resources. (3) The Brokering Client (see Section 4.2.1) — the “brain” of the Grid, which is deployed as a client part in as many instances as users need. It is enabled with powerful resource discovery and brokering capabilities, being able to distribute the workload across the Grid. It also provides client functionality for all the Grid services, and yet is required to be lightweight and installable by any user in an arbitrary location in a matter of a few minutes. In this scheme, the Grid is defined as a set of resources registering to the common information system. Grid users are those who are authorized to use at least one of the Grid resources by the means of Grid tools. Services and users must be properly certified by trusted agencies (Section 3.7). Users interact with the Grid via their personal clients, which interpret the tasks, query the Information System whenever necessary, discover suitable resources, and forward task requests to the appropriate ones, along with the user’s proxy and other necessary data. If the task consists of job execution, it is submitted to a computing resource, where the Grid Manager interprets the job description, prepares the requested environment, stages in the necessary data, and submits the job to the local resource management system

(LRMS). Grid jobs within the ARC system possess a dedicated area on the computing resource, called the session directory, which effectively implements limited sandboxing for each job. Location of each session directory is a valid URL and serves as the unique Job Identifier. In most configurations this guarantees that the entire contents of the session directory is available to the authorized persons during the lifetime of the Grid job. Job states are reported in the Information System. Users can monitor the job status and manipulate the jobs with the help of the client tools, that fetch the data from the Information System and forward the necessary instructions to the Grid Manager. The Grid Manager takes care of staging out and registering the data (if requested), submitting logging and accounting information, and eventually cleaning up the space used by the job. Client tools can also be used to retrieve job outputs at any time. The following innovative ARC approaches made sure that the customer requirements and development principles are satisfied: • Usage of the specialized GridFTP server (see Section 3.1) for job submission provides for job and resource management flexibility, full GSI benefits, and a lightweight client. • The Grid Manager carries most of the workload, thus enabling a perfect front-end Grid model, where no Gridrelated software (other than the LRMS) has to be installed on cluster worker nodes. The Grid Manager is highly configurable. • The data management functionality of the Grid Manager includes reliable file transfer, input data caching and registration of output data to data indexing services via builtin interfaces. • The system has no central element or service: the information tree is multi-rooted, and matchmaking is performed by individual clients. • The information about Grid jobs, users and resource characteristics is generated on demand and is kept locally on every resource in a Lightweight Directory Access Protocol (LDAP) [21] database. No higher level of data

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

aggregation or cache is used, hence clients query directly each information source, which in turn either serves the data from its local cache, or promptly generates the requested information in case this cache is expired. The Information System adequately describes the Grid entities and resources in a dedicated LDAP schema. • There is no intermediate service between the client and the resource: the Brokering Client performs resource discovery, matchmaking and actual Grid job submission. Users can share the clients, or have several user interfaces per user. Fig. 1 shows several optional components, which are not required for an initial Grid setup, but are essential for providing proper Grid services. Most important are the Storage Services and the Data Indexing Services. As long as the users’ jobs do not manipulate very large amounts of data, these are not necessary. However, when Terabytes of data are being processed, they have to be reliably stored and properly indexed, allowing further usage. Data are stored at Storage Elements (SE), which in ARC come in two kinds: the “regular” diskbased SE is simply a GridFTP interface to a file system (see Section 3.4), while the Smart Storage Element (SSE) is a Web service providing extended functionality, including file movement and automatic file indexing (see Section 3.5). File indexing itself needs a specially structured database with a proper GSI-enabled interface. ARC currently does not provide any indexing services. Instead an interface to several widely used data indexing systems is implemented: Globus Replica Catalog (RC) [22], Globus Replica Location Service (RLS) [23] and gLite Fireman [24]. RC and RLS were developed by the Globus project with the goal to facilitate cataloging and consequent locating of data sources. The original Replica Catalog was not GSI-enabled; however, ARC developers have added the possibility to perform securely authenticated connections. Fireman is a component of the gLite [25] Data Management System and is developed by EGEE project [26]. Another non-critical component that nevertheless often comes among the top-ranked in user requirements, is the Grid Monitor (Section 4.4). In the ARC solution, it is only a client without capabilities to manipulate resources or jobs, providing information only to the end-users. It relies entirely on the Information System, being basically a specialized user-friendly LDAP browser. Monitoring in ARC does not require special sensors, agents or dedicated databases: it simply re-uses the Information System infrastructure. While monitoring offers nearly-realtime transient views of the system, historical information has to be made persistent for further analysis, accounting, auditing and similar purposes. To provide for this, a job’s usage record is submitted by the Grid Manager to a Logging Service (Logger) (see Section 6.2). Each group of users that desires to maintain such records can run a dedicated Logger. For all the structure described above to serve a particular user, this user should be known (authenticated) to the Grid and be authorized to use one or more services according to the policies adopted by the service providers. The ARC user management scheme is identical to most other Grid solutions: typically, Grid users are grouped in Virtual Organizations

223

(VO), which provide their member lists via various User Databases or through a Virtual Organization Membership Service (VOMS [27]). ARC services that require authorization, such as the Computing Service or a Storage Service, subscribe to a subset of these databases (depending on local policies) and regularly retrieve the lists of authorized user credentials. The overall architecture does not impose any special requirements upon computing resources. ARC services and clients can be deployed on a variety of computing resources, ranging from single-CPU desktop PCs through SMP systems to traditional computing clusters. In the case of the latter, the presence of a shared file system (e.g. the Network File System, NFS) between the head node and the compute (worker) nodes of a cluster is preferred because if there is no such shared area, LRMS has to provide means for staging data between the head node and the compute nodes. Cluster-wise, ARC implements a pure front-end model: no Grid middleware has to be installed on the worker nodes. Being developed mainly for the purposes of serial batch jobs, ARC presently does not explicitly support a range of other computational tasks. Most notably, there is no specialized solution for an interactive analysis, such as graphical visualization or other kind of tasks requiring realtime user input. Although it attempts to appear to an enduser as not much different from a conventional batch system, ARC has a substantial difference: the results of a batch job do not appear at the user’s local area, until the user fetches them manually. The ARC job submission client does not provide an automatic job resubmission in the case of trivial failures (such as network outages), nor does it implement job re-scheduling from one computing service to another. Higher level tools or Grid services can be developed to assist in such tasks (e.g., the Job Manager, see Section 6). ARC is not suitable for jobs that are expected to execute instantly: this would require distributed advance resource reservation, which is not implemented in the Grid world. The related shortcoming is that ARC provides no solution for cross-cluster parallel jobs. All these limitations do not stem from the architectural solution: rather, they would require creating new dedicated services, quite possibly in the framework of the current design. 2.3. Comparison to similar Grid solutions ARC was designed and implemented following one year of studies and try-outs of existing Grid solutions in 2001. It inherited some approaches from other projects, but also it introduced new ones, making it substantially different in many respects. Since then, ARC is developing in parallel R with such middlewares as Globus Toolkit , EDG/LCG/gLite line [7], UNICORE [6] and Condor [1], offering an alternative implementation of a similar service. This section gives a comparison of ARC and other related Grid middleware solutions. The original purpose of the NorduGrid project was to use the EDG middleware to set up a Nordic Grid testbed. This R middleware was based on the Globus Toolkit , and when it turned out that the EDG solution would neither be delivered in

224

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

time to set up the testbed, nor would it satisfy the community requirements of non-intrusiveness, security and portability, the R became the natural solution on which to Globus Toolkit base ARC, as it was meant to be interoperable with the EDG middleware. It was also important that users already familiar with batch control systems could relatively easily switch to a R new system. For that purpose Globus Toolkit seemed to be the best solution, as it was modular and potentially could be used to build a system with the desired functionality. Although some of its components were not able to provide all the desired functionality and some interfaces were not documented, the very presence of multiple interfaces, a rich API of underlying libraries and the availability of the source code made the task of developing middleware based on documented protocols and interfaces look feasible. R Grid solutions featuring Globus Toolkit cover a large array of technologies spanning from security to Grid service platforms [28]. While ARC uses some Globus R Toolkit components, the aim of ARC is to provide a deployable out-of-the-box Grid solution with lightweight components, especially the client. The scope of ARC is in that sense similar to UNICORE and EDG with its follow-up middlewares LCG-2 [29] and gLite [25]. EDG, LCG-2 and gLite rely upon the concept of a dedicated Resource Broker service, with a sandbox for input and output data incorporated. The EDG-like middlewares use a two-step submission scheme: from the User Interface to the Resource Broker and then from the Resource Broker to the computing resources. The latter are equipped with the R Globus pre-WS GRAM [30], however mediated through Condor-G [31] to enable scalability in the pre-WS GRAM R pre-WS GRAM tends to overload the frontinterface. Globus end machine by running a new process for each serviced job for the whole duration of job execution. The necessity of backward connections to clients for data staging makes it problematic to be used by clients behind network firewalls. The submission interface from the User Interface to the Resource Broker is EDG/LCG specific, however, in gLite this interface is changed to Condor-C, hence closing the gap between EDG/LCG/gLite and Condor even further. In the early days of EDG, the Condor-G interface was not yet added, which was one of the reasons for NorduGrid to diverge from EDG, since the pure pre-WS GRAM could not provide adequate stability and scalability. Further, the reliance of EDG-like solutions on a set of central services like the Resource Broker is seen as one of the shortcomings of these middlewares: not only it is a bottleneck, but also in the case of a broker failure, the task of job recovery can be problematic if the User Interface completely relies on the broker’s job management machinery. ARC, though starting from the same base as LCG and gLite (but preceding them), does not use an intermediate resource broker, rather, the client does the brokering, and jobs are submitted through the interface built on top of the GridFTP channel, whose connection directions can be controlled, hence making possible firewalled direct user interfaces. Further, as the job states are maintained on the resource and user interface, failure of any service or client does not cause the loss of all jobs, if any.

UNICORE relies on neither Globus nor Condor, and has a different scheme for security, information provision and file transfers. Quite a significant shortcoming of the UNICORE solution was the absence of a flexible and dynamic information system. Another issue with UNICORE is that it was mostly targeting providing interfaces to particular applications. Since we needed general purpose access to computing resources, unfortunately UNICORE was not satisfactory. The closedsource approach was also a strong argument against it. Distributed computing and data handling services are provided also by Condor [1] and the Condor Stork data placement scheduler [32]. While Condor is essentially a batch processing system for computer clusters, Condor-G adds wide-area distributed inter-domain computing functionality, R interfacing, among others, to the Globus Toolkit and, recently, ARC computing servers. This model is similar to that of ARC, but back in 2001, Condor was not interfaced to other batch systems, while ARC is designed to utilize other local resource managers in addition to Condor. While considering possible solutions, it was taken into account that middleware should interfere as little as possible with the existing cluster management infrastructures on sites. This consideration prevented us from choosing Condor as a possible solution because at that time it was mostly meant to replace batch management systems. Another Condor shortcoming at that time was its strongly centralized infrastructure and public unavailability of technical information and source code. While it is solid and reliable, this prevented any possibility to extend its features or to interface new modules to it. 3. Services ARC provides a reliable, lightweight, production quality and scalable implementation of fundamental Grid services such as Grid job submission and management, Grid resource management, resource characterization, resource aggregation, and data management. A Grid enabled computing resource runs a non-intrusive Grid layer composed of three main services, see Fig. 2: specialized GridFTP (Section 3.1), Grid Manager (Section 3.3), and the Local Information Service (Section 3.6.1). The first two constitute the Computing Service as such. A Grid enabled storage resource makes use of either of the two types of Grid storage layers provided by ARC: a conventional solution, based upon the GridFTP server (Section 3.4), or the Web service based (Section 3.2) Smart Storage Element (Section 3.5). Storage resources also run Local Information Services. Computing and storage resources are connected to the Grid via the registration service (Section 3.6.3) which links a resource to an Information Index Service (Section 3.6.2), this way implementing resource aggregation. 3.1. ARC GridFTP server The GridFTP interface was chosen for job management and data movement. ARC provides its own GridFTP server

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

225

Fig. 2. ARC Computing Service; arrows indicate data and information flow between the components.

(GFS), with a GridFTP interface implemented using the Globus GSIFTP Control Connection application programming R interface (API) of the Globus Toolkit libraries. The reason for introducing this server was an absence of GridFTP servers with flexible authorization. The only available implementation was the modified WU-FTPD FTP server distributed as part of R Globus Toolkit . With the whole configuration based on the UNIX identity of users, it was incapable of providing adequate authorization in distributed environments. GFS consists of two parts. The front-end part implements the GridFTP protocol, authentication, authorization and configuration of the service. The back-end part implements the file system visible through the GridFTP interface. This solution also adds the possibility of serving various kind of information through the GridFTP interface, and to reduce the number of transport protocols used in the system — and hence, to decrease the complexity of the software. GFS does not implement all the latest features of the GridFTP protocols, but it is compatible with the tools provided R by the Globus Toolkit and other Grid projects based on it. The server part takes care of identifying the connecting client and applying access rules (see Section 3.7). In the configuration, a virtual tree of directories is defined, with directories assigned to various back-ends. The activation of the virtual directories is performed in the configuration, according to the Grid identity of a client. Therefore, each client may be served by various combinations of services. Information about Grid identity of a user is propagated through the whole chain down to the back-ends. This makes it possible to write a back-end capable of making internal authorization decisions based entirely on Grid identity of the connecting client and thus avoids the necessity to manage dedicated local user accounts. The back-end part is implemented as a set of plugins. Each plugin is a dynamic library which implements a particular C++ class with methods corresponding to FTP operations needed to create a view of a virtual file system-like structure visible through the GridFTP interface. Currently, the ARC distribution comes with three plugins. Two of them, called fileplugin and gaclplugin, implement

different ways for accessing local file systems (see Section 3.4 for descriptions). Their virtual file system almost corresponds to the underlying physical one. The third plugin is called jobplugin: it generates a virtual view of session directories managed by the Grid Manager service and is described in Section 3.3. 3.2. HTTPSd framework ARC comes with a framework for building lightweight HTTP, HTTP over SSL (HTTPS) and GSI (HTTPg) services and clients [33]. It consists of C++ classes and a server implementation. These classes implement a subset of the HTTP protocol, including GET, PUT and POST methods, content ranges, and other most often used features. The implementation is not feature rich, instead, it is light-weight and targeted to transfer the content with little overhead. It has some support for gSOAP [34] integrated in the implementation of the POST method, thus allowing creation and communication with Web Services. This framework does not support message-level security. The server implements configuration, authentication and authorization in a way common for all the network services of ARC. It listens to two TCP/IP ports for HTTPS and HTTPg connections. The configuration assigns various plugin services to paths. The services themselves are dynamic libraries which provide configuration and creation functions and supply an implementation of a C++ class responsible for handling HTTP requests. The client part is a thin layer which allows one to move data through GET and PUT methods, and provides functionality adequate to the server part. The DataMove library, described in Section 5.2, uses its functionality to provide a multi-stream access to HTTP(S/g) content. ARC is currently being delivered with two HTTP(S/g) plugins. Those are two Web Services: the Smart Storage Element (Section 3.5) and the Logger (Section 6.2). Some others are being developed. Among them is a Web Service for job submission and control, intended to replace the jobplugin

226

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

of the GFS. It also provides an interface for ordinary Web browsers to monitor the execution of Grid jobs. 3.3. Grid Manager The purpose of the ARC Grid Manager (GM) is to take necessary actions on a computing cluster front-end, needed to execute a job. The reason for introducing GM was the absence of corresponding services with desired features in other Grid middlewares and toolkits. Some available solutions were considered and rejected. One R , was lacking data of them, GRAM from Globus Toolkit staging capabilities, and did not have an ability to share cached files between local user accounts. The way it handled TCP/IP ports for every managed job was limiting the number of active jobs at firewalled sites. Moreover, complete absence of protocol documentation prevented us from implementing a compatible service with enhanced functionality. Another possible solution considered, Condor [1], required a central server, and required modification of user applications in order to provide data access capabilities. In ARC, a job is defined as a set of input files, a main executable, the requested resources (including pre-installed software packages) and a set of produced files. The process of gathering the input files, executing the main executable, and transferring/storing the output files is carried out by GM (Fig. 2). Any data access performed by a job during execution out of range of predefined input files has to be done by the user application itself and is outside the scope of this service. GM processes job descriptions written in Extended Resource Specification Language (XRSL, see Section 4.1). Each job gets a directory on the execution cluster, the session directory. This directory is created by GM as a part of the job preparation process. The job is supposed to use this directory for all input and output data, and possibly other auxiliary files. All the file paths in the job description are specified relative to the session directory. There is no notion of the user’s home directory, nor user ID, nor other attributes common in shell access environments. The main emphasis of GM is on serving jobs, not UNIX users. Hence, the possibility of sharing any information between jobs being processed is not guaranteed. Jobs executed on ARC-enabled clusters should not rely on anything specific to a user account, not even on the existence of such an account. In most common installations, the space for the session directory resides in an area shared by means of the NFS with all the computing nodes. In this case, all the data that a job keeps in the session directory is continuously available to the user through the jobplugin of GFS, as described below. It is also possible to have a session directory prepared on the front-end computer and then moved to a computing node. This limits the amount of remotely available information during the job execution to that generated by GM itself. Input files are gathered in the session directory by the downloader module of GM. Downloader is a data management client application capable of fetching data from various sources, including those registered in indexing services. Support for

replicas of data and download retries makes data transfer more robust. To do that, downloader uses the DataMove library described in Section 5.2. In the most common configuration, this module is run by GM directly on the front-end. To limit the load of the front-end machine, it is possible to control the number of simultaneous instances of the downloader modules and the number of downloadable files. GM can be instructed to run downloader on a computing node as a part of the job execution. Such a configuration is desirable on computing clusters with slow or small disk space on the front-end. GM supports various kinds of sources for input as well as for output files. These include pseudo-URLs expressed in terms of data registered in Indexing Services. All types of supported URLs can be found in Section 5.2. A client can push files to the session directory as a part of the job preparation. This way, the user can remotely run a replica of an environment prepared and tested on a local workstation without the trouble of arranging space on storage services. GM handles a cache of input files. Every URL requested for download is first checked against the contents of the cache. If a file associated with a given URL is already cached, the remote storage is contacted to check the file’s validity and the access permissions based on the user’s credentials. If a file stored in cache is found to be valid, it is either copied to session directory, or a soft-link is used, depending on the configuration of GM. The caching feature can be turned off either by the resource administrator or by the user submitting the job. Caching capability may be configured to be private per local account or shared among serviced users. GM can automatically register the files stored in its cache to a Data Indexing Service. This information is then used by a brokering client (see Section 4.2) to direct jobs into a cluster with a high probability of fast access to input data. This functionality has proved to be very useful when running many jobs which process the same input data. It is possible to instruct GM to modify the files’ URLs and hence access them through more suitable protocols and servers. This feature makes it possible to have so-called “local Storage Elements” in a way transparent to users. With the possibility to have this information published in the Information System (see Section 3.6), this can significantly reduce the job preparation costs. After the contents of the session directory has been prepared, GM communicates with the Local Resource Management System (LRMS) to run the main executable of the job. GM uses a set of back-end shell scripts in order to perform such LRMS operations as job submission, detection of the exit state of the main executable, etc. Currently supported LRMS types include various PBS [35] flavors, Condor and SGE [36]. There is also support for job execution on the front-end, meant for testing purposes only. The output files generated by the job have to be listed in the job description together with their final destinations, or indicated to be kept on the computing resource for eventual retrieval. After the job’s main executable has stopped running, they are delivered to the specified destinations by the uploader

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

module. The uploader is similar to the downloader described above. The session directories are preserved only temporarily after the session is finished. Hence, the user cannot directly re-use any produced data in sequential jobs and has to take care of retrieving output files kept on the computing resource as soon as possible. Support for software runtime environments (RTE, see also Section 4.1) is also a part of GM. RTEs are meant to provide a unified access to pre-installed software packages. By requesting a specific RTE in the job description, the job is meant to be executed in an environment fitted with the necessary utilities, libraries, environment variables etc. In this way, RTEs provide predictable and functionality-wise identical environments for jobs. The RTEs are especially useful in providing different versions of the same software package or otherwise overlapping software. In GM, support for RTE is implemented through shell scripts executed a few times during job duration: after the preparation phase of the job, before running the main executable, and after it has finished. The scripts may set up environments needed by pre-installed software packages, and even modify the job description itself. To standardize creation of new RTEs and to make it easier for users to find out how to use available RTEs, a registration portal was established [37] by a NorduGrid partner, the Nordic Data Grid Facility. A job passes through several states in GM, and each state can be assigned to an external plugin. This way, the functionality of GM can be significantly extended with external components to include e.g. logging, fine-grained authorization, accounting, filtering, etc. GM itself does not have a network port. Instead, it reads the information about job requests from the local file system. This information is delivered through the GridFTP channel by using a customized plugin for the ARC GridFTP server, named jobplugin. Information about the jobs served by GM and content of their session directories is represented by the same plugin through the virtual file system. Session directories are mapped to FTP directories. If the computing resource uses a shared filesystem, this also allows access to current information (produced by the job at execution time) through a convenient interface. The downloader module is not the only agent allowed to provide input files for a job. The files which are not accessible from a machine running downloader (like those stored at the user’s computer) can be provided by the client through direct upload to the session directory via the jobplugin GridFTP interface. This eliminates the need to run remotely accessible services on the client side, and allows creating a user interface which does not require negotiation of open ports in a local firewall and can even be used in Network Address Translation (NAT) environments. The jobplugin also presents a set of virtual directories and files. FTP manipulations on these objects allow for job management. All job manipulation procedures are done by uploading an instructions file (XRSL, Section 4.1, or JSDL [38] format) to a virtual directory. For common operations like

227

job cancellation etc., there are short-cuts provided, which allow such operations to be performed by inexpensive FTP manipulations on the directory representing a job. Information about jobs can be obtained through the contents of special virtual files, or through the Information System, as described in Section 3.6. Detailed information about the jobplugin interface is available in the NorduGrid technical note [39]. Access through GridFTP to every piece of information is protected, and a flexible access control over the job data is possible. By default, every piece of information about the job is accessible only by a client which presents the same credentials as those used to submit a job. By adding GACL [40] rules to job description, a user can allow other users to see results of the job and even to manage it. Thus users can form groups to share the responsibility for submitted jobs. GM has several other features. Those include E-mail notification of job status changes; support for site local credentials through customizable plugins; authorization and accounting plugins; and the possibility to report jobs to a centralized logging service (see Section 6.2). 3.4. Classic Storage Element Most Storage Elements on sites equipped with ARC are GridFTP servers. A significant number of them use ARC GridFTP server with one of the following plugins. The fileplugin presents the underlying file system in a way similar to ordinary FTP servers with file access based on a mapping between the client certificate subject and a local UNIX3 identity. It is also possible to specify static access restrictions for pre-defined directories directly in the configuration of the GridFTP server (GFS). In addition, there is a way to completely disregard UNIX identities of the clients, thus relying only on the capability of GFS to create virtual file system trees, or to mix both ways. This plugin is meant to be used as a replacement for an ordinary FTP server in user mapping mode, or for user communities which do not need complex dynamic access control inside dedicated storage pools. The gaclplugin uses GACL [40], an XML-based access control language, in order to control access to files and directories on a storage element. Each directory and file has an associated GACL document stored in a dedicated file. In order not to break the compatibility with the FTP protocol, modification of GACL rules is performed by downloading and uploading those files. GACL files themselves are not visible through FTP file listing commands. Each GACL document consists of entries with a list of identities and access rules associated with them. There is no predefined set of supported types of identities. Currently, ARC can recognize a X.509 certificate subject (Distinguished Name [41]), VOMS names, groups, roles and capabilities, and memberships of a predefined Virtual Organization (lists of members must be created by a third-party utility). GACL 3 UNIX is a trademark of the Open Group.

228

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

permissions control the possibility to read and write objects, list the contents of directories or get information about stored files. The admin permission allows one to obtain and modify the contents of GACL documents. A GridFTP server with the gaclplugin is most convenient for providing flexible dynamic access control based purely on Grid identities. 3.5. Smart Storage Element A “Smart” Storage Element (SSE) is a data storage service that performs a basic set of the most important data management functions without user intervention. It is not meant to be used in an isolated way. Instead, its main purpose is to form a data storage infrastructure together with similar SSEs R and Data Indexing Services, such as the Globus Replica Catalog or Replica Location Service. The main features of SSE and the data storage infrastructure which can be implemented using SSE are as follows (see also Fig. 3): • SSE can be used as a stand-alone service, but preferably should be a part of a full infrastructure and must be complemented with Data Indexing Services (DIS). • DIS can be either a common purpose or an application specific one. The modular internal architecture of SSE allows adding (on the source code level) interfaces to previously unsupported DIS types. For every DIS C++ class responsible for registering and unregistering instances of data has to be implemented. SSE currently comes with R interfaces to Globus Replica Catalog, Replica Location Service and empty interface for standalone deployment. • SSE has no internal structure for storing data units (files). Files are identified by their identity used in a DIS (Logical File Name, GUID, Logical Path, etc.). • All operations on data units (creation, replication, deletion of replica, etc.) are done upon a single request from a client through SSE. Hence most operations are atomic from the client’s point of view. Nevertheless, objects created during those operations do not always have a final state and may evolve with time. For example, a created file can be filled with data, then validated, and then registered. • Entry points to a DIS infrastructure should be used by clients only for operations not involving data storage units. If an operation involves any modification of SSE content which has to be registered in DIS, any communication to DIS will be done by the SSE itself. • Access to data units in SSE is based on the Grid Identity of the client and is controlled by GACL rules, for flexibility. • Data transfer between SSEs is done upon request from a client, by SSEs themselves (third-party transfers using “pull” model). The interface to SSE is based on SOAP over HTTPS/HTTPg (Web Services) with data transfer through pure HTTPS/HTTPg. It is implemented as a plugin for the HTTPSd framework (see Section 3.2). The SOAP methods implemented in SSE allow it to:

Fig. 3. An example of an architecture of a Data Storage Infrastructure.

• Create a new data unit and prepare it to be either filled by the client, or its contents to be pulled from an external source. The SSE can pull data both from inside of the infrastructure (replication) and from external sources. SSE uses the DataMove library (see Section 5.2) for data transfer and supports all the corresponding protocols. • Get information about stored data units — a regular expression search for names is possible. • Inspect and modify the access rules of data units. • Update the metadata of stored units. This method is mostly used to provide checksums needed for content validation. • Remove data units. Recently, a SRM v1.1 [42] interface has been implemented for SSE. Work to provide support for the current standard SRM v2.1.1 is in progress. 3.6. Information system The ARC middleware implements a scalable, production quality, dynamic, LDAP-based distributed information system via a set of coupled resource lists (Index Services) and local LDAP databases. Implementation of the ARC information system is based on OpenLDAP [43,21] and its modifications R provided by the Globus MDS framework [44,45]. NorduGrid R plans to replace the Globus LDAP modifications with native OpenLDAP functionalities. The system consists of three main components: (1) Local Information Services (LIS), (2) Index Services (IS), (3) and Registration Processes (RP). The components and their connections shown in the overview Fig. 4 are described in the following subsections. 3.6.1. Local Information Service Local Information Services (LIS) are responsible for resource (computing or storage) description and characterization. We note that the description of Grid jobs running on a resource is also a part of the resource characterization: job status monitoring within ARC is done via querying the LIS components of the information system. The local information is generated on the resource, and optionally is cached locally within the LDAP server. Upon client requests, the information is presented to the

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

229

Fig. 4. Overview of the ARC information system components. Clients issue type-1 queries (dotted arrows) against Index Services which maintain a dynamic list of contact URLs of LISs and further Index Services registering to them (registration processes are marked by dashed arrows). Then clients perform type-2 queries (solid arrows) to discover resource properties stored in the LISs.

Fig. 5. Internal structure of the Local Information Service.

clients via LDAP interface. A LIS is basically nothing more but a specially populated and customized OpenLDAP database. Fig. 5 together with the description below gives a detailed account of the internal structure of the LIS. The dynamic resource state information is generated on the resource. Small and efficient programs, called information providers, are used to collect local state information from the batch system, from the local Grid layer (e.g., the Grid Manager

or the GridFTP server), or from the local operating system (e.g., information available in the /proc area). Currently, ARC infoproviders are interfaced with the UNIX fork, the PBS family (OpenPBS, PBS-Pro, Torque), Condor and Sun Grid Engine (SGE) batch systems. The output of the information providers (generated in LDIF format) is used to populate the R LIS. A special OpenLDAP back-end, the Globus GRIS [45], is used to store the LDIF output of the information providers.

230

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

The custom GRIS back-end implements two functionalities: it is capable of caching the providers’ output, and upon client query it triggers the information providers unless the data are already available in the cache. The caching feature of the OpenLDAP back-end provides protection against overloading the local resource by continuously triggering the information providers. The information stored in the LISs follows the ARC information model. This model gives a natural representation of Grid resources and Grid entities. It is implemented via an LDAP schema by specifying an LDAP Directory Information Tree (DIT). The schema naturally describes Grid enabled clusters, queues and storage elements. The schema, unlike other contemporary Grid schemas, also describes Grid jobs and authorized Grid users. A detailed description of this information model is given in the NorduGrid technical note [46]. 3.6.2. Index Services Index Services are used to maintain dynamic lists of available resources, containing the LDAP contact information of the registrants. A registrant can either be a LIS or another Index Service. The content of the Index Service, that is, the information about the registrants, is pushed by the registrants via the registration process and is periodically purged by the Index Service, this way maintaining a dynamic registrant list. Index Services are implemented as special purpose LDAP databases: the registration entries are stored as pseudo-LDAP entries in the Globus GIIS LDAP back-end [45]. Although the registration information is presented in a valid LDIF format, the registration entries do not constitute a legitimate LDAP tree, therefore they are referred to as the pseudo-LDAP registration entries of the Index Service. A specially crafted LDAP query is required to obtain the registration entries. Besides the registrant tables, no other information is stored in the ARC Information Index Services. The current implementation based on misuse of the GIIS back-end is planned to be replaced by something more suitable: the simplistic functionality of maintaining a dynamic list of registrant tables could be trivially done e.g. in native LDAP. Furthermore, ARC Index Service does not implement any internal query machinery, Index Services are basically plain lists of contact URLs of other Index Services and LISs. 3.6.3. Registration processes, topology Individual LISs and Index Services need to be connected and organized into some sort of a topological structure in order to create a coherent information system. ARC utilizes registration processes and Index Services to build a distributed information system of the individual LISs and Index Services. During the registration process, the registrant (lower level) sends its registration packet to an Index Service. The packet contains information about the host (the registrant) initiating the process and the type of the information service running on the registrant (whether it is a LIS or an IS). The registration message is basically an LDAP contact URL of the information service running on the registrant. Registrations are always initiated by the registrants (bottom-up model) and

Fig. 6. Resources (LIS) and Index Services are linked via the registration process creating a hierarchical multi-rooted tree topology. The hierarchy has several levels: the ω lowest level (populated by LIS), the γ organisation/project level, the β country/international project level and the α top (root) Grid level. The tree is multi-rooted, such that there are several α-level Index Services and in bigger countries there are several β- or country-level Index Services.

are performed periodically, thus the registration mechanism follows a periodic push model. Technically, the registrations are implemented as periodic ldapadd operations. The LISs and the Index Services of ARC are organized into a multi-level tree hierarchy via the registration process. The LISs that describe storage or computing resources represent the lowest level of the tree-like topology. Resources register to first level Index Services, which in turn register to second level Services, and so forth. The registration chain ends at the Top Level Indices which represent the root of the hierarchy tree. The structure is built from bottom to top: the lower level always registers to the higher one. The tree-like hierarchical structure is motivated by the natural geographical organization, where resources belonging to the same region register under a region index, region indices are registering to the appropriate country index, while country indices are grouped together and register to the top level Grid Index Services. Besides the geographical structuring there are some Index Services which group resources by specific application area or organization. These application/organization Index Services either link themselves under a country Index or register directly to a Top Level Index. In order to avoid any single point of failure, ARC suggests a multi-rooted tree with several top-level Indices. Fig. 6 shows a simplified schematic view of the hierarchical multi-rooted tree topology of ARCconnected resources and Index Services.

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

3.6.4. Operational overview Grid clients such as monitors, Web portals or user interfaces (Section 4) perform two types of queries against the Information System: (1) During the resource discovery process clients query Index Services in order to collect lists of LDAP contact URLs of Local Information Services describing Grid resources. (2) During a direct resource query the clients directly contact the LISs by making use of the obtained LDAP contact URLs. Once the client has collected the list of resource LDAP contact URLs (by performing type-1 queries against the Index Services), the second phase of the information collection begins: type-2 queries are issued against LISs to gather resource descriptions. Both types of queries are carried out and served via the LDAP protocol. Unlike other similar purpose systems R R (Globus Toolkit 2 GIIS or Globus Toolkit 4 aggregator [47], or R-GMA [48]), ARC has no service which caches or aggregates resource specific information on a higher level. ARC Index Services are not used to store resource information, they only maintain dynamic lists of LDAP URLs. Fig. 4 presents an overview of the ARC information system components showing relations between the Index Services, LISs registration processes and the two type of queries. 3.6.5. Scalability considerations Designing and implementing a scalable and production quality Grid information system is a major challenge: some of the production Grid middlewares do not even address the problem (e.g. UNICORE) while others simplify it by following one of the two extreme approaches: (1) Collect all the Grid information centrally and provide this central cache for the clients. (2) Keep all the information locally, close to the services, and require clients to perform direct queries to the fully distributed information sources. Globus with its original MDS proposed the theoretical “golden mean” of a hierarchical system of Information Caches; however, it provided a rather poor implementation [49] which forced other production Grid middleware deployment and development projects to choose from one of the two options above. LCG [29], the largest operational production Grid of the world, deploys and operates a central cache to collect all the Grid Information data, while NorduGrid running on ARC middleware follows the second alternative. Other proposed solutions, such as the Relational Grid Monitoring Architecture, R-GMA [48] have not yet been proved in production environments. Both of the two extreme approaches have their own inherent scalability problems. The central cache faces the challenge of collecting, updating and presenting information describing the entire Grid in a single central service, thus clearly introducing a single point of failure and a severe performance bottleneck. The fully distributed information sources alternative (ARC approach described in the previous sections) solves

231

the bottleneck problem but results in a different scalability problem, namely the necessity of a large number of client queries. The current ARC deployments have shown that the Local Information Services can easily serve up-to-date (not older than 30 s) information about dynamic properties of resources, hundreds of Grid users and thousands of Grid jobs. Clients using the parallel resource discovery so far had no problems fetching information from the few hundreds LISs. ARC developers are aware of the scalability limitations of the current approach: there are plans to introduce caching services and to implement sophisticated discovery algorithms. 3.7. Security framework of services The security model of the ARC middleware is built upon the Grid Security Infrastructure (GSI) [3]. Currently, for communication ARC services do not use message level security, but rather transport level security [33]. The ARC services that make use of authentication and authorization are those built on top of the GridFTP server and the HTTPSd framework. The authentication part makes use of Certificate Authorities (CA) signing policies, whereby sites can limit the signing name-space of a CA. Moreover, ARC provides a utility (grid-update-crls) to retrieve certificate revocation lists on a regular basis. In a basic and largely obsolete authorization model, ARC services support a simple X.509 certificate subject (Distinguished Name) mapping to UNIX users through the grid-mapfile mechanism. Automatic grid-mapfile creation can be done using the nordugridmap utility which can fetch Virtual Organization membership information from LDAP databases, and simple X.509 subject lists — through FTP and HTTP(s). In addition, it is possible to authorize local users, specify filters and black-lists with this utility. For advanced authorization, most of the ARC services are capable of acting purely on the Grid identity of the connecting client without relying on local identities. Those services support a unified set of rules which allows to identify users, group them and choose adequate service level. A mapping to a local identity must exist however, since the back-end services usually use the local operating system to handle access control. In addition to the personal Grid identity of a user, other identification mechanisms are supported. Among them are the Virtual Organization Membership Service (VOMS) and general VO membership (through external information gathering tools). It is also possible to use external identification plugins, thus providing high level of flexibility. In addition to access-level authorization, many services make use of the client Grid identities internally to provide fine-grained access control. Currently, all such services use GACL language for that purpose. Fine-grained access control is available in the GridFTP server’s file access and job control plugins and in the SSE service.

232

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

The job handling service – the Grid Manager – also has a pluggable authorization mechanism that can provide finegrained authorization and accounting during job execution (see Section 3.3) through third-party plugins. 4. User interfaces and clients Users interact with the ARC services on the Grid resources using various client tools. With the help of the basic command line interface (CLI, Section 4.2), users can submit jobs to the Grid, monitor their progress and perform basic job and file manipulations. Jobs submitted through the command line interface must be described in terms of the Extended Resource Specification Language (Section 4.1). The basic command line interface for job handling is similar to a set of commands that can be found in a local resource management system. In addition to this, there are commands that can be used for basic data management, like creating and registering files on storage elements and administering access rights to the data. In addition to the CLI, the ARC client package includes R some commands from the Globus Toolkit , in particular, those needed for certificate handling and data indexing (see Section 4.3). Several other clients interacting with the ARC services are developed as well. An example of such is the ARC Grid Monitor (see Section 4.4) that presents the information from the Information System using a Web browser. 4.1. Resource Specification Language The XRSL language used by ARC to describe job options is an extension of the core Globus RSL 1.0 [50]. The extensions were prompted by the specifics of the architectural solution, and resulted in the introduction of new attributes. In addition, differentiation between the two levels of job options specifications was introduced. The user-side XRSL is a set of attributes specified by a user in a job description file. This file is interpreted and pre-processed by the brokering client (see Section 4.2), and after the necessary modifications is passed to the Grid Manager (GM). The GM-side XRSL is a set of attributes prepared by the client, and ready to be interpreted by the GM. The purpose of XRSL in ARC is therefore dual: it is used not only by users to describe job requirements, but also by the client and GM as a communication language. The major challenge for many applications is pre- and poststaging of a considerable amount of files, often of a large size. To reflect this, two new attributes were introduced in XRSL: inputFiles and outputFiles. Each of them is a list of localremote file name or URL pairs. Local to the submission node input files are uploaded to the execution node by the client; the rest is handled by the GM. The output files are moved upon the job completion by the GM to a specified location (Storage Element). If no output location is specified, the files are expected to be retrieved by a user via the client. Another important addition is the runtimeenvironment attribute. Very often jobs require a specific local environment in

order to get properly executed. This includes not merely – and not necessarily – a presence of a specific pre-installed software, but also proper settings of local variables and paths, existence of certain services or databases, a particular system configuration and so on. It is convenient to denote such a suite of settings with a single tag – a runtime environment name. This tag is validated and advertised by the resource owners and simplifies matchmaking, since users can request the necessary runtime environment tag by the means of XRSL. Runtime environment (RTE) is a more advanced concept than a set of environment variables, as it may indicate the presence of a specific operating system, compiler, file system or even a dedicated hardware setup, and also it allows one to avoid clashes when different environments are called identically at different sites. RTEs were introduced originally by the EDG middleware, and developed further for ARC by formalizing RTE name-spaces. It is a responsibility of Virtual Organizations to define such complex environments, validate and advertise them. In general, a user is not expected to know which particular set-up is needed in order to execute a VO-specific task: a requirement of a well-defined RTE in job description is sufficient to ensure the task will be executed properly. Several other attributes were added in XRSL, allowing users to specify in details their preferences [51]. A typical XRSL file is shown in Box I. R It is worth noting that since the release of Globus Toolkit 3, Globus provides an improved, XML-based version of RSL. It includes several new attributes, such as e.g. fileStageIn and fileStageOut, which are almost entirely analogous to the corresponding NorduGrid’s XRSL attributes. However, Globus R Toolkit 3 became available only in 2003, approximately a year after NorduGrid introduced such attributes, and it still does not support a complex concept of the runtime environment. It is also remarkable that most Grid middleware projects develop their own languages to describe tasks, such as another, independently developed XML-based XRSL by the PROGRESS project [52], Condor’s ClassAds [30] and inspired by it JDL by EDG [7], JRDL by GRASP [53] or AJO by UNICORE [6], to name a few. We in NorduGrid look forward to a standardized language that satisfies everybody’s needs, and the recently proposed JSDL [38] by GGF appears to be a proper basis for such. 4.2. ARC Command Line Interface The ARC Command Line Interface (CLI) provides basic tools for simple Grid job management and single file manipulation. It also incorporates Grid resource discovery, matchmaking, and some brokering functionality, see Fig. 7. The ARC CLI provides all the tools necessary to work in the Grid environment and yet remains small in size and simple in installation. A user can run several CLIs: clients installed on different workstations can be used simultaneously and interchangeably by the same user without the necessity to instantiate them as permanently active services.

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

233

Box I.

Fig. 7. ARC Brokering Client components. Table 1 The job management commands in the ARC user interface ngsub ngstat ngcat ngget ngkill ngclean ngrenew ngresub ngsync ngtest

Job submission Find information about resources and jobs Catenate stdout, stderr or the grid manager’s error log of a job Retrieve the results of a finished job Kill a running job Clean up after a job (deletes the session directory on the cluster) Renew the GSI proxy certificate for a job Resubmit the job (using the same XRSL) to a different cluster Synchronize the local job data base using the Information System Perform various tests

4.2.1. Job management A user should be able to submit jobs to Grid without having to worry about which particular cluster will execute a job. From a user’s perspective it should not be much different from submitting a job to a local cluster. The set of commands used for job management in the ARC CLI is therefore similar to a set of commands provided by a local resource management system on a cluster. The commands are listed in Table 1.

Job submission is done using the ngsub command which invokes a built-in brokering mechanism. Jobs that are submitted must be described using the XRSL specification language (Section 4.1). When a job description is given to ngsub, its requirements are compared to the capabilities of the computing resources that can be discovered through the Information System. Brokering between the requirements and the capabilities is performed and a queue that satisfies the given criteria is selected as submission target. The ngsub client then modifies the user-side XRSL by resolving optional arguments before sending the job description to the selected target as a GM-side XRSL. The brokering capabilities in the ARC middleware are distributed since each client acts as its user’s individual broker, thereby avoiding the possible single point of failure and potential bottleneck that would exist if a centralized broker was used. Several attributes in the XRSL are used in the brokering. A specific cluster or queue can be selected by providing the corresponding XRSL attributes. The requested number of

234

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

CPUs, memory, architecture, operating system or middleware version and pre-installed runtime environments are compared to the capabilities published by the cluster in the information system and those targets that fail to meet the specified requirements are rejected. The brokering algorithm takes into account the size of the input files and where they are located. It tries to minimize the amount of data that has to be transferred from storage elements that are not local to the submission target. It resolves the location of files registered in file index servers like RC and RLS and, if cache registration is enabled at the clusters, the information about cached files is used as well. If there are no free CPUs available on a cluster where the data is already present, the job description is sent to a different cluster, thereby triggering the transfer of the data to that cluster. Compared to other brokering schemes, like e.g. that of Condor [1] the brokering done in the ARC client is much more simple. Condor uses several variables to calculate a final ranking number which then governs the actual choice of resource. Some of these variables are even based on estimates on when a cluster becomes ready. In that sense the Condor brokering, and hence also EDG/LCG/gLite one, seem to be superior to the brokering done in ARC. However, it is important to bear in mind the origin for the advanced Condor scheme: to harvest resources for the single researcher in the most optimal way, and not necessarily to optimize the use of the resources. It is our firm belief that the simpler ARC scheme utilizes the resources at least as good as the Condor-based brokering, and comparisons in mass productions like CERN ATLAS DC2 (Section 7) have shown that the brokering ARC is performing very well indeed. The ngstat command is a user tool that queries the Information System and presents the result to the user. It can be used for both getting information about available computing resources and their capabilities and about the status of jobs submitted to the system. There are also commands for killing queuing or running jobs, for retrieving outputs of finished jobs and for other job management tasks. 4.2.2. Data Management Most of the Data Management tasks are automatic from a user’s perspective since the file movement and registration is performed on the user’s behalf by the Grid Manager, provided the correct attributes are specified in the XRSL job description. There are also user tools that interact with the Storage Services and Data Indexing Services directly (see Table 2). These commands also provide for a higher level data management since they support pseudo-URLs [54] referring to Index Server registrations. If such pseudo-URLs are used, file registration in the Data Indexing Service will be modified in a consistent way to reflect the changes in the Storage Services. 4.3. External commands The ARC middleware is built on top of the pre-WS R components of the Globus Toolkit . The toolkit provides

Table 2 Data management commands in the ARC CLI ngcopy ngremove ngls ngacl

Copy files to from or between Storage Servers Delete files from Storage Servers List files on Storage or Index Servers Set or get the access control list of a file

command line interfaces to some of the components used along with ARC. In particular, the ARC middleware uses the Globus Security Infrastructure (GSI) for authentication. Users therefore make use of the toolkit’s command line tools that handle creation and management of proxy certificates, such as grid-proxy-init. R data management technologies have been Several Globus integrated into the ARC middleware, such as the Replica Catalog and the Replica Location Service. In order to perform data management, users should use the clients of these services. Most notably, the globus-rls-cli tool is very useful for manipulating and querying registered file attributes in the Replica Location Service. 4.4. The Monitor Reliable and up-to-date information about the Grid system, its status, its components, users and even single tasks is provided by the means of the ARC Grid Monitor [55,56]. The Monitor is an end-user tool, offering a simple way of checking the status of the Grid. The ARC Grid Monitor is a Web interface to the ARC Information System (Section 3.6), allowing us to browse all the data published through it. The Information System provides a robust and dynamic model for accessing not only quasi-static information about resources and services, but also about such rapidly changing parameters like queue and job status. Being based on OpenLDAP, it can be easily interfaced to any browsing or monitoring tool, giving a user-friendly overview of all the resources. The Grid Monitor makes use of the PHP4 [57] LDAP module to provide a real-time monitoring and initial debugging via any Web browser. The structure of the Grid Monitor to a great extent follows that of the Information System. The basic objects are defined by the schema’s objectclasses. For each objectclass, either an essential subset of attributes, or the whole list of them, is presented in an easily accessible inter-linked manner. This is realized as a set of browser windows, each being associated with a corresponding module. Each module displays both dynamic and static information: for example, a queue name is static, while the amount of running jobs in this queue is dynamic. Most of the displayed objects are linked to appropriate modules, such that, with a simple mouse click, a user can launch another module, expanding the information about the corresponding object or attribute. Each such module opens in an own window, and gives access to other modules in turn, providing most intuitive browsing. The Grid Monitor’s framework allows full localization, making use of the browsers’ preferred language settings.

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

Currently, the Monitor is available in 7 languages, being perhaps the first multilingual Grid tool. The Monitor layout is deliberately simplistic, ensuring its smooth working with any Web browser. It has been used via a variety of such, starting from text-based Lynx and ending with a mobile phone browser. This effectively allows ARC users to interact with the Grid literally by phone. 5. Developer interfaces The public interfaces of ARC-enabled resources are expressed via standard open protocols: GridFTP, LDAP and HTTP(S). Any client capable of talking LDAP and GridFTP can interface to all ARC computing and most storage resources. In order to ease this communication and to hide the lowlevel details of the LDAP schemas and GridFTP internals (e.g. job submission, control commands) ARC provides a set of developer libraries. The libraries also include convenient solutions for handling GSI identities and XRSL processing, among others. The next subsection describes the ARClib library, focusing on job submission including brokering, while Section 5.2 introduces a common purpose data transfer library. 5.1. ARCLib and JARCLib ARCLib is a client library for ARC written in C++ with a well-defined API. The library consists of a set of classes for • handling proxy, user and host certificates, • discovering and querying resources like clusters and storage elements, • handling XRSL job descriptions, • operating on GridFTP servers — uploading and downloading files etc., • brokering and • job submission. ARCLib also incorporates a set of common useful functionalities suitable for both clients and servers, like URL handling, conversion routines for dates and times and so on. ARCLib is complete enough to perform most client operations but is still very simple to use. Furthermore, ARCLib is wrapped using SWIG [58] and thus also provides a Python [59] interface to the API. ARCLib is therefore well suited for writing client utilities, graphical user interfaces, and portals. R ARCLib depends on pre-WS Globus Toolkit for certificate handling, RSL descriptions and interacting with GridFTP servers, and on OpenLDAP for discovering and querying the information system for resources. Otherwise the library is selfcontained. Resource discovery and querying in ARCLib is done by standard LDAP queries to the information system, but the whole process of discovering and querying resources is wrapped in simple method calls. For example, getting information about all the clusters or all the storage elements registering to the ARC information system can be done by one simple method call. ARCLib does fully threaded LDAP queries to all resources in one go, thus providing maximum speed and performance.

235

Brokering is the process of selecting the most suitable resource in accordance with the job specification. Having obtained a list of candidate target resources, the brokering step consists of removing the candidates inconsistent with the user’s job specification and sorting the remaining candidates according to some criteria. ARCLib finds the candidate targets by a full resource discovery and querying the information system, and then removes inconsistent targets after criteria like requested memory on the nodes of the clusters, requested CPU time, requested installed runtime environments and so on. It then sorts the remaining targets according to the speed of the CPU’s of the clusters. The ARCLib brokering framework makes it possible for the users to write their own brokers and include them in the brokering. At the moment, this requires re-compilation of a part of ARCLib but it is planned to make it possible to dynamically load user-provided brokers. Job submission in ARC is done by uploading the corresponding XRSL job description to a specific directory on a cluster’s GridFTP server. From here, the Grid Manager (GM) on the cluster takes over and handles the job. Job submission in ARCLib follows logically after the brokering step. Having obtained a sorted list of possible targets from this brokering step, the ARCLib methods try to submit the job description to the first target in the target list. If successful, they upload the corresponding local input files specified in the job description and get back the job ID assigned by the GM on the cluster. This job ID is returned to the user. If job submission to the first target fails, the second one is tried and so on. JARClib is a thin layer over GridFTP and LDAP protocols written in Java language. Its aim is to provide a basic ARC API to Java programmers, in the same way as ARCLib provides an API for C/C++ programmers. This enables Java programmers to create platform independent ARC clients, including applets and Java standalone applications. Currently, JARCLib supports job submission and control through the GridFTP interface, basic Information System interfacing methods (e.g. computing resource discovery) and basic matchmaking. All of these actions can be executed in parallel with a possibility to specify the maximum timeout for the whole operation. JARCLib’s implementation relies on R Globus CoG toolkit [60] and Netscape Directory SDK [61]. 5.2. DataMove DataMove is a common purpose data transfer library used by most services and utilities of the ARC middleware. This way, the capabilities of all parts of ARC are unified. The DataMove library is implemented in C++. It does not provide a POSIXlike interface; instead it is meant to be used for creating data transfer channels to stream data through an application. DataMove consists of an interface for URL abstraction (DataPoint) with integrated support for data registration in Data Indexing Service. Fig. 8 illustrates typical usage for transferring data between two endpoints. Data access interface (DataHandle) has an expandable support for popular data transfer protocols. An abstraction of memory buffer

236

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

Fig. 8. Interaction between classes and external services for transferring data from “source” to “destination”.

(DataBufferPar) is used to establish a multi-stream channel between two DataHandle instances. Other features include getting information about objects, control over transfer speed, checksum calculation, etc. R Currently supported services are: Globus RC and RLS, gLite Fireman [24], GridFTP, HTTP wrapped in SSL and GSI, SRMv1.1 [62], SSE and local file systems. 6. Ongoing development For the upcoming 0.6 release of ARC several new features have been added. In the next paragraphs, the Job Manager and the HTTPS job submission framework development are described in brief. Then, the Java Graphical User Interface (can be used as an applet as well) is being introduced, followed by the Logger framework and the application portals. The Job Manager is a higher-level client tool for managing jobs on ARC enabled Grids. It is able to supervise jobs by monitoring their states and reacting to changes in the state. The Job Manager is constructed in such a way that a user can tailor it to fit specific needs for controlling Grid jobs by plugging in handlers for different situations. The system acts as a layer between the user and the Grid, i.e., the users interact with Grid through the Job Manager. As a first step to a Web service based job control interface, a new experimental service is being developed. It is going to provide a replacement for job control over GridFTP currently used in ARC middleware. The interface fully supports the authentication and authorization used in ARC and uses a lightweight (comparing to GridFTP) secure HTTP(S/g) channel. A pure Web interface to session directories and job information is also available. 6.1. Java GUI The Java GUI client for the ARC middleware, the “Arconaut”, is currently in the development stage. It is built on top of JARCLib (see Section 5.1). The main aim of the design was to provide a localizable user friendly interface to credential management, job description generation, submission and management of the jobs. The main features of the Arconaut are as follows: • creating jobs for specific applications (such as Scilab, Povray or any other engine based applications) completely transparently with respect to the underlying XRSL job description; • creating or importing a user proxy from a number of sources (including MyProxy servers [63]);

• submitting a job to Grid with a single click, or browse manually for the computing resource that suits the job the best; • issuing standard actions on multiple submitted jobs in an intuitive way. Arconaut can be used either as a Java Web Start application or as a standalone application. It is platform independent as it depends only on pure Java libraries. The modular architecture provides an easy way for extending the functionality of Arconaut. 6.2. Logging Service ARC provides a framework to deploy a Grid-wide logging infrastructure. The logging system consists of a Logging service (Logger) and logging clients. The Logging Service (Logger) is a MySQL database with a Web service interface implemented in the HTTPSd infrastructure (Section 3.2). The Logger database collects Usage Records (UR) of Grid jobs. The URs are produced on the Grid-enabled computing resources and pushed to the Logger database in a SOAP message. Access to the Logger database is controlled via flexible common access rules which provide administrative and logging access levels. For ordinary clients, access to every UR is private. The Usage Records stored in the Logger database describe Grid job characteristics including resource usage, execution information, ownership, etc. The current UR of ARC is very simplistic and contains only very basic information about jobs. A new UR based on the corresponding Global Grid Forum (GGF) [64] documents has been developed and is currently being implemented. Usage Records are being prepared and submitted to the Logger service by the Grid Manager. By caching URs on the resource front-end, sufficiently reliable message delivery is obtained. Although logging by definition is a centralized service, the target Logger service where the UR is submitted can be selected by both resource administrators via the Grid Manager configuration and by Grid users via their Grid job description. A Usage Record of a Grid job can be submitted to multiple Logger Services, multiple Loggers can coexist and be simultaneously used. This makes it possible to have a Logger service deployed per infrastructure or per VO or for a user to have a personal Logger. To query the information stored in the Logger database ARC provides a command line tool. Besides the command line tool, the NGLogger Web interface [16] was developed for convenient

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

Fig. 9. Application Portal architecture.

access to the URs stored in the database. Items like the number of various kind of jobs per cluster or per user can be seen in a textual and graphical way. 6.3. Application portals As more research areas need more computational resources, it is very important to provide easy access to these resources, both via Web based interfaces and from the command line. Currently, most users of the computational resources use command line tools for accomplishing tasks. For other scientific areas this way of accessing a system can be very unfamiliar. To expand the user base at the Center for Scientific and Technical Computing in Lund (Lunarc [65]), an application portal project was initiated. The project aims to provide a Web based interface to the most commonly used applications, such as MOLCAS [66], ABAQUS [67], MATLAB [68] and Python. Implementation of the Lunarc Application Portal is designed to be lightweight and easily extendable. An important aspect of this is not to reinvent the wheel. Existing software packages and tools are used as much as possible. The architecture is illustrated in Fig. 9. The Lunarc Application Portal is implemented in Python using the WebWare application server [69]. WebWare is an object-oriented application server capable of handling multiple user Web sessions. Each page on the server is represented by a Python class. The application server is integrated into the Apache Web server [70] using the mod webkit Apache module. Job submission and management is handled using the ARC middleware. The current version of the portal uses the command line tools to interface with ARC, but the development version of the portal will use ARCLib. 7. Applications Performance of the ARC middleware is being constantly tested through a variety of applications in production environments. A substantial amount of scientists are using the NorduGrid production Grid as their primary source of computing power and storage capacity. Their applications range over all areas of science — from High Energy Physics simulations through quantum chemistry, genomics and bioinformatics studies to meteorology. The scientific impact has

237

resulted in a number of works acknowledging the use of the NorduGrid production Grid [71–73]. The primary project to use the NorduGrid production Grid has always been the ATLAS [18] experiment at the Large Hadron Collider (LHC) at the European Particle Physics Laboratory, CERN. The challenges faced by the physicists at ATLAS and the LHC at startup in 2007 are unprecedented. The ATLAS detector will at design luminosity observe particle collisions at a rate of 4 × 107 /s. This rate is reduced by a set of filter algorithms to less than 1000 events per second for final storage and physics analysis — corresponding to about 10 PB of data per year. Once running, the physicists at ATLAS expect to have almost instantaneous access to data, analysis tools and computing power. To prepare for this, ATLAS has set up a series of so-called Data Challenges to verify the capability of the collaboration to handle the analysis of such an amount of data. One of these, ATLAS Data Challenge 2 (DC2) [74], consisted of large-scale Monte Carlo simulations of different physics samples. The NorduGrid production grid participated together with the two other Grid projects, LCG and Grid3 [75]. The simulations were performed using a largely automated production system written for the Data Challenge with components keeping track of which tasks to run, actually running the jobs and returning the output of the jobs. On NorduGrid, this production system allowed two physicists only to run the whole production having about 1000–2000 jobs submitted for processing at any given point of time. In total well over 150 000 jobs ranging widely in resource consumption (some taking as long as a CPU-day to process) were run on NorduGrid during the second half of 2004. The total amount of dedicated CPU units available through NorduGrid for ATLAS production was approximately 700. More than 20 sites in 7 countries connected to NorduGrid thereby operated as a single resource and contributed approximately 30% to the total ATLAS DC2 production having only about one-sixth of the CPU’s available to e.g. LCG, which clearly illustrates ARC’s superior performance and efficiency. 8. Discussion, conclusions and future work In this paper, we have presented the NorduGrid collaboration’s Advanced Resource Connector (ARC) middleware, starting from the user requirements and proceeding to the architectural and software engineering related solutions that were implemented based on the requirements. In order to meet the requirements of performance, security and easy deployment, the cornerstones of ARC on the server side are the Information System, built on LDAP, that provides fast access to the information of computing and storage environment; the Grid Manager–GridFTP pair offering a powerful and versatile computing service and storage services. On the client side, there is a job submission client that implements matchmaking between the job requirements and the resources described by the Information System. The standalone client package, available to a variety of UNIX platforms, is easy to install and use.

238

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

The ARC middleware has quickly grown from a prototype to a production-quality solution allowing quick and easy setup of a robust and reliable Grid computing infrastructure. In order to simplify the task of migrating users to the new environment, the developers offer an excellent community support via a ticketing system and mailing lists, comprehensive documentation, and easy to download and install client packages. The software, documentation, access to Web based tools etc. are made available by NorduGrid. Since there are several solutions on the market offering Grid services, standardization and consequent interoperability between them becomes as essential as interoperability of electrical appliances on the power grid — and even more so, as user jobs have a much higher mobility. ARC as a second generation Grid middleware is based R on standard libraries and solutions provided by Globus , OpenSSL, OpenLDAP and VOMS; and offers public interfaces through standard channels: GridFTP and LDAP. Nevertheless, interoperability among second generation Grid middlewares (like LCG [29], gLite [25], Globus [2] and UNICORE [6]) is not easily obtainable. The gradual development and acceptance of Grid standards such as Open Grid Service Architecture OGSA [76] and Web Service Resource Framework WS-RF [77] will hopefully bring us closer to an ideal situation where users would not have to care which Grid client to use. In the future, communication with Grid servers that implement such functionalities as resource discovery, job submission, data management etc, will be available through standard interfaces and protocols. With ARC, a Web service based interface (using its own HTTPg engine [78]), is being developed. Another area of standardization is that of job submission. JSDL [38] is a forthcoming standard of an XML based job submission description language. It is foreseen that ARC will use JSDL in the near future. With these efforts under way, ARC is gradually converging towards the community standards set by the Global Grid Forum and similar bodies. Acknowledgments Parts of the work were supported by the NORDUNet2 programme, Nordic Council of Ministers through the NDGF project, the Swedish Institute Visby Programme and the Swedish Research Council through the Swegrid project. The authors would like to thank the following groups and individuals for their contribution: Profs. Farid Ould-Saada and Alexander Read, Dr. H˚akan Riiser and Katarina Pajchel (University of Oslo), Prof. Paula Eerola (Lund University), Prof. Tord Ekel¨of (Uppsala University), Prof. John Renner Hansen (NBI Denmark), Miika Tuisku and Michael Gindonis (HIP Finland), Dr. Juha Lento ˚ and Arto Ter¨as (CSC Finland), Ake Sandgren (HPC2N Sweden), Leif Nixon (NSC Sweden), Olle Mulmo (PDC Sweden), Dr. Josva Kleist and Henrik Thostrup Jensen (Aalborg University), Mario Kadastik (NICPB Estonia), Dr. Csaba Anderlik (University of Bergen), Antti Hyv¨arinen (Helsinki University of Technology), all involved computer center system administrators, and ARC’s active user community.

References [1] M. Litzkow, M. Livny, M. Mutka, Condor — A hunter of idle workstations, in: Proceedings of the 8th International Conference of Distributed Computing Systems, 1988, pp. 104–111. [2] I. Foster, C. Kesselman, Globus: A metacomputing infrastructure toolkit, International Journal of Supercomputer Applications 11 (2) (1997) 115–128. [3] I. Foster, et al., A security architecture for computational grids, in: CCS ’98: Proceedings of the 5th ACM Conference on Computer and Communications Security, ACM Press, 1998, pp. 83–92. [4] I. Foster, C. Kesselman (Eds.), The Grid: Blueprint for a New Computing Infrastructure, 2nd edition, Morgan Kaufmann, 2003. [5] The NorduGrid Collaboration, Web site. URL http://www.nordugrid.org. [6] M. Romberg, The UNICORE architecture: Seamless access to distributed resources, in: Proceedings of 8th IEEE International Symposium on High Performance Distributed Computing, HPDC-8 ’99, 1999, p. 44. [7] E. Laure, et al., The EU datagrid setting the basis for production grids, Journal of Grid Computing 2 (4) (2004) 299–400. [8] A. W¨aa¨ n¨anen, et al., An overview of an architecture proposal for a high energy physics grid, in: PARA’02: Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing, in: Lecture Notes in Computer Science, vol. 2367, Springer-Verlag, 2002, pp. 76–88. [9] M. Ellert, et al., The NorduGrid project: Using Globus toolkit for building Grid infrastructure, Nuclear Instrumentation and Methods A 502 (2003) 407–410. [10] F. Ould-Saada, The NorduGrid collaboration, SWITCH Journal 1 (2004) 23–24. [11] Estonian Grid, Web site. URL http://grid.eenet.ee. [12] SWEGRID: The Swedish Grid testbed, Web site. URL http://www.swegrid.se. [13] Danish Center for Grid Computing, Web site. URL http://www.dcgc.dk. [14] S. Gadomski et al., The Swiss ATLAS Computing Prototype, Tech. Rep. CERN-ATL-COM-SOFT-2005-007, ATLAS note, 2005. [15] P. Eerola, et al., Building a production grid in Scandinavia, IEEE Internet Computing 7 (4) (2003) 27–35. [16] K. Pajchel et al., Usage statistics and usage patterns on the NorduGrid: Analyzing the logging informations collected on one of the largest production grids in the world, in: A. Aimar, J. Harvey, N. Knoors (Eds.), Proc. of CHEP 2004, CERN-2005-002, vol. 1, 2005, pp. 711–714. [17] P. Eerola, et al., Atlas data-challenge 1 on NorduGrid, in: Proc. of CHEP 2003, La Jolla, California, 2003. [18] The ATLAS Collaboration, ATLAS — A Toroidal LHC ApparatuS, Web site. URL http://atlas.web.cern.ch. [19] Nordic universities network, Web site. URL http://www.nordu.net. [20] E. Deelman, et al., Mapping abstract complex workflows onto grid environments, Journal of Grid Computing 1 (1) (2003) 25–39. [21] T.A. Howes, M.C. Smith, A Scalable, deployable directory service framework for the internet, Tech. Rep. 95-7, CITI (Center for Information Technology Integration), University of Michigan, USA, 1995. [22] H. Stockinger, et al., File and object replication in data grids, Cluster Computing 5 (3) (2002) 305–314. [23] A.L. Chervenak, et al., Performance and scalability of a replica location service, in: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, HPDC’04, IEEE Computer Society Press, 2004, pp. 182–191. [24] EGEE User’s Guide: gLite Data Management Catalog CLI, Tech. Rep. EGEE-TECH-570780-v1.0, EGEE, EU INFSO-RI-508833, 2005. URL https://edms.cern.ch/file/570780/1/EGEE-TECH-570780-v1.0.pdf. [25] EGEE gLite, gLite — Lightweight Middleware for Grid Computing, Web site. URL http://glite.web.cern.ch/glite/. [26] EGEE, EGEE — Enabling Grids for E-science, Web site. URL http://egee-intranet.web.cern.ch/egee-intranet/gateway.html.

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240 [27] R. Alfieri, et al., From gridmap-file to VOMS: Managing authorization in a Grid environment, Future Generation Computer Systems 21 (4) (2005) 549–558. [28] I. Foster, C. Kesselman, Globus: A toolkit-based grid architecture, in: The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999, pp. 259–278. [29] CERN, The LHC Computing Grid Project — LCG, Web site, 2003. URL http://lcg.web.cern.ch/LCG/. [30] J. Nabrzyski, J.M. Schopf, J. Weglarz (Eds.), Grid Resource Management: State of the Art and Future Trends, Kluwer Academic Publishers, Norwell, MA, USA, 2004. [31] J. Frey, et al., Condor-G: A computation management agent for multiinstitutional grids, Cluster Computing 5 (3) (2002) 237–246. [32] T. Kosar, M. Livny, Stork: Making data placement a first class citizen in the grid, in: Proceedings of the 24th International Conference on Distributed Computing Systems, ICDCS’04, IEEE Computer Society, 2004, pp. 342–349. [33] S. Shirasuna, et al., Performance comparison of security mechanisms for grid services, in: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, GRID’04, IEEE Computer Society Press, 2004, pp. 360–364. [34] G. Aloisio, et al., Secure web services with globus GSI and gSOAP, in: H. Kosch, L. B¨osz¨orm´enyi, H. Hellwagner (Eds.), Euro-Par 2003 Parallel Processing: Proceedings of 9th International Euro-Par Conference, in: Lecture Notes in Computer Science, vol. 2790, SpringerVerlag, 2003, pp. 421–426. [35] R.L. Henderson, Job scheduling under the portable batch system, in: D.G. Feitelson, L. Rudolph (Eds.), IPPS ’95: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, in: Lecture Notes in Computer Science, vol. 949, Springer-Verlag, 1995, pp. 279–294. [36] W. Gentzsch, Sun grid engine: Towards creating a compute power grid, in: CCGRID’01: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, IEEE Computer Society Press, 2001, p. 35. [37] Runtime Environment Registry, Web site. URL http://www.csc.fi/grid/rer/. [38] A. Anjomshoaa et al., JSDL-WG, Job Submission Description Language Work Group, Tech. Rep. GFD-R-P.056, GGF Document Series, 2005. URL https://forge.gridforum.org/projects/jsdl-wg/document/draftggf-jsdl-spec/en/28. [39] A. Konstantinov, The NorduGrid Grid Manager And GridFTP Server: Description And Administrator’s Manual, The NorduGrid Collaboration, NORDUGRID-TECH-2. URL http://www.nordugrid.org/documents/GM.pdf. [40] A. McNab, The gridsite web/grid security system: research articles, Softw. Pract. Exper. 35 (9) (2005) 827–834. [41] R. Housley et al., RFC 2459, Internet X.509 Public Key Infrastructure Certificate and CRL Profile, 1999. URL http://www.ietf.org/rfc/rfc2459.txt. [42] Storage Resource Management Working Group, Web site. URL http://sdm.lbl.gov/srm-wg/. [43] T. Jackiewicz, Deploying OpenLDAP, Apress, Berkeley, CA, USA, 2004. [44] S. Fitzgerald, et al., A directory service for configuring highperformance distributed computations, in: Proc. 6th IEEE Symp. on High Performance Distributed Computing, IEEE Computer Society Press, 1997, pp. 365–375. [45] K. Czaijkowski, et al., Grid information services for distributed resource sharing, in: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing, HPDC-10’01, IEEE Computer Society Press, 2001, pp. 181–194. [46] B. K´onya, The NorduGrid/ARC Information System, The NorduGrid Collaboration, NORDUGRID-TECH-4. URL http://www.nordugrid.org/documents/arc infosys.pdf. [47] J.M. Schopf et al., Monitoring and discovery in a web services framework: Functionality and performance of the globus toolkit’s MDS4, Tech. Rep. ANL/MCS-P1248-0405, Argonne National Laboratory, 2005.

239

[48] A.W. Cooke, et al., The relational grid monitoring architecture: Mediating information about the grid, Journal of Grid Computing 2 (4) (2004) 323–339. [49] L. Field et al. Grid deployment experiences: The path to a production quality LDAP based grid information system, in: A. Aimar, J. Harvey, N. Knoors (Eds.), Proc. of CHEP 2004, CERN-2005-002, vol. 1, 2005, pp. 711–714. [50] The Globus Alliance, The Globus Resource Specification Language RSL v1.0, Web site. URL http://www-fp.globus.org/gram/rsl spec1.html. [51] O. Smirnova, Extended Resource Specification Language, The NorduGrid Collaboration, NORDUGRID-MANUAL-4. URL http://www.nordugrid.org/documents/xrsl.pdf. [52] M. Bogdanski, et al., Grid Service Provider: How to improve flexibility of grid user interfaces? in: P. Sloot, et al. (Eds.), Proceedings of the 3rd International Conference on Computational Science vol. 1, in: Lecture Notes in Computer Science, vol. 2567, Springer, 2003, pp. 255–263. [53] O.-K. Kwon et al. A grid resource allocation system for scientific application: Grid resource allocation services package (GRASP), in: Proceedings of the 13th Annual Mardi Gras Conference: Frontiers of Grid Applications and Technologies, 2005, pp. 23–28. [54] A. Konstantinov, Protocols, Uniform Resource Locators (URL) and Extensions Supported in ARC, The NorduGrid Collaboration, NORDUGRID-TECH-7. URL http://www.nordugrid.org/documents/URLs.pdf. [55] O. Smirnova, The Grid Monitor, The NorduGrid Collaboration, NORDUGRID-MANUAL-5. URL http://www.nordugrid.org/documents/monitor.pdf. [56] NorduGrid Web site, The Grid Monitor, the interface. URL http://www.nordugrid.org/monitor. [57] T. Converse, J. Park, PHP 4 Bible, John Wiley & Sons, Inc., New York, NY, USA, 2000. [58] D.M. Beazley, Automated scientific software scripting with SWIG, Future Generation Computer Systems 19 (5) (2003) 599–609. [59] M. Lutz, Programming Python: Object-Oriented Scripting, O’Reilly & Associates, Inc., Sebastopol, CA, USA, 2001. [60] G. von Laszewski, et al., A Java commodity grid kit, Concurrency and Computation: Practice and Experience 13 (8–9) (2001) 645–662. [61] R. Weltman, T. Dahbura, LDAP Programming with Java, Addison Wesley, 2000. [62] A. Shoshani, A. Sim, J. Gu, Storage Resource Managers: Middleware components for Grid Storage, in: Proc. of the 19th IEEE Symposium on Mass Storage Systems, MSS’02, 2002, pp. 209–223. [63] J. Novotny, S. Tuecke, V. Welch, An online credential repository for the grid: MyProxy, in: Proc. of the 10th International Symposium on High Performance Distributed Computing, HPDC-10’01, IEEE Computer Society, 2001, p. 104. [64] Global Grid Forum, Web site. URL http://www.ggf.org. [65] Lunarc — Center for Technical and Scientific Computing, Lund University, Web site. URL http://www.lunarc.lu.se. [66] G. Karlstr¨om, et al., MOLCAS: A program package for computational chemistry, Computational Materials Science 28 (2) (2003) 222–239. [67] Hibbitt, Karlsson and Sorensen Inc, Pawtucket, USA, ABAQUS Theory Manual, 2005. [68] D.M. Etter, D. Kuncicky, D. Hull, Introduction to MATLAB 7, Prentice Hall PTR, Upper Saddle River, NJ, USA, 2005. [69] C. Esterbrook, Introduction to webware for Python, in: Proceedings of the 9th International Python Conference, 2001. http://www.webwareforpython.org/Papers/IntroToWebware.html. [70] R.T. Fielding, G. Kaiser, The apache HTTP server project, IEEE Internet Computing 1 (4) (1997) 88–90. [71] T. Sj¨ostrand, P.Z. Skands, Baryon number violation and string topologies, Nuclear Physics B 659 (1–2) (2003) 243–298. [72] O. Sylju˚asen, Directed loop updates for quantum lattice models, Physics Review E 67 (1) (2003) 46701–1–17. ˇ ak, Inhomogeneous sandpile model: Crossover from multifrac[73] J. Cern´ tal scaling to finite size scaling, ePrint: cond-mat/0510326, 2005. http://arxiv.org/abs/cond-mat/0501070.

240

M. Ellert et al. / Future Generation Computer Systems 23 (2007) 219–240

[74] R. Sturrock et al., Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2, in: A. Aimar, J. Harvey, N. Knoors (Eds.), Proc. of CHEP 2004, CERN-2005-002, vol. 1, 2005, pp. 765–768. [75] I. Foster, et al., The Grid2003 production grid: Principles and practice, in: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, HPDC’04, IEEE Computer Society Press, 2004, pp. 236–245. [76] I. Foster, et al., The physiology of the Grid, in: Grid Computing, John Wiley & Sons Ltd., 2003, pp. 217–250. [77] K. Czajkowski et al., Modeling stateful resources with web services, Tech. Rep., Globus Alliance, 2004. URL http://www.ibm.com/developerworks/library/ws-resource/wsmodelingresources.pdf. [78] A. Konstantinov, The HTTP(s,g) and SOAP framework, The NorduGrid Collaboration, NORDUGRID-TECH-9. URL http://www.nordugrid.org/documents/HTTP SOAP.pdf.

Jonas Lindemann was born in Sweden in 1970. He received a Ph.D. degree in structural mechanics in 2003. He is currently working as a systems engineer at the Center for Scientific and Technical Computing, LUNARC, at Lund University. His research interests include visualisation of finite element simulations, methods for distributing finite element systems using CORBA/Grid, as well as development of user interfaces for computational codes and educational tools in structural mechanics.

Mattias Ellert is a researcher at the Department of Nuclear and Particle Physics at Uppsala University. He obtained a Ph.D. in High Energy Physics at Uppsala University in 2001 for charged Higgs bosons searches with the DELPHI experiment at the LEP collider at CERN. He is a member of the NorduGrid collaboration, and is active in the development of the ARC middleware, especially the client command line interface. As a member of the ATLAS collaboration he has also worked on the integration of ARC enabled Grid resources with the ATLAS production system.

Jakob Langgaard Nielsen works as a financial analyst and programmer. He holds a Ph.D. degree in theoretical high energy physics from the University of Copenhagen and has worked on the ATLAS project at CERN for 3 years. He has been a part of the NorduGrid project since 2002.

Michael Grønager is a senior researcher at UNI-C, The Danish IT Centre for Education and Research. He obtained his Ph.D. at the Technical University of Denmark in quantum mechanics and laser interactions. He has been working with virtual reality and distributed computing and is currently visiting the Niels Bohr Institute working with Grid computing and Grid interoperability related to the EGEE project.

Aleksandr Konstantinov obtained his Doctor’s degree in Physics at Vilnius University in 2000. He worked in fields of electromagnetic scattering by nonhomegeneous and anisotropic bodies, reverse tasks in electromagnetics, and Grid computing environments. He is now a researcher in the Institute of Material Science and Applied Research of Vilnius University.

Bal´azs K´onya is a lecturer at Lund University, Sweden and is working with the NorduGrid project. His recent work in Grid computing focuses on information system models and their implementations. He received a Ph.D. in theoretical physics from the University of Debrecen, Hungary.

Ilja Livenson is a CS student in Tartu University, Estonia. Currently he works for the BalticGrid project.

Marko Niinim¨aki is currently an associate professor at the University of Tampere, Finland. He obtained his Ph.D. degree of computer science in the same university. He has worked in several Grid projects, affiliated to Helsinki Institute of Physics, and University of Oslo.

Dr. Oxana Smirnova is an Associate Professor at the High Energy Particle Physics department in Lund University, Sweden. A graduate of Moscow State University, she obtained Ph.D. in physics from Lund University, working with the DELPHI experiment at CERN. She is a member of the ATLAS collaboration at CERN and of the NorduGrid, being one of the core developers of the ARC middleware.

Anders W¨aa¨ n¨anen is an IT consultant at the Niels Bohr Institute at the University of Copenhagen where he also earned his Ph.D. in High Energy Physics. He has been working on Grid computing since 2000 and participated in several Grid projects including NorduGrid, the European DataGrid project, the LHC Computing Grid and EGEE.