Towards a complete grid filesystem functionality

Towards a complete grid filesystem functionality

Future Generation Computer Systems 23 (2007) 123–131 www.elsevier.com/locate/fgcs Towards a complete grid filesystem functionality Soha Maad ∗ , Bria...

829KB Sizes 0 Downloads 33 Views

Future Generation Computer Systems 23 (2007) 123–131 www.elsevier.com/locate/fgcs

Towards a complete grid filesystem functionality Soha Maad ∗ , Brian Coghlan, Geoff Quigley, John Ryan, Eamonn Kenny, David O’Callaghan Department of Computer Science, Trinity College Dublin, Ireland Received 2 May 2006; accepted 8 June 2006 Available online 7 August 2006

Abstract To be successful the grid must serve a variety of application domains. The data management solutions offered by existing grid middleware have significant shortcomings with respect to the needs of some of these application domains. This paper attributes these shortages to a lack of an underlying grid storage infrastructure, similar to that within an operating system. In an effort to develop this missing infrastructure we have created a relatively complete grid filesystem comprising four abstract engines for directory management, discovery, data movement and consistency. This will facilitate simple data management both within and between grids. c 2006 Elsevier B.V. All rights reserved.

Keywords: Grid; Filesystem; Data management; Interoperability

1. Introduction Grid computing offers a means for accessing compute and data resources distributed across the world. This paper describes a grid filesystem, a piece of infrastructure that reduces the gap between the grid data management solutions expected by applications and those actually available. Current data management solutions in the grid are largely based on replica catalogues and similar technologies. Each has its own API, requiring both that legacy code be modified in order to access grid data resources, and that new code be written specifically for the target API. A more elegant solution is to join the data storage resources together into a unified grid filesystem that can be accessed using standard operating system file functions. In effect, a grid version of a distributed filesystem like NFS [1] or AFS [2], where a remote filesystem is mounted alongside the local filesystem, with the proviso that this filesystem would still need to enforce authorisation based on grid credentials, such as those used by Globus. Some existing distributed filesystems: (1) require the use of proprietary APIs, so applications needed to be altered [3,4]; ∗ Corresponding author.

E-mail addresses: [email protected] (S. Maad), [email protected] (B. Coghlan), [email protected] (G. Quigley), [email protected] (J. Ryan), [email protected] (E. Kenny), [email protected] (D. O’Callaghan). c 2006 Elsevier B.V. All rights reserved. 0167-739X/$ - see front matter doi:10.1016/j.future.2006.06.006

(2) use protocols that are not normally enabled through firewalls [5,6]; (3) deal only with whole files, so even if only a few bytes are accessed, large data transfers are sometimes necessary [5, 7]; (4) do not support grid authentication; (5) deal at the user and group levels rather than higher-level constructs such as virtual organisations; and (6) where intended for use with grids, tend to be specific to a particular grid and its middleware, rather than more generally applicable [5]. Clearly it is desirable to produce a grid filesystem free of these shortcomings that can be used even with legacy applications without requiring changes to the code. Ideally it would hide the actual file accesses behind the operating system, allowing file access via the standard POSIX-compliant libraries rather than via custom APIs. In the case of Linux, this can be provided by integration with the Virtual File System (VFS) layer [8,9]. Communications should by preference go via popular protocols that are normally enabled through firewalls. It should allow block-level file access, so fractional access to large files can be handled in a more efficient manner. Filesystem security should make full use of grid security models at the various levels — user, VO, etc. Consistency should be maintained at both file and filesystem levels. It also needs to be reasonably fast. Most importantly the grid filesystem needs

124

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131

to be usable and this requires that much of the complexity be hidden. A user not only should not need to know the actual physical location of a file (i.e. it should support location transparency), they should not even need to know which grid the file resides in. The grid filesystem should be transparent to the user, accessed like part of the local filesystem, with details about the location of files and how they are accessed hidden from the user. In this paper we describe a prototype grid filesystem that we believe achieves most of these goals and serves as a platform for further research into optimisations for speed and consistency. Our filesystem is not built from scratch but harnesses disparate previous efforts undertaken to develop individual grid filesystem functionalities. Key qualities that we aim to embody in the filesystem are simplicity, portability, interoperability, and universal accessibility. The next section of the paper sheds light on the gap between the actual and the expected data management solutions. Following this we describe a relatively complete grid filesystem composed of four abstract engines: directory, discovery, data movement, and consistency engines. Implementation details are discussed, followed by various use case scenarios. Experimental results are presented and the paper concludes with a summary. 2. The grid data management gap Although a large variety of distributed filesystems exist, few are a good basis for a grid filesystem. The proposed NFS protocol version 4 [10] is an interesting candidate. AFS [11,2] and Coda [6] share many features. Coda, however, downloads the whole file to a client, whereas AFS introduced chunking of files in version 3. Both use Kerberos security and use protocols that are not normally accessible through firewalls. At present (December 2005), no single grid project has delivered a grid filesystem deployable on various existing grids and operating at grid and inter-grid levels. Such a system would ultimately provide file discovery, directory, and data movement services while respecting consistency, security, timeliness, and transparency. Existing grid middlewares only partially provide the above functionality. Data Management in the EGEE [12] context has three major parts [13] provided by a set of services including: storage elements (SEs), to provide file access and storage; data catalogs, to facilitate access to file replicas using a location-independent logical file name (LFN); and transfer scheduling, to make sure that a given file is available at the chosen site where the job is to be run. Recently a good first approximation to a grid filesystem, ELFI [7], has been provided by the EGRID project. ELFI is a filesystem interface to the replica catalog and LCG [14]/EGEE classic SE. It uses a FUSE [15] user-space daemon, which can trap and redirect system calls made to the Linux VFS. The Globus Toolkit [16] provides a number of components for data management. These components fall into two basic categories: data movement (including GridFTP and Reliable File Transfer (RFT) service tools) and data replication including the Replica Location Service (RLS). An important

related component, OGSA-DAI [17], enables users to gridenable their data resources, e.g. databases or files, via web services. SlashGrid [18] is a framework for Grid-aware filesystems developed by the GridPP Project [19]. It is based on a rewrite of Venus, the Coda user-space daemon. It uses Grid Access Control Lists (GACL) [20] to provide file access control based on grid PKI certificates. There are two implementations of SlashGrid: certfs uses grid certificates to authorise access to local files, and curlfs uses the libCurl library [21] for HTTPS access to files on remote webservers (but is read-only). Both variants inherit the Coda practise of downloading the whole file to a client. The GFARM filesystem is a shared filesystem for a cluster or Grid that can scale up to petascale storage, designed for parallel processing and scalable I/O bandwidth [5]. Like ELFI it uses a FUSE user-space daemon. The Resource Namespace Service (RNS) [22], is a Web Service Resource Framework (WSRF) [23] compliant Web service capable of providing namespace services for any addressable entity. RNS is intended to facilitate namespace services for a wide variety of grid services; however it does not provide any actual file service, just the namespace services, and even these are a definition, not an implementation. The Global Grid Forum (GGF) Grid File System working group [24] proposes a Service oriented architecture for a Grid Filesystem (GFS) [25] that would provide three standard interfaces: a Client Interface (to allow clients to use a namespace); a Resource Interface (to allow any underlying filesystem resource to plug into the GFS); and a Peer Interface (to allow peer-to-peer communication between the GFS resources). Again this is a definition of a filesystem, not an implementation. TeraGrid [26] and DEISA [27] have recently demonstrated interoperability at the file level with a special grid filesystem with one common file address space. Applications executed at any of the participating sites can transparently access data in the common file address space, although whole-file transfers are done. While a very good example of the role of a grid filesystem in promoting grid interoperability, the extent to which grid interoperability is more generally supported (with grids other than TeraGrid and DEISA) is not yet clear. There is the additional question of the relationship of a grid filesystem with grid services for resource discovery and indexing, such as information systems like Globus MDS [28] or R-GMA [29]. A grid filesystem could leverage these, particularly for discovery, and indeed, as will be seen later, this is what we have done. The overviews of the grid penetration in various application domains presented in [30,31] infer various expected grid data management needs as seen from an application domain perspective. The grid filesystem described below aims at reducing the gap between the existing solutions outlined above and the expected solutions. This gap is summarised in Table 1. At the present time, grid-enabling existing applications involves some re-engineering, for several reasons: (1) grid services are accessed through special-purpose APIs

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131 Table 1 Attributes required to fill the gap between actual and expected grid data management solutions Expected

Actual

Example domain

r/w/etc access modes Secure data access Rich metadata Plug-and-play storage Fractional access Data consistency User friendly access User-specific views Native file processing Inter-grid data access

Catalog save/retrieve Partial Metadata about catalogs Proposal (GGF) No (only complete files) No No Proposal (RNS) No (via API) Work in progress

Collaborative Eng. e-Hospital, military e-Government e-Business, enterprise Very Large Datasets Parallel processing e-Business, enterprise All All Grand challenge

(2) leveraging grid computational power imposes the adoption of a parallel programming paradigm (3) grid files are accessed and manipulated through specialpurpose APIs (4) the grid environment is not user friendly, especially for a user who is familiar only with desktop environments. This is due to a lack of a low level infrastructure that facilitates the above tasks and reduces the re-engineering cost. By analogy to a personal computer, this infrastructure is basically a grid operating system that enables input/output interactions, e.g. accessing of external devices (storage and other devices), or interactions with a user. A core functionality of this grid operating system is a filesystem that allows the user to view and manipulate grid files as local files on a workstation. Users urgently need some of the complexity of the massive interconnectivity of the grid to be hidden, as well as provision of human-readable easy-to-manage data views. 3. A transparent grid filesystem A grid filesystem must support standard file operations such as open, read, write, seek, close, etc. Ideally these would be transparent to any grid middleware, in that they would occur at the operating system level, invoked by the standard POSIX-compliant system calls that are supported by the normal libraries for C, C++, Java, Python, etc. Below we describe a prototype of a transparent grid filesystem, gridFS, with basic but relatively complete functionality provided by four orthogonal abstract engines for directory management, discovery, data movement, and consistency. This grid filesystem can be deployed on every node of EGEE, Globus, or any other existing middleware and even on user’s workstations. The grid filesystem is specifically intended to support interoperability between arbitrary middlewares. As can be seen from Fig. 1, the data movement engine of the grid filesystem is effectively a client/server pair. The client uses the FUSE user-space daemon and the server uses the GridSite module for the Apache webserver. The exported filesystems are registered with a discovery engine that is based on the R-GMA information system. A directory engine (also based on R-GMA) supports personalised views of the filesystem.

125

3.1. Data movement engine The data movement engine consists of the basic client and server sides of a filesystem. It is intended to overcome two major limitations in existing data management solutions. Firstly, they do not operate at the block level. Secondly, in the main they use custom APIs to access, e.g. file catalogs. With the above concerns in mind, the data movement engine focuses on accessing various types of data storage via VFS, and operation at block rather than at file level. The server side of the data movement engine exploits the GridSite [32] module (mod gridsite) for the Apache webserver [33]. GridSite accesses files using the HTTP 1.1 protocol. It includes very desirable features including authentication and GACL [20] authorisation at each directory level, and supports convenient graphical editing of these permissions. Each directory contains a XML .gacl file that defines the permissions, conditioned by host, VO or person: /C=IE/O=Grid-Ireland/OU=cs.tcd.ie/L=RA-TCD/ CN=Soha Maad /vo.dom.ain/group https://www.vo.dom.ain/dn-lists/group host*.dom.ain GridSite supports byte-level access for get, but only file-level access for put, move and delete. For the data movement engine we modified mod gridsite to provide byte-level access for both read and write. The client side of the data movement engine uses block-level caching to minimise network traffic and support consistency. Like ELFI and GFARM it derives from the FUSE user-space daemon, although it started life as a derivation of the Coda user-space daemon Venus. This approach has several notable attributes: (1) By using a user-space daemon (FUSE), if the user invokes the grid filesystem, it supports a user-specific grid filesystem view. (2) By using HTTPS to communicate with the server-side, it can: – traverse firewalls as easily as any browser can; – authenticate connections; – leverage web protocol optimizations; and – leverage the proxy functionality of web servers. (3) By using a user-space daemon and HTTPS, it supports complete user-level privacy: – at the client even the root user cannot access the user’s filespace, and all data movement beyond the client daemon is secured by SSL; and – at the server, privacy can be ensured by encrypting all data within the client daemon, which runs in the user’s address space; all information beyond the client daemon will then be private.

126

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131

Fig. 1. The grid filesystem architecture.

(4) By operating at block level: – it can fractionally read or write remote files, only transferring the blocks required; and – it has the potential to provide block-level locking to facilitate future mutually-exclusive multiple-writers to the same file. At a high level then, these components provide a filesystem that communicates over HTTPS with a web server and allows secure file operations on byte-ranges within files. Use of a user-space daemon addresses the transparency requirement, by providing access to remote files through the VFS layer using standard POSIX-compliant libraries that are universally available. 3.2. Consistency engine Consistency semantics define the outcome of multiple accesses to a single file. An example case where inconsistency may arise is when several grid processes access the same filesystem and attempt a simultaneous write to a single file. To avoid such a case several consistency models can be adopted. Initially the engine enforces cache consistency via a simple single-reader single-writer write-back coherency policy. 3.3. Directory engine Although the data movement engine provides transparency, the directory engine is needed to fully address transparency, as only by allowing views that hide data location and complexity can a truly location-transparent and user-friendly system be constructed. A user creates a personal hierarchy

of virtual directories, where leaves in this tree correspond to subdirectories of personal interest that exist on physical filesystems. It is assumed these subdirectories have been exported using the grid filesystem by the same or another user. The user runs their own instance of the FUSE daemon that accesses this personalised view of the grid filesystem namespace, using their personal grid-credentials. The directory engine implements many of the concepts of the Resource Namespace Service (RNS) provided by the Global Grid Forum Grid File System working group. RNS identifies resources within a grid by a universal name that ultimately resolves to a meaningful address, with a particular emphasis on hierarchically managed names that may be used in human interface applications. RNS embodies a three-tier naming architecture, which consists of human interface names, logical reference names, and endpoint references. Name-toresource mapping in RNS features an optional arrangement of two levels of indirection. The first level of indirection is realized by mapping human interface names directly to endpoint references. A second level of indirection may be useful when mapping human interface names to logical references (identified by logical names), which in turn map to endpoint references. The advantage of using a logical name to represent a logical reference is that they may be referenced and resolved independently of the namespace. Fig. 2 shows an example of the first level of indirection. The table on the right-hand side shows the list of mappings of the full physical locations to a user-specified namespace that is easily recognised and makes sense to the user.

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131

127

Fig. 2. GUI showing mapping of endpoint references to user namespaces. node is a server. export dir is a directory.

The directory engine uses the Relational Grid Monitoring Architecture (R-GMA) [34,35,29] to store published namespace information. R-GMA is composed of three components: producers, which insert data into a virtual database; consumers, which retrieve information from the virtual database; and registries, which match consumers to producers which publish information that the consumers are interested in. Information can be inserted into and retrieved from the R-GMA virtual database using a subset of the statements specified in the SQL92 Entry Level standard. Directory engine producers publish namespace information into the database using SQL INSERT statements, and its consumers retrieve that information from the database using SQL SELECT statements. R-GMA currently offers rather limited security based on host authentication. Whereas the directory engine functions are subject to credential checks and permissions, data published in R-GMA is not yet subject to user-based authorisation and can be queried by any authenticated user regardless of permissions. This is expected to be remedied by the end of 2006.

imposed by the fact that currently R-GMA only offers hostbased authentication. Files may not be discovered, only filesystems (directories), as grid-wide discovery down to the file level would not scale. However, if desired for a specific activity, then an activityspecific (perhaps VO-wide) discovery engine may very easily be established that publishes richer metadata, e.g. to file level. 4. Implementation details Fig. 1 depicts the grid filesystem showing its abstract engines. For scalability and/or management purposes, nodes 1–7 can be separate, but typically nodes 1–3 are combined as a client and nodes 6–7 are combined as a server, while nodes 4–5 can be combined or replicated, and each may in fact encompass one or more machines. The linkages between the client components on nodes 1–3 can be composed in a number of different ways, and the VFS filesystem of node 7 may hide client components too. 4.1. Data cache

3.4. Discovery engine Whereas existing grid middleware focuses on delivering partial data management solutions, there is very little existing metadata that may be used to describe a grid filesystem. The discovery engine allows metadata to be published and queried as needed to support grid and inter-grid activity, using two sets of tools: (1) gridfspublish: command-line/GUI tools to publish metadata to an inter-grid discovery service. (2) gridfsdiscover: command-line/GUI tools to discover filesystems that match a search string by querying published metadata. The discovery engine assumes that every machine on the grid is able to export some directories according to given permissions (at grid and inter-grid levels) of read, write, open, execute, admin, etc. Like the directory engine, it depends on the Relational Grid Monitoring Architecture (R-GMA) to store published metadata. The same security constraints are

A filesystem is part of the storage hierarchy extending from CPU registers to tape repositories. Typically caches at one level of the hierarchy mask the longer latencies to the next level and reduce traffic between the levels. The grid filesystem client includes local caches of file data blocks held at the remote server. The current implementation uses a cache of 200 1 MB blocks, but this can easily be changed. The cache uses a writeback policy for flushing block-writes through to the server. Its address function is a hash code generated from the URI of the file and the block number. If large numbers of cache collisions start to take place this can add a significant overhead as blocks generally need to be read from the server unless the whole block is to be overwritten. File handles ensure that the grid filesystem is aware of which files are currently open, with doubly linked chains to track which blocks of a file are in the cache. 4.2. Directory cache The directory cache decouples the operation of the data movement engine from the directory engine. In addition it

128

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131

enables a small but important distinction to be made between statically and dynamically mapped grid filesystems. Static mappings are loaded at startup time from the directory engine and/or local gridfstab files, whereas dynamic mappings are loaded on demand from the directory engine. The primary advantage of this approach is that those engines hidden behind the VFS layer (the data movement and consistency engines) form a self-sufficient filesystem, albeit without location transparency. After startup any user or program that knows the full path to a remote filesystem can immediately access the files there, even in the absence of the directory and discovery engines. This highlights the value added by the latter two engines. It is they that virtualise the filesystem to be a unified location-transparent grid filesystem. Since the mappings are generally loaded into and unloaded from the directory cache on demand, there are no mountpoints as with most other filesystems. The advantage is simplicity: no mounting/unmounting. 5. Simple interactive use case First let us look at a simple use case. Suppose a user wishes to access files within directories on three gridFS servers as shown under Endpoint Reference in Fig. 2. Assume they discover the filesystems by using the discovery service, then create a virtual directory hierarchy that is more sensible to them, as shown under User Namespace in Fig. 2. After the user starts the grid filesystem with the ./htgridfs fuse ˜/gridfs command, then ls ˜/gridfs will list the directories UCC, TCD and NUIG. Note that there aren’t any physical directories corresponding to UCC, TCD or NUIG, since these are purely virtual locations created by the user and stored in the directory engine. ls ˜/gridfs/TCD will list this virtual directory’s children (COMPUTER and PHYSICS). ls ˜/gridfs/TCD/COMPUTER will cause the directory engine to convert the virtual path to a physical path so that the data movement engine can obtain the list of children of that path. File operations can be carried out within the directories that correspond to physical locations but not within those that are purely virtual. Commands such as vim ˜/gridfs/TCD/COMPUTER/testfile.txt treat the grid filesystem exactly as any other area of filesystem. 6. Grid-job file handling use cases Assume a user would like to submit a job that uses files on his/her workstation. Typically they use a job description language (e.g. the declarative languages RSL [36] for Globus and JDL [37] for LCG2) to specify all the information needed by the middleware to run a certain job, including details of all the necessary files. There may be other files related to the job, e.g. data input/output files. The way in which these files are presented will depend on the application and the middleware. For example, it may be appropriate that they be transferred via sandboxes, or they may belong to a third party, or their location may be in some sense unknown, etc. Nonetheless, the generic sequence of file operations that may be applied to the files is:

(1) (2) (3) (4) (5)

metadata create & publish discovery, then namespace create & publish, or vice versa open/read/write/lock/. . . /close namespace unpublish & delete metadata unpublish & delete.

Generally only a subset of these operations will be applied to any one file. Below four use cases are considered, where each involves a different subset of these actions within job submission and execution, see also Table 2. The first two cases do not assume the grid filesystem exists, while the last two do. The first two use cases highlight the need for a grid filesystem offering transparent remote file handling. 6.1. Use Case 1: No grid filesystem, using sandboxes Local files are passed to the job via sandboxes, i.e. passed by value (names plus contents). This is a non-grid-filesystem case, where files are passed via sandboxes. A good example of sandboxing is in the LCG2 middleware [12,14]. In order to successfully submit the job, a JDL file and all the entities contained in the JDL (Input Sandbox files, executables files, etc) are transferred by LCG2 to the compute node (WN) as part of the job submission protocol, and eventually the job will begin to execute. Upon successful completion of a job LCG2 will return files that have been created or modified by the job if they are specified as OutputSandbox files. Since all the sandbox files have been passed by value (i.e. copies of both filenames and contents have been transferred), the grid filesystem is not involved in any file handling relating to these files. 6.2. Use Case 2: No grid filesystem, using file catalogs Files with known paths are copied to file catalogs, i.e. passed by value, then the file catalog handles are passed to the job as arguments, i.e. passed by reference (names only). Again this is a non-grid-filesystem case. Sandboxes are not efficient for large files, so several middlewares have implemented file catalog services, e.g. Globus RC [38] or LCG2 RLS [39]. The catalog file handles can be specified in the job descriptions, e.g. LCG2 catalog files may be specified by the InputData and OutputData keywords; it then copies those files to/from the compute node. The use of a catalog is sensible for large datasets, and is traditional for some disciplines (e.g. Physics), but can be a notable inconvenience for others (e.g. Collaborative Engineering). Files handled via a catalog do not involve the grid filesystem. 6.3. Use Case 3: Simplest use of gridFS, no discovery Files with known paths are passed to the job as arguments, i.e. passed by reference (names only). If the grid filesystem is used, then any extra files can be passed to the job by reference (i.e. only the filenames need be passed, as job arguments) without recourse to file catalogs. This is not usual for grid middleware. Let us consider the simplest possible use of the grid filesystem. Assume a data file resides at a known gridFS server, with suitable access permissions. In this case the user knows the

129

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131 Table 2 File actions used by six use cases Action

Actor

Machine

Create & Publish Metadata Discovery Create & Publish Namespace Create & Publish Metadata Copy File to Catalog Pass File by Value Pass File by Reference Discovery Create & Publish Namespace Create & Publish Metadata Open Read Write Close Unpublish & Delete Metadata Unpublish & Delete Namespace Unpublish & Delete Metadata Unpublish & Delete Namespace Unpublish & Delete Metadata

Remote user User User User User User User Program Program Program Program Program Program Program Program Program User User Remote user

Server Client Client Client Client Client Client WN WN WN WN WN WN WN WN WN Client Client Server

full path to the file and can pass it to the job as an argument. The owner of the file does not have to create or publish the export metadata as the full path to the file is known, including the host on which it resides, i.e. no discovery is necessary. When the user’s job makes a system call to open the file, the client-side of the data movement engine can remotely open it, perform any necessary reads and writes, conditioned by their credentials, and eventually close the file. Notice that any remote file actions are transparent to the user’s job, which does not have to be specially ported to the grid. This transparency allows the grid filesystem to be used with any middleware. 6.4. Use Case 4: Simple use of gridFS, prior discovery Files with incomplete paths, discovered prior to submission, where the complete paths are passed to the job by reference. The above use case does not involve discovery, since the full path to the file was already known. As discussed in Section 3.4, the user can perform discovery and choice prior to job submission. The use case then proceeds as in Use Case 3 above. Again any remote file actions are transparent to the user’s job. There are other more complex use cases, e.g. see Table 2 case 5, where incomplete file paths are passed to the job by reference then the job discovers the files. As shown, the job also creates and publishes its own view (perhaps to a private subnet). We are still exploring this flexibility. Or see case 6, where files are discovered and mapped to the user’s local namespace prior to submission then the full path to the file in their local namespace is passed to the job by reference — wherever it runs, the job is still able to use the user’s namespace. This convenience, that the user’s filesystem view is the same regardless of location, makes the grid filesystem very easy to use. In fact use case 6 is a contextless use case that uses all four engines to enable transparent grid file access for

1

2

3

4

5

6

X X

X

X X X

X X

X X X X

X

X

X

X X X

X

X X X X

X X X X

X X X X

X X X X

X X X X

X

X

X

X X

any middleware, and hence nicely illustrates the support for interoperability between arbitrary middlewares. 7. Experimental results Table 3 presents some preliminary results from performance comparisons between the prototype and other transfer methods: htcp and gsiFTP. The former is a command-line utility supplied with GridSite, and the latter is part of the Globus Toolkit. All tests were performed between the same client and server machines. Both gridFS and htcp use https to copy files of 10 MiB and 50 MiB to read/write from/to the same locations of the server. The gsiFTP test uses globus-url-copy for the read and write tests but accesses the same disk locations in order to minimise differences between the tests. These results are the average of at least 10 repetitions of the tests. Results from Bonnie [40] are also shown in Table 3, where the influence of caching is more clearly evident. Performance is entirely unoptimised as yet. 8. Conclusion This paper outlines the need for a complete grid filesystem to support file manipulation at grid and inter-grid levels. A basic but relatively complete grid filesystem, gridFS, has been presented. The functionality has been evaluated in several use cases, and some experimental results outlined. Table 4 shows the manner in which the desired attributes for closing the gap between expected and actual functionality have been achieved. Key features of the filesystem are simplicity, portability, interoperability, and universal accessibility. The filesystem is constructed of four simple abstract engines. Although it relies on UNIX technology, it is expected that it will be portable across a wide variety of UNIX family derivatives. Its native file processing was designed to support interoperability across arbitrary middlewares. Reliance on a HTTP 1.1 transport layer

130

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131

Table 3 (a) Mean transfer rate (in KiB/s) comparison of gridFS with htcp and gsiftp (b) Bonnie benchmarking results (a) File Copying

gridFS

htcp

gsiftp

Read 10 MiB Write 10 MiB Read 50 MiB Write 50 MiB

7339 1828 4787 2007

2568 3562 2817 3612

12 272 13 484 22 535 19 356

(b) Bonnie

Seq Output (nosync)

Seq Input

Rnd Seek

Size (MB)

Char (K/s)

Block (K/s)

Rewrite (K/s)

Char (K/s)

Block (K/s)

04k (03) (/s)

50

4448

6601

6478

16 407

49 170

152.5

Table 4 Expected grid data management attributes and their provision Expected

The complete gridFS

Active engine

r/w/etc access modes Secure data access Rich metadata Plug-and-play storage Fractional access Data consistency User friendly access User-specific views Native file processing Inter-grid data access

Yes GACL permissions + privacy Discovery of various forms Plug-and-play namespaces Yes Write-back coherency Yes Yes Yes Yes

Data Movement Data Movement Discovery Data Movement Data Movement Consistency Directory Directory Data Movement All

should yield widespread accessibility. It is expected that this new filesystem will encourage further innovation to assist the increasing data management activity at grid and inter-grid levels. Acknowledgements We would like to thank Science Foundation Ireland (SFI) for their support in funding this research as part of the WebCom-G project, Andrew McNab of Manchester University for his help with investigating SlashGrid and extending GridSite, and the R-GMA team for their unfailing assistance. References [1] B. Callaghan, B. Pawlowski, P. Staubach, NFS version 3 protocol specification. URL: http://www.ietf.org/rfc/rfc1813.txt, June 1995. [2] Open AFS. URL: http://www.openafs.org/. [3] Replica location service (RLS). URL: http://www.globus.org/toolkit/data/rls/. [4] LCG file catalog. URL: http://wiki.gridpp.ac.uk/wiki/LCG File Catalog. [5] O. Tatebe, N. Soda, Y. Morita, S. Matsuoka, S. Sekiguchi, Gfarm v2: A grid file system that supports high-performance distributed and parallel data computing, in: Proceedings of the 2004 Computing in High Energy and Nuclear Physics, CHEP04, Interlaken, Switzerland, 2004. URL: http://datafarm.apgrid.org/.

[6] CODA. URL: http://www.coda.cs.cmu.edu/. [7] ELFI file system, URL: http://www.egrid.it/sw/elfi/index html. [8] The Linux virtual file system layer, URL: http://www.cse.unsw.edu.au/ ∼neilb/oss/linux-commentary/vfs.html. [9] M. Bar, Linux File Systems, Book and CD edition, McGraw-Hill, ISBN: 0072129557, 2001. [10] B. Pawlowski, S. Shepler, C. Beame, B. Callaghan, M. Eisler, D. Noveck, D. Robinson, R. Thurlow, The NFS version 4 protocol, in: Proceedings of the 2nd International System Administration and Networking Conference, SANE2000, 2000, p. 94. URL: http://www.citeseer.ist.psu.edu/article/ pawlowski00nfs.html. [11] J. Morris, M. Satyanarayanan, M. Conner, J. Howard, D. Rosenthal, F. Smith, Andrew: A distributed personal computing environment, Communications of the ACM 29 (3) (1986) 184–201. [12] Enabling Grids for E-sciencE (EGEE), URL: http://www.eu-egee.org/, 2006. [13] P. Kunszt, EGEE gLite users guide overview of gLite data management, EGEE-TECH-570643-v1.0 (20th March 2005). [14] Large Hadron Collider Computing Grid Project (LCG), URL: http://lcg. web.cern.ch/LCG/, 2006. [15] FUSE, filesystem in user space. URL: http://fuse.sourceforge.net. [16] Globus project, URL: http://www.globus.org/. [17] M. Antonioletti, M. Atkinson, R. Baxter, A. Borley, N.C. Hong, B. Collins, N. Hardman, A. Hume, A. Knox, M. Jackson, A. Krause, S. Laws, J. Magowan, N. Paton, D. Pearson, T. Sugden, P. Watson, M. Westhead, The design and implementation of grid database services in OGSA-DAI, Concurrency and Computation: Practice and Experience. 17 (2). [18] A. McNab, Slashgrid—a framework for grid-aware filesystems, in: Storage Element Workshop, CERN, 2002. URL: http://www.gridsite. org/slashgrid/. [19] GridPP project. URL: http://www.gridpp.ac.uk/. [20] GACL. URL: http://www.gridpp.ac.uk/authz/gacl/. [21] The LibCurl Library. URL: http://curl.haxx.se/libcurl/. [22] M. Pereira, O. Tatebe, L. Luan, T. Anderson, J. Xu, Resource namespace service specification, November 2005. [23] K. Czajkowski, F.D. Ferguson, I. Foster, J. Frey, S. Graham, I. Sedukhin, D. Snelling, S. Tuecke, W. Vambenepe, The WS-Resource framework, version 1.0, March 2004. [24] GGF grid file system working group (gfs-wg). URL: http://phase.hpcc.jp/ ggf/gfs-rg/. [25] The GGF Grid File System architecture workbook, version 0.54, 3rd August 2005. [26] The TeraGrid project. URL: http://www.teragrid.org/. [27] The DEISA project. URL: http://www.deisa.org/. [28] S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke, A directory service for configuring high-performance distributed computations (1997) 365–375. [29] The relational grid monitoring architecture (R-GMA). URL: http://www. r-gma.org/. [30] S. Maad, B. Coghlan, G. Pierantoni, E. Kenny, J. Ryan, R. Watson, Adapting the development model of the grid anatomy to meet the needs of various application domains, in: Proceedings of the Cracow Grid Workshop, CGW’05, Cracow, Poland, 2005. [31] S. Maad, B. Coghlan, J. Ryan, E. Kenny, R. Watson, G. Pierantoni, The horizon of the grid for e-government, in: Proceedings of the eGovernment Workshop, Brunel University, United Kingdom, ISBN 1-902316-46-0, 2005. [32] The GridSite project. URL: http://www.gridsite.org/. [33] Apache web server. URL: http://httpd.apache.org/. [34] S. Fisher, Relational model for information and monitoring, 2001. [35] B. Coghlan, A. Djaoui, S. Fisher, J. Magowan, M. Oevers, Time, information services and the grid, in: K.D. O’Neill, B.J. Read (Eds.), Proc. BNCOD 2001—Advances in Database Systems, RAL-CONF-2001–003, Oxford, 2001.

S. Maad et al. / Future Generation Computer Systems 23 (2007) 123–131 [36] The globus resource specification language (RSL), specification 1.0. URL: http://www-fp.globus.org/gram/rsl%5Fspec1.html. [37] The edg job description language (JDL). URL: http://server11.infn.it/ workload-grid/docs/DataGrid-01-TEN-0142-0%5F2.pdf. [38] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets (based on conference publication from proceedings of netstore conference 1999), Journal of Network and Computer Applications 23 (2001) 187–200. [39] D. Cameron, J. Casey, L. Guy, P. Kunszt, S. Lemaitre, G. McCance, H. Stockinger1, K. Stockinger, G. Andronico, W. Bell, I. Ben-Akiva, D. Bosio, R. Chytracek, A. Domenici, F. Donno, W. Hoschek, E. Laure, L. Lucio, P. Millar, L. Salconi, B. Segala, M. Silander, Data management services in the european datagrid project, in: Proc. UK e-Science All Hands Conference, 2004. [40] T. Bray, Bonnie Disk Benchmark, 1996. URL: http://www.textuality.com/ bonnie/. Soha Maad is a Post-doctoral researcher at the Department of Computer Science at Trinity College Dublin (TCD). She is currently working on the WebCom-G project funded by the Science Foundation Ireland (SFI). In 2002, she received her Ph.D. in Computer Science from the University of Warwick in the United Kingdom. After her Ph.D., she was granted an ERCIM fellowship at Fraunhofer Institute of Media Communication in Germany. Following her ERCIM fellowship, she joined INRIA Rhone-Alpes in France as a postdoctoral researcher. Her research interests are data management, software engineering, grid applications, and multimedia. Brian Coghlan, see http://www.cs.tcd.ie/coghlan/, is a Senior Lecturer in the Department of Computer Science, TCD. He is joint leader of the Computer Architecture Group, and is the instigator of and responsible for its grid activities. He has participated in the EU DataGrid, CrossGrid and EGEE projects, and is currently a participant in the EU EGEE-II and int.eu.grid projects and the Irish CosmoGrid and WebCom-G projects. His research staff have extensive expertise in the grid arena, and provide the grid management for Grid-Ireland, of which he is a founding Director. He is a member of the Institution of Electrical Engineers, and a Professional Chartered Engineer (C.Eng.).

131

Geoff Quigley is a postdoctoral research fellow in the Computer Architecture Group at Trinity College Dublin, and is currently working on the WebComG project. He has a B.Sc. in physics from Imperial College, London, and a Ph.D. in optoelectronics from the University of Southampton. After his Ph.D. Geoff worked as a programmer and wireless product manager for Autonomy Systems Ltd in Cambridge, UK, and subsequently as a consultant and trainer specialising in Autonomy’s technologies for the processing and management of structured and unstructured data. His current research interests are infrastructure management, data movement and user interfaces in the Grid context. John Ryan is a post-graduate research student in the Department of Computer Science at Trinity College Dublin. He is currently working on the WebComG project funded by Science Foundation Ireland and performing research for his Ph.D. He received his B.A.I engineering degree from Trinity College Dublin. His research interests include parallel programming models, especially for shared memory and message passing, and hybrid combinations thereof.

Eamonn Kenny is a Post-Doctoral Researcher at the Department of Computer Science in Trinity College Dublin. He is currently the Webcom-G project manager. He received his Ph.D. in electronic engineering at Trinity College Dublin specialising in ray-tracing methods for indoor telecommunications. He has an M.Sc. in numerical analysis and worked as a researcher in the electronic engineering department on the EU STORMS and Enterprise Ireland Informatics 2000 project before joining the Computer Architecture group at TCD. David O’Callaghan is a post-graduate research student in the Department of Computer Science at Trinity College Dublin. He is currently working on the WebCom-G project funded by Science Foundation Ireland and performing research for his Ph.D. He received his B.A. (Mod) in Computer Science from Trinity College Dublin. His research interests include grid security, in particular authentication and trust in a grid context.