ARTICLE IN PRESS
Nuclear Instruments and Methods in Physics Research A 534 (2004) 339–342 www.elsevier.com/locate/nima
Computing technology and environment for physics research Marcel Kunze Forschungszentrum Karlsruhe GmbH, P.O. Box 3640, 76021 Karlsruhe, Germany Available online 2 August 2004
Abstract Over the past year a remarkable progress in computing technology and environment for physics research has been achieved. This article tries to summarize the contributions to the field as presented in the course of the ACAT2003 conference. r 2004 Elsevier B.V. All rights reserved. PACS: 89.20; 07.05; 29.50 Keywords: Computing Technology; Physics Analysis; Standardization
1. Introduction In the recent years platforms and applications have become far more advanced and powerful, but also far more complex. The vision of current activity in the field is to enable scientists to concentrate on science, unaware of the details and complexity of the environment they are exploiting. Ideally a working environment for physics research requires Provisioning of common tools, frameworks, environment, data persistency; Exploiting the resources available to experiments in computing centres, physics institutes and universities around the world; E-mail address:
[email protected] (M. Kunze).
Presenting this as a reliable, coherent environment for the scientist.
2. Grid computing Grid Computing has emerged as an important new field, with the potential to offer a suiting environment for physics research. It is distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, highperformance orientation. The main thing required for grid software to interoperate is standardization. The emergence of a large number of projects in both the US and Europe has led, not surprisingly, to efforts that have diverged in different ways, architecture,
0168-9002/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.nima.2004.07.085
ARTICLE IN PRESS 340
M. Kunze / Nuclear Instruments and Methods in Physics Research A 534 (2004) 339–342
capabilities and technical implementation. Web services technology seems to help with this and offers a way to set up robust and fault-tolerant systems, but the definition of semantics and the methods for interaction still need to be worked out. In Grid development this standardization issue with a reference implementation is the most pressing need. The Open Grid Services Architecture (OGSA) promises to define a common architectural framework in the medium term (2–3 years) [2]. Distributed OGSA applications are made of stateful Web Services (Grid Services). For this purpose, the existing Web Services technology needs to be extended by features such as
service discovery, dynamic service creation, lifetime management, notification.
The Open Grid Service Implementation (OGSI) defines a set of standardized interfaces and protocols to implement Grid Services [1]. A reference implementation of OGSI has been published by the Globus team. An evaluation and test of the Globus ToolKit 3 has been presented [3]: The OGSI services provided (GRAM, RFT, IS) seem to be functionally working, although the maturity and the performance may vary. In the meantime, the Web Service Resource Framework (WSRF) has been proposed as a follow-on standard for extension of Web Services and a new reference implementation is on the way (Globus ToolKit 4). 2.1. Data replication Replication of data usually increases Grid system performance. Here, the key open issue is the lack of interoperable systems; OGSA does not seem to help as it does not offer a definition for this kind of service. Apparently the two implementations of European Data Grid (EDG) and Globus are being reconciled by now. Other approaches to data management exist, like the Storage Resource Broker (SRB) of the San Diego Supercomputer Center. Work has been invested to integrate SRB with the GIGGLE/EDG frame-
work in active collaboration between members of CMS, BaBar and the SRB Group [4]. The data discovery component is well understood, the open question however is, how files stored in SRB can be accessed by tools of the LHC Computing Grid. The full interoperation of both environments requires further development effort and the implementation of corresponding Grid Services is planned. Despite the fact that we see workable solutions, an inherent problem remains the consistency management in applications where users can modify replicas. It was therefore proposed to work out a Replica Consistency Service [5]. 2.2. Monitoring A configuration monitoring tool for large scale distributed computing was presented [6]. It allows to track and query site configuration information for large-scale distributed applications in the CMS experiment. There are plans to rework the tool as a Grid Service. In the MonALISA product, monitoring agents are using a large integrated services architecture [7]. The idea is to use dynamic registration and discovery and subscription mechanisms. MonALISA is self-organizing and adapts to changing conditions. 2.3. Modeling The purpose of the modeling effort is to compare grid systems with respect to performance, reliability and scaling. The MONARC simulation framework has been developed for modeling of large scale distributed computing systems. It is a design tool for large distributed systems and allows for performance evaluation and prediction of grid systems [8].
3. Distributed physics data analysis The major concern is that analysis activities require chaotic access to resources by a large number of potentially inexperienced users. Most HEP experiments are developing frameworks for distributed computing and various prototypes of
ARTICLE IN PRESS M. Kunze / Nuclear Instruments and Methods in Physics Research A 534 (2004) 339–342
analysis frameworks exist. However, sometimes there is parallel and non-compatible effort and the importance of standardization is obvious. The ARDA working group defines an Architectural Roadmap towards Distributed Analysis and proposes a common Grid analysis architecture for all LHC experiments based on standards. ARDA proposes the component-by-component deployment and avoiding big-bang releases is a critical part of the implementation strategy. The recommendation is to work out a prototype based on AliEn [9].
4. Interactive physics data analysis Typical interactive requests will run on o(TB) distributed data and the transfer time for the required data volume is a critical boundary condition. Therefore, data transfers once and in advance of the interactive session have to be taken into account. This requires the allocation, installation and set-up of corresponding database servers before the startup of the interactive session. Products with integration of user-friendly interactive access have been presented [9,10].
5. Common libraries There are several common libraries to support a common software infrastructure. The purpose of the SEAL project is to provide the basic foundation and utility libraries and tools that are common for the LHC experiments. In addition to these, the project aims at developing a set of basic framework services to facilitate the integration of LCG and non-LCG software to build coherent applications. The first version of the component model and framework services is available; the scripting is based on Python [11]. The Physics Interface project (PI) aims at setting up a framework for analysis services components on a component bus realized in Python. Based on PI, the Abstract Interface for Data Analysis (AIDA) defines user level interfaces for some common analysis data objects such as binned and unbinned histograms, free format data points,
341
and tuples, together with interfaces to fit and plot these objects. There are prototypes available to implement the AIDA interface for HippoDraw and ROOT [12].
6. Graphic user interfaces A cross-platform approach to create interactive applications based on ROOT and Qt GUI libraries has been presented [13]. The Qt package from TrollTech AS is a multi-platform C++ application framework that developers can use to write single-source applications that run natively on Windows, Linux, Unix, Mac OS X and embedded Linux. A lot of Qt widgets are available for re-use that allow for the consolidation of the ROOT graphics system (TGQt vs. TGWin32, TGX11, TGWin32GDK).
7. Data fabric In the fabric area there is an increasing need for powerful, high-throughput systems. Storage Area Networks (SAN) seem to offer a solution. The GridKa computer center has worked out a scalable IO design based on fiber channel technique [18]. The application of new interface techniques like Infiniband interconnect yields data transfer speeds of up to 800 MB/s [17]. In the area of online data processing the most recent powerful trigger systems require the execution of refined analysis procedures in realtime [14]. In that sector, besides embedded solutions, powerful clusters and networks are in use for online event reconstruction and distributed analysis, for instance in the realtime event reconstruction farm of the Belle experiment [15]. Furthermore, a basic development for an analysis framework distributed on wide area network has been presented [16].
8. Outlook Despite the immense progress in the field of computing technology for physics data processing in the recent time, there remains a set of questions
ARTICLE IN PRESS 342
M. Kunze / Nuclear Instruments and Methods in Physics Research A 534 (2004) 339–342
Is the far-reaching vision offered by Grid Computing obscured by the lack of interoperability standards among grid computing technologies? Should the next few years be considered as a transition period with multiple prototypes in competition to speed up the development? How do we design Grid-aware applications? In order to work on the issues, first of all we have to make developers and users aware of the strong potential of network centric applications. Maybe the current methods of software design are not powerful enough to construct robust distributed systems and we need to think about new abstract programming models. In any case, there seems to be an urgent need to invent new programming techniques and tools that specifically address the Grid and encompass heterogeneity in order to master the distributed computing aspects of Grid programming.
References [1] I. Foster, C. Kesselman, S. Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations, Int. J. Super comp. Appl 15(3) (2001). [2] I. Foster, C. Kesselman, J.M. Nick, S. Tuecke, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002. [3] M. Lamanna, et al., OGSA/GT3 Evaluation Activity at CERN, Nucl. Instr. and Meth. A, (2004) these proceedings. [4] T. Barras, et al., Interfacing the storage resource broker and the GIGGLE framework, Nucl. Instr. and Meth. A, (2004) these proceedings. [5] A. Domenici, et al., Replica consistency in a data grid, Nucl. Instr. and Meth. A, (2004) these proceedings.
[6] Y. Wu, et al., Configuration Monitoring Tool for Largescale Distributed Computing, Nucl. Instr. and Meth. A, (2004) these proceedings. [7] I. Legrand, et al., A processes oriented, discrete event simulation framework for modelling and design of large scale distributed systems, Nucl. Instr. and Meth. A, (2004) these proceedings. [8] I. Legrand, et al., Using a mobile agent architecture to monitor, control and optimize the operation of distributed systems, Nucl. Instr. and Meth. A, (2004) these proceedings. [9] A. Peters, et al., Distributed analysis with AliEn and ROOT, Nucl. Instr. and Meth. A, (2004) these proceedings. [10] M. Ballintijn, F. Rademakers, Parallel interactive analysis on the GRID using PROOF and CONDORe GRID using PROOF and CONDOR, Nucl. Instr. and Meth. A, (2004) these proceedings. [11] J. Generowicz, et al., The SEAL project: Common core libraries and services for LHC applications, Nucl. Instr. and Meth. A, (2004) these proceedings. [12] A. Pfeiffer, Status of the LCG physicist interface (PI) project, Nucl. Instr. and Meth. A, (2004) these proceedings. [13] R. Brun, et al., Cross-platform approach to create the interactive application based on ROOT and Qt GUI libraries, Nucl. Instr. and Meth. A, (2004) these proceedings. [14] V. Lindenstruth, et al., Realtime analysis for the ALICE high level trigger, Nucl. Instr. and Meth. A, (2004) these proceedings. [15] I. Adachi, et al., The Belle computing system, Nucl. Instr. and Meth. A, (2004) these proceedings. [16] M. Ishino, et al., A basic RD for an analysis framework distributed on wide area network, Nucl. Instr. and Meth. A, (2004) these proceedings. [17] A. Heiss, U. Schwickerath, First experiences with the Infiniband Interconnect, Nucl. Instr. and Meth. A, (2004) these proceedings. [18] J. van Wezel, First experiences with large SAN storage in a linux cluster, Nucl. Instr. and Meth. A, (2004) these proceedings.