Nuclear Instruments and Methods in Physics Research A 389 (1997) 87-88 -, __ Gi!!!zJ ELSEXIER
NUCLEAR INSTRUMENTS ahmwooS IN PHYSICS RESEARCH SectIonA
Physics analysis in the NA49 experiment using ROOT Gunther Institut
,fu'r Kernphwik.
Roland*
Universitiit Frankjiwt.
Germane
For the NA49 collaboration Abstract The NA49 experiment studies collisions of heavy nuclei at the CERN SPS. In terms of data rate (16 Mbyte/s) and total yearly data volume (15 Tbyte), NA49 is facing challenges already today that will be a commonplace at future colliders like RHIC and LHC. To allow a fast, interactive analysis of the NA49 data, the development of the object-oriented ROOT analysis framework was started within the NA49 collaboration. The system has been used for physics analysis for more than a year. In this paper we will summarize our experience so far and outline our plans for the near future.
1. ROOT in the NA49 experiment The decision to develop the ROOT [l] system was based on two facts inherent in the NA49 physics program, the huge data volume and the need for a powerful interactive analysis. The aim of NA49 is to collect the maximum information on the hadronic final state of nuclear collisions at the CERN SPS, allowing an Event-by-Eoent analysis. In four large volume Time-Projection-Chambers, up to 1200 charged particles are measured for a single event! After zero-suppression the raw data volume is more than 8 Mbyte per event. The track reconstruction reduces the data volume to 150 Kbytes per event, which are stored in ROOT mini-DSTs. To achieve sufficient statistics NA49 is running with an event rate of 1-2 Hz, leading to a data rate of 16 Mbyte/s. The resulting data volume per year is of the order of 15 Tbyte, with typically lo6 events taken under identical run conditions. Over its planned lifetime of 5years, NA49 will therefore accumulate close to 80 Tbyte of raw data, resulting in about lOTbyte of DSTs and 1.5Tbyte of ROOT-based mini-DSTs. Typically, the Event-by-Event analysis requires several types of access to different parts of the mini-DSTs: ~ All Events, all particles: To calculate the properties of every single event, we need high-bandwidth access to all the available events for certain run-conditions, possibly reading only parts of the events. ~ Correlation analysis: To search for correlations between different event properties and to search for different sub-classes of events and rare fluctuations we *Tel.: + 41 22 767 6406; e-mail:
[email protected].
need statistical and graphics tools for an interactive analysis of the event-property database. _ Event selection: We need tools for selectively reading events that have been singled out in the correlation analysis. The mini-DST therefore has to allow direct access to events identified by run- and event-number. Finally, we want to perform detailed physics analysis on selected subsets of 103-lo5 events. We also required the possibility to integrate custom packages for statistical and physics analysis. The system was interfaced to the standard NA49 DSTs, which are based on C structures and managed by the DSPACK data management system.
2. The ROOT49 library The implementation of the NA49 physics analysis inside the ROOT environment has four major components: (i) Data classes for events, tracks, points, etc. (ii) Standard analysis classes containing reusable statistics tools, etc. (iii) User analysis classes containing specific physics analysis code. (iv) C++ macros that are executed by the ROOT C++ interpreter CINT and access the compiled code and data contained in the classes mentioned above. All the C + + classes used in the NA49 analysis inherit from the ROOT TObject class, allowing them to be fully integrated into the interactive ROOT environment. Routines for l/O and interactive access are generated automatically by the ROOT C++ interpreter based on the header files for the NA49 classes.
0168-9002/97/$17.00 Copyright \CI 1997 Published by Elsevier Science B.V. Al) rights reserved PII
SO 168-9002(97)00049-
1
UC. INTERACTIVE
ANALYSIS
G. RolandjNucl. Instr. and Meth. in Phw. Res. A 389 (1997) 87-88
88
The standard data and analysis classes are compiled into a shared library which can be dynamically linked to the standard ROOT executable to provide full interactive access to the data. Here we show a very simple example of how to access NA49 data using the ROOT C+ + interpreter: i T49Event *event; T49Track *track; T49DST *dst = new T49DST (“DATA23/run820. root”); TH2F *pty = new THSF (“pty”, “pt vs y”, 30,0,6, 20, 0, 2); TClonesArray *TrackList; while (event = dst- > GetNextEventO) i
TrackList = event- > GetTrackList ( >; for (i = 0; < TrackList- > Get Entries ( >;i + +>
{
track =
At (i); pty- > Fill2 (track- > GetRap #PIMASS), trackz GetPt<1);
I delete
(ii) I/O performance: We have extensively benchmarked the performance in reading the NA49 event objects, containing more than 1000 individual track objects and up to 10’ point objects. This led to the development of the TClonesArray container class in ROOT, which provides management of large collections of individual objects. Our present experience is based on typical mini-DST volumes of 5-8Gbyte. The data has been stored in the mini-DST using the Streamer ( ) member function of the T49Event class. This I/O routine is generated automatically by the ‘rootcint’ program [l]. The ROOT analysis was performed on a 200MHz Pentium Pro PC running the Linux operating system. The throughout for reading NA49 event objects from disk and re-creating them in memory was measured as 3 Mbyte/s, which leads to an event rate in the analysis of up to 20 events per second. In an interative application a single workstation of this type is sufficient for an analysis of samples of several 10” events. In batch-mode, allowing for a 24 h turn-around time, data sets of up to 1.000000 events (150 Gbyte) can be handled on a single processor.
4. Conclusions and outlook event;
1 pty- >Draw (“box”); ) In this macro, we access a ROOT data file using T49DST class, loop over all events in the file and fill rapidity and transverse momentum for every track in event into a two-dimensional histogram. Finally, histogram is drawn.
the the the the
3. The ROOT experience NA49 physics results obtained with ROOT have been presented at several conferences (ICHEP’96, Warsaw, Quark Matter’96, Heidelberg). In the course of the analysis, two features of the ROOT system were perceived as particularly important: (i) CINT: The CINT C++ interpreter allows the user to work in a consistent environment for all parts of the analysis. The interface to the data and analysis routines is identical in compiled code, interpreted macros and on the command line. Using C++ as the scripting language also provides far more powerful control of the analysis flow compared to the traditional CERN software. The downside of this approach is of course an initial investment required by the users to obtain a basic knowledge of C++. We also found that the design of the basic data classes requires great care and a widespread contribution from the collaboration to ensure that the resulting mini-DSTs are useful for many users.
Based on the expected data volumes and the need for a fast, highly interactive analysis the NA49 collaboration has investigated a completely new, C + + based analysis environment, ROOT. The data storage and analysis are performed using NA49 specific C + + classes. The analysis flow is controlled by interpreted ROOT macros written in C++. The experience so far with data sets of up to 8 Gbyte and a limited user base of about 10 physicists with previous experience in C or C ++ has been very positive, showing that the basic performance goals have been met. The ROOT49 project now faces two major challenges. namely extending its user base to about 50 physicists without prior C/C + + knowledge and providing interactive access to the growing volume of NA49 reconstructed events, which will reach 500.000 events (75 Gbyte miniDST) by the end of this year. The first problem will hopefully be solved by a series of ROOT workshops introducing PAW users to the world of object-oriented programming. To deal with the growing NA49 data volume the ROOT system will be modified to allow a parallel execution of data queries on a cluster of SMP workstations, with the aim of achieving an aggregate throughout of up to 50Mbyte/s, which would allow interactive access to the full NA49 data set.
References [l] R. Brun and F. Rademakers,theseProceedings(AIHENP 96, Lausanne, Switzerland,1996) Nucl. Instr. and Meth. A 389 (1997) 81.