CMS software architecture

CMS software architecture

Computer Physics Communications 140 (2001) 31–44 www.elsevier.com/locate/cpc CMS software architecture Software framework, services and persistency i...

222KB Sizes 0 Downloads 131 Views

Computer Physics Communications 140 (2001) 31–44 www.elsevier.com/locate/cpc

CMS software architecture Software framework, services and persistency in high level trigger, reconstruction and analysis V. Innocente a,∗ , L. Silvestris a,b , D. Stickland c a CERN, CMS/CMC, EP Division, CH-1211 Geneva 23, Switzerland b INFN, Sezione di Bari, Bari, Italy c Princeton University, Princeton, USA

On behalf of the CMS Software Group

Abstract This paper describes the design of a resilient and flexible software architecture that has been developed to satisfy the data processing requirements of a large HEP experiment, CMS, currently being constructed at the LHC machine at CERN. We describe various components of a software framework that allows integration of physics modules and which can be easily adapted for use in different processing environments both real-time (online trigger) and offline (event reconstruction and analysis). Features such as the mechanisms for scheduling algorithms, configuring the application and managing the dependences among modules are described in detail. In particular, a major effort has been placed on providing a service for managing persistent data and the experience using a commercial ODBMS (Objectivity/DB) is therefore described in detail.  2001 Elsevier Science B.V. All rights reserved. Keywords: HEP; LHC; CMS; Object Oriented; Software architecture; C++; ODBMS; Event reconstruction

1. Introduction Requirements on CMS software and computing resources [1] will far exceed those of any existing high energy physics experiment, not only because of the complexity of the detector and of the physics task but also for the size and the distributed nature of the collaboration and the long time scale. Since 1995 the CMS software group has been engaged in an extensive R&D program to evaluate and prototype a software architecture that will allow CMS reconstruction * Corresponding author.

E-mail address: [email protected] (V. Innocente).

and analysis software to be developed in the given time frame and cost envelope, to perform correctly and efficiently in the LHC demanding environment, while providing enough flexibility to cope with the inevitable changes of requirements that can be expected in a project with such a life-time. This R&D effort has, on one side, recognized that the software technologies and the software development practices in use in HEP so far were not adequate for the CMS software project. On the other side, it has identified new software technologies, such as object oriented programming, object data base management systems, flexible software architectures, evolutionary software development processes, which could be suc-

0010-4655/01/$ – see front matter  2001 Elsevier Science B.V. All rights reserved. PII: S 0 0 1 0 - 4 6 5 5 ( 0 1 ) 0 0 2 5 3 - 3

32

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

cessfully used to develop CMS software. These technologies have been used to build prototypes to validate our choices. Today an object oriented framework is in place, which is used to build real data processing applications, ranging from high level trigger to analysis, by incorporation of physics software modules. Flexibility is achieved making use of plug-in technology, deferred execution and implicit invocation. Object persistency is implemented using a commercial ODBMS, Objectivity/DB [2], interfaced to a high performance mass storage system, HPSS [3], for tertiary storage. Basic computing services are accessed through solid foundation libraries, LHC++ [4]. It is in use in Test-Beams [5] and High Level Trigger studies [6].

2. Requirements CMS has identified [1] the following underlying principles which motivate the overall design of the CMS Software Architecture. • Multiple environments: Various software modules must be able to run in a variety of environments as different computing tasks are performed. Examples of these environments are level 3 triggering, production reconstruction, program development, and individual analysis; • Migration between environments: A particular software module may start out being developed for one environment, then later used in other unforeseen environments as well; • Migration to new technologies: Hardware and software technologies will change during the lifetime of the experiment: a migration to a new technology should require a finite effort localized in a small portion of the software system, ideally without involving changes to physics software modules; • Dispersed code development: The software will be developed by organizationally and geographically dispersed groups of part-time non-professional programmers — only some portions of the code will be written by computing professionals; • Flexibility: Not all software requirements will be fully known in advance, therefore the software systems must be adaptable without requiring total rewrites;

• Ease of use: The software systems must be easily usable by collaboration physicists who are not computing experts and cannot devote large amounts of time to learning computing techniques. These requirements imply that software should be developed keeping in mind not only performance but also modularity, flexibility, maintainability, quality assurance and documentation. Object Orientation has been identified as the enabling technology, since it directly addresses these kinds of problems.

3. Architecture design These requirements on the software architecture result in the following overall structure for the CMS software: • an application framework CARF (CMS Analysis & Reconstruction Framework), customizable for each of the computing environments; • physics software modules with clearly defined interfaces that can be plugged into the framework; • a service and utility toolkit that can be used by any of the physics modules. The framework defines the top level abstractions, their behaviour and collaboration patterns. It comprises two components: a set of classes that capture CMS specific concepts like detector components and event features and a control policy that orchestrates the instances of those classes taking care of the flow of control, module scheduling, input/output, etc. This control policy is tailored to the task in hand and to the computing environment. The physics and utility modules are written by detector groups and physicists. The modules can be plugged into the application framework at run time, independently of the computing environment. One can easily choose between different versions of various modules. The physics modules do not communicate with each other directly but only through the data access protocols that are part of the framework itself. The service and utility toolkit consists of two major categories of services: physics type services (histogrammers, fitters, mathematical algorithms, geometry and physics calculation routines) and computer services (data access, inter module communication, user interface, etc.) and, in the long run, will most probably not be CMS-specific. This toolkit is based on a solid

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

foundation library, which provides basic computer services and a set of classes which form a “dictionary” to be used in all module interfaces. This “dictionary” is eventually extended by CMS specific classes. Today the foundation library of choice has been identified to be LHC++ [4]. Both the application framework and the service and utility toolkit shield the physics software modules from the underlying technologies which will be used for the computer services. This will ensure a smooth transition to new technologies with changes localized in the framework and in specific components of the service toolkit.

33

The design of this mechanism make use of the “Observer pattern” [8] and in CARF it is implemented using template classes with, as template argument, a pointer to the event to dispatch. Fig. 1 shows the class diagram of an example of a user class observing physics events and runs. In this example the User Class is a “standard observer” of runs (it will take immediate action when a run changes) and a “lazy observer” of events (it will take action only when a user asks it for a service). The message trace diagram for two scenarios corresponding to the occurrence of a new run and of a new event is presented in Fig. 2. 3.3. Module construction

3.1. Action on demand In the past years we have performed studies and evaluations of the architectures used in simulation, reconstruction and analysis programs of various HEP experiments at CERN, Fermilab, SLAC and Desy. These studies have convinced us that the traditional “main program and subroutines” architecture and even a more elaborated “pipelined” architecture is not flexible enough to satisfy CMS requirements described above. To achieve maximum flexibility CARF implements an “implicit invocation architecture” as, for instance, described by Shaw [7]. Modules register themselves at creation time and are invoked when required. In this way only those modules really needed are loaded and executed. Implicit invocation requires some additional mechanisms to construct the required modules, to customize them and to manage dependencies. 3.2. Implicit invocation Modules whose change of state depends on the occurrence of some external “event” should register themselves as an “observer” to the “dispatcher” of such an event. In CARF applications typical external “events” are the arrival of a new physics event, of a new simulated event (trigger or pile-up), a new run, a new detector set-up, etc. When a new “event” occurs the dispatcher informs all registered observers which update their state accordingly and eventually perform a specific action. To avoid useless computation, such an action can be deferred until a client asks an observer for a service.

CMS software is subdivided in “Packages”. A package groups all classes required to perform a very specific task. A package is realized physically as a shared library. CARF supports full run-time dynamic loading of these shared libraries. CARF provides a PackageInitializer class which can be specialized in each package. One instance of this class can be statically constructed when the corresponding library is loaded. Such an object can be used to construct and register default versions of the modules contained in the package and to dynamically load any package it requires. Additional “Singletons” can be used to publicize and export other more direct services. 3.4. Customization A CARF application can be customized in several ways: • at load time: the act of loading a shared library will automatically invoke the corresponding PackageInitializer; • at compilation time: CARF provides “hooks” where users can overwrite default initialization and register their own plug-in modules and define their own preferred configuration; • at run time: through a user interface (at present a simple ASCII file, in future a database driven configuration control system) CARF can be asked to dynamically load packages and loaded packages can be instructed to construct and register particular modules in some specific configuration. CARF applications do not need to be fully customized and configured at initialization time: when

34

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

Fig. 1. Class diagram showing the classes collaborating in the implicit invocation mechanism implemented in CARF.

Fig. 2. A sequence diagram showing the messages exchanged among the observer and dispatcher objects at the occurrence of a new run and of a new event.

objects from the database are accessed the dynamic loading of the packages required to handle them is triggered. A CARF application can start with a minimal set of software loaded: its software base will grow

and its functionality will expand while navigating into the database and accessing the objects required by the user. At the same time non-required software is not loaded keeping the application size small.

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

3.5. Dependency management To avoid a central management of dependencies all plug-in modules work in a “lazy-update” mode. In such a mode the actual code execution is deferred until a service is required to the object in question. A simple state machine ensures that code is executed once for each “event”. A “lazy-update” approach achieves two goals: • code is executed if and when it is really required, • dependencies are automatically taken into account. The major drawback of the absence of a central management of dependencies is the impossibility to perform a “static check” of the application configuration prior to run time: missing modules and/or circular dependencies will only be detected at run time.

35

tion software). Sources of information for a reconstruction algorithm are: • Real or simulated data provided by DAQ or simulation program; • Reconstructed objects provided by other reconstruction algorithms; • Environmental data including: – Detector description; – Detector status; – Calibrations; – Alignments; • Parameters used to steer the algorithm. This is schematically shown in Fig. 3. In the reconstruction process we identify two stages: detector reconstruction and global reconstruction. 4.1. Detector reconstruction

4. Reconstruction Reconstruction is the data-reduction process which produces the information relevant to physics analysis starting from the raw data delivered by the data acquisition system. Reconstruction algorithms share common sources of information and services from common utility tools (which are re-usable components of CMS reconstruc-

Detector reconstruction is a data-reduction process which can be performed in isolation in a single “Detector Unit”. It can be considered essentially as an algorithmic procedure run as a direct continuation of the digitization process performed online. It is performed offline mainly because environmental data are not available, or precise enough, at the time the data are acquired. Because of its local nature it is natural to assign the responsibility of detector

Fig. 3. Sources of information for a Reconstruction algorithms (arrows indicate the direction of requests, not data flow).

36

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

reconstruction to the detector unit which produces the raw data itself. We will denote the deliverables of such a reconstruction process as “reconstructed hits” (RHITs). The granularity of detector units will of course depend on the detector layout. It is supposed that global reconstruction modules will ask directly the detector units for their RHITs during pattern recognition. A geometry-setup object is therefore required to hold the detector units and eventually provide a navigation tool among them. It should be noted that detector reconstruction has several additional distinctive features: • It is usually stable and rarely is it required to run two different reconstructions for the same detector unit concurrently (although mechanisms to change an algorithm at run time are provided). • It is usually fast (most of the time just a calibration and a linear clustering algorithm) compared to the combinatorial reconstruction which will follow.

• If it is required to be rerun, it invalidates all the results of the global reconstruction which depends on it. Therefore in most of the cases the results of detector reconstruction as such do not require to be made persistent. Detector Units are also the natural place where simulation of the digitization process can take place. In this case a mechanism for making persistent the object created in this process is provided. Fig. 4 shows the classes collaborating in detector reconstruction and their relationships: • DetectorSet: represents an application specific transient class which can group several DetectorUnits. It acts as interface among the physics modules, such as pattern recognition algorithms, and the Detector Units. It can cache the “Reconstructed Hits” to enhance performances. • DetectorUnit: models an elementary component of a detector. it can be a persistent class and its instances should be grouped into a “Set-Up”

Fig. 4. Class diagram showing the classes collaborating in raw data handling and detector reconstruction: in the first column are classes whose persistent instances are created event by event, in the second column are classes whose persistent instances exist for each “set-up”, the third column include transient classes that are application specific.

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

37

Fig. 5. Scenario diagram showing the collaboration pattern of the various classes contributing to detector reconstruction.

which is managed as a configuration item. It shares with the DetectorSet the responsibilities of caching simulated hits, performing digitization simulation and detector reconstruction. • ReadOutUnit: models the read-out electronics of a detector. Its major responsibility is to create raw data (in an online DAQ application or during digitization simulation) and to read them back in offline reconstruction applications. It is a persistent class and its instances are grouped into a “Set-Up” which is managed as a configuration item. Oneto-one, one-to-many and many-to-one relations are allowed between DetectorUnits and ReadOutUnits. • RawEvent: The entry point to all information about a “raw” event. • RawData: A simple persistent collection of the raw data belonging to a given ReadOutUnit for a given event. A typical collaboration pattern for detector reconstruction having persistent raw-data as input is shown in the scenario diagram of Fig. 5. 4.2. Global reconstruction Global reconstruction delivers “Reconstructed Objects” (RecObjs) suitable for physics analysis or further global reconstruction. By their very nature they are detached from the detector that provided the original information and should be usable in a Lorentz space common to the CMS detector as a whole. They

belong more to the event rather than to a specific detector. Reconstruction is performed by user-provided modules which extend the CARF provided Reconstruction Unit (RecUnit). A Reconstruction Unit usually performs the following activities: • Pattern recognition: the process that, starting from collections of lower level RecObj’s, constructs the RecObj’s of interest. Usually it makes use of time consuming combinatorial algorithms. • Fit: the process which computes the physical or geometrical attributes of a RecObj starting from information provided by its constituent RecObj’s. • Correction: the process whereby additional detector information (calibrations, alignments, etc.), not available (or meaningful) at constituent level, is used to build or fit a RecObj. Although these three basic activities can be interleaved and mixed depending on the actual reconstruction algorithm, it is current practice to split them in two parts: • Object creation: it is the process which creates “stable” RecObj’s in terms of, at least, their constituents. It includes pattern recognition and “zeroorder” corrections. In general changes to algorithms or information used in this process do not ensure that the resulting RecObj’s will keep their “identity”. Full consistency requires that previous RecObj’s be deleted if this process needs to be re-run.

38

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

• Object updates: it is the process whereby the physical and/or geometrical attributes of a RecObj are updated. It includes fit and “high-order” corrections. It can include some “non-combinatorial” pattern recognition (object extension). In general, algorithms and information used in this process do not modify the identity of a RecObj but only its attributes. If an update process is run, a new version of a RecObj is created. It seems natural to assign the responsibility of object creation to the Reconstruction Unit and the responsibility of object update to the object itself. A Reconstruction Unit is identified by a name (ASCII string) which is also used to identify the collection of RecObjs it creates. A default Reconstruction Unit can be assigned to each RecObj sub-type (usually by the corresponding PackageInitializer). This allows the implementation of a simple “Reconstruction-onDemand” mechanism: • a user asks the Event for the collection of a given type of RecObj; • (s)he can specify the collection name; if not the default name is assumed; • the Event checks if this collection is already present: – if present it returns the collection,

– if not present it requests the corresponding RecUnit to perform the reconstruction and then returns the freshly built collection. Fig. 6 shows the classes collaborating in this mechanism and their relationships: • RecEvent: The entry point to all information about a reconstructed event. • RecObj: An abstract class which needs to be specialized in the various concrete types (track, electron, jet, etc.). • Reconstructor: For each type of RecObj the event has a Reconstructor which manages the corresponding RecObj’s and the various reconstruction activities. The Reconstructor class is concrete and does not need to be specialized for each RecObj concrete type. In the current design it is identified by the name of the corresponding RecObj. • RecUnit: The actual Reconstruction Units will be specializations (by inheritance) of this class. The relationship between Reconstructor and RecUnit implements a well known Strategy Pattern [8]. • RecObjIterator: The user entry point to this mechanism. It is a parametrized class (a C++ template class) with instances for each concrete RecObj type. It is an extension of the classical iterator with the additional feature that the invocation of its constructor will trigger the “reconstruction on demand” mecha-

Fig. 6. Class diagram showing the classes collaborating in the reconstruction-on-demand mechanism.

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

39

Fig. 7. Scenario diagram representing the reconstruction on demand mechanism.

nism for its parameter type, as shown in the scenario diagram of Fig. 7. This architecture has several advantages: • There is no need to modify the RecEvent class (nor any other class in the top level architecture) when a new concrete RecObj subclass is introduced. • RecObj’s are reconstructed when needed and if needed. Their reconstruction is triggered by the object which actually uses them: there is no need for a steering routine where the order in the reconstruction process has to be defined and the dependencies among reconstruction units has to be known in advance. • Reconstructor can be versioned: the event entry point (the RecEvent object) remains always the same while different versions of reconstructed objects are managed at Reconstructor level. • In the present prototype RecUnit is a transient class. It can become a persistent class with the responsibility of keeping a record of all information actually used by the reconstruction unit itself. Reconstructed objects require to be made persistent and CARF provides full support to store and retrieve RecObj belonging to a given event. It also supports multiple versions of the same RecObj collection.

5. Persistency service CMS reconstruction and analysis software is required to store and retrieve the results of computing intensive processes (and in general of any process which cannot be “easily” repeated). This responsibility also covers raw-data storage as we plan to use the same software framework in both offline reconstruction and High Level Triggers. 5.1. Categories of data to be made persistent We have identified three major types of information which require to be made persistent: • Event data: data associated to a given “triggered beam-crossing”. They encompass: – Raw data: data read directly from the detector and eventually processed online. These are WORM (Write Once, Read Many) data, to be securely stored and never modified; – Reconstructed data: Data produced by the reconstruction process. They can belong to the whole collaboration, to a physics analysis group or to a single user. (Partial) re-processing of event data produces new versions of reconstructed data. Re-

40

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

constructed data may be further structured in several “layers” according to use-cases and access patterns. – Event tag: a “small” object which summarizes the main features of an event to be used for fast selection. Event data are usually organized in datasets according to run conditions, physics selection criteria and access patterns. • Environmental data: data describing the state of the environment at the time the event data were produced. They are identified by: – a validity time period: which allows to relate an event to the corresponding environmental data, – a version. Environmental data encompass: – Machine status, – Detector status (setup, calibration, alignment, etc.), – Running conditions, – Reconstruction configuration, including algorithm parameters and user options. This information can be produced directly from the Experiment Control Systems or from offline processes. Due to the fact that new versions of environmental can be produced it is required to put in place a mechanism which relates the event data (other than raw) with the environmental data actually used during the reconstruction process. • Summary data – Event catalog, – Statistics. 5.2. ODBMS: an object oriented solution to data persistency CMS has joined as early as 1995 the RD45 [9] project to investigate possible solutions to the problem of persistent data management. As part of the RD45 team the CMS software group has been engaged in several activities such as product evaluation, prototype building, benchmark testing and risk analysis. These activities have been described in the RD45 status reports [10]. We summarize here the main conclusions of this R&D effort:

• A persistent object model, following the ODMG standard [11], is adequate to describe the data of any HEP experiment and of CMS in particular; • An Object Data Base Management System (ODBMS) satisfies our requirements for managing access to all types of data and provides a coherent solution to the problem of persistent object management; • Objectivity/DB [2] has been identified as a possible candidate to be used as ODBMS for CMS. In particular: – Objectivity/DB has been evaluated by CMS in several prototypes which successfully stored and retrieved environmental data, Test-Beam data, simulation data, reconstructed data and statistical data such as histograms. – Objectivity/DB successfully passed several benchmark tests: in particular CMS has been able to write at 100 MB/s into an Objectivity database. – We have identified improvements required to make Objectivity/DB fully usable at LHC. These include support for very large databases, interface to mass storage, security of the data server, support for user data, and compliance to the standard. All these issues are receiving attention from Objectivity, and several of them have been already solved. – We do expect that CMS, in collaboration with RD45 and other experiments using Objectivity such as BaBar, will be able to have a working solution to most of the remaining open issues before the end of year 2001. • The use of a DBMS requires a level of software configuration management and data administration which is not commonly realized in HEP collaborations. Simple tools based on basic operating system and file system features are not adequate and the usual skills of physics graduate students are not sufficient. The decision to use an ODBMS implies an investment in software professional personnel and a commitment, at all levels of the collaboration, to improve the processes of software development and data management to match the constraints imposed by the use of industrial-strength software components. CMS has, for a long time, been committed to and implemented a rigorous software process [12]. • At present the major risk in using a commercial ODBMS comes from the limited market these products have. In line with the recommendation

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

contained in the CMS CTP [1] (p. 22) the RD45 team has started a prototype effort to understand if an in-house product meeting our requirements can be developed. CMS follows closely also the experience of other HEP collaborations (ALICE, STAR, CDF, D0) which have decided not to use an ODBMS, to be in the position to readily evaluate different solutions if an ODBMS eventually proves to be inadequate for our needs. 5.3. Persistent object management in CARF Providing a solution for persistent object management has always been a major goal of the development of CARF: it has always been considered one of the major components of the analysis and reconstruction framework. Indeed already the very first prototypes (notably the prototype used in the test-beam of 1997) had already an Objectivity database as a key component [5]. Today, persistent object management is fully integrated into CARF. CARF manages, directly or through the utility toolkit: • database transactions; • creation of databases and containers; • detector set-up and event collections; • physical clustering of event objects; • persistent event structure and its relation with transient objects; • de-referencing persistent objects. CARF provides also a software middle-layer, mainly in the form of template classes, which helps developers of detector software in providing persistent versions of their own objects. To avoid transaction overheads event objects are first constructed as transient instances and are made persistent (by a copy to their persistent representations) only at the end of the event processing when a decision is finally taken about the fate of the event (send it to oblivion or save it in a particular dataset) and about its classification. Such a copy is not performed when accessing persistent events: physics modules access directly persistent objects through simple C++ pointers. CARF takes care of all details to make sure that the required objects are actually loaded in memory and that their pointers do not become invalid while the event is processed.

41

This architecture avoids that detector software developers should become Objectivity experts. Indeed the goal is to make the use of Objectivity completely transparent to physics software developers without making compromises in efficiency. This will also guarantee a painless transition to a new persistent technology if Objectivity proves not to satisfy our requirements. The present version of CARF offers persistent object management for: • the event structure and the event catalog common to test-beams, simulation and reconstruction, • detector set-up description common to test-beams, simulation and reconstruction, • raw data from test-beam, • simulated particles, simulated tracks and simulated digitizings, • reconstructed objects common to test-beams and simulation. 5.4. Persistent event model In designing a persistent object model two aspects should be taken into account: • The logical model; which describes persistent capable classes and their relationships and the corresponding navigation paths a user has to follow (explicitly or implicitly) to obtain a service (in a database application usually access to some information). • The physical model; describing the localization of the various objects on the storage media. I/O granularity is different from object granularity and the way in which objects are physically clustered together affects considerably the performance of I/O bound applications. An optimized persistent object model will minimize the number of I/O operations to be performed to satisfy the major use-cases. The critical component of an I/O bound use-case is the “access pattern”, i.e. which objects are accessed and in which order. Therefore the design of a persistent object model will require an optimization of the major access patterns. If a single object model can not optimize all major access patterns the use of techniques such as partial replication and re-clustering could be required. In a multi-user distributed environment concurrency issues should also be considered. Top-level entry point

42

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

objects could be required to be accessed at the same time by many applications. These objects are usually also the ones which require to be more often updated when new information is added to the database. These activities could create I/O bottlenecks which could require the use of replication and caching techniques to be solved. The great majority of HEP applications are data reduction processes: they read a “large” amount of data and produce a summary of it without modifying the input data. Therefore the major use-case is a typical analysis job which accesses parts of an event and produces high level reconstructed objects, tags or just statistics objects. It should be noted that, depending on the stage at which the analysis is performed a different portion of the event is required to be accessed, for example: • raw data belonging to a single detector in a calibration job (and the successive reconstruction jobs); • a large portion of the reconstructed objects in first pass physics analyses which will produce high-level (global) reconstructed objects; • high-level reconstructed objects and a small portion of the reconstructed objects in final physics analyses. Other use-cases such as single-event visualization, detailed detector performance studies, should also be taken into account but they can fit a large spectrum of persistent event models without major additional performance penalties. In the following description of the CMS persistent event model we will use a nomenclature introduced in BaBar and used also in the MONARC project [13]. These concepts, although useful at this stage to describe our model using a common language, will not necessarily correspond to concrete entities in the CMS final implementation. This nomenclature classifies the information belonging to a HEP experiment event (from higher levels to lower levels) according to creation and access patterns in: • TAG (Event Selection Tag): used for fast event selection; • AOD (Analysis Object Data): information used in final analysis; • ESD (Event Summary Data): information required for detailed analysis and high-level (global) reconstruction;

• REC (Reconstructed Data): detailed information about reconstructed objects required mainly for detector and algorithm performance studies; • RAW (Raw Data): written once, never modified. For simulated events they include also all information generated during the simulation step. We expect TAG, AOD and partially ESD to be accessed by the majority of the analysis jobs most of which will not be CPU-intensive. REC and RAW will instead be seldom accessed, mostly by scheduled, CPU-intensive, production jobs which will run over them in a sequential fashion to generate a new version of the corresponding ESD, AOD and TAG. It is interesting to note that analysis jobs which require access to low level information usually require access also to the corresponding high level information and indeed in most of the cases they start from it. 5.4.1. Physical clustering of data The logical relations among raw data, reconstructed data and the event as a whole have been already described in the previous section together with the navigation paths used to access them. Raw data belonging to the same “sub-detector” are clustered together. This optimizes detector studies and low-level reconstruction use-cases which usually access one type of sub-detector. Raw data are not required to be modified and no versioning mechanism is provided. This decision could be revised if a reanalysis of simulation use-cases shows the need of multiple versions of digitization simulation. Each CMS reconstructed class can be subdivided into AOD, ESD and REC components according to access patterns. AOD, ESD and REC will be physically clustered independently. For a given class, such as track or cluster, the AOD will include physics information such as a Lorentz vector and particle identification, the ESD, detector related information such as position or energy deposited in various calorimeter compartments and quality criteria, while the REC could contain detailed information about the original object constituents such as hits. This subdivision is not rigid; some high-level objects can have only an AOD component while some low-level objects can be completely clustered into ESD containers. Fig. 8 shows a model of a possible decomposition. In this case the AOD is directly inheriting form RecObj (the real reconstructed objects) and has, as

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

43

Fig. 8. Class diagram representing a possible decomposition of a reconstructed object into AOD, ESD and REC components.

private component, a reference to the corresponding ESD which in turn has a reference to the REC as private component. In this model it is supposed that the AOD explicitly exports (acting as a “proxy”) the ESD and the REC services. In this way two goals are achieved: • The user has a unique coherent view of a reconstructed object through the AOD interface, independently of the details of the AOD–ESD–REC implementation; • The information can be properly clustered according to the access pattern (moving some information from one component to another requires schema evolution). When re-reconstruction is required (due to changes in calibrations or algorithms) a new version of the reconstructed event (RecEvent) is created. Although the possibility of in-place replacement is still being investigated, all use-cases show that versioning offers more flexibility and safety. The new RecEvent refers to the previous version of all reconstructed objects which do not require an update and holds all the ones which have been re-reconstructed. It should be noted the the model of Fig. 8 allows the creation of new versions of AOD and ESD without any modification or copy of the lower level information. Indeed different AOD versions can share the same version of an ESD. This also allows an easy partial event replication (just a new version with identical copy of the parts to replicate instead of a real re-reconstruction). In this schema the responsibility of holding the information about which is the “correct” event collection to access is allocated to the event catalog. At present CARF manages event collections mainly through Objectivity naming features.

6. Future activities The present prototype activities and the use of CARF in test-beams and simulation reconstruction has unveiled several new requirements and some design imperfections. Although the CARF architecture has been successfully validated by these tests we plan a major technical review at the beginning of 2001. Some of the areas which will require more attention in the future developments of CARF are as follows: • Harmonization of CARF usage among the various applications: this is part of the continuous process of incremental development which brings into the framework components common to more than one application. • Adaptation of the framework to make it more robust and resilient to user mistakes. • Improvement of error and exception handling. • Improvement of the user interface and of the configuration mechanisms in general. • Extension of the persistency service to include environmental data. 7. Summary Thanks to several years of continuous and coherent R&D effort CMS is now able to provide to its physics software developer community an object oriented reconstruction and analysis framework which satisfies the requirements set in the CTP. Besides the management and scheduling of physics modules, CARF provides a persistency service which is today exploited in different environments and application domains, such as data taking and analysis in the test-beams and for High Level Trigger simulation studies.

44

V. Innocente et al. / Computer Physics Communications 140 (2001) 31–44

References [1] CMS Collaboration, Computing technical proposal, CERN/ LHCC 96-45, Geneva 1996. [2] Objectivity Technical Overview, Version 5, http://www.objectivity.com/Products/Tech0v.html. [3] High Performance Storage System, http://www.sdsc.edu/hpss/hpss1.html. [4] Libraries for HEP Computing, LHC++, http://wwwinfo.cern.ch/asd/lhc++. [5] L. Silvestris, A prototype of the CMS Object Oriented reconstruction and analysis framework for the beam test data, CMSCR/1998-022 presented at CHEP98, August 31–September 4, 1998, Chicago, IL. [6] D. Stickland, The design, implementation and deployment of a functional prototype OO reconstruction software for CMS, The ORCA project, CHEP, 2000, Abstract 108.

[7] M. Shaw, Some patterns for software architectures, in: Proc. 2nd Workshop on Pattern Languages For Programming, Addison-Wesley, Reading, MA, 1996. [8] E. Gamma, Design Patterns, Addison-Wesley, Reading, MA, 1994. [9] RD45, A persistent object manager for HEP, http://wwwinfo. cern.ch/asd/rd45. [10] RD45, Reports, http://wwwinfo.cern.ch/asd/rd45/reports.htm. [11] R.G.G. Cattel et al., The Object Database Standard: ODMG 2.0, Morgan Kaufmmann Publishers Inc., San Francisco, CA, 1997. [12] J.-P. Wellisch, Status of software process improvement in CMS, CMS-IN/1999-033. [13] MONARC, Models of networked analysis at regional centres for LHC experiments, http://monarc.web.cern.ch/MONARC/.