Nuclear Instruments and Methods in Physics Research A 824 (2016) 329–330
Contents lists available at ScienceDirect
Nuclear Instruments and Methods in Physics Research A journal homepage: www.elsevier.com/locate/nima
The computing and data infrastructure to interconnect EEE stations F. Noferini a,b,n a b
Museo Storico della Fisica e Centro Studi e Ricerche “Enrico Fermi”, Rome, Italy INFN CNAF, Bologna, Italy
On behalf of the EEE Collaboration art ic l e i nf o
a b s t r a c t
Available online 10 November 2015
The Extreme Energy Event (EEE) experiment is devoted to the search of high energy cosmic rays through a network of telescopes installed in about 50 high schools distributed throughout the Italian territory. This project requires a peculiar data management infrastructure to collect data registered in stations very far from each other and to allow a coordinated analysis. Such an infrastructure is realized at INFN-CNAF, which operates a Cloud facility based on the OpenStack opensource Cloud framework and provides Infrastructure as a Service (IaaS) for its users. In 2014 EEE started to use it for collecting, monitoring and reconstructing the data acquired in all the EEE stations. For the synchronization between the stations and the INFN-CNAF infrastructure we used BitTorrent Sync, a free peer-to-peer software designed to optimize data syncronization between distributed nodes. All data folders are syncronized with the central repository in real time to allow an immediate reconstruction of the data and their publication in a monitoring webpage. We present the architecture and the functionalities of this data management system that provides a flexible environment for the specific needs of the EEE project. & 2015 Elsevier B.V. All rights reserved.
Keywords: Cosmic rays Cloud Data transfer
1. Introduction
2. Data transfers
One of the main goal of the EEE Project is to involve young students in a high-level scientific enterprise. Therefore the setup of the experiment is very peculiar and requires new solutions for the data management. For this purpose the EEE Project joined the CNAF cloud facility to create its own data collection center. In fact the CNAF cloud provides a flexible environment based on OpenStack and virtualization which allows us to allocate on demand resources adapted to the need of the experiment and to collect data from the telescopes which are distributed in a wide territory. The EEE telescopes [1] consist of three planes of Multigap Resistive Plate Chambers with a time resolution o 100 ps. Each chamber (plane) is segmented in 24 strips allowing to reconstruct the direction of the secondaries produced in cosmic ray showers. Each telescope is equipped with a GPS to associate a UTC time to each event and then to perform search of coincidences at very large distances. All the telescopes were built by high-school students at CERN (starting from 2004), and they are maintained by them under the supervision of EEE researchers.
After a pilot run in 2014, the EEE project performed a first global run, Run-1, involving 35 schools in a coordinated data acquisition. During the run1 all the schools were connected/authenticated at CNAF in order to transfer data using a BitTorrent technology. To realize this goal a btsync client (Win OS) is installed in each school and a frontend at CNAF is dedicated to receive all the data with a total required bandwidth of 300 kB/s, to collect the expected 5–10 TB per year. All the data collected are considered as custodial and for this reason they are stored also on tape. In Fig. 1 the general architecture for the EEE data flow is reported. In the period including the pilot run and Run-1 we collected about 7 billion cosmic rays, corresponding to 2 TB data transferred at CNAF. In the same period also 3 TB from past years were also transferred. In Fig. 2 a summary of the data flow performances during Run-1 is reported.
n Correspondence address: Museo Storico della Fisica e Centro Studi e Ricerche “Enrico Fermi”, Rome, Italy. E-mail address:
[email protected]
http://dx.doi.org/10.1016/j.nima.2015.10.069 0168-9002/& 2015 Elsevier B.V. All rights reserved.
1 Pilot run from 27-10-2014 to 14-11-2014 and Run-1 from 02-03-2015 to 3004-2015.
330
F. Noferini / Nuclear Instruments and Methods in Physics Research A 824 (2016) 329–330
3. Data reconstruction/monitor/analysis The chain to reconstruct data at CNAF is fully automated. This point is really crucial because all the schools have to be monitored also remotely to act promptly in case of problems. This point is addressed through automatic agents, running in a CNAF node dedicated to this issue, which are able to identify the arrival of a new file and then to trigger the reconstruction. A MySql database is deployed to trace all the actions performed on each single file (run) and the
main parameters resulting from the reconstruction. Once the run is reconstructed a DST (Data Summary Tape) output is created and some quality plots are made available and published in the web page devoted to monitoring [2] (Fig. 3). In parallel, a cluster of analysis nodes is reserved to EEE users via virtual nodes constructed on a dedicate image of the Operating System selected for the experiment (SL6). The EEE users authenticated at CNAF can access data (both RAW and DST files) via a gpfs filesystem as well as the software of the experiment. The analysis activity [3] at CNAF resources is currently focused on several items [4] like coincidences searches (two-three-many stations), rate vs. time (rate monitorþpressure correction), East–West asymmetry, cosmic ray anisotropy, upward going particles and the observation of the moon shadow. 4. Conclusion
size (GB) and number of schools per day
Fig. 1. Architecture of the EEE tenant at CNAF.
50
Run-1: 02/03/2015 - 30/04/2015
Extreme Energy Events La Scienza nelle Scuole
EEE EEE
Only reconstructed files considered
40
number of transferring schools GB (RAW data) all schools
30
20
10
From 2014 the EEE experiment entered in the phase of a coordinated activity between its telescopes. Such a step is realized with the creation of a data collector center at CNAF which at the same time provides the resources needed for the user analysis. The centralization of the EEE activities gave a big boost both in the scientific program and in the participation of the high schools students. This “joint venture” between EEE and CNAF is still young and it will increase in the next months with the development of other services which are currently under study. In the future, CNAF staff planned to provide an Infrastructure-as-a-Service to EEE users to make the access to the resources even more flexible according to the cloud paradigm (user will be able to instantiate VMs on demand for the analyses) and to submit jobs to a dedicated LSF queue of CNAF “Tier1” data center. Several solutions to release the most relevant data using consolidated OpenData frameworks are under investigation (CKAN, OpenDataKit, etc.). Easy-to-use access mechanism to CNAF data will be deployed (Webdav/Swift/Torrent) as well as a structure of data replicated and stored among schools, exploiting torrents benefits. References
0
11/03/15
25/03/15 08/04/15 day
22/04/15
Fig. 2. Statistics for the EEE Run-1 in 2015. For each day the number of schools transferring data and the amount of data collected at CNAF (in GB) are reported.
[1] [2] [3] [4]
M. Abbrescia, et al., Journal of Instrumentation 7 (2012) 11011. INFN-CNAF, EEE Monitor 〈https://www.cnaf.infn.it/eee/monitor/〉. M. Abbrescia, et al., EPJ Plus 128 (2013) 148. M.P. Panetta, these proceedings, http://dx.doi.org/10.1016/j.nima.2015.10.073.
Fig. 3. A screenshot of the EEE monitor page [2]. Data Quality Monitor (DQM) plots are provided in real time as well as the status of the connection of each school.