G Model
ARTICLE IN PRESS
FUSION-7285; No. of Pages 6
Fusion Engineering and Design xxx (2014) xxx–xxx
Contents lists available at ScienceDirect
Fusion Engineering and Design journal homepage: www.elsevier.com/locate/fusengdes
ITERDB—The Data Archiving System for ITER Gheni Abla a,∗ , Gerd Heber b , David P. Schissel a , Dana Robinson b , Lana Abadie c , Anders Wallander c , Sean M. Flanagan a a b c
General Atomics, P.O. Box 85608, San Diego, CA 92186-5608, USA The HDF Group, 1800 South Oak Street, Suite 203, Champaign, IL 61820, USA ITER Organization, Route de Vinon-sur-Verdon, 13115 St. Paul-lez-Durance, France
h i g h l i g h t s • We identified the software requirements and priorities for the ITER data archiving. • We designed ITERDB – a system architecture for ITER data archiving. • We investigated the feasibility of using HDF5 as the ITERDB internal data store file format.
a r t i c l e
i n f o
Article history: Received 17 May 2013 Received in revised form 21 January 2014 Accepted 6 February 2014 Available online xxx Keywords: ITER ITERDB Data Archiving Scientific Database
a b s t r a c t For ITER, acquiring, managing and archiving its data is an essential task. ITER is foreseen to produce up to one terabyte of data per pulse and several petabytes of data per year. All the produced data needs to be stored and managed. The stored data is expected to serve the data access needs of ITER researchers located both on the ITER premises as well as worldwide during ITER’s lifetime and beyond. ITERDB is a data management system being designed for centralized ITER data archival and data access. It is designed to manage and serve both unprocessed and processed data from the ITER plant systems and data analysis workflows. In this paper, we report the ITER Data Archiving System software requirements and priorities that have been identified by working with ITER staff and a large number of stakeholders. We will describe the design challenges and the proposed solutions. We will also present the current state of the ITERDB software architecture design. © 2014 Elsevier B.V. All rights reserved.
1. Introduction Critical to the success of ITER reaching its scientific goal (Q ≥ 10) is a data system that supports the broad range of diagnostics, data analysis, and computational simulations required for this scientific mission [1]. Such a data system will be the centralized data access point and data archival mechanism for all of ITER’s scientific data. The data system will provide a unified interface for accessing all types of ITER scientific data regardless of the consumer (e.g., scientist, engineer, plant operator) including interfaces for data management, archiving system administration, and health monitoring capabilities.
∗ Corresponding author. Tel.: +1 858 455 3103; fax: +1 858 455 3586. E-mail address:
[email protected] (G. Abla).
ITERDB is the name of the data system being designed and prototyped. Due to stringent requirements of ITER operations, the ITERDB has multiple design challenges. ITER is foreseen to produce on the order of a terabyte of data per pulse and several petabytes of data per year. All the produced scientific data needs to be stored and managed. The stored data is expected to serve the data access needs of ITER researchers located both on the ITER premises as well as worldwide during ITER’s lifetime and beyond. By its very nature, ITERDB needs to be tightly integrated with multiple other entities at ITER, some of which may be remote. In Fig. 1, ITERDB’s relation to other ITER entities (numbered 1–7) is shown. The aggregate rate of data to be recorded at ITER is at times anticipated to be quite large, up to 50 GB/s, at peak times. Recording the steady stream of data from many sources poses a substantial load balancing problem for ITERDB. In addition, a data replication process is required to enforce the isolation of Plant Operation Zone (POZ) network that forbids access from outside this network. From
http://dx.doi.org/10.1016/j.fusengdes.2014.02.025 0920-3796/© 2014 Elsevier B.V. All rights reserved.
Please cite this article in press as: G. Abla, et al., ITERDB—The Data Archiving System for ITER, Fusion Eng. Des. (2014), http://dx.doi.org/10.1016/j.fusengdes.2014.02.025
G Model FUSION-7285; No. of Pages 6
ARTICLE IN PRESS
2
G. Abla et al. / Fusion Engineering and Design xxx (2014) xxx–xxx
The stored data is served to data consumers during the pulse and after the pulse. The data consumers include both scientists and data analysis processes. It is also expected that ITERDB will hold some of the data derived from data analysis activities of the researchers and computational workflows. Finally, ITERDB needs to provide monitoring and management capabilities, and is expected to be integrated with other ITER CODAC tools and its network infrastructure. This particular study was asked to examine the feasibility of using HDF5 [3] as the underlying scientific file format for ITERDB. There are multiple data management systems being used in fusion experiments worldwide [4,5]. However, the existence of POZ, high aggregate data rate, the very long-pulse at ITER, and the HDF5 as the internal data format, created the unique design challenges for ITERDB.
1 ITER Streams Engineering Continuous delivery 7 data 2 Scientific Simulation EPICS data Status Supervision Control Operator Monitor
ITERDB
Control 3 Room Displays
Catalog 6 Scientist searches Data products Calibrated and processed data
Real-time data Continuous migration Pulse 4 Experimenter Raw data schedules
Data analysis 5
Fig. 1. ITERDB and its relation to other ITER entities.
2. Overall architecture a networking perspective, the ITER plant is divided into the POZ that is the secure isolated operating environment and the XPOZ, which includes all ITER networking outside the POZ. The Operation Request Gatekeeper (ORG) and Operation Data Gatekeeper (ODG) are responsible for information exchanges between POZ and XPOZ [2]. In Fig. 2, the position of ITERDB and related systems with respect to POZ and XPOZ is illustrated. The data storing process happens in two stages: (1) storing in short-term data storage; and (2) storing in permanent data storage. The recorded data must be progressively replicated from short-term data storage to permanent data storage. This requires a robust and efficient data replication mechanism that can insure data integrity while conducting a fast data transfer. Meanwhile, the data needs to be served to data consumers such as scientists and analysis codes. The permanent data holdings must be stored in an efficient and archive grade format. The ITERDB is also responsible for managing and serving a detailed provence information and metadata catalog are needed for searching and discovering datasets. All collected data, along with the metadata catalog, needs to be delivered to end users via a unified data access interface. The number of data sources and the size of datasets are expected to grow over the years, thus the data system must be scalable to meet the growth demand. The model of operation of ITER has two distinct phases. The first is the pulse period that is the creation and maintenance of the plasma for up to 3600 s. During portions of this period, it is anticipated that at times, data will be written to ITERDB at a high rate. It is anticipated that the bulk of ITER data will be produced during this period. The second phase is called the between pulse period which is the time between two plasma pulses and is expected to last for about 600 s. Some data is also produced during this phase.
2.1. POZ configuration store The configuration store holds ITERDB POZ internal configuration information including information mirrored from the CODAC Self-Describing Data Toolkit Database (SDD), a database for storing ITER diagnostic device information, and information about stream recording requests and their delegation to recorders and writers. 2.2. POZ data source Data sources are remote hosts that generate streaming data to be stored. The data sources are connected to POZ via the Plant Operation Network (PON) and Data Archiving Network (DAN). 2.3. POZ dispatcher The primary responsibility of the dispatcher is to accept and delegate stream recording requests to recorder instances. The dispatcher stores requests and delegation information in the configuration store. Recording requests arrive via the ITERDB EPICS interface.
MCR - Main Control Room ODG - Outgoing Data Gateway ORG - Operation Request Gatekeeper PCS - Plasma Control System POZ - Plant Operation Zone XPOZ - Outside POZ
Offline calibration Offline analysis
Due to the fact that the ITER operation area is separated into two distinct areas, POZ and XPOZ, the functional components of ITERDB architecture are also distributed between POZ and XPOZ. In Fig. 3, the ITERDB components that are located inside the POZ are shown.
Producer
Producer ITERDB
Simulation Management
ITERDB XPOZ
O R G
PCS Camera
ITERDB POZ
O D GBackup Consumer
Fast controller
Management
MCR
EPICS
Experimenter
Consumer Operator
Fig. 2. ITERDB and related systems and their relation to POZ and XPOZ.
Please cite this article in press as: G. Abla, et al., ITERDB—The Data Archiving System for ITER, Fusion Eng. Des. (2014), http://dx.doi.org/10.1016/j.fusengdes.2014.02.025
G Model FUSION-7285; No. of Pages 6
ARTICLE IN PRESS G. Abla et al. / Fusion Engineering and Design xxx (2014) xxx–xxx
3
Fig. 3. Components of ITERDB inside the POZ.
Fig. 4. ITERDB XPOZ components.
2.4. POZ HDF5 writer The HDF5 writers are responsible for writing data received from one or more data sources to one or more HDF5 files stored in the ITERDB POZ retention pool. 2.5. POZ monitor The purpose of the monitor is to acquire vital signs and performance data from all other POZ components. It has logic to detect deviation from normal behavior and is responsible for notifying the dispatcher in the event of abnormal system behavior. 2.6. POZ recorder The Recorders delegate recording requests from the dispatcher to the transmitter/HDF5 writer instances that perform the actual stream recording. It is responsible for initiating or stopping Transmitter/HDF5 Writer processes as needed. It also performs load-balancing decisions for recording. 2.7. POZ retention pool The retention pool stores all raw and processed scientific data produced during a pulse in the POZ. The data is stored in HDF5 files.
2.10. Catalog manager The responsibility of the catalog manager is to maintain an index of all archival holdings, the current state and distribution of digital objects in the POZ storage hierarchy, as well as access rights for such objects and collections of objects. The catalog manager also implements ITERDB’s main search capability. 2.11. Disk pool The disk pool’s responsibility is to act as a transient data buffer. The data can be logically divided into three regions. Active is considered to be raw and processed data that is available for read access from XPOZ clients. Incoming is considered the raw and processed data arriving from POZ, processed data written by XPOZ clients, and data being recalled from the archive. Outgoing is considered to be processed data written by XPOZ clients that has not yet been written to the archive. These are intended to be logical divisions, not physical. There would be no data transfer between regions when a file’s status changes; instead flags in the disk pool data structures would be set/unset. 2.12. XPOZ dispatcher The dispatcher delegates read and write requests to appropriate HDF5 readers and writers.
2.8. POZ transmitter
2.13. XPOZ HDF5 reader
The data from diagnostic devices are stored temporarily in the POZ retention pool. Meanwhile, the data needs be accessible to scientists and analysis codes. This requires a fast data multiplexing/transmission component to satisfy both needs. ITERDB XPOZ includes all components except those that belong to POZ. The users, both located in the main control room and remotely, access ITERDB via XPOZ. In Fig. 4, the main components belonging to XPOZ are shown.
HDF5 readers are created in response to read requests by the dispatcher. An HDF5 reader reads data from HDF5 files located in the active zone of the disk pool. Data not currently in the active zone are recalled from the archive on-demand.
2.9. Archive The archive’s main responsibility is to store multiple copies of all digital objects subject to ITER archival policies and to be able to recall them on request. These digital objects include raw and processed scientific data as well as engineering data. It is expected to grow in capacity over the years as the accumulated data grow at ITER.
2.14. XPOZ HDF5 writer HDF5 writers are created in response to write requests for processed data by the dispatcher. An HDF5 writer is responsible for writing data to versioned HDF5 files located in the active zone of the disk pool. Processed data is never overwritten; a new version is created instead. 2.15. ITERDB Admin API ITERDB Admin API is responsible for configuring ITERDB as it relates to ITER data sources (e.g. adding new hardware nodes or
Please cite this article in press as: G. Abla, et al., ITERDB—The Data Archiving System for ITER, Fusion Eng. Des. (2014), http://dx.doi.org/10.1016/j.fusengdes.2014.02.025
G Model FUSION-7285; No. of Pages 6
ARTICLE IN PRESS
4
G. Abla et al. / Fusion Engineering and Design xxx (2014) xxx–xxx
detaching them). It has the capability of configuring and modifying data transfer related aspects of ITERDB. It also facilitates user management (e.g., whitelist), as well as monitoring and logging configurations. Finally, it provides a channel of monitoring the ITERDB health status, data transfer status, and write/read access status.
The storage manager oversees the migration of incoming data into the archive and the recall of previously archived data. This component is most likely part of the storage solution and not an XPOZ component.
Samples typically do not arrive one sample at a time, but in blocks of samples. The number of samples per block, the sample count, can vary from block to block based on sampling frequency or frame rate. Typical sample counts are in the 100s to 1000s of samples in a block. Blocks may include samples from multiple data streams. The sample size (in bytes) in a stream may vary in time, but we assume that it is fixed for all samples in a block. For example, for a block of images, we assume that all images have the same resolution (and color space). If the resolution of a camera is increased, the device would start emitting blocks of images of a larger sample size, but it would not mix images of different resolutions in the same block. Sample sizes range from a few bytes to several megabytes. Each sample is time-stamped, relative to a pulse or in absolute time (UTC). The two most common representations are: (1) A timestamped block plus an interpolation procedure for the samples in the block and (2) An explicit time stamp for each sample in the block. An ITER stream may switch time stamp representations between blocks. The time resolution is typically on the order of nanoseconds. At the storage layer, ITER streams are viewed as a series of (header, payload) pairs and stored in HDF5 files. The header contains begin and end time stamps, the sample rate, count, and size, information about the time reference for the samples, and a reference to the payload. While HDF5 library supports the concurrent reading of a file by multiple processes, it does not support concurrent writing and reading. The Single-Writer/Multiple-Reader (SWMR) functionality will be fully supported at Version 1.10 release. Therefore, the internal HDF5 files are organized in a way that the writing and reading processes are separated. One such solution is to create separate HDF5 files for different time segments. The file names include time segment information so that no heavy search process is involved for finding the files. This method separates the writing from reading and allows the reading a signal with a certain delay. A separate process can follow up to generate larger files by combining smaller time segments if necessary.
3. Data model
3.2. Metadata
The data from ITER streams, pulse-based and continuous, is stored across multiple HDF5 files. Internal HDF5 indexing schemes and meta data store facilitate to querying HDF5 files and their data contents.
ITERDB automatically creates and maintains metadata that is used by the various APIs to rapidly fulfill user requests. This metadata is not directly available to the end user and thus cannot be directly queried, but only used via the ITERDB API. We will assume that the HDF5 data files will be stored using a deterministic naming scheme such as: /
// / segment ID/.h5 and ////.h5. Data sources could be aggregated into HDF5 files in a variety of ways. One suggested way would be to store data that are commonly accessed together in the same file. MongoDB, a NoSQL data store will be used to hold metadata describing the data residing in ITERDB [6]. This will include a mapping from pulse IDs, data source or variable names, and time intervals to the HDF5 files that store the data as well as catalog of the data sources and diagnostics. Note, that the HDF5 Writer generates these metadata when the HDF5 file is first loaded. In Fig. 5, the ITERDB components as well as the action sequences of the data reading process are illustrated.
2.16. ITERDB User API The ITERDB User API allows the reading of raw scientific and engineering data and the reading/writing of processed scientific data. Although not all are required the API access data via Pulse ID (for pulse-based data), Data source name, Time window (absolute or pulse-relative), and Version (for processed data, default: latest). Because of potentially high latencies (e.g., the requested data can be large), the API is non-blocking and clients may have multiple outstanding requests. 2.17. XPOZ monitor The monitor acquires vital signs and performance data from all other XPOZ components. It is also responsible to track data access records and logs. 2.18. Schema manager The XPOZ schema is a collection of known object types, searchable attributes, etc. All objects managed by the catalog manager are instances of object types defined in the POZ. It’s the schema manager’s responsibility to maintain that schema, which includes the creation of new object types and the versioning of existing types. 2.19. XPOZ storage manager
3.1. ITERDB internal files The scientific data produced by the ITER machine will be recorded in a divide-and-conquer fashion by a number of writer instances, which are deployed across an array of recording nodes. The data is stored internally in HDF5, that is, unless clients request the data in this format, they will never be exposed to these internals. The HDF5 file writers and specific file structure can be customized based on data type and format (e.g., unprocessed raw data, and processed data). We present an HDF5 profile for arbitrary byte streams generated by diagnostic devices. Due to space limitations, we ignore the specifics of the streams being recorded and communication artifacts. However, this data model can be optimized for specific diagnostic device, data and recorder types. We assume that an ITER stream from a diagnostics device is represented as a series of byte sequences called samples. A sample could be a floating point temperature value, a 3D field map, or a 2D image. It is one data element or object.
1. A client connects to the ITERDB data service and sends a request to read a specific piece of data specified as pulse, data name, and/or time-range. The Dispatcher creates a new Data Service process dedicated to answer that particular request.
Please cite this article in press as: G. Abla, et al., ITERDB—The Data Archiving System for ITER, Fusion Eng. Des. (2014), http://dx.doi.org/10.1016/j.fusengdes.2014.02.025
G Model FUSION-7285; No. of Pages 6
ARTICLE IN PRESS G. Abla et al. / Fusion Engineering and Design xxx (2014) xxx–xxx
5
Fig. 5. Graphical illustration of the action sequence required for data reading from ITERDB.
2. The new Data Service process connects to the Metadata Manager and requests information that will allow mapping the requested data to the exact location within an HDF5 file. The Metadata Manager answers with mapping information. 3. The Data Service process sends a data read request along with the required metadata to the HDF5 Readers service. 4. The HDF5 Reader service assigns an HDF5 data reader process. When completed, the requested data is constructed. 5. The HDF5 reader sends data back to the Data Service. 6. The Data Client receives the data from the Data Service and the data format is in the ITERDB API’s specified format. 4. ITERDB User API and system administration 4.1. User API ITERDB is the central data system for storing and serving all scientific data relevant to the ITER operations. It provides centralized data storage and access for all ITER collaborators including those onsite and off-site. The ITERDB data service is designed based on client–server architecture for both reading and writing data over TCP/IP connections. Because of this architectural design, from a data access standpoint, once authenticated, there is no difference between local and remote clients. It must be noted that ITERDB provides data streaming capability. The ITERDB server will have the administrative capability to configure (possibly dynamically) the server resources that can be utilized for each data request. By doing this, ITERDB can handle even a very large data request while still continuing to handle other, more typical, smaller requests. To simplify the usage of ITERDB, the architectural design hides the location and structure of the HDF5 files that contain the actual data from the client. The metadata manager is responsible for storing the metadata that allows mapping from a client request (pulse, data name, time range) to the name of a HDF5 file as well as its specific location inside an HDF5 file. The metadata provides the right parameters to the HDF5 Reader so that the HDF5 API call parameters can be properly constructed. Currently, the following essential API routines have been proposed. IdbConnect: Establishes a TCP/IP socket connection to remote ITERDB server. IdbDisconnect: Disconnect from currently connected TCP/IP socket. IdbRead: Reads a data set from the ITERDB storage system and return a data object which contains all data relevant to the data
set name for a given pulse or time range, including: data, time base, position, data type, rank, etc. IdbWrite: Writes an ITERDB data object to the ITERDB for a given data set name, and time range. IdbSearch: Searches the ITERDB system and return Boolean object based on the existence of the data. IdbInfo: Returns metadata information about the Data stored in ITERDB.
4.2. ITERDB administration ITERDB offers facilities for system monitoring and administration. These capabilities address the following administrative needs: (1) system health and status information; (2) resource usage information; (3) hardware and software component configuration and management; (4) data schema creation and schema versioning management; (5) configuration and adjustment of the data transfer from ITERDB POZ to ITERDB XPOZ; (6) metadata management; (7) management of user access control levels; and (8) system logging, auditing and reporting. Some of the above functionalities are exposed via an API, others are implemented as standalone application. This separation is driven by the complexity of an individual function.
5. Future work The ITERDB design is a work-in-progress, and the architecture will continue to evolve. In the near future, we will have identified the main functional components that will allow us to demonstrate that the proposed architecture results in a scalable system. The validity of the data model, the file layout and the corresponding metadata schema will be tested against wide range of possible diagnostic devices and data types. A natural next step would be the development, deployment and thorough testing of a prototype system.
Acknowledgment This work was supported by the ITER International Fusion Energy Organization under ITER/CT/12/4300000571. The views and opinions expressed herein do not necessarily reflect those of the ITER Organization References [1] ITER Project Mission http://goo.gl/Qz9DD(accessed 19.12.13).
Please cite this article in press as: G. Abla, et al., ITERDB—The Data Archiving System for ITER, Fusion Eng. Des. (2014), http://dx.doi.org/10.1016/j.fusengdes.2014.02.025
G Model FUSION-7285; No. of Pages 6
ARTICLE IN PRESS
6
G. Abla et al. / Fusion Engineering and Design xxx (2014) xxx–xxx
[2] G. Abla, T.W. Fredian, D.P. Schissel, J.A. Stillerman, M.J. Greenwald, D.N. Stepanov, D.J. Ciarlette, Operation Request Gatekeeper: A software system for remote access control of diagnostic instruments in fusion experiments, Rev. Sci. Instrum. 81 10 (2010) E124. [3] HDF5 File Format Specification, http://goo.gl/aDRcZhttp://goo.gl/aDRcZ [4] W. Thomas, Fredian, A. Joshua, Stillerman, MDSplus. Current developments and future directions 60 (2002) 229.
[5] R. Layne, A. Capel, N. Cook, M. Wheatley, JET EFDA contributors, Long Term Preservation of Scientific Data: Lessons from JET and Other Domains, Fusion Eng. Des. 87 (2012) 2209. [6] MongoDB documentation(http://goo.gl/7i9eT)., http://goo.gl/7i9eT.
Please cite this article in press as: G. Abla, et al., ITERDB—The Data Archiving System for ITER, Fusion Eng. Des. (2014), http://dx.doi.org/10.1016/j.fusengdes.2014.02.025