Recent developments and object-oriented approach in FTU database

Recent developments and object-oriented approach in FTU database

Fusion Engineering and Design 56 – 57 (2001) 981– 986 www.elsevier.com/locate/fusengdes Recent developments and object-oriented approach in FTU datab...

220KB Sizes 0 Downloads 20 Views

Fusion Engineering and Design 56 – 57 (2001) 981– 986 www.elsevier.com/locate/fusengdes

Recent developments and object-oriented approach in FTU database A. Bertocchi a, G. Bracco a, G. Buceti a, C. Centioli a, F. Iannone a, G. Manduchi b, U. Nanni c, M. Panella a,*, C. Stracuzzi c, V. Vitale a a

Centro Ricerche Enea Frascati, 6ia E. Fermi 45, 000044 Frascati, Rome, Italy b CNR, Istituto Gas Ionizzati, Padua, Italy c Uni6ersity of Rome ‘La Sapienza’, Italy

Abstract During the last two years, the experimental database of Frascati Tokamak Upgrade (FTU) has been changed from several points of view, particularly: (i) the data and the analysis codes have been moved from the IBM main frame to Unix platforms making enabling the users to take advantage of the large quantities of commercial and free software available under Unix (Matlab, IDL, …); (ii) AFS (Andrew File System) has been chosen as the distributed file system making the data available on all the nodes and distributing the workload; (iii) ‘One measure/one file’ philosophy (vs. the previous ‘one pulse/one file’) has been adopted increasing the number of files into the database but, at the same time, allowing the most important data to be available just after the plasma discharge. The client–server architecture has been tested using the signal viewer client jScope. Moreover, an object oriented data model (OODM) of FTU experimental data has been tried: a generalized model in tokamak experimental data has been developed with typical concepts such as abstraction, encapsulation, inheritance, and polymorphism. The model has been integrated with data coming from different databases, building an Object Warehouse to extract, with data mining techniques, meaningful trends and patterns from huge amounts of data. © 2001 Elsevier Science B.V. All rights reserved. Keywords: FTU; Database; Tokamak

1. Introduction Approximately two years ago the necessity to update the hardware and software components of the data acquisition system and the experimental data archive of Frascati Tokamak Upgrade * Corresponding author. Tel.: + 39-69400-5245; fax: +3969400-5524. E-mail address: [email protected] (M. Panella).

(FTU), arose for the following reasons: (i) the obsolescence of the previous archive built on a IBM platform mainframe; (ii) the increase in the time to make the data available, i.e. if the user wanted to see the first data acquired, he had to wait for the closing of the pulse file and this meant 15 min from the end of the pulse; moreover, things got worse as soon as new diagnostic systems were added; (iii) the need to make new technologies such as UNIX and/or Windows NT

0920-3796/01/$ - see front matter © 2001 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 0 - 3 7 9 6 ( 0 1 ) 0 0 4 4 1 - 0

982

A. Bertocchi et al. / Fusion Engineering and Design 56–57 (2001) 981–986

applications for personal processing available to users; (iv) the need to optimise the maintenance of the data and the systems both from the hardware and software point of view. Further, the home-made tools that we have been using until now cannot cope with the increased data complexity and had proven inadequate in handling project management and software development [1]. The O-O model has the capability to describe the domain of data through: (i) static properties (objects, attributes and relationships); and (ii) dynamic properties (methods). Typical concepts such as encapsulation, inheritance, and polymorphism can capture the semantics of data to realize a more expressive ‘real world’ description. Moreover, we need persistence functionality, so we should integrate Object-Oriented methodology to database technologies such as integrity, security, transaction optimisation, concurrency, recovery, and back-up. This means testing the efficiency of an Object Oriented Database Management System (OODBMS)[2].

2. Building an Unix archive

2.1. File systems for users and data The first step was building a suitable file system based on a unique disk system mounted on several nodes connected to Ethernet. The adopted solution is the Andrew File System (AFS) that allows accessing files and directories resident on machines geographically distributed on 100 Mbit/ s Ethernet network. AFS is a client– server architecture; AFS client machines are typical workstations (Digital-UNIX, Risc-AIX, Intel-Linux, HP-UX, Sun-Solaris, etc.) with a shared local disk partition and dedicated to the operation of client AFS. On these machines, a process called ‘Cache Manager’ makes /afs file system visible to final users. All files and directories under /afs are virtually present on each AFS server machine (cells), geographically dispersed in the world. All cells are visible at the second level of an AFS tree.

The final user has typical advantages of the centralized management of data, such as retrieving, installation, maintenance and backup of software and data and distribution of workload on several machines. Another fundamental characteristic of AFS is security. In fact, a more sophisticated authorisation system is used, achieving a more selective file access control than the one of the normal Unix file system. All that is granted by an authentication system, based on Kerberos Authentication Server, to which every AFS user is subdued.

2.2. New philosophy for a data archi6e Formerly data produced by each diagnostic acquisition system were stored in a pulse file, organised in two parts: the header, with various parameters required to extract and interpret each acquisition channels; data area, with all data channels placed in a sequential order. With this new philosophy we have structured the AFS file system so that at every pulse number corresponds a directory, which in turn, contains several subdirectories each corresponding to one family 1 and containing the channels of that family. In this way, we have replaced the one pulse/ one file concept with the one channel/one file concept. One of the main advantages of this solution is that the FTU data acquisition system can be structured in several acquisition nodes. Each node, relevant to one or more diagnostics, will be able to write directly its data in AFS archive without having to cross a central node. In addition, this system makes the outcome of single diagnostic acquisition node independent from the outcome of the others.

2.3. Client–ser6er concept data analysis and archi6ing The architecture illustrated in Section 2.2 allows the client–server concept to be used in an almost generalised way in the implementation of 1 A family is a collection of correlated acquisition channels (i.e. belonging to the same diagnostic).

A. Bertocchi et al. / Fusion Engineering and Design 56–57 (2001) 981–986

system applications and utilities. On an AFS client machine and at the end of a pulse, a server receives data from any acquiring node, in order to load them into the defined directories. Such a server has been built following the RPC standard, suitable to transfer various types of data from parts of various types of machines with different operating systems. Each client sends one channel at a time; therefore, the server satisfies requests through the first-in/first-out way: there is no lost information because the RPC standard sets up a queue of requests. The dispatch of one channel at a time allows the immediate availability of data: therefore the acquiring node can first send the channels suitable to define the main results of each pulse. One of the future improvements will be to develop on one of our machines that sees the AFS file system, a server of data that receives extraction requests for channels from any other node in a platform independent way, allowing the user to display and to process data on his own personal workstation. As an example, we have worked out one connection between server of data MDSPLUS [3] and our data reading routine: in such a way from the geographic network, seeing our AFS file system, one will be able to send read requests to our archives and to display data.

3. Object-oriented approach

3.1. FTU object-oriented data model 3.1.1. Data requirements analysis We start to describe FTU environment using the unified modelling language (UML) [4] that visualizes general models and extends the core concepts. We focus our attention on aspects such as acquisition, processing and storage of data carried out when a plasma discharge is produced. In FTU, a plasma discharge is 1.5 s long and each discharge defines one shot. The core of FTU database is made of the shots’ data. To introduce various data sources involved in the FTU experiment we draw a use-case diagram as shown in Fig. 1. By now we consider only the following

983

data sources because we want to focus our attention mainly on the current FTU database: (i) FTU – DB DAS, the raw data produced by acquisition systems; (ii) FTU – DB PED, the post-elaborated data produced by analysis software [5]; (iii) DIARY, it is a set of synthetic information on each shot; (iv) HARDWARE/SOFTWARE TABLE, the DAS configuration.

3.1.2. Database schema Data schema has been designed to require: (i) high performance to storage data acquisition objects; and (ii) general schema to be adapted to similar experimental data coming from other laboratories. In our case, the most natural semantic of data objects is to use the following classes hierarchy: Channel-Family-Shot. A synthetic schema of relationships between objects belonging to these classes is shown in Fig. 2. It is a schema that includes the concepts of Shot as a collection of families of channels and each Channel as a container of data array and header description data. To distinguish the acquired raw data objects (DAS – Channel) from the post-elaborated data objects (PED – Channels) we have used inheritance to achieve specialisation in the FTU database. In this way, we can use different object accesses for data acquisition systems and analysis software and implement specific methods and constraints. In this class diagram header of the channel is represented through many-to-one relationships between objects Soft – Table and family, which in

Fig. 1. Use case diagram of the FTU data source.

A. Bertocchi et al. / Fusion Engineering and Design 56–57 (2001) 981–986

Fig. 2. Schema of FTU data.

984

A. Bertocchi et al. / Fusion Engineering and Design 56–57 (2001) 981–986

985

Fig. 3. jScope interface to OODB.

turn have their one-to-many relationships with objects Channel. In this way we have a link between the header description of the channel in each shot and the general data of Software Table. Further, a relationship between DAS – Channel and Optional – Parameters and its aggregate classes optimises storage and access to the header description of the channel. Another characteristic of this schema is the implementation of Stream class to define arrays of data with different types. We have reduced the implementation of DAS – Stream class and PED – Stream at the specific data types of the FTU database.

3.2. OODBMS implementation Our approach to OODBMS has been somewhat educational, mainly because this technology is not yet mature and the actual costs of a com-

mercial tool are very high. We have used a trial edition of ObjectStore Personal Storage Edition (PSE) for Java to have persistent objects functionality. PSE use object serialization that provides a simple yet extensible mechanism for storing objects. Serialization is reliable for applications that operate on small amounts of data and update objects infrequently. Unfortunately, this is not our case, anyway we used it to make a case study and learn object-oriented methodology. Using the Java language we have reduced the software development time and we have implemented: (i) read/write access to Hardware/Software table; and (ii) read/write access to data acquisition and post-elaborated channels. The methods that are visible to the user and process are typically: (i) select/insert/update/delete of Hard/Soft – Table objects; (ii) select/insert/update of Family objects; and (iii) insert/select of Shot

986

A. Bertocchi et al. / Fusion Engineering and Design 56–57 (2001) 981–986

and Channel objects. We have developed the FTUdb package that contain packages of schema and methods of FTU database, servlet applications to build a web-based interface to browse objects and applications for importing data from an old archive. In particular, we have implemented Manager class to access Hard/Soft – Table, Shot, Family and Channel objects contained in the database. Manager also controls user access, registration and session management. Manager might run as an application server that operates behind a web server and provides access to the database. Manager class contains most of the database-specific code, such as starting and ending transactions. Further, the FTUOODBPro6ider class has been implemented with low effort to interface data Java-viewer jScope [3] to OODB, as outlined in Fig. 3, showing that it is possible to develop a data visual analysis tool in Java using an objectoriented platform independent programming language.

4. Conclusions With the new archive organization we have achieved (i) a drastic reduction of waiting time for data availability after pulse; and (ii) a number of channels and amount of data scalability, virtually without limits because of the strong parallelism in writing and reading the archive. From the point

of view of the ‘physic’ access to data, since we have all what technology makes available today, the next step will be to explore the nature of the data. We have studied and verified how to access an OODB of FTU experimental data of raw type. It remains to explore and study the nature of information used in data processing, to design objects of an OODB for tokamak data and finally to make a further increase in the access criteria efficiency and a true exchange of data with other laboratories.

References [1] D. Du¨ llman, Object database as data stores for high energy physics, Proceeding of the CERN School of Computing, CERN 98-08, 1998. [2] T. Connoly, C. Begg, A. Strachan, Database System: A Practical Approach to Design, Implementation and Management, Addison-Wesley, Reading, MA, 1998. [3] G. Manduchi, C. Taliercio, A. Luchetta, The Java interface of MDSPLUS: towards an unified approach for local and remote data access, Second IAEA Technical Committee meeting on Control, Data Acquisition and remote participation on fusion research, Lisboa, Portugal, 19 –21 July 1999. [4] M. Fowler, K. Scott, UML Distilled, Applying the Standard Object Modelling Language, Addison-Wesley, Reading, MA, 1970. [5] G. Bracco, G. Buceti, A. Imparato, M. Panella, S. Podda, G.B. Righetti, O. Tudisco, V. Zanza, Data analysis software on the FTU experiment and its recent developments, Fusion Engineering and Design 43 (3-4) (1999) 425 –432.