Data analysis software on the FTU experiment and its recent developments

Data analysis software on the FTU experiment and its recent developments

Fusion Engineering and Design 43 (1999) 425 – 432 Data analysis software on the FTU experiment and its recent developments G. Bracco *, G. Buceti, A...

77KB Sizes 0 Downloads 14 Views

Fusion Engineering and Design 43 (1999) 425 – 432

Data analysis software on the FTU experiment and its recent developments G. Bracco *, G. Buceti, A. Imparato, M. Panella, S. Podda, G.B. Righetti, O. Tudisco, V. Zanza Associazione EURATOM-ENEA sulla Fusione, Centro Ricerche di Frascati, C.P. 65 -00044 Frascati, Rome, Italy

Abstract A general purpose data analysis and display program, named SHOW, has been developed over the years as the answer to the needs of the experimentalists working on FTU, a high magnetic field tokamak devoted to the study of plasma behaviour both with ohmic and additional heating. The description of the main characteristics of the SHOW program can be seen as a summary of many of the facilities required by the experimental physicists in their data analysis activity. Some of these facilities rely heavily on the program architecture, that derives from the awareness that only a very flexible structure permits an easy adaptation to the changing requirements of the data analysis activity. The program has been developed in FORTRAN on the IBM mainframe (running MVS-ESA) where the main FTU databases are located. The GDDM package has been used to implement the interactive user interface, based on panels, and the graphic display facilities, that include 2- and 3-D plot sections. The program provides utilities for the evaluation of derived quantities (as integrals, derivatives, smoothing filters) and for the time series analysis. A spread-sheet section permits the analysis of tables of data. A quantitative analysis of the utilisation of the SHOW program is also presented, during a period of \ 3 years. In the last year a preliminary version of the program has been ported to the UNIX environment, on a DEC Alpha workstation, and the user interface has been realised making use only of libraries available for free. A client/server software has been developed so that the IBM mainframe can still be used as a file server. © 1999 Elsevier Science S.A. All rights reserved. Keywords: FTU experiment; Data analysis; Program

1. Introduction The data analysis is the core activity of the experimental physicists working on a large fusion device that produces a vast amount of data during its operation. Most of the data analysis on the FTU experiment [1] is presently performed using * Corresponding author. E-mail: [email protected]

the program SHOW, a general purpose software that has been developed over the years by the experimentalists working on the high magnetic field tokamak devices located at ENEA Frascati. The first version of the program was developed on FT [2] and a version has been installed also at JET [3], in connection with the exploitation of the data produced by the neutral particle analyser [4], a diagnostic system constructed and operated in collaboration between ENEA Frascati and JET.

0920-3796/99/$ - see front matter © 1999 Elsevier Science S.A. All rights reserved. PII S0920-3796(98)00414-1

426

G. Bracco et al. / Fusion Engineering and Design 43 (1999) 425–432

The implementation of the program is based on the hardware configuration available in ENEA Frascati, where various models of IBM mainframes have been operated over the years, presently an IBM 9672-R21, running MVS/ESA. Due to the general trend to migrate to a distributed system of workstations, a new preliminary version of the program has been recently ported to UNIX, on a DEC Alpha workstation, and a client/server software has been developed so that the IBM mainframe is used as a file server. The present paper refers mostly to the MVS version of the SHOW program and reflects some of the limitations of a system conceived many years ago. In spite of that, the flexible structure of the program has permitted to follow the evolution of the needs that have emerged over the years. It can be observed that some of the solutions adopted in the program structure are well suitable for a medium size laboratory, with  30 regular users of the program, and may be not practical, at least in their present implementation, in a much larger environment, with hundreds of users. Nevertheless the description of the facilities provided by the program can be regarded as a summary of what the experimentalist requires to a general data display and analysis software. The experimental exploitation of a large fusion device requires the collection of a large amount of data [5–7], most of them produced in the relatively short time duration of a plasma discharge (1.5 s on FTU, up to 60 s on JET) identified by a pulse number. As a consequence the databases generated by the data acquisition systems have a primary key in the pulse number and a secondary key, often called channel, identifying the acquired quantity. These databases, called pulse databases in the following sections, have a typical size in the range 10 to 100 MB pulse − 1, depending on the availability of diagnostic systems and the duration of the plasma [5]. The overall analysis of the experiment requires also the construction of global databases, containing the relevant information extracted from many plasma pulses. These global databases constitute a summary of the experimental results and they can also be used as an index of what is contained in the pulse databases.

The pulse databases are originated both by the control systems, which permit to operate the fusion device, and by each diagnostic apparatus that measures a relevant plasma parameter. Usually each system has its own monitoring software which is tailored to its specific requirements. On the contrary, the physical exploitation of the experimental results requires a software that is capable of an integrated approach, that is to access simultaneously every existing pulse database, in order to make all the demanded crosscorrelated analysis of the data. This is the main purpose for which the SHOW program has been written. The work is organised as follows. In Section 2, a brief description of the FTU databases is given. The SHOW program facilities are illustrated in Section 3. Section 4 presents the details of the program structure. The interactive and the batch modes of the program are described in Sections 5 and 6, respectively. Section 7 reports the results of monitoring the program utilisation for \ 3 years of the FTU experimental activity. As a conclusion, Section 8 presents the new UNIX version of the program.

2. FTU databases FTU pulse data are stored in two main databases: the DAS (data acquisition system) database, containing the data produced by the control and data acquisition systems, and the PED database (post-pulse elaboration database), produced by elaboration codes running on the IBM mainframe. The user can access also to a third type of data, the elaborated data, for which only the recipe to evaluate the returned data is stored as a member of a partitioned dataset.

2.1. DAS database The data acquired from the DAS [6] are collected by a Digital VAX 4000, running VMS 6.1. These data are structured in a standard format and transmitted to the IBM through a fast FDDI link 100 MB s − 1, where they are stored on an array of disks, model 3390. The data of each

G. Bracco et al. / Fusion Engineering and Design 43 (1999) 425–432

plasma pulse are grouped in a set of sequential files, structured as direct-access unformatted datasets. The set consists of a root dataset, containing the general index together with the most frequently required data, and of a number of datasets each containing data from one or more diagnostic systems. These datasets, like the other datasets in the system, are maintained under the HSM (the IBM hierarchical storage manager) and are automatically migrated to a system of stored tapes, depending on the date of last access. The logical structure of the data is based on a hierarchical system where a family name is assigned to each diagnostic system, while a channel name is assigned to each specific data produced by the diagnostic. Hence, the user selects a data in the DAS database by providing the complete channel name, formed by the family name and the channel name. The standard format of a DAS data includes an header and a data part. The header, called channel table, contains all the information that must go with the data for a complete interpretation of the results of each diagnostic. The number of DAS channels is now  800 per plasma pulse. An important characteristic of the data contained in the DAS database is that its content does not change after it has been created the first time.

2.2. PED database The PED database contains the data produced by all the elaboration codes that are run after the completion of each plasma pulse, or generated during the off-line analysis of each diagnostic system. All the PED channels referring to the same plasma pulse are stored as members of a single partitioned dataset, identified by the pulse number, allocated under a common userid. The user has the possibility to create a private PED database, under its own userid. The partitioned datasets are automatically re-compressed by the disk management utilities during every night. Sometimes the dataset needs to be reallocated if the size is not sufficient to store the whole amount of

427

data. When an elaboration code is run several times on the same plasma pulse and data, the old versions of the data are lost, if no special procedure is followed. The described structure provides a very simple solution to the problem of the management of a database containing data that require to be easily updated. The number of PED channels is now 170 per plasma pulse.

2.3. Elaborated data The elaborated data database contains only the recipes to evaluate the returned data. The recepi is defined using a simple interpreter language. This interpreter can perform calculations using data contained in the DAS and PED databases and in other suitably formatted text datasets. In addition, input parameters can be provided by the user, when the elaborated data channel is invoked. This facility permits to avoid the duplication of very similar data in the PED and DAS databases and allows the prompt definition of new elaborated data channels according to the experimentalist needs, immediately available for all the pulses for which the corresponding PED or DAS data exist. The elaborated data facility provides also an user friendly access to the software packages of various diagnostics systems which have been incorporated in the SHOW program, as described in Section 4.3. About 700 elaborated data channels definitions are presently available.

3. Show program facilities In this section the main facilities are shortly described, dividing them accordingly to the main tasks that can be performed by the SHOW program: databases read-out, 2-D and 3-D plots, computation of derived quantities and time series analysis, spread-sheet tool for global database analysis, read and write to external files. A simple interface with the FTU operation journal is also provided. Most of SHOW options can be run either in interactive and in batch mode.

428

G. Bracco et al. / Fusion Engineering and Design 43 (1999) 425–432

3.1. Data read-out The experimental data collected in the pulse databases are identified by a pulse number and a channel name, as described in Section 2, and SHOW program is able to read 1- or 2-D data structures, from these databases. The program accepts as input also sequential text files, in appropriated format, for which the channel name coincides with the dataset name and in that case the pulse number is not always a relevant information. The data read-out can be performed step by step, providing the keys identifying the data and retrieving its content, or by using a command file, so that the same sequence can be easily repeated. The command file can be generated by the SHOW program itself so that the user is not required to remember in details the syntax of the program commands required to construct a sequence. The sequence can consist of different channels for the same pulse series or different pulses for the same channel series. Several facilities can be invoked from the command file: the plot facilities, the computation options, the preparation of data in the format required by the spread-sheet facility, the spread-sheet facility itself, the saving of a data in the PED database.

plot page. Some possibilities to display values or description labels as character strings on the plot page are available. A limitation of the present version of the program is the assumption that all the grids in the same plot page have the same range of abscissa values. The range itself can be automatically adjusted by the program, or fixed by the user, or evaluated relative to the abscissa values of one of the input data.

3.3. Computational facilities Simple algebraic computation between 1-D data can be performed, with an implicit interpolation for data with different values of the independent variable. The more common functions and operators (integrals, derivatives, smoothing filters, etc.) are also available. The computation facility is based and a simple interpreter package that can be easily modified to add new functions or operators, in the form of FORTRAN routines. Some fitting capability is included, both for linear or polynomial functions and also for more general functional dependencies. The program includes some tool for the study of time series, as Fourier analysis and the box car technique.

3.2. Graphic facilities 3.4. Spread-sheet utility The plot capabilities of the SHOW program include 2-D plots, 3-D plots and contour maps. In 3-D plots, the suppression of hidden lines is applied. In 2-D plots, 1-D data from different pulses and channels, or one-dimensional slices of 2-D data can be included in the same plot grid. The plot page can be defined either with several plot grids per page or in overlay mode. Error bars can be displayed both in x and y directions. Log and linear scales can be chosen for both axis. Data in the same plot grid can be shown as normalised to their maximum. Vertical reference lines can be drawn on the plot page. One-dimensional data with the same independent variable can be easily plotted one versus each other. Each data can be represented by different kinds of lines and/or symbols. The cursor can be used to change the grid limits, to measure values or distances on the

A simple spread-sheet utility, named VERSUS, is included in the SHOW program, so that simple analysis of the global databases can be performed. The spread-sheet operates on tables of data read from external files in text format and includes some computational, fitting and plotting facilities. VERSUS commands have been tailored to fulfil the most common needs of the analysis of tables of data and some of the commands are integrated with the other facilities provided by SHOW. For example a VERSUS command provides the possibility to prepare command files, in the syntax required by the data read-out section of SHOW, so that a table of new data for a given list of plasma pulses and times can be easily obtained, starting from the data contained in a global database.

G. Bracco et al. / Fusion Engineering and Design 43 (1999) 425–432

3.5. Write-out options The option to write data to external files in text format is provided and the same files can be also read-back. The files can be edited or browsed. A table of data can be written in the same syntax as the one accepted by VERSUS. The 2-D data can be written in the format that is normally acceptable by other 3-D plotting programs. A data can be also saved as a data in the PED database. 4. Program structure and system constraints The hardware and software configuration of the available system has set the constraints to the design of the program. An effort has been dedicated to provide a clear and effective user interface. To permit a complete analysis of the experimental data, it has also been decided to include several FORTRAN packages related to each diagnostic system. Some details on the program dimensioning are also provided.

4.1. System constraints The main databases of the FTU experiment are located on the disk system of an IBM mainframe (presently the model 9672-R21) under MVS-ESA operating system. The users access the data using 3270 terminals with GDDM (graphical data display manager) software. Interactive sessions are run under TSO with the PDF/ISPF full screen facility. In the last 5 years, many of the terminals have been substituted by Macintosh personal computers running 3270 emulators as TN3270. The code is written in FORTRAN VS, the IBM implementation of FORTRAN 77.

4.2. User interface In developing the SHOW program it has been chosen to make use of the GDDM package to implement all the full screen user interface and the plot display sections. PDF/ISPF environment can be invoked from SHOW in order to make use of all its features, including the editing and browsing of datasets, but it is not required to run most of the code itself.

429

The 2-D plot section has been implemented making use of a system of high level plot routines developed at ENEA Frascati. The package was originally written on HP computers using Tektronix terminals and PLOT-5 graphical software. The package satisfies the most common requirements for 2-D plotting tasks and includes also a simple set of character font. Most of the interface with the hardware is contained in a single FORTRAN routine, so that it can be easily adapted to a new system. In fact several versions have been developed in the years, including interfaces with PLOT 10 IGL on VAX-MVS, IBM-MVS and Norsk Data systems, with GDDM on IBM-MVS and IBM-VM, and with GHOST (AEA Technology, Culham, GB) on IBM-MVS. The 3-D plot section makes use of a modified version of the Calcomp 3-D software, which has been interfaced with the package.

4.3. Diagnostic related packages The present implementation of SHOW includes in the code also several FORTRAN packages that are developed under the responsibility of each plasma diagnostic group. These packages provide data to the SHOW program using a standard interface structure, for which a symbolic calling sequence is available. This symbolic calling sequence is stored as the definition of an elaborated channel, see Section 2, so that the user does not need to remember it in details. In that respect there is no practical difference for the SHOW user if the data is retrieved from the pulse databases or it is the product of these diagnostic packages. The main advantages of this feature are: (1) a reduction of the size of the PED databases; (2) the possibility for the user to change some of the control parameters for the interpretation of the data of each plasma diagnostics, so that a more complete analysis can be performed; and (3) the diagnostician can immediately compare the results of its diagnostics with all the others available in the databases, and does not need to develop his own plotting program to achieve this result. The main disadvantages are: (1) the possibility of unwanted interactions between different sections of the code, due to errors or misunderstandings by a

430

G. Bracco et al. / Fusion Engineering and Design 43 (1999) 425–432

relative large number of different programmers; and (2) the need to re-link the entire program whenever any of the routines is changed. This last problem could be avoided in principle by the use of a run-time library, but no attempt has been done for the moment. Taking advantage of this facility of SHOW, no ad hoc application has been developed for some of the diagnostic systems as their entire analysis software is contained into the SHOW program. At present 28 different elaboration packages are included in the program.

browse utility of the PDF/ISPF system, that is used to access a dataset containing the help texts, one for each panels. The browse utility includes a find/repeat find command. Other panels with a different page structure are used for specific tasks as providing the list of the data available for a given plasma pulse, the user interface of the VERSUS spreadsheet, the display of the content of a data in the SDA, the various plot pages: 2-D, 3-D, the time series analysis and the VERSUS spread-sheet.

4.4. Program dimensioning 6. Show batch mode All the data read by the program are stored in the computer memory as long as it is required by the user, using a dynamic common area called SDA (show data area). The maximum number of channel data contained at the same time in the SDA is 20. The SDA total size is 163840 (couples of 4 bytes words) and the maximum size of each channel data is 65536. These sizes can be easily modified, as it has been done for the JET implementation. The total size of the static sections of the program is 3.4 MB and the total number of routines is 545, 37 of which compose the core of the program. The program and the plot package consist of  27000 lines of code including 10000 lines of comments, not taking into account the diagnostic specific elaborations.

5. Show interactive mode The user interface for the interactive TSO mode is organised in full screen panels optimised for a page size of 32 rows and 80 columns, typical of the available 3270 terminals. The 24 lines page size is also supported. The structure of the standard SHOW panels is composed by an upper section, common to all the panels, and a lower section that is specific for each panel. The upper section displays the data read into the SDA, so that the user has always a direct information on the data available in the program memory. An interactive help information is available using the key F1. The help option is provided by the

As described in SECTION 3, SHOW permits to perform a series of operations, as the readout of data or the production of plots, by the automatic execution of a sequence of commands. These commands can be generated by the SHOW program itself in interactive mode and they can also be executed by invoking the program in batch mode. Not all the features of the program are available in batch mode: 3-D plots, time series analysis, the read and write of sequential files and the FTU journal can only be accessed in interactive sessions. On the contrary all the operations available from the data read-out panel together with the invocation of the VERSUS spread-sheet can also be performed in batch. The batch mode is used mostly for the analysis of a large number of plasma pulses, that usually requires the plot of the same data for every pulse of the series or for the preparation of global databases, consisting for example in tables of data at given time values for many plasma discharges.

7. Monitoring show utilisation The SHOW program writes two records in a log dataset, at the beginning and at the end of each program session. These records contain the program version, the user and the terminal identification, the program environment (batch or TSO) together with some summary information as the time duration of the session, the number of data loaded in the SDA and of the performed

G. Bracco et al. / Fusion Engineering and Design 43 (1999) 425–432

plots. In the case of an abnormal session termination all the summary information are lost. The results of an analysis of the log file covering about 3 1/2 years, in the period October 1991 to May 1996, are reported in this section. In Table 1 some quantitative information on the program utilisation are shown. In 78% of the cases, the program terminates normally and most of the abnormal conclusions of a SHOW session are presumably due to the automatic logoff of a TSO session after 5 min without any input from the user. The batch runs amount to the 9% of the total sessions and most of the interactive sessions have been performed from a terminal with 32 rows (88%). The large number of program versions is due to the fact that SHOW includes a number of software packages, that are rather frequently updated, as described in Section 4.3. On 3270 terminals it is necessary to terminate the program in order to obtain a copy on paper of the saved plots and this characteristic is related to the average values of the session duration and of the number of performed plots. Considering only the TSO sessions, the analysis must include the factor if the tokamak was operating or not, as shown in Table 2. The maximum number of 15/17 contemporary users has been found, irrespective of the FTU operation status.

8. UNIX version and conclusions The utilisation of the SHOW program by a rather large number of users has shown that it has Table 1 General information on SHOW program utilisation during 3 1/2 years Program invocations Normal program terminations Program versions Versions used \100 times Users Users with \200 invocations Batch users Batch users \20 invocations Plot per session (average) Average session duration (min)

33 500 25 400 960 60 71 30 18 11 9 26

431

Table 2 SHOW interactive utilisation, divided according to the tokamak operation status

Invocations per day Concurrent sessions

FTU operation days

FTU shut-down days

40 916

19 911

6 93

3 92

Averages and standard deviation values are provided.

satisfied a broad spectrum of the most common requests of analysis in the framework of a nuclear fusion experiment. The main reason for this success are: (1) the flexibility of the program, which has been developed by integrating all the new features required by the users in their analysis of the experimental data; and (2) the optimisation of the user interface, in the limits of the available hardware, so that the minimum of redundant actions have to be performed by the user in performing each task. A number of improvements can be performed on the present version of SHOW to correct some of the limitations in its graphic and computation capabilities. However, the general trend of the data elaboration indicates that the access to mainframe utilisation will be less easier in the future years due to increasing costs, while the use of networks of workstations and file servers has become widespread also in nuclear fusion experiments. In this framework a preliminary UNIX version of SHOW has been prepared by porting the code to a DEC Alpha workstation running Digital UNIX version 4.0B. The porting of the FORTRAN code from IBM MVS to Digital UNIX has been quite straightforward and the only problem to be solved has been the choice of a proper set of libraries to implement the user interface in the place of the GDDM package. In order to minimise the effort, for the moment it has been decided to maintain the same user interface, based on panels and a graphical window. The panel management has been realised by creating a simple emulation of the used subset of the GDDM routines, by means of the curses library, freely

432

G. Bracco et al. / Fusion Engineering and Design 43 (1999) 425–432

available on many UNIX systems. Also for the plot section the same choice of creating an interface to a freely available plotting package has been adopted. At present, two interfaces have been implemented, to the CERN HIGZ package and to the PGPLOT package (TJ Pearson, California Institute of Technology), respectively. The final choice between these two solutions will be taken after a wider utilisation of the two implementations. The advantage of the adopted approach has been the rapidity of the porting process but the obvious drawback is the under utilisation of the facilities provided by the UNIX environment. For that reason the user interface design will be re-examined in the future so that the same functional capability of the program will be provided with a more modern look and feel. A further development, that would guarantee a true platform independence for the software, would be to adopt JAVA as the programming language.

The effective feasibility of this last approach is entirely to be discussed yet, to understand if all the functional capabilities can be retained with this language.

References [1] R. Andreani, et al., Fusion Technology 1990, Proc 16th Symp. On Fusion Technology (Soft) London 1990, vol. 1, North-Holland, Amsterdam, 1991, p. 218. [2] F. De Marco, L. Pieroni, F. Santini, S.E. Segre, Nucl. Fus. 26 (1986) 1193. [3] P.H. Rebut, B.E. Keen, Fus. Technol. 11 (1987) 13. [4] R. Bartiromo, et al., Rev. Sci. Instrum. 58 (1987) 788. [5] P.C. Van Haren, N.A. Oomens, Fus. Technol. 24 (1993) 391. [6] G. Buceti, G. Bardotti, G.B. Righetti, Proc. 1995 IEEE Conference On Real-Time Computer Applications In Nuclear Particle And Plasma Physics, Michigan State University National Superconducting Cyclotron Laboratory, May 24 – 28, 1995, 160 p. [7] J.P. Christiansen, J. Comput. Phys. 73 (1987) 85.

. .