Fusion Engineering and Design 60 (2002) 475– 480 www.elsevier.com/locate/fusengdes
The TEC Web-Umbrella J.G. Krom a,*, M. Korten a, H.R. Koslowski a, A. Kra¨mer-Flecken a, G. Manduchi e, B.U. Nidero¨st d, J.W. Oosterbeek a, R.P. Schorn a, F. Wijnoltz c, B. Becks a, W. Biel a, M.P. Evrard b, J.C. van Gorkom c, M.G. von Hellermann c Institut fu¨r Plasmaphysik, Forschungszentrum Ju¨lich GmbH, EURATOM Association, D-52425 Ju¨lich, Germany 1 b Laboratoire de Physique des Plasmas/Laboratorium 6oor Plasmafysica, ERM/KMS, EURATOM Association, Renaissancelaan 30, B-1000 Brussels, Belgium 1 c FOM Instituut 6oor Plasmafysica Rijnhuizen, EURATOM Association, Postbus 1207, NL-3430 BE Nieuwegein, The Netherlands 1 d Utrecht Uni6ersity, Princetonplein 5, NL-3584 CC Utrecht, The Netherlands e Consorzio RFX — Associazione EURATOM ENEA per ricerche sulla fusione, Corso Stati Uniti 4, 35127 Padua, Italy a
Abstract The TEC community operates the TEXTOR device and in doing so collects and stores data from a number of different front-end acquisition systems, processing codes and analysis systems. Due to the evolution of these systems in the past, different, distributed data storage technologies were used to record this data. In an attempt to reduce the number of interfaces client codes have to use when accessing data from these data stores, an ‘‘umbrella’’ concept was developed: a software-layer that covers (as an ‘‘umbrella’’) as many as possible of these stores and provides a unified access mechanism to them. We explored the possibility of using the widely supported HTTP protocol for this purpose; this is the core protocol of the World-Wide-Web and it is capable of transporting almost any type of data. The concepts behind using this protocol were based on earlier work at JET. Access via this umbrella has been provided to the most important data stores around TEXTOR and access to others is being added regularly. Clients codes, libraries and programs have been developed for several user environments. The HTTP based concepts and the data-access via this system have been found to be highly portable. This paper gives an overview of the TEC Web-Umbrella system, it describes the basic concepts of this system and it presents some of the client-side codes and programs. The paper also reports on some first (tentative) user experiences with it. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Remote data-access; Platform independence; Programming-language independence; Web technology; HTTP; Data navigation
Paper to accompany a presentation at the 3rd IAEA TCM on Control, Data Acquisition and Remote Participation for Fusion Research, Padova, Italy, 16–19 July 2001. * Corresponding author. Tel.: + 49-24-6161-5451; fax: + 49-24-6161-5452. E-mail address:
[email protected] (J.G. Krom). 1 Partner in the Trilateral Euregio Cluster (TEC).
0920-3796/02/$ - see front matter © 2002 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 0 - 3 7 9 6 ( 0 2 ) 0 0 0 4 9 - 2
476
J.G. Krom et al. / Fusion Engineering and Design 60 (2002) 475–480
1. Introduction The TEC community operates the experimental tokamak device ‘‘TEXTOR’’. In doing so, it collects and stores data from a number of different frontend data acquisition systems, processing codes and analysis programs. Due to the evolution of these systems in the past, different, distributed data storage technologies were used to record this acquired and computed data. Just to indicate some of these: We use an in-house developed data storage system for the CAMAC acquisition system, we have the data stores of imported diagnostics such as those built by other institutes and commercial laboratory information systems and we operate an ORACLE database with processed data. This situation implied that client programs and codes had to use a number of different interfaces to gain access to data from these stores. One of the aims of project DYNACORE [1] was to reduce this number of interfaces by a generic concept. It developed the concept of a softwarelayer that covers (as an ‘‘umbrella’’) as many as possible of these stores. The first DYNACORE approaches used CORBA to provide a unified access mechanism to these stores, for object-oriented programming environments. This approach was further extended as described in this paper, by using web technologies to provide support for other programming environments too.
2. The TEC Web-Umbrella We explored the possibilities of the widely used and supported HTTP protocol for this purpose [2]; this is the core protocol of the World-WideWeb and it is capable of transporting almost any type of data. This protocol is also used in earlier work at JET [3 – 5]; but we use it in slightly different ways. Using this protocol brings several advantages: Many high-quality servers, clients and support programs exist for several platforms. It allows efficient transport of a wide variety of data- and file types.
It allows easy navigation of the experiment data.
2.1. Protocol considerations Most data objects in the relevant stores represent a ‘‘signal’’; this term is, in the current context, used rather loosely to indicate time-traces, profiles, scalar values, multi-dimensional datasets, etc. These signals are made to appear to the user of the Web-Umbrella system as ‘‘pseudo-files’’ in a pseudo-directory structure on a Web server. To make data addressable via HTTP, a Uniform Resource Locator (or URL [6]) scheme was devised that allows unique identification of the signal in question. An example of such a URL would be: http://ipptwu/textor/all/90620/rt2/ivd/ ip-star. As with any URL, the http: fragment is the protocol identifier and the //ipptwu fragment is the hostname of the server computer2. The pathname component of the url specifies the experiment from which data is requested and the shot-number, diagnostic group, subsystem and signal-name hierarchy of that experiment; ip-star is one of the variants of the plasma current signal. This URL structure is not significantly different from that used by other attempts to use web-technology to access data (e.g. JET-RDA [3]), except perhaps on two points: 1. The separation of the web-server name and the experiment name. It is, in principle, possible that one host provides data for a number of experiments, but also that data from one experiment could be obtained from different hosts (to provide some degree of load balancing). 2. When a partial signal URL is presented in a request, the TEC Web-Umbrella servers reply with a list of possible choices on the next level. See Fig. 1. The data structure for a signal often has two main components: the meta-data and the bulkdata. Meta-data being information like the num2
The fully qualified name is: ipptwu.ipp.kfa.juelich.de
J.G. Krom et al. / Fusion Engineering and Design 60 (2002) 475–480
ber of samples in the bulk-data, the units (Volts, Amperes, Kelvin, etc) of the measurement and reference information such as time-bases. Other systems, e.g. NetCDF [7] as used in JET-RDA, tend to combine meta-data and bulkdata in one file or dataset. It seemed useful to present these two types of data separately; this allows a client program to use the meta-data to tailor its request for bulk-data. Note that this does not imply that this data is stored separately, just that it is presented via this interface scheme as two separate ‘‘pseudo-files’’. A URL representing a signal on a TEC WebUmbrella server actually addresses this meta-data ‘‘pseudo-file’’. This ‘‘file’’ is structured as a Java properties file; mainly because Java provides a well-defined format for encoding keyword/value pairs, Java is not required for the handling of this meta-data. Fig. 2 presents an example of such a properties ‘‘file’’. One of the properties in this ‘‘file’’ identifies the URL of the bulk-data; another defines the encoding of the bulk-data, by specifying its mime-type. At the moment, the bulk-data for most relevant signals is encoded using the ASCII representation of the sample values; a highly portable representation, but with some well known disadvantages. It is however possible, and properly supported by the HTTP protocol, to use different data-types, e.g. IEEE binary floating point numbers, and/or different transport encodings, such as compressed or zipped. Indeed there are a number of GIF encoded pictures available on one of our servers.
477
Furthermore, the HTTP protocol allows for some negotiation between client and server about the exact type and encoding to use. In this way both can agree to use either a portable representation or perhaps a more efficient one. Requests for bulk-data can be tailored to the requirements, using the http query options mechanisms, e.g. a client could request only a part of the bulk-data. A few query options have been defined for use within the TEC Web-Umbrella scheme to indicate the segmenting information. See Fig. 3.
2.2. Ser6er implementations Access via this umbrella has been provided to some of the most important data stores around TEXTOR and access to others is being added regularly. The implementations of the web-servers for these various stores, differ according to the storage platform, but all servers adhere to the URL naming and data structuring as sketched above. Access to the CAMAC data is provided by an Apache [8] server running on an OpenVMS machine, with a CGI [9] program written in standard C; a similar setup is used for the data from the Vacuum-Ultra-Violet (VUV) diagnostic, but this server runs on a Linux computer. Data from the TEXTOR Physics Data store (TPD [10]) is provided using an ORACLE web-server with the CGI program coded in PL/SQL. The data acquired by the FOM diagnostics is provided by an Apache server with Tomcat [11] and Java coded servlets which in turn access a data-server using CORBA.
Fig. 1. Output of a web request for a partial TEC Web-Umbrella URL giving a list of choices, in this case of diagnostic groups. We make, on the relevant pages, only limited use of HTML make-up, to allow processing by simple text-only oriented codes. The above is output of the text-only web-browser lynx.
478
J.G. Krom et al. / Fusion Engineering and Design 60 (2002) 475–480
Fig. 2. Lynx output of a web request for a signal property page. The meaning of some properties is indicated in the main text of the current paper; for a detailed and up-to-date description of these properties see http://ipptwu.ipp.kfa-juelich.de documentation.
In this way it is simple to add a new data server, using the technology most appropriate for that server, without requiring changes to the client codes. Indeed several new servers are already under consideration.
2.3. Client facilities Clients codes, libraries and programs have been developed for several user environments. 1. Several of the authors developed or adapted stand-alone client applications. The following are presently available as user-side client-software for Web-Umbrella signals. (a) An experimental adaptation of jScope, an MDSplus tool [12]. (b) An adaptation of Xplot, an existing inhouse tool for OpenVMS written in Fortran77 and Tcl/Tk.
(c) An adaptation of texplotter, an existing Java based in-house tool [13]. (d) A Win32-based visualisation tool, a new in-house development, written in Delphi 5 plus Indy Winsocks [14] and the SDL component suite [15]. (e) A VBA program for Microsoft-Office tools Excel, Access, (etc.). 2. Support libraries for several programming environments have been made available; currently: Fortran, C and C+ + , Java, Matlab, Scilab, Delphi, IDL. Others are being added when the need arises. These support libraries handle the HTTP connection with the web-server, provide access to the property ‘‘files’’ and convert any transport encoding. In fact most, if not all, of this activity is handled by the normal run-time libraries of mod-
J.G. Krom et al. / Fusion Engineering and Design 60 (2002) 475–480
ern languages like Java and Delphi. These codes have, where relevant, been tested on platforms as diverse as Apple-Macintosh, OpenVMS, Windows and several UNIX flavours.
2.4. Additional descriptions The use of a web server also allows us to make additional, supporting information available. This additional information is normally processed and displayed by web-browsers aimed at interactive users. In these cases we have all the power of HTML, PDF and related formatting languages available to present this additional information. One of the authors provided a number of physics orientated descriptions of measurement channels. In a similar way all relevant documentation of the TEC Web-Umbrella approach and several coding examples are published via this web server.
3. Experiences The current operational phase of TEXTOR (a shutdown of about a year), limits the experiences gained with the TEC Web-Umbrella; however, there has been some interesting local and remote use. In the last days before the shutdown, early versions of the TEC Web-Umbrella codes were used during a Remote-Participation experiment, whereby collaborators in Canada successfully used the Web-Umbrella interface to read data directly following a TEXTOR shot into a local Matlab session. Most of the local users are pleased by the convenience of having one interface to all data stores. Also the facility of being able to navigate
479
through the signals tree, using easily available tools such as standard web-browsers, is appreciated. Use of the HTTP protocol is convenient when accessing data remotely; most firewalls allow HTTP traffic. More and more institutes operate web-caches for HTTP traffic, reducing network load. During experiments with dummy data on a not really state-of-the-art server (350 MHz PentiumII, Linux 2.2.14, Apache and CGI program) and a minimal ‘‘get and forget’’ client on an IBM AIX machine, connected via 100 MHz Ethernet, we measured transfer times of (0.17+ B/4 250 000) s for B bytes. Using a client coded in Matlab and Java on the same AIX machine, we found times of (0.3+ B/ 1 023 000) s. Using an 8 byte/sample encoding, this equates to (0.3+ N8/128 000) s for N8 samples. The servers with real data can be up to a factor of 10 slower; but these times are dominated by other effects, such as even older hard- and software, additional network access and non-optimal data store organisation. We are currently in the process of acquiring a more powerful server (Multi-CPU IBM R6000, 2TByte local disks, multiple gigabit Ethernet connections); furthermore, with the use of HTTP features such as compressed transport encoding, it seems reasonable to expect acceptable performance.
4. Conclusions The HTTP based concepts and the data-access via this protocol have been found to be highly portable. Adapting existing client programs has
Fig. 3. Lynx output of a web request for an (unusually small) segment, roughly in the middle of a signal, using query options. The ‘‘start‘‘ option species the sample number with which to start the segment and ‘‘total’’ species the total segment length.
480
J.G. Krom et al. / Fusion Engineering and Design 60 (2002) 475–480
been shown to be straightforward. Users tend to appreciate the easy navigation through the signal stores using HTML facilities. This approach allows a low coupling between client-side program coding, server technologies and data storage designs. We believe that this, HTTP based, umbrella approach has clear advantages for both local and remote, platform independent data access.
[5]
[6] [7] [8] [9]
References [1] The European Telematics Application Program, project DYNACORE (RE4005) http://www.cordis.lu/telematics/ home.html [2] R. Fielding, et al., Hypertext Transfer Protocol — HTTP/ 1.1, RFC 2616, June 1999. [3] K. Blackler, Remote access to JET data and computers, in: 2nd IAEA Technical Committee Meeting on Control, Data Acquisition and Remote Participation on Fusion Research, July 1999, Lisbon, Portugal. [4] V. Schmidt, Technical aspects of remote participation at JET in the framework of the European Fusion Research
[10]
[11] [12]
[13]
[14] [15]
Program, in: 11th IEEE NPSS Real-Time Conference Record, Alphagraphics, Santa Fe, NM, 1999, pp. 463 – 467. M.R. Wheatley, M. Rainford, CODAS Object monitoring service, in: 21st Symposium On Fusion Technology, 11 – 15 September 2000, Madrid, Spain. T. Berners-Lee, et al., Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, August 1998. NetCDF User’s Guide for C, An Interface for Data Access. B. Laurie, P. Laurie, Apache, The Definitive Guide, O’Reilly, 1999. Th. Boutell, CGI Programming in C & Perl, AddisonWesley, 1996 – 1999. A. Kra¨ mer-Flecken, et al., Database and WEB-application server as a tool for remote participation at TEXTOR, in: 21st Symposium On Fusion Technology, 11 – 15 September 2000, Madrid, Spain. http://jakarta.apache.org/tomcat/ G. Manduchi, et al., The Java interface of MDSplus: towards a unified approach for local and remote data access, Fusion Eng. Des. 48 (2000) 163 – 170. M. Korten, et al., Upgrading a TEXTOR data acquisition system for remote participation using Java and CORBA, Fusion Eng. Des. 48 (2000) 239 – 245. http://www.nevrona.com/Indy/ http://www.lohninger.com/