Computer Physics Communications 45 (1987) 299—310 North-Holland, Amsterdam
299
DATABASE SYSTEMS FOR HEP EXPERIMENTS Richard P. MOUNT California Institute of Technology, Pasadena, CA 91125, USA
I review the role of commercial and home-made database systems in HEP experiments. I draw heavily on the qualitative and quantitative evaluations performed by the L3 offline computing group, but I also review some of the activities of other experiments. I consider to what extent features, such as query language, Fortran interface, efficiency, portability, robustness, security and cheapness, are needed for various applications and are offered by existing systems. The commercial systems ORACLE and SQL/DS, and the HEP-made ‘access methods’ KAPACK and ZEBRA-RZ will be discussed in some detail. Finally, I present a brief overview of the L3 database system.
1. What does this talk contain?
systems which qualify as databases: 1) SELECT NAME, PHONE FROM L3 COLLABORATORS WHERE RESPONSIBILITY LIKE ‘%TEC%GASSYSTEM%’ ORDER BY DATE_LAST_CALLEDINTHE_MID DLE_OF_THENIGHT -
More than 2 years ago, the L3 collaboration realised that providing a database service to L3 off-line programs was not going to be trivial. This talk results mainly from the experience gained in evaluating how to use or misuse database systems to serve the needs of large HEP programs. I have also taken into account some of the evaluations performed by other collaborations (particularly ALEPH), and I will give a brief summary of the various solutions to the ‘database problem’ adopted by the LEP experiments. Obviously, even a short exposure to database systems qualifies me as an expert, and I am therefore bound to add (or at least cannot be prevented from adding) philosophical discussions or diversions whenever I feel it vaguely appropriate.
2) CALL GETVALIDCALIBRATION (‘BGO/BARREL’, ‘03:14:56 19-MAR-1990’) As these examples show, database systems can vary greatly in their sophistication and generality. According to my definition, simple file management systems are not databases, since access to data generally requires the specification of things like device names which have nothing to do with what you really want. However, the distinction between the best file management systems, and simple database systems, becomes blurred.
2. What is a database system?
3. What do we want a database system for?
The first philosophical diversion comes in the form of my own definition of a database system:
First, to get them out of the way, I will mention the ‘conventional’ types of applications for which commercial database systems are an obvious
‘A system for the storage and retrieval of data in which the data are identified by the attributes which are most relevant to the user or application.’ This definition can be illustrated by giving two examples of how a user or a program might ‘query’
choice. Typical applications are lists of collaborators, lists of LEP cables, lists of optical and mechanical properties of the BGO crystals delivered to L3 etc. Almost as obvious an application is the Pariticle Data Group database. In this talk, I will be much more concerned
0010-4655/87/$03.50 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)
R.P. Mount / Database systemsfor HEP experiments
300
with HEP activities which clearly need ‘something like a database system’, but which have technical or financial complications which make the choice of system less than obvious, The insertion and retrieval of bulky data, mainly by Fortran programs, is one such area. Alignment and calibration information for large modern detectors can be both bulky and frequently changing. These data may be created and accessed by many different processes, often operating simultaneously. Histogram and statistics archives are typical of slightly less demanding Fortran applications. Moving a lititle towards conventional database applications we find areas like book-keeping for data-taking and analysis, and Fastbus system management. In these areas there is a need for both high-performance Fortran access, and insertion and retrieval from terminals, Finally there are less essential applications, such as document filing systems, which may be able to use the facilities provided for the storage of bulky data, but are not a prime motivation for the creation of the system.
4. What features do we need? Many of the ‘conventional’ applications can benefit from most of the features of a commercial relational database system. These applications are all characterised by small data-rates and a dominance of terminal access. Thus network access to a few systems running commercial relational data-
base systems is enough to satisfy the needs of these applications. For the more specialised HEP applications, the needs are: 1. Efficient retrieval of bulky and/or numerous data according to simple search criteria. Normally, when a physicist says ‘efficient’ he means ‘close to hardware speeds’. Occasionally, he is more demanding, and requires that elements of the database which will be accessed repeatedly are available at memory-rather than disk-access speeds. 2. Reasonably efficient insertion of bulky and/or numerous data. (Unlike an airline reservation system, insertions are much less frequent than retrievals). 3. Fortran access (essential). 4. Terminal access (desirable; it can always be provided via Fortran if it doesn’t exist). 5. Concurrent write access. It must be possible to lock (ideally automatically) an adequately small fraction of the database. 6. Portability, both for the Fortran code and the database contents. The latter is impossible unless the database system ‘knows’ the data type of every data object. 7. Robustness. It is desirable that program or system crashes leave the database in a well-defined state. This is essential for banks and must be a good idea for airline reservations and HEP. 8. Security to reduce accidental damage or hacking. No system can be expected to know whether you really want to do the daft operation you
Table 1 A quick review of database systems and HEP needs ORACLE Full features Efficiency Fortran access Terminal access Concurrent writes Portability of Fortran Portability of data Robustness Security Cheapness
SQL/DS
INGRES
KAPACK
ZEBRA-RZ
..
... ••
•..••
•••••
... .....
....
.....
.....
.....
.....
...
... ...
.....
.....
.....
..... ..
S...
..
.....
R.P. Mount / Database systemsfor HEP experiments
have just requested, so no security system will prevent all accidents. 9. Cheapness (preferably infinite cheapness). This is almost mandatory for any system which must be installed at most institutes. To break up any semblance of boring logical development, I inject, by means of table 1, a quick review of a number of database systems compared with these needs. The studies on which much of this information is based will be described later. I have included four systems tested by L3 (ORACLE, IBM SQL/DS, KAPACK and ZEBRA-RZ), and one system (INGRES) already widely used in HEP about which I have ‘hearsay evidence’. I have used the ISO 5-blob system to indicate how well each system performs in each of the above categories. The ‘Full Features’ comparison shows immediately which of these products is a commercial database system, and which are packages written within HEP to satisfy more restricted needs. Apart from this, only a few entries merit further comment at this stage. The comrnercial systems get full marks for portability of Fortran if they are available on most machines, and the code can be written in the SQL query language. Full marks for portability of data would require that the system ran on many machines, and could automatically translate all data-types when exchanging data.
5. Database architectures After a first taste of real information, it is now time for another philosophical diversion. I will outline the architectural differences between database systems, and how these differences might affect HEP usage. Let it be clear form the start, that I consider only the logical structure visible to the user, and not the various clever tricks for efficient storage and retrieval which may be internal to the database system. The structure of relational, hierarchical and network databases is compared in Fig. 1, using lists of L3 collaborators and institutes to illustrate some of the features and problems. Relational Database Systems handle this sort of
301
‘administrative’ data most naturally. The data are arranged in tables where both the tables and the columns have names. All columns in a table have equal status; it is equally easy to list all collaborators who belong to Princeton, or all collaborators who are in the MUON group, or all collaborators whose names begin with ‘P’. In languages like SQL, a single query can also extract, for example, the telex number of J. Branson, by a ‘join’ of the ‘Collaborators’ and ‘Institutes’ tables. Hierarchical Database Systems are fairly closely adapted to the needs for data storage in HEP. However, my simple administrative example serves to point out some of the logical difficulties encountered when using hierarchical systems. The fact that the L3 collaborators may be subdivided into either Institutes or Groups forces us to choose between duplication of information (to simplify access) or very complicated ‘navigation’ of the database to answer seemingly simple queries. In fig. 1 have chosen to duplicate some information for ease of access. Duplication brings with it not only a waste of resources, but also Mr. Murphy’s assurance that the ‘duplicated’ data will never be the same in the two places. Network Database Systems fix-up some of the problems of hierarchical systems by allowing a departure from a simple tree structure. The figure shows how additional links remove the need to duplicate information. However, since the structure of a network database is arbitrary, it can be queried only by a person or process with detailed knowledge of the structure. In spite of (or because of) their potential for chaotic complexity, small network databases are very popular in HEP as memory-resident ‘event data-structures’. It is debatable whether it is desirable to permit such complexity in larger HEP databases. The example I gave helped to bolster the propaganda statements that relational databases are simple to use because they offer ‘automatic naygation’. I will now try to look less kindly on this propaganda. As soon as a relational system is used for any large scale application, it becomes necessary to impose an effective hierarchy. The data must first be subdivided into tables. Then unplea-
302
R. P. Mount
/ Database systemsfor HEP experiments
COLLABORATORS
a
NAME
INSTITUTE GROUP
Bakken, J.
Princeton
BGO
INSTITUTES
Pohi, M.
ETH Zurich
TEC
INSTITUTE ADDRESS TELEX
Branson, J.
MIT
Muon
ETH Zurich
Wenaus, T.
MIT
Muon
Michigan
Mills, G.
Michigan
HADCAL
MIT
—
Princeton
b
L3
Institutes
Groups
MIT
Princeton
ETH
address telex etc.
address telex etc.
address
Muon
TEC
BGO
telex etc.
Branson
Wenaus
Bakken
Pohi
Branson
Wenaus
Pohi
Bakken
Muon
Muon
BGO
TEC
MIT
MIT
ETH
Princeton
c
L3
Institutes
MIT
Princeton
ETH
address
address
address
telex etc.
telex etc.
Groups
telex etc.
Muon
TEC
BGO
¾_ •~~_ .¾. •q~
•
•~
Branson
Wenaus
PohI
Bakken
Fig. 1. Examples of Database Structures: (a) Relational; (b) hierarchical; (c) network.
R.P. Mount / Database systemsfor HEP experiments
sant things called ‘INDICES’ must be created on columns or combinations of columns to make access to the data acceptably efficient. The creation of indices is a statement of how the data are likely to be accessed; in other words it is a statement of their hierarchical structure. It is my opinion, that for many HEP applications, such as the handling of alignment/calibration information, a relational database system requires just as much additional software as a more simple hierarchical ‘access method’,
6. Personal histor~’ To understand why I am prepared to go to a conference and talk about databases, we must look back a few years to the EMC muon scattering experiment. I joined EMC before the experiment was installed and I was involved in installation, data-taking and analysis for five years. EMC was the largest collaboration in HEP, and like most of its contemporaries and predecessors, it used the HEP-standard notebook/back-of-envelope/non-existent data management and analysis management system. As, I suspect, in many other experiments, this system delayed the publication of physics results, reduced the amount of usable data, and increased the systemmatics we had to quote. Perhaps because we had almost no competition, this was a shame rather than a disaster.
7. L3 database history L3 is the largest and most precise HEP detector ever built. For example, the muon chambers occupy a volume 12 m long by nearly 12 m in diameter and can measure muon trajectories with a systemmatic error of less than 30 micron. Manual mangement of the data from the many alignment and calibration systems would be a hopeless task. The L3 software group began to look at the ‘database problem’ in 1984 and quickly formulated the following questions: 1. How much data?
303
2. Frequency of access? Read access? Write access? 3. How to use the database on diverse computers in diverse places? 4. Does a suitable system exist? In HEP? Commercially? A survey of the needs of the components of the experiment [1] provided rough answers to the first two questions. The database would grow by several — —
—
—
hundred megabytes per year and data would be read from it every few events when processing events. Write access would be less frequent. It was also obvious from the beginning that L3 could not use any system which was so expensive, or so machine dependent, that it could not run at many home institutes, Most people have heard horror stories about database systems which consume complete mainframes and still take minutes to answer simple queries. Against this backdrop, and knowing that our application was somewhat unconventional, we decided to build a ‘model database’ using any available tools, Although the primary motivation for the model database study was to see whether we needed to panic, it soon developed into a way of making comparative evaluations of database systems.
8. L3 database evaluation & 1. The model database To understand the approach used in the model (and final) L3 database system, it is essential to consider the folowing axiom. AXIOM.’ No existing database system should be used to store just a few numbers per row (for bulky data). If, for example, we tried to store the calibration constants for 12000 BGO crystals in 12000 rows, the results would be inefficient storage and murderous access times. Although I have elevated this statement to the level of an Axiom, it is really
304
R.P. Mount / Database systemsfor HEP experiments
pure pragmatism. Future database systems may have sufficient intelligence or psychic powers to cache in memory most of the rows a program is likely to want. Thus using a commercial relational system to describe and access event data structures may eventually prove efficient and elegant. The L3 model database contained geometry, calibration, alignment and status data in large chunks. Some of the chunks, for example a calorimeter calibration, were assumed to be about 200 kbytes, whereas more typical chunks were 20 kbytes to 40 kbytes long. Some of the data were assumed stable, but most chunks had validity penods ranging from minutes to days. To query the database, a user program might ask for: Muon-Chamber Geometry on 1-April-1990 at 11:59 Latest Version —
—
—
The program would then do nothing if the Muon-Chamber Geometry currently in memory was valid on 1-April-1990 at 11:59, otherwise read a directory record and as many update records as necessary to get a valid ‘chunk’, return an ‘old version’ if explicitly requested.
The model system was first implemented using the CERN ‘Keyword Access Package’ KAPACK. This package had such limited facilities that it was necessary to invent an ad hoc directory record format and to store these directories as KAPACK records. When the test program was later converted to run with SQL/DS, ORACLE and ZEBRA-RZ, laziness preserved the ad hoc directory records. It would have been much more elegant, and possibly more efficient, to use the directory or table searching capabilities of the more sophisticated packages. I will now review the facilities offered by the four systems before arriving, eventually, at a summary of their performance. 8.2. KAPACK KAPACK [2] is a Fortran callable package written at CERN which can store and retrieve
variable length data with two keywords. Although it offers only two keywords, the keywords can be long character strings. In the model database implementation we used keywords like: TEC.STATUS UPDATE. SERIAL 005736. VERSION 007 Obviously, KAPACK is not a database system, and its main attraction was simply that it was free. 8.3. SQL/DS SQL/DS is an IBM product for VM systems. The SQL query language was originally a product of IBM research. However, the published ideas were first implemented by the Oracle Corporation, and the IBM product is relatively new. SQL/DS is a full relational database system supporting (with effectiveness described later) both terminal access and Fortran access. Each column of an SQL/DS table can be declared to be one of ten types. The most relevant types are: INTEGER FLOAT VARCHAR
32 bit signed integer IBM floating-point number Variable number of characters ~ 255 LONG VARCHAR Variable number of bytes ~ 32763.
The LONG VARCHAR field can be inserted and retrieved from Fortran. It cannot be displayed interactively and it cannot be searched in an SQL query. The KAPACK implementation of the model database was converted to SQL/DS by using five columns for the data which had been compressed into KAPACK keywords, and using a LONG VARCHAR column for the data itself. 8.4. ORA CLE ORACLE is also a full relational database systern. The supported data types are: NUMBER Integer or floating point, up to 20 significant digits DATE Date and time (to one second) CHAR Variable number of characters ~ 255 LONG Variable number of bytes s~ 65535.
R.P. Mount / Database systemsfor HEP experiments
Apart from the addition of the very useful DATE data-type, the ORACLE data types are rather similar in functions and restrictions to those of SQL/DS. It was possible to convert the model database software from SQL/DS to ORACLE in about half an hour. Both SQL/DS and ORACLE were chosen for test because they were availabe at CERN.SQL/DS was installed for a trial period on CERNVM, and was later replaced by ORACLE. &5. USING SQL I found SQL, and in particular its use from FORTRAN, full of pleasant and unpleasant surprises. For this reason, I think it worthwhile to describe some of what has to be typed or coded to use SQL. Most of my examples are identical in ORACLE and SQL/DS. Terminal access It was a pleasant surprise to find that SQL statements are usually easy to understand and remember. For example, here is how the model database table was created: CREATE TABLE MASTERTABLE (DETECTOR CHAR(8), DATA_TYPE CHAR(8), RECORD TYPE CHAR(8), VERSION INTEGER, SERIAL INTEGER, DATA LONG VARCHAR) IN L3TESTDB —
— —
-
—
—
305
SELECT DETECTOR, DATA_TYPE, RECORD TYPE, VERSION, SERIAL FROM MASTERTABLE WHERE DETECTOR ‘TEC’ AND DATA TYPE ‘GEOMETRY’ =
which would result in: _____________________________________ DETEC- DATA RECORD VER- SERITOR TYPE TYPE SION AL TEC TEC TEC TEC TEC TEC TEC TEC TEC TEC TEC TEC TEC
GEOMETRY MASTER GEOMETRY BASIC GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE GEOMETRY UPDATE
0 0 1 1 1 1 2 2 2 2 3 3 3
0 0 1 2 3 4 1 2 3 4 1 2 3
Fortran access Both SQL/DS and ORACLE support Fortran access via a pre-compiler. In principle, the SQL commands are just prefixed by ‘EXEC SQL’ and inserted in the code. Some sample code from the subroutine which inserted data into the model database shows how this is done:
-
SUBROUTINE PUTSQL(C1,C2,C3,11,I2,NDATA,DATA) CHARACTER*(*) DATA *
An unpleasant surprise was that it could take up to 15 minutes for SQL/DS to answer a simple query on such a table containing 100 megabytes of data. For both SQL/DS and ORACLE it was, in practice, essential to create an index on the table:
EXEC SQL BEGIN DECLARE SECTION CHARACTER *8 C1,C2,C3 INTEGER 11,12 CHARACTER * 32765 LVCHAR INTEGER*2 STRNGL CHARACTER* 32763 STRING COMMON/SDATA/ STRNGL, STRING EQUIVALENCE (LVCHAR, STRNGL) EXEC SQL END DECLARE SECTION
CREATE INDEX MYINDEX ON MASTERTABLE (DETECTOR, DATA TYPE, RECORD .TYPE, VERSION, SERIAL) *
Simple queries were indeed simple, for example,
STRNGL NDATA STRING(1:STRNGL) =
=
DATA(1:NDATA)
R.P. Mount / Database systemsfor HEP experiments
306 *
EXEC SQL INSERT INTO MASTERTABLE 1 (DETECTOR, DATA TYPE, RECORD .TYPE, VERSION, 2 SERIAL, DATA FIELD) 3 VALUES (:C1, :C2, :C3, :11, :12, :LVCHAR) *
EXEC SQL COMMIT WORK *
END The INSERT statement is exactly equivalent to an INSERT typed at a terminal. In contrast, the unpleasant rigmarole in the ‘DECLARE SECTION’ is needed to handle the LONG VARCHAR data type in Fortran. This example is the SQL/DS rigmarole; the corresponding ORACLE rigmarole is different. The COMMIT WORK command is a key feature of the SQL error recovery system. Until a COMMIT WORK is executed, any changes made to the database may be removed by a ROLLBACK WORK command. In the event of a serious error or crash, the database will automatically be ‘rolled back’. Retrieving data into Fortran variables is more complicated. A glance at the SELECT conmiand and its result shown above demonstrates the problem. Although INSERT commands put in one row at a time, a SELECT command may return a table. The inelegant way in which this problem is ‘solved’ in SQL is shown below: EXEC SQL DECLARE CRSR CURSOR FOR 1 SELECT DATA FIELD FROM MASTERTABLE 2 WHERE DETECTOR :C1 AND DATA_TYPE :C2 3 AND RECORD_TYPE :C3 AND VERSION :11 4 AND SERIAL :12 =
=
=
=
=
*
EXEC SQL OPEN CRSR *
If there really are multiple rows satisfying the SELECT, then repeated execution of the FETCH statement will return the data from them. 8.6. ZEBRA-RZ The ZEBRA package has been written at CERN to manage data structures within Fortran programs, and to transport these structures in a machine independent format. Within ZEBRA, the RZ utility manages the storage and retrieval of data on random access disk files. RZ supports a UNIX-like hierarchical tree of directories. Any directory may contain both sub-directories and data objects. Within directory, dataProvided objects are identified by up toa nine 32-bit the ‘keys’. that RZ is given enough space, the directories and keys reside in memory, and only the data reside on disk. As with and SQL/DS, thetake model database wasORACLE only minimally adapted to advantage of the power of RZ. The database was subdivided into directories like: //L3/BGO/GEOMETRY //L3/MUON/CALIBRAT
etc.
Within the directories the data object had keys: Data Type (Master, Basic, Update) Version No. Serial No. 8.7. Good news, bas news, Anecdotes In order to save the comparative performance information until the last moment, I will now present some snippets of information which didn’t fit in anywhere else. First a caveat. Database systems, particularly commercial systems, develop rapidly. The problems and limitations which I describe, existed when we tested the systems, but are unlikely to persist indefinitely. For example, when I attempted to criticise INGRES (which we did not test), it was immediately pointed out to me that all the limitations I described no longer existed.
EXEC SQL FETCH CRSR INTO :LVCHAR *
EXEC SQL CLOSE CRSR
KAPACK KAPACK provided no surprises. It is a simple
ltP. Mount / Database systemsfor HEP experiments
Fortran callable system which keeps no record of data types, provides single user access only, has no rollback facilities etc. etc. SQL/DS I think we tested version 0.001 of the Fortran binding. The very first version we tried didn’t work at all, but this was rapidly replaced by a version that was merely cranky. Apart from constructs which wouldn’t work unless coded in some particular way, and the tendency of the pre-compiler to crash in response to syntax errors, there was an annoying architectural limitation. The pre-compiler retrieves information about a table from the database before generating the Fortran calls to satisfy a query. Thus, a program which creates, fills and queries a table will be rejected by the pre-compiler. Not specifically a Fortran problem, but one which had a severe effect on the test results, was the need for explicit requests to update the indices on a table. Before creating an index, a model database query took 15 minutes. The creation of the index also took 15 minutes, but after this queries were fast. However, if more rows were added, these rows were not automatically indexed, and query times expanded dramatically again. In most other respects SQL/DS did not give problems. The system got stuck a few times, but since it was only being offered as a trial service by ‘managers’ only marginally more familiar with it than the users, this should not be taken too seriously. ORACLE The ORACLE Fortran pre-processor gave us no problems, although the comparison with SQL/DS is not entirely fair since we used the code which SQL/DS had finally been persuaded to accept. Indices were treated in the way we wanted. It was possible to create an index on an empty table and the index was updated automatically as the table was filled. When we tested ORACLE it had just been installed as a service on the VM and VAX systems at CERN. However, it had been in use by the LEP machine group for more than a year, and so it is not surprising that it appeared more stable than SQL/DS. Nevertheless some
307
instances of crashes involving data loss have been recorded by other ORACLE users at CERN. ZEBRA -RZ Like KAPACK, RZ [3] is a relatively simple Fortran callable package written within HEP. RZ scores over KAPACK in its hierarchical directory structure, its support for data-typing and machine-independent ‘exchange mode’, and in its integration with the increasingly popular ZEBRA data manager. Each directory can be password protected and multi user write access is supported by allowing individual directories to be locked and unlocked by Fortran calls. By cacheing the directory structure and the ‘keys’ in memory, the package ensures that most retrievals require only one disk read. ZEBRA itself provides facilities for storing a description of the data-types within each ‘bank’ and the automatic translation according to these types when the data are transferred from one machine to another. RZ was very new when we tested it. Minor bugs were still emerging, and the specifications were still changing (often after requests from L3.) My impression was that stability was not far away. 8. & Performance tests The results of the model database tests are shown in table 2. Please remember that the L3 model database was a very specific test, and further, that the model was first optimised for the simplest package KAPACK, and then minimally restructured to use SQL/DS, ORACLE and ZEBRA-RZ. Before I draw any conclusions from the results it is necessary to explain some of the figures. The database size usually had little effect on performance provided an index was created over the columns acting as keywords in the SQL/DS and ORACLE tables. The notable exception was the insertion in an SQL/DS table where the necessary occasional retrieval of model database directory records took longer and longer as more ‘unindexed’ rows were added to the table. As implied above, the ‘insert performance’ figures include the time required to retrieve directory
308
R.P. Mount / Database system.s for HEP experiments
Table 2 Results of the L3 model database tests
Database size Megabytes ‘Rows’ Insert performance Elapsed Time small row (ms) large row (ms/kbyte) CPU Time small row (ms) large row (ms/kbyte) Readperformance Elapsed Time small row (ms) large row (ms/kbyte) CPU Time small row (ma) large row (ms/kbyte) Read model L3 reconstruction environment: 40 rows, 500 kbytes.
ORACLE IBM 3090
ORACLE VAX 8600
SQL/DS IBM 3081 (night)
KAPACK IBM 3081 (day)
RZ IBM 3090
RZ VAX 8650
120 7000
6 350
120 7000
420 20000
18 1000
18 1000
330
26
<10 6.5
<10 3—7
77
1
55 8
20 4
100—300 6—17
0.7
50—80 6
200—1000 13—77 25 3.5
100—150 10
1
25 2.7
6 s
10 s
records for modification. Where possible I have broken the timings down into the time to retrieve a small row and the time per kilobyte to retrieve large rows. In some cases, only an average time was available, which is most properly compared with the large row timing since the average size was about 30 kilobytes. The very small insertion times for small rows in RZ reflect the buffering in memory which required only occasional disk writes. (They also reflect the fact that the buffered data would have been lost if the program had crashed.) The ‘read performance’ figures show much less spread, partly because no package had major problems, and partly because the disk access speed provided an upper limit on performance. It is my prejudice that the relatively poor performance of RZ on an IBM 3090 (compared with KAPACK on a 3081) was due to strangulation by the VM/CMS file system. KAPACK running under MVS did not have this problem. & 9. Conclusions of the L3 model database tests As in any other branch of computing, those who need the very highest performance should
6s
1 0.65
59 7
2 0.4—0.7
78 4
0.7
2 0.13
1 0.32
3 s
6 s
5 s
write the software themselves (in assembler, of course). If you would be happy with what I consider acceptable performance, then you could use ORACLE, RZ or KAPACK (plus probably other systems not tested such as INGRES). Whichever system you use, you will have to add a lot of software, both to support the high level functions needed by the application, and to optimise performance where the simplest way to use the database would be too slow. Since RZ offers much more than KAPACK, and is still free, I have no hesitation in narrowing the choice to RZ versus ORACLE. Both RZ and ORACLE would have been technically acceptable to L3. Both are available on the main IBM and VAX systems at CERN, and ORACLE could be installed on the L3 data acquisition computer at tolerable cost. Some of the more administrative applications in L3 already make use of ORACLE, and since there is no clear point at which administration stops and physics begins, it would have simphfied matters to use ORACLE throughout. Although it is clear that RZ has many strong points, its choice as the service for all L3 offline .
.
R.P. Mount
/ Database systemsfor HEP experiments
software was made primarily, and inevitably, on financial grounds. Requiring any institute in the collaboration to install ORACLE as a pre-requisite for most L3 analysis software was an impossibility. The ORACLE corporation were ready to negotiate on price, on the basis of acceptable restrictions on use and on support, but we ended up a few factors of two away from something I had the energy to sell to L3.
9. The 13 database system
The detailed design of the L3 offline database system is now in progress, and attempts will be made to extend the system to as much of the online system as possible I will not give details of this work, but I would like to outline what we intend to achieve, The L3 offline database has two almost distinct functions. First, and most mundane, is the storage and retrieval of information related to the state of the detector, These data objects will be identified principally by ‘pathname’ and ‘detector time’, the data-taking time for which they describe the detector. For special purposes the insertion time and inserting process identification may also be relevant. The first part of the L3 database system, which is already working in a simple form [4], manages the creation, and usage of time-keyed data objects. In this context ‘usage’ has a rather precise meaning, involving physical retrieval only if the object is not in memory or is not valid for the current detector time. The second part of the L3 database system will manage data about the state of the analysis. This will involve much more ambitious logic than the data-intensive but conceptually simple ‘state of the detector’ database. The ‘analysis log’ database is still at an early design stage. The aims are to keep a full and automatic record of all activities such as calibration processing and event reconstruction. Program identification, details of any code modifications, details of all input parameters and datasets, details of output datasets and database objects, should all be recorded automatically.
309
10. Other examples of database usage by HEP experiments L3 did not invent the idea of using database systems in HEP experiments. Among the LEP experiments, DELPHI was the first to build a database system (using KAPACK) and ALEPH performed careful reviews of the commercial products available about two years ago [5]. I apologise in advance for oversimplifying the situation, but I still think it is valuable to review the database usage at CERN with which I am reasonably familiar. I will ignore all ‘trivial’ uses such as list of collaborators. ALEPH are using ORACLE as the ‘master database’ for online and offline applications. Why can ALEPH afford ORACLE when L3 cannot? A full answer would require a detailed description the ALEPH software design. A simple answer isofthat it
is not intended that the majority of ofifine
programs access the ORACLE database. Database information needed for offline processing will normally be extracted from ORACLE and written on to the data tapes. Within offline programs ADAMO will give the software access to the database information. By accepting severe limitations on program-database interaction, ALEPH have ensured that they will only need ORACLE on computers at CERN. DELPHI are paying a small penalty for being the first to build a database system into their software. Their KAPACK based system was designed three years ago when KAPACK was the only system available at CERN. They may now be considering a move to ZEBRA-RZ in view of its better facilities and likelihood of long-term support. OPAL have still not made any firm decisions, and therefore might be able to do better than everybody else. orMy understanding is isthat use disof ORACLE ZEBRA-RZ or both being cussed. The Fastbus Management System is a project led by CERN-DD with strong participation of the two experiments with the most complicated Fastbus readout systems, ALEPH and DELPHI.
310
R.P. Mount
/ Database systemsfor HEP experiments
ORACLE, accessed through an ADAMO interface, will be used to store details of every module and crate in a readout system. Information on the physical interconnections will also be stored so that (for example) routing tables can be generated almost automatically with user guidance being sought to select preferred routings.
really need. We don’t refuse to buy IBM computers because they have decimal instructions and we must accept that any successful product will not be matched exactly to the HEP needs. The most that we can hope for is that our needs will be met in some respects and exceeded in many others.
11. Concluding comments
References
Commercial database systems almost made it as general purpose tools for LEP-era HEP software. This time round a combination of unfamiliarity and cost has restricted their wider use, and promoted the writing of efficient but rather restricted packages within HEP. For the next (SSC) generation of experiments, further performance enhancements and cost reductions may make commercial systems more attractive. Nobody today complains about having to install Fortran and there is some chance that the idea of installing a database system will approach the same level of acceptability. A final comment for those who notice all the features of commercial systems which HEP doesn’t
[1] L3 Offline Software Group, ‘Requirements for the L3 Data-Base’, L3 Report No. 315, 1984. [2] KAPACK, CERN Program Library No. Z303. R. Matthews, KAPACK Long Writeup, August 1983. [3] ZEBRA, CERN Program Library No. Q100. R. Brun, M. Goossens, J. Zoll, ZEBRA Long Writeup, 1987. [4] E. Nagy, The Data-Base System DBL3, L3 Report No. 468, 1987. [5] Examples of ALEPH Studies: T. Kokkinias, P. Palazzi, S. Santiago, E. Zevgolatakos, Evaluation of BCS RIM, a Relational Database Management System, ALEPH SOFTWR 84-07, 1984. R. McClatchey, Evaluation of for a Commercial Relational Database Management System a Particle Physics Experiment, ALEPH-COMPRO 85-3, 1985. S. Fisher, ALEPH Data Base Workshop, ALEPH-COMPRO 85-5, 1985.