OSIPE — a tool for scientific programming in FORTRAN

OSIPE — a tool for scientific programming in FORTRAN

Computer Physics Communications ELSEVIER OSIPE Computer Physics Communications 81(1994) 293—317 — a tool for scientific programming in FORTRAN Fr...

2MB Sizes 5 Downloads 33 Views

Computer Physics Communications ELSEVIER

OSIPE

Computer Physics Communications 81(1994) 293—317



a tool for scientific programming in FORTRAN

François Colonna “

a,

Luc-Henri Jolly a Raymond A. Poirier Georg Jansen

b,

János G. Angyán

Laboratoire de Dynamique des Interactions Moléculaires (UPR 271), Université Pierre et Mane Curie, 4, Place Jussieu, 75005 Paris, France b Department of Chemistry, Memorial University of Newfoundland, St. John’s, Newfoundland, A1B 3X7 Canada Laboratoire de Chimie Théorique, Université de Nancy I, B.P. 239, 54506 Vanda?uvre-lès-Nancy, France

Received 12 July

1993

Abstract FORTRAN tools have been designed to support well-structured, modular, route-independent and easily interfaceable programming. This set of tools, which creates a kind of low-level object-oriented programming environment, is refered to as OSIPE (Open Structured Interfaceable Programming Environment). Its main role is to transform meaningful FORTRAN variables into Objects which are linked via “created-by” relationships. Two programs, written in the OSIPE environment, an ab initio electronic structure code, MUNGAUSS, and a novel implementation of a quantum chemical solvent effect theory, SOLVENT, are briefly described. Efficient tools are provided to restructure existing FORTRAN codes and make them compatible with the OSIPE environment.

1. Introduction Scientific computer programs are often becoming, during their evolution, quite complex and very programmer unfriendly, containing many idiosyncrasies of the programmer(s). Large computational chemistry codes provide a typical example, where modifying or adding new features can be tedious and hazardous, even for those well acquainted with the code. In general, when working with these program systems, changing or adding new features requires knowing a lot of technical details, for example: the name of the variables and whether they are in common blocks or on disk; the unit number of the files used by the program; the options, which are often identified by numerical values; such as: —

— —

if(IOP(20).eq.1O)

then;

the names of the various commons and their content; how to ensure that all the data required to compute a variable is available, that is, how routes #1 (sequence of calls) are generated; how to redimension (if possible). Furthermore, these programs tend not to be modular and portable. All in all, these codes are quite difficult to understand and hard to debug, which may limit their value as an efficient research tool. — —



OO1O-4655/94/$07.OO © 1994 Elsevier Science B.V. All rights reserved SSDI 0010-4655(94)00016-U

294

F. Cabana eta!. / Computer Physics Communications 81 (1994) 293—317

OSIPE (Open Structured Interfaceable Programming Environment) eliminates these difficulties. OSIPE is not a language but a set of simple but rigorous FORTRAN programming tools that make adding and understanding code easy. The tools guarantee: (i) unambiguous identification and access of any Object #2 anywhere in the program (Open), (ii) the easy addition of any new features to an existing code (Interfaceable), and (iii) that additions will have no effect on the program logic (Structured). Unambiguous identification of an Object means that its content is always guaranteed to be well-defined and cannot be overwritten accidentally. OSIPE considers a program as an ensemble of independent routines creating Objects which occupy well-defined memory segments. The Open, Structured and Interfaceable properties mentioned above are realized obeying two basic principles: (i) an Object can only be created by one routine (its allocationroutine), which creates nothing else, and (ii) in order to ensure their accessibility all Objects are stored in a single array, located in a large, unique common block. A code written according to the first principle is necessarily structured and modular. In order to cope with the above requirements Objects are handled in OSIPE by the functions putobj, getobj and bldobj. Function putobj returns the address of a new-Object, i.e. an Object to be created and therefore supposedly not existing. Any previously existing version of this Object will be deleted. Function getobj returns the address of an existing-Object needed to create a new-Object. If the existing-Object is not present in memory or on disk, g e to b j activates its creation by calling a third function, bldobj. This function “knows” which allocation-routine is associated with a given Object and executes it. Adding a new feature to a program merely involves adding a new item to b Id o b j. The program is then able to automatically build this new-Object whenever it is required. By the virtue of these tools to handle Objects, the concept of route and of main have disappeared. For example, the request to print an Object triggers its creation and recursively the creation of all Objects it depends on. In addition to the three principal functions OSIPE provides other tools written in standard FORTRAN, necessary for safe, route-independent and easy management of Objects: compression, deletion, printing, checking and saving, and also to transform semi-automatically the old code into OSIPE-code. The OSIPE project is not intended to make any contribution to ongoing research in computer sciences. Instead, it just tries to add some of the concepts of non-procedural programming languages to standard FORTRAN in order to facilitate the evolution of computational chemistry codes. The text below presents a brief overview of facilities offered by OSIPE. We shall define and present the concepts used as well as the main tools. Furthermore, two scientific codes, MUNGAUSS and SOLVENT, which exemplify the use of OSIPE are briefly described.

2. OSIPE: an overview In this section, we will show how the principle, essential for program extension, “an Object can be built from anywhere in the program” is implemented via OSIPE tools. 2.1. Global data communication When programming in FORTRAN one is faced with two choices for communicating information throughout the program: common blocks or argument lists. In general the most important FORTRAN

~ #2

The route structure is due to the use of a procedural language (FORTRAN) and is a serious obstacle to code evolution. Here, Object is taken to be entity the program is able to address i.e. scalar, vector, matrix, tensor,... It is not a type extension as

in Object-Oriented Programming [1,2]. It is yet an Object in the sense that its creation is implied by its declaration.

F. Colonna eta!. /Computer Physics Communications 81 (1994) 293—317

allocation—routine: subroutine ALLOC_O common / CStack / stack(maxstk) common / CO / addresso, dimension_0 common / COl / address_O1, dimension_Ol common / C02 / address_02, dimension_02 * gathering addresses: address_O = allocation-of-a—new—Object-in-Stack * computation: call CALC...O (Stack(address_O), Stack(address_O1), ! from common Stack(address_02), ! from common dimension_O, dimension_Ol, ! from common dimension_02) from common return endS algorithmic-routine: subroutine CALC_O (0, 01, 02, dimension_O, dimension_Ol, dimension_02) double precision O(dimension_O) double precision O1(dimension_O1) double precision 02(dimension_02) algorithm 0 = function—of (01, 02) return end

295

COl C02 COl C02

Fig. 1. Example of an allocation-routine without argument programmed with usual FORTRAN. Object 0 is created from Objects 01 and 02. 0’s address is computed in a special function. The addresses are stored in common blocks and passed to the algorithmic-routine as addresses of stack elements.

variables (which are considered in OSIPE as Objects) are stored as over-dimensioned arrays of a

common block. This procedure makes them global, i.e. accessible from any part of the program, but implies a loss of memory space, and a dangerous and inconvenient management of the array dimensions. Moreover, each common block contains several variables, some of which are not necessarily used in a given routine and can interfere with local variables. On the other hand passing all the variables as arguments of a subroutine allows for the use of exact dimensions, but this method makes the maintainance of a large code very difficult: all routines depend closely on each other by their argument lists. In some sophisticated codes, such as CHARMm [31or Xplor [4], a combined ‘tipproach is used, which guarantees minimal memory occupation, and encapsulation of local variables (non-interference with global ones). These codes are written according to the scheme illustrated in Fig. 1. • All the important variables are stored as segments of one large array (named stack), located in a common block and refered to here as the S t a c k. Allocation is done dynamically during execution of the program. The position in the S t a c k (stack-address) and the size (dimension) of the variables are stored in other common blocks. This procedure avoids over-dimensioning of arrays and ensures globality. • In order to recover individual arrays stored in the S t a c k segments one uses the property of FORTRAN to pass variables by address. Subroutines are therefore sub-divided into allocation- and algorithmic-routines. The allocation-routine contains the S t a c k common block and the common blocks with the stack-addresses and variable (arrays) sizes. It calls the algorithmic-routine to which variable

296

F. Colonna et at. / Computer Physics Communications 81 (1994) 293—317

addresses are passed by its argument list as addresses of elements of the S t a c k (stack-addresses). This algorithmic-routine performs the calculation. Arrays coming from the argument list can therefore be dimensioned exactly and the other variables, local to the routine, cannot interfere with the content of the common blocks located outside. Fig. 1 illustrates such an implementation. In this example Object 0 (array a) is built from two existing Objects (arrays) 01 and 02. The calculation of 0 from 01 and 02 is done by the algorithmic-routine CALCU. In this subroutine the arrays 0, 01, and 02 are dimensioned exactly at dimension 0, d i men s i onUl, d i men s i on_U 2. ALL U Ca, on the other hand, does no calculation, has no argument list and its purpose is simply to extract from the S t a c k Objects 01 and 02 necessary for the calculation of U and to allocate it. This allocation is done by a function which knows where to implement any new Object in the Stack, given its size. In our example, 01 is stored at index address 01 and 02 at index add r e s s_U 2 of the array stack. In other words, the first value of the array 01 is located at the address of the Stack element stack(address 01), and thevalue Ui(i)stack(address 01). The stack-addresses of Objects 0, 01 and U2, (indices in array stack: address_U, address_Ui, address_U?) and their size (di mensi onO, dimension 01, di mensi on 02) are stored in common blocks C U, CU 1, CU 2, respectively. The memory addresses of Objects U, 01 and U 2 are communicatedtoCALCoargumentlistasmemoryaddressesofstack(addresso),stack(addressUi) and stack (address 02). Programming in this way ensures the modularity of the code and good encapsulation of the algorithmic parts avoiding possible interference between Objects. Moreover, this kind of code is relatively easy to maintain and modify. 2.2. Route independence As it stands, this approach yet does not satisfy the open requirement that, “an Object be accessible from anywhere” as the programmer has no information whether the Object already exists. Therefore, this approach still requires a deep knowledge of the route leading to Ui and 02 (i.e. compute Ui, compute 02 then compute U) before implementing AL LU C_U, which is exactly what we want to avoid. To overcome this difficulty the program would need to “know” whether a given Object, say 01, has been built, and if not, how to build it dynamically. Another difficulty which arises from this approach is that stack-addresses and sizes are handled using FORTRAN variables stored in common blocks spread all over the program. The programmer has to manipulate a lot of common blocks and to know in which common block the required stack-addresses are stored. Furthermore, no dynamic creation of Objects is possible. The management of Objects can be greatly simplified by giving each object a name (i.e. associate them with a character string, which is stored in the character array o b j n am (see Section 3.1). Object management tools can easily be built, and any kind of information associated with each Object (stack-address, size, type, name of allocation-routine etc.) can be accessed from the knowledge of its name by a simple scan of obj nam. All the common blocks like CO, CU1, CU? are no longer required. Moreover, it also becomes easy to check whether an Object is stored in memory, on disk, or has simply not been built yet. In that latter case, it is straightforward to build it by calling its allocation-routine. The implementation of such a methodology is shown in Fig. 2. Contrary to the example in Fig. 1 the allocation-routine A L LU C_U has only one argument: the name of the Object to be built. Since there is a unique relationship between the Object-name and its allocation-routine, having the Object-name as an argument does not interfere with our Open requirement. This argument is, in fact, necessary if one wants to build the names (at execution time) of Objects 01 and 02 from the name of U, by simple string manipulations (see Modalities in Section 3.4).

F. Colonna et a!. / Computer Physics Communications 81(1994) 293—317

297

allocation-routine: subroutine ALLOC_O (Object-name) include ‘stack.h’ ! contains array stack character* (*) Obj ect—name character*(maxstr) 0, 01, 02 lbuild .false. ! initialize * build existing-Object names from modalities of Object—name (if any): 01 = function-of (Object—name) 02 = function-of (Object—name) * gather indices: index_Ui = getobj (01) ! set lbuild=.true. if 01 does not exist if(lbuild) index_Ol = bldobj (01) ! automatic creation if needed lbuild re—set to .false. index_O2 = getobj (02) if(lbuild) index_02 = bldobj (02) * creation: index_O = putobj (Object—name, dimension_U, ‘REAL’, 8) * gather addresses: address_U = objadd(index...O) address_O1 = objadd(index_01) address_02 = objadd(index_02) * computation: call CALC_0 (Stack(address_O), Stack(address_O1), Stack(add.ress_02), dimension_0, dimension_Ol, dimension_02) return end algorithmic—routine: subroutine CALC_O (0 01, 02, dimension_0, dimension_Ol, dimension_02) double precision O(dimension_O) double precision Oi(dimension_Oi) double precision 02(dimension_02) Non-declarative part of the code 0 = function—of (01, 02) return end -

Fig. 2. OSIPE’s implementation of the building of Object 0 from two other Objects 01 and 02. The differences between this version and that shown in Fig. 1 are inserted in the inner frame. They clearly show the OSIPE specificity: Objects are not handled as FORTRAN variables but as character strings.

The indices (i ndex 01 and i ndex 02) of 01 and 02 in array objnam are gathered by the function g e to b j. This function switches the logical variable I b u lid to t rue. when the Object does not exist. This activates the function b I do b j, which creates the missing Object and returns its index (see Sections 4.2 and 4.3). When all necessary objects (here 01 and 02) are built, 0 is allocated by putobj (see Section 4.1). The stack-addresses are collected from the array objadd #3 and the calculation can be done in C A L C_U. Although this technique is obviously closely related to that of example 1, the introduction of the management of Objects by name represents a significant qualitative improvement, namely, the dynamic creation of Objects by b Id o b j. This frees the programmer not only from having to know in which

298

F. Cobonna eta!. /Computer Physics Communications 81 (1994) 293—317

FORTRAN variable the stack-address is stored but also about the route leading to the creation of an Object. Routine AL LU C_U is now completely independent of the way the rest of the program has been written: an Object address can be provided from any place in the program and an Object can be created dynamically if it does not exist. It must be emphasized that the algorithmic-routine (CAL CU, here) is independent of OSIPE tools and can be extracted without any change from an old code (there is a tool to do that, see Section 9). However, it must obey the simple rule “an Object can only be created by one routine, which creates nothing else”. If it is not the case, the code has to be restructured. The route problem having been solved, there is no need to know neither how nor when and where the Objects are built and the program can be easily interfaced or modified by people other than the author. The dynamic creation of Objects has another interesting property: it facilitates the parallelization of program execution on clusters. If we denote the relationship “created-by” by “<=“ then routine ALLOC U of Fig. 2 can be summarized by the scheme: 0<

=

01

0< = U? Suppose we also have these relationships: 01<

=

Oil

01< = 012 02< = U21 02< = 02? Any request to build 0 will set up the dynamic route which is the tree structure shown in Fig. 3. For cases where the branches of the tree, or at least sub-trees, are independent, they can, in principle, be executed on different processors. In other words, the creation of Objects leads to independent sub-tasks, each of them can be executed simultaneously on different processors (or cluster of workstations). This facility is under development.

3. Objects 3.1. Memory management

In order to ensure complete accessibility of Objects, the memory is reduced to a unique array S t a c k, of ma x s t k 8-bytes elements, located in a common block C S t a c k of include file (stack.h). The CS t a c k common image is shown in Fig. 4. In a certain sense the main role of the OSIPE package is to manage the Stack.

In OSIPE an Object is an arbitrary partition of the Stack, aligned on a 8-byte boundary. As mentioned in the introduction and in Section 2, it is produced by a unique routine: its allocation-routine. Each Object is associated with at least five items stored in arrays and common blocks of include file S t a c k h. These items include: name (array ob j n am), number of elements or cardinal (array o b j e I in), .

#3

The stack-addresses which have been defined by putobj could have been modified by compressing objects (cprobj) or by

dumping them on disk (freobj), that is why they are kept in the array objadd and not directly provided by getobj.parallelization of program execution on clusters.

F. Colonna et a!. / Computer Physics Communications 81(1994) 293—317

0 ~D

1

___________________

011

~

012

299

021

02

022

Fig. 3. Tree structure defined by the “created-by” relationship between Objects. This tree represents the dynamic route followed by OSIPE to build Object 0 from Objects 01 and 02, and Objects 01 and 02 from Objects 011, 012, 021, and 022. This structure can be used as a basis to parallelize the code automatically.

type (array o b j ty p), word size in bytes (array o b j by t) and address (array o b j add). All these arrays are indexed according to the sequence of Object names in the array ob j n am. These five arrays are described below: ob j n am is a character array containing the names of the Objects, which can be of arbitrary length. Names are usually 20 to 50 characters long. Each name is composed of a sequence of fields separated by a ““and freely chosen by the programmer. At present these fields do not obey any hierarchy. For example: DENS I I Y_M AIR I X could be the name of an Object containing the density matrix. As far as an Object is not created dynamically during execution, its name can be known at compilation

double precision head-mark, Stack common / CStack /head-mark, Stack(maxstk) <—Object 1 ——><—— Object 2 -->. . .<-—last Object——> content: HEADMARX<--data-->END...MARK<--data-->END_MARK.. <--data-->END_MARK

I I

I I

II I I

head-ma4

end-ma4

address of let Object

.

I I

end—mark

II II

end-m9~k

I

address of 2nd Object

I last Stack address:lsLdd

Fig. 4. Schematic representation of the organisation of the common block CSTACK.

F. Cobonna et a!. / Computer Physics Communications 81(1994) 293—317

300

time and therefore its index in array o b j n am can be determined a priori even if the object has not been stored yet. In this case its address is undefined (= 999999). ob j e I m is an integer array which stores the number of elements (cardinal) of each Object. o b j ty p is a character array which stores the type of each Object (‘REAL’, ‘I NI E GE R’, ‘BOO LEAN’ or ‘CHARACTER’). o b j b yt is an integer array which stores the length in bytes of the elements for each Object, (i.e. 8 for real Objects, 4 for Integers or Booleans and any integer value for Character strings). ob j add is an integer array which stores the address of each Object. An Object address is the index in the Stack of the first element of the Object i.e. stack (address) (see Section 2). 3.2. Scalars as objects Although Objects are usually arrays in OSIPE, it is also convenient to handle “global” scalars (single-word FORTRAN variables or individual strings) as Objects of one element, rather than storing them in common blocks. They can be any kind of parameter necessary to define the conditions of a given calculation. Scalars can be associated to a given Object or a collection of Objects. In this case their names are built from the name of the associated array-Object, to which the string _S Cx_ is added (“x” may be B, C, I or R) followed by a field specific to the scalar content: For example DENSITY MATRIX SCR THRESHOLD stores the threshold used to compute the arrayObject DENSITY MATRIX. In fact, OSIPE handles any name containing the string _S C x_ as a Scalar name. There are four types of scalars: SCB

for Booleans

_SCC_ for Characters

_SCI_ for Integers

_SCR_ for Reals

3.3. Matrices and tensors Matrix types It is sometimes useful to be able to recognize that an Object is a matrix (and which kind of matrix) or a tensor. This information is optional but can be specified by the addition of a six character field at the beginning of the name. For example, an upper triangular density matrix, can be named as MATUPI DENSITY MATRIX. There are five matrix types: MAT RE C_name: general rectangular matrix. MATSQG_name: general square matrix. MATSQS name: symmetrical square matrix. MATUPT name: upper triangular square matrix. MAT LOT_name: lower triangular square matrix. Cartesian tensors of any dimension can be names as: TENS2C name: 2 indices Cartesian tensor XX, XY, etc. TENS3C name: 3 indices Cartesian tensor XXX, XXY, etc.

301

F. Colonna eta!. /Computer Physics Communications 81 (1994) 293—317

Dimensions In the case of rectangular matrices and tensors, the dimensions are stored as scalar Objects. The names

of these scalars are built from the matrix or tensor name to which is added the field _s C ID

I MnO Fm.

This name characterizes the nth dimension of a tensor of m dimensions (D I Mension n 0 F m). For example, in the case of the rectangular matrix MATREC_COEF F_OCCUPIED_ORBITALS, the number of rows corresponds to the number of basis functions while the number of columns corresponds to the number of occupied orbitals. Therefore, the dimensions are stored in the two scalars MATRECCOEF F OCCUPIED ORBITALS Sd DIM1 OF2 =nb-of-rows and MATRECCOEF FOC CUPIED ORBITALS S C I_D 1M20 F2 = nb-of-columns respectively

In the case of tensors, 2 or more dimensions are stored, such as: TENS2C THIS THAT Sd TENS3C THIS THAT Sd

DIM1 0F2, DIM1OF3,

. .

...

.SCIDIM2OF2 SCIDIM2OF3,

...SCIDIM3OF3

3.4. Modalities: polymorphism of Objects Up to this point, Objects have been presented as individual entities. However, in some cases it is convenient to consider a group of Objects as belonging to the same class. For example, density matrices can be calculated using different formalisms (RH F, U H F, M P2 etc.) and albeit different they are all “density matrices”. Moreover, several Objects belonging to the same class may be required and therefore exist simultaneously in memory (Stack). Since Object-names correspond unequivocally to a memory zone, it is necessary to have distinct names for different Objects of the same class. However, some routines must be independent of which object in the class they are dealing with (i.e. the density matrix, irrespective of whether it is from an RHF, UHF,...). In those cases, the Object name must include a special field, containing the modalities of its creation, and which can be modified automatically during program execution. This technique avoids code duplication while ensuring the consistency of the content of the Object with the modalities of its creation. This is done by adding a “field” (currently 3 characters) to the name (which defines the class, e.g. ‘DENSITY_MATRIX )delimited by”:” and followed by the value or “modality” (string of 8 characters or less) of the field. For example, the different density matrices would be identified by: “=“

DENSITY MATRIX :WFNRHF DENSITY_MATRIX: WFN UHF DENSITY_MATRIX :WFNMP2 W F N is the field that defines the type of wave function used to build this particular density matrix. Several

fields can be included simultaneously in the same name, e.g. OBJECT CLASS NAME:FD1VALUE1:FD2VALUE2...:FDnVALUEn

4. Object management tools The first three functions described below constitute the core of OSIPE. They all return the index of the Object name in array 0 B J N AM and only indirectly the address of the Object (the index in array STACK) (see Section 4.2). Although this procedure may seem inefficient, it is the only way to get the current address of the Object just before calling the algorithmic-routine. In fact, the address can change by the deletion of an intermediate Object (cf. Section 4.7) between the call to g e to b j and the actual use of the Object address in the algorithmic-routine.

F. Cobonna et a!. / Computer Physics Communications 81 (1994) 293—317

302

4.1. putobj: Object creation The pu to b j function, called only by the allocation-routine, creates a new Object. If the Object already exists, it is deleted and then re-created. The function, which has four parameters: syntax: index = putobj (‘Object—name’, Cardinal, ‘Type’, Bytes) example: index U = putobj (‘0’, dimension—O, ‘REAL’, 8) returns the index in array o b j n a in of the Object-name. It allocates space in the array S t a c k for the new Object, initializes it to a standard value chosen to be (—8888.0) and fills the five arrays described in Section 3.1 with the name, the address, the number of elements, the type and the number of bytes by element. 4.2. getobj: Object calling Any Object address can be requested from anywhere. This is done by the function g e to b j, which returns the index in array o b j n am of an existing and properly initialized Object (not containing 8888.0). syntax: index = getobj (‘Object—name’) —

It works according to the following hierarchy: • gets the index of Object-name in array o b j n am. • gets the address as address = obj add (index) if the address is not defined (= 999999, Object not stored in the S t a c k and/or the Object does not exist, then: restores the Object from its “dump-file” (see Section 6); if no “dump-file” is available on disk, switches logical variable I b u i I d to t rue, to activate the building of the Object through the b I do b j function. Therefore, in the allocation-routine code, any call to g e to b j will be followed by a conditional call to b I do b j in this way: — —

-

index

=

getobj

if(Ibuild)

The”if

(‘Object—name’)

index

bldobj

=

(Ibuild) index

=

btdobj

(‘Object—name’)

(‘Ubject—name’)”lineisnotpartofgetobjtolower

the recursivity level of g e t o b j (it is also convenient for another reason, as will be seen in Section 4.4). • If the Object contains only 8888.0’s (Object not initialized) an error has occurred and the program —

stops. 4.3. bldobj: automatic building of Objects The automatic building of an Object (execution of its allocation-routine) is done by the function b Id o b j (‘0 b j e c t name’ In its simplest version, this function is structured as an “e 1 se if” construct using as many “else i f” statements as there are Objects in the program #4 —

else

if call

#4

).

(object—name. ALLOCO

eq.

‘XXX’)

then

(‘object—name’)

In a more sophisticated version written in C by A. Tabary (IBM-France), the routine addresses are stored in a sixth array rtnadd

indexed as objnam and directly executed, to avoid all the “else if” statements.

F. Co!onna eta!. /Computer Physics Communications 81(1994) 293—317

b I do b j is indirectly recursive and since many care has been taken #5

303

FORTRAN compilers are still not recursive, special

4.4. defobj: building Objects by default The situation may arise (usually during the creation of Objects by reading data files), when an Object must not be built by its allocation-routine but initialized to some default value. In this case, instead of calling bldobj the Ibui Id flag calls defobj. d e fob j is a menu similar to b I. d b o j but calls the default-routines instead of the allocation-routines. These routines compute the cardinal of the new Object whenever it is possible or use some reasonably

over-estimated maximum value stored in an associated scalar. 4.5.

cprobj: release of unused memory space

When the cardinal of an Object has been over-evaluated before its creation (when it is read from a

file, for example) it can be reduced to its current dimension by the routine c pro b j. This ensures contiguity of Objects in array S t a c k and minimizes memory occupation. syntax:

call cdpobj

(‘Object—name’, actual—dimension)

4.6. putscx and getscx: Scalar-Objects management

Two types of routines deal with Scalars: Put sc x and gets cx. Where the “x” stands for B, C, I or R depending on the scalar type. Eight different routines (not functions) are used to create a scalar Object with a given value or get back its value: syntax call putscb (‘Scalar—name’, Boolean—value) call putscc (‘Scalar—name’, String—value) call putsci (‘Scalar—name’,Integer—value) call putscr

(‘Scalar—name’,

Real—value)

call getscb (‘Scalar—nanie’, Boolean—value) call getScc (‘Scalar—name’, String—value) call getsci call getscr

(‘Scalar—name’, Integer—value) (‘Scalar—name’, Real—value)

Allocation of default values and automatic building of scalars is done by a function b I d s c I organized similarly to the function b I do b j used to build Array-Objects. 4.7. delobj and freobj: deletion of Objects An Object can be deleted from the S t a c k or freed (i.e. deleted and dumped on disk). In this case the

whole St a c k is shifted (to the left) to remove empty space and all the addresses of the moved Objects

#5

To avoid some spurious effects, the names of the input Objects and their indices are saved in an array of include file recurs.h

indexed according to the recursion level.

F. Cobonna et a!. / Computer Physics Communications 81 (1994) 293—317

304

are re-actualized in array o b j add, minimizing memory usage. syntax; call delobj

(‘Object—name’)

call

(‘Object—name’)

freobj

Complete reorganization of the S t a c k as a consequence of deleting several Objects at the beginning or in the middle of the stack might lead to a considerable increase of overhead. Nevertheless this is not necessarily a serious problem, since the typical case of Object deletion involves the work arrays of an

algorithmic routine. They can usually be created at the very end of the stack and after completion of the algorithmic routine deleted in reverse order. 4.8.

prtobj: printing of Objects

Any Object can be printed using the Pr to b j (‘0 b j e c t — name’) routine. This is a powerful debugging tool. It can be activated from the input stream by the “output” command or used in the FORTRAN code for debugging purposes in association with the “s e t debug” command (see debug-

ging tools in Section 8). 4.9. Modality tools

Two functions, g e t mod and Pu t mod, allow the program to pick up the modality value (see section 3.4) of a given field (g e t mod) or to attribute a given modality value to a given field (pu t mod). Two Objects are associated with each field: The first one, MOD A L I T Y_N AMES: F L D is a character array which contains all the modality-names associated with the field F L D. In the example of Section 3.4, F L D = W F N and therefore the object MODALITY NAMES:WFN contains the three names: RHF, UHF and MP2. The second one, MODALITY_NAMES: FLD SCC CURRENT is a scalar character Object which contains the current modality associated with the field F L D. In our example MODAL I T Y_N A ME S : W F N = _SCC_CURRENT may take the value ‘RH F’, ‘UHF’ or ‘MP2’ depending on the part of the program executing. The following illustrates the way in which modality functions can be used in a program: function getmod: This string function extracts the modality value of field from an Object-name. syntax

: modality—name

example:

modnam

=

getmod (‘Object—name’,

getmod (‘COORX:MOLA’,

=

Field)

‘MOL’)

In this example modnam is extracted from the Object-name COOR X : MOL = A and gives ‘A’. function

putmod:

This function updates the modality value of a field. syntax

: New—Object—name

=

putmod

(‘Object—name’,

‘Field’,

‘New—modality’)

example : newnam

=

putmod

(‘COORX:MOLA’,

‘MOL’,

‘B’)

In this example, putmod has replaced ‘A’ by ‘B’. newman is therefore equal to ‘COUR X : MOL

=

B’.

305

F. Co!onna eta!. /Computer Physics Communications 81(1994) 293—317

The use of these two functions introduces a kind of inheritance between Objects: the modality of a new-Object collected with get mod can be transmitted to the corresponding field of the existing-Objects it is built from, ensuring a correct switch in the dynamic route. function cutmod: This function cuts out the modality field and modality value from a given object name to return the class name and remaining modalities. syntax

: Root—name

example:

rootnam

=

=

cutmod (‘Object—name’,

cutmod

Field)

(‘COORX:MOLA:FLDX’,

‘MOL’)

In this example rootnam is equal to ‘COOR X: FLD=X’. 4.10.

Multiple modalities

There are some Objects which essentially need to be described by two modalities of the same field, simultaneously. For instance, the Objects encountered in intermolecular interaction problems fall into this category. The interaction energy value between molecule A and molecule B is a good example of such a composite scalar Object which can be defined as: INTERACT ENERGY :MOLA’/.B SCR VALUE The multiple modality AZ B indicates that this energy corresponds to the interaction of molecule A with molecule B. The energy of the dimer A + B can be named: ENERGY: MOL

=

ABSCRVALUE

the dimer A + B being considered as the “molecule” A B. The functions g tnt od p and Pt mod p are used in this case of g e t mod and Pu t mod respectively, as in the following examples: syntax: modality

=

GTMODP (‘Object—name’, ‘Field’, Position)

Syntax: new—Object—name

=

PUMODP (‘Object—name’, ‘Field’, Position, ‘new—modal i ty’)

examples: modnam

=

GTMODP (‘INTERACT ENERGY:MOLAXB SCR VALUE’, ‘MOL’, 1)

newnam

=

PTMODP (‘INTERACT ENERGY:MOLAZB SCR VALUE’, ‘MOL’, 1,

‘C’)

In the above examples modnam will give ‘A’ and newnam will be ‘INTERACT:MOLCZB SCR VALUE’.

5.

Disk-file management

5.1. General aspects OSIPE also has its own tools for managing files, eliminating the need for knowing the FORTRAN logical units. All file manipulations (open, close, delete, read, write) use filenames instead of FORTRAN units. A file-name is connected to a FORTRAN logical unit by the routine a 110 C f.

306

F. Cobonna et a!. /Computer Physics Communications 81(1994) 293 —317 syntax

call ALLOCF ( filnam,

!

input

file—name



type,

input

file type (object name)



untype,

input

unit type



status, action,

input input



access,

• •

form, unit,

• •

lerror, lexist)

input !

input output output output

Similar to Objects, information associated with file-names is stored in arrays of several common blocks in include file f i I e s h. These arrays are indexed by the FORTRAN unit number. Array name f I stores the file-name and array f i I t y p stores the file type. The file type is a character string used by the program to define the content of the file. In general, file management is done with the file type name, rather than with the file-name which is arbitrary and not always known in advance. As far as possible, f 11 t yp coincides with the name of the corresponding Object. If the file contains several Objects which can be grouped in a class, f i I ty p is the name of this class, i.e. a sub-string of the Object’s names. To restrict the use of some special units, another character array u n i t yp stores the unit type. There are three unit types: ‘f r e e’ if the unit is not yet connected, ‘1 o c k e d’ if the unit is not connectable (unit 6 for example) or ‘s t ream’ when the unit is connected to a streamed command-file (unit 5 by default). The following are other functions used for file management: get u n I (‘file-type’) returns the unit number of the file of type file-type, clsuni (‘file-type’) closes the file of type file-type, delfil (‘file-name’) deletes the file of name file-name. As a rule the specific routines used to read or write files sequentially call a 1 1 o c f (to open the file), read or write the file and close it. As a consequence, only one unit is usually open at a time (see section 10 for an exception). -

5.2. File organization To avoid misuse of file contents, a file written by OSIPE has two parts: the header and the data. The header contains several items such as, the file-name, the file format, the physical units in which the data are measured, title strings, date, user name etc. When a file is read, these items are stored in the common block of files.h or in scalar Objects and checked for consistency. The header is delimited by its last record: “END OF HEADER”. The data records can be anything and end with “END OF FILE”. To avoid having to worry about file formats when reading files, a special routine c u t f I d has been designed which returns all the fields of given record separated by a given character (for example, a blank). 5.3. Program catalog When a file is closed it is catalogued in the program catalog to keep track of its existence. The program catalog is a set of arrays necessary to store this information. These arrays are stored in common blocks of catalog.h.

F. Co!onna etaL/Computer Physics Communications 81(1994) 293—317

307

6. Dump-files 6.1. Disk-images of Stack From time to time (when the memory size is not large enough) it may be necessary to dump objects on disk. These freed-Objects constitute an “image” of the S t a c k, and each Object is stored as a persistent Object on a special dump — f i I e thus avoiding the use of direct access files. Since a dump f I I e corresponds to only one object, its name is automatically built by concatenation of the current job name (runnam word) and the Object-name. example: H2OSCF DENSITY MATRIX where H2OSCF is the name of the run and DENSITY MATRIX the name of the Object. The dump-file header contains all the information necessary to recover the Object (i.e. to fill the put o b j argument list) and to check the consistency of the data contained in the file and those of the job currently running i.e. if the basis set, or the coordinates etc. are consistent. —

6.2. Restart-file The function get o b j tries to restore at first from disk a non-existing Object and only thereafter calls b Id o b j, therefore, dump-files can be used to restart a job. The only requirement to get an Object from its dump-file, is to have the same run n am for the restore and the dump step. For this purpose, the run n am variable can be modified from the input stream by the command: set run name = (run—name) end end These dumping and restoring facilities are enabled by default, but they can be disabled by the set command: set dump = false end set store = false end

7. Parseii an elementary command language To make program inputs readable and format-free, they are read with the help of an elementary parser, using some of the concepts as implemented for instance in L e x and Y a c c (see Ref. [5] for

comprehensive description). All the commands have the same structure: they begin with the command name and terminate with an “end”. The information is provided to the program by as many affectations of the type “token

=

v a I u e” as necessary, where token is any kind of string chosen by the

programmer. Instead of token a command may include as many sub-commands as desired, each sub-command ends with an “end”. The information is expressed in this way: command token = value token = value sub— command token token end end

=

value

=

value

308

F. Co!onna et a!. /Computer Physics Communications 81(1994) 293—317

The parser can also substitute symbolic items #6 present in the input. All the parser information (current token value, current line content, current cursor position, current stream unit number) are passed to the program through commons of include file pa r s e r h. In connection with substitution of symbolic items a fo r loop facility has also been developed together with a set of tools to manipulate Objects from the input stream (Object copy, deletion, rename, extraction of value(s) as symbolic item). The possibility of streaming command files is also supported. .

8. Debugging tools Interactive debuggers, such as dbx, are very sophisticated and powerful, but the possibilities they offer do not always correspond to the expectations of the programmer. That is why it is a common practice to put some extra w r i t e instructions in their program for debugging purposes. To avoid the insertion and deletion of debugging code, a set of flags (Iocdbg, t ra 1ev, pgmchk, .) gives the developer an opportunity to add debugging code and extra print messages into the program without any impact on the performance. This extra code can be activated or deactivated from the input stream (s e command) without a further compilation step. Moreover, OSIPE has all the tools necessary to obtain full details of the program flow and memory storage. This typically open feature makes debugging of newly written programs extremely easy. From our experience the time spent to debug OSIPE-programs is much shorter than the time spent to write code, which is a situation just opposite to that commonly experienced by programmers. The different debugging tools are described briefly below. •

t

8.1. Debugging a routine It is possible to activate the debugging mode of any routine by typing in the input stream: set debug = ROUTIN end The effect of this command is to activate printing inside the routine, by setting 1 o c d b g to ‘ . t r u e. as shown in the small example below. subroutine • - • fortran

if(locdbg)

ROUTIN code

then

call prtobj end

• •

(‘XXX_YYY’)

if

return end

which results in the following output: RUUTIN> output of array:XXXYYY

#6

>

1

=

2.01600000000000

> >

2 3

=

> >

4 5

=

>

6

=

O•351562500000000E—U1 O.365381649624472E—02 O.738281250000000E—O3 O•210004182213293E—03 O•741169410150891E—04

=

=

Any kind of arithmetic, logical or string operation can be performed on the symbolic items.

‘,

F. Co!onna eta!. /Computer Physics Communications 81 (1994) 293—317

309

Other items of the set command can be used to debug a routine: set tralev = end set silent set prtlev

= =

end end

All routines whose nesting-level is less than t r a I e v are traced, i.e. the following messages are issued on the output: • .

-

entering

routine: routine

routine—name output

exit routine—name the number of dots . - - - is equal to the nesting level of the routine. The s I I en t sub-command turns off tracing messages for the routine defined by rout i n e name. The sub-command pr t I e v enables all printing inside the program according to a level defined from the input. This corresponds inside the code to: —

if(prtlev.ge

.20)

then

print

end if Thus, if pr t I ev has been set to, say, 20 by the set command, the message will be printed. Checking stack boundaries

8.2.

The program can be executed under the check mode with the command: set

check

=

program

end

This activates the execution of the routine c h k b n d, by setting the variable c h k pg m equal to

-

t rue

e.g.

if (chkpgm) call chkbnd (‘Routine—name’) This routine checks whether the HE A DMA R K and the EN DM AR K labels in the S t a c k (see section 4) have not been overwritten. This line of code is usually inserted after each call to an algorithmic-routine. The same check is executed after each call to g e to b j. 8.3. Stack image printing If necessary or desired, an image of the S t a c k content can be printed with the command: show stack end which results in printing all Object names, type, addresses, index, cardinal, boundary, first and last element. This information allows the programmer to make sure that nothing has been destroyed or deteriorated in the S t a c k. The following is an example of the result of a ‘show st a c k’ command: shostk> head mark :HEADMARK >

name:

“ATOM

MASSES’

> creation:DFATMA >261 addr.: >

nb eLem.:

reference: 26 first: 6 Last

:

FREE

end_mark:

END_MARK’•

type:REAL

O.160000000000000D+02

4210000000000000

0.100000000000000D+01

4110000000000000

F. Co!onna et a!. / Computer Physics Communications 81 (1994) 293—317

310 >

name:

>

creation:DFMLNM

‘‘MOLECULE_NAMES’ reference:FREE

>263 addr.:

33

> nb eLem. :

2

> name: >

4120204220200000

Last

B

4120204220200000

:A

63

>

nb eLem. :

6

>

name:

>

creation:DEFSIT

type:CHAR

bytes:3

ATOM TO’’

reference:FREE

>267 addr.:

‘‘END_MARK’’

B

‘‘SITE FIRST BOND

creation:DEFSIT

end_mark:

first:A

end mark:END MARK type:INTE

first:

0

0000000000000000

Last

0

0000000000000000

:

‘‘SITES_CHARGE_EXIST’’

>275 addr.:

reference:FREE

end_mark:

‘‘END_MARK’’

226 first: 5

:

000000010000001

F

000000000000000

>

nb eLem. :

>

name:

>

creation:SETCRD

>

41 addr.:

308

first:B

4220202020202020

>

nb elem. :

1

Last :B

4220202020202020

‘‘MODALITY

Last

NAMES:MOL

type:BOOL

T

SSC CURRENT’’

reference:LOCKED

end_mark:

‘‘END_MARK’’

type:CHAR bytes:1

The head mark is the first item displayed, it must contain the string H E A DMA R K if nothing has been overwritten. The next four lines briefly describe each Object stored. The first line contains the name of the Object (for instance ATOM_MASSES). The second line contains the name of the routine allocation-

routine and that of the routine currently using the Object (F RE E if the Object is not currently used), the end mark of the Object, which must contain EN D_M AR K, followed by its type and, if the Object is a Character array by its number of bytes. The third line displays in sequence: the index in S t a c k (261 for the first Object in the example, i.e. ‘ATOM MASSES’ objnam(261)), the address (26 i.e. address = S ta c k (26)) and the content of the first element in a readable format and in hexadecimal. The fourth line displays the number of elements (6 in this case) and the content of the last element.

9. Automatic transformation of code Dressing and Composi programs An old code written in standard FORTRAN can be quasi-automatically transformed into OSIPE of the type shown in Fig. 2 by a three step procedure schematized as: 1.

old—code

———

Ftax

Fortran 2.

ftax

file

———

code

Dressing

———>

ftax

file

analysis ———>

osi

file

extraction of Objects attribution of a name, a

type and a size

extraction of algorithmic code 3. osi

file

———

Composi

———>

Osipe—code

generation of the allocation—routine encapsulation of algorithmic—routing 4. osipe—code

———

f77

———>

object file

by allocation—routine

F. Colonna et a!. / Computer Physics Communications 81(1994) 293—317

311

F t a x is a public-domain FORTRAN analyser written by D. Taupin (Orsay, France) [6]. It gives a description of the variables used in a given subroutine. This information is treated by the program D r e s s I n g. This FORTRAN program, designed by one of us (L.H.J.), transforms FORTRAN variables from F t a x into new-Objects, existing-Objects, work-Objects and local variables. It analyses the Object dimensions and extracts the non-declarative part of the code (algorithmic part). The Objects are given a name, a type and their size is inferred. This declarative information is used to create the intermediate osi-file. This file contains the OSIPE declaration of the Objects and the algorithmic part of the code. It is treated by the program Comp os I which encapsulates the algorithmic-routine by the allocation-routine. The necessary calls to getobj, bldobj and putobj are automatically implemented as well as the debugging tools. The argument list of the algorithmic-routine is automatically built, all the variables explicitly declared and the arrays exactly dimensioned. The resulting code is compilable FORTRAN. If one wants to write a new piece of code, only the osi-file has to be written by the programmer, i.e. a file containing the Object declarations and the algorithmic code. The program Corn po s I will then generate the FORTRAN source code. Direct transformation of a code to OSIPE standard has been done by one of us (R.A.P.) with MUNGAUSS (Section 10). The main problem one is faced with, is re-structuring code which does not build one Object at a time. However, in a first step, one can let some routines produce several Objects, provided that they have a logical connection. This is acceptable if no contradiction arises. On the other hand it is impossible to produce one object from two different routines as this is controlled by b I do b j. Moreover, if by chance an Object has been allocated but not initialized (not given values), OSIPE will immediately stop the program. If an Object has no allocation-routine, OSIPE will print a message. The question “which Object produces which” is not a trivial question and in many cases it is necessary to exhibit intermediate Objects which were hidden in the existing code. This work is never wasted time, because the re-structured code is improved by being more secure, readable, easier to use in further applications and in some cases more efficient.

10. Mungauss



The closed shell SCF

The OSIPE version of MUNGAUSS represents the first ab initio program to use the OSIPE tools and was developed concurrently with OSIPE. Although the pre-OSIPE version of MUNGAUSS was fairly modular and readable, many major changes were made to meet the OSIPE standard. The closed shell SCF (RHF) will be used to illustrate a feature of OSIPE which has not been presented to this point, that is, how to deal with the two-electron integrals. Dealing with two-electron integrals is an especially difficult problem since they do not fall into one of the object types described in Section 3 (i.e. scalar, matrix or tensor). The number of two-electron integrals can be very large and each two-electron integral is identified by four indices. For that reason, it is generally not possible to keep all the integrals in memory (S t a c k). The OSIPE version of the SCF is similar to the one described in Ref. [7] and will only be briefly described here to illustrate the handling of the two-electron integrals. The main steps in the algorithmic-routine are given is Fig. 5. In STEP 1, both the core Hamiltonian (H) and the E~Gmatrix have been built via the g e to b j (‘I 1 E_T V_A 0’) (routine I 1 E C L c) and getobj (‘I2E2JMKAODELTA’) (routine DGRBLD), respectively. The i~Gmatrix calculation requires the z~ P (guess) matrix and the two-electron integrals. Both these objects are built as a result of the calls to g e to b j s in routine D G RB L D. The dynamic calculation of H and ~G is illustrated in the

312

F. Colonna eta!. /Computer Physics Communications 81(1994) 293—317 Step 1



Step 2 Step 3

Step 4 Step 5 Step 6 Step 7

— —

— — — —

Generate the Fock matrix (F): F

1~,= + Calculate the energy: Eejec = ~tr(H+ F)P. Determine the coefficients (routine DETLIN): convert FC = eSC to F’C’ = cC’; diagonalize F’ to give C’. Construct the density matrix: P = C’C’~. Extrapolate/Interpolate and construct AP (routine CONCLO). Check for convergence. Construct the two-electron correction (AG) to the Fock matrix:

~

=

AP~(4(i~

kl)—(ik I jl)—(ill jk))+ ~~AP~~(2(ij

k,1 k>/ —

kk)—(ik I jk)).

k

go to step I Fig. 5. The main steps in the closed shell SCF routine.

following scheme. The arrow means “created-by”. The name of the allocation-routine is given above the arrow.

10.1 SCF iteration The following scheme illustrates the program behavior during an SCF iteration. Ii ECLC

H4=

I2ECLC DGRBLD

12E

~

DP1BLD

P0 In the above scheme 12E represents all the two-electron integrals. During the SCF cycles the ~1G matrix is calculated (STEP 8) using the algorithmic-routine D 6 RB L 8 directly. 10.2. Two-electron integrals The two-electron integrals are classified into a total of eight different types as given in Fig. 6. Unlike the version described in Ref. [7] the integrals and the indices are stored separately resulting in a total of 22 Objects as shown in Fig. 7. 1. (~JIk1)i>j> k >1 2.(iilkl)i>k>1

5. (iilkk) i> k 6.(ijljj)i>j

3.(ijlkj)i>j>k

7.(iilil)i>1

4.(ijljl)i>.j>!

8.(iijii)

Fig. 6. The eight different types of two-electron integrals.

F. Co!onna et a!. / Computer Physics Communications 81(1994) 293—317

1. i, j, k, ! indices 2. (ill k!) integrals 3. (ik If!) integrals 4. (i! I jk) integrals 5. i, k,! indices 6. (ii k!) integrals 7. (ik ii) integrals 8. i, j, k indices 9. (tj I kj) integrals 10. (ik If]) integrals 11. i, f, I indices

313

12. (till!) integrals 13. (ill jj) integrals 14. i, k indices 15. (ii I kk) integrals 16. (ik ik) integrals 17. i, j indices 18. (if Iii) integrals 19. i,! indices 20. (ii I ii) integrals 21. i index 22. (ii I ii) integrals

Fig. 7. The content of the 22 two-electron integral Objects.

There are a number of advantages in storing the integrals and indices as different Objects, but the main advantage is that each integral type can be dealt with very efficiently without testing for the type

(using specialized routines) [7]. Since it may not always be possible to keep in memory (as a whole) even one of the 22 integral Objects, the two-electron integrals are treated as buffers #7 and distinguished from other OSIPE-Objects by adding the _B U F F suffix to the Object-name, for example: I2EIJKLINDBUFF (mdi ces) 12E_IJKL_VAL_BUFF ((ij IkI) 12E_IKJL_IND_BUFF ((ijikl) 12E ILKJ_IND BUFF ((ijikl)

integrals) integrals)

integrals)

In this case the g e to b j (‘I 2 E_. - - _B U F F’) will check if the integrals exist, if not, build is set to .true. (as for any other object) which will result in a call to b I dab j and therefore to I2ECLC. Therefore, in this case the “Object” in the S t a c k is a buffer of two-electron integrals (or indices). Routine I2ECLC works with all 22 buffers and uses a special write routine to save the buffers to disk (for a conventional SCF). Work is underway to split routine I2ECLC into eight different routines corresponding to the eight types of two-electron integrals. Further splitting of the (i ~ k I) type (the most numerous) into

sub-types is also being investigated as a possibility of parallelizing the calculation of the two-electron integrals. With this approach an SCF can use any combination of conventional, direct or in-core schemes. 11. SOLVENT

-

A generalized SCRF program

SOLVENT is a newly written computer program for performing generalized self-consistent reaction field (GSCRF) calculations [8,9] on subsystems (salutes) embedded in some molecular environment

(solvent) such as a molecular crystal, a solution, or a cage molecule. The charge density of the solute is

#7

To eliminate this somewhat limited buffer concept we are now developing a new integral organization, based on modalities,

allowing a selection of specific integrals resulting in well defined Objects.

314

F. Colonna et al. / Computer Physics Communications 81 (1994) 293—317

obtained by means of some ab initio quantum chemical method, and the environment is approximated by a set of multipoles and polarizabilities. During the calculation the electrostatic interactions between subsystem and environment are optimized until self-consistency is achieved. SOLVENT was developed concurrently with OSIPEd and makes use of its tools in many ways; some of them will be discussed in the following. A more detailed description of the algorithm and the capabilities of the program will be published elsewhere [10]. 11.1. Interfacing with exterior program packages The goal of a calculation with SOLVENT is to find an approximate solution of the non-linear GSCRF equation. The non-linear term in the GSCRF equation is a one-electron operator which at the same time acts on and depends on the charge-density of the solute, a situation comparable to that of the two-electron terms in an SCF. So it seems natural to implement such an approach into some existing SCF code. What, however, if one wishes to study transition states which are made up by several electron configurations, or electron-correlation effects? There is a large variety of ab initio methods for electronic structure determination starting with several open shell Hartree—Fock methods, multi-configurational SCF, the many approximate schemes for taking electron correlation into account such a many-body perturbation theory, configuration interaction, coupled pair and coupled cluster methods and finally full configuration interaction. The number of program packages available is even larger, and although some of them are quite complete one often meets the situation that one wishes to use a special feature which is not included in the particular package. So if one wants to benefit from the work already done the ability to interface easily with several existing codes is mandatory. Thinking in terms of the Objects one needs for a GSCRF calculation, it becomes clear how this can be achieved: The role of the SOLVENT program is to modify the one-electron integrals needed in any of these methods to include the interaction terms with the environment. For this purpose it needs the basis set of the quantum calculation and the electron density matrix. This information very often can be extracted from some intermediary files used in the ab initio packages. The strategy of SOLVENT is to provide a set of interface routines which first read the necessary pieces of information from files and convert them into well-defined Objects stored in S t a c k. Furthermore for modification of the one-electron integrals there exist routines converting the Object MODIFIED ONE ELECTRON INTEGRALS to the specific file format of the ab initio package. In this way there is no need to change the code of the package; one only has to provide the interface routines. For the time being such interface routines are available for the MOLECULE-SWEDEN code [11], where one meets the ideal situation that one- and two-electron integrals are on different files, and for GAUSSIAN 90 [12] and GAUSSIAN 92 [13]. Since MUNGAUSS has already been incorporated in the OSIPE environment all quantum chemical Objects created by this program are readily accessible at any time, making it especially easy to interface with SOLVENT. Besides the need to interface with quantum chemical program packages it is also necessary to interface with Monte Carlo or molecular dynamics codes, since if one wants to study the properties of a solute in a liquid calculations of many geometries of the total system are required. 11.2. Using the for loop facility of the parser Since solving the GSCRF equation requires an iterative approach (start with some electron density, calculate from that the modified one-electron integrals, determine a new electron density, etc.) there must be repeated runs of the ab initio package/SOLVENT pair. This can be achieved, for example, using the for loop facility of a UNIX shell script.

F. Co!onna eta!. /Computer Physics Communications 81(1994) 293—317

315

A better possibility is, however, to use the for loop facility of the OSIPE parser combined with a call of the exterior package through parser (the latter possibility is easily implemented under UNIX employing the function s y s tern which performs a call to the operating system from a FORTRAN program). The necessary SOLVENT commands and the call to the exterior package are included in the for loop. This allows to keep those Objects, which do not change during the SOLVENT iterations, always in the S t a c k array, without the need to write them out and read them in again at the next iteration. The other ones are deleted from S t a c k and re-constructed. Finally it should be mentioned that the facility to print every Object in S t a c k has proven a powerful tool for interpretation of the results of a calculation. For example, the electric field of the solute or the dipole moments induced in the environment are usually Objects of a more intermediate character (as compared to the interaction energy), which nevertheless can be useful to know and are easily accessible by a simple input command, without having to write a specific print routine for them.

12. Discussion Over the past decades a large amount of program packages, often comprising several hundred thousand lines of code, have been developed in the different branches of computational chemistry, such as ab initio or semi-empirical quantum chemistry, density functional theory, molecular modeling, molecular mechanics, etc., almost exclusively in FORTRAN. Many of the underlying algorithms have been thoroughly researched and are highly optimized and specialized for a given application and/or computer architecture. Although for future developments it would be desirable to have all these algorithms available in some modern programming language which makes use of the more sophisticated developments in computer science, this is clearly out of reach for the near future: The contributions to computational chemistry come from a multitude of different groups all over the world whose main

concern has to be the development of more and more realistic and precise scientific models and which, to stay competitive, cannot spend too much time on rewriting old and running programs. Furthermore, it is not at all clear which of the more sophisticated object-oriented programming languages such as C + +, SMALLTALK, EIFFEL, etc., to choose as a common standard, and without a common standard once again the free flow of codes within the community and their reusability would be inhibited. So, to our mind, it is absolutely necessary to have tools which allow for a slow migration, piece by piece, algorithm by algorithm, of old codes towards a more object-oriented program system. This is most easily done within the limits of standard FORTRAN, although these limits at first sight lead to a somewhat cumbersome handling of “objects”. Due to our experience with adapting scientific codes to OSIPE this, however, soon pays off, and the adaptation of the program can really be done in pieces, always having a running code. Once a sufficient amount of code has been restructured in this way one certainly could as a second stage, if desirable, change to another language with the help of automatic translation tools. The concept of global dynamic memory has been realized in many programming languages besides FORTRAN. Its current implementation in OSIPE is very simple and certainly by no means competitive with the more advanced techniques used in these languages. So, for example, we did not pay much attention to the question of how to efficiently delete an object in the middle of the S t a c k. Problems of this kind arise, for instance, during the course of an iterative algorithm, where at a certain point the current, an updated object has to replace a previous, no longer needed one. Questions of this type are under study now and will be taken into account in future releases of OSIPE. The current version, however, already eliminates some of the main obstacles of FORTRAN.

316

F. Colonna et al. /Computer Physics Communications 81 (1994) 293—317

13. OSIPE group OSIPE has been designed to allow developers to exchange their code. More precisely, a developer using OSIPE can use what has been already done by other people without entering into the computational details of other codes: OSIPE provides them with intelligible Object names and guarantees the unicity of their content. OSIPE codes belong to their authors, they can be copyrighted or not, it does not matter: they all produce Objects accessible for any usage outside the original code. Up to now the codes available under OSIPE are the following: • MUNGAUSS V1.0 Author: R.A. Poirier, Memorial University of Newfoundland, St. John’s, Newfoundland, Canada A1B 3X7. E-mail: [email protected]. Content: this code is the OSIPE version of MUNGAUSS. Status: copyrighted. • POLFIT Author: F. Colonna, Université de Paris VI, France. E-mail: [email protected]. Content: fit of empirical site-site potentials (electrostatic, Lennard-Jones) from ab initio data. Status: free. • SAPTH Author: 0. Hess, IBM France. E-mail: [email protected]. Content: molecular interaction energies using symmetry adapted perturbation theory. Status: free. • SOLVENT Author: J.G. Angyán, Université de Nancy, France. E-mail: [email protected]. Author: G. Jansen, Université de Nancy, France. E-mail: [email protected]. Content: solvent effects using microscopic solvent description. Status: free.

Acknowledgements The authors thank the Groupement Scientifique “Modelisation Moleculaire” IBM-CNRS for providing computer facilities and technical assistance. We are particularly grateful to the IBM-France team (G. Galleyrand, 0. Hess, J. Quincy and A. Tabary) for their help. We wish to acknowledge the following people for their suggestions and support: 0. Ciccotti (CECAM, Lyon, France), A. Desnoyers (Université de Paris VII, France), M. Dupuis (IBM, Kingston, USA), E.M. Evleth (Université de Paris VI, France), J.P. Flament (Ecole Polytechnique, Palaiseau, France), Yves-Henri Sanejouand (Université de Toulouse, France), W. Saurin (Institut Pasteur, Paris, France), and J.M. Teuler (IDRIS, Orsay, France). J.G.A. is indebted to the Alexander von Humboldt Foundation for the fellowship, which made possible his stay in

F. Co!onna eta!. /Computer Physics Communications 81

(1994)

293—317

317

Bonn. R.A.P. gratefully acknowledges the continued support of the Natural Sciences and Engineering Research Council of Canada. G.J. is indebted to the Alexander von Humboldt Foundation and to the CNRS for a Feudor—Lynen fellowship.

References [1] B. Meyer, Object-Oriented Software Construction (Prentice-Hall, London, 1988). [2] B. Stroustrup, The C+ + Programming Language (Addison—Wesley, New York, 1986). [3] B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, DJ. States, S. Swaminathan and M. Karplus, J. Comput. Chem. 4 (1983) 187. [4] AT. Brunger, Xplor, Yale University. [5] J.R. Levine, T. Mason and D. Brown, Lex and Yacc (O’Reilly and Associates, Sebastopol, CA, 1990). [6] D. Taupin, FTAX: Fortran Analyzer, CNRS, (Orsay, France, 1992). [7] R.A. Poirier and CD. Keefe, J. Mol. Struct. (Theochem) 234 (1992) 19. [8] 0. Tapia, J. Mol. Struct. (Theochem) 226 (1991) 59. [9] J.G. Angyán, J. Math. Chem. 10 (1992) 93. [10] J.G. Angyán, F. Colonna and G. Jansen, in preparation. [11] J. Almlöf, C.W. Bauschlicher, D.P. Chong, M.R.A. Blomberg, A. Heiberg, S.R. Langhoff, P.-A. Malmqvist, A.P. Rendell, B.O. Roos, P.E.M. Siegbahn and P.R. Taylor, MOLECULE-SWEDEN: an electronic structure program system, Lund University, Sweden (1990). [12] MJ. Frisch, G.W. Trucks M. Head-Gordon, J.B. Foresoman, H.B. Schlegel, K. Raghavachari, M. Robb, J.S. Binkley, C. Gonzalez, D.J. Defrees, D.J. Fox, R.A. Whiteside, R. Seeger, C.F. Melius, J. Baker, R.L. Martin, L.R. Kahn, J.J.P. Stewart, S. Topiol and J.A. Pople, Gaussian 90, Revision H, Gaussian, Inc., Pittsburgh PA (1990). [13] M.J. Frisch, G.W. Trucks, M. Head-Gordon, P.M.W. Gill, MW. Wong, J.B. Foresman, B.G. Johnson, H.B. Schlegel, M.B. Robb, E.S. Replogle, R. Gomperts, J.L. Andres, K. Raghavachari, J.S. Binkley, C. Gonzalez, R.L. Martin, D.J. Fox, DJ. Defrees, J. Baker, J.J.P. Stewart and J.A. Pople, Gaussian 92, Revision Dl, Gaussian, Inc., Pittsburgh PA (1992).