FDLVP,
A NEW LANGUP6E FOR SYSTEM SOFTWARE PRCXlRJVv'MING
G. Musstopf Scientific Control Systems Ltd & Co GmbH Hamburg, Germany
ABSTRACT System software is mostly programmed in an assembler language . This is e s pecially true of real time operating systems for process control . The problem oriented language POLYP (froblem Qriented ~anguage for sIstem software Programming) was conc eived to aid in the rationalization of the design , programming and maintenance of systems software . The design of the language was strongly influenced by considerations of: Hardware independence Ease of compilation Ease of programming and testing Efficiency of the object program The language contains extensive facilities for table d 7finition and table handling and for the applicat~on of address variables . This aids in the production of efficient object code . Fur thermore , POLYP contains extensive facilities for storage organisation and management which also allow inte r pr ogram communication and program interaction. By the use of so called control language , it is pos sible within certain limits to adapt the language to special problems without having to change the compi ler . 1 . INTRODUCTION The initiative to develop POLYP (Problem Oriented ~anguage for sIstem software frog;amming) - came from a question put during a discussion in the early phase of a project meeting. It was asked "Must this program- package really be ~ogrammed in Assembler language?" . The installation in question had only a simple Assembler without macro- generator as programming support . The instruction set of the machine was very small . The volume of programming promised to be very extensive . The operating system and its monitor lacked the properties which are normally considered necessary for the rational development and testing of such large program complexes . Furthermore , core storage was scarce and the processing speed of the central unit hardly sufficient for the problem concerned . The situation was typical for real time applications and unfortunately only too well known to programmers in this field .
508
The question which arose at the beginning of the development of POLYP was : Is it possible to construct a problem oriented language for the production of system software and similar program packages which enables the generation of efficient object code without extensive optimization? The object code should be efficient in two ways , with respect to execution time and with respect to storage requirement . As in the case of assembler programming , it should be possible for the programmer to choose which of these two alternatives is of primary importance for a given part of his program . Before looking for an answer to this question it is useful to consider causes of inefficiency in object programs generated from such everyday languages as ALGOL 60 or FORTRAN . It was established that: 1/1 . 1be compiler can only recognize the static structure of a program . Recognition of the dynamic structure is not possible . This restricts the number of optimizations possible if the programmer has no means of communicating the dynamic flow of his program to the compiler . 1/2 . The usual declarations of identifiers is insufficient . It must be possible to produce a better layout of storage . 1/3 . Statements which interrupt the sequential flow of a program are also causes of inefficiency in the object program . 1/4 . Facilities for operating with single bits and with storage addresses are too scarce . 1/5. The demand for compatibility in the sense that only recompilation is necessary in order to transfer a program for one machine to a different one often leads to difficulties in writing the compiler or to decreased efficiency on the second machine . One of the results of these considerations was the recognition of the fact that nothing would be gained by simply extending existing languages and their compilers to meet the necessary requirements . Languages such as PL360 (1) or PS440 (2) , on the
2. LANGUAGE
other hand, show clearly that it is possible at the machine oriented level to produce system software more rationally. Both languages, as their names imply, are however only suitable for use with a given hardware (IBM /360 and TR 440). A new question presents itself: Is it possible to obtain the required language by simplifying an exisiting problem oriented language and then extending the simplified language with new elements? First Thoughts were that it would seem possible subject to the following restrictions:
This section is restricted to a description of the more outstanding properties of POLYP, whereby the control language and the problem language are discussed separately. The control language serves to rationalize programming and allow greater legibility of the source program. 2.1 Problem Language POLYP has in addition to the normal information types of integer, real and character, the types logical, address, label and fixed point. A variable declared as logical with length 1 is identical to the Boolean of ALGOL 60 (for example). Types label and address are explained later. The complete declaration of a variable must always include its length. The unit of length is normally one bit and for character variables one character (because of 6 or 8 bit representations).
2/1. The programmer must appreciate the structure and mode of operation of computers. He must also understand the special properties of the machine on which he is working. 2/2. Source programs must be modified upon transfer from one machine to another (different) machine. It should however be possible to take advantage of the hardware properties of the object machine in question.
Information groups are defined as arrays, tables and queues. An array may only contain Information units of the same type and length whereas in a table the information units in a row may be of different types and lengths. Within a column of the table, however, type and length must be the same for all rows. As a consequence of this only the rows of a table can be referenced by index . Columns are referenced by CID (Column IDentifier). CID names are defined by-the programmer in the table declaration. The structure of a table row is defined in a tformat(table format) declaration. Each table declaration contains a reference to a tformat declaration. The following abbreviations are used in format definitions:
The difficulties which these properties of the language present to the programmer can be largely alleviated by the introduction of a control language. This control language is based on a subset of the problem language and offers programming aids similar to conditional assembly and macros in machine oriented languages. During the initial development phase the question arose whether real time application programs as well as system software (such as real time operating systems) could be programmed in POLYP. The question was considered to be of secondary importance but arose again in a basic study (autumn 1970)of real time programming languages (3). This study led essentially to the following results:
I R F C L A T E
3/1. POLYP is a GECOL level language (GEneral Computer Oriented Language). The GECOL level Is betwee; the machine and problem oriented levels. However, it fluctuates so, that some elements of the GECOL level are close to the machine oriented level whereas others are close to the problem oriented level. 3/2. POLYP permits the programming of real time problems. As mentioned above, relatively high demands are set on the programmer. 3/3 . Because of the control language , POLYP can be used as base language for the co~struction of real time programming languages ..The transi tion from the real time language to POLYP is achieved bysubstitut ion and from POLYP to machine oriented language by normal compilation.
integer real fixed point (scaled) characte:logical address target (label) empty boundary
The di" :',- _', nce between A and T is that an addr ess r e:ers (directly or indirectly) only to data ,,-h;,reas a target can refer only to a labelled program statement . E and B enable the positioning of information in storage and multiple naming of bits and bit groups. The following example should help to clarify this: tformat
Especially the last point mentioned has far-reachirg consequences since the number of compilers is thereby diminished and the modification of real time languages, i.e. adaption to special problems, is possible without change to the compiler. The only existing language to meet the requirements of the GECOL level at least approximately is CORAL 66 (4). The main deficiencies of this language are the insufficient declarations and character operators and the weakness of the control language (only macros as co py function).
509
FORNAME ( B(32,0), 21 16, L8, 124, E4, 4L, E( -4 ), L4, E( - 8), 18, A24, 2c6);
TNAME ( 0:20, FORNAME, N0 1, N03, FLAX, N09, BOA, BOB, BOC, BOD, FLAY, BIND, POINT , TEXAS, TEXTIL);
The element B(32,O} specifies that the beginning address of each row should be a multiple of 32. A replicator can be written in front of the type declaration which in turn can be followed by the information length (default=1). Following the integer of length 24 in the above example is an 8-bit unit which should be considered together with its CID. These bits have relative bit addresses of 64-71 within the row. The bits 68-71 can be referenced individually by BOA, BOB, BOC and BOD or together by FLAX. All 8 bits can be referenced as a unit with the name BIND which could be used in arithmetic operations or as an index.
the form: label ; Only labels thus defined may appear on the left hand side of assignment statements, whereas both implicit and explicit labels may appear to the right (no expressions, however).
label ELAB, ETA;
In this simple example it can be clearly seen that each column identifier is provided with a type, an information length, and a bit address relative to the beginning of the row. In special cases these declarations could be used for variables and for a better representation of procedure parameters. Such detailed definitions allow a program to be very easily adapted to the structure of another computer where the compile ' time and execution time machines are not the same.
LAB:
if A < B
~
begin Z : = A + E/C; ET-AB := ETA; end'
--'
Apart from simple variables a program statement may also have subscripted variables as operands. A subscripted variable denotes an array or table element the latter being identified by:
IMLAB: Y := X + (A + Z}/F;
table identifier (row number, column identifier) A column identifier (CID) can also be used together with an address variable (AV) to form an indirect variable: AV
ETA .- IMPLI
CID
B9.....!:.'?
The address variable must in this case contain the beginning address of a row. As already mentioned the relative bit address, the information type and information length are defined by the CID-declaration. The assignment of addresses to address variables is achieved using standard functions. The address variable can also be used for address calculation in arithmetic expressions and assignments. Normally it is allowed to write assignments as operands of program statements:
IMPLI: Z : = X + Y; The basic form can be extended to include label array a~d label elements (T) in tables.
Z : = A + (X := B + SIN(C)} + D*(E + Fix};
The operand L in the goto statement
This provides a simple means of allowing the programmer to optimize his own program. The above example shows that ALGOL 60 notation is used for assignments.
B9.....!:.'?
:=
L;
may be an impllclt or explicit label or a subscripted variable which must than be of type label.
The label concept of a language can have a great influence on the structure of the generated object code. The content of a label declared variable, as already mentioned, may only make reference to a labelled program statement . A distinction is made between implicity and explicity declared labels. Implicit labels are declared by their appearance
L : Y
ELAB;
Only the simple form: if then ; of if-statement is allowed. Statement brackets for compound statements are the delimiters begin and end. Thus other forms of the if-statement (v. ALGOL 60, PL/I) can be constructed using compound and goto statements:
B + C;
in the program. An explicit label declaration has
510
if A < B then begin X : = B + C;
A distinction is made between statement procedures and function procedures, the latter being of type real, integer, or logical. There is no restriction on the number of allowable parameters. 8uch additional procedure attributes as reenterable or recursive must be explicitly specified in the declaration.
Z := A + KM P;
goto L; end'
--' X := B - C; Y := A - L/Q; L:
The explicit declaration of normal standard procedures is not necessary (additional entries in compiler lists). The object code for these routines should be available in a program library. In the same way it is possible to call system (or supervisor) routines. 7he differentiation between normal standard procedure calls and system (or supervisor) routines is necessary because many machines make a distinction between problem state and system state (supervisor state). It is thus possible to exclude I/O operations as language elements. Experience shows that it is nearly always necessary to modify calls to I/O routines in a program written in a problem oriented language when the program is transferred from one machine type to another. As opposed to this the POLYP concept has few disadvantages and allows use to made of all special prope~ties of the operating system in question. This advantage becomes apparant especially in the real time field. Procedure parameters are processed on the basis of call by reference. Expressions as call parameters are not allowed. The effect which this restriction has on the programming of calls is not so large as at first might be expected since address variables and address arrays are also allowed as call parameters. The use of address arrays as subroutine parameters is often found in assembler programs. This use of address arrays provides an indirect method of passing a variable number of parameters to a procedure.
All six relational operators:
are allowed in Boolean expressions. Not only numerical but also character types are allowed as operands. If 81 and 82 are of type character then the result of the comparison 81 < 82 depends upon the internal collating sequence of the character set being used. The necessity for such operations is one of the reasons for the restriction 2/2 in the introduction. Nearly all problem oriented languages provide loop statements for the description of the controlled repetition of program statement sequences. Upon examination of the optimizations implemented in compilers (e .g. ALGOL 60, PL/I) for loop statement s and the processing of subscripted variables within loops, it is found that in spite of the large. effort invested the desired result is not always achleved. The degree to which such optimizations can be successfully carried out also depends largely on the hardware structure of the machine on which the target programs are to run. The classification of loop statements into various types is no solution to the problem since the classification is itself hardware dependent. These arguments led to the decision to exclude loop statements from POLYP. The following indirect forms, however, are allowed:
The structure of data storage and the transfer of information between programs is of great importance. The block structure of ALGOL 60 was taken as a basis. All identifiers must be declared. Default lengths can be defined for the individual information types in order to reduce the coding effort.
A(I,") : = 0; B(I , "") := A( ... , K) + C( .. , L); These statements can be rewritten ln ALGOL 60 as follows, assuming that the indices of the above arrays are defined from 1 to 100:
The local storage of a block can be overlapped with the local storage of an parallel block or be reserved at a lower block level. In all cases, however, the corresponding identifiers are only valid in the block in which they are declared. In order to extend the possibilities of transfer of information between programs, two delimiters external and entry are introduced. These delimiters should only be used in the outermost block of a program. The declaration:
for LV : = 1 step 1 until 100 do MI, LV ] := 0; for LV : = 1 step 1 until 100 do BLI, LV]: = A[ LV, K] + C[LV, L); A further language element which is often required by the programmer and which mostly leads to inefficient object code is the procedure call and the processing of the call parameters . Onc e again the quality of the object code generated depends also on the hardware structure of the target machine. In developing the POLYP procedure concept it was attempted to choose a structure which would allow the generation of efficient object programs. If this should not be possible directly it should at least be possiblr. by the use of certain programming techniques.
entry integer
I, K;
has the effect of reserving storage for I and K and of making those two identifiers available to the system. This means that another program can have access to I and K by declaring: external integer
511
I, K;
A program which consists solely of entry declarations defines a common area. Whether dynamic arrays should be included as part of the block structure concept depends on the problem and machine concerned. In most cases, depending on the hardware of the machine, dynamic arrays bring about a decided reduction in object program efficiency.
x ,Z
A + 2.178 282*Y;
,- T + SIN(0.017 453*BETA);
2.2 Control Language The purpose of the cont rol language for POLYP is to provide an aid to rationalize the writing of source programs. This lanzuage enables the modification and preparation of programs in the sense of conditional assembly and macro-calls at the assembler level. Due to the high level of POLYP (as opposed to assembler) it is relatively easy to construct such a control language. The language consists of a subset of the POLYP problem language. Elements of the control language are identified by the delimiter £2£trol.
The most important elements of the language are control if, ,control goto and control 'procedure. The declarat10n of a control procedure corresponds to a macro definition and the call of the procedure to a macro call. Parameters can be passed to the control procedure which cause modification of the procedure body by substitution. Of special intere~t is the use of control PE£~ ~ogether w1th a control procedure library. A sub-11brary could for example contain frequently needed functions. The following is an example of two control procedures for loop organ isati on:
The following example shows the way 1n which the control language is used:
control
~
control procedure SILOBEG (AV, STEP, NO); control address AV; control integer STEP, NO; control begin address AVLAST; begin AVLAST := AV + N~STEP; control end;
E, PI, UMGB;
control procedure SI LEND (AV, STEP, LAB); control address AV; control label LAB; control integer STEP; E
:= 2 .71 8 282;
control begin if (AV : = AV + STEPT) S AV LAST ; ~ B!2...!£ LAB; -
PI := 3.141 593;
~;
control end; Using the control procedures:
UMGB .- PI/180;
x ,-
SILO BEG (ADDR , LENGTH, 20);
A + E",y;
Z := T +
START:
SIN(UMGB ~ BETA);
X := A + ADDR:CID; which after "execution of the control program" becomes:
SILEND (ADDR, LENGTH, START); becomes:
512
rience in using the language. At the end of the phase the new compiler is translated using the pilot compiler. begin
address
AVLAST
.=
AVLAST;
Phase IV
Extension of the subset compiler to the full compiler.
Phase V
Splitting of the ~enerative compiler into the machine independent and machine dependent parts.
ADDR + 2o.LENGTH;
START:
X := A + ADDR:CID;
The main aim of the development is to produce a working preversion of POLYP so that experience can be gained as soon as possible. The pilot compiler will be ready in the middle of 1971.
2, if (ADDR : = ADDR + LENGTH) ~ AVLAST then E.£...i2. START; end;
3. POLYP compiler The complete POLYP-compiler consists of two compilers.The first compiler proc esses the control language in an interpretative fashion. Elements not belonging to the control language are transferred as character strings. The output from this first phase is the actual source program from which the second compiler generates the object program. The target language of the second compilation should normally be an assembler or macro-assembler language.
The POLYP language, as opposed to assembler, provides a more rational means of programming such system software as operating systems, compilers, etc. This is achieved with no essential loss of efficiency in the object program. This is especially true for computers with a small instruction set. Programming in POLYP sets higher demands on the qualification of the programmer than a problem oriented language such as FORTRAN. In a later development stage it must be investigated whether POLYP is suitable as a base language for the construction of real time oriented languages which are easier to use by the programmer. REFERENCES (1) N. Wirth, PL 360, a programming language for the 360 computors. Journ. ACM (1968) p.37-7 4 . (2) G. Goos, K. Lagally, G. Sapper, PS 440
Eine niedere Programmiersprache. Bericht 7002 RZ der TH ~funchen.
Since the control language is a subset of the problem language the same syntax checking and decoding procedures can be used in both compilers. In a later development phase the generative (second) compiler is split into a machine independent and a machine dependent part. This provides further ra-tionalization in the production of software. By changing the machine dependent part, object programs can be generated for different machines.
F.L.T. Clark, J. Gohlke, H. Hotes, G. MuEtopf, Real-Time Programmiersprachen, GrundsatzStudie. SCS Schriftenreihe Heft 2 (1971). ( 4 ) Inter-establishment committee for computor
applications, Official Definition of CORAL 66 (1970) .
4. Development phases To begin with, POLYP is being used onl y internally so that the experience gained in working with the language can be used to modify and improve it. The actual development phases are: Phase I
Definition of a POLYP subset.
Phase 11
Production of a pilot compiler for the subset. This compiler is written in FORTRAN IV.
Phase III
Rewriting of the pilot compiler from FORTRAN IV int o subse t - ~O LYP. This work can be carried out partly- pa!'allel to phase Ill. It provides the first expe-
Conclusions
513
Discussion on Paper XIII-4 by G. Musstopf F. Vlietstra, Netherlands: Could the control part not have been solved more efficiently by initialization at compile time and the addition of a powerful 1 first-pass MACRO pre-processor? G. Musstopf: The first part of the whole compiler, the so-called interpretative compiler is a "powerful first-pass MACRO pre-processor". The important thing in this area is, we use for problem language and control-language the same syntax. Thereby,-rt is possible to use the method for the decomposition of problem and control languages. B. Kruger, German Federal Republic: You said that the normal call is not too efficient, so I think you have a special technique in calling a procedure. Take the following for example: You have a global field F. If you want to work with one field - element F(I) in a procedure, can you check in this case the upper and lower boundaries of the field? G. Musstopf: It is allowed to use address variables as procedure-parameters. In this way you get a more efficient - an assembler like parameter handling. But, if you use address variables for instance in your example, you have to pay programmer responsibility for this higher object-program-efficiency.
514