Incremental Compilation and Conversational Interpretation
M. BERTHAUD IBM France, Grenoble Scient$c
Centre
and M. GRIFFITHS Universitt de Grenoble, France
1. Introduction In a previous paper [I], a bird’seye view was given of certain of the ideas involved in the incremental compilation of ALGOL 60 171or PL/I [81. The space allocated by I.F.I.P. did not allow sufficient development of these ideas, as was pointed out by the reviewer of the paper [2]. Apart from necessary expansions of ideas already put forward, further information is now available as a result of the implementation of both languages using the algorithms suggested. Rough measures of performance have been obtained from comparisons of the time taken to interpret particular programs under the incremental system with the time taken to execute the code produced by the standard OS/360 compiler. As expected, these results vary considerably with the type of statement examined. An overall measure of the gains and losses will take much more time, since this can only come from user experience of the system over a period of months, and this only after the users themselves have become proficient. For the ALGOL project, a community of students and research workers should be the first to reach this state. The main technical interest in the projects lies in the interpretation techniques, which are heavily affected by the definition of incrementalism. If a programmer wishes to change one or more increments of an existing segment, the compiler does not consider the contents of any other increments when recompiling the new ones. Since there are no restrictions on the increments which can be altered, this means that the compiler uses no cross-increment information. In particular, uses of identifiers cannot be associated with their declarations until the segment is executed. The increment used is at most one statement or declaration, and in certain cases the increment is less than a statement. In view of the technical emphasis on the interpretation stage, the externalspecifications and the generator are reviewed briefly in the following two chapters. The rest of the paper contains details of the methods used to store data and to evaluate program and data. The intermediate pseudo-code used for ALGOL 60 is given in an appendix. Although the implementations were carried out on S/360, the description of techniques makes no mention of the particular machine. B
95
M. Berthaud and 44. GriJiths
96
2. User’s View of the System The executable unit is a segment, which is itself made up of increments. The concept of program is somewhat weakened, since segments can call each other freely without prior warning being given to the system. A segment is inserted into the system by: segment segmentname 1 2
.. .
3
ii’
endsegment
The segment name is an identifier. The integers are assigned by the system to the increments of the segment. The contents of each line are treated as successive items; if more than one increment occurs on a given line, the increment number is augmented accordingly. For example, in ALGOL 60:
1 3 6
segment algol 1 ; begin real x, y ; m :x :=l;y:=x+l; ..
begin, declaration, label and assignment are all increments. Each increment is analysed syntactically as soon as the line is received. If the increment is syntactically incorrect, a message is printed and the increment number repeated until a syntactically legal increment is inserted. At the end of the segment a global check is carried out on the structure of the segment, for example to confirm that begin-end matching is correct. This type of error is not definitive, since it depends on inter-increment effects, which can be changed dynamically. Any existing segment can be executed (interpreted) by execute segmentname. If the segment calls other segments, they will also be interpreted at the point at which they are called. Names are local to their segment, so that association of names across segments is done by the programmer using parameters. These parameters are by lexicographical replacement . For example : segment seg 1; 1 5 10 11 1 3 10 20
;‘:‘=b+c; nonlocal **a; endsegment segment seg 2 ; begin real x ; .. .
include seg 1 using x ; endsegment *--
Incremental Compilation and Conversational Interpretation
97
The nonlocal at the end of seg 1 indicates that a is to be replaced by an object declared in the calling segment. The word include is a call of a segment from another. In the example, seg 2 calls seg 1 with x as parameter. The replacement of a by x is a strict name association, a taking the type of x and referring to the same object. The interpretation of a segment may temporarily cease for one of the following reasons : The statement wait was encountered in the segment. The attention button is pressed at the user console. An error is found. In this state, the programmer may type language statements or editing statements for immediate execution. After each such statement the system returns to the wait state, unless a goto or continue are obeyed. The statement continue is used to leave the wait state in sequential operation. Integers are allowed following goto, but are interpreted as increment numbers in the same segment. The goto is legal if and only if it would be legal to jump to the nonexistent label at the destination. The editing facilities are available during the reception of a segment as well as during interpretation. They are limited to the addition, replacement and elimination of whole increments or sequences of increments and new increments are accepted in the same way as the increments of the original segment. This short overview of the external specifications, which will be completely described in the users’ manual, should serve to set the scene for the technical details which follow.
3. Generation Given the original restriction that the compiler must be incremental, it is evident that the generative part of the system is very limited. The transformation of the original text is essentially little more than a recoding, together with syntax analysis on a strictly local scale. The degree of transformation can best be judged by studying the code output from the ALGOL 60 generator, which is given in the appendix. The process of transformation is very rapid. A top-down, deterministic syntax is used, which was produced using a grammar transforming program [3] which stems from work by Foster [4] and Knuth [5]. Semantic functions are called from the syntax to produce the required output. The axiom of the analyser is a single increment and thus the generator is a routine which is called whenever a new increment needs to be treated either when entering a segment at the console or as a result of an edition. The pseudo-code produced by the generator for successive increments is placed in a buffer, which is transferred to the disc when filled. Because of edition, the order of arrival of increments is not necessarily their true lexicographical order. The pseudo-code is nevertheless left in the order in which it arrives, possibly wasting space since some of it may no longer be accessible, and the chaining between increments is done by means of a control dictionary. This contains one entry per increment, and is a summary of the state of a segment. Entries are arranged by increment number, and contain the number of the lexicographically following increment, the type of the increment, and a pointer to its pseudo-code. If a segment is entered sequentially, the lexicographically following increment has the next number, but after editing this is not so.
M. Berthaud and M. Grijiths
98
For example : segment seg 1 ; 1 begin real x ; x:==y 3 10 10 11 11
*** after 2 integer y ; endedit
The user arrives at increment 10 before realising that he wished to declare y as well as x. He can in fact write this declaration anywhere in the block, but let us suppose that he wants it to be after real x ;. after 2 is an editing command which inserts successive increments until another command is reached. As a result the increment number 10 is considered to follow the increment number 2. The control dictionary will be:
Begin
2
Rsal
IO
Assign
1 -To
pseudo-code
4
...
...
integer
3
A further advantage of this control dictionary will be seen when we consider the interpretation of pseudo-code; it contains the structure of the segment, and it is possible to decide which statement to consider next during execution without consulting the psuedo-code. The form of the pseudo-code is important. It must be compact, interpretable and symmetric, that is that the original text can be deduced from the pseudo-code (perhaps losing layout features). It is normally in functional notation, apart from the expressions, which are in reverse Polish. A normal segment fits into a pseudo-code buffer of 1024 bytes. Segments are limited to 256 increments and 128 identifiers. These limitations allow generation always to take place in a standard data zone of 4096 bytes/terminal. The ALGOL 60 language as normally implemented (that is no integer labels, no dynamic own arrays, fully specified parameters) is accepted, with local style input/output. For PL/I, more restrictions have been made, since we have no ambitions in the field of producing subroutine libraries. Let us consider a particular segment and its corresponding pseudocode and dictionaries:
1 3 4 5
segment example of generation; begin real a, b; integer array i[l : 100,l : 21 ; integer p ; for p : = 1 step 1 until 100 do
Incremental Compilation and Conversational Interpretation 6 8
99
begin read (a, b) ; i[p, 21 : = a + b ; i[p, 21 : = a * b
9 10 11
end end endsegment
12
After reception of the segment, an identifier dictionary, a control dictionary and a buffer of pseudo-code have been produced :
1
IDENTIFIER
A B I P READ
1 2
3 4 5
DICTIONARY
The unfilled spaces contain the internal representation
1
l-Y@
2 3 4 5 6 7 8 9 10 11 12
real int 811 illt for begin
2 3 4 5 6 7 8 9 10 11 12 0
Pointers to pseudocode
1 byte
2 bytes
procst ass ass end end endseg
1 byte
CONTROL
DICTIONARY
of blank.
100
M. Berthaud and M. Gr#iths
The pseudo-code is given below, with comments in brackets and starting a new line for each increment : (none for begin) 2 (number of identifiers) 1 2 (their codes) 1 (number of groups with bounds) 1 (number of identifiers with same bounds) 3 (name of identifier) 2 (number of dimensions) 3 1 9 3 100 9 3 1 9 3 2 9 (four expressions). 14 1 (simple var) 4 3 (type of for-group) 3 1 9 3 1 9 3 100 9 (three expressions) (none for begin) 5 2 (number of parameters) 1 1 9 1 2 9 (two expressions) 1 (number of left-hand sides) 2 (subscripted variable) 3 (name) 2 (number of subscripts) 1 4 9 3 1 9 (two expressions) 1 1 1 2 17 (plus) (right-hand side in reverse polish) 9 1 2 3 2 1 4 9 3 2 9 1 1 1 2 19 (times) 9 (none for end end endseg) This example should give the flavour of the transformation
made by the generator.
4. Interpretation The main elements used by the interpreter are the control dictionary and pseudo-code provided by the generator, a symbol table, and a stack. The symbol table has one entry for each name used in the segment being interpreted, and this entry points at (contains the displacement of) the most recent occurrence of a value of the corresponding name. Values are kept in the stack, along with a great deal of other necessary information, and hence pointers to values are displacements from the stack base. Obviously, the symbol table is used to perform the association of references to names with their declarations. One of the effects of incrementalism is to complicate this process, for which the algorithms will be given in later chapters. In particular, name association must be done by the interpreter during a second pass of the source text, the first being taken up by the dynamic exedution of the declarations. We will show that, for PL/I, the execution of the declarations is itself a multi-pass process. The existence of the control dictionary is particularly important in the control of the different passes. In fact, each pass is not a pass of the source text, but of the control dictionary, and is usually limited to one block. Since the control dictionary contains the type and the order of the increments, it contains implicitly the structure of the program. It is thus possible to select the next increment to be obeyed dynamically in function only of the control dictionary, thus avoiding unnecessary references to the pseudo-code. Since the pseudo-code is paged on the disk, this can lead to important savings of system time in some cases. The interpreters themselves are written in the OS/360 assembly language, which has been augmented by macro-definitions [6] to allow the use of recursive procedures, conditions, loops and expressions in a form which resembles ALGOL 60. It was originally thought that the recursive procedures would allow the interpreter to model directly the recursive structure of the languages.
Incremental Compilation and Conversational Interpretation
101
For example, we would like to treat a compound statement as: ROUTINE COMPOUNDST WHILE B, STATTYPE, NE, END CALL OBEYSTATEMENT CALL GETNEXTSTATEMENT ENDWHILE RETURN If one of the statements inside the compound statement is itself compound, this leads to a harmless recursive call of COMPOUNDST. Unfortunately this technique does not work, since if a goto is obeyed which leads to a non-local label the recursion must be broken. It is easy enough to change the level of the data stack, and this is done as it is necessary, but the stack of return points controlling the interpreter routines is a different matter. This problem is more fully treated in the next chapter. This difficulty is typical of the difference between compilation and interpretation, and it is due to the fact that the decision of what to do is not treated independently of the execution of the action. It has been suggested that interpretation is easier for the compiler-writer than is compilation. In our view this is only true for trivial languages, the opposite being true for heavily-structured languages.
5. Structure of the Stack We have already seen that the entry in the symbol table corresponding to a particular name points at the most recent occurence of that name in the stack. Occurrences are put on the stack by the execution of a declaration of the name. If the name already existed in another block or in a previous and still current activation of the same block, this previous occurrence must not be lost. Thus each occurrence on the stack points to the preceding occurrence of the same name, if one exists. Since the type of objects is also considered during the interpretation, the stack entry for a simple variable is a triplet of the type, value and a pointer to the preceding entry for the same name. In the case of ALGOL 60 on the 360 computer, this entry takes two words:
Value
Type
-Pointer
The type is 1 byte, the pointer 3 bytes and the value one word. The more varied and more complex data types of PL/I require more space. In particular, the type of ALGOL 60
M. Berthaud and 44. Grl~ths
102
becomes an attribute symmary in PL/I, and this takes up a full word. Thus the minimum entry for a PL/I variable is three words. In general the life of a name is the same as the life of the block in which it is declared. At the end of a block its local names are “dedeclared”, that is to say that the corresponding symbol table entries are replaced by the values they had at the start of the block. These values are found in the pointer field of the stack entry, since the pointer is to the preceding stack entry for the name. The current block level in the stack is kept in a particular memory location, and each block entry points to the preceding block entry. Let us consider a simple example in ALGOL 60: 1 2 f..
begin real a, b ; begin 3nteger b,c;
...
A ..**
end end At a point A in the internal block, the stack and the symbol table would be:
Direction of growth
Block 2. Current pelzk
Stack
This simple identifier and block ch~ning is the basis of all the storage allocation and variable retrieval used in the compilers, but needs some improvement to treat certain particular cases which arise in ALGOL 60 or in PL/X. In particular, the search for non-local variables in the stack requires a second pointer in the case of procedures. Consider the
Incremental Compilation and Conversational Interpretation
103
ALGOL 60 program: 1 2
3
hegin real x; procedure f; begin x:=1; end; begin integer x; f; end end
Procedure f is called from block 3, and at the moment that the assignment to x in block 2 is executed, the stack and the symbol table will be:
The most recent declaration of x is in block 3, but the x required is the x of block 1, since block 2 is lexicographically contained in block 1 and not in block 3. To avoid referencing the wrong x, a second pointer is needed in the case of procedures, which indicates the lexicographically containing block. We call this pointer the static pointer, the pointer to the calling block being the dynamic pointer. A static pointer is needed for procedures and in the case of the evaluation of an actual parameter corresponding to a formal parameter by name. The use of two pointers in this latter case is already known in ALGOL 60 compilers. The pointers described above are sufficient to find all variables in ALGOL 60 and most in PL/I. The algorithm for a reference is the following: 1. Consider the address given by the relevant entry of the symbol table. If this address is in the current block, the reference is to a local variable and is given directly; otherwise the reference is non-local. 2. Consider the non-local address to decide if it is in scope. To do this, follow the static chain up to its first value before the address considered. 3. Follow the dynamic chain to its last value before the value found on the static chain. 4. These two values now enclose a block. If the address considered lies between them, the reference is found. Otherwise consider the next address in the identifier chain. 5. If this address is still higher in the stack than the static pointer value considered, go back to 4; otherwise go back to 2.
M. Berthaud and M. Grijiths
104
This process will always find an ALGOL 60 reference, if one exists, but needs modification due to the attributes INTERNAL/EXTERNAL, AUTOMATIC/STATIC/CONTROLLED /BASED in PL/I. References to arrays are done by the normal extensions to simple references, but references to structures need thought. In PL/I structure references would normally be evaluated at compile time, but this cannot happen in incremental mode. The organization of the structure is modelled in the pseudocode, so that with each element is associated its level number and the names of any immediate successors. For example, in DCL
la, 2 b FIXED BIN, 2 c, 3 a CHAR (lo), 3 d FIXED DEC ;
The element a is of level 1 and is followed by the names b and c. The generator detects multiple declarations, like DCL 1 x, 2 y, 2 y; The interpretation can stack directly the information given in the pseudo code and also test the compatibility of the declaration with the others in the same block; thus the element at level 1 must not be declared at this level elsewhere in the block. However, a name declared at a level other than level 1 can occur in declarations in the same block. Reference can be made to particular elements or sets of elements either by simple or by qualified names. The rules which decide whether the reference is legal are complicated, depending in the first instance on the distinction between complete and incomplete references. A complete reference is one in which each occurring name which is not of level 1 is preceded by the name of the immediately including name, the two names being separated by a decimal point. We note that a complete reference is always unique in any block, since no name can have more than one declaration at level 1 and no two immediate substructures of the same structure element can have the same name. Complete references take priority over incomplete references, but if a simple or qualified name is not a complete reference, it must correspond to only one incomplete reference. In the previous example declaring the structure a : a is a complete reference to the whole structure, taking priority over the possibility of its being an incomplete reference to the element at level 3. a.c.a is a complete reference to the level 3 element. a.a and c.a are (legal) incomplete references to this same element of level 3. In the declaration DCL 1 p, 2 4, 3 z, 2 r, 3 z;
Incremental Compilation and Conversational Interpretation
105
p.z is an ambiguous incomplete reference, since there are two possibilities. The diagram shows the stack and symbol table corresponding to the original example:
Symbcl Value
STR
b attri bates
-
jr L-
table.
f2
STRI 7
t
Stack
Thus the general algorithm treating simple or qualified references in PL/J is the following : 1. Apply the ALGOL 60 algorithm to the first name in the reference. If the reference is qualified, go to 4. 2. The reference is unqualified. If the level number of the element found is equal to one, the element is the one referred to. 3. The level number of the element found is greater than one. Find any other references to the same name in the same block. If there are no others, the element is found. If a reference to a level one object exists, this is the correct one. If more than one reference exists in the block, and none of them is of level one, the reference is ambiguous. 4. The name is qualified. Search for a complete reference in the block. If none is found, then search for all incomplete references. If there is more than one incomplete reference, the reference is ambiguous. 5. Search for a complete reference. If the first name in the element found is of level 1 and is not a terminal element, go to 6, otherwise go to 7.
106
M. Berthaud and M. Grifiths
6. The structure is searched directly for a complete reference. If one is found, it is the reference sought, otherwise go to 8.
7. Search the other entries on the identifier chain for an element in the same block which is non-terminal and of level 1. Ifone is found, go to 6. 8. No complete reference exists in the block. Find all possible incomplete references in the block. If there are none, go down the identifier chain to an including block and go to 5. If there are more than one, there is an error. If there is exactly one, it is the reference sought. Finding incomplete references is a tree search in which nodes may be jumped, which is a long process. An initial check to see if all the names of the reference are included within the part of the stack to be searched usually saves time. Obviously, if the end of the identifier chain is reached at any stage, the reference does not exist. A further chain is established in the stack to cope with the problem of recursion introduced in the previous chapter. This chain, called the scope chain, indicates which statements are contained in other statements. For example, consider the following portion of an ALGOL 60 program: ifx>O then begin for i : = 1 step 1 until 10 do begin a [i] :=a[i]+x; x: =x+1 end end When the assignment is made to x inside the loop, the scope chain in the stack will be:
Compound
st.
For st.
Compound
st.
Conditional
st.
Space in the entry for the for statement will also be used to indicate the current state of the for list. The scope chain is used to check that the current increment is permissible in the contextfor example, that an else is preceded by an if then pair. It is also used to control the sequencing of increments, as detailed in the next chapter, and in the execution of the goto statement.
Incremental Compilation and Conversational Interpretation
107
6. Evaluation of Declarations
On entry into a block, the local identifiers must be declared before executing the instructions within the block. In ALGOL, the compiler accepts declarations anywhere in the block and not only at its head. So it is necessary to see all the increments of the block, in order to declare the identifier appearing in declarations or as labels. There are two problems: 1. Find the end cf the block. 2. Ignore declarations in nested blocks, but declare the labels in nested compound statements. The control dictionary is used to detect the structure of the segment. Declarations are immediately interpreted by up-dating the Symbol Table and allocating space in the stack for the type and the value of each identifier; the same process is applied to labels. When a procedure declaration is encountered, the procedure identifier is also declared and the body of the procedure is skipped. When a begin is found, the type of the following increments is checked; if a declaration appears, it is a block and the corresponding end is found; otherwise, it is a compound statement and, in a second pass, the labels are declared. At the end of the current block, control returns to the beginning and the executable statements are interpreted according to the order given by the Control Dictionary. It should be noted that, in ALGOL the declarations in a block are independent and the expressions appearing as bounds of an array must only invoke variables declared in an outer block. In PL/I it is more difficult to find the end of a block, since the same END can close several blocks or DO groups. For example: 1
p : PROC OPTIONS (MAIN);
5
iab(2) : 100~ : . . . ;
... 10 11
x:PROC; DCL z FLOAT ;
14 20
; ‘: ‘ENTRY ; . . . continue : DO . . .
28 29 30
&Dx; DCL i FTXED BIN, lab(l0) LABEL; END ;
On entry to p, the variables i, lab, loop, x and y become active, but the declarations of z and continue do not. The END x in line 28 closes the procedure x and all other blocks or DO-groups within x. Thus the DO of line 20 is also closed. Thus the structure of a PL/I program depends not only on the Control Dictionary, but also on the labels (simple labels only) which are attached to BEGIN, PROC or DO. This structure information is thus stacked, and the right number of closures done for a multiple END, testing first that the label exists. Line 14 of the example shows up another problem, which is that of the secondary entry
108
M. Berthaud and M. Grt@ths
point, y exists at the same level as the primary entry point x of the procedure, that is in the block containing the block in which it occurs. To avoid confusion, the PL/I interpreter does a preliminary pass of a segment to define its structure. The control dictionary is at this moment slightly changed to include the number of the increment containing the corresponding END in the case of BEGIN, PROC and DO statements. Since the segment can be changed by edition, this operation is performed each time the segment is edited. To an incremental compiler, the declarations are normally interpreted in the order given by the control dictionary. However, in PL/I, the declarations within a block can be interdependent in such a way that the order of evaluation becomes important. For example: 1 example : PROC ; 2 DCL a(x) ; 3 DCL x FIXED BINARY INITIAL
(10);
The declaration of x, together with its initialisation, must be performed before the declaration of a. In general, the operations performed at the head of a block constitute the prologue. The language [8] defines the list of items which are available at the start of the prologue (for example variables declared in enclosing blocks) and the list of items which become available as the prologue progresses (automatic variables declared in the block, etc.). ‘lne prologue may need to evaluate expressions, concerning only automatic and defined data, in lengths, bounds, area sizes, and in the INITIAL attribute as iteration factors or as arguments in the CALL option. These references are dynamic in the sense that they make use of the value of the variable concerned. Inversely, the attribute LIKE considers only the description of the structure referred to and then creates new names without evaluating. In other cases, such as CONTROLLED and DEFINED, there exist expressions which refer to names, but these expressions are evaluated either when storage is allocated or when the variable is used. The different memory classes in PL/I require different treatment. AUTOMATIC variables go on the stack, as in ALGOL 60, but STATIC variables require the value to be kept alive between activations of the block in which they are declared. In a batch compiler the STATIC memory is allocated at compile time and is treated, like the ALGOL 60 own, as if it were declared in the outermost block. This method is not easily applicable in the incremental compiler since the concept of outermost block is weakened and also the attribute LIKE can be applied to a STATIC structure. This latter implies the recognition of the name following LIKE before the name has been declared. It will therefore be necessary to separate the notions of name and value of identifiers, since these do not always have the same scope, and divide the prologue in two. 1. Reserve all the names declared in the block being activated. 2. Allocate memory, initialising as required. We suppose that all identifiers are explicitly declared. This simplifies the compiler, since implicit and contextual declarations have the whole program as their scope, which at best implies a reorganisation of the stack and at worst a problem of transfer of names across segments. In an incremental compiler it would be preferable to declare the identifier locally and AUTOMATIC.
r~cre~~ta~ ~o~F~~~tionand ~o~~ersatio~a~~~te~~retation
109
The control dictionary indicates those statements which must be examined during the first phase of the prologue (DECLARE, label, etc.). The corresponding pseudo-code is examined and an entry is made in the symbol table to reserve each name. The symbol table entry points to a stack entry which contains the following information: previous stack entry of the same name; type (for example label, static, etc, but not a full description); where to find the attributes (increment and pseudo-code address), or the value in the case of label constants; pointers to immediate substructures for structure variables; gag indicating that the memory space is not yet allocated, and thus that the variable cannot be used in an expression; indications of particularities, for example that an expression is contained in the declaration or that the attribute LIKE is present, etc. During this pass, multiple declarations are detected, This is not always trivial, since the label attached to PROC, for example, can also be declared, but not be used as a label elsewhere in the same block. In a second scan, names affected by LIKE are evaluated, and any new names produced by this process are reserved. The storage allocation scan can now be carried out, except for the case of CONTROLLED and BASED variables, which are allocated by the programmer. The STATIC variables within the block are allocated a continuous zone in the stack, zones corresponding to the STATIC variables of diRerent blocks being chained together. Allocation of STATIC space is only done the first time that the block is entered, since it is kept until the end of the program. Thus, at the end of a block, the STATIC space is attached to the enclosing block, which sometimes leads to recopying of STATIC values to a lower level. The memory allocation consists of replacing the address of the pseudo-code by a pointer to an attribute list and space for a value. If the block has previously been active its STATIC zone is found on the STATIC chain and the pointers recalculated. If STATIC declarations are altered during the course of an interpretation, the interpretation should restart to take the changes fully into account. The example shows the stack and symbol table after execution of the prologue to bIock 5:
l-----l Value = 5 attributes
Pointer to lost static zone I
L L
_
3c *i Static chain
J Static
Symbol
t,
1
1 Stack
tabte
M. Berthaud and M. Gr$iths
110
Program : 1 BEGIN; 5 BEGIN; 6 DCL i STATIC INIT (5); 10 END; 11 END; For DEFINED variables, memory space cannot yet be allocated, although their complete attribute list must be obtained. This can require the evaluation of expressions. The value of such a variable is the pseudo-code address, so that the code is interpreted at each reference AUTOMATIC and DEFINED variables are artificially put into two categories. declarations in which the attributes contain expressions referring to names; declarations in which the attributes either contain no expressions or only constants. This second category can be treated immediately. At the same time the declarations of the former category are analysed and a list made of the names they reference. In a second pass, these declarations are examined cyclically. If a declaration refers only to names which have been treated, it also can be treated. If at any time no declaration can be treated, then there exists an interdependence, otherwise, at the end of the process and after a number of unlimited cycles, all the declarations have been treated. For completion, we note that EXTERNAL variables require that the name itself shall be known to the interpreter. Finally the prologue initalises arrays of AUTOMATIC LABEL variables. ALLOCATE and FREE are instructions which allow the user to allocate or liberate memory corresponding to previously declared variables. This requires interpretation of the corresponding declaration and the existence of a second zone of storage, since such quantities are not kept on the stack.
7. Evaluation of Particular Statements We discuss in this chapter certain problems arising from particular constructions in the languages. The scope chain was introduced in ALGOL to treat statement nesting. In general, at the end of each statement interpreted, the first element of the scope chain is examined. The following actions are taken, depending on the type of this element. if. The then part has been obeyed. If an else exists, the else clause is skipped. An else encountered elsewhere is illegal. The if is removed from the scope chain and the next element examined. for. The condition for continuation of the loop is found in the pseudo-code of the for statement. If the for is terminated, it is removed from the scope chain and the next element examined. procedure. The procedure is completed and the return point is found. The procedure is removed from the scope chain. If the procedure returns a value, control is passed to the
Incremental Compilationand ConversationalInterpretation
111
expression containing the call, otherwise the next element on the scope chain is examined after the return, but before the following increment is obeyed. hegin compound statement or block. The normal state and the one which allows continuation without looking at further elements of the scope chain. begin is obviously removed by the corresponding end. The scope chain also serves to confirm the legality of statements in their context, for example that else follows if then and that then if and then for are not encountered. In PL/I, since the program structure is treated at the start of interpretation, the different parts of the conditional statement are chained. The general form is: IF expression THEN instruction 1; [ELSE instruction 2;] instruction 3 The square brackets indicate that the ELSE part is optional. In the control dictionary, the IF increment points to the ELSE (if it exists) or to instruction 3. Thus, since DO groups are included in the block chain, PL/I does not use a scope chain. Since DO is also terminated by END this is more practical in the particular case of PLJI. In both languages, the interpretation of a GOT0 uses the block chain and in ALGOL the scope chain. The value of a label is in two parts: the number of the increment; block level at its “declaration”. The algorithm is the following. 1. If the block level of the label is higher than the current block level, there is an error. This can only happen in PL/I with a program like: 1
2
BEGIN; DCL l(10) LABEL;
10 11 15
*** BEGIN; :... Cl) = p;
20 21
END; GOT0 l(1);
30
END;
The label l(1) referred to in 21 is declared, but its value is not. Since variables are “dedeclared” at the end of their block, a jump to a non-existent label is normally shown up by the absence of an entry in the symbol table. 2. If the label is declared in an enclosing block, the block chain is used to close the right number of blocks. 3. The label and the program are now at the same block level. There can be two further difficulties : the GOT0 can leave an internal statement (DO group in PL/I or IF, FOR or compound statement in ALGOL). The scope and/or block chain can be affected; C
112
M. Berthaud and M. Gr&$ths
the GOT0 causes a jump into an internal statement. This can be an error, for example into an iterative DO in PL/I, or require additions to the scope chain, for example jump into a compound statement in ALGOL. The considerations outlined above imply that the interpretation of a GOT0 cannot be done directly from the pseudo-code and the stack, but needs an analysis of the control dictionary to determine the contexts of the GOT0 and its target.
8. Conclusion The compilers described here represented interesting projects from the programming point of view, but are not necessarily economic in a commercial environment. First steps towards a decision concerning the viability of the products are measurement and use in a real system. Both of these are under way, but of course the first is easier than the second. The method of measurement chosen was the simplest-to compare the performances of the standard IBM compilers and ours. The results are not surprising, since they reinforce intuitive judgements. The incremental generators are about ten times faster than their batch counterparts, since they do much less work. The comparison of execution speeds is heavily dependent upon instruction type, but for PL/I the factor is about 100. The ALGOL interpreter runs about twice as fast as the PL/I, and since the batch ALGOL is worse than the batch PL/I the ALGOL figures can be made artificially reasonable. The reasons for the factors are obvious-interpretation at this level automatically costs at least an order of magnitude in efficiency, and PL/I is much more complicated than ALGOL. In particular the number of data types in PL/I implies that the routines which fetch variables and evaluate expressions will take a long time. Since descriptions of variables are also held at run time, PL/I is also much more expensive in terms of space. Outside the student environment these figures probably mean that the system is uneconomic. Students tend to have an execution time which is negligible compared with initialisation and loading. For commercial exploitation a pre-interpretation phase would be necessary, which at least constructs the symbol table for a segment and does name association. This operation would be repeated after edition, and its impact on the reactive facilities would need to be considered.
Acknowledgements We would like to thank the various members of the IBM France Scientific Centre and the University of Grenoble who have contributed to these projects.
Appendix. Pseudo-code for ALGOL 60 The pseudo-code is given in the form of a grammar with the following conventions : Characters between square brackets are in the control dictionary. Characters between round brackets are in the pseudo-code. Characters not between brackets are class names of the grammar. An asterisk indicates repetition.
Incremental Compilationand ConversationalInterpretation Increment 3 Begin End Label Declaration If Else Got0 Procedure For Assign Vide Begin + [25] End + 1261 Label --f 1291(index) Declaration --f Simpledec Arraydec Procdec Switchdec Labelspec Stringspec Valuepart Simpledec -+ IType] Idlist Type + Real Boolean Integer Real + 1 Boolean -+- 2 Integer 3 3 Idlist + (Numberofids Index*) Arraydec -+ [4] Arraylist [4 + Type] Arraylist Arraylist -+ Idlist Boundedlist Eoundedlist + Boundedarray Boundedarray Boundedlist Boundedarray -+ Idlist (Numberofdimensions) Ex* Procdec --z Proctype Idlist Proctype Id Parpart Proctype 3 [201 120 + Type1 Parpart ---*(Pars) Idlist Switchdec + [16] (Index Numberofelements) Ex* Labelspec -+ [17] Idlist Stringspec -+ 1241Idlist Valuepart -+ [19] Idlist
113
114
M. Berthuud and M. Grifiths
If -+ [30] Ex Else-+ [31] Goto -+ [32] Ex For + [27] Var Forgroup* Forgroup -+ (1) Ex (2) Ex Ex (3) Ex Ex Ex Procedure -+ [33] (Index Numberofpars) Ex* Assign -+ 1281(Numberofleftsides) Var* Ex Vide --f 1341 Ex -+ Exel* (9) String Exe1 + Const Var Operator Bracket Function Condition Const 3 (Consttype Value) Consttype -+ Shortint Longint Realconst Booleanconst Var -z- (1 Index) (2 Numberofdims) Ex* Function -> (10 Numberofpars) Ex* Condition + (11) Ex (12) String + (Numberofchars Char*) A new line indicates an alternative. Certain obvious expansions are not given, for example ‘numberofpars’ is any integer. ‘Index’ is the address of an identifier in the name table. The grammar should be treated as indicative rather than definitive.
References 1. M. GRIFFITHS,M. PECCOUDand M. PELTIER,Incremental interactive compilation, Proc. Z.F.Z.P., August 1968. 2. W. D. SEES, Review of ref. 1, Computing Reviews, November 1968. 3. M. GRIFFITH~and M. PELTIER,Grammar transformation as an aid to compiler production, ZMAG, February 1968. 4. J. M. FOSTER,A syntax improving device, Computer Journal, May 1968. 5. D. E. KNUTH, Top-down syntax analysis, Copenhagen, August 1968. 6. M. GIUFFITHSand M. PELTJER,A macro-generable language for the 360 computers, Computer Bulletin, November 1969. 7. J. W. BACKUSet al., Report on the algorithmic language ALGOL 60, CACM, December 1960. 8. IBM System/360, PL/I Language Specifications, IBM FormY33-6003. 9. B. Randell and L. J. Russell, ALGOL 60 Implementation, Academic Press, 1964.