A simulator for teaching the internal workings of an assembler

A simulator for teaching the internal workings of an assembler

A SIMULATOR FOR TEACHING THE INTERNAL WORKINGS OF AN ASSEMBLER M. H. WILLIAMS*, G. R. POTE and J. P. BROOKS Rhodes University. Grahamstown, South Afr...

669KB Sizes 2 Downloads 29 Views

A SIMULATOR FOR TEACHING THE INTERNAL WORKINGS OF AN ASSEMBLER M. H. WILLIAMS*, G. R. POTE and J. P. BROOKS Rhodes University. Grahamstown,

South Africa

Abstract-h many courses on systems programming students are taught about assemblers. However. due to constraints on the time available they seldom get the opportunity to construct one. This paper describes an assembler simulator which can be used as a tool to assist in teaching about assemblers. Using it students can design and simulate the running of their own assemblers in a verv short time. This approach not only stimulates the student’s interest but also gives him a deeper Insight into assemblers.

INTRODUCTION Many courses on systems programming or compiler construction devote some attention to the subject of assemblers and assembly languages [l-3]. However. since assemblers are generally fairly

large programs and since the amount of time available is usually limited, there is seldom sufficient time for the student to obtain any practical experience on the construction of an assembler. For this reason courses sometimes concentrate on what an assembler should do rather than providing an understanding of how it does it. We have attempted to solve this problem by creating an assembler simulator and using it in such a course. An assembler simulator is a program which simulates the actions of ari assembler. In order to use it. the user must decide on the details of the particular assembly language which he wants to simulate (the assembly language instruction and pseudo-op formats, the machine code produced, the actions to be performed, etc.). These details are then encoded in a suitable notation and are referred to as the assembler specijication. The notation used for the specification is a special-purpose high level language. Using this notation one can describe the operation of a wide range of different assemblers with a variety of different properties. Once an assembler specification for a particular assembler has been drawn up. it is read in by the simulator. whereupon the latter will adopt the characteristics of this assembler and go through the motions of the assembler. accepting as input source lines of a program written in the particular assembly language, extracting the information from these, assembling machine code and printing the listings expected as assembler output. At any point in the assembly language source code one may include directives to print out the contents of the assembler’s tables, variables or the machine code produced, or to switch the tracing mechanism on or off. The whole process is illustrated diagrammatically in Fig. 1. Thus the simulator serves a dual purpose in teaching students. In the first place it provides a model which enables one to “see” the inner workings of an assembler; in the second place. in formulating the

~~~~~~i~t~~~~~~~~ Fig. 1. Diagrammatic * Present Scotland.

address:

Department

representation

of Computer

of assembler simulator,

Science. Heriot-Watt 55

University.

Edinburgh.

EH 1 ZHJ,

M.

56

H. WILLIAMS er al

assembler specification one obtains an insight into the way m which different facilities may be implemented. Besides its use as an educational device the simulator can also be used as a design tool in designing assemblers, especially for microprocessors. The mechanism behind the desired assembler can be specified and its operation debugged and tested on the simulator in a relatively short space of time. before proceeding with an actual implementation. In the following sections the notation used for the assembler specification is described briefly and then applied to an example. THE The assembler into two parts:

specification

ASSEMBLER

describes

SPECIFICATION

the operation

of the assembler

to be simulated.

It is divided

(a) The Declaration Section: This consists of a sequence of declarations describing the formats of tables used by the assembler. the variables used by the assembler, etc. (b) The Assembler Description: This is an algorithm in a high level notation which specifies the operation of the assembler. Data types and identifiers There are two basic types in this system: numbers and strings. In the declaration section one may declare variables of either type-a numeric variable is denoted by a numeric identifier (which consists of a letter followed by zero or more letters and digits), a string variable is denoted by a string identifier (which is similar except that it is preceded by a “0” symbol). In other declarations the format of the input record is described in terms of named fields, the format of a machine code instruction is described in terms of the fields which comprise the machine code word, the entries of a table are described in terms of the fields of which they are composed. These named fields may be either string or numeric and the same naming convention applies. The Declaration Section The declaration section is divided into eight subsections. each preceded by an asterisk two-letter qualifier code, viz. *CO. *IF, *OF, *MF. *TF. *EF. *EM. *VA.

followed by a

Constant section The constant section is used to set up certain constants for the simulator. The most important of these is the word length of the machine for which code is being generated. This is set up by means of a simple assignment statement. For example *co WORDLENGTH

= 16

will set up a word length of 16 for the machine

code produced.

Input f&mats The input format specifications de::ribe the different forms which input records read by the assembler. can have. Each specification indicates how a line of source code may be broken into fields. There can be any number of different input format specifications. each consisting of a sequence of field specifications separated by commas and terminated by a semi-colon. Each field specification consists of one of the following: A fixed lenyth ,je/d. This is specified by writing the name of the held followed by its length in parentheses. e.g. SLABEL(9). A cariah/e length field. This is written as a string identifier followed by a delimiter character in quotation marks. e.g. BLAB”.“. Under certain circumstances the delimiter character may be omitted from the assembly language source which is to be assembled. A constant field. This is indicated by a numeric identifier followed by a string constant, e.g. ASTERISK ‘*IL”.If the string is recognised. the numeric variable is set to I. otherwise it IS set to zero. A dumm~~field. written as (0). This indicates the presence of zero or more spaces. For example *IF SLABEL(9). SOP(6). BOPERAND(60): $LAB” “. (0), SPOP” “. (0). SARGUMENT” ASTERISK”*“. $ADDRESS” “:

“:

An assembler simulator

57

defines three input formats: the first consists of three fixed length fields $LABEL, $OP, and $OPERAND of length 9. 6 and 60 characters respectively; the second has three variable length fields SLAB, $POP and $ARGUMENT, each terminated by one or more spaces; the third indicates that if the line begins with an asterisk the remainder of the line is to be taken as a variable length field $ADDRESS. terminated by a space. Output ,fOmaTS The output format specifications indicate how a line of output is to be assembled. Again there can be any number of different output format specifications, each consisting of a sequence of field specifications separated by commas and terminated by a semicolon. Field specifications are the same as fixed length input field specifications. For example *OF $SPACE(8). KARD(80); Machine formats

The machine format specifications are similar to the output format specifications except that they are used to describe the formats of machine code instructions. There can be any number of machine format specifications, each consisting of a sequence of fixed length numeric fields. For example *MF F(4), N(12); NUM(16): Table lomats

In this section the user defines the tables (Symbol Table. Machine Opcode Table, etc.) which his assembler will use. Each table format specification has the form Table name (max entries) = field,, field,, .

. field,;

followed by zero or more entries of form @‘dat,, dat,,

. , dat,;

where max entries is an integer specifying the maximum number of entries in the table for which space is to be reserved; field,. . . ..field. are string or numeric field names referring to the different fields within each entry of the table; and @,dat,. datz, . , dat,: initializes an entry in the table. In the case of a string field name, the name must be followed by an integer in square parentheses, representing the maximum length which a string in this field may have. This is necessary in order for the system to print the contents of tables neatly. Each dati is a string constant (enclosed in quotation marks), a binary integer (preceded by #) or a decimal integer. For example *TF ST(20) = !&SYMBOL [8], $TYPE Cl], ADDRESS, VAL; MOT(3) = $OPCODE [6], MCOP, BRN; iic”LOAD”. #OOOO.0; #-ADD", ~00i,o; (4“JMP”. #OOlO. 1; defines a table ST with a maximum of 20 entries, each of which consists of four fields: a string field $SYMBOL whose maximum size is 8 characters, a string field $TYPE whose maximum size is 1 character and two numeric fields ADDRESS and VAL; and a table MOT consisting of 3 entries (each with three fields-a string field SOPCODE and two numeric fields MCOP and BRN). The entries in MOT are initialized with the data given. Expression ,formats

Since an operand field may contain an expression which may have one of a number of different possible forms. this set of declarations defines the set of possible expression forms which might occur. Each expression format is written in a form similar to a BNF production, with a string variable followed by an equal sign followed by one or more alternatives separated by exclamation marks and

M. H. WILLIAMS rf ul.

5.x

terminated by a semi-colon. Each alternative consists of a sequence of string constants. string variables or pseudo-terminals (e.g. ELETTER. SDIGIT. SLAMBDA) separated by commas. If the left hand variable is separated from the equal sign by a slash symbol and an integer i. this means that the token I will be returned whenever this symbol is recognized. For example *EF %ADDR SEXP BTERM @D/l SRESTTD SINT:‘2 SRESTINT %PLUS/3

= = = = = = = =

SEXP; STERM! ST-ERM. SPLUS, SEXP: SID! SINT; SLETTER, SRESTID; SLETTER, GRESTID! ~LAMBDA~ IDIGIT. BRESTINT; SDIGIT. SRESTINT! SLAMBDA; “+“;

Thus an expression of form SADDR consists of one or more terms separated by the symbol “+“, where each term is either an identifier or an integer. If an expression is recognized. the tokens associated with the nonterminals will be used by the assembler in analyzing the expression and generating code. Error ntessuyes If the user wants his assembler to print error messages at certain points. he specifies these by number in the Assembler Description. This section relates the numbers to actual error messages. Each entry has the form Error number, error message text: For example. *EM 1, “INVALID OPCODE”; 2, “INVALID OPERAND”;

The variables required in the Assembler Description are declared by writing them out as a list separated by semicolons. For example *VA LC; FLAG; ENDFLAG; C01?Imrnts

Any line which has asterisks in the first two character positions is treated as a comment.

The Assembler description This section (preceded by the characters *AD) contains a description of the action of the assembler written as a program in a high level notation. It consists of a single compound statement (i.e. a sequence of labelled or unlabelled.statements enclosed between BEGIN and END) followed by a full stop. A label has the form $6