Information Processing North-Holland
ANOTHER
Letters
32 (1989) 263-266
NOTE ON RECURSIVE
ASCENT
George H. ROBERTS 13134 S. 125th E. Auenue. Broken Arrow. OK 74011, U.S.A. Communicated by G.R. Andrew Received 4 October 1988 Revised 15 March 1989 and 23 May 1989
Not all implementations of LR parsers state recognizes is considered.
Keywords:
Recursive
ascent,
bottom-up
are created
parsing,
equal. The differences
LR parsing,
language
become
apparent
when the language
that each LR
approximation
1. Introduction
Pennello [4] introduced the idea of compiled LR parsers by representing each row of the LR parsing table as a block of code. Both Aretz [2] and Roberts [5] extended this concept by representing these blocks as recursive ascent subroutines, but Roberts started with the sets of items rather than the parsing table. The purpose of these procedural implementations of LR parsers is to sacrifice space for speed. Pennello reported a two-fold increase of space for a six-fold increase of speed. Our purpose is to show that Roberts solved a more specific problem-one that leads to “better” procedural implementations and to more insight into parsing. In the sequel, the representations are also referred to as [4], [2], and [5] respectively. To allow comparison of the problems being solved, Section 2 introduces the concept of an LR state as a recognizer of a language. Section 3 compares the problems solved in [4], [2], and [5]. Section 4 discusses implementation details from a compiling point of view. Section 5 summarizes the results. Details about language specification, LR parsing, and optimizing compilers may be found in [1,3,6].
2. Languages for LR states Let the grammar (VT, V,, P, S) be an LR grammar. Vr and V, are the terminal and nonterminal alphabets. S, a subset of VN, is the set of start symbols. Elements of P, the production rules, are written in the form (Y+ . . . p . . . , and items, elements of P with the place marker “*“, as a + . . . ‘p . - . . An LR state is a set of items restricted by if OL--, - - * ‘fi * - * is in the state thensoisp-,‘6...
forallp-+S...
EP.
This restriction indicates that after you “read” a 6 you might be able to “read” a fi. This idea is captured by a language for each LR state specified by state grammar (ul, u,, p. s) constructed as follows (Z’~(u,~u,,)+and Z*E(U~UU,)*): 0020-0190/89/$3.50
0 1989, Elsevier Science
Publishers
B.V. (North-Holland)
263
Volume 32, Number
5
INFORMATION
PROCESSING
LETTERS
22 September
1989
(1) for each a --, Z:‘*fl,YZ*,an item in the state, EP,
P’-P
B’-.J,J-,
(if /I= Vr then /3’Es);
(2) for each (Y+ ‘&Z*, an item in the state, P’-+Pa’
EP,
(ifBE
a’, B’ E o,, P E 0,
Vr then P’Es);
(3) for each a + ‘, an item in the state, et--,a
Ep,
a’Eu,,
LEs;
(4) and for each CY + Z+‘, an item in the state, e”
Ep,
c’ E s.
A set of rules /3’ + &a[ 1 . - - 1a:) in a state grammar is interpreted as: when you are ready to “read” 8, you “read” /3 and then you are ready to “read” one of the ai. This construction leads to regular expressions for the state languages written in the symbols of the LR grammar of the form t,G, I - - - I tnG I G+ 1 where ti E Vr and Gi are regular expressions over V,. An implementation of an LR state recognizes that state’s state language. We fish this section with an example of an LR state for the usual expression grammar (state 4 of [l, p. 2081): set of items: F ~ 1y,, .E 1,)W
E_,‘E
T-,-T
T+‘F
“*‘I F
“+”
T
E-,-T F ---,-U(1) E t,)”
F +‘id the state grammar: u,:
{E’, T’, F’, “(” ‘, id’}
u,:
{E, T, F, “(“, id} E’ ---)E T’-,TT’ id’ + id F’ { “(” I, id’}
P:
s:
E’-,EE’ F’--, F T’
T’+T E’ !I(,# , ---, )I(,, F’
the state language as a regular expression: ( “(” 1id) F T+ E+
3. Language comparison The state language need not be recognized exactly. Since no unexpected errors can appear in the input, words added to the state language will never appear and cannot affect the action of the recognizer. While .[5] begins with the exact state language, the usual table-driven implementation of an LR parser as well as [4] and [2] approximate (add words to) the state language with
264
Volume
32, Number
INFORMATION
5
For the above example, this approximation ( “0
PROCESSING
LETTERS
22 September
1989
is
1id) (F 1T 1E)*.
These two cases are at opposite extremes implementations of LR parsing states.
of the set of languages
that lead to correct
procedural
4. Implementation concerns Implementation concerns fall into two areas, LR state linkage, and state language selection. The LR state linkages of (41, [2], and [5], written in a uniformly low-level language, are presented in the following display. subroutine
call
return
(Pennello [4]) next_FSA_state push next _ LR_ state jump (Aret2 ]21) next _ LR_ state Cdl dec n n < > 0, next_FSA_state jump return
dec stack-pointer, return
rule-length
load n, rule-length return
(Roberts [5]) dec stack-pointer, rule-length next_LR_state call return next _ FSA _ state jump Rather than selecting the “best” low-level implementation, the functionality should be expressed as high-level language constructs: “ invoke” subroutine “ but return to” local _ label “skip” n “procedure
invocations as you return”
deferring to an optimizing compiler the appropriate low-level implementation. The selection of the “best” approximation to a state language should also not be arbitrary but should be made on a state-by-state basis. This can be accomplished within an optimizing compiler as suggested in [5]. A simple optimizing compiler was built for the purpose of compiling LR parser state descriptions. The compiler selects an approximate language as a side-effect of performing constant propagation, block merging, and branch elimination and selects the appropriate subroutine linkage. For the few states compiled (several hundred LR states from BASIC, Pascal, and Fortran grammars), on a state-by-state basis, the implementations of [5] were always a fast and as compact as the best of [4] and [2]. The next display shows implementations for the example: code using [4] or [2]: (“(‘I 1id) (F ]T 1E)* I.,: read(sym) L,: if sym =“(I’ then call I,; goto L, if sym = id then calI I,; goto L, sym := id; goto L, -error recovery L,: if sym = F then call I,; goto L, -a multi-level return -exits loop if sym = T then call I,; goto L, if sym = E then call Is; goto L, 265
Volume
32, Number
5
INFORMATION
PROCESSING
LE’ITERS
22 September
1989
( “(” 1id) F Tf E+ code using [5]: I 4: read(sym) L,: if sym =“(” then call I,; goto L, if sym = id then call I,; goto L, -error recovery sym := id; goto L, L,: call Is L,: call I,; if sym = T then goto L, -a multi-level return L,: call Is; got0 L, -exits loop
5. Discussion The state language approach provides insight into LR parsing by providing a means of comparing the problems solved by alternative LR parser implementations. In [2,4,5] alternatives are suggested to the usual table-based implementation of LR parsers. While all suggest different implementations, only [5] solves a different problem. An optimizing compiler was constructed to sort through the alternatives. An experiment indicates that the “best” implementation is closer to that proposed in (51 than that proposed elsewhere.
References [I] A.V. Aho and J.D. Ullman, Principles o/ Compiler Design (Addison-Wesley, Reading, MA, 1978). [2] F.E.J.K. Aretz, On a recursive ascent parser, fnform Process. Left. 29 (1988) 201-206. [3] B. Lohro, ed., Merhoak and Took for Compiler Construction (Cambridge University Press, New York, 1984).
266
[4] T.J. Pennello, Very fast LR parsing, SIGPLAN Norices 21 (7) (1986) 145-151. [5] G.H. Roberts, Recursive ascent: an LR analog to recursive descent, SIGPLAN Notices 23 (8) (1988) 23-29. [6] W.M. Waite and G. Goos, Compiler Consrrucrion (Springer, New York, 1984).