JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING ARTICLE NO.
35, 2–17 (1996)
0063
An LR Substring Parser Applied in a Parallel Environment GWEN CLARKE AND DAVID T. BARNARD Department of Computing and Information Science, Queen’s University, Kingston, Ontario, Canada K7L 3N6
Overview A sequential substring parser implemented by Cormack naturally lends itself to parallelization. Cormack’s algorithm implements the theory developed by Richter for a suffix parser that parses the bounded context class of LR grammars. Of interest is the behavior of the parallel version, particularly the number of reductions done with small substrings. There are timing gains made when parsing sentences of an expression language on a balanced binary tree architecture. The behavior is analyzed for sentences of an expression language parsed on 7, 15, and 31 node trees. The expression language is parsable in O(log n) time in the best case. How evenly work is distributed among the processors depends on the number of tokens per leaf and the shape of the program’s derivation tree. 1996 Academic Press, Inc.
Section 2 discusses the background material in the areas of substring parsing and parallel parsing. Section 3 discusses the adaptation of the sequential parsing algorithm to the parallel environment. Section 4 discusses trials using the parallel LR substring parser on an expression grammar. The applicability of these results to the highly parallel environment and to other bounded context grammars is also discussed. Finally, Section 5 contains the conclusions. 2. BACKGROUND
This paper assumes that the reader is familiar with compilation and parsing. For more information, the reader is directed to [1]. Some knowledge of LR parsing is also assumed.
1. INTRODUCTION
Currently research is being done in the area of parallelizing the parsing of programs during compilation. Of interest is parallel parsing which potentially employs hundreds, if not thousands, of processors. This paper examines the parallelization of a parsing algorithm developed by Cormack [6]. Cormack implements the ideas of suffix analysis and substring parsing developed by Richter [14]. The algorithm was developed for error recovery during sequential parsing. Cormack’s parsing method uses the LR parsing algorithm to parse the bounded context class of grammars [8, 21], a subclass of LR grammars. The sequential parsing method naturally lends itself to a parallelization since it is a substring parsing algorithm. The parser, in a special recovery state, begins at the token immediately following an error token and determines if the remainder of the input is a proper suffix of a string generated by the grammar. In the parallel version [2], each processor is given a portion of the input or partially parsed sections to process. A processor parses its allocated section starting in the special recovery state. Summarized trials demonstrate the behavior of a parallel version of Cormack’s parsing algorithm on three balanced binary tree architectures using programs of a small language. Trees with 7, 15, and 31 nodes are examined. The distribution of the parsing work among the processors and how this applies to a highly parallel environment is discussed; communication costs are not considered.
2.1. Terminology and Conventions Any terminology and conventions not defined here can be found in [1]. The following conventions are followed:
• T is the finite set of terminal symbols; • N is the finite set of nonterminal symbols; • capital letters at the beginning of the alphabet (A, B,
C, ...) are nonterminal characters, e.g., A [ N; • capital letters at the end of the alphabet are strings of terminals and nonterminals, e.g., X [ (N < T )*; • small letters at the beginning of the alphabet are terminal characters, e.g., a [ T; and • small letters at the end of the alphabet are strings of terminals, e.g., x [ T *. A context-free grammar (CFG) is defined as a 4-tuple, G 5 kT, N, S, Pl where S is the start symbol and P is the set of rules, also referred to as productions. Productions are of the form A R X. LHS refers to the left-hand side of a production; RHS refers to the right-hand side. * X A ⇒ X means that A derives X in one step. A ⇒ means that A derives X in one or more steps, e.g., A ⇒ X1 ⇒ X2 ⇒ ? ? ? ⇒ X. L(G) is the language generated by the grammar G. * xj. x is said to be a sentence of L(G). A L(G) 5 hxuS ⇒ sentential form is derivable from S according to the gram* Xj. mar G, SF(G) 5 hXuS ⇒ 2
0743-7315/96 $18.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT
* rm x is the sequence The rightmost derivation for X ⇒ X ⇒ X1 ⇒ X2 ⇒ ? ? ? ⇒ x where in each step the rightmost nonterminal is replaced using a production, p [ P, of the grammar G. If X 5 WAz and A ⇒ Z where W and z may be the empty set then X ⇒ WZz. A CFG is said to be unambiguous if x [ L(G) implies x has a unique rightmost derivation. For any language L(G) if A R XY [ P, then A R YX [ Prev for the language Lrev(G rev). When a grammar class is symmetric, if G [ Gclass then G rev [ Gclass . LR parsing is a left to right, bottom-up strategy that successively reduces input tokens and previously recognized nonterminals until the start symbol is recognized. LR parsing produces the rightmost derivation of a sentence. Grammars are described as being LR(k), where the k refers to the number of lookahead symbols. When k 5 0 the current input symbol only is needed to choose a parsing action. When k 5 1 the current input symbol and the next input symbol must both be known. BC(1,1) refers to 1–1 bounded context grammars [8, 21]. A 1–1 bounded context grammar has two characteristics: (1) for every rule A R X, if for some sentential form containing aXb, A derives X then, A derives X for all sentential forms containing aXb; and (2) the grammar is LR(1). SBC(1,1) refers to 1–1 simple bounded context grammars. A 1–1 simple bounded context grammar has two characteristics: (1) for every rule A R X, if for some sentential form containing aX, and for another sentential form containing Xb, A derives X, then, A derives X for all sentential forms containing aXb; and (2) the grammar is LR(0). Both BC(1,1) and SBC(1,1) grammars are symmetric. When referring to input, left is toward the beginning and right toward the end. A processor, x, is said to be left of a processor, y, if it parses tokens corresponding to input nearer the beginning than those parsed by y and the sections parsed by x and y are adjacent. The leftmost processor at any given level in the tree is the processor parsing tokens that correspond to the leftmost section of input. Similarly, the rightmost parses the rightmost section. A token is a nonterminal or a terminal symbol. 2.2. Substring LR Parsing Richter [14] introduces the notion of noncorrecting error handling and develops the theory necessary for suffix analysis, substring analysis, and interval analysis. Three limitations of these methods are noted by Richter. Substring parsers are not necessarily deterministic unless the grammar is a bounded context grammar [8, 21]. The second limitation is that a suffix parser does not produce a parse tree that can be used for semantic analysis. Richter suggests that a normal LR parse be done and suffix analysis only initiated upon finding the first error. Hence, if no errors
3
are found an LR parse is completed. A final limitation is that it is possible to miss errors with these strategies; the errors missed are mismatched parentheses. A missing parenthesis is not detected when another error occurs within the scope. It is argued that such situations are not common and worth overlooking because at least all the errors that are detected are actual errors. Such errors would be found on a subsequent parse after the correction of existing errors. Cormack [6] implements a substring parser; his implementation is motivated by Richter’s paper. The substring parser is constructed in a similar manner to the construction of an LR parser. Adjustments are made to the LR parser in order to accommodate the recognition of substrings. The usual items A R X • Y, where A R XY [ P, are included and also new suffix items of the form A R ...• Y. It is these suffix items that give the parser the ability to recognize only a suffix of the grammar when the entire sentence is not available. The parser uses the LR parsing automaton. The parser recognizes a suffix language and, hence, it also recognizes a substring language because of the correct prefix property for all LR parsers. There are two constructions developed, one for BC(1,1) grammars, which is referred to as BC-LR(1,1) and the other for SBC(1,1) grammars, which is referred to as SBC-LR(1,1). The construction sets are provided in Appendix A. Sentences of BC(1,1) grammars can be recognized by an LR(1) automaton, while those of SBC(1,1) can be recognized by an LR(0) automaton. Cormack enhances the SBC-LR(1,1) by eliminating unit rules and merging structurally equivalent subparses during construction; his parser recognizes a slightly larger class of grammars than SBC(1,1). A grammar for Pascal is given by Cormack that runs on the enhanced SBC-LR(1,1) parser. The grammars are symmetric and, therefore, the substring parser can produce a right-to-left parse. Interval analysis is implemented by parsing left-to-right until an error is found and then parsing right-to-left starting with the error token until another error is found. Parsing then proceeds from the token immediately to the right of the right margin of the last interval processed. The substring parser is deterministic and, if there are no syntax errors, it produces a parse tree that can be used as input for semantic analysis. If there are syntax errors the algorithm produces a forest of partial parse trees. A different substring parser for a context-free grammar has been developed [13]. The application intended is syntax error recovery and use in a completion tool. This completion tool guides the programmer through writing each construct in order that the program be syntactically correct. The parser is not as efficient as a classical LR parser, but for the intended applications this is not seen as a drawback. Multiple parse stacks are maintained similarly to the parallel LR parsing algorithm discussed by Fischer [7]. The application of this substring parser to a parallel environment is not discussed. A parallel tree implementation could be done that is similar to our implementation of Cormack’s substring parser. However, each node would have consid-
4
CLARKE AND BARNARD
erably more work to leave open all possible left and right contexts for a substring. A matching up process would be needed when the results of two processors are combined. 2.3. Parallel Parsing A summary of research in the area of parallel parsing is found in [18]. In that paper, two reasons are given for the interest in this area. First, as parallel systems become widely used, it will be advantageous to be able to parse on these systems and not require a host for sequential compilation. Second, the theory behind compilation is well understood and therefore it would seem reasonable that parallelizing compilation will expand our knowledge about parallelization in general. In this paper, the distinction is made between parallelism in the sense of several processors and massive parallelization with thousands of processors. The open question of what class of languages lend themselves to parallel parsing is presented as well. It is desirable to find parallel parsing algorithms that run in logarithmic time using n processors [17]. Parallel lexing can be done in logarithmic time using O(n) processors [9]; an implementation is simulated in [19]. The algorithm uses the prefix sum algorithm [10] to combine the results of the processors. It is a method for allowing n processors to combine their results in logarithmic time. Four papers of interest on LR(k) parsing in parallel are [5, 12, 7, 11]. The algorithms in all of these papers begin by splitting the input into sections of equal length that are to be parsed by the various processors. References [12, 11, 5] all describe asynchronous algorithms and when k is greater than 0 there is lookahead possible for nonterminals. In contrast, Ref. [7] developed a synchronous algorithm. To deal with insufficient left context for all but the processor parsing the leftmost portion of the input, Refs. [12, 11] add states. All but the leftmost processor begins parsing in a new initial state, super-initial state [12], or mid state [11]. This state includes all of the incomplete items in the grammar; an item, A R X • Y, is incomplete if Y is not empty. There are two possible complications encountered by all but the leftmost processor: first, insufficient stack depth when a reduction is indicated and, second, a shift– reduce or reduce–reduce conflict. A conflict occurs when a unique operation is not determinable for the particular state when a token is input. A shift–reduce conflict occurs when a possible shift is indicated in a state and a possible reduction is indicated as well; a reduce–reduce conflict occurs when more than one rule is indicated for a possible reduction. The algorithm for a single processor given in [12] empties the stack in the case of insufficient stack depth, issues a cancellation message to the processor to its left, and recommences parsing in the superinitial state. The first input upon recommencement is the LHS of the rule just reduced. A cancellation message indicates to a processor that it must pop a certain number of items from its stack and, in the case of recurring insufficient stack depth, issue another cancellation message to the processor to its left. In the case of conflict, Ref. [12] has a processor pass its
stack to its left and begin in yet another type of initial state known as a continuation state. A continuation state is generated from every state in which a conflict appears. The continuation state is used instead of the superinitial state in order that the observed left context not be ignored. In contrast, Ref. [11] deals with both conflicts and insufficient stack depth in the same manner. A processor passes its stack to its left and begins parsing from the mid state. A different means of dealing with partial left context is presented by [5]. Each processor parses the portions of its input that it has sufficient context to understand. Initially, each processor skips forward in its respective section until the beginning of a construct is found; in trials, the token following a semicolon was searched for. When a conflict occurs, the stack is passed to the left and parsing begins again after the next semicolon. Any tokens skipped over are passed to the left. Partial left context is handled in yet another manner by [7]. Each processor maintains a number of parse stacks, one for each possible state for a given input. Naturally, the leftmost processor will have just one. Stacks are merged in a similar manner to that used by the parallel lexing algorithm [9]. In all four parallel LR papers the leftmost processor completes the parse. For all the approaches the maximum time is O(n); the leftmost processor does all the work when no reduction is possible until the entire input has been seen. Useful speedup is shown by [5, 7, 11]. However, speedup levels off long before reaching a number of processors that is considered massive. Reference [12] suggests that parsing may require time that is proportional to the height of the derivation tree. Using a bottom-up strategy, Ref. [4] demonstrates that short and fat derivation trees can be expected to parse faster in parallel than tall and thin trees; communication time is not used in their analysis. Reference [15] demonstrates that for parallel parsing speedup increases with the number of processors up to a maximum and then levels off or decreases; communication time is included in its timing analysis. The parallel parsing of bounded context grammars is mentioned by [16]. Schell demonstrates a means of expanding the grammar rules in a way that enables a bounded context grammar to fit into the shift–reduce–cancel– continue model that he derived for LR parsing in parallel. Unlike Cormack’s substring parser, Schell allows the reduction of partial phrases. When such a reduction occurs a cancellation message is sent to the left.
3. THE PARALLEL IMPLEMENTATION OF THE LR SUBSTRING PARSER
A parallel LR substring parser modelled after Cormack’s sequential LR substring parser has been implemented. The implementation is restricted by the type and size of machine available.
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT
Communication costs are not being considered in this paper and therefore do not restrict the architecture design. The parallel parsing results of interest in this paper are twofold: (1) the sharing of the parsing task among a small number of processors with a small language; and (2) the applicability of the parallel version of the substring parsing algorithm to a massively parallel environment and a larger language. The particular communication behavior of this architecture may not be characteristic of multiprocessors in the future when massive parallelism is commonplace. Looking at the current communication costs could be misleading. Also, isolating parsing behavior from communication costs allows a clearer picture of the former. The parallel version of the substring parser is implemented on 7, 15, and 31 node balanced binary tree architectures. The machine used is the Transtech MCP1000 System; 60 processors are available. The architecture is loosely coupled, multiple instruction multiple data (MIMD) and, therefore, it runs asynchronously. Each processor has its own memory; there is no memory shared amongst the processors. Communication between processors is done synchronously. 3.1. The Parallel Substring Parsing Algorithm In the parallel tree implementation each processor works independently. Appendix B contains the parallel parsing algorithm. Each processor reads in the parse table and all of the input tokens. During the parse at a node, the derivation trees created by reductions are pointed to from the parse stack. The root of a derivation tree is placed on the parse stack during a reduction and the root points to the remainder of the tree. If the processor is a leaf, then its parser’s input is a section of the initial input tokens. Its parse stack is empty and its parsing state is the initial state upon commencement of its parse. Once a leaf completes parsing it passes its parse stack to its parent; the parse stack contains the partially parsed section resulting from the corresponding section in the input tokens. A derivation tree is passed along when its root is found on top of the parse stack. A processor that is not a leaf receives two parse stacks, one from its left child, the other from its right. Upon commencement of its parse, a processor’s parse stack is the parse stack from its left child and its parsing state is the last state pushed on this stack by its left child. Input to a nonleaf processor’s parser is its right child’s parse stack. The input is ordered from the bottom to the top of the right child’s parse stack. Once its parse is complete the root node prints its derivation tree or trees to a file; other nonleaf nodes pass their parse stacks to their parent in the same manner as does a leaf node. The parsing table produced by the sequential substring parsing algorithm is used by the parallel substring parser.
5
In the sequential version, parsing may begin at any token by starting in a special recovery state. The special recovery state allows for any terminal symbol as input. In the parallel version, a processor starting in this special state may parse any portion of the terminal tokens using the same parse table. However, a processor that is not a leaf will be receiving input tokens that are the partially parsed section coming from its right child. There are two considerations when input symbols can be nonterminals. First, whether input to this special recovery state will ever be a nonterminal and second, whether nonterminals encountered as input, while in other states can be correctly parsed. A nonterminal can only be input when in the special recovery state if a processor has already made a reduction that eliminates any terminals in its left context. This does not occur because within the closure of a set when a suffix item has been recognized a reduction is not done. Instead, another suffix item is added for every rule that contains a string of terminals and nonterminals following the identified nonterminal. The reason for this is that a suffix of a nonterminal symbol, A, could be the beginning of the suffix for any phrase containing A. Within the closure of the items in a state, the last set of items for each grammar adds suffix items when a suffix item is recognized. Also, the definition of reduce does not include recognized suffix items. See Appendix A for Cormack’s construction sets for SBC-LR(1,1) and BC-LR(1,1). The possibility of a state being reached in which a nonterminal input token becomes an error token when it should not be must be considered. This occurs if the parser is not ready for the nonterminal. Again, this is impossible. If the right child of the processor already had enough context to reduce to the nonterminal, then the same reduction or reductions would be done in a canonical parse. Since the reductions have already been done, then the parent is ready to parse the nonterminal. Restricting the grammar class to bounded context guarantees this property. Hence, the parse table from the sequential substring parser can be used, unchanged, in the parallel version. A discussion of error handling within the parallel LR substring parser is found in [3]. 3.2. Input and Output of the Parallel Substring Parser Each processor loads the left to right parse table from a file during the initialization phase. A file that contains the sentence to be parsed is read by each leaf processor. Without an error handler a leaf only needs to store the portion of the input that it will be parsing. The scanning/ screening is done by the leaves during the initialization phase. Once the root of the tree completes its parse, it contains the entire derivation tree. Each node of the derivation tree contains a production number used in the derivation of the sentence. The root prints a postfix representation of the derivation tree to a file. Leaves and empty positions in the tree are represented by a value of (21).
6
CLARKE AND BARNARD
4. TESTING ON THE PARALLEL LR SUBSTRING PARSER
FIG. 1. Input with corresponding output.
An example of syntactically correct input and its corresponding output are found in Fig. 1. The intermediate output represents the internal states of the various nodes; the significant part of the output is the indication of acceptance. 3.3. Complexity Analysis As Cormack [6] states, the space requirements are slightly larger, but not unreasonable, for a substring parser, and the time complexity is still O(n) for parsing correct programs. The space increase is due to the extra states needed to allow substring recognition. Cormack demonstrates that, using a BC-SLR(1,1) definition with structural merging, the table for Pascal is twice the size of that for an LR definition. It is desirable not to increase space requirements during the parallelization of any algorithm. In the parallel version, the space requirements are the same at each processor as for the sequential version because the same tables are used. As pointed out in [17], it is desirable in parallel parsing to achieve logarithmic time. The worst case time complexity of the parallel version is O(n log n); this occurs when the entire input must be seen before any reductions are done. The best case time complexity of the parallel version is O(log n); this occurs when the height of the derivation tree is O(log n). The observed time complexity of the parallel version will be discussed in Section 4.3. 3.4. Limitations of the Parallel Implementation The parallel implementation on the Transtech MCP1000 System is limited in several ways. First, there is a lengthy initialization time because reading and writing are all through the host. Second, there is difficulty starting all the processors at the same instant. These two limitations do not interfere with the analysis of the sharing of the parsing task. The third limitation is that the stack size is exceeded when recursively traversing too large a tree. Input of up to approximately 4000 tokens can be parsed. This is large enough to observe the parallel parsing behavior, especially considering there are a maximum of 31 processors used. Also, testing is limited by the size of processor tree possible on the MCP1000 System. The largest balanced tree possible only has 31 nodes. Finally, the memory for individual processors is too small to run Pascal; the Pascal parse table is too large. To compensate, the sentences of the expression language that were used for testing attempt to emulate the scoping pattern of procedural language programs. The expression language specification is given in Appendix C.
Trials are designed to emphasize the particular aspects of parallel substring parsing behavior that are the focus of this paper. In order to understand the behavior, the following are analyzed:
• whether or not LR parsing is inherently left to right; • the number of reductions done at each node throughout the tree; and • how various styles of derivation trees affect the division of the parsing task among processors. As previously outlined communication costs are not considered. The testing of the parser is limited to expression language programs; the language specification is given in Appendix C. Trials are designed to resemble the scoping patterns of procedural programs. 4.1. Measurement There are two measurements taken in order to discuss the parsing behavior. Each processor records the time taken to parse and the number of reductions done. The time to receive information from children and the time to pass information to the parent are not included in the parsing time. A processor’s parsing time is merely the time to parse once the parse stack and input are available in memory at the processor. Analyzing the division of reductions amongst the various processors gives an interesting picture of the division of labor. In order to parse a given input and provide a derivation tree for further compilation, there are a certain number of reductions that must be done. If these reductions can be shared amongst the various processors, then there is useful work being shared. It is found that the parsing times at the processors reflect the reduction distribution and the latter provides more specific information. Speedup is estimated as the number of reductions done divided by the maximum weighted path in the processor tree. The value at each node is the number of reductions done by that processor. 4.2. Test Sample The analysis is done by parsing syntactically correct programs. If the parsing of a grammar class is inherently left to right, then little can be gained in a parallel parse. The amount of left context necessary to carry out reductions must be limited in order that processors, other than the one with a complete left context, can perform reductions. If a reasonable percentage of reductions are carried out by processors not containing a complete left context, then timing gains can be expected in the parallel version. All figures in this section contain trees which represent the processors. The number at a processor is the number of reductions done by that processor.
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT
4.2.1. A Large Number of Input Tokens—Minimal Parallelism. To test the left to right parsing behavior for the expression language trials with a large ratio of input tokens per leaf are run. Parallelism is considered minimal when the input per processor is high. See Figs. 2–5 for the trees displaying minimal parallelism. The trial that produces the pattern in Fig. 3 has a shorter and fatter derivation tree than that of Fig. 2. The style of input for Fig. 2 is provided in Series 1 of Table I and the style of input for Fig. 3 is provided in Series 2 of Table I. Figure 5 has a worst case derivation tree shape; reductions are done strictly left to right and reductions begin only once half of the input has been seen. The input used for this trial contains a series of left parentheses, a number, and then a series of right parentheses. A procedural program with this characteristic is highly improbable because it would correspond, for example, to a sequence of deeply nested procedures with only the innermost containing statements. Timing gains are possible in a minimally parallel environment, i.e., using tens of processors. Approximately, speedups in the range 2–9 are observable except in extreme cases. 4.2.2. Token Doubling Runs. Once the sharing of the parsing task amongst the nodes of the tree and the amount of left context needed to reduce are both understood, then deductions are possible with respect to massive parallelism.
FIG. 2. Minimal parallelism: 4397 input tokens.
FIG. 3. Minimal parallelism: 4609 input tokens.
FIG. 4. Minimal parallelism: 4403 input tokens.
7
8
CLARKE AND BARNARD
FIG. 5. Minimal parallelism: 417 input tokens.
FIG. 6. Parse trees.
The amount of left context necessary before reductions begin gives a further demonstration that the expression language does not parse strictly in a left-to-right manner. The number of tokens at the leaves that produces at least one reduction gives us a feel for the amount of left context needed. To further observe the sharing of the parsing task and further understand the question of left context, two series of tests are done that successively double the amount of input. Each series has a different scoping pattern. A sample of the input is provided in Table I. The difference in the two series is demonstrated in Fig. 6.
A subset of the trials run for the two series is reviewed in Figs. 7–14 for the 31-node tree. Series 2 contains sentences from the series with shorter and fatter derivation trees than Series 1. Figure 2 is also part of Series 1 and Fig. 3 is part of Series 2. An interesting pattern emerges from the trials run. Until there is a reasonable number of tokens at a processor reductions are not done. Once within a reasonable range of tokens per leaf reductions occur at the leaves and tend to occur fairly evenly throughout the tree. I will refer to values of tokens per leaf within this range as ideal. For this expression grammar the ideal number of tokens per leaf appears to be between 9 and 18.
TABLE I Input Series
Number of tokens
1
137
2
145
Input a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2 p a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2 p a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2 p a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2 (a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2) p (a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2) p (a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2) p (a 1 q 2 a p b 1 (2/3) p 2 1 (12 2 3 p (8 2 4)) 1 w/4 2 2)
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT
FIG. 7. Series 1: 69 input tokens.
FIG. 9. Series 1: 273 input tokens.
FIG. 8. Series 1: 137 input tokens.
FIG. 10. Series 1: 545 input tokens.
9
10
CLARKE AND BARNARD
FIG. 11. Series 2: 73 input tokens.
FIG. 13. Series 2: 289 input tokens.
FIG. 12. Series 2: 145 input tokens.
FIG. 14. Series 2: 577 input tokens.
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT
FIG. 15. Tokens per leaf 5 ideal; 7 nodes.
It is also noticeable that the series with taller, thinner derivation trees have more reductions done higher in the tree than the other series. Finally, once the number of tokens per leaf becomes large, work mostly occurs at the leaf level. In a minimally parallel environment two observations are relevant; first, the leaves each have a reasonably equal number of reductions with the leftmost leaf usually having a few more and, secondly, the leftmost processor at subsequent levels has far more reductions than other processors at that level. On subsequent levels parsing is very much dominated by a left to right direction. The same pattern is observed for the 7- and 15-node trees. Figures 15–17 show all three sizes of trees with the same number of tokens per leaf when this number appears within the ideal range. Figures 18–20 show all three sizes of trees when this number is far larger than ideal and Figs. 21–23 have the smaller than ideal samples. The results from these two series look promising. There appears to be an ideal number of tokens per processor needed before reductions begin. This number would depend both on the grammar and on the scoping pattern of the program. When there are too few tokens per processor then no reductions occur. Also, observing these two distinct series it is noticed that when the scopes are larger work is done higher in the tree. To further investigate these observations, testing with a variety of sentences within the ideal range is presented in the next section.
FIG. 17. Tokens per leaf 5 ideal; 31 nodes.
4.2.3. Derivation Tree Shapes. Now that there is a feel for the amount of left context needed to begin reductions,
FIG. 18. Tokens per leaf . ideal; 7 nodes.
FIG. 16. Tokens per leaf 5 ideal; 15 nodes.
FIG. 19. Tokens per leaf . ideal; 15 nodes.
11
12
CLARKE AND BARNARD
FIG. 20. Tokens per leaf . ideal; 31 nodes.
FIG. 21. Tokens per leaf , ideal; 7 nodes.
FIG. 23. Tokens per leaf , ideal; 31 nodes.
a series of tests with approximately the ideal ratio of tokens per leaf are run on the 31-node tree. These trials present various derivation tree shapes in order to observe the sharing of the parsing task and further demonstrate this ideal ratio. Figures 24–28 provide these trials. Note that Fig. 28 demonstrates worst case behavior. Reductions are only done in a left-to-right manner; there are no internal scopes. A procedural program of this nature is highly improbable. Derivation tree shapes greatly affect the amount of work done between levels of the tree. The taller the tree the higher in the tree that the majority of the reductions occur. To evenly distribute reductions among the levels, programs need to contain a variety of scope sizes and nested scopes. 4.3. Observed Complexity
FIG. 22. Tokens per leaf , ideal; 15 nodes.
The parallel implementation has O(log n) steps and the best case time is O(1) at each processor. However, the worst case time at a processor is O(n) as shown in Fig. 5. This appears discouraging, but timing gains in a parallel implementation are possible. Figures 3 and 12 are indicative of this. Parsing is shared by all processors when parsing sentences that do not have tall, thin derivation trees, and when the number of input tokens per processor is ideal. If reductions are shared equally by all the processors, the
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT
FIG. 24. Ideal: 143 input tokens.
FIG. 25. Ideal: 147 input tokens.
FIG. 26. Ideal: 157 input tokens.
FIG. 27. Ideal: 113 input tokens.
13
14
CLARKE AND BARNARD
tree are not accomplishing useful work. To eliminate this waste, an array of processors could parse a section of the initial input and then pass their results to just one processor. The observed behavior patterns can be applied to a massively parallel environment in which each leaf receives one token. Reductions would not be expected for a few levels of the tree until processors have a reasonable number of tokens. However, once reductions start occurring, work is expected to be reasonably shared by all the processors at the remaining levels if the derivation tree is not tall and thin. Further investigation of the number of tokens at each processor and the distribution of reductions could prove fruitful in the area of parsing in a massively parallel environment. The behavior of the parallel substring parser on a programming language, such as Pascal, needs to be analyzed. However, the bounded context property assures similarities to the expression language tested. Any reductions done by a processor would be valid. Some left context would be needed in order to begin reductions, and the pattern of reductions would greatly depend on the shapes of the derivation trees for the programs parsed. 5. CONCLUSIONS FIG. 28. Ideal: 113 input tokens.
size of the input to each processor is reasonably consistent. In such a situation, the time for parsing at each level approximates constant time. Figures 15–17 and 24–27 demonstrate approximations to this pattern. Hence, an average case running time is between O(n) and O(log n). 4.4. Limitations Bounded context grammars suffer from an inherent O(n) LR parsing behavior. The worst case time for LR parsing of any grammar appears to be O(n) regardless of the number of processors [20]. Restricting the LR grammars to those with a bounded context does not alleviate this characteristic; it is still possible to have reductions restricted to a left to right pattern. Figures 5 and 28 illustrate two instances of this problematic behavior. Figure 28 is for a derivation tree that is completely left branching, while Fig. 5 is center branching. Another problematic derivation tree is one that is completely right branching. The distinguishing characteristic of these problematic trees is that there is only one nonterminal at each level within their tree. 4.5. Implications In a minimally parallel environment, a better implementation of the parallel substring parser is possible. In the tree implementation, after the leaf level, parsing is mostly done left to right; many of the processors higher in the
Bounded context grammars naturally lend themselves to a parallel parsing algorithm; gains are evident when parsing sentences of an expression language on a balanced binary tree architecture. There need to be several tokens allocated to each processor before reductions are likely to occur; this number is grammar-dependent. However, useful work is done on substrings of a reasonably small size when there are internal scopes. When parsing sentences with shallow and fat derivation trees, the work is shared by all levels of the tree if, initially, the leaves start with several tokens in order that there be some left context to make reductions. One anticipates finding a class of grammars that lend themselves to parallel parsing much like LR grammars do to sequential parsing. In this search for the largest subclass of grammars, BC-LR(1,1) could possibly be a subset. The results of this paper indicate that further investigation of this grammar class for parallel parsing may prove fruitful. There were trials run to demonstrate that parsing mostly occurs at the leaves, when the number of input tokens per leaf is too high. On subsequent levels of the tree, reductions mostly occur at the leftmost processor. In this case, with a small tree relative to the size of the input, the worst case time for parsing at each level is O(n). However, timing gains are possible because the work is shared amongst the leaves. An array, instead of a tree, of processors could get this speedup without having some processors accomplish little useful work. When parsing sentences without tall, thin derivation trees and the number of input tokens per processor is ideal, parsing is shared equally by all processors; this occurs when the height of the tree is O(log n).
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT
Because reductions are shared by all the processors in the trials run with several scopes, the size of the input to each processor is reasonably consistent. Hence, the time for parsing at each level approximates constant time. In a massively parallel environment, if there is one leaf per input token, then useful work does not occur for a few levels. However, with large input this does not affect the overall performance. The testing done demonstrates that, once reductions begin, the work is distributed fairly evenly if the derivation tree is not tall and thin. In this case, the expression grammar is parsable in between linear and logarithmic time. The sharing of the parsing task is the focus of this paper; therefore, communication time is excluded. APPENDIX A: CONSTRUCTION SETS
A.1. BC-SLR(1,1) The closure of a set for BC-SLR(1,1) grammars is defined as
15
The set of reachable states is defined as reachable 5 startstate < hshift(s, a) u s [ reachablej < hgoto(s, A) u s [ reachablej. A.2. BC-LR(1,1) The closure of a set for BC-LR(1,1) grammars is defined as closure(s) 5 s * «j < hB R • X : b u A R Y • BZ : b [ s and Z ⇒ * bWj < hB R • X : b u A R Y • BZ : a [ s and Z ⇒ * «j < hB R • X : b u A R ...• BZ : b [ s and Z ⇒ * bWj < hB R • X : b u A R ...• BZ : a [ s and Z ⇒ < hB R ...• X : b u A R ...• [ s and B R YAZ [ P and b [ follow(B)j.
closure(s) 5 s < hB R • X u A R Y • BZ [ sj < hB R • X u A R ...• BZ [ sj < hB R ...• X u A R ...• [ s and B R YAX [ Pj. The function shift is defined as shift(s, a) 5 closure(hA R Ya • Z u A R Y • aZ [ sj < hA R ...• Z u A R ...• aZ [ sj).
The function shift is defined as shift(s, a) 5 closure(hA R Ya • Z : b u A R Y • aZ : b [ sj < hA R ...• Z : b u A R ...• aZ : b [ sj). The function goto is defined as goto(s, B) 5 closure(hA R YB • Z : b u A R Y • BZ : b [ sj < hA R ...• Z : b u A R ...• BZ : b [ sj). The function reduce is defined as
The function goto is defined as reduce(s, a) 5 hA R Y u A R Y • : a [ sj goto(s, B) 5 closure(hA R YB • Z u A R Y • BZ [ sj < hA R ...• Z u A R ...• BZ [ sj).
The set of start states is defined as start state 5 hA R ...• aZ : b u
The function follow is defined as follow(A) 5 ha u S ⇒* YAaZj. The function reduce is defined as reduce(s, a) 5 hA R Y u A R Y • [ s
A R YaZ [ P and b [ follow(A)j. The set of reachable states is defined as reachable 5 startstate < hshift(s, a) u s [ reachablej < hgoto(s, A) u s [ reachablej.
and a [ follow(A)j. APPENDIX B: PARALLEL PARSING ALGORITHM
The set of start states is defined as start state 5 hA R ...• aZ u A R YaZ [ Pj.
/* INITIALIZATION */ read parse table input tokens
16
CLARKE AND BARNARD
if leaf parse stack 5 empty current state 5 0 calculate position of first and last token to parse from last to first input token push (token,index) onto input stack else while not last token from left child push token received onto temporary stack while temporary stack not empty pop from temporary stack push onto parse stack while not last token from right child push (token,index) received onto input stack /* PARSE */ while input stack is not empty /* LR parse */ current input 5 symbol on top of input stack pop input stack if (current input is an error message) or (current input is the root of a parse tree) /* TRANSFER */ push error message or root onto parse stack else if parse table (current state, current input) is not negative then /* SHIFT */ current state 5 parse table (current state,current input) push (current state,current input,index) onto parse stack else if parse table (current state, current input) is not empty then /* REDUCE */ rule 5 parse table (current state,current input) * (21) pop (size of RHS of rule) tokens off of the parse stack built a derivation tree with rule being reduced as the root and any rules popped from the parse stack as its children index 5 index of last parse stack token popped current state 5 state at top of stack current state 5 parse table (current state,LHS of rule) push (current state,LHS of rule, index) onto parse stack else /* ERROR */ call error handler if root print derivation tree to file
else while parse stack not empty send top parse stack item to parent pop the parse stack APPENDIX C: PRODUCTIONS FOR THE EXPRESSION GRAMMAR
s R bof expression eof expression R term expression R expression 1 term expression R expression 2 term term R factor term R term p factor term R term/factor factor R id factor R ( expression ) factor R constant factor R id pp factor factor R ( expression ) pp factor factor R constant pp factor ACKNOWLEDGMENT The authors thank Dr. G. V. Cormack for the LR substring parsing tables.
REFERENCES 1. Aho, A. V., and Ullman, J. D. The Theory of Parsing, Translation, and Compiling. Prentice–Hall, Englewood Cliffs, NJ, 1972. 2. Clarke, G. An LR substring parser applied in a parallel environment. Master’s thesis, Queen’s Univ., 1991. 3. Clarke, G., and Barnard, D. T. Error handling on a parallel LR substring parser. Comput. Languages 19, 4 (1993) 247–259. 4. Cohen, J., Hickey, T., and Katcoff, J. Upper bounds for speedup in parallel parsing. J. Assoc. Comput. Mach. 29, 2 (Apr. 1982), 408–428. 5. Cohen, J., and Kolodner, S. Estimating the speedup in parallel parsing. IEEE Trans. Software Engrg. SE-11, 1 (Jan. 1985), 114–124. 6. Cormack, G. V. An LR substring parser for noncorrecting syntax error recovery. SIGPLAN Notices 24, 7 (July 1989), 161–169. 7. Fischer, C. N. On parsing context free languages in parallel environments. Ph.D. thesis, Cornell Univ., 1975. 8. Floyd, R. W. Bounded context syntactic analysis. Comm. ACM 7, 2 (Feb. 1961). 9. Hillis, W. D., and Steele, G. L. Data parallel algorithms. Comm. ACM. 29, 12 (Dec. 1986), 1170–1183. 10. Ladner, R. E., and Fisher, M. J. Parallel prefix computation. J. Assoc. Comput. Mach. 27 (1980), 831–838. 11. Ligett, D., McCluskey, G., and McKeeman, W. M. Parallel LR parsing. Tech. Rep. TR-82-93, Wang Institute of Graduate Studies, July 1982. 12. Mickunas, M. D., and Schell, R. M. Parallel compilation in a multiprocessor environment (extended abstract). J. Assoc. Comput. Mach. 29, 2 (1978), 241–246. 13. Rekers, J., and Koorn, W. Substring parsing for arbitrary contextfree grammars. ACM Sigplan Notices 26, 5 (May 1991), 59–66.
AN LR SUBSTRING PARSER IN A PARALLEL ENVIRONMENT 14. Richter, H. Noncorrecting syntax error recovery. Trans. Programming Languages and Systems 7, 3 (July 1985), 478–489. 15. Sarkar, D., and Deo, N. Estimating the speedup in parallel parsing. Proceedings of the International Conference on Parallel Processing, St. Charles, IL, 1986, pp. 157–163. 16. Schell, R. M. Jr. Methods for constructing parallel compilers for use in a multiprocessor environment. Ph.D. thesis, Univ. of Illinois at Urbana–Champaign, 1979. 17. Schmeiser, J. P. Parallel parsing: A Survey. Unpublished, Apr. 1990. 18. Skillicorn, D. B., and Barnard, D. T. Compiling in parallel. Tech. Rep. 90-289, Queen’s Univ., Oct. 1990. 19. Skillicorn, D. B., and Barnard, D. T. Parallel compilation: A status report. Extern. Tech. Rep. ISSN-0826-0227-90-267, Queen’s Univ., Mar. 1990. 20. Skillicorn, D. B., and Barnard, D. T. Parallel parsing on the Connection Machine. Inform. Process. Lett. 31, 3 (May 1988), 111–117. Received May 15, 1992; accepted September 7, 1995
17
21. Williams, J. H. Bounded context parsable grammars. Inform. and Control 28 (1975), 314–334.
GWEN CLARKE is currently a principal in a consulting firm, involved in software development. She completed her M.Sc. degree in 1992 at Queen’s University, Kingston, Ontario, Canada, specializing in parallel parsing and error recovery. As a research associate at Queen’s University, she was involved with the MacDonald project, a program development environment wherein management of programs and supporting documents is integrated into a consistent uniform framework. DAVID T. BARNARD joined the Department of Computing and Information Science at Queen’s University in 1977, having studied at the University of Toronto. He is now a professor in that department. His research applies formal language analysis to compiling programming languages with a focus on using parallel machines, and to treating documents as members of a formal language.