TOTAL PRECEDENCE LANGUAGES AND ELR( l)-LANGUAGES* A. A. ORDYAN
Erevan (Received 3 October 1974; revised 16 December 1974) A CONNECTION is established between classes of ELR(l)-grammars, introduced in [l] , and the total precedence grammars (TP-grammars). It is shown that the set of ELR(l)-grammars (languages) includes the set of TP-grammars (languages). The methods of ELR(1) parsing and TP-parsing are compared.
in f 1] an extension of Knuth3 well-known algorithm for the statics parsing of contextfree languages was presented. A new parsing algorithm (ELR(l)-parsing), which is not a left hand parsing algorithm, makes it possible to extend the class of languages recognized by the LR(l)parsing algorithm. If a grammar does not satisfy the uniqueness conditions of ELR( l)parsing, it is shown how it is in many cases possible to achieve such a parsing (sometimes by means of so-called elementary tr~sfo~atio~, after changing the alar slightly). In [l] it is also shown that the set of ELR(l)-grammars also includes the set of LR(l)-grammars and is not identical with it. In [Z] total precedence ~~~ (~-~~~) are considered. For the strings of a TPlanguage a method of grammatical parsing (TP-parsing) is presented, using precedence relations defmed on the wnole dictionary of the grammar investigated. In this paper it is shown that the set of ELR( l)gmmmars (languages) includes the set of ~~~~s ~~~a~s). A connection is established between the method of ELR( I)-parsing and that of TP-parsing. The fundamental concepts are defmed in Section 1. Some concepts whose definitions can be found in [l] are alsoused.
1. TP-grammars
1. Dej%nYiims. We define a CF-grammar as an ordered quadruplet G= (V,, VN, P, S) , where VT and V, are fmite disjoint alphabets of terminaland non-terminalsymbols respectively, .SE V, is an i~&t~ symbol, andP is a fmite set of rules of the form A-q, where AE V,, (PET, V=V,U V,. We denote by V* the set of all the strings (iicluding the empty string e) on the alphabet V.
*Zh. yjkhX
Mat. mat. Fiz.. l&S,
1289-1296,197s.
190
Languagesand ELR(l)-languages
191
We say that fl directly generates a (we write this as p-ta ), if strings rl and -y2 (possibly empty) exist such that B=y,Ay$, a =YI’~‘~z,and there is a rule (A-q) EP. We say that 0 genemfes a (we write this as ‘b+a,), if strings PO,. . . , pr, r>O, exist such that
If r>O (in the generation of a! at least one rule is applied), we write this generation &a. We call sentential forms any generations of an isolated symbol S. We define a language L(G) as a set of sentential forms consisting only of terminal symbols: L(G)={y(S+,
ydG*}.
The CF-grammar G= (V,, VN, P, S\ is said to be reduced, if(A+a) EP inplies that a#&; ASA. does not hold for alI AEVN there exist a, /3EV, yEVT* such that ShAi3Layp; for any A= V, It is known [3] , that every CF-language not containing the empty string is generated by a reduced CF-grammar.
set
Let G= (I’,, VN, P, S) be a reduced CF-grammar. We define four binary relations on the Vi= VA V&J {I}, where Let’, UV, is a special symbol, a marker: X< Y, ifA, BEVNand a. B,y=J'* exist such that (A-taXBp) EP andZ?LYy; X&Y, if AEV~ X>Y,
and a, p’V*
ifA, BEVNand
exist such that (A-+aXYB)EP;
a, 8, y=V
exist such that (A-taBYP)
X@Y, if A, B, CEVN and a, ,fJ, y, &=V’ B:
yfY, C:
EP
andB:yX:
exist such that (A-taBCp)EP.
YiS;
J--Z X, ifS-LXa
X> -t-, if S-LX
for some a= v’; for some a= v’.
The relations< , &, 0, > are called precedence relations. An automaton with two magazines in some alphabet Vis defined as a defined as a device consisting, fnstly, of two magazines in each of which there is part of a row ZEV’. This is represented schematically by
and is called a configuration of the automaton, and secondly, of a finite set of rules of the form
(Xi, XJ * (Y,, Yz) ,whereXI, Xh Yi, Y+V’. The operation of the automaton begins from some configuration, called the initial conftguration. If at a given instant the configuration of an automaton is of the form [2,X, X,2,] and the rule (X,, X,)-t ( Yi, Y,), exists, then the automaton can pass into the new configuration 1-&Y, YJ21. The automaton stops if none of the existing rules is applicable or if the automaton has reached a final configuration known in advance.
192
A. A. Ordyan
In [2] a method of syntactical analysis is given which uses precedence relations defined in the whole dictionary of the grammar investigated. Below we explain briefly the main results of [2] .
2. 7P-relations Defmition of TPgrammars and TP-languages. We consider a two-magazine automaton in the alphabet
(vd-hu{-h
<,
>}),where(‘V&vN)i?{A,
<,
>}=a,
defined as follows: 1) the initial confi~ration
of the automaton
2) the final configuration of the automaton
[ J- s-l- ] ,wher& -is an isolated symbol of G; 3) rules of the automaton if A<,B or A=& B+J-, (4 %-+(A<& A), if A+=,B, (A, B) -+ (AB, A) 3 (A, B)-+-(A>, B), if A>$ or A+-L, B=J-, (<01>, A)-(A, U), if U+a - is a rule of the grammar G. Here < i, A,,
> , are abstract relations not yet specified.
In the general case a two-magazine automaton defmed in this way may be nondeterministic, therefore its operation may be represented as the simultaneous operation of several automata (known as subautomata). The automaton described is regarded as an arralyzer of an L(G) if and only if for every terminal string x applied to the automaton in its initial conjuration, z&&(G) means that no subautomaton stops in the terminal configuration, and if 5~ L (G) , then at least one subautomaton stops in the terminal configuration. The structure of x may be deduced directly from a series of cancellations performed by the automaton which stops in the terminal configuration. If the automaton described is an analyzer of the grammar G, then this analyzer is called a TPgnnlyzer, and the relations =Z 1, *$, z==t are called TP-relations. For any CF-grammar G there always exists a TP-analyzer. For this it is sufficient to take AQ ,B, A=,B, A 3 ,C for all
However, it is obvious that an analyzer constructed in this way is nondete~~i~. A grammar for which there exists a deterministic TP-analyzer is called a TPgr~mrrrur. Languages which may be generated by TP-grammars are called TP-kan~es. 3. Necessary and sufficient conditionsfor the existence of TF-grammars.TP-grammars may be easily recognized. If it is agreed that the TP-relations < $, J-,, ti.! are unique, when
Languages and ELR(l)-languages
193
then the necessary and sufficient conditions that a grammar be a TP-grammar are as follows: a) all the rules have different right sides; b) unique TP-relations can be defined for the grammar. For a given grammar it is easy to verify condition a). To verify condition b) the following theorem is proved. Theorem 1 The necessary and sufficient condition for the existence in the grammar of unique TP-relations is the condition
In the general case several TP-analyzers can be found for a given grammar. The following theorem enables us to enumerate all the TP-analyzers which are capable of analyzing not only the terminal strings, but all the strings of V,U V,. l%eorem 2
The three binary relations < ,, &,, >, in the alphabet t’,U VN of the grammar C are TPrelations and the corresponding analyzer is capable of analyzing all the strings on VAJ V.Yif and only if the following inclusions are satisfied:
4. Left-sided parsing and TP-languages. It is known that the necessary and sufficient conditions
for achieving left-sided parsing by means of the precedence relations <, A, 0, z=- are as follows: a’) all the right sides of the rules of the grammar are different;
Condition a’) is the same as condition a). Condition b’) requires the satisfaction of < fl @= 0, which is absent from b). This is a fundamental feature. In the analysis of the strings of TP-languages those relations which are contained in the set C fl 0, are combined with the set @ , and therefore the parsing of the strings of TPlanguages is not always unique. In [2] , Fig. 1 shows the relations of the set of TP-languages to the sets of other known languages. In Fig. 1 the following notation has been adopted: R is the set of regular languages, D is the set of deterministic languages,5 is the set of inversions of deterministic languages, P is the set of is the set of inversions of leftTP-languages, PLR is the set of left-to-right TP-languages, and & to-right TP-languages.
194
A. A. Ordyan
If
Fisthe set of inversions of P, then
P=P.
Therefore, the set of languages generated by TP-grammars is closed with respect to inversion and is not contained in the set of deterministic languages and their inversions.
2. The membership of TP-languages in the set of ELR( l)-languages In [l] a generalized Knuth’s algorithm is described and it is shown that the set of languages (ELR(l)-languages) which can be analyzed by means of this parsing algorithm, includes the set of deterministic languages and is not identical with this set. It is shown below that the set of ELR(l)languages also includes the set of TP-languages. Lemma I
In the control table of the generalized Knuth algorithm for the TP-grammar G every graph can contain not more than one convolution operation. (A convolution by some rule with different expected right contexts is regarded as one convolution operation.) hoof: By condition a) of the definition of a TP-grammar, all the rules of the grammar have different right sides, therefore two different convolutions with the same right side cannot be encountered in the given graph. Let all the graphs in the control table be enumerated in increasing order, beginning with the first graph whose generating state is [OOJ-I. Let some graph with the number N contain two different convolutions with respect to the rulespandq. Since rules of the form A+ E are excluded, the particular states
(P,n,; tp),
(2)
(4, n9; t9>
are in the generating state of the given graph. This implies that the rules p and (I are, respectively, of the form A~-tx~f
* * *
Xpi - -. Xpnp,
A.~+x,,
i+i . . . xpnp, where.i>()
(we have assumed here that the right side of the rule p is longer than the right side of 4). How is it possible for the particular states (2) to be in one and the same generating state? Two types of conflict are possible. Z)pe 1. In the generating state of some graph with number less than N there is the particular
state where. x,, I+~=AFVN,
(p, i; tp),
h this Case Xpi*Xp, it, and xpi
i+l.
S-0.
Languages and ELR(I)-languages
lzIpe 2. In the ~nera~g
195
state of some graph with number less than N there are the particular
states (P, i; &I>,
(I”, j; tr>,
5 ?, j+t=BandB~A,a,
Since Xrj=Xpi, therefore z,++, &A&.
where
Do,
j>o,
a~(VTUVN)**
i+i from rule p and Z,iGX,, i+i from x,iA=B and
In both cases we have (4 na) +@, which contradicts condition (1) of the definition of unique TFQelations. Lemma 1 is proved.
FIG. 1
FIG. 2
Lemma 2 In the construction of the control table of the generalized Knuth algorithm for the ‘IT-grammar G there cannot arise conflicts of type 1 for whose removal an elementary transformation of the grammar is required. Proof; In the construction of the next graph let a conflict of type 1 occur. Then YE@~~)@ exists. It follows from YE@, that in the generating state of the given graph there is the particular state (p, n,; Y) , where
We show that X=V,\V, and is not encountered in a second column of the given graph. It follows from XEVA, that either A,+X, or A,>X. In both cases t,,>X. If X determines a shift into some state, then 5,,,+Xor xpnp< X. We obtain that (4 >) fl (c fl >)#0, which contradicts condition (1) of the defmition of unique TP-relations. By Lemma 1, X cannot define a convolution with respect to some rule different from p. But if X were to defme a convolution by rule p, then we would not have a conflict of type 1, therefore,XEV,\V,. Since XEV, and is not encountered in the second column of the right side, the removal of the conflict arising does not require an elementary transformation. Lemma 2 is proved. Lemma 3 Let G=(V,, V-v, P, S) be a TP-grammar,X, YEV~,XEVY. where czE (V,U V,) , does not exist in the grammar C. l
Then the inferenceXLXa,
196
A. A. Ordyan.
+ hoof: It follows from XEVY that a ZEVN exists such that 2+=X. If the inference X-+Xa, were to hold, then 24.X and(Afl<) - (2, X) +a, which contradicts condition (1) of the definition of the uniqueness of TP-relations. Lemma 3 is proved. Lemmas 1,2 and 3 show that in the construction of the control table of a generalized Knuth algorithm in each graph there can only once occur a conflict of type 1 which can be removed without using an elementary transformation. This permits us to formulate the following theorem. l7reorem 3 TP-languages can be analyzed by the generalized Knuth algorithm. Therefore, the set of ELR(I)-languages includes the set of TP-languages. It is known that the language L= {a”b%} U {a”b”}
is neither a deterministic nor a TPlanguage. On the other hand, in [l] a control table for some grammar generating this language was constructed and it was shown that it is an ELR(l)-language. Figure 2 shows the connection between the sets of deterministic, TP and ELR( I)-languages.
3. The elementary transformation of a grammar and precedence relations We consider the following question: how do we often succeed in continuing the parsing by means of an elementary transformation of the grammar, and how is such a transformation connected with precedence relations? For convenience we consider the language L= (a, b) “ab”, where n>O, and the grammar G=(V,, Vi+, P,, So) , generating this language, where Pr is the set of rules 0. s,-+s.J-
2. S+bSb
1. S+aSb
3. Sea
We construct the precedence matrix 1 in the whole dictionary of this grammar. Matrix 1
197
Languages and ELR(l)-languages
Table 1 -
S
so
a
b Sl
-L
SP
s
a b
01-l 1,311 21-L
stop 121 ,+$I
Sl
(10,20,30)b
(*Yilib
[I211 [(11,31)bl
s4 s-4
221
Pill
s3
S2 S3 s5
13-L 12b (1 l;;;)b
(l0,20,30)b s5
3.
SG
As is obvious from this matrix (c n > ) = { (a, b) , ( b, b) }
0.
SO
S a
b Sl
_L
S2
01-L
Sl
WY
S2
stop 131
231 411 1. 4. 13B 41B 22B
s3
SP
1
SS
s4
SS
SC,
a b S i B b S B b” S n b
1lB; 3iB 21B
s5
22-L 11B, 31B 21B
s7
131 411 12B (II P31) B 2iB,41Bl 22B (w,3;) R
SU
se Sib SU SlP
Sfl
[2311
I
(11 4’3i)B 2lB,’ 41BJ 23B 4iB 2.
[i3B17 40B-L
f
411jJ_
Sll
:41B17 40B-L :23B], 4OBJ_
f B b
4:Ijl
Sll
S5 Se & J-G
SS Sll s1e SE5
Sll
ss Sll SlS ss SS
This completes the construction of the control table. Therefore, after constructing the control table the grammar is of the form
Sll
198
A. A.
Ordyan
Gs=((a, b, -J-J, {So, 8, B), Ps, So), where P2. is the set of the following rules 0. s,-%!u-
3. S-+a
1. S+aSB
4. B+b
2. S+bSB We construct the precedence matrix 2 for this grammar. Matrix 2
All the right sides of the rules of the grammar G are different, and the condition (F U >j issatisfled.ThegrammarisaTP-grammar.
(~r-+)u(on~)u(cn>)=0
U
It is easy to verify that the parsing of an arbitrary string of the language L(G2) by means of the generalized Knuth algorithm and with the aid of a two-magazine automaton is executed in the same way: fust all the symbols b standing at the end of a string are replaced by symbols B, then the symbol a standing in front of them is replaced by S, and then the cancellation rules aSB$S, bSB~S,corresponding to rules 1. and 2. of the grammar G, are applied. It is also obvious that in both cases for successful parsing the pair (a, b) E 0. is included in the set 4. The preceding example shows that in some cases an elementary transformation of the grammar may be a means of transforming the grammar into a TP-grarnmar. The author thanks S. S. Lavrov for supervising this research. Translated by J. Berry REFERENCES 1.
LAVROV, S. S. and ORDYAN, A. A., An extension of Knuth’s algorithm for the analysis of context-free languages. Zh. vj%I~isl.Mat.mur. Fiz., 15,4,1006-1019,197S.
2.
SOLMERAUER, A., Total precedence relations. J. ACM, 17,1,14-30.1970.
3.
GINSBURG, S., Mathematical theory of context-free languages (Matematicheskaya teoriya kontekstnosvobnykh yazykov), “Mir”, Moscow, 1970.