Volume 7, numxr
5
IFIFORMATIOIJ PROCESSING LETTERS
August 1978
A NOTEON WEAKOPERATOR PRECEDIZNCEGRAMMARS * I.H. SUDBOROUGH** Department of cbmputcr Sciences, 7%~Technotogrkat Institute Northwestern University~ Evanston, IL 60201, USA Received 30 December 1976; revised v&on
received 1 July 1977
Precedence grammars, operator precedence relations, Wirth-Weber precedence relations, parsing
The families of operator precedence, simple precedence, and weak precedence grammars have been investigated primarily because of their application to parsing programming languages and the fact that they allow simple shift-reduce parsing algorithms [l-3,5-6, g-101. These familiesofgrammarshave in common the property that certain precedence relations are defmed between various symbols of t:he grammar in order to allow for easy identification of reductions in parsing. In the cm of operator precedence grammars the grammar must be in operator form, the precedence relations are defined only oh the set of terminal symbols of the grammar, and at most one precedence relation holds between any pair of terminal symbols. Simple precedence grammars extend this concept by eliminating the requirement that the graun;~larbe in operator form and by de:ltMng precedence relations on nonterminal as well as terminal symb&. Weak precedence grammars extend the concept further by relaxing somewhat the requirement that at most one precedence relation holds1 between any pair of symbols. It is known that the, family of languages defined by simple precedence grammars properly contains the family of operator
* This work was supported in part by NSF grant GJ-43228. A preliminary version of this material WCSpresente’d at the 1977 Conference on Information Sciexe and Systems (sponsored by Johns Hopkins University) March %&April, 1977 and appears in the proceedings (pages X2-206). ** sed siesJon has been published in ~~~~~l~a~o~ Processing Letters 6 (6) (1977) 213-218; this is tl:e revised version.
precedence languages and is identical to the family of languages generated by weak precedence grammars 134. In this note we consider a. fan&y of p;ammars, called weak operator precedence grammars, which are a generalization of the concept of an operator precedence grammar. For weak operator precedence grammars the operator precedence relatit ins are defined as for operator precedence grar;maarsand the grammar must be in operator form, b wt the requirement thit at most one operator precedence relation holds between any pair of terminal symbols is somewhat relaxed. It is shown that the family of languages generated by weak operator precedence grammars properly contains the family of operator precedence languages and is properly contained in the fami& of simple preceden.ce languages. The completitp uf pi;rsing kuiguage!;defined by weak operator precedeIKe grammars is discussed and it is argued .that the traditional shift-reduce parsing algorithm for parsing operator precedence languages [l] need not be s’lbstantially altered to obtain a parsing algorithm fclr the weak operator precedence 1w;guages. We begin with a description of fundamental ctncepts which are needed in the following diszussiorm. A ~snik.xEjke grammar is a four-tuple C = (V, 2, & s), wh.ere V is a finite alphabet, Z C k’ is a fini ‘e
alphabet @f tcow~zdmlsymbols) an set
_,
kORh!ATION PROkIkNG LETTERS
Volumb 7. number 5
&e binary relation $ C, V*NY* X Y” is defined by: llClXa~~ifA-*wisinP,whereAisinN~du, x are in V*. The binary relation’$dy*m* X I/” inP,whereA is deiked by: .& ‘3 vwtxifA+w’ ( i&PI’, uisinl Y*,andxisin IS*. e shall often 4?when tlfe gmrnma omit the subr;cript G from z or ‘G G is understood.) For any binary relation R on a set S let R +@2 y) denote the transitive closure of R (transitive, reflexive closure of R). Let G = (I< Z, P, s) be a context-free grammar. ‘I’htisymbol 14in N is trsefirl if S $ tix $ uwx, for some U,W,x, in C*. The symbols in N which are not zzf111are c:dled useless. There are well known algorithms for eliminating useless symbols [I]. A production A + B, where A, B are nonterminals, is called a single produe tion. A (rightmost) sen tenti$ form is any string w .in k’* such that S 5 w (S “% w). The grammar C = (V, Z, P, s) k called an operator grammar if A * w in P implies w is not in VNNV’. (That is, no production in an operator grammar has a right side with two adjacent nonterminals.) A production of the form A + e, where e is the empty string and A is any nonterminal symbol, is called an e-production,.A context-free grammar G = (V, I=, P, s) is piroper if it does not possess useless symbols, does not possess any e-productions except possibly S + e in which case S does not appear on the right side of any production, and no derivation of the foml A 3 A, for any nonterminal A, is allowed [ 11. A context-free grammar G = (V, Z, P, s) such that A + ar and B + p in P implies cy# fl is called uniquely inuertible. For any grammar G = (V, Z, P, s), let the kmguage generated by G, denoted by L(G), be I&) = (winZ’IS!kw}. Let G = (li: Z, P, S) be an operator grammar. The three operator precederzce relations <, G, b are defied on pairs (a, 6) of terminal symbols as follows (where a, /3,Ii are arbitrary strings in V* and 7 i:; in NU (e):
(1)
a31 ifA +~~bpisinP,
(2)
a< b ifA -+uca~vbflisinPandB~yb6,
(3)
a > b ifA4 +
CI&J~ is in P
and B t say.
An operator cammar is called an operator precedence granmxzr if it has no e-produ\=tionsand at most one operator preb-edencerelation holds between asly pair of termE1a.lsymbols [ I$]. An operztor gra.nmxu
August1978 ’
G = (VJ P, s) is called a weak operator precedence grammai if
(1) it has no e-productions, in uniquely invertible, and does not possess any sin@eproductions except possibly involving the initial aontermind (in which case the initial nonterminal dces not appear on the right side of any production) (2) the operator precedence relation > 3s disjoint from the union of the operator precedence relations &and<,and (3 if A -+ Cvxpand B -* fi are productions in P, where x is in C, then XI3does not occur as a substring 0%~ right sentential form. A language L is an operutor precedence hmgua& (weak operator precedence langvuzge)if’ there is an operator precedence grammar (weak operator precedence grammar) G such that L = L(G). Let G = (V, 22,P, s) be a context-free grammar. The Wirth-Weber precedence relations R (q, R (=), and R(>)iare,binG-relations on V defined as follows [l,lO]: (1) -(X, Y)ER(
(X, Y) E ip (=) if2 + arXYj3 is in P for some
cuJin V*,and (3)
(X, Y)~R(>)ifA+aBQSsinf,B~yX,
andCSYS,forsome~,&y,6
in V
(The relation R(>) is &stomarily restricted to pairs (X, Y) such that Y is in C, since the symbol immediately to t.h+_ right of a string to be reduced in a right-most parse is a terminal symbol. We shall adopt this restriction here also.) A context-free grammar G = (V, X:,P, 8) which is proper, has no e-productions, and in which at most one Wirth-Weberprecedence relation exists between any pair of symbols in V is a precedence gramxkr. A precedence grammar which is uniquely invertible is called a simple prece dence grammar. Let G = (V, C, P, s) be a proper context-free grammar with no e-productions. We say that G is a weak precedence grammar if the following conditions hold
WI: (1)
the Wirth-Weberprecedence’ k+on R (>) is disjokt from a union of R (<) and R (=)
Vo!zme ‘1,number 5
(2)
INFORMATION PROCE!&NG UTTERS
ifA+aXpandB+~areinPwithXin v, then neither of the Wirth-Weberprecedence relations R(\? or R(=) exist on the pair (X,B).
A hgu L is called a simpleprecedence bngwzge if’there exists a simple precedence grammar G such that L = L(G). It is known that for every uniquely invertMe weak precedence grammar G one can effectively construct a simple precedence grammar C’ such that L(G’) = L(G) [3]. It should be noted th@ the third condition in the defunition of a weak operator precedence grammar can easily be detected. That is XB occ’ilrsas a substring of a right sentential form of a grammar G if and only if either of the Wirth-Weberprecedence relations R(=) or R(C) exists between X and I3 in the grammer obtained from G by deleting useless symbols. (This follows from Theorem 5.14 of Cl].) Since the Wirth-Weberprecedence relations can easily be computed, the third condition is easily verified. Although the uniquely invertible weak precedence grammars generate only simple precedence languages, we shah see that weak operator precedence grammars generate languages which are not operator precedence languages. For example, let II1 = {a%” In 3 1). L1 is a weak operator precedence language. For instance, the following set .of productions comprise a weak operator precedence grammar which generates L1 : SdaaSb,
S-,aab.
For the grammar with the two productions as described above, the table of operatcr precedence relations is as listed below: .(
It follows that Lr is a weak operator precedeycc language. However, L1 is not an operator precedence language; a proof that L1 can not be generated by any operator precedence grammar is described below: Suppose that G = (K Z, P, 5’)were an operator precedence grammar such that UG) = Lr . It may be assumed, without any loss of generality, that G does not contain single productions and dh~esnot possess useless symbols [2] First it is shown that the relations a < a md b 9 b
’ Acll_:ulst1978
musthold. Since L(1is not a regular s+t any ::rmtext.
free grammar generating Li *mustbe selfiem,Gti&ling [8]. Hence there is a derivation in G of the kmn described below: (where u, q w, k,.,‘yE (a, t} a) S a tiy 5 u&y
.a uwy = 2%” ,
(4
for some n > 1 and strings u and x such that ux # e. It follows that udwx”yis in L(G) for all i 2 Q. Therefore, u E a*, v E a’, w E a%*, x E b’, and y E b*. (Otherwise, eau2wx2ywould either (1) not contain twice as many occurrences of thy symbol a as occurrences of the symbol b, or (2) nok be in the set a+b’,) This implies that a G a and b 3 b hold. Secondly, we note that G cannot possess productions of the form
for any strings cy,0 in Y’ and any nontelMnal B. (That is, no nonterminal symbol may prl:cede an occurrence of the symbol a and no nonte,nninal symbol may follow an occurrence of the symbol b.) This is so, since if there were a production of the type A + da& then B must be a non-tern$nal such that B $ ak for some k > 0. This implies, however, that a > a and we have already established ‘he fact that a Q a. A similar argument notes that productions of the form A + c&&3cannot occur. Finally, from the above we may conclude t!!at every production in G must be of one of the three forms:A+aBb,A+aB,A+Bb,whereAENand B is in N U (e}. (That is, if there is a production in G of the form A + aaBb:,for example, then a G d and that would contradict the fact that a Q a and G is an
operator precedence gmnmar.) Thus, the partial derivation A h tix from (*) must be accomplished by using only tic above three types of productions with B EN Consequently, we can write this partial derivation a.sindicated below. A = &, *
vlBlxl * vlv2,(92~2xi
* VlV2 . . . U&&ifk
.. . x2x1
=
...
d-r
for some k 2 1, where vi is h (G,e), xi is in {b e}, and ucf # e, for all I. d i c
Vuhmtt37, number5
August 1978
JNFORMATION PINFESSING LETTERS
fm some i, then B 9 a must hold.) Therefore, sinze at most One of the Operator precedence relations + <, or b can hold betwe8n the symbol II and the symbol b, it mur;t be that either (1) for a&ii, V,= s an+ yi = b (in which case, orA b holds), (2) for all i, ui = 2 ahri ti =b(inwhichcase,aQb holds), or (3)foraN, ~~=a.~dxi=e(inwhichcasea~ 3 holds). In all three of these ct=s, however, it is easily verified that uu~$vx*ywould not contain twice as many occurrences of the symbol a as occurrences of the symbol b. This c,ntradicts the assumption that L(c> = Ll. Therefore, there are weak operator precedence language5 which are not operator precedence languages. It is shown next that every operator precedence language is also a weak operator precedence language, Let G = (V, Z, P, 5’) be an operator precedence gmmmar with no single productions. Suppose thatA~arX;BandB~PareinPwithXin~.Itmust be shown that XB is not a substring of any rightmost Lqntential farm of G=Suppose there were a rightmost sentential forz.~cyXh for some 7 in V* and w in EC’ Then 7 Ww rg 7x0~ and, by Theorem 5.25, page 439, of [ 11, the operator precedence relationship 4 must hold between X and the leftmost terminal symbol of 0. However, since X0 occurs in the right side of’ a productiOn in P, the operator precedence relationship 5 must hold between X and the leftmolst terminal symbol of 0. This contradicts the assumptio& that the operator precedence relatiOnsare disjoint. Since every operator precedence grammar can be replaced by an equivalent operator precedence grammar without single productions [2], the result follows. Thus, we have established the following: Ixmma. The fmnily of i.eak op,eratorprecedence tinguages properly con rains the ,fami!yof opemtor precedence lunguzges. It is known that the simple precedence language ~=(LIO”l”0”1m,n>l} W {bO”1”O’~m,n~~l) is not an operatL+rprecedence language [2,5]. In fa&
a straightforwarC proof of this fat will show that in any opc=rator gmmmar both of the operator precedence must hold fcr the pair f : ,l$. (That
is, 1 < 1 and 1 > 1 must be true.) It follows imm& diately that L2 is also not generated by any weak operator precedence grammar. It is shown next that every weak 0,perator precedence langu:@eis a weak precedence hUIguq8. I’he result is obtained by describing an algorithm to convert any weak operator precedence grammar into an equivalent weak precedence grammar. The algorithm is essentially the same as that given for operator precedence grammers in Section 8.3.2 of [2] i The major difference is that the proof of the algorithm’s correctness must be modified for the cas8 of weak precedence grammars. These differences are indicated in the following excerpts from [2]. We take a weak operator precedence grammar G and change it into a uniquely invertible weak precedence grammar Gi such that L(Gr) = 4 L(G), where 6 is a new symbol not occurring in the terminal alphabet of G. Once again, the algorithm is the ssme as that given in [2] (see Algorithm 8.7). We dG:scribe the algorithm briefly here for completeness. Let G = (V, E, P, s) be a weak OperatO:prece dencegrammar.Let&= {[XA]iXisincU (6) andAisin~?andVo=NoU~.Lethbethehomomorphism from (VOU (6 1)’ to V’ defmed by: (a)h@)=a, forainI3W 00 h&A]) = a~.
(&},and
Then h-$x) is defutled only for strings QIin V’, which begin with a symbol in C W (6 ) and do not have adjacent nontenninals. Moreover, h-‘(a) is unique if it is defined. That is, h-l combines a nonterminal with the terminal to its left. Let POconsists of all productions [d] + h”(ucy) such that A + cyis in P andthesymbolaisinZU (4). Let Cl = (VI, Z, Pl , S) where Vl and Pl are obtained. from Ve and POby deleting all useless symbols and productions involving useless symbols. It needs to be shown that Cl is a uniquely invertible weak precedence grammar such that L(Gr) = &L(G). The proof of Lemma 8.15 in [2] carries over directly to this case except for the argument that if A + a@ and B + p are in Pi, then neither (X, B) E R (C’)or (X B) E R(z) holds. (The referenced proof uses@ this point that the operator precedence relation P is disjoint from the operator precedence relation 6 This no longer need be true.) We dasribg an altemative proof of the required fact. Let A & t6Yfl and
Volu~&7, number5
INFORMATION PRWESSINCJ WETTERS
B+BbeproductionsinP&etA = [do] andB= [b&) f’orsomecr,binCU (4) andAo,BOinN. By the definition of PI, since B + Cpis in PI, there must be a production&, +If10in P, where fl = IZ-~@&,). Likewise, since A + ti$ is in&, there must be a productionA0 + q&18,in P, where aX/3= h-l(~~~o). Itisshownthatif(xJ?)ER(=)o; (X,B)ER(c)in Gl,theIlbisinI:and(b,BO)ER(=)or~,BO)E R(z) holds in G. Since (b, 80’)E R[=) 0;: (b, Bo) E R(<) implies that bl30appears as a substring of some rl@most sentential form of G and the third condition in the defurition of a weak operator precedence grammar prohibits such occurrences, we may conclude that neither of the Wirth-Welwrprecedence relations R (=) or R(<) hold between X and B. Cke 1. If (X, B) E R(=) holds in G 1,then G1 must contain a pro@xtion of the form (‘+ riXBy* for some non&r&al C and ri,72 in V*. Since B = [b&J, it follows that there must b+ a production in P of the form C’ -+ y\Mey; and that b is in IL It follows immediately that 8, Be) E R(=) holds in G. Gzse2. If (X, B) E R(c) holds in Gr , then G must contain a production of the form C+ ‘yrXT72, for some nontermin& C, T and strings rr,y2 in Vi such that T b Bys. for some 7s in Vi. It follows that there is a production in G of the form C; + y;bT’&, where 7iXT7s = h-‘@y~bT’$) for some cinZU (&),andT’ijBoy;,forsomer3in Vs. (That T = [bT’] for some T’ in N such that T’ 8 Boy; follows from the construction of the yroductions in Pr .) ‘Iherefore, the Wirth-WGberprecedence relation R(c) holds between b and l30 in G, Le. @, Bo) -63. It follows than as stated in Theorem 8.22 page 716 of [2] that if L is a weak op‘Tator precedence language, then L is a simple prececence language. (That is, from the weak precedence grammar Gr for &L(G)we may construct a simple precedence grammar for L(G) using results given in [2] .) Theorem. l%e family (of weak operator precedence languagesproperly includes the optv-atorprecedence languagesand is properly included in the family of simple precedence kmguages. IOshould be noted1that a parsing algorithm for weak operator precedence languages can be con-
Auaust1978
strutted in a manner similar to &at for an operator precedence language. That is, since we have the CO~Idition ‘WI +.arXpandB + fl we in P, where X is in C, then Xg.does not occur as a qubstring of-any @bt sententi;Jform’?,the f&$&at both of the operator precedence relations +-and Q may hold between X and the leftmost terminal symbol of /3doea not create any ambiguity in parsing: (That is, one can not feduce /3to B, since this would yield a potential rightmost sentential form containing XB as a substring). Furthermore, ifX is a nonterminal symbol, then p also can not be reduced to B, If /3were replaced by B in this case, then the resulting string would have two adjacent conterminals and, hence, would not be a sentential form of an operator grammar. Thus, the parsing algorithm may operate by the rule: reduce the longest suffu of the processed portion of the input string that occurs as t&e right side of some lroduction. (This tifffur’may be kl~ntified, as in the case of parsing operator precedence languages, bythe precedence relations. In particular, if p is the string to be reduced, then the operator precedence relation 3 will hold between the rightmost terminal symbol of p and the next input symbol and the operator precedence relations & or < (possibly both) will hold between all the “adjacent” terminal symbols of the processed string (including those in O).) In conclusion, we note that some languages of practical interest are weak operator precedence and not operator precedence languages. To see this consider the language L of all well-formed functional expressions involving a three p&e function symbol b and an operand symbol a. This language is generated by the grammar: S + bSSS, S + a I? may also be generated by the following weak operator precedence grammar: S+a S + Ta T+ bSaT T-+ bSa
T -+ bUaT T+bUa U+aT U+TaT ,
where b < b, b 2 C, b < a, a b. It C~II be shown that this language cannot be generated by any operator precedence grammar. (One may use in a proof that the language (bnQ2’+’ In 2 1) is a subset of this language L. lt should be observed, also, that the related knguage generated by the grammar S -+ bS,S, S + a is an operator precedence languag%) 217
‘I%e author gradid& acknowledgesthe a.s&taiue prqideq-by cogversationswitl~W.M.Evqeikt, who w&6 -iipr MS a& &Ge]ate;t f~y:ofjgraIanMs [4] ,Iaiid‘t& referee&eh6 pravidedq#o&ate witicisms and’sugge&&s for im@xwemiats. _ .
Refereslces
R
[l] A.V. Aho and J.D. Ullman, The Theory of Parsing, Translation, and Compiling,Volume I: Parsing (Prentice-Hall PublishingCo., Englewood Ciiff’s,NJ, 1972). f2l A.V. Aho and J.D. Ullman, The Theory of Parsing, Translation, and Compiling,Volume II: Compihng, (Prentice-Hall PublishingCo., Englewood Cliffs, NJ, 19?3). (31 A.V. Aho, P.J. Denningand J.D. UMman,Weakand m&d strategy precedence parsing,J. Assoc. Comput. Mach. 19 (2) (1972) 225-243.
218
W&. Zvan+hst, An extension of the operator precedence parsing technique, MS. Thesis, Department of Computer Science, Northwestern University, Evanston, IL, USA. Fischer,,Some properties of precedents language& ISI Iv&J., Proc. Assoc. Comput. Mach. Symposium on Theory of Computing