Information Processing Letters 31(1989) 47-51 North-Holland
12 April 1989
A PUMPING LEMMA FOR DETERMINISTIC CONTEXT-FREE LANGUAGES Sheng YU Department of Mathematical Sciences, Kent State University, Kent, OH 44242, U.S.A. Communicated by David Gries Received 4 August 1988
In this paper, we introduce a new pumping lemma and a new iteration theorem for deterministic context-free languages (DCFLs). As applications of the lemma, several languages are shown not to be deterministic context-free by using this lemma. An iteration theorem for DCFLs has already been introduced by Harrison in [3]. However, the result presented in this paper is different from his. Instead of considering the pumping property of individual words, we consider the pumping property of pairs of words. The latter approach seems more natural to the DCFLs (LR(l) languages). The pumping lemma introduced in this paper also appears to be easier to understand and use. We conjecture that there exist languages that can be shown not to be DCFLs by this new result but cannot be shown by the existing one. In Section 1, we introduce the basic notations used in this paper. In Section 2, we state our pumping lemma (without a proof) and show two applications of the lemma. In Section 3, a left-part theorem for LR( k) grammars is introduced, which is crucial to the proof of the pumping lemma and interesting on its own right. In the last section, we prove the lemma and extend it to an iteration theorem.
Keywords: Deterministic context-free languages, iteration theorems, pumping lemmas, LR(R) grammars, Greibach normal form, left-part theorems
1. Basic notations A context-free grammar (CFG) is denoted by G = (N, T, P, S), where N and T are finite, disjoint sets of nonterminals and terminals, respectively. P is a finite set of productions, each having theformA+qwhereAENandarE(NUT)*. S E N is the start symbol. L(G) denotes the language generated by a grammar G. A language L is context-free if L = L(G) for some context-free grammar 6. A pushdown automaton (PDA) is denoted by a 7-tuple (Q, 1c, F, 6, s, a;‘,, F), where Q is a finite set of states; z1 is a finite set of input symbols; F is a finite set of stack symbols; 6 is a mapping from Q X (2 U (c )) X r to finite subsets of Q X r*; s E Q is the initial state; 2, is the initial
bottom-of-pushdown symbol; and FE Q is a set of final states. A PDA M = (Q, 2, r, 6, s, Z,, F) is deterministic if for each triple (4, a, 2) E Q X (2 U w)Xr (1) I W4, a, 2) I d 1;
(2) 6(q, a, 2) = 8, for all a # Q, if S(q, C, 2) 4. A language is a deterministic context-free language (DCFL) if it is accepted by a deterministic pushdown automaton (DPDA). For (YE (N U zl)*, 1a 1 denotes the number of symbols in a. We define (‘I w, for w E Jc*, to mean the first k symbols of w (k) W=
0020-019OJ89/$3.50 Q 1989, Elsevier Science Publishers B.V. (North-Holland)
x w
if Iwl >k, w=xy,and ifIwI
1x1 =k;
47
V 1OU @ =a1
(V
UtF
*(q ‘V)34
8P
'7d3a Sugu03 n pm ‘Ial = InI Qn =7 i&m2hm~ au .g aldum3
0 -ma BIOU si7 wuwa~ 8pdumd aqa 01 Am~um s? sg~, ‘7 9 1zrcI_ I ZX I _UQ”V *Jawa =
PIOY 1OU saop (I4 OS
wx~x~x
OS
U I u > 3 9 I fxZx = X
‘*q3zd
pm
1 I nsaJ puz 0 < I Zx I ,leql
I
UO~ll?Z!.IOl~lJ3AllV
‘(p)
+q 3zX
II?
M
qctns fxZx*x
Jap!SUO=, aM ‘MON
‘Ploy 1OU saop (8 aJoJaJaQ& l 7 3 +uzQly-uV = zSxfxZx = ZSx~xfx~xIx uaql lng ‘0 < y amos ~03 )snm 7 y9 = tx pm? yv= Zx uoggmm aqj L~S~~IZS 3 0xpxfx4xrx f pi Jo3 put2 0 c I*xZx 1 ivgl qms Sx*xfxZxIx = x suo~~azl~o~~1~3 alq’Issod Quo arw, ‘F3 (E) JaP?sUm sn la? ‘Plsy PInoqS (v) JO (c) ravlta %uuua~ ayi 0) %IIJU~~~ l ucua~ aw 30 (2) pus (I) sai3syes zx = , M pm cfx = M 30 m?oq:, ayl %~sno~qo •I_Uzqr~ pm2 'q=X 61_uquv=x pntr 63 < u Ja%alq amos 303 uzquv = ,hi puv UqUv = M asooql) l trunua~ Sqdumd aq IQ 7 ~03 ~utrasum aw aq 3 ia1 pm w3a =SF 7 l=v a-w l loord
tousi(oel!zcIp) n
ioel+pI
l 7mav =7vMu=3
l mtua~ @dumd aq, 30 suogmqdda OMI appord is?3 aM ‘p uoy -aas ty pun03 aq um trunuaI agl30 3ooJd au
INFORMATION PROCESSING ‘LETTERS
Volume 31. Number 1
A grammatical (derivation) tree is denoted by T= (D, T, I-, r, p), where D is the set of all nodes in the tree; T : D X D is a parent-of relation such that dI T dz if and only if dI is the parent of d2; t- : D x D is called a left-to relation and is defined as follows: d b d’ if (i) for some i >, 1, d is the ith child of a node and d’ is the (i + ljst child of the same node, (ii) d(T-l)*l-T*d’andthereisnoeEDsuch that d(T-l)* + T *e and e(T-‘)* I- T *d’; r is the root of the tree; p: D --) (NU X) is a mapping that maps each node to a grammatical symbol. Let T=/D, T, t- , r, p) be a grammatical tree of an LR(k) grammar. Let dl, d2,. . . , d,, be the complete leaves of T in the left-to-right order. Then T is called a right canonical grammatical tree (RCGT) if p(r) 3 P(dMd2) PPn)* l
Let G = (4, T,, + 1, rl, I-,, r2, ~~1. ‘men C =
l
separate trees (T,, T2,. . . , r) for some t 2 0. Two forests F1 = (T1,, T12,. . . , T,,) and I;; = (T21, if M = n and Tli = T2i for T22 , . . . , TI,) are -x@ a.lll
0 and t ) 0. If the following conditions hold: (U Pl(r,) = P2(?2h (2) pl(cl . . . c,,,) = p2(d, . . . d,) = ar E (N u X) *, for some integers m and n such that O
l
~~1 and
T2 = (4,
l
T2 if there exists a bijection f: D1 + D2 such that (1) f (r1) = r29 (2) cT1d in T1 if and only if f(c)T2 f(d) in T2, (3) ck,dinT,ifandonlyiff(c)t-,f(d)inT,, (4) PI(d) = Pz(f (djj for all d E Da Let II, 12,..., 1, be the complete left-to-right sequence of leaves of a grammatical tree T. Then for a nonnegative integer n < m, (n)T denotes T restricted to the set of nodes (d 1d T * Zj for some 1 6 i < n ). (n)T is a forest, i.e. a sequence of 5,
. 4
12 April1989
m = n; (m)Tl ~(“1 T2_ To explain the theorem, let us look at Fig. 1. TI and T2 are two RCGTs of an LR(1) grammar. Their roots have the same label A, and the first m leaves of T1 and the first n leaves of T2 are labeled by the same string. The nodes x,+~ and
..
.
.
AL
...
‘b l._
,_~~~~~~L cyJyhl ... Cm *Cm+1
........ ..... .......... ....... ... Tr
G
:
4, : dn+l 4 .. ..... .... .... ........ .. ...... . T1
Fig. 1. 49
both labeled by the same terminal. And all the leaves to the right of x,+~ and yn+l in Ti and T,, respectively, are labeled by terminals or E. The left-part theorem says that from the above conditions we can conclude that the two forests
yn+* are
Fl
=
(Tll,
T12,.
12 April 1989
INFORMATION PROCESSING LETTERS
Volume 31, Number 1
. . 9 TIP)
and
F2 =
(T2,,
T2,), which are circled by dotted lines in Fig. 1, should be identical. Note that there may be c-labeled nodes at the boundaries between the first m leaves and the rest of the leaves in T,, and between the first n leaves and the rest of the leaves in T2. Condition (5) guarantees that both Fl and F2 include all of those c-labeled nodes. Note al-~ that the paths from r, to x,, 1 in Tl and from r2 to J,, . 1 are qot necessarily identical. They may be different not only in labeling but also in structure. The above theorem can be proved by induction on m. We omit the proof here. A detailed proof is given in [7]. G2 , . . . ,
4. Proof of the pumping lemma In the proof of the pumping lemma, we need the following result, which has been proved by Lomet in [5] and Geller, Harrison, and IIavel in 123e
Theorem 2. Evev deterministic context-free language has an LR(1) grammar in Greibach normal
form. Proof of Lemma 1. Let L be a DCFL. Then there exists an LR(1) grammar G such that L = L(G). By Theorem 2, we can assume that G is in Greibash normal form. Let k be the number of nonterminals in G and h be the maximum length of the right-hand side of a production. We show that C = 2’( k2 + l)( h - 1) is an appropriate choice for the constant for L in the pumping lemma. Consider two words w = XY and w ’ -=xz of L(G) such that w, w ‘, X, y, z satisfy (1) and (2) of the pumping lemma, i.e. 1x 1 ) C and (‘$ = % Let T, = (iv,, T,, t- w, r,, p,,,) and T,# = (N,I, T # I- ,,,I, rwt, p,,,l) be the two RCGTs that derive wwAd w’, respectively. By the left-part theorem for LR(k) grammars, the two forests (See Fig. 2) T129 . . .) T,,) and E; = Vi19 I;; = (T219 , . . . , Tzq), both of which derive X, are equal. T,2 That is, p = 4 and Tli = T2i for all 16 i
. . . . . . . . ..*...................* TW'
Fig. 2.
Volune 31, Number 1
INFORMATION
PROCESSING LETI’ERS
If no tree in F has 2’ or more leaves, then there are more than (k* + I)( h - 1) separate trees in F, i.e., p > (k* + l)( h - 1). Consider the last (k* + l)(h - 1) + I such trees in F. Obviously, these (k* + l)(h - 1) + 1 trees are derived from a suffix of x whose length is at most C. Remember that G is in Greibach normal form, so there are at least k* + 1 separate trees among them that consist of only a single node labeled by a terminal. Let T,,, irI,,,. . . , T, denote these “terminal” trees (nodes) such that t >, k* + 1. Note that the parents of these single-node trees in T, (Twt) are nodes in the path from rl to (‘)y (from r2 to %). Let n 11) ni;) denote these parent n.12’. . . . ni, (ni;, rv12’.s., nodes in the path from r, to (‘)y (from r2 to %). Since t 3 k* + 1 and k is the number of nonterminals in G, by the Pigeonhole principle there is a nonterminal A of G such that A is the label of at least k + 1 nodes among ni,, nix,. . . , ni,. Again by the same principle, there is a pair of distinct nodes fii, and ni in T,, and ni# and ni; in Twt, respectively, such that A is the label for both nit and nii and B is the label for both n!; and ni;, for some Al, B E IV.Then we have factorrzations x = ~1~2x3, in Fig. 2. The Y =.hy2y3, and z = ~1~2~3 a~ show condition (4) of the pumping lemma clearly holds. 0
12 April 1989
Acknowledgment I wish to thank J.C. Beatty for his hint on the left-part theorem for LR(k) grammars, and K. Culik II and H.F. Rolletschek for their helpful discussions and corrections.
References J.C. Beatty, Two iteration theorems for the Lyk) languages, Theoret. Comput. Sci. 12 (1980) 193-228. PI M.M. Geller, M.A. Harrison and I.M. Havel, Normal forms of deterministic grammar s, Discrete Math. 16 (1976) 313-321. 131 M.A. Harrison, Introduction to Formal Language Theory (Addison~Wesley, Reading, MA, 1978). 141J.E. Hopcroft and J.D. Ullman, Introduction to Atom&a Theory, Languages, and Computation (Addison-Wesley, Readiq, MA, 1979). 151D.B. Lomet, A formalization of transition diagram systems, J. ACM 20 (1970) 235-237. 161A. Sahnaa, Formal Lunguages (Academic Press, New York, 1973). I71S. Yu, A left-part theorem for LR(k) grammar s, Technical Report, Math. Dept., Kent State Univ., 1988.
W
51