A method for proving programming languages non context-free

A method for proving programming languages non context-free

Volume7, number 3 April 1978 A METHBD FOR PROVING PROGRAMMING WGUAGES NDN CONTEXT-FREE Stefan SOKOLOWSKI Institute ofihtkematlcs, Universiryof Ch...

377KB Sizes 2 Downloads 42 Views

Volume7, number 3

April 1978

A METHBD FOR PROVING PROGRAMMING WGUAGES

NDN CONTEXT-FREE

Stefan SOKOLOWSKI Institute ofihtkematlcs,

Universiryof Chhsk, ul, WituStwosza 5 7,80-952 Gdansk, Pohnd

Received22 June 1977;relrisedversionreceived24 January 1938 Programminglamuages,identifiers,con text dependence,pumping lemma 1. Introduction

)WOdistinctletters (saya and b>and for every ul, u2, u j E Z+, if

The fact that AUOL 60 is not context-free was fust proved by Floyd in 121. He noticed that in any context-free language L every suffkiently long word p has a partition p =qrstu with either r or t nonempty, such that the words .I*(‘) st(j) u alll belong to L for i = 0, 1,2, . .. (x(~) - i-th iteration of x). If ALGOL 60 were context-free it would permit such a partition for every word of the form

(U~XU~XU~~X E z”} c L,

In order to prove that ALGOL is not context-free, assume that 2:’ - set of letters, Z - set of ALGOL% terminals and consider the program

be& d

beginrealx;x=Oend.

~(~1;x(m) = x(n) end ,

which turns out to be untrue. In his paper Floyd conjectures that similar considerations hold for “any reasonable language in which all variables must be declared”. It is intuitively obvious that one of the m& reasons for ALX;OL’scontext-dependence is the obligation to predeclarethe variables.However Floyd’s lemma is not strong enough to prove that obligatory declarations exclude context-independence (see Appendix). The aim of this paper is to develop a criterion of context-independence applicable to real-life programming languages - a criterion based on the fact that intuitively speaking there is no contextfree way of checlang whether two subwords of a word are equal.

t?zenthere exis: two different wordsx’, x” E E’+, such that uIx’u2x”u3 E L.

This program remains correct under every substitution of a word from Z’+ for both occurences of .x. If ALGOL were context-free, then it would contain (according to the Theorem) a word begin realx’;x”:= 0 end , # x”, which it doesn’t. It’s clear that the criteriofl (CR) faiis to be satisfied by>ny programming language with obligatory declarations and indefinite length of identifiers. Moreover (CR) applies to abstract formal languages as well. As an example we may prove that L = {xx Ix E C* 1 is not context-free provided C contains at least three elements, say a, 6, c, Let 2 = {s, b}, u1 = u2 = c, 1.4~ = A (theempty word), Now {cxcx ’YQ {a, 51’) C L and if L were context-free it would cc .atain a word CX’C.K” with x’ #xv’ 2nd x’, x” E {a, 2)’ , which it obviously doesn’t.

with x’

2. The criterion and how to use it Let I:

be the alphabet of a language 1,.

Theorem. If L is corztext-@ee, then (CR) .,‘brever] subset X’ of C containingat least

3. Lemma I.IfT isa finite tree with f,tzeorder of branching rtowhere exceeding r, and the longest path in T has

INFORMATION PROCESSING LETTERS

Volume7, number 3 kngth k, then number of leaves of T G fi

(Remark. by the length of a path I mean the number

of segments between its vertices. Only the branching points of order zero or not less than 2 are counted as vertices). The lemma is obvious. Let G be a context-free monotonous grammer of L over the terminal alphabet I: and the nonterminal alphabet I/. Let n be the number of nonterminals (n - u) and r - the length of the longest production in CJ(thus throughout the proof all the branching points of all derivation trees considered will have order not exceecling r). LRt the denotation 1x1stand for the length of :he word X. Lemma 2. Ifs is a positive integer, z - a word from Z”, y - a connected subword of z, ly ] G s, 14zI2 t(‘+’ )(*+ ‘1 and r !iere exist a nonterminwl N nvzda derivation A’ -: z, then by erasing some l’ettersof z outside the subword y we :~LWY o 4::ain a properly shorter word z’ also derivable ,‘kom N.

hoof. Let P be the longest path in the derivation trcte T of N 4 z. According to Lemma 1, length dp) 2 (s + l)(n + 1). Since /y I< s, there are no more than s vertices of T where the path P meets a path from a letter of y. Hence there exists a subpath P’ of B without the vertices of this kind and such that length (p’) = n + 1, _4tleast one nonterminal M must be repeated along this path. So there exist the words dl, t&, d5 d4, d5, such that (I)

NSdIMd5,

(2)

M$d2Md4

63)

MSd3,

,

with at least one of dz, d4 non-empty and both disjoint withy. By composing (1) and (3) we get

Nzdld3d5

=z’

and this completes the proof of Lmxna 152

2.

April

1978

Naw let s =max(lull, Ia& ju3l) 2nd :y)2 = ( r#r)(n+r), Assume that all the words of the form w =t11xu2w3 with x E P belong to L. In particular v we put x = a@)b@) we get a word from L. Let T be the derivation Itree of w in the grammar G and P - its longest path. Let to be the “youngest” vertex on P, such that the fti subtree TO of T with to as its root has at least r@tl)(ntl) leaves. Ll!t ?: be the immediate successor of to along P and 7’1- full subtree of T with ti as its root. T1 has less than r(‘tr)(ntl) leaves, so by Lemma 1 the length of the segment of P contained in Tr is less than r@+E)(n+l) (this time we use the lefthand side of the inequality in Lemma 1). Therefore the: length of the segment of P contained in TO is at most r(s’tl)(‘*tl),hence the number of leaves of TO is not greater than r@l)(“tl) = m. So we obtained the estimate

r@+l)@+l)G number of leaves of To < PR. Let N be the nonterminal at to in the derivation t;ee T, and z - the terminal word of To. Because of :itslength, z may have a non-empty intersection with not more than one of ul, u;!, uZI.Let y be this intersection. Since the preassumptions of the Lemma 2 are fulfilled, we may substitute a .shorter word z’ for :I in w without influencing ctl, tr!2,u3. Once more comparing the length of z and PVwe see that this substitution changes either the :8rst x in w, or the second one, or the number of b - s in the first one and the number of a - s in the second one. Anyway, after this change both x - s fail to remain equal. And this completes the proof.

$. Final remarks ,The (CR) criterion provides a necessary condition for a language to be context-free. The natural question arises: is it also a sufficient one? I suppose the answer is no. The intuitive background for it is the feeling that (CIR) expresses onlIr one of possible context-type restrictions on languages, namely the one that reads “every variable has to be predeclarcd”. There exist several others even in real-life programming languages. Flor instance ?IO variable may !le declared more than once”. It is easy to imagine a programming language where there is no compulsion to declare but declaring a variable more

Volume 7, number 3

INFORMATION PROCE!jSING LETTERS

than once is outlawed - this being the only ~ontc~r type restriction. Such a context-sensitive lanlai+* might nevertheless satisfy the (CR) criterion.

Acknowledgement As it would have been completely impossible for this paper to come into being without the :inspiration of Prof. A.W. Mostowski I express him by ;gratitude for introducing me to the problem, for the incentive to write the solution down, and for his concern about its future.

Appendix This appendix gives an example of a language ‘with obligatory declarations’ whose context-dependence may not be proved by means of Floy#s lemma even in its stronger version that reads: Lemma. For every context-free language I, there exist constants m and n such that every word p EL with JpI> m may be written as p = qrstu with lrst I& n and either r or t non-empty, such that for every i 3 0: q#) St(*)u belongs to L.

April 1978

Let I: be an alphabet and c $ Z:. Let L = X’U{XC(? 0). We’may say that any word of L consists either of a declaration of ‘anidentifjer or of both - declaration and application separated by a sequence of separators c. For a partition of a word w from L as demanded in the Lemma we take : if w contains a “c” (w =xc@) x):

x IX c Xi and k >

x-

first ‘8 - empty - empty - rest of w, if w does not contain “8’:

empty - first letter of w - empty - empty - rest sf w. So nothing may be stated of L using the Lemma. It is obvious though that L does not satisfy CR and hence is not context-free.

References VI A. Blikle, Wybrane zegadnienia lingwistyki matematyczVI PI

nej, in: Problemy przetwarzania informaeji, Wydawnictwa Naukowo-Techn., Warszawa (1974) (in Polish) 94-152. R.W. Floyd, On the nonexistence of a phrase structure grammer for ALGOL 60, Comm. ACM 5 (1962) 483-484. P. Naur et al., Revised Report on the Algorithmic Language ALGOL 60, Numerische Mathematik 4 (19631 420-453.

153