O N SOME DECIDABILITY PROBLEMS CONCERNING DEVELOPMENTAL LANGUAGES Art0 SALOMAA University of Turku, Turku, Finland
1. Introduction Developmental languages were defined by Lindenmayer [7] in connection with a theory proposed to model the development of filamentous organisms. Each letter in a word is interpreted as a state of a biological cell. Thus, stages of development are represented by strings of letters corresponding to filaments of cells. The developmentalinstructions which are presumed to generate the organism are modelled by rewriting rules or productions. These productions are applied simultaneously to all letters to reflect the simultaneity of growth in the organism. Such generating devices, generally referred to as L-systems, have been extensively studied from formal language theory point of view during the past few years. Thus, the essential characteristic of an L-system is that at every step of a derivation a production is applied to every letter in the word considered. In this paper, we use customary abbreviations about different kinds of L-systems. Thus, OL, 1L and 2L mean context-free, one-sided contextsensitive and two-sided context-sensitive, respectively. D means that rewriting is deterministic, and P that it is propagating, i.e., no letter goes into the empty word. Systems with tables are abbreviated by T, and extended systems (i.e., a terminal alphabet is allowed) by E. The reader is referred to [lo] for an introductory discussion, and to [9] for a recent bibliography of the field. The resistance of some L-families against closure operations is mainly due to the fact that there is no terminal alphabet and, consequently, also 144
DECIDABILITY PROBLEMS CONCERNING DEVELOPMENTAL LANGUAGES
145
all intermediate words are in the language. Families generated by extended systems possess strong closure properties. E.g., the family TOL is an anti-AFL, whereas ETOL is a full AFL. An important observation (made in [Z]) as regards decidability problems is that all OL-systems (including tables and extended systems) can be simulated by index grammars of Aho [l].
2. Basic decidability problems Since membership, emptiness and finiteness are decidable for index languages [l] and [ 5 ] , the following theorem is now immediately obtained. THEOREM 1. Membership, emptiness and finiteness are decidable for all OL-families, in particular for the family ETOL.
2. Consider any family XYOL among the OL-families. There is THEOREM an algorithm for deciding of an arbitrary language L in XYOL, and an arbitrary word P whether or not P occurs as a subword of a word in L.
PROOF. Let G be an index grammar for L, with the terminal alphabet V T . Since the index languages form an AFL, also the language L(G) V,*PV,* is an index language. Moreover, an index grammar G, for it can be effectively found. Theorem 2 now follows because we can decide the emptiness of L(G,).
n
In many cases (e.g., for POL-systems), the decision method of Theorem 2 can be replaced by a simpler device of checking through all ‘direct ancestors’ in derivation trees. 3. Consider any family XYOL among the OL-families. There is THEOREM an algorithm f o r deciding of an arbitrary language L in XYOL and an arbitrary word P whether or not P occurs infinitely many times as a subword in L.
Theorem 3 is obtained similarly as the previous theorem by deciding the finiteness of L(G,). Because the equivalence of sentential forms of context-free grammars is undecidable [12]; the following theorem is obtained. 10 Kanger, Symposium
146
ART0 SALOMAA
THEOREM 4. The equivalence of OL-systems (even of POL-systems) is undecidable. Also the equivalence of DTOL-systems is undecidable. Theorem 4 leaves open the equivalence problem for DOL- and PDOLlanguages. In fact, these are the problems of the longest open standing in this area. Also the equivalence problems for DOL- and PDOL-sequences are open (i.e., given two DOL-systems, one has to decide whether they generate the same sequence of words). It seems likely that all of these problems are decidable. In fact [M. Nielsen, personal communication], if we can solve the equivalence problem for PDOL-sequences, we can solve it for PDOL-languages. On the other hand, to solve the former problem, it suffices to consider two PDOL-sequences Piand Qi ,i = 1,2,... , such that, for every i and every letter a, the number of occurrences of a in Pi equals the number of occurrences of a in Q, . (The condition is obviously necessary for equivalence and it can be decided by the theory of growth functions considered in the next section.) One then has to determine a bound k such that Pi = Q, for i 5 k implies P, = Q, for i > k. Results concerning subword complexity [4] might also be relevant for these problems. Decision methods of an arithmetic nature can be obtained for the case where the alphabet consists of one letter only. (Such systems are called UL-systems. Now it is obviously irrelevant whether the system is OL, 1L or 2L.) Herman et al. [6] have shown that the generated language is always either regular or of the form idk* I i 2 0}, for some k andj. All their constructions are effective and, thus, there is an algorithm for deciding whether a given UL-language is regular. Herman et al. [6] propose as an open problem the existence of a converse algorithm. Such an algorithm can indeed be found [I31 by considering the minimal automaton accepting the given regular language: THEOREM 5. There is an algorithm for deciding whether a given regular language is UL. 3. Growth functions Consider any deterministic L-system defining a unique sequence of words Po, P I , P z , ..., where Po is the axiom. (Thus, systems with tables
DECIDABILITY PROBLEMS CONCERNING DEVELOPMENTALLANGUAGES
147
are excluded.) The function
is termed the growth function associated with the system. E.g., for the DOL-system determined by the axiom a and productions a -+ abd6, b -+ bcd'l, c + cd6, d + d ,
we have f(n) = (n + The basic paper in this area is by Szilard [14]. Growth functions of DOL-systems fit in the framework of the theory of integral sequential word functions [8]. The latter have been extensively studied in the past in connection with probabilistic automata. Assume that the alphabet of a DOL-system consists of the letters a, , ..., a k . Let TC be the k-dimensional row vector such that its ithcomponent equals the number of occurrences of a, in the axiom P o , for i = 1, ...,k. Let q be the k-dimensional column vector with all components equal to 1. Let M be the k-dimensional square matrix whose (i,j)th entry equals the number of occurrences of a, on the right-hand side of the production with a, on the left-hand side. Then the growth function can be given the matrix representation f ( n ) = nM"q. This representation gives rise to the following theorem [14, 81. (By the growth equivalence problem we mean the problem of deciding of two L-systems of a given type whether or not they possess the same growth function.) THEOREM 6. The generating function of the growth function f of a DOLsystem equals TC ( I - Mx)-'q, where I is the identity matrix. Consequently, the growth equivalence problem of DOL-systems is decidable. By changing q, one gets the same result also for the case, where only the number of occurrences of some letters in a DOL-sequence is considered. Theorem 6 also gives a solution to the 'growth analysis' problem of DOL-systems: given a system, one has to determine its growth function. Another more practical solution is based on difference equations [ l l , 16, 171. The converse 'growth synthesis' problem (i.e., given a function, one has to realize it, if possible, as the growth function of a system of some previously specified type) is much more difficult. The following result holds [8].
148
ART0 SALOMAA
THEOREM 7 . There is an efective procedure A with the following property. Given a function f ( n ) and an upper bound r for the Hankel rank off, A produces a DOL-system whose growth function equals f, provided such a system exists. If no such system exists, A runs forever.
It is a consequence of Theorem 6 that DOL growth functions are always exponential, polynomial or combinations of the two. In the DOL case, one can give an explicit characterization of the different types of growth. Denote by 3 the exponential growth (i.e., there are no and t > 1 such that the growth function satisfies the condition f ( n ) 2 t" for n 2 no), by 2 the growth bounded by a polynomial but not bounded by a constant, by 1 the growth bounded by a constant but not becoming ultimately 0, and by 0 the growth becoming ultimately 0. (These exhaust all the possibilities in the DOL case.) Given a semi-DOL-system,i.e., a DOLsystem without the axion, different types of growth may result by different choices of the axiom. Thus, the productions a --f a2b, b
3
bc, c
--f
cd, d
--f
E
give rise to the type combination 3210. The following theorem [16] tells which combinations may occur in DOL-systems. THEOREM 8. Type 1 never occurs without type 2. All other combinations are possible.
Assume that f ( n ) is the growth function of some (deterministic) L-system and, furthermore, thatf(n) is not bounded by a constant. Then f(n) is at most exponential and at least logarithmic. These bounds give some restrictions for the growth realizable by an L-system. Consider the following example due to Lindenmayer. A filament of cells grows in such a way that, at each step of the growth process, the first cell remains undivided, the second cell is divided into two new cells, the third cell into three new cells, and so forth. It is easy to see that this growth is more than exponential and, hence, not realizable by any L-system. On the other hand, the possibilities for D1L and D2L growth are much richer than those for DOL growth. Consequently, the former growth functions are much harder to characterize than the latter. (In fact, no general characterizations are known.) The following theorem gives some examples.
DECIDABILITY PROBLEMS CONCERNING DEVELOPMENTAL LANGUAGES
149
THEOREM 9. For D1L-systems without the axiom, all combinations among the types 3,2, 1 , O arepossible. There are D1L-systems with a logarithmic growth function. For each natural number t, there is a D2L-system whose growth function is asymptotically equal to d t t . PROOF.The first sentence follows by Theorem 8 and by consideringthe semi-D1L-system with the productions "a + a, #a 4a', where g is the input from the environment. An example establishing the second sentence is due to Herman, cf. [8]. The example given before [8, Theorem 351 has a growth function asymptotically equal to nli2.This is due to the fact that the lengths of the constant intervals grow in a linear fashion. The growth n113 is obtained similarly by making these lengths to grow in a quadratic fashion. This is achieved by letting a messenger travel back and forth in the string, always going one step further than at the previous time. Growth occurs only when the messenger has reached the right end of the string. A PD2L-system with these properties is defined as follows. The axiom is bca, g is the input from the environment, and the productions are:
a 4 a except ba + b, ha 4 c, f a -+ c, d
4
d, ae + e,
b --+ a, c 4 c except b~ d
+
-+
h, *Co + ae,
a except Qd-+ b,
e + a except
-+A
h -+ d. Then the generated sequence is: bca, aha, adc, dac, bac, abc, aaae, aaea, aeaa, eaaa, faaa, bcaa, ahaa, adca, daca, bacu, abca, aaha, aadc, adac, daac, baac, abac, aabc, aaaae,aaaea, aaeaa, aeaaa, eaaaa,faaaa, bcaaa, .... Other fractional powers are obtained by iterating this procedure. (Thus, to get n1'4, each of the messengers d and b has to travel back and forth.) Theorem 9 gives many examples of growth function of contextdependent L-systems which are not realizable by any DOL-system. An interesting open problem is whether or not there exists an L-system with
150
ART0 SALOMAA
growth type 2+, i.e., growth is faster than polynomial but slower than (With biological conexponential, as exemplified by the function dog". notations, type 3 growth has been called 'malignant' and type 2 growth 'normal'. Thus, type 2 i could be considered as growth which is neither malignant nor normal.) We mention finally an example showing that, for D1L-systems, very small changes in the number of occurrences of some letter in the axiom can cause big changes in the growth type, a phenomenon not possible for DOL-systems. Consider the productions b + c, c + c, a + a2 except *a b. Then the axiom ba2 gives rise to linear growth whereas the axiom ba3 gives rise to exponential growth. By letting also c grow, other similar changes are obtained. --f
4. Macro systems and Lindenmayer AFL's Let 9 be a family of languages. By definition, 9MOL consists of languages L such that L is the result of an 9-substitution into a OLlanguage. Languages in 64MOL are referred to as 64-macra-OL-languages. The family 64MTOL is defined similarly by considering 9-substitutions into TOL-languages. Macro systems were introduced in [3]. They correspond to the biological situation where one first observes longer segments ('macros') in a growth process and, finally, the structure of each macro is inserted. If 9 is the family of finite (resp. regular) languages, the corresponding macro families are denoted by FMOL and FMTOL (resp. RMOL and RMTOL). As usual, a cone is a family of languages (containing at least one nonempty language) closed under rational transductions (or, equivalently, under homomorphism, inverse homomorphism and intersection with regular languages). THEOREM 10. If B is a cane, then Sl= 9MOL and LZ2 = 9MTOL are full AFL's.
PROOF.We prove the theorem for Z1,the proof for P2being similar.
It suffices [lo, p. 1351 to prove that zl is closed under union, star, regular substitution, and intersection with regular languages. Closure under union and star is obvious, and closure under regular substitution
15 1
DECIDAEIILITYPROBLEMS CONCERNING DEVELOPMENTAL LANGUAGES
follows immediately from the fact that 3 is closed under regular substitution. To show closure under intersection with regular languages, we assume that LPl contains the language L and that R is a regular language, accepted by the finite automaton M with the state set Q,initial state qo and final state set F. For qi, q, E Q,denote by K,, the regular language consisting of all words which move M from q, to q,. Let L be the result of substituting languages L,, i = 1, ...,k, belonging to 9 'into the language generated by the OL-system G whose alphabet consists of the letters a, , ...,aK.Without loss of generality, we assume that there is only one production X -+ E (i.e., X is one of the a,%) in G with the empty word E on the right side. (Otherwise, we modify G by introducing a new letter X with productions X 4 X and X + E, removing all other productions A --f E, and replacing 0 or more occurrences of such letters A on the right-hand sides of the original productions in all possible ways by X.)We may also assume without loss of generality that a, is the axiom of G. A new OL-system GI is now defined as follows. The letters are S (axiom) and triples of the form (4,,a, q,), where qi, q, E Q and a is a letter of G. The productions of G, consist of all of the following: (9 S -+ (qo,a l , qF),where q F E F. (ii) If A + BIBz B,,, n 2 1, B,'s are letters, is a production of G, then a.9
.
is a production of GI for all states q l ,q,, qll, ..., qfn-l (For n = 1 the production is (qi,A, q,) + (qi , Bl ,q,).) (iii) (q,,X,qr) E, for all states qi. The language substituted for 5' is empty. The language substituted for (qi ,a,, qf)is L, 0 Kf,. Since 2' is a cone and KiIis regular, the language L,,, KfJis in 9'. Similarly, one can prove that if 2' is a faithful cone, then 2Z1and Y z are AFL's. LPl and g 2are termed the Lindenmayer AFL's associated with 2'. In [2], a different method is used to show that if 9 is a full AFL, then so are 3,and LY2. --f
n
152
A R T 0 SALOMAA
THEOREM 11. FMOL = EOL. PROOF.As in the previous proof, one can show that FMOL is closed under intersection with regular languages. This implies the inclusion EOL G FMOL. The reverse inclusion follows from the fact [15] that EOL is closed under arbitrary homomorphisms.
REMARK. In the study of FMOL-languages, one has to be careful in distinguishing between &-productionsin the OL-system and &-substitutions. The closure of FMOL under arbitrary homomorphisms IS obvious by definition, whereas the proof of this closure for EOL is rather complicated. Thus, Theorem 11 is not as immediate as claimed in [2]. All of the following facts are rather hard to prove directly, but any of them implies easily the other two: (i) EOL is closed under homomorphism. (ii) FMOL c EOL. (iii) The two definitions of FMOL languages given in [2] and [3] (one as above and the other involving a finite set of terminal productions) are equivalent. Along similar lines, one can show that FMTOL = RMTOL
=
ETOL.
References [l] A.Aho, Indexed grammars - an extension of context-free grammars, J. Assoc. Comput. Much. 15 (1 968) 647-67 1. [2] K. Culik 11, On some families of languages related to developmental systems, Internut. J . Comput. Murh., to appear. [3] K. Culik I1 and J. Opatrny, Macro OL systems, to appear. [4] A. Ehrenfeucht and G. Rozenberg, Subwords in deterministic TOL systems, to appear. [ 5 ] T.Hayashi, On derivation trees of indexed grammars, a n extension of the uuwxytheorem, Kyoto University, Tech. Rep. RIMS-122 (1972). [6] G. Herman, K. Lee, J. v. Leeuwen and G. Rozenberg, Characterization of unary developmental languages, Discrete Math. 6 (1973) 235-247. [7] A. Lindenmayer, Mathematical models for cellular interactions in development, Parts 1-11, J. Theoret. Biol. 18 (1968) 280-315. [8] A.Paz and A.Salomaa, Integral sequential word functions and growth equivalence of Lindenmayer systems, Information and Conrrol, to appear.
DECIDABILITY PROBLEMS CONCERNING DEVELOPMENTAL LANGUAGE3
153
[9] G. Rozenberg and D. Wood, Generative models for parallel processes, McMaster University, Hamilton, Ont., Computer Science Tech. Rept. 73/6 (1973). [lo] A. Salomaa, Formal languages (Academic Press, New York, 1973). [l 11 A. Salomaa, On exponential growth in Lindenmayer systems, Indag. Math. 35 (1973) 23-30. [I21 A. Salomaa, On sentential forms of context-free grammars, Acta Inforrnatica 2 (1973) 40-49. [13] A. Salomaa, Solution of a decision problem concerning unary Lindenmayer systems, Discrete Math., to appear. [I41 A. Szilard, Growth functions of Lindenmayer systems, to appear. [15] J. van Leeuwen, Pre-set push-down automata, University of California, Berkeley, Calif., Computer Science Tech. Rept. 10 (1973). [16] P.Vitanyi, Structure of growth in Lindenmayer systems, Tech. Rept. No. 1/73, Mathematisch Centrum, Amsterdam (1973). [ 171 P.Vitanyi, Growth of strings in parallel rewriting systems, unpublished.