On the number of nonterminals in linear conjunctive grammars

On the number of nonterminals in linear conjunctive grammars

Theoretical Computer Science 320 (2004) 419 – 448 www.elsevier.com/locate/tcs On the number of nonterminals in linear conjunctive grammars Alexande...

399KB Sizes 6 Downloads 67 Views

Theoretical Computer Science 320 (2004) 419 – 448

www.elsevier.com/locate/tcs

On the number of nonterminals in linear conjunctive grammars Alexander Okhotin School of Computing, Queen’s University, Kingston, Ont., Canada K7L3N6 Received 7 September 2003; received in revised form 25 February 2004; accepted 3 March 2004 Communicated by D. Perrin

Abstract The number of nonterminals in a linear conjunctive grammar is considered as a descriptional complexity measure of this family of languages. It is proved that a hierarchy collapses, and for every linear conjunctive grammar there exists and can be e4ectively constructed a linear conjunctive grammar that accepts the same language and contains exactly two nonterminals. This yields a partition of linear conjunctive languages into two nonempty disjoint classes of those with nonterminal complexity 1 and 2. The basic properties of the family of languages for which one nonterminal su5ces are established. Nonterminal complexity of grammars in the linear normal form is also investigated. c 2004 Elsevier B.V. All rights reserved.  Keywords: Formal languages; Conjunctive grammar; Trellis automaton; Cellular automaton; Language equation; Descriptional complexity; Minimal linear grammar

1. Introduction Linear conjunctive grammars, which are linear context-free grammars augmented with an explicit intersection operation, were introduced several years ago [16,18], and soon thereafter were shown to generate a language family that has been studied long before [23]. It is now known that the following formalisms de@ne the same family of languages: one-way real-time cellular automata [6,24], trellis automata [2,4,5], a certain very restricted type of Turing machines [13,14], linear conjunctive grammars  A preliminary version of this paper was presented at the Descriptional Complexity of Formal Systems (DCFS 2003) workshop held in Budapest, Hungary, July 12–14, 2003, and its extended abstract appeared in the proceedings. E-mail address: [email protected] (A. Okhotin).

c 2004 Elsevier B.V. All rights reserved. 0304-3975/$ - see front matter  doi:10.1016/j.tcs.2004.03.002

420

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

[16,23], language equations with union, intersection and linear concatenation [17] and the recently introduced linear Boolean grammars [21]. The abundancy of natural characterizations coming from di4erent areas of theoretical computer science clearly justi@es the importance of this language family and gives a reason for the continued study of its properties. Let us summarize the most important known facts about this family. It contains some usual examples of non-context-free languages, such as {an bn cn | n¿0} [4,6,18] and {wcw | w ∈ {a; b}∗ } [4,16], as well as the language of all computations of any Turing machine [15,16,18] and the von Dyck language [6,23], which is not linear contextfree. It contains some P-complete languages [13,20]. Every language it contains can be recognized in time O(n2 ) and space O(n) [4,16]. It is known to be closed under all set-theoretic operations [4,6,18], not closed under concatenation [25] and not closed under star [23]. It is incomparable with the context-free languages [25] and is properly contained in the conjunctive languages [23]. The descriptional complexity of one-way real-time cellular automata was @rst investigated in [15], where the tradeo4s between them and several devices of di4erent recognition power were proved to be recursively unbounded using the method of [12]. The succinctness of description of these languages by linear conjunctive grammars was @rst studied in [22], where the transition from them to trellis automata was shown to cause superpolynomial blowup in the total length of description. It was also proved in [22] that for every n there exist languages that can be denoted with an n-state but not an (n − 1)-state trellis automaton, thus establishing a strict in@nite hierarchy of n-state languages. This paper considers another descriptional complexity measure for this family of languages—the number of nonterminals in a linear conjunctive grammar. Somewhat surprisingly, the more or less expected in@nite hierarchy of n-nonterminal languages actually collapses, as it turns out that two nonterminals always su3ce. This contrasts the known in@nite hierarchy of n-nonterminal context-free and linear context-free languages [9] and can be compared with the results on the representability of all recursively enumerable languages in several di4erent grammar formalisms using a @xed number of nonterminals [7]. The proof of this main result, given in Section 3, is constructive: given a trellis automaton, one can represent its entire operation in the form of a certain language and denote this language using a single nonterminal, employing the other one as a start symbol that “analyzes” the encoded computation by referring to the former nonterminal and generates exactly the strings accepted by the automaton. This upper bound of 2 nonterminals is strict: as demonstrated in Section 4, two nonterminals are required even for some regular languages. At the same time, for every linear conjunctive language a certain closely related language can be denoted using a single nonterminal; among these are some P-complete languages. The basic properties of one-nonterminal linear conjunctive grammars are developed in Section 4. The class of languages they generate admits another noteworthy characterization, as the set of languages de@ned by one-variable language equations with union, intersection and linear concatenation, and thus the results on one-nonterminal languages have important implications on the theory of language equations.

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

421

In the @nal Section 5 the nonterminal complexity of linear conjunctive grammars in the linear normal form [16] is investigated and compared with the number of states in trellis automata. 2. Grammars, equations and automata 2.1. Linear conjunctive grammars Conjunctive grammars, introduced in [16], are an extension of context-free grammars with an explicit intersection operation. Denition 1. A conjunctive grammar is a quadruple G = (; N; P; S), in which  and N are disjoint @nite nonempty sets of terminal and nonterminal symbols respectively; P is a @nite set of grammar rules, each of the form (A ∈ N; n ¿ 1; i ∈ ( ∪ N )∗ for all i);

A → 1 & · · · & n

(1)

where the strings i are distinct and their order is considered insigni@cant; S ∈ N is a nonterminal designated as the start symbol. For a rule (1), an object of the form A → i is called a conjunct; denote the set of all conjuncts as conjuncts(P). A conjunctive grammar generates strings by deriving them from the start symbol, generally in the same way as context-free grammars do. Intermediate strings used in course of a derivation are de@ned as follows: Denition 2. Let G = (; N; P; S) be a conjunctive grammar. The set of conjunctive formulae F ⊂ ( ∪ N ∪ {“(”; “&”; “)”})∗ is de@ned inductively: • The empty string  is a conjunctive formula. • Any symbol from  ∪ N is a formula. • If A and B are nonempty formulae, then AB is a formula. • If A1 ; : : : ; An (n¿1) are formulae, then (A1 & · · · &An ) is a formula. G

Next, de@ne the notion of derivability in one step as a binary relation =⇒ on the set F. There are two types of derivation steps: (1) A nonterminal can be rewritten with a body of a rule enclosed in parentheses: G

s As =⇒ s ( 1 & · · · & n )s

(2)

if A → 1 & · · · & n ∈ P and s As ∈ F, where s ; s ∈ ( ∪ N ∪ {“(”; “&”; “)”})∗ . (2) A conjunction of one or more identical terminal strings enclosed in parentheses can be replaced with one such string without the parentheses: G

s (w& · · · &w)s =⇒ s ws if w ∈ ∗ and s ws ∈ F.

(3)

422

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

Denition 3. Let G = (; N; P; S) be a conjunctive grammar. The language of a formula is the set of all terminal strings derivable from the formula in zero or more steps: G LG (A) = {w ∈ ∗ | A =⇒ ∗ w}. De@ne L(G) = LG (S). Let us now restrict conjunctive grammars of the general form to obtain the subclass of linear conjunctive grammars: Denition 4. A conjunctive grammar G = (; N; P; S) is said to be linear, if each rule in P is of the form A → u1 B1 v1 & · · · &um Bm vm A→w

(m ¿ 1; ui ; vi ∈ ∗ ; Bi ∈ N )

or

(w ∈ ∗ )

(4a) (4b)

It is said to be in the linear normal form, if the rules of the type (4a) are restricted to be of the form A → bB1 & · · · &bBm &C1 c& · · · &Cn c, where b; c ∈ , m + n¿1 and Bi ∈ N . 2.2. Language equations with union, intersection and linear concatenation Context-free grammars are known to have an algebraic representation by least solutions of systems of language equations with union and concatenation [1,3]. Conjunctive grammars possess a similar characterization by language equations with union, intersection and concatenation [17]. Let us retell the main results of [17], simplifying them for the subcase of linear conjunctive grammars studied in this paper. Denition 5 (System of equations). Let  be an alphabet. Let n¿1. Let X = (X1 ; : : : ; Xn ) be a set of language variables, which assume values of languages over . Let ’1 ; : : : ; ’n be expressions that depend upon the variables X and may contain these variables, the constant languages {} and {a} (for all a ∈ ), set-theoretic union and intersection, as well as linear concatenation (i.e., left- and right-concatenation of {a}). Then X1 = ’1 (X1 ; : : : ; Xn ) .. .

(5)

Xn = ’n (X1 ; : : : ; Xn )

is called a resolved system of equations over  in variables X . A vector of languages L = (L1 ; : : : ; Ln ) is a solution of (5) if for every i the value of ’i under the assignment Xj = Lj (for all j) is Li . Note that language equations can be de@ned in a more rigorous way by @rst de@ning a formula as a syntactical concept and then supplying semantics for it by de@ning its value on a vector of languages [17,21].

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

423

Denote the right-hand side of a system (5) as a vector function: ’(X1 ; : : : ; Xn ) = (’1 (X1 ; : : : ; Xn ); : : : ; ’n (X1 ; : : : ; Xn ))

(6)

and inductively de@ne its substitutions into itself as ’0 (X1 ; : : : ; Xn ) = (X1 ; : : : ; Xn )

(7a)

’i+1 (X1 ; : : : ; Xn ) = ’(’i (X1 ; : : : ; Xn )):

(7b)

and

De@ne a partial order “4” on the set of language vectors of length n as componentwise inclusion: (L1 ; : : : ; Ln ) 4 (L1 ; : : : ; Ln ) if and only if Li ⊆ Li for all i (16i6n). ∗ Then one can prove that the operator ’ on the set (2 )n is monotone and ∪-continuous with respect to this partial order [17], which, by the @xed point theory, yields the following result: Theorem 1 (Okhotin [17]). Every system (5) over an alphabet  and in variables X1 ; : : : ; Xn has least solution (with respect to “4”) given by   L = (L1 ; : : : ; Ln ) = sup ’i ∅; : : : ; ∅ :    i¿0

(8)

n

Theorem 2 (Okhotin [17]). A language L is generated by a linear conjunctive grammar if and only there exists a system of equations (5) with union, intersection and linear concatenation, such that L is the 8rst component of its least solution. Another important issue treated in [17] is the uniqueness of solution. For language equations with unrestricted concatenation it is convenient to use the notion of a strict system, developed in [17] by generalizing the corresponding notion for the context-free case [1], as a su5cient condition for uniqueness. When concatenation is linear, this condition degrades to quite a simple one, which will now be stated: Denition 6 (Okhotin [17]). An expression (in the sense of De@nition 5) ’ is said to be admissible in strict systems if it is {}, {a}, union or intersection of two expressions admissible in strict systems, or a left- or right-concatenation of a terminal symbol to an arbitrary expression as in De@nition 5. A system (5) of language equations with union, intersection and linear concatenation is said to be strict if every expression ’i on its right-hand side is admissible in strict systems. Basically, the only restriction imposed on strict systems is that variables can appear in expressions only when concatenated to terminal symbols. This dismisses pathological cases of language equations like X = X and guarantees the following property: Theorem 3 (Okhotin [17]). Every strict system has unique solution.

424

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

Now let us strengthen the result of Theorem 2 on the equivalence of linear conjunctive grammars and language equations as in De@nition 5. In this stronger formulation it will be very useful in the following: Theorem 4. Let n¿1. For any vector of languages (L1 ; : : : ; Ln ) over , the following two conditions are equivalent: • There exists a linear conjunctive grammar G = (; {A1 ; : : : ; An }; P; A1 ), such that LG (Ai ) = Li for all i. • There exists a system of language equations with union, intersection and linear concatenation over  in variables (X1 ; : : : ; Xn ), such that its unique solution is (L1 ; : : : ; Ln ). Proof. The conversion of an n-nonterminal grammar to a system of n equations can be done as follows: @rst, unit conjuncts are removed from the grammar using the method of [16], i.e., all rules of the form A → B& 2 & · · · & n are eliminated; this does not a4ect the number of nonterminals and the languages they generate. Then every nonterminal Ai is represented as a variable Xi . If there are some rules for Ai , then the equation for Xi is Xi =



n

Ai → 1 &···& n ∈P j=1

j ;

(9a)

where all instances of nonterminals in j are replaced with the corresponding variables. If there are no rules for Xi , then the equation may be taken as, say, Xi = &a

(for some a ∈ ):

(9b)

This system has least solution (LG (A1 ); : : : ; LG (An )) by the results of [17]. On the other hand, it is a strict system, so the mentioned solution is unique by Theorem 3. Going from language equations back to linear conjunctive grammars is somewhat complicated by the fact that the expressions in the right-hand sides of equations might be arbitrarily complex, while the rules in linear conjunctive grammars must be of the speci@c form (4). This can be trivially solved by introducing new variables and moving subexpressions to separate equations, as in Theorem 2, but this would contradict our goal of having as many nonterminals as there were variables. In order to meet this goal, let us start from equivalent transformations of right-hand sides. Two expressions over union, intersection and linear concatenation that depend upon language variables (X1 ; : : : ; Xn ) and contain constant languages {a} (a ∈ ) and {} are said to be equivalent, denoted ’ ≡ , if they evaluate to the same language for every substitution of languages for its variables. The following equivalencies can easily be proved to hold for all expressions ’, and : ’ ∩ ( ∪ ) ≡ (’ ∩ ) ∪ (’ ∩ );

(10a)

( ∪ ) ∩ ’ ≡ ( ∩ ’) ∪ ( ∩ ’);

(10b)

a(’ ∪ ) ≡ a’ ∪ a ;

(10c)

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

a(’ ∩ ) ≡ a’ ∩ a ;

425

(10d)

(’ ∪ ) · a ≡ ’a ∪

a;

(10e)

(’ ∩ ) · a ≡ ’a ∩

a:

(10f)

Now, given any expression ’ and using these equivalencies as rewriting rules that operate from left to right, one can move linear concatenation to the bottom of the formula, raise union to the top level and let intersection remain in between. This can be done with the right-hand sides of a given system X = ’(X ), resulting in a system X = ’ (X ) with the same unique solution, in which every equation is of the form Xi =

lij mi

j=1 k=1

uijk Xijk vijk :

(11)

This system of equations can be directly simulated by a grammar Ai → uij1 Xij1 vij1 & · · · &uijlij Xijlij vijlij

(1 6 i 6 n; 1 6 j 6 mi )

(12)

which satis@es the @rst condition of the theorem. It should be noted that the claim of Theorem 4 is speci@c to linear conjunctive grammars, and likely does not hold for conjunctive grammars of the general form. Indeed, equalities (10d) and (10f) cannot be generalized for the general case (as concatenation is not distributive over intersection), and hence one would probably need to add extra nonterminals to represent concatenation of intersections. A statement similar to Theorem 2 [17] remains the strongest known statement on the equivalence of conjunctive grammars of the general form and language equations with intersection. 2.3. Trellis automata (Systolic) trellis automata [4] were introduced in early 1980s as a model of a massively parallel system with simple identical processors connected in a uniform pattern. Triangular (real-time, homogeneous) trellis automata are a particular case of trellis automata, in which the connections between nodes form a @gure of triangular shape, as shown in Fig. 1. These automata are used as acceptors of strings loaded from the bottom, and the acceptance is determined by the topmost element. In the following they will be referred to as just trellis automata. Following the notation of [23], de@ne a trellis automaton as a quintuple M = (; Q; I; $; F), where  is the input alphabet, Q is a @nite nonempty set of states (of processing units), I :  → Q is a function that sets the initial states (loads values into the bottom processors), $ : Q × Q → Q is the transition function (the function computed by processors) and F ⊆ Q is the set of @nal states (e4ective in the top processor). Given a string a1 : : : an (ai ∈ ; n¿1), every node corresponds to a certain substring ai : : : aj (16i6j6n) of symbols on which its value depends. The value of a bottom node corresponding to one symbol of the input is I (ai ); the value of a successor of

426

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

a1

a2

a3

a4

a5

Fig. 1. Computation of a trellis automaton.

two nodes is $ of the values of these ancestors. Denote the value of a node corresponding to ai : : : aj as &(I (ai : : : aj )) ∈ Q: here I (ai : : : aj ) is a string of states (the bottom row of the trellis), while & denotes the result (a single state) of a triangular computation starting from a row of states. By de@nition, &(I (ai )) = I (ai ) and &(I (ai : : : aj )) = $(&(I (ai : : : aj−1 )); &(I (ai+1 : : : aj ))). Now de@ne L(M ) = {w | &(I (w)) ∈ F}. The computational equivalence of trellis automata to linear conjunctive grammars was proved as follows: Theorem 5 (Okhotin [23]). For every linear conjunctive grammar G there exists and can be e:ectively constructed a trellis automaton M , such that L(M ) = L(G)\{}. Assuming without loss of generality that G is in the linear normal form, the proof of Theorem 5 is based upon a subset construction and results in exponentially as many states as there are nonterminals in the normal form grammar G. To be more speci@c, if G = (; N; P; S), then the set of states of a constructed trellis automaton is Q =  × 2N × . Theorem 6 (Okhotin [23]). For every trellis automaton M , there exists and can be e:ectively constructed a linear conjunctive grammar G, such that L(G) = L(M ). Theorem 6 is proved by a simple construction [23], where for each state q of the automaton (; Q; I; $; F) one creates a nonterminal Aq , for each initial state q = I (a) one adds a rule Aq → a, for each transition q = $(q1 ; q2 ) one adds ||2 rules Aq → bAq2 &Aq1 c

(for all b; c ∈ )

(13)

and @nally one introduces a start symbol S with a rule S → Aq for every q ∈ F. The correctness of this construction is stated as follows: G

Lemma 1 (Okhotin [23]). For every w ∈ + and q ∈ Q, Aq =⇒ ∗ w i: &(I (w)) = q.

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

427

3. Two nonterminals su#ce In this section the main result of the paper is established, which is a construction of a linear conjunctive grammar with two nonterminals out of an arbitrary linear conjunctive grammar. 3.1. Encoding of a computation The given grammar is @rst converted to an equivalent trellis automaton M = (; Q; I; $; F)—for instance, using the known method [23],—and the rest of the construction is a particular kind of simulation of this automaton by another grammar. The straightforward method of simulating trellis automata by linear conjunctive grammars, explained in the previous section, represents every state with a nonterminal. As a result, for every string w ∈ + , the set {Aq | Aq =⇒∗ w} of nonterminals that derive w directly corresponds to the state of one node &(I (w)) in the trellis. If one aims to reOect the value of each and every element of the trellis in the form of a derivation of the corresponding string from some nonterminal, then it is easy to see that no less than log2 |Q| nonterminals are required just to be able to store all this information. However, storing every element is unnecessary, as one can simulate at a time larger portions of a computation than a single $ transition, and explicitly store only some selection of the computed states, such that the rest of the states could be inferred from those that are stored. The grammar that will be constructed records every (6|Q| − 1)-th row of the computation and simulates the whole $ machinery of a trellis automaton using just one nonterminal instead of the set {Aq | q ∈ Q} from [23]. This nonterminal can be viewed as a single bit corresponding to every substring of the input string, and the operation of the grammar, which involves deriving longer strings from the shorter ones, is essentially a computation of these bits using @nite-range data access. The other nonterminal will be a start symbol that will decode the appropriate information from the @rst nonterminal and use it to accept the right strings. Let d = 6|Q| − 1. Our one-bit simulation determines and uses the values in every dth row of the computation of the trellis automaton, starting from the @rst one; these rows are emphasized in Fig. 2(a). Every dth row of the simulation (i.e., the rows 1; 1 + d; 1 + 2d; : : :) is called a control row, and its bits are always set to true (shown as black squares in Fig. 2(b)). The numbers of states forming each (kd + 1)-th (k¿0) row of the trellis are encoded as bits in the rows kd + 2; : : : ; kd + d of the one-bit simulation (white squares in Fig. 2(b), which can be true or false). Let u be a substring (of some longer input string), such that |u| = 1 (mod d), and let us show how the state qi = &(I (u)) is encoded in the bit rows following the row number |u| in the simulated computation. In Fig. 3, the central black square in the lower control row corresponds to the substring u. The number of the state qi is encoded in the bits shown in grey: both bits labeled qi in the @gure are set to true, while the rest of the grey bits are set to false. ReOecting the same state qi in two identical bits is a clear redundancy of the encoding. However, this redundancy is necessary, because the “data access” capabilities of a grammar simulating an automaton are fairly limited: the derivability of a substring

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

q q q q q 2d+1 q q q q q q 2d q q q q q q

3 q q 2 q q 1 qqq a a a (a) 1 2 3

...

d+2 d+1 encoded d states from the row 1 3 2 1 a1 a2 a3 . . . ad ad+1 (b)

q q q q q q q q q q q q q

qqqqqqq

. . . ad ad+1

...

control rows

...

...

d+2 q q q q q q q q q d+1 q q q q q q q q q d q q q q q q q q q q ...

not used

2d+1 2d

encoded states from the row d+1

...

not used

...

428

...

Fig. 2. (a) A computation and (b) its one-bit simulation.

.

..

..

.

control row

2n-1 empty rows

.

..

..

.

q n? 2n-1 more data rows: again, every second one is empty

q2? q1?

.

..

..

.

qn? 2n data rows: q2? every second one is empty q1 ?

control row

...

... u

Fig. 3. Encoding of a state qi = &(I (u)), where |u| = 1 (mod d).

from a nonterminal depends on this substring alone, and cannot be inOuenced by the context it appears. As will be shown below, most of the time only one of these two bits will be accessible. This information is stored in 4n − 1 rows above the control row; every second row among these is intentionally left blank (i.e., set to false). The remaining 2n − 1 rows are also empty. As will be proved below, these empty rows ensure that one can always tell where the control rows are. Once the control rows of the simulated computation are located, it is possible to decode the numbers of those states from the original computation that are located in every dth row. Knowing the states in these selected rows, one can easily compute the states in any row located in between: indeed, in

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448 1 2 3 2 1 3 2 2 3 3 2 1 4 3 3 2 2 3 4 3 3 2 2 4 3 4 3 3 2 2 4 4 3 4 3 3 2 2 4 4 4 3 4 3 3

2244443433

I a b 2 3

429

δ 1 2 3 4

1 3 2 2 2 1 3 4 3 4 3 4 3

2 2 4 4 4 4 4 3 4 3 3 2 2 4 4 4 4 4 4 3 4 3 3 2 2 4 4 4 4 4 4 4 3 4 1 3 2 2 4 4 4 4 4 4 4 4 3 2 3 3 2 2 4 4 4 4 4 4 4 4 4 1 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 1 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 1 4 1 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 3 2 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 1 4 1 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 3 2 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 1 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 1 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 1 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 1 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 1 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 3 3 4 3 4 3 3 4 3 3 4 3 3

F={1}

2 2 4 4

224444444442242422241343433433433 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 1 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 1 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 1 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 1 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 1 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 1 4 1 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 3 2 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 1 4 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 4 1 4 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 3 4 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 4 2 3 4 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 1 3 4 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 4 2 1 3 4 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 2 3 3 4 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 4 2 2 3 3 4 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 1 4 1 3 4 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 4 2 1 4 3 3 4 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 2 3 2 3 3 4 4 4 4 2 2 4 2 4 2 2 2 4 2 2 2 4 2 2 4 2 2 3 4 3 3 4 3 3 3 4 3 3 3 4 3 4 3 3 4 3 3 4 3 3 2 2 2 4 1 4 3 3 4 4 4 2 1 4 2 4 2 2 2 4 2 2 2 4 2 2 4 2 1 4 3 4 1 3 4 3 3 3 4 3 3 3 4 3 4 1 3 4 3 3 4 3 3 2 2 2 4 2 3 4 3 3 4 4 2 2 3 2 4 2 2 2 4 2 2 2 4 2 2 4 2 2 3 4 3 2 3 3 4 3 3 3 4 3 3 3 4 3 2 3 3 4 3 3 4 3 3 2 2 1 4 2 1 3 4 1 3 4 2 1 4 1 4 2 2 1 4 2 2 1 4 2 1 4 2 2 1 3 4 1 4 1 3 4 1 3 3 4 1 3 3 4 1 4 1 3 4 1 3 4 1 3

22232233233223232223222322322233232332333233323233233233

a a a ba a bba bba a ba ba a a ba a a ba a ba a a bba ba bba bbba bbba ba bba bba bb Fig. 4. A computation of a trellis automaton for the von Dyck language.

order to compute one state in a row kd + i, it is su5cient to know i adjacent states in the row kd + 1. This decoding process will be explained in more detail in a formal construction below, and for now let us illustrate the encoding on an example. Consider a certain four-state trellis automaton for the Dyck language over  = {a; b} [4]. This automaton and its computation on the 56-symbol-long string w = aaabaabbabb aababaaabaaabaabaaabbababbabbbabbbababbabbabb are given in Fig. 4. The number d equals 6|Q| − 1 = 6 · 4 − 1 = 23, and thus every 23rd row of this computation will be stored in its one-bit simulation; in the @gure these rows (1, 24, 47) are emphasized with a boldface font. The next Fig. 5 contains the one-bit simulation of this computation. The top left corner of the @gure reminds how every state is encoded, and indeed one can easily decipher the states emphasized in Fig. 4 by a careful examination of the bits in Fig. 5. For instance, the row number 24 of the original computation contains one instance of state “1”; it is easy to @nd the encoding of this state by looking right above the middle control row in the simulated computation.

430

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448 1\

4\ 3\ 2\ 1\ 4/ 3/ control

4/ 3/ 2/ 1/

control

2/ 1/

d = 6|Q|-1 = 23

4\ 3\ 2\ 1\ 4/ 3/ 2/ 1/

control

4\ 3\ 2\ 1\ 4/ 3/ 2/ 1/ ctrl a a a ba a bba bba a ba ba a a ba a a ba a ba a a bba ba bba bbba bbba ba bba bba bb

Fig. 5. One-bit simulation of the computation from Fig. 4.

3.2. A language equation for the encoded computation Now let us formalize this one-bit simulation as a language of the strings that have their corresponding bit set to true. Fix a trellis automaton M with the set of states Q = {q1 ; : : : ; qn } and let d = 6n − 1. Consider the language LM = Lcontrol ∪ Lleft ∪ Lright ;

(14a)

Lcontrol = {w | |w| = 1 (mod d)};

(14b)

Lleft = {xw | |w| = 1 (mod d); |x| = 2n + 2i − 1; where &(I (w)) = qi };

(14c)

Lright = {wy | |w| = 1 (mod d); |y| = 2i − 1; where &(I (w)) = qi }

(14d)

where

de@ned with respect to M . Note that all strings from Lleft have length 2n + 2i modulo d (16i6n), while the strings from Lright have length 2i modulo d. Thus each of the three

...

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

431

k

...

...

control row

2i-1 control row

x

u

v y

Fig. 6. Decoding of the states: xuv ∈ locate[k](LM ) (&(I (x)) = qi for some i).

components in (14) has its own set of admissible lengths modulo d, and these sets are disjoint. Lemma 2. For every trellis automaton M = (; Q; I; $; F) there exists and can be effectively constructed a language equation X = ’(X ), where ’ may contain the operations of union, intersection and concatenation of terminals, such that language (14) corresponding to M is the unique solution of this equation. Proof. For every k, such that 16k6d, de@ne the function locate[k](A) = Auv ∩ Av; qi ∈Q u;v: |u|=2i−1; |uv|=d+k−1

(15)

which maps a language (denoted by the language variable A) to a language. Let us substitute A = LM ; it is claimed that a string w is in locate[k](LM ) if and only if |w|¿d + 1 and |w| = k (mod d). ⇐  In order to prove that every such w is in locate[k](LM ), factorize it as w = xy, where |x| = 1 (mod d) and d6|y|¡2d. Since |w| = k (mod d), |y| = k − 1 (mod d), and hence |y| = d + k − 1. Let qi = &(I (x)) be the result of computation of M on x. Further factorize y as uv, where |u| = 2i − 1. Then xu ∈ Lright . On the other hand, x ∈ Lcontrol . Hence, xu ∈ LM and x ∈ LM , and therefore, xuv ∈ LM v ∩ LM uv ⊆ locate[k](LM ). This is illustrated in Fig. 6. ⇒  Let us prove the converse. If w ∈ locate[k](LM ), then w can be factorized as xuv, where x ∈ LM , xu ∈ LM , |u| = 2i − 1 and |uv| = d − 1 + k for some 16i6n. Since x is in (14), consider the two potential possibilities: • x ∈ Lcontrol (i.e., the control row has been correctly identi@ed as intended). Then |x| = 1 (mod d); therefore, |w| = |xuv| = 1 + d − 1 + k = k (mod d). Also, x

432

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

must be at least 1 symbol long, and hence |w| = |xuv|¿1 + d − 1 + k = d+ k¿d + 1. • x ∈ Lright or x ∈ Lleft (supposing that the some data bit could be mistaken for a control row). Then |x| = 2j (mod d) for some 16j62n, and thus x can be factorized as yz, where |y| = 1 (mod d) and |z| = 2j − 1. Then xu = y · zu is a concatenation of y (of length 1 modulo d) and a string zu, such that |zu| = 2(i + j) − 2, where 26i + j63n, and therefore |zu| is an even number between 2 and 6n − 2. This upper bound 6n − 2 does not permit yzu to reach the next control row and the rows above; the evenness of |zu| shows that yzu cannot be in Lleft or in Lright . Hence, yzu = xu ∈= LM , which yields a contradiction, proving that this case is in fact impossible. Note that the empty rows inserted between the data rows (see Fig. 3) were essentially used to rule out this possibility. Now, de@ne ’control (A) = locate[1](A) ∪

a∈

a;

(16)

which, by the argument above, implies ’control (LM ) = Lcontrol :

(17)

Obtaining a similar representation of Lright and Lleft requires a more complicated construction. First, it is needed to decode the information on the states contained in (14c) and (14d), which will be done by the following function:

is[qi ; j](A) =

        

u;z: |u|=j−1; |z|=d+1−j−(2i−1)

uAz



y;v: |y|=j−1−(2n+2i−1); |v|=d+1−j

if 1 6 j 6 4n; yAv if 4n + 1 6 j 6 d + 1;

(18)

which ensures that the state in the jth relative position in the control row below is qi (see Fig. 7). Formally, the claim is that for every string w (|w|¿d) of length 1 modulo d, w ∈ is[qi ; j](LM ) if and only if for the partition w = uxv (|u| = j−1; |v| = d+ 1 − j) it holds that &(I (x)) = qi . ⇐  Let w = uxv, where |u| = j − 1, |v| = d + 1 − j and &(I (x)) = qi . Note that, since |uxv| = 1 (mod d) and |uv| = d, x also has length 1 modulo d. Consider two cases: • If j64n, then |v| = 6n−j¿2n, and hence v can be factorized as yz, where |y| = 2i−1 and |z| = d+1−j−(2i−1). By (14d), xy ∈ Lright , and thus w = uxyz ∈ uLM z ⊆ is[qi ; j] (LM ). This case is illustrated in Fig. 7(a). • If j¿4n + 1, |u|¿4n, and u can be factorized as yz, where |z| = 2n + 2i − 1 and |y| = j − 1 − (2n + 2i − 1). By (14c), zx ∈ Lleft , and w = yzxv ∈ yLM v ⊆ is[qi ; j](LM ). See Fig. 7(b). Note that for low values of j the relevant bits from Lleft would be inaccessible, and similarly for j close to d + 1 the bits from Lright might be out of reach; this

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

433

control row

2n+2i-1 2i -1 control row 1 2 3

j

u

x

(a)

4n

d+1

y

1 2 3

z

y

v

x'

4n

(b)

z u

j

d+1

x

v

x'

Fig. 7. Decoding of the states: is[qi ; j](LM ), the cases (a) j64n and (b) j¿4n.

is the exact reason why the same state is redundantly encoded in two di4erent places. ⇒  Let w ∈ is[qi ; j](LM ) (|w|¿d; |w| = 1 (mod d)). Then, by (18): • If 16j64n, then w = ux z, where |u| = j − 1, x ∈ LM and |z| = d + 1 − j − (2i − 1). Factorize x as xy, where |y| = 2i − 1. Note that |uyz| = j − 1 + 2i − 1 + d + 1 − j − (2i − 1) = d and |x| = |w| − |uyz| = 1 − d = 1 (mod d). Since |u| = j − 1 and |yz| = 2i − 1 + d + 1 − j − (2i − 1) = d + 1 − j, the factorization w = u · x · yz satis@es the requirement above. Since |y| = 2i − 1, the string xy ∈ LM has to be in Lright , and therefore, by (14d), &(I (x)) = qi . • If 4n + 16j6d + 1, then w = yx v, where |y| = j − 1 − (2n + 2i − 1), x ∈ LM and |v| = d + 1 − j. Factorize x = zx, where |z| = 2n + 2i − 1. As in the previous case, |yzv| = j − 1 − (2n + 2i − 1) + (2n + 2i − 1) + d + 1 − j = d and |x| = 1 (mod d). The factorization w = yz · x · v is again as required, since |yz| = j − 1. According to (14c), zx ∈ Lleft , and therefore &(I (x)) = qi . This proves that (18) works as intended: for every w, such that |w|¿d and |w| = 1 (mod d), w ∈ is[qi ; j](LM ) if and only if M produces qi on certain substring of w that is d symbols shorter. Let us now devise a expression over LM that contains w (|w|¿d; |w| = 1 (mod d)) if and only if M produces qi on the same string w. Indeed, decoding d + 1 adjacent states using is[qi ; j] allows us to simulate d rows of a computation of trellis automaton and thus determine &(I (w)). De@ne computes[qi ](A) =



d+1

qt1 ;:::;qtd+1 ∈Q: j=1 &(qt1 :::qtd+1 )=qi

is[qtj ; j](A)

(19)

and now, for every w as above (i.e., |w| = 1 (mod d) and |w|¿d), w ∈ computes[qi ](A) if and only if &(I (w)) = qi . This process is explained in Fig. 8.

434

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

?

qi = ...

...

control row

d+1

... ...

... ... ...

encoded states

simulation of TA

is[q

(for a

tj , j](L ') M

ll j=1

ctrl row

..d+1

qt1

)

qt2

...

qtj

qtd+1

...

Fig. 8. Decoding of the states: computes[qi ](LM ).

In terms of locate[k] and computes[qi ] it is easy to de@ne the functions: w∪ ’left (A) = w∈Lleft ; |w|6d





locate[2n + 2i](A) ∩

qi ∈Q

’right (A) =

w∈Lright ; |w|6d



u: |u|=2n+2i−1

 u · computes[qi ](A) ;

(20a)

w∪



qi ∈Q



locate[2i](A) ∩

v: |v|=2i−1

 computes[qi ](A) · v

(20b)

and prove that ’left (LM ) = Lleft ;

(21a)

’right (LM )

(21b)

= Lright :

Let us give the proof for (21b). By (20b), w ∈ ’right (LM ) if and only if either |w|6d and w ∈ Lright , or there exists i (16i6n), such that w ∈ locate[2i](LM )

(22)

and a factorization w = xv (where |v| = 2i − 1) yields x ∈ computes[qi ](LM ):

(23)

As proved above, (22) is equivalent to |w|¿d and |w| = 2i (mod d). This means that |x| = |w| − |v| = 1 (mod d), and hence, by the properties of computes[qi ], (23) is equivalent to &(I (x)) = qi . Taking note that these conditions on v and x are as in (14d), it can be concluded that w ∈ ’right (LM ) if and only if either |w|6d and w ∈ Lright , or |w|¿d and w ∈ Lright —i.e., if and only if w ∈ Lright . This completes the proof of (21b); the case of ’left is handled in exactly the same way.

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

ψ

F

435

?

k k+1 control row

k+1

j

1 2

com

pute

(for

simulation of TA

s[q ] t (

LM ')

j=1.

j

.k+1

)

qt1

qt2

...

qtj

...

qtk+1

computes[qtj](L'M)

v

u

Fig. 9. Decoding L(M ) out of LM using

.

Now consider the language equation: X = ’control (X ) ∪ ’left (X ) ∪ ’right (X ):

(24)

By (17), (21a), (21b) and (14a), the language LM is a solution of this equation. On the other hand, this equation is strict (see De@nition 6), and therefore its solution LM is unique, which completes the proof of the lemma. 3.3. Decoding the original language Lemma 3. For every trellis automaton M = (; Q; I; $; F), there exists an expression (X ) over union, intersection and linear concatenation, such that (LM ) = L(M ), where LM is as in (14). Proof (Sketch of proof). Let locate[k](X ) and computes[q](X ) be as in the proof of Lemma 2. De@ne  =

w∈L(M ): |w|6d



w∪

d  locate[k](X )∩

k=1

 k



qt1 ;:::;qtk+1 ∈Q: j=1 u;v: |u|=j−1; |v|=k+1−j &(qt1 :::qtk+1 )∈F

 u · computes[qtj ](X ) · v :

(25)

It is left to demonstrate that, indeed, (LM ) = L(M ). For every string w of length ld+k (l¿1; 16k6d), it su5ces to decode the outcome of the computation of M on all substrings of w of length ld from the language LM (using the mapping computes), thus obtaining a vector of k + 1 states, and then ensure that the subcomputation of M on this vector leads to a @nal state. This method is illustrated in Fig. 9.

436

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

d Formula (25) separately considers all possible lengths of w modulo d (the union k=1 and the reference to locate[k](X )), and for each k it tries all possible vectors of k + 1 states that lead to a @nal state. For every such vector (qt1 ; : : : ; qtk+1 ), each of its states qtj is checked for being the outcome of the computation of M on the substring x of w, such that w = uxv, |u| = j − 1 and |v| = k + 1 − j. This can be done by a single reference to computes[qtj ](LM ). Now the main result of this paper can be stated: Theorem 7. For every trellis automaton M = (; Q; I; $; F), there exist and can be e:ectively constructed: • a system of language equations with two variables that has unique solution (L(M ); LM ). • a linear conjunctive grammar G with two nonterminals, such that L(G) = L(M ). Proof. Let X = ’(X ) be the language equation constructed in Lemma 2, which has unique solution LM . Let be the expression constructed in Lemma 3. Then the system S = (X ); X = ’(X )

(26)

is easily seen to have unique solution (L(M ); LM ), which proves the @rst part of the theorem. A two-nonterminal linear conjunctive grammar for L(M ) can be constructed out of system (26) by Theorem 4. Another related result is that every linear conjunctive language can be “almost” represented by a single language equation: Theorem 8. Let  be an alphabet, let # ∈= . Then for every trellis automaton M there exists and can be e:ectively constructed a single language equation that has unique solution LM ∪ #L(M ). Proof. Let X = ’(X ) be the language equation from Lemma 2 and let pression from Lemma 3. De@ne the expression    a∪ /(X ) = X ∩  a∈



u∈ : 16|u|6d+1

 uX 

be the ex-

(27)

and the equation X = ’(/(X )) ∪ # (/(X )):

(28)

Let us prove that Eq. (28) has unique solution LM ∪ #L(M ). The @rst claim is that /(LM ∪ #L(M )) = LM :

(29)

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

437

The left-hand side of (29) can be equivalently transformed as follows: /(LM ∪ #L(M ))





 = (LM ∪ #L(M )) ∩  a∪ a∈

  a∪ = (LM ∪ #L(M )) ∩  a∈



u∈ : 16|u|6d+1



u∈ : 16|u|6d+1

 u · (LM ∪ #L(M ))  uLM ∪

u∈∗ : 16|u|6d+1

 u#L(M )

= L 1 ∪ L2 ; where





 L1 = LM ∩  a∪ a∈



u∈∗ : 16|u|6d+1

 L2 = #L(M ) ∩  a∪ a∈

uLM ∪



u∈ : 16|u|6d+1

Language (30a) equals 

u∈∗ : 16|u|6d+1

uLM ∪

 u#L(M ) ;

(30a) 



u∈ : 16|u|6d+1

 u#L(M ) :

(30b)



 L1 = LM ∩  a∪ a∈



u∈ : 16|u|6d+1

 uLM  ∪ LM ∩





LM

 

u∈∗ : 16|u|6d+1

 ∅

u#L(M ) = LM ; 

where the @rst part follows from the containment of every string of length 1 modulo d in LM , while the second component is ∅, because no string in LM contains #. Turning to (30b):    a∪ L2 = #L(M ) ∩  a∈

u∈∗ : 16|u|6d+1

uLM ∪

u∈∗ : 16|u|6d+1

 u#L(M ) = ∅;

here the two languages are disjoint, because all strings in #L(M ) start from #, while all strings in the other language start from some symbol in . Putting together the equalities obtained: /(LM ∪ #L(M )) = L1 ∪ L2 = LM ∪ ∅ = LM ;

(31)

which proves claim (29). Now let us substitute LM ∪ #L(M ) for X in (28): ’(/(LM ∪ #L(M ))) ∪ # (/(LM ∪ #L (M ))) equals ’(LM ) ∪ # (LM ) by Eq. (29), which in turn equals LM ∪ # (LM ) by

438

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

Lemma 2 and LM ∪ #L(M ) by Lemma 3, thus proving that LM ∪ #L(M ) is a solution of (28). This solution is unique, because the equation is strict.

4. One-nonterminal grammars Having proved that every linear conjunctive language can be generated using two nonterminal symbols, it is natural to investigate whether this estimation is precise, and could not a single nonterminal be su5cient. It turns out that it could not, and, as it will be shortly demonstrated, there exist linear conjunctive languages that require two nonterminals. This section is devoted to the study of one-nonterminal linear conjunctive grammars and of the language family they generate. Due to the correspondence between linear conjunctive grammars and systems of language equations, these are the languages denoted by unique (or least) solutions of individual language equations with union, intersection and linear concatenation, resolved with respect to their single variable; this characterization makes the study of their properties worthwhile. A similar object, linear context-free grammars with one nonterminal—minimal linear context-free grammars—have been studied in the early days of formal language theory [3,8,10] (note that Chomsky and SchRutzenberger [3] and Haines [10] use a more restrictive de@nition than just the singularity of a nonterminal; the present paper follows the de@nition of Greibach [8], also assumed by Harrison [11]), and some key results on linear context-free grammars were demonstrated to hold for this class: this is the existence of languages with non-context-free complement [10] and the undecidability of the ambiguity problem [8]. The conjunctive counterpart of this class, linear conjunctive languages of nonterminal complexity 1, as it was already demonstrated in Theorem 8, similarly share some essential qualities of linear conjunctive languages of the general form. Their basic formal properties are studied in this section. 4.1. The shrinking lemma Let us @rst establish the following shrinking lemma for one-nonterminal linear conjunctive grammars, using which one can prove that some linear conjunctive languages actually require two nonterminals. Lemma 4 (Shrinking lemma). Let G = (; {S}; P; S) be a linear conjunctive grammar comprised of a single nonterminal. Then there exists a constant k¿0, such that for every string w ∈ L(G), where |w|¿k, there exists a factorization w = xuy (0¡|x| + |y|6k), such that u ∈ L(G). Proof. Let G = (; {S}; P; S) be an arbitrary linear conjunctive grammar that generates an in@nite language. Assume, without loss of generality, that G does not contain unit conjuncts of the form S → S (all rules containing this conjunct are of no use and can

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

439

be removed from P [16]). Let k=

max

S→ ∈conjuncts(P)

| |

(32)

be the maximum length of a right-hand side of a conjunct. Let w ∈ L(G) be any string, such that |w|¿k. Since S =⇒∗ w, there exists a rule S → 1 & · · · & m , such that i =⇒∗ w for all i. Consider the @rst conjunct of the rule: 1 cannot be in ∗ , since that would imply

1 = w and | 1 |¿k, which is a contradiction. So, 1 = xSy for some x; y ∈ ∗ (0¡|x| + |y|6k). Since 1 =⇒∗ w, this implies that w = xuy for some u ∈ ∗ , such that S =⇒∗ u. This factorization satis@es the statement of the lemma. The shrinking lemma is a necessary but not a su5cient condition of being a onenonterminal language. An example of language that satis@es shrinking lemma but is not one-nonterminal will be constructed in Theorem 9 (the case of union). However, in many cases the shrinking lemma does work: Proposition 1. The regular language ba∗ b cannot be generated by a one-nonterminal linear conjunctive grammar. Proof. Suppose L = ba∗ b is generated by such a grammar. Then, by Lemma 4, there exists a constant k¿0, such that for the string bak b ∈ L there exists a factorization bak b = xuy (|x| + |y|¿0), where u ∈ L. Therefore, a string of the form bai , ai b or ai must be in L, which is a contradiction. Could there be a pumping lemma for one-nonterminal linear conjunctive grammars? The answer is negative. It is known that linear conjunctive languages of the general form do not have a pumping lemma due to the undecidability of the emptiness problem. If one supposes that there is a pumping lemma for the one-nonterminal case, then, given an arbitrary linear conjunctive grammar G, one can construct an equivalent trellis automaton M and consider the language LM ∪ #L(M ) that is a one-nonterminal language by Theorem 8. Since the supposed one-nonterminal pumping lemma should apply to every su5ciently long string #w ∈ #L(M ) from this language, producing #w ∈ #L(M ), its statement would at the same time be applicable to the corresponding string w ∈ L(M ), similarly producing w ∈ L(M ). This gives a pumping lemma for linear conjunctive grammars of the general form, which is a contradiction. Let us continue with some examples of languages that can be denoted with onenonterminal grammars. Proposition 2. There exists a linear conjunctive grammar with one nonterminal for the regular language a∗ b ∪ ba∗ . Proof. The grammar is S → b|ab|ba|aaS&aS|Saa&Sa. While the @rst three rules denote a base set of strings, the last two specify that if a string begins with an a (ends with an a, resp.), then one more a can be appended to its beginning (to its end, resp.).

440

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

Formally, it can be proved that this grammar generates a∗ b ∪ ba∗ by checking that this language is a solution of the corresponding language equation. Note that, although the language a∗ b ∪ ba∗ is obviously linear context-free, it is easy to prove that no one-nonterminal linear context-free grammar generates it. Proposition 3. There exists a language generated by a one-nonterminal linear conjunctive grammar, which cannot be represented as a 8nite intersection of context-free languages. Proof. Consider the linear conjunctive language L = {wcw | w ∈ {a; b}∗ }, and let M = (; Q; I; $; F) be an arbitrary trellis automaton for this language. Let LM be the (onenonterminal) language constructed using the method of Lemma 2, and let us prove that it is not in the intersection closure of the context-free languages. Suppose it can be represented as an intersection of @nitely many context-free languages. Then, by (25) in Lemma 3, the language L can also be represented as such a @nite intersection, which is known to be untrue [26]. A wealth of other one-nonterminal languages is provided by Theorem 8. Its result will be used in the later Section 4.4 to @nd the hardest language in the family.

4.2. Closure properties Theorem 9. The family of one-nonterminal linear conjunctive languages is not closed under union, intersection, complement, concatenation and star. Proof. The case of concatenation is straightforward: it su5ces to represent the language ba∗ b, which is not a one-nonterminal language by Proposition 1, as b · a∗ b. In order to prove nonclosure under intersection, consider the languages L1 = (a ∪ b)∗ b and L2 = b(a ∪ b)∗ . Both can clearly be generated using one nonterminal, but their intersection L1 ∩ L2 = b(a ∪ b)∗ b ∪ b cannot, which is proved in the same way as in Proposition 1. Turning to the case of complement, let L = a∗ b ∪ ba∗ ∪ a∗ . L is generated by a grammar S → |a|b|ab|ba|aaS&aS|Saa&Sa, based on the same idea as the grammar from Proposition 2. Let us prove that its complement, L = a∗ ba∗ b(a ∪ b)∗ ∪ a+ ba+ , cannot be generated with one nonterminal. Supposing that it can, let k be the constant given by the shrinking lemma, and consider the string w = bak b ∈ L. There should exist a factorization w = xuy, such that 0¡|x| + |y|6k and the string u is in L; however, every such factorization results in u ∈ ba∗ , u ∈ a∗ b or u ∈ a∗ , which implies u ∈= L and yields a contradiction. Proving that one-nonterminal languages are not closed under union is somewhat harder, because the shrinking lemma will not work: it is easy to see that for every L1 ; L2 that satisfy the shrinking lemma, the language L1 ∪ L2 also does. Consider the

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

441

languages: L1 = {am ban | m ¡ n};

(33a)

L2 = {am ban | m ¿ n}:

(33b)

L1 is generated by the grammar S → aS|aSa|b, and L2 by a similar one. Let us give a direct proof that their union, L = L1 ∪ L2 = {am ban | m = n} is not a one-nonterminal language. Suppose L is generated by a linear conjunctive grammar G = ({a; b}; {S}; P; S); as in Lemma 4, assume without loss of generality that S → S ∈= conjuncts(P). As in (32), let k be the maximum length of conjuncts’ right-hand sides. Consider the derivation of the string bak ∈ L: S =⇒ ( 1 & · · · & l ) =⇒ · · · =⇒ bak

(34)

It is easy to see that every i must be of the form Saji , where ji ¿0. Indeed, the case

i = bak is impossible by the de@nition of k, because |bak |¿k; i cannot be in ba∗ Sa∗ , because then ba∗ Sa∗ =⇒∗ bak would mean that some string from a∗ is derivable from S, which cannot be true; i ∈ Sa+ is the only remaining opportunity to derive bak . Therefore, derivation (34) is of the form S =⇒ (Saj1 & · · · &Sajl ) =⇒ · · · =⇒ bak

(35)

and the grammar contains the rule S → Saj1 & · · · &Sajl (ji ¿0). Using this rule and taking note that for every i the string ak bak−ji is in L, construct the following derivation: S =⇒ (Saj1 & · · · &Sajl ) =⇒ · · · =⇒ (ak bak−j1 aj1 & · · · &ak bak−jl ajl ) =⇒ ak bak : k

(36)

k

Since a ba ∈= L, this contradicts the assumption that L(G) = L, proving that L = L1 ∪ L2 is not a one-nonterminal linear conjunctive language, and thus this family is not closed under union. It remains to prove nonclosure under star. Let L = {an bn | n¿0}, an obvious onenonterminal language. Suppose that L∗ is a one-nonterminal language as well, and let k be the constant given by the shrinking lemma. Consider w = ak bk ak bk ∈ L∗ . Since every factorization w = xuy (0¡|x| + |y|6k) gives u = ak−|x| bk ak bk−|y| ∈= L∗ (because k − |x| = k or k = k − |y|), a contradiction is obtained. 4.3. Decision problems Theorem 10. The emptiness and universality problems for one-nonterminal linear conjunctive grammars are decidable. The equivalence problem is undecidable. The decidability results are proved by establishing necessary and su5cient conditions for emptiness and universality. These conditions are given in Lemmata 5 and 6, which will be followed by the proof of Theorem 10.

442

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

Lemma 5. Let G = (; {S}; P; S) be a one-nonterminal linear conjunctive grammar over . Then L(G) = ∅ if and only if P contains a rule of the form S → w (w ∈ ∗ ). Proof. If there is such a rule, then w ∈ L(G) and thus L(G) = ∅. Conversely, for any conjunctive derivation to be successful, it has to contain at least one application of a rule of the form A → w, and thus if there is no rule of this form, then L(G) = ∅. Lemma 6. Let G = (; {S}; P; S) be a linear conjunctive grammar, such that S → S ∈= conjuncts(P), and let d be the maximum length of right-hand side of a conjunct. Then L(G) = ∗ if and only if • Every string of length less or equal to 2d is in L(G), and • For every string w of length 2d there exists a rule S → u1 Sv1 & · · · &ul Svl ∈ P;

(37)

such that every ui is a pre8x of w and every vi is a su3x of w. Proof. ⇒  If L(G) = ∗ , then the @rst condition is obviously satis@ed. Supposing that the second is not—i.e., there exists a string w, such that no rule in P has all conjuncts in the required form—one can easily see that w cannot be derived at all and therefore L(G) = ∗ , which is untrue. ⇐  Let both conditions hold and let us show that every w ∈ ∗ is in L(G). The proof is an induction on the length of w. Basis: |w|62d. Every such w is in L(G) by the @rst condition. Induction step: |w|¿2d. Factorize the string as w = w zw , where |w | = |w | = d. Now for the string w w there should exist a rule of form (37), such that every ui is a pre@x of w w and every vi is a su5x of w w . Since |ui |; |vi |6d and |w | = |w | = d, it follows that for every i, w = ui xi and w = yi vi for some xi ; yi ∈ ∗ . Since every string xi zyi is shorter than w = ui xi zyi vi , it is in L(G) by the induction hypothesis. This allows to construct the derivation S =⇒ (u1 Sv1 & · · · &ul Svl ) =⇒ (u1 x1 zy1 v1 & · · · &ul xl zyl vl ) =⇒ w zw ;

(38)

which proves that w ∈ L(G). Proof (Proof of Theorem 10). The emptiness problem, stated as “given a linear conjunctive grammar G = (; {S}; P; S), determine whether L(G) = ∅”, can be algorithmically solved by checking the condition given in Lemma 5. Similarly, the universality problem (“given G, determine whether L(G) = ∗ ”) can be decided according to Lemma 6. Let us show that there is no algorithm to solve the equivalence problem for onenonterminal linear conjunctive grammars (given G1 and G2 , determine whether L(G1 ) = L(G2 )). Suppose it is decidable and construct an algorithm for checking the emptiness of the language generated by a linear conjunctive grammar of the general form, which is easily seen to be undecidable by reduction from Turing machine halting problem [18].

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

443

Let G be an arbitrary linear conjunctive grammar over . Construct a trellis automaton M = (; Q; I; $; F) that accepts L(G) and consider the equation X = ’(X ) from Lemma 2, which has unique solution LM as in (14), and the equation X = ’(/(X )) ∪ # (/(X )) from Theorem 8, which has unique solution LM ∪ #L(M ). Taking note that LM = LM ∪ #L(M ) if and only if L(M ) = ∅, each of the two equations can be converted to a one-nonterminal grammar using the method of Theorem 4, and then by checking the equivalence of these grammars one can decide the emptiness of L(G), which contradicts its known undecidability. 4.4. The hardest language and the membership problem In [13] it was shown that some P-complete languages can be accepted by a certain restricted type of Turing machines and thus by trellis automata; this was rediscovered in [20] using trellis automata and linear conjunctive grammars. Using the results of this paper one can prove a stronger claim, stating that a one-nonterminal grammar (or a single resolved language equation) is actually enough to denote a P-complete language. Theorem 11. The family of one-nonterminal linear conjunctive languages contains a P-complete language. Proof. Let L0 be some P-complete linear conjunctive language, let M be a trellis automaton for L0 . Using the method of Theorem 8, construct a one-variable language equation that has unique solution L = LM ∪ #L(M ) Clearly, L0 is reducible to L under any sensible de@nition of reducibility (it su5ces to add # to the beginning of a string), which proves the P-hardness of L. On the other hand, L is in P as a linear conjunctive language. This completes the proof of the statement that L is a P-complete one-nonterminal linear conjunctive language. The complexity of the general membership problem for this family of grammars now follows as a simple corollary of Theorem 11. Theorem 12. The membership problem for one-nonterminal linear conjunctive grammars, stated as “Given a grammar G = (; {S}; P; S) and a string w ∈ ∗ , determine whether w ∈ L(G)”, is P-complete. Proof. It is solvable in polynomial time, because so is the membership problem for conjunctive grammars of the general form [19]. Its P-hardness can be proved by reducing the @xed membership problem in some one-nonterminal P-complete language, which exists by Theorem 11, to this more general problem. The properties of one-nonterminal linear conjunctive grammars are summarized and compared to those of linear conjunctive grammars of the general form in Table 1.

444

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

Table 1 Properties of one-nonterminal linear conjunctive grammars

Closure properties ∪ ∩ ∼ · ∗

Decision problems Emptiness Universality Equivalence The hardest language

One-nonterminal

Of the general form

− − − − −

+ + + − [25] − [23]

+ +

− − −



P-complete

P-complete

5. Nonterminal complexity of grammars in the linear normal form While a @nite number of nonterminals turns out to be su5cient in linear conjunctive grammars of the general form, this is not the case for the grammars in the linear normal form, since every n-nonterminal grammar in this normal form can be converted to a trellis automaton of 2n+C states, where C depends upon the alphabet [23], and there exist @nitely many automata of a @xed size. Let us investigate succinctness tradeo4s between these two representations. 5.1. Simulating trellis automata with grammars The original proof of simulation of trellis automata by linear conjunctive grammars [23] represented every state of an n-state automaton with a nonterminal symbol and added one extra start symbol, thus resulting in n + 1 nonterminals (see Theorem 6). In this section a new method, much more e5cient in terms of the number of nonterminals, is proposed: an n-state trellis automaton is converted to a grammar with O(log2 n) nonterminals. Given an automaton M = (; Q; I; $; F), where |Q| = n, let k = log2 n and rename the states (without loss of generality), so that Q ⊆ {0; 1}k . Now construct the grammar (k) (1) (k) with the nonterminals {S; A(1) 0 ; : : : ; A0 ; A1 ; : : : ; A1 } and with the following rules: Ai(t) → a (if I (a) = (i1 ; : : : ; ik ); for all t); t

(39a)

(k) (1) (k) Al(t) → bA(1) j1 & · · · &bAjk &Ai1 c& · · · &Aik c t

(if $((i1 ; : : : ; ik ); (j1 ; : : : ; jk )) = (l1 ; : : : ; lk ); for all b; c ∈  and t);

(k) S → A(1) i1 & · · · &Aik

(for all (i1 ; : : : ; ik ) ∈ F):

(39b)

(39c)

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

445

Let us prove the correctness of construction. Lemma 7. For every string w ∈ + , and for every t (16t6k) and x ∈ {0; 1}, if G Ax(t) =⇒ ∗ w then the tth component of &(I (w)) is x. Proof. Induction on the length of w. Basis: w = a ∈ . If Ax(t) =⇒∗ a, then there should be a rule Ax(t) → a in the grammar. By construction (39a), this implies that the tth component of I (a) is x. Induction step: Let |w|¿2 and let Ax(t) =⇒∗ w. Then there exists a derivation of the form (k) (1) (k) Ax(t) =⇒ (bA(1) j1 & · · · &bAjk &Ai1 c& · · · &Aik c) =⇒ · · · =⇒ w:

(40)

This implies that w = buc for some u ∈ ∗ , Ai(t) =⇒∗ bu (for all t) and Aj(t) =⇒∗ uc (for t t all t). Invoking the induction hypothesis 2k times, one obtains that the @rst component of &(I (bu)) is i1 , the second component of &(I (bu)) is i2 and so on, and similarly every tth component of &(I (uc)) is jt . Putting these facts together, &(I (bu)) = (i1 ; : : : ; ik ) and &(I (uc)) = (j1 ; : : : ; jk ). Let (l1 ; : : : ; lk ) = $((i1 ; : : : ; ik ); (j1 ; : : : ; jk )) = &(I (w)). By (39b) in the construction (k) (1) (k) of the grammar, the existence of the rule Ax(t) = bA(1) j1 & · · · &bAjk &Ai1 c& · · · &Aik implies that x = lt , which is exactly the tth component of &(I (w)). G

Lemma 8. For every string w ∈ + , if &(I (w)) = (i1 ; : : : ; ik ), then Ai(t) =⇒ ∗ w for all t. t Proof. Induction on |w|. Basis: w = a ∈ . By the construction of grammar (39a), for every t there is a rule Ai(t) → a, and hence Ai(t) =⇒∗ a. t t Induction step: Let &(I (w)) = (l1 ; : : : ; lk ). Let w = buc, where b; c ∈ , u ∈ ∗ . Let &(I (bu)) = (i1 ; : : : ; ik ) and &(I (uc)) = (j1 ; : : : ; jk ). By the induction hypothesis: =⇒∗ bu Ai(t) t

(for all t)

(41a)

Aj(t) =⇒∗ uc t

(for all t):

(41b)

and

Since $((i1 ; : : : ; ik ); (j1 ; : : : ; jk )) = $(&(I (bu)); &(I (uc))) = &(I (w)) = (l1 ; : : : ; lk ), by the construction of the grammar there should be a rule (39b) for every t, which, together with all 2k rules (41), is enough to construct a derivation of w from Al(t) . t Theorem 13. For every n-state trellis automaton there exists an can be e:ectively constructed an equivalent linear conjunctive grammar in the linear normal form with 2log2 n + 1 nonterminals.

446

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

Proof. Construct the grammar as (39), and let us prove that for every string w ∈ + , S =⇒∗ w if and only if &(I (w)) ∈ F. ⇒  Let S =⇒∗ w. Consider the derivation (k) S =⇒ (A(1) i1 & · · · &Aik ) =⇒ · · · =⇒ w;

(42)

G

which implies that Ai(t) =⇒ ∗ w for all t. Using Lemma 7, it can be obtained that for t every t the tth component of &(I (w)) is it . The conjunction of these k statements (k) implies &(I (w)) = (i1 ; : : : ; ik ). Now, since S → A(1) i1 & · · · &Aik ∈ P, from construction (39c) it follows that (i1 ; : : : ; ik ) ∈ F. ⇐  Let &(I (w)) = (i1 ; : : : ; ik ) ∈ F. Then, by Lemma 8, Ai(t) =⇒∗ w for every t. By the t construction of the grammar, there is a rule (39c). This allows to construct the derivation (k) S =⇒ (A(1) i1 & · · · &Aik ) =⇒ · · · =⇒ (w& · · · &w) =⇒ w;

(43)

proving that w ∈ L(G). 5.2. Simulating grammars with trellis automata How many states does a minimal trellis automaton equivalent to an n-nonterminal linear conjunctive grammar in the linear normal form have? The grammar-to-automaton construction of [23] provides ||2 · 2n = 2n+C upper bound. Let us infer 2 n=2 −1 lower bound from the automaton-to-grammar construction given in Theorem 13. Suppose the contrary, i.e., that there exists n¿0, such that for every grammar in the linear normal form with n nonterminals there exists an equivalent automaton with 2 n=2 −1 − 1 states. Choose m, such that n = 2log2 m + 1. Then for every m-state automaton the construction from Theorem 13 can be applied to obtain a grammar comprised of n nonterminals, and then the supposed construction can be used to get another (2 (2 log2 m +1)=2 −1 − 1)-state automaton that accepts the same language as the original one. However, 1

2 (2 log2 m +1)=2 −1 − 1=2 log2 m +2 −1 − 1 = 2 log2 m −1 − 1 ¡ 2log2 m − 1 = m − 1 and thus it is shown that an arbitrary m-state automaton can be reduced, which cannot be true in light of the results of [22]. Thus it has been proved that the number of states in the minimal trellis automata equivalent to an n-nonterminal linear conjunctive grammar in the linear normal form lies between 2 n=2 −1 and 2n+C , i.e., is 2O(n) . 6. Conclusion It was demonstrated that two nonterminals in a linear conjunctive grammar are su5cient to denote any language from this quite noteworthy family, and a certain variation of any such language can even be generated with a single nonterminal. Besides characterizing the nonterminal complexity of this family of grammars, these results will surely

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

447

have impact on the theory of language equations, showing how much can be expressed in a single resolved equation that uses quite a modest set of language-theoretic operations. A classi@cation of linear conjunctive languages into two classes of those that require one nonterminal and those that require two of them was developed. One-nonterminal linear conjunctive languages, akin to the known minimal linear context-free languages, were found to cover some paradigmatic examples of languages from this family. Additionally, they turned out to have di4erent decidability and closure properties than linear conjunctive grammars with two nonterminals. For the class of linear conjunctive grammars in the linear normal form, an exponential tradeo4 between the number of nonterminals and the number of states in trellis automata was proved. Naturally, the given technique of converting a linear conjunctive grammar to an equivalent one with two nonterminals causes a signi@cant blowup in the total length of description. In future research it might be interesting to investigate the tradeo4 between these two measures of succinctness. A more interesting question is whether a bounded number of nonterminals is enough for conjunctive grammars of the general form, or is there an in@nite hierarchy of nnonterminal conjunctive languages, similar to the context-free one? If a @xed number of nonterminals is enough, then the construction would probably be quite di4erent from the one presented in this paper, as the present method does not seem to be extendable to general conjunctive grammars. On the other hand, proving the more likely hierarchy result appears to be a challenging task, since it would most probably require to prove some languages not to be conjunctive, for which no technique is known yet. Acknowledgements I am grateful to an anonymous referee for noting a subtle technical error in the initial version of the construction in Section 3, and to Kai Salomaa for his comments on di4erent revisions of the paper. References [1] J. Autebert, J. Berstel, L. Boasson, Context-free languages and pushdown automata, Rozenberg, Salomaa (Eds.), Handbook of Formal Languages, Vol. 1, Springer, Berlin, 1997, pp. 111–174. [2] C. Cho4rut, K. Culik II, On real-time cellular automata and trellis automata, Acta Inform. 21 (1984) 393–407. [3] N. Chomsky, M.P. SchRutzenberger, The algebraic theory of context-free languages, in: Bra4ort, Hirschberg (Eds.), Computer Programming and Formal Systems, 1963, pp. 118–161. [4] K. Culik II, J. Gruska, A. Salomaa, “Systolic trellis automata”, I, International J. Comput. Math. 15 (1984) 195–212; K. Culik II, J. Gruska, A. Salomaa, “Systolic trellis automata”, II, International J. Comput. Math. 16 (1984) 3–22. [5] K. Culik II, J. Gruska, A. Salomaa, Systolic trellis automata: stability, decidability and complexity, Inform. and Control 71 (1986) 218–230. [6] C. Dyer, One-way bounded cellular automata, Inform. and Control 44 (1980) 261–281.

448

A. Okhotin / Theoretical Computer Science 320 (2004) 419 – 448

[7] H. Fernau, Nonterminal complexity of programmed grammars, Theoret. Comput. Sci. 296 (2) (2003) 225–251. [8] S.A. Greibach, The undecidability of the ambiguity problem for minimal linear grammars, Inform. and Control 6 (2) (1963) 119–125. [9] J. Gruska, Descriptional complexity of context-free languages, Proc. MFCS, 1973, pp. 71–83. [10] L.A. Haines, Note on the complement of a (minimal) linear language, Inform. and Control 7 (1964) 307–314. [11] M.A. Harrison, Introduction to Formal Language Theory, Addison-Wesley, Reading, MA, 1978. [12] J. Hartmanis, On the succinctness of di4erent representations of languages, SIAM J. Comput. 9 (1980) 114–120. [13] O.H. Ibarra, S.M. Kim, Characterizations and computational complexity of systolic trellis automata, Theoret. Comput. Sci. 29 (1984) 123–153. [14] O.H. Ibarra, S.M. Kim, S. Moran, Sequential machine characterizations of trellis and cellular automata and applications, SIAM J. Comput. 14 (2) (1985) 426–447. [15] A. Malcher, Descriptional complexity of cellular automata and decidability questions, J. Automat. Languages Combin. 7 (4) (2002) 549–560. [16] A. Okhotin, Conjunctive grammars, J. Automat. Languages Combin. 6 (4) (2001) 519–535. [17] A. Okhotin, Conjunctive grammars and systems of language equations, Programming Comput. Software 28 (2002) 243–249. [18] A. Okhotin, On the closure properties of linear conjunctive languages, Theoret. Comput. Sci. 299 (2003) 663–685. [19] A. Okhotin, A recognition and parsing algorithm for arbitrary conjunctive grammars, Theoret. Comput. Sci. 302 (2003) 365–399. [20] A. Okhotin, The hardest linear conjunctive language, Inform. Process. Lett. 86 (5) (2003) 247–253. [21] A. Okhotin, Boolean grammars, Developments in Language Theory, Proceedings of DLT 2003, Szeged, Hungary, July 7–11, 2003, Lecture Notes in Computer Science, Vol. 2710, Springer, Berlin, pp. 398– 410. [22] A. Okhotin, State complexity of linear conjunctive languages, J. Automat. Languages Combin. 9 (2004) to appear. [23] A. Okhotin, On the equivalence of linear conjunctive grammars to trellis automata, Inform. ThVeorique Appl. 38 (2004) 69–88. [24] A.R. Smith III, Real-time language recognition by one-dimensional cellular automata, J. Comput. System Sci. 6 (1972) 233–252. [25] V. Terrier, On real-time one-way cellular array, Theoret. Comput. Sci. 141 (1995) 331–335. [26] D. Wotschke, The Boolean closures of deterministic and nondeterministic context-free languages, in: W. Brauer (Ed.), Gesellschaft fRur Informatik e., Vol. 3, Jahrestagung 1973, Lecture Notes in Computer Science, Vol. 1, Springer, Berlin, 1973, pp. 113–121.