A canonical automaton for one-rule length-preserving string rewrite systems

Information and Computation 244 (2015) 203–228 Contents lists available at ScienceDirect Information and Computation www.elsevier.com/locate/yinco ...

Download PDF

722KB Sizes 11 Downloads 36 Views

Report

PDF Reader
Full Text

Information and Computation 244 (2015) 203–228

Contents lists available at ScienceDirect

Information and Computation www.elsevier.com/locate/yinco

A canonical automaton for one-rule length-preserving string rewrite systems Michel Latteux, Yves Roos ∗ Univ. Lille, CRIStAL, UMR 9189, 59650 Villeneuve d’Ascq, France

a r t i c l e

i n f o

Article history: Received 21 February 2014 Received in revised form 3 October 2014 Available online 17 July 2015 Keywords: String rewrite system Rational transduction

a b s t r a c t In this work, we use rearrangements in rewriting positions sequence in order to study precisely the structure of the derivations in one-rule length-preserving string rewrite systems. That yields to the deﬁnition of a letter-to-letter transducer that computes the relation induced by a one-rule length-preserving string rewrite system. This transducer can be seen as an automaton over an alphabet A × A. We prove that this automaton is ﬁnite if and only if the corresponding relation is rational. We also identify a suﬃcient condition for the context-freeness of the language L recognized by this automaton and, when this condition is satisﬁed, we construct a pushdown automaton that recognizes L. © 2015 Elsevier Inc. All rights reserved.

1. Introduction Rewrite systems are of primordial interest for computational problems. The problems that are mainly investigated for rewrite systems are the accessibility problem, the common descendant problem, the conﬂuence problem, the termination and uniform termination problem. Some intriguing decidability problems remain open even for very simple rewrite systems.1 One-rule rewrite systems are among the simplest rewrite systems. Nevertheless, they have been intensively studied for several years and several deep results have been obtained [11,12,20,15,22,10,16,7,8,18]. It is particularly noteworthy that the decidability of the termination of one-rule rewrite systems remains an open question for more than twenty years. One-rule rewrite systems are simply deﬁned by two words u , v over an alphabet A and noted S = {u → v }. For a word w, S ( w ) is the set of words obtainable from w by replacing repeatedly u by v. Thus S induces a relation over A ∗ × A ∗ and we address here the problem to deﬁne a letter-to-letter transducer that computes this relation in the case when the system S = {u → v } is length-preserving that is when |u | = | v |. It is known from [6] (cf. also [5]) that a length-preserving rational relation over A ∗ × A ∗ is a rational subset of ( A × A )∗ so a length-preserving rational relation can be computed by a letter-to-letter ﬁnite transducer. Unfortunately, the relation induced by a one-rule length-preserving rewrite system need not be rational: for instance, the rewrite system S 1 = {ba → ab} clearly induces a relation that does not preserve regularity. As a consequence, the letter-to-letter ﬁnite transducer associated with a one-rule length-preserving rewrite system that we deﬁne in this work need not be ﬁnite and we do not provide an effective algorithm for computing it either. Nevertheless, we get that it is ﬁnite if and only if the relation induced by its corresponding one-rule length-preserving rewrite system is rational by proving here the if part.

* 1

Corresponding author. E-mail addresses: [email protected] (M. Latteux), [email protected] (Y. Roos). See [4] for a history of these problems and the attempts to solve them.

http://dx.doi.org/10.1016/j.ic.2015.07.002 0890-5401/© 2015 Elsevier Inc. All rights reserved.

204

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Clearly, a letter-to-letter ﬁnite transducer over A ∗ × A ∗ can be seen as an automaton whose input alphabet is A × A. The states of this automaton have to store all the potential and future rewritings that can be applied in a given position in a whole derivation for a given one-rule length-preserving rewrite system S = {u → v }. For this, we introduce the notion of rewriting positions sequence (rps for short) that is the sequence of all the positions where the rule u → v is applied during the derivation and we also deﬁne the crucial notion of left rps. The left derivation (or left rps) from a word w to a word w satisﬁes the property that each step in the derivation is applied at the leftmost position (in the current word) such that a derivation to w is still possible. This notion of left rps gives a canonical representative for the set of all derivations from a given word w to a given word w and plays a central role in the deﬁnition of the states of the canonical automaton associated with a one-rule length-preserving rewrite system. The paper is organized as follows: ﬁrst, in Section 3, we deﬁne the notion of rewriting positions sequence and explore some of their properties. We also deﬁne left derivations and left rps and the contribution of a rps to a given position i. For every one-rule length-preserving rewrite system S = {u → v }, we deﬁne the semi-commutation relation θ S and prove that its corresponding semi-commutation rewrite system can be used to compute from every rps its equivalent left rps. In Section 4, we give the deﬁnition of the canonical automaton associated with a one-rule length-preserving rewrite system S = {u → v } and we prove that this automaton is ﬁnite if and only if the relation induced by the rewrite system S is rational. In Section 5, we give a suﬃcient condition for the context-freeness of the language L recognized by this automaton deﬁned in Section 4 and, when this condition is satisﬁed, we construct a pushdown automaton that recognizes L. At last, in the conclusion, we identify some problems that deserve to be studied in the context of the rationality of one-rule length-preserving rewrite systems. 2. Preliminaries and notations Let be a ﬁnite or inﬁnite alphabet, ∗ will denote the free monoid over and ε the empty word in ∗ . For a word w ∈ ∗ , | w | denotes the length of the word w and, for any letter a ∈ , | w |a denotes the number of occurrences of the letter a in w. A word w is a factor of a word w if there exist two words w 1 and w 2 such that w = w 1 w w 2 and we denote by F( w ) the set of the factors of the word w. We denote by SF( w ) (respectively PF( w )) the set of suﬃxes (respectively preﬁxes) of the word w, that is:

SF( w ) = { w ∈ ∗ | ∃ w ∈ ∗ , w = w w }, PF( w ) = { w ∈ ∗ | ∃ w ∈ ∗ , w = w w }. A rewrite system over an alphabet is a subset S ⊆ ∗ × ∗ . Members of S are denoted u → v. We shall denote S −1 the system obtained from the system S by reversing the rules of S, that is u → v ∈ S iff v → u ∈ S −1 . One-step derivation, denoted →, is the binary relation over words deﬁned by: ∀ w , w ∈ ∗ , w → w iff there exists u → v ∈ S and α , β ∈ ∗ ∗ such that w = α u β and w = α v β . The relation − → , called derivation relation, is the reﬂexive and transitive closure of the relation →. Abusing notation we shall identify in the following a given rewrite system S with its associated transformation ∗ over languages: for every word w ∈ ∗ , we shall denote S ( w ) the set S ( w ) = { w ∈ ∗ | w − → w } and for every language ∗ L ⊆ , S (L) = S ( w ). For a derivation w = w 0 → w 1 . . . → w n = w , n is called the length of the derivation. w ∈L

A rewrite system S is called length preserving if for every rule u → v in S, u and v have the same length. Semicommutation systems are particular cases of length preserving rewrite systems: a semi-commutation θ over an alphabet is an irreﬂexive binary relation included in × . When the relation θ is symmetric, it is called a partial commutation. Semicommutations and partial commutations were introduced in the context of traces theory [2]. With every semi-commutation θ is associated a semi-commutation rewrite system R θ deﬁned by R θ = {ab → ba | (a, b) ∈ θ}. For every one-rule length-preserving rewrite system S = {u → v }, we denote:

X = PF(u ) ∩ SF( v ) ∩ + ,

Z = SF(u ) ∩ PF( v ) ∩ + ,

U

= u Z −1

U = X −1 u ,

V

v X −1

=

= {u

∈ ∗

| u ∈ u Z },

V = Z −1 v as depicted in Fig. 1.

3. Rewritings and commutations In the rest of this article, we consider a ﬁxed non-trivial rewrite system S = {u → v } such that |u | = | v |. Here S is called non-trivial if u = v. We denote A the alphabet of S: A = {x | |u |x + | v |x > 0}. Since u and v are distinct, there exist a word d and two distinct letters a and b in A such that u ∈ dbA∗ and v ∈ daA∗ . We have: ∗

Lemma 1. If w = w 0 . . . w k − → w = w 0 . . . w k with w = w and if i is the smallest index such that w i = w i then: 1. w 0 . . . w i −1 ∈ A ∗ d

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

205

Fig. 1. The sets X , Z , U , U , V and V .

2. w i = b and w i = a ∗ → dw i . . . w k 3. dw i . . . w k − ∗

The proof of this lemma is an induction over the length of the derivation w − → w and, as a matter of fact, it does not use the hypothesis that S is length-preserving; Lemma 1 is true for every one-rule rewrite system. As a direct consequence of this lemma, we get that S ( v A ∗ ) ⊆ daA∗ so S ( v A ∗ ) ∩ u A ∗ = ∅. Symmetrically we also have S ( A ∗ v ) ∩ A ∗ u = ∅. In order to keep a precise information on positions where rewritings apply, we use the following notion of rewriting positions sequence associated with a derivation. ∗ We associate with every derivation w − → w of length k in S its rewriting positions sequence (rps) s, deﬁned by:

• if k = 0 then s = ε , ∗ ∗ • else w = α u β → α v β −→ w and s = |α |.s where s is the rps associated with the derivation α v β − → w. Observe that every derivation, starting from a word w, is completely characterized by its rewriting positions sequence that can be seen as a (ﬁnite) word over the (inﬁnite) alphabet N. We say that a word s ∈ N∗ is a rps (for S) if there exists ∗ a derivation w − → w in S whose rps is s; we denote w − → w this derivation and we denote RPS( w , w ) the set of all rps s corresponding to the derivations from w to w . For every integer k ∈ N, let us denote by shiftk the morphism deﬁned from N∗ to N∗ by: ∀i ∈ N, shiftk (i ) = i + k. For every sequence s ∈ N∗ , we also denote min(s) = min({i | |s|i > 0}) and max(s) = max({i | |s|i > 0}). The following properties are clearly satisﬁed: Property 1. 1. 2. 3. 4. 5. 6. 7.

every factor of a rps is a rps, / F(s), for every rps s, for every i ∈ N, ii ∈ if s ∈ RPS( w , w ) then for every α ∈ A ∗ , s ∈ RPS( w α , w α ), s ∈ RPS( w , w ) if and only if for every α ∈ A t , shiftt (s) ∈ RPS(α w , α w ), in particular, if s is a rps then the sequence s deﬁned by shiftmin(s) (s ) = s is a rps, if s is a rps then |s|0 ≤ 1, if s is a rps then |s|min(s) = |s|max(s) = 1.

In the rest of this article, we assume u = u 0 u 1 . . . un and v = v 0 v 1 . . . v n with, for every i ∈ {0, . . . , n}, u i , v i ∈ A. Let us consider a one-step derivation w 0 w 1 . . . w p − → w 0 w 1 . . . w p . For every integer j that satisﬁes 0 ≤ j ≤ p − n, if j < i or i j > i + n then w j = w j and if 0 ≤ j − i ≤ n, then w j = u j −i and w j = v j −i .

→ w: More generally, we can deﬁne the sequence of transformations that occur in a position j during a derivation w − s with every integer j, we associate the morphism cont j : N∗ −→ Nn∗ where Nn = {i ∈ N | i ≤ n} deﬁned by cont j (i ) = j − i if 0 ≤ j − i ≤ n, else cont j (i ) = ε . For every rps s and every integer j, we say that cont j (s) is the contribution of the rps s to the position j. Observe that if j = i + 1 we have conti +1 ( j ) = shift1 (i that is the morphism deﬁned from N∗ to N∗ by: >i ( j ) = j if j > i else >i ( j ) = ε . Example 1. Let S 2 = {baba → abba} and let us consider the following derivation from w = babbababa to w = ababbabba:

babbababa → bababbaba → abbabbaba → abbababba → ababbabba.

206

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

The rps associated with this derivation is the sequence 3.0.5.2. Observe that we also have 3.5.0.2 in RPS( w , w ). The contribution of the rps 3.0.5.2 to the position 3 is cont3 (3.0.5.2) = 0.3.1. Lemma 2. If w = w 0 w 1 . . . w t − → w = w 0 w 1 . . . w t and cont j (s) = i 1 . . . ik with k > 0, then w j = u i 1 , w j = v ik with u i 2 . . . u ik = s v i 1 . . . v ik−1 . Proof. The proof is an induction over |s|, the length of the rps s. Observe that |s| > 0 since k > 0.

• if |s| = 1, the property is satisﬁed since k = 1, w j = u i 1 , w j = v i 1 and u i 2 . . . u ik = v i 1 . . . v ik−1 = ε , • if |s| > 1, then s = is with i ∈ N and w − → w −→ w . We consider two cases: i

s

1. cont j (i ) = ε : in this case w j = w j and cont j (s ) = i 1 . . . ik so the property is satisﬁed by inductive hypothesis,

2. cont j (i ) = ε : in this case cont j (i ) = i 1 and cont j (s ) = i 2 . . . ik . It follows w j = u i 1 and w j = v i 1 . By inductive hypothesis, we get w j = u i 2 , w j = v ik and u i 3 . . . u ik = v i 2 . . . v ik−1 . It follows w j = v i 1 = u i 2 , so u i 2 . . . u ik = v i 1 . . . v ik−1 .

2

We shall now prove that if s is in RPS( w , w ) and s is in RPS( w , w ) for some words w , w , w then w = w if and only if s and s are permutations of each other. Recall that u ∈ dbA∗ and v ∈ daA∗ for some word d and distinct letters a and b. In the rest of the article, we shall denote r = |d|.

→ x y and xz − → x z for some words x, y , z, x , y , z with |x| = |x | then for every i < |x| − r, |s|i = |s |i . Lemma 3. If xy − s

s

Proof. Assume that i < |x| − r is the smallest integer such that |s|i = |s |i . Thus |x| > r so x = x0 . . . x|x|−1 and x = x0 . . . x|x|−1 . Let j = i + r, cont j (s) = i 1 . . . ik , cont j (s ) = i 1 . . . ik and let us compute

δa = | v i 1 . . . v ik |a − |u i 1 . . . u ik |a . We can write δa = >r +
>r =

(| v ih |a − |u ih |a ),

h=1,...,k i h >r

(| v ih |a − |u ih |a ) and r =

h=1,...,k i h
(| v ih |a − |u ih |a ).

h=1,...,k i h =r

Since v r = a and u r = b = a, we get r = |cont j (s)|r = |s|i . Moreover, if i h < r then u ih = v ih by deﬁnition of d and it + |s | by deﬁning δ = follows r + |s|i , and, similarly, δa = | v i . . . v i |a − |u i . . . u i |a = > i r a 1

1

k

k

+ + like δ = > a >r + r and it follows >r = > a i i r a On the other hand, we have from Lemma 2: δa = | v ik |a − |u i 1 |a since u i 2 . . . u ik = v i 1 . . . v ik−1 and similarly δa = | v i |a − k |u i 1 |a . It follows

δa − δa = | v i |a − | v ik |a + |u i 1 |a − |u i 1 |a k

and, since v i = xj = v ik and u i = x j = u i 1 we ﬁnally obtain δa − δa = 0, a contradiction. 1

k

2

We are now able to prove the following theorem where, for every word w ∈ ∗ , the set com( w ) = { w ∈ ∗ | ∀a ∈

, | w |a = | w |a } denotes the commutative closure of the word w.

Theorem 1. If s ∈ RPS( w , w ) and s ∈ RPS( w , w ) for some words w , w , w then w = w if and only if com(s) = com(s ). Proof. Let w = w 0 . . . w t , w = w 0 . . . w t and w = w 0 . . . w t . Let us suppose com(s) = com(s ) and take j ≤ | w |; we shall prove w j = w j .

Let cont j (s) = i 1 . . . ik and cont j (s ) = i 1 . . . ik . From the hypothesis, com(cont j (s)) = com(cont j (s )) and it follows that com(u i 1 . . . u ik ) = com(u i . . . u i ) and 1 k com( v i 1 . . . v ik ) = com( v i . . . v i ). 1 k From Lemma 2, we get w j = u i 1 = u i which implies com(u i 2 . . . u ik ) = com(u i . . . u i ). Moreover, since u i 2 . . . u ik = v i 1 . . . v ik−1 and u i . . . u i = v i . . . v i 2

k

1

k−1

1

, it follows com( v i 1 . . . v ik−1 ) = com( v i . . . v i 1

The converse implication is a direct consequence of Lemma 3 in the case2 y = z = ε .

2

The case y = ε or z = ε of Lemma 3 will be needed in the proof of Proposition 4.

2

k−1

k

) and we get v ik = v i so w j = w j . 2

k

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

207

Note that we symmetrically get: If com(s) = com(s ) and s ∈ RPS( w , w ) and s ∈ RPS( w , w ) then w = w . We also observe that Theorem 1 does not hold if the rewrite system is not length-preserving as shown by the following example:

→ abaaa −→ ababaaa and aaa −→ aaaba −→ abaaaba. Conversely, we have Example 2. Let S 3 = {a → aba}. We have aaa − 0

aa − → abaa −→ abaaba and aa −→ aaba −→ abaaba. 0

3

2

2

0

0

1

Moreover, it is clear that, even if S is a one-rule length-preserving rewrite system, if s ∈ RPS( w , w ) and if s is a sequence that satisﬁes com(s) = com(s ), it does not always hold that s ∈ RPS( w , w ): indeed, for S 1 = {ba → ab}, 0.1 ∈ RPS(baa, aab) but 1.0 ∈ / RPS(baa, aab). So, a natural question arises in this context: given a one-rule length-preserving rewrite system and a rps s corresponding to a derivation between two words w and w , how to compute the set RPS( w , w )? From Theorem 1, we know that RPS( w , w ) ⊆ com(s) and so is ﬁnite. Hence, RPS( w , w ) can be computed as follows: enumerate com(s) and for each s ∈ com(s), test if w −→ w . As a matter of fact, there exists a particular member in s RPS( w , w ) from which a more eﬃcient procedure can be applied to compute RPS( w , w ). This particular rps corresponds to the left derivation from w to w deﬁned bellow. Roughly speaking, in each step of a left derivation from a word w to a word w , the rewriting rule is applied on the leftmost occurrence of u that keeps w reachable. It corresponds to the smallest sequence of RPS( w , w ) with respect to the lexicographic order.

→ w − → w (and its corresponding rps is) is left if the two following properties are satisﬁed: Deﬁnition 1. A derivation w − i

1. the derivation w

− → s

s

w is left,

/ S (α v β). 2. if w = α u β with |α | < i then w ∈ Note that we shall consider that an empty derivation is left. Furthermore, it is clear from the deﬁnition that, given two words ∗ w and w with w − → w , there exists a unique left derivation from w to w and we denote leftrps( w , w ) its corresponding rps. We say that a sequence s is a left rps if there exist words w and w such that s = leftrps( w , w ). Example 3. Let S 4 = {abab → bbaa}. The derivation

w = ababbaabababab −−−−→ w = bbabbaabaabbaa 6.10.0.3

is not left since w −−−−−→ w . The left derivation from w to w is w −−−−−→ w . We can observe that the indices in the 6.0.10.3

0.6.3.10

rps of a left derivation need not occur in ascending order. As a matter of fact it is the case if and only if Z ⊆ X as shown in [13]. Even though we have now a notion of canonical derivation and its associated rps, we do not yet have an operation that can transform an rps into the canonical one. This operation will be the following semi-commutation deﬁned over indices of rps. In this commutation, two indices i and j can commute if the application of a rewriting in position i followed by an application of a rewriting in position j is always equivalent to the application of a rewriting in position j followed by an application of a rewriting in position i. It is the case when |i − j | > n or when u = xu x, v = xv x for some words x, u and v with |xu | = |i − j |. Indeed, let us consider ji ∈ RPS( w , w ) for some words w, w and assume i < j. If j − i > n then w = α u β u γ and w = α v β v γ with |α | = i and |α u β| = j and we clearly have i j in RPS( w , w ). Now if j − i ≤ n and u = xu x, v = xv x for some words x, u , v with |xu | = j − i, it follows w = α xu xu xβ and w = α xv xv xβ with |α | = i and |α xu | = j so i j is in RPS( w , w ). This motivates the following deﬁnition of the semi-commutation θ S which is not symmetric since our goal is to get left derivations: Deﬁnition 2. The semi-commutation θ S ⊆ N × N is deﬁned by

θ S = {( j , i )|( j > i + n) ∨ ((i < j ≤ i + n ∧ i + n − j + 1 ∈ F )} where F = {|x| | x ∈ X ∩ Z }. Clearly if ( j , i ) ∈ θ S , then the sequence ji is a non-left rps and the sequence i j is a left rps. We also observe that, if the semi-commutation θ S is not trivial, that is if F = ∅, then u 0 = v 0 and un = v n . In particular, we have: Lemma 4. For every 0 < j ≤ n, ( j , 0) ∈ θ S if and only if u j . . . un ∈ X ∩ Z . Furthermore, ( j , i ) ∈ θ S only depends on the value j − i in the deﬁnition; it follows that ( j , i ) ∈ θ S if and only if ( j − i , 0) ∈ θ S . We also have the following, used in the proof of Proposition 4:

208

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Lemma 5. If (i , j ) ∈ θ S then for every k ≥ i, the sequence kj is not a left rps. Symmetrically, for every k ≤ j, the sequence ik is not a left rps. Proof. Let (i , j ) ∈ θ S and k ≥ i. If kj is not a rps then it is not a left rps. Else, if k > j + n, then (k, j ) ∈ θ S and it follows that kj is not left. It remains the case k ≤ j + n which implies i ≤ j + n too. Let x = u i − j . . . un ; since (i , j ) ∈ θ S , it follows x ∈ X ∩ Z . Let x = uk− j . . . un ; since kj is a rps, it follows x ∈ Z . Moreover |x | ≤ |x| so x ∈ PF(x) ⊆ PF(u ). On the other hand, since x ∈ SF(u ) and x ∈ Z , we get x ∈ SF(x) ⊆ SF( v ) so x ∈ X . This implies (k, j ) ∈ θ S and kj is not left. One can symmetrically prove that if k ≤ j, the sequence ik is not a left rps. 2 Another property of the relation θ S is: Lemma 6. The semi-commutation θ S is transitive. Proof. Let ( j , i ) ∈ θ S and (i , k) ∈ θ S . It follows k < i < j. Suppose that k < j ≤ k + n. It follows k < i ≤ k + n, i < j ≤ i + n,

u = PF(u ) ∩ A i +n− j +1 ∈ SF(u ) ∩ PF( v ) ∩ SF( v ) and

u = PF(u ) ∩ A k+n−i +1 ∈ SF(u ) ∩ PF( v ) ∩ SF( v ). Since j ≤ k + n, we get |u | +|u | ≥ n + 1 = |u | which implies u = v, a contradiction. It follows that j > k + n so ( j , k) ∈ θ S .

2

´ Métivier and Ochmanski [17] have proved that if a semi-commutation θ has no symmetric rule then its corresponding system R θ is conﬂuent if and only if θ is transitive. So, we directly obtain: Corollary 1. R θ S is conﬂuent. A ﬁrst and easy result connecting the semi-commutation θ S and derivations in S is: Proposition 1. Let s ∈ N∗ and s ∈ R θ S (s) then s ∈ RPS( w , w ) if and only if s ∈ RPS( w , w ). Proof. By an induction argument, it is suﬃcient to consider one step of rewriting in R θ S from s to s . Thus we can assume s = ji and s = i j with ( j , i ) ∈ θ S . Let us consider two words w and w such that w −→ w . We shall prove that w −→ w . ji ij We consider two cases: 1. j > i + n. In this case,

w = α1 u α2 u α3 −j− −−−−→ α1 u α2 v α3 − −−−→ α1 v α2 v α3 = w =|α u α | i =|α | 1

2

1

and, clearly, we can also have

w = α1 u α2 u α3 − −−−→ α1 v α2 u α3 −j− −−−−→ α1 v α2 v α3 = w . =|α u α | i =|α | 1

1

2

2. i < j ≤ i + n. In this case, u = α β α and v = αγ α for some words α , β and γ with |α | = i + n − j + 1. It follows w = α1 α β α β αα2 with |α1 | = i, |α1 α β| = j and w − → α1 α β αγ αα2 − → α1 αγ αγ αα2 = w . This implies that we also j i have w − → α1 αγ α β αα2 − → α1 αγ αγ αα2 = w . i

j

Similarly we can prove that w −→ w implies w −→ w that ﬁnishes the proof. ij

ji

2

We directly obtain as a corollary: Corollary 2. If s = leftrps( w , w ) then s is in R θ S normal form. Conversely, we shall prove that if s = leftrps( w , w ) it holds that s in RPS( w , w ) implies s in R θ S (s). The proof is not so immediate and, in a ﬁrst step, we show that the property holds in a special case. For this, we need several intermediate properties. Recall that u = dbg and v = dag for some words g and g and some distinct letters a and b where d is the longest common preﬁx of u and v. Let us denote C = {u | u da ∈ PF(u )}. Observe that ε ∈ / C and that C + is not always included in dbA∗ : for instance, if S = {bba → bab} then C = {b} is not included in bbA∗ , but we have: Lemma 7. C ∗ db ⊆ dbA∗ .

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

209

Proof. The inclusion is clearly true if C = ∅. Otherwise, we shall prove that C i db ⊆ dbA∗ for every integer i. This is clear for i = 0. Next, assume that C i db ⊆ dbA∗ and let u ∈ C . It follows u C i db ⊆ u dbA∗ . Since u da ∈ PF(u ) and by a length argument, we get db ∈ PF(u d) so u C i db ⊆ dbA∗ . 2 Lemma 8. If xda ∈ PF(C ∗ ) then x ∈ C ∗ . Proof. There exists some word y such that xday = x x with x ∈ C ∗ , x ∈ C and | y | < |x |. We can distinguish two cases: 1. |day| < |x |. In that case, x = x z with zday ∈ C . It follows zda ∈ PF(u ) which implies z ∈ C and x = x z ∈ C ∗ . 2. |day| ≥ |x |. In that case d = d d with d = ε , x = xd and x = d ay ∈ C . It follows xd ∈ C ∗ , d = d1 d2 with d2 ∈ C ∗ , x ∈ C ∗ f for some word f and f d1 ∈ C . Since d2 d a ∈ PF(C ∗ ) with |d2 d a| ≤ |db|, we get d2 d a ∈ PF(db) by Lemma 7 so d2 d a ∈ PF(d), and we get fda = f d1 d2 d a ∈ PF( f d1 da) which is included in PF(u ) because f d1 ∈ C . So f ∈ C and x ∈ C ∗ . 2 Lemma 9. Let w and z be two words that satisfy w , w z ∈ C ∗ with db ∈ PF(zdb) then z ∈ C ∗ . Proof. The property is clearly true if C = ∅. Otherwise, let P = (db)−1 Cdb; P is a preﬁx code that satisﬁes dbP ∗ = C ∗ db: indeed, by deﬁnition of P and Lemma 7, we obtain dbP = Cdb, and, by induction, dbPk = C k db for every k ≥ 0. We get wzdb = wdbz with wdb, wdbz ∈ dbP ∗ and it follows z ∈ P ∗ and zdb = dbz ∈ dbP ∗ = C ∗ db so z ∈ C ∗ . 2 Lemma 10. If dbu b ∈ PF(u ), then C ∗ u A ∗ ∩ C ∗ dbu a A ∗ = ∅. Proof. First observe that C + dbA∗ = CdbA∗ since C ∗ db ⊆ dbA∗ by Lemma 7. Now, if C = ∅, we are done. Otherwise, assume wuy = w dbu ay with w , w ∈ C ∗ . Since dbu a ∈ / PF(u ), w = w . We can distinguish two cases: 1. w = w z. In that case u y = zdbu ay with z ∈ C + by Lemma 9, so C + dbA∗ ∩ u A ∗ = ∅. On the other hand, Cda ⊆ PF(u ) implies that CdbA∗ ∩ u A ∗ = ∅ and so C + dbA∗ ∩ u A ∗ = ∅, a contradiction. 2. w = w z. In that case zuy = dbu ay with z ∈ C ∗ by Lemma 9 and dbu a ∈ PF(C ∗ u ). Since dbu a ∈ / PF(u ), we get

dbu a ∈ PF(C + u ) ⊆ PF(C + dbA∗ ) = PF(CdbA∗ ). It follows that there exists some word x ∈ C such that dbu a ∈ PF(xdbA∗ ). Since xda ∈ PF(u ) it follows xdb ∈ / PF(dbu ) ⊆ PF(u ). From dbu a ∈ PF(xdbA∗ ) and xdb ∈ / PF(dbu ), we get |x| > |u |. That implies dbu a ∈ PF(xdb) so dbu a ∈ PF(xd) ⊆ PF(u ), a contradiction. 2 Lemma 11. If S ( w ) ∩ u A ∗ = ∅ then w ∈ C ∗ u A ∗ . Proof. Let us consider a derivation of minimal length w − → ux for some word x ∈ A ∗ . The proof is by induction on |s|. If s ∗ s = ε then w ∈ u A ; else let us consider the smallest i such that |s|i > 0. − − We have s = s is and w −→ yu w → yv w − → ux with i = | y |. It follows w = yz with z −− → u w where shift| y| (s ) = s . By s

i

s

∗

s

inductive hypothesis, we get z ∈ C ∗ u A ∗ . We also have ux = yz and v w − → z . It follows z = daz by S ( v A ∗ ) ⊆ daA∗ . Now, from the minimality of the length of the derivation w − → ux, we get u ∈ / PF( yu w ). This implies u ∈ / PF( yd) so yda ∈ PF(u ), s y ∈ C and w = yz ∈ C ∗ u A ∗ . 2

Note that the converse of this lemma does not hold: if we consider S 5 = {baa → aba}, we have d = ε and C = {b, ba}, so w = babbaa ∈ C 2 u but S 5 ( w ) = {babbaa, bababa} and S 5 ( w ) ∩ u A ∗ = ∅. The following lemma shows that there exists only one occurrence of u as a factor in C ∗ u: Lemma 12. C ∗ u ∩ A ∗ u A + = ∅. Proof. The case C = ∅ is clear so we can suppose C = ∅ which implies da ∈ F(u ). Assume that wu = w u w for some w ∈ C ∗ and w = ε and consider the last occurrence of da as a factor in u: u = αdaβ with daβ ∈ / A + daA∗ . It follows α ∈ C ∗ ∗ and | w | > |β| so u ∈ F(C d). Moreover, by Lemma 7 and a length argument, we get d ∈ PF(C ) so u ∈ F(C ∗ ). Let u be the shortest member of C that satisﬁes u = xy with y = ε and u ∈ PF( yC ∗ ). Since u db ∈ / PF(C ∗ ), it follows x = ε . Moreover, since u da ∈ PF(u ), we also get u = yz with zda ∈ PF(C ∗ ), which implies z ∈ C ∗ by Lemma 8. Now, from the equalities u = xy = yz, it follows u ∈ SF( z∗ ) with | z| < |u | a contradiction with the choice of u . So C ∗ u ∩ A ∗ u A + = ∅. 2

210

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

We denote f the longest common suﬃx of u and v and we ﬁrst suppose that |d| = r ≤ | f |. In this case, we shall prove that, for w ∈ S ( w ), we can construct step by step the left rps from w to w . First we have:

→ u w with wu ∈ / A ∗ u A + and | w | = i, there exists Lemma 13. If PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ) then for every derivation wuw − s a rps s with is ∈ R θ S (s0). Proof. The proof is an induction over |s|, the length of the sequence s. From Lemma 11 we get wuw = xux for some x ∈ C ∗ . If |x| < | w | then wu ∈ A ∗ u A + , a contradiction. If |x| > | w | then xu ∈ A ∗ u A + , a contradiction by Lemma 12. It follows w = x so w ∈ C ∗ . If s = ε then w = ε and i = 0 and we have is = 0 ∈ R θ S (0). If s = ε then s = js1 . If j = i then the property is satisﬁed so we assume j = i, and, since wu ∈ / A ∗ u A + , we have j > i. Let us distinguish two cases:

• j < i + n + 1 − r. In this case, wuw − → wu daz − − → u w with u db ∈ PF(u ). Since j > i it follows u = ε then u d = dbu j s 1

for some word u . From Lemma 10 it follows that wdbu az = wu daz ∈ / C ∗ u A ∗ , a contradiction by Lemma 11. • j ≥ i + n + 1 − r. In this case, wuw = wu u u z − → wu u v z − − → u w with u u = u = u u and u v = v. It follows j s1 u ∈ X ∩ Z so ( j , i ) ∈ θ S . By inductive hypothesis, there exists s1 such that is1 ∈ R θ S (s1 0) and it follows ijs1 ∈ R θ S (s0). 2 Remark 1.

/ SF( v ). 1. Observe that, if there exists some m ∈ PF(u ) ∩ SF(u ) ∩ PF( v ) \ SF( v ), we have u = u m = mu , v = mv and m ∈ For the derivation u mu − → u mv , with j = |u |, there does not exist s with 0s ∈ R θ S ( j0) since u ∈ / SF( vu ). For j

instance, if S = {aa → ab} then aaa − → aab −→ abb and aaa −→ aba but aba is not in A ∗ aaA∗ . 0

1

0

2. Lemma 13 indicates that, when its condition is satisﬁed, in each step of any left derivation from a word that is not in u A ∗ to a word of u A ∗ one rewrite the leftmost occurrence of word u while the derivation does not reach a word of u A ∗ . Observe that the condition PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ) of Lemma 13 is a necessary condition: if S = {babb → bbba} then leftrps(bababbabb, babbbabba) is not in 2N∗ but is in 5N∗ . 3. More generally, at each step of a left derivation from a word w to a word w , if the current word z = z0 . . . zk is different from w , one take the smallest i such that zi = w i and one rewrite the leftmost occurrence of u that ends after the position i. One can observe that, if r ≤ | f |, then for every m ∈ PF(u ) ∩ SF(u ) ∩ PF( v ), we get |m| ≤ r ≤ | f | so m ∈ SF( f ) ⊆ SF( v ). Hence, we get as a corollary of Lemma 13:

→ u w with wu not in A ∗ u A + and | w | = i, there exists a rps s with is ∈ Lemma 14. If r ≤ | f | then for every derivation wuw − s R θ S (s0). Note that the condition PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ) of Lemma 13 is weaker than the condition r ≤ | f | of Lemma 14. Indeed, the system {aaca → aaba} satisﬁes both r > | f | and PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ). Recall that our goal is to show that if s = leftrps( w , w ) then for every s ∈ leftrps( w , w ) it holds that s ∈ R θ S (s). Although Lemma 13 generalizes Lemma 14, we consider in the rest of this section the two cases r ≤ | f | and r > | f |. Lemma 15. If r ≤ | f | and s = leftrps( w , w ) then

s ∈ RPS( w , w ) ⇒ s ∈ R θ S (s). Proof. The proof is an induction over |s|. The case |s| = 0 is clear. Otherwise, one can assume without loss of generality that |s|0 > 0. Indeed, if |s|0 = 0, since com(s) = com(s ), we get w = α β and w = α β with |α | = min(s) and we can consider the derivation from β to β applying the corresponding translation on s and s. So, we have s = s1 0s2 with w − −→ u w −−→ w . Let w = xuy with xu not in A ∗ u A + and |x| = i. From Lemma 14, there s1

0s2

exists s1 such that is1 ∈ R θ S (s1 0). It follows is1 s2 ∈ R θ S (s) and w −s− → w that implies that s = is for some rps s . By − 1 s2 inductive hypothesis, s ∈ R θ S (s1 s2 ) so s = is ∈ R θ S (s). 2 In order to solve the case r > | f |, we study the links between the system S and the system S R = {u R → v R } where u R denotes the reverse of the word u that is: if u = u 0 u 1 . . . un then u R = un . . . u 1 u 0 . When r > | f |, we can apply Lemma 15 on system S R : indeed, the longest common preﬁx of u R and v R is f R and the longest common suﬃx of u R and v R is d R with | f R | ≤ |d R |. Let

X = PF(u R ) ∩ SF( v R ) = SF(u ) R ∩ PF( v ) R = Z R

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

211

and

Z = SF(u R ) ∩ PF( v R ) = X R . So X ∩ Z = ( X ∩ Z ) R . Lemma 16. θ S = θ S R . Proof. Let us consider ( j , i ) ∈ θ S R . If j > i + n then ( j , i ) ∈ θ S ; else j > i and i + n − j + 1 ∈ {|x| | x ∈ X ∩ Z }. Clearly, {|x| | x ∈ X ∩ Z } = F so ( j , i ) ∈ θ S . Symmetrically, we also get the implication ( j , i ) ∈ θ S ⇒ ( j , i ) ∈ θ S R so θ S = θ S R . 2 ∗

∗

Clearly there exists a bijection between the derivations w − → w in S and the derivations w R −→ w R in S R . More ∗ precisely, we deﬁne the following morphism hk from Nk−n to Nk∗−n by hk (i ) = k − n − 1 − i where k = | w |. Observe that hk ◦ hk is the identity and hk (i ) − hk ( j ) = j − i.

→ w in S if and only if w R −−−→ w R in S R where k = | w |. Lemma 17. w − hk ( s )

s

Proof. From the deﬁnition of hk , we get w − → w in S if and only if w R −h−− → w R in S R so w − → w in S if and only if i k (i ) s R R R w −−−→ w in S . 2 hk ( s )

Let

γ S be the partial commutation deﬁned by γ S = θ S ∪ θ S−1 . Then:

Lemma 18. Let s ∈ RPS( w , w ), k = | w | and s ∈ N∗ . R γ S (hk (s )) = R γ S (hk (s)) if and only if R γ S (s ) = R γ S (s). Proof. We have ((hk (i ), hk ( j )) ∈ θ S if and only if ( j , i ) ∈ θ S by hk (i ) − hk ( j ) = j − i. It follows that ((hk (i ), hk ( j )) ∈ γ S if and only if (i , j ) ∈ γ S . This implies R γ S (hk (s )) = R γ S (hk (s)) if and only if R γ S (s ) = R γ S (s). 2 We are now able to prove: Proposition 2. Let s and s be two rps and w, w be two words with s ∈ RPS( w , w ) then s ∈ RPS( w , w ) if and only if R γ S (s ) = R γ S ( s ). Proof. 1. For the only if part, we know that R θ S is conﬂuent from Corollary 1. It follows that if R γ S (s ) = R γ S (s), there exists s ∈ R θ S (s) ∩ R θ S (s ). From Proposition 1 we get s ∈ RPS( w , w ) and s ∈ RPS( w , w ). 2. For the if part, one distinguish two cases: • r ≤ | f |. Let us consider s = leftrps( w , w ). From Lemma 15 we get s ∈ R θ S (s) and s ∈ R θ S (s ) so R γ S (s ) = R γ S (s). • r > | f |. In this case, we can use system S R . Since w − → w and w −→ w in S, we have, from Lemma 17, w R −−−→ s

hk ( s )

s

R in S R with k = | w |. We can apply Lemma 15 on S R as in the previous case and, thanks to w R and w R − −−− → w hk ( s )

Lemma 16, we get R γ S (hk (s )) = R γ S (hk (s)) which implies R γ S (s ) = R γ S (s) from Lemma 18.

2

As a consequence, we obtain the following desired theorem without any condition over the lengths of d and f : Theorem 2. If s = leftrps( w , w ) then s ∈ RPS( w , w ) if and only if s is in R θ S (s). Proof. If s ∈ R θ S (s), it follows from Proposition 1 that s ∈ RPS( w , w ). Conversely, if s ∈ RPS( w , w ) and s ∈ RPS( w , w ), it follows from Proposition 2 that R γ S (s ) = R γ S (s) and, since R θ S is conﬂuent from Corollary 1, there exists s ∈ R θ S (s) ∩ R θ S (s ). Since s = s from Corollary 2, we get s ∈ R θ S (s). 2 We directly get by Theorem 2 and Corollary 2:

/ θ S for all i j ∈ F(s). Corollary 3. If s ∈ RPS( w , w ) then s = leftrps( w , w ) if and only if s is in R θ S normal form i.e., (i , j ) ∈ In the next section, the contributions of left derivations will play a crucial role and we call these contributions left contributions. The next proposition points out links between left contributions and the semi-commutation θ S .

212

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Proposition 3. A contribution c is a left contribution if and only if the property (P ): ∀i j ∈ F(c ), ( j , i ) ∈ / θ S is satisﬁed. Proof. Assume ﬁrst that c satisﬁes (P ). Since c is a contribution, there exists a rps s such that c = contt (s) for some t. If s is left, we are done. Otherwise, one use an induction over the number of steps of rewriting in R θ S that are needed to transform s into its equivalent left rps. Let s = s lms with (l, m) ∈ θ S . If l ≤ t ≤ m + n, denoting i = t − l and j = t − m, we get i j ∈ F(c ) and ( j , i ) ∈ θ S , a contradiction. So t < l or t > m + n. This implies contt (s mls ) = contt (s) = c and it follows by induction that c is a contribution of a left derivation. Conversely, let c = contt (s) for some rps s. Assume that there exists i j in F(c ) such that ( j , i ) ∈ θ S . It follows s = s (t − i )s (t − j )s with contt (s ) = ε . Thus for all letters k in s , t < k or k < t − n, moreover (t − i , t − j ) ∈ θ S since (t − i ) −(t − j ) = j − i. Assume that s = s1 s2 for all s1 = t (s ). This implies s = s1 k1 k2 s2 with k1 > t and k2 < t − n and it follows that (k1 , k2 ) ∈ θ S so s is not left. If s = s1 s2 , one can distinguish three cases: 1. if s1 = ε , then s = s (t − i )k1 s1 s2 (t − j )s with k1 < t − n ≤ t − j. It follows that (t − i )k1 is not left by Lemma 5 so s is not left. 2. if s2 = ε , we get s = s (t − i )s1 s2 k2 (t − j )s with k2 > t ≥ t − i. It follows that k2 (t − j ) is not left by Lemma 5 so s is not left. 3. if s1 = s2 = ε , then s = s (t − i )(t − j )s is not left since (t − i , t − j ) ∈ θ S . 2 Corollary 4. A rps s is left if and only if: 1. if i j ∈ F(s) then i ≤ j + n and 2. all the contributions of s are left. Proof. We only have to prove the if part. If s is a non-left rps, it follows from Corollary 3 that there exists i j ∈ F(s) with (i , j ) ∈ θ S . Assume i ≤ j + n, it follows 0m ∈ F(conti (s)) with m = i − j and (m, 0) ∈ θ S and by Proposition 3 conti (s) is not a left contribution. 2 As a consequence we also get: Corollary 5. It holds that all the contributions are left if and only if X ∩ Z = ∅. To ﬁnish this section, we now prove a property, that extends the result of Lemma 3 and that will be useful in the next section. We need the following lemma: Lemma 19. Let w 0 w 1 . . . w k − → w 0 w 1 . . . w k for some letters w 0 , w 1 , . . . , w k , w 0 , w 1 , . . . , w k and some sequence s. Let 0 ≤ i ≤ s

k − r + 1 such that for every j ∈ F(s), j + n < i + r or j ≥ i then: 1. w 0 . . . w i +r −1 −−−−→ w 0 . . . w i +r −1 , < i ( s )

2. if |s|i = 0 then w i +r = w i +r , 3. if w i +r = v r then (|s|i = 0 ∧ w i +r = v r ). Proof.

→ 1. The property is clearly true if s = ε . Now let s = js for some integer j and some sequence s and w 0 w 1 . . . w k − j w 0 w 1 . . . w k −→ w 0 w 1 . . . w k . By inductive hypothesis, we have w 0 . . . w i +r −1 −−−−− → w 0 . . . w i +r −1 . Now, if j ≥ i s

< i ( s )

then w 0 . . . w i +r −1 = w 0 . . . w i +r −1 and

2. Clearly, it is suﬃcient to prove the property for s = j for some integer j = i. If j + n < i + r or if j > i + r then w i +r is not affected by the step of rewriting so w i +r = w i +r ; else by hypothesis i < j ≤ i + r so w i +r = u i +r − j , moreover since i + r − j < r, we get u i +r − j = v i +r − j = w i +r . 3. As above, we only have to consider the case s = j for some integer j. Since w i +r = u r , it follows j = i and, from 2, w i +r = w i +r = v r . 2 We can now prove the following property:

→ x y and xz − → x z where x, y, z, x , y and z are words with |x| = |x |, for every Proposition 4. For all left derivations xy − s s 0 ≤ i < |x| − r, conti (s) = conti (s ).

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

213

Proof. The proof is by contradiction: let xy − → x y and xz −→ x z be two left derivations for some words x, y , z, x , y , z s s with |x| = |x | that contradict the above statement. Moreover, we assume that K = |x| + |s| + |s | is minimal. Let i be the smallest integer such that conti (s) = conti (s ). This implies i = |x| − r − 1 else we could choose a shorter x and a lower K . Since conti (s) = conti (s ) and for every 0 ≤ i < i, conti (s) = conti (s ), we get that |s|i + |s |i > 0. Furthermore, from Lemma 3, |s|i = |s |i so s and s are not empty. Then s = i 1 s1 and s = i 1 s1 for some integers i 1 , i 1 and some sequences s1 , s1 . Let us suppose i 1 = i 1 = j then xy − → x y −−→ x y with |x | = |x| and xz − → x z −− x z but this leads to a contradiction since s→ j

j

s1

1

|x | + |s1 | + |s1 | < |x| + |s| + |s | so i 1 = i 1 . We assume in the following i 1 > i 1 . Observe that, from the hypothesis on i, it follows by induction on |s| + |s | that i. Since i = |x| − r − 1, we have xy −i→ xy −− → x y . Moreover, conti (i 1 s1 ) = s1 z , we obtain again1 a contradiction conti (s1 ) so taking the left derivations xy − − → x y and xz −→ x since |x| + |s1 | + |s | < s1 s |x| + |s| + |s |; it follows i 1 = i and i 1 < i. Since
as follows:

s

x0 . . . xi +r y − → w 0 . . . w i +r y − −−−−→ w 0 . . . w i +r y − −→ x y α k ...α k δ i

1 1

t

t

and

x0 . . . xi +r z − −−−−→ w 0 . . . w i +r z − −→ x z k ...β k δ 1

t

t −1

with δ = αt +1 . . . k p α p +1 , δ = βt . . . k p β p if t < p and δ = α p +1 , δ = β p if t = p. In particular, w i +r = v r and, from item 3 of Lemma 19, w i +r = v r . On the other hand, from item 2 of Lemma 19 we get w i +r = xi +r = u r . But this leads to a contradiction since the derivations w 0 . . . w i +r y − −→ x y and w 0 . . . w i +r z −k−− → x z k δ δ t

t

imply w i +r = w i +r = u i +r −kt . This contradiction proves that k j + n < i + r for every j ≤ t. Let us now consider the two rps i α1k1 . . . αt kt and k1 β1 . . . kt β i and their corresponding derivations from xy and xz. We have:

x0 . . . xi +r −1 u r y − → x0 . . . xi +r −1 v r y − −−−−−→ x0 . . . xi +r −1 xi +r y α k ...α k i

t t

1 1

and

x0 . . . xi +r −1 u r z − −−−→ x0 . . . xi +r −1 ur z − → x0 . . . xi +r −1 v r z . k ...k β 1

t

i

From Item 1 of Lemma 19, we have x0 . . . xi +r −1 −−−−→ x0 . . . xi +r −1 and x0 . . . xi +r −1 −−−−→ x0 . . . xi +r −1 so x0 . . . xi +r −1 = k1 ...kt k1 ...kt x0 . . . xi +r −1 . Moreover, by Item 3 of Lemma 19, xi +r = v r which implies x0 . . . xi +r = x0 . . . xi +r −1 v r , and from the minimality of K = |x| + |s| + |s |, we have p = t, |s|i = |s |i = 1, α p +1 = ε and β p = β p i. In particular, denoting w = x0 . . . xi +r −1 u r . . . un and w = x0 . . . xi +r −1 v r . . . v n , it follows the existence of the two following derivations:

w− → x0 . . . xi +r −1 v r . . . v n − −−−→ w k1 ...k p i w− −−−→ x0 . . . xi +r −1 ur . . . un − → w. k1 ...k p i From Proposition 2, R γ S (ik1 . . . k p ) = R γ S (k1 . . . k p i ). This implies that for any k ∈ {k1 . . . k p }, (i , k) ∈ θ S since k + n < i + r. Let us ﬁnally consider the rps s = i α1 k1 . . . α p k p and let us distinguish two cases: 1. if α1 = ε , it follows that ik1 is not a left rps since (i , k1 ) ∈ θ S and that implies that s is not a left rps either, a contradiction. 2. α1 = α1 j for some integer j. Since (i , k1 ) ∈ θ S and j ≥ i, it follows from Lemma 5 that jk1 is not left and that implies that s is not left, a contradiction. 2 4. A transducer for one-rule length-preserving system In a previous paper [13], we addressed the problem to know whether the relation induced by a one-rule lengthpreserving rewrite system is rational. We have studied and partially proved the following conjecture proposed in [14,21]: Conjecture 1. A non-trivial one-rule length-preserving rewrite system is a rational transduction if and only if it is not quasi-conjugate.

214

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228 a

a

b

a

b

a

a

b

a

a

↓ a

a

b

a

a

b

a

b

a

a

a

b

a

b

a

b

a

a

a

b

a

1

2

3

↓ a

a

↓

contributions :

a

a

ε

0

a 1

b 0 2

a 1 3

b 2

a 3 0

Fig. 2. The contributions of left derivation from aababaabaa to aaaabababa.

Here a one-rule system S = {u → v } is called quasi-conjugate if there are words x, y and z such that u = xzy and v = yzx. In [13] we proved the only if part of the conjecture and, conversely, we considered two cases for which the if part is satisﬁed. These cases are based on the kind of overlaps that exist between u and v and depend on the presence of short pairs of overlaps, that are pairs of overlaps (x, z) ∈ X × Z such that |xz| ≤ |u | and the presence of large pairs of overlaps, that are pairs of overlaps (x, z) ∈ X × Z such that |xz| > |u |. In this context, the aim of this section is, given a one-rule length-preserving rewrite system S, to deﬁne a canonical automaton that recognizes the following language L S : ∗

system S, we associate the language L S = { w ⊗ w | w − → w} Deﬁnition 3. With every one-rule length-preserving nrewrite where ⊗ is the binary operation deﬁned from A × A n to ( A × A )∗ by: n ≥0

• ε ⊗ ε = (ε , ε ) • xw ⊗ y w = (x, y )( w ⊗ w ), where x, y ∈ A and w , w ∈ A ∗ . A one-rule length-preserving rewrite system S is a rational transduction if and only if the language L S is regular [6,5]. The language L S need not be regular: for instance, for S 1 = {ba → ab}, am bn ∈ S 1 (bm an ) if and only if m = n so

L S 1 ∩ (b, a)∗ (a, b)∗ = {(b, a)n (a, b)n | n ≥ 0}. As a matter of fact, L S need not even be context-free: Example 4. Let S 6 = {baa → aab}; it is easily seen that (aa)n+ p (ba)m (bb)q ∈ S 6 ((ba)n (bb) p (aa)m+q ) if and only if n = m and p = q. It follows

L S 6 ∩ ((b, a)(a, a))∗ ((b, a)(b, a))∗ ((a, b)(a, a))∗ ((a, b)(a, b))∗

= {((b, a)(a, a))n ((b, a)(b, a)) p ((a, b)(a, a))n ((a, b)(a, b)) p } which is not context-free. Nevertheless it is possible to give a deﬁnition of an automaton recognizing L S where the states are exactly all left contributions, even if, as shown by the previous examples, this automaton may be inﬁnite (i.e. it may have an inﬁnite number of states). At the contrary, this automaton will be always ﬁnite when S is a rational transduction, as, for instance, in the cases pointed in [13]. Let us ﬁrst give an example that illustrates the idea of the construction. Example 5. Let S 7 = {abaa → aaba}. We have u 0 = u 2 = u 3 = v 0 = v 1 = v 3 = a and u 1 = v 2 = b. Let us consider in Fig. 2 the left derivation from w = aababaabaa to w = aaaabababa and its corresponding contributions. This would lead to the automaton part given in Fig. 3. One can make several observations on this example:

• In this part of automaton, the left derivation w − → w with s = 3.1.6 corresponds to the successful path going s

successively through the states cont0 (s) = ε , cont1 (s) = 0, cont2 (s) = 1, cont3 (s) = 0.2, cont4 (s) = 1.3, cont5 (s) = 2, cont6 (s) = 3.0, cont7 (s) = 1, cont8 (s) = 2 and cont9 (s) = 3, • the successors of the different states can easily be computed: for instance the successor of the state cont3 (s) = 0.2 is the state

cont4 (s) = (0 + 1).(2 + 1) = 1.3,

• the state 1 has two successors: 2 = 1 + 1 and 0.2 = 0.(1 + 1) because v 0 = u 2 ,

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

215

Fig. 3. An automaton part corresponding to a left derivation.

• the label of every transition (q, x/ y , q ) with q = ε is completely deﬁned by the destination state of the transition: for instance in the label b/a of the transition (0.2, w 4 / w 4 , 1.3) = (0.2, b/a, 1.3), we have b = u 1 where 1 is the ﬁrst index of 1.3 and a = v 3 where 3 is the last index of 1.3. We shall now give precisely the construction. Since (left) contributions play a central role in this automaton, we now give some basic properties that are satisﬁed by contributions: Property 2. 1. 2. 3. 4.

every factor of a contribution is a contribution, every factor of a left contribution is a left contribution, no contribution contains 00 as a factor, no contribution contains nn as a factor.

Proof. 1 is a consequences of item 1 of Property 1 and 2 is a consequence of Proposition 3. For 3, it is suﬃcient to prove that 00 is not a contribution. Let s be a rps such that there exists an index j that satisﬁes cont j (s) = 00. Then s = s1 js js2 with cont j (s ) = ε . It follows that every i with |s |i > 0 satisﬁes i < j − n or i > j. Since < j −n (s ) j > j (s ) j ∈ R θ S ( js j ), we get by Proposition 1 that < j −n (s ) j > j (s ) j is a rps and it follows that there exists a rps js j such that for every i with |s |i > 0, i > j. From item 5 of Property 1 we obtain that there exists a rps 0s 0 which implies S ( v A ∗ ) ∩ u A ∗ = ∅, a contradiction. Symmetrically, one can prove 4 using the property S ( A ∗ v ) ∩ A ∗ u = ∅. 2 Note that Property 2 does not imply that no contribution contains ii as a factor if i = 0 and i = n. Indeed let us consider S 8 = {cba → abc }. We have leftrps(cbcbaba, ababcbc) = 2.0.4.2, cont2 (2.0.4.2) = 0.2.0 and cont3 (2.0.4.2) = 1.1. As we said before, the states of our automaton will be the left contributions. The diﬃculty is that we do not know how to compute the set of all these left contributions or how to decide, given a sequence c = i 1 . . . ik of Nn∗ , whether this sequence is a (left) contribution or not. Of course, if c = ε or if c = 1 then c is a left contribution, else, if k > 1, we must apply a ﬁrst ﬁlter, and we consider the sequences c = i 1 . . . ik ∈ Nn∗ that satisfy the following properties that must be satisﬁed by left contributions:

P1 ) v i 1 v i 2 . . . v ik−1 = u i 2 . . . u ik (coming from Lemma 2) P2 ) ∀1 ≤ t < k, (it it +1 = 00) ∧ (it it +1 = nn) (coming from Property 2) P3 ) ∀1 ≤ t < k, (it +1 , it ) ∈ / θ S (coming from Proposition 3). We deﬁne the set of state candidates (denoted SC S ) as the set of sequences that satisfy the three properties P1 ,P2 and P3 . In particular, we have ε ∈ SC S and Nn ⊆ SC S . We call these sequences candidates because the three properties P1 ,P2 and P3 are not suﬃcient to ensure that a state candidate will really be a state in the automaton. The reason is that the candidate must appear as a contribution in a successful rewriting and a deﬁnition of accessible and co-accessible contributions is needed. Let h be the morphism deﬁned from N∗ to N∗ by h(n) = ε and h(i ) = i + 1 for i = n and let π be the morphism deﬁned from N∗ to N∗ by π (0) = ε and π (i ) = i for i = 0. Note that if c = conti (s) for some s and c = conti +1 (s) then π (c ) = h(c ) so we deﬁne the operation successor over the set of state candidates SC S , denoted succ, by:

∀c ∈ SC S , succ(c ) = {c ∈ SC S | π (c ) = h(c )}.

216

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

It is easily seen that the operation succ satisﬁes the following properties:

• • • • •

succ(ε ) = succ(n) = {ε , 0}, ∀0 ≤ i < n, i + 1 ∈ succ(i ), if iq ∈ succ((i − 1)q) then q ∈ succ(q), if qnq and qq are in SC S then succ(qnq ) = succ(qq ), if q ∈ succ( p ) then |q| + 1 ≤ 2(| p | + 1) and | p | + 1 ≤ 2(|q| + 1).

Symmetrically, we deﬁne the operation predecessor over the set of state candidates SC S , denoted pred, by: ∀c ∈ SC S , pred(c ) = {c ∈ SC S | c ∈ succ(c )}. Since the operations succ and pred can be seen as binary relations over SC S , we also deﬁne their reﬂexive and transitive closures that we denote succ∗ and pred∗ . Clearly, for every i ∈ Nn , succ∗ (ε ) = succ∗ (i ) and pred∗ (ε ) = pred∗ (i ). Example 6. Let S 8 = {cba → abc }. Note that θ S 8 = ∅. We have succ(ε ) = {ε , 0} and succ(0) = {1}: indeed neither 0.1 nor 1.0 belongs to SC S 8 by property P1 . We have succ(1) = {2, 0.2, 2.0, 0.2.0}: indeed v 0 = u 2 = a and v 2 = u 0 = c. Since pred(1.1) = {0.2.0, 2.0.2.0, 0.2.0.2, 2.0.2.0.2}, we have 1.1 ∈ succ∗ (ε ). We are now able to give the deﬁnition of our automaton: Deﬁnition 4. Let S = {u → v } be a one-rule length-preserving rewrite system. The left-contribution automaton LCA S = ( A × A , Q lca , I lca , F lca , lca ) where Q lca is the set of states, I lca ⊆ Q lca is the set of initial states, F lca ⊆ Q lca is the set of ﬁnal states and lca is the set of transitions, is deﬁned by:

• • • •

Q lca = succ∗ (ε ) ∩ pred∗ (n) I lca = {ε } F lca = {ε , n} lca = {(c , x/ y , c ) | c ∈ succ(c ), c = i 1 . . . ik = ε , x = u i 1 , y = v ik } ∪ {(ε , x/x, ε ) | x ∈ A } ∪ {(n, x/x, ε ) | x ∈ A }.

Note that, by its deﬁnition based on succ and pred, the state set Q lca satisﬁes an important property: it is closed under factors. We also observe that LCA S is a trim automaton (i.e. all states are accessible and co-accessible) and it has a single initial state. Also note the three following properties satisﬁed by the set of transitions lca :

P4 ) for 0 < i ≤ n, if ((i − 1)q, u i / y , iq ) ∈ lca then (q, v i / y , q ) ∈ lca , P5 ) if (q, u 0 / y , 0q ) ∈ lca then (q, v 0 / y , q ) ∈ lca , P6 ) if (nq, x/ y , q ) ∈ lca then (q, x/ y , q ) ∈ lca . In order to prove the correctness of this automaton, that is LCA S recognizes the language L S , we ﬁrst show that every successful path in the automaton is associated with a left derivation: Lemma 20. Let (ε , x0 / y 0 , q0 )(q0 , x1 / y 1 , q1 ) . . . (qk−1 , xk / yk , qk ) be a successful path in the automaton LCA S and let w = x0 x1 . . . xk and w = y 0 y 1 . . . yk . Then w ∈ S ( w ) and for every i ∈ [0, k], q i = conti (s) where s = leftrps( w , w ).

Proof. The proof is an induction on N = i ∈[0,k] (|qi |0 ). If N = 0 then for every i ∈ [0, k], q i = ε , so qi = conti (ε ) and w = w . If N > 0, let us consider the smallest i such that

qi = 0qi , qi +1 = 1qi +1 , . . . , qi +n = nqi +n . Such an index i exists since the path is successful: for instance consider the greatest j such that q j = 0qj . Note that i + n is the smallest index such that q i +n ∈ nNn∗ . We get xi = u 0 , xi +1 = u 1 , . . . , xi +n = un so it follows (qi −1 , u 0 / y i , qi ), (qi , u 1 / y i +1 , qi +1 ), . . . , (qi +n−1 , un / y i +n , qi +n ) ∈ lca . This implies (qi −1 , v 0 / y i , qi ) ∈ lca from P5 and we also get from P4 :

(qi , v 1 / y i +1 , qi +1 ), . . . , (qi +n−1 , v n / y i +n , qi +n ) ∈ lca . Moreover, since q i +n = nqi +n , if i + n < k, we get from P6 :

(qi +n, xi +n+1 / y i +n+1 , qi +n+1 ) ∈ lca and, if i + n = k, it follows qi +n = ε . So, in both cases, we get a new successful path and, using the inductive hypothesis, we obtain the left derivation

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

217

x0 x1 . . . xi −1 vxi +n+1 . . . xk − → w s

with ∀ j ∈ [0, i − 1] ∪ [i + n + 1, k], q j = cont j (s ), ∀ j ∈ [i , i + n], qj = cont j (s ). Let s = is , it remains to prove: 1. ∀0 ≤ j ≤ k, q j = cont j (is ), 2. is is left. For 1, it is suﬃcient to observe that

∀ j ∈ [0, i − 1] ∪ [i + n + 1, k], q j = cont j (s ) = cont j (is ) and ∀ j ∈ [i , i + n], q j = ( j − i )qj = cont j (is ). For 2, the property is clearly true if s = ε , so let s = js for some j and s . For a proof by contradiction, assume that i > j + n. Then it follows

q j = cont j (s) = cont j (ijs ) = cont j ( js ) = 0qj , q j +1 = cont j +1 ( js ) = 1qj +1 , . . . , q j +n = nqj +n , a contradiction to i being smallest. So i ≤ j + n. By 1, all the contributions of is are left contributions, and then by Corollary 4, is is left. 2 As a direct consequence of Lemma 20, we obtain: Corollary 6. The automaton LCA S is unambiguous.3 We shall now prove that every word of L S is recognized by automaton LCA S : Lemma 21. Let w , w be two words with w ∈ S ( w ) then there exists a successful path in automaton LCA S that is labeled by w ⊗ w . Proof. Let w , w be two words with w ∈ S ( w ) and s = leftrps( w , w ). We prove the lemma by induction over |s|, the length of the rps s.

• If s = ε then w = w and there clearly exists a successful path, labeled by w ⊗ w for the state ε to the state ε , using the transitions in {(ε , x/x, ε ) | x ∈ A }. • If s = ε then s = s i and there exist words w 1 , α , w 2 , w 1 , w 2 with | w 1 | = | w 1 | = i and | w 2 | = | w 2 | such that w = w 1 α w 2 −→ w 1 u w 2 − → w 1 v w 2 = w . By inductive hypothesis, there exists a successful path in LCA S , labeled by w ⊗ s

i

w 1 u w 2 . This path can be decomposed in three paths: one path from the state ε to a state q labeled by w 1 ⊗ w 1 , a path from q to a state q labeled by α ⊗ u and a path from q to a ﬁnal state q labeled by w 2 ⊗ w 2 . We shall prove that there exists a path from q to q n labeled by α ⊗ v that will give a successful path in LCA S labeled by w ⊗ w . We can write the path from the state q to the state q labeled by α ⊗ u as the concatenation of transitions:

(q, α0 /u 0 , q0 )(q0 , α1 /u 1 , q1 ) . . . (qn−1 , αn /un , qn ) with qn = q and, from Lemma 20, we have for every j ∈ [0, n], q j = conti + j (s ). It follows that for every j ∈ [0, n], q j j = conti + j (s) is a state in the automaton LCA S . Moreover, we have q0 0 ∈ succ(q) and for every j ∈ [1, n], q j j ∈ succ(q j −1 ( j − 1)). Now, if q0 = ε it follows α0 = u 0 , q ∈ {ε , n} and (q, α0 / v 0 , 0) ∈ lca , else, if q0 = i 1 . . . ik then α0 = u i 1 and (q, α0 / v 0 , q0 0) ∈ lca . Similarly, we have for every j ∈ [1, n], (q j −1 ( j − 1), α j / v j , q j j ) ∈ lca so there exists a path from q to q n labeled by α ⊗ v. We now distinguish two cases: 1. If w 2 = ε then q is ﬁnal and, since q n is a state it follows q = ε and q n = n is ﬁnal. 2. If w 2 = xw for some letter x then w 2 = y w for some letter y and there exists a state q such that (q , x/ y , q ) ∈ lca and such that there exists a path from q to the ﬁnal state q labeled by w ⊗ w . Since (q n, x/ y , q ) ∈ lca from the deﬁnition of succ, there exists a path from q to q labeled by α w 2 ⊗ v w 2 . In all cases we obtain a successful path in automaton LCA S that is labeled by w ⊗ w .

2

By Lemma 20 and Lemma 21, we get:

3 We consider here the ambiguity of the automaton LCA S recognizing the regular language L S and not the ambiguity of the associated transducer that computes S ( w ) for every word w.

218

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Fig. 4. The automaton LCA S 9 for S 9 = {aba → bba}.

Proposition 5. For every one-rule length-preserving rewrite system S, the left-contribution automaton LCAS recognizes the language L S . Note that another consequence of Lemma 20 and Lemma 21 is that the state set Q lca is exactly the set of all the left contributions. Example 7. Let S 9 = {aba → bba}. Let us compute the set of accessible states of the automaton LCA S 9 starting from the initial state ε : succ(ε ) = {ε , 0}; succ(0) = {1, 0.1} because 1.0 does not satisfy property P1 ; succ(1) = {2, 2.0} (0.2 does not satisfy property P1 ); succ(0.1) = ∅ because 1.2, 1.0 and 0.2 do not satisfy property P1 . It follows that 0.1 is not co-accessible and will not be a state in the automaton; succ(2) = {ε , 0} and ﬁnally succ(2.0) = {1, 0.1} and, as 0.1 is not co-accessible, it is not in Q lca and there is a single outgoing transition leaving state 2.0. This gives the automaton LCA S 9 of Fig. 4. The automaton in Example 7 is not only unambiguous but also deterministic. Observe that it would not be the case if u 0 = v 0 since in this case there exist a transition (ε , u 0 /u 0 , ε ) and a transition (ε , u 0 /u 0 , 0) in LCA S . Conversely, we can prove that u 0 = v 0 is suﬃcient to ensure that LCA S is deterministic: Proposition 6. The automaton LCA S is deterministic if and only if u 0 = v 0 . Proof. Let q, q , q be states in LCA S with q = q and x, y be two letters such that (q, x/ y , q ) ∈ lca and (q, x/ y , q ) ∈ lca . We claim u 0 = v 0 . If q ∈ {ε , n} then {q , q } = {ε , 0}. We can suppose q = ε and q = 0; then the transition (q, x/ y , ε ) implies x = y and the transition (q, x/ y , 0) implies x = u 0 and y = v 0 , so u 0 = v 0 . If q ∈ / {ε , n} then ε ∈ / {q , q }. Since q ∈ succ(q), q ∈ succ(q) and q = q , we can distinguish three cases: 1. q = c 1 ijc2 and q = c 1 i0 jc 2 for some sequences c 1 , c 2 , c 1 , c 2 and some integers i , j. It follows from property P1 that v i = u j , v i = u 0 and v 0 = u j which implies u 0 = v 0 . 2. q = c 1 i and q = c 1 i0 for some sequences c 1 , c 1 and some integer i. It follows from property P1 that v i = u 0 and from the deﬁnition of the transitions in LCA S that v 0 = y = v i which implies u 0 = v 0 . 3. q = ic 1 and q = 0ic 1 for some sequences c 1 , c 1 and some integer i. It follows from property P1 that v 0 = u i and from the deﬁnition of the transitions in LCA S that u i = x = u 0 which again implies u 0 = v 0 . 2 Even if automaton LCA S is not deterministic (which happens when u 0 = v 0 ), it is in fact not very far from being deterministic. More precisely, as a consequence of Proposition 4 and from Lemma 20, we obtain that automaton LCA S has a bounded delay (see [1]). This property generalizes the notion of deterministic automata since a deterministic automaton is an automaton with a delay 0. Deﬁnition 5. Let A = ( A , Q , I , F , ) be an automaton over alphabet A where Q is the set of states, I ⊆ Q is the set of initial states, F ⊆ Q is the set of ﬁnal states and ⊆ Q × A × Q is the set of transitions. Automaton A has a bounded delay δ ≥ 0 if q1 = q1 holds for all transitions ( p , a, q1 ) and ( p , a, q1 ) and paths from state q1 to some state q2 and from state q1 to some state q2 labeled by a same word v with | v | = δ . Recall that r is the length of the longest common preﬁx of u and v. We have: Proposition 7. Automaton LCA S is an automaton with a bounded delay r. Proof. Let ( p , α /β, p ) and ( p , α /β, p ) be two transitions in lca . Let q , q in Q lca and w in ( A × A )∗ such that | w | = r and there exist a path from p to q and a path from p to q both labeled by w. Since automaton LCA S is trim and has

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

219

Fig. 5. The canonical automaton CAN S 9 for S 9 = {aba → bba}.

a single initial state ε , it follows that there exist a path from ε to p, labeled by some word w ∈ ( A × A )∗ , a path from q to a ﬁnal state f labeled by some word w ∈ ( A × A )∗ and a path from q to a ﬁnal state f labeled by some word w ∈ ( A × A )∗ . Let x, y , x , y ∈ A ∗ deﬁned by x ⊗ x = w (α /β) w, y ⊗ y = w and z ⊗ z = w . From Lemma 20, we have xy − → x y and xz −→ x z for some left rps s and some left rps s . Moreover p = cont| w | (s) and p = cont| w | (s ). Since s s | w | = r, it follows that | w | < |x| − r and, from Proposition 4, it follows cont| w | (s) = cont| w | (s ) so p = p . 2 We observe that the automaton in Example 7 is not minimal since some states are equivalent: clearly 2 is equivalent to

ε and 2.0 is equivalent to 0. The reason comes from the deﬁnition of succ: if we consider the morphism ψ deﬁned from N∗ to N∗ by ψ(n) = ε and ψ(i ) = i for i = n then for all states q and q such that ψ(q) = ψ(q ) it holds that succ(q) = succ(q ). This property is used to deﬁne the canonical automaton for a one-rule length-preserving rewrite system S as follows: Deﬁnition 6. The canonical automaton CAN S = { A × A , Q can , I can , F can , can } for a one-rule length-preserving rewrite system S is deﬁned by:

• Q can = ψ(succ∗ (ε ) ∩ pred∗ (n)) • I can = F can = {ε } • can = {(ψ(c ), x/ y , ψ(c )) | (c , x/ y , c ) ∈ lca }. Example 8. The canonical automaton corresponding to the automaton of Example 7 is given Fig. 5. Observe that we have Q can ⊆ Nn∗−1 by deﬁnition. The following properties are clearly preserved from automaton LCA S to automaton CAN S :

• The state set Q can is closed under factors. • Automaton LCA S and automaton CAN S recognize the same language: indeed if (c , x/ y , c ) ∈ lca and ψ(c ) = ψ(c ) for some c ∈ Q lca then (c , x/ y , c ) ∈ lca . • Automaton CAN S is an automaton with bounded delay r and so is unambiguous. • Automaton CAN S is deterministic if and only if u 0 = v 0 and is bi-deterministic if and only if u 0 = v 0 and un = v n . In contrast, some properties that are satisﬁed by automaton LCA S are not satisﬁed any more by automaton CAN S :

• 00 may appear as a factor in a label of a state of CAN S : indeed 00 ∈ Q can if and only if 0n0 ∈ Q lca . Note that in this case, v 0 = un and v n = u 0 . • The label of a transition is not given by the label of its reaching state anymore, so different transitions reaching the same state may have different labels. Example 9. The canonical automaton for the rewrite system S 8 = {cba → abc } is the inﬁnite automaton given in Fig. 6. Observe that, for the system S 8 = {cba → abc } of Example 9, we get that L S 8 ∩ (c , a)((b, b)(c , a))∗ (b, b)((a, c )(b, b))∗ (a, c ) is equal to the non-regular language {(c , a)((b, b)(c , a))k (b, b)((a, c )(b, b))k (a, c ) | k ≥ 0}. It follows that S 8 is not a rational transduction and it is not surprising that the canonical automaton for this system is inﬁnite. Conversely, one can wonder whether the canonical automaton for a system S such that L S is regular could be inﬁnite. The answer is clear if u 0 = v 0 and un = v n : indeed in this case CAN S is bi-deterministic that implies its minimality so L S is regular if and only if CAN S is ﬁnite. It is not so immediate in the general case and we shall use the following lemma that establishes a link between the size of different states of automaton LCA S that are reached by a same word. For every state q of LCA S , we will denote by pre(q) the set of all the words labeling a path from state ε to state q and, symmetrically, post(q) will denote the set of all the words labeling a path from state q to a ﬁnal state. Lemma 22. There exists a positive integer K such that for all states q, q of LCA S , if pre(q) ∩ pre(q ) = ∅ then |q | ≤ K (|q| + 1).

220

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Fig. 6. The canonical automaton CAN S 8 for S 8 = {cba → abc }.

Proof. Let ( p , x/ y , p ) ∈ then | p | + 1 ≤ 2(| p | + 1) and | p | + 1 ≤ 2(| p | + 1). It follows by induction that if there exists a path from some state p to some state p labeled by some word w in LCA S then | p | + 1 ≤ 2| w | (| p | + 1) and | p | + 1 ≤ | w| 2 (| p | + 1). Now, if we consider two states q, q and a word w ∈ pre(q) ∩ pre(q ), it follows from Proposition 7 that there exist a state q and two words w , w with w = w w and | w | ≤ r such that w ∈ pre(q ) and there exist a path from q to q labeled by w and a path from q to q labeled by w . It follows that |q | + 1 ≤ 2| w | (|q | + 1) ≤ 4| w | (|q| + 1). Since | w | ≤ r, taking K = 4r , we get |q | ≤ K (|q| + 1). 2 We can now state: Theorem 3. A one-rule length-preserving rewrite system S = {u → v } is a rational transduction if and only if its canonical automaton

CAN S is ﬁnite.

Proof. We only have to prove the only if part. Let S = {u → v } be a one-rule length-preserving rewrite system such that CAN S is inﬁnite. It follows that automaton LCA S is inﬁnite too. We shall prove that the language L S has an inﬁnite number of residuals and so is not regular. More precisely, we prove that for every integer p there exists a word w ∈ ( A × A )∗ such that

w −1 L S = ∅ ⊆ ( A × A ) p ( A × A )∗ . Indeed, let p be a positive integer and let us consider a state q of LCA S such that |q| ≥ 2 p K where K is the constant of Lemma 22. Let w ∈ pre(q), it follows from Lemma 22 that every state q such that w ∈ pre(q ) satisﬁes |q | ≥ 2 p . Moreover, from the deﬁnition of succ, one can prove by induction on | w | that for all q ∈ Q can , w ∈ pre(q ), |q | ≤ |2| w | . It follows p that, if |q | ≥ 2 , we get that every word w in post(q ) satisﬁes | w | ≥ p. This implies

w −1 L S =

post(q) ⊆ ( A × A ) p ( A × A )∗

q, w ∈pre(q)

that proves the theorem.

2

As a consequence of Theorem 3, we obtain, in the case of a one-rule rewrite system, the converse of a result of Bala Ravikumar in [19] that gives a suﬃcient condition for a length-preserving rewrite system to be a rational transduction. This condition is based on the notion of change-bounded length-preserving rewrite system that was introduced in the same article and that we recall here in another but equivalent form. Observe that the deﬁnition of rps, introduced in the case of one-rule length-preserving rewrite systems makes sense for arbitrary rewrite systems. Deﬁnition 7. A length-preserving rewrite system S = {u 1 → v 1 , . . . , ut → v t } is called change-bounded (by K ) if there exists an integer K such that for every rps s and every integer j, |s| j ≤ K . Intuitively, this means that in every derivation in S, the number of times that a change (an application of a rule) is made at a same position during the derivation is bounded by K . Bala Ravikumar has proved: Proposition 8. (See [19].) A change-bounded length-preserving rewrite system is a rational transduction. Now, from Theorem 3, we get that if a one-rule length-preserving rewrite system is a rational transduction then its canonical automaton CAN S is ﬁnite. This implies that LCA S is ﬁnite too and, since the state set Q lca is the set of all left contributions, it follows that for every left contribution c, |c |0 is bounded. This directly implies that S is change-bounded by |c |0 = |s| j if c = cont j (s) and we have: Theorem 4. A one-rule length-preserving string rewrite system is a rational transduction if and only if it is change-bounded.

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

221

Clearly, for every one-rule length-preserving rewrite system S = {u → v }, for all words w and w ∈ S ( w ), it holds that if S is change-bounded then the system { w → w } is change-bounded too, and we get: Corollary 7. For every one-rule length-preserving rewrite system S = {u → v }, for all words w and w ∈ S ( w ), if S is a rational transduction then the system { w → w } is a rational transduction. We also easily get: Corollary 8. Let S = {u → v } be a one-rule length-preserving rewrite system and h be a letter-to-letter morphism such that h(u ) = h( v ). If the system {h(u ) → h( v )} is a rational transduction then S is a rational transduction. Proof. Clearly, if {h(u ) → h( v )} is change-bounded then S is change-bounded.

2

5. A suﬃcient condition for context-freeness of L The language L S of Deﬁnition 3 need not even be context-free as we have seen in Example 4. The aim of this section is to identify a recursive family of length-preserving rewrite systems for which the language L S is always context-free. This family is the family of systems S such that the state set Q lca of automaton LCA S satisﬁes the single continuation property that we now introduce: Deﬁnition 8. Let be an alphabet. A language L ⊆ ∗ satisﬁes the single continuation property if for every a ∈ there exists a unique b ∈ such that ab ∈ F( L ). We ﬁrst prove that the single continuation property is easily decidable. We have: Proposition 9. The state set Q lca satisﬁes the single continuation property if and only if 1. u = xyz, v = zyx for some words x, y , z with X = {x} and Z = { z}, 2. A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅. Some lemmas are needed to prove this proposition. Lemma 23. 1. 2. 3. 4.

j0 ∈ Q lca if and only if S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ 0i ∈ Q lca if and only if S ( v A ∗ ) ∩ u i . . . un A ∗ = ∅ and (i , 0) ∈ / θS tn ∈ Q lca if and only if S ( A ∗ v 0 . . . v t ) ∩ A ∗ u = ∅ and (n, t ) ∈ / θS ns ∈ Q lca if and only if S ( A ∗ v ) ∩ A ∗ u 0 . . . u s = ∅.

Proof. We only prove 1, proofs of 2, 3 and 4 being very similar. Let us ﬁrst suppose that S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ and let us consider two cases: 1. v j . . . v n A ∗ ∩ u A ∗ = ∅. In this case, v = v x and u = xu for some words v and u with x = v j . . . v n . It follows uu − → 0 vu − → v v that is a left derivation so j0 = cont j (0 j ) ∈ Q lca . j ∗ ∗ → w −→ u w be a left derivation of minimal length for some words w , w , w , 2. v j . . . v n A ∩ u A = ∅. Let v j . . . v n w − s

p

some rps s and some index p. By the minimality of the length of the derivation, we get w ∈ / u A ∗ and |s|0 = 0. Moreover, since it is a left derivation, we have p ≤ n and ( p , 0) ∈ / θ S so sp0 is a left rps. Let us consider the derivation uw − → v w −→ v 0 . . . v j −1 v w with s = shift j (sp0). This derivation is left and cont j (0s ) = j0 so j0 ∈ Q lca . 0

s

Conversely, let j0 ∈ Q lca . In this case, there exists a left rps s and an index k ∈ N such that contk (s) = j0. It follows that s = s (k − j )s ks with

contk (s ) = contk (s ) = contk (s ) = ε and we can suppose s = s = ε . Moreover, since contk (s ) = ε , it follows that for every index t such that |s |t > 0, either t > k or t < k − n and s = k (s ) because (k − j )s is left. By (k − j )k (s )k ∈ R θ S ((k − j )>k (s )kk (s )kk (s )k is a rps, which is moreover left. By (k − j )>k (s )k = shiftk− j (0α j ), it follows that 0α j is also left and satisﬁes t > j for every t such that |α |t > 0. Now, since 0α j is a left rps, there exists a left derivation u w 1 − → v w 1 −→ w 2 − → w 3 for some words w 1 , w 2 , w 3 . From the propj α 0 erty |α |t > 0 ⇒ t > j, we get that w 2 = v 0 . . . v j −1 w 2 for some word w 2 and w 3 = v 0 . . . v j −1 w 3 for some word w 3 and

222

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

we have the derivation v j . . . v n w 1 − −→ w 2 − → w 3 with shift j (α ) = α . This implies u ∈ PF( w 2 ) and S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ α 0 that ﬁnishes the proof. 2 Lemma 24. If A ∗ U ∩ A ∗ V = ∅ then 1. 2. 3. 4.

/ F( V ∗ v ), u∈ S ( v A ∗ ) = V ∗ v A ∗ , j0 ∈ Q lca if and only if v j . . . v n ∈ X , 0i ∈ Q lca if and only if u i . . . un ∈ Z \ X .

Proof. 1. Suppose that u ∈ F( V ∗ v ), then u = u 1 u 2 with u 1 = ε ∈ SF( V + ) and u 2 = ε ∈ PF( v ). It follows u 2 ∈ Z so u 1 ∈ U , a contradiction. 2. For every k ≥ 0, for every word w ∈ V k v A ∗ and for every word w such that w → w , w ∈ V k+1 v A ∗ ∪ V k v A ∗ since u∈ / F( V ∗ v ). It follows by induction that S ( v A ∗ ) ⊆ S ( V ∗ v A ∗ ) ⊆ V ∗ v A ∗ . Conversely, we clearly have for any k ≥ 0, V k+1 v A ∗ ⊆ S ( V k v A ∗ ) so by induction we get V ∗ v A ∗ ⊆ S ( v A ∗ ). 3. From Item 1 of Lemma 23, we have j0 ∈ Q lca if and only if S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅. Since v j . . . v n ∈ X implies S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅, it remains to prove that, if A ∗ U ∩ A ∗ V = ∅, S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ implies v j . . . v n ∈ X . ∗ Assume that v j . . . v n ∈ / X and that we have a derivation v j . . . v n w → w 1 −→ u w . Clearly, one can assume that w 1 ∈ / ∗ ∗ ∗ v j . . . v n A . Then w 1 ∈ v j . . . v p v A with v 0 . . . v p ∈ V . Thus u ∈ / F( v j . . . v p V v ) and u w ∈ S ( w 1 ) ⊆ v j . . . v p S ( v A ∗ ) = v j . . . v p V ∗ v A ∗ from item 2 but, according to item 1, this leads to a contradiction. / θ S ; it follows from item 2 of Lemma 23 4. Assume ﬁrst that u i . . . un ∈ Z \ X . Then S ( v A ∗ ) ∩ u i . . . un A ∗ = ∅ and (i , 0) ∈ / θ S . Since S ( v A ∗ ) = V ∗ v A ∗ from that 0i ∈ Q lca . Conversely, if 0i ∈ Q lca we get S ( v A ∗ ) ∩ u i . . . un A ∗ = ∅ and (i , 0) ∈ item 2, u i . . . un ∈ PF( V ∗ v ) and u i . . . un ∈ V ∗ Z . If u i . . . un ∈ V + Z , there exists some p > i such that u p . . . un ∈ Z and u 0 . . . u p −1 ∈ U ∩ A ∗ V , a contradiction. So u i . . . un ∈ Z and, since (i , 0) ∈ / θs , u i . . . u n ∈ Z \ X . 2 McNaughton [15] calls {u → v } right barren if u ∈ / F( v ) and U ∩ SF( V ∗ ) = ∅. He proves Lemma 24, items 1 and 2 in the + case PF(u ) ∩ SF(u ) ∩ A = ∅ in order to prove termination of right barren one-rule string rewrite systems. Symmetrically, we can prove: Lemma 25. If U A ∗ ∩ V A ∗ = ∅ then 1. 2. 3. 4.

/ F( v V ∗ ), u∈ S ( A ∗ v ) = A ∗ v V ∗ , tn ∈ Q lca if and only if v 0 . . . v t ∈ Z \ X , ns ∈ Q lca if and only if u 0 . . . u s ∈ X .

We can now prove the if part of Proposition 9: Lemma 26. If u = xyz and v = zyx for some words x, y , z with X = {x} and Z = { z}, and A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅ then the state set Q lca satisﬁes the single continuation property. Proof. Let x = v j . . . v n = u 0 . . . u s , z = v 0 . . . v t = u i . . . un and y = u s+1 . . . u i −1 = v t +1 . . . v j −1 . Observe that U = {xy }, V = { zy }, U = { yz} and V = { yx}. Since A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅, we have A ∗ U ∩ A ∗ V = U A ∗ ∩ V A ∗ = ∅. It follows from Lemmas 24 and 25:

• • • •

0k ∈ k0 ∈ kn ∈ nk ∈

Q lca ⇐⇒ k = i, Q lca ⇐⇒ k = j, Q lca ⇐⇒ k = t, Q lca ⇐⇒ k = s.

We claim that for every k ∈ Nn , there exists some k such that kk ∈ Q lca . To prove this, let us consider the deriva→ xyzyxyz −→ zyxyxyz −−→ zyxyzyx − → zyzyxyx. Observe that s = i0(i + j ) j = leftrps(xyxyzyz, zyzyxyx): indeed, tion xyxyzyz − i

0

i+ j

j

(i , 0) ∈ / R θ S since x A ∗ ∩ z A ∗ = ∅ and that also implies (i + j , j ) ∈ / R θS . Hence we get from Lemma 21 and Lemma 20 that for every 0 ≤ k ≤ n, conti +k (s) in Q lca . Moreover conti +k (s) ∈ kNn+ : indeed, assume conti +k (0) = ε and conti +k (i + j ) = ε then it follows i + k > n and k < j. That implies i + k − j > 0 and i + k − j < n so conti +k ( j ) = ε . Now, since Q lca is closed under factors, it follows that there exists k with kk ∈ Q lca . Hence, we can consider the smallest k such that there exist p and p with p < p and kp ∈ Q lca , kp ∈ Q lca . From the above, k > 0 and k < n. Let us distinguish two cases:

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

223

1. If p > 0, let q, q ∈ Q lca such that kp ∈ succ(q) and kp ∈ succ(q ). Since k is minimal, we cannot have the following possibilities: • q = (k − 1)( p − 1) and q = (k − 1)( p − 1), • q = (k − 1)n( p − 1) and q = (k − 1)( p − 1), • q = (k − 1)( p − 1) and q = (k − 1)n( p − 1). So q = (k − 1)n( p − 1) and q = (k − 1)n( p − 1), but this implies p − 1 = p − 1 = s, a contradiction. 2. If p = 0, it follows k = j. Let us consider two cases: • If j ≥ p , let us consider the following different states accessible from jp : ( j + 1)( p + 1), . . . , ns. This leads to a contradiction since n − j = s and p > 0. • If j < p , let us consider the following states co-accessible from jp : ( j − 1)( p − 1), . . . 0i. It follows v 0 . . . v j −1 = u i . . . u p −1 . This is a contradiction since zy = v 0 . . . v j −1 and z = u i . . . un . Finally, the state set Q lca satisﬁes the single continuation property.

2

The following lemma proves the only if part of Proposition 9: Lemma 27. If the state set Q lca satisﬁes the single continuation property then u = xyz and v = zyx for some words x, y , z with X = {x} and Z = { z}, and A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅. Proof. Since Q lca satisﬁes the single continuation property, there exists an index i = 0 such that 0i ∈ Q lca and there exists an index s = n such that ns ∈ Q lca . It follows that 1(i + 1), 2(i + 2), . . . , (n − i )n are in Q lca and (n − 1)(s − 1), . . . , (n − s)0 are in Q lca . Let us denote t = n − i and j = n − s, we have v j . . . v n = u 0 . . . u s = x and v 0 . . . v t = u i . . . un = z. Assume t ≥ j and let p = i + j. We get jp ∈ succ j (0i ) and jp ∈ predt − j (tn) so jp ∈ Q lca , a contradiction since p = 0 and j0 ∈ Q lca . So t < j and s = n − j < n − t = i that implies j − t = i − s. Let k be the (unique) index such that (t + 1)k ∈ Q lca , we consider two cases: 1. if k > 0 then t (k − 1) ∈ Q lca so k = s + 1 and (t + 1)(s + 1), (t + 2)(s + 2) . . . ( j − 1)(i − 1) are in Q lca . We get v t +1 . . . v j −1 = u s+1 . . . u i −1 = y so we have u = xyz and v = zyx. 2. if k = 0, since s < i, we have (t + 1)0, (t + 2)1, . . . ns ∈ Q lca so t = j − 1 and we have u = xyz and v = zyx with y = ε . Observe that, as a consequence, the function follow that is deﬁned for every index 0 ≤ p ≤ n by follow( p ) = q such that pq ∈ Q lca is surjective so it is bijective and we have both ∀ p ∃!q | pq ∈ Q lca and ∀q∃! p | pq ∈ Q lca . Thus, we get that j is the unique index such that j0 ∈ Q lca and this implies X = {x}. Let us now consider i = i such that u i . . . un ∈ Z . Since 0i ∈ / Q lca , it follows from Lemma 23 that (i , 0) ∈ θ S which implies, from Lemma 4, u i . . . un ∈ X ∩ Z , so u i . . . un = x. Let us consider two cases: 1. If |x| ≤ | z| then z = z1 x = xz2 for some words z1 and z2 . It follows u = xyxz2 and v = z1 xyx, but that implies xyx ∈ X , a contradiction. 2. If | z| < |x|. This implies z ∈ X , a contradiction since X = {x}. It follows that u i . . . un ∈ Z implies i = i so Z = { z}. It remains to prove A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅. v and it follows that |x| = | z| implies A ∗ x ∩ A ∗ z = ∅. Hence, we Suppose ﬁrst that A ∗ x ∩ A ∗ z = ∅. Note that x = z by u = have to consider two cases: 1. |x| > | z|. In this case z ∈ SF(x), so there exists an index p such that z = u p . . . u s so zyz = u p . . . un and that implies vyz = zyu → zyv = zyzyx ∈ u p . . . un A ∗ so S ( v A ∗ ) ∩ u p . . . un A ∗ = ∅. Observe that we have p = i since s < i so we cannot have 0p ∈ Q lca . From Lemma 23, item 2, which implies ( p , 0) ∈ θ S so, from Lemma 4, u p . . . un = zyz ∈ X ∩ Z , a contradiction since Z = { z}. 2. |x| < | z|. In this case x ∈ SF( z), so z = v 0 . . . v p −1 x for some index p which implies xyx = v p . . . v n . It follows v p . . . v n yz = xyu → xyv = uyx and, from Lemma 23, item 1, p0 ∈ Q lca . But this would imply p = j, a contradiction since p < | z| ≤ j. Symmetrically, we can prove x A ∗ ∩ z A ∗ = ∅.

2

We now prove the main result of this section that was stated at its beginning: if the state set Q lca of automaton LCA S satisﬁes the single continuation property then the language L S is context-free. In order to prove this result, we consider a more general case. Indeed, if the state set Q lca satisﬁes the single continuation property then it is also the case for the state set Q can of automaton CAN S . The converse is generally false as shown in the following examples: Example 10. Let S 10 = {bab → aba}. We have v 0 = u 1 , v 1 = u 0 , v 1 = u 2 and v 2 = u 1 . The state set Q can = {ε , 0, 1, 01, 10} satisﬁes the single continuation property while Q lca = {ε , 0, 1, 2, 01, 10, 12, 21, 012, 210} does not satisfy it.

224

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Remark 2. We observe in the previous example that Q lca and Q can are ﬁnite. As a matter of fact, if the state set Q lca satisﬁes the single continuation property then it is inﬁnite and there exist words w 0 , w 1 , . . . , w t ∈ Nn∗ such that Q lca = F( w ∗0 + w ∗1 + · · · + w t∗ ). It follows that, in this case, L S is context-free and not regular. Concerning Q can , it is possible that a ﬁnite Q can satisﬁes the single continuation property, on the other hand, when it is inﬁnite, it also satisﬁes the property that there exist words w 0 , w 1 , . . . , w t ∈ Nn∗−1 such that Q can = F( w ∗0 + w ∗1 + · · · + w t∗ ). In the following example, Q lca and Q can are inﬁnite but Q can satisﬁes the single continuation property while Q lca does not satisfy it. Example 11. Let S 11 = {acaba → abaca}. We have v 0 = v 2 = v 4 = u 0 = u 2 = u 4 , v 1 = u 3 and v 3 = u 1 . It is easily seen that 22 is neither accessible nor co-accessible and (4, 0) ∈ θ S so Q lca ∩ N2 = {02, 13, 20, 24, 31, 40, 42}. One can also verify that 202 is not accessible and 242 is not co-accessible and ﬁnally Q lca = F(42(402)∗ 0 + (13)∗ ). So Q lca is inﬁnite and does not satisfy the single continuation property while Q can = F((02)∗ + (13)∗ ) does. Following the above observations, we shall ﬁnally prove that if Q can satisﬁes the single continuation property then the language L S is context-free. This proof is based on the following crucial lemma. Deﬁnition 9. Let J be a set of non-negative integers, for every sequence p ∈ J ∗ , we deﬁne first( p ) and last( p ) as follows:

• if p = ε then first( p ) = last( p ) = ε , • if p ∈ i J ∗ ∩ J ∗ j then first( p ) = i and last( p ) = j with i , j ∈ J . By convention, we set |ε |ε = 0. We also denote ≡ the equivalence relation deﬁned over CAN S by: p ≡ q if and only if first( p ) = first(q) and last( p ) = last(q). Clearly, this equivalence relation is of ﬁnite index. For every state q ∈ Q can , we denote JqK the equivalence class of state q for the relation ≡. Lemma 28. Let CAN S be the canonical automaton associated with a one-rule length-preserving rewrite system S such that Q can satisﬁes the single continuation property. Let δ = ( p , x/ y , q) and δ = ( p , x/ y , q ) in can with p ≡ p and q ≡ q . Then |q|first(q) − | p |first( p ) = |q |first(q) − | p |first( p ) ∈ {−1, 0, +1}. Proof. The property is clearly satisﬁed if p = ε since in this case p = ε and q = q = 0 or q = q = ε . If p = ε and q = ε then q = ε and p , p ∈ (n − 1)+ but, since the factor nn is not allowed in left contributions, we have p = p = n − 1 else 0 would appear as a factor in q or in q . Hence, we can suppose p = ε , q = ε and first( p ) = first( p ) = i, first(q) = first(q ) = i , last( p ) = last( p ) = j, last(q) = last(q ) = j . From the deﬁnition of succ, we have to consider several cases: 1. 2.

i = i + 1. Then |q|i − | p |i = |q |i − | p |i = 0. i = 0 and i < n − 1. This implies 0(i + 1) ∈ PF(q), so 0(i + 1) ∈ Q can , |q|i +1 = | p |i and |q |i +1 = | p |i . From the equalities

|q|0 − |q|i +1 = |q |0 − |q |i +1 = we get

|q|0 − | p |i = |q |0 − | p |i =

if j = 0, +1 if j = 0

0

if j = 0, +1 if j = 0.

0

3. i > 0 and i = n − 1. Then (n − 1)(i − 1) ∈ PF( p ), so (n − 1)(i − 1) ∈ Q can , |q|i = | p |i −1 and |q |i = | p |i −1 . From the equalities

| p |i −1 − | p |n−1 = | p |i −1 − | p |n−1 = we get

|q|i − | p |n−1 = |q |i − | p |n−1 =

0 if j = n − 1, −1 if j = n − 1

if j = n − 1, −1 if j = n − 1.

0

4. i = 0 and i = n − 1 and (n − 1)k ∈ Q can with k < n − 1. Then |q|k+1 = | p |k and |q |k+1 = | p |k . It follows

| p |k − | p |i = | p |k − | p |i =

if j = i , −1 if j = i

0

and

|q|0 − |q|k+1 = |q |0 − |q |k+1 =

if j = 0, +1 if j = 0

0

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

225

so

|q|0 − | p |i = |q |0 − | p |i =

+1 if j = i and j = 0, 0, −1 if j = i and j = 0 in the other cases.

5. i = n − 1, i = 0 and (n − 1)(n − 1) ∈ Q can . In this case, p and p belong to (n − 1)+ and q and q belong to 0+ . By deﬁnition of CAN S , there exists a transition ( p , x/ y , q ) ∈ lca such that the state q ∈ Q lca satisﬁes q ∈ F((0n)∗ ), |q |n = | p |n−1 and |q |0 = |q|0 . Moreover

|q |0 − |q |n =

+1 if q ∈ (0n)∗ 0, −1 if q ∈ (n0)∗ n, 0

in the other cases.

Now, from the deﬁnition of the transitions in LCA S , we have

|q |0 − |q |n =

+1 if x/ y = u 0 / v 0 , −1 if x/ y = un / v n , 0

in the other cases.

Observe that this property is consistent since in this case u 0 = un : ﬁrst, we prove by contradiction that for every 1 ≤ k ≤ n − 1, kk ∈ Q lca . For this, let us consider the biggest k such that kk ∈ / Q lca . Then knk ∈ Q lca : indeed, if k = n − 1 then kk ∈ Q can else (k + 1)(k + 1) ∈ Q lca and in both cases it follows knk ∈ Q lca . This implies {(n − 1)(k − 1), (n − 1)n(k − 1)} ∩ Q lca = ∅ and so (n − 1)(k − 1) ∈ Q can , a contradiction since Q can satisﬁes the single continuation property, (n − 1)(n − 1) ∈ Q can and k ≤ n − 1. Now, since kk ∈ Q lca for every 1 ≤ k ≤ n − 1, we get v 1 . . . v n−1 = u 1 . . . un−1 . Moreover, 0n0 ∈ Q lca since 11 ∈ Q lca so v 0 = un and v n = u 0 . It follows that u 0 = un since u = v. Finally, since |q |n = | p |n−1 and |q |0 = |q|0 , and using a similar reasoning for p and q , we obtain

|q|0 − | p |n−1 = |q |0 − | p |n−1 =

+1 if x/ y = u 0 / v 0 , −1 if x/ y = un / v n , 0

2

in the other cases.

Clearly, when Q can satisﬁes the single continuation property then, for every δ = ( p , x/ y , q) in can , if |q|first(q) = | p |first( p ) + 1 then first(q) = last(q) = 0 and if |q|first(q) = | p |first( p ) − 1 then first( p ) = last( p ) = n − 1 so we directly obtain as a consequence of Lemma 28: Corollary 9. Let CAN S be the canonical automaton associated with a one-rule length-preserving rewrite system S such that its state set Q can satisﬁes the single continuation property. Then can = 0 ∪ + ∪ − with

0 = {( p , x/ y , q) ∈ can | |q|first(q) = | p |first( p ) }, + = {( p , x/ y , q) ∈ can | first(q) = last(q) = 0 ∧ |q|0 = | p |first( p ) + 1} and − = {( p , x/ y , q) ∈ can | first( p ) = last( p ) = n − 1 ∧ |q|first(q) = | p |n−1 − 1}. In particular, we get

(ε , x/x, ε ) ∈ 0 , (ε , u 0 / v 0 , 0) ∈ + , and (n − 1, un / v n , ε ) ∈ − . We can now prove: Theorem 5. Let CAN S be the canonical automaton associated with a one-rule length-preserving rewrite system S such that Q can satisﬁes the single continuation property. Then L S is a context-free language. Proof. We construct from CAN S a pushdown automaton that recognizes L S . Intuitively, in this automaton, the states will be the equivalence classes of the relation ≡ and the stack will give information on the size of the label of the corresponding state in Q can . More precisely, we deﬁne the pushdown automaton PDA S = { A , , , ⊥, Jε K, } where

• • • • •

A is the input alphabet,

= {JqK | q ∈ Q can } is the set of states, = {⊥, } is the stack alphabet, ⊥ is the initial stack symbol, Jε K is the initial and ﬁnal state,

226

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

• is the set of rules of the automaton deﬁned by:

= {(Jε K, x/x, ⊥) → (Jε K, ⊥) | x ∈ A } ∪ {(Jε K, u 0 / v 0 , ⊥) → (J0K, ⊥)} ∪ {(Jn − 1K, un / v n , ⊥) → (Jε K, ⊥)} ∪ {(J p K, x/ y , γ ) → (JqK, γ ) | ( p , x/ y , q) ∈ 0 , p = ε , q = ε , γ ∈ } ∪ {(J p K, x/ y , γ ) → (JqK, γ ) | ( p , x/ y , q) ∈ + , γ ∈ , p = ε } ∪ {(J p K, x/ y , ) → (JqK, ε ) | ( p , x/ y , q) ∈ − , q = ε }. Using Corollary 9 and Lemma 28, it is easy to verify that one can establish a one to one correspondence between every run in CAN S with a run in PDA S : more precisely, one can prove by induction over the length of the runs that any path in CAN S from ε to a state q labeled by w ⊗ w corresponds to a run labeled by w ⊗ w in PDA S starting from the initial conﬁguration (Jε K, ⊥) and reaching the conﬁguration (JqK, ⊥t ) with t = 0 if q = ε and t = |q|first(q) − 1 if q = ε . In particular, there exists a one to one correspondence between every path from state ε to state ε in CAN S labeled by a word of L S with a successful run labeled by the same word in PDA S . This implies that PDA S recognizes L S so L S is a context-free language. 2 Example 12. The pushdown automaton corresponding to the canonical automaton for S 8 = {cba → abc } given in Example 9 is the automaton

PDA S 8 = { A , {qε , q0 , q1 }, {⊥, }, ⊥, qε , } where qε = Jε K, q0 = J0K = 0+ , q1 = J1K = 1+ and 4

={ (qε , a/a, ⊥) → (qε , ⊥), (qε , b/b, ⊥) → (qε , ⊥), (qε , c /c , ⊥) → (qε , ⊥), (qε , c /a, ⊥) → (q0 , ⊥), (q1 , a/c , ⊥) → (qε , ⊥), (q1 , a/a, ⊥) → (q0 , ⊥), (q1 , c /c , ⊥) → (q0 , ⊥), (q0 , b/b, ⊥) → (q1 , ⊥), (q1 , a/a, ) → (q0 , ), (q1 , c /c , ) → (q0 , ), (q0 , b/b, ) → (q1 , ), (q1 , c /a, ⊥) → (q0 , ⊥), (q1 , c /a, ) → (q0 , ), (q1 , a/c , ) → (q0 , ε ) } As a corollary of Proposition 9 and Theorem 5 we get: Corollary 10. Let S = {u → v } be a one-rule length-preserving rewrite system such that 1. u = xyz, v = zyx for some words x, y , z with X = {x} and Z = { z}, 2. A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅, then L S is a context-free language. From Proposition 6, the pushdown automaton built in the proof of Theorem 5 is deterministic if and only if u 0 = v 0 . Nevertheless, from Proposition 7 we know that automata LCA S and CAN S are not very far from being deterministic and it is not too diﬃcult, using Proposition 7 and Lemma 28, to provide a deterministic pushdown automaton that is equivalent to pushdown automaton PDA S . As a consequence, in the case when the state set Q can satisﬁes the single continuation property, we obtain a linear algorithm for the accessibility problem that is to decide, given a one-rule length-preserving rewrite system S and two words w and w whether w ∈ S ( w ). 6. Conclusion In this paper, we have deﬁned the canonical automaton CAN S associated with a one-rule length-preserving string rewrite system S. This automaton is based on the notion of rps and we think that it could be a crucial tool in order to solve

4

Recall that, in this example, Q can = 0∗ + 1∗ .

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

227

Conjecture 1. This automaton being with a bounded delay, we also think that it should induce a polynomial algorithm to solve the accessibility problem. Some decision problems remain open. For instance, Theorem 4 and Theorem 3 give a characterization of one-rule lengthpreserving rewrite systems that are rational transductions. Nevertheless, we do not know if these characterizations are decidable: we do not know yet how to decide whether a system is change-bounded or not and an important open question is the recursivity of the set Q can of states of the canonical automaton. As a matter of fact, we conjecture that Q can is always a regular language: as seen in the previous section, it is the case when Q can satisﬁes the single continuation property but it is also the case for instance for the system S 6 = {baa → aab} while we have seen, in the beginning of Section 4, that for this system L S is not a context-free language. Indeed, it is not diﬃcult to check that, in this case, Q lca = F((1 + 20)∗ ) so Q can = (0 + 1)∗ is a regular language. It is worth noting that the conjecture Q can is a regular language is equivalent to the conjecture Q lca is a regular language. Indeed, if Q lca is a regular language, then, clearly, Q can is also a regular language. The converse is also true: one can retrieve Q lca from Q can using only operations that preserve regularity. Proving the conjecture does not seem to be an easy task, anyway: the state set Q lca is deﬁned as the intersection succ∗ (ε ) ∩ pred∗ (n); unfortunately, there is no hope to have succ∗ (ε ) and pred∗ (n) both regular: for S 12 = {baa → abb}, Q lca is ﬁnite because pred∗ (n) = {2, 1, 0, ε , 2.0.2, 0.2, 2.0} is ﬁnite but succ∗ (ε ) is not a regular language. Indeed, in this case, the set >0 (succ∗ (ε )) is equal to the set of factors of the Fibonacci word that is not regular: indeed it has been proved in [9] that the Fibonacci word is fourth power-free. Some questions also remain in the context of the single continuation property and the context-freeness of the language L S : we have proved that the single continuation property for Q can is a suﬃcient condition to get the context-freeness of L S . Conversely, if we consider the system {dacad → cadac}, it is easy to verify that 1.1 and 1.3 are both in Q can so Q can does not satisfy the single continuation property while L S is regular and so context-free. However, we do not have an example of a system such that Q can does not satisfy the single continuation property while L S is a context-free non-regular language. Similarly, if we consider the construction of the pushdown automaton in the proof of Theorem 5, we can wonder if, when L S is context-free, it is always the case that there exists a pushdown automaton recognizing L S using an initial stack symbol and only one extra stack symbol. Another question deserves to be studied. If for a given one-rule length-preserving rewrite system S, its associated language L S is context-free then S is a context-free transduction: indeed, in this case, there clearly exist two morphisms g and h such that S ( w ) = g (h−1 ( w ) ∩ L S ) for every word w. As a consequence, the image of any regular language by S is a context-free language. Such transformations that transform any regular language into a context-free language are called algebrico-rational in [3] where are characterized the semi-commutation systems, so special cases of length-preserving rewrite systems, that are algebrico-rational. This characterization implies in particular that every one-rule semi-commutation system is algebrico-rational. So, what happens if we consider a one-rule length-preserving rewrite system S that is not a semicommutation system and such that L S is not a context-free language like system S 6 = {baa → aab}? As a matter of fact, for ∗ this system, one can deﬁne two morphisms g and h such that, for every word w, S 6 ( w ) = g (h−1 ( w ) ∩ D ∗ 1 (x, y ) (a + b ) ) ∗ where D 1 (x, y ) denotes the semi-Dyck language over the alphabet {x, y } and denotes the shuﬄe operation. Thus S 6 is a context-free transduction. So the non-context-freeness of language L S does not allow to disprove the conjecture, stated in [13], that every one-rule length-preserving rewrite system is algebrico-rational and more precisely is a context-free transduction. Acknowledgments The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. References [1] Véronique Bruyère, Automata and codes with bounded deciphering delay, in: Imre Simon (Ed.), LATIN, in: Lecture Notes in Computer Science, vol. 583, Springer, 1992, pp. 99–107. [2] Mireille Clerbout, Michel Latteux, Semi-commutations, Inf. Comput. 73 (1) (1987) 59–74. [3] Mireille Clerbout, Yves Roos, Semicommutations and algebraic languages, Theor. Comput. Sci. 103 (1) (1992) 39–49. [4] Nachum Dershowitz, Open. Closed. Open, in: Jürgen Giesl (Ed.), RTA, in: Lecture Notes in Computer Science, vol. 3467, Springer, 2005, pp. 376–393. [5] Samuel Eilenberg, Automata, Languages and Machines. Volume A, Pure Appl. Math., vol. 59, Academic Press, New York, 1974. [6] Calvin C. Elgot, Jorge E. Mezei, On relations deﬁned by generalized ﬁnite automata, IBM J. Res. Dev. 9 (1) (January 1965) 47–68. [7] Alfons Geser, Decidability of termination of grid string rewriting rules, SIAM J. Comput. 31 (4) (2002) 1156–1168. [8] Alfons Geser, Termination of string rewriting rules that have one pair of overlaps, in: Robert Nieuwenhuis (Ed.), RTA, in: Lecture Notes in Computer Science, vol. 2706, Springer, 2003, pp. 410–423. [9] Juhani Karhumäki, On cube-free ω -words generated by binary morphisms, Discrete Appl. Math. 5 (3) (1983) 279–297. [10] Yuji Kobayashi, Masashi Katsura, Kayoko Shikishima-Tsuji, Termination and derivational complexity of conﬂuent one-rule string-rewriting systems, Theor. Comput. Sci. 262 (1–2) (2001) 583–632. [11] Winfried Kurth, Termination und Konﬂuenz von Semi-Thue-Systemen mit nur einer Regel, PhD thesis, Technische Universität Clausthal, 1990. [12] Winfried Kurth, One-rule semi-Thue systems with loops of length one, two or three, Inform. Théor. Appl. 30 (5) (1996) 415–429. [13] Michel Latteux, Yves Roos, One-rule length-preserving rewrite systems and rational transductions, RAIRO Theor. Inform. Appl. 48 (2) (2014) 149–171. [14] Éric Lilin, Une généralisation des semi-commutations, Technical report IT-210, Laboratoire d’Informatique Fondamentale de Lille, Université de Lille 1, France, April 1991, in French.

228

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

[15] Robert McNaughton, The uniform halting problem for one-rule semi-Thue systems, Technical report 94-18, Resselaer Polytechnic Institute, Troy, NY, August 1994. [16] Robert McNaughton, Semi-Thue systems with an inhibitor, J. Autom. Reason. 26 (4) (2001) 409–431. ´ [17] Yves Métivier, Edward Ochmanski, On lexicographic semi-commutations, Inf. Process. Lett. 26 (2) (1987) 55–59. [18] Wojciech Moczydłowski, Alfons Geser, Termination of single-threaded one-rule semi-Thue systems, in: Jürgen Giesl (Ed.), Term Rewriting and Applications, in: Lecture Notes in Computer Science, vol. 3467, Springer, Berlin/Heidelberg, 2005, pp. 338–352. [19] Bala Ravikumar, Peg-solitaire, string rewriting systems and ﬁnite automata, Theor. Comput. Sci. 321 (2–3) (2004) 383–394. [20] Géraud Sénizergues, On the termination problem for one-rule semi-Thue system, in: Harald Ganzinger (Ed.), Rewriting Techniques and Applications, in: Lecture Notes in Computer Science, vol. 1103, Springer, Berlin/Heidelberg, 1996, pp. 302–316. [21] Alain Terlutte, David Simplot, Iteration of rational transductions, Inform. Théor. Appl. 34 (2) (2000) 99–130. [22] Hans Zantema, Alfons Geser, A complete characterization of termination of 0 p 1q → 1r 0s , Appl. Algebra Eng. Commun. Comput. 11 (1) (2000) 1–25.

A canonical automaton for one-rule length-preserving string rewrite systems

A canonical automaton for one-rule length-preserving string rewrite systems

Recommend Documents