A canonical automaton for one-rule length-preserving string rewrite systems

A canonical automaton for one-rule length-preserving string rewrite systems

Information and Computation 244 (2015) 203–228 Contents lists available at ScienceDirect Information and Computation www.elsevier.com/locate/yinco ...

722KB Sizes 0 Downloads 5 Views

Information and Computation 244 (2015) 203–228

Contents lists available at ScienceDirect

Information and Computation www.elsevier.com/locate/yinco

A canonical automaton for one-rule length-preserving string rewrite systems Michel Latteux, Yves Roos ∗ Univ. Lille, CRIStAL, UMR 9189, 59650 Villeneuve d’Ascq, France

a r t i c l e

i n f o

Article history: Received 21 February 2014 Received in revised form 3 October 2014 Available online 17 July 2015 Keywords: String rewrite system Rational transduction

a b s t r a c t In this work, we use rearrangements in rewriting positions sequence in order to study precisely the structure of the derivations in one-rule length-preserving string rewrite systems. That yields to the definition of a letter-to-letter transducer that computes the relation induced by a one-rule length-preserving string rewrite system. This transducer can be seen as an automaton over an alphabet A × A. We prove that this automaton is finite if and only if the corresponding relation is rational. We also identify a sufficient condition for the context-freeness of the language L recognized by this automaton and, when this condition is satisfied, we construct a pushdown automaton that recognizes L. © 2015 Elsevier Inc. All rights reserved.

1. Introduction Rewrite systems are of primordial interest for computational problems. The problems that are mainly investigated for rewrite systems are the accessibility problem, the common descendant problem, the confluence problem, the termination and uniform termination problem. Some intriguing decidability problems remain open even for very simple rewrite systems.1 One-rule rewrite systems are among the simplest rewrite systems. Nevertheless, they have been intensively studied for several years and several deep results have been obtained [11,12,20,15,22,10,16,7,8,18]. It is particularly noteworthy that the decidability of the termination of one-rule rewrite systems remains an open question for more than twenty years. One-rule rewrite systems are simply defined by two words u , v over an alphabet A and noted S = {u → v }. For a word w, S ( w ) is the set of words obtainable from w by replacing repeatedly u by v. Thus S induces a relation over A ∗ × A ∗ and we address here the problem to define a letter-to-letter transducer that computes this relation in the case when the system S = {u → v } is length-preserving that is when |u | = | v |. It is known from [6] (cf. also [5]) that a length-preserving rational relation over A ∗ × A ∗ is a rational subset of ( A × A )∗ so a length-preserving rational relation can be computed by a letter-to-letter finite transducer. Unfortunately, the relation induced by a one-rule length-preserving rewrite system need not be rational: for instance, the rewrite system S 1 = {ba → ab} clearly induces a relation that does not preserve regularity. As a consequence, the letter-to-letter finite transducer associated with a one-rule length-preserving rewrite system that we define in this work need not be finite and we do not provide an effective algorithm for computing it either. Nevertheless, we get that it is finite if and only if the relation induced by its corresponding one-rule length-preserving rewrite system is rational by proving here the if part.

* 1

Corresponding author. E-mail addresses: [email protected] (M. Latteux), [email protected] (Y. Roos). See [4] for a history of these problems and the attempts to solve them.

http://dx.doi.org/10.1016/j.ic.2015.07.002 0890-5401/© 2015 Elsevier Inc. All rights reserved.

204

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Clearly, a letter-to-letter finite transducer over A ∗ × A ∗ can be seen as an automaton whose input alphabet is A × A. The states of this automaton have to store all the potential and future rewritings that can be applied in a given position in a whole derivation for a given one-rule length-preserving rewrite system S = {u → v }. For this, we introduce the notion of rewriting positions sequence (rps for short) that is the sequence of all the positions where the rule u → v is applied during the derivation and we also define the crucial notion of left rps. The left derivation (or left rps) from a word w to a word w  satisfies the property that each step in the derivation is applied at the leftmost position (in the current word) such that a derivation to w  is still possible. This notion of left rps gives a canonical representative for the set of all derivations from a given word w to a given word w  and plays a central role in the definition of the states of the canonical automaton associated with a one-rule length-preserving rewrite system. The paper is organized as follows: first, in Section 3, we define the notion of rewriting positions sequence and explore some of their properties. We also define left derivations and left rps and the contribution of a rps to a given position i. For every one-rule length-preserving rewrite system S = {u → v }, we define the semi-commutation relation θ S and prove that its corresponding semi-commutation rewrite system can be used to compute from every rps its equivalent left rps. In Section 4, we give the definition of the canonical automaton associated with a one-rule length-preserving rewrite system S = {u → v } and we prove that this automaton is finite if and only if the relation induced by the rewrite system S is rational. In Section 5, we give a sufficient condition for the context-freeness of the language L recognized by this automaton defined in Section 4 and, when this condition is satisfied, we construct a pushdown automaton that recognizes L. At last, in the conclusion, we identify some problems that deserve to be studied in the context of the rationality of one-rule length-preserving rewrite systems. 2. Preliminaries and notations Let  be a finite or infinite alphabet,  ∗ will denote the free monoid over  and ε the empty word in  ∗ . For a word w ∈  ∗ , | w | denotes the length of the word w and, for any letter a ∈  , | w |a denotes the number of occurrences of the letter a in w. A word w  is a factor of a word w if there exist two words w 1 and w 2 such that w = w 1 w  w 2 and we denote by F( w ) the set of the factors of the word w. We denote by SF( w ) (respectively PF( w )) the set of suffixes (respectively prefixes) of the word w, that is:

SF( w ) = { w  ∈  ∗ | ∃ w  ∈  ∗ , w = w  w  }, PF( w ) = { w  ∈  ∗ | ∃ w  ∈  ∗ , w = w  w  }. A rewrite system over an alphabet  is a subset S ⊆  ∗ ×  ∗ . Members of S are denoted u → v. We shall denote S −1 the system obtained from the system S by reversing the rules of S, that is u → v ∈ S iff v → u ∈ S −1 . One-step derivation, denoted →, is the binary relation over words defined by: ∀ w , w  ∈  ∗ , w → w  iff there exists u → v ∈ S and α , β ∈  ∗ ∗ such that w = α u β and w  = α v β . The relation − → , called derivation relation, is the reflexive and transitive closure of the relation →. Abusing notation we shall identify in the following a given rewrite system S with its associated transformation ∗ over languages:  for every word w ∈  ∗ , we shall denote S ( w ) the set S ( w ) = { w  ∈  ∗ | w − → w  } and for every language ∗  L ⊆  , S (L) = S ( w ). For a derivation w = w 0 → w 1 . . . → w n = w , n is called the length of the derivation. w ∈L

A rewrite system S is called length preserving if for every rule u → v in S, u and v have the same length. Semicommutation systems are particular cases of length preserving rewrite systems: a semi-commutation θ over an alphabet is an irreflexive binary relation included in  ×  . When the relation θ is symmetric, it is called a partial commutation. Semicommutations and partial commutations were introduced in the context of traces theory [2]. With every semi-commutation θ is associated a semi-commutation rewrite system R θ defined by R θ = {ab → ba | (a, b) ∈ θ}. For every one-rule length-preserving rewrite system S = {u → v }, we denote:

X = PF(u ) ∩ SF( v ) ∩  + ,

Z = SF(u ) ∩ PF( v ) ∩  + ,

U

= u Z −1

U  = X −1 u ,

V

v X −1

=

= {u 

∈ ∗

| u ∈ u  Z },

V  = Z −1 v as depicted in Fig. 1.

3. Rewritings and commutations In the rest of this article, we consider a fixed non-trivial rewrite system S = {u → v } such that |u | = | v |. Here S is called non-trivial if u = v. We denote A the alphabet of S: A = {x | |u |x + | v |x > 0}. Since u and v are distinct, there exist a word d and two distinct letters a and b in A such that u ∈ dbA∗ and v ∈ daA∗ . We have: ∗

Lemma 1. If w = w 0 . . . w k − → w  = w 0 . . . w k with w = w  and if i is the smallest index such that w i = w i then: 1. w 0 . . . w i −1 ∈ A ∗ d

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

205

Fig. 1. The sets X , Z , U  , U  , V  and V  .

2. w i = b and w i = a ∗ → dw i . . . w k 3. dw i . . . w k − ∗

The proof of this lemma is an induction over the length of the derivation w − → w  and, as a matter of fact, it does not use the hypothesis that S is length-preserving; Lemma 1 is true for every one-rule rewrite system. As a direct consequence of this lemma, we get that S ( v A ∗ ) ⊆ daA∗ so S ( v A ∗ ) ∩ u A ∗ = ∅. Symmetrically we also have S ( A ∗ v ) ∩ A ∗ u = ∅. In order to keep a precise information on positions where rewritings apply, we use the following notion of rewriting positions sequence associated with a derivation. ∗ We associate with every derivation w − → w  of length k in S its rewriting positions sequence (rps) s, defined by:

• if k = 0 then s = ε , ∗ ∗ • else w = α u β → α v β −→ w  and s = |α |.s where s is the rps associated with the derivation α v β − → w. Observe that every derivation, starting from a word w, is completely characterized by its rewriting positions sequence that can be seen as a (finite) word over the (infinite) alphabet N. We say that a word s ∈ N∗ is a rps (for S) if there exists ∗ a derivation w − → w  in S whose rps is s; we denote w − → w  this derivation and we denote RPS( w , w  ) the set of all rps s corresponding to the derivations from w to w  . For every integer k ∈ N, let us denote by shiftk the morphism defined from N∗ to N∗ by: ∀i ∈ N, shiftk (i ) = i + k. For every sequence s ∈ N∗ , we also denote min(s) = min({i | |s|i > 0}) and max(s) = max({i | |s|i > 0}). The following properties are clearly satisfied: Property 1. 1. 2. 3. 4. 5. 6. 7.

every factor of a rps is a rps, / F(s), for every rps s, for every i ∈ N, ii ∈ if s ∈ RPS( w , w  ) then for every α ∈ A ∗ , s ∈ RPS( w α , w  α ), s ∈ RPS( w , w  ) if and only if for every α ∈ A t , shiftt (s) ∈ RPS(α w , α w  ), in particular, if s is a rps then the sequence s defined by shiftmin(s) (s ) = s is a rps, if s is a rps then |s|0 ≤ 1, if s is a rps then |s|min(s) = |s|max(s) = 1.

In the rest of this article, we assume u = u 0 u 1 . . . un and v = v 0 v 1 . . . v n with, for every i ∈ {0, . . . , n}, u i , v i ∈ A. Let us consider a one-step derivation w 0 w 1 . . . w p − → w 0 w 1 . . . w p . For every integer j that satisfies 0 ≤ j ≤ p − n, if j < i or i  j > i + n then w j = w j and if 0 ≤ j − i ≤ n, then w j = u j −i and w j = v j −i .

→ w: More generally, we can define the sequence of transformations that occur in a position j during a derivation w − s with every integer j, we associate the morphism cont j : N∗ −→ Nn∗ where Nn = {i ∈ N | i ≤ n} defined by cont j (i ) = j − i if 0 ≤ j − i ≤ n, else cont j (i ) = ε . For every rps s and every integer j, we say that cont j (s) is the contribution of the rps s to the position j. Observe that if j = i + 1 we have conti +1 ( j ) = shift1 (i that is the morphism defined from N∗ to N∗ by: >i ( j ) = j if j > i else >i ( j ) = ε . Example 1. Let S 2 = {baba → abba} and let us consider the following derivation from w = babbababa to w  = ababbabba:

babbababa → bababbaba → abbabbaba → abbababba → ababbabba.

206

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

The rps associated with this derivation is the sequence 3.0.5.2. Observe that we also have 3.5.0.2 in RPS( w , w  ). The contribution of the rps 3.0.5.2 to the position 3 is cont3 (3.0.5.2) = 0.3.1. Lemma 2. If w = w 0 w 1 . . . w t − → w  = w 0 w 1 . . . w t and cont j (s) = i 1 . . . ik with k > 0, then w j = u i 1 , w j = v ik with u i 2 . . . u ik = s v i 1 . . . v ik−1 . Proof. The proof is an induction over |s|, the length of the rps s. Observe that |s| > 0 since k > 0.

• if |s| = 1, the property is satisfied since k = 1, w j = u i 1 , w j = v i 1 and u i 2 . . . u ik = v i 1 . . . v ik−1 = ε , • if |s| > 1, then s = is with i ∈ N and w − → w  −→ w  . We consider two cases:  i

s

1. cont j (i ) = ε : in this case w j = w j and cont j (s ) = i 1 . . . ik so the property is satisfied by inductive hypothesis,

2. cont j (i ) = ε : in this case cont j (i ) = i 1 and cont j (s ) = i 2 . . . ik . It follows w j = u i 1 and w j = v i 1 . By inductive hypothesis, we get w j = u i 2 , w j = v ik and u i 3 . . . u ik = v i 2 . . . v ik−1 . It follows w j = v i 1 = u i 2 , so u i 2 . . . u ik = v i 1 . . . v ik−1 .

2

We shall now prove that if s is in RPS( w , w  ) and s is in RPS( w , w  ) for some words w , w  , w  then w  = w  if and only if s and s are permutations of each other. Recall that u ∈ dbA∗ and v ∈ daA∗ for some word d and distinct letters a and b. In the rest of the article, we shall denote r = |d|.

→ x y  and xz − → x z for some words x, y , z, x , y  , z with |x| = |x | then for every i < |x| − r, |s|i = |s |i . Lemma 3. If xy −  s

s

Proof. Assume that i < |x| − r is the smallest integer such that |s|i = |s |i . Thus |x| > r so x = x0 . . . x|x|−1 and x = x0 . . . x|x|−1 . Let j = i + r, cont j (s) = i 1 . . . ik , cont j (s ) = i 1 . . . ik  and let us compute

δa = | v i 1 . . . v ik |a − |u i 1 . . . u ik |a . We can write δa = >r + 
>r =



(| v ih |a − |u ih |a ),


h=1,...,k i h >r





(| v ih |a − |u ih |a ) and r =

h=1,...,k i h
(| v ih |a − |u ih |a ).

h=1,...,k i h =r

Since v r = a and u r = b = a, we get r = |cont j (s)|r = |s|i . Moreover, if i h < r then u ih = v ih by definition of d and it  + |s | by defining δ  = follows r + |s|i , and, similarly, δa = | v i  . . . v i   |a − |u i  . . . u i   |a = > i r a 1

1

k

k

 +   +   like δ =   > a >r +  r and it follows >r = > a i i r a On the other hand, we have from Lemma 2: δa = | v ik |a − |u i 1 |a since u i 2 . . . u ik = v i 1 . . . v ik−1 and similarly δa = | v i   |a − k |u i 1 |a . It follows

δa − δa = | v i   |a − | v ik |a + |u i 1 |a − |u i 1 |a k

and, since v i   = xj = v ik and u i  = x j = u i 1 we finally obtain δa − δa = 0, a contradiction. 1

k

2

We are now able to prove the following theorem where, for every word w ∈  ∗ , the set com( w ) = { w  ∈  ∗ | ∀a ∈

, | w |a = | w  |a } denotes the commutative closure of the word w.

Theorem 1. If s ∈ RPS( w , w  ) and s ∈ RPS( w , w  ) for some words w , w  , w  then w  = w  if and only if com(s) = com(s ). Proof. Let w = w 0 . . . w t , w  = w 0 . . . w t and w  = w 0 . . . w t . Let us suppose com(s) = com(s ) and take j ≤ | w |; we shall prove w j = w j .

Let cont j (s) = i 1 . . . ik and cont j (s ) = i 1 . . . ik . From the hypothesis, com(cont j (s)) = com(cont j (s )) and it follows that com(u i 1 . . . u ik ) = com(u i  . . . u i  ) and 1 k com( v i 1 . . . v ik ) = com( v i  . . . v i  ). 1 k From Lemma 2, we get w j = u i 1 = u i  which implies com(u i 2 . . . u ik ) = com(u i  . . . u i  ). Moreover, since u i 2 . . . u ik = v i 1 . . . v ik−1 and u i  . . . u i  = v i  . . . v i  2

k

1

k−1

1

, it follows com( v i 1 . . . v ik−1 ) = com( v i  . . . v i  1

The converse implication is a direct consequence of Lemma 3 in the case2 y = z = ε .

2

The case y = ε or z = ε of Lemma 3 will be needed in the proof of Proposition 4.

2

k−1

k

) and we get v ik = v i  so w j = w j . 2

k

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

207

Note that we symmetrically get: If com(s) = com(s ) and s ∈ RPS( w  , w ) and s ∈ RPS( w  , w ) then w  = w  . We also observe that Theorem 1 does not hold if the rewrite system is not length-preserving as shown by the following example:

→ abaaa −→ ababaaa and aaa −→ aaaba −→ abaaaba. Conversely, we have Example 2. Let S 3 = {a → aba}. We have aaa − 0

aa − → abaa −→ abaaba and aa −→ aaba −→ abaaba. 0

3

2

2

0

0

1

Moreover, it is clear that, even if S is a one-rule length-preserving rewrite system, if s ∈ RPS( w , w  ) and if s is a sequence that satisfies com(s) = com(s ), it does not always hold that s ∈ RPS( w , w  ): indeed, for S 1 = {ba → ab}, 0.1 ∈ RPS(baa, aab) but 1.0 ∈ / RPS(baa, aab). So, a natural question arises in this context: given a one-rule length-preserving rewrite system and a rps s corresponding to a derivation between two words w and w  , how to compute the set RPS( w , w  )? From Theorem 1, we know that RPS( w , w  ) ⊆ com(s) and so is finite. Hence, RPS( w , w  ) can be computed as follows: enumerate com(s) and for each s ∈ com(s), test if w −→ w  . As a matter of fact, there exists a particular member in s  RPS( w , w ) from which a more efficient procedure can be applied to compute RPS( w , w  ). This particular rps corresponds to the left derivation from w to w  defined bellow. Roughly speaking, in each step of a left derivation from a word w to a word w  , the rewriting rule is applied on the leftmost occurrence of u that keeps w  reachable. It corresponds to the smallest sequence of RPS( w , w  ) with respect to the lexicographic order.

→ w  − → w  (and its corresponding rps is) is left if the two following properties are satisfied: Definition 1. A derivation w − i

1. the derivation w 

− → s

s

w  is left,

/ S (α v β). 2. if w = α u β with |α | < i then w  ∈ Note that we shall consider that an empty derivation is left. Furthermore, it is clear from the definition that, given two words ∗ w and w  with w − → w  , there exists a unique left derivation from w to w  and we denote leftrps( w , w  ) its corresponding rps. We say that a sequence s is a left rps if there exist words w and w  such that s = leftrps( w , w  ). Example 3. Let S 4 = {abab → bbaa}. The derivation

w = ababbaabababab −−−−→ w  = bbabbaabaabbaa 6.10.0.3

is not left since w −−−−−→ w  . The left derivation from w to w  is w −−−−−→ w  . We can observe that the indices in the 6.0.10.3

0.6.3.10

rps of a left derivation need not occur in ascending order. As a matter of fact it is the case if and only if Z ⊆ X as shown in [13]. Even though we have now a notion of canonical derivation and its associated rps, we do not yet have an operation that can transform an rps into the canonical one. This operation will be the following semi-commutation defined over indices of rps. In this commutation, two indices i and j can commute if the application of a rewriting in position i followed by an application of a rewriting in position j is always equivalent to the application of a rewriting in position j followed by an application of a rewriting in position i. It is the case when |i − j | > n or when u = xu  x, v = xv  x for some words x, u  and v  with |xu  | = |i − j |. Indeed, let us consider ji ∈ RPS( w , w  ) for some words w, w  and assume i < j. If j − i > n then w = α u β u γ and w  = α v β v γ with |α | = i and |α u β| = j and we clearly have i j in RPS( w , w  ). Now if j − i ≤ n and u = xu  x, v = xv  x for some words x, u  , v  with |xu  | = j − i, it follows w = α xu  xu  xβ and w  = α xv  xv  xβ with |α | = i and |α xu  | = j so i j is in RPS( w , w  ). This motivates the following definition of the semi-commutation θ S which is not symmetric since our goal is to get left derivations: Definition 2. The semi-commutation θ S ⊆ N × N is defined by

θ S = {( j , i )|( j > i + n) ∨ ((i < j ≤ i + n ∧ i + n − j + 1 ∈ F )} where F = {|x| | x ∈ X ∩ Z }. Clearly if ( j , i ) ∈ θ S , then the sequence ji is a non-left rps and the sequence i j is a left rps. We also observe that, if the semi-commutation θ S is not trivial, that is if F = ∅, then u 0 = v 0 and un = v n . In particular, we have: Lemma 4. For every 0 < j ≤ n, ( j , 0) ∈ θ S if and only if u j . . . un ∈ X ∩ Z . Furthermore, ( j , i ) ∈ θ S only depends on the value j − i in the definition; it follows that ( j , i ) ∈ θ S if and only if ( j − i , 0) ∈ θ S . We also have the following, used in the proof of Proposition 4:

208

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Lemma 5. If (i , j ) ∈ θ S then for every k ≥ i, the sequence kj is not a left rps. Symmetrically, for every k ≤ j, the sequence ik is not a left rps. Proof. Let (i , j ) ∈ θ S and k ≥ i. If kj is not a rps then it is not a left rps. Else, if k > j + n, then (k, j ) ∈ θ S and it follows that kj is not left. It remains the case k ≤ j + n which implies i ≤ j + n too. Let x = u i − j . . . un ; since (i , j ) ∈ θ S , it follows x ∈ X ∩ Z . Let x = uk− j . . . un ; since kj is a rps, it follows x ∈ Z . Moreover |x | ≤ |x| so x ∈ PF(x) ⊆ PF(u ). On the other hand, since x ∈ SF(u ) and x ∈ Z , we get x ∈ SF(x) ⊆ SF( v ) so x ∈ X . This implies (k, j ) ∈ θ S and kj is not left. One can symmetrically prove that if k ≤ j, the sequence ik is not a left rps. 2 Another property of the relation θ S is: Lemma 6. The semi-commutation θ S is transitive. Proof. Let ( j , i ) ∈ θ S and (i , k) ∈ θ S . It follows k < i < j. Suppose that k < j ≤ k + n. It follows k < i ≤ k + n, i < j ≤ i + n,

u  = PF(u ) ∩ A i +n− j +1 ∈ SF(u ) ∩ PF( v ) ∩ SF( v ) and

u  = PF(u ) ∩ A k+n−i +1 ∈ SF(u ) ∩ PF( v ) ∩ SF( v ). Since j ≤ k + n, we get |u  | +|u  | ≥ n + 1 = |u | which implies u = v, a contradiction. It follows that j > k + n so ( j , k) ∈ θ S .

2

´ Métivier and Ochmanski [17] have proved that if a semi-commutation θ has no symmetric rule then its corresponding system R θ is confluent if and only if θ is transitive. So, we directly obtain: Corollary 1. R θ S is confluent. A first and easy result connecting the semi-commutation θ S and derivations in S is: Proposition 1. Let s ∈ N∗ and s ∈ R θ S (s) then s ∈ RPS( w , w  ) if and only if s ∈ RPS( w , w  ). Proof. By an induction argument, it is sufficient to consider one step of rewriting in R θ S from s to s . Thus we can assume s = ji and s = i j with ( j , i ) ∈ θ S . Let us consider two words w and w  such that w −→ w  . We shall prove that w −→ w  . ji ij We consider two cases: 1. j > i + n. In this case,

w = α1 u α2 u α3 −j− −−−−→ α1 u α2 v α3 − −−−→ α1 v α2 v α3 = w  =|α u α | i =|α | 1

2

1

and, clearly, we can also have

w = α1 u α2 u α3 − −−−→ α1 v α2 u α3 −j− −−−−→ α1 v α2 v α3 = w  . =|α u α | i =|α | 1

1

2

2. i < j ≤ i + n. In this case, u = α β α and v = αγ α for some words α , β and γ with |α | = i + n − j + 1. It follows w = α1 α β α β αα2 with |α1 | = i, |α1 α β| = j and w − → α1 α β αγ αα2 − → α1 αγ αγ αα2 = w  . This implies that we also j i have w − → α1 αγ α β αα2 − → α1 αγ αγ αα2 = w  . i

j

Similarly we can prove that w −→ w  implies w −→ w  that finishes the proof. ij

ji

2

We directly obtain as a corollary: Corollary 2. If s = leftrps( w , w  ) then s is in R θ S normal form. Conversely, we shall prove that if s = leftrps( w , w  ) it holds that s in RPS( w , w  ) implies s in R θ S (s). The proof is not so immediate and, in a first step, we show that the property holds in a special case. For this, we need several intermediate properties. Recall that u = dbg and v = dag for some words g and g  and some distinct letters a and b where d is the longest common prefix of u and v. Let us denote C = {u  | u  da ∈ PF(u )}. Observe that ε ∈ / C and that C + is not always included in dbA∗ : for instance, if S = {bba → bab} then C = {b} is not included in bbA∗ , but we have: Lemma 7. C ∗ db ⊆ dbA∗ .

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

209

Proof. The inclusion is clearly true if C = ∅. Otherwise, we shall prove that C i db ⊆ dbA∗ for every integer i. This is clear for i = 0. Next, assume that C i db ⊆ dbA∗ and let u  ∈ C . It follows u  C i db ⊆ u  dbA∗ . Since u  da ∈ PF(u ) and by a length argument, we get db ∈ PF(u  d) so u  C i db ⊆ dbA∗ . 2 Lemma 8. If xda ∈ PF(C ∗ ) then x ∈ C ∗ . Proof. There exists some word y such that xday = x x with x ∈ C ∗ , x ∈ C and | y | < |x |. We can distinguish two cases: 1. |day| < |x |. In that case, x = x z with zday ∈ C . It follows zda ∈ PF(u ) which implies z ∈ C and x = x z ∈ C ∗ . 2. |day| ≥ |x |. In that case d = d d with d = ε , x = xd and x = d ay ∈ C . It follows xd ∈ C ∗ , d = d1 d2 with d2 ∈ C ∗ , x ∈ C ∗ f for some word f and f d1 ∈ C . Since d2 d a ∈ PF(C ∗ ) with |d2 d a| ≤ |db|, we get d2 d a ∈ PF(db) by Lemma 7 so d2 d a ∈ PF(d), and we get fda = f d1 d2 d a ∈ PF( f d1 da) which is included in PF(u ) because f d1 ∈ C . So f ∈ C and x ∈ C ∗ . 2 Lemma 9. Let w and z be two words that satisfy w , w z ∈ C ∗ with db ∈ PF(zdb) then z ∈ C ∗ . Proof. The property is clearly true if C = ∅. Otherwise, let P = (db)−1 Cdb; P is a prefix code that satisfies dbP ∗ = C ∗ db: indeed, by definition of P and Lemma 7, we obtain dbP = Cdb, and, by induction, dbPk = C k db for every k ≥ 0. We get wzdb = wdbz with wdb, wdbz ∈ dbP ∗ and it follows z ∈ P ∗ and zdb = dbz ∈ dbP ∗ = C ∗ db so z ∈ C ∗ . 2 Lemma 10. If dbu b ∈ PF(u ), then C ∗ u A ∗ ∩ C ∗ dbu a A ∗ = ∅. Proof. First observe that C + dbA∗ = CdbA∗ since C ∗ db ⊆ dbA∗ by Lemma 7. Now, if C = ∅, we are done. Otherwise, assume wuy = w  dbu ay  with w , w  ∈ C ∗ . Since dbu a ∈ / PF(u ), w = w  . We can distinguish two cases: 1. w  = w z. In that case u y = zdbu ay  with z ∈ C + by Lemma 9, so C + dbA∗ ∩ u A ∗ = ∅. On the other hand, Cda ⊆ PF(u ) implies that CdbA∗ ∩ u A ∗ = ∅ and so C + dbA∗ ∩ u A ∗ = ∅, a contradiction. 2. w = w  z. In that case zuy = dbu ay  with z ∈ C ∗ by Lemma 9 and dbu a ∈ PF(C ∗ u ). Since dbu a ∈ / PF(u ), we get

dbu a ∈ PF(C + u ) ⊆ PF(C + dbA∗ ) = PF(CdbA∗ ). It follows that there exists some word x ∈ C such that dbu a ∈ PF(xdbA∗ ). Since xda ∈ PF(u ) it follows xdb ∈ / PF(dbu ) ⊆ PF(u ). From dbu a ∈ PF(xdbA∗ ) and xdb ∈ / PF(dbu ), we get |x| > |u  |. That implies dbu a ∈ PF(xdb) so dbu a ∈ PF(xd) ⊆ PF(u ), a contradiction. 2 Lemma 11. If S ( w ) ∩ u A ∗ = ∅ then w ∈ C ∗ u A ∗ . Proof. Let us consider a derivation of minimal length w − → ux for some word x ∈ A ∗ . The proof is by induction on |s|. If s ∗ s = ε then w ∈ u A ; else let us consider the smallest i such that |s|i > 0. − − We have s = s is and w −→ yu w → yv w − → ux with i = | y |. It follows w = yz with z −− → u w  where shift| y| (s ) = s . By    s

i

s



s

inductive hypothesis, we get z ∈ C ∗ u A ∗ . We also have ux = yz and v w  − → z . It follows z = daz by S ( v A ∗ ) ⊆ daA∗ . Now, from the minimality of the length of the derivation w − → ux, we get u ∈ / PF( yu w  ). This implies u ∈ / PF( yd) so yda ∈ PF(u ), s y ∈ C and w = yz ∈ C ∗ u A ∗ . 2

Note that the converse of this lemma does not hold: if we consider S 5 = {baa → aba}, we have d = ε and C = {b, ba}, so w = babbaa ∈ C 2 u but S 5 ( w ) = {babbaa, bababa} and S 5 ( w ) ∩ u A ∗ = ∅. The following lemma shows that there exists only one occurrence of u as a factor in C ∗ u: Lemma 12. C ∗ u ∩ A ∗ u A + = ∅. Proof. The case C = ∅ is clear so we can suppose C = ∅ which implies da ∈ F(u ). Assume that wu = w  u w  for some w ∈ C ∗ and w  = ε and consider the last occurrence of da as a factor in u: u = αdaβ with daβ ∈ / A + daA∗ . It follows α ∈ C  ∗ ∗ and | w | > |β| so u ∈ F(C d). Moreover, by Lemma 7 and a length argument, we get d ∈ PF(C ) so u ∈ F(C ∗ ). Let u  be the shortest member of C that satisfies u  = xy with y = ε and u ∈ PF( yC ∗ ). Since u  db ∈ / PF(C ∗ ), it follows x = ε . Moreover, since u  da ∈ PF(u ), we also get u  = yz with zda ∈ PF(C ∗ ), which implies z ∈ C ∗ by Lemma 8. Now, from the equalities u  = xy = yz, it follows u  ∈ SF( z∗ ) with | z| < |u  | a contradiction with the choice of u  . So C ∗ u ∩ A ∗ u A + = ∅. 2

210

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

We denote f the longest common suffix of u and v and we first suppose that |d| = r ≤ | f |. In this case, we shall prove that, for w  ∈ S ( w ), we can construct step by step the left rps from w to w  . First we have:

→ u w  with wu ∈ / A ∗ u A + and | w | = i, there exists Lemma 13. If PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ) then for every derivation wuw − s   a rps s with is ∈ R θ S (s0). Proof. The proof is an induction over |s|, the length of the sequence s. From Lemma 11 we get wuw = xux for some x ∈ C ∗ . If |x| < | w | then wu ∈ A ∗ u A + , a contradiction. If |x| > | w | then xu ∈ A ∗ u A + , a contradiction by Lemma 12. It follows w = x so w ∈ C ∗ . If s = ε then w = ε and i = 0 and we have is = 0 ∈ R θ S (0). If s = ε then s = js1 . If j = i then the property is satisfied so we assume j = i, and, since wu ∈ / A ∗ u A + , we have j > i. Let us distinguish two cases:

• j < i + n + 1 − r. In this case, wuw − → wu  daz − − → u w  with u  db ∈ PF(u ). Since j > i it follows u  = ε then u  d = dbu j s 1

for some word u  . From Lemma 10 it follows that wdbu az = wu  daz ∈ / C ∗ u A ∗ , a contradiction by Lemma 11.        • j ≥ i + n + 1 − r. In this case, wuw = wu u u z − → wu u v z − − → u w  with u  u  = u = u  u  and u  v  = v. It follows j s1 u  ∈ X ∩ Z so ( j , i ) ∈ θ S . By inductive hypothesis, there exists s1 such that is1 ∈ R θ S (s1 0) and it follows ijs1 ∈ R θ S (s0). 2 Remark 1.

/ SF( v ). 1. Observe that, if there exists some m ∈ PF(u ) ∩ SF(u ) ∩ PF( v ) \ SF( v ), we have u = u  m = mu  , v = mv  and m ∈ For the derivation u  mu  − → u  mv  , with j = |u  |, there does not exist s with 0s ∈ R θ S ( j0) since u ∈ / SF( vu  ). For j

instance, if S = {aa → ab} then aaa − → aab −→ abb and aaa −→ aba but aba is not in A ∗ aaA∗ . 0

1

0

2. Lemma 13 indicates that, when its condition is satisfied, in each step of any left derivation from a word that is not in u A ∗ to a word of u A ∗ one rewrite the leftmost occurrence of word u while the derivation does not reach a word of u A ∗ . Observe that the condition PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ) of Lemma 13 is a necessary condition: if S = {babb → bbba} then leftrps(bababbabb, babbbabba) is not in 2N∗ but is in 5N∗ . 3. More generally, at each step of a left derivation from a word w to a word w  , if the current word z = z0 . . . zk is different from w  , one take the smallest i such that zi = w i and one rewrite the leftmost occurrence of u that ends after the position i. One can observe that, if r ≤ | f |, then for every m ∈ PF(u ) ∩ SF(u ) ∩ PF( v ), we get |m| ≤ r ≤ | f | so m ∈ SF( f ) ⊆ SF( v ). Hence, we get as a corollary of Lemma 13:

→ u w  with wu not in A ∗ u A + and | w | = i, there exists a rps s with is ∈ Lemma 14. If r ≤ | f | then for every derivation wuw − s R θ S (s0). Note that the condition PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ) of Lemma 13 is weaker than the condition r ≤ | f | of Lemma 14. Indeed, the system {aaca → aaba} satisfies both r > | f | and PF(u ) ∩ SF(u ) ∩ PF( v ) ⊆ SF( v ). Recall that our goal is to show that if s = leftrps( w , w  ) then for every s ∈ leftrps( w , w  ) it holds that s ∈ R θ S (s). Although Lemma 13 generalizes Lemma 14, we consider in the rest of this section the two cases r ≤ | f | and r > | f |. Lemma 15. If r ≤ | f | and s = leftrps( w , w  ) then

s ∈ RPS( w , w  ) ⇒ s ∈ R θ S (s). Proof. The proof is an induction over |s|. The case |s| = 0 is clear. Otherwise, one can assume without loss of generality that |s|0 > 0. Indeed, if |s|0 = 0, since com(s) = com(s ), we get w = α β and w  = α β  with |α | = min(s) and we can consider the derivation from β to β  applying the corresponding translation on s and s. So, we have s = s1 0s2 with w − −→ u w  −−→ w  . Let w = xuy with xu not in A ∗ u A + and |x| = i. From Lemma 14, there s1

0s2

exists s1 such that is1 ∈ R θ S (s1 0). It follows is1 s2 ∈ R θ S (s) and w −s− → w  that implies that s = is for some rps s . By − 1 s2 inductive hypothesis, s ∈ R θ S (s1 s2 ) so s = is ∈ R θ S (s). 2 In order to solve the case r > | f |, we study the links between the system S and the system S R = {u R → v R } where u R denotes the reverse of the word u that is: if u = u 0 u 1 . . . un then u R = un . . . u 1 u 0 . When r > | f |, we can apply Lemma 15 on system S R : indeed, the longest common prefix of u R and v R is f R and the longest common suffix of u R and v R is d R with | f R | ≤ |d R |. Let

X  = PF(u R ) ∩ SF( v R ) = SF(u ) R ∩ PF( v ) R = Z R

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

211

and

Z  = SF(u R ) ∩ PF( v R ) = X R . So X  ∩ Z  = ( X ∩ Z ) R . Lemma 16. θ S = θ S R . Proof. Let us consider ( j , i ) ∈ θ S R . If j > i + n then ( j , i ) ∈ θ S ; else j > i and i + n − j + 1 ∈ {|x| | x ∈ X  ∩ Z  }. Clearly, {|x| | x ∈ X  ∩ Z  } = F so ( j , i ) ∈ θ S . Symmetrically, we also get the implication ( j , i ) ∈ θ S ⇒ ( j , i ) ∈ θ S R so θ S = θ S R . 2 ∗



Clearly there exists a bijection between the derivations w − → w  in S and the derivations w R −→ w  R in S R . More ∗ precisely, we define the following morphism hk from Nk−n to Nk∗−n by hk (i ) = k − n − 1 − i where k = | w |. Observe that hk ◦ hk is the identity and hk (i ) − hk ( j ) = j − i.

→ w  in S if and only if w R −−−→ w  R in S R where k = | w |. Lemma 17. w − hk ( s )

s

Proof. From the definition of hk , we get w − → w  in S if and only if w R −h−− → w  R in S R so w − → w  in S if and only if i k (i ) s R  R R w −−−→ w in S . 2 hk ( s )

Let

γ S be the partial commutation defined by γ S = θ S ∪ θ S−1 . Then:

Lemma 18. Let s ∈ RPS( w , w  ), k = | w | and s ∈ N∗ . R γ S (hk (s )) = R γ S (hk (s)) if and only if R γ S (s ) = R γ S (s). Proof. We have ((hk (i ), hk ( j )) ∈ θ S if and only if ( j , i ) ∈ θ S by hk (i ) − hk ( j ) = j − i. It follows that ((hk (i ), hk ( j )) ∈ γ S if and only if (i , j ) ∈ γ S . This implies R γ S (hk (s )) = R γ S (hk (s)) if and only if R γ S (s ) = R γ S (s). 2 We are now able to prove: Proposition 2. Let s and s be two rps and w, w  be two words with s ∈ RPS( w , w  ) then s ∈ RPS( w , w  ) if and only if R γ S (s ) = R γ S ( s ). Proof. 1. For the only if part, we know that R θ S is confluent from Corollary 1. It follows that if R γ S (s ) = R γ S (s), there exists s ∈ R θ S (s) ∩ R θ S (s ). From Proposition 1 we get s ∈ RPS( w , w  ) and s ∈ RPS( w , w  ). 2. For the if part, one distinguish two cases: • r ≤ | f |. Let us consider s = leftrps( w , w  ). From Lemma 15 we get s ∈ R θ S (s) and s ∈ R θ S (s ) so R γ S (s ) = R γ S (s). • r > | f |. In this case, we can use system S R . Since w − → w  and w −→ w  in S, we have, from Lemma 17, w R −−−→  s

hk ( s )

s

 R in S R with k = | w |. We can apply Lemma 15 on S R as in the previous case and, thanks to w  R and w R − −−− → w hk ( s )

Lemma 16, we get R γ S (hk (s )) = R γ S (hk (s)) which implies R γ S (s ) = R γ S (s) from Lemma 18.

2

As a consequence, we obtain the following desired theorem without any condition over the lengths of d and f : Theorem 2. If s = leftrps( w , w  ) then s ∈ RPS( w , w  ) if and only if s is in R θ S (s). Proof. If s ∈ R θ S (s), it follows from Proposition 1 that s ∈ RPS( w , w  ). Conversely, if s ∈ RPS( w , w  ) and s ∈ RPS( w , w  ), it follows from Proposition 2 that R γ S (s ) = R γ S (s) and, since R θ S is confluent from Corollary 1, there exists s ∈ R θ S (s) ∩ R θ S (s ). Since s = s from Corollary 2, we get s ∈ R θ S (s). 2 We directly get by Theorem 2 and Corollary 2:

/ θ S for all i j ∈ F(s). Corollary 3. If s ∈ RPS( w , w  ) then s = leftrps( w , w  ) if and only if s is in R θ S normal form i.e., (i , j ) ∈ In the next section, the contributions of left derivations will play a crucial role and we call these contributions left contributions. The next proposition points out links between left contributions and the semi-commutation θ S .

212

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Proposition 3. A contribution c is a left contribution if and only if the property (P ): ∀i j ∈ F(c ), ( j , i ) ∈ / θ S is satisfied. Proof. Assume first that c satisfies (P ). Since c is a contribution, there exists a rps s such that c = contt (s) for some t. If s is left, we are done. Otherwise, one use an induction over the number of steps of rewriting in R θ S that are needed to transform s into its equivalent left rps. Let s = s lms with (l, m) ∈ θ S . If l ≤ t ≤ m + n, denoting i = t − l and j = t − m, we get i j ∈ F(c ) and ( j , i ) ∈ θ S , a contradiction. So t < l or t > m + n. This implies contt (s mls ) = contt (s) = c and it follows by induction that c is a contribution of a left derivation. Conversely, let c = contt (s) for some rps s. Assume that there exists i j in F(c ) such that ( j , i ) ∈ θ S . It follows s = s (t − i )s (t − j )s with contt (s ) = ε . Thus for all letters k in s , t < k or k < t − n, moreover (t − i , t − j ) ∈ θ S since (t − i ) −(t − j ) = j − i. Assume that s = s1 s2 for all s1 = t (s ). This implies s = s1 k1 k2 s2 with k1 > t and k2 < t − n and it follows that (k1 , k2 ) ∈ θ S so s is not left. If s = s1 s2 , one can distinguish three cases: 1. if s1 = ε , then s = s (t − i )k1 s1 s2 (t − j )s with k1 < t − n ≤ t − j. It follows that (t − i )k1 is not left by Lemma 5 so s is not left. 2. if s2 = ε , we get s = s (t − i )s1 s2 k2 (t − j )s with k2 > t ≥ t − i. It follows that k2 (t − j ) is not left by Lemma 5 so s is not left. 3. if s1 = s2 = ε , then s = s (t − i )(t − j )s is not left since (t − i , t − j ) ∈ θ S . 2 Corollary 4. A rps s is left if and only if: 1. if i j ∈ F(s) then i ≤ j + n and 2. all the contributions of s are left. Proof. We only have to prove the if part. If s is a non-left rps, it follows from Corollary 3 that there exists i j ∈ F(s) with (i , j ) ∈ θ S . Assume i ≤ j + n, it follows 0m ∈ F(conti (s)) with m = i − j and (m, 0) ∈ θ S and by Proposition 3 conti (s) is not a left contribution. 2 As a consequence we also get: Corollary 5. It holds that all the contributions are left if and only if X ∩ Z = ∅. To finish this section, we now prove a property, that extends the result of Lemma 3 and that will be useful in the next section. We need the following lemma: Lemma 19. Let w 0 w 1 . . . w k − → w 0 w 1 . . . w k for some letters w 0 , w 1 , . . . , w k , w 0 , w 1 , . . . , w k and some sequence s. Let 0 ≤ i ≤ s

k − r + 1 such that for every j ∈ F(s), j + n < i + r or j ≥ i then: 1. w 0 . . . w i +r −1 −−−−→ w 0 . . . w i +r −1 , < i ( s )

2. if |s|i = 0 then w i +r = w i +r , 3. if w i +r = v r then (|s|i = 0 ∧ w i +r = v r ). Proof.

→ 1. The property is clearly true if s = ε . Now let s = js for some integer j and some sequence s and w 0 w 1 . . . w k − j   w 0 w 1 . . . w k −→ w 0 w 1 . . . w k . By inductive hypothesis, we have w 0 . . . w i +r −1 −−−−− → w 0 . . . w i +r −1 . Now, if j ≥ i  s

< i ( s )

then w 0 . . . w i +r −1 = w 0 . . . w i +r −1 and 

2. Clearly, it is sufficient to prove the property for s = j for some integer j = i. If j + n < i + r or if j > i + r then w i +r is not affected by the step of rewriting so w i +r = w i +r ; else by hypothesis i < j ≤ i + r so w i +r = u i +r − j , moreover since i + r − j < r, we get u i +r − j = v i +r − j = w i +r . 3. As above, we only have to consider the case s = j for some integer j. Since w i +r = u r , it follows j = i and, from 2, w i +r = w i +r = v r . 2 We can now prove the following property:

→ x y  and xz − → x z where x, y, z, x , y  and z are words with |x| = |x |, for every Proposition 4. For all left derivations xy − s s  0 ≤ i < |x| − r, conti (s) = conti (s ).

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

213

Proof. The proof is by contradiction: let xy − → x y  and xz −→ x z be two left derivations for some words x, y , z, x , y  , z s s with |x| = |x | that contradict the above statement. Moreover, we assume that K = |x| + |s| + |s | is minimal. Let i be the smallest integer such that conti (s) = conti (s ). This implies i = |x| − r − 1 else we could choose a shorter x and a lower K . Since conti (s) = conti (s ) and for every 0 ≤ i  < i, conti  (s) = conti  (s ), we get that |s|i + |s |i > 0. Furthermore, from Lemma 3, |s|i = |s |i so s and s are not empty. Then s = i 1 s1 and s = i 1 s1 for some integers i 1 , i 1 and some sequences s1 , s1 . Let us suppose i 1 = i 1 = j then xy − → x y  −−→ x y  with |x | = |x| and xz − → x z −− x z but this leads to a contradiction since  s→ j

j

s1

1

|x | + |s1 | + |s1 | < |x| + |s| + |s | so i 1 = i 1 . We assume in the following i 1 > i 1 . Observe that, from the hypothesis on i, it follows by induction on |s| + |s | that  i. Since i = |x| − r − 1, we have xy −i→ xy  −− → x y  . Moreover, conti (i 1 s1 ) = s1  z , we obtain again1 a contradiction conti (s1 ) so taking the left derivations xy  − − → x y  and xz −→ x since |x| + |s1 | + |s | < s1 s |x| + |s| + |s |; it follows i 1 = i and i 1 < i. Since 
as follows:

s

x0 . . . xi +r y − → w 0 . . . w i +r y  − −−−−→ w 0 . . . w i +r y  − −→ x y  α k ...α k δ i

1 1

t

t

and

x0 . . . xi +r z − −−−−→ w 0 . . . w i +r z − −→ x z k ...β  k δ 1

t

t −1

with δ = αt  +1 . . . k p α p +1 , δ  = βt  . . . k p β p if t  < p and δ = α p +1 , δ  = β p if t  = p. In particular, w i +r = v r and, from item 3 of Lemma 19, w i +r = v r . On the other hand, from item 2 of Lemma 19 we get  w i +r = xi +r = u r . But this leads to a contradiction since the derivations w 0 . . . w i +r y  − −→ x y  and w 0 . . . w i +r z −k−− → x z k δ δ t

t

imply w i +r = w i +r = u i +r −kt  . This contradiction proves that k j + n < i + r for every j ≤ t. Let us now consider the two rps i α1k1 . . . αt kt and k1 β1 . . . kt β  i and their corresponding derivations from xy and xz. We have:

x0 . . . xi +r −1 u r y − → x0 . . . xi +r −1 v r y  − −−−−−→ x0 . . . xi +r −1 xi +r y  α k ...α k i

t t

1 1

and

x0 . . . xi +r −1 u r z − −−−→ x0 . . . xi +r −1 ur z − → x0 . . . xi +r −1 v r z . k ...k β  1

t

i

From Item 1 of Lemma 19, we have x0 . . . xi +r −1 −−−−→ x0 . . . xi +r −1 and x0 . . . xi +r −1 −−−−→ x0 . . . xi +r −1 so x0 . . . xi +r −1 = k1 ...kt k1 ...kt x0 . . . xi +r −1 .      Moreover, by Item 3 of Lemma 19, xi +r = v r which implies x0 . . . xi +r = x0 . . . xi +r −1 v r , and from the minimality of K = |x| + |s| + |s |, we have p = t, |s|i = |s |i = 1, α p +1 = ε and β p = β p i. In particular, denoting w = x0 . . . xi +r −1 u r . . . un and w  = x0 . . . xi +r −1 v r . . . v n , it follows the existence of the two following derivations:

w− → x0 . . . xi +r −1 v r . . . v n − −−−→ w  k1 ...k p i w− −−−→ x0 . . . xi +r −1 ur . . . un − → w. k1 ...k p i From Proposition 2, R γ S (ik1 . . . k p ) = R γ S (k1 . . . k p i ). This implies that for any k ∈ {k1 . . . k p }, (i , k) ∈ θ S since k + n < i + r. Let us finally consider the rps s = i α1 k1 . . . α p k p and let us distinguish two cases: 1. if α1 = ε , it follows that ik1 is not a left rps since (i , k1 ) ∈ θ S and that implies that s is not a left rps either, a contradiction. 2. α1 = α1 j for some integer j. Since (i , k1 ) ∈ θ S and j ≥ i, it follows from Lemma 5 that jk1 is not left and that implies that s is not left, a contradiction. 2 4. A transducer for one-rule length-preserving system In a previous paper [13], we addressed the problem to know whether the relation induced by a one-rule lengthpreserving rewrite system is rational. We have studied and partially proved the following conjecture proposed in [14,21]: Conjecture 1. A non-trivial one-rule length-preserving rewrite system is a rational transduction if and only if it is not quasi-conjugate.

214

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228 a

a

b

a

b

a

a

b

a

a

↓ a

a

b

a

a

b

a

b

a

a

a

b

a

b

a

b

a

a

a

b

a

1

2

3

↓ a

a



contributions :

a

a

ε

0

a 1

b 0 2

a 1 3

b 2

a 3 0

Fig. 2. The contributions of left derivation from aababaabaa to aaaabababa.

Here a one-rule system S = {u → v } is called quasi-conjugate if there are words x, y and z such that u = xzy and v = yzx. In [13] we proved the only if part of the conjecture and, conversely, we considered two cases for which the if part is satisfied. These cases are based on the kind of overlaps that exist between u and v and depend on the presence of short pairs of overlaps, that are pairs of overlaps (x, z) ∈ X × Z such that |xz| ≤ |u | and the presence of large pairs of overlaps, that are pairs of overlaps (x, z) ∈ X × Z such that |xz| > |u |. In this context, the aim of this section is, given a one-rule length-preserving rewrite system S, to define a canonical automaton that recognizes the following language L S : ∗

system S, we associate the language L S = { w ⊗ w  | w − → w} Definition 3. With every one-rule length-preserving  nrewrite where ⊗ is the binary operation defined from A × A n to ( A × A )∗ by: n ≥0

• ε ⊗ ε = (ε , ε ) • xw ⊗ y w  = (x, y )( w ⊗ w  ), where x, y ∈ A and w , w  ∈ A ∗ . A one-rule length-preserving rewrite system S is a rational transduction if and only if the language L S is regular [6,5]. The language L S need not be regular: for instance, for S 1 = {ba → ab}, am bn ∈ S 1 (bm an ) if and only if m = n so

L S 1 ∩ (b, a)∗ (a, b)∗ = {(b, a)n (a, b)n | n ≥ 0}. As a matter of fact, L S need not even be context-free: Example 4. Let S 6 = {baa → aab}; it is easily seen that (aa)n+ p (ba)m (bb)q ∈ S 6 ((ba)n (bb) p (aa)m+q ) if and only if n = m and p = q. It follows

L S 6 ∩ ((b, a)(a, a))∗ ((b, a)(b, a))∗ ((a, b)(a, a))∗ ((a, b)(a, b))∗

= {((b, a)(a, a))n ((b, a)(b, a)) p ((a, b)(a, a))n ((a, b)(a, b)) p } which is not context-free. Nevertheless it is possible to give a definition of an automaton recognizing L S where the states are exactly all left contributions, even if, as shown by the previous examples, this automaton may be infinite (i.e. it may have an infinite number of states). At the contrary, this automaton will be always finite when S is a rational transduction, as, for instance, in the cases pointed in [13]. Let us first give an example that illustrates the idea of the construction. Example 5. Let S 7 = {abaa → aaba}. We have u 0 = u 2 = u 3 = v 0 = v 1 = v 3 = a and u 1 = v 2 = b. Let us consider in Fig. 2 the left derivation from w = aababaabaa to w  = aaaabababa and its corresponding contributions. This would lead to the automaton part given in Fig. 3. One can make several observations on this example:

• In this part of automaton, the left derivation w − → w  with s = 3.1.6 corresponds to the successful path going s

successively through the states cont0 (s) = ε , cont1 (s) = 0, cont2 (s) = 1, cont3 (s) = 0.2, cont4 (s) = 1.3, cont5 (s) = 2, cont6 (s) = 3.0, cont7 (s) = 1, cont8 (s) = 2 and cont9 (s) = 3, • the successors of the different states can easily be computed: for instance the successor of the state cont3 (s) = 0.2 is the state

cont4 (s) = (0 + 1).(2 + 1) = 1.3,

• the state 1 has two successors: 2 = 1 + 1 and 0.2 = 0.(1 + 1) because v 0 = u 2 ,

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

215

Fig. 3. An automaton part corresponding to a left derivation.

• the label of every transition (q, x/ y , q ) with q = ε is completely defined by the destination state of the transition: for instance in the label b/a of the transition (0.2, w 4 / w 4 , 1.3) = (0.2, b/a, 1.3), we have b = u 1 where 1 is the first index of 1.3 and a = v 3 where 3 is the last index of 1.3. We shall now give precisely the construction. Since (left) contributions play a central role in this automaton, we now give some basic properties that are satisfied by contributions: Property 2. 1. 2. 3. 4.

every factor of a contribution is a contribution, every factor of a left contribution is a left contribution, no contribution contains 00 as a factor, no contribution contains nn as a factor.

Proof. 1 is a consequences of item 1 of Property 1 and 2 is a consequence of Proposition 3. For 3, it is sufficient to prove that 00 is not a contribution. Let s be a rps such that there exists an index j that satisfies cont j (s) = 00. Then s = s1 js js2 with cont j (s ) = ε . It follows that every i with |s |i > 0 satisfies i < j − n or i > j. Since < j −n (s ) j > j (s ) j ∈ R θ S ( js j ), we get by Proposition 1 that < j −n (s ) j > j (s ) j is a rps and it follows that there exists a rps js j such that for every i with |s |i > 0, i > j. From item 5 of Property 1 we obtain that there exists a rps 0s 0 which implies S ( v A ∗ ) ∩ u A ∗ = ∅, a contradiction. Symmetrically, one can prove 4 using the property S ( A ∗ v ) ∩ A ∗ u = ∅. 2 Note that Property 2 does not imply that no contribution contains ii as a factor if i = 0 and i = n. Indeed let us consider S 8 = {cba → abc }. We have leftrps(cbcbaba, ababcbc) = 2.0.4.2, cont2 (2.0.4.2) = 0.2.0 and cont3 (2.0.4.2) = 1.1. As we said before, the states of our automaton will be the left contributions. The difficulty is that we do not know how to compute the set of all these left contributions or how to decide, given a sequence c = i 1 . . . ik of Nn∗ , whether this sequence is a (left) contribution or not. Of course, if c = ε or if c = 1 then c is a left contribution, else, if k > 1, we must apply a first filter, and we consider the sequences c = i 1 . . . ik ∈ Nn∗ that satisfy the following properties that must be satisfied by left contributions:

P1 ) v i 1 v i 2 . . . v ik−1 = u i 2 . . . u ik (coming from Lemma 2) P2 ) ∀1 ≤ t < k, (it it +1 = 00) ∧ (it it +1 = nn) (coming from Property 2) P3 ) ∀1 ≤ t < k, (it +1 , it ) ∈ / θ S (coming from Proposition 3). We define the set of state candidates (denoted SC S ) as the set of sequences that satisfy the three properties P1 ,P2 and P3 . In particular, we have ε ∈ SC S and Nn ⊆ SC S . We call these sequences candidates because the three properties P1 ,P2 and P3 are not sufficient to ensure that a state candidate will really be a state in the automaton. The reason is that the candidate must appear as a contribution in a successful rewriting and a definition of accessible and co-accessible contributions is needed. Let h be the morphism defined from N∗ to N∗ by h(n) = ε and h(i ) = i + 1 for i = n and let π be the morphism defined from N∗ to N∗ by π (0) = ε and π (i ) = i for i = 0. Note that if c = conti (s) for some s and c  = conti +1 (s) then π (c  ) = h(c ) so we define the operation successor over the set of state candidates SC S , denoted succ, by:

∀c ∈ SC S , succ(c ) = {c  ∈ SC S | π (c  ) = h(c )}.

216

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

It is easily seen that the operation succ satisfies the following properties:

• • • • •

succ(ε ) = succ(n) = {ε , 0}, ∀0 ≤ i < n, i + 1 ∈ succ(i ), if iq ∈ succ((i − 1)q) then q ∈ succ(q), if qnq and qq are in SC S then succ(qnq ) = succ(qq ), if q ∈ succ( p ) then |q| + 1 ≤ 2(| p | + 1) and | p | + 1 ≤ 2(|q| + 1).

Symmetrically, we define the operation predecessor over the set of state candidates SC S , denoted pred, by: ∀c ∈ SC S , pred(c ) = {c  ∈ SC S | c ∈ succ(c  )}. Since the operations succ and pred can be seen as binary relations over SC S , we also define their reflexive and transitive closures that we denote succ∗ and pred∗ . Clearly, for every i ∈ Nn , succ∗ (ε ) = succ∗ (i ) and pred∗ (ε ) = pred∗ (i ). Example 6. Let S 8 = {cba → abc }. Note that θ S 8 = ∅. We have succ(ε ) = {ε , 0} and succ(0) = {1}: indeed neither 0.1 nor 1.0 belongs to SC S 8 by property P1 . We have succ(1) = {2, 0.2, 2.0, 0.2.0}: indeed v 0 = u 2 = a and v 2 = u 0 = c. Since pred(1.1) = {0.2.0, 2.0.2.0, 0.2.0.2, 2.0.2.0.2}, we have 1.1 ∈ succ∗ (ε ). We are now able to give the definition of our automaton: Definition 4. Let S = {u → v } be a one-rule length-preserving rewrite system. The left-contribution automaton LCA S = ( A × A , Q lca , I lca , F lca , lca ) where Q lca is the set of states, I lca ⊆ Q lca is the set of initial states, F lca ⊆ Q lca is the set of final states and lca is the set of transitions, is defined by:

• • • •

Q lca = succ∗ (ε ) ∩ pred∗ (n) I lca = {ε } F lca = {ε , n} lca = {(c , x/ y , c  ) | c  ∈ succ(c ), c  = i 1 . . . ik = ε , x = u i 1 , y = v ik } ∪ {(ε , x/x, ε ) | x ∈ A } ∪ {(n, x/x, ε ) | x ∈ A }.

Note that, by its definition based on succ and pred, the state set Q lca satisfies an important property: it is closed under factors. We also observe that LCA S is a trim automaton (i.e. all states are accessible and co-accessible) and it has a single initial state. Also note the three following properties satisfied by the set of transitions lca :

P4 ) for 0 < i ≤ n, if ((i − 1)q, u i / y , iq ) ∈ lca then (q, v i / y , q ) ∈ lca , P5 ) if (q, u 0 / y , 0q ) ∈ lca then (q, v 0 / y , q ) ∈ lca , P6 ) if (nq, x/ y , q ) ∈ lca then (q, x/ y , q ) ∈ lca . In order to prove the correctness of this automaton, that is LCA S recognizes the language L S , we first show that every successful path in the automaton is associated with a left derivation: Lemma 20. Let (ε , x0 / y 0 , q0 )(q0 , x1 / y 1 , q1 ) . . . (qk−1 , xk / yk , qk ) be a successful path in the automaton LCA S and let w = x0 x1 . . . xk and w  = y 0 y 1 . . . yk . Then w  ∈ S ( w ) and for every i ∈ [0, k], q i = conti (s) where s = leftrps( w , w  ).



Proof. The proof is an induction on N = i ∈[0,k] (|qi |0 ). If N = 0 then for every i ∈ [0, k], q i = ε , so qi = conti (ε ) and w = w  . If N > 0, let us consider the smallest i such that

qi = 0qi , qi +1 = 1qi +1 , . . . , qi +n = nqi +n . Such an index i exists since the path is successful: for instance consider the greatest j such that q j = 0qj . Note that i + n is the smallest index such that q i +n ∈ nNn∗ . We get xi = u 0 , xi +1 = u 1 , . . . , xi +n = un so it follows (qi −1 , u 0 / y i , qi ), (qi , u 1 / y i +1 , qi +1 ), . . . , (qi +n−1 , un / y i +n , qi +n ) ∈ lca . This implies (qi −1 , v 0 / y i , qi ) ∈ lca from P5 and we also get from P4 :

(qi , v 1 / y i +1 , qi +1 ), . . . , (qi +n−1 , v n / y i +n , qi +n ) ∈ lca . Moreover, since q i +n = nqi +n , if i + n < k, we get from P6 :

(qi +n, xi +n+1 / y i +n+1 , qi +n+1 ) ∈ lca and, if i + n = k, it follows qi +n = ε . So, in both cases, we get a new successful path and, using the inductive hypothesis, we obtain the left derivation

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

217

 x0 x1 . . . xi −1 vxi +n+1 . . . xk − →  w s

with ∀ j ∈ [0, i − 1] ∪ [i + n + 1, k], q j = cont j (s ), ∀ j ∈ [i , i + n], qj = cont j (s ). Let s = is , it remains to prove: 1. ∀0 ≤ j ≤ k, q j = cont j (is ), 2. is is left. For 1, it is sufficient to observe that

∀ j ∈ [0, i − 1] ∪ [i + n + 1, k], q j = cont j (s ) = cont j (is ) and ∀ j ∈ [i , i + n], q j = ( j − i )qj = cont j (is ). For 2, the property is clearly true if s = ε , so let s = js for some j and s . For a proof by contradiction, assume that i > j + n. Then it follows

q j = cont j (s) = cont j (ijs ) = cont j ( js ) = 0qj , q j +1 = cont j +1 ( js ) = 1qj +1 , . . . , q j +n = nqj +n , a contradiction to i being smallest. So i ≤ j + n. By 1, all the contributions of is are left contributions, and then by Corollary 4, is is left. 2 As a direct consequence of Lemma 20, we obtain: Corollary 6. The automaton LCA S is unambiguous.3 We shall now prove that every word of L S is recognized by automaton LCA S : Lemma 21. Let w , w  be two words with w  ∈ S ( w ) then there exists a successful path in automaton LCA S that is labeled by w ⊗ w  . Proof. Let w , w  be two words with w  ∈ S ( w ) and s = leftrps( w , w  ). We prove the lemma by induction over |s|, the length of the rps s.

• If s = ε then w = w  and there clearly exists a successful path, labeled by w ⊗ w  for the state ε to the state ε , using the transitions in {(ε , x/x, ε ) | x ∈ A }. • If s = ε then s = s i and there exist words w 1 , α , w 2 , w 1 , w 2 with | w 1 | = | w 1 | = i and | w 2 | = | w 2 | such that w = w 1 α w 2 −→ w 1 u w 2 − → w 1 v w 2 = w  . By inductive hypothesis, there exists a successful path in LCA S , labeled by w ⊗  s

i

w 1 u w 2 . This path can be decomposed in three paths: one path from the state ε to a state q labeled by w 1 ⊗ w 1 , a path from q to a state q labeled by α ⊗ u and a path from q to a final state q labeled by w 2 ⊗ w 2 . We shall prove that there exists a path from q to q n labeled by α ⊗ v that will give a successful path in LCA S labeled by w ⊗ w  . We can write the path from the state q to the state q labeled by α ⊗ u as the concatenation of transitions:

(q, α0 /u 0 , q0 )(q0 , α1 /u 1 , q1 ) . . . (qn−1 , αn /un , qn ) with qn = q and, from Lemma 20, we have for every j ∈ [0, n], q j = conti + j (s ). It follows that for every j ∈ [0, n], q j j = conti + j (s) is a state in the automaton LCA S . Moreover, we have q0 0 ∈ succ(q) and for every j ∈ [1, n], q j j ∈ succ(q j −1 ( j − 1)). Now, if q0 = ε it follows α0 = u 0 , q ∈ {ε , n} and (q, α0 / v 0 , 0) ∈ lca , else, if q0 = i 1 . . . ik then α0 = u i 1 and (q, α0 / v 0 , q0 0) ∈ lca . Similarly, we have for every j ∈ [1, n], (q j −1 ( j − 1), α j / v j , q j j ) ∈ lca so there exists a path from q to q n labeled by α ⊗ v. We now distinguish two cases: 1. If w 2 = ε then q is final and, since q n is a state it follows q = ε and q n = n is final. 2. If w 2 = xw  for some letter x then w 2 = y w  for some letter y and there exists a state q such that (q , x/ y , q ) ∈ lca and such that there exists a path from q to the final state q labeled by w  ⊗ w  . Since (q n, x/ y , q ) ∈ lca from the definition of succ, there exists a path from q to q labeled by α w 2 ⊗ v w 2 . In all cases we obtain a successful path in automaton LCA S that is labeled by w ⊗ w  .

2

By Lemma 20 and Lemma 21, we get:

3 We consider here the ambiguity of the automaton LCA S recognizing the regular language L S and not the ambiguity of the associated transducer that computes S ( w ) for every word w.

218

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Fig. 4. The automaton LCA S 9 for S 9 = {aba → bba}.

Proposition 5. For every one-rule length-preserving rewrite system S, the left-contribution automaton LCAS recognizes the language L S . Note that another consequence of Lemma 20 and Lemma 21 is that the state set Q lca is exactly the set of all the left contributions. Example 7. Let S 9 = {aba → bba}. Let us compute the set of accessible states of the automaton LCA S 9 starting from the initial state ε : succ(ε ) = {ε , 0}; succ(0) = {1, 0.1} because 1.0 does not satisfy property P1 ; succ(1) = {2, 2.0} (0.2 does not satisfy property P1 ); succ(0.1) = ∅ because 1.2, 1.0 and 0.2 do not satisfy property P1 . It follows that 0.1 is not co-accessible and will not be a state in the automaton; succ(2) = {ε , 0} and finally succ(2.0) = {1, 0.1} and, as 0.1 is not co-accessible, it is not in Q lca and there is a single outgoing transition leaving state 2.0. This gives the automaton LCA S 9 of Fig. 4. The automaton in Example 7 is not only unambiguous but also deterministic. Observe that it would not be the case if u 0 = v 0 since in this case there exist a transition (ε , u 0 /u 0 , ε ) and a transition (ε , u 0 /u 0 , 0) in LCA S . Conversely, we can prove that u 0 = v 0 is sufficient to ensure that LCA S is deterministic: Proposition 6. The automaton LCA S is deterministic if and only if u 0 = v 0 . Proof. Let q, q , q be states in LCA S with q = q and x, y be two letters such that (q, x/ y , q ) ∈ lca and (q, x/ y , q ) ∈ lca . We claim u 0 = v 0 . If q ∈ {ε , n} then {q , q } = {ε , 0}. We can suppose q = ε and q = 0; then the transition (q, x/ y , ε ) implies x = y and the transition (q, x/ y , 0) implies x = u 0 and y = v 0 , so u 0 = v 0 . If q ∈ / {ε , n} then ε ∈ / {q , q }. Since     q ∈ succ(q), q ∈ succ(q) and q = q , we can distinguish three cases: 1. q = c 1 ijc2 and q = c 1 i0 jc 2 for some sequences c 1 , c 2 , c 1 , c 2 and some integers i , j. It follows from property P1 that v i = u j , v i = u 0 and v 0 = u j which implies u 0 = v 0 . 2. q = c 1 i and q = c 1 i0 for some sequences c 1 , c 1 and some integer i. It follows from property P1 that v i = u 0 and from the definition of the transitions in LCA S that v 0 = y = v i which implies u 0 = v 0 . 3. q = ic 1 and q = 0ic 1 for some sequences c 1 , c 1 and some integer i. It follows from property P1 that v 0 = u i and from the definition of the transitions in LCA S that u i = x = u 0 which again implies u 0 = v 0 . 2 Even if automaton LCA S is not deterministic (which happens when u 0 = v 0 ), it is in fact not very far from being deterministic. More precisely, as a consequence of Proposition 4 and from Lemma 20, we obtain that automaton LCA S has a bounded delay (see [1]). This property generalizes the notion of deterministic automata since a deterministic automaton is an automaton with a delay 0. Definition 5. Let A = ( A , Q , I , F , ) be an automaton over alphabet A where Q is the set of states, I ⊆ Q is the set of initial states, F ⊆ Q is the set of final states and ⊆ Q × A × Q is the set of transitions. Automaton A has a bounded delay δ ≥ 0 if q1 = q1 holds for all transitions ( p , a, q1 ) and ( p , a, q1 ) and paths from state q1 to some state q2 and from state q1 to some state q2 labeled by a same word v with | v | = δ . Recall that r is the length of the longest common prefix of u and v. We have: Proposition 7. Automaton LCA S is an automaton with a bounded delay r. Proof. Let ( p , α /β, p  ) and ( p , α /β, p  ) be two transitions in lca . Let q , q in Q lca and w in ( A × A )∗ such that | w | = r and there exist a path from p  to q and a path from p  to q both labeled by w. Since automaton LCA S is trim and has

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

219

Fig. 5. The canonical automaton CAN S 9 for S 9 = {aba → bba}.

a single initial state ε , it follows that there exist a path from ε to p, labeled by some word w  ∈ ( A × A )∗ , a path from q to a final state f labeled by some word w  ∈ ( A × A )∗ and a path from q to a final state f  labeled by some word w  ∈ ( A × A )∗ . Let x, y , x , y  ∈ A ∗ defined by x ⊗ x = w  (α /β) w, y ⊗ y  = w  and z ⊗ z = w  . From Lemma 20, we have xy − → x y  and xz −→ x z for some left rps s and some left rps s . Moreover p  = cont| w  | (s) and p  = cont| w  | (s ). Since s s | w | = r, it follows that | w  | < |x| − r and, from Proposition 4, it follows cont| w  | (s) = cont| w  | (s ) so p  = p  . 2 We observe that the automaton in Example 7 is not minimal since some states are equivalent: clearly 2 is equivalent to

ε and 2.0 is equivalent to 0. The reason comes from the definition of succ: if we consider the morphism ψ defined from N∗ to N∗ by ψ(n) = ε and ψ(i ) = i for i = n then for all states q and q such that ψ(q) = ψ(q ) it holds that succ(q) = succ(q ). This property is used to define the canonical automaton for a one-rule length-preserving rewrite system S as follows: Definition 6. The canonical automaton CAN S = { A × A , Q can , I can , F can , can } for a one-rule length-preserving rewrite system S is defined by:

• Q can = ψ(succ∗ (ε ) ∩ pred∗ (n)) • I can = F can = {ε } • can = {(ψ(c ), x/ y , ψ(c  )) | (c , x/ y , c  ) ∈ lca }. Example 8. The canonical automaton corresponding to the automaton of Example 7 is given Fig. 5. Observe that we have Q can ⊆ Nn∗−1 by definition. The following properties are clearly preserved from automaton LCA S to automaton CAN S :

• The state set Q can is closed under factors. • Automaton LCA S and automaton CAN S recognize the same language: indeed if (c  , x/ y , c ) ∈ lca and ψ(c  ) = ψ(c  ) for some c  ∈ Q lca then (c  , x/ y , c ) ∈ lca . • Automaton CAN S is an automaton with bounded delay r and so is unambiguous. • Automaton CAN S is deterministic if and only if u 0 = v 0 and is bi-deterministic if and only if u 0 = v 0 and un = v n . In contrast, some properties that are satisfied by automaton LCA S are not satisfied any more by automaton CAN S :

• 00 may appear as a factor in a label of a state of CAN S : indeed 00 ∈ Q can if and only if 0n0 ∈ Q lca . Note that in this case, v 0 = un and v n = u 0 . • The label of a transition is not given by the label of its reaching state anymore, so different transitions reaching the same state may have different labels. Example 9. The canonical automaton for the rewrite system S 8 = {cba → abc } is the infinite automaton given in Fig. 6. Observe that, for the system S 8 = {cba → abc } of Example 9, we get that L S 8 ∩ (c , a)((b, b)(c , a))∗ (b, b)((a, c )(b, b))∗ (a, c ) is equal to the non-regular language {(c , a)((b, b)(c , a))k (b, b)((a, c )(b, b))k (a, c ) | k ≥ 0}. It follows that S 8 is not a rational transduction and it is not surprising that the canonical automaton for this system is infinite. Conversely, one can wonder whether the canonical automaton for a system S such that L S is regular could be infinite. The answer is clear if u 0 = v 0 and un = v n : indeed in this case CAN S is bi-deterministic that implies its minimality so L S is regular if and only if CAN S is finite. It is not so immediate in the general case and we shall use the following lemma that establishes a link between the size of different states of automaton LCA S that are reached by a same word. For every state q of LCA S , we will denote by pre(q) the set of all the words labeling a path from state ε to state q and, symmetrically, post(q) will denote the set of all the words labeling a path from state q to a final state. Lemma 22. There exists a positive integer K such that for all states q, q of LCA S , if pre(q) ∩ pre(q ) = ∅ then |q | ≤ K (|q| + 1).

220

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Fig. 6. The canonical automaton CAN S 8 for S 8 = {cba → abc }.

Proof. Let ( p , x/ y , p  ) ∈ then | p  | + 1 ≤ 2(| p | + 1) and | p | + 1 ≤ 2(| p  | + 1). It follows by induction that if there exists a  path from some state p to some state p  labeled by some word w  in LCA S then | p  | + 1 ≤ 2| w | (| p | + 1) and | p | + 1 ≤ | w|    2 (| p | + 1). Now, if we consider two states q, q and a word w ∈ pre(q) ∩ pre(q ), it follows from Proposition 7 that there exist a state q and two words w  , w  with w = w  w  and | w  | ≤ r such that w  ∈ pre(q ) and there exist a path from   q to q labeled by w  and a path from q to q labeled by w  . It follows that |q | + 1 ≤ 2| w | (|q | + 1) ≤ 4| w | (|q| + 1). Since | w  | ≤ r, taking K = 4r , we get |q | ≤ K (|q| + 1). 2 We can now state: Theorem 3. A one-rule length-preserving rewrite system S = {u → v } is a rational transduction if and only if its canonical automaton

CAN S is finite.

Proof. We only have to prove the only if part. Let S = {u → v } be a one-rule length-preserving rewrite system such that CAN S is infinite. It follows that automaton LCA S is infinite too. We shall prove that the language L S has an infinite number of residuals and so is not regular. More precisely, we prove that for every integer p there exists a word w ∈ ( A × A )∗ such that

w −1 L S = ∅ ⊆ ( A × A ) p ( A × A )∗ . Indeed, let p be a positive integer and let us consider a state q of LCA S such that |q| ≥ 2 p K where K is the constant of Lemma 22. Let w ∈ pre(q), it follows from Lemma 22 that every state q such that w ∈ pre(q ) satisfies |q | ≥ 2 p . Moreover,  from the definition of succ, one can prove by induction on | w  | that for all q ∈ Q can , w  ∈ pre(q ), |q | ≤ |2| w | . It follows  p    that, if |q | ≥ 2 , we get that every word w in post(q ) satisfies | w | ≥ p. This implies

w −1 L S =



post(q) ⊆ ( A × A ) p ( A × A )∗

q, w ∈pre(q)

that proves the theorem.

2

As a consequence of Theorem 3, we obtain, in the case of a one-rule rewrite system, the converse of a result of Bala Ravikumar in [19] that gives a sufficient condition for a length-preserving rewrite system to be a rational transduction. This condition is based on the notion of change-bounded length-preserving rewrite system that was introduced in the same article and that we recall here in another but equivalent form. Observe that the definition of rps, introduced in the case of one-rule length-preserving rewrite systems makes sense for arbitrary rewrite systems. Definition 7. A length-preserving rewrite system S = {u 1 → v 1 , . . . , ut → v t } is called change-bounded (by K ) if there exists an integer K such that for every rps s and every integer j, |s| j ≤ K . Intuitively, this means that in every derivation in S, the number of times that a change (an application of a rule) is made at a same position during the derivation is bounded by K . Bala Ravikumar has proved: Proposition 8. (See [19].) A change-bounded length-preserving rewrite system is a rational transduction. Now, from Theorem 3, we get that if a one-rule length-preserving rewrite system is a rational transduction then its canonical automaton CAN S is finite. This implies that LCA S is finite too and, since the state set Q lca is the set of all left contributions, it follows that for every left contribution c, |c |0 is bounded. This directly implies that S is change-bounded by |c |0 = |s| j if c = cont j (s) and we have: Theorem 4. A one-rule length-preserving string rewrite system is a rational transduction if and only if it is change-bounded.

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

221

Clearly, for every one-rule length-preserving rewrite system S = {u → v }, for all words w and w  ∈ S ( w ), it holds that if S is change-bounded then the system { w → w  } is change-bounded too, and we get: Corollary 7. For every one-rule length-preserving rewrite system S = {u → v }, for all words w and w  ∈ S ( w ), if S is a rational transduction then the system { w → w  } is a rational transduction. We also easily get: Corollary 8. Let S = {u → v } be a one-rule length-preserving rewrite system and h be a letter-to-letter morphism such that h(u ) = h( v ). If the system {h(u ) → h( v )} is a rational transduction then S is a rational transduction. Proof. Clearly, if {h(u ) → h( v )} is change-bounded then S is change-bounded.

2

5. A sufficient condition for context-freeness of L The language L S of Definition 3 need not even be context-free as we have seen in Example 4. The aim of this section is to identify a recursive family of length-preserving rewrite systems for which the language L S is always context-free. This family is the family of systems S such that the state set Q lca of automaton LCA S satisfies the single continuation property that we now introduce: Definition 8. Let  be an alphabet. A language L ⊆  ∗ satisfies the single continuation property if for every a ∈  there exists a unique b ∈  such that ab ∈ F( L ). We first prove that the single continuation property is easily decidable. We have: Proposition 9. The state set Q lca satisfies the single continuation property if and only if 1. u = xyz, v = zyx for some words x, y , z with X = {x} and Z = { z}, 2. A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅. Some lemmas are needed to prove this proposition. Lemma 23. 1. 2. 3. 4.

j0 ∈ Q lca if and only if S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ 0i ∈ Q lca if and only if S ( v A ∗ ) ∩ u i . . . un A ∗ = ∅ and (i , 0) ∈ / θS tn ∈ Q lca if and only if S ( A ∗ v 0 . . . v t ) ∩ A ∗ u = ∅ and (n, t ) ∈ / θS ns ∈ Q lca if and only if S ( A ∗ v ) ∩ A ∗ u 0 . . . u s = ∅.

Proof. We only prove 1, proofs of 2, 3 and 4 being very similar. Let us first suppose that S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ and let us consider two cases: 1. v j . . . v n A ∗ ∩ u A ∗ = ∅. In this case, v = v  x and u = xu  for some words v  and u  with x = v j . . . v n . It follows uu  − → 0 vu  − → v  v that is a left derivation so j0 = cont j (0 j ) ∈ Q lca . j ∗ ∗    → w −→ u w be a left derivation of minimal length for some words w , w , w  , 2. v j . . . v n A ∩ u A = ∅. Let v j . . . v n w − s

p

some rps s and some index p. By the minimality of the length of the derivation, we get w  ∈ / u A ∗ and |s|0 = 0. Moreover, since it is a left derivation, we have p ≤ n and ( p , 0) ∈ / θ S so sp0 is a left rps. Let us consider the derivation uw − → v w −→ v 0 . . . v j −1 v w  with s = shift j (sp0). This derivation is left and cont j (0s ) = j0 so j0 ∈ Q lca .  0

s

Conversely, let j0 ∈ Q lca . In this case, there exists a left rps s and an index k ∈ N such that contk (s) = j0. It follows that s = s (k − j )s ks with

contk (s ) = contk (s ) = contk (s ) = ε and we can suppose s = s = ε . Moreover, since contk (s ) = ε , it follows that for every index t such that |s |t > 0, either t > k or t < k − n and s = k (s ) because (k − j )s is left. By (k − j )k (s )k ∈ R θ S ((k − j )>k (s )kk (s )kk (s )k is a rps, which is moreover left. By (k − j )>k (s )k = shiftk− j (0α j ), it follows that 0α j is also left and satisfies t > j for every t such that |α |t > 0. Now, since 0α j is a left rps, there exists a left derivation u w 1 − → v w 1 −→ w 2 − → w 3 for some words w 1 , w 2 , w 3 . From the propj α 0 erty |α |t > 0 ⇒ t > j, we get that w 2 = v 0 . . . v j −1 w 2 for some word w 2 and w 3 = v 0 . . . v j −1 w 3 for some word w 3 and

222

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

we have the derivation v j . . . v n w 1 − −→ w 2 − → w 3 with shift j (α  ) = α . This implies u ∈ PF( w 2 ) and S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ α 0 that finishes the proof. 2 Lemma 24. If A ∗ U  ∩ A ∗ V  = ∅ then 1. 2. 3. 4.

/ F( V ∗ v ), u∈ S ( v A ∗ ) = V ∗ v A ∗ , j0 ∈ Q lca if and only if v j . . . v n ∈ X , 0i ∈ Q lca if and only if u i . . . un ∈ Z \ X .

Proof. 1. Suppose that u ∈ F( V ∗ v ), then u = u 1 u 2 with u 1 = ε ∈ SF( V + ) and u 2 = ε ∈ PF( v ). It follows u 2 ∈ Z so u 1 ∈ U  , a contradiction. 2. For every k ≥ 0, for every word w ∈ V k v A ∗ and for every word w  such that w → w  , w  ∈ V k+1 v A ∗ ∪ V k v A ∗ since u∈ / F( V ∗ v ). It follows by induction that S ( v A ∗ ) ⊆ S ( V ∗ v A ∗ ) ⊆ V ∗ v A ∗ . Conversely, we clearly have for any k ≥ 0, V k+1 v A ∗ ⊆ S ( V k v A ∗ ) so by induction we get V ∗ v A ∗ ⊆ S ( v A ∗ ). 3. From Item 1 of Lemma 23, we have j0 ∈ Q lca if and only if S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅. Since v j . . . v n ∈ X implies S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅, it remains to prove that, if A ∗ U  ∩ A ∗ V  = ∅, S ( v j . . . v n A ∗ ) ∩ u A ∗ = ∅ implies v j . . . v n ∈ X . ∗ Assume that v j . . . v n ∈ / X and that we have a derivation v j . . . v n w → w 1 −→ u w  . Clearly, one can assume that w 1 ∈ / ∗ ∗  ∗ v j . . . v n A . Then w 1 ∈ v j . . . v p v A with v 0 . . . v p ∈ V . Thus u ∈ / F( v j . . . v p V v ) and u w  ∈ S ( w 1 ) ⊆ v j . . . v p S ( v A ∗ ) = v j . . . v p V ∗ v A ∗ from item 2 but, according to item 1, this leads to a contradiction. / θ S ; it follows from item 2 of Lemma 23 4. Assume first that u i . . . un ∈ Z \ X . Then S ( v A ∗ ) ∩ u i . . . un A ∗ = ∅ and (i , 0) ∈ / θ S . Since S ( v A ∗ ) = V ∗ v A ∗ from that 0i ∈ Q lca . Conversely, if 0i ∈ Q lca we get S ( v A ∗ ) ∩ u i . . . un A ∗ = ∅ and (i , 0) ∈ item 2, u i . . . un ∈ PF( V ∗ v ) and u i . . . un ∈ V ∗ Z . If u i . . . un ∈ V + Z , there exists some p > i such that u p . . . un ∈ Z and u 0 . . . u p −1 ∈ U  ∩ A ∗ V  , a contradiction. So u i . . . un ∈ Z and, since (i , 0) ∈ / θs , u i . . . u n ∈ Z \ X . 2 McNaughton [15] calls {u → v } right barren if u ∈ / F( v ) and U  ∩ SF( V ∗ ) = ∅. He proves Lemma 24, items 1 and 2 in the + case PF(u ) ∩ SF(u ) ∩ A = ∅ in order to prove termination of right barren one-rule string rewrite systems. Symmetrically, we can prove: Lemma 25. If U  A ∗ ∩ V  A ∗ = ∅ then 1. 2. 3. 4.

/ F( v V ∗ ), u∈ S ( A ∗ v ) = A ∗ v V ∗ , tn ∈ Q lca if and only if v 0 . . . v t ∈ Z \ X , ns ∈ Q lca if and only if u 0 . . . u s ∈ X .

We can now prove the if part of Proposition 9: Lemma 26. If u = xyz and v = zyx for some words x, y , z with X = {x} and Z = { z}, and A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅ then the state set Q lca satisfies the single continuation property. Proof. Let x = v j . . . v n = u 0 . . . u s , z = v 0 . . . v t = u i . . . un and y = u s+1 . . . u i −1 = v t +1 . . . v j −1 . Observe that U  = {xy }, V  = { zy }, U  = { yz} and V  = { yx}. Since A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅, we have A ∗ U  ∩ A ∗ V  = U  A ∗ ∩ V  A ∗ = ∅. It follows from Lemmas 24 and 25:

• • • •

0k ∈ k0 ∈ kn ∈ nk ∈

Q lca ⇐⇒ k = i, Q lca ⇐⇒ k = j, Q lca ⇐⇒ k = t, Q lca ⇐⇒ k = s.

We claim that for every k ∈ Nn , there exists some k such that kk ∈ Q lca . To prove this, let us consider the deriva→ xyzyxyz −→ zyxyxyz −−→ zyxyzyx − → zyzyxyx. Observe that s = i0(i + j ) j = leftrps(xyxyzyz, zyzyxyx): indeed, tion xyxyzyz − i

0

i+ j

j

(i , 0) ∈ / R θ S since x A ∗ ∩ z A ∗ = ∅ and that also implies (i + j , j ) ∈ / R θS . Hence we get from Lemma 21 and Lemma 20 that for every 0 ≤ k ≤ n, conti +k (s) in Q lca . Moreover conti +k (s) ∈ kNn+ : indeed, assume conti +k (0) = ε and conti +k (i + j ) = ε then it follows i + k > n and k < j. That implies i + k − j > 0 and i + k − j < n so conti +k ( j ) = ε . Now, since Q lca is closed under factors, it follows that there exists k with kk ∈ Q lca . Hence, we can consider the smallest k such that there exist p and p  with p < p  and kp ∈ Q lca , kp  ∈ Q lca . From the above, k > 0 and k < n. Let us distinguish two cases:

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

223

1. If p > 0, let q, q ∈ Q lca such that kp ∈ succ(q) and kp  ∈ succ(q ). Since k is minimal, we cannot have the following possibilities: • q = (k − 1)( p − 1) and q = (k − 1)( p  − 1), • q = (k − 1)n( p − 1) and q = (k − 1)( p  − 1), • q = (k − 1)( p − 1) and q = (k − 1)n( p  − 1). So q = (k − 1)n( p − 1) and q = (k − 1)n( p  − 1), but this implies p − 1 = p  − 1 = s, a contradiction. 2. If p = 0, it follows k = j. Let us consider two cases: • If j ≥ p  , let us consider the following different states accessible from jp  : ( j + 1)( p  + 1), . . . , ns. This leads to a contradiction since n − j = s and p  > 0. • If j < p  , let us consider the following states co-accessible from jp  : ( j − 1)( p  − 1), . . . 0i. It follows v 0 . . . v j −1 = u i . . . u p  −1 . This is a contradiction since zy = v 0 . . . v j −1 and z = u i . . . un . Finally, the state set Q lca satisfies the single continuation property.

2

The following lemma proves the only if part of Proposition 9: Lemma 27. If the state set Q lca satisfies the single continuation property then u = xyz and v = zyx for some words x, y , z with X = {x} and Z = { z}, and A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅. Proof. Since Q lca satisfies the single continuation property, there exists an index i = 0 such that 0i ∈ Q lca and there exists an index s = n such that ns ∈ Q lca . It follows that 1(i + 1), 2(i + 2), . . . , (n − i )n are in Q lca and (n − 1)(s − 1), . . . , (n − s)0 are in Q lca . Let us denote t = n − i and j = n − s, we have v j . . . v n = u 0 . . . u s = x and v 0 . . . v t = u i . . . un = z. Assume t ≥ j and let p = i + j. We get jp ∈ succ j (0i ) and jp ∈ predt − j (tn) so jp ∈ Q lca , a contradiction since p = 0 and j0 ∈ Q lca . So t < j and s = n − j < n − t = i that implies j − t = i − s. Let k be the (unique) index such that (t + 1)k ∈ Q lca , we consider two cases: 1. if k > 0 then t (k − 1) ∈ Q lca so k = s + 1 and (t + 1)(s + 1), (t + 2)(s + 2) . . . ( j − 1)(i − 1) are in Q lca . We get v t +1 . . . v j −1 = u s+1 . . . u i −1 = y so we have u = xyz and v = zyx. 2. if k = 0, since s < i, we have (t + 1)0, (t + 2)1, . . . ns ∈ Q lca so t = j − 1 and we have u = xyz and v = zyx with y = ε . Observe that, as a consequence, the function follow that is defined for every index 0 ≤ p ≤ n by follow( p ) = q such that pq ∈ Q lca is surjective so it is bijective and we have both ∀ p ∃!q | pq ∈ Q lca and ∀q∃! p | pq ∈ Q lca . Thus, we get that j is the unique index such that j0 ∈ Q lca and this implies X = {x}. Let us now consider i  = i such that u i  . . . un ∈ Z . Since 0i  ∈ / Q lca , it follows from Lemma 23 that (i  , 0) ∈ θ S which implies, from Lemma 4, u i  . . . un ∈ X ∩ Z , so u i  . . . un = x. Let us consider two cases: 1. If |x| ≤ | z| then z = z1 x = xz2 for some words z1 and z2 . It follows u = xyxz2 and v = z1 xyx, but that implies xyx ∈ X , a contradiction. 2. If | z| < |x|. This implies z ∈ X , a contradiction since X = {x}. It follows that u i  . . . un ∈ Z implies i  = i so Z = { z}. It remains to prove A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅. v and it follows that |x| = | z| implies A ∗ x ∩ A ∗ z = ∅. Hence, we Suppose first that A ∗ x ∩ A ∗ z = ∅. Note that x = z by u = have to consider two cases: 1. |x| > | z|. In this case z ∈ SF(x), so there exists an index p such that z = u p . . . u s so zyz = u p . . . un and that implies vyz = zyu → zyv = zyzyx ∈ u p . . . un A ∗ so S ( v A ∗ ) ∩ u p . . . un A ∗ = ∅. Observe that we have p = i since s < i so we cannot have 0p ∈ Q lca . From Lemma 23, item 2, which implies ( p , 0) ∈ θ S so, from Lemma 4, u p . . . un = zyz ∈ X ∩ Z , a contradiction since Z = { z}. 2. |x| < | z|. In this case x ∈ SF( z), so z = v 0 . . . v p −1 x for some index p which implies xyx = v p . . . v n . It follows v p . . . v n yz = xyu → xyv = uyx and, from Lemma 23, item 1, p0 ∈ Q lca . But this would imply p = j, a contradiction since p < | z| ≤ j. Symmetrically, we can prove x A ∗ ∩ z A ∗ = ∅.

2

We now prove the main result of this section that was stated at its beginning: if the state set Q lca of automaton LCA S satisfies the single continuation property then the language L S is context-free. In order to prove this result, we consider a more general case. Indeed, if the state set Q lca satisfies the single continuation property then it is also the case for the state set Q can of automaton CAN S . The converse is generally false as shown in the following examples: Example 10. Let S 10 = {bab → aba}. We have v 0 = u 1 , v 1 = u 0 , v 1 = u 2 and v 2 = u 1 . The state set Q can = {ε , 0, 1, 01, 10} satisfies the single continuation property while Q lca = {ε , 0, 1, 2, 01, 10, 12, 21, 012, 210} does not satisfy it.

224

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

Remark 2. We observe in the previous example that Q lca and Q can are finite. As a matter of fact, if the state set Q lca satisfies the single continuation property then it is infinite and there exist words w 0 , w 1 , . . . , w t ∈ Nn∗ such that Q lca = F( w ∗0 + w ∗1 + · · · + w t∗ ). It follows that, in this case, L S is context-free and not regular. Concerning Q can , it is possible that a finite Q can satisfies the single continuation property, on the other hand, when it is infinite, it also satisfies the property that there exist words w 0 , w 1 , . . . , w t ∈ Nn∗−1 such that Q can = F( w ∗0 + w ∗1 + · · · + w t∗ ). In the following example, Q lca and Q can are infinite but Q can satisfies the single continuation property while Q lca does not satisfy it. Example 11. Let S 11 = {acaba → abaca}. We have v 0 = v 2 = v 4 = u 0 = u 2 = u 4 , v 1 = u 3 and v 3 = u 1 . It is easily seen that 22 is neither accessible nor co-accessible and (4, 0) ∈ θ S so Q lca ∩ N2 = {02, 13, 20, 24, 31, 40, 42}. One can also verify that 202 is not accessible and 242 is not co-accessible and finally Q lca = F(42(402)∗ 0 + (13)∗ ). So Q lca is infinite and does not satisfy the single continuation property while Q can = F((02)∗ + (13)∗ ) does. Following the above observations, we shall finally prove that if Q can satisfies the single continuation property then the language L S is context-free. This proof is based on the following crucial lemma. Definition 9. Let J be a set of non-negative integers, for every sequence p ∈ J ∗ , we define first( p ) and last( p ) as follows:

• if p = ε then first( p ) = last( p ) = ε , • if p ∈ i J ∗ ∩ J ∗ j then first( p ) = i and last( p ) = j with i , j ∈ J . By convention, we set |ε |ε = 0. We also denote ≡ the equivalence relation defined over CAN S by: p ≡ q if and only if first( p ) = first(q) and last( p ) = last(q). Clearly, this equivalence relation is of finite index. For every state q ∈ Q can , we denote JqK the equivalence class of state q for the relation ≡. Lemma 28. Let CAN S be the canonical automaton associated with a one-rule length-preserving rewrite system S such that Q can satisfies the single continuation property. Let δ = ( p , x/ y , q) and δ  = ( p  , x/ y , q ) in can with p ≡ p  and q ≡ q . Then |q|first(q) − | p |first( p ) = |q |first(q) − | p  |first( p ) ∈ {−1, 0, +1}. Proof. The property is clearly satisfied if p = ε since in this case p  = ε and q = q = 0 or q = q = ε . If p = ε and q = ε then q = ε and p , p  ∈ (n − 1)+ but, since the factor nn is not allowed in left contributions, we have p = p  = n − 1 else 0 would appear as a factor in q or in q . Hence, we can suppose p = ε , q = ε and first( p ) = first( p  ) = i, first(q) = first(q ) = i  , last( p ) = last( p  ) = j, last(q) = last(q ) = j  . From the definition of succ, we have to consider several cases: 1. 2.

i  = i + 1. Then |q|i  − | p |i = |q |i  − | p  |i = 0. i  = 0 and i < n − 1. This implies 0(i + 1) ∈ PF(q), so 0(i + 1) ∈ Q can , |q|i +1 = | p |i and |q |i +1 = | p  |i . From the equalities 





|q|0 − |q|i +1 = |q |0 − |q |i +1 = we get

|q|0 − | p |i = |q |0 − | p  |i =



if j  = 0, +1 if j  = 0

0

if j  = 0, +1 if j  = 0.

0

3. i  > 0 and i = n − 1. Then (n − 1)(i  − 1) ∈ PF( p ), so (n − 1)(i  − 1) ∈ Q can , |q|i  = | p |i  −1 and |q |i  = | p  |i  −1 . From the equalities

| p |i  −1 − | p |n−1 = | p  |i  −1 − | p  |n−1 = we get

|q|i  − | p |n−1 = |q |i  − | p  |n−1 =





0 if j = n − 1, −1 if j = n − 1

if j = n − 1, −1 if j = n − 1.

0

4. i  = 0 and i = n − 1 and (n − 1)k ∈ Q can with k < n − 1. Then |q|k+1 = | p |k and |q |k+1 = | p  |k . It follows

| p |k − | p |i = | p  |k − | p  |i =



if j = i , −1 if j = i

0

and

|q|0 − |q|k+1 = |q |0 − |q |k+1 =



if j  = 0, +1 if j  = 0

0

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

225

so

 



|q|0 − | p |i = |q |0 − | p |i =

+1 if j = i and j  = 0, 0, −1 if j = i and j  = 0 in the other cases.

5. i = n − 1, i  = 0 and (n − 1)(n − 1) ∈ Q can . In this case, p and p  belong to (n − 1)+ and q and q belong to 0+ . By definition of CAN S , there exists a transition ( p  , x/ y , q ) ∈ lca such that the state q ∈ Q lca satisfies q ∈ F((0n)∗ ), |q |n = | p |n−1 and |q |0 = |q|0 . Moreover

 



|q |0 − |q |n =

+1 if q ∈ (0n)∗ 0, −1 if q ∈ (n0)∗ n, 0

in the other cases.

Now, from the definition of the transitions in LCA S , we have

 



|q |0 − |q |n =

+1 if x/ y = u 0 / v 0 , −1 if x/ y = un / v n , 0

in the other cases.

Observe that this property is consistent since in this case u 0 = un : first, we prove by contradiction that for every 1 ≤ k ≤ n − 1, kk ∈ Q lca . For this, let us consider the biggest k such that kk ∈ / Q lca . Then knk ∈ Q lca : indeed, if k = n − 1 then kk ∈ Q can else (k + 1)(k + 1) ∈ Q lca and in both cases it follows knk ∈ Q lca . This implies {(n − 1)(k − 1), (n − 1)n(k − 1)} ∩ Q lca = ∅ and so (n − 1)(k − 1) ∈ Q can , a contradiction since Q can satisfies the single continuation property, (n − 1)(n − 1) ∈ Q can and k ≤ n − 1. Now, since kk ∈ Q lca for every 1 ≤ k ≤ n − 1, we get v 1 . . . v n−1 = u 1 . . . un−1 . Moreover, 0n0 ∈ Q lca since 11 ∈ Q lca so v 0 = un and v n = u 0 . It follows that u 0 = un since u = v. Finally, since |q |n = | p |n−1 and |q |0 = |q|0 , and using a similar reasoning for p  and q , we obtain

 



|q|0 − | p |n−1 = |q |0 − | p |n−1 =

+1 if x/ y = u 0 / v 0 , −1 if x/ y = un / v n , 0

2

in the other cases.

Clearly, when Q can satisfies the single continuation property then, for every δ = ( p , x/ y , q) in can , if |q|first(q) = | p |first( p ) + 1 then first(q) = last(q) = 0 and if |q|first(q) = | p |first( p ) − 1 then first( p ) = last( p ) = n − 1 so we directly obtain as a consequence of Lemma 28: Corollary 9. Let CAN S be the canonical automaton associated with a one-rule length-preserving rewrite system S such that its state set Q can satisfies the single continuation property. Then can = 0 ∪ + ∪ − with

0 = {( p , x/ y , q) ∈ can | |q|first(q) = | p |first( p ) }, + = {( p , x/ y , q) ∈ can | first(q) = last(q) = 0 ∧ |q|0 = | p |first( p ) + 1} and − = {( p , x/ y , q) ∈ can | first( p ) = last( p ) = n − 1 ∧ |q|first(q) = | p |n−1 − 1}. In particular, we get

(ε , x/x, ε ) ∈ 0 , (ε , u 0 / v 0 , 0) ∈ + , and (n − 1, un / v n , ε ) ∈ − . We can now prove: Theorem 5. Let CAN S be the canonical automaton associated with a one-rule length-preserving rewrite system S such that Q can satisfies the single continuation property. Then L S is a context-free language. Proof. We construct from CAN S a pushdown automaton that recognizes L S . Intuitively, in this automaton, the states will be the equivalence classes of the relation ≡ and the stack will give information on the size of the label of the corresponding state in Q can . More precisely, we define the pushdown automaton PDA S = { A , , , ⊥, Jε K, } where

• • • • •

A is the input alphabet,

= {JqK | q ∈ Q can } is the set of states,  = {⊥, } is the stack alphabet, ⊥ is the initial stack symbol, Jε K is the initial and final state,

226

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

•  is the set of rules of the automaton defined by:

 = {(Jε K, x/x, ⊥) → (Jε K, ⊥) | x ∈ A } ∪ {(Jε K, u 0 / v 0 , ⊥) → (J0K, ⊥)} ∪ {(Jn − 1K, un / v n , ⊥) → (Jε K, ⊥)} ∪ {(J p K, x/ y , γ ) → (JqK, γ ) | ( p , x/ y , q) ∈ 0 , p = ε , q = ε , γ ∈ } ∪ {(J p K, x/ y , γ ) → (JqK, γ ) | ( p , x/ y , q) ∈ + , γ ∈ , p = ε } ∪ {(J p K, x/ y , ) → (JqK, ε ) | ( p , x/ y , q) ∈ − , q = ε }. Using Corollary 9 and Lemma 28, it is easy to verify that one can establish a one to one correspondence between every run in CAN S with a run in PDA S : more precisely, one can prove by induction over the length of the runs that any path in CAN S from ε to a state q labeled by w ⊗ w  corresponds to a run labeled by w ⊗ w  in PDA S starting from the initial configuration (Jε K, ⊥) and reaching the configuration (JqK, ⊥t ) with t = 0 if q = ε and t = |q|first(q) − 1 if q = ε . In particular, there exists a one to one correspondence between every path from state ε to state ε in CAN S labeled by a word of L S with a successful run labeled by the same word in PDA S . This implies that PDA S recognizes L S so L S is a context-free language. 2 Example 12. The pushdown automaton corresponding to the canonical automaton for S 8 = {cba → abc } given in Example 9 is the automaton

PDA S 8 = { A , {qε , q0 , q1 }, {⊥, }, ⊥, qε , } where qε = Jε K, q0 = J0K = 0+ , q1 = J1K = 1+ and 4

={ (qε , a/a, ⊥) → (qε , ⊥), (qε , b/b, ⊥) → (qε , ⊥), (qε , c /c , ⊥) → (qε , ⊥), (qε , c /a, ⊥) → (q0 , ⊥), (q1 , a/c , ⊥) → (qε , ⊥), (q1 , a/a, ⊥) → (q0 , ⊥), (q1 , c /c , ⊥) → (q0 , ⊥), (q0 , b/b, ⊥) → (q1 , ⊥), (q1 , a/a, ) → (q0 , ), (q1 , c /c , ) → (q0 , ), (q0 , b/b, ) → (q1 , ), (q1 , c /a, ⊥) → (q0 , ⊥), (q1 , c /a, ) → (q0 , ), (q1 , a/c , ) → (q0 , ε ) } As a corollary of Proposition 9 and Theorem 5 we get: Corollary 10. Let S = {u → v } be a one-rule length-preserving rewrite system such that 1. u = xyz, v = zyx for some words x, y , z with X = {x} and Z = { z}, 2. A ∗ x ∩ A ∗ z = x A ∗ ∩ z A ∗ = ∅, then L S is a context-free language. From Proposition 6, the pushdown automaton  built in the proof of Theorem 5 is deterministic if and only if u 0 = v 0 . Nevertheless, from Proposition 7 we know that automata LCA S and CAN S are not very far from being deterministic and it is not too difficult, using Proposition 7 and Lemma 28, to provide a deterministic pushdown automaton that is equivalent to pushdown automaton PDA S . As a consequence, in the case when the state set Q can satisfies the single continuation property, we obtain a linear algorithm for the accessibility problem that is to decide, given a one-rule length-preserving rewrite system S and two words w and w  whether w  ∈ S ( w ). 6. Conclusion In this paper, we have defined the canonical automaton CAN S associated with a one-rule length-preserving string rewrite system S. This automaton is based on the notion of rps and we think that it could be a crucial tool in order to solve

4

Recall that, in this example, Q can = 0∗ + 1∗ .

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

227

Conjecture 1. This automaton being with a bounded delay, we also think that it should induce a polynomial algorithm to solve the accessibility problem. Some decision problems remain open. For instance, Theorem 4 and Theorem 3 give a characterization of one-rule lengthpreserving rewrite systems that are rational transductions. Nevertheless, we do not know if these characterizations are decidable: we do not know yet how to decide whether a system is change-bounded or not and an important open question is the recursivity of the set Q can of states of the canonical automaton. As a matter of fact, we conjecture that Q can is always a regular language: as seen in the previous section, it is the case when Q can satisfies the single continuation property but it is also the case for instance for the system S 6 = {baa → aab} while we have seen, in the beginning of Section 4, that for this system L S is not a context-free language. Indeed, it is not difficult to check that, in this case, Q lca = F((1 + 20)∗ ) so Q can = (0 + 1)∗ is a regular language. It is worth noting that the conjecture Q can is a regular language is equivalent to the conjecture Q lca is a regular language. Indeed, if Q lca is a regular language, then, clearly, Q can is also a regular language. The converse is also true: one can retrieve Q lca from Q can using only operations that preserve regularity. Proving the conjecture does not seem to be an easy task, anyway: the state set Q lca is defined as the intersection succ∗ (ε ) ∩ pred∗ (n); unfortunately, there is no hope to have succ∗ (ε ) and pred∗ (n) both regular: for S 12 = {baa → abb}, Q lca is finite because pred∗ (n) = {2, 1, 0, ε , 2.0.2, 0.2, 2.0} is finite but succ∗ (ε ) is not a regular language. Indeed, in this case, the set >0 (succ∗ (ε )) is equal to the set of factors of the Fibonacci word that is not regular: indeed it has been proved in [9] that the Fibonacci word is fourth power-free. Some questions also remain in the context of the single continuation property and the context-freeness of the language L S : we have proved that the single continuation property for Q can is a sufficient condition to get the context-freeness of L S . Conversely, if we consider the system {dacad → cadac}, it is easy to verify that 1.1 and 1.3 are both in Q can so Q can does not satisfy the single continuation property while L S is regular and so context-free. However, we do not have an example of a system such that Q can does not satisfy the single continuation property while L S is a context-free non-regular language. Similarly, if we consider the construction of the pushdown automaton in the proof of Theorem 5, we can wonder if, when L S is context-free, it is always the case that there exists a pushdown automaton recognizing L S using an initial stack symbol and only one extra stack symbol. Another question deserves to be studied. If for a given one-rule length-preserving rewrite system S, its associated language L S is context-free then S is a context-free transduction: indeed, in this case, there clearly exist two morphisms g and h such that S ( w ) = g (h−1 ( w ) ∩ L S ) for every word w. As a consequence, the image of any regular language by S is a context-free language. Such transformations that transform any regular language into a context-free language are called algebrico-rational in [3] where are characterized the semi-commutation systems, so special cases of length-preserving rewrite systems, that are algebrico-rational. This characterization implies in particular that every one-rule semi-commutation system is algebrico-rational. So, what happens if we consider a one-rule length-preserving rewrite system S that is not a semicommutation system and such that L S is not a context-free language like system S 6 = {baa → aab}? As a matter of fact, for ∗ this system, one can define two morphisms g  and h such that, for every word w, S 6 ( w ) = g  (h−1 ( w ) ∩ D ∗ 1 (x, y ) (a + b ) ) ∗ where D 1 (x, y ) denotes the semi-Dyck language over the alphabet {x, y } and  denotes the shuffle operation. Thus S 6 is a context-free transduction. So the non-context-freeness of language L S does not allow to disprove the conjecture, stated in [13], that every one-rule length-preserving rewrite system is algebrico-rational and more precisely is a context-free transduction. Acknowledgments The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. References [1] Véronique Bruyère, Automata and codes with bounded deciphering delay, in: Imre Simon (Ed.), LATIN, in: Lecture Notes in Computer Science, vol. 583, Springer, 1992, pp. 99–107. [2] Mireille Clerbout, Michel Latteux, Semi-commutations, Inf. Comput. 73 (1) (1987) 59–74. [3] Mireille Clerbout, Yves Roos, Semicommutations and algebraic languages, Theor. Comput. Sci. 103 (1) (1992) 39–49. [4] Nachum Dershowitz, Open. Closed. Open, in: Jürgen Giesl (Ed.), RTA, in: Lecture Notes in Computer Science, vol. 3467, Springer, 2005, pp. 376–393. [5] Samuel Eilenberg, Automata, Languages and Machines. Volume A, Pure Appl. Math., vol. 59, Academic Press, New York, 1974. [6] Calvin C. Elgot, Jorge E. Mezei, On relations defined by generalized finite automata, IBM J. Res. Dev. 9 (1) (January 1965) 47–68. [7] Alfons Geser, Decidability of termination of grid string rewriting rules, SIAM J. Comput. 31 (4) (2002) 1156–1168. [8] Alfons Geser, Termination of string rewriting rules that have one pair of overlaps, in: Robert Nieuwenhuis (Ed.), RTA, in: Lecture Notes in Computer Science, vol. 2706, Springer, 2003, pp. 410–423. [9] Juhani Karhumäki, On cube-free ω -words generated by binary morphisms, Discrete Appl. Math. 5 (3) (1983) 279–297. [10] Yuji Kobayashi, Masashi Katsura, Kayoko Shikishima-Tsuji, Termination and derivational complexity of confluent one-rule string-rewriting systems, Theor. Comput. Sci. 262 (1–2) (2001) 583–632. [11] Winfried Kurth, Termination und Konfluenz von Semi-Thue-Systemen mit nur einer Regel, PhD thesis, Technische Universität Clausthal, 1990. [12] Winfried Kurth, One-rule semi-Thue systems with loops of length one, two or three, Inform. Théor. Appl. 30 (5) (1996) 415–429. [13] Michel Latteux, Yves Roos, One-rule length-preserving rewrite systems and rational transductions, RAIRO Theor. Inform. Appl. 48 (2) (2014) 149–171. [14] Éric Lilin, Une généralisation des semi-commutations, Technical report IT-210, Laboratoire d’Informatique Fondamentale de Lille, Université de Lille 1, France, April 1991, in French.

228

M. Latteux, Y. Roos / Information and Computation 244 (2015) 203–228

[15] Robert McNaughton, The uniform halting problem for one-rule semi-Thue systems, Technical report 94-18, Resselaer Polytechnic Institute, Troy, NY, August 1994. [16] Robert McNaughton, Semi-Thue systems with an inhibitor, J. Autom. Reason. 26 (4) (2001) 409–431. ´ [17] Yves Métivier, Edward Ochmanski, On lexicographic semi-commutations, Inf. Process. Lett. 26 (2) (1987) 55–59. [18] Wojciech Moczydłowski, Alfons Geser, Termination of single-threaded one-rule semi-Thue systems, in: Jürgen Giesl (Ed.), Term Rewriting and Applications, in: Lecture Notes in Computer Science, vol. 3467, Springer, Berlin/Heidelberg, 2005, pp. 338–352. [19] Bala Ravikumar, Peg-solitaire, string rewriting systems and finite automata, Theor. Comput. Sci. 321 (2–3) (2004) 383–394. [20] Géraud Sénizergues, On the termination problem for one-rule semi-Thue system, in: Harald Ganzinger (Ed.), Rewriting Techniques and Applications, in: Lecture Notes in Computer Science, vol. 1103, Springer, Berlin/Heidelberg, 1996, pp. 302–316. [21] Alain Terlutte, David Simplot, Iteration of rational transductions, Inform. Théor. Appl. 34 (2) (2000) 99–130. [22] Hans Zantema, Alfons Geser, A complete characterization of termination of 0 p 1q → 1r 0s , Appl. Algebra Eng. Commun. Comput. 11 (1) (2000) 1–25.