A combinatorial property of the factor poset of a word☆

A combinatorial property of the factor poset of a word☆

Information Processing Letters 81 (2002) 35–39 A combinatorial property of the factor poset of a word ✩ Arturo Carpi a , Aldo de Luca b,c,∗ a Istitut...

77KB Sizes 1 Downloads 57 Views

Information Processing Letters 81 (2002) 35–39

A combinatorial property of the factor poset of a word ✩ Arturo Carpi a , Aldo de Luca b,c,∗ a Istituto di Cibernetica del CNR, via Toiano 6, 80072 Arco Felice (NA), Italy b Dipartimento di Matematica dell’Università di Roma “La Sapienza”, piazzale Aldo Moro 2, 00185 Roma, Italy c Centro Interdisciplinare “B. Segre”, Accademia dei Lincei, via della Lungara 10, 00100 Roma, Italy

Received 25 January 2001; received in revised form 16 February 2001 Communicated by L. Boasson

Abstract We prove the following interesting combinatorial property of the poset of the factors of a word. Let w be a word and n = Gw + 2, where Gw is the maximal length of a repeated factor of w. If v is any word such that the posets of the factors of v and of w up to length n are isomorphic, then v can be obtained by renaming the letters of w or of the reversal of w.  2002 Elsevier Science B.V. All rights reserved. Keywords: Combinatorial problems; Factor poset; Similar words

1. Introduction A problem of combinatorics on words of great interest both from the theoretical and applied point of view is to get information about a word by knowing suitable sets of its factors (or subwords). In this frame, some uniqueness theorems on words have been recently proved [2–4]. In particular, one has that a word w is uniquely determined by the knowledge of its factors up to length Gw + 2, where Gw is the maximal length of a repeated factor in w [3]. In this paper we consider the following more general problem. Suppose that one knows the ‘shape’ of the graph of the partially ordered set, or poset, Fn (w)

of the factors of a word w up to a certain length n, i.e., Fn (w) is known up to a poset-isomorphism. What can be said about the word? The main result of the paper is that the knowledge of the shape of the graph Fn (w), with n = Gw + 2 determines the word up to a monoid-isomorphism or anti-isomorphism. In other terms, a word v such that the poset Fn (v) is isomorphic to the poset Fn (w) can be obtained by renaming the letters of w or of the reversal of w. For instance, let w be the word w = abbac. In such a case Gw + 2 = 3 and the shape of the graph F3 (w) (where the edges are oriented upward) is reported below.



The work for this paper has been supported by the Italian Ministry of Education under Project COFIN’98 “Modelli di calcolo innovativi: metodi sintattici e combinatori”. * Corresponding author. E-mail addresses: [email protected] (A. Carpi), [email protected] (A. de Luca). 0020-0190/02/$ – see front matter  2002 Elsevier Science B.V. All rights reserved. PII: S 0 0 2 0 - 0 1 9 0 ( 0 1 ) 0 0 1 9 2 - 2

36

A. Carpi, A. de Luca / Information Processing Letters 81 (2002) 35–39

The only other words whose posets of factors up to length 3 have this shape are cabba and all the words obtained by renaming the letters in abbac and cabba.

2. Preliminaries Let A be a finite non-empty set. We denote by A∗ the free monoid generated by A. The set A is also called alphabet and its elements letters. The elements of A∗ , usually called words, are the finite sequences of letters of A, including the empty sequence, called empty word, which will be denoted by ε. A non-empty word w ∈ A∗ can be written uniquely as a sequence of letters as w = a1 a2 · · · an ,

with

ai ∈ A, 1  i  n, and n > 0. The integer n is called the length of w and denoted by |w|. The length of ε is 0. The reversal (or mirror image) of the word w = a1 a2 · · · an is the word w∼ = an an−1 · · · a1 . Moreover, one sets ε∼ = ε. A word u is a factor, or subword, of a word w if there exist words r and s such that w = rus. If w = us, for some word s (respectively w = ru, for some word r), then u is called a prefix (respectively a suffix) of w. We shall denote by Fact(w) the set of the factors of w. For any n  0, we denote by Fn (w) the set of all factors of w of length  n. For any w ∈ A∗ , alph(w) denotes the set Fact(w) ∩ A. A language over the alphabet A is any subset of A∗ . A language L over A is factorial if all the factors of any word of L belong to L. Let L be a factorial language and u ∈ L. A right extension of u in L is any word of L of the form ua, where a is a letter. In a similar way one defines left extensions of u in L. By extension, without specification, we mean indifferently a right or a left extension. In the case L = Fact(w), where w is a word, the right and left extensions of u ∈ Fact(w) in Fact(w) will be called, simply, right and left extensions of u in w, respectively. A factor u of w ∈ A∗ is called repeated if there are at least two distinct occurrences of u in w, i.e.,

w = rus = r us , with r, r , s, s ∈ A∗ and r = r . In the opposite case, the factor u is called unrepeated. The maximal length of a repeated factor of a nonempty word w will be denoted by Gw . For u, v ∈ A∗ , if u is a factor of v, one writes u  v. This is a partial order relation in A∗ , usually called factorial order. If u  v and u = v, we write u < v. For any word w ∈ A∗ , the set Fact(w) with the order  in A∗ is a poset called the factor poset of w. With any finite poset P one usually associates a directed graph (or diagram) whose nodes are the elements of P and whose edges are the pairs (p, q) such that p < q and there is no r ∈ P such that p < r < q. We remark that if P is a factorial language over the alphabet A, then the edges of the associated directed graph are the pairs (p, q) with p, q ∈ P such that q is an extension of p in P . Let P and Q be two posets. A function θ : P → Q is a poset-isomorphism (see, e.g., [1]) if it is a bijection which satisfies the following condition: for any p, q ∈ P, pq

if and only if

θ (p)  θ (q).

It is well known that when P and Q are finite posets, θ : P → Q is a poset-isomorphism if and only if θ is an isomorphism of the corresponding graphs. Suppose that P and Q are (finite) factorial languages over the alphabets A and B, respectively, and let θ : P → Q be a poset-isomorphism. By the isomorphism of the corresponding graphs, one has that θ is length preserving and that, for any p ∈ P , the extensions of p in P are mapped into extensions of θ (p) in Q. Let A and B be two alphabets. A bijection ϕ : A∗ → ∗ B is a monoid-isomorphism (respectively anti-isomorphism) if for any u, v ∈ A∗ , ϕ(uv) = ϕ(u)ϕ(v) (respectively ϕ(uv) = ϕ(v)ϕ(u)). Clearly, the reversal operation in A∗ is an involutory anti-automorphism of A∗ . One easily verifies that any monoid-isomorphism or anti-isomorphism ϕ : A∗ → B ∗ is also a poset-isomorphism of A∗ and B ∗ . Two words u and v are said to be similar if there exists a monoid-isomorphism or anti-isomorphism of (alph(u))∗ onto (alph(v))∗ mapping u onto v. In other terms, u and v are similar if v is obtained by renaming the letters of u or u∼ . For instance, the words abcc, bbac, bbca, and cbdd are all similar.

A. Carpi, A. de Luca / Information Processing Letters 81 (2002) 35–39

3. Main results

37

By a symmetrical argument, if v = (ba)k b,

Let us introduce in A∗ the relation ≈ defined as follows: for u, v ∈ A∗ , u ≈ v if at least one of the following conditions is verified: (1) u = v, (2) u = v ∼ , (3) there exist a, b ∈ A, k  1 such that u = (ab)k a and v = (ba)k b. Notice that ≈ is an equivalence relation in A∗ and that, whenever u ≈ v, u and v are similar.

v = (ab)k a,

Proposition 1. Let w and w be two words of length n > 1. Denote by u and v the (possibly coincident) factors of w of length n − 1 and by u and v the (possibly coincident) factors of w of length n − 1. If

Since u = u , it follows x = x and s = s . Thus, one has

u≈u





and v ≈ v ,

then w ≈ w . Proof. Let us assume that u and v are respectively the prefix and the suffix of w of length n−1. Moreover, by possibly replacing u , v , and w with their reversal, we can reduce ourselves to the case that u is a prefix of w

and v is a suffix of w . Thus, the following relations hold, for suitable letters x, y, x , y ∈ A:







and w = u y = x v .

w = uy = xv

(1)

By hypothesis, u ≈ u and v ≈ v . We have to distinguish some cases. Let us first consider the case that u = (ab)k a,

u = (ba)k b,

with a, b ∈ A, a = b, and k  1. By replacing u and u in (1) with the preceding expressions one obtains w = (ab)k ay = xv

and w = (ba)k by = x v . (2)

Thus x = a, x = b, v = (ba)k y, and v = (ab)k y

so that, since k  1 and b = a, one has v = v and v = v ∼ = y(ab)k . Since v ≈ v , one derives y = b and y = a. Thus, from (2) one has w = uy = (ab)k+1





w = u y = (ba) so that w ≈ w .

and k+1

= w∼ ,

with a, b ∈ A, a = b, and k  1, one derives that w = w∼ , so that w ≈ w . Thus, in the sequel, we can assume that u = u or u = u∼

and v = v or v = v ∼ .

(3)

First, we consider the case that u = u. From (1), there exist s, s ∈ A∗ such that u = xs,

u = xs,

and u = x s ,

v = sy

v = sy,

v = s y .

v = sy .

If y = y , then v = v , so that, from (3) one derives v = v ∼ , i.e., sy = ys ∼ . Consequently, there exists t ∈ A∗ such that s = yt, s ∼ = ty , and ty = t ∼ y; thus y = y which is a contradiction. We conclude that y = y and w = uy = u y = w . By a symmetrical argument, if v = v one obtains w = w . Let us then consider the only remaining case, that is u = u∼

and v = v ∼ .

In this case, from (1) one has w = u∼ y = x v ∼

and w∼ = yu∼ = v ∼ x.

(4)

By (4) one derives w xy = x v ∼ xy = x yu∼ y = x yw

and w∼ y x = yu∼ y x = yx v ∼ x = yx w∼ . From these equations, by using a classical lemma of Lyndon and Schützenberger [7] (see also [6]), one derives w = (x y)k x δ

and w∼ = (yx )k y δ ,

where k  0 and δ = 0 or = 1 according to the parity of |w|. Since, by hypothesis, |w| = |w | > 1 one has k  1 so that w ≈ w∼ ≈ w. ✷ Theorem 1. The factor posets of two words f and g are isomorphic if and only if f and g are similar. Proof. Let θ : Fact(f ) → Fact(g) be a poset-isomorphism and  ∗  ∗ π : alph(f ) → alph(g)

38

A. Carpi, A. de Luca / Information Processing Letters 81 (2002) 35–39

be the monoid-isomorphism defined by π(a) = θ (a),

for all a ∈ alph(f ).

We shall prove that for all w ∈ Fact(f ), π(w) ≈ θ (w).

(5)

This is trivial if |w|  1. Thus we assume |w| = n > 1 and proceed by induction on n. Since θ preserves the lengths, one has |θ (w)| = n. Moreover, since θ is a poset-isomorphism, if the factors of w of length n − 1 are u and v (possibly u = v), then the factors of θ (w) of length n−1 are θ (u) and θ (v). Since π is a monoidisomorphism, the factors of π(w) of length n − 1 are π(u) and π(v). By the inductive hypothesis, one has θ (u) ≈ π(u)

and θ (v) ≈ π(v),

so that (5) follows from Proposition 1. In particular, one has π(f ) ≈ θ (f ) = g so that f , π(f ), and g are similar. Conversely, suppose that f and g are similar, i.e., g = π(f ), where π is an isomorphism or an antiisomorphism of (alph(f ))∗ and (alph(g))∗ . Since π is also a poset-isomorphism of (alph(f ))∗ and (alph(g))∗ , it defines a poset-isomorphism of Fact(f ) and Fact(g). ✷ The subword complexity λw of a word w is the map λw : N → N defined as   λw (n) = Card {v ∈ Fact(w) | |v| = n} , for all n ∈ N. Let us recall [5] that the subword complexity λw of any word w is non-decreasing for 0  n  1 + Gw and is strictly decreasing for 1 + Gw  n  |w|, having in this interval λw (n + 1) = λw (n) − 1.

(6)

Thus, λw reaches in 1 + Gw its maximum value. Since λw (|w|) = 1, from (6) one easily derives that λw (1 + Gw ) = |w| − Gw .

(7)

Lemma 1. Let f and g be words and set nf = 2 + Gf . If λf (n) = λg (n) for 0  n  nf , then f and g have the same subword complexity function, the same length, and Gf = Gg .

Proof. Since λf (n) = λg (n) for 0  n  nf , λg is non-decreasing for 0  n  1 + Gf and λg (nf ) = λf (nf ) = λf (nf − 1) − 1 = λg (nf − 1) − 1. This implies that Gf = Gg . By (7), one has |f | = |g|. Moreover, from (6) one derives that λf (n) = λg (n) for n  nf . ✷ Proposition 2. Let f and g be words and set nf = 2 + Gf . A poset-isomorphism θ : Fnf (f ) → Fnf (g), can be extended in a unique way to a poset-isomorphism θˆ : Fact(f ) → Fact(g). Proof. Since Fnf (f ) and Fnf (g) are isomorphic, one has λf (m) = λg (m) for 0  m  nf . By Lemma 1 one has that f and g have the same subword complexity function, the same length, and Gf = Gg . Consequently, any factor of f or g of length  nf − 1 is unrepeated in f or in g, respectively. We suppose that there is a poset-isomorphism θn : Fn (f ) → Fn (g), with nf  n < |f | and we prove that there is a unique isomorphism θn+1 : Fn+1 (f ) → Fn+1 (g), which extends θn . As a consequence, if one takes θnf = θ , one finds inductively that θˆ = θ|f | is the unique extension of θ to a poset-isomorphism of Fact(f ) and Fact(g). Let us extend θn as follows: let w ∈ Fn+1 (f ) of length n + 1; since n > 2, we can write w = asb,

with a, b ∈ A, and s ∈ Fn−1 (f ).

Since |s|  nf − 1, s is an unrepeated factor of f , so that as = sb. Thus, since θn is a poset-isomorphism, θn (as) and θn (sb) are two distinct extensions of θn (s) in g. Consequently, since θn (s) is unrepeated in g, there exist, and are unique, the letters c and d such that cθn (s)d ∈ Fact(g). Moreover, cθn (s) and θn (s)d are the only extensions (left and right) of θn (s) in g, so that     θn (as), θn (sb) = cθn (s), θn (s)d . We shall set θn+1 (w) = cθn (s)d.

A. Carpi, A. de Luca / Information Processing Letters 81 (2002) 35–39

Let us verify that the map θn+1 is injective. Indeed, let w, w ∈ Fn+1 (f ) be distinct factors of f of length n + 1 and write w = asb,

w = a s b ,

with a, b, a , b letters and s, s ∈ Fn−1 (f ). One has |s| = |s | = n − 1  nf − 1 and s = s since s and s are unrepeated factors of f . Thus, as θn is injective, it follows θn (s) = θn (s ). Since θn+1 (w) = cθn (s)d

and θn+1 (w ) = c θn (s )d ,

for suitable letters c, d, c , and d , one derives θn+1 (w) = θn+1 (w ). Actually, θn+1 is a bijection. Indeed, since f and g have the same subword complexity function, one has Card(Fn+1 (f )) = Card(Fn+1 (g)). By the construction, trivially, θn+1 is a posetisomorphism. Also by the construction, it is clear that θn+1 is the only poset-isomorphism of Fn+1 (f ) and Fn+1 (g) extending θn . ✷ By Proposition 2 and Theorem 1 one derives the following Theorem 2. Let f and g be words and set nf = 2 + Gf . The posets Fnf (f ) and Fnf (g) are isomorphic if and only if f and g are similar. Remark 1. The preceding theorem shows that the knowledge of the isomorphism class of the poset of the factors of a word f up to length 2 + Gf determines the word up to a monoid-isomorphism or anti-isomorphism. A similar result, proved in [3] shows that a word f is uniquely determined by the knowledge of its factors up to length 2 + Gf , i.e., if g is any word such that Fnf (f ) = Fnf (g), then f = g. We observe that this latter result can be easily derived from Proposition 2.

39

In fact, if Fnf (f ) = Fnf (g), the poset-isomorphism θ : Fnf (f ) → Fnf (g) in Proposition 2 can be taken equal to the identity isomorphism. Moreover, one can verify that for nf  n < |f |, if the poset-isomorphism θn considered in the proof is equal to the identity morphism, the same holds for θn+1 . From this one derives that the unique extension of θ to a posetisomorphism of Fact(f ) and Fact(g) is the identity isomorphism. Thus, Fact(f ) = Fact(g) which implies f = g. Finally, let us stress that the proof of Theorem 2 is based on two quite different results, the first (see Theorem 1) stating that two words f and g are similar if and only if their factor posets are isomorphic and the second (see Proposition 2) allowing one to restrict the isomorphism to the posets Fnf (f ) and Fnf (g). The technique of proof of this second result is, for some aspects, similar to that developed in [3].

References [1] G. Birkhoff, Lattice Theory, 3rd edn., American Mathematical Society Colloquium Publications, Vol. 25, Amer. Math. Soc., Providence, RI, 1967. [2] A. Carpi, A. de Luca, Words and repeated factors, Séminaire Lotharingien de Combinatoire B42l (1998) 24; also in: D. Foata, G. Han (Eds.), The Andrews Festschrift, Seventeen Papers on Classical Number Theory and Combinatorics, Springer, Berlin, 2001, pp. 231–251. [3] A. Carpi, A. de Luca, Words and special factors, Theoret. Comput. Sci. 259 (2001) 145–182. [4] A. Carpi, A. de Luca, S. Varricchio, Words, univalent factors, and boxes, Dipartimento di Matematica dell’Università di Roma “La Sapienza”, Preprint no. 41/2000, 2000. [5] A. de Luca, On the combinatorics of finite words, Theoret. Comput. Sci. 218 (1999) 13–39. [6] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983. [7] R.C. Lyndon, M.P. Schützenberger, The equation a M = bN cP in a free group, Michigan Math. J. 9 (1962) 289–298.