Infinite Lyndon words

Infinite Lyndon words

ELSEVIER Information Processing Letters 50 (1994) 101-104 Information Processing Letters Infinite Lyndon words Rani Siromoney *, Lisa Mathew, V.R...

321KB Sizes 0 Downloads 102 Views

ELSEVIER

Information Processing Letters 50 (1994) 101-104

Information Processing Letters

Infinite Lyndon words Rani Siromoney

*, Lisa Mathew,

V.R. Dare, K.G. Subramanian

Department of Mathematics, Madras Christian College, Tambaram, Madras 600 059, India

(Communicated by L. Boasson; received 29 November 1993)

Abstract We define an infinite Lyndon word as the limit of an increasing sequence of prefix preserving Lyndon words and show that some of the interesting properties of Lyndon words generalize to the infinite case. We construct a queue automaton that recognizes the set of Lyndon words and show that it can be extended to recognize infinite Lyndon words. We discuss certain topological properties of the set of infinite Lyndon words such as homeomorphism with a subspace of the Cantor space. Key words:

Lyndon

1. Lyndon

words

words;

Infinite

words;

Queue automata; Formal languages

Consider a totally ordered alphabet A. The free monoid A* is ordered by the lexicographic ordering. Hence for U,U EA* we write u < U, or equivalently, u > u whenever (1) u is a prefix of u and u # U, or (2) u =xay, u =xbz with x,y,z EA”, a,b EA and a
* Corresponding author. Elsevier Science B.V. SSDI 0020-0190(94)00013-O

In other words, a Lyndon word is a primitive word which is minimal in its conjugacy class. The following properties are true for the set L of Lyndon words [2]. (1) A word w E A* is in L if and only if for each proper suffix v of w we have w < v. (2) Let U,U EL. uu EL if and only if u < c’. (3) Any word w E A* admits a unique factorization, w=w1w2...w, such that and w,zw,>

each

wi (1 G i G m) is a Lyndon

word

a** >w,.

Duval [2] has given a linear time algorithm that uses pattern matching to obtain the factorization mentioned above.

2. Infinite

Lydon words

Infinite words and their topological properties have been studied extensively [4]. However, no

102

R. Siromoney et al. /Information

study has been made so far on extending finite Lyndon words to the infinite case, although there are several studies on the infinite Fibonnacci and Thue-Morse words. Hence we define the set IL of w-Lyndon words over the alphabet A as the limit of the set of Lyndon words over A. In other words, an infinite word is w-Lyndon if it has an infinite number of prefixes which are Lyndon. A number of properties of Lyndon words can be extended to the set of w-Lyndon words. Proposition 2.1. Let u EL, if and only if u CU.

u E IL. Then UVEIL

Suppose u < v. Since u E IL it has an infinite number of prefixes, say vl,vZ,. . . E L with IuiI > lul. Since u
Proposition 2.2. A word w E A is in IL if and only if w < v for each suffi v of w. Proof. Suppose w = uv is w-Lyndon. We can choose prefixes wi E L of w such that I u I < I wi I for i E N. Then wi = uui for some vi E A*, ui # A. Since each wi is Lyndon wi < vi and hence each wi
(2.1)

such that w1 & w2 > . * * aw,

wherewiEL(l
(2.2)

I’ ...

ZWi>

...

and wi E L.

Processing Letters 50 (1994) 101-104

Since each letter a E A is Lyndon, it is clear that every word in A” has at least one factorization (the trivial factorization where each letter is a separate factor) in terms of Lyndon words. The condition wi > w2 > . . . can be obtained by applying the unique factorization theorem for finite words on successively longer prefixes of w. The proof is complete if we can show that this factorization is unique. It is clear that there cannot be two factorizations of the type (2.1), since any finite prefix of w has a unique factorization. The same argument holds for factorizations of the type (2.2). Thus the only possibility is that w has two factorizations, one of the form (2.1) and the other of the form (2.2), i.e. w=w,w,...w, =w;w;...w:,... Proof.

with wi 2 w,> *** >w, and w; > w;> ... > wka . ..) wi is Lyndon for 1< i < m, w, is wLyndon and wf is Lyndon for i E N. However, in this case, it is clear that wi = wj for 1 w;+i>~A+~... , I contradicting w-Lyndon. 0

wm+lwm+2***

and hence wk> the fact that w,,, is

We use the notation FL(w) for the factorization used in Theorem 2.3. Proposition 2.4. Let w E A” and FL(w) be finite with FL(w) = wl.. . wm. Then the following are true: (1) w, is the least suffix of w with respect to the ordering < . (2) If w,,, E IL, it is the maximal suffix of w in IL. Proof. Let s be a suffix of w. Then it is of the

form wiwi+i... w, where w[ is a nonempty suffix of wi for some i w m. Hence w, < wi < wi < s. Thus w, is least with respect to the ordering < . (2) Since w, is a proper suffix of s, we have i
R. Siromcmey et al. / I~fo~rnut~on Processing Letters 50 (1994) IO1 -104

Proposition 2.5. Let w E A” = w,wt and FL(w) = w,FL(w’). Then w1 is the maximal prefk of w in L. Proof. Suppose u is a prefix of w. It is of the form u = w 1.. . w: with WI a nonempty prefix of w, for some i. Hence WI < w, < - . * d w1 and w, CU. Thus wj < u and u CEL. This completes the proof. q

rejected if x
Generally it has been of interest to construct automata to accept specific classes of languages or sets of words. So far no such construction has been given for the set of Lyndon words. Queue automata (machine with one or more FIFO tapes) have been studied in connection with various grammars [ 11. A k-tape non-deterministic queue automaton (FIFO,) A4 is defined as a 7-tuple (Q, A, B, 6, qo, Z, F) where Q is a finite set of internal states, A a finite input alphabet B, a finite memory alphabet, 6 is a (possibly partial) transition mapping given by 6:QxAxBk

91, qo1, 4to9 q219

(40, a, A, A) + (90, s, (H,

a>, (H,

0)

, S, (H,

b), (H,

A))

(qo,

3. The queue automaton

103

b> A> A) --) (qo

(qO, 0, a, x) -+ (qO, S, (S, a>, (H,

a))

(40, b, b, x) -+ (qO, S, (S, b), (H,

6))

(40, b, a, x) -+ (901, tf, (S, b),

(H,

a>)

(401, b, a, x) --$ (qa, H, (S, A), (H, (qo,,

4)

b, b, x) -+ (901, H, (S, A), (H,

(901, b, A, x) --+ (41, 3, (H,

b))

A), )

(41, a, x, a) -+ ( 41 , s, (H,

a>, (S, 6))

(41, b, x, b) --+ (41, $7 (H,

b),

(41, b, x, a) -+ (410, H, (H,

(S, b))

a>, (S, b))

(910, b, x, a) -3 (410, H, (H,

a>, (S, A))

(410, b, x, b) -+ (qiO> iFI, (H,

b), (S, A))

(410, b, x, A) -+ (90, S, (H,

A), (H,

A))

(40, a, b, x) -+ (qz, a, 0% A), Cr, A)) (where P(E) is a set of finite subsets of the set E), q0 E Q, is the initial state, Z E Z3 is the initial memory symbol and F c Q is the set of final states. The automaton M is deterministic if %, a, A,, A,, . . . , Ak) contains at most one element for every choice of the arguments. We construct a 2-tape deterministic queue automaton ML that recognizes the set of Lyndon words by a method similar to Duval’s algorithm [21. As each letter is read by the input head it is stored in one of the FIFO tapes, say Ti. As each symbol on this FIFO tape is read it is stored on the other FIFO tape, T2. At each stage we compare the current symbol x on the input tape with that on T,, say x’. If the symbols on the two tapes are identical the tape heads move to the succeeding symbols on the tapes. If not, the word is

(s,,

~3 x, b) -+ (q2, a, (P, A), (r,

(92, x, Y, 2) -+ (qzt a, (P,

A), (r,

A)) A))

Here x,y,z~ {a, b, A} and a,P,Te {S, H}. A represents the empty word. We know that an automaton recognizes an infinite word if it passes through the set of final states infinitely many times on reading the word. Extending the same definition to a queue automaton we get the set of words recognized by ML to be the set of IL.

4. Topological properties We define the metric d(u, u) 141 over A” as follows: Let U,U EAT. If u = U, d(u, u) = 0. Oth-

104

R. Siromoney et al. /Information

erwise d(u, U) = l/2” where n is the least positive integer such that u(n) = v(n). It is easily seen that A” is a compact metric space since the topology that d imposes on A” coincides with the product topology obtained from the discrete topology of A. The limit of every infinite sequence of Lyndon words is a Lyndon word or an w-Lyndon word. Hence INL = IL U L is a closed set and therefore a compact set, since it is a closed subset of a compact set. ZNL is zero-dimensional since it has a basis (the collection S,,,,(U) n INL, u E INL, n E b4) which is both onen and closed. Here S_(U) is the open ball with centre at u and radius ;.‘ Since ZNL is a zero-dimensional compact metric space, from [3] we see that it is homeomorphic to a subspace of the Cantor space.

Processing Letters 50 (I 994) 101-104

5. Acknowledgement The authors thank the referee for his critical comments which were useful in the revision of this paper.

6. References [l] A. Cherubim, C. Citrini, S. Crespi-Reghizzi and D. Mandriolini, QRT FIFO automata, breadth first grammars and their relations, Theoret. Comout. Sci. 85 (1991) 171-203. [2] J.P. Duval, Factorizing words over an ordered alphabet, J. Algorithms, 4 (1983) 363-381. 131 T. Head, The adherence of languages as topological spaces, in: Automata on Infinite Words, Lecture Notes in Computer Science 192 (Springer, Berlin, 1984) 147-163. [4] M. Nivat, Infinite words, infinite trees, infinite computations, in: Math. Centre Tracts 109 (1979) l-52.