JOURNAL
OF COMPUTER
AND
SYSTEM
SCIENCES
20, 379-395
(1980)
Test Sets and Checking Words for Homomorphism Equivalence* KAREL
Department
of Computer
Science,
CULIK
University
II
of Waterloo,
Waterloo,
Ontario,
Canada
AND
ARTY SALOMAA Department
of Mathematics,
Received
January
University 29, 1979;
of Turku,
revised
Turku,
October
Finland
10, 1979
Given a language L over an alphabet C and two homomorphisms g and h, de&red on B*, we want to decide whether or not g and h are equivalent on L, i.e., whether or not g(w) = h(w) holds for all words w in L. We prove the following results for the case where X consists of two letters. Every language L possesses a finite subset Lr such that, for any pair (g, h), g and h are equivalent on L if and only if they are equivalent on L1 . For every language L (with the exception of some trivial cases), there is a word eu (not necessarily in L) such that, for any pair (g, h), g and h are equivalent on L if and only if g(m) = h(w). Our constructions are, in general, noneffective. Also some related notions are discussed in the paper.
1.
INTRODUCTION
Problems concerning homomorphism equivalence have turned out to be of crucial importance in some recent developments in formal language theory. Perhaps the best example is the DOL equivalence problem, [2]. Very recently there has been quite much research done along these lines. Basically, one can distinguish two different lines of development. In the first place, given two homomorphisms g and h, we define their equality set by
&?I4 = {wIg(w)= w4L Equality sets have turned out to be a powerful tool in the characterization of various language classes. The reader is referred to [I] and [3] for further details. In the second place, given a language L, the question of determining whether or not two given homomorphisms g and h are equivalent on L, i.e., whether or,not g(w) = h(w) * This research Grant A7403.
was
supported
by the
Natural
Sciences
and
Engineering
Council
of Canada,
379 0022~0000/80/030379-17$02.00/O Copyright 0 1980 by Academic Press, Inc. All rights of reproduction in my form reserved.
380
CULIK
AND
SALOMAA
holds for all words w in L, has appeared very interesting in its own right, apart from being a crucial tool in many decision problems. (Of course, this problem amounts to determining whether or not L is contained in E(g, h).) For instance, it was shown in [5] that this problem is decidable for context-free languages L, and conjectured that it is decidable even for indexed languages L. A special case of this conjecture was considered in [4], where the decidability for ETOL languages over a two-letter alphabet was established. This paper continues the second line of research described above from a more abstract point of view. Our starting point is the following conjecture, sometimes referred to as “Ehrenfeucht’s Conjecture”: Every language L possesses a finite subset L, such that, for any pair of homomorphisms (g, h), g and h are equivalent on L if and only if they are equivalent on L, . It follows from known undecidability results that the set L, cannot, in general, be effectively computable from L. We prove that the conjecture holds true for languages L over a two-letter alphabet Z = {a, b}. Moreover, we show that for such a language L one of the following two alternatives holds: (i) two homomorphisms g and h are equivalent on L only if g = h, or else (ii) there is a word w (not necessarily in L) such that, for any pair of homomorphisms (g, h), g and h are equivalent on L if and only if g(w) = h(w). Most of the technical discussions in this paper deal with languages over two-letter alphabets 2. Because of some special properties of such languages, many of the techniques are not applicable in the general case. Thus it remains an open problem whether or not our main results are valid in the general case. However, the basic definitions are given for the general case. 2.
DEFINITIONS
This paper deals mostly with very fundamental notions in formal language theory. The special definitions are given below, for the remaining unexplained notions (which are very few in number and very fundamental in formal language theory), the reader is referred to [7] or [9]. Consider two homomorphisms g and h mapping Z* into q, where .Z and Zr are finite alphabets and possibly 2 = Zr . The equality set of g and h is defined by E(g, h) = {w E .Z* j g(w) = k(w)}. For a word w in Z:*, the balance of w is defined by B(w) = I g(w)l - I 4w)l. (Thus, B(w) is an integer depending, apart from w, also on g and h. However, we write it simply B(w) because the homomorphisms, as well as their ordering, will always be clear from the context.) For an integer k 2 0, we say that the pair (g, k) has a k-bounded bhnxe on a language L if
I B(w)15 k
HOMOMORPHISM EQUIVALENCE
381
holds for all initial subwords w of the words inL. We denote by &(g, h) the largest subset of E(g, h) such that the pair (g, h) has a k-bounded balance on E,(g, h). Two homomorphisms g and h are equivalent on a language L if L C E(g, h). g and h are equivalent on L with bounded balance if
holds for some k. A finite subset L, of a language L is a test set for L if, for any pair of homomorphisms (g, h), g and h are equivalent on L if and only if g and h are equivalent on L, . A word w is a checking word for a languages L if, for any pair of homomorphisms (g, h), g and h are equivalent on L if and only if g(w) = h(w). Ob serve that it is not required that w is inL. A language L is rich if two homomorphisms g and h are equivalent on L only in case g = h. A word w is homomorphically forced by a language L, in symbols L I- w, if whenever two homomorphisms g and h are equivalent on L then also g(w) = h(w). A language L is homomorphically independent if there is no word w in L such that L - {w} k-- w. Following [6], we call a homomorphism h: Z* -+ 2; simplifiable if it can be decomposed as
h = gf, whereg:~*~~,*andf:~~~~~, and the cardinality of .Za is smaller than that of 22. If a homomorphism is not simplifiable, it is called elementary. A homomorphism h: Z* + .Zc is periodic if there is a word w such that, for each a E Z, there is an integer q such that h(a) = wg. It is clear that every homomorphism h: {a, b}* + 2; is either elementary or periodic.
3. PRELIMINARY
RESULTS
This sectioncontainsresults of a basic nature needed later on and also results obtainable from those in the literature. We consider first an example of a test set and a checking word.
382
CULIK
AND
SALOMAA
THEOREM 3.1. The setL, = {ab, u2b2}is a test setfor the language L = {unb” 1n 2 l]. The language L possessesno test set consisting of only one word but the word uba2b2 is a checking word for L. Every two words of L constitute a honmwr phically independent set.
Proof. To prove the first sentence, consider two homomorphisms g and h equivalent on L, . Denoting g(a) = X, g(b) = y, h(a) = xi , h(b) = yi , we obtain XY =
and
%Yl
XXYY
=
WlYlYl
*
If x = xi (and consequently y = yi), there is nothing to prove. Otherwise, we assume without loss of generality that x1 is a proper prefix of X, i.e., x = xpz, where x is not the empty word. Consequently,
We obtain now by the equation xxyy = x,x,y,y, Xl~Xl~YYl
=
,
XlXlZY~Y
and hence zx1zy = x1zyz. This equation is possible only if zx1 = XlZ
zy = yz.
and
These commuting relations imply (for instance, cf. [7]) the existence of a word v and integers p, q, Y such that Xl = VP,
z = 7.9,
y = 0’.
But this means that for any rz g(&n)
= v”(P+q+r) = h(&“),
which proves the first sentence. For any two homomorphisms g and h, g(ubu2b2) = h(aha2b2) if and only if g@) = 44
and
g(u2b2) = h(a2b2).
The “if” part of this claim is obvious. The “only if” part follows because the balance of any word w depends on its Par&h vector only. Consequently; if. B(w) = 0 then B(wi) = 0 for any prefix w, of w whose Parikh vector is proportioned to that of w. This shows that ubu2b2is a checking word for L.
HOMOMORPHISM
383
EQUIVALENCE
For each 1z= 1, 2,..., consider the homomorphisms g, and h, defined by
g,(a) = 4
g#) = bun,
h,(a) = a’%,
h,(b) = a.
Then g,(&W) = h,(@bm) if and only i f m = n. Consequently, L possessesno test set consisting of only one word. The homomorphisms g, and h, show also that every two words of L constitute a homomorphically independent set. 1 At the time of this writing we do not know any homomorphically independent language over the alphabet {a, b} consisting of three words. It is clear that, for any alphabet, there are homomorphically independent languages whose cardinality equals that of the alphabet. It is also clear that every subset of a homomorphically independent language is itself homomorphically independent and that the empty word does not belong to any homomorphically independent language of cardinality 22. If every language possessesa test set then clearly there are no infinite homomorphically independent languages. We show below that every language over a two-letter alphabet {a, ZJ}possessesa test set and, consequently, there are no infinite homomorphically independent languages over the alphabet {a, 6). The existence of text sets in the general case remains open. We consider next some properties of rich languages. THEOREM 3.2. Every language L over one-letter alphabet, i.e., L C b*, (where b is a letter) and not consisting of the empty word alone is rich and possessesa checking word. No rich language strictly over at least two-letter alphabet possessesa checking word.
Proof. The first sentence is obvious. To prove the second sentence, consider a rich language L whose minimal alphabet .Z contains at least two letters and a word w over Z. If w does not contain occurrences of two distinct letters, it cannot be a checking word for L. Otherwise, it is possible to define two periodic homomorphisms g and h such that but
gfh
g(w) = 44
BecauseL is rich, w cannot be a checking word for L.
.
1
For a nonempty word w over {a, b}, we denote by &,(w) the ratio between the number of occurrences of a and that of b in w. Thus, R&z”) THEOREM
3.3.
= co,
Rx,(b)
= 0,
R,,(u%zb) = 312.
If a language L over the alphabet {a, b} possessestwo words wI and w2
such that Gh) then L is rich.
f R&4
(1)
384
CULIK AND SALOMAA
Proof. The claim is obvious if one of the ratios equals 0 or GO.Consequently, we may assume that both ratios are finite and different from 0. Let the number of occurrences of a (resp. b) in wi be mi (resp. ni), for i = 1, 2. Thus, all of the numbers mi and ni are #O. Consider now two homomorphisms g and h satisfying
and
&l> = 4%)
&%) = w4*
(2)
We want to show that g = h. Assume the contrary. Then there are positive integers OL and /3 such that
I&>I- Iw = 01
and
(Of course, it is also possible that h(a) is longer than g(u) but we restrict without loss of generality our attention to one of the two symmetric cases.) We obtain now from the equations (2) mlci - n,/3 = 0 = mza - n&, and hence, ml/n1 = m&t2 , which is a contradiction.
1
The following result is now an immediate consequence of Theorem 3.3. THEOREM 3.4. A language L satisfying the assumptions of Theorem 3.3 possesses a test set L, consisting of two elements. In fact, any two words w1 and w2 in L satisfying (1) constitute a test set for L.
Theorem 3.4 shows that, as regards test sets for languages L over (~1,b}, we may restrict our attention to the case where R&W) is constant when w ranges over L. A further classification of such languages turns out to be useful. A language L over the alphabet {a, b} with Rab(w) = r for all w EL has m-bounded prefix dsjference, where m is a positive constant, if for all prefixes w of the words in L, the absolute value of the difference between the number of occurrences of a and that of b multiplied by r in w is at most m. L has bounded prefix d;Sference if it has m-bounded prefix difference for some m. Otherwise, it has unbounded pref;x dt$ference. For instance, the language L of Theorem 3.1 has unbounded prefix difference. LEMMA 3.5. Let g and h be homomorphisms on 2?, Z = {a, b}. Let g be elementary and h simplifiable. Then there is w in Z* so that E(g, h) = {w}*. Proof. As already mentioned, a simplifiable homomorphism over two letter alphabet must be periodic. So, there is z in .Z+ so that h(u) = zn, h(b) = zrn for some m, n 2 0. so x E Ek,
h)
iff
g(x) = %#Jz)+m#a(xb
(3)
where #Jx) is the number of occurrences of a in x. Let u, ~1E E(g, h). Then also uv, vu E E(g, h) and by (3) g(uv) = g(wu) = z~#J~~)+~#~(~~).Since g is elementary it is injective by [8, Theorem 111.1.71. Therefore uv = vu and clearly there exists w in Z*
HOMOMORPHISM EQUIVALENCE
385
such that u = ti”, v = w8 for some integers t and s. Assume that w is the minimal such string (see [7, Chap. I]). Hence E(g, h) C {w}*. Now, assume E(g, h) # {c} ( E IS . t h e empty word). Then for some K > 0, wk E E(g, h). so
and we conclude that
Thus by (3) we have w E E(g, h).
1
We need the following result which is a consequence of the considerations in [6]. LEMMA
3.6.
If g and h are elementary -%
homomorphisms h)
=
Ek(&
(defined on any alphabet
.Z) then
4
for sonz integer k. LEMMA 3.7. If g and h are homomorphisms then E(g, h) = E,(g, h) for some k 1 0.
over a two-letter
alphabet
andg is elementary,
Proof. If both g and h are elementary, then we have a special case of Lemma 3.6. If h is periodic, then the result follows by Lemma 3.5. 1 THEOREM 3.8. If L is a language over the alphabet {a, b} possessing unbounded pre$x difference, and g and h are distinct homomorphisms equivalent on L, then g and h are both periodic. Proof. We have already pointed out that, in the case of a two-letter source alphabet, every homomorphism is either elementary or periodic. Because the balance of a word depends only on its Parikh vector, and because L possesses unbounded prefix difference, the result follows by Lemma 3.7. fi THEOREM 3.9. Assume that L is a language over (a, b} possessing a test skt L, and such that the ratio R&w) is constant when w ranges over L. Then L possesses a checking word. Proof.
A checking word is obtained by catenating all words in L, .
fl
Theorem 3.9 is valid also for arbitrary alphabets: The assumption of R,, being constant is to be replaced by the assumption that no two Parikh vectors of words in L are linearly independent. It is clear that if every language L belonging to some family of languages possessesa test set L, and if, furthermore, Ll is effectively computable from L, then the problem of homomorphism equivalence is decidable for the family. This means that it is decidable, given a language L from the family and two homomorphisms g and h, whether or not g
386
CULIK
AND
SALOMAA
and h are equivalent on L. The following result shows that the constructions given below cannot, in general, be effective. THEOREM3.10. The problem sensitive languages over (a, b}. Proof.
of homomorphism
equivalence
is undecidable
for context-
Consider the homomorphisms g and h defined by g(a) = h(a) = a;
h(b) = a2.
g(b) = a,
If we could decide whether g and h are equivalent on a context-sensitive language L over {a, b}, we could also decide whether b occurs in such a language. But the latter problem is known to be undecidable. 1 We conclude this section by pointing out a result established already in [5]. THEOREM
3.11.
Every regular
language (over any alphabet)
4.
MAIN
possesses a test set.
THEOREM
This section is devoted to the explicit statement of our main results already hinted at above. The proof of Theorem 4.1 will be given in the next section, the final one in the paper. THEOREM
4.1.
Every
language L over {a, b} possesses a test set.
The test set constructed in the proof in the next section might be unnecessarily complicated. A more explicit characterization of the equality sets in the case of a two-letter alphabet might give rise to a much simpler theory of test sets. However, at the time of this writing we are unable to give such a characterization. Theorem 3.10 shows that the construction is not, in general, effective. The following result is an immediate consequence of Theorems 3.3, 3.9 and 4.1. THEOREM 4.2. checking word.
Every
language
over the alphabet
{a, b} is either
rich
or possesses a
Also our last theorem is a consequence of Theorems 3.3, 3.9 and 4.1 because star languages (i.e., languages of the form L = K*) are closed under catenation. THEOREM4.3. Every star language over the alphabet test set consisting of only one word.
(a, b} is either rich or possesses a
In conclusion, we would still like to emphasize that most of our constructions depend heavily on the alphabet consisting of only two letters. Thus, the problems remain open in the general case. The preceding discussions show that also many other fundamental questions in the theory of free monoids, such as the characterization of homomorphically
HOMOMORPHISM EQUIVALENCE
387
independent languages, remain open. A more explicit characterization of the equality sets in the case of a two-letter alphabet might also settle the well-known open problem: Is the Post Correspondence Problem decidable if both of the lists consist of two words only ? 5. PROOF OF THEOREM 4.1 We may assume by Theorem 3.4 that the given language L has “constant ratio”:
when w ranges over L. (We assume, of course, that L is infinite.) We write the ratio in the form where r,+r,= 1. 9. = y*lr, , For a word w over {a, b}, we define the weighted dajjim?nceby 44
= r,#aW
- c#&J)>
where #,(w) and #B(w) denote the number of occurrences of a and b in w, respectively. Thus, d(w) = 0 for words w belonging to L. Observe also that if h, and h2 are two homomorphisms equivalent of L and w is a prefix of a word in L, then
I B(w)1= ) F
(I h&)l - I h,(4) / = / $) (I Mbl - I h,(W)1.
It is clear that a language has bounded prefix difference in the sense defined in Section 3 if and only if there is a constant D such that I d(w)1 2 D holds for all prefixes w of the words in the language. We now outline the proof of Theorem 4.1. Two cases will be considered. Assume first that L has bounded prefix difference. In this case we factorize each word w inLasw =x, ***xg, where the lengths of xi are between C and 2C. Here C is a constant large enough compared with the bound on the prefix difference such that the following condition will be satisfied. Whenever h,(w) = h,(w) and 1 5 i r k - 1, then m4 I MxI
-.a41, I h,h **axi)l) < min(l h,(x, -*axt+dl, I h& **a~+,)l).
In other words, we always get “overlap” between the two homomorphisms when the above factorization is considered since, because of the bounded prefix difference, none of the homomorphisms can “run” too much faster than the other. The explicit technical details are contained in the notion of piece defined below and in Lemma 5.1. The overlap makes it sufficient to test for homomorphism equivalence all situations of the form
57+'/3-8
388
CULIK
AND
SALOMAA
Because the lengths of the x’s, as well as ( d( x)I are bounded, this gives rise to a finite test set. Second, assume that L has unbounded prefix difference. By Theorem 3,8, two homomorphisms can be equivalent on L only in case they are periodic. Hence, in this case we must show that L possessesa finite “periodicity forcing” subset L,: two distinct homomorphisms can be equivalent on L, only in case they are periodic. This second case is needed for Theorem 4.1 only-Theorem 4.2 follows from the case of bounded prefix difference above. After this outline, we now go into the technical details. For the case of bounded prefix difference, the following definition will be crucial. Assume that n is a rational number (possibly negative) and w is a word over {a, b). We say that the pair (n, w) is a piece if , w, >
I n I + I 4w)l min(5 , r2)
- (m=($-,$)
+ 1)
where D is an upper bound on the absolute values of weighted differences of all subwords of w. The following lemma formalizes the notion of “overlap” between two homomorphisms. Intuitively, the lemma says that none of h and h, can “run” much faster than the other with respect to a sufficiently long w. LEMMA 5.1. Assume that uww E (a, b>*, (d( u), w ) is a piece, and h, and h, are distinct homomorphisms satisfying h,(uww) = h,(uwo). Then
ma4 hlMl, I W)l) < fin(l Ww)l, I h&Ml). Proof. We may assume that I h,(u)\ 2 I hz(u)l, otherwise, we interchange h, and h, . We may also assume that I h,(u)1 > I h,(a)l, otherwise, we interchange a and b. Observe that the latter interchange is “legal” in the sense that it preserves (d(u), w) as a piece. By the assumption, 1d(x)1 2 D for each subword x of w. Consequently, each subword x satisfying
contains at least one occurrence of the letter a. Otherwise, I d(x)\ > D. Because (d(u), w) is a piece, JwJ >!2!.!3.~. (1) 12 r1 On the other hand,
I hI(u)I- l h,(u)\= ( B(u)1= y
(I h(a)1- l hI(a
< F
I hz(a)l.
HOMOMORPHISM
389
EQUIVALENCE
This yields by (1) I B(u)1 -c I ha(w)I, i.e.,
I Mf4l < I h2Wl + I h2Wl = I Ww)lThis together with the relation
proves the lemma.
a
We are now in the position to actually begin the proof of Theorem 4.1. We assume first that L has bounded prefix difference. This clearly implies the existence of a constant D such that I 44 5 D holds for all subwords x of words in L. We define another constant C by C=2(max($-,z)+l)‘. Denote by L, the finite subset of L, consisting of words of length 13C. Consider a word x in L with I x 1 > 3C. x can be decomposed as x = x1x2 *** Xk ,
C 5 1Xi 1 5 2C
for
i = I,..., K.
(2)
For each such x, we fix such a decomposition (2). It is easy to verify that, by the choice of c, (4% *.* Xi-l), Xi) is always a piece. Let now L, be the collection of triples (%
Wl
Y wz),
(39
where n is an integer and w1 and w2 are words satisfying C 5 / wi / 5 2C such that there is a word (2) in L and a number i, 1 s i 5 K - 1, with the following properties: 4x1 ‘** Xi-l) = TZ, Wl = Xi 7 W2 = Xi+1 * (In the case i = 1 it is understood that 4x1 ..- Xi-l) = 0.) It is obvious that L, is finite. Moreover, there is a finite subset T’ of L that generates L, in the following sense: whenever (3) is in L, , there is a word UW,W,Vin T’ such that d(u) = n. We define now T = T’ u L, and claim that T is a test set for L. Clearly, T is finite.. It suffices to show that, for an arbitrary pair (hr , h,) of distinct homomorphisms, whenever h, and h, are equivalent on T, then they are also equivalent on L. Hence, assume that hl and h, are equivalent on T. To show that h, and h, are equivalent on L, we consider an arbitrary word x in L -L, , written in the form (2).
390
CULIK
AND
SALOMAA
For i = 1,..., k, we denote by A(i) th e f o 11owing assertion: one of the words
hdx, ***Xi)
and
h&i
*.* Xi)
is a prefix of the other. Our aim is to establish inductively the assertion A(K). The basis of induction is clear’: one of the words h,(x,) and h,(x,) is a prefix of the other. This follows because Wx,x,w)
= Uw,$
for some word ux,xse, in T’ such that d(u) = 0. Assume now, inductively, that A(i) holds true for some i such that 1 5 i 5 k - 1. Without loss of generality (the situation being symmetric), we assume that there is a word z such that Mx, -.a xi) = h,(x, ..- x&. On the other hand, there is a word UX~X~+~Z, in T’ such that d(u) = d(x, *.. xiel). (In the case i = 1, this means that d(u) = 0.) Because hl and h2 are equivalent in T’, one of the words h,(uq) and h,(uxJ is a prefix of the other. We claim that h&x,)
= h&x&,
where x is the word defined above. To prove this claim, we observe first that h,(ux$) = h&x&‘,
for some z’ with the property 1z’ 1 = 1z I. This is an immediate consequence of the fact that d(x, ... xi) = d(uxJ. But now, by Lemma 5.1,
I h&xi)l > I M4L which implies that
I z’ I < I h,(xi)l. In the same way we infer from the equation
hl(Xl-..
xi) = h,(x, ‘.- xi)z
that
I z I < I @,)I, a,fact obvious also by the relation j z 1 = 1z’ I. But this means that both z and x’ consist of 1x 1 = 1z’ 1last letters of h&J, showing our claim z = z’ to be true. Because h, and h, are equivalent on T’, we infer that one of the words
wwi+l)
and
HOMOMORPHISM
EQUIVALENCE
391
is a prefix of the other. This together with the fact that x = x’ now immediately gives the result that one of the words h,(x, a-* x~+~)
and
h,(% --* xi+11
is a prefix of the other, completing the inductive step. Consequently, the assertion A(K) holds true, i.e., one of the words 4%
*** x7&)
and
h2(s1 --- xh)
is a prefix of the other. On the other hand, x1 *.- xk is a word in the language L and has, consequently, the correct ratio for which h, and h, already have been tested. This implies that I Mx, *.- x&l = I Mx, --- x,4, which gives the desired result
This concludes the proof in the case where L has bounded prefix difference. We now proceed to the other case: L has unbounded prefix difference. In this case, by Theorem 3.8, two distinct homomorphisms h, and h, are equivalent on L only if both are periodic. On the other hand, to test the equivalence of two periodic homomorphisms, it suffices to test whether or not the homomorphisms agree on an arbitrary word in the language. (Remember that L has constant ratio). Hence, the construction of a test set amounts in this case to the construction of a finite subset of L which is “periodicity forcing” in the sense that if two homomorphisms are equivalent on this subset, they are necessarily periodic. The remainder of the paper deals with the construction of this subset. Note, however, that if we only want to prove Theorem 4.2, we are finished. This is seen exactly as in the proof of Theorem 3.1: we just construct a periodic@ forcing checking word. The latter is not necessarily in the language L. We now proceed with the construction of the periodicity forcing finite subset of L. It will consist of four words chosen so that a pair of injective homomorphisms with “unbalanced” images of single letters cannot agree on the first two words, and a pair of homomorphisms for which the balance of single letters is relatively small with respect to the length of their images cannot agree on the other two words. The precise distinction between these two cases will be specified later by formula (6). We make first the following observation. Claim. Let g, h be homomorphisms on .Z*, u, v, w, x, y E z*, d(u) = d(x), (d(u), w) i; a pp, g(uwv) = h(uww) andg(xwy) = h(xwy). Theng(uwy) = h(uwy) andg(xwv) = XWV.
Proof. Immediately by Lemma 5.1. In view of the above we may extend the given language L when looking for a periodicity forcing subset and then return back to a subset of L.
392
CULIK AND SALOMAA
The completion of L, denoted by C(L), is obtained as follows. If UWNand xwy are in L, where d(u) = d(x) and (d(u ), w) is a piece, then add strings uwy and xwv to L. Repeat until no more strings can be added. We say that L is complete if C(L) = L. Clearly, it follows from the above claim that C(L) possesses a periodicity forcing finite subset if and only if L does. Thus we may assume without loss of generality that L is complete. Let M be the set of all maximal common prefixes of the words in L, i.e.,
M = {U 1 uw EL, ux EL, where 1 : w # 1 : z, or else w = E and z $1 E}. Clearly, M is infinite. Consider the set of rational numbers
S = {I d(w)1 1w E M}. Since L is complete, it is easy to verify that S is unbounded. Since ( d(w)1 is unbounded, L must contain two words ‘yaab&
Qw3,
44
j: 44
(4)
or, otherwise, we can lind a periodicity forcing set quite easily. We choose such words (4) for which 1d(or)l is the smallest possible. (The words (4) may also result by decomposing one word in two different ways.) Denote D, = 1d(or)l. We also assume that
is smallest possible. We assume without loss of generality that 1h,(u)! 2 1As(b)/, otherwise we interchange a and b. Now, we define
K = W, + 4 + Y>/Q. We now choose a word w from M with the property
I Q# 2 K/min(r, , ye).
(5)
(Remember that 5’ is unbounded.) Let u and v be words such that wu and WV are in L and 1 : u # 1 : v. Because clearly w is not in L, the words u and v are nonempty. Consider now two distinct homomorphisms h, and h, equivalent on L. Hence, one of the words h,(w) and h(w) is a prefix of the other, assume h,(w) = Wk. Denote
H, = I I h&)l - I hb)l I Hence,
and
fh = I I h,(b)l- I 441 I
HOMOMORPHISM
393
EQUIVALENCE
Assume that h and h, satisfy the inequality
Then we claim that the set {wu, ww} is periodicity forcing. Indeed, (5) and (6) imply that
Thus, if h, and h, are equivalent on {WU, ruw}, then h,(u) and h,(u) have a common prefix of length ) I 1, i.e., of length at least I h,(a)1 + I h&)1, and starting with both h(u) and with h#). By Theorem 111.1.6 from [8], this is impossible for elementary homomorphism hl , Clearly, there is no x such that (WU,ww} 2 z*. Hence by Lemma 3.5 if hr is periodic and equivalent to h, on {UN, RXJ},then also h, is periodic. We now claim that the set
is periodicity forcing for homomorphisms (h, , h,) not satisfying the condition (6). Consequently, for any pair (/zr , h), (7) is periodicity forcing. We now apply the shifting argument of [2], considering the common substring aab appearing after the prefixes 01and y. We assume without loss of generality that B(d) 2 0, otherwise we interchange hl and h, . The situation can be depicted as follows h,(a)
h,(a)
I I
h,(a)
I I
h,(b)
I I
I
i: hp(d
h&a) I ‘P”
I -’
P’ B(d
I
I I I I I I
h2h) L
‘I$4
I ’
h&a)
I I
h&a) I
h2b) ’
I I
.I
)+I
B(Y) = h,(a) I I
h,(a)
h,(a)
, I
I
h, b) ;
I
We have “aligned” the images by the common subword h,(mb). Because (6) is not satisfied, we have H
1
<
12
I M4l + ~1I 4(4l . K
By (8) and the definition of K it is now easy to deduce the estimates
(8)
394
CULIK
AND
SALOMAA
which shows that there is enough “overlapping” to get periodicity and that h,(u) is “periodic,” i.e., of the form p,pmp, , m 2 2, pap, = p. If h,(a) # pm, then the second occurrence of h,(u) in ha(&) is periodic with the same period but shifted. Consequently, h,(a) has even a shorter period. Repeating this reasoning, we finally obtain a period p’ such that h,(a) = (p’)“, s 2 2, i.e., h,(a) is strictly periodic. If now WY) + 2 I h2(4/ + I h,(b)1 5 2 I 44l + I Mb)lY (9 we use the same shifting situation as before and conclude that the whole word A,(&) is periodic with the period p’ deduced above. Since 1As(a)11 1 h,(b)1 and ha(a) is strictly periodic, As(b) must be a prefix of h,(u). Therefore, independently on which are the first letters in strings /3,6 (which follow the two occurrences of substring a&), both occurrences of h,(aab) are followed by h,(b). Therefore, we have a “shifting situation” also for h,(M) and conclude that h,(b) is strictly periodic, i.e., h,(b) = (p’)” for some t. There remains the case where (9) is not satisfied. In this case we use the shifting argument for uub in both Luaub/3and yuab& However, we now align the images by the common subword h,(uab) occurring in both h,(aA$) and h,(yu&). By (8) and the definition of K we have H
<
~2 I44
+
~1 I h(b)l
D2
+
1 W,
Since(01+
4Yr2
2
B(y)/Hl ~B(Y)
and +
r2 I @)I 7%
Furthermore, since Hi = ( j h2(u)l we have 2B(Y)
-=I 2
+
<
rz>/rz
-
+ rl I h,(b)1 = r, I h,(u) + r2 I h,(b)/ we have
r2 I h,Wl
+ ~1I )I.
j h,(u)/ 1, 0 < rl ,
I &)I - 2 I I h2Wl
y2
< 1 and I h,(u)1 3 1h,(b)1
- I M4l I 5 2 I $(4l.
Thus
Now (10) guarantees that both occurrences of h,(ub) in the shifting position are “under” the common substring h,(uub), which implies the periodicity of h,(ub). Since h,(u) is strictly periodic, 1h2(u)l > I h,(b)1 we see first that h,(u) is strictly periodic, therefore either h,(u) is a prefix of h,(b) or vice versa, and finally that also h(b) is strictly periodic. This together with Lemma 3.5 concludes the argument in this final case, where (9) is not satisfied.
ACKNOWLEDGMENT The authors are grateful to C. Choffrut particular, J. Karhumlki proved Lemma
and 3.5.
J. Karhumiiki
for
discussion
of the
material.
In
HOMOMORPHISM
EQUIVALENCE
395
REFERENCES
1. K. CULIK II, A purely homomorphic characterization of recursively enumerable sets, J. Assoc. Comput. Mach. 26 (1979), 345-350. 2. K. CULIK II AND I. FRIS, The decidability of the equivalence problem for DOL-systems, Inform. Contr. 35 (1977). 2&39. 3. K. CULIK II AND H. A. MAUFSR, On simple representation of language families, R.A.I.R.O. 13 (1979), 241-250. 4. K. CULIK II AND J. L. RICHIER, Homomorphism equivalence on ETOL languages, Internat. J. Computer Muth., Sect. A 7 (1979), 43-51. 5. K. CULIK II AND A. SALOMAA, On the decidability of homomorphism equivalence for languages, 1. Comput. System Sci. 17 (1978), 163-175. 6. A. EHRENFEUCHT AND G. ROZENBWC, Elementary homomorphisms and a solution of the DOL sequence equivalence problem, Theoret. Comput. Sci. 7 (1978), 169-183. 7. M. A. HARRISON, “Introduction to Formal Languages,” Addison-Wesley, Reading, Mass, 1978. 8. G. ROZENBERG AND A. SALOMAA, “The Mathematical Theory of L systems,” Academic Press, New York/London, 1980. 9. A. SALOMAA, “Formal Languages,” Academic Press, New York/London, 1973.