I N F O R M A T I O N AND CONTROL
11, 167-176 (1967)
The Loop Complexity of Pure-Group Events ROBERT ~[C]-'~'AUGHTON Rensselaer Polytechnic Institute, Troy, New York 12181
An algorithm is presented for finding the loop complexity (sometimes known as "star height") of certain regular events. The algorithm works for those regular events whose syntactic monoid (i.e., semigroup) is a group. The existence of an algorithm that works for all regular events remains an open question. This paper is a continuation of the problem presented by the author (McNaughton, 1967). As it is not expected that anyone will want to read this paper who has not already read its predecessor, a general knowledge of that paper and its availability as a reference are presupposed. Frustration will come early to anyone who has not read as far as the theorems following the definition of "all-admissible graph." It is also necessary that the reader have some knowledge of the syntactic monoid of an event, also referred to as the semigroup of the event, as explained in the first section of Rabin and Scott (1959). The main result is the presentation of an algorithm for determining the loop complexity of a regular event whose syntactic monoid is a group. The idea of investigating the loop complexity of these events was inspired by recent papers by Dejean and Schfitzenberger (1966) and Dejean (1967). The term "pure-group event" is used to distinguish them both from events whose syntactic monoids have only trivial or relatively small and insignificant subgroups and from events whose syntactic monoids are not groups but have large and important subgroups. Let E be a regular event over an alphabet Z whose syntactic monoid M is a group. Thus E ~ Z* and there is a homomorphism ¢ mapping Z* onto M such that, for some subset M ' of M, E -~ O-I(M'). Since E is a regular event, M is a finite group. If m C M' then let us say that m is a terminal element of M. Let e be the identity of M. The main technique used in this paper is that of establishing "pathwise homomorphisms" between graphs. If 7 is a mapping of the set of 167
168
MC NAUGHTON
nodes and branches of G' into the set of nodes and branches of G such that 7(N'), for any node N' of G', is a node of G, then 7 is a pathwise homomorphism if the conditions (PH1) and (PH2) hold. (PH1) : For each branch B' of G' if B' leads from NI' to N~' then either 7(B') is a node of G and 7(N1') = 7(B') = ~(N2'), or else ~(B') is a branch of G leading from "/(NI') to 7(N~'). Thus, for any path P' in G', 7 determines a path ~/(P') in G; if P' = No'B~'NI'...Np', then-/(P') = NoBIN,...Nq, which is obtained from the sequence ~(No')'y(B~')~/(NI') • ..~/(N~') by deleting ,y(B~'),y(N~') for any i such that ~/(N~'-I) = 7(B~') = ~ ( N ( ) . (PH2): For every path P of G there is a path P' of G' such that 7(P') = P. Let NoBIN1...B~Vp be a p~th P. Then P' = No'BI'NI'...N~', where 0 < q < p is a subpath of P if, for some i <=p-- q, No' = N~ and for every j, 1 < j <=q, N j = N~+j and B / = B~+j. Note that any node by itself is a path of zero length. Hence (PH2) trivially implies that ~/is onto the set of nodes and branches of G. If P~ and P2 are paths, and P1 ends at the same node at which P2 begins, then P1P~ is the path that results from joining the two thus: if P2 is NoB1N1...Np and P~ is NpB~+INp+I...N~ then P1P2 is NoB1N. • •N~. Thus P' is a subpath of P if and only if there exist paths P~' and P~' such that P = P~'P'P~'. ( P ( or P2' m a y b e a path of zero length.) If P begins and ends at the same node then p2 is PP, p3 is PPP, etc. Theorem 3 will state that if G' is pathwise homomorphic to G then the rank of G' is greater than or equal to the rank of G. Theorems 1 and 2 are in effect lemmas for that result. THEOREM 1. I f S is a section of G and G' is pathwise homomorphic to G under % then there is a section S' of G' pathwise homomorphic to S under'y. [A section is a maximal strongly connected component, as defined in McNaughton (1967).] Proof. Since S is strongly connected, for any finite set ~ of paths of S we can find a path P~ such that every path of ~ is a subp~th of P~. From this it follows that there exists an infinite sequence, P~, P2, • • • of paths of S such that, for each i, all paths of S of length i and less are subpaths of P~. Furthermore, we can suppose that each P~ begins and ends at the same node N of S. Let n be the number of nodes of G'. Since G' is pathwise homomorphic to G, for each i, there is a P' of G' such that ~,(P') = P~. Suppose that P' = P~'P2"" "P,', where, for each k, 7(PJ) = P~ ; let N'~-~ and N~' be the end nodes of P~, for each k. Since
T H E LOOP C O M P L E X I T Y O F P U R E - G R O U P
EVENTS
]_69
G' has only n nodes, at least two of the nodes No', . . . , N / must be the same. Thus we can say that for each P~ there exists an x, 1 -< x - n, such t h a t there is a path P" where 7(P") = Pi ~ and where P " begins and ends at the same node, from which it follows that P " is in a section S( of G'. Consider the infinite sequenee of sections S~', $2', . . . thus obtained from the sequence P1, P2"-" • There are only finitely m a n y sections of G'. Thus there must be a section S' that occurs infinitely often in this infinite sequence. But this means that for every path P of S, there is a path P' of S' such that 7 ( P ' ) = P. Thus (PH2) has been verified, and in order to complete the proof that 7 is a pathwise homomorphism from S' to S, we must verify (PH1). But 7 is given to be a pathwise homomorphism from d to G; thus in order to verify (PH1) it su~ttiees to show that for every node N' of S', 7(N') is a node of S. But from what has already been proved we know that there exists a node No' of S' such that 7 ( N J ) is a node of S. There must be a p a t h P l ' from N' to N0' and a pathP2' from No' to N'. 7(P2'PI') must be a path from ~'(N00 to 7(N0') and hence must be entirely in S. Thus 7(N'), which is a node of this path, must also be in S, which concludes the proof of Theorem 1. THEOREM 2. i f G' is pathwise homomorphic to G under 7, Go is a subgraph of G, and Go' the subgraph of G' determined by 7-~(A ) where A is the
set of nodes determining Go, then Go' is pathwise homomorphic to Go under 7. ["Subgraph" is defined in the text of McNaughton (1967). A subgraph is determined by a set of nodes and has just the nodes of that set and all the branches touching just nodes of that set.] The proof is immediate from the definition of "pathwise homomorphism." TI~OREM 3. If G' is pathwise homomorphic to G then the rant~ of G' is
greater than or equal to the rank of G. Proof. Suppose the theorem is false and let s > 0 be the smallest integer such that some G' of rank s is pathwise homomorphic to a graph G of greater rank r. Now s cannot be 0; for that would mean that G' has no sections but G does, violating Theorem 1. At least one section S of G is of rank r and by Theorem i there is a section S' of G' pathwise homomorphic to it. The rank of S' can be no more than s, since G' has rank s, and can be no less than s since s is the smallest integer for which some graph with that rank is pathwisc homomorphie to a graph of larger rank. Thus S' is of rank s.
170
MC I~AUGHTON
Since S' is strongly connected and of rank s, there is a node N' of S' such that the subgraph resulting from S' by the deletion of N' is of rank s - 1. ~,(N') is in S and if we take the subgraph resulting by deleting from S' all the nodes of ~/-~(N') fl S', then the result is a graph Go' of rank no greater than s - 1 and (by Theorem 2) pathwise homomorphie under ~/to Go, the graph resulting from S by deleting ~(Nr). Since S is strongly connected and of rank r, the rank of Go (since it results by the deletion of just one node) is at least r - 1 > s - 1. Thus we have Go' of rank less than s pathwise homomorphic to Go of greater rank, contradictingthe original assumption that s was the smallest rank of any graph pathwise homomorphic to a graph of greater rank, which concludes the proof of Theorem 3 by reductio ad absurdum. A mapping g will now be defined from the set of all nodes of all transition graphs Gr for E onto the class of sets of elements of M, the syntactic monoid for E. t~ will give rise to some fruitful pathwise homomorphisms. For a node N of G', a transition graph for E, g(N) is the set of all elements m of M such that for some path P from an initial node to N in G', P spells out a word W E ¢-1(m). For any set A, let Card(A) be the cardinality of A. THEOREM 4. I f N1 and N~ are nodes of the same section of G', a transi-
tion graph for E, and if some path from NI to N~ spells out W, then Card(~(N1)) = Card(g(N2)) and tL(N2) = t~(N~)¢(W). Proof. Clearly g(N1)¢(W) E g(N~). But the cancellation law for groups implies that Card(g(Nj)) = Card(g(N1)¢(W)). Thus Card(t~(N1) ) ~ Card(t~(N2) ). Now supposea pathfrom N~ to N'I spells out W'; the path exists because N1 and N2 are in the same section. Then g(N2)¢(W') ~ g(N~), and so Card(g(N2)) _-< Card(g(Nl)). Thus Card(g(N~)) = Card(t~(N~)) and t~(N2) = g(N~)¢(W). A g-graph is any graph obtained in the following manner. An arbitrary set A1 of M is selected. Then the class of sets {A~ ,-. • ,A~} is determined as the smallest class containing the set A1 and containing A~m, for every m ~ M. The graph is constructed, a node N~ corresponding to every set A ~. There is exactly one branch labeled with each letter a of the alphabet from each node N~ leading to N j where A~¢(a) = A~. A node N~ is initial if and only ire E A~ and is terminal if and only if every m E A~ is terminal (Such a graph is a state graph if and only if there is one initial node.) The graph is useful if it has at least one terminal node; otherwise useless. It is not difficult to see that, for every node N~ of a g-graph, t~(N~) = A~, the set corresponding to N~ in the construction of the
THE L O O P
COMPLEXITY OF PURE-GROUP EVENTS
171
g-graph. [To show that Ai ----- g(Ni), let m C A~. Let W0 C (~-1(m-1) and suppose that the path from Ni spelling out W0 ends at N j . Then Aj = A~n-1. e E A i , and so Nj is an initial node. If W C ~-1(m), ¢(WoW) = e and the path beginning at N~ spelling out W must end at N i . Thus m ~ ~(N~). To show that ~(N~) ~ A i , let m ~ g(N~). For some W E ~-l(m), there is a path from an initial node Nj to N~ spelling out W. Then, by construction of the g-graph, m C A~ .] Furthermore, every word accepted by a g-graph is in the event E, although the converse is not true in general. TI~EOREM 5. W is accepted by a g-graph G if and only i r e ( w ) ~ g ( N ) , for some terminal node N of G. If W is accepted by virtue of a path spelling out W from an initial node to a terminal node N, then ¢(W) C g(N) by definition of/~. On the other hand, suppose ¢(W) E ~(N), for some terminal node N. Then for some W' such that ¢(W') = ¢(W), there is a path spelling out W p from an initial node No to N, again by the definition of ~. Thus g(N0) ~(W p) = ~(N). But since ¢(W) = ¢ ( W ' ) , ~(N0)¢(W) = ~(N), and the unique path beginning at No spelling out W must end at N. Hence W is accepted. Theorem 5 tells us that every useful g-graph accepts an event ¢-1(To) such that To is a nonempty subset of the set of terminal elements of M. The objective is now to show that a graph for E of minimal rank can be formed as the union graph of some set of useful g-graphs. (A union graph of several graphs is simply the result of considering these graphs together as a single graph. No new branches or nodes are added and a node is initial, or terminal, if and only if it was so before. Thus the union graph of two graphs has two portions, such that there is no path from either portion to the other.) THEOREM 6. I f G' is any graph for E and m is a terminal element of M then there is a section of Gt pathwise homomorphic to a g-graph accepting all words of ¢-1(m). LEMMA. For each i there exists a word Wi such that ¢(Wi) = e, and, for every path P in every tL-graph G of length i or less, the path beginning at any node of G spelling out W~ contains P as a subpath. Proof. There are finitely many nodes in each of the finitely many g-graphs. Also there are finitely many paths of length i or less in each t~-graph. A word can be written from left to right, covering each of these finitely many eventualities in turn. For example, suppose we have written an initial portion U of the word and we wish to add something
172
~c
NAUGHTON
to the right of U to make sure that the path beginning at No in G spelling out W: (when it is completed) has a path P as a subpath. Suppose P goes from N2 to N3, spe]]ingout V1, and supposethe path from No spelling out U (which is unique) ends at N1. Then there is a path P0, spelling out some word V0, from N1 to N~ since G is strongly connected. In that case simply add VoVI to U, obtaining UVoV~. After all these eventualities have been taken care of a word U~ has been obtained. Simply tack on a word V so that ¢(U~)¢(V) = ¢ ( U W ) = e, which is possible since M is a group. W~. = U W then is a word satisfying the Lemma, whose proof is concluded. Now let U~ be a word such that ¢ ( U ~ ) = m. Thus, for every x, ¢ ( W ~ U , ~ ) = m, and Wi~U,~ must be accepted b y G'. Let n be the number of nodes of G'. Then let PP be a path in G' from ~t l g an initial node to a terminal node spelling out Wi U~. Let P ' = P1 P2 l l • " •Pn Pn+l where each P h,' 1 =< h -< n, spells out W~ and P~+I spells out U~. For each i, assume that Pht goes from Nh-i to Nh. Now, of the nodes No, • • • , N~, at least two must be the same, say, Nh and N~, h < j. Thus we have a path from NI~ to Nh spelling out W~-~. Let Si be the section of G~ containing this path. In this way a section S~ of G~ is determined for each W~ defined above. In the infinite sequence $1, $2, . . . , at least one section S occurs infinitely often since there are only finitely m a n y sections. Furthermore for any i and j, if i < j, then S~ would work as well in place of S~. Thus we can replace every section in S j , $2, • • • by S. We then have t h a t for every W~ there is a node N of S and an x _-> 1 such that some path from N to N in S spells out W~~ . We can now define a mapping v of the nodes and branches of S into the nodes and branches of a #-graph G which will turn out to be a pathwise homomorphism. For any node N ' of S take G to be that #-graph having a node N such t h a t tt(N t) = t*(N). Then by Theorem 4, for every node N ~ of S, there is a node N of G such that ~ ( N ~) = t~(N) : take v ( N t) to be this node of G. For any branch B in S leading from N1 to N2 there are two possibilities. Case I . If B is labeled }, then tt(N~) = ~(N~) by definition of t* and Theorem 4, and so v(N1) = 7(N~) : take v ( B ) = v(N1) = 7(N2). Case I I . If B is labeled a C ~ then #(N1)ck(a) = t*(N~), by Theorem 4, and so there must be a branch B0 in G labeled a from v(N~) to v(N2) : take v ( B ) = B0. Note that ( P H 1 ) for v follows immediately by remarks made in the above paragraph. To verify ( P H 2 ) consider any path P = NoB1N~. - •N ~
THE LOOP COMPLEXITY OF PURE-GROUP EVENTS
~3
in G. P has length p, and there must be a path p1 in S spelling out Wp z, for some x :> i, from some node N t to N t. Then, by definition of Wp, the path in G beginning at 7(N p) spelling out Wp must have every path of length p of G as a subpath; thus it must have P as a subpath. Thus, there must be a subpath P~ of PI such that ~,(PP~) -- P, which concludes the verification of (PH2) and the proof of Theorem 6. A complete set of #-graphs is a set having, for each terminal m in M, a G accepting all words of ¢(m). A union graph of any complete set of u-graphs is a graph for E. But from Theorem 6 it can be seen that, for any graph Gr for E, there is a complete set I G1, • • " , G~} such that for each Gi, 1 _-< i _-< u, there is a section S~ of G~ such that S~ is pathwise homomorphic to G~. Thus, by Theorem 3, the rank of G' is no less than the rank of the union graph for this set, since G1, ..- , G~ are all the sections of the union graph. Theorem 7 follows from these considerations. THEOUEM 7. The loop complexity of a pure-group event equals the rank of a graph which is the union graph of a complete set of u-graphs. Before stating the main theorem in which an algorithm is presented for determining the complexity of a given pure-group event, note first that the syntactic monoid of any regular event can be constructed from a state graph for the event (or from any other representation, such as a regular expression, which can be converted into a state graph), as is shown by Rabin and Scott (1959). Then all of the finitely many u-graphs are constructible. Next note that Theorem 5 assures us that it is an easy m a t t e r to determine whether a given set of u-graphs is complete. TRLOREM 8 (Main Theorem). An algorithm to determine the loop complexity proceeds by constructing all useful u-graphs and computing their rank. Next all the finitely many complete sets of useful u-graphs are examined; a rank is associated with each set, namely the largest rank of all the u-graphs in that set. The smallest such ranl~ associated with any complete set is the loop complexity of the event. The next theorem contains a subsidiary result of some interest. In the proof use will be made of the well-known fact that in the reduced state graph of a pure-group event every word is a permutation of the set of nodes. Thus, for any word W, if there is a p a t h from N~ to N3 and a path from N2 to N4 both spelling out W, then N~ = N2 if and only if N s --- N 4 .
THEOREM 9. I f the reduced state graph G of a pure-group event E has exactly one terminal state, then the loop complexity of E equals the rank of G. Theorem 9 is proved by proving that any other all-admissib]e transi-
17~
MC NA~GHTON
tion graph G~ for E is pathwise homomorphic to G. ("All-admissible" is defined in Section 3 of McNaughton (1967) ; it means that every node of G' is part of a path from an initial to a terminal node.) LEMMA. If there are two paths each from some initial node to a common node N I in Gt, one spelling out W1 and the other spelling out W2, and if there are paths from the initial node to N1 and N~ in G spelling out W1 and W~ , respectively, then N1 = N~. Proof. Suppose the hypothesis of the Lemma. Then since G1 is alladmissible there is a path from N ~ to a terminal node spelling out some word W. Thus W I W E E and W2W E E. It follows that there are paths from N1 and from N~, respectively, to the one terminal node of G spelling out W; it follows that N1 = N2, since W must permute the set of nodes of G, by a remark made above. This proves the Lemma. The Lemma permits a definition of v, a pathwise homomorphism. ~/(N'), for N ' any node of Gt, is the one node N of G such that there exists a word W spelled out by a path from an initial node to N ~ in GJ and by a path from the initial node to N in G. For any branch B p of G' from N / t o N~': if B' is labeled X, since 7(N1') = 7(N2'), take ~(B r) = ~/(N/); but if B' is labeled a E ~, since the branch B labeled a from 7(N1') must go to 7(N~'), take 7(B') = B. Clearly (PH1) is satisfied for 7. (PH2) follows from the fact that every path Po of G must be a subpath of a path P from the initial node to the terminal node spelling out a word W E E. There must be a path PP in G' from an initial to a terminal node spelling out W, ~/(PJ) = P, by the Lemma and by the construction of 7. Hence for some P0% a subpath of P', 7(Po') =- Po, which concludes the proof of Theorem 9. (Theorem 9 can be generalized slightly so as to include some events that are not pure-group events: thus if E has an all-admissible G with one initial node and one terminal node, if for every node N and for every a E ~ there is at most one branch leaving N labeled a and at most one branch entering N labeled a, and if no branches are labeled X, then the rank of G is the loop complexity of E. The entire proof of Theorem 9 goes over with obvious modifications. The case actually covered in Theorem 9 is the special case in which both occurrences of the phrase "at most one" are replaced by the phrase "exactly one.") Theorem 9 is useful, for it enables us to ascertain the loop complexity of some events without even constructing the syntactic monoid. (We can verify by inspection of the reduced state graph that the syntactic monoid is a group by noting that every input letter is a permutation of
THE LOOP COMPLEXITY OF PURE-GROUP EVENTS
175
the set of nodes.) For example, Theorem 9 could be used to verify the result of Dejean and Schfitzenberger (1966) that the loop complexity of the event Ek is k, where Ek is the event over {0, 1} consisting of the set of all words with the property that the number of O's minus the number of l's is an integral multip]e of 2k. However, the genera] case is surely not as simple as this, and the complication of the algorithm described in the proof of Theorem 8 would appear to be necessary. Consider the event given by the reduced state graph of Fig. 1 (which in this case happens also to be a graph for the syntactic monoid of the event). The rank of this graph is 3, whereas the event has loop complexity 2, since it can be represented by the union of the two g-graphs of Fig. 2 (rank 2) and Fig. 3 (rank 1). 0 0 1
0
I
I'
FIG. 1
FIG. 2
FIG. 3
176
MC NAUGHTON
I t was proved (Dejean, 1967) t h a t all pure-group events of loop complexity 1 over an alphabet ~ are of the form (E~)*( ~ U z ~2 U- • • U z4~), where for each ix, 0 < ix < k - 1. Verifying this result b y reference to graphs is possible, although it is no improvement. Noting t h a t in a t~-graph of a pure-group event every word serves to permute the set of nodes, the key step is to show t h a t every such graph of rank 1 consists of a set of nodes No, N1 • • • , Nr-1 where, for all a C Z, the branch from N~ labeled a goes to Ni+l (or to No if i = p - 1). The proof of this step, and the verification b y means of it of Dejean's result are left as problems of some difficulty for the interested reader with a working knowledge of the material b y M e N a u g h t o n (1967). RECEIVED: M a r c h 1, 1967 REFERENCES McN~u~HTON, R. (1967), The loop complexity of regular events. I~1 Proceedings of the 1965 Symposium on Logic, Compatibility ang Automata (Frank B. Cannonito, ed.). Thompson Book Company, Washington. RABIN, M. AND SCOTT, ~). (1959), Finite automata and their decision problems. I B M J. Res. Develop, 3, 114-125. DEJEAN, F. AND SCIti~TZENBERGER,~I. t ). (1966), On a question of Eggan. Inform. Control 9, 23-25. DEaEAN, F. (1967), Une propri6td de certaines partie d'un monoide fibre. Inform. Control 10,434-444.