On the boundary of regular languages

On the boundary of regular languages

Theoretical Computer Science 578 (2015) 42–57 Contents lists available at ScienceDirect Theoretical Computer Science www.elsevier.com/locate/tcs On...

556KB Sizes 0 Downloads 107 Views

Theoretical Computer Science 578 (2015) 42–57

Contents lists available at ScienceDirect

Theoretical Computer Science www.elsevier.com/locate/tcs

On the boundary of regular languages ✩ Jozef Jirásek a,1 , Galina Jirásková b,∗,2 a b

Institute of Computer Science, Faculty of Science, P.J. Šafárik University, Jesenná 5, 040 01 Košice, Slovakia Mathematical Institute, Slovak Academy of Sciences, Grešákova 6, 040 01, Košice, Slovakia

a r t i c l e

i n f o

Article history: Received 28 October 2013 Received in revised form 25 June 2014 Accepted 14 January 2015 Available online 20 January 2015 Keywords: Regular languages Boundary Finite automata State complexity

a b s t r a c t We prove that the tight bound on the state complexity of the boundary of regular languages, defined as bd( L ) = L ∗ ∩ ( L )∗ , is 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. Our witness languages are described over a five-letter alphabet. Next, we show that this bound cannot be met by any quaternary language if n ≥ 5. However, the state complexity of boundary in the quaternary case is smaller by just one. Finally, we prove that the state complexity of boundary in the binary and ternary cases is Θ(4n ). © 2015 Elsevier B.V. All rights reserved.

1. Introduction The famous Kuratowski’s “14-theorem” states that, in a topological space, at most 14 sets can be produced by applying the operations of closure and complement to a given set [2,5]. In analogy with this theorem, Brzozowski et al. [1] proved that there is only a finite number of distinct languages that arise from the operations of Kleene (or positive) closure and complement performed in any order and any number of times. Every such language can be expressed, up to inclusion of the empty string, as one of the following five languages and their complements: L , L ∗ , ( L )∗ , ( L ∗ )∗ , (( L )∗ )∗ , where L and L ∗ denote the complement and Kleene closure of L, respectively. If the state complexity of a regular language L, that is, the number of states of the minimal deterministic finite automaton for L, is n, then the state complexity of L is also n, and the state complexity of L ∗ and ( L )∗ is at most 3/4 · 2n [6,13]. The state complexity of ( L ∗ )∗ could potentially be double-exponential [9], however, as shown in [3], it is only 2Θ(n log n) . Brzozowski, Grant, and Shallit in [1] also studied the concepts of “open” and “closed” sets. A language L is said to be Kleene-closed if L = L ∗ , where L ∗ is the Kleene closure of L. A language is Kleene-open if its complement is Kleene-closed. The same notions can be defined for positive closure. These are natural analogues of the concepts with the same names from point-set topology, and in [1], the authors found many natural analogues of the classical theorems. The boundary of a language is defined as bd( L ) = L ∗ ∩ ( L )∗ , respectively, as L + ∩ ( L )+ for positive closure [1,9,10]. In this paper, we study the state complexity of the boundary of regular languages in the case of Kleene closure. To simplify the

✩ This work was presented at the CIAA 2013 conference held in Halifax, Canada on July 16–19, 2013, and its extended abstract appeared in the conference proceedings [4]. Corresponding author. E-mail addresses: [email protected] (J. Jirásek), [email protected] (G. Jirásková). 1 Supported by VEGA grant 1/0142/15. 2 Supported by VEGA grant 2/0084/15.

*

http://dx.doi.org/10.1016/j.tcs.2015.01.022 0304-3975/© 2015 Elsevier B.V. All rights reserved.

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

43

exposition, we will write everything in an exponent notation, using c to represent complement, thus L c ∗ stands for ( L )∗ , and so bd( L ) = L ∗ ∩ L c ∗ . We show that if a language L over an alphabet Σ is accepted by an n-state deterministic finite automaton (DFA), then the boundary bd( L ) is accepted by a DFA of at most 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2 states. We also show that this bound is tight in the case when the alphabet Σ has at least five symbols. Next, we show that if n ≥ 5, then this bound cannot be met by any language defined over a four-letter alphabet, and that the tight bound in the quaternary case is 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 1. Finally, we prove that the state complexity of boundary in the binary and ternary cases is Θ(4n ). We also study the case when in a DFA for a language L, only the initial state is final. The upper bound for the boundary of L in such a case is (n + 2) · 2n−2 + 1, and we prove that this bound can be met by a binary language. 2. Preliminaries In this section, we recall some basic definitions. For details and all unexplained notions, the reader may refer to [8,11,12]. For integers i and j with i ≤ j, we denote by [i , j ] the set of integers {k | i ≤ k ≤ j }. The cardinality of a finite set A is denoted by | A |, and its power-set by 2 A . Let Σ be a finite non-empty alphabet. Then Σ ∗ denotes the set of all strings over the alphabet Σ , including the empty string ε . A language over the alphabet Σ is any subset of Σ ∗ . Let K and L be languages over an alphabet Σ . Then L c =  ∗ ∗ ∗ i Σ \ L, K ∩ L = { w ∈ Σ | w ∈ K and w ∈ L }, K L = {uv | u ∈ K and v ∈ L }, and L = i ≥0 L , where L 0 = {ε } and L i +1 = L i L. The boundary of a language L is the set bd( L ) = L ∗ ∩ L c ∗ , where we use L c ∗ to denote ( L c )∗ . A nondeterministic finite automaton (NFA) is a quintuple A = ( Q , Σ, ·, s, F ), where Q is a finite non-empty set of states, Σ is a finite alphabet, ·: Q × Σ → 2 Q is the transition function which is extended to the domain 2 Q × Σ ∗ in the natural way, s ∈ Q is the initial state, and F ⊆ Q is the set of final states. The language accepted by A is the set L ( A ) = { w ∈ Σ ∗ | s · w ∩ F = ∅}. An NFA A = ( Q , Σ, ·, s, F ) is deterministic (and complete) (DFA) if |q · a| = 1 for each q in Q and each a in Σ . In such a case, we write q · a = q instead of q · a = {q }. A state q of the DFA A is reachable if there exists a string w in Σ ∗ such that s · w = q. Two states p and q are distinguishable if there exists a string w such that exactly one of the states p · w and q · w is final. Two states are equivalent if they are not distinguishable. The state complexity of a regular language L, sc( L ), is the smallest number of states in any DFA recognizing L. It is well known that a DFA is minimal (with respect to the number of states) if all its states are reachable, and no two distinct states are equivalent. Every symbol a of the DFA A may be viewed as a transformation on the set Q , that is, as a mapping from Q to Q . A symbol a is called a permutation symbol if a performs a permutation on Q . The symmetric group is the group of all permutations on the set {0, 1, . . . , n − 1}. The symmetric group is generated by a circular shift that maps i to (i + 1) mod n, and by a swap permutation that swaps 0 and 1 and maps any other i to itself. Every NFA A = ( Q , Σ, ·, s, F ) can be converted to an equivalent DFA A = (2 Q , Σ, · , {s}, F ), where R · a = R · a and F = { R ∈ 2 Q | R ∩ F = ∅} by the subset construction [7]. The DFA A is called the subset automaton of the NFA A. The subset automaton need not be minimal since some of its states may be unreachable or equivalent. Let A = ( Q A , Σ, s A , · A , F A ) and B = ( Q B , Σ, s B , · B , F B ) be two DFAs. Then L ( A ) ∩ L ( B ) is recognized by the product automaton A × B = ( Q A × Q B , Σ, ·, (s A , s B ), F A × F B ), where ( p , q) · a = ( p · A a, q · B a). 3. Upper bound: construction of DFAs for boundary The boundary of a regular language L is defined by bd( L ) = L ∗ ∩ L c ∗ , where L c ∗ = ( L c )∗ . Since the state complexity of star is 3/4 · 2n [6,13], the trivial upper bound on the state complexity of boundary is 9/16 · 4n . The aim of this section is to get a slightly better upper bound 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. We start with the construction of a DFA for L ∗ ∩ L c ∗ . Without loss of generality, we may assume that the empty string is in L. Let a language L be accepted by a DFA A = ( Q , Σ, ·, s, F ), where | Q | = n, s ∈ F , | F | = k, and · is the transition function extended to the domain 2 Q × Σ ∗ in a natural way. Let F c = Q \ F . Construct an NFA N for the language L ∗ from the DFA A as follows. For each state q and each symbol a, if q · a ∈ F , then add the transition from q to s on a. Next, construct an NFA N for the language L c ∗ from the DFA A as follows. First, interchange the sets of final and non-final states to get a DFA for L c . Then, add a transition from a state q to the state s on a symbol a whenever q · a ∈ F c . Finally, add a new initial and final state q0 that goes on each symbol a to {s · a} if s · a ∈ / F c, c and to {s, s · a} if s · a ∈ F . Fig. 1 illustrates the construction of NFAs N and N . Denote the transition functions of the NFAs N and N , extended to the domain 2 Q × Σ ∗ in a usual way, by ◦ and •, respectively. Let D and D be the subset automata of the NFAs N and N , respectively. Then the language L ∗ ∩ L c ∗ is accepted by the product automaton D × D , the states of which are pairs of subsets of Q . The initial state of the product automaton is the pair ({s}, {q0 }), and a pair ( S , T ) is final if S is a final state in D and T is a final state in D , that is, if S ∩ F = ∅ and T ∩ F c = ∅. Each pair ( S , T ) goes to the pair ( S ◦ a, T • a) in the product automaton. Notice that



S ◦a=

S · a, S · a ∪ {s},

if S · a ∩ F = ∅, otherwise,



T •a=

T · a, T · a ∪ {s},

if T · a ∩ F c = ∅, otherwise.

44

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

Fig. 1. A DFA A of a language L and the NFAs N and N for the languages L ∗ and L c ∗ , respectively.

It follows that in the DFA D, a set S may go to a set of (at most by one) larger cardinality by any symbol a only if the set S · a contains a final state of A. Next, if a reachable set S of the DFA D contains a state in F , then S must also contain the initial state s of the DFA A since the NFA N has a transition from a state q to the initial state s on a symbol a whenever q ·a ∈ F. In the DFA D , a set T may go to a set of (at most by one) larger cardinality by a, only if T · a contains a state in F c . Each set that is reachable in the DFA D and contains a state in F c must also contain the initial state s of the DFA A since the NFA N has a transition from a state q to the initial state s on a symbol a whenever q · a ∈ F c . The following observation shows that for each pair that is reachable in the product automaton, except for the initial pair ({s}, {q0 }), the intersection of the left and right components must be non-empty, and that the initial state s must occur at least in one of the two components. Proposition 1. Let ( S , T ) be a non-initial pair that is reachable in the product automaton for L ∗ ∩ L c ∗ . Then S and T have a non-empty intersection, and at least one of S and T contains the initial state s. Proof. Let ( S , T ) be a non-initial reachable pair. Then there exist a symbol a and a string w such that ( S , T ) = ({s} ◦ aw , {q0 } • aw ). Since the state s · a is in the intersection of {s} ◦ a and {q0 } • a, the state s · aw must be in the intersection of S and T . To show the second part of the proposition, let q ∈ S ∩ T . If q ∈ F , then s ∈ S, otherwise s ∈ T . This concludes the proof. 2 We use the above observation to show that in the product automaton, we cannot reach any pair ( S , T ) having only s in the intersection of S and T , and such that T contains k − 1 non-initial states, while every other non-initial state is in S. Recall that n = | Q |, k = | F |, and s ∈ F . In what follows we assume that n ≥ 3 and k ≥ 2. Lemma 1. Let N = {( S , T ) ∈ 2 Q × 2 Q | S ∩ T = {s}, | T | = k, | S | = n − k + 1}. No pair in N is reachable in the product automaton for L ∗ ∩ L c ∗ . Proof. Assume that a pair ( S , T ) in N is reached from a pair ( P , R ) by a symbol a in the product automaton, that is, ( S , T ) = ( P ◦ a, R • a). Let us show that the pair ( P , R ) must also be in N . The pair ( P , R ) is different from the initial state since the initial state ({s}, {q0 }) goes by a either to ({s, s · a}, {s · a}) or to ({s · a}, {s, s · a}) that are not in N (they either do not have s in the intersection of the left and right component, or both are equal to ({s}, {s}); recall that we assume n ≥ 3 and k ≥ 2). Hence ( P , R ) is a pair in 2 Q × 2 Q and P ∩ R = ∅ by Proposition 1. Let S = S \ {s} and T = T \ {s}. Then the sets S and T are disjoint, do not contain s, and | T | = k − 1, and | S | = n − k. Let P = P \ R and R = R \ P . Notice that for each state q in P ∩ R, we must have q · a = s because otherwise the intersection S ∩ T would contain a state different from s. Hence S ⊆ P · a and T ⊆ R · a, which is illustrated in Fig. 2. It follows that | S | ≤ | P | and | T | ≤ | R |. Thus | P | ≥ n − k and | R | ≥ k − 1, so | P | + | R | ≥ n − 1. Since P and R are disjoint subsets of Q with | Q | = n, it follows that | P ∩ R | = 1. Hence | R | = | R | + | P ∩ R | = k, and | P | = | P | + | P ∩ R | = n − k + 1. Next, since for the pair ( P , R ), we have | P | = n − k + 1, where k = | F |, the set P must contain a state in F . Therefore it also contains the state s. Since | R | = k, then either R = F or R contains et least one state in F c . In both cases, the state s is in R. Thus P ∩ R = {s}, which means that the pair ( P , R ) is in the family N . The proof is complete. 2 Now we are able to prove an upper bound on the state complexity of the boundary of L. Recall that the state complexity of a regular language L, sc( L ), is the number of states of the minimal DFA recognizing the language L. The following lemma provides an upper bound that depends on the number of final states in the minimal DFA for L. Then we show that this upper bound is maximal if the minimal DFA has two final states. In the end of this section, we discuss the case when the initial state is a unique final state.

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

45

Fig. 2. Unreachable states; S ∩ T = {s}, | T | = k, and | S | = n − k + 1.

Lemma 2. Let n ≥ 3 and 2 ≤ k ≤ n − 1. Let L be a regular language with ε ∈ L and sc( L ) = n. Let the minimal DFA for L have k final states. Then





sc L ∗ ∩ L c ∗ ≤ 4n−1 −





n−1 k−1

+ 2n−k · 2n−1 − 3n−k · 2k−1 + 2k−1 · 2n−1 − 3k−1 · 2n−k + 1.

Proof. Let L be accepted by a minimal DFA A = ( Q , Σ, ·, s, F ) with | F | = k. Since ε ∈ L, the initial state s is in F . Let Fc = Q \ F. Construct the NFAs N and N and the DFAs D and D as described above, and consider the product automaton D × D for the language L ∗ ∩ L c ∗ . Let us count the number of reachable pairs in the product automaton. By Proposition 1, for each non-initial pair ( S , T ), the sets S and T have a non-empty intersection, and at least one of them contains the initial state s / S and of the DFA A. We now count the number of reachable pairs ( S , T ) in 2 Q × 2 Q such that (i) s ∈ S and s ∈ T , (ii) s ∈ s ∈ T , and (iii) s ∈ S and s ∈ / T. (i) Consider the case of s ∈ S and s ∈ T . We have 2n−1 · 2n−1 such pairs. However, by Lemma  1, the pairs in {( S , T ) ∈ n−1 2 Q × 2 Q | S ∩ T = {s}, | T | = k, | S | = n − k + 1} are unreachable. Hence, we have to subtract k−1 from 4n−1 . / S and s ∈ T , then S must be a subset of F c and T is a subset of Q containing s. The number of all such pairs (ii) If s ∈ is 2n−k · 2n−1 . However, the subsets S and T must have a non-empty intersection, so we need to subtract all the pairs with S and T disjoint. The number of such pairs is 3n−k · 2k−1 since every function f : F c → {1, 2, 3} may be viewed as a code of two disjoint subsets of F c : S = {i | f (i ) = 1} and T = {i | f (i ) = 2}. Some other elements of T may be chosen arbitrarily in the set F \ {s}. Thus in this case, we get 2n−k · 2n−1 − 3n−k · 2k−1 pairs. / T is symmetric. The set T must be a subset of F not containing s, and after subtracting the (iii) The case of s ∈ S and s ∈ disjoint subsets, we get 2k−1 · 2n−1 − 3k−1 · 2n−k pairs in this case. By counting the number of pairs in all three cases, and by adding the initial state ({s}, {q0 }), we get the expression in the statement of the lemma. 2 Now we show that the upper bound given by Lemma 2 is maximal if the minimal DFA for the language L has 2 or n − 1 final states. Lemma 3. Let n ≥ 3 and 2 ≤ k ≤ n − 1. The value of the function

f (k) = 4n−1 −





n−1 k−1

+2n−k · 2n−1 − 3n−k · 2k−1 + 2k−1 · 2n−1 − 3k−1 · 2n−k + 1

is maximal if k = 2 or k = n − 1, and the maximum of the function f (k) is 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. Proof. Since the value of 4n−1 − k−1

2

k−1

n−1

k−1

n−k

n−1 k−1

is maximal if k = 2 or k = n − 1, we only need to prove that 2n−k · 2n−1 − 3n−k ·

+2 ·2 −3 ·2 is maximal if k = 2 or k = n − 1. 1 After the substitution x = n+ − k, we get the function 2 f (x) = 2n−

n +1 2 +x

· 2n−1 − 3n−

n +1 2 +x

·2

n +1 2 −x−1

+2

n +1 2 −x−1

· 2n−1 − 3

n +1 2 −x−1

· 2n−

n +1 2 +x

.

Straightforward calculations result in

f (x) = 8

n −1 2

    n −1 · 2x + 2−x − 6 2 · (3/2)x + (3/2)−x .

It follows that f is a symmetric function, that is, f (x) = f (−x). We will show that f is increasing if x > 0, and therefore decreasing if x < 0, and that is has a minimum at x = 0. To this aim consider the first derivative

f (x) = 8

n −1 2

    n −1 · ln 2 · 2x − (1/2)x − 6 2 · ln(3/2) · (3/2)x − (2/3)x .

n −1 2

      n −1 · ln(3/2) · 2x − (1/2)x − (3/2)x + (2/3)x = 6 2 · ln(3/2) · 2−x · 3x + 1 · (4/3)x − 1 > 0,

If x > 0, then

f (x) > 6

46

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

Fig. 3. The DFA A over {a, b, c , d, e } accepting a language L with sc( L ∗ ∩ L c ∗ ) = 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. n n−3 and f (0) = 0. Since we have 2 ≤ k ≤ n − 1, we are interested in the values of x in the interval  3− , 2 . For such values 2 of x, the function f has maximum at both borders of the interval, and the lemma follows. 2

Next, consider the case when in a DFA for L, the initial state is the sole final state. Lemma 4. Let n ≥ 3 and let L be a regular language accepted by an n-state DFA, in which only the initial state is final. Then sc( L ∗ ∩ L c ∗ ) ≤ (n + 1) · 2n−2 + 1. Proof. If L is accepted by a DFA, in which the initial state is a unique final state, then L = L ∗ . Thus, the DFA D for the language L ∗ is the same as the DFA A. Therefore the language L ∗ ∩ L c ∗ is accepted by the product automaton A × D , where D is the DFA for L c ∗ described in the beginning of this section. Let s be the initial and the unique final state of A. All the other states of A are non-final. Therefore in every reachable state ( S , T ) of the product automaton, the set S is equal to a set {q}, where q is a state of A. The initial state of the product automaton is ({s}, {q0 }). Next, if ({q}, T ) is a reachable and non-initial state of the product automaton, then T ∩ {q} = ∅, and moreover, if q = s, then the state s must be in T . This gives 2n−1 states ({s}, T ), and 2n−2 states ({q}, T ) for every non-final state q. The lemma follows. 2 Finally, let us show that (n + 1) · 2n−2 + 1 does not exceed our upper bound 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. Lemma 5. Let n ≥ 2. Then (n + 1) · 2n−2 + 1 < 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. Proof. It is enough to show that n · 2n−2 + n + 2 · 3n−2 ≤ 3/8 · 4n . If n ≥ 2, then 1 ≤ 2n−2 and n ≤ 2n−1 . Therefore

n · 2n−2 + n ≤ n · 2n−2 + n · 2n−2 = n · 2n−1 ≤ 22n−2 .

(1)

Next, we have

2 · 3n−2 ≤ 2 · 4n−2 = 22n−3 . n−2

Using (1) and (2) we get n · 2

(2) n−2

+n+2·3

2n−2

≤2

2n−3

+2

= 3/8 · 4 , and the lemma follows. 2 n

As a result, we get the following upper bound on the state complexity of boundary of regular languages. Theorem 1 (Boundary: upper bound). Let n ≥ 3 and let L be a regular language such that sc( L ) = n. Then sc( L ∗ ∩ L c ∗ ) ≤ 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. 2 4. Matching lower bound In this section, we show that the upper bound given by Theorem 1 is tight. Our witness languages will be defined over a five-letter alphabet. However, the fifth symbol will be used only to prove the reachability of one particular pair. As a consequence, a lower bound for a four-letter alphabet is just by one smaller. In the next section we show that the upper bound cannot be met by any quaternary language. Let n ≥ 4 and Σ = {a, b, c , d, e }. Let L be the language accepted by the DFA A = ( Q , Σ, ·, 0, {0, 1}) shown in Fig. 3, with the state set Q = {0, 1, . . . , n − 1}, and the transitions defined as follows. By a, states 0 and 1 go to themselves, state n − 1 goes to 2, and every other state i goes to i + 1. By b, the states 0 and 1 are interchanged, state 2 goes to state 0, and every other state goes to itself. The inputs c , d, and e interchange the states 0 and 2, 1 and 2, and 0 and 1, respectively. Construct an NFA N for the language L ∗ from the DFA A as described in the previous section, that is, add transitions (0, b, 0), (0, e , 0), (1, a, 0), (1, c , 0), and (2, d, 0). Next, construct an NFA N for the language L c∗ from the DFA A first by exchanging the final and non-final states, and then by adding a new initial and final state q0 going to {0} by a, d, to {1} by b, e and to {0, 2} by c. Next, for each state q in Q and each symbol σ in Σ , add the transition (q, σ , 0) whenever q · σ ∈ {2, 3, . . . , n − 1}, that is, add the transitions (0, c , 0), (1, d, 0), (2, a, 0), (2, e , 0), and (i , σ , 0) for each i with 3 ≤ i ≤ n − 1 and each σ in Σ . Let D and D be the subset automata of the NFAs N and N , respectively. In what follows, we consider the product automaton D × D for the language L ∗ ∩ L c ∗ .

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

47

4.1. Reachability The aim of this subsection is to show that the product automaton has 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2 reachable states. We need to show that the following pairs in 2 Q × 2 Q are reachable:

( S , T ) with 0 ∈ S ∩ T , except for the pairs with S ∩ T = {0} and | T | = 2 and | S | = n − 1, ( S , T ) with S ⊆ {2, 3, . . . , n − 1}, 0 ∈ T , and S ∩ T = ∅, ( S , {1}), where {0, 1} ⊆ S. We start with the reachability of some special sets, and then we prove the reachability of the pairs ( S , T ) with S ∩ T = {0}. Notice that we need the symbol e only to get the reachability of the pair ( Q , {0}). Lemma 6. Let k,  ≥ 0 and k +  ≤ n − 1. Moreover, if  = 1 let k ≤ n − 3. Then the pair



{0,  + 1,  + 2, . . . ,  + k}, {0, 1, 2, . . . , }



is reachable in the product automaton. Proof. Consider three cases. (i) Let  = 0. Let us show by induction on k that every pair ({0, 1, 2, . . . , k}, {0}) is reachable in the product automaton. The claim holds if k = 0 or k = 1 since we have



 a   b   b   {0}, {q0 } − → {0}, {0} − → {0, 1}, {1} − → {0, 1}, {0} .

Assume that 2 ≤ k ≤ n − 2, and that ({0, 1, 2, . . . , k − 1}, {0}) is reachable. Since



 a   {0, 1, 2, . . . , k − 1}, {0} − → {0, 1, 3, 4, . . . , k}, {0}  a   d  − → {0, 2, 3, 4, . . . , k}, {0} − → {0, 3, 4, 5, . . . , k + 1}, {0}  an−3   bb  − → {0, 1, 3, 4, 5, . . . , k + 1}, {0} − −→ {0, 1, 2, 3, 4, . . . , k}, {0} ,

the pair ({0, 1, 2, . . . , k}, {0}) is reachable as well. Finally, if k = n − 1, then ({0, 1, 2, . . . , n − 1}, {0}) is reached from the pair ({0, 1, 2, . . . , n − 2}, {0}) by adee. Notice that this is the only place in the proof of reachability and distinguishability where we use the symbol e. (ii) Let  = 1. By the assumption of the lemma, we have k ≤ n − 3. As shown in case (i), the set ({0, 1, 2, . . . , k}, {0}) is reachable, and



 ada   {0, 1, 2, 3, . . . , k}, {0} − −→ {0, 3, 4, . . . , k + 2}, {0}   d   cc − → {0, 3, 4, . . . , k + 2}, {0, 2} − → {0, 3, 4, . . . , k + 2}, {0, 1}  an−3  − −→ {0, 2, 3, . . . , k + 1}, {0, 1} .

Thus the lemma holds in the case of  = 1 and k ≤ n − 3. (iii) Let 2 ≤  ≤ n − 1 and k +  ≤ n − 1. Then k ≤ n − 3, and therefore the pair ({0, 2, 3, . . . , k + 1}, {0, 1}) is reachable as shown in case (ii). Next,



 (acc)−1   {0, 2, 3, . . . , k + 1}, {0, 1} −−−−−→ {0,  + 1,  + 2, . . . ,  + k}, {0, 1, . . . , } .

The proof is complete.

2

Lemma 7. Let S , T be subsets of Q with S ∩ T = {0}. Moreover, if | T | = 2, let | S | ≤ n − 2. Then the pair ( S , T ) is reachable in the product automaton. Proof. Let S = {0, s1 , s2 , . . . , sk } and T = {0, t 1 , t 2 , . . . , t  } be subsets of Q with {s1 , s2 , . . . , sk } ∩ {t 1 , t 2 , . . . , t  } = ∅. By the assumption of the lemma, if  = 1, then k ≤ n − 3. By Lemma 6, the pair ({0,  + 1,  + 2, . . . ,  + k}, {0, 1, 2, . . . , }) is reachable in the product automaton. Next, notice that in the DFA A, the string ad performs circular shift (1, 2, . . . , n − 1), while the input d swaps the states 1 and 2. Recall that swap and circular shift generate the whole symmetric group. Therefore, for every permutation π of {1, 2, . . . , n − 1}, there is a string w π in {ad, d}∗ such that in the DFA A, each state i in {1, 2, . . . , n − 1} goes to the state π (i ) by w π . Moreover, by both a and d, the state 0 goes to itself in the DFA A. Thus in the product automaton, the state

48

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57



{0,  + 1,  + 2, . . . ,  + k}, {0, 1, 2, . . . , }



goes to the state





 

0, π ( + 1), π ( + 2), . . . , π ( + k) , 0, π (1), π (2), . . . , π 

by the string w π . Now, by considering a permutation πˆ such that πˆ (i ) = t i for i = 1, 2, . . . , , and πˆ ( + i ) = si for i = 1, 2, . . . , k, we get the reachability of ( S , T ). Notice that ( Q , {0}) is the only pair that needs symbol e to be reached. 2 Next, we prove the reachability of pairs ( S , T ) such that S ∩ T contains 0 and at least one more state. Lemma 8. Let S , T ⊆ Q and {0}  S ∩ T . Then the pair ( S , T ) is reachable in the product automaton. Proof. Let {0}  S ∩ T . Let S = ( S \ ( S ∩ T )) ∪ {0} and T = ( T \ ( S ∩ T )) ∪ {0}. Then the subsets S and T satisfy the conditions in Lemma 7 since if | T | = 2, then | S | ≤ n − 2 because S and T have a non-zero state in their intersection. Therefore the pair ( S , T ) is reachable by a string over {a, b, c , d}. Now it is enough to prove that if 0 ∈ S ∩ T and i ∈ / S ∪ T, then the pair ({i } ∪ S , {i } ∪ T ) is reached from ( S , T ). First, let i = 2. If 1 ∈ S, then ( S , T ) goes to ({2} ∪ S , {2} ∪ T ) by input c, otherwise by the string bd. / S ∪ T , and 0 ∈ S ∩ T . By the previous Now, let i ≥ 3. Let S = S · an−i and T = T · an−i . Then | S | = | S |, | T | = | T |, 2 ∈ an−i x case, ({2} ∪ S , {2} ∪ T ) is reached from ( S , T ) by a string x in {c , bd}. Thus we have ( S , T ) −−−→ ( S , T ) − → ({2} ∪ S , i −2

{2} ∪ T ) −a−−→ ({i } ∪ S , {i } ∪ T ). / S ∪ T , and 0 ∈ S ∩ T . Hence we have Finally, let i = 1. Let S = S · d and T = T · d. Then | S | = | S |, | T | = | T |, 2 ∈ d x d ( S , T ) −→ ( S , T ) − → ({2} ∪ S , {2} ∪ T ) −→ ({1} ∪ S , {1} ∪ T ). 2 The next two lemmata prove the reachability of all pairs that contain 0 only in one of their components. Lemma 9. Let S ⊆ {2, 3, . . . , n − 1} and T ⊆ Q . Let 0 ∈ T and S ∩ T = ∅. Then the pair ( S , T ) is reachable in the product automaton. Proof. Let i ∈ S ∩ T , thus i ≥ 2. Let S = S \ {i } and T = T \ {i }. Next, let S = S · an−i and T = T · an−i . Then S and T do not contain state 2, and 0 ∈ T . By Lemmata 7 and 8, the pair ({0} ∪ S , T ) is reachable by a string over {a, b, c , d}, and we have



 c   a i −2   {0} ∪ S , T − → {2} ∪ S , {2} ∪ T − −→ {i } ∪ S , {i } ∪ T = ( S , T ).

This proves the lemma.

2

Lemma 10. Let S ⊆ Q and {0, 1} ⊆ S. Then the pair ( S , {1}) is reachable in the product automaton.

/ S. By Lemma 7, the pair ( S , {0}) is reachable by a string over {a, b, c , d}, and it goes to ( S , {1}) by b. If Proof. First, let 2 ∈ 2 ∈ S, then ( S \ {2}, {1}) is reachable by the former case, and it goes to ( S , {1}) by c. 2 As a consequence of Lemmata 6–9, we get the following result. Corollary 1. Let L be the language accepted by the DFA over {a, b, c , d, e } shown in Fig. 3. Then the product automaton D × D for the language L ∗ ∩ L c ∗ has 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2 reachable pairs. Moreover, all the pairs, except for ( Q , {0}), can be reached via strings in {a, b, c , d}∗ . 2 4.2. Distinguishability The idea of the proof of distinguishability of the states in the product automaton of D × D for the language L ∗ ∩ L c ∗ is the following. We show that for every state q of the DFA A, there exist strings u q and v q such that in the NFA N for the language L ∗ , the string u q is accepted only from the state q, and the string v q is accepted from each of its states; while in the NFA N for the language L c ∗ , the string v q is accepted only from the state q, and the string u q is accepted from each of its states. This is enough to prove distinguishability since if ( S , T ) and ( S , T ) are two distinct pairs in the product automaton D × D , then either S = S or T = T . In the first case, without loss of generality, there is a state q with q ∈ S and q ∈ / S . Then the string u q is accepted in D from S and rejected from S . Moreover, in D , the string u q is accepted from T , and therefore, this string is accepted by the product automaton D × D from ( S , T ), but rejected from ( S , T ). The second case is symmetric; now, the string v q distinguishes the pairs ( S , T ) and ( S , T ). Assume that n ≥ 4. Recall that ◦ and • denote the (nondeterministic) transition functions of the NFAs N and N for the languages L ∗ and L c ∗ , respectively. Let us start with the following two technical results.

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

49

Lemma 11. Let S be a non-empty subset of Q . Let z = (ab)n−3 a. (a) If 2 ∈ S, then S ◦ z ⊇ {2} and S • z ⊇ {0, 2}. / S, then S ◦ z and S • z are non-empty subsets of {0, 1}. (b) If 2 ∈ Proof. Let S be a non-empty subset of Q and z = (ab)n−3 a. (a) Let 2 ∈ S. In the DFA A, we have 2 · z = 2. Therefore S ◦ z ⊇ {2}. Since 2 is final in N , S • z ⊇ {0, 2}. (b) Notice that a does not change states 0 and 1 in the DFA A, while b interchanges states 0 and 1. Since the string z contains only symbols a and b, we get {0, 1} ◦ z ⊆ {0, 1} and {0, 1} • z ⊆ {0, 1}. Now let i > 2. Then i ◦ (ab)n−i = {n − 1} ◦ ab = {2} ◦ b = {0} and i • (ab)n−i ⊆ {0, 1, n − 1} • ab = {0, 1, 2} • b = {0, 1}. By reading the remaining symbols of z, we cannot go outside the set {0, 1} in N or N . Thus if 2 ∈ / S, then each state in S goes by z to a non-empty subset of {0, 1} in both NFAs, and the lemma follows. 2 Lemma 12. For every state q in Q , there exists a string w q in {a, b, c , d}∗ such that

q ◦ w q ⊇ {2},

q • w q ⊇ {0, 2},

and for every state p in Q with p = q,

p ◦ w q and p • w q are non-empty subsets of {0, 1}. Proof. We consider three cases, q = 0, q = 1, and q ≥ 2, and give the proof in the first case. The proofs in the remaining two cases are similar. Let z = (ab)n−3 a be the string given by Lemma 11. (1) If q = 0, set w 0 = cz. Since 0 · c = 2, we have 2 ∈ 0 ◦ c and 2 ∈ 0 • c. Hence by Lemma 11, 0 ◦ w 0 = (0 ◦ c ) ◦ z ⊇ {2} and 0 • w 0 = (0 • c ) • z ⊇ {0, 2}. / p ◦ c and On the other hand, if p = 0, then p · c = 2 since c is a permutation on Q with 0 · c = 2. It follows that 2 ∈ 2∈ / p • c. Hence by Lemma 11, p ◦ w 0 = ( p ◦ c ) ◦ z ⊆ {0, 1} and p • w 0 = ( p • c ) • z ⊆ {0, 1}. (2) If q = 1, set w 1 = dz. / p ◦ an−q and (3) If q ≥ 2, set w q = an−q z; notice that now if p = q, then p · an−q = 2, and since 0 · a = 0, we have 2 ∈ 2∈ / p • an−q . 2 Now we are ready to define the strings u q and v q as described above. Lemma 13. For every state q in Q , there exist strings u q and v q in {a, b, c , d}∗ such that (i) in N, the string u q is accepted only from state q, while the string v q is accepted from each of its states; (ii) in N , the string v q is accepted only from state q, while the string u q is accepted from each of its states. Proof. Let w q be the string defined by Lemma 12. Let

u q = w q dac ,

v q = w q ddab.

Let p = q. By Lemma 12, in the NFA N (with final states 0 and 1) we have

q ◦ u q = q ◦ w q dac

⊇ {2}

◦ dac

p ◦ u q = p ◦ w q dac ⊆ {0, 1} ◦ dac q ◦ v q = q ◦ w q ddab ⊇ {2}

◦ ddab

= {0, 1, 2}, = {2, 3}, = {0, 1, 3},

p ◦ v q = p ◦ w q ddab ⊆ {0, 1} (non-empty), and {0} ◦ ddab = {0, 1}, {1} ◦ ddab = {0, 1}. It follows that in the NFA N, the string u q is accepted only from the state q, while the string v q is accepted from each of its states. By the same lemma, in the NFA N (with final states 2, 3, . . . , n − 1) we have

q • v q = q • w q ddab ⊇ {0, 2} • ddab p • v q = p • w q ddab ⊆ {0, 1} • ddab q • u q = q • w q dac

⊇ {0, 2} • dac p • u q = p • w q dac ⊆ {0, 1} (non-empty),

= {0, 1, 3}, = {0, 1}, = {0, 1, 2},

and 0 • dac = {0, 2} and 1 • dac = {0, 2, 3}. Therefore in N , the string v q is accepted only from the state q, while the string u q is accepted from each of its states. 2 The last lemma proves the distinguishability of the states in the product automaton.

50

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

Lemma 14. Let L be the language accepted by the DFA A shown in Fig. 3. All the reachable states of the product automaton D × D for the language L ∗ ∩ L c ∗ are pairwise distinguishable by strings in {a, b, c , d}∗ . Proof. First, let us show that the initial and final state ({0}, {q0 }) can be distinguished from any other final state. Notice that the initial state ({0}, {q0 }) goes to the non-final state ({0}, {0}) by a. Let us show that the one-letter string a is accepted from any other final state of the product automaton. To this aim let ( S , T ) be a final state of the product automaton. Then 0 ∈ S and T ∩ {2, 3, . . . , n − 1} = ∅. In the DFA A, the state 0 goes to itself by a, while every state in {2, 3, . . . , n − 1} goes to a state in {2, 3, . . . , n − 1} by a. This means that the final state ( S , T ) goes by a to a state ( S , T ) with 0 ∈ S and T ∩ {2, 3, . . . , n − 1} = ∅. It follows that ( S , T ) is final, so a is accepted from ( S , T ). Now, let S , S , T , T be subsets of Q with ( S , T ) = ( S , T ). Then either S = S or T = T . In the former case, without loss of generality, we may assume that there is a state q such that q ∈ S and q ∈ / S . Let uq be the string given by Lemma 13. Then the string u q is accepted by the NFA N only from the state q. It follows that the DFA D, obtained from N by the subset construction, accepts the string u q from the subset S, and rejects from the subset S . Moreover, the string u q is accepted from each state in the NFA N . Hence the DFA D , obtained from N by the subset construction, accepts the string u q from the subset T . Therefore, the product automaton D × D accepts the string u q from the pair ( S , T ), and rejects this string from the pair ( S , T ). The latter case is symmetric; now the sets T and T differ in a state q, and the string v q distinguishes the pairs ( S , T ) and ( S , T ) in the product automaton. This completes the proof. 2 Corollary 1 and Lemma 12 give the following lower bound. Theorem 2 (Boundary: lower bound, |Σ| ≥ 5). Let n ≥ 4. Let L be the language over {a, b, c , d, e } accepted by the DFA A shown in Fig. 3. Then sc( L ∗ ∩ L c ∗ ) = 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2. 2 Since we used the symbol e only to reach one particular pair in the product automaton, we also have a lower bound in the case of a four-letter alphabet. Theorem 3 (Boundary: lower bound, |Σ| = 4). Let n ≥ 4. Let K be the language accepted by the DFA A in Fig. 3 restricted to the alphabet {a, b, c , d}. Then sc( K ∗ ∩ K c ∗ ) ≥ 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 1. 2 Next, the lower bound in Theorem 2 matches our upper bound in Theorem 1. This gives the exact value of the state complexity of boundary of regular languages over an alphabet of at least five letters. Moreover, a lower bound for quaternary languages is smaller by just one. Theorem 4 (State complexity of boundary). Let n ≥ 4 and let L be a language over an alphabet Σ with sc( L ) = n. Then





sc L ∗ ∩ L c ∗ ≤ 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2, and the bound is tight if |Σ| ≥ 5. Moreover, there is a language K defined over a four-letter alphabet and such that





sc K ∗ ∩ K c ∗ ≥ 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 1.

2

5. Quaternary case The aim of this section is to show that the upper bound 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2 on the state complexity of the boundary of regular languages cannot be met by any quaternary language. As a result, taken into account our lower bound in Theorem 3, we will get the exact state complexity of the boundary in the quaternary case given by the function 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 1. Recall that for integers i and j with i ≤ j, we denote by [i , j ] the set of integers {k | i ≤ k ≤ j }. To this aim let n ≥ 5 and let L be a regular language accepted by a DFA A = ( Q , Σ, ·, 0, F ), where Q = [0, n − 1] and F = {0, 1}. Let N and N be the NFAs for L ∗ and L c ∗ ; recall that the final states in N are 0 and 1, while the final states in N are 2, 3, . . . , n − 1. Next, recall that to meet the upper bound 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2, the following pairs must be reachable and pairwise distinguishable in the corresponding product automaton for L ∗ ∩ L c ∗ :



 { 0} ∪ S , { 0} ∪ T ,

where S , T ⊆ [1, n − 1], except for the pairs with | T | = 1 and S = [1, n − 1] \ T .

(3)

Our first observation shows that if each symbol in Σ performs a permutation on Q , then some of above mentioned pairs must be equivalent. Proposition 2. Let each symbol in Σ perform a permutation on Q . Then the pairs ( Q , Q ) and ( Q , Q \ {i }) with i ∈ [1, n − 1] are equivalent in the product automaton for L ∗ ∩ L c ∗ .

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

51

Fig. 4. The reachability of a pair ( Q \ R , {0} ∪ R ); R = {r1 , r2 , . . . , rk }.

Proof. Let a be an arbitrary symbol ∈ Σ . Then a performs a permutation on Q . It follows that the pair ( Q , Q ) goes to itself by a in the product automaton. Let i ∈ [1, n − 1]. Then the pair ( Q , Q \ {i }) goes by a to ( Q , Q ) if i · a = 0, and it goes to ( Q , Q \ {i · a}) if i · a = 0. The pair ( Q , Q ) and all the pairs ( Q , Q \ {i }) are final in the product automaton. This means that every string is accepted from the pair ( Q , Q ) as well as from each ( Q , Q \ {i }). Therefore these pairs are equivalent in the product automaton. 2 Hence the upper bound on the state complexity of the boundary of L cannot be met if Σ contains only permutation symbols. Our next aim is to show that to reach all the pairs in (3), the alphabet Σ must contain at least four distinct permutation symbols. The first two permutations are given in the next observation. Proposition 3. To reach the pairs ( Q , {0}) and ({0}, Q ), the alphabet Σ must contain a permutation a with 1 · a = 0 and a permutation b with 2 · b = 0, respectively. Proof. Suppose that the pair ( Q , {0}) = ({0, 1, 2, . . . , n − 1}, {0}) is reached from a pair ( S , T ) by a symbol a. Notice that in the NFA N each state i goes by a either to {i · a} or to {0, i · a} depending on whether i · a is final in N or not. It follows that there must exist pairwise distinct states i 1 , i 2 , . . . , in−1 in Q such that i j · a = j (1 ≤ j ≤ n − 1) and {i 1 , i 2 , . . . , in−1 } ⊆ S \ T . This gives n − 1 distinct states that are in S but not in T . Since S and T must have a non-empty intersection, there must be one more state i 0 with i 0 · a = 0 and {i 0 } = S ∩ T . Hence a is a permutation on Q . Moreover, we must have i 0 = 0 because otherwise ( Q , {0}) might only be reached from itself by a. This means that i 0 = 1 because otherwise T would also contain state 0 since each state in [2, n − 1] is final in the NFA N . Therefore we have 1 · a = 0. The proof for the pair ({0}, Q ) is similar. Now, we must have i 0 ∈ [2, n − 1] since 1 ∈ S imply 0 ∈ S. Without loss of generality, we have i 0 = 2 since we can rename the states in [2, n − 1] if necessary. 2 Thus to reach all the pairs in (3), the alphabet Σ must contain a permutation a that maps state 1 to state 0, and a permutation b that maps state 2 to state 0. Let i and j be the images of 0 and 1, respectively, under b. Since b is a permutation on Q that maps 2 to 0, we must have i , j ∈ [1, n − 1] and i = j. Now, consider the following subset of pairs in (3):

R=







Q \ R , {0} ∪ R R ⊆ [1, n − 1] and | R | ≥ 2 ,

that is, for each pair in R the right component contains at least two non-zero states, each non-zero state occurs in exactly one component, and 0 is the only state in the intersection of the left and right component. To meet the upper bound for the boundary, each pair in R must be reachable in the product automaton. The next observation shows how the pairs in R may be reached. Proposition 4. Assume that a pair ( Q \ R , {0} ∪ R ) in R is reached from a pair ( S , T ) by a symbol d. Then (a) the symbol d performs a permutation on Q , and moreover, if 0 · d = 0, then {0, 1} ⊆ T \ S; (b) d = a; (c) if d = b, then {i , j } ⊆ R; recall that i = 0 · b and j = 1 · b. Proof. (a) Let ( Q \ R , {0} ∪ R ) be a pair in R. Let R = {r1 , r2 , . . . , rk } with k ≥ 2, and [1, n − 1] \ R = {rk+1 , rk+2 , . . . , rn−1 }, where r1 , r2 , . . . , rn−1 are pairwise distinct states in [1, n − 1]. Assume that the pair ({0, rk+1 , rk+2 , . . . , rn−1 }, {0, r1 , r2 , . . . , rk }) is reached from a reachable pair ( S , T ) by a symbol d. Recall that in both NFAs N and N , each state i goes by d either to {i · d} or to {0, i · d}. It follows that there must be pairwise distinct states i j in Q with i j · d = r j (1 ≤ j ≤ n − 1) such that {i 1 , i 2 , . . . , ik } ⊆ T \ S and {ik+1 , ik+2 , . . . , in−1 } ⊆ T \ S; note that none of r1 , r2 , . . . , rk is in Q \ R and none of rk+1 , . . . , rn−1 is in R. Since S and T must have a non-empty intersection, there must be one more state i 0 with i 0 · d = 0 such that {i 0 } = S ∩ T . This is illustrated in Fig. 4. It follows that d is a permutation on Q . Let 0 · d = 0, thus i 0 = 0. Since | T | ≥ 3, the set T must contain at least one final state of N , and therefore it must contain also state 0. Thus 0 ∈ {i 1 , i 2 , . . . , ik }. Since

52

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

{i 1 , i 2 , . . . , ik } ∩ S = ∅, the state 0 cannot be in S. Then also 1 cannot be in S since 1 is final in N, and therefore 1 ∈ S would imply 0 ∈ S. This means that 1 must be in T \ S. Hence {0, 1} ⊆ T \ S. (b) Let d = a. Since a maps 1 to 0, we have 0 · a = 0. By the previous case, we must have {0, 1} ⊆ T \ S. However, since 1 · a = 0, we have {1} = S ∩ T , which is a contradiction with {0, 1} ⊆ T \ S. (c) Let d = b. Since 0 · b = 0, we must have {0, 1} ⊆ T \ S by case (a). It follows that {i , j } ⊆ R. 2 The following observation shows that if Σ does not contain any permutation c with 0 · c = 0, then it must contain three distinct permutations c 1 , c 2 , c 3 , each of which is different from the permutation a. Proposition 5. Assume that Σ does not contain any permutation c with 0 · c = 0. To reach all pairs in R, there must be permutations c 1 , c 2 , c 3 in Σ with {0, 1} · c 1 = {1, 2}, {0, 1} · c 2 = {1, 3}, {0, 1} · c 3 = {1, 4}. Proof. Consider the pair ( Q \ {1, 2}, {0, 1, 2}) in R. By Proposition 4(a), this pair must be reached from a pair ( S , T ) by a permutation c 1 . Since we assume that Σ does not contain any permutation that maps 0 to 0, we have 0 · c 1 = 0, and therefore {0, 1} ⊆ T \ S. It follows that {0, 1} is mapped to {1, 2} by c 1 . The same reasoning, applied to the pairs ( Q \ {1, 3}, {0, 1, 3}) and ( Q \ {1, 4}, {0, 1, 4}) in R, gives two more permutations c 2 and c 3 , respectively, such that {0, 1} · c 2 = {1, 3} and {0, 1} · c 3 = {1, 4}. 2 Hence if the alphabet Σ does not contain any permutation that maps 0 to 0, then it must contain four distinct permutations a, c 1 , c 2 , c 3 to reach all the pairs in (3). Now assume that Σ contains permutations a, b, c with 1 · a = 0, 2 · b = 0, and 0 · c = 0. Let us show that to reach all the pairs in R, the alphabet Σ must contain one more permutation. Proposition 6. Assume that the alphabet Σ contains three permutations a, b, c with 1 · a = 0, 2 · b = 0, and 0 · c = 0. To reach all the pairs in R, the alphabet Σ must contain a fourth permutation. Proof. Recall that we have i = 0 · b and j = 1 · b, and by Proposition 4(c), if a pair ( Q \ R , {0} ∪ R ) in R is reached by b, then we must have {i , j } ⊆ R. Now consider four cases depending on i · c and j · c. (i) First, assume that the states i and j are in two disjoint cycles C i and C j of the permutation c. Without loss of generality, assume that the length of C i is less or equal than the length of C j . Let R = [1, n − 1] \ C i , thus | R | ≥ 2. Consider the pair ( Q \ R , {0} ∪ R ) which is in R. By Proposition 4(b), this pair cannot be reached by a. By Proposition 4(c), this pair cannot be reached by b since i ∈ / R. By c, this pair goes to itself, and therefore to reach it, we must have a fourth permutation d. (ii) Next, assume that the states i and j are in the same cycle C of the permutation c, and that the length of this cycle is at most n − 3. Let R = [1, n − 1] \ C , thus | R | ≥ 2. Consider the pair ( Q \ R , {0} ∪ R ) which is in R. The same reasoning as in the previous case gives the fourth permutation. (iii) Now, assume that the states i and j are in the same cycle C of the permutation c, and that the length of this cycle is n − 2. Then there is a state k in [1, n − 1] with k · c = k. Let R = { j , k} and consider the pair ( Q \ { j , k}, {0, j , k}) in R. For the same reason as in the above cases, this pair cannot be reached by a and by b. Next, in the product automaton, we have a cycle on c of pairs





c

Q \ { j , k}, {0, j , k} − → c − →

 





c

Q \ { j , k } · c , { 0, j · c , k } − →







Q \ { j , k } · c n −2 , 0, j · c n −2 , k





Q \ { j , k } · c 2 , 0, j · c 2 , k





 = Q \ { j , k}, {0, j , k} .



c − → ···

All these pairs are in R, thus cannot be reached by a. Next, the right components of these pairs contain at most one of i and j, and therefore these pairs cannot be reached by b. It follows that to reach the pairs on this cycle, we need a fourth permutation. (iv) Finally, assume that the states i and j are in the same cycle C of the permutation c, and that the length of this cycle is n − 1. Then we have i = j · c  , where 1 ≤  ≤ n − 2. If  = 1 and  = n − 2, then take the pair ( Q \ { j , j · c }, {0, j , j · c }), otherwise take the pair ( Q \ { j , j · c 2 }, {0, j , j · c 2 }). In both cases, in a similar way as in (3), we get a cycle on c of n − 1 pairs in the product automaton. Since we have n ≥ 5, the right components of the pairs in this cycle cannot contain both i and j. Thus they cannot be reached by a, b. Therefore, their reachability requires a fourth permutation, which concludes the proof. 2 By summarizing the results of this section, and taking into account our lower bound result in Theorem 2, we get the exact complexity of the boundary in the quaternary case. Theorem 5 (Boundary: state complexity, |Σ| = 4). Let n ≥ 5. Let L be a language over a four-letter alphabet with sc( L ) = n. Then sc( L ∗ ∩ L c ∗ ) ≤ 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 1, and the bound is tight.

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

53

Fig. 5. The binary DFA A accepting a language L with sc( L ∗ ∩ L c ∗ ) ≥ 1/256 · 4n .

Proof. The lower bound is given by Theorem 2. The upper bound follows from the results in this section since we proved that to reached all the pairs in (3), the alphabet Σ must contain at least four distinct permutation symbols. However, if Σ contains only permutation symbols, then by Proposition 2, at least n pairs in (3) must be equivalent. This gives the upper bound. 2 6. Binary case In this section we consider the binary case, and our goal is to prove Ω(4n ) lower bound on the state complexity of boundary of binary languages. This will result in Θ(4n ) complexity of boundary in the binary and ternary cases since an upper bound 3/8 · 4n is given by Theorem 1. To this aim, let n ≥ 7, and consider the binary language L accepted by the DFA A shown in Fig. 5, with the state set Q = [0, n − 1], the initial state 0, and the set of final states F = {0, 1}. The input a performs the permutation (0, n − 1)(1, 2, . . . , n − 2), and the input b maps states 0 and 1 to state 1, state 2 to state 0, state n − 1 to state 2, and performs the cycle (3, n − 2, n − 3, . . . , 4). Construct the NFAs N and N for the languages L ∗ and L c ∗ , respectively, as described in Section 3. Let D and D be the corresponding subset automata. Our goal is to show that every pair ({0, 1} ∪ S , {0, 1} ∪ T ) with S , T ⊆ [3, n − 2] is reachable in the product automaton D × D for L ∗ ∩ L c ∗ , and that these pairs are pairwise distinguishable. This proves reachability and distinguishability of 2n−4 · 2n−4 pairs, and gives 1/256 · 4n lower bound. We will proceed in three steps. First, we prove that every pair ({0, 1} ∪ S , {0, 1}) with S ⊆ [3, n − 2] is reachable. Next, we show that all the pairs ({0, 1} ∪ S ∪ T , {0, 1} ∪ T ) can be reached, where S and T are disjoint subsets of [3, n − 2]. Finally, we get the reachability of all pairs ({0, 1} ∪ S ∪ T , {0, 1} ∪ T ∪ R ), where S , T , and R are disjoint subsets of [3, n − 2]. Hence for all subsets S and T of [3, n − 2], we will be able to reach the pair ({0, 1} ∪ ( S \ T ) ∪ ( S ∩ T ), {0, 1} ∪ ( S ∩ T ) ∪ ( T \ S )), that is, the pair ({0, 1} ∪ S , {0, 1} ∪ T ). In the second part of this section, we prove the distinguishability of these pairs. We will use the following facts. Since input a performs the cycle (1, 2, . . . , n − 2), for every subset S of [1, n − 2], the set S · a is again a subset of [1, n − 2]. Moreover, the size of S · a is the same as the size of S, and S · an−2 = S. Similarly, input b performs the cycle (3, n − 2, n − 3, . . . , 4). Hence for every subset T of [3, n − 2], we have T · b ⊆ [3, n − 2], | T · b| = | T |, and T · bn−4 = T . Let us start with the following observation which shows how the state n − 2 can be added (i) only to the left component, (ii) to both components, and (iii) only to the right component, while each state from [3, n − 3] remains in itself in both components. Then we continue with the three steps mentioned above. Proposition 7. Let L be the binary language accepted by the DFA A in Fig. 5. In the product automaton D × D for L ∗ ∩ L c ∗ , we have



 (ab)2 bn−4   {0, 1}, {1} −−−−−→ {0, 1, n − 2}, {1} ,   (ab)2 bn−4   {0, 1}, {0, 1} −−−−−→ {0, 1, n − 2}, {0, 1, n − 2} ,   (ab)4 bn−4   {0, 1}, {0, 1} −−−−−→ {0, 1}, {0, 1, n − 2} . Next, let i ∈ [3, n − 3]. Then in the NFAs N = ( Q , {a, b}, ◦, 0, {0, 1}) respectively, we have

i ◦ ab = {i } and

(4) (5) (6) and N

= ( Q ∪ {q0 }, {a, b}, •, q0 , [2, n − 1])

for L ∗ and L c ∗ ,

i ◦ bn−4 = {i },

(7)

2 n −4

= {0, 1, i },

(8)

4 n −4

= {0, 1, i , n − 2}.

(9)

i • (ab) b i • (ab) b

Proof. In the product automaton, we have



 a   b   a   b   {0, 1}, {1} − → {2, n − 1}, {0, 2} − → {0, 2}, {0, 1} − → {3, n − 1}, {0, 2, n − 1} − → {2, n − 2}, {0, 1, 2}  b   b n −6   b  − → {0, 1, n − 3}, {0, 1} − → {0, 1, n − 4}, {1} − −→ {0, 1, n − 2}, {1} ,

which proves (4). Next, let us see where the pair ({0, 1}, {0, 1}) goes by (ab)2 and by (ab)4 :

54

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57



 a   b   {0, 1}, {0, 1} − → {2, n − 1}, {0, 2, n − 1} − → {0, 2}, {0, 1, 2}  b   a  − → {3, n − 1}, {0, 2, 3, n − 1} − → {2, n − 2}, {0, 1, 2, n − 2}  b   a  − → {0, 1, 3}, {0, 1, 2, 3, n − 1} − → {0, 1, n − 2}, {0, 1, 2, n − 2}  b   a  − → {0, 1, 2, n − 1}, {0, 1, 2, 3, n − 1} − → {0, 1, 2}, {0, 1, 2, n − 2} . b n −4

b n −4

Finally, ({2, n − 2}, {0, 1, 2, n − 2}) −−−→ ({0, 1, n − 2}, {0, 1, n − 2}) and ({0, 1, 2}, {0, 1, 2, n − 2}) −−−→ ({0, 1}, {0, 1, n − 2}), which proves (5) and (6), respectively. Now, let i ∈ [3, n − 3]. Then we have i · ab = (i + 1) · b = i and i · bn−4 = i in the DFA A. Since all states in [3, n − 2] are non-final in N, we get (7). On the other hand, all states in [3, n − 2] are final in N , and therefore after reading each symbol we must add state 0 to the resulting set. Thus in the NFA N we have a

b

a

b

b n −4

i− → {0, i + 1 } − → { 0, 1 , i } − → { 0, 2 , i + 1 , n − 1 } − → {0, 1, 2, i } −−→ {0, 1, i }, which gives (8). The proof of (9) is similar.

2

Lemma 15. Let L be the binary language accepted by the DFA A shown in Fig. 5. Let S ⊆ [3, n − 2]. Then the pair ({0, 1} ∪ S , {0, 1}) is reachable in the product automaton D × D for L ∗ ∩ L c ∗ . Proof. First, let us show that every pair ({0, 1} ∪ S , {1}) is reachable, where S ⊆ [3, n − 2]. The proof is by induction on | S |. The basis is | S | = 0. The set ({0, 1}, {1}) is reached from the initial state ({0}, {q0 }) by b. Let S be a subset of [3, n − 2] such that the pair ({0, 1} ∪ S , {1}) is reachable, and let i be a state in [3, n − 2] such that i ∈ / S. Let us show that the pair ({0, 1, i } ∪ S , {1}) is also reachable. Consider two cases: (i) i = n − 2. Then S ⊆ [3, n − 3]. Using (4) and (7), we get



 (ab)2 bn−4   {0, 1} ∪ S , {1} −−−−−→ {0, 1, n − 2} ∪ S , {1} .

(ii) i = n − 2. Let S = S · b i −2 . Then | S | = | S | and n − 2 ∈ / S . It follows that the pair ({0, 1, n − 2} ∪ S , {1}) is reachable as shown in (i). The pair ({0, 1, n − 2} ∪ S , {1}) goes to ({0, 1, i } ∪ S , {1}) by bn−2−i . Now let S ⊆ [3, n − 2]. Let S = S · bn−6 . Then, as shown above, the pair ({0, 1} ∪ S , {1}) is reachable. Let us show that ({0, 1} ∪ S , {1}) goes to ({0, 1} ∪ S , {0, 1}) by an−2 b2 . We have



 an−3   { 0, 1} ∪ S , { 1 } − −→ J ∪ {n − 2} ∪ S · an−3 , {0, n − 2, n − 1} ,

where J = {0}, J = {n − 1}, or J = {0, n − 1}. The latter pair goes to ({0, 1} ∪ S , {0, 1}) by ab2 .

2

Lemma 16. Let L be the binary regular language accepted by the DFA A in Fig. 5. Let S and T be disjoint subsets of [3, n − 2]. Then the pair ({0, 1} ∪ S ∪ T , {0, 1} ∪ T ) is reachable in the product automaton for L ∗ ∩ L c ∗ . Proof. The proof is by induction on | T |. The basis is T = ∅, and the pair ({0, 1} ∪ S , {0, 1}) is reachable by Lemma 15. Now let S and T be disjoint subsets of [3, n − 2] such that the pair ({0, 1} ∪ S ∪ T , {0, 1} ∪ T ) is reachable, and let i be a state in the set [3, n − 2] such that i ∈ / S ∪ T . Let us show that the pair ({0, 1, i } ∪ S ∪ T , {0, 1, i } ∪ T ) is reachable as well. Consider two cases: (i) i = n − 2. Then S and T are disjoint subsets of [3, n − 3]. Using (5), (7), and (8), we get



 (ab)2 bn−4   {0, 1} ∪ S ∪ T , {0, 1} ∪ T −−−−−→ {0, 1, n − 2} ∪ S ∪ T , {0, 1, n − 2} ∪ T .

(ii) i = n − 2. Let S = S · b i −2 and T = T · b i −2 . Then n − 2 ∈ / S ∪ T . As shown in (i), the pair ({0, 1, n − 2} ∪ S ∪ T , {0, 1, n − 2} ∪ T ) is reachable. It goes to ({0, 1, i } ∪ S ∪ T , {0, 1, i } ∪ T ) by bn−2−i . This concludes the proof.

2

Lemma 17. Let L be the binary regular language accepted by the DFA A in Fig. 5. Let S , T , R be disjoint subsets of [3, n − 2]. Then the pair ({0, 1} ∪ S ∪ T , {0, 1} ∪ T ∪ R ) is reachable in the product automaton for L ∗ ∩ L c ∗ . Proof. The proof is by induction on | R |. The basis is R = ∅, and the pair ({0, 1} ∪ S ∪ T , {0, 1} ∪ T ) is reachable by Lemma 16. Now let S , T , R be disjoint subsets of [3, n − 2] such that the pair ({0, 1} ∪ S ∪ T , {0, 1} ∪ T ∪ R ) is reachable, and let i be a state in [3, n − 2] such that i ∈ / S ∪ T ∪ R. Let us show that the pair ({0, 1} ∪ S ∪ T , {0, 1, i } ∪ T ∪ R ) is reachable. Consider two cases:

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

55

(i) i = n − 2. Then S , T , R are disjoint subsets of [3, n − 3]. Using (6), (7), and (9) we get



 (ab)4 bn−4   {0, 1} ∪ S ∪ T , {0, 1} ∪ T ∪ R −−−−−→ {0, 1} ∪ S ∪ T , {0, 1, n − 2} ∪ T ∪ R

(ii) i = n − 2. Let S = S · b i −2 , T = T · b i −2 , and R = R · b i −2 . Then n − 2 ∈ / S ∪ T ∪ R . The pair ({0, 1} ∪ S ∪ T , {0, 1, i } ∪ T ∪ R ) n−2−i is reached from ({0, 1} ∪ S ∪ T , {0, 1, n − 2} ∪ T ∪ R ) by b , and the latter pair is reachable by (i).

2

The proof is complete.

Corollary 2. Let L be the binary regular language accepted by the DFA A shown in Fig. 5. Let S , T ⊆ [3, n − 2]. Then the pair ({0, 1} ∪ S ,

{0, 1} ∪ T ) is reachable in the product automaton D × D for L ∗ ∩ L c∗ .

Proof. Let S , T be two subsets of [3, n − 2]. Then S \ T , S ∩ T , and T \ S are disjoint subsets of [3, n − 2]. By Lemma 17, the pair ({0, 1} ∪ ( S \ T ) ∪ ( S ∩ T ), {0, 1} ∪ ( S ∩ T ) ∪ ( T \ S )) is reachable, and since this pair is equal to ({0, 1} ∪ S , {0, 1} ∪ T ), the corollary follows. 2 Hence we have 1/256 · 4n reachable pairs in the product automaton. Now we prove distinguishability. Lemma 18. Let L be the binary language accepted by the DFA A in Fig. 5. The pairs ({0, 1} ∪ S , {0, 1} ∪ T ) with S , T ⊆ [3, n − 2] are pairwise distinguishable in the product automaton D × D for L ∗ ∩ L c ∗ . Proof. Consider the NFAs N and N for the languages L ∗ and L c ∗ , respectively. We are going to show that for each state i in [3, n − 2], there exist strings u i and v i such that in the NFA N, the string u i is accepted only from the state i, while v i is accepted from the state 0, and in the NFA N , the string v i is accepted only from the state i, while u i is accepted from the state 0. This is enough to prove distinguishability since for two distinct pairs ({0, 1} ∪ S , {0, 1} ∪ T ) and ({0, 1} ∪ S , {0, 1} ∪ T ) either S and S differ in a state i in [3, n − 2] and then the string u i distinguishes the two pairs, or the sets T and T differ in a state i in [3, n − 2] and then the string v i distinguishes the two pairs. Recall that we have n ≥ 7, thus n − 4 ≥ 3. First, let w = abn−4 . Let j ∈ [3, n − 3]. Notice that in the NFAs N and N , we have

{0, 1} ◦ w = {0, 1} ◦ abn−4 = {2, n − 1} ◦ bn−4 = {0, 1}, {0, 1} • w = {0, 1} • ab j • w = j • ab

n −4

n −4

= { 0, 2, n − 1 } • b

= {0, j + 1} • b

(n − 2) • w = (n − 2) • ab

n −4

n −4

n −4

(10)

= {1},

(11)

= {0, 1, j + 1},

= {1} • b

n −4

(12)

= {1}.

(13)

Now, define the strings u i and v i by

u i = b n −4 b i −2 a

and

v i = b n −4 b i −3 w n −5 .

Let i , j be states in [3, n − 2] with i = j. Then in the NFA N (with final states 0 and 1), we have b n −4 b i −2

b n −4 b i −2

n −4







j −−−−−→ j − → j + 1 ,

a

i −−−−−→ {n − 2} − → {0, 1},

a

i −2

b b a −→ {0, 1} − −→ {0, 1} − → {2, n − 1}, {0, 1, 2, n − 1} −

where j = j · b i −2 = n − 2, n −4 i −3

n −5

b b w {0 } − −−− −→ {0, 1} − −−→ {0, 1} by (10).

Thus u i is accepted by N only from the state i, and v i is accepted from the state 0. Next, in the NFA N (with final states 2, 3, . . . , n − 1), we have b n −4

b i −3

w n −5

i −−→ {0, 1, i } −−→ {0, 1, 3} −−−→ {0, 1, n − 2} by (11) and (12), b n −4 b i −3





w n −5

j −−−−−→ 0, 1, j −−−→ {1} by (11)–(13) since j = j · b i −3 > 3, n −4

i −3

n −5

b b w −→ {1} − −→ {1} − −−→ {1} by (11), {0 , 1 , 2 , n − 1 } −

n −4 i −2

b b a { 0} − −−− −→ {1} − → {0, 2}.

Thus v i is accepted by N only from the state i, and u i is accepted from the state 0. Hence the strings u i and v i have the desired properties, and the lemma follows. 2 The results of this section show that the product automaton for L ∗ ∩ L c ∗ has 1/256 · 4n reachable and pairwise distinguishable states. Hence the state complexity of the boundary of binary languages is Ω(4n ). Theorem 6 (Boundary: lower bound, |Σ| = 2). Let n ≥ 7. Let L be the binary regular language accepted by the DFA A in Fig. 5. Then sc( L ∗ ∩ L c ∗ ) ≥ 1/256 · 4n . 2

56

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

Fig. 6. The binary DFA with F = {0}, accepting a language L with sc( L ∗ ∩ L c ∗ ) = (n + 1) · 2n−2 + 1.

7. The case when only the initial state is final This section deals with DFAs, in which only the initial state is final. The upper bound (n + 1) · 2n−2 + 1 for the boundary in this case is given in Lemma 4. Here we show that it is tight in the binary case. To this aim let n ≥ 4, and consider the binary language L accepted by the DFA shown in Fig. 6. Since a performs the cycle (0, 1, . . . , n − 1), for every subset S of [0, n − 1], we have S · an = S. Since b performs the cycle (2, 3, . . . , n − 1) and maps 0 to 0, for every subset S of {0} ∪ [2, n − 1], we have S · bn−2 = S. Moreover, if 0 ∈ S, then S • b = S · b. Next, in the NFA N , every state i goes to {0, i + 1} by a, except for the state n − 1 which goes to {0} by a. Hence if n − 1 ∈ / S then S • a = {0} ∪ S · a, otherwise S • a = S · a. Lemma 19. Let S ⊆ Q . Then the pair ({0}, S ) with 0 ∈ S and the pair ({i }, S ) with {0, i } ⊆ S are reachable in the product automaton for L ∗ ∩ L c ∗ . All these pairs are pairwise distinguishable. Proof. The proof of reachability is by odd induction on | S |. The basis is S = Q . In the product automaton, we have i b an ({0}, {q0 }) −→ ({0}, {0}) −− → ({0}, Q ) −a→ ({i }, Q ). This proves the basis. Assume that the claim holds for every set of size k + 1, and let S be a set of size k. Let us show that each pair ({i }, S ) with {0, i } ⊆ S is reachable. Consider five cases: / S. Let S = S · bn−3 . Then | S | = | S |, 0 ∈ S and 1 ∈ / S . It follows that the pair (i) i = 0 and 0 ∈ S. First assume that 1 ∈ ({0}, {1} ∪ S ) is reachable by induction. This pair goes to ({0}, S ) by b. / S. Then ({i }, S ) is reached from ({i · bn−3 }, {1} ∪ S · bn−3 ) by b, and the latter (ii) i ≥ 2 and {0, i } ⊆ S. Assume that 1 ∈ pair is reachable by induction. (iii) i = 0 and 0 ∈ S. Now let 1 ∈ S. Then there is a state t with 1 ≤ t ≤ n − 2 such that S = [0, t ] ∪ T and t + 1 ∈ / T ; that is, the minimal state that is not in S is t + 1. Let S = S · an−t . Then | S | = | S |, {0, n − t } ⊆ S , and 1 ∈ / S . Since 2 ≤ n − t ≤ n − 1, the pair ({n − t }, S ) is reachable by (ii). Notice that each set S · a j with 1 ≤ j ≤ t − 1 contains n − 1. Therefore, the pair ({n − t }, S ) goes to ({0}, S ) by at . (iv) i = 1 and {0, 1} ⊆ S. Let S = S · an−1 . Then | S | = | S | and 0 ∈ S . Moreover, we have n − 1 ∈ S . Therefore, the pair ({0}, S ) is reachable by (i) and (iii), and it goes to ({0}, S ) by a. (v) Assume that each pair ({i − 1}, S ) with {0, i − 1} ⊆ S is reachable. Let S be a set with {0, i } ⊆ S and 1 ∈ S. Let S = S · an−1 . Then | S | = | S |, {0, i − 1} ⊆ S , and n − 1 ∈ S . Therefore, the pair ({i − 1}, S ) is reachable, and it goes to ({i }, S ) by a. This proves reachability. Now, let us show that all the pairs in the product automaton are pairwise distinguishable. The initial and final pair ({0}, {q0 }) goes to the non-final pair ({0}, {0}) by b as well as by ab. On the other hand, every other final pair ({0}, { T }) either contains a state from [2, n − 1] in its right component, or it is equal to ({0}, {0, 1}). In the first case, it accepts b, in the second case, it accepts ab. Now, let ({ p }, S ) and ({q}, T ) be two distinct reachable pairs with S , T ⊆ Q . If p = q, then the string w = an an− p distinguishes the two pairs since ({ p }, S ) goes to the final pair ({0}, Q ) by w, while ({q}, S ) goes by w to a pair ({q }, Q ) with q = 0, which is non-final. Finally, consider two distinct reachable pairs ({ p }, S ) and ({ p }, T ). Then S and T differ in a state t with t ∈ [1, n − 1] and t = p. For each state i in [1, n − 1], define the string v i by v 1 = (abn−2 )n−2 and v i = bn−2 bn−i (abn−2 )n−3 if i ≥ 2. Then in the NFA N , the string v i is accepted only from the state i since i • v i = {0, n − 1} and j • v i = {0} if j = i. Next, if p = i then in the NFA N, the string v i is accepted from the state p since p ◦ v i = {0}. It follows that the string v t distinguishes the pairs ({ p }, S ) and ({ p }, T ). 2 Hence we get the following result. Theorem 7. Let n ≥ 4. Let L be a language over an alphabet Σ such that in the minimal n-state DFA for L only the initial state is final. Then sc( L ∗ ∩ L c ∗ ) ≤ (n + 1) · 2n−2 + 1, and the bound is tight if |Σ| ≥ 2. 2 8. Conclusions Let f k (n) be the state complexity of the boundary of regular languages over a k-letter alphabet defined as f k (n) = max{sc( L ∗ ∩ L c ∗ ) | L ⊆ Σ ∗ , |Σ| = k, sc( L ) = n}. Using this notation, we can summarize our results in the following theorem. Theorem 8. Let n ≥ 5. Let f k (n) be the state complexity of the boundary of regular languages over a k-letter alphabet as defined above. Then

J. Jirásek, G. Jirásková / Theoretical Computer Science 578 (2015) 42–57

57

Table 1 The state complexity of the boundary of regular languages over 1-, 2-, 3-, 4-, and 5-letter alphabet. n

f 1 (n) = (n − 1)2 + 1

3 4 5 6 7 8

5 10 17 26 37 50

(i) (ii) (iii) (iv)

f 2 (n)

f 3 (n)

f 4 (n)

f 5 (n) = 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2

11 32 113 424 1747 ≥ 6872

18 77 313 1289 5490 ≥ 22705

19 80 334 1385 5684 23175

19 80 335 1386 5685 23176

f 1 (n) = (n − 1)2 + 1, f 2 (n) ∈ Θ(4n ) and f 3 (n) ∈ Θ(4n ), f 4 (n) = 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 1, f k (n) = 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2 if k ≥ 5.

Proof. In the unary case, the string a must be either in L or in L c . Therefore one of L ∗ or L c ∗ is equal to a∗ , and the boundary is equal to the other. The state complexity of star operation in the unary case is known to be (n − 1)2 + 1 [13]. By Theorem 6, we have f 2 (n) ≥ 1/256 · 4n if n ≥ 7. Next, f 2 (n) ≤ f 3 (n), and f 3 (n) ≤ 3/8 · 4n by Theorem 1. The tight bounds in (iii) and (iv) are given by Theorem 5 and Theorem 4. 2 Hence, in this paper, we obtained the tight bounds on the state complexity of boundary of regular languages over 4- and 5-letter alphabets. Moreover, we proved Ω(4n ) lower bound for the boundary in the binary case. We also studied the case when only the initial state is final. We got the upper bound (n + 2) · 2n−2 + 1 in this case, and we showed that this bound can be met by a binary language. We did some computations that are summarized in Table 1. Notice that if n = 3 or n = 4, then the upper bound 3/8 · 4n + 2n−2 − 2 · 3n−2 − n + 2 can be met by quaternary languages. The computations show that the complexity of boundary decreases with decreasing the size of the alphabet. The tight bounds in the binary and ternary cases remain open, and to get them seems to be a very hard problem. References [1] J.A. Brzozowski, E. Grant, J. Shallit, Closures in formal languages and Kuratowski’s theorem, Internat. J. Found. Comput. Sci. 22 (2011) 301–321. [2] J.H. Fife, The Kuratowski closure-complement problem, Math. Mag. 64 (1991) 180–182. [3] G. Jirásková, J. Shallit, The state complexity of star-complement-star, in: H.-C. Yen, O.H. Ibarra (Eds.), DLT 2012, in: LNCS, vol. 7410, Springer, Heidelberg, 2012, pp. 380–391. [4] S. Konstantinidis (Ed.), Conference on Implementation and Application of Automata, LNCS, vol. 7982, 2013, pp. 208–219. [5] C. Kuratowski, Sur l’opération A de l’analysis situs, Fund. Math. 3 (1922) 182–199. [6] A.N. Maslov, Estimates of the number of states of finite automata, Sov. Math., Dokl. 11 (1970) 1373–1375. [7] M. Rabin, D. Scott, Finite automata and their decision problems, IBM J. Res. Develop. 3 (1959) 114–129. [8] A. Salomaa, K. Salomaa, S. Yu, State complexity of combined operations, Theoret. Comput. Sci. 383 (2007) 140–152. [9] J. Shallit, Open problems in automata theory and formal languages, https://cs.uwaterloo.ca/~shallit/Talks/open10r.pdf. [10] J. Shallit, State complexity of ( L ∗ )∗ and L ∗ ∩ ( L )∗ , Personal communication, 2010. [11] M. Sipser, Introduction to the Theory of Computation, PWS Publishing Company, Boston, 1997. [12] S. Yu, Regular languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, vol. I, Springer, Heidelberg, 1997, pp. 41–110, Ch. 2. [13] S. Yu, Q. Zhuang, K. Salomaa, The state complexity of some basic operations on regular languages, Theoret. Comput. Sci. 125 (1994) 315–328.