On the teaching complexity of linear sets

On the teaching complexity of linear sets

JID:TCS AID:11396 /FLA Doctopic: Algorithms, automata, complexity and games [m3G; v1.226; Prn:8/12/2017; 12:57] P.1 (1-17) Theoretical Computer Sci...

1003KB Sizes 2 Downloads 32 Views

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.1 (1-17)

Theoretical Computer Science ••• (••••) •••–•••

Contents lists available at ScienceDirect

Theoretical Computer Science www.elsevier.com/locate/tcs

On the teaching complexity of linear sets Ziyuan Gao a,∗ , Hans Ulrich Simon b , Sandra Zilles a a b

Department of Computer Science, University of Regina, Regina, SK, S4S 0A2, Canada Horst Görtz Institute for IT Security and Faculty of Mathematics, Ruhr-Universität Bochum, D-44780 Bochum, Germany

a r t i c l e

i n f o

Article history: Available online xxxx Keywords: Teaching complexity Teaching dimension Recursive teaching dimension Linear sets

a b s t r a c t Linear sets are the building blocks of semilinear sets, which are in turn closely connected to automata theory and formal languages. Prior work has investigated the learnability of linear sets and semilinear sets in three models – Valiant’s PAC-learning model, Gold’s learning in the limit model, and Angluin’s query learning model. This paper considers teacher– learner models of learning families of linear sets, in which a benevolent teacher presents a set of labelled examples to the learner. First, we study the classical teaching model, in which a teacher must successfully teach any consistent learner. Second, we will apply a generalisation of the recently introduced recursive teaching model to several infinite classes of linear sets, and show that thus the maximum sample complexity of teaching these classes can be drastically reduced compared to classical teaching. To this end, a major focus of the paper will be on determining two relevant teaching parameters, the teaching dimension and recursive teaching dimension, for various families of linear sets. © 2017 Elsevier B.V. All rights reserved.

1. Introduction A mathematical notion that has found numerous applications in Computer Science over the last five decades is the notion of semilinear set. A semilinear set is a set of vectors (m-tuples) of nonnegative integers following a specific structure. Consider for example fixed m-tuples c = (1, 0, 0 . . . , 0), p 1 = (2, 2, 0, . . . , 0), p 2 = (0, 0, 1, . . . , 1), for m ≥ 3, and the set L of vectors that can be expressed as a sum c + n1 · p 1 + n2 · p 2 , where n1 , n2 are nonnegative integers. L consists of all vectors of the shape (k1 , . . . , km ), where k1 is any odd positive integer, k2 = k1 − 1, and k3 = . . . = km are any nonnegative integers independent of k1 and k2 . The set L is then called a linear set with constant c and periods p 1 and p 2 . A linear set may have any finite number of periods, not just two as in the example above, and the components of the constant and the periods can be any nonnegative integers. A semilinear set is now simply a union of finitely many linear sets. Semilinear sets generalise the class of ultimately periodic sets of integers to higher dimensions and have applications in formal language theory and database theory. One of the earliest and most important results on the connection between semilinear sets and context-free languages is Parikh’s theorem [17], which states that any context-free language is mapped to a semilinear set via a function known as the Parikh vector1 of a string. Another interesting result, due to Ibarra [14], characterises semilinear sets in terms of reversal-bounded multicounter machines. Moving beyond abstract theory, semilinear sets have also recently been applied in the fields of DNA self-assembly [6] and membrane computing [15].

*

Corresponding author. E-mail addresses: [email protected] (Z. Gao), [email protected] (H.U. Simon), [email protected] (S. Zilles). 1 For a fixed order (a1 , . . . , am ) over a given finite alphabet  = {a1 , . . . , am }, the Parikh vector of a finite string s over  is the vector (q1 , . . . , qm ) ∈ Nm 0, where qi is the number of occurrences of the symbol ai in s. https://doi.org/10.1016/j.tcs.2017.11.024 0304-3975/© 2017 Elsevier B.V. All rights reserved.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.2 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

2

Considering the variety of areas in which semilinear sets play a role, it is not surprising that the machine learning community has shown interest in the learnability of such sets in various formal models. Learning of semilinear sets has been investigated in Valiant’s PAC-learning model [1], Gold’s learning in the limit model [22], and Angluin’s query learning model [22]. Abe [1] showed that when the integers are encoded in unary, the class of semilinear sets of dimension 1 or 2 is polynomially PAC-learnable; on the other hand, the question as to whether classes of semilinear sets of higher dimensions are PAC-learnable is open. Takada [22] established that for any fixed dimension, the family of linear sets is learnable in the limit from positive examples but the family of semilinear sets is not learnable in the limit from only positive examples. In addition, Takada showed the existence of a learning procedure via restricted subset and restricted superset queries that identifies any semilinear set and halts; however, he proved at the same time that any such algorithm must necessarily be time consuming. He also proved that for any variable dimension, if for any unknown linear set L and any conjectured semilinear set U , queries as to whether or not L ⊆ U can be made, then the class of linear sets is learnable in polynomial time of the minimum size of representations (of each linear set) and the dimension. Due to the complexity of the class of semilinear sets in general, several of the positive learnability results in the literature concern the special case of linear sets; further it is common to deal with restrictions on the dimensionality m. The present work is primarily concerned with classes of linear sets of nonnegative integers and pairs of integers, i.e., the cases when m = 1 and when m = 2. We study the learnability of such classes in settings different from those studied by Abe or Takada, namely teaching settings, in which the information presented to the learner consists of labelled examples2 that are selected by a benevolent source called the teacher. After the teacher presents a set S of labelled examples, the learner makes a conjecture as to the identity of the target concept. If the learner’s conjecture is correct, then it is said to have successfully learnt the target concept. All teaching and learning models discussed in the present work involve a protocol between the teacher and learner. A protocol maps a concept class C to a teacher–learner pair; for any c ∈ C and any set S of labelled instances over the universe, the teacher maps c to a set of labelled instances and the learner maps S to some c  ∈ C . As an illustration, consider the teaching set protocol [11,12,19]. According to this protocol, for any given c ∈ C , the teacher selects a smallest possible set S c of labelled instances such that c is the only concept in C consistent with S c while the learner maps any set S of labelled instances to some c  ∈ C that is consistent with S. Zilles, Lange, Holte and Zinkevich [24] later designed a protocol – the recursive teaching protocol – that only exploits an inherent hierarchical structure of any concept class. The RTD of a concept class is the maximum sample complexity derived by applying the recursive teaching protocol to the class. Our interest in the RTD partly stems from the fact that this parameter has a number of surprising connections to other central notions in computational learning theory, namely to the VC-dimension and to sample compression schemes [4,5,20]. In addition, the RTD possesses several regularity properties and has been fruitfully applied to the analysis of pattern languages [9,16], which, when defined over unary alphabets, exhibit a correspondence to linear sets. When studying the sample complexity of concept learning from teachers, a core question is to determine the number of labelled examples that are needed in the worst case to teach any concept in a given concept class; the present paper deals with exactly this question for various classes of linear sets. Two combinatorial parameters (and some variants) will be used to measure the sample complexity, namely the teaching dimension (TD) [11,19], which refers to the sample complexity of the teaching set protocol mentioned above, and the recursive teaching dimension (RTD) [24], which measures the sample complexity of the recursive teaching protocol. In some cases, teaching with positive examples, i.e., using only m-tuples contained in the target set, appears to be a natural strategy. In order to study how restrictive such a choice of examples is, we compare the TD and RTD parameters to the corresponding parameters referring to settings when only examples labelled with “+” are allowed. We denote these parameters by TD+ and RTD+ , respectively. Our main results can be summarised as follows.

• We determine the exact values or almost matching upper and lower bounds of the TD, TD+ , RTD and RTD+ for various • •

• •

2

classes of linear sets (see Table 1 and Section 2.1 for the definitions of these classes). For many classes of linear sets considered in the present work, the RTD is significantly smaller than the TD (Table 1). We show that there are natural classes of linear sets that cannot be taught in the recursive setting using only positive examples while the RTD is finite; these examples illustrate how supplying negative information may sometimes be indispensable to successful teaching and learning. We demonstrate that the RTD+ is closely related to the notions of weak and strong spanning sets. In particular, the RTD+ of a concept class C is upper bounded by the size of the largest minimum strong spanning set over all L ∈ C , and lower bounded by the size of the largest minimum weak spanning set over all L ∈ C (Lemma 24). For intersectionclosed concept classes such as the class of linear sets, weak spanning sets are also strong spanning sets and vice versa (Lemma 25). Coinfinite linear subsets of N0 with constant 0 do not have finite teaching sets (Corollary 15). The minimum teaching set of any cofinite linear subset L of N0 with constant 0 may be characterised by means of a partial order defined with respect to the set of periods of L (Corollary 16).

m A labelled example for a linear set L over Nm 0 is a pair ((a1 , . . . , am ), ) ∈ N0 × {+, −} where  = + if (a1 , . . . , am ) ∈ L and  = − otherwise.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.3 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

3

Table 1 Summary of results on TD, TD+ , RTD and RTD+ . Class

TD

TD+

CF–LINSET1 CF–LINSET2 CF–LINSET3 CF–LINSETk , k ≥ 5 LINSET1 LINSET2 LINSET3 LINSET NE–LINSETk , k ≥ 1 NE–LINSET LINSET2=2 S–LINSETm,n , m ≥ 1, n ≥ 2

0 (Theorem 22) 3 (Theorem 22) 5 (Theorem 22) ∞ (Theorem 22(IV)) ∞ (Remark 18) ∞ (Remark 18) ∞ (Remark 18) ∞ (Remark 18) ∞ (Remark 43) ∞ (Remark 43) ∞ (Theorem 45) ∞ (Theorem 34)

0 (Theorem ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark ∞ (Remark

22) 23) 23) 23) 23) 23) 23) 23) 23) 23) 23) 23)

RTD

RTD+

0 (Theorem 22) 2 (Theorem 30) {2, 3} (Theorem 30) {k − 1, k} (Theorem 30) 1 (Theorem 32(II)) 3 (Theorem 32(II)) {3, 4, 5} (Theorem 32(II)) ∞ (Corollary 31) {k − 1, k, k + 1} (Theorem 42) ∞ (Remark 43) {3, 4} (Theorem 45) ∞ (Theorem 34)

0 2 3 k 1

(Theorem 22) (Theorem 30) (Theorem 30) (Theorem 30) (Theorem 32(II)) ∞ (Theorem 32(I)) ∞ (Theorem 32(I)) ∞ (Theorem 32(I)) k + 1 (Theorem 41) ∞ (Remark 43) ∞ (Theorem 45) ∞ (Theorem 34)

• To ease the analysis of recursive teaching of linear sets, we exhibit a visually intuitive graph-theoretic interpretation of the RTD (Section 3), which may be of general interest beyond the scope of the present paper. Our results are of relevance not only from a computational learning theory perspective, but also from a formal language theory perspective. They uncover a number of structural properties of linear sets, especially in the one-dimensional case, which could be applied to study formal languages via the Parikh vector function. Consider, for example, the set L (π ) of all words obtained by substituting nonempty strings over {a} for variables in some nonempty string π of symbols chosen from {a} ∪ X , where X is an infinite set of variables. L (π ) is a special case of a non-erasing pattern language [2]. As will be seen later, the Parikh vector maps L (π ) to a linear subset L of the natural numbers such that the sum of L’s periods does not exceed the constant associated to L. Thus one could determine various teaching complexity measures of any non-erasing pattern language from the teaching complexity measures of a certain linear subset of the natural numbers. One may also investigate the teaching properties of parallel computation models via the Parikh vector function, which maps equal matrix languages [21] and weakly persistent Petri nets [23] to semilinear sets. This paper is an expanded version of an earlier conference paper [10]. 2. Preliminaries

N0 denotes the set of all nonnegative integers and N denotes the set of all positive integers. For each integer m ≥ 1, let m Nm 0 = N0 × . . . × N0 (m times). N0 is regarded as a subset of the vector space of all m-tuples of rational m numbers over the rational numbers. For each r ∈ N, [r ] denotes {1, . . . , r }. For any v = (a1 , . . . , am ) ∈ Nm i =1 ai . 0 will denote 0 , define v 1 = the zero vector in Nm when there is no possibility of confusion. 0 2.1. Linear sets m A subset L of Nm 0 is said to be linear iff there exist an element c and a finite subset P of N0 such that L = c + P := {q : q = c + n1 p 1 + . . . + nk pk , ni ∈ N0 , p i ∈ P }. c is called the constant and each p i is called a period of c + P . Denote 0 + P by P . For any linear set L, if L = c + P , then (c , P ) is called a representation of L. Any finite P ⊂ Nm 0 is independent iff for   all P   P , it holds that P  = P . A representation (c , P ) of a linear set L is canonical iff P is independent. A linear subset of Nm 0 will also be called a linear set of dimension m. { p 1 , . . . , p k } will often be written as p 1 , . . . , p k and d p 1 , . . . , p k will denote dp 1 , . . . , dpk . Our paper will focus on the linear subsets of N0 . The main classes of linear sets investigated are denoted as follows. In these definitions, k ∈ N.

(i) (ii) (iii) (iv) (v) (vi)

3 4

LINSETk := { P : P ⊂ N ∧ 1 ≤ | P | ≤ k}.  LINSET := k∈N LINSETk . CF–LINSETk := { P : P ⊂ N ∧ gcd( P ) = 1 ∧ 1 ≤ | P | ≤ k}.3  CF–LINSET := k∈N CF–LINSETk .  NE–LINSETk := {c + P : c ∈ N0 ∧ P ⊂ N0 ∧ | P | ≤ k ∧ p ∈ P p ≤ c }.4  NE–LINSET := k∈N NE–LINSETk .

CF stands for “cofinite.” NE stands for “non-erasing.”

JID:TCS AID:11396 /FLA

4

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.4 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

Note that the classes in items (I) to (IV) exclude singleton linear sets; the reason for this omission will be explained later. The motivation for studying each subfamily in items (III) to (VI) will be explained as it is introduced in the forthcoming sections. 2.2. Teaching dimension and recursive teaching dimension The two main teaching parameters studied in this paper are the teaching dimension and the recursive teaching dimension. Let L be a family of subsets of some set X , where X is called the instance space. Any set L ∈ L is called a concept. Let L ∈ L and T be a subset of X × {+, −}. Furthermore, let T + (resp. T − ) be the set of elements x ∈ X for which (x, +) ∈ T (resp. (x, −) ∈ T ). X ( T ) is defined to be T + ∪ T − . A subset L of X is said to be consistent with T iff T + ⊆ L and T − ∩ L = ∅. T is a teaching set for L w.r.t. L iff T is consistent with L and for all L  ∈ L \ { L }, T is not consistent with L  . Every element of X × {+, −} is known as a labelled example. Most of the discussion in this paper focuses on the case where X = Nm 0 for some m ∈ N, but some results presented below also hold for more general choices of X . Definition 1. [11,19] Let L be any family of subsets of X . Let L ∈ L. The size of a smallest teaching set for L w.r.t. L is called the teaching dimension of L w.r.t. L, denoted by TD( L , L). The teaching dimension of L is defined as sup{TD( L , L) : L ∈ L} and is denoted by TD(L). If there is a teaching set for L w.r.t. L that consists of only positive examples, then the positive teaching dimension of L w.r.t. L is defined to be the smallest possible size of such a set, and is denoted by TD+ ( L , L). If there is no teaching set for L w.r.t. L that consists of only positive examples, then TD+ ( L , L) is defined to be ∞. The positive teaching dimension of L is defined as sup{TD+ ( L , L) : L ∈ L}. Another complexity parameter recently studied in computational learning theory is the recursive teaching dimension. It refers to the maximum size of teaching sets in a series of nested subfamilies of the family. Definition 2. (Based on [24,16].) Let L be any family of subsets of X . A teaching sequence for L is any sequence R = ((F0 , d0 ), (F1 , d1 ), . . .) where (i) the families Fi form a partition of L and each Fi is nonempty, and (ii) di = sup{TD( L , L \ 0≤ j
JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.5 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

5

Proof. Assertion (I). Let F0 = {{x} : x ∈ N0 } and F1 = {∅}. Note that for all L ∈ F0 , TD( L , L) = 1 (since {(x, +)} is a teaching set for {x} w.r.t. L for all x ∈ N0 ) and TD(∅, L \ F0 ) = 0 (since the only member of L \ F0 is the empty set). Hence ((F0 , 1), (F1 , 0)) is a positive teaching sequence for L. Assertion (II). Any teaching set for ∅ must be a subset of N = {(x, −) : x ∈ N0 }. Since any finite subset of N is consistent with some concept { y }, TD(∅, L) = ∞. Hence there is no teaching plan for L and so RTD1 (L) = RTD+ 1 (L) = ∞. In particular, TD+ (L) = TD(L) = ∞. 2 The following proposition sums up several useful properties of the teaching complexity measures introduced so far; their proofs are almost immediate from Definitions 1, 2 and 3. Proposition 5. Let L be any family of sets over an instance space. (i) (ii) (iii) (iv) (v)

If TD( L , L) = ∞ for some L ∈ L, then RTD1 (L) = ∞. If TD+ ( L , L) = ∞ for some L ∈ L, then RTD+ 1 (L) = ∞.  (Monotonicity) For all L ⊆ L and K ∈ {TD, TD+ , RTD, RTD+ , RTD1 , RTD+ 1 }, K (L ) ≤ K (L). inf{TD( L , L) : L ∈ L} ≤ RTD(L) and inf{TD+ ( L , L) : L ∈ L} ≤ RTD+ (L). TD(L) ≤ TD+ (L), RTD(L) ≤ RTD+ (L) and RTD1 (L) ≤ RTD+ 1 (L).

3. A graph-theoretic characterisation of teaching This section shall consider the teaching of infinite families in a general setting. To streamline the proofs of some results in the present paper, particularly for upper bounds on the RTD and its variants, it will be useful to adopt a graph-theoretic formulation of the teaching complexity measures introduced so far. We will show that the RTD has a rather visually intuitive graph-theoretic interpretation. Given a family L of sets over an instance space, let T be a mapping L → T ( L ) that assigns a set T ( L ) of labelled examples to every concept L ∈ L such that the labels in T ( L ) are consistent with L. Define the digraph induced by T as the graph G = ( V G , A G ), where the nodes of G are identified with the members L of L, i.e., V G = L, and a pair ( L  , L ) ∈ L × L is included in A G iff L  is consistent with T ( L ). The order of T , denoted by ord( T ), is defined as sup L ∈L | T ( L )| (possibly ∞). Define the depth of a node v in G as the length of the longest path ending in v (or as ∞ if the paths ending in v can become arbitrarily long). The weight of a node v in G is defined as the number of vertices w ∈ V G such that there exists a path from w to v in G. T is said to satisfy the finite-depth condition (resp. the finite-weight condition) if every node v in G has finite depth (resp. finite weight). Say that the mapping L → T ( L ) with L ranging over all members of L and the labels of T ( L ) consistent with L is RTD-admissible for L iff | T ( L )| < ∞ for all L ∈ L, the digraph G induced by T is acyclic, and G satisfies the finite depth condition; say that T is RTD1 -admissible for L iff T is RTD-admissible and the digraph G induced by T satisfies the finiteweight condition.5 T is RTD+ -admissible for L (RTD+ 1 -admissible for L) iff T is RTD-admissible for L (RTD1 -admissible for L) and for all L ∈ L, T ( L ) consists of only positive examples. If T is RTD-admissible for L, then it will often simply be called RTD-admissible if L is clear from the context. We define RTD+ -admissibility, RTD1 -admissibility and RTD+ 1 -admissibility analogously. The first proposition of this section establishes a firm connection between RTD-admissibility and teaching sequences, as well as between RTD1 -admissibility and teaching plans. Theorem 6. Given any family L over an instance space, let T be a mapping L → T ( L ) that assigns a set T ( L ) of labelled examples to every L ∈ L, such that L is consistent with T ( L ). (i) T is RTD-admissible  iff there exists a partition of L into L0 , L1 , . . . such that, for all i and all L ∈ Li , it holds that T ( L ) is a teaching set for L w.r.t. j ≥i L j . (ii) Let L be finite or countably infinite. T is RTD1 -admissible iff there exists an ordering L 1 < L 2 < . . . of L such that for all i, it holds that T ( L i ) is a teaching set for L i w.r.t. { L j : j ≥ i }. Proof. Assertion (I). Let G = ( V G , A G ) be the digraph induced by T . Suppose that T is RTD-admissible. For all i, define Li = { v ∈ V G : depth( v ) = i }. Since every node of G has a finite depth and G is acyclic, {Li : i ∈ N0 } is a partition of V G . Furthermore, suppose for a contradiction that there are i , j and L 1 ∈ Li , L 2 ∈ L j such that i ≤ j and L 2 is consistent with T ( L 1 ). Then the depth of L 1 must be at least j + 1 > i and so L 1 ∈ / Li , a contradiction. Conversely, suppose that there  exists a partition of L into L0 , L1 , . . . such that, for all i and all L ∈ Li , it holds that T ( L ) is a teaching set for L w.r.t. j ≥i L j . Note that for any distinct v 1 , v 2 ∈ V G such that v 1 ∈ Li and v 2 ∈ L j , if ( v 1 , v 2 ) ∈ A G ,  so that v 1 is consistent with T ( v 2 ), then i < j. (This follows from the condition that T ( v 2 ) is a teaching set for v 2 w.r.t. l≥ j Ll .) This implies that G cannot

5

Note that if G satisfies the finite-weight condition, then it also satisfies the finite-depth condition.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.6 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

6

contain any cycle. Now consider any v ∈ V G such that v ∈ Li . By the above argument, for any v 1 ∈ Li 1 with ( v 1 , v ) ∈ A G , i 1 < i. Applying this argument inductively, it follows that the length of any path ending in v is at most i. Assertion (II). Suppose there exists an ordering L 1 < L 2 < . . . of L such that for all i, it holds that T ( L i ) is a teaching set for L i w.r.t. { L j : j ≥ i }. By the proof of Assertion (I), T is RTD-admissible. Furthermore, since T ( L i ) is a teaching set for L i w.r.t. almost all concepts in L, there are at most finitely many concepts in L that are consistent with T ( L i ), so that L i has finite weight. Conversely, suppose that T is RTD1 -admissible. Define the relation R T = {( L , L  ) ∈ L × L : ( L = L  ) ∧ L is consistent with T ( L  )}, and let trcl( R T ) be the transitive closure of R T . It follows from the acyclicity of the digraph G = ( V G , A G ) induced by T that trcl( R T ) is asymmetric. Furthermore, by the finite-weight condition on G, every element of L has only finitely many predecessors w.r.t. trcl( R T ). Hence (L, trcl( R T )) is a countable strict partial order on L such that every element of L has finitely many predecessors. By [7, Theorem 1.3(1)], one can extend (L, trcl( R T )) to a linear order (L, ≺) such that every element of L has finitely many predecessors (w.r.t. ≺). In particular, there is a linear ordering L 1 ≺ L 2 ≺ . . . of L. Pick any i ∈ N. Since, for all L  ∈ L with L  = L i ,

L  is consistent with T ( L i ) → ( L  , L i ) ∈ trcl( R T ) → L  ≺ L i , it follows that T ( L i ) is a teaching set for L i w.r.t. { L j : j ≥ i }.

2

Corollary 7. For all finite or countably infinite families L of sets over an instance space, K (L) = inf({ord( T ) : T is K -admissible for L ∨ ord( T ) = ∞}), where K ∈ {RTD, RTD+ , RTD1 , RTD+ 1 }. Note that for any finite family L of sets over an instance space X and any mapping T : L → P ( X × {+, −}) such that the labels in T ( L ) are consistent with L for all L ∈ L, T is RTD1 -admissible (and hence also RTD-admissible). The next corollary is a consequence of this observation. Corollary 8. For all finite families L of sets over an instance space, RTD(L) = RTD1 (L). A family L of subsets of X is said to have finite thickness [3] iff for every x ∈ X , the class { L ∈ L : x ∈ L } is finite. Note that finite thickness is a sufficient condition for families that do not contain the empty set to have a teaching plan with positive examples.

/ L. There is a mapping L → T ( L ) that Proposition 9. Let L be any family of subsets of X such that L has finite thickness and ∅ ∈ + + assigns a set of positive examples to every L ∈ L such that T is RTD+ 1 -admissible; furthermore, RTD1 (L) = RTD (L). + / L and L has the finite thickness property. RTD+ Proof. Let L be any family of subsets of X such that ∅ ∈ 1 (L) ≥ RTD (L) follows from Proposition 5(V). First, it is shown that there is a mapping L → T ( L ) that is RTD+ 1 -admissible. As L has finite thickness, it is at most countable and so there is a one–one enumeration L 0 , L 1 , L 2 , . . . of L. By the finite thickness of L, there are only finitely 0 m many sets L 00 , . . . , L m 0 ∈ L such that min( L 0 ) ∈ L 0 ∩ . . . ∩ L 0 . (Note that L 0 = ∅ by hypothesis.) For each i ∈ {0, . . . , m}, define j

j

T ( L 0i ) = {( y , +) : y ∈ S i }, where S i is the smallest subset of L 0i such that min( L 0 ) ∈ S i and S i  L 0 for all L 0  L 0i . Inductively, assume that T ( L ik ) has been defined for all k ≤ l. Define ik+1 to be the least index such that L ik+1 ∈ / { L i 0 , . . . , L il }, and for all L j with min( L ik+1 ) ∈ L j such that T ( L j ) has not been defined yet, define T ( L j ) in a similar way to the case when L ik+1 = L 0 . Let G = ( V G , A G ) be the digraph induced by T . Note that T partitions V G into finite sets such that no edge connects any two vertices belonging to distinct partitions, and so T satisfies the finite-weight condition. Furthermore, G is acyclic as ( L , L  ) ∈ A G only if L is a proper superset of L  and each partition induced by T is finite. + It remains to show that RTD+ 1 (L) ≤ RTD (L). By the preceding argument, there is a mapping L → T ( L ) for all L ∈ L such that T is RTD+ -admissible. Let T be an RTD+ -admissible mapping for L such that ord( T ) = RTD+ (L), and let G = ( V G , A G ) be the digraph induced by T . If L is finite, then each L ∈ V G has only finitely many predecessors, implying that T is + + also RTD+ 1 -admissible (so that RTD1 (L) ≤ RTD (L)). Suppose that L were infinite. Note that ord( T ) ≥ 1, for otherwise the infinite digraph G would be cyclic, contradicting the RTD-admissibility of T . Thus one can define a mapping T  such that for all L ∈ L,

T  (L) =



T (L) {(min( L ), +)}

if T ( L ) = ∅; if T ( L ) = ∅.

Note that ord( T  ) = ord( T ) = RTD+ (L). Furthermore, T  is RTD+ -admissible and by the finite thickness of L, T  satisfies the + +  finite weight condition. Hence T  is RTD+ 1 -admissible and so RTD1 (L) ≤ ord( T ) = RTD (L). 2 Example 10. Let L = {{x} : x ∈ N0 } be a family of sets over the instance space N0 . Note that L has finite thickness (as every x ∈ N0 belongs to exactly one singleton concept) and RTD+ (L) = 1.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

[m3G; v1.226; Prn:8/12/2017; 12:57] P.7 (1-17)

7

The next proposition provides a necessary condition for any family to have a teaching sequence with positive examples. This condition will be used later to establish the non-existence of positive teaching sequences for some families of linear sets. Proposition 11. Let L be a family of subsets of X that has at least one positive teaching sequence. Let L ∈ L. Then there does not exist any infinite descending chain H 0  H 1  H 2  . . . such that { H 0 , H 1 , H 2 , . . .} ⊆ L and L  H i for each i. Proof. Suppose there is some L ∈ L for which there exists an infinite descending chain H 0  H 1  H 2  . . . with { H 0 , H 1 , H 2 , . . .} ⊆ L and L  H i for each i. Assume by way of a contradiction that ((L0 , d0 ), (L1 , d1 ), . . .) were a positive teaching sequence for L. Suppose L ∈ Li for some i. Note that for all j ∈ {0, . . . , i }, L  H j implies that H j ∈ Lk j for some k j < i. Further, for all j ∈ {0, . . . , i − 1}, since H j +1  H j , it must hold that k j < k j +1 . This contradicts the fact that 0 ≤ k j < i for all j ∈ {0, . . . , i }. 2 To conclude this section, we remark that if L has an RTD-admissible mapping, then a protocol for teaching and learning can be defined in an analogous way to the recursive teaching protocol for finite families [24]. First, define the canonical teaching sequence for L to be the sequence R can = ((L0 , d0 ), (L1 , d1 ), . . .) such that for all i,

Li = { L ∈ L \

 j
L j : TD( L , L \



L j ) < min({RTD(L) + 1, ∞})}.

j
If L is RTD-admissible, then for every L ∈ L , there exists a finite sequence (Q0 , . . . , Qm ) of subsets of L such that Qm = { L } and for all j ≤ m and all L ∈ Q j , TD( L , L \ j  < j Q j  ) < min({RTD(L) + 1, ∞}), so that L ∈ Ll for some unique l. This implies  that i ∈N0 Li = L. Furthermore, it follows by construction that ord( R can ) ≤ RTD(L), and therefore ord( R can ) = RTD(L). Given any L ∈ L, the teacher uses the canonical teaching sequence R can for L to determine the unique index i with L ∈ Li .  The teacher then presents the examples in a minimal teaching set for L w.r.t. L \ j 1. (ii) If H ⊆ P , then the following hold: (a) H ⊆ P . (b) The partial ordering ≤ P is a refinement of the partial ordering ≤ H ; that is, for any x, y, x ≤ H y implies x ≤ P y. (c) MIN P ∩ H ⊆ MIN H . (iii) Let s = a1 p 1 + . . . + ar p r ∈ P and let I = {i ∈ [r ] : ai = 0}. Then p i ≤ P s for each i ∈ I . This implies that MIN P ⊆ P .

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.8 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

8

(iv) If p , p  ∈ P and p < P p  , then p  is superfluous, i.e., P = P \ { p  } . (v) If P is independent, then P = MIN P and P is a spanning set of P in the sense that P is the unique smallest set of LINSET containing P . Moreover, every subset of P of cardinality less than | P | spans a proper subset of P . For the rest of this section, it will always be assumed that P is independent. We now study teaching sets. Lemma 14. (i) Let T be a teaching set for P ∈ LINSET w.r.t. CF–LINSET. Then P ⊆ T + . (ii) Let T be a teaching set for P ∈ LINSETk−1 w.r.t. LINSETk . Then, for each x ∈ N \ P , there exists y ∈ T − such that x ≤ P y. This assertion remains valid with CF–LINSET in place of LINSET. (iii) For any finite set P and k ∈ N with P ∈ CF–LINSETk , let T be a teaching set for P w.r.t. CF–LINSETk . Then | T + | ≥ k. + T − ∩ P = ∅. Therefore, Proof.  Assertion (I). Since  the labelling in T is consistent with P , it follows that T ⊆ P and   T + ⊆ P so that T + is consistent with T . Suppose for the sake of a contradiction that p ∈ / T + for some p ∈ P . Thus T +   is a proper subset of P that is consistent with T . There is a sufficiently large prime q such that if P  = T + ∪ {q}, then P       is consistent with T , gcd( P ) = 1, and p ∈ / P ; thus P ∈ CF–LINSET. This is a contradiction since we assumed T to be a teaching set for P w.r.t. CF–LINSET. Assertion (II). Pick an arbitrary but fixed x ∈ N \ P . We have to show that T − contains some element y such that   x ≤ P y. Let P  = P ∪ {x}. Clearly, P is a proper subset of P  and P  is consistent with T + . Since T is a teaching set     for P w.r.t. LINSETk  and P , P  ∈ LINSETk , it follows that T − contains an element y ∈ P  \ P . Thus y can be written in the form y = ax + p ∈ P a( p ) p for some properly chosen a ∈ N and a( p ) ∈ N0 . It follows that x ≤ P y, as desired. Since gcd( P ) = 1 implies that gcd( P  ) = 1, the same argument holds with CF–LINSET in place of LINSET. + + Assertion  +  (III). Let T be a teaching set for some P ∈ CF–LINSETk w.r.t. CF–LINSETk . Suppose that | T | < k. If gcd( T +) = 1, then T is a proper subset of P that is consistent with T and belongs to CF–LINSETk , which is impossible. If gcd( T ) > 1   or | T + | = 0, then, since P is cofinite, there is a sufficiently large prime p such that CF–LINSETk  T + ∪ { p }  P and   + T ∪ { p } is consistent with T , a contradiction. 2

Corollary 15. If gcd( P ) > 1 and k = 1 + | P |, then TD( P , LINSETk ) = ∞. Proof. According to Lemma 13, N \ P contains an infinite ascending chain x1 < P x2 < P x3 < P . . .. According to the second assertion in Lemma 14, a teaching set for P must contain infinitely many elements of this chain. 2 Corollary 16. If gcd( P ) = 1, then the set T ( P ) given by T ( P )+ = P and T ( P )− = MAX P is the unique smallest teaching set for P w.r.t. LINSET and also w.r.t. CF–LINSET. Proof. Suppose that H is consistent with T ( P ). We show that H = P (implying that T ( P ) is a teaching set for P ). Since P ⊆ H (by consistency), it follows that P ⊆ H . Pick an arbitrary but fixed element x from N \ P . Recall that N \ P is finite. Thus, by the definition of MAX P , there must exist an element y ∈ MAX P such that x ≤ P y. From x ≤ P y and P ⊆ H , we may conclude that x ≤ H y. The number y ∈ MAX P cannot belong to H (because H is consistent with T ( P )). Now, x ≤ H y implies that x ∈ / H . Thus, H does not contain any element outside P . It follows that P = H . Thus, T ( P ) is a teaching set for P w.r.t. LINSET, indeed. According to Lemma 14, any other teaching set must contain T ( P ) as a subset. According to Lemma 14, any other teaching set must contain T ( P ) as a subset even when P is taught w.r.t. CF–LINSET. 2 The next example shows that Lemma 14.1 does not hold in general for the restricted class LINSETk . Example 17. For any k > 1, let P = {k, k + 1, . . . , 2k − 1}. Then T = {(x, −) : x < k} ∪ {(k + i , +) : 1 ≤ i ≤ k + 1} is a teaching set for P w.r.t. LINSETk . Proof. Suppose that L ∈ LINSETk is consistent with T . Now L must be uniquely generated by an independent set H of size at most k. Since (k + i , +) ∈ T for all i ∈ {1, . . . , k + 1} and y ∈ / L for all y < k, k ∈ H holds. Thus P ⊆ L ⊆ P , and so L = P . 2 m + − Remark 18. If T is a teaching set for L  ⊆ Nm 0 w.r.t. L, then for any c ∈ N0 , c + T = {(c + x, +) : x ∈ T } ∪ {(c + y , −) : y ∈ T } is a teaching set for c + L  w.r.t. L[n] = {c + L : L ∈ L}. Thus Lemma 14 and Corollaries 15 and 16 may be readily generalised, mutatis mutandis, to the class LINSET[c ] = {c + L : L ∈ LINSET} for any c ∈ N0 . The proof of [16, Theorem 6] provides a construction that may be slightly modified to show that TD(LINS–ET1 ) = ∞ even though TD( q , LINSET1 ) < ∞ for any q > 0. By the monotonicity of TD, TD(LINSET) = TD(LINSETk ) = ∞ for all k > 0.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.9 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

9

By Corollary 15, LINSET contains infinitely many members that have an infinite TD w.r.t. LINSET. Thus for any L ⊆ LINSET, it may be difficult to interpret a value of ∞ for TD(L): (1) on the one hand, all cofinite subclasses of L may have an infinite TD w.r.t. L; (2) on the other hand, there may be a cofinite subclass of L that has a finite TD w.r.t. L. Intuitively, it seems that L in Case (2) is unteachable in a weaker sense than in Case (1), but the TD makes no such distinction. It shall be shown, however, that the RTD is a bit more well-behaved when applied to LINSET. In particular, for all L ⊂ LINSET, RTD(L, LINSET) grows only linearly with sup{min( P ) : P ∈ L ∧ min( P ) > 0}. We will also give a finer analysis of LINSETk for k ∈ {1, 2, 3}, showing that while LINSET2 does not have any positive teaching sequence, RTD(LINSETk ) < ∞ for k ∈ {1, 2, 3}. In addition, RTD(LINSETk ) grows at least linearly in k, implying that RTD(LINSET) = ∞. The question as to whether RTD(LINSETk ) < ∞ for any k > 3 remains open. First, the following proposition explains why the singleton {0} was excluded from the definition of LINSET. The proof is quite similar to that of Proposition 44, which will be proven later. Proposition 19. RTD({{0}}, {{0}} ∪ LINSET1 ) = ∞. Proof. Assume for a contradiction that R = ((L0 , d0 ), (L1 , d1 ), . . .)were a teaching subsequence for {{0}} ∪ LINSET1 covering {{0}}. Suppose {0} ∈ Li , and let T be a teaching set for {0} w.r.t. j ≥i L j . Choose N > max({d j : j < i }) so that N is greater than x for every x ∈ X ( T ). Further, let p 0 , . . . , p N +i −1 be a strictly increasing sequence of N + i primes. Observe that by  N +i −1 the choice of N, L 0 = j =0 p j is consistent with T and so it must belong to L j 0 for some j 0 < i. For any two distinct    ( N + i − 1)-subsets S , S  of { p 0 , . . . , p N +i −1 }, L 0  y∈ S y and y∈ S y ∩ y∈ S  y ⊆ L 0 . Thus by the choice of N, there  must exist some ( N + i − 1)-subset S 1 of { p 0 , . . . , p N +i −1 } so that L 1 = y ∈ S 1 y ∈ L j 1 for some j 1 < j 0 . By iterating a  similar argument, one can show that for some N-subset S  of { p 0 , . . . , p N +i −1 }, y ∈ S  y ∈ L0 . But the teaching dimension  of y ∈ S  y w.r.t. {{0}} ∪ LINSET1 is at least N > d0 , a contradiction. 2 Proposition 19 should be contrasted with the observation that RTD(LINSET1 ) = 1: (( 1 , 1), ( 2 , 1), . . .) is a teaching plan with positive examples for LINSET1 , where the ith linear set in the plan is i and {(i , +)} is a teaching set for i w.r.t. { j : j ≥ i }. The next theorem shows that for any finite L ⊂ LINSET, RTD(L, LINSET) < ∞; in fact, for any L ⊂ LINSET, RTD(L, LINSET) is at most linear in sup{min( P ) : P ∈ L ∧ min( P ) > 0}. Theorem 20. Let Fn = { P : P is independent ∧ min( P ) ≤ n}. Then RTD(Fn , LINSET) ≤ 2n − 1. Proof. We shall use the following two facts: (1) if P = { p 1 , . . . , pk } ⊆ N0 is independent, p 1 = min P , and U ⊆ N0 is a set whose elements are pairwise incomparable w.r.t. ≤ P , then for all i = 1, . . . , k, the elements of the sequence (u mod p i )u ∈U are distinct, which implies that |U | ≤ p 1 ; (2) if P = { p 1 , . . . , pk } ⊆ N0 is independent, p 1 = min( P ) and d = gcd( P ), then k = | P | ≤ p 1 and |MAX P | ≤ p 1 − 1. Proof of (1). If u 1 ≡ u 2 (mod p i ) for some u 1 , u 2 ∈ U with u 1 < u 2 , then u 2 − u 1 = ap i for some a ∈ N0 and so u 1 ≤ P u 2 , which is impossible by the definition of U . Proof of (2). Note that P = MIN P and MAX P both are sets whose elements are pairwise incomparable w.r.t. ≤ P . Moreover, no element of MAX P is a multiple of p 1 (which rules out 0 as the smallest residue). (2) then follows from (1). Let P range over finite independent subsets of N0 . We shall show that the mapping P → T ( P ) given by T ( P )+ = P and T ( P )− = MAX P is RTD-admissible for L. It suffices to show that the digraph  G = (L, A ) induced by the mapping

P → T ( P ) is acyclic and every node P ∈ L has a finite depth. Suppose that ( P  , P ) ∈ A. It follows from the con  struction of G that P  is consistent with T ( P ). The consistency with T ( P )+ = P implies that d = gcd( P  ) is a divisor of d = gcd( P ). It suffices to show that d is a proper divisor of d since this implies that G is acyclic and that the depth of   d = gcd( P ). Suppose for sake of contradiction that d = d so that  P  is bounded by the number of prime power divisors of P , P ⊆ d · N0 . Now we may argue as follows. Since ( P  , P ) ∈ A, the two linear sets do not coincide so that we may  pick a point u from their symmetric difference. Since both sets are subsets of d · N0 , there exists  linear    u ∈ N such that u = du  . But then u  is in the symmetric difference of (1/d) P  and (1/d) P . On the other hand, since P  is consistent with   T ( P ), it follows that (1/d) P  is consistent with (1/d) P labelled “+” and (1/d)MAX P = MAX(1/d) P labelled “−”. According to Corollary 16, (1/d) · P labelled “+” and MAX(1/d) P labelled “−” is a teaching set for (1/d) P w.r.t. L, a contradiction. Finally, observe that from Facts (1), (2), (3), the RTD-admissibility of T for L, and the condition that min( P ) ≤ n for all

P ∈ Fn with P independent, one can find a teaching subsequence for L covering Fn such that the order of this teaching subsequence is at most n + (n − 1) = 2n − 1. 2 Theorem 30 (to be shown later) will imply that the order of any teaching sequence for LINSET must necessarily be infinite. Nonetheless, the preceding theorem shows roughly that the growth of RTD(L, LINSET) with L is relatively modest if the minimum positive periods of all L ∈ L vary only linearly. The next series of results will present a detailed study of CF–LINSETk for each k and CF–LINSET, which comprises all linear sets P such that P ( = ∅) is a finite subset of N and gcd( P ) = 1. By Proposition 12, this is precisely the class of

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.10 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

10

cofinite linear subsets of N0 with constant 0. Since LINSETk is a union of classes of linear sets, each of which is isomorphic to CF–LINSETk , it is hoped that investigating the teaching complexity of CF–LINSETk may lead to some insights into the question of whether RTD(LINSETk ) is finite for each k. CF–LINSET is also perhaps interesting in its own right: on the one hand, the teaching dimension of CF–LINSETk is finite for k ≤ 3 but infinite for k ≥ 5; on the other hand, for all k, CF–LINSETk has a relatively simple teaching sequence that gives it an RTD+ of k. We shall give an almost complete analysis of TD(CF–LINSETk ) for all k; the case k = 4 is left open. First, we make explicit the connection between the present work and the theory of numerical semigroups [18]. The sets in the family CF–LINSET are called numerical semigroups in the literature (see, for example, [18]). Let S = P be a numerical semigroup. Denote by PF( S ) the set of pseudo-Frobenius numbers of S. It is well-known [18] that PF( S ) coincides with the set of maximal elements of N0 \ S w.r.t. the following partial ordering:

a P b ⇔ b − a ∈ S. Note that the partial ordering ≤ P is a refinement of  P . It follows that maximality w.r.t. ≤ P implies maximality w.r.t.  P ; that is, MAX P ⊆ PF( S ). The following result presents a sufficient condition for MAX P = PF( S ) to hold. Lemma 21. Let S = P ∈ CF–LINSET. Then the following holds:

(∀a ∈ N0 )((a ∈ PF( S ) ∧ {2a, 3a} ⊂ S ) → (a ∈ MAX P )). In particular, if 2a, 3a ∈ S for all a ∈ PF( S ), then MAX P = PF( S ). Proof. Let a ∈ PF( S ). We show that if {2a, 3a} ⊂ S, then a ∈ MAX P ; in other words, for all y ∈ N0 \ P such that a ≤ P y, it holds that a = y. Consider any y ∈ N0 \ P with a ≤ P y. There are m ∈ N and x ∈ P such that y = ma + x. If m ≥ 2, then, since {2a, 3a} ⊆ S, it would follow that ma + x ∈ S, contradicting the choice of y. Thus m = 1, and so a  P y. The condition a ∈ PF( S ) then yields y = a. 2 Theorem 22. (i) (ii) (iii) (iv)

TD(CF–LINSET1 ) = 0; TD(CF–LINSET2 ) = 3; TD(CF–LINSET3 ) = 5; for each k ≥ 5, TD(CF–LINSETk ) = ∞.

Proof. Assertion (I). Note that CF–LINSET1 = { 1 }. The empty set is a teaching set for 1 w.r.t. CF–LINSET1 . Assertion (II). We first prove the upper bound. Note that N0 is the only member of CF–LINSET2 that is generated by one number. {(1, +)} is a teaching set for N0 w.r.t. CF–LINSET2 , and so TD(N0 , CF–LINSET2 ) = 1. Now consider L = p 1 , p 2 , where gcd( p 1 , p 2 ) = 1. It is well-known [18, Example 2.22] that PF( S ) = { p 1 p 2 − p 1 − p 2 }, and so it follows from Lemma 21 that MAX P = { p 1 p 2 − p 1 − p 2 }. According to Corollary 16, the set T ( P ) given by T + = P and T − = MAX P is a teaching set w.r.t. LINSET. Since this teaching set consists of two positive examples and one negative example, TD(CF–LINSET2 ) ≤ 3. For the lower bound, choose primes p 1 , p 2 , p 3 such that 2 < p 1 < p 2 < p 3 . Note that 2, p 1 p 2 p 3  2, p 1 p 2  2, p 1 is a chain in CF–LINSET2 . Thus if T is any teaching set for 2, p 1 p 2 w.r.t. CF–LINSET2 such that | T | ≤ 2, then T must contain exactly one positive example (x1 , +) and exactly one negative example (x2 , −). Choose any prime p > max({x1 , x2 , 2, p 1 p 2 }). Then x1 , p is consistent with T but 2, p 1 p 2 = x1 , p . Hence | T | ≥ 3. Assertion (III). We first show that TD(CF–LINSET3 ) ≤ 5. Let P = { p 1 , p 2 , p 3 } such that gcd( P ) = 1 and let S = P . It is well-known [18, Corollary 10.22] that S has at most two pseudo-Frobenius numbers. Hence |MAX P | ≤ 2. By Corollary 16, the set {(a, +), (b, +), (c , +)} ∪ {( y , −) : y ∈ MAX P }, which has size at most 5 by the preceding argument, is a teaching set for P w.r.t. CF–LINSET3 . Combining the preceding result with Corollary 16 and the inequality TD( P , CF–LINSET3 ) ≤ TD( P , LINSET) gives the upper bound of 5. To show that TD(CF–LINSET3 ) ≥ 5, consider the set E = 3, 10, 14 . Note that {3, 10, 14} is independent. By Lemma 14.3, any teaching set for E w.r.t. CF–LINSET3 must contain at least 3 positive examples. In addition, note that E = 3, 5 ∩ 3, 7 . Thus T must contain at least one negative instance ( y 1 , −) to distinguish E from 3, 5 and at least one negative instance ( y 2 , −) with y 2 = y 1 to distinguish E from 3, 7 , and so T contains at least two negative examples. Therefore | T | ≥ 5. due to J. Backelin [8]. Consider the sequence ( S n,r ) of numerical semigroups such Assertion (IV). We adapt a construction   that each S n,r is given by S n,r = P n,r and P n,r = {s, s + 3, s + 3n + 1, s + 3n + 2}, where n ≥ 2, r ≥ 3n + 2 and s = r (3n + 2) + 3. As was stated in [8], the set of Frobenius numbers of S n,r satisfies

PF( S n,r ) ⊇ M := {(r + 1)s + i : −2 ≤ i ≤ 3n − 1 ∧ i ≡ 0 (mod 3)}; Note that |MAX P n,r | ≥ | M | = 2n + 2 grows with n to infinity. We claim that for each a ∈ M, we have {2a, 3a} ⊂ S n,r . First, it is shown that for each a ∈ M, 2a ∈ S n,r . Note that for any i ∈ {−2, . . . , 3n − 1} with i ≡ 0 (mod 3), it holds that 2i ≡ 0 (mod 3). Consider any i ∈ {−2, . . . , 3n − 1} such that 2i ≡ 1 (mod 3) and 2i ≥ 3n + 1. Then there exists a

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.11 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

11

j ≤ n − 1 so that s + 3n + 1 + js + 3 j = ( j + 1)s + 3n + 1 + 3 j = ( j + 1)s + 2i. Since 2r + 1 − j ≥ 0, it follows that (2r + 1 − j )s + ( j + 1)s + 2i = (2r + 2)s + 2i ∈ S n,r . A similar argument applies if 2i ≡ 2 (mod 3). To see that 3a ∈ S n,r , consider again any i so that i ≤ 3n − 1. Then 3r + 3 − i ≥ 0, and therefore (3r + 3 − i )s + i (s + 3) = (3r + 3)s + 3i ∈ S n,r . Thus {2a, 3a} ⊂ S n,r for any a ∈ M. Consequently, by Lemma 21, M ⊆ MAX P n,r . Note that |MAX P n,r | ≥ | M | = 2n + 2 grows with n to infinity. According to the second assertion in Lemma 14, a teaching set for S n w.r.t. CF–LINSETk must contain all elements of MAX P n,r as negative examples if k ≥ 5. Thus TD(CF–LINSETk ) = ∞ holds for all k ≥ 5. 2 Remark 23. Note that for any family L such that there are L 1 , L 2 ∈ L with L 1 ⊂ L 2 , TD+ ( L 1 , L) = TD+ (L) = ∞. In particular, TD+ (L) = ∞ for any L ∈ {LINSETk , CF–LINSETk } whenever k ≥ 1 and k ≥ 2. To establish the next main result, we shall first relate the RTD+ to certain general structural properties of families of sets over an instance space. Recall that a family L of subsets of an instance space X is said to be intersection-closed iff for all L 1 , L 2 ∈ L, L 1 ∩ L 2 ∈ L. It is readily seen that LINSET and CF–LINSET are intersection-closed families. Say that S is a weak spanning set for L w.r.t. L iff S ⊆ L but S  L  for any L  ∈ L such that L  ⊂ L.6 That is, L contains S but there is no proper subset of L in L that contains S as well. Say that S is a strong spanning set for L w.r.t. L iff S ⊆ L but S ⊆ L  for some L  ∈ L can only happen if L ⊂ L  . In other words, L contains S but the members of L do not contain S unless they are supersets of L. Let I W ( L , L) denote the size of the minimum weak spanning set for L w.r.t. L (possibly ∞) and let I W (L) = sup L ∈L I W ( L , L) (again possibly ∞). Let I S ( L , L) and I S (L) be the corresponding notions for strong spanning sets. Note that I W and I S are monotonic, that is,

L ⊆ L ⇒ I W (L ) ≤ I W (L) and I S (L ) ≤ I S (L). Say that L contains infinite ascending chains if there exists a sequence ( L i )i ≥0 of sets L i ∈ L such that L i ⊂ L i +1 for all i ≥ 0. Lemma 24. Let L be any family of sets over an instance space. (i) RTD+ (L) ≥ I W (L). (ii) Suppose that L does not contain any infinite ascending chain. Then RTD+ (L) ≤ I S (L). (iii) Suppose that L does not contain any set L that has infinitely many distinct proper supersets in L. Then RTD+ 1 (L) ≤ I S (L). Proof. Assertion (I). Let L ∈ L. In any positive teaching sequence S for L and any L  , L  ∈ L such that L  ⊂ L  , L  must be taught before L  . Thus the positive teaching set S for L that is used in S must distinguish L from all its proper subsets in L. Thus S is a weak spanning set for L w.r.t. L. Consequently, RTD+ ( L , L) ≥ I W ( L , L) and so RTD+ (L) ≥ I W (L). Assertion (II). Let S be a mapping that assigns a minimum strong spanning set S ( L ) for L w.r.t. L. to every L ∈ L. S ( L ) distinguishes L from any L  ∈ L except for supersets L  ⊇ L. It follows that the digraph G S induced by S contains only arcs ( L  , L ) ∈ L × L such that L ⊂ L  . G S is acyclic, and since L does not contain any infinite ascending chain, it follows that every node in G S has a finite depth. Thus S is RTD-admissible and RTD+ (L) ≤ I S (L). Assertion (III). Given the stronger condition that L does not contain any set L that has infinitely many distinct proper supersets in L, the conclusion that RTD+ (L) ≤ I S (L) can be strengthened to RTD+ 1 (L) ≤ I S (L). Under this stronger condition, the mapping S is even RTD1 -admissible. 2 Lemma 25. If L is intersection-closed, then every weak spanning set is a strong one, so that I W (L) = I S (L). Proof. Let S be a weak spanning set for L w.r.t. L. Consider any L  ∈ L such that S ⊆ L  . The intersection-closed property of L implies that S ⊆ L ∩ L  ∈ L. By the definition of a weak spanning set, L ∩ L  cannot be a proper subset of L. Thus L ∩ L  = L and so L ⊆ L  . Therefore S is a strong spanning set for L w.r.t. L. 2 Lemma 26. Let L = P ∈ LINSETk . Then P is a strong spanning set for L w.r.t. LINSET (and thus also w.r.t. LINSETk ). Moreover, there is no minimum weak spanning set for L w.r.t. LINSETk other than P (and thus no minimum weak spanning set for L w.r.t. LINSET other than P ). Proof. P ⊆ L  ∈ LINSET implies that L = P ⊆ L  . Thus P is a strong spanning set for L w.r.t. LINSET. Now let S ⊆ L = P be a weak spanning set for L w.r.t. LINSETk , so that | S | ≤ | P | = k and S ⊆ L = P . Since S distinguishes L from any L  ∈ LINSETk such that L  ⊂ L, it follows that S = L. This is only possible if P ⊆ S, and since | S | ≤ | P |, S = P . 2 Corollary 27. I W (LINSETk ) = I S (LINSETk ) = k. 6

A finite weak spanning set for L w.r.t. L is also known in the computational learning theory literature as a tell-tale set for L w.r.t. L [3].

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.12 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

12

Lemma 28. I W (CF–LINSETk ) = I S (CF–LINSETk ) = k. Proof. Let P be an independent set such that | P | = k and gcd( P ) = 1. Let L = P . We claim that I W ( L |CF–LINSETk ) ≥ k. Consider any set S with | S | ≤ k − 1. It suffices to show that there is a proper subset L  ∈ CF–LINSETk of L that contains S. If gcd( S ) = 1, set L  = S . Suppose that gcd( S ) > 1. Choose any prime p ∈ L \ S and set L  = S ∪ { p } . Note that such a prime exists because L is cofinite. Further, gcd( S ∪ { p }) = 1, so that S ∪ { p } ∈ CF–LINSETk . 2





Lemma 29. Let Lk = { k, p 1 , . . . , pk−1 : p i ∈ {k + i , 2k + i }}. Then TDmin (Lk ) ≥ k − 1. Proof. If k + i is neither given as a positive example nor as a negative example, then, even if the set P  = P \ { p i } is known, it is impossible to distinguish between the numerical semigroup generated by P  ∪ {k + i } and the numerical semigroup generated by P  ∪ {2k + i }. 2 Theorem 30. For all k ≥ 1, RTD(CF–LINSETk ) ∈ {k − 1, k} and RTD+ (CF–LINSETk ) = RTD+ (CF–LINSETk ) = k. Moreover, RTD(CF–LINSET2 ) = 2. Proof. We first prove RTD+ (CF–LINSETk ) = k. Every L ∈ CF–LINSETk is cofinite and thus has only finitely many supersets. Combining Assertions (I) and (III) of Lemma 24 with Lemma 28 gives RTD+ (CF–LINSETk ) ≥ I W (CF–LINSETk ) = k and RTD+ 1 (CF–LINSETk ) ≤ I S (CF–LINSETk ) = k. The lower bound k − 1 for RTD(CF–LINSETk ) follows from k − 1 ≤ TDmin (Lk ) ≤ RTD(Lk ), where Lk is as defined in Lemma 29. The upper bound follows from RTD(CF–LINSETk ) ≤ RTD+ (CF–LINSETk ) = k. To see that RTD(CF–LINSET2 ) ≥ 2, consider the subfamily S = { p 1 , p 2 : p 1 = p 2 ∧ { p 1 , p 2 } is independent ∧ gcd( p 1 , p 2 ) = 1} and observe that RTD(CF–LINSET2 ) ≥ RTD(S ) ≥ min({ T D ( L , S ) : L ∈ S }) ≥ 2. 2 Corollary 31. TD(CF–LINSET) = RTD(CF–LINSET) = RTD+ (CF–LINSET) = RTD(LINSET) = ∞. For each k ∈ {1, 2, 3}, the result on TD(CF–LINSETk ) may be directly applied to construct a teaching sequence of finite order for LINSETk . Theorem 32. (i) LINSET2 does not have any positive teaching sequence. (ii) RTD+ (LINSET1 ) = RTD(LINSET1 ) = 1, RTD(LINSET2 ) = 3 and 3 ≤ RTD(LINSET3 ) ≤ 5. Proof. Assertion (I). Let p 1 , p 2 , p 3 , . . . be a strictly increasing infinite sequence of primes with p 1 > 2. Note that for all j,

2  2, p 1 . . . p j . Further, 2, p 1  2, p 1 p 2  2, p 1 p 2 p 3  . . .  2, p 1 . . . p j  2, p 1 . . . p j p j +1  . . . is an infinite descending chain in LINSET2 . Thus by Proposition 11, LINSET2 does not have a positive teaching sequence. Assertion (II). Earlier, after Proposition 19, we described a teaching plan with positive examples of order 1 for LINSET1 . Here, we will define an RTD-admissible mapping T : LINSET2 → N0 × {+, −} such that ord( T ) ≤ 3; an RTD-admissible mapping T  : LINSET3 → N0 × {+, −} of order no more than 5 can be constructed analogously. For each P ∈ LINSET2 such that P is independent, let T ( L )+ = P and T ( L )− = MAX P . One can argue as in Theorem 20 that T is RTD-admissible. The proof of Assertion (II) in Theorem 22 shows that | P | + |MAX P | ≤ 3. Thus by Corollary 7, RTD(LINSET2 ) ≤ 3. To see that RTD(LINSET2 ) ≥ 3, assume by way of a contradiction that R = ((L0 , d0 ), (L1 , d1 ), (L2 , d2 ), . . .) were a teaching ≤2 sequence of order at most 2 for LINSET0 . Choose the minimum i such that Li contains some linear set m with m ∈ N. Fix such an m and let T be a teaching set for m w.r.t. j ≥i L j . Now choose i + 1 numbers p 0 , p 1 , . . . , p i with max( X ( T )) < p 0 < p 1 < . . . < p i such that gcd(m, pl ) = 1 for all l ∈ {0, 1, . . . , i } and m, p 0 ⊃ m, p 1 ⊃ . . . ⊃ m, p i . Note that for all l ∈ {0, 1, . . . , i }, m, pl is consistent with T . Thus m, p i ∈ L j 0 for some j 0 < i. We claim that m, p i −1 ∈L j 1 for some j 1 < j 0 . Suppose not. Then, since m, p i −1 ⊃ m, p i ⊃ m and j 0 < i, any teaching set T  for m, p i w.r.t. j ≥ j 0 L j must contain at least one positive example (m , +) and one negative example. Since ord( R ) ≤ 2, (m , +) must be the only positive example in T  . Moreover, as m is consistent with T  , it must hold that m ∈ L i  for some i  < j 0 < i. But this contradicts the assumption that i is the least index l such that Ll contains a linear set with exactly one generator. Thus m, p i −1 ∈ L j 1 for some j 1 < j 0 . Repeating a similar argument for m, p i −2 , . . . , m, p 0 gives that m, p 0 , m, p 1 , . . . , m, p i belong to L j i , L j i−1 , . . . , L j 1 , L j 0 respectively for some j 0 , j 1 , . . . , j i ∈ N0 such that j i < j i −1 < . . . < j 0 < i, which is impossible. Therefore no such teaching sequence R for LINSET2 exists, and so RTD(LINSET2 ) ≥ 3. 2 A semilinear set is a finite union of linear sets. Semilinear sets are objects of interest in models of parallel computation such as matrix grammars and Petri nets; for example, the Parikh image of an equal matrix language [21], as well as any reachability set of a weakly persistent Petri net [13], is semilinear. Unfortunately, the next result shows that even the class of semilinear sets that are unions of at most two linear sets generated by one period has infinite RTD.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.13 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

13

Definition 33. Let m, n ∈ N. The class of semilinear sets { P 1 ∪ . . . ∪ P n : 0 < | P i | ≤ m} is denoted by S–LINSETm,n . Theorem 34. For all m, n ∈ N with n ≥ 2, RTD(S–LINSETm,n ) = ∞. Proof. By the monotonicity of RTD, it suffices to show that RTD(S–LINSET1,2 ) = ∞. Suppose there were some mapping T : S–LINSET1,2 → N0 × {+, −} such that T is RTD-admissible; T induces an acyclic digraph G = ( V G , A G ). For all L ∈ S–LINSET1,2 , let v L be the node in G representing L. Suppose that depth( v 2 ) = i < ∞. Choose N > max({max( X ( T ( 2 ))), sup{| T ( L )| : L ∈ S–LINSET1,2 ∧ depth( V L ) ≤ i }}). Note that such an N can be found because {| T ( L )| : L ∈ S–LINSET1,2 } is bounded. Consider the semilinear set S L 1 = 2 ∪ p 1 · . . . · p N +i , where, for all j ≥ 2, p j −1 is the jth prime number. Note that since S L 1 is consistent with , depth

T ( 2 ) ( v S L 1 ) = i − 1. Furthermore, given any two distinct ( N + i − 1)-subsets S 1 , S 2 of { p 1 , . . . , p N +i },

2 ∪



∩ 2 ∪



= S L 1 . Since N > sup{| T ( L )| : L ∈

 S–LINSET1,2 ∧ depth( V L ) ≤ i }, there is an ( N + i − 1)-subset S of { p 1 , . . . , p N +i } such that S L 2 = 2 ∪ j ∈ S p j has depth less than i − 1. Repeating a similar argument i times produces some semilinear set S L i +1 that has depth less than 0, a contradiction. 2 j∈ S 1

pj

j∈ S 2

pj

5. Linear subsets of N0 with bounded period sums The present section will examine a special family of linear subsets of N0 that arises from studying an invariant property of the class of non-erasing pattern languages over varying unary alphabets. Prior work has established the values of TD and RTD for various families of non-erasing pattern languages over alphabets of arbitrary size [9,16]. Recall that the commutative image, or Parikh image, of w ∈ {a}∗ is the number of times that a appears in w, or the length of w. Thus the commutative k k image of the language generated by a non-erasing pattern ak0 x11 . . . xnn is the linear subset (k0 + k1 + . . . + kn ) + k1 , . . . , kn of N0 , which is in NE–LINSET. Conversely, any L ∈ NE–LINSET is the commutative image of a non-erasing pattern language. This gives a one-to-one correspondence between the class of non-erasing pattern languages and the class NE–LINSET, so that the two classes have equivalent teachability properties. We will establish in this section that the RTD+ of NE–LINSETk is exactly k + 1. Our first observation is a consequence of Proposition 9; note that for all k, NE–LINSETk has finite thickness and ∅ ∈ / NE–LINSETk . Proposition 35. RTD+ (NE–LINSETk ) = RTD+ 1 (NE–LINSETk ). The following lemma gives an upper bound on RTD+ (NE–LINSETk ). Lemma 36. RTD+ (NE–LINSETk ) ≤ k + 1.



NE–LINSETk [c  ]. ConProof. It suffices to show that there is a teaching plan of order k + 1 for NE–LINSETk [c ] w.r.t. c  ≥c   sider an arbitrary L = c + P ∈ NE–LINSETk [c ]. The example (c , +) distinguishes L from all sets in c  >c NE–LINSETk [c ]. Thus the presentation of (c , +) collapses the teaching problem to teaching P w.r.t. LINSETk [c ]. It suffices therefore to show that RTD+ (LINSETk [c ]) ≤ k. We may conclude from the third assertion of Lemma 24 that RTD+ (LINSETk [c ]) ≤ I S (LINSETk [c ]). From a monotonicity argument and an application of Corollary 27, we get I S (LINSETk [c ]) ≤ I S (LINSETk ) = k. Putting everything together, the inequality RTD+ (NE–LINSETk ) ≤ k + 1 is obtained. 2 We will show that the upper bound in Lemma 36 is tight. The following notion is central to our derivation of the matching lower bound. Definition 37. Let k, N ≥ 1 be integers. Say that a set L ∈ NE–LINSET is (k, N )-special if it is of the form L = N + P such that the following hold: 1. P is an independent set of cardinality k and gcd( P ) = 1. 2. Let a1 = min( P ) and, for r = 0, . . . , a1 − 1, let

tr ( P ) = min({s ∈ P : s ≡ r (mod a1 )}). a1 −1 Then N > a21 + r = 0 t r ( P ). The following result shows that we need at least k positive examples in order to distinguish a (k, N )-special set from all its proper subsets in NE–LINSETk [ N ]. Lemma 38. If L ∈ NE–LINSET is (k, N )-special, then I W ( L , NE–LINSETk [ N ]) ≥ k.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.14 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

14

Proof. Suppose that L = N + P is of the form as described in Definition 37. For the sake of simplicity, we will write tr instead of tr ( P ). Assume by way of contradiction that there is a weak spanning set S of size k − 1 for L w.r.t. NE–LINSETk [ N ], say S = { N + x1 , . . . , N + xk−1 } for integers xi ≥ 1. For i = 1, . . . , k − 1, let r i = xi (mod a1 ) ∈ {0, 1, . . . , a1 − 1}. It follows that each xi is of the form xi = qi a1 + tri for some integer q i ≥ 0. Let X = {x1 , . . . , xk−1 }. If X = {a2 , . . . , ak }, then S ⊆ N + a2 , . . . , ak – an immediate contradiction since N + a2 , . . . , ak belongs to NE–LINSETk [ N ] and is a proper subset of L = N + P . We may therefore assume that X = {a2 , . . . , ak }. Then there must exist an index i  such that q i  = 1 or

tri ∈ / {a2 , . . . , ak }. For i = 1, . . . , k − 1, let qi = min({qi , 1}) and xi = qi a1 + tri . It is easy to verify that N + x1 , x1 , . . . , xk −1 contains S although it belongs to NE–LINSETk [ N ] and it is a proper subset of L = N + P – again a contradiction.

2

Lemmas 36 and 38, in combination with the first assertion of Lemma 24, lead to the following corollary. Corollary 39. k ≤ I W (NE–LINSETk ) ≤ RTD+ (NE–LINSETk ) ≤ k + 1. The lower bound k on RTD+ (NE–LINSETk ) can be improved to k + 1, as the following lemma shows. Lemma 40. For all k ≥ 1, RTD+ (NE–LINSETk ) ≥ k + 1. Proof. We first discuss the case k = 1. Consider any set of the form L = c + 1 for some constant c ≥ 2. It suffices to show that any positive teaching set T that distinguishes L from all its proper subsets in NE–LINSET1 must contain at least two examples. Clearly, c ∈ T + , because otherwise the set c + 1 + 1 would be consistent with T too. But T = {c } is impossible, because otherwise the set c + 2 would be consistent with T too. Hence RTD+ (NE–LINSET1 ) ≥ 2. Assume now that k ≥ 2. We will pursue the following strategy: 1. We define a set L ∈ NE–LINSETk of the form L = N + P + 1 . If RTD+ ( L , NE–LINSETk ) ≥ k + 1, then we are done. We may therefore assume that there is a positive teaching plan for NE–LINSETk whose teaching set T for L is of size at most k. 2. We define a second set L  = N + G ∈ NE–LINSET that is (k, N )-special and consistent with T . Moreover, L  \ L = { N }. If this can be achieved, then the proof will be accomplished as follows:

• According to Lemma 38, we need k examples for distinguishing L  from all its proper subsets in NE–LINSETk [ N ]. • Since L  is consistent with T , it must occur in the teaching plan before L, so that we need a positive example that distinguishes L  from L. But the only positive example that fits this purpose is ( N , +). • Altogether, we need at least k + 1 positive examples for distinguishing L  from all its proper subsets in NE–LINSETk [ N ] and from L. We start with the definition of L. Choose some p ≥ k + 1 so that { p , p + 1, . . . , p + k} is independent. Let M = F ( p , p + 1) = p ( p + 1) − p − ( p + 1). An easy calculation shows that k ≥ 2 and p ≥ k + 1 imply that M ≥ p + k. Let I = { p , p + 1, . . . , M }. Choose N large enough so that all sets of the form

N + P where | P | = k, gcd( P ) = 1 and P ⊆ I are (k, N )-special. With these choices of p and N, let L = N + p + 1 . As already explained in our proof strategy above, we may assume that there is a positive teaching plan for NE–LINSETk whose teaching set T for L is of size at most k. Note that N + p , N + p + 1 ∈ T + , because otherwise one of the sets N + p + 1 + 1 , N + p + 2, 3 ⊂ L would be consistent with T whereas T must distinguish L from all its proper subsets in NE–LINSETk . Setting A = {x : N + x ∈ T }, it follows that | A | = | T | ≤ k and p , p + 1 ∈ A. The set A is not necessarily independent but it contains an independent subset B such that p , p + 1 ∈ B and A = B . Since M = F ( p , p + 1), it follows that any integer greater than M is contained in p , p + 1 . Since B is an independent extension of { p , p + 1}, it cannot contain any integer greater than M. It follows that B ⊆ I . Clearly, | B | ≤ k and gcd( B ) = 1. We would like to transform B into another generating system G ⊆ I such that

B ⊆ G , gcd(G ) = 1 and |G | = k. If | B | = k, we can simply set G = B. If | B | < k, then we can make use of the elements in the independent set { p , p + 1, . . . , p + k} ⊆ I and add them, one after another, to B (thereby removing other elements from B whenever their removal leaves B invariant) until the resulting set G contains k elements. We now define the set L  by setting L  = N + G . Since G ⊆ I = { p , p + 1, . . . , M }, the smallest element in L \ { N } is N + p. Thus L  \ L = { N }, as desired. Moreover, since N had been chosen large enough, the set L  clearly is (k, N )-special. Since L and L  have all properties that are required by our proof strategy, the proof of the lemma is complete. 2 Combining Lemma 40 with Corollary 39 and Proposition 35 gives the main result of this section.

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.15 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

15

Theorem 41. RTD+ (NE–LINSETk ) = RTD+ 1 (NE–LINSETk ) = k + 1. The lower bound on RTD(NE–LINSETk ) in the following theorem may be obtained by adapting the proof of the corresponding result for CF–LINSETk ; the proof of [16, Theorem 6] immediately implies that TD(NE–LINSETk ) = ∞. Theorem 42. For all k ≥ 1, k − 1 ≤ RTD(NE–LINSETk ) ≤ k + 1 and TD(NE–LINSETk ) = ∞. Remark 43. Our results on NE–LINSET may be generalised to classes of linear subsets of Nm 0 for any m > 1 in the following way. For each m, define m := {c + P : c ∈ Nm (i) NE–LINSETm 0 ∧ P ⊂ N0 ∧ | P | ≤ k ∧ k  m m (ii) NE–LINSET := k∈N NE–LINSETk .



p∈ P

p 1 ≤ c 1 }.

) = Note that NE–LINSETk1 = NE–LINSETk and NE–LINSET1 = NE–LINSET. Then one has RTD+ (NE–LINSETm k m RTD+ (NE–LINSETk ) = k + 1, k − 1 ≤ RTD(NE–LINSETk1 ) ≤ RTD(NE–LINSETm ) and RTD ( NE–LINSET ) = RTD ( NE–LINSET )= k RTD+ (NE–LINSETm ) = RTD+ (NE–LINSET) = ∞. 6. Linear subsets of N 20 with constant 0 Finally, we consider how our preceding results may be extended to general classes of linear subsets of higher dimensions. Finding teaching sequences for families of linear sets with dimension m > 1 seems to present a new set of challenges, as many of the proof methods for the case m = 1 do not carry over directly to the higher dimensional cases. The classes of linear subsets of N20 briefly studied in this section are denoted as follows. In the first definition, k ∈ N. (i) LINSETk2 := { P : P ⊂ N20 ∧ ∃ p ∈ P [ p = 0] ∧ | P | ≤ k}.

(ii) LINSET2=2 := LINSET22 \ LINSET21 .

We will show later that the RTD of LINSET2=2 is either 3 or 4. The following result suggests that to identify interesting classes of linear subsets of Nm 0 for m > 1 that have finite teaching complexity measures, it might be a good idea to first exclude certain linear sets. Proposition 44. RTD({ (0, 1) }, LINSET22 ) = ∞. Proof. Assume that R = ((L0 , d0 ), (L1 , d1 ), . . .) were a teaching subsequence for LINSET22 covering { (0, 1) }. Suppose that 

(0, 1) ∈ Li and T were a teaching set for (0, 1) w.r.t. LINSET22 \ j max({d j : j < i }) such that N is larger than every component of any instance (a, b) ∈ N20 in T . Further, let p 0 , . . . , p N +i be a strictly increasing sequence of primes. Observe that by the choice of N, (0, 1), p 0 . . . p N +i (1, 1) is consistent with T . Hence this linear set occurs in some  L j0 with  j 0 < i. In addition, since,  for any two distinct( N + i )-subsets S , S of { p 0 , . . . , p N +i }, (0, 1), p 0 . . . p N +i (1, 1) 

(0, 1), x∈ S x(1, 1) and (0, 1), x∈ S x(1, 1) ∩ (0, 1), x∈ S   x(1, 1) ⊆ (0, 1), p 0 . . . p N +i (1, 1) , the choice of N again gives that for some ( N + i )-subset S 1 of { p 0 , . . . , p N +i }, (0, 1), x∈ S 1 x(1, 1) ∈ L j 1 for some j 1< j 0 . The preceding line of argument can be applied again to show that for some ( N + i − 1)-subset S 2 of S 1 , (0, 1), x∈ S 2 x(1, 1) ∈ L j 2 for some j 2 < j 1 .Repeating the argument successively thus yields a chain S 1  S 2  . . .  S i of subsets of { p 0 , . . . , p N +i } such that

(0, 1), x∈ S l x(1, 1) ∈ L jl for all l ∈ {1, . . . , i }, where j i < . . . < j 1 < j 0 < i, which is impossible as j i ≥ 0. Hence there is no teaching subsequence of LINSET22 covering { (0, 1) }.

2

One can define quite a meaningful subclass of LINSET22 that does have a finite RTD. LINSET2=2 consists of all linear sets in LINSET22 that are strictly 2-generated. Examples of strictly 2-generated linear subsets include (1, 0), (0, 1) and (4, 6), (6, 9) .

(0, 1) is not a strictly 2-generated linear subset. Theorem 45. (i) TD(LINSET2=2 ) = ∞. (ii) LINSET2=2 does not have any positive teaching sequence. (iii) RTD(LINSET2=2 ) ∈ {3, 4}. Proof. Assertion (I). Observe from the proof of Proposition 44 that for any N distinct primes p 0 , p 1 , . . . , p N −1 , TD( (0, 1), p 0 p 1 . . . p N −1 (1, 0) , LINSET2=2 ) ≥ N. Hence TD(L, LINSET2=2 ) = ∞ for any cofinite subclass L of LINSET2=2 .

JID:TCS AID:11396 /FLA

16

Doctopic: Algorithms, automata, complexity and games

[m3G; v1.226; Prn:8/12/2017; 12:57] P.16 (1-17)

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

Fig. 1. p 1 and p 2 (not drawn to scale).

 Assertion (II). Let p 1 , p 2 , p 3 , . . . be a strictly increasing infinite sequence of primes. Note that for all j, (2, 0), (3, 0)  (1, 0), p 1 . . . p j (0, 1) . Further, (1, 0), p 1 (0, 1)  (1, 0), p 1 p 2 (0, 1)  (1, 0), p 1 p 2 p 3 (0, 1)  . . .  (1, 0), p 1 . . . p j (0, 1) 

(1, 0), p 1 . . . p j p j +1 (0, 1)  . . . is an infinite descending chain in LINSET2=2 . Thus by Proposition 11, LINSET2=2 does not have

a positive teaching sequence. Assertion (III). The main idea is that for each strictly 2-generated linear subset S of N20 with canonical representation (0, P ) , if M denotes  the class of all S  ∈ LINSET2=2 for which each S  ∈ M with canonical representation (0, P  ) satisfies p  ∈ P  p  1 ≥ p ∈ P p 1 , then TD( S , M ) ≤ 4. The sequence ((L0 , d0 ), (L1 , d1 ), . . .) defined by Li = { u 1 , u 2 :

u 1 + u 2 1 = i + 2} would then be a teaching sequence for LINSET2=2 of order at most 4. To prove this assertion, it suffices 2   to find a teaching set of size at most 4 for any u 1 , u 2 w.r.t. the class of all S ∈ LINSET=2 such that if S has the canonical representation (0, P  ), then p  ∈ P  p  1 ≥ u 1 + u 2 1 .

Case (i): {u 1 , u 2 } is linearly independent. For a given linear set L with canonical representation (c , P ), call each p ∈ P a minimal period of L. Assume that u 1 lies to the left of u 2 . Consider the set A of linear sets L in M such that u 1 , u 2  L. Since no single vector in N20 can generate two linearly independent vectors in N20 , each L ∈ A must have two linearly independent periods p 1 and p 2 , neither of which lies strictly between u 1 and u 2 ; in addition, max({ p 1 1 , p 2 1 }) ≤ max({ u 1 1 , u 2 1 }). Thus A is finite. Furthermore, for each L ∈ A with canonical representation (0, P  ), at least one    of the periods in P is not parallel to u 1 and also not parallel to u 2 , for otherwise p  ∈ P  p 1 < u 1 + u 2 1 . If  A = ∅, then {(u 1 , +), (u 2 , +)} is a teaching set for u 1 , u 2 w.r.t. M. Assume that A = ∅. Consider the set Q = L ∈ A { w : w is a minimal period of L not parallel to u 1 and not parallel to u 2 }. Choose some p 1 among the periods in Q that are closest to u 1 to the left of u 1 , and choose p 2 so that p 2 is among the periods in Q that are closest to u 2 to the right of u 2 (see Fig. 1); note that at least one of p 1 , p 2 exists. For every L ∈ A with canonical representation (0, { v 1 , v 2 }), at least one of p 1 and p 2 lies between (not necessarily strictly) v 1 and v 2 , and {kp 1 , k p 2 } ∩ u 1 , u 2 = ∅ for all k, k ∈ N. Thus there is a sufficiently large K ∈ N such that for all L ∈ A, either K p 1 ∈ L \ u 1 , u 2 or K p 2 ∈ L \ u 1 , u 2 . Therefore a teaching set for

u 1 , u 2 w.r.t. M is {(u 1 , +), (u 2 , +), ( K p 1 , −), ( K p 2 , −)}. If p i does not exist for exactly one i, then remove ( K p i , −) from this teaching set. Case (ii): {u 1 , u 2 } is linearly dependent. Then u 1 and u 2 can be expressed as ma and na respectively for some a ∈ Q2 and some m, n > 1 such that gcd(m, n) = 1. Consider the set B of linear sets L in M such that u 1 , u 2  L. The case B = ∅ can be dealt with as in Case (i). Assume that B = ∅. Given any L ∈ B, let w 1 and w 2 be the periods of L. Then either u 1 or u 2 cannot be written in the form λ1 w 1 + λ2 w 2 for some λ1 , λ2 > 0, for otherwise the sum of the components of L’s periods would be less than u 1 + u 2 1 . Thus exactly one of the periods { w 1 , w 2 }, say w 1 , is such that ma = k1 w 1 and na = k2 w 1 for some k1 , k2 ∈ N. Hence mk2 = nk1 , and as gcd(m, n) = 1, k2 = q1 n and k1 = q2 m for some q1 , q2 ∈ N. This implies that a = q2 w 1 = q1 w 1 ∈ L \ u 1 , u 2 , so that {(u 1 , +), (u 2 , +), (a, −)} is a teaching set for u 1 , u 2 with respect to M. Now define L2i = { u 1 , u 2 : u 1 + u 2 1 = i + 2 ∧ {u 1 , u 2 } is linearly independent} and L2i +1 = { u 1 , u 2 : u 1 + u 2 1 = i + 2 ∧ {u 1 , u 2 } is linearly dependent} for all i ∈ N0 . The analyses of Cases (i) and (ii) show that ((L0 , d0 ), (L1 , d1 ), . . .) is indeed a teaching sequence for 2LINSET0=2 of order at most 4. To prove the lower bound of 3, suppose for a contradiction that R = ((Q0 , d0 ), (Q1 , d1 ), . . .) were a teaching sequence of order 2 for LINSET2=2 . Let i be the minimum index for which Qi contains some L ∈ LINSET2=2 generated by two linearly dependent vectors mw and nw for some w ∈ N2 and integers m, n > 1 with gcd(m, n) = 1. Choose i + 1 primes p 0 , . . . , p i with p 0 < . . . < p i and v 1 , v 2 ∈ N2 such that v 1 lies to the left of w and v 2 lies to the right of w. Let p be any prime such that p > max({m, n}). Since mw , np w  L, the choice of i implies that a recursive teaching set T for L with respect to R must contain at least one positive example, say (m w , +). As p 0 . . . p i v j , m w is consistent with {(m w , +)} for any j ∈ {1, 2}   and p 0 . . . p i v 1 , m w ∩ p 0 . . . p i v 2 , m w ⊆ m w  L, at least one of p 0 . . . p i v 1 , m  w and p 0 . . . p i v2 , m w belongs to some Q j 0 with j 0 < i; suppose that p 0 . . . p i v 1 , m w satisfies this condition. As p 0 . . . p i v 1 , m w  p 0 . . . p i −1 v 1 , m w       and {kw : k ∈ N0 } ∩ p 0 . . . p i v 1 , m w = {kw : k ∈ N0 } ∩ p 0 . . . p i −1 v 1 , m w , either p 0 . . . p i −1 v 1 , m w belongs to some Q j 1   with j 1 < j 0 , or any recursive teaching set T for p 0 . . . p i v 1 , m w with respect to R contains at least one negative example / {kw : k ∈ N0 }. In the latter case, there is some prime p  > m such that { v 4 , −} is also with ( v 4 , −) such  that v 4 ∈   consistent   m w , p w ; by the choice of i, T must contain a positive example ( v 3 , +) to distinguish p 0 . . . p i v 1 , m w from m w , p  w . Further, there exists some L  = p 0 . . . p i v 5 , v 3 such that v 5 lies strictly between v 4 and v 3 , so that L  is consistent with T and must belong to some Q j 1 with j 1 < j 0 . One can then apply the preceding line of argument again to either L  or p 0 . . . p i −1 v 1 , m w and yield some L  ∈ LINSET2=2 such that L  has a linearly independent set of periods, one of which is equal to p 0 . . . pl v  for some l ≥ i − 2,

JID:TCS AID:11396 /FLA

Doctopic: Algorithms, automata, complexity and games

Z. Gao et al. / Theoretical Computer Science ••• (••••) •••–•••

[m3G; v1.226; Prn:8/12/2017; 12:57] P.17 (1-17)

17

v  ∈ N2 , and L  ∈ Q j 2 for some j 2 < j 1 . Repeating the argument successively gives i + 1 linear sets of LINSET2=2 contained in Q j 0 , . . . , Q j i such that 0 ≤ j i < j i −1 < . . . < j 0 < i, which is impossible. Hence RTD(LINSET2=2 ) ≥ 3. 2 7. Conclusion We have studied two main teaching parameters, the TD and RTD (as well as their variants, TD+ and RTD+ ), of classes of linear sets with a fixed dimension. Notice that in Table 1, even though all the classes have an infinite TD, there are finer notions of teachability that occasionally yield different finite sample complexity measures. In particular, there are families of linear sets that have an infinite TD and RTD+ and yet have a finite RTD. We broadly interpret a class that has an infinite RTD as being “unteachable” in a stronger sense than merely having an infinite TD. Quite interestingly, the fact that some classes in Table 1 have an infinite RTD contrasts with Takada’s [22] theorem that the family of linear subsets of Nm 0 is learnable in the limit from just positive examples. One possible interpretation of this contrast is that classes of linear sets may be generally harder to teach than to learn. Further, a number of quantitative problems remain open. For example, we did not solve the question of whether RTD(LINSETk ) is finite for each k > 3. A more precise analysis of the values of RTD for various families of linear sets studied in the present paper (see Table 1) would also be desirable. Acknowledgements We would like to thank the anonymous reviewers for their insightful comments, which helped to improve the presentation of the paper. We would also like to thank the referees of ALT 2015 for their critical reading of the earlier conference version. Sandra Zilles was partially supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) discovery grant, application number 386246–2010, and by the NSERC Canada Research Chairs program. References [1] [2] [3] [4]

[5] [6] [7] [8] [9] [10]

[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

Naoki Abe, Polynomial learnability of semilinear sets, in: Proceedings of the 2nd Annual Conference on Learning Theory (COLT), 1989, pp. 25–40. Dana Angluin, Finding patterns common to a set of strings, J. Comput. System Sci. 21 (1980) 46–62. Dana Angluin, Inductive inference of formal languages from positive data, Inf. Control 45 (2) (1980) 117–135. Malte Darnstädt, Thorsten Doliwa, Hans U. Simon, Sandra Zilles, Order compression schemes, in: Algorithmic Learning Theory, Twenty-Fourth International Conference, ALT 2013, Singapore, October 6–9, 2013, Proceedings, in: Lecture Notes in Computer Science Proceedings, vol. 8139, Springer, 2013, pp. 173–187. Thorsten Doliwa, Gaojian Fan, Hans U. Simon, Sandra Zilles, Recursive teaching dimension, VC-dimension and sample compression, J. Mach. Learn. Res. 15 (1) (2014) 3107–3131. David Doty, Matthew J. Patitz, Scott M. Summers, Limitations of self-assembly at temperature 1, Theoret. Comput. Sci. 412 (1–2) (2011) 145–158. Emanuele Frittaion, Alberto Marcone, Linear extensions of partial orders and reverse mathematics, Math. Log. Q. 58 (6) (2012) 417–423, available at https://arxiv.org/pdf/1203.5207v2.pdf. Ralf Fröberg, Christian Gottlieb, Roland Häggkvist, On numerical semigroups, Semigroup Forum 35 (1987) 63–83. Ziyuan Gao, Zeinab Mazadi, Regan Meloche, Hans U. Simon, Sandra Zilles, Distinguishing pattern languages with membership examples, Manuscript, 2014. Ziyuan Gao, Hans U. Simon, Sandra Zilles, On the teaching complexity of linear sets, in: Algorithmic Learning Theory, Twenty-Sixth International Conference, ALT 2015, Banff, AB, Canada, October 4–6, 2015, Proceedings, in: Lecture Notes in Computer Science Proceedings, vol. 9355, Springer, 2015, pp. 102–116. Sally A. Goldman, Michael J. Kearns, On the complexity of teaching, J. Comput. System Sci. 50 (1) (1995) 20–31. Sally A. Goldman, H. David Mathias, Teaching a smarter learner, J. Comput. System Sci. 52 (2) (1996) 255–267. Kunihiko Hiraishi, Atsunobu Ichikawa, On structural conditions for weak persistency and semilinearity of Petri nets, Theoret. Comput. Sci. 93 (2) (1992) 185–199. Oscar H. Ibarra, Reversal-bounded multicounter machines and their decision problems, J. ACM 25 (1) (1978) 116–133. Oscar H. Ibarra, Zhe Dang, Omer Egecioglu, Catalytic P systems, semilinear sets, and vector addition systems, Theoret. Comput. Sci. 312 (2–3) (2004) 379–399. Zeinab Mazadi, Ziyuan Gao, Sandra Zilles, Distinguishing pattern languages with membership examples, in: LATA 2014, Proceedings, in: Lecture Notes in Computer Science Proceedings, Springer, 2014, pp. 528–540. Rohit J. Parikh, On context-free languages, J. ACM 13 (4) (1966) 570–581. José C. Rosales, Pedro A. García-Sánchez, Numerical Semigroups, Springer, New York, 2009. Ayumi Shinohara, Satoru Miyano, Teachability in computational learning, New Gener. Comput. 8 (4) (1991) 337–347. Hans U. Simon, Sandra Zilles, Open problem: recursive teaching dimension versus VC dimension, in: Proceedings of the 28th Annual Conference on Learning Theory (COLT), 2015, pp. 1770–1772. Rani Siromoney, On equal matrix languages, Inf. Control 14 (2) (1969) 135–151. Yuji Takada, Learning semilinear sets from examples and via queries, Theoret. Comput. Sci. 104 (2) (1992) 207–233. Hideki Yamasaki, On weak persistency of Petri nets, Inform. Process. Lett. 13 (3) (1981) 94–97. Sandra Zilles, Steffen Lange, Robert Holte, Martin Zinkevich, Models of cooperative teaching and learning, J. Mach. Learn. Res. 12 (2011) 349–384.