Information Elsevier
Processing
Letters
47 (1993) 81-88
20 August
1993
On the non-existence of maximal inference degrees for language identification Sanjay Jain Institute of Systems Science, National
University of Singapore, Singapore 0511, Singapore
Arun Sharma School of Computer
Science and Engineering,
University of New South Wales, Sydney, NSW 2033, Australia
Communicated by L.A. Hemaspaandra Received 16 February 1993 Revised 27 April 1993
Abstract Jain, S. and A. Sharma, On the non-existence Processing Letters 47 (1993) 81-88.
of maximal
inference
degrees
for language
identification,
Information
Identification of grammars (r.e. indices) for recursively enumerable languages from positive data by algorithmic devices is a well-studied problem in learning theory. The present paper considers identification of r.e. languages by machines that have access to membership oracles for noncomputable sets. It is shown that for any set A there exists another set B such that the collections of r.e. languages that can be identified by machines with access to a membership oracle for B is strictly larger than the collections of r.e. languages that can be identified by machines with access to a membership oracle for A. In other words, there is no maximal inference degree for language identification. Keywords:
Machine
learning;
inductive
inference;
methodology
of science;
recursion
theory;
theory
of computation
1. Introduction
A model for a subject learning a concept could be described thus. At any given time, the subject receives some finite data about the concept. Based on this data, the subject conjectures a hypothesis about the concept. Availability of more data may cause the subject to revise its hypotheses. The subject is said to learn the concept just in case the sequence of hypotheses conjectured by the subject converges to a fixed hypothesis and this final hypothesis is a correct representation of the concept. This is essentially the theme of Gold’s [93 identification in the limit when the subject is an algorithmic device. Identification of programs for computable functions from their graphs and identification of grammars (r.e. indices) for recursively enumerable languages from positive data are two extensively studied problems in the above framework. Lately, with a view to gain a deeper insight into the process of identification, there has been considerable interest in investigating identification by devices that have access to membership oracles for non-computable sets [1,8,7]. This study has mainly concentrated on
Correspondence to: Professor S. Jain, Institute of Systems Science, Ridge, Singapore 0511, Singapore. Email:
[email protected]. 0020-0190/93/%06.00
0 1993 - Elsevier
Science
Publishers
National
University
B.V. AII rights reserved
of Singapore,
Heng Miu Keng Terrace,
Kent
81
Volume
47, Number
2
INFORMATION
PROCESSING
LETTERS
20 August
1993
identification of functions; a representative result being that there exists a single machine with access to an oracle for the halting set that identifies every computable function [l]. The present paper investigates identification of recursively enumerable languages from positive data by machines with access to membership oracles for noncomputable sets. It is shown that for this problem, there is no maximal inference degree in the sense that for every set A there exists another set B such that the collections of r.e. languages that can be identified by machines with access to a membership oracle for B is strictly larger than the collections of r.e. languages that can be identified by machines with access to a membership oracle for A. Similar results are shown about more general identification criteria, namely vacillatory language identification and behaviorally correct language identification. We now proceed formally. In Section 2 and Section 3, we introduce the notation and definitions, respectively. Section 4 contains the results.
2. Notation Recursion-theoretic concepts not explained below are treated in [16]. N is the set of natural numbers, (0, 1, 2, . . .}; N+ is the set of positive integers, 11, 2, 3, . . .I. The symbols a, b, i, j, m, II, s, t, x, and y, with or without decorations (decorations are subscripts, superscripts and the like), range over natural numbers unless otherwise specified. 5, c , denote subset, proper subset, respectively. E denotes “element of”. fl denotes the empty set. A, B, C, D, S range over sets of natural numbers; we usually reserve D to denote finite sets. We denote the cardinality of the set S by card(S). AAB denotes the symmetric difference of A and B, i.e. (A - B) U (B -A). The symbol * denotes “finite but unbounded” such that (Vn>[n < * < ~1. For a EN U { * 1, we say that A =a B iff card (AAB) G a. Thus, A =* B means that card (AAB) is finite. max(->, min(-) denote the maximum and minimum of a set, respectively. By convention max(@) = 0 and min(@>= m. For a partial recursive function n, domain(q) denotes the domain of n. 1 denotes defined. t denotes undefined. L, with or without decorations, ranges over recursively enumerable (r.e.> subsets of N. 8 denotes the class of all r.e. languages. 9, with or without decorations, ranges over subsets of 8. cp denotes a standard acceptable programming system (also referred to as standard acceptable numbering) [15,16], ‘pi denotes the partial recursive function computed by the ith program in the standard acceptable programming system cp. @ denotes an arbitrary fixed Blum complexity measure [3,101 for the q-system. &. denotes the domain of ‘pi. W, is, then, the r.e. set/language (L N) accepted by q-program i. We also think of the r.e. index i for W, as a grammar for accepting Wi. u/;,, denotes the set {X G s ( Q&x) GS}. FIN denotes the set of all non-empty finite sets. We do not include the empty set in our definition of FIN to facilitate the presentation of our proof of Theorem 5. Note that an effective canonical indexing for FIN can be easily obtained. For n EN, C,” and Ilz denote the nth level of the Kleene hierarchy relativized with respect to C [161. (i, j) stands for an arbitrary computable one to one encoding of all pairs of natural numbers onto N [16]. The quantifiers “V”’ and “3”” mean “for all but finitely many” and “there exist infinitely many”, respectively.
3. Preliminaries In this section, we briefly adopt notions and results from the recursion-theoretic machine learning literature to learning with oracles. We first introduce a notion that facilitates discussion about elements of a language being fed to a learning machine. 82
Volume
47, Number
2
INFORMATION
PROCESSING
LETTERS
20 August
1993
A finite sequence is a mapping from {x 1x < a), for some a E N, into (N U {#)I. We let u and 7, with or without decorations, range of finite sequences. The content of a finite sequence U, denoted content(a), is the set of natural numbers in the range of u. Intuitively, #‘s represent pauses in the presentation of data. The length of u, denoted I (T I, is the number of elements in the domain of u. u c T means that (+ is an initial sequence of T. SEQ denotes the set of all finite sequences. An oracular learning machine is an algorithmic device which has access to a membership oracle and which computes a mapping from SEQ into N. Without loss of generality, we assume our oracle learning machines to be total with respect to any oracle. We let M, with or without decorations, range over oracular learning machines. MA(a) denotes the hypothesis conjectured by machine M with access to a membership oracle for A on input u. A text T for a language L is a mapping from N into (N U {#)) such that L is the set of natural numbers in the range of T. The content of a text T, denoted content(T), is the set of natural numbers in the range of T. We let T, with or without decorations, range over texts. T(n) denotes the nth element of text T. T[nl denotes the finite initial sequence of T with length n. Hence, domain(T[nl) = {x I x
Suppose M is an oracular learning machine and T is a text. Let A LN. MA( (read: MA(T) converges) iff (3i)(W’n)[MA(T[n]) = i]. If MA(T>J, then MA(T) is defined to be the unique i such that (Wn)[MA(T[n]) = i]; otherwise we say that MA(T) diverges (written: M*(T)? ). 1. Let A c N. Let a E N U { * 1. (1) M OATxfExa-identifies L (written: L E OATxfExa(M)) iff (V texts T for L)[MA(T)J A WIM~CTj =P Ll. (2) OATxtExa = (9~8 1(3M)[_5?~ OATxtEx=(M)]). If A is a recursive then OATxtExa is the same class as the class TxtEx” defined in the literature OATxtExo is usually written OATxtEx. Definition
[6].
3.2. Vacillatory language identification
We now extend the notion of oracular identification to vacillatory learning [4,13]. Intuitively, according to this criteria of success, instead of converging to a single index for the language, the machine converges to a finite set of indices, all of which are correct. We first introduce the notion of an oracular learning machine finitely converging on a text. Let M be an oracular learning machine and T be a text. Let A c N. MA(T) finitely-converges (written: MA(T)J) iff {MA(u)1 UC: T) is finite; otherwise we say that MA(T) finitely-diverges (written: MA(T)fl). If MA(T) J, then MA(T) is defined as P, where P = (i I(3”u c T)[MA(u) = i]). Let ~EN+U{*). Let a eNU{*). L (written: L E OATxfFexg(M)) iff (V texts T for LX3P [card(P) G b A (Vi E P)[K =P L])[MA(T)I[
Definition
2. Let A cN.
(1) M OATxfFexg-identifies
A MA(T) = PI. (2) OATxtFex,” = (9 I(3M)[_Yc OATxfFex,“(M)]). If A is recursive then OATxfFexz is the same class as the class TxtFexg defined in the literature [13,41. 83
Volume
47. Number
INFORMATION
2
PROCESSING
20 August
LEl-l-ERS
1993
3.3. Behaviorally correct language identification We extend the notion of oracular identification
to TxtBc”-identification
3. Let A cN. Let a ENU{*). (1) M OATxtBca-identifies L (written: L E OATxtBca(M)) (V texts T for L W’ n) [WM~cT,nIj2 L].
[6,14,13].
Definition
(2) OATxtBca
= (9’ I(3M)[_Yc
iff
OATxtBcn(M)]}.
4. Results 4.1. 0 ATxtEx”-identification
The following lemma is a relativization
of a central result about TxtEx”-identification
[2,12].
Lemma 4. Let a EN U (*}. Let A G N. Zf M OATxtExa-identifies L, then there exists a u such that (1) content(a) c L, (2) WM~cUj2 L, and (3) (VT I u L T A content(r) c L>[MA(a) = MA(r)]. u is referred to as an OATxtExn-locking sequence for M on L. Theorem 5. Let a EN. Consider a sequence of sets, A,, A,, A,. . . . Consider the set, B = {(i, x) 1x EAT A i EN). Let C be such that {i I Ai is finite} @ 2:. Then there exists a class of languages 9 such that 9~ OBTxtEx - OCTxtEx”. Proof. Consider the following languages and classes of languages: For iEN, S&N, let Li,s denote
((i,
X) I x E S}
For i E N, let p.= ’
if Ai is finite,
ILi,Nly i
{Li,,
I D E FIN],
otherwise.
Then, let
Note that according to our definition FIN consists of all the finite sets except the empty set. We leave out the empty set for ease of writing the proof of Claim 1 below. For D E FIN, let Gi,, denote a grammar for Li,, such that Gi,, can be found effectively from i and D. Let Gi,N denote a grammar for Li,N.
Now the theorem follows from Claims 1 and 2 below. Claim 1. _!YE OBTxtEx. 84
Volume
47, Number
2
INFORMATION
PROCESSING
LETTERS
20 August
1993
Proof. We construct a machine M that OBTxtEx-identifies .9’. Suppose T is a text for L ES?. Then, M, on seeing the initial sequence T[x], will make use of a number of functions to issue its hypothesis MB(T[x]). Each of these functions will either be recursive or recursive in B. It is clear that elements of L are of the form (i, n) for some fixed i; if Ai is finite then n ranges over N, else n ranges over some finite set. We refer to i as the key of L. Formally,
key(T[x])
=min({x]
U
{.il(~y)[(j,
Y> Econtent(T[x])]}).
We now define three functions g,, g,, and g,. gi(T[xI)
=card({y
g2(
Y> EB}),
T[ x])
= card(content(
T[ xl)).
The information provided by g, and g, is collected in the following function g,. If L is Li,,,, then is 0 for all but finitely many x. If L is finite, then g,(T[xl) is 1 for all but finitely many x. Clearly, g, is recursive in B.
g,(T[xl)
g3(T]x]) It is easy (1) Ai (2) Ai Now, let
= (” 0,
if g,(T[x]) otherwise.
>g2(T[x])y
to verify that the following two statements summarize the properties of g,, g,, and g,. is finite iff L = Li,,, iff (V”x)[g,(T[x]) > g,(T[x])l iff (V”x)[g,(T[xl) = 01. is infinite iff L = Li,, for some D E F’IN iff (V”x)[g,(T[xl) >gg,(T[xl)l iff (VmxXg,(T[xI) = 11. M be defined as follows:
MB(WI)
=
G kw(WlW’
if g,(T[x])
= 0,
%cey(qxjp
if g&%1)
= 1
x)~content(T[x])}.
~D={xl(key(T[x]),
I
It is easy to verify that M indeed OBTxtEx-identifies
3.
q
Claim 2. 3’66 OCTxtEx”. Proof. Suppose by way of contradiction 3’~ OCTxtEx”. Then, we show that the set {i I Ai is finite} E 2;. Suppose M OCTxtEx”-identifies 9. Let i EN. Recall that pi = {Li N} if Ai is finite. Thus, by Lemma 4, there is an OCTxtEx”-locking sequence for M on Li,,, if Ai is finite. On the other hand, if Ai is infinite, then for every u such that content(a) c Li,N, there exist infinitely many L ~2~ such that L Z“ WMcuj. Thus we have Ai is finite
*
(3~ Icontent( a) c Li,N)(V7 I ff C 7 A content( 7) c Li,N) [MC(a)
= MC(r)] .
Thus, Ii I Ai is finite} E Ct. A contradiction.
q
We would like to note that the above theorem can be extended to show that there exists a class of languages 3’ such that 9” E OBTxtEx - OCTxtEx *. A proof can be worked out on the lines of the above proof by taking 3” to be a cylindrification of 9. We leave details to the reader. The following corollary is essentially a restatement of Osherson, Stob and Weinstein’s result that 8 cannot be TxtEx*-identified by even non-computable learning machines [12]. Osherson, Stob, and Weinstein’s proof follows from their observation that Blum and Blum’s [2] locking sequence construction also holds for non-computable devices. 85
Volume
47, Number
Corollary
2
INFORMATION
PROCESSING
20 August
LETTERS
1993
6. (VAIL 8 P 0 ATxtEx * I.
The following corollary says that there does not exist a maximal inference degree for TxtEx”-identification. Corollary
7. Let a EN U { *}. (VA)(3B)[OATxtEx”
c OBTxtExa].
Kummer and Stephan [ll] have recently refined our results. 4.2. 0 ATxtFex $-identification We now show that an analog of Theorem 5 holds for 0 ATxtFexg-identification. As in the case of Theorem 5, we only give a proof of the cases where a EN; the a = * case can similarly be obtained by considering a cylindrification of class used in the proof of other cases. But first, we present the following lemma, which is a relativization of a result about TxtFexg-identification [13,4]. Lemma 8. Let A c N. Let a E N U { * ). Let b E N+ U { * ). u E SEQ and D E FIN such that
ZfM
0 ATxtFex g-identifies L, then there exists a
(1) content(a) CL, (2) card(D) =Gb A (Vi E D)[w. A L], and (3) (VT I u c T A content(r) c L>[MA(7) E 01. 9. Let a E N. Consider a sequence of sets, A,, A,, A,. . . . Consider the set, B = {(i, x >I x E Ai A i EN). Let C be such that (i I Ai is finite) 642 $. Then there exists a class of languages _Y such that _YE OBTxtEx - OCTxtFex”, . Theorem
Proof. Consider the collection of languages _.Yfrom the proof of Theorem 5. We only need to show that _Y$ZOCTxtFex”, . Suppose by way of contradiction, _!Z= O=TxtFex”, . We show that {i 1Ai is finite) E Ct. Suppose M OCTxtFex”,-identifies 9. Now, using Lemma 8 and proceeding as in the proof of Claim 2, we have that Ai is finite iff (3~ Icontent c Li,,)(g a finite set D)(Vr I u G T A content(r) c Li,NHMC(~> E D]. Thus, {i I Ai is finite) E Ct. A contradiction. 0
As already stated, we leave it to the reader to extend the above result for the a = * case. The following corollary is essentially a restatement of Osherson, Stob and Weinstein’s result that 8 cannot be TxtFex *,-identified by even non-computable learning machines [12]. Corollary
10. (VA)[8 G 0 A TxtFex Z I.
The following corollary says that there does not exist a maximal inference degree for TxtFexg-identification. Corollary
11. Let a EN U {*I. Let b E N+U{ *}. (VAX3B)[OATxtFex;
c OBTxtFexg].
4.3. OATxtBca-identification We now show that a similar result also holds for behaviorally correct language identification. following lemma is a relativized version of a result about TxtBc”-identification [13,6]. 86
The
Volume
47, Number
Lemma 12.
If
2
INFORMATION
M OATxfBc”-identifies
PROCESSING
LETTERS
20 August
1993
L, then there exists a u such that
(1) content(a) c L, (2) WM~caj2 L, and (3) (V7 I u c 7 A content(T) c L)[WM_4c7j2 L]. Theorem 13. Consider a sequence of sets, A,, A 1, A 2 . . . . Consider the set, I3 = {(i, x) I x EA, A i EN). Let C be such that (i 1Ai is finite) 4 Z$. Then there exists a class of languages 22 such that _Y E OBTxfEx - OCTxtBc *. Proof. Again, consider the collection show that _FG OCTxtBc*.
of languages _.F from the proof of Theorem
5. We only need to
Suppose by way of contradiction, _YE OCTxtBc*. We then show that {i I Ai is finite) E Zf. Suppose M OCTxtBc*-identifies _Y. Now, Ai is finite implies M O’TxtBc*-identifies Li,,,, which implies (by Lemma 12) that (3a Icontent( a) c 15~,~)(V7 I u c 7 A content( 7) E LI,N) [ WM~cTj=* L,,,]
,
which in turn implies that (3a (content(a)
CLi,N)(V7 I u c 7 A content(~)
G
Li,N) [ WMcc7) is infinite] .
c
Li,N)
AISO,
(3~7 Icontent( c) c
Li,~)(vT
Iu
c
7 A
content(~)
implies that M does not OCTxtBc *-identify any Li,,
[ WMcCTj
such that content(u)
is infinite] & Li,D, thereby implying that
Ai is finite.
Thus, we have Ai is finite
e
(37 (content(u)
z Li,N)(V~ Iu c
7 A
content( 7) cLi,N)
[ wMcW is finite] . Now, it is easy to see that [W1M~(7j is infinite ] is II: (since, [WMp7, is infinite] iff (VmX3n > m)(3 t)[n ence, (i I Ai is finite) E Cf. A contradiction. q E WMC(7),rl1. H
The following corollary is essentially a restatement of Osherson, Stob and Weinstein’s result that 8 cannot be TxtBc *-identified by even non-computable learning machines [ 121. Corollary 14. (VA)[8 64 OATxtBc *I. The following corollary says that there does not exist a maximal inference degree for TxtBc”-identification. Corollary 15. Let a EN
U (*}.
(VAX3B)[OATxtBc”
c OBTxtBca].
Acknowledgement We wish to thank the referees for many helpful comments, for pointing out an oversight in an earlier version of the paper, and for suggesting that we also consider TxtBc-identification by oracular machines. 87
Volume
47, Number
2
INFORMATION
PROCESSING
LETTERS
20 August
1993
References [l] L. Adleman and M. Blum, Inductive inference and unsolvability, J. Symbolic Logic 56 (3) (1991) 891-900. [2] L. Blum and M. Blum, Toward a mathematical theory of inductive inference, Inform. and Control 28 (1975) 125155. [3] M. Blum, A machine independent theory fo the complexity of recursive functions, J. ACM 14 (1967) 322-336. [4] J. Case, The power of vacillation, in: D. Haussler and L. Pitt, eds., Proc. Workshop on Computational Learning Theory (Morgan Kaufmann, Los Altos, CA, 19881 133142. 151 J. Case, The power of vacillation in language learning, Tech. Rept. 93-08, University of Delaware, 1992. [6] J. Case and C. Lynes, Machine inductive inference and language identification, in: M. Nielsen and E.M. Schmidt, eds., Proc. 9th Internat. Colloq. on Automata, Languages and Programming, Lecture Notes in Computer Science 140 (Springer, Berlin, 1982) 107-115. [7] P. Cholak, R. Downey, L. Fortnow, W. Gasarch, E. Knber, M. Kummer, S. Kurtz and T. Slaman, Degrees of inferability, in: Proc. Fifth Ann. Workshop on Computational Learning Theory, Pittsburgh, PA (ACM Press, New York, 1992) 180-192. 181 W.I. Gasarch and M.B. Pleszkoch, Learning via queries to an oracle, in: R. Rivest, D. Haussler and M.K. War-
88
[9] [lOI
[ill
[12]
[13] [14] I151 [16]
muth, eds., Proc. Second Ann. Workshop on Computational Learning Theory, Santa Cruz, CA (Morgan Kaufmann, Los Altos, CA, 19891 214-229. E.M. Gold, Language identification in the limit, Inform. and Control 10 (1967) 441-414. J. Hopcroft and J. Ullman, Introduction to Automata Theory Languages and Computation (Addison-Wesley, Reading, MA, 1979). M. Kummer and F. Stephan, On the structure of degrees of inferability, in: Proc. Sixth Ann. Workshop on Computational Learning Theory, Santa Cruz, CA (ACM Press, New York, 1993). D. Osherson, M. Stob and S. Weinstein, Systems that Learn, An Introduction to Learning Theory for Cognitive and Computer Scientists (MIT Press, Cambridge, MA 1986). D. Osherson and S. Weinstein, Criteria of language learning, Inform. and Control 52 (1982) 123-138. D. Osherson and S. Weinstein, A note on formal learning theory, Cognition 11 (1982) 77-88. H. Rogers, Giidel numberings of partial recursive functions, J. Symbolic Logic 23 (1958) 331-341. H. Rogers, Theory of Recursive Functions and Effectiue Computability (McGraw Hill, New York, 1967); Reprint: MIT Press, Cambridge, MA, 1987.